This article concentrates on one little-known aspect of the wake-on-LAN portion of 1E NightWatchman. This product is capable of being installed in a so-called “integrated” fashion, where it becomes part of the System Center Configuration Manager (SCCM) product; and, in a “stand-alone” model where it is integral to the NightWatchman Management Center (NWMMC; i.e. no dependency on existence of SCCM). In both scenarios, a fundamental element of its wakeup operation is to ensure there is a proxy 1E Agent up and running at all times on every subnet. This agent then assumes a proxy function working in concert with the server-side agent to receive wakeup requests from the server and to then create and issue magic packets to its subnet neighbors. Part of this process is the method whereby the WakeUp Server component finds a partner agent on each subnet, and a second backup agent to support this process. These agents are known as the Primary and Alternate agents on each subnet. The end goal is to find a pair of machines where these roles may be assigned. We will also ensure one or the other of these agents is up and running at all times to process those wakeup requests. That latter process, known as “Last Man Standing”, will be discussed in a subsequent article. This article provides a high level overview of the discovery and assignment process to identify and assign the two roles initially.
Agent Discovery Overview
Before we dive into the deep end, let’s take a moment to review the high level process before we go under the hood with specifics. This comes directly from the product documentation. Once installed, the WakeUp Server component scans the subnets associated with its site. More on that later when we discuss the AgentList.dat file. This purpose is to find a pair of agents to register as the site’s Primary and Alternate proxy agents to be called upon for subsequent wakeups on each subnet.
Within the WakeUp Server console, the Agent Finder screen, below, controls how 1E Agents are discovered (using ping packets) when using Multi-Agent mode. The WakeUp Server process is identical regardless of using SCCM integrated or standalone mode. In our discussion we assume the default installation scenario known as “Multi-Agent” mode. This implies a 1E Agent is installed on every computer in the estate. You can set the number and frequency of pings sent, the timeout for receiving a response and how often the subnet is rescanned. This lets you set a balance between successfully finding all the Agents against the amount of network traffic generated.
The ultimate result of all this is seen in the Agents node of the console, below. This screen displays a list of all discovered 1E Agent systems, by subnet, and enables centralized control over their settings. When a subnet is discovered, and a pair of agents found, the Primary and Alternate Agents are displayed under Agent List.
Discovery Stage 1
The following picture shows the WakeUp Server scanning the target subnets for a running 1E Agent. By default the scan is biased towards any servers or workstations on the subnet and lowest in the priority are laptop PCs.
Discovery Stage 2
The 1E Agents on the remote subnet respond to the WakeUp Server scan by declaring themselves up and running and available to distribute any wakeup calls, as shown in the following picture
Discovery Stage 3
The first two 1E Agents to respond are stored by the WakeUp Server on the SCCM Primary Site Server (integrated mode) or the NightWatchman Management Center server (stand-alone mode). The first Agent is stored as the Primary Agent; the second becomes the Alternate Agent.
Agent Discovery Detail
Now that we’ve established the basic groundwork around the general concepts and process flows used to find and register these agents on each subnet, let’s go into the details of just how this is actually happening under the hood. Assume the processes are the same in integrated and stand-alone scenarios unless otherwise stated.
- The WakeUp Server service starts
- Its AgentFinder thread initiates
- AgentFinder evaluates its companion data file AgentList.dat for the list of known subnets
- See more on this file and its purpose under Agent State Manager later in this article
- It also evaluates the NWMMC or SCCM db for all subnets reported by the WakeUp clients
- Known subnets are listed in the WakeUp Console
- Subnets are processed using PING to look for awake agents
- responding agents are evaluated for assignment criteria (suitability as primary or alternate agent role)
- the first two are registered as Primary and Alternate agents
- Confirm that the registered clients have the following in the WakeUp Registry:
Value Name: MiniAgentTo
Value Data: ‘Insert name of WakeUp Server’
What determines which machine becomes a primary or an alternate?
- The first machine to respond to an agent discovery request by the WakeUp server becomes the primary and the second becomes the alternate.
- If there is just one agent discovered on the subnet i.e. the primary, the first machine it wakes up as part of a wakeup job is forced to become the alternate.
- An agent which was previously a primary or alternate comes up, it assumes the role of an alternate and then probes the subnet for an existing primary. If no primary agent exists, it automatically becomes the primary and registers itself with the WakeUp server as the primary. If a primary already exists, then it registers as an alternate, which forces the WakeUp server to unregister a previously registered alternate agent (should one exist). Please note that this exchange of messaging (again UDP) happens when a machine comes up, so there could be timing issues here.
To manually force the AgentFinder process
- 1. Stop the WakeUp Server service
- Delete the AgentList.dat file from its default location here:
C:\Documents and Settings\All Users\Application Data\1E\WakeUpSvr\AgentList.dat
- Set the following registry value to ON:
- Start the WakeUp server service
- After the WakeUp server service has started wait for a few minutes and the Agent Manager thread will start.
- You should see something similar in the WakeUp Server agent log:
17/06/2009 13:05:08: WakeUpSvr Copyright (c) 1999-2009 1E Ltd. (220.127.116.11r50796) – Service Started
17/06/2009 13:05:08: new file C:\Documents and Settings\All Users\Application Data\1E\WakeUpSvr\AgentList.dat
17/06/2009 13:05:18: Agentfinder started
17/06/2009 13:06:08: AgentManager: Processing subnet ‘192.168.0.0\255.255.255.0’
17/06/2009 13:06:08: Creating an entry for 192.168.0.0 [255.255.255.0]
17/06/2009 13:06:08: AgentManager: Processing subnet ‘192.168.1.0\255.255.255.0’
17/06/2009 13:06:08: Creating an entry for 192.168.1.0 [255.255.255.0]
17/06/2009 13:06:16: AgentFinder Rescan 192.168.10.0 Requested
17/06/2009 13:06:16: Finding agent(s) subnet – 192.168.10.0 [255.255.255.0]
New Agent found 192.168.10.10 for subnet 192.168.10.0
17/06/2009 13:06:19: AGTSTAT4 from XP2 for subnet 192.168.10.0Note: (Note: where XP2 is the NetBIOS machine name found)
Agent ‘XP2, 0, 0, mask=255.255.255.0’ registering for subnet ‘192.168.10.0’
17/06/2009 13:06:19: subnet-192.168.10.0 – Agents=1 OK_pings=2
More on the Agent State Manager
Now that we’ve been swimming about in the deep end of the pool for a while, and the general concepts and related processes have been documented at length, it’s time to add a bit more along the lines of “current Events” related to the Agent State Manager (formerly known in earlier versions as the AgentFinder) process as it is today. There’s a bit of a difference between the way that the Agent State Manager works in SCCM mode vs Stand-alone (formerly referred to as the AFR mode– for the legacy name of the Agility Framework Reporting database use by the NightWatchman management Center).
The Agent State Manager today
- It is a background task in WakeUp Server which runs by default every 10 minutes
- It can be enabled/disabled by setting the “AgentStateManager” registry key (remember, it used to be called just “AgentManager” in 5.6 and previous versions)
- Its job is to ensure that, for all the subnets that are registered in the AgentList.dat file, there are contactable agents which do the job of waking up peers
However, before this process kicks in, one of the first things WakeUp Server does is to work out its boundaries.
- In AFR mode/strategy, it makes a call to the AFR Web Service: “I’m WakeUp Server X – what are the subnet/IP range boundaries which have been defined for me in the NWMC Console?”
- In SCCM strategy, it queries the local SCCM server and retrieves the boundaries for that site
- It then makes the same call to the AFR Web Service
- It then merges the results (i.e. SCCM boundaries merged with the NWMC Console-defined boundaries)
Now WakeUp Server has its boundaries and, if enabled, the Agent State Manager can do its thing. Remember, the IP boundary information controls which subnets and machines it is solely responsible for. This is why we require WU Server to be installed on all SCCM Primaries in SCCM mode, as well as all NWMMC servers should there be more than one in the estate. Now, Every 10 minutes, the Agent State Manager will first try and “discover” new subnets – but it only does this in the AFR (Stand-alone) strategy.
- It makes another call to the AFR Web Service and gets the distinct list of subnets for all the network adapters stored in the database
- It then goes through this list, and discards anything which falls outside of its configured boundaries
- It adds the subnets which ARE inside of its boundaries into the AgentList.dat file, if they aren’t already there
- Any subnets which weren’t in the AgentList.dat file beforehand are considered “new”.
Next, it will delete subnets which have been unreachable for more than a set amount of time (default 30 days), and will also mark as “stale” (i.e. needing rediscovery) those subnets from which we have had no communication in a set amount of time (default 3 hours).
Now we’ve got subnets marked as “new”, and others marked as “stale”. These are the subnets that need discovery.
WakeUp Server will then process the first ten of these (adjustable using the “MaxSubnetsPerPoll” setting), and will queue them for Agent Discovery, as described in detail earlier. If there are more than ten to process, the remainder (the next ten) will get dealt with the next time round (i.e. after 10 minutes). This process continues until all subnets are processed.
The process described above in the “To manually force the AgentFinder process” section will only work in AFR strategy, and will only actually find Agents at a rate of ten subnets every ten minutes. In the SCCM strategy, the subnets are discovered on-demand, so they will only appear in the WakeUp Server console (which is a reflection of the AgentList.dat file) once an initial wake up request has been sent to them. This last statement is often a cause of confusion for the new Administrator, as the expectations is “OK, I’ve got all this installed, but it’s now a day later and I only see a few of my subnets! Where are all the rest?”. Simple: there has yet to be a deployment targeted at them! Be patient. They will appear just fine.
That said, you may ask why does the Agent State Manager only proactively discover new subnets in the AFR strategy. It’s a historic thing. Back in the day when our AFR strategy supported only one WakeUp Server (prior to v6.0.500), it was easy for an AFR WakeUp Server to determine which subnets it cared about – it cared about all the subnets for all the adapters in the AFR database. This wasn’t so easy to do in the SCCM strategy however, because SCCM might have been configured to use boundaries based on IP ranges as well as subnets. So, for example, if SCCM site A looks after, 192.168.0.1 – 192.168.0.128, and SCCM site B looks after 192.168.0.129 – 192.168.0.255, neither WakeUp Server for Site A nor WakeUp Server for Site B really owns the 192.168.0.0 subnet. This was the reason for the “lazy” discovery of subnets – the first WakeUp Server asked to do a wake up on that subnet would try and do the discovery.
This logic is less applicable today, as now you can also have multiple WakeUp Severs in the AFR strategy, and these can be configured to use IP address ranges too. The problem of ownership of a “shared” subnet (as in the example above) still exists, which is why generally you should avoid splitting an individual subnet between multiple WakeUp Servers wherever possible.
The differences between installing WakeUp Server in the AFR and SCCM strategies are becoming less with each release. Since v6.0.500 and the notion of configuring your boundaries in NightWatchman Management Center, WakeUp Server requires the whole AFR backend (in order to pull down boundary information) irrespective of the strategy. It would make sense that, as these differences gradually disappear, we do away with the whole concept of strategy and simplify things by having WakeUp Server “just work”. For that eventuality, “stay tuned”!
Hopefully this series of articles shed adequate light on how the process of discovering subnets, and finding a pair of proxy agents to work with, now makes much more sense, regardless of how you have NightWatchman installed and configured.
(Special thanks to James Davies and Andy Brand for their valuable assistance in this article)