Background
This article concentrates on one little-known aspect of the wake-on-LAN portion of 1E NightWatchman. This product is capable of being installed in a so-called “integrated” fashion, where it becomes part of the System Center Configuration Manager (SCCM) product; and, in a “stand-alone” model where it is integral to the NightWatchman Management Center (NWMMC; i.e. no dependency on existence of SCCM). In both scenarios, a fundamental element of its wakeup operation is to ensure there is a proxy 1E Agent up and running at all times on every subnet. This agent then assumes a proxy function working in concert with the server-side agent to receive wakeup requests from the server and to then create and issue magic packets to its subnet neighbors. Part of this process is the method whereby the WakeUp Server component finds a partner agent on each subnet, and a second backup agent to support this process. These agents are known as the Primary and Alternate agents on each subnet. The end goal is to find a pair of machines where these roles may be assigned. We will also ensure one or the other of these agents is up and running at all times to process those wakeup requests. That latter process, known as “Last Man Standing”, will be discussed in a subsequent article. This article provides a high level overview of the discovery and assignment process to identify and assign the two roles initially.
Agent Discovery Overview
Before we dive into the deep end, let’s take a moment to review the high level process before we go under the hood with specifics. This comes directly from the product documentation. Once installed, the WakeUp Server component scans the subnets associated with its site. More on that later when we discuss the AgentList.dat file. This purpose is to find a pair of agents to register as the site’s Primary and Alternate proxy agents to be called upon for subsequent wakeups on each subnet.
Within the WakeUp Server console, the Agent Finder screen, below, controls how 1E Agents are discovered (using ping packets) when using Multi-Agent mode. The WakeUp Server process is identical regardless of using SCCM integrated or standalone mode. In our discussion we assume the default installation scenario known as “Multi-Agent” mode. This implies a 1E Agent is installed on every computer in the estate. You can set the number and frequency of pings sent, the timeout for receiving a response and how often the subnet is rescanned. This lets you set a balance between successfully finding all the Agents against the amount of network traffic generated.
The ultimate result of all this is seen in the Agents node of the console, below. This screen displays a list of all discovered 1E Agent systems, by subnet, and enables centralized control over their settings. When a subnet is discovered, and a pair of agents found, the Primary and Alternate Agents are displayed under Agent List.
Discovery Stage 1
The following picture shows the WakeUp Server scanning the target subnets for a running 1E Agent. By default the scan is biased towards any servers or workstations on the subnet and lowest in the priority are laptop PCs.
Discovery Stage 2
The 1E Agents on the remote subnet respond to the WakeUp Server scan by declaring themselves up and running and available to distribute any wakeup calls, as shown in the following picture
Discovery Stage 3
The first two 1E Agents to respond are stored by the WakeUp Server on the SCCM Primary Site Server (integrated mode) or the NightWatchman Management Center server (stand-alone mode). The first Agent is stored as the Primary Agent; the second becomes the Alternate Agent.
Agent Discovery Detail
Now that we’ve established the basic groundwork around the general concepts and process flows used to find and register these agents on each subnet, let’s go into the details of just how this is actually happening under the hood. Assume the processes are the same in integrated and stand-alone scenarios unless otherwise stated.
HKEY_LOCAL_MACHINE\SOFTWARE\1E\WakeUpAgt\
Value Name: MiniAgentTo
Value Data: 'Insert name of WakeUp Server'
What determines which machine becomes a primary or an alternate?
To manually force the AgentFinder process
C:\Documents and Settings\All Users\Application Data\1E\WakeUpSvr\AgentList.dat
HKLM\Software\1E\WakeUpSvr\AgentManager=ON
17/06/2009 13:05:08: WakeUpSvr Copyright (c) 1999-2009 1E Ltd. (5.6.10.3r50796) – Service Started
Software\1E\WakeUpSvr\Strategy="AFRStrategy"
Software\1E\WakeUpSvr\AgentManager="ON"
17/06/2009 13:05:08: new file C:\Documents and Settings\All Users\Application Data\1E\WakeUpSvr\AgentList.dat
17/06/2009 13:05:18: Agentfinder started
17/06/2009 13:06:08: AgentManager: Processing subnet '192.168.0.0\255.255.255.0'
17/06/2009 13:06:08: Creating an entry for 192.168.0.0 [255.255.255.0]
17/06/2009 13:06:08: AgentManager: Processing subnet '192.168.1.0\255.255.255.0'
17/06/2009 13:06:08: Creating an entry for 192.168.1.0 [255.255.255.0]
17/06/2009 13:06:16: AgentFinder Rescan 192.168.10.0 Requested
17/06/2009 13:06:16: Finding agent(s) subnet – 192.168.10.0 [255.255.255.0]
New Agent found 192.168.10.10 for subnet 192.168.10.0
17/06/2009 13:06:19: AGTSTAT4 from XP2 for subnet 192.168.10.0Note: (Note: where XP2 is the NetBIOS machine name found)
Agent 'XP2, 0, 0, mask=255.255.255.0' registering for subnet '192.168.10.0'
17/06/2009 13:06:19: subnet-192.168.10.0 – Agents=1 OK_pings=2
More on the Agent State Manager
Now that we’ve been swimming about in the deep end of the pool for a while, and the general concepts and related processes have been documented at length, it’s time to add a bit more along the lines of “current Events” related to the Agent State Manager (formerly known in earlier versions as the AgentFinder) process as it is today. There's a bit of a difference between the way that the Agent State Manager works in SCCM mode vs Stand-alone (formerly referred to as the AFR mode– for the legacy name of the Agility Framework Reporting database use by the NightWatchman management Center).
The Agent State Manager today
However, before this process kicks in, one of the first things WakeUp Server does is to work out its boundaries.
Now WakeUp Server has its boundaries and, if enabled, the Agent State Manager can do its thing. Remember, the IP boundary information controls which subnets and machines it is solely responsible for. This is why we require WU Server to be installed on all SCCM Primaries in SCCM mode, as well as all NWMMC servers should there be more than one in the estate. Now, Every 10 minutes, the Agent State Manager will first try and "discover" new subnets – but it only does this in the AFR (Stand-alone) strategy.
Next, it will delete subnets which have been unreachable for more than a set amount of time (default 30 days), and will also mark as "stale" (i.e. needing rediscovery) those subnets from which we have had no communication in a set amount of time (default 3 hours).
Now we've got subnets marked as "new", and others marked as "stale". These are the subnets that need discovery.
WakeUp Server will then process the first ten of these (adjustable using the "MaxSubnetsPerPoll" setting), and will queue them for Agent Discovery, as described in detail earlier. If there are more than ten to process, the remainder (the next ten) will get dealt with the next time round (i.e. after 10 minutes). This process continues until all subnets are processed.
The process described above in the “To manually force the AgentFinder process” section will only work in AFR strategy, and will only actually find Agents at a rate of ten subnets every ten minutes. In the SCCM strategy, the subnets are discovered on-demand, so they will only appear in the WakeUp Server console (which is a reflection of the AgentList.dat file) once an initial wake up request has been sent to them. This last statement is often a cause of confusion for the new Administrator, as the expectations is “OK, I’ve got all this installed, but it’s now a day later and I only see a few of my subnets! Where are all the rest?”. Simple: there has yet to be a deployment targeted at them! Be patient. They will appear just fine.
That said, you may ask why does the Agent State Manager only proactively discover new subnets in the AFR strategy. It's a historic thing. Back in the day when our AFR strategy supported only one WakeUp Server (prior to v6.0.500), it was easy for an AFR WakeUp Server to determine which subnets it cared about – it cared about all the subnets for all the adapters in the AFR database. This wasn't so easy to do in the SCCM strategy however, because SCCM might have been configured to use boundaries based on IP ranges as well as subnets. So, for example, if SCCM site A looks after, 192.168.0.1 – 192.168.0.128, and SCCM site B looks after 192.168.0.129 – 192.168.0.255, neither WakeUp Server for Site A nor WakeUp Server for Site B really owns the 192.168.0.0 subnet. This was the reason for the "lazy" discovery of subnets – the first WakeUp Server asked to do a wake up on that subnet would try and do the discovery.
This logic is less applicable today, as now you can also have multiple WakeUp Severs in the AFR strategy, and these can be configured to use IP address ranges too. The problem of ownership of a "shared" subnet (as in the example above) still exists, which is why generally you should avoid splitting an individual subnet between multiple WakeUp Servers wherever possible.
The differences between installing WakeUp Server in the AFR and SCCM strategies are becoming less with each release. Since v6.0.500 and the notion of configuring your boundaries in NightWatchman Management Center, WakeUp Server requires the whole AFR backend (in order to pull down boundary information) irrespective of the strategy. It would make sense that, as these differences gradually disappear, we do away with the whole concept of strategy and simplify things by having WakeUp Server "just work". For that eventuality, “stay tuned”!
Hopefully this series of articles shed adequate light on how the process of discovering subnets, and finding a pair of proxy agents to work with, now makes much more sense, regardless of how you have NightWatchman installed and configured.
(Special thanks to James Davies and Andy Brand for their valuable assistance in this article)