I recently posted a lengthy and detailed blog 1E WakeUp Server and its AgentFinder Process. This article introduced the concept of identifying 1E Agents present on the local subnets within the boundaries of a WakeUp Server installation, and assigning two of them the roles of a primary and alternate agent for their respective subnets. The purpose of this effort comes into play when it becomes necessary to ensure clients machines on a given subnet are up and running at a designated time. This technology uses one of the remote agents as a proxy of the WakeUp server to power on the required systems using magic packets in a traditional Wake-On-LAN (WOL) scenario. This article will take the reader on from the previous article to the final stage of all of this effort. In the previous article we stated that a fundamental element of our 1E NightWatchman product is its wakeup operation is to ensure there is a proxy 1E Agent up and running at all times on every subnet. This agent then assumes a proxy function working in concert with the server-side agent to receive wakeup requests from the server and to then create and issue magic packets to its subnet neighbors.
Now that we have identified a pair of 1E Agents on each subnet (documented in the previous article referenced above), how do we ensure that there is always at least one agent up and running after hours if we are applying a power management policy to turn them off at a specified time? How are these actually used when it is time to awaken systems? Let’s take each of these questions in turn. Together they provide the end-to-end story.
Last Man Standing (LMS)
This term refers to the process used to ensure that there is always one agent up and running on each subnet to act as the proxy for WakeUp Server. For this discussion, we continue to assume we are working in an integrated fashion, where 1E WakeUp Server is installed on a Microsoft System Center Configuration Manager (SCCM) primary site server. Its purpose is to identify “when” a mandatory deployment is scheduled to execute (explained extensively in the previous post, above). At the scheduled execution time, the WakeUp Server component derives the “list” of all machines in the target collection of the deployment, including all of the data needed to create a magic packet. It then determines which agent(s) on a given subnet are, or should be, up and running to receive this data and then create and execute the magic packets to the target systems. The problem here is simple: how do we ensure that one or the other of the previously discovered and assigned primary and alternate machines remain on at all times? If one of the two is powered off, we need to ensure that the other is powered on. Whichever remains on is referred to as the Last Man Standing on that particular subnet. So how is a machine powered off in the first place? Assuming there wasn’t a power outage, there are really only two scenarios: a NightWatchman power policy is applied to a location (e.g. “I want all of you systems to shut down at 6pm”); or, the user of the machine initiates a normal shutdown as they perhaps leave work early (i.e. from the Windows START menu, the user initiates shutdown). In either case, we need to ensure that one of the systems remains powered on and retains the role of the “primary” agent, as it is to this system that the WakeUp Server hands off the task of waking the required systems on the local subnet. In the first scenario, where a system is told to shut down via a policy (“It’s now 6pm. Its time to shut down”), the primary agent simply ignores the request and stays powered on. It is the last man standing. The alternate, because it is the alternate, shuts down as directed. When the user initiates the shutdown, however, things get interesting. In this scenario, the primary agent will not ignore the shutdown. Instead, it will interrogate the alternate agent. If it is on, then the role of “primary” is transferred (that system will then ignore the policy based shutdown). If, on the other hand, the alternate is in an “off” state, the primary (in the process of shutting down) will wake the alternate up. It will then assume the primary role, and it is now the last man standing. The following short video animation illustrates these scenarios clearly.
“WAKE UP, People! There is WORK to do!”
Now that we know how to discover a pair of systems on every subnet and assign them a primary or alternate agent role, and ensure one stays on at all times, what exactly does that agent actually do when it’s called upon by the WakeUp Server? As we discussed earlier, when an SCCM mandatory deployment is at the scheduled execution time, the WakeUp Server then identifies all the systems in the collection to which that active deployment is intended for. The list of those machines, along with the elements needed to create a magic packet (all taken from the SCCM database inventory of those machines) is then parsed out to the primary agent on each subnet involved. (note: part of the WakeUp Server install actually installs a 1E Agent on that primary server, in addition to the WakeUp Server service. The service actually hands off the wakeup list to its companion agent. That agent actually does the communications with the respective primary agents involved. I’ve omitted that small element here for simplicity). It is important to note that this process is done through simple HTTP/HTTPS. Consequently there is absolutely no impact to, or need for configuration of, any routed network equipment in play anywhere in the enterprise! The primary agent on a subnet simply receives the instruction list of those machines needed on the subnet. That primary agent then crafts industry standard WOL magic packets (using the data included in the wakeup list generated at the server side) and proceeds to send them to those target devices. Once those machines are powered on, the SCCM client agent is started, sees the existing (or perhaps receives new) policy to execute the task that started all of this in the first place, and the task is initiated. If the full NightWatchman product is in play, then once the SCCM task is complete, and assuming there is not a second task scheduled in the near term, the shutdown element of the 1E Agent on the awakened client is automatically returned to the appropriate low power state. Simply put, this means that with the addition of NightWatchman in the environment, critical tasks like Patch Tuesday security updates can now be deployed overnight, complete with the needed “reboot” (i.e. shut down when task completed; powered on the next morning for normal operations), resulting in near 100% success and added security overnight! The following animation illustrates this process nicely.
While the above process illustrates the most common scenario, 1E WakeUp integrated with SCCM, the process of waking systems in a standalone NightWatchman Enterprise installation is essentially identical. The only differences is that the standalone installation doesn’t have the ad hoc waking action generated in concert with something like an SCCM deployment schedule. This standalone wakeup process would more likely be initiated manually by the administrator waking a group of machines from the server console directly, for example. Once initiated at the server side, regardless of how the action is initiated, the entire process of creating and handing off the wakeup list of target machines to a local proxy is identical, as the clients also report a basic inventory upon installation which is adequate for magic packet creation.
A Few Final Thoughts
In order for any of this WOL based technology to work, whether it is our NightWatchman solution or any other, there are a few basic caveats that need to be made clear. First of all, the managed computers themselves need to be properly configured to support WOL magic packets in the first place. This occurs typically in two places: the network interface card (NIC) driver itself (typically seen under the [power management] property as shown below; and in the computer’s BiOS.
It is also worth noting the importance of the above option set to “Only allow a magic packet to wake this computer”. This will prevent any number of random bits of noise that may be present at the network port from waking the machine needlessly. If you are unsure of the ability of a given machine to respond to a magic packet, or if it is properly configured to do so, 1E provides an excellent tool in our 1E Free Tools repository called Magic Test. This provides a quick and easy means to determine if a machine will awake at all, and if it is even receiving magic packets in the first place.
The included web reporting system included with NightWatchman provides a serious amount of wakeup statistics around wakeup successes, failures, and so on. Armed with this data, together with native SCCM deployment success reports, the Administrator is provided comprehensive information about the environment and its general condition related to WOL activities.
Lastly, there are also scenarios where no WOL tool will ever be able to wake a machine. In the LMS paragraph above I mentioned a power outage, even if later restored. Likewise, there is the scenario of a user shutting down a machine in a “less than graceful” way: pressing and holding the OFF button! In each of these scenarios, the machine is left in a state where the NIC is totally dead. There is no power applied at all. Consequently, the NIC is no longer able to monitor its network jack to “see” and process a magic packet that may be aimed at it. Consequently a system in this state cannot be awakened. You can easily determine this state by physically looking at the NIC’s Ethernet jack and not seeing the telltale flickering green and yellow lights. In these situations, there is nothing to be done, unfortunately.
This article, together with my earlier post 1E WakeUp Server and its AgentFinder Process, provide the definitive overview of the underlying process behind the WOL portion of our 1E NightWatchman Enterprise power management offering. It provides the enterprise with a powerful systems management capability by deploying software off hours, including application of a reboot, with no disruption of the end user. Ad hoc, one-off, wakeups may also be initiated by a help desk technician for remote access to a user machine if needed. In my next article in this series I will address the process whereby a user outside the organization (when at home, for example) is able to wake his or her machine, even after the evening’s scheduled policy shutdown has occurred. This then provides a simple means for remote RDP access to the work computer from anywhere. This capability is implemented via NightWatchman’s component known as Web WakeUp.