Jumping in right where we left off from part one:
Why 2%: Increase Speed by…
The failure rate and the field tech parameters are related in a triple constraint type of way. As a result, there are 2 schools of thought potentially to increase velocity:
If we can drive down the failure rate and keep the field techs as it is, we can drive up the MPD
If we have more field techs we can handle more failures, thus allowing the MPD to increase
Obviously, the first option is more advantageous since more field techs ultimately carry more project cost. There is a larger increase in MPD if the failure rates are decreased vs employing more field techs.
Why 2%: Real numbers, 30k environment
Getting back to 2%, let’s go through a thought exercise and see what this would look like. Let’s say we are in charge of a 30k seat org. From a technology standpoint as mentioned, the number of deployments per day is, in theory, infinite, and we should be able to do all your migrations all at once. For our 30k estate, in theory, we can do them all in 1 day, our MPD is 100%. This is our starting point.
Why 2%: Real numbers, 30k environment
Let’s dive deeper using some more realistic figures.
From customers present and historically, we are hearing for migrations about a 3 – 7% failure rates for deployments as a whole. This number will be higher during the early days of the migration as we showed them from deployment profile, but in the mid 90% success is good.
I’m going to use 5% failure rate as a round number, 95% success rate. If we use 5%, therefore in a 30k seat org could have 1500 machines that will fail the automatic migration, meaning they will need to be serviced by a field tech via heavy touch. This is a lot of machines at first glance. The goal is to find ways we can drive this failure rate % as low as possible. And as the migration progresses and we get better and better, this failure % is going to go down and the MPD will go up naturally.
Another data point is the amount of time it takes to fix a machine. Per a related Gartner report, a manual rebuild of a machine can take anywhere up to 5 hours depending on the complexity of the machine and the environment. This may seem like a lot but think about the process as a ticket comes in, the tech gets assigned and goes to the office, once at the office he/she has to find the desk of the user, assesses the situation, tries to salvage user data, copies that to the thumb drive, bare metals the machine, install the OS, joins the domain, encrypts the data, re-applies the user profile, re-installs the apps, sets up outlook, etc. It’s a lot.
Therefore let’s say a field tech can service 3 machines a day via heavy touch, which means we have to schedule migrations such that the field tech can keep up with the migration velocity. 3 machines serviced is 5% of 60 machines per day, immediately the number of machines per day goes from 30k in one day to 60 machines per day. Which puts us somewhere around .3% MPD.
Why 2%: Real numbers, 30k environment
What is a way we can increase the MPD? As lowly non-tech project managers, I don’t understand TSs and I don’t understand computers; but I do understand scheduling meetings and I understand resources allocation, so one thing PMs can do is throw more resources at this. How many field techs are included as part of an organization? For a 30k seat, national organization let’s say there are 10 field techs to cover the W10 project. 10 field techs still doing 4 hours to fix a machine doing 3 per day per tech, can cover collectively 30 broken machines per day, which is 5% of 600 machines. Now the MPD is 2% of the estate.
Velocity Cheat Sheet (30k seats)
In our though exercise we used numbers to make it work, but managers can modify these figures based on the org to see what might work best. For example, 15 field techs can cover 45 broken machines per day, which is 5% of 900 machines, which is now the limit per day of 3% of the estate . As mentioned what is better, driving down the failure rate can make a much bigger impact on MPD.
Process – 4 Main Factors concerning speed
How can we decrease failures from a process standpoint?
From our experience, there are 4 main inputs to answer the questions of migration velocity: Visibility, Targeting, Agility, and Strategy. These 4 areas will answer the question on how fast you should go.
Perhaps the most critical factor is visibility into the environment. How small or large is the problem, how small or large are the potential risks. The greater the awareness of the enterprise environment, the less ‘hope’ needs to be relied upon for the migration. The goal for visibility is to ascertain the various components of the environment that may impact a successful migration, as well as a factor in the speed of a successful migration. Reporting is paramount in order to gather this information. Tools such as Tachyon can help uncover and quantify various aspects of migration visibility.
Windows Readiness Assessment
What is the “application footprint”
- Is there a standard baseline configuration? e.g. Apps installed on 50% or more computers.
- What machines vary from the baseline? % installed in an environment
- (Per machine) Determine application usage e.g. per application
- Determine if discovered apps are currently packaged in ConfigMgr e.g. preparing for subsequent installs during OS deployment
- Are the apps packaged to be installed silently/unattended
- If not packaged, what is the priority?
- Determine application compatibility e.g. use Microsoft’s Operations Management Suite (OMS) Upgrade Readiness
What is “data”?
- Define what file-types, folders, registry data, application configuration files, personality: user profile and customizations is constituted as “data”
- What are the data size-limits to determine candidates for Zero Touch vs. White-gloved/manual deployments eg. How much data is “too much” data?
- What will be the capture/backup and restore methods? e.g. cloud-sync before deployment, local drive via USMT hardlink, USMT to centralized storage (file-share/NAS), USMT to peer (1E Nomad Peer Backup Assistant)
What is the hardware readiness? – the goal is to deploy the most secure configuration possible, in order to take advantage of the advanced security features in Windows 10. Not only is it important to know the current state of the machine, but know “what the end-state configuration” of the machine will be at the end of the Windows 10 deployment.
- What is the current state of each endpoint?
- Make/Model e.g. supported, unsupported, to be decommissioned
- OS version
- TPM status
- Virtualization Extensions in firmware
- Disk encryption
- Power-state e.g. battery or AC
- Network connection e.g. wired or wireless
- Hyper-V is running
- SLAT processor
- What are the endpoint’s hardware capabilities?
- Can UEFI be enabled?
- Can Secure Boot be enabled?
- Can it run BitLocker?
- Can the TPM be enabled and activated?
- Can Hyper-V be enabled
- Can it run Windows Defender Device Guard and Credential Guard?
How much variation is in the environment – Visibility
- Leveraging discovery tools similar to Tachyon can help know what might be run into
How standardized are the machines? – Visibility
- This will largely be dependent upon location, but the more vanilla the machine (e.g. call center) the easier and more migrations can be done each day since variables are reduced. On the other hand, if users are admins on their machines and have ability to install anything and configure anything, edge cases will increase.
- What type of hardware models exist in the environment, and how will they respond to running W10.
What is the site breakdown? – Visibility
- Are there point of sale machines or manufacturing machines or VIP machines that will always need a white glove treatment
Application rules and readiness – Visibility
- This is better when compared to XP -> W7, but there may be homegrown applications that need to be tested and validated for W10 before migrations begin. It’s important to know which machines this may impact in order to plan accordingly.
- Are applications allowed to be installed automatically and silently
What and where is Data – Visibility
- What is data for each user leveraging things like USMT estimate as part of CM inventory, how long will this machine take during migration
Once we have visibility into the estate, this will act as a filter to know what machines to target and when. The goals are to always be pointing and targeting to what machines can go right now, and therefore where the exceptions will present themselves. There is no use in targeting machines for migration that might have applications that aren’t yet supported on W10. There are also benefits to targeting the easier machines first to gain momentum and overall migration acumen while leaving outlier machines to the end.
Are there machine hardware models that aren’t going to support W10 and should be naturally migrated via attrition and standard machine replace process over time.
Start Small and Ramp Up – Targeting
- Achieving velocity is done by working out kinks through small initial trials and expanding. The solution will naturally harden over the course of initial piloting, accounting for more environmental scenarios that were impossible to test in the lab and foresee. It’s important to let the business know that there will be issues initially as with any pilot, but the key is to not see recurring issues across subsequent pilots
Advance the Content – Targeting
- One of the greatest friction to velocity is the speed of the network or lack thereof. Especially for low bandwidth sites, small sites, and remote locations, just in time download of information over slow connections can’t be counted upon. It’s important to stage or pre-cache data to these locations ahead of time, going back to Visibility to know what locations may have network issues, and Targeting these locations to go later in the migration schedule.
Good visibility will lead to improved targeting, but there will be things that come up. The ability to be Agile with deployments, due to visibility, will keep the migration velocity high. If there is an issue or a strategic change in direction, it’s important to have a contingency. If a section of machines is not able to be migrated as planned, are there other groups of machines that can migrate instead.
How much capacity can you handle when you get errors (they will happen) – Agility
- Experience lends that anywhere between 1 – 5% of machines migrated will experience issues, from hard drive failures to user (mis) intervention, to old BIOS, to killer applications, as noted previously.
- How much deskside support staff is available. If someone needs to visit 5% of machines, how many can the business handle per day/week/month based on the number of deskside support staff available?
Holidays, Weather Impacts – Agility
- The human element, regarding holidays, vacations, and local weather need to be considered. If offices are in the North East and migrations are occurring during winter months, you can anticipate locations being closed due to snow. Similarly, during summer months users will be away on vacation which might limit migration numbers.
Automation can bring out flaws in processes and team procedures – Agility
- Achieving velocity can highlight breaks in communication between teams very quickly. If communication and team cooperation break down throughout the coordination of mass migrations, this can bring progress to a halt.
The overarching strategy of the business will factor into the migration velocity, both accelerating and potentially limiting. Identifying the goals or concerns of the business stakeholders will uncover any ground rules to be used in velocity considerations. It’s important to understand the technology solution expectations.
What output is expected or solutions are expected – Strategy
- Does the business have expectations around how the migrations are facilitated? For example, what is possible to RE schedule and targeting (e.g. user self-service migration or migration push).
- What is the end state and when, can you work backward to fill in the run rate you need to achieve to hit any committed milestone dates.
How much end-user communication is required – Strategy
- A good communication team not only has mechanisms to reach out to the target but also finds ways to empower users through self-service or ways to get the customer help when something goes wrong
How savvy are users – Strategy
- The last thing you want is a help desk call. This erases all the savings you’ve invested in automation. Are users good at figuring things out that are new in W10 vs W7, or will users call the help desk to find out where State Menu has moved or who Cortana is. The good news is that W10 has been out for a long time, so users should be familiar based on their own personal computers already running W10
Who ‘owns’ the machine, the user or the business? – Strategy
- Some companies are very sensitive to the end user–it’s perceived that end users have a right to the end-point and should not be inconvenienced in any way. This might lend towards a gentler ramp up or lower max velocity.
- Some companies are more cavalier about changes and may allow for minor inconveniences to the end user for the sake of velocity and shortened migration schedules.
How do I start?
The best approach is to start with Visibility, understanding all these things to consider and divide the estate into reasonable migration tranches. Some of these tranches will be able to do very quickly (e.g. corporate offices, call centers, well-connected sites, pilot program users, field techs) while others will need to be done over time (e.g. very remote sites, developer machines, manufacturing machines). It’s important to tell the business a target range, but also inform them there will be peaks and values.