Search
Close this search box.

Creating a Windows cluster: part 3 – Creating a Windows failover cluster

Windows cluster

If you’re reading this third part of the Creating a Windows Cluster series, welcome back!
In the first two installments of the series we covered Using iSCSI to Connect to Shared Storage in Part 1 and Configuring Shared Disk in the OS in Part 2. Now we are at the point where things are ready to actually create the cluster.
This installment will be a little longer. It covers two main tasks. First, the Failover Clustering role will be installed on the prospective cluster nodes. Next, the cluster itself will be created and we’ll check some basic configuration items.
At this point the word “prospective” will be dropped and the machines we have been working with will have graduated to the position of full-blown “Cluster Nodes”.

Step Description
Server Manager Open Server Manager and select the Local Server object in the left pane. Scroll to the bottom of the right pane and select Add Roles and Features from the Tasks menu as shown.
Installation Type Select the option to perform a role-based or feature-based installation and click Next.
Server Selection Click Next at the Select destination server screen.
Server Roles Click Next at the Select server roles screen. You do not need to select any roles here.
Server Features Select the Failover Clustering checkbox at the Select features screen to install the failover clustering components on the server.
Add Required Features When prompted to add features required for Failover Clustering click Add Features.
Confirmation Click Install at the Confirm installation selections screen and complete the wizard as appropriate.

Repeat the steps above on the other prospective cluster nodes.
Now we have completed installing the Failover Clustering feature on each of the servers that will be nodes in our cluster. To review, we have connected to shared storage as part of the first blog in this series. We used iSCSI, but there are other ways to achieve the connection to shared storage, such as Fibre Channel. Next, we configured the disks in the OS of the prospective nodes. At this point, one node has disks online, one has them offline. The next step is to actually create the cluster.

Step Description
Cluster Manager On Start Open Failover Cluster Manager from the Start page.
Validate Cluster Nodes A console with an unpopulated Failover Cluster Manager node opens. Right click on the node and select Validate Configuration from the context menu.
Before you Begin Click Next at the Before You Begin screen.
Select Servers An unpopulated Select Servers or a Cluster screen appears
Add Servers Enter the name of one of the servers that will be a node in your cluster and click the Add button. Repeat for the other server that will be a node in your cluster. Click Next when your prospective nodes have all been added.
Testing Options At the Testing Options screen you can select either Run all tests or Run only tests I select. In the lab environment I used when taking these screenshots I had to select Run only tests I select because the virtual SAN appliance I built does not support SCSI-3 Persistent Reservations. This didn’t concern me for a lab. For production where you want your configuration to be supported your choice of hardware/software in the solution is more important. You want your cluster configuration to pass all tests in a production environment.If you choose to limit the tests you will be presented with a list of tests to choose from before moving on to the next step.
Hb - Confirmation At the Confirmation screen click Next to continue.
Validating The Validating screen will show the progress of the validation checks.
Validation Summary In the Summary screen you can view the results of the validation. You can click the View Report button to open a browser page that displays details about each of the checks. If your configuration is suitable for clustering the Create the cluster now using the validated nodes checkbox will be selected, and you can create the cluster on the heels of the Cluster Validation Wizard.
Create Cluster Wizard If you are continuing with the cluster creation click Next at the Before You Begin screen.
Cluster Access Point Your first step in creating the cluster is to assign a cluster name and a cluster IP address. Enter the desired information in the Access Point for Administering the Cluster screen and click Next to continue.
Cluster Creation Confirmation Select the Add all eligible storage to the cluster checkbox and click Next at the Confirmation screen.
Forming Cluster The progress of cluster creation is shown in the Creating New Cluster screen.
Summary After the cluster is created you can view a report like the one shown in the Validation steps by clicking the View Report button. Click Finish when completed with the Create Cluster Wizard.
Failover Cluster Console At this point you will see that the Failover Cluster Manager node in the console is populated with the nodes, the storage, and the network resources in the cluster and you will be ready to check the configuration of the cluster.
Cluster Disks Click on the Disks node in the Storage object and note that the disks you configured on the nodes in a previous blog from this series are present in the right pane.
Network Properties Open the network properties for one of the three nodes under the Networks object by right clicking on the network and selecting Properties from the context menu.
Heartbeat Network Config Identify the network purpose by the subnet listed in the Subnets box. In this example the first network being checked is the Heartbeat network. Clients will not be communicating over the dedicated heartbeat network, however cluster nodes will communicate with one another over this link. The Allow cluster network communication on this network option should be selected. The Allow clients to connect through this network checkbox should be deselected.
Storage Network Config Open the network properties for the second connection. In our example here the second connection is the Storage network. Again, it can be identified by the subnet listed. The Storage network will only communicate with the iSCSI storage. No client communication will occur over this network and the cluster nodes will not use it to communicate with one another so for the storage network select the option Do not allow cluster network communication on this network.
LAN Network Config The final network in the example is the LAN network. Since this is the network connected to the LAN and it is the one that cluster clients will use to communicate with the cluster the Allow cluster network communication on this network option will be selected and the Allow clients to connect through this network checkbox will be selected. Make note that a cluster “client” can be a workstation or a server. Do not be confused by the word “client” and think it refers to only workstations.

Now the cluster is created and we have checked that the storage is available and the network connections are all configured as they should be. The next thing we should do is to test failover. After all, what good is a cluster if it doesn’t fail over? Might as well just have a standalone machine.
In this test we will monitor the disk resources and watch them as they fail over from one node to the next.
Examine the Failover Cluster Manager. Check the Disks node and look at the storage in the right pane. Make note of which machine owns the storage as indicated in the Owner Node column.
If you aren’t already there you’ll need to log on to the machine that is not the storage Owner Node. You will need to monitor the failover from Failover Cluster Manager on that machine. The test will involve doing something that will simulate the unexpected failure of the node that owns the storage resource and we won’t be able to monitor the cluster from a “failed” node.
As fun as it might sound, I doubt any of us will test our node failover by wailing on it with a sledgehammer, or dousing it with a bucket of water, or making a big noise with a stick of dynamite strategically wedged into one of the drive bays.
Makes me think of a guy I worked with back in the early 90’s. When he would see someone with an open server on the bench something compelled him to make a point of walking by to flip a quarter into the open chassis. Oh the fun of watching George bounce around on the motherboard while scrambling for the power button before something bad happened. But I digress…
No, we are going to pick something a little more benign to simulate a failure. Whatever you do, make sure it is unexpected to the OS of the node you are causing to fail. If you stop the cluster service the resources will fail over, but that failover is handled by the service as it stops. If you do a proper shutdown the service will also handle the failover coordination. We want something that will simulate something like when the motherboard decides to do an impression of a genie leaving the bottle (OK. Old reference I haven’t heard in a long time. Let’s go back 28 years. I was doing component level electronics, repairing military equipment that had been returned from the field, and this was a phrase we used when you powered up a unit and it let out some smoke. “Hey, you just let the genie out of the bottle!” Man, this post turned into a trip down memory lane, didn’t it?).
Your failure needs to be something more along the lines of disconnecting the network cables. Another good option (if you are using virtual machines) is to go into the machine configuration and disconnect the NICs. This way the machine is no longer communicating on the network and the other node will detect that as a failure. Another option is to disconnect the power, not giving the machine a chance to initiate a failover.
Personally I prefer the network connection interruption method. I haven’t seen a machine go belly up due to an improper shutdown in some time, but I guess old habits die hard and I don’t like the idea of not performing a graceful shutdown. Probably never will.
Once you do something to cause an unexpected failure on the node that owns the disk resources you need to watch the other cluster node. It will take about 10 or 15 seconds and you will see the disk resources flash “Offline” and then come back online. The name of the server in the Owner Node column in the Failover Cluster Manager will change from the node that you caused to “fail” to the node that you are monitoring from.
And when you see that, you know your basic cluster is working. Next time we’ll look at the creation of a SQL Server cluster on top of the failover cluster that we just built.
Until then, wishing you all the best!

Report

The FORRESTER WAVE™: End-User Experience Management, Q3 2022

The FORRESTER WAVE™: End-User Experience Management, Q3 2022