Because we know that human error is one of the largest causes of system down-time….let’s imagine that in their haste to repair the downed server and return to full capacity, an over- zealous administrator pulls the run server from the enclosure!
Here we go again -- clients lose their connections momentarily while the system re-defines itself as a cluster of two. We’re still not down, but we’d better get that third good server back up and running…
So, let’s power it back on… this time we see there are no connections to the new node initially, but then clients start connecting to the new node and the system starts to balance out again.
Wow! Disaster averted. Now we can concentrate on getting the right server repaired and back in servers.
This is only one example of the routine testing we perform. We also have a number of automatic fault processes that test the cluster interconnect and the LAN.
This extreme testing helps to ensure that our customers will be able to deploy this technology with confidence.