System Upgrades Discussion

Thread Starter

MisterBill2

Joined Jan 23, 2018
18,463
How large of a company is this? If properly prepared, all updates (not just OS) should always be able to be successfully reverted. If not, you put your livelihood at risk. Note that I didn’t specify Windows; this principle stands for ALL software upgrades.
The company was large enough but the problem was due to an update the day before, and we discovered it as the computer was being started up about ten minutes before riding in a crash car The computer was our data recording device, and suddenly everything that had been OK two days earlier was no longer OK.
And you can not imagine how much a crash car crash costs. Fortunately we were not the auto company running the test, so we only lost our data. That auto company does not use MS anything for their crash data system. It was explained to me that they simply could not afford unreliable software in the crash test area. That says it all, doesn't it?
 

djsfantasi

Joined Apr 11, 2010
9,160
The company was large enough but the problem was due to an update the day before, and we discovered it as the computer was being started up about ten minutes before riding in a crash car The computer was our data recording device, and suddenly everything that had been OK two days earlier was no longer OK.
And you can not imagine how much a crash car crash costs. Fortunately we were not the auto company running the test, so we only lost our data. That auto company does not use MS anything for their crash data system. It was explained to me that they simply could not afford unreliable software in the crash test area. That says it all, doesn't it?
It says a lot about the person who made the statement and less about the software.

Our e-commerce systems were available year round with a 99.995% uptime. Every system was mirrored, system snapshots taken before every update (EVERY) and running systems were leapfrogged. That is System A was running a proven configuration while System B was being updated. Then, System B was, at a minimum, run through an automated test suite which certified it was ready for production (manual test scenarios were run for site app software if that is what was updated). Once System B was certified, then all future traffic was routed to the new system (via network configuration files).

All systems were paired with on offline partner. All online systems were grouped and traffic was load balanced across the group.

It rarely happened, but if a system (or group of systems) failed, a network configuration file update and seconds later we were running with NO loss of data.

Oh, and these weren’t all physical systems. One piece of iron ran 4-8 virtual Windows systems, so this topography was affordable.

In over ten years, we had only one failure, when the vendors firmware on a storage system seized. Not an update, just an obscure bug. Once in ten years. On systems running Windows.

A poor craftsman blames his tools. A gifted craftsman uses his tools to make something that works.
 
Last edited:

Thread Starter

MisterBill2

Joined Jan 23, 2018
18,463
There is a serious limit to the amount of anything that you can put on a crash car, and there is a limit to the number of "g"s that any equipment can survive repeatedly. And there was a definite limit to both our budget and also our installation time window. OUr project was being given a ride-along because the large auto company knew it was worth it. But we did not have an unlimited budget.
 
Top