Why does reset work on so many machines?

joeyd999

Joined Jun 6, 2011
5,237
That every system should be designed and tested until it is error-free and 100% reliable?
There are methods of writing (and executing) structured test plans. Usually, these test plans are written parallel to and in conjunction with the code specification. These include vectors that test for allowed/disallowed inputs, all possible conditions and branches, and tests for common errors such as null pointers and buffer overflows.

This stuff wasn't taught in engineering school (I studied for a Computer Engineering track EE degree). I learned one of the methods on my first job.

Testing needs to be a large part of software development. The metric we used was 40% time on code, 60% on testing, and 100% coverage of code in the test plan. And no excuses.

Good code and good testing is possible. With the ease of transmitting patches and updates via the internet, I think the emphasis on testing has waned.

But the importance of testing is finding a resurgence: IoT can make for lots of little evil bots that can do lots of damage very quickly.
 

nsaspook

Joined Aug 27, 2009
13,086
There are methods of writing (and executing) structured test plans. Usually, these test plans are written parallel to and in conjunction with the code specification. These include vectors that test for allowed/disallowed inputs, all possible conditions and branches, and tests for common errors such as null pointers and buffer overflows.

This stuff wasn't taught in engineering school (I studied for a Computer Engineering track EE degree). I learned one of the methods on my first job.

Testing needs to be a large part of software development. The metric we used was 40% time on code, 60% on testing, and 100% coverage of code in the test plan. And no excuses.

Good code and good testing is possible. With the ease of transmitting patches and updates via the internet, I think the emphasis on testing has waned.

But the importance of testing is finding a resurgence: IoT can make for lots of little evil bots that can do lots of damage very quickly.
Exactly. I think it's less of "Programmers don't know' and more of 'Programmers don't care".

https://quoteinvestigator.com/2019/...'s Law.,came along would destroy civilization.

If Builders Built Buildings the Way Programmers Wrote Programs, Then the First Woodpecker That Came Along Would Destroy Civilization
 

djsfantasi

Joined Apr 11, 2010
9,156
Exactly. I think it's less of "Programmers don't know' and more of 'Programmers don't care".

https://quoteinvestigator.com/2019/09/19/woodpecker/#:~:text=WEINBERG'S SECOND LAW:,came along would destroy civilization.&text=Weinberg's Law.,came along would destroy civilization.

If Builders Built Buildings the Way Programmers Wrote Programs, Then the First Woodpecker That Came Along Would Destroy Civilization
That’s why in a commercial environment, programmers are NOT allowed to test their code for production. (Of course, they’re required to unit test their code).

A separate department with a separate manager is tasked with integrating all the code and ensuring that it works. In some cases, where software is THE product, a third department tests that the code is ready for “prime time”.

That’s how my company was organized. I was manager of the third level of testing, just before release to the public.
 

MisterBill2

Joined Jan 23, 2018
18,179
Another weakness I have found in many appliances is switch debouncing. So many times I have turned on my coffee pot, kitchen fan, washing machine and microwave only to instantly turn on and back off. On the new washing machine, they went a little too far. You now have to push and hold for 3 seconds to shut it off.
Push and hold is a great idea because it greatly reduces things happening accidentally.

And as for testing, it certainly seems that the budget for checking each new windows OS must be about $25. Yes, it may work, but NO, it is not an improvement.
Now stop and think about a computer driven car: Would YOU want to be in a car that needed 200Meg of updates every week???
 

nsaspook

Joined Aug 27, 2009
13,086
That’s why in a commercial environment, programmers are NOT allowed to test their code for production. (Of course, they’re required to unit test their code).

A separate department with a separate manager is tasked with integrating all the code and ensuring that it works. In some cases, where software is THE product, a third department tests that the code is ready for “prime time”.

That’s how my company was organized. I was manager of the third level of testing, just before release to the public.
:rolleyes: As stated post above about testing, somebody better tell the MS Windows programmers about this.
 

crutschow

Joined Mar 14, 2008
34,285
If I know it right, the famous Apollo's alarm 2102 triggered not one but four software reboots.
Yes, but that was caused by a synchronization phase error between two AC power supplies for the radar dish angle resolvers.
This caused the radar system to sent multiple interrupts to the computer (12.8K interrupts/sec) which then caused the computer to run out of memory, triggering the reboot.
It was not a software or computer problem.
The software continued to do it's job and landed them safely on the Moon.
The power supply specification erroneously stated the two power supplies needed to be frequency locked, when it should have said they needed to be also phase locked.
 

WBahn

Joined Mar 31, 2012
29,979
Though the issues TS is talking about are system-level issues. It really doesn't matter that the software wasn't the cause of the problem -- the system was getting into a state that it couldn't recover from without rebooting the system (or some part of it) it to a known good state. It was still a design flaw -- but far from being an example of a design gone wrong, it was an example of a design done sufficiently well that a whole host of potential (and clearly real) design flaws could be compensated for in real time as they were encountered. It was a design philosophy that incorporated not only a design-and-test approach, but also a design-to-accommodate-unforeseen-occurrences approach. That approach has paid dividends on numerous occasions during the space program as a whole.
 

crutschow

Joined Mar 14, 2008
34,285
It was still a design flaw -
But not a software design flaw.
If a memory overflow occurred from unknown reasons it was designed to do a partial reboot to clear up some memory so it could continue processing the critical data.
How is that a flaw?
 

WBahn

Joined Mar 31, 2012
29,979
But not a software design flaw.
If a memory overflow occurred from unknown reasons it was designed to do a partial reboot to clear up some memory so it could continue processing the critical data.
How is that a flaw?
So using AC power supplies that where not phase-locked wasn't a design flaw?

So a system is fine as long as it doesn't have any software design flaws?

Guess the Challenger was just fine, since it wasn't a software design flaw that caused it to explode.

The TS did not ask why do software systems often respond to the software being reset when the software does something bad. He asked why do so many various machines respond to being reset in some way. There are plenty of purely mechanical or purely electrical systems that exhibit this behavior. There does not have to be software anywhere near it or, even if there is, the issue may be completely unrelated to the software. It is not uncommon for some types of valves, particularly relief valves, to occasionally get stuck open and they can be reset by shutting the system down (or at least that part of the system that has the valve in it) which will let it close. Some valves are specifically designed to operate this way, while other times it is due to wear or contaminants or something else unintended.

One situation I saw was where the system would start misbehaving and then would keep misbehaving until it was shut down and allowed to cool to near ambient temperature. At that point, you could restart it and it would work fine, sometimes for months, before it would happen again. No software in sight, but that system had a flaw. IIRC, the issue was a sensing rod linkage that, under the right thermal gradients through the system, could go over center where it stayed. When the system cooled down, one of the links that adjusted the transfer ratio would put enough side pull on the over center one to pop it back into its correct orientation. Since that was never intended to happen, it was quite fortuitous that the system could recover by shutting it down, waiting, and restarting. On the other hand, if it hadn't, then perhaps the flaw would have gotten diagnosed and fixed much more quickly than it did since the first time it happened would have rendered the system inoperable until it was properly repaired.

EDIT: Removed obsolete text.
 
Last edited:

nsaspook

Joined Aug 27, 2009
13,086
https://www.americanscientist.org/article/moonshot-computing
The cause of this behavior was not a total mystery. It had been seen in test runs of the flight hardware. Two out-of-sync power supplies were driving a radar to emit a torrent of spurious pulses, which the AGC dutifully counted. Each pulse consumed one computer memory cycle, lasting about 12 microseconds. The radar could spew out 12,800 pulses per second, enough to eat up 15 percent of the computer’s capacity. The designers had allowed a 10 percent timing margin.

Much has been written about the causes of this anomaly, with differing opinions on who was to blame and how it could have been avoided. I am more interested in how the computer reacted to it. In many computer systems, exhausting a critical resource is a fatal error. The screen goes blank, the keyboard is dead, and the only thing still working is the power button. The AGC reacted differently. It did its best to cope with the situation and keep running. After each alarm, the BAILOUT routine purged all the jobs running under the Executive, then restarted the most critical ones. The process was much like rebooting a computer, but it took only milliseconds.
Recalling the episode of the 1202 alarms, I asked if the key might be to seek resilience rather than perfection. If they could not prevent all mistakes, they might at least mitigate their harm. This suggestion was rejected outright. Their aim was always to produce a flawless product.

I asked Hamilton similar questions via email, and she too mentioned a “never-ending focus on making everything as perfect as possible.” She also cited the system of interrupts and priority-based multitasking, which I had been seeing as a potential trouble spot, as ensuring “the flexibility to detect anything unexpected and recover from it in real time.”
http://ibiblio.org/apollo/Documents/CherryApollo11Exegesis.pdf
 

crutschow

Joined Mar 14, 2008
34,285
So using AC power supplies that where not phase-locked wasn't a design flaw?

So a system is fine as long as it doesn't have any software design flaws?

Guess the Challenger was just fine, since it wasn't a software design flaw that caused it to explode.
Sorry, we are talking about two different things.
This post has been mainly talking about software flaws requiring a reset, so I was referring to that.
I did not mean that the phase-lock error was not a flaw in the system (which it certainly was, of course).
 

WBahn

Joined Mar 31, 2012
29,979
Sorry, we are talking about two different things.
This post has been mainly talking about software flaws requiring a reset, so I was referring to that.
I did not mean that the phase-lock error was not a flaw in the system (which it certainly was, of course).
Okay. We were just working from two different contexts.
 

Thread Starter

Volttrekkie

Joined Jul 27, 2017
63
A lot of it comes down to budget. Every project has a budget, and often there's only enough budget to make it "good enough". The marketing guys call it MVP, Minimum Viable Product. Something just good enough to sell. A team of engineers is not cheap. Guess that it costs maybe $5k-$10k/week to employ a good engineer. Maybe you've got a group of 10 engineers working on your fancy new touch screen widget, so that's $50k-$100k / week to employ them. Why pay them for 12 months to make it "great" when you can pay them for 8 months to make it "good enough" and potentially sell the same number, but at a higher profit margin? Plus during that 4 months you saved, they can be working on the next product, and get that product to market sooner too, turning even more profit. And it snowballs. Not every market tolerates MVP, but it sure seems to work in the price sensitive consumer electronics market.
Engineers make 260k to 520k a year? Where?
 

Reloadron

Joined Jan 15, 2015
7,501
This is a question that has been bugging me for a long time. I have fixed a whole variety of machines. Air conditioners, ice machines, printers, lifts, scanners, labelers, packaging machines...the list is endless...and of course your own laptop. And as we all know, when the machine just won't do what it should, like get onto the web, and you checked everything else, modem is working, reset it, cables all good, no software conflict you can find...can't find any cause...so you finally decide, well, maybe I'll just try restart. And low and behold it works. Now, this works on all these machines time and time again. There has to be a common reason. And you feel feel like an idiot just having to restart again and act like you know what you are doing without really understanding the exact cause. What is the exact cause? Is it the capacitors? Do they need to be relieved? What is going on? Do you see what I am getting at? We've all seen this surely. Why do all machines do this crap?
Simply put my life's experiences have taught me a few things. There are two keys to happiness the first is the Delete Key and the second is a Reset Key. Don't look for logic where there is none, it will only make your head hurt.

Ron
 

Reloadron

Joined Jan 15, 2015
7,501
Oops, I actually meant every paycheck instead of week, but @ElectricSpidey got my point, it costs a lot more to employ someone than what they make.
During her early years my wife served as an HR (Human Resources) administrator for the salaried side of a large corporation. The rough estimate was 150% plus there was no return on an employee for the first six months. Keep in mind this was thirty years ago but it worked out that if a new salary employee was hired at 50 K annually the cost to have them was about 75 K annually and during the first six months there was no return on investment.

What needs considered is this included many things long gone, like a full paid medical; insurance plan, LTD (Long Term Disability) and any pension plan as well as stock options or matching 401K or similar. Normally it took 5 years to become vested in the pension plan. The company was the old Lear Siegler Corporation. On the bright side she had 15 years under the old LSI and gets a monthly pension check.

Anyway as to employee cost? Really depends on position and benefit plan and good benefits are a disappearing thing these days. Those with city, county, state or federal jobs still have good benefits and retirement plans but such plans are disappearing in the private sector verse public sector.

Ron
 

WBahn

Joined Mar 31, 2012
29,979
Engineers make 260k to 520k a year? Where?
There's a big difference between what an engineer gets paid and what it costs to hire and/or employ an engineer.

A common rule of thumb for lots of professions is that the total cost to employ someone is twice what their gross pay is -- it's seldom much less than that and sometimes quite a bit more. This covers employee-specific costs such as the employer side of the employment tax, health and other insurance premiums paid by the employer, retirement program contributions, vacation/holiday/sick leave, plus another benefits such as educational or child care expenses. Then there are indirect costs such as building lease/mortgage costs, general insurance premiums, maintenance, utilities, and office equipment and supplies. When hiring an engineer you also have to cover the employee costs of those people in the company that do not get hired out, such as IT technicians, office managers, receptionists, etc.
 
Top