Your biggest fumble?

Thread Starter

strantor

Joined Oct 3, 2010
6,782
I've been developing a PC-based (Python) automation system for the past 18 months and last week I entered the home stretch, nearly done. I had just a couple more tiny features to integrate. I was maybe 2-3 days from the end. And my program starts crashing. Not the gentle Python let-down with a nice traceback to tell you exactly where you where you messed up; no, the windows "Python has stopped working" message with no feedback or clue whatsoever as to the cause. It crashed like this once or twice several months ago but not since, so I assumed I had worked it out. Now the crashes happen without fail, but randomly. Maybe 1 hr, maybe 36hrs, but when you start it, you can be assured it will crash as soon as it is needed most.

I found enough whisps of a clue in the Windows event manager that with exhaustive googling I finally figured out what was wrong. I'm handling threads very poorly. I thought I had threading figured out, but I was wrong. I built everything I have worked on in past year, on a flawed implementation of threading. I am talking thousands upon thousands of lines of script. Most of it needs to be changed. It won't be as arduous as the original writing of it but it will not be easy either. Like building a house and while laying the carpet prior to moving in, you learn that your slab is too thin and you have to rip the house down and start over. You know how to build a house now so it should go quicker, but still...

This is one of the worst bumbles that the world has allowed me to make. In what reality or dimension does a mistake like this go unpunished for long? It never should have worked the way I wrote it. But it just "happened to" work until IT DIDN'T.

WHAT? WHY?

FML

I'm leaving work early. Going to go get some donuts. Not sure what to do after that. I have to dispose of a dead raccoon later but only after a few beers. It would help to hear that I'm not the only one who has made such grand booboos. What's your story? Please, one-up me. I need it.
 

Wolframore

Joined Jan 21, 2019
2,609
I’ve got a boo boo that could have been very dangerous... ugh, luckily my son discovered it before anything happened because he just likes to break things. Complete programming oversight. Unfortunately can’t go into all the details but it could have been a lawsuit.
 

Thread Starter

strantor

Joined Oct 3, 2010
6,782
Hello there :)
Just think of it as your... crash course in Python.

View attachment 222427
ef9434b43760b79a51b9e296f07321e3b7b5999ea364b079e58c0dcf4cc0453c.jpg



:p

"Crash" course indeed. This will be one of those lessons not soon forgotten.

Well a minor feast on Krispy Kreme donuts pacified the right side of my brain, distracting it long enough for the left side retake its rightful place at the helm and disable panic mode. I realized that it's not actually "that" bad. Most of my code has nothing to do with threading, so most of it won't require changes. The only parts that will require changes are those where my various scripts interact with the UI. Maybe 25% of the whole project. Once I figure out the changes that need to be made, I should naturally find a "rhythm" in going through and making them. Might only take a week or two of mind-numbing drudgery to set this right. I'm sure it won't be as simple as "find & replace" but maybe simple enough to write a quick script to make all the changes for me.

Good thing I haven't been tooting my horn about how close I was. With any luck I'll have it fixed before anyone knows there was a setback. Of course I'll give an accurate progress report if asked, but I'm not often asked.

This just in... right brain still expects beer. Will comply. Now.
 

KeithWalker

Joined Jul 10, 2017
3,063
This is one that I will never forget:
I was a Junior Technician in the Royal Air Force in 1957. I worked in the maintenance unit of No.1. Radio School, RAF Locking. One of my duties was to supervise the public address system during parades on the square. I mentioned to my boss, the Flight Lieutenant in charge of maintenance that the Amplifier was very obsolete and in really bad shape. He asked me If I could build a better one if he got me the components. I accepted the challenge because routine maintenance was very boring.
I built one, using valves (vacuum tubes) of course in those days, with two EL84 output pentodes. I checked it on the square and it worked very well except for a tiny bit of 50Hz hum. I decided that the filaments should have a hum-dinger added (a 100 ohm pot across the filaments with the slider connected to chassis ground).
I took it back to the shop, plugged it in and did a couple of tests. Then, just to be very safe, I unplugged it and with a screwdriver I momentarily shorted the smoothing capacitors on the 600VDC anode supply.They made quite a splat! When it was all "safe" I went in with both hands to wire up the new pot. There was a very loud BANG and I was hurled through the plywood wall behind me. I had unplugged my soldering iron, not the amplifier! I sat up in the middle of the floor of my boss's office. He looked over his desk at me and casually said "Would you mind knocking, next time, airman!"
 

MrChips

Joined Oct 2, 2009
30,706
My biggest fumble was I was in charge of backing up the hard disk drive on to magnetic tape on our research computer (DG Nova 2/10). I was doing this routinely once a month but had never bothered to check that the backups were done correctly. Then the inevitable happened. The hard drive crashed and the platter had to be replaced. When that was done and came time to recover the backup files, horror! All the backed up directories were empty. I was supposed to go into each directory and copy the contents over to the backup directory! There was about 18 months worth of research work lost.
 

Thread Starter

strantor

Joined Oct 3, 2010
6,782
My biggest fumble was I was in charge of backing up the hard disk drive on to magnetic tape on our research computer (DG Nova 2/10). I was doing this routinely once a month but had never bothered to check that the backups were done correctly. Then the inevitable happened. The hard drive crashed and the platter had to be replaced. When that was done and came time to recover the backup files, horror! All the backed up directories were empty. I was supposed to go into each directory and copy the contents over to the backup directory! There was about 18 months worth of research work lost.
Wow, that's pretty bad. I bet you weren't very popular for a while. I guess I can be thankful that I've only inconvenienced myself. I mean there will be a delay that effects others but chances are they won't know it. Thanks for sharing.
 

Thread Starter

strantor

Joined Oct 3, 2010
6,782
This is one that I will never forget:
I was a Junior Technician in the Royal Air Force in 1957. I worked in the maintenance unit of No.1. Radio School, RAF Locking. One of my duties was to supervise the public address system during parades on the square. I mentioned to my boss, the Flight Lieutenant in charge of maintenance that the Amplifier was very obsolete and in really bad shape. He asked me If I could build a better one if he got me the components. I accepted the challenge because routine maintenance was very boring.
I built one, using valves (vacuum tubes) of course in those days, with two EL84 output pentodes. I checked it on the square and it worked very well except for a tiny bit of 50Hz hum. I decided that the filaments should have a hum-dinger added (a 100 ohm pot across the filaments with the slider connected to chassis ground).
I took it back to the shop, plugged it in and did a couple of tests. Then, just to be very safe, I unplugged it and with a screwdriver I momentarily shorted the smoothing capacitors on the 600VDC anode supply.They made quite a splat! When it was all "safe" I went in with both hands to wire up the new pot. There was a very loud BANG and I was hurled through the plywood wall behind me. I had unplugged my soldering iron, not the amplifier! I sat up in the middle of the floor of my boss's office. He looked over his desk at me and casually said "Would you mind knocking, next time, airman!"
The way you told that story, I can see it all play out in my head as if it were a movie. Great (bad) story! Thanks for sharing.
 

Papabravo

Joined Feb 24, 2006
21,158
Building several hundred units to reduce the expected 8 week lead time for an order from a customer...that was never placed.
 

bogosort

Joined Sep 24, 2011
696
This is one of the worst bumbles that the world has allowed me to make. In what reality or dimension does a mistake like this go unpunished for long? It never should have worked the way I wrote it. But it just "happened to" work until IT DIDN'T.
I'm guessing you inadvertently introduced subtle race conditions in your threaded code. Few bugs are as vile and wretched. Pure evil hiding in your seemingly functioning code.

I had a related fumble at work a few years ago. The engineers who designed and developed one of our industrial data acquisition products had recently left the company and I inherited the project. It was basically a custom analog front-end that interfaced with a commercial single-board computer, which ran our application software on Linux. The SBC had gone obsolete but I found a suitable replacement part. Unfortunately, the old 2.6 kernel we had been using couldn't recognize the new hardware, so my first real task was to port the application to a more modern vintage of Linux.

The application was a tightly-coupled set of software, including a hardware driver, custom kernel module, and a fairly complex threaded C++ program to coordinate data acquisition, calculations, etc. As I dove into the poorly documented code to try to understand how everything worked, it quickly became clear how much work I was going to have to do. Linux had changed the entire kernel locking paradigm (no more Big Kernel Lock), so I had to rewrite the kernel module from scratch. Worse still, the app was heavily (and improperly) threaded and full of race conditions -- running Valgrind on the code showed over 10,000 threading errors.

So, naturally, when I discovered the product behaving weirdly with the newly ported software, I immediately suspected the threading code. With our customers impatiently waiting for an update, I spent six weeks tearing apart the code, making timing diagrams of how it should work, redesigning the mutex strategies, and rewriting the core of the code. Of course, the weird behavior persisted despite my efforts.

Completely at a loss for what could be wrong with the software, I decided to see what was happening on the hardware side. The product used a PC/104 bus -- essentially 16-bit ISA for industrial applications -- for communication, so I hooked up my scope and began probing pins. Surprisingly, the weird behavior stopped whenever I would look at a signal. As long as I was probing, everything functioned normally. A true Heisenbug!

Sitting back in my chair, I considered how this could be possible. It suddenly dawned on me that the probe itself introduces a small capacitance into the circuit. Looking around my desk, I found a 1 nF ceramic cap and placed it between a signal pin and ground; everything started working again. Eureka! Like any bus, PC/104 has capacitance thresholds to meet its timing requirements, and the new single-board computer didn't quite meet that threshold. After six weeks of intense, frustrating, anxious work, the fix was to include a 1-cent capacitor on the back of the PCB.

I had long been conditioned to assume that every problem was a software problem, to the point that I didn't even first check some simple stuff on the hardware. It was a painful and humbling lesson to learn, but all of that pain ensures that I'll never forget it! Consider your fumble a valuable lesson and use the opportunity to become proficient in threaded programming.
 

Thread Starter

strantor

Joined Oct 3, 2010
6,782
I'm guessing you inadvertently introduced subtle race conditions in your threaded code. Few bugs are as vile and wretched. Pure evil hiding in your seemingly functioning code.
Yes, at least I think that's the proper term. It came up repeatedly in my quest for resolution. I wrote several scripts that each need to run in continuous loops and do various things:
1. Communicate with a web API server
2. Communicate with a Siemens PLC
3. Communicate with an P3000 PLC
4. Communicate with several Click PLCs
5. Communicate with a RS232 printer
6. Communicate with a TCP/IP RFID reader
7. Decode railroad RFID IDs
8. Write .CSV data logs to a network drive
9. Communicate with an Adam RS485 I/O module
10. Make information to & from all these sources available through a QT GUI.

Since QT has built-in threading support which is more intuitive than Python's threading, i chose to use my GUI as the central script, calling each of these scripts in turn via its threadpool. The most convenient thing of all, was that I could then create global variables for every piece of data collected by, or sent to, any of these scripts, and use it whenever and wherever I wanted, without having to use slots and signals (.emit & .connect). Except that was absolutely untrue. I was reading and writing data between several concurrently running threads with reckless abandon. I have no idea how or why it ever worked at all, it was so rampant throughout.

I'm starting from scratch on my GUI thread, creating signals for every piece of data. Each script running in a thread other than the GUI thread will read and write only to/from the GUI thread, nothing else, and only via signals, no more global variables. The GUI thread will truly serve as the central point where all data from other threads is kept, no more grabassing amongst threads.

Unfortunately I don't see any way to automate the changes. It's a grueling manual process. I finished (properly-I hope) integrating the first script today (the easiest one) with the GUI. Only 8 more to go. And I did just happen to be asked for a progress report and I spilled the beans, so everyone knows I'm holding up the show. But it's kind of a relief that it's out there now.
Consider your fumble a valuable lesson and use the opportunity to become proficient in threaded programming.
Yeah, definitely. I didn't even know what threads were a year ago. I'm not really a computer programmer. I have zero formal training. I'm an industrial automation guy (PLC "programmer") who occasionally used python for tasks that were outside the scope of what a PLC is cut out for. This project was supposed to be one of those simple projects at the beginning but it just kinda grew, faster than my skills, and turned into something that I never thought I would be working on. I have learned A LOT from this, and the school of hard knocks is still in session.


I had a related fumble at work a few years ago. The engineers who designed and developed one of our industrial data acquisition products had recently left the company and I inherited the project. [...] After six weeks of intense, frustrating, anxious work, the fix was to include a 1-cent capacitor on the back of the PCB.
Man, that sounds like a major PITA. at least it wasn't your mistake though.
 
Last edited:

cmartinez

Joined Jan 17, 2007
8,218
I've been programming in VB.net for quite a few years now, and in my experience proper thread programming becomes unbearably hard to keep track of for complicated systems. So the technique I use is set flags in objects that raise events (for example, buttons) and use a single polling loop in the entire program that checks the state of said flags and acts accordingly in the best possible order, clearing those flags after done.

That way the possibility of thread crashing is completely avoided, and also conflicting assignment of values to critical global variables is eliminated. It is not as sophisticated as "real" thread execution, but it is versatile and very easy to keep track of. And given the speed of modern computers, response time is not an issue.
 

Thread Starter

strantor

Joined Oct 3, 2010
6,782
I've been programming in VB.net for quite a few years now, and in my experience proper thread programming becomes unbearably hard to keep track of for complicated systems. So the technique I use is set flags in objects that raise events (for example, buttons) and use a single polling loop in the entire program that checks the state of said flags and acts accordingly in the best possible order, clearing those flags after done.

That way the possibility of thread crashing is completely avoided, and also conflicting assignment of values to critical global variables is eliminated. It is not as sophisticated as "real" thread execution, but it is versatile and very easy to keep track of. And given the speed of modern computers, response time is not an issue.
I played around with VB6 when I was a tween, it was my very first experience with programming. But I've had no more exposure to VB since then apart from the occasional Excel Macro. I really have no idea in what ways it's similar/different from Python. I don't see the need for the polling. Is that maybe something that makes more sense in VB.net than it does in Python? Or am I missing the benefit? I actually considered doing a polling main (GUI) loop but then decided there was no need for it. Since I'll be writing each variable back to my main thread in one place and one place only, I don't see the opportunity for any conflicts. And since this (seems to) happen in real time, all event driven, all of the concurrently running threaded functions will have the same access to the same real-time information. The polling would only introduce unneeded (albeit minimal) delay. Maybe I will change my tune later this week when I actually have several threads running concurrently (again). But I would like to hear your reasoning, may save me a few days of (re)re-work.
 

Thread Starter

strantor

Joined Oct 3, 2010
6,782
Ok I've just finished rewriting my entire GUI/threading program to route all data shared between threads, through signals to/from the GUI. It's been running about an hour now, no crashes. Now to see if it will run at least a week....

I am actually kinda happy this happened. I learned a lot and it gave me a perfect opportunity to do some housekeeping in my code.

I opted not to use any polling timers for anything. The threaded functions go immediately back into the queue (QthreadPool) as soon as they've ran, and the GUI is updated in real time. I have incorporated about 10x more data into the GUI than I had before* and it is cool to see all the data changing on screen in real time finally.

*This incorporating of the data into the GUI display was what set off the crashes I think. It seems all the data being shared across threads wasn't a problem until I tried to display it in the GUI. For the duration of the development I was using only the terminal output to verify data, and bringing it into the GUI was the last step.
 

Thread Starter

strantor

Joined Oct 3, 2010
6,782
1 week test complete, successful. Very pleased. If anyone needs assistance with PyQt5 threading in the future, tag me. I can almost confidently say that I sorta know what I'm doing now.
 
Top