"Boneheaded" Mistakes

strantor

Joined Oct 3, 2010
6,875
I've only skimmed the thread and haven't read all the posts, but I see a common theme that I think everyone, most definitely myself included, is afflicted by.

Humans are very, very good at pattern recognition and extrapolation. We see part of something and we fill in the gaps with what we don't see but that we know needs to be there. We see the tip of the tail and we know that there is the rest of the lion attached. I think this carries forward to our writing and programming. We know what we meant to write, and so when we read it we tend to not see what's actually there, but rather what we know is supposed to be there. I can't even begin to count the number of times I have written something, from a post here to a paper I'm submitting for publication, that I have proofread over and over only to find several typos as soon as I print it out. Or the number of times I've been trying to point out a silly mistake that a student has made in a program, such as a misspelled variable name, and even after focusing their attention right to the specific word they still don't see the problem because their brain is taking what it sees and replacing it with what it knows it is supposed to be.

The good news is that there are some simple things we can do to greatly help (but never eliminate) this. I've noticed, and confirmed in conversations with countless others, that even minor changes in context make a huge impact. That's why printing out a paper, or previewing a post, allows me to immediately see errors that I've overlooked repeatedly in the editor. On several occasions when I've gone back to the editor to fix something I found in the preview, I couldn't find it again because it was now back in the context where my brain was tricking me. But as soon as I went back to the preview I found it immediately. Another technique, particularly good with programming situations, is to consciously tell myself tell myself to read what I wrote, not what I wanted to write, or to read what's there and not what I want to be there. Essentially I am forcing myself into the role of a reviewer looking at someone else's work and that often moves me far enough away from the mindset that established the familiar context that is masking the problem for me to spot it. If that doesn't help, I start explaining my work, in detail, to my dog (even if she's not around). That forces me to think about the problem and my solution differently, because now I am explaining it instead of solving it and implementing it, which again serves to create a new context in which I notice things I've overlooked. Numerous head-slap moments have resulted -- plus I should, by all rights, have a very well-educated dog by now.
Yep, I always read what I post here because I could proofread something 3 times in the message editing window and find perfection each time, but as soon as I post it and it's in a different font against a different background, well... it's like when the lights come on in a strip club. And like you said, when I go back to fix issues in the editor, I can't find them half the time. I usually program with my IDE in the "dark" theme (easier on the eyes) but sometimes I find it beneficial to change themes and look over things again.

IMO it's pretty bizarre really. It implies (I think?) That you have a mental image of, in some cases, thousands of lines of code, stored (incorrectly, but consistently) in your cranial RAM. And that's pretty impressive.
 

panic mode

Joined Oct 10, 2011
4,983
today i was asked for help... but as i was answering more and more of the pieces of the puzzle came out. it is quite fascinating how things slowly unravel. basically i was informed of an "accident"... and it turs out, when it rains it really pours...

a year or so ago we delivered to client two (out of three) identical systems. both are working and life is good. third one was delayed because of some shortages...

this was a nice project, with lot of development and a great learning opportunity for new members. each system is made of many products and a lot of code but one thing we had some issues with is lead time on one of the components. it was the little control board that we had to source from specific supplier to meet the clients exacting standards. each system uses bunch of those - they use RS485 to connect to our server - everything else is already on Ethernet.

and when the time came to order the material for the third and last system (plus some spares - just in case) that vendor was dealt a blow by supply chain problems. i really did not want to make the third unit different from two that are already running, so i bugged the product vendor trying to find out what the exact challenge was. it turns out that really the only problem was the simple 485 interface chip... well that can't be too bad, they are very similar with many substitutes etc. also i have a whole bunch of inventory but - not for this one (3.3V in a small MSOP10 package).

so after some searching, i found few distributors that did have small quantities and sure enough - it worked. our vendor was able to get them, make the products and deliver them on time. we made sure to get two full sets as one is meant to be spare (just in case the chip shortage lasts longer than expected).

and as mentioned, today i found out that there was a problem...

first i was shown a photo of one board (another product) with burned resistors. they were terminating resistors on an adapter board. apparently one of techs connected power to wrong terminals. ok, no biggie, resistors are easy to replace plus i made sure to have spares of the complete board too.

so, just to be sure that is all, i started asking what exactly happened because those resistors are used to terminate 485 bus. was there anything else on the bus? it turns out that at the moment of accident the power was indeed connected to the communication port - the RS485 terminals, and yes, half of the system was populated with the little controller boards. the ones that our vendor had problem getting 485 chips for. and of course all of the connected boards suffered the same fate too.

now this is by no means a good news, but i try to look on the bright side - it was only half a set and we still have another full set of spares. and one of the programmers is there too. he handled those many times, because fresh boards needed to be powered individually to be programmed (at least set high baud rate and assign node address), before they can be added into network - settings are loaded through software and saved to eeprom. so i suggested he does.

and that is exactly what was already done. the programmer stepped in to help, powered system down, removed all damaged products, configured and labeled each of new ones, then populated system completely (including two reserved slots that are only meant for future use) and then - he powered it up. but he was way overconfident and did so in haste so this time he too made a mistake and - yeah... Huston, we have a problem.

so there is not enough working units to complete the system, not to mention having spares... if he did not decide to connect two extra units (remember the reserved slots...) there would be no spares but it would be enough to complete the system to full capacity.

but that is no longer the case. and of the remaining and reportedly operational units, there are two that are not performing as well. obviously they are damaged too, even though they are not fully dead.

So right now i am banging my head against the wall and trying to get few more of those chips - asap. if i can get them in couple of days or maybe a week, i may replace the blown chips myself, saving tons of time and even more headache. plus that would give at least some relief to two team members feeling really really bad now and hoping for a miracle...



so i went to my list all places that had stock six months ago, checked the links and of course all of them are sold out - except one! we may still be in luck, will probably know tomorrow. i never dealt with this company before. also could not just place an order - they do show stock but require official RFQ, so...

fingers crossed...:(
 
Last edited:

Thread Starter

Ya’akov

Joined Jan 27, 2019
10,235
today i was asked for help... but as i was answering more and more of the pieces of the puzzle came out. it is quite fascinating how things slowly unravel. basically i was informed of an "accident"... and it turns out, when it rains it really pours...
:(
I often reflect on how mostly things aren’t like this any more. What I mean is, only a few decades ago almost any even slightly sophisticated electronic/electromechanical device was extremely vulnerable to simple configuration mistakes—polarity, load, bad adjustment, etc.

I remember how easy it was to just connect something wrong and see smoke. It made for nervous checking and rechecking of everything before applying power, and sometimes you couldn’t even control the application of power. Things seemed so sophisticated, and they certainly were, but they were also naïvely designed. This was also exacerbated by the level of sophistication of the components available.

But today, it is a great surprise when something can be hurt by plugging something into it. For example my problem with a Type-C power supply that was only Type-C as deep as the connector but was otherwise simply a 12V 5A power supply with no intelligence. Plugging it into a 5V, particularly one that expected 5V because if you do nothing that’s what you get, meant smoke.

But lately, that’s the exception. With integrated power management components that include protection, and polarity reversal & overvoltage protection very common, it just doesn’t happen much. But I think it also comes from a difference in the culture of design. In the past, a designer could just pass the buck on protection to the end user. “Just do it right” and nothing will happen.

But we don’t find that acceptable any more. We won’t tolerate a design that can be broken by connecting plugs that fit into the jacks provided. We don’t accept a device that can be destroyed by adjusting its exposed controls. We expect that the designers will either provide protection against these things doing damage, or if somehow that’s not possible, clear warnings about the danger.

But, really, today we mostly expect things to “just work”. In the days of yore, it was not unexpected to struggle with getting sophisticated devices to work, particularly if it was making things from two manufacturers work together. This was often down to the use of cheesy workarounds in the absence of standards for interoperability. There is still some of this, but we have so many standards in every area it is almost never necessary to do it.

I have some smoke-related boneheaded mistakes in my past. It was a learn-the-hard-way thing to be sure. But now, just plug it in (you don’t even have to turn the power off when you do), or just pair the two and magic happens. To me, this is an amazing advance—but then I worked with desktop and handheld computing literally from their beginnings, and with things like radio transceivers, instruments, and other devices while some still didn’t use ICs.

It helps me to recall because it makes me feel like we live in a great technical age today. What will the future bring?
 

strantor

Joined Oct 3, 2010
6,875
today i was asked for help... but as i was answering more and more of the pieces of the puzzle came out. it is quite fascinating how things slowly unravel. basically i was informed of an "accident"... and it turs out, when it rains it really pours...

a year or so ago we delivered to client two (out of three) identical systems. both are working and life is good. third one was delayed because of some shortages...

this was a nice project, with lot of development and a great learning opportunity for new members. each system is made of many products and a lot of code but one thing we had some issues with is lead time on one of the components. it was the little control board that we had to source from specific supplier to meet the clients exacting standards. each system uses bunch of those - they use RS485 to connect to our server - everything else is already on Ethernet.

and when the time came to order the material for the third and last system (plus some spares - just in case) that vendor was dealt a blow by supply chain problems. i really did not want to make the third unit different from two that are already running, so i bugged the product vendor trying to find out what the exact challenge was. it turns out that really the only problem was the simple 485 interface chip... well that can't be too bad, they are very similar with many substitutes etc. also i have a whole bunch of inventory but - not for this one (3.3V in a small MSOP10 package).

so after some searching, i found few distributors that did have small quantities and sure enough - it worked. our vendor was able to get them, make the products and deliver them on time. we made sure to get two full sets as one is meant to be spare (just in case the chip shortage lasts longer than expected).

and as mentioned, today i found out that there was a problem...

first i was shown a photo of one board (another product) with burned resistors. they were terminating resistors on an adapter board. apparently one of techs connected power to wrong terminals. ok, no biggie, resistors are easy to replace plus i made sure to have spares of the complete board too.

so, just to be sure that is all, i started asking what exactly happened because those resistors are used to terminate 485 bus. was there anything else on the bus? it turns out that at the moment of accident the power was indeed connected to the communication port - the RS485 terminals, and yes, half of the system was populated with the little controller boards. the ones that our vendor had problem getting 485 chips for. and of course all of the connected boards suffered the same fate too.

now this is by no means a good news, but i try to look on the bright side - it was only half a set and we still have another full set of spares. and one of the programmers is there too. he handled those many times, because fresh boards needed to be powered individually to be programmed (at least set high baud rate and assign node address), before they can be added into network - settings are loaded through software and saved to eeprom. so i suggested he does.

and that is exactly what was already done. the programmer stepped in to help, powered system down, removed all damaged products, configured and labeled each of new ones, then populated system completely (including two reserved slots that are only meant for future use) and then - he powered it up. but he was way overconfident and did so in haste so this time he too made a mistake and - yeah... Huston, we have a problem.

so there is not enough working units to complete the system, not to mention having spares... if he did not decide to connect two extra units (remember the reserved slots...) there would be no spares but it would be enough to complete the system to full capacity.

but that is no longer the case. and of the remaining and reportedly operational units, there are two that are not performing as well. obviously they are damaged too, even though they are not fully dead.

So right now i am banging my head against the wall and trying to get few more of those chips - asap. if i can get them in couple of days or maybe a week, i may replace the blown chips myself, saving tons of time and even more headache. plus that would give at least some relief to two team members feeling really really bad now and hoping for a miracle...



so i went to my list all places that had stock six months ago, checked the links and of course all of them are sold out - except one! we may still be in luck, will probably know tomorrow. i never dealt with this company before. also could not just place an order - they do show stock but require official RFQ, so...

fingers crossed...:(
You mean like this?

20210810_101856.jpg
20221025_092337.jpg
20221025_092424.jpg

The electrical contractor I had installing my systems brought 120V into the RS485 ports on ALL the remote I/O slaves on the whole line. Luckily I brought up one panel at a time so I caught it on the first fried slave, could have been much worse if I just flipped the big switch. I wasn't even using RS485 (hence the open terminals tempting invitation to land 120V) but it still rendered the slave inoperable.

My bad for not micromanaging them, but this was the 3rd line they've done for me and the first 2 went off without a hitch, so I had a bit of trust in them.

Your calamity was much worse than mine but hit so close to home that I couldn't pass up the opportunity to #metoo.
 

panic mode

Joined Oct 10, 2011
4,983
yup... just like that..

on the left i marked dimension on one of SOIC for scale.

in red is the RS485 chip that is toast. it has internal TVS on field side IO. they paid the price.

on the right in yellow is one of four X2Y caps in 0603 package. there are two diagonal streaks (one on cap and one on PCB) but both seem to be just result of smoke from IC blowing up. if i get the my hands on the boards will check if this just wipes off. also those are the only ones where soldering and alignment looks different - clearly they were hand soldered at the factory... probably because the 485 option was added later on. empty spot above 485 chip is for RS232 version.

1666711657074.png
 
Last edited:
Top