Inconsistent Results with Different Chips

Discussion in 'Embedded Systems and Microcontrollers' started by jpanhalt, Oct 24, 2014.

  1. jpanhalt

    Thread Starter AAC Fanatic!

    Jan 18, 2008
    5,685
    900
    Last March I put a project to bed and reopened it last week. When I put it to bed, I was getting inconsistent results. That is, one chip would work and a presumably identical chip wouldn't work. That inconsistency still exists.

    Here is the basic project:
    upload_2014-10-24_10-18-8.png

    Oops: The Inclinometer chip is a PIC12F683.

    T2 is the 16-bit high count for a PWM signal; T3 is the 16-bit count for the period. The ratio gives a value that is converted to inclination using a look-up table.

    When I started getting inconsistencies, I stripped the program to its bare essentials and just print out the four bytes of data. That program in MPASM with a lot of comments to myself is attached (change txt extension to .asm for viewing with MPLab).

    The symptom I see is that one chip (call it #1) will print to screen the register values and update several times a second as expected. Another chip (#2) will print to screen one set of register values BUT NOT UPDATE. If I reset with MCLR, then chip #2 will print appropriate new values, but still not update. Both chips have been programmed with the same build using either an ICD3 or PK3, yet one sometimes works and the other doesn't. Chip #1 is the older chip and has been erased and programmed many times. Chip #2 is actually any one of three chips. One is a bit older than the others. One was absolutely brand new out of the anti-static shipping package yesterday.

    Hardware: Two setups have been used. One is a breadboard that drives the display data with PortB. It has a high-quality ZIP 40-pin socket for the MCU. The other is a commercially made PCB that drives the display with PortA. It uses a standard socket for the MCU. The includes in the header simply have the defines needed for using PortB or PortA, They are no secret and I would be happy to include them, but long questions get fewer replies. Even using the bread board only or prepared PCB, different chips behave differently.

    I have tried wiggling the chips; different power-up and power-down sequences; removing chips, placing them on anti-static pad and replacing; letting the frozen display run for up to an hour or more; and anything else I could think of.

    My ultimate question is what sort of things can I try to pin down the program error that appears to be handled differently by different chips? If you do a compile and disassembly, you will see that the PC goes to 0100h during the uDelay subroutine. I tried adding a Pagesel for the GOTO LCD_POWER instruction and elsewhere with no effect. (I didn't think that would be necessary with the enhanced chips for such a relatively short program.)

    Regards, John
     
  2. ErnieM

    AAC Fanatic!

    Apr 24, 2011
    7,386
    1,605
    What are these mystery chips 1 & 2? Where do they go? Did you build both halves or is one part purchased? Do you have a way to monitor raw serial data?

    The ultimate way to pin down problems that arise while the code is running is to use an in-circuit debugger such as an ICD or a PICkit. If the code seems to freeze you can see what bit isn't flipped (such as the serial data is no longer being received due to an unhandled overrun).

    I'd also check the date code of the #1 & #2 chips to make sure one isn't an older silicon rev and has a weird problem.
     
  3. jpanhalt

    Thread Starter AAC Fanatic!

    Jan 18, 2008
    5,685
    900
    The GLCD chips are all 16F1519 from Microchip purchased from DigiKey. Date code #1 = 1119EJB; dates codes for #2's are 1113485, 1113485, and 1119EJB.

    I designed and built all circuits. The Inclinometer is a small PCB plugged into a breadboard for power and ease of access. I can and do monitor the serial output on a Tek210 'scope and it looks fine regardless of which chip is driving the GLCD.

    I have done in-circuit debugging with the ICD3 and a separate accessory board, not the board with the GLCD. It ran fine in debug. Of couse, it runs fine with one chip too in real time. I am reluctant to attach the PCB with the GLCD attached (referred to above as the PortA program) because I did not design that board to allow turning off the back light. That extra 30 mA or so may be too much for the ICD3. Of course, I could attach an external supply, but haven't done that.

    I don't think it is an overrun or framing error, as I check the OERR and FERR and branch to clear and return from the interrupt on failure. Just a thought to myself, maybe I will do a "RESET" instruction at that appropriate place to see if or how many times the program goes there (screen should blink).

    One additional comment: The four bytes are received in two transmissions of two bytes (H/L) each. One concern was that by poor luck one might start one chip at the wrong time relative to those transmissions. That is why I do multiple starts, MCLR's, and/or let the chips run a long time before concluding one is not working.

    I am assuming that a RESET inside an ISR does not need to have any other resets, including RETFIE before or after it? Please let me know if that is not correct.

    John
     
  4. jpanhalt

    Thread Starter AAC Fanatic!

    Jan 18, 2008
    5,685
    900
    Added RESET here:
    upload_2014-10-24_13-5-40.png

    No joy. Chip #1 works fine; Chip#2(1113485) doesn't work and has same symptom. That is, after a first pass of jibberish data. MCLR reset gives a second, but frozen result that is seemingly correct. Screen does not blink, which would be expected on any reset similar to MCLR.

    I believe the bug must be outside the ISR, because if it froze while in the ISR, the data would not be printed. I will try some resets elsewhere to hopefully pinpoint the problem spot.

    John
     
  5. jpanhalt

    Thread Starter AAC Fanatic!

    Jan 18, 2008
    5,685
    900
    Well to add to the mystery and frustration, I added a RESET here:
    upload_2014-10-24_13-38-55.png
    That should have allowed valid data to be captured followed by RESET and a re-initialization blink. (Not sure registers are cleared. Will need to test that too.) However, Chip #1 (working) acted funny. Chip #2 started working. So, I removed the RESET and am back to the original. BOTH chips now worked.

    It is that type of inconsistency that has been gnawing at me since March. I have ordered a new 40-pinZIF socket just in case that is the culprit and will solder up a stripboard test fixture. Should be here this weekend or Monday. I am pretty much out of other ideas.

    John
     
  6. jpanhalt

    Thread Starter AAC Fanatic!

    Jan 18, 2008
    5,685
    900
    A couple of hours ago I was messing with inserting the chips (#1 and #2 as immediately above), and both started working. Once working, I could remove and re-insert and they still worked. So, I left a working chip in the ZIF socket and turned off the power supply (Agilent 3630A) and let it sit. After at least an hour, I turned the PS back on, and the chip showed the same lock-up result. I used MCLR to "turn off" the chip, but left power to the breadboard and socket.* Opened the ZIF and removed the chip. Touched some of the pins and re-inserted it. I am left handed, ZIF lever is on the left, and pins 20/21 get inserted first. Chip then worked. That seems reproducible with both chips.

    I am stumped.

    Edit: To save folks the trouble, VDD is pins 11/32; VSS is 12/31; EUSART RX input is pin26; EUSART TX is not enabled.

    John

    *My usual procedure was to turn the chip off, disconnect power to the bread board, and remove the chip in that order.
     
    Last edited: Oct 24, 2014
  7. Alec_t

    AAC Fanatic!

    Sep 17, 2013
    5,791
    1,103
    Does chip temperature affect the results?
     
  8. jpanhalt

    Thread Starter AAC Fanatic!

    Jan 18, 2008
    5,685
    900
    So far as I can tell, the temperature does not make a difference. Today was a particularly beautiful Fall day 18°C with bright sunshine, and that occurred to me too. So, I tried to warm the chip to something above RT with my finger, and that didn't help.

    John
     
  9. Alec_t

    AAC Fanatic!

    Sep 17, 2013
    5,791
    1,103
    Removing/reinserting would mean the chip seeing an instantaneous rise in supply voltage, whereas switching on/off the supply might result in a ramped (i.e. relatively slow) rise in supply voltage. Could that cause the effect?
     
    jpanhalt likes this.
  10. jpanhalt

    Thread Starter AAC Fanatic!

    Jan 18, 2008
    5,685
    900
    Power up seemed to be the probelm, so I searched, found, and read AN607 from Microchip. That gave me some ideas. I need to confirm the exact error, but I think I know from history what it is. The section of code involved seems to be in the initialization of the usart. I moved that code around (about 3AM this morning!), and it seemed to work more consistently. Got some sleep, and now I am testing my theory about the history thing. I am being intentionally vague, because I don't want to appear twice the fool, if that theory is wrong.

    John
     
  11. ErnieM

    AAC Fanatic!

    Apr 24, 2011
    7,386
    1,605
    You make it sound like a power-on problem, perhaps something like the voltage ramping up for too long. You can test for that with a simple switch in series with the power to make it SNAP on.

    That makes me wonder what how you are connecting MCLR.

    For a forever fix consider a power supply supervisor.
     
    jpanhalt likes this.
  12. jpanhalt

    Thread Starter AAC Fanatic!

    Jan 18, 2008
    5,685
    900
    UPDATE
    Didn't get much done yesterday -- had to clean gutters and rake leaves

    Got an early start today. It is now working with both chips, and there is no need to invoke MCLR or fondle the pins. I did not check each change I made to see which was the guilty party(ies).
    Changes:
    1) Set baud rate before enabling CREN. Before, I had set baud after enabling the serial port and CREN. Somehow, I thought that wouldn't make a difference, if I was toggling SPEN and CREN later. I am not sure it made a difference, but it seems more logical to do it that way.
    2) Made RC5 output. That bit had been set as input for applying synchronous stimulation and had been left input with no pull-up or pull-down. I did try adding a pull down before making any other changes, but that my itself didn't cure the problem.
    3) Cleared the user ram (0x20 thru 0x2F) and common ram (0x70 - 0x7f) at start. I did not realize those registers could be other than zero on POR. It is an issue discussed on the Microchip forum. That may explain the effects of touching the pins and insertion into the ZIF socket.

    Here's my code for that:

    Code (ASM):
    1.  
    2. ;Clear GPR(0x20 - 0x2F) and Common Ram
    3. ;Flag0, Flag1, and GIEF are the registers most likely to cause a problem
    4. ;on reset if not cleared. FSR0H = 0 on any reset, probably not necessary
    5. ;to clear it in code.
    6.      CLRF      FSR0H               ;                                       |B0
    7.      MOVLW     0x20                ;                                       |B0                                                                      
    8.      MOVWF     FSR0                ;                                       |B0
    9. Clr_GPR
    10.      CLRF      INDF0               ;                                       |B0
    11.      INCF      FSR0
    12.      MOVLW     0x2F                ;                                       |B0
    13.      XORWF     FSR0,w              ;                                       |B0
    14.      BTFSS     STATUS,z            ;                                       |B0
    15.      GOTO      Clr_GPR             ;                                       |B0
    16.      MOVLW     0x70                ;                                       |B0
    17.      MOVWF     FSR0
    18. Clr_Com
    19.      CLRF      INDF0               ;                                       |B0
    20.      INCF      FSR0
    21.      MOVLW     0x7F                ;                                       |B0
    22.      XORWF     FSR0,w              ;                                       |B0  
    23.      BTFSS     STATUS,z            ;                                       |B0
    24.      GOTO      Clr_Com             ;                                       |B0
    25.      NOP
    26.  
    Will check multiple starts/stops throughout the day to see if it is stable. Thank you all for contributing.

    John
     
  13. atferrari

    AAC Fanatic!

    Jan 6, 2004
    2,648
    762
    Regarding 1) I always followed the sequence stated in the datasheet.

    Regarding 3) Many many years ago, one of the best contributors to that forum, (now banned since long time), to a question about a recurrent failure, suggested 2 things: setting explicitly all SFRs as they should be on POR - as per datasheet - (what actually cured my problem) and additionally to clear all RAM prior running the program.

    Since then, I do it regularly always.

    Regarding the clearing, two comments:

    a) For debugging I would eventually write other than H'00' to identify more easily the registers that could have changed.

    b) Is important NOT to clear the RAM register holding the counter value used in the clearing. Been there...

    Routines available if interested, John.
     
  14. jpanhalt

    Thread Starter AAC Fanatic!

    Jan 18, 2008
    5,685
    900
    Yes, I simulated it with other than zeros in the registers. I stuffed them all with .10 and then got to watch them fall like domino's. At 0400 local, that was all the fun I was likely to have. :)

    I don't use a counter per se for clearing. I look for the last of the register locations in the FSR, but I know what you mean about clearing registers that are stuffed.

    John
     
  15. jpanhalt

    Thread Starter AAC Fanatic!

    Jan 18, 2008
    5,685
    900
    I have used just a resistor touch to ground or the following, which is modified from Microchip:
    upload_2014-10-26_11-28-0.png

    With a 10K pullup resistor on the MCU, the rise time to 4V is about 150 mS. It is also a debounce. By using the ICSP header in this way, I don't need to change anything on the PCB to program. I just unplug the MCLR reset circuit.

    John
     
  16. ErnieM

    AAC Fanatic!

    Apr 24, 2011
    7,386
    1,605
    What 10K pullup? Where might that be?

    I ask because the MCLR connections are on the sensitive side, they lead to turn on problems similar to the ones you are seeing.
     
  17. jpanhalt

    Thread Starter AAC Fanatic!

    Jan 18, 2008
    5,685
    900
    Here it is in the ICD3 manual:
    upload_2014-10-26_12-59-12.png

    Interestingly, the 16F1519 datasheet doesn't show it. The datasheet, however, instructs the user to follow the specific instructions of the programmer. My experience is pretty limited in terms of chips, and the other ones I have used showed the pull-up. Some even had a small capacitor to ground, which I did not use as the ICD3 recommends against doing that specifically.

    John
     
  18. Markd77

    Senior Member

    Sep 7, 2009
    2,803
    594
    Always best to clear the registers, I dumped most of the memory content of one chip via serial and almost all user registers were 0xFF but some had a bit or two cleared.
     
Loading...