Pic watchdog timer

Discussion in 'Embedded Systems and Microcontrollers' started by geoffers, Feb 11, 2015.

  1. geoffers

    Thread Starter Active Member

    Oct 25, 2010
    239
    6
    Hi all,
    I have a project up and running, its been going for along time now (2ish years) with constant dabbling from me! Just recently I had a few niggles when I rewrote the program with it hanging in funny places, this is now resolved but every so often when I check it its frozen (timescale has varied from a week to twice in 24 hrs). This hasn't happened before and I suspect it to be a I2C hang.
    I have a display/keypad module, eeprom, rtcc on the I2C, I suspect it could be a bad connection to the keypad/display which I'm going to have a look at.

    Its got me a bit spooked as the thing runs 24/7 feeding calves and I thought maybe I should be using the watchdog timer to force a reset if it happens again,

    First question, will a watchdog reset force whichever I2C thing that could be hanging on to the bus to let go?

    Are there any issues using the watchdog timer I should look out for?

    I'm using a 18f2523 is there any way to tell if its a watchdog reset as opposed to a power off reset?

    Cheers Geoff
     
  2. tshuck

    Well-Known Member

    Oct 18, 2012
    3,531
    675
    Unless you are going to use an external watchdog that will reset all devices in the system, no. The watchdog timer will reset the PIC, if another device is pulling the lines low, that device would need to be made to release the lines. Have you checked SDA and SCL when the device is frozen? The problem may be elsewhere as well...

    You should look at how long it takes the devices on the I2C bus to timeout and disregard a partially sent transmission and have the master (PIC) wait for that time upon a restart.

    Yes, check out RCON on page 42 in the datasheet for 18F2520 (the datasheet for the 18F2523 specifically mentioned that it only deals with differences between the two PICs).
     
  3. ErnieM

    AAC Fanatic!

    Apr 24, 2011
    7,386
    1,605
    A slave device can hang the I2C device in several ways. First, though rare, a slave can legally pull the clock line low when it is requesting extra time to process a transaction; so it if gets stuck the bus appears dead.

    If the master never completes reading back an ACK bit then the slave is stuck, and restarting the master will not help as the master will see the bus in use and wait for it to end.

    One work around for the latter is to manually clock the clock line to get past the ACK.

    You can check if either is the problem by reading the I2C lines during the master hang up: if both lines are high then the I2C bus is free to work again.
     
  4. joeyd999

    AAC Fanatic!

    Jun 6, 2011
    2,674
    2,717
    And this is why I don't use blocking code for peripheral devices!
     
  5. MrChips

    Moderator

    Oct 2, 2009
    12,420
    3,355
    Diagnosing intermittent faults such as yours is extremely difficult without access to expensive transient recorders and logic analyzers.

    Watchdog timers (WDT) can be useful in restarting mission critical systems but you employ that as a last resort. WDT will restart failures in complex software systems caused by software bugs or lock-out problems known as the "deadly embrace" when two sections of the code are waiting for the same resource but one refuses to release the resource.

    On a rare occasion, the fault may be due to a hardware failure such as spurious electromagnetic interference, ionizing radiation or other physical malfunction.

    Before implementing WDT I would look for other hardware suspects.

    1) Power supply. How stable, clean and regulated is the power supply? Does it have proper regulation and filtration?
    Are the filter capacitors new and up to specs?

    2) Do you suffer from regular electrical power brownouts? Does your microcontroller (MCU) have brownout detection circuitry?

    3) Does your circuit have proper 0.1μF capacitors across the power rails at every IC?

    4) Are all power supply voltages and currents up to specs?

    5) How noisy is your electrical environment? Do you have heavy electrical machinery that stop and start on a regular basis?

    6) Is the RESET circuitry to the MCU properly designed?

    7) Is your entire circuitry laid out properly and professionally on a double sided PCB?
    Is there a ground plane on the PCB?
    Are power traces sufficiently wide for the amount of current they are expected to carry?
    Are your I2C connections short as possible and separate from other interfering signals?

    8) Is the whole assembly of electronics in a grounded metal chassis?

    Just a short list of things to start thinking about that need attention.
     
  6. JohnInTX

    Moderator

    Jun 26, 2012
    2,340
    1,022
    In addition to the excellent ideas posted by the others, I'd do something like this:

    You want the system to keep running so I'd definitely use the WDT and detect when the PIC was reset due to WDT timeout. Use a long prescaler to avoid stray trips due to delays etc. When the system detects a WDT reset, it should run self-tests that exercise and verify all of the peripherals and get them operational from an unknown state. This includes reconfiguring the I2C, sending the SCL clocks to recover a lost slave etc. I usually put all of this in the normal reset/startup stuff since I don't trust external peripherals. If it can't get things running, set an alarm and try again. Hopefully, a full reconfigure and error recovery at least keeps the calves fed.

    Something I've done to find problems causing stray WDT timeouts that happen in the field is to log entry and exit into the various routines. One way is to load a register with a non-zero number before entering a function and clear it upon return. If the WDT resets the PIC, inspecting the number will identify where it was when the timeout occurred. If power off is not a concern, it can be in RAM or use the internal EEPROM. If you use RAM and C, tell the compiler not to initialize that variable to '0' on a reset. For readout, you could put the code on the display assuming it came back up. A crude way would be just to plug in the PICkit and read the EEPROM image. Lots of ways to do it.

    In the code itself, consider clearing the WDT only once, at the top of your main loop. I always take the time to inspect the system setup as well - making sure that the interrupt configuration, IO directions, MSSP setups etc. are as they should be and haven't been clobbered by a stray write through an FSR etc (its happened to me). If everything is not up to snuff, its time to flag it and reinitialize.

    Reconsider any code that does single bit outputs on port C (shared with I2C). Even doing r-m-w on LATC has caused issues in the past with I2C lockups - we've tested it, we know. Its not supposed to but it does, at least on the 18F stuff we use. Don't know about your particular part but I still won't r-m-w on LATx shared with I2C. Also, never, ever r-m-w on TRISC when using I2C. uCHIP is beginning to come clean and later datasheets finally recommend against that. If you have the misfortune to r-m-w on TRISC when the MSSP module is sending ACK, it will lock SDA low until you reset the MSSP. At that point, you'll need to be able to resync the various slaves on the bus.

    Pull the errata for your silicon revision and make sure you've accounted for the issues. Keep in mind that it sometimes takes a LONG time for a known issue to make it to the errata sheet. Filing a support ticket can fill in the gaps.

    I don't know how your code is written but I agree with Joey999 - your code should never wait indefinitely on a peripheral. Using the WDT as a backup is OK but it would be nice to detect and handle issues in the normal program flow.

    Good luck!
     
  7. geoffers

    Thread Starter Active Member

    Oct 25, 2010
    239
    6
    Thanks Guys,
    So much good information in a few posts! I've learnt a fair bit already. Until now I'd disregarded the wdt and thought if everything works ok I could leave it out.
    If the I2C gets stuck, clocking scl can grab it back?
    John, do you always use a shadow register for latc in a 18f series if your using I2C? I've already learnt about r-m-w the hard way by bsf ing ports instead of lat!
    I like the idea of using a register to debug too.
    Thanks again, you don't get this stuff on a datasheet!
    Cheers Geoff
     
  8. ErnieM

    AAC Fanatic!

    Apr 24, 2011
    7,386
    1,605
    You can typically unstick a slave by clocking CLK.

    Shadowing TRIS registers behind the I2C pins will not help since they are under secret control by the I2C module. Best to only change them when (before?) the I2C is in use, or perhaps in between uses.
     
  9. JohnInTX

    Moderator

    Jun 26, 2012
    2,340
    1,022
    Not so. I have (unfortunately) extensive, expensive experience in this area. Shadowing TRIS while using an I2C slave fixes the r-m-w on TRIS problem - at least on midrange and some early 18F that we ported our system to. At the time, the problem and solution was documented in internal uCHIP documents only.
    True enough but you can't control when a slave SSP turns the port around for ACK. If your program needs to change TRIS on the port shared by I2C, you have to shadow it.

    I absolutely do - and I can't tell you why it should be necessary on 18F. All I can tell you is that after the pain-fest of mixing IO and directions on midrange, we had extensive testing procedures developed for I2C. During the port to 18F, I got everything working and ran the tests. Un-shadowed PORTC (doing r-m-w on it directly) failed. Shadow ON - worked. Shadow OFF - fail. That was enough for me. Now, with I2C, any new designs either use the other PORTC pins as inputs only or they get shadowed. And there is no freakin' way I'd consent to changing TRISC for I2C in any new designs.

    To be fair, we used the bus heavily with over 50 slaves and 3 masters hammering away at all the slaves at 400KHz at >90% duty. The basic test protocol called for 50E6 messages of lengths from 1-65 character packets with no errors. Most applications don't use the bus that hard. I'm kind of a belt and suspenders guy and don't like solving the same problem twice so - that's what I do with PORTC/I2C. YMMV
     
    Last edited: Feb 12, 2015
  10. ErnieM

    AAC Fanatic!

    Apr 24, 2011
    7,386
    1,605
    As it happens the only TRIS twiddling I've been doing is confined on a single port used just for straignt digital I/O, no I2C or any other fancy functions. So I've not had any trouble with the I2C/TRIS issue.
    My thought was on the master side a macro or function can set/reset a semaphore for I2C at the START and STOP commands. That could be used to tell the rest of the system to BACK OFF while a background I2C transcation is in process and leave the dang TRIS bits alone.
     
  11. JohnInTX

    Moderator

    Jun 26, 2012
    2,340
    1,022
    Good thought. Monitoring the S/P status bit tells when the bus is busy. One thing I've never been able to resolve to my satisfaction is 'OK, bus is busy, now what?'. Suspend the IO until the bus isn't busy maybe but how and what other problems will that cause? Not a lot of spare resources on a PIC. Definitely agree with you on 'leaving the dang TRIS alone'. Learned that one the hard way..

    Someday maybe I'll do a blog post on some of our experiences with this. In one case, a simple, late-addition of a software PWM that toggled a bit on TRISC (to dim a shared LED) tumbled into a train-wreck of issues that rippled way beyond I2C. Ugly. Distilled to its essence the whole thing reduces to what ErnieM said Don't.do.r-m-w.on.the.sharedI2C.port. (or any other midrange port or tris for that matter). EDIT: and, on midrange, shadow the ports from the get-go.

    Cheers!
     
    Last edited: Feb 14, 2015
  12. geoffers

    Thread Starter Active Member

    Oct 25, 2010
    239
    6
    Thanks again for all this, I've been fiddling with my code over the last few days, studying data sheets etc, Ernie, to clock scl I must first disable my I2C module, change the trisc bits that the I2C was using (in my case rc3 and rc4) the toggle sci high/low. Have I got that right? How may clocks do you do to try and recover things?

    I think I'm going to add a bit of a handling routine to tell which slave has gone awol too, my application can manage critical stuff without a display but not without its eeprom and rtcc.

    Thanks to John latc is now shadowed :).

    Cheers Geoff
     
  13. geoffers

    Thread Starter Active Member

    Oct 25, 2010
    239
    6
    Just found out... I think, from the I2c spec sheet;

    3.1.16 Bus clear In the unlikely event where the clock (SCL) is stuck LOW, the preferential procedure is to reset the bus using the HW reset signal if your I2C devices have HW reset inputs. If the I 2C devices do not have HW reset inputs, cycle power to the devices to activate the mandatory internal Power-On Reset (POR) circuit. If the data line (SDA) is stuck LOW, the master should send nine clock pulses. The device that held the bus LOW should release it sometime within those nine clocks. If not, then use the HW reset or cycle power to clear the bus.

    What are the chances, I coded in ten, best make it nine,
    Cheers Geoff
     
  14. ErnieM

    AAC Fanatic!

    Apr 24, 2011
    7,386
    1,605
    9 or 900 makes no difference as long as you hold data at some value and don't send out any valid slave addresses.

    Anything addressed and locked will release after 9 and ignore the next 981 clocks.
     
    geoffers likes this.
Loading...