Single cycle accuracy 16bit delay for PIC

Discussion in 'Embedded Systems and Microcontrollers' started by Markd77, May 24, 2013.

  1. Markd77

    Thread Starter Senior Member

    Sep 7, 2009
    2,803
    594
    Thought I'd share this here in case it is useful to anyone.
    I'm sure it's been done already but I couldn't find one. I'm pretty sure it works for delays of 28 to 65535 cycles in periodH and periodL (unchanged by the function). Should work on all baseline and midrange PICs.
    call and retlw instruction times are included. You can change the offset (up to 255) if you want to balance out time from your main loop.
    If anyone spots any issues let me know.
    Code ( (Unknown Language)):
    1. delay2                    ;d2 = High byte d1 = Low byte
    2. offset equ d'28'
    3.  
    4.     movf periodH, W        ;copy
    5.     movwf d2
    6.     incf d2, F            ;so that decfsz can be used later
    7.     movf periodL, W
    8.     movwf d1
    9.  
    10.     movlw offset        ;subtract the overhead
    11.     subwf d1, F
    12.     btfss STATUS, C
    13.     decf d2, F
    14.    
    15.     comf d1, W        ;delay for the number of cycles of the
    16.     andlw b'00000111'    ;lower 3 bits
    17.     addwf PCL, F
    18.     nop
    19.     nop
    20.     nop
    21.     nop
    22.     nop
    23.     nop
    24.     nop
    25.     nop
    26.  
    27.     movlw b'11111000'    ;lower 3 bits of delay taken care of already
    28.     andwf d1, F
    29.  
    30.     movlw d'8'            ;preload W (unchanged in loop)
    31. delay2loop                ;this loop always takes 8 cycles
    32.     subwf d1, F            ;so subtract 8 from low byte
    33.     btfsc STATUS, C        ;check overflow of low byte
    34.     goto nocarry        ;make loop up to 8 cycles
    35.     decfsz d2, F        ;decrease high byte
    36.     goto delay2waste2cycles
    37.     goto enddelay2        ;job done
    38. nocarry
    39.     goto $+1
    40. delay2waste2cycles
    41.     goto delay2loop
    42.  
    43. enddelay2
    44.     retlw 0x00
     
    Eric007 likes this.
  2. MMcLaren

    Well-Known Member

    Feb 14, 2010
    759
    116
    Hi Mark,

    Here's another solution for mid-range devices that's relatively simple and compact;

    Code ( (Unknown Language)):
    1.  
    2. ;==================================================================
    3. ;  vDelay(16bitvariable), 17..65535 cycles          Mike, K8LH    =
    4. ;==================================================================
    5.  
    6. DelayHi equ     0x40
    7.  
    8. vDelay  macro   U16Var          ; 16 bit variable
    9.         movf    U16Var+1,W      ; the 'hi' byte
    10.         movwf   DelayHi         ;
    11.         movf    U16Var+0,W      ; the 'lo' byte
    12.         call    vdelay          ;
    13.         endm                    ;
    14.  
    15. ;*****************************************************************
    16. ;  code for simulation testing
    17. ;
    18. testit
    19.         vDelay  (delvar)        ; delvar = 10000 (lil endian)
    20.         nop                     ; <- simulator break point here
    21.  
    22. ;*****************************************************************
    23. ;  the 16-bit timing subroutine                                  *
    24. ;*****************************************************************
    25.  
    26. vdelay  addlw   -17             ; subtract subsystem 'overhead'
    27.         skpnc                   ;
    28.         incf    DelayHi,F       ;
    29. vloop
    30.         addlw   -5              ; subtract 5 cycle loop time
    31.         skpc                    ; borrow? no, skip, else
    32.         decfsz  DelayHi,F       ; done? yes, skip, else
    33.         goto    vloop           ; loop again
    34.         xorlw   -1              ;
    35.         addwf   PCL,F           ; take care of delay%5 cycles
    36.         nop                     ;
    37.         nop                     ;
    38.         nop                     ;
    39.         nop                     ;
    40.         return                  ;
    41.  
    If you can use a fixed or constant value for the delay parameter instead of a 16-bit variable, then a cycle accurate fixed delay subsystem can be extremely versatile;

    Code ( (Unknown Language)):
    1. ;******************************************************************
    2. ;  K8LH DelayCy() subsystem macro generates four instructions     *
    3. ;******************************************************************
    4.         radix   dec
    5. clock   equ     4               ; 4, 8, 12, 16, 20 (MHz), etc.
    6. usecs   equ     clock/4         ; cycles/microsecond multiplier
    7. msecs   equ     clock/4*1000    ; cycles/millisecond multiplier
    8.  
    9. DelayCy macro   delay           ; 11..327690 cycle range
    10.         movlw   high((delay-11)/5)+1
    11.         movwf   delayhi
    12.         movlw   low ((delay-11)/5)
    13.         call    uDelay-((delay-11)%5)
    14.         endm
    15. ;******************************************************************
    16. ;  example code for simulation testing                            *
    17. ;******************************************************************
    18.         org     0x000
    19. SimTest
    20.         DelayCy(200*usecs)      ; <- put simulator PC here
    21.         goto    $               ; <- put simulator break point here
    22. ;******************************************************************
    23. ;  K8LH DelayCy() 16-bit uDelay subroutine                        *
    24. ;******************************************************************
    25.         nop                     ; entry for (delay-11)%5 == 4     |B0
    26.         nop                     ; entry for (delay-11)%5 == 3     |B0
    27.         nop                     ; entry for (delay-11)%5 == 2     |B0
    28.         nop                     ; entry for (delay-11)%5 == 1     |B0
    29. uDelay  addlw   -1              ; subtract 5 cycle loop time      |B0
    30.         skpc                    ; borrow? no, skip, else          |B0
    31.         decfsz  delayhi,F       ; done?  yes, skip, else          |B0
    32.         goto    uDelay          ; do another loop                 |B0
    33.         return                  ;                                 |B0
    34. ;******************************************************************
    35.  
    Here's a code example where the delay is specified as a number of microseconds minus the loop time in cycles to produce a precise delay using any clock;

    Code ( (Unknown Language)):
    1. ;
    2. ;  key press beep
    3. ;
    4. ;  DelayCy(1*msecs) produces     DelayCy(1*msecs-6) produces
    5. ;  497.018 Hz --  4 MHz clock    500.000 Hz tone -- any clock
    6. ;  498.504 Hz --  8 MHz clock
    7. ;  499.004 Hz -- 12 MHz clock
    8. ;  499.251 Hz -- 16 MHz clock
    9. ;  499.400 Hz -- 20 MHz clock
    10. ;
    11.         bsf     Beep,5          ; do 32 msec "new press" beep     |B0
    12. DoBeep  movf    PORTA,W         ; read port A                     |B0
    13.         xorlw   1<<Spkr         ; toggle speaker bit              |B0
    14.         movwf   PORTA           ; toggle speaker pin              |B0
    15.         DelayCy(1*msecs-6)      ; 1 msec minus 6 cycles           |B0
    16.         decfsz  Beep,F          ; done?  yes, skip, else          |B0
    17.         goto    DoBeep          ; loop (toggle Spkr pin again)    |B0
    18.  
    A cycle accurate fixed delay subsystem makes it possible to use routines like the bit-banged RS232 routines below with almost any uC clock without change (you simply have to set the 'clock' equate in the delay subsystem);

    Code ( (Unknown Language)):
    1. Put232                          ; 19200 baud Tx subroutine
    2.         movwf   txbyte          ; save Tx data byte               |B0
    3.         movlw   10              ; 1 start + 8 data + 1 stop bit   |B0
    4.         movwf   BitCtr          ; setup bit counter               |B0
    5.         clrc                    ; C = 0 (start bit)               |B0
    6.         goto    SendBit         ; send start bit                  |B0
    7. NextBit
    8.         DelayCy(52*usecs-10)    ; 52 usecs -10 cycle loop time    |B0
    9.         setc                    ; always shift in a 'stop' bit    |B0
    10.         rrf     txbyte,F        ; put data bit in Carry           |B0
    11. SendBit
    12.         movf    GPIO,W          ; read port                       |B0
    13.         iorlw   1<<TxPin        ; set TxPin bit to 1              |B0
    14.         skpc                    ; if data bit = 1 skip, else      |B0
    15.         xorlw   1<<TxPin        ; set TxPin bit to 0              |B0
    16.         movwf   GPIO            ; precise 52-usec bit timing      |B0
    17.         decfsz  BitCtr,F        ; done? yes, skip, else           |B0
    18.         goto    NextBit         ; send next bit                   |B0
    19.         return                  ;                                 |B0
    20. ;
    21. Get232                          ; 19200 baud Rx subroutine
    22.         btfsc   GPIO,RxPin      ; start bit (0)? yes, skip, else  |B0
    23.         goto    Get232          ; loop (wait for start bit)       |B0
    24.         DelayCy(52*usecs/2)     ; delay 1/2 bit time              |B0
    25.         movlw   8               ;                                 |B0
    26.         movwf   BitCtr          ; BitCtr = 8                      |B0
    27. RxBit
    28.         DelayCy(52*usecs-7)     ; precise 52-usec bit timing      |B0
    29.         clrc                    ; assume '0'                      |B0
    30.         btfsc   GPIO,RxPin      ; a '0' bit? yes, skip, else      |B0
    31.         setc                    ; set to '1'                      |B0
    32.         rrf     rxbyte,F        ; b0 first, b7 last               |B0
    33.         decfsz  BitCtr,F        ; done? yes, skip, else           |B0
    34.         goto    RxBit           ; get another bit                 |B0
    35.         movf    rxbyte,W        ; put "rxbyte" in WREG            |B0
    36.         return                  ;                                 |B0
    37.  
     
    Last edited: May 28, 2013
    Eric007 likes this.
  3. Markd77

    Thread Starter Senior Member

    Sep 7, 2009
    2,803
    594
    Very nice, I wouldn't have thought of doing it that way. For this project I'm using a baseline PIC, which unfortunately is missing the very useful addlw instruction, but for midrange PICs your code is a vast improvement.
     
  4. MMcLaren

    Well-Known Member

    Feb 14, 2010
    759
    116
    Here's a variation of the cycle accurate fixed delay subsystem that works for both baseline (12-bit core) and mid-range (14-bit core) devices. It's slightly less efficient than the routine for mid-range devices due to the lack of an "addlw" instruction, as you mentioned.

    BTW, several delay subsystems, including some of the ones I've shown here, can be found in the PICLIST source code library.

    Cheerful regards, Mike

    Code ( (Unknown Language)):
    1.  
    2. ;==================================================================
    3. ;  K8LH DelayCy() subsystem macro generates five instructions     =
    4. ;==================================================================
    5.         radix   dec
    6. clock   equ     4               ; 4 MHz
    7. usecs   equ     clock/4         ; cycles/usec operand multiplier
    8. msecs   equ     clock/4*1000    ; cycles/msec operand multiplier
    9.  
    10. DelayCy macro   delay       ; 12..524298 cycle range
    11.     movlw   high((delay-12)/8)^255
    12.     movwf   delayhi
    13.     movlw   low ((delay-12)/8)^255
    14.     movwf   delaylo
    15.     call    uDelay-((delay-12)%8)
    16.     endm
    17.  
    18. ;******************************************************************
    19. ;  K8LH DelayCy() subsystem 16-bit "uDelay" timing subroutine     *
    20. ;******************************************************************
    21.         nop                     ; entry for (delay-12)%8 == 7
    22.         nop                     ; entry for (delay-12)%8 == 6
    23.         nop                     ; entry for (delay-12)%8 == 5
    24.         nop                     ; entry for (delay-12)%8 == 4
    25.         nop                     ; entry for (delay-12)%8 == 3
    26.         nop                     ; entry for (delay-12)%8 == 2
    27.         nop                     ; entry for (delay-12)%8 == 1
    28. uDelay  incf    delaylo,F       ; subtract one 8-cycle loop
    29.         skpnz                   ; borrow? no, skip, else
    30.         incfsz  delayhi,F       ; done?  yes, skip, else
    31.         goto    uDelay-3        ; do another 8-cycle loop
    32.         retlw   0               ;
    33.  
    34. ;******************************************************************
    35.  
     
    Last edited: May 25, 2013
  5. Eric007

    Senior Member

    Aug 5, 2011
    1,041
    33
    Mike! your coding is ALWAYS very neat! I like that.
     
  6. MMcLaren

    Well-Known Member

    Feb 14, 2010
    759
    116
    Hey Mark,

    I don't mean to detract from you accomplishment. You've done a really nice job and I'm excited to see someone getting into this level of assembly language programming. For me, optimizing code is more fun than a crossword puzzle (grin) so hopefully you won't mind if I make a few comments or suggestions.

    After you copy the 16 bit delay value into working variables d1 and d2 in your first section of code you increment the high byte so that you can use the decfsz instruction on the high byte in your delay loop. You also decrement the high byte if there's a borrow when you subtract the subsystem overhead. You can accomplish these same two operations while shaving off one word of memory and one cycle of overhead in a couple different ways;

    Code ( (Unknown Language)):
    1.  
    2. delay2
    3.         incf    periodH,W       ; bump hi byte
    4.         movwf   d2              ; make a working copy
    5.         movf    periodL,W       ;
    6.         movwf   d1              ;
    7.         movlw   offset          ;
    8.         subwf   d1,F            ; subtract overhead
    9.         skpc                    ; borrow? no, skip, else
    10.         decf    d2,F            ; decrement the hi byte
    11.  
    Code ( (Unknown Language)):
    1.  
    2.         movf    periodH,W       ; make a working copy
    3.         movwf   d2              ;
    4.         movf    periodL,W       ;
    5.         movwf   d1              ;
    6.         movlw   offset          ;
    7.         subwf   d1,F            ; subtract overhead
    8.         skpnc                   ; borrow? yes, skip, else
    9.         incf    d2,F            ; bump hi byte
    10.  
    In the next section of code you take care of the modulo 8 portion of the delay but you included an extra 'nop' instruction for delay%8 == 0. Get rid of the extra 'nop' and you reduce overhead by one more cycle and one more word of program memory.

    Code ( (Unknown Language)):
    1.         comf    d1,W            ;
    2.         andlw   b'00000111'     ; handle delay%8 cycles
    3.         addwf   PCL,F           ;
    4.         nop                     ; delay%8 == 7
    5.         nop                     ; delay%8 == 6
    6.         nop                     ; delay%8 == 5
    7.         nop                     ; delay%8 == 4
    8.         nop                     ; delay%8 == 3
    9.         nop                     ; delay%8 == 2
    10.         nop                     ; delay%8 == 1
    11.  
    The next two instructions aren't necessary. You're subtracting 8 from the low byte so it doesn't matter if the three "%8" bits are there (they don't affect when you get the borrow flag in your timing loop). Get rid of these two instructions and you're down from 32 words and 28 cycles overhead to 28 words and 24 cycles overhead.

    Code ( (Unknown Language)):
    1. ;       movlw   b'11111000'     ; lower 3 bits of delay taken care of already
    2. ;       andwf   d1,F            ;
    3.  
    Also, leaving the three "%8" bits in your delay variable through the timing loop leaves you with the compliment of the %8 bits at the end of the timing loop. If you move your %8 code to the end of the timing loop you can eliminate the andlw b'00000111' instruction in your %8 code and save another word of memory and another cycle overhead.

    Your last section of code, the 16-bit timing loop, is flawless. But, if you change the structure a bit, going from an 8 cycle loop to an isochronous 5 cycle loop, you can realize a significant reduction in size and overhead (modulo 5 code is smaller than modulo 8 code). Here's a summary of the changes discussed so far;

    Code ( (Unknown Language)):
    1.  
    2. delay2
    3.         movf    periodH,W       ; make a working copy
    4.         movwf   d2              ; of the 16-bit delay
    5.         movf    periodL,W       ;
    6.         movwf   d1              ;
    7.         movlw   d'20'           ; subtract the overhead
    8.         subwf   d1,F            ;
    9.         skpnc                   ; borrow? yes, skip, else
    10.         incf    d2,F            ; bump the hi byte
    11.  
    12.         movlw   d'5'            ; (5 cycle loop)
    13. dloop
    14.         subwf   d1,F            ; subtract 5 cycle loop time
    15.         skpc                    ; borrow? no, skip, else
    16.         decfsz  d2,F            ; done? yes, skip, else
    17.         goto    dloop           ; loop (not done)
    18.         comf    d1,W            ; compliment %5 remainder
    19.         addwf   PCL,F           ; handle %5 portion of delay
    20.         nop                     ; entry for delay%5 == 4
    21.         nop                     ; entry for delay%5 == 3
    22.         nop                     ; entry for delay%5 == 2
    23.         nop                     ; entry for delay%5 == 1
    24.         retlw   0               ; entry for delay%5 == 0
    25.  
    Here we've gone from 32 words of memory and 28 cycles overhead down to 20 words of memory and 20 cycles overhead.

    Take care. Have fun.

    Cheerful regards, Mike
     
    Last edited: May 26, 2013
  7. Markd77

    Thread Starter Senior Member

    Sep 7, 2009
    2,803
    594
    Thanks, that's great. For some unknown reason I thought the loop needed to be a power of 2 cycles long.
    Here's what I'm working on, just playing a scale through a piezo at the moment:
    http://youtu.be/92h_24mE69o
    The saved instructions will mean I can add more notes to the tune it will eventually play.
     
  8. THE_RB

    AAC Fanatic!

    Feb 11, 2008
    5,435
    1,305
    So you are using this to produce an output frequency? Not for a one-time delay?

    If you want really simple code to produce cycle accurate period for exact frequency generation then have a look at "ZEZJ algorithm" halfway down this page;
    http://www.romanblack.com/one_sec.htm
    it uses TMR0 interrupt and only a tiny bit of code and will produce the frequency automatically while any of your other code is running.

    If you want to make musical notes with a PIC see this page;
    http://www.libstock.com/projects/vi...-notes-with-pic16-or-pic18-and-any-xtal-value
    It has code to make highly exact musical notes in a 10 octave range from any xtal value.

    Or the bottom of this page;
    http://www.romanblack.com/onesec/High_Acc_Timing.htm
    There is code to generate exact frequencies in decimal Hz, 0.0001 Hz to 50000.0000 Hz, with adjustment resolution of 0.0001 Hz.

    All those code examples are for PICs in C code, but you can easily adapt the principle operation to assembler if you prefer. :)
     
  9. MMcLaren

    Well-Known Member

    Feb 14, 2010
    759
    116
    That's pretty cool. Which baseline PIC are you using, Mark?
     
  10. Markd77

    Thread Starter Senior Member

    Sep 7, 2009
    2,803
    594
    This method is good enough for my purposes, the worst actual error isn't all that bad, 1Hz at 1kHz and it gets better for lower notes.
    I could try to get it perfect, but it eats into code space, especially as I'm already using timer 0 for the note decay and timing. There's no interrupt on the PIC I'm using (PIC12F508) which would make things trickier.
    In the video I haven't accounted for the time the rest of the code takes so the notes are out by about 3% for the higher notes, but only around 0.1% for the lowest, and it still sounds fairly good to me.
     
Loading...