Single cycle accuracy 16bit delay for PIC

Thread Starter

Markd77

Joined Sep 7, 2009
2,806
Thought I'd share this here in case it is useful to anyone.
I'm sure it's been done already but I couldn't find one. I'm pretty sure it works for delays of 28 to 65535 cycles in periodH and periodL (unchanged by the function). Should work on all baseline and midrange PICs.
call and retlw instruction times are included. You can change the offset (up to 255) if you want to balance out time from your main loop.
If anyone spots any issues let me know.
Rich (BB code):
delay2                    ;d2 = High byte d1 = Low byte
offset equ d'28'

    movf periodH, W        ;copy
    movwf d2
    incf d2, F            ;so that decfsz can be used later
    movf periodL, W
    movwf d1

    movlw offset        ;subtract the overhead
    subwf d1, F
    btfss STATUS, C
    decf d2, F
    
    comf d1, W        ;delay for the number of cycles of the
    andlw b'00000111'    ;lower 3 bits
    addwf PCL, F
    nop
    nop
    nop
    nop
    nop
    nop
    nop
    nop

    movlw b'11111000'    ;lower 3 bits of delay taken care of already
    andwf d1, F

    movlw d'8'            ;preload W (unchanged in loop)
delay2loop                ;this loop always takes 8 cycles
    subwf d1, F            ;so subtract 8 from low byte
    btfsc STATUS, C        ;check overflow of low byte
    goto nocarry        ;make loop up to 8 cycles
    decfsz d2, F        ;decrease high byte
    goto delay2waste2cycles
    goto enddelay2        ;job done
nocarry
    goto $+1
delay2waste2cycles
    goto delay2loop

enddelay2
    retlw 0x00
 

MMcLaren

Joined Feb 14, 2010
861
Hi Mark,

Here's another solution for mid-range devices that's relatively simple and compact;

Rich (BB code):
;==================================================================
;  vDelay(16bitvariable), 17..65535 cycles          Mike, K8LH    =
;==================================================================

DelayHi equ     0x40

vDelay  macro   U16Var          ; 16 bit variable
        movf    U16Var+1,W      ; the 'hi' byte
        movwf   DelayHi         ;
        movf    U16Var+0,W      ; the 'lo' byte
        call    vdelay          ;
        endm                    ;

;*****************************************************************
;  code for simulation testing
;
testit
        vDelay  (delvar)        ; delvar = 10000 (lil endian)
        nop                     ; <- simulator break point here

;*****************************************************************
;  the 16-bit timing subroutine                                  *
;*****************************************************************

vdelay  addlw   -17             ; subtract subsystem 'overhead'
        skpnc                   ;
        incf    DelayHi,F       ;
vloop
        addlw   -5              ; subtract 5 cycle loop time
        skpc                    ; borrow? no, skip, else
        decfsz  DelayHi,F       ; done? yes, skip, else
        goto    vloop           ; loop again
        xorlw   -1              ;
        addwf   PCL,F           ; take care of delay%5 cycles
        nop                     ;
        nop                     ;
        nop                     ;
        nop                     ;
        return                  ;
If you can use a fixed or constant value for the delay parameter instead of a 16-bit variable, then a cycle accurate fixed delay subsystem can be extremely versatile;

Rich (BB code):
;******************************************************************
;  K8LH DelayCy() subsystem macro generates four instructions	  *
;******************************************************************
        radix   dec
clock   equ     4               ; 4, 8, 12, 16, 20 (MHz), etc.
usecs   equ     clock/4         ; cycles/microsecond multiplier
msecs   equ     clock/4*1000    ; cycles/millisecond multiplier

DelayCy macro   delay           ; 11..327690 cycle range
        movlw   high((delay-11)/5)+1
        movwf   delayhi
        movlw   low ((delay-11)/5)
        call    uDelay-((delay-11)%5)
        endm
;******************************************************************
;  example code for simulation testing                            *
;******************************************************************
        org     0x000
SimTest
        DelayCy(200*usecs)      ; <- put simulator PC here
        goto    $               ; <- put simulator break point here
;******************************************************************
;  K8LH DelayCy() 16-bit uDelay subroutine                        *
;******************************************************************
        nop                     ; entry for (delay-11)%5 == 4     |B0
        nop                     ; entry for (delay-11)%5 == 3     |B0
        nop                     ; entry for (delay-11)%5 == 2     |B0
        nop                     ; entry for (delay-11)%5 == 1     |B0
uDelay  addlw   -1              ; subtract 5 cycle loop time      |B0
        skpc                    ; borrow? no, skip, else          |B0
        decfsz  delayhi,F       ; done?  yes, skip, else          |B0
        goto    uDelay          ; do another loop                 |B0
        return                  ;                                 |B0
;******************************************************************
Here's a code example where the delay is specified as a number of microseconds minus the loop time in cycles to produce a precise delay using any clock;

Rich (BB code):
;
;  key press beep
;
;  DelayCy(1*msecs) produces     DelayCy(1*msecs-6) produces
;  497.018 Hz --  4 MHz clock    500.000 Hz tone -- any clock
;  498.504 Hz --  8 MHz clock
;  499.004 Hz -- 12 MHz clock
;  499.251 Hz -- 16 MHz clock
;  499.400 Hz -- 20 MHz clock
;
        bsf     Beep,5          ; do 32 msec "new press" beep     |B0
DoBeep  movf    PORTA,W         ; read port A                     |B0
        xorlw   1<<Spkr         ; toggle speaker bit              |B0
        movwf   PORTA           ; toggle speaker pin              |B0
        DelayCy(1*msecs-6)      ; 1 msec minus 6 cycles           |B0
        decfsz  Beep,F          ; done?  yes, skip, else          |B0
        goto    DoBeep          ; loop (toggle Spkr pin again)    |B0
A cycle accurate fixed delay subsystem makes it possible to use routines like the bit-banged RS232 routines below with almost any uC clock without change (you simply have to set the 'clock' equate in the delay subsystem);

Rich (BB code):
Put232                          ; 19200 baud Tx subroutine
        movwf   txbyte          ; save Tx data byte               |B0
        movlw   10              ; 1 start + 8 data + 1 stop bit   |B0
        movwf   BitCtr          ; setup bit counter               |B0
        clrc                    ; C = 0 (start bit)               |B0
        goto    SendBit         ; send start bit                  |B0
NextBit
        DelayCy(52*usecs-10)    ; 52 usecs -10 cycle loop time    |B0
        setc                    ; always shift in a 'stop' bit    |B0
        rrf     txbyte,F        ; put data bit in Carry           |B0
SendBit
        movf    GPIO,W          ; read port                       |B0
        iorlw   1<<TxPin        ; set TxPin bit to 1              |B0
        skpc                    ; if data bit = 1 skip, else      |B0
        xorlw   1<<TxPin        ; set TxPin bit to 0              |B0
        movwf   GPIO            ; precise 52-usec bit timing      |B0
        decfsz  BitCtr,F        ; done? yes, skip, else           |B0
        goto    NextBit         ; send next bit                   |B0
        return                  ;                                 |B0
;
Get232                          ; 19200 baud Rx subroutine
        btfsc   GPIO,RxPin      ; start bit (0)? yes, skip, else  |B0
        goto    Get232          ; loop (wait for start bit)       |B0
        DelayCy(52*usecs/2)     ; delay 1/2 bit time              |B0
        movlw   8               ;                                 |B0
        movwf   BitCtr          ; BitCtr = 8                      |B0
RxBit
        DelayCy(52*usecs-7)     ; precise 52-usec bit timing      |B0
        clrc                    ; assume '0'                      |B0
        btfsc   GPIO,RxPin      ; a '0' bit? yes, skip, else      |B0
        setc                    ; set to '1'                      |B0
        rrf     rxbyte,F        ; b0 first, b7 last               |B0
        decfsz  BitCtr,F        ; done? yes, skip, else           |B0
        goto    RxBit           ; get another bit                 |B0
        movf    rxbyte,W        ; put "rxbyte" in WREG            |B0
        return                  ;                                 |B0
 
Last edited:

Thread Starter

Markd77

Joined Sep 7, 2009
2,806
Very nice, I wouldn't have thought of doing it that way. For this project I'm using a baseline PIC, which unfortunately is missing the very useful addlw instruction, but for midrange PICs your code is a vast improvement.
 

MMcLaren

Joined Feb 14, 2010
861
Here's a variation of the cycle accurate fixed delay subsystem that works for both baseline (12-bit core) and mid-range (14-bit core) devices. It's slightly less efficient than the routine for mid-range devices due to the lack of an "addlw" instruction, as you mentioned.

BTW, several delay subsystems, including some of the ones I've shown here, can be found in the PICLIST source code library.

Cheerful regards, Mike

Rich (BB code):
;==================================================================
;  K8LH DelayCy() subsystem macro generates five instructions     =
;==================================================================
        radix   dec
clock   equ     4               ; 4 MHz
usecs   equ     clock/4         ; cycles/usec operand multiplier
msecs   equ     clock/4*1000    ; cycles/msec operand multiplier

DelayCy macro   delay           ; 12..524298 cycle range
    if((delay < 12) | (delay > 524298))
        error "DelayCy() range error"
    endif
        movlw   high((delay-12)/8)^255
        movwf   delayhi
        movlw   low ((delay-12)/8)^255
        movwf   delaylo
        call    uDelay-((delay-12)%8)
        endm

;******************************************************************
;  K8LH DelayCy() subsystem 16-bit "uDelay" timing subroutine     *
;******************************************************************
        nop                     ; entry for (delay-12)%8 == 7
        nop                     ; entry for (delay-12)%8 == 6
        nop                     ; entry for (delay-12)%8 == 5
        nop                     ; entry for (delay-12)%8 == 4
        nop                     ; entry for (delay-12)%8 == 3
        nop                     ; entry for (delay-12)%8 == 2
        nop                     ; entry for (delay-12)%8 == 1
uDelay  incf    delaylo,F       ; subtract one 8-cycle loop
        skpnz                   ; borrow? no, skip, else
        incfsz  delayhi,F       ; done?  yes, skip, else
        goto    uDelay-3        ; do another 8-cycle loop
        retlw   0               ;

;******************************************************************
 
Last edited:

MMcLaren

Joined Feb 14, 2010
861
Hey Mark,

I don't mean to detract from you accomplishment. You've done a really nice job and I'm excited to see someone getting into this level of assembly language programming. For me, optimizing code is more fun than a crossword puzzle (grin) so hopefully you won't mind if I make a few comments or suggestions.

After you copy the 16 bit delay value into working variables d1 and d2 in your first section of code you increment the high byte so that you can use the decfsz instruction on the high byte in your delay loop. You also decrement the high byte if there's a borrow when you subtract the subsystem overhead. You can accomplish these same two operations while shaving off one word of memory and one cycle of overhead in a couple different ways;

Rich (BB code):
delay2
        incf    periodH,W       ; bump hi byte
        movwf   d2              ; make a working copy
        movf    periodL,W       ;
        movwf   d1              ;
        movlw   offset          ;
        subwf   d1,F            ; subtract overhead
        skpc                    ; borrow? no, skip, else
        decf    d2,F            ; decrement the hi byte
Rich (BB code):
        movf    periodH,W       ; make a working copy
        movwf   d2              ;
        movf    periodL,W       ;
        movwf   d1              ;
        movlw   offset          ;
        subwf   d1,F            ; subtract overhead
        skpnc                   ; borrow? yes, skip, else
        incf    d2,F            ; bump hi byte
In the next section of code you take care of the modulo 8 portion of the delay but you included an extra 'nop' instruction for delay%8 == 0. Get rid of the extra 'nop' and you reduce overhead by one more cycle and one more word of program memory.

Rich (BB code):
        comf    d1,W            ; 
        andlw   b'00000111'     ; handle delay%8 cycles
        addwf   PCL,F           ;
        nop                     ; delay%8 == 7
        nop                     ; delay%8 == 6
        nop                     ; delay%8 == 5
        nop                     ; delay%8 == 4
        nop                     ; delay%8 == 3
        nop                     ; delay%8 == 2
        nop                     ; delay%8 == 1
The next two instructions aren't necessary. You're subtracting 8 from the low byte so it doesn't matter if the three "%8" bits are there (they don't affect when you get the borrow flag in your timing loop). Get rid of these two instructions and you're down from 32 words and 28 cycles overhead to 28 words and 24 cycles overhead.

Rich (BB code):
;       movlw   b'11111000'     ; lower 3 bits of delay taken care of already
;       andwf   d1,F            ;
Also, leaving the three "%8" bits in your delay variable through the timing loop leaves you with the compliment of the %8 bits at the end of the timing loop. If you move your %8 code to the end of the timing loop you can eliminate the andlw b'00000111' instruction in your %8 code and save another word of memory and another cycle overhead.

Your last section of code, the 16-bit timing loop, is flawless. But, if you change the structure a bit, going from an 8 cycle loop to an isochronous 5 cycle loop, you can realize a significant reduction in size and overhead (modulo 5 code is smaller than modulo 8 code). Here's a summary of the changes discussed so far;

Rich (BB code):
delay2
        movf    periodH,W       ; make a working copy
        movwf   d2              ; of the 16-bit delay
        movf    periodL,W       ;
        movwf   d1              ;
        movlw   d'20'           ; subtract the overhead
        subwf   d1,F            ;
        skpnc                   ; borrow? yes, skip, else
        incf    d2,F            ; bump the hi byte

        movlw   d'5'            ; (5 cycle loop)
dloop
        subwf   d1,F            ; subtract 5 cycle loop time
        skpc                    ; borrow? no, skip, else
        decfsz  d2,F            ; done? yes, skip, else
        goto    dloop           ; loop (not done)
        comf    d1,W            ; compliment %5 remainder
        addwf   PCL,F           ; handle %5 portion of delay
        nop                     ; entry for delay%5 == 4
        nop                     ; entry for delay%5 == 3
        nop                     ; entry for delay%5 == 2
        nop                     ; entry for delay%5 == 1
        retlw   0               ; entry for delay%5 == 0
Here we've gone from 32 words of memory and 28 cycles overhead down to 20 words of memory and 20 cycles overhead.

Take care. Have fun.

Cheerful regards, Mike
 
Last edited:

Thread Starter

Markd77

Joined Sep 7, 2009
2,806
Thanks, that's great. For some unknown reason I thought the loop needed to be a power of 2 cycles long.
Here's what I'm working on, just playing a scale through a piezo at the moment:
http://youtu.be/92h_24mE69o
The saved instructions will mean I can add more notes to the tune it will eventually play.
 

THE_RB

Joined Feb 11, 2008
5,438
So you are using this to produce an output frequency? Not for a one-time delay?

If you want really simple code to produce cycle accurate period for exact frequency generation then have a look at "ZEZJ algorithm" halfway down this page;
http://www.romanblack.com/one_sec.htm
it uses TMR0 interrupt and only a tiny bit of code and will produce the frequency automatically while any of your other code is running.

If you want to make musical notes with a PIC see this page;
http://www.libstock.com/projects/vi...-notes-with-pic16-or-pic18-and-any-xtal-value
It has code to make highly exact musical notes in a 10 octave range from any xtal value.

Or the bottom of this page;
http://www.romanblack.com/onesec/High_Acc_Timing.htm
There is code to generate exact frequencies in decimal Hz, 0.0001 Hz to 50000.0000 Hz, with adjustment resolution of 0.0001 Hz.

All those code examples are for PICs in C code, but you can easily adapt the principle operation to assembler if you prefer. :)
 

Thread Starter

Markd77

Joined Sep 7, 2009
2,806
This method is good enough for my purposes, the worst actual error isn't all that bad, 1Hz at 1kHz and it gets better for lower notes.
I could try to get it perfect, but it eats into code space, especially as I'm already using timer 0 for the note decay and timing. There's no interrupt on the PIC I'm using (PIC12F508) which would make things trickier.
In the video I haven't accounted for the time the rest of the code takes so the notes are out by about 3% for the higher notes, but only around 0.1% for the lowest, and it still sounds fairly good to me.
 
Top