Compare Conventional and Alternative Division Methods

Discussion in 'Embedded Systems and Microcontrollers' started by jpanhalt, May 21, 2016.

  1. jpanhalt

    Thread Starter AAC Fanatic!

    Jan 18, 2008
    5,675
    899
    This thread is related to a previous thread on binary division here: http://forum.allaboutcircuits.com/threads/divide-a-binary-number.121934/

    Rather than continuing off topic, I decided to start a new thread to compare a "standard" multiprecison method for division with one based on the Kenyan multiplication algorithm but developed independently.

    For the standard method I chose Peter Hemsley's from Piclist (http://www.piclist.com/techref/microchip/math/div/32by16ph.htm). I have used it many times and simply adapted it to 24-bit X 16-bit for this experiment.

    For the Kenyan method, I simply followed the algorithm developed by G. Reichborn-Kjennerud and presented here: http://www.piclist.com/techref/method/math/muldiv.htm The code is written for 24-bit X 24-bit. I will refer to it as the R-K algorithm -- not to be confused with any other person living or dead. ;)

    Both methods were simulated on MPLab 8.92 using MPLab SIM at 4 MHz. The pseudo-Kenyan method uses the instruction set from an enhanced mid-range PIC 16F1829, namely the subwfb instruction. My code is still pretty crude, and I do intend to post it once it is cleaned up. The common subwf(b) method destroys the number being subtracted from (minuend) and my current code uses a shadow register to reconstruct those registers when needed. I don't like doing that, if it can be avoided. It can be done other ways, but the added loop time for doing that may be self-defeating. For example, this destroys the minuend, which ends up with the result:
    Code (Microchip Assembler):
    1.  
    2.      movf      T2L,w     ;
    3.      subwf     T1L,f     ;
    4.      movf      T2H,w     ;
    5.      subwfb    T1H,f     ;
    6.      nop
    7.  
    This preserves the minuend, but takes more time:
    Code (Microchip Assembler):
    1.  
    2.      movf      T2L,w     ;
    3.      subwf     T1L,w
    4.      movwf     T3L     ;
    5.      movf      T2H,w     ;
    6.      subwfb    T1H,w   ;
    7.      movwf     T3H
    8.      nop
    9.  
    Cutting to the chase, here are my timing results:
    upload_2016-5-21_7-1-14.png
    The Hemsley method uses rotations and has a relatively fixed time. The alternative method depends on the number of subtractions needed. It can be quite short when a number is divided by itself or long when the divisor is 1. My code does not check (yet) for a zero divisor.

    Although no question is presented here, I would like to hear from others who have efficient multi-precision methods or who have also tried the Reichborn-Kjennerud algorithm themselves.

    Regards, John
     
    NorthGuy likes this.
  2. jpanhalt

    Thread Starter AAC Fanatic!

    Jan 18, 2008
    5,675
    899
    Due to the overwhelming interest, it looks like time to put this baby to bed.

    I updated the code a little (posted below). Basically, one must do a trial subtraction. If that result is negative (STATUS,0 is clear), a zero is rotated into the result. If the subtraction result is positive, a 1 is rotated into the result. In either instance, the subtrahend is divided by 2 and the process is repeated. Subtraction often results in destruction of the minuend registers. The first approach shown above used back-up registers and restored from those registers when a subtraction gave a negative result ("failed"). In today's version, the subtraction is done twice. The first time avoids destruction of the minuend registers. Then if needed, the subtraction is repeated with the result put in to the minuend registers as usual. It seemed like a quotient with a lot of set bits (e.g., dec. 63) would take longer than one with a lot of clear bits (e.g, dec. 64). That is the case; however, the double-subtraction method is still faster.

    Here is a revised chart from my first post:
    upload_2016-5-22_10-37-35.png
    Here is the revised code:
    Code (ASM):
    1.  
    2. ;Author: ©JPANHALT
    3. ;Date: May 22,2016
    4.  
    5.     list        p=16f1829      ; list directive to define processor
    6.     #include    <p16f1829.inc> ; processor specific variable definitions
    7.      errorlevel -302,-305
    8.      RADIX     DEC
    9.    
    10.     __CONFIG _CONFIG1, _FOSC_INTOSC & _WDTE_OFF & _PWRTE_OFF & _MCLRE_ON & _CP_OFF & _CPD_OFF & _BOREN_OFF & _CLKOUTEN_ON & _IESO_OFF & _FCMEN_OFF
    11.     __CONFIG _CONFIG2, _WRT_OFF & _PLLEN_OFF & _STVREN_OFF & _BORV_19 & _LVP_OFF
    12.  
    13.      cblock    0x20
    14.      T1U
    15.      T1H
    16.      T1L
    17.      T2U
    18.      T2H
    19.      T2L
    20.      count
    21.      resultL
    22.      resultH
    23.      endc
    24.  
    25.     org 0x0000
    26.  
    27.      goto      Start
    28. Start
    29. ;T1(24bit)/T2(16bit)
    30.      movlw     high(881)
    31.      movwf     T2H
    32.      movlw     low(881)  
    33.      movwf     T2L
    34.      call      Refresh
    35.      clrf      count
    36.      clrf      T2U
    37.      clrf      resultL
    38.      clrf      resultH
    39.      nop
    40. Kenyan
    41.      incf      count,f
    42.      movf      T2L,w
    43.      subwf     T1L,f
    44.      movf      T2H,w
    45.      subwfb    T1H,f
    46.      movf      T2U,w
    47.      subwfb    T1U,f
    48.      btfss     STATUS,0
    49.      goto      Calculate
    50.      lslf      T2L
    51.      rlf       T2H
    52.      rlf       T2U
    53.      goto      Kenyan
    54.  
    55. Calculate
    56.      call      Refresh
    57.  
    58. Cycle                      
    59. ;T1 registers are destroyed during subtraction.  When the subtrahend is too
    60. ;big, which results in a zero being rotated into the result, those registers
    61. ;must be recovered.  Back-up registers can be used; however, several steps are
    62. ;required to restore them.  Doing a trial subtraction and preserving the
    63. ;minuend registers was slightly faster.
    64.      movf      T2L,w
    65.      subwf     T1L,w
    66.      movf      T2H,w
    67.      subwfb    T1H,w
    68.      movf      T2U,w
    69.      subwfb    T1U,w
    70.      btfss     STATUS,0       ;if C=0, subtrahend too big divide by 2 and repeat
    71.      goto      Result
    72.      movf      T2L,w          ;subtractand OK, repeat subtraction w/saves in f
    73.      subwf     T1L,f
    74.      movf      T2H,w
    75.      subwfb    T1H,f
    76.      movf      T2U,w
    77.      subwfb    T1U,f          ;carry will be set
    78. Result    
    79.      rlf       resultL,f      ;carry bit is set or clear correctly from above
    80.      rlf       resultH,f
    81.      decfsz    count,f
    82.      goto      DivideT2
    83.      goto      Finish
    84. DivideT2
    85.      lsrf      T2U
    86.      rrf       T2H
    87.      rrf       T2L
    88.      goto      Cycle
    89.  
    90. Finish
    91.      nop                      ;answer in resultH and resultL
    92.  
    93. ;Subroutine
    94.  
    95. Refresh
    96.      movlw     upper(55536)
    97.      movwf     T1U
    98.      movlw     high(55536)
    99.      movwf     T1H
    100.      movlw     low(55536)
    101.      movwf     T1L
    102.      return  
    103.  
    104.      end
    105.  
    Regards, John
     
    andre_teprom likes this.
Loading...