Compare Conventional and Alternative Division Methods

Discussion in 'Embedded Systems and Microcontrollers' started by jpanhalt, May 21, 2016.

1. jpanhalt Thread Starter AAC Fanatic!

Jan 18, 2008
5,675
899

Rather than continuing off topic, I decided to start a new thread to compare a "standard" multiprecison method for division with one based on the Kenyan multiplication algorithm but developed independently.

For the standard method I chose Peter Hemsley's from Piclist (http://www.piclist.com/techref/microchip/math/div/32by16ph.htm). I have used it many times and simply adapted it to 24-bit X 16-bit for this experiment.

For the Kenyan method, I simply followed the algorithm developed by G. Reichborn-Kjennerud and presented here: http://www.piclist.com/techref/method/math/muldiv.htm The code is written for 24-bit X 24-bit. I will refer to it as the R-K algorithm -- not to be confused with any other person living or dead.

Both methods were simulated on MPLab 8.92 using MPLab SIM at 4 MHz. The pseudo-Kenyan method uses the instruction set from an enhanced mid-range PIC 16F1829, namely the subwfb instruction. My code is still pretty crude, and I do intend to post it once it is cleaned up. The common subwf(b) method destroys the number being subtracted from (minuend) and my current code uses a shadow register to reconstruct those registers when needed. I don't like doing that, if it can be avoided. It can be done other ways, but the added loop time for doing that may be self-defeating. For example, this destroys the minuend, which ends up with the result:
Code (Microchip Assembler):
1.
2.      movf      T2L,w     ;
3.      subwf     T1L,f     ;
4.      movf      T2H,w     ;
5.      subwfb    T1H,f     ;
6.      nop
7.
This preserves the minuend, but takes more time:
Code (Microchip Assembler):
1.
2.      movf      T2L,w     ;
3.      subwf     T1L,w
4.      movwf     T3L     ;
5.      movf      T2H,w     ;
6.      subwfb    T1H,w   ;
7.      movwf     T3H
8.      nop
9.
Cutting to the chase, here are my timing results:

The Hemsley method uses rotations and has a relatively fixed time. The alternative method depends on the number of subtractions needed. It can be quite short when a number is divided by itself or long when the divisor is 1. My code does not check (yet) for a zero divisor.

Although no question is presented here, I would like to hear from others who have efficient multi-precision methods or who have also tried the Reichborn-Kjennerud algorithm themselves.

Regards, John

NorthGuy likes this.
2. jpanhalt Thread Starter AAC Fanatic!

Jan 18, 2008
5,675
899
Due to the overwhelming interest, it looks like time to put this baby to bed.

I updated the code a little (posted below). Basically, one must do a trial subtraction. If that result is negative (STATUS,0 is clear), a zero is rotated into the result. If the subtraction result is positive, a 1 is rotated into the result. In either instance, the subtrahend is divided by 2 and the process is repeated. Subtraction often results in destruction of the minuend registers. The first approach shown above used back-up registers and restored from those registers when a subtraction gave a negative result ("failed"). In today's version, the subtraction is done twice. The first time avoids destruction of the minuend registers. Then if needed, the subtraction is repeated with the result put in to the minuend registers as usual. It seemed like a quotient with a lot of set bits (e.g., dec. 63) would take longer than one with a lot of clear bits (e.g, dec. 64). That is the case; however, the double-subtraction method is still faster.

Here is a revised chart from my first post:

Here is the revised code:
Code (ASM):
1.
3. ;Date: May 22,2016
4.
5.     list        p=16f1829      ; list directive to define processor
6.     #include    <p16f1829.inc> ; processor specific variable definitions
7.      errorlevel -302,-305
9.
10.     __CONFIG _CONFIG1, _FOSC_INTOSC & _WDTE_OFF & _PWRTE_OFF & _MCLRE_ON & _CP_OFF & _CPD_OFF & _BOREN_OFF & _CLKOUTEN_ON & _IESO_OFF & _FCMEN_OFF
11.     __CONFIG _CONFIG2, _WRT_OFF & _PLLEN_OFF & _STVREN_OFF & _BORV_19 & _LVP_OFF
12.
13.      cblock    0x20
14.      T1U
15.      T1H
16.      T1L
17.      T2U
18.      T2H
19.      T2L
20.      count
21.      resultL
22.      resultH
23.      endc
24.
25.     org 0x0000
26.
27.      goto      Start
28. Start
29. ;T1(24bit)/T2(16bit)
30.      movlw     high(881)
31.      movwf     T2H
32.      movlw     low(881)
33.      movwf     T2L
34.      call      Refresh
35.      clrf      count
36.      clrf      T2U
37.      clrf      resultL
38.      clrf      resultH
39.      nop
40. Kenyan
41.      incf      count,f
42.      movf      T2L,w
43.      subwf     T1L,f
44.      movf      T2H,w
45.      subwfb    T1H,f
46.      movf      T2U,w
47.      subwfb    T1U,f
48.      btfss     STATUS,0
49.      goto      Calculate
50.      lslf      T2L
51.      rlf       T2H
52.      rlf       T2U
53.      goto      Kenyan
54.
55. Calculate
56.      call      Refresh
57.
58. Cycle
59. ;T1 registers are destroyed during subtraction.  When the subtrahend is too
60. ;big, which results in a zero being rotated into the result, those registers
61. ;must be recovered.  Back-up registers can be used; however, several steps are
62. ;required to restore them.  Doing a trial subtraction and preserving the
63. ;minuend registers was slightly faster.
64.      movf      T2L,w
65.      subwf     T1L,w
66.      movf      T2H,w
67.      subwfb    T1H,w
68.      movf      T2U,w
69.      subwfb    T1U,w
70.      btfss     STATUS,0       ;if C=0, subtrahend too big divide by 2 and repeat
71.      goto      Result
72.      movf      T2L,w          ;subtractand OK, repeat subtraction w/saves in f
73.      subwf     T1L,f
74.      movf      T2H,w
75.      subwfb    T1H,f
76.      movf      T2U,w
77.      subwfb    T1U,f          ;carry will be set
78. Result
79.      rlf       resultL,f      ;carry bit is set or clear correctly from above
80.      rlf       resultH,f
81.      decfsz    count,f
82.      goto      DivideT2
83.      goto      Finish
84. DivideT2
85.      lsrf      T2U
86.      rrf       T2H
87.      rrf       T2L
88.      goto      Cycle
89.
90. Finish
91.      nop                      ;answer in resultH and resultL
92.
93. ;Subroutine
94.
95. Refresh
96.      movlw     upper(55536)
97.      movwf     T1U
98.      movlw     high(55536)
99.      movwf     T1H
100.      movlw     low(55536)
101.      movwf     T1L
102.      return
103.
104.      end
105.
Regards, John

andre_teprom likes this.