Binary to BCD

Thread Starter

Ian0

Joined Aug 7, 2020
13,097
I think we all know that “double dabble” is the best binary to bcd algorithm for processors with no multiply and limited memory, but what is the best algorithm for processors with a single-cycle multiply and no shortage of memory?
 

MrChips

Joined Oct 2, 2009
34,628
I think we all know that “double dabble” is the best binary to bcd algorithm for processors with no multiply and limited memory, but what is the best algorithm for processors with a single-cycle multiply and no shortage of memory?
How many binary bits you need to convert, signed or unsigned?
Does the MCU have integer divide?
What MCU are you usiing?
 

MrChips

Joined Oct 2, 2009
34,628
I would do it one of two ways:

1) divide by decreasing powers of 10 starting with the largest, for example, 10,000 for 16-bit integer

2) iterative divide by 10.
 

WBahn

Joined Mar 31, 2012
32,703
I think we all know that “double dabble” is the best binary to bcd algorithm for processors with no multiply and limited memory, but what is the best algorithm for processors with a single-cycle multiply and no shortage of memory?
No shortage of memory?

A look-up table.

But, there're LUTs and then there're LUTs.

The fastest would be a true brute-force LUT with the BCD values for every possible integer of interest. For 16-bit, that would be 64K entries (I'm assuming you are only working with unsigned integers) with each entry being five BCD values (so three bytes, if they are packed), which may or may not be feasible. For 32-bit integers, it's hard to imagine it being feasible.

But you could partition this pretty easily to get a tradeoff between computational effort and LUT size.

Also, avoid doing division, but depending on your compiler (if you are writing in a high-level language), it may do that for you by converting your division (or remainder) by a constant divisor into a multiplication by the reciprocal. If your compiler doesn't do that, you can do it yourself in your source code.
 

Futurist

Joined Apr 8, 2025
721
I think we all know that “double dabble” is the best binary to bcd algorithm for processors with no multiply and limited memory, but what is the best algorithm for processors with a single-cycle multiply and no shortage of memory?
There are several way to represent BCD, e.g. "packed" which one's on your mind?
 

Thread Starter

Ian0

Joined Aug 7, 2020
13,097
Not familiar with that processor.

When you do a 32 bit integer multiply, does it result in a 64-bit result stored in two registers?
Yes, the UMULL or SMULL instruction does that, but when it does division, it doesn’t give you quotient and remainder, just quotient, and it takes a multiply and subtract to get the remainder
 

MrChips

Joined Oct 2, 2009
34,628
If the values are randomly distributed across the 32-bit range, then I don’t think it matters which technique you choose. Otherwise, I would opt for repeated division by 10. Small numbers will take less time to convert.
 

Futurist

Joined Apr 8, 2025
721
I'm curious what might motivate the use of BCD anyway, on an MCU I mean. IBM instruction sets supported (well, still do) BCD arithmetic, that was motivated by business and support for stuff like COBOL.

BCD arithmetic is not hard actually, I just learned that, in C it's pretty simple, we add two four bit values, if the result exceeds 0F, we just add 6 then split the byte (assuming were not using packed BCD).

Of course that's more work than jus adding two registers!
 
Last edited:

WBahn

Joined Mar 31, 2012
32,703
I just looked up the GCC implementation and the vprintfv.c (which printf.c is based on) file is over 2300 lines long.
 

MrChips

Joined Oct 2, 2009
34,628
If you want to use the library, you would sprintf( ) to a string and then send the string to the LCD.

However, I agree. I would write my own functions.
 

Thread Starter

Ian0

Joined Aug 7, 2020
13,097
If you want to use the library, you would sprintf( ) to a string and then send the string to the LCD.

However, I agree. I would write my own functions.
Yes. That's what started this all off. The program "evolved" and ended up with four different functions to display numbers in decimal in four different formats, and I decided it needed "tidying".
I spent some time working out exactly how double dabble worked, and realised that a processor with a barrel shifter does a better job. Now I've just become intrigued as to what is the best way of doing it (with no particular idea what "best" means).
 

Thread Starter

Ian0

Joined Aug 7, 2020
13,097
After some thought, I realised that converting the fractional part of a fixed point binary number is considerably easier than converting the integer part. It just needs a multiply from which the integer part is output and the fractional part goes on to the next stage, and UMULL will do that for you, outputting the top 32 bits in one register and the bottom 32 bits in another.
Applying that to a 32-bit integer doesn't work, because what you get is the number as a decimal fraction of 2^32.
However, multiplying the original number by 2^64/10^9 normalises it, and it gives the correct output, but I've not yet tested whether that number is big enough to avoid an error in the LSB for all numbers. In which case an extra shift would be required (but it would only be required ONCE for the whole conversion, not for every digit)
[EDIT] dividing by 10^9 by multiplying by 2^m/10^9 (where m=64) is a magic number division, and the theory says that m>(32+log2(n)).
Log2 of 10^9 is 29.9, so it suggests m=62 ought to be good enough.
 
Last edited:
Top