Addition of IEE754 half precision floating point numbers

MrAl

Joined Jun 17, 2014
11,496
Hi,

I think you are over complicating this and not understanding my intent here.

For example, when i said that we get a leading 10 i ALSO said that the second digit there can CHANGE later. If the second digit is a zero (which it is) then if it changes it can only change to a 1, thus making both digits 11. Yet you stated that we might not only get 10 we might get 11, whcih is the same thing i said.

This is why we have to stick to very simple examples not long drawn out examples with base changes and stuff like that. We also dont really have to consider the exponent just yet because we can assume that it gets adjusted correctly when needed.

Whatever you are talking about might be different, but i can assure you that what iam talking about works and works efficiently. It has actually been done on the Z80 CPU and the mid range Microchip uC's. It's just so easy that it is almost not worth discussing :)

For example, if we have a four bit number 1001 and a four bit number 1010, if we add them we get:
01001
01010
-------
10011

and we can see that the carry bit (on the left) ended up being a 1. This is standard addition in CPU's and uC's.

If you want to drop the leading 1 then we have 0001 and 0010, or 001 and 010, and before we add we can put the 1's back to get 1001 and 1010 and we then have the same as above. If we want to add them first then we have;
0 001
0 010
--------
0 011

but then we have to add the two upper 1's.

If instead we had 1111 and 1110 with dropped leading 1's then we have 111 and 110, and putting the 1's back we have:
01111
01110
---------
11101

and then adjust that.

If we do it without the leading 1's then we have:
0 111
0 110
-------
1 101

and then we know we had two leading 1's whcih then added end up as a leading 10, so we have to add that next:
10
01101
--------
11101

and so we get the same result either way.

But i cant remember running into a case where i had to do this. That means i must have restored the leading 1's before the addition. In fact if we dont restore the leading 1's then i dont see how we could 'denormalize' the number in preparation for the addition.


Perhaps you can show a SIMPLE example of the way you would propose to add the two numbers. You seem to be suggesting that we dont use the carry bit, and i dont see how that is possible. That is because if we have a register length of N then we need a register length N+1 to do an integer addition or else we loose 1 bit every time we do an addition, assuming we dont want to have to check for leading zeros, which we never want to have to take the time to do.

I could probably find some code for the uC's chips. I know they have routines published on line too somewhere.
But if you have another way of doing it, that would be good to see too.
 

WBahn

Joined Mar 31, 2012
30,077
Hi,

I think you are over complicating this and not understanding my intent here.

For example, when i said that we get a leading 10 i ALSO said that the second digit there can CHANGE later. If the second digit is a zero (which it is) then if it changes it can only change to a 1, thus making both digits 11. Yet you stated that we might not only get 10 we might get 11, whcih is the same thing i said.

This is why we have to stick to very simple examples not long drawn out examples with base changes and stuff like that. We also dont really have to consider the exponent just yet because we can assume that it gets adjusted correctly when needed.

Whatever you are talking about might be different, but i can assure you that what iam talking about works and works efficiently. It has actually been done on the Z80 CPU and the mid range Microchip uC's. It's just so easy that it is almost not worth discussing :)
Then why ARE you discussing it at ALL??? The thread is EXPLICITLY about the IEEE-754 half-precision floating point representation. It is NOT about whatever ad hoc representation YOU used on the Z80 or a mid-range PIC!

So whether whatever you used worked or not or whether it was efficient or not is immaterial -- you are NOT talking about the IEEE-754 half-precision floating point standard and that is what the TS IS talking about.[/QUOTE]
 

MrAl

Joined Jun 17, 2014
11,496
Then why ARE you discussing it at ALL??? The thread is EXPLICITLY about the IEEE-754 half-precision floating point representation. It is NOT about whatever ad hoc representation YOU used on the Z80 or a mid-range PIC!

So whether whatever you used worked or not or whether it was efficient or not is immaterial -- you are NOT talking about the IEEE-754 half-precision floating point standard and that is what the TS IS talking about.
[/QUOTE]

Hi,

Well, i didnt think it would be that much different. Do you think you can give a simple binary example of how you would do the addition?

Another instruction i remember is "add with carry" which means that we add two numbers like:
0001
0010

and if we add them we get 0011, but if we add with carry and carry was 1, then we get:
0001
0010
0001 (the carry from the previous addition)
------
0100 result

and since there was no carry for this last addition the carry bit would be 0, so the next addition would just add and when it added the carry bit that would not change anything.

Wider example:
0001 1001 (two 4 bit registers)
0100 1000

first add the last 4 bit regs in each group:
1001
1000
results in:
0001 with carry=1

next add the first 4 bit regs in each group:
0001
0100
result:
0101
then add the carry:
0110

so the total result of the two four bit groups is:
0110 0001

so we were able to add two 8 bit numbers represented in two four bit groups each.

But i think if you could provide a simple example like this in the way you propose to add numbers that would be informative.
 

WBahn

Joined Mar 31, 2012
30,077
Well, i didnt think it would be that much different. Do you think you can give a simple binary example of how you would do the addition?
[/QUOTE]

I gave TWO examples. For IEEE-754 half-precision:

00101010 (0x2A) + 01010101 (0x55) = 01011000 (0x58)

and

00101010 (0x2A) + 01011111 (0x5F) = 01011000 (0x61)
 

MrAl

Joined Jun 17, 2014
11,496
Well, i didnt think it would be that much different. Do you think you can give a simple binary example of how you would do the addition?
I gave TWO examples. For IEEE-754 half-precision:

00101010 (0x2A) + 01010101 (0x55) = 01011000 (0x58)

and

00101010 (0x2A) + 01011111 (0x5F) = 01011000 (0x61)[/QUOTE]


Hi again,

Ok good.

Here is my rendition:

Code:
line organization:  sgn exp crry mantissa

  0 101 0 11111 first #
  0 101 0 11111 second #
  0 101 1 11110 added, first + second
  0 110 0 11111 initial adjustment (shift right with crry)
  0 110 0 01111 leading zero replacement
  0 110 0 1111  finishing adjustment
I dont see too much difference other than the adjusting. If we have 5 digits then we dont need any intermediate carry, just the one on the left which in the above is only zero for that third line where they are added.
 

WBahn

Joined Mar 31, 2012
30,077
But look at what you've done compared to what you originally claimed. I said that you needed an additional bit to hold the implied 1 and you said that this additional bit was the carry bit, to which I replied that the carry bit could not serve this purpose. Now, in YOUR example, you expand the eight bit representation into TEN bits, including a carry bit AND the additional bit to hold the implied 1. So, in other words, you are now agreeing with me that an additional bit is needed to hold the implied 1 and that this additional bit is NOT the carry out bit.
 

MrAl

Joined Jun 17, 2014
11,496
Hi,

Ok then i guess you were talking about something that i wasnt talking about :)

Of course we need more bits if we have more digits to add right? How could we not.

After all is said and done though, i think we got the point across to the OP about how to go about doing this.

Back when i did it in Z80 code i used an index also rather than an actual shift. The index would access the bits. I've have to find the code again though to see exactly what i did. It was so long ago now, maybe 20 years since i worked with the Z80.

Also interesting is pseudo floating point where the float numbers are stored in integers rather than type float or double. The integers are scaled floats and they act like fixed point floats.
 

WBahn

Joined Mar 31, 2012
30,077
Also interesting is pseudo floating point where the float numbers are stored in integers rather than type float or double. The integers are scaled floats and they act like fixed point floats.
I'm pretty sure I understand what you are referring to, in which case it is simply called fixed-point.
 
Top