can someone explain how Binary floating point addition works

ThatComputerGuy · Jun 13, 2018

hi,

i am currently making a 16 bit CPU in Logisim, i am currently working on the ALU, it has all the basic functions, and works just fine, but i can't go any higher than 65535, or lower than -65565, i also can't calculate numbers between numbers, like the number 0.2.

i can add and subtract in binary, no problem but i just can't understand floating point, so if anyone could teach me how it works and give me an example, i'd be very happy

thanks in advance,

ThatComputerGuy

MrChips · Jun 13, 2018

Floating point (FP) is like scientific notation.
For example, 1234.56 could be written as 1.23456 x 10^3
and 0.000123456 would be 1.23456 x 10^-4

Hence you now need two fields, one for the exponent, for example 3 or -4, and another for the mantissa, 1.23456

In binary, the mantissa is always reduced to the form 1.XXXXXX.
This process is called normalization.
Since the most significant bit is always 1, we can ignore this (i.e. it is always implied) and hence have space for one more bit.

How do you add two FP numbers?
The two numbers must have the same exponent before you can add.
To do this, the number with the smaller exponent must be shifted right (divide by two) and add 1 to the exponent. Do this until the exponents match. Obviously, what is happening here is you lose precision in the smaller number.
When the smaller number has been adjusted to match the larger number, you can go ahead and add the two mantissa.

All of this sounds like a lot of work. Yes, it is. The solution is to resort to pre-written FP SW libraries or use HW co-processors to do the hard work for you.

A very acceptable option in many applications is to do fixed-point arithmetic using integers.
For example, if you have to create an application that displays temperatures to one or two decimal place, you can store all your values at 100x the actual value and scale the results later.

BobTPH · Jun 14, 2018

A floating point arithmetic unit will be much more complex than your entire CPU. Software floating point is the usual solution for simple processors. For instance, pretty much all microcontrollers.

Bob

WBahn · Jun 14, 2018

MrChips said:
Floating point (FP) is like scientific notation.
For example, 1234.56 could be written as 1.23456 x 10^3
and 0.000123456 would be 1.23456 x 10^-4

Hence you now need two fields, one for the exponent, for example 3 or -4, and another for the mantissa, 1.23456

In binary, the mantissa is always reduced to the form 1.XXXXXX.
This process is called normalization.
Since the most significant bit is always 1, we can ignore this (i.e. it is always implied) and hence have space for one more bit.

The problem with this is that it means that you can't represent zero.

There were floating point representations for which this was the case. But most people agreed that this was unacceptable and so they found ways to represent zero somehow.

The IEEE-754 standard deals with this by specifying that the smallest exponent pattern represents the same exponent as the next smallest, but that the most significant bit is now assumed to be 0. The is called "denormalization" or "graceful underflow".

Which only goes to underscore that you probably do NOT want to get into developing hardware that deals with this.

First option: Use someone else's floating point coprocessor (if one exists that is usable with your processor)
Second option: Use someone else's IEEE-754 emulation libraries (or non-IEEE-754 if you have to).
Third option: Write your own floating point emulation library.
Fourth option: Develop our own floating point hardware.

For an MCU, your best bet is probably going to be Option #2.

ThatComputerGuy · Jun 23, 2018

BobTPH said:
A floating point arithmetic unit will be much more complex than your entire CPU. Software floating point is the usual solution for simple processors. For instance, pretty much all microcontrollers.

Bob

Can you maybe tell me more about this software floating point?

BobTPH · Jun 23, 2018

ThatComputerGuy said:
Can you maybe tell me more about this software floating point?

Not in a forum post. Google is your friend.

Bob

Thread starter	Similar threads	Forum	Replies	Date
	Could someone help explain what happens at each node during a VA transition in this coupled lumped model?	Analog & Mixed-Signal Design	1	Jul 16, 2025
	Can Someone Explain the Process Used in This PCB Repair Video?	Technical Repair	25	Aug 14, 2024
	Can someone explain ???	Off-Topic	5	Jun 10, 2024
M	Can Someone Explain What All This Stuff Is?	PCB Layout , EDA & Simulations	33	May 23, 2024
M	Can someone explain small signal duty cycle	General Electronics Chat	6	May 8, 2024

can someone explain how Binary floating point addition works

Join our Engineering Community! Sign-in with:

can someone explain how Binary floating point addition works

ThatComputerGuy

MrChips

BobTPH

WBahn

ThatComputerGuy

BobTPH

You May Also Like

The LM386: How National Semiconductor Put a Speaker Amplifier in Eight Pins

New Semtech IC Closes the USB-PD VBus Protection Gap at 53 V

Geehy Debuts First Motor Control MCU on Its New G32F0 Platform

Microchip’s 100/1000BASE‑T1 SPE PHYs Pack Security and Safety Features