problem with representing real numbers in binary system

Ghina Bayyat · May 28, 2020

thank you all for your help but all of you said that the fixed point representation is more common and easier . so why in many books and also many videos on youtube say that the floating point representation is the more common way ? i'm really getting lost now can anyone please tell me a summary or if there is a table including all the ways of representing negative and real numbers ? and which one is better or which one is used nowadays ? and how can i decide which way i can use ? is it depending on the computer and on the number ?

Papabravo · May 28, 2020

Ghina Bayyat said:
thank you all for your help but all of you said that the fixed point representation is more common and easier . so why in many books and also many videos on youtube say that the floating point representation is the more common way ? i'm really getting lost now can anyone please tell me a summary or if there is a table including all the ways of representing negative and real numbers ? and which one is better or which one is used nowadays ? and how can i decide which way i can use ? is it depending on the computer and on the number ?

Normally floating point calculations are done with dedicated hardware. This is expensive in terms of chip real estate. In low end microcontrollers it is common to support only integer operations. In some cases you don't even get integer multiplication and division.

Deleted member 115935 · May 28, 2020

Ghina Bayyat said:
thank you all for your help but all of you said that the fixed point representation is more common and easier . so why in many books and also many videos on youtube say that the floating point representation is the more common way ? i'm really getting lost now can anyone please tell me a summary or if there is a table including all the ways of representing negative and real numbers ? and which one is better or which one is used nowadays ? and how can i decide which way i can use ? is it depending on the computer and on the number ?

Quote us the books, and we might be able to comment more,

This is posted in digital design, ( see the end )

In digital design, despite what the silicon vendors say, silicon is expensive, both in terms of dollars, but also power used.

One of the main skills of a digital designer is to use the "most efficient" method of achieving the requirement,

Floating point is always going to be an order or more in magnitude than a fixed point solution,
If minimum time to develop is the requirement, and you have dollars and watts to chuck at the problem, go floating point,

As an example, take the classic problem sqrt ( A*A + B*B)

where A and B are in range 0 to 1

Simple answer , is to use real fractional numbers, and a sqrt algorithm,
this would take a bunch of silicon,

Alternatively you could use largest + 1/2 smallest, and do it all in fixed point,
it would be a few percent in error, but may be that's good enough.

digital design is all about understanding the constraints and requirements, and compromises,

BUT

Then you mention CPU,
now CPU is not digital design, its programming,

Traditionally, a CPU has not a floating point unit, thus floating point numbers are "evaluated" in multiple steps, i.e. they are traditionally much slower then integers,

More modern CPUs have floating point units which are almost as fast as the integer units,
so the argument to use integers only for small programs is less,
the saving in using the integer unit to the floating point unit is may be only 10:1 in execution time,

To a first approximation, if a small program took say 10ms in floating point , it would take 1ms in integer.
who cares,

but

if your program took 10 hours to run and going integer, it went to 1 hour, then its a different story,

jpanhalt · May 28, 2020

Papabravo said:
Normally floating point calculations are done with dedicated hardware. This is expensive in terms of chip real estate. In low end microcontrollers it is common to support only integer operations. In some cases you don't even get integer multiplication and division.

Sounds "good," but I have never seen a low end MCU (limited to 12F5xx and above) that didn't support fixed point math including routines with multiplication and division. Of course, those are not single step operations either.

MrChips · May 28, 2020

No one here is saying one is better than the other. There is never a "best" solution. It depends on your application and implementation options.

Floating point is costly in one way or another.
If the MCU does not have FP HW then you have to implement it in SW.
If you implement it in SW it will require lots of program space and will take lots of execution cycles.
FP math is not exact. There are errors in FP computation.

Integer arithmetic is exact.
Integer computation in SW is fast and efficient. Many MCUs have HW multiply and divide. This makes computation even more faster and code space efficient.

If you are running programs such as spreadsheet or MATLAB on a PC, go ahead and use FP. I am not going to quibble. If you are building an embedded system on an MCU my choice would be integer arithmetic for speed and efficiency.

I once had to calculate dewpoint on an MCU. There was no way to implement this in FP in 2K bytes on an 8-bit MCU.

Papabravo · May 28, 2020

MrChips said:
No one here is saying one is better than the other. There is never a "best" solution. It depends on your application and implementation options.

Floating point is costly in one way or another.
If the MCU does not have FP HW then you have to implement it in SW.
If you implement it in SW it will require lots of program space and will take lots of execution cycles.
FP math is not exact. There are errors in FP computation.

Integer arithmetic is exact.
Integer computation in SW is fast and efficient. Many MCUs have HW multiply and divide. This makes computation even more faster and code space efficient.

If you are running programs such as spreadsheet or MATLAB on a PC, go ahead and use FP. I am not going to quibble. If you are building an embedded system on an MCU my choice would be integer arithmetic for speed and efficiency.

I once had to calculate dewpoint on an MCU. There was no way to implement this in FP in 2K bytes on an 8-bit MCU.

As another anecdotal data point it might be worthwhile to point out that DSP is often done with fixed point arithmetic with things normalized to lie in the interval (-1,1), an open interval that does not include either endpoint

MrChips · May 28, 2020

Papabravo said:
As another anecdotal data point it might be worthwhile to point out that DSP is often done with fixed point arithmetic with things normalized to lie in the interval (-1,1), an open interval that does not include either endpoint

Good point. FP FFT would be too darn slow. FFT is usually computed in scaled integer arithmetic.

MrChips · May 28, 2020

Here is a straight forward comparison.
Take any MCU. Compare how long it takes (or computer cycles) to add two numbers
(a) using integers
(b) using floating point.

Ghina Bayyat · May 30, 2020

thanks a lot for your help
so depending on the requirements i can use any appropriate method :

MrChips said:
No one here is saying one is better than the other. There is never a "best" solution. It depends on your application and implementation options.

Floating point is costly in one way or another

Papabravo said:
Normally floating point calculations are done with dedicated hardware. This is expensive in terms of chip real estate

andrewmm said:
Floating point is always going to be an order or more in magnitude than a fixed point solution,
If minimum time to develop is the requirement, and you have dollars and watts to chuck at the problem, go floating point

but one last question : i noticed that neither of you mentioned the 2's complement method and you only talked about fixed and floating point
is there a reason like it is not used when we talk about real numbers ?
if it is used then how because i can't understand how can i use 2's complement to represent a real number

LesJones · May 30, 2020

The 2's complement is just an easy way to change the sign of a number. Invert all the bits (Most microcontrollers have a complement instruction.) which gives the 1's complement. Then just add 1 to the 1's complement to give the 2's complement. Using this method makes subtraction easier. Just add the 2's complement of a number to the number you want to subtract it from.
If you program in assembler you will soon get to understand binary. (And other base number systems.)

Les.

BobTPH · May 30, 2020

As I said a few posts into this thread, fixed point can use two’s complement to represent negatve numbers.

What makes you think it cannot?

Bob

jpanhalt · May 30, 2020

Ghina Bayyat said:
i noticed that neither of you mentioned the 2's complement method and you only talked about fixed and floating point
is there a reason like it is not used when we talk about real numbers ?
if it is used then how because i can't understand how can i use 2's complement to represent a real number

2's complement is frequently used to represent both positive and negative "real" numbers, as shown in post#5. Take these 3 examples from that table:

The left 12 bits are intergers.

0011 1110 1000 = 0x3E8 = 512+256+128+64+32+8 = 1000
0000 0110 0100 = 0x64 = 100
0000 0001 1001 = 0x19 = 25

One of the nice things about 2's complement, besides it use in subtraction, is that positive numbers represented in that manner do not change (as illustrated above). Thus, you will see 2's complement used very often for data from a variety of sensors.

Deleted member 115935 · May 30, 2020

Twos compliment and fixed point are two different things.

Fixed / floating is a way of representing a number with a "point" in it.

2's compliments , ones compliment, offset binary et all are all ways of representing a number that's got a negative and a positive.

fixed / floating can be represented in 2's compliment, 1's compliment, offset binary et all.

There is an interesting theory, from the 70's that the universe is coded in 2's compliment !

look at tip 154,,
http://www.inwap.com/pdp10/hbaker/hakmem/hacks.html#item154

MrChips · May 30, 2020

2's complement was not mentioned because it is a given. That means that it is a commonly used technique to differentiate between positive and negative numbers.

Ghina Bayyat · May 30, 2020

thank you all so much
everything is clear now

BobTPH said:
As I said a few posts into this thread, fixed point can use two’s complement to represent negatve numbers.

What makes you think it cannot?

i didn't know that before i thought fixed point , floating point and 1&2's complement are all ways of representing numbers in binary and a number can be represented in only one way of them so that's what made me confused about how to represent a real negative number
but thanks to what you said i get it now

andrewmm said:
Twos compliment and fixed point are two different things.

Fixed / floating is a way of representing a number with a "point" in it.

2's compliments , ones compliment, offset binary et all are all ways of representing a number that's got a negative and a positive.

thanks for your help

MrChips · May 30, 2020

To be clear, floating point uses 2's complement.
In the example I gave you on fixed point arithmetic in post #9 I used 2's complement. Hence it is not one versus the other.

Deleted member 115935 · May 31, 2020

MrChips said:
To be clear, floating point uses 2's complement.
In the example I gave you on fixed point arithmetic in post #9 I used 2's complement. Hence it is not one versus the other.

To be clear,,,

Floating point NORMALY uses 2's compliment, it does not HAVE to ,
IEEE 754 does but there are other formats possible and found in the past.

Thread starter	Similar threads	Forum	Replies	Date
R	Just having a bit of a current control problem.	Power Electronics	57	Jun 13, 2026
M	Crossing within differential pair problem	PCB Layout , EDA & Simulations	17	Jun 7, 2026
S	Boss DM-2 PCB problem	PCB Layout , EDA & Simulations	9	May 20, 2026
	Filter problem	Analog & Mixed-Signal Design	8	May 11, 2026
J	Help Drawing Circuit for Problem.	Homework Help	15	Apr 14, 2026

problem with representing real numbers in binary system

Join our Engineering Community! Sign-in with:

problem with representing real numbers in binary system

Ghina Bayyat

Papabravo

Deleted member 115935

jpanhalt

MrChips

Papabravo

MrChips

MrChips

Ghina Bayyat

LesJones

BobTPH

jpanhalt

Deleted member 115935

MrChips

Ghina Bayyat

MrChips

Deleted member 115935

You May Also Like

Cirrus Logic Expands Pro Audio Converter Lineup With New 32-Bit Devices

Class AB vs. Class D: Understanding the Trade-Offs for Piezo Driver Design

Toshiba Collapses MCU, Motor Driver, and Sensorless Control Into One IC

Microchip’s 100/1000BASE‑T1 SPE PHYs Pack Security and Safety Features