problem with representing real numbers in binary system

Ghina Bayyat · May 28, 2020

I have a question can you please help me
I know that real numbers are represented using the floating point representation method and that negative numbers are represented using two's complement .
I know there are other methods but these are the most common ways and i know very well how to represent real and negative numbers using the two methods but my question is :
What if i have a real negative number ? like -3.5 or -4.25 or any other number ?
How would i represent these numbers ? is the floating point method used to represent these numbers since it has the left most bit as the sign bit or is there another way to represent these numbers ? maybe a mix of the two ways or something like that ??

jpanhalt · May 28, 2020

Source: http://cstl-csm.semo.edu/xzhang/Class Folder/CS280/Workbook_HTML/FLOATING_tut.htm
A 1 bit indicates a negative number, and a 0 bit indicates a positive number. Before a floating-point binary number can be stored correctly, its mantissa must be normalized. ... The exponent expresses the number of positions the decimal point was moved left (positive exponent) or moved right (negative exponent).

Of course, you can also use fixed point or integer math with both types of numbers.

Ian Rogers · May 28, 2020

I tend to use fixed point.. The trade of is a fixed decimal point.

3.50 is just 350 but the decimal place is known and placed when needed.. -3.50 is the same but the sign bit is used

Ghina Bayyat · May 28, 2020

so you are saying that the two's complement method cannot be used to represent real numbers and it is only for negative numbers ??

jpanhalt · May 28, 2020

Ghina Bayyat said:
so you are saying that the two's complement method cannot be used to represent real numbers and it is only for negative numbers ??

Here's a 2's complement table:

Notice the range of real values. Try working with it to understand the concept rather than simply assuming what can and cannot be done.

EDIT: Here's a link to the original document: https://datasheets.maximintegrated.com/en/ds/MAX31856.pdf

Ghina Bayyat · May 28, 2020

jpanhalt said:
Here's a 2's complement table:
View attachment 208342

Notice the range of real values. Try working with it to understand the concept rather than simply assuming what can and cannot be done.

EDIT: Here's a link to the original document: https://datasheets.maximintegrated.com/en/ds/MAX31856.pdf

thanks but the link is not opening
anyway if the two's complement can be used to represent real numbers can you please explain how with an example ? let's say -3.25 .how can i write this number using two's complement ?can you please explain ?

BobTPH · May 28, 2020

If you are using fixed point with two decimal places, -3.25 would be represented the same way as the integer -325.

Floating point formats typically use sign magnitude.

Bob

jpanhalt · May 28, 2020

Ghina Bayyat said:
thanks but the link is not opening
anyway if the two's complement can be used to represent real numbers can you please explain how with an example ? let's say -3.25 .how can i write this number using two's complement ?can you please explain ?

It opens for me fine. Just search for and open the datasheet for the Maxim MAX31856 thermocouple amplifier.

MrChips · May 28, 2020

Fixed point representation

We will use 4 bits for integer bits and 4 bits for fractional bits.

0 = 0000 0000
8 = 1000 0000
4 = 0100 0000
2 = 0010 0000
1 = 0001 0000
0.5 = 0000 1000
0.25 = 0000 0100
0.125 = 0000 0010
0.0625 = 0000 0001

The value of each bit going from left to right is
2 ^ 3 = 8
2 ^ 2 = 4
2 ^ 1 = 2
2 ^ 0 = 1
2 ^ -1 = 0.5
2 ^ -2 = 0.25
2 ^ -3 = 0.125
2 ^ -4 = 0.0625

-1 = 1111 0000
-0.5 = 1111 1000

3.25 = 0011 0100
-3.25 = 1100 1100

You may wish to think of the value as being scaled by 16.
For example,
0011 0100 = 52 = 3.25 x 16
1100 1100 = -52 = -3.25 x 16

If the MSB is 1, take the straight binary value and subtract 256, then divide by 16
For example,
1100 1100 = 204
Subtract 256
204 - 256 = -52
Divide by 16
-52 / 16 = -3.25

atferrari · May 28, 2020

@MrChips @Ian Rogers

Not to derail this thread, just two questions (never came to use fix point just integer math):

When should I use it instead of integer?

How do you express in fix point format: 1783.487?

Thanks.

MrChips · May 28, 2020

atferrari said:
@MrChips @Ian Rogers

Not to derail this thread, just two questions (never came to use fix point just integer math):

When should I use it instead of integer?

How do you express in fix point format: 1783.487?

Thanks.

It comes down to the range of numbers (minimum value to maximum value) and precision desired.
1783487 would require about 25 bits. You would be better to go to floating point representation.

If you can accept 1 decimal place
1782.5 can be represented as 17825 / 10
With 16 bits you can have a range of -3276.8 to +3276.7 with 0.1 resolution.

As a general rule I do not use floating point.
I use scaled decimal as in the example above. All numbers are scaled by a suitable scaling factor, for example, x10, x40, x100, x200, x800, x1600
Numbers are displayed in decimal format and the decimal point is inserted in the correct place.
This is far more efficient than using floating point.

btw, fixed point arithmetic is still integer math. It uses only integer math.

Ian Rogers · May 28, 2020

Once upon a time... Floating point was cumbersome and slowwwww.. It took half your memory.. All my products use Trig.. Lots of Trig.. I couldn't fit all the code into a pic18f so we used fixedpoint... We only needed two decimal places, but calculated with three.

1783.487 is 1783487... just remember where the decimal place is ie 4 /3 = 1 with integer math, but 400 / 3 = 133 put the result with two decimal places 1.33... Correct..

Ian Rogers · May 28, 2020

Sorry MrChips... I was typing...

MrChips · May 28, 2020

As an example, I had to create an MCU product to display dewpoint in °C or °F calculated from temperature and relative humidity.
One decimal place was desired. In order to maintain accuracy I scaled all values by x40. Final results were divided by 4 (i.e. rounded and shifted right 2 bits). Decimal values were generated and the decimal point put in place.
For example 422 became 10.6°C.
This was all done on a simple 8-bit MCU.

Papabravo · May 28, 2020

Another way to look at the problem is to consider the use of rational approximations to real numbers. Each real number is represented as a pair of integers, call them N and D. For example pi can be represented by (N,D) = (22.7) = 22/7. A better approximation would be (N,D) = (355, 113) = 355/113.

atferrari · May 28, 2020

Thanks to you both.

It seems that I will have to seriously revisit this soon.

The lack of practice is a problem.

Ian Rogers · May 28, 2020

One thing to remember.. A 32bit Floating point unit and a long can contain the same amount of numbers... The Float, however! Represnts larger range by trading off precision ie.. 0.000000001 but when you represent larger numbers ie.. 4.345876 ^ 15.. is being a bit loose... 4,345,876,000,000,000. " Wheres the precision?? "

Horses for courses...

MrChips · May 28, 2020

There are many tricks you can use in order to avoid using floats and also to speed up computation.
Let us use the value of pi as an example. As pb points out 22/7 is crude.
π = 3.1415926535897
22 / 7 = 3.14286, error = +0.00127
355 / 113 = 3.1415929, error = + 0.0000003

To multiply by π using integer arithmetic we need one multiply and one divide. Now recognize that shifting bits is much more efficient than multiplication and division.

Hence we convert
355 / 113 = (355 x 256) / (113 x 256) = (355 x 256 / 113) / 256 = 804 / 256
804 / 256 = 3.140625, error = -0.001

In this example we notice that 804 is divisible by 4.
201 / 64 gives the same result = 3.140625

If we use 10-bit shifts
(355 x 1024 / 113) / 1024 = 3217 / 1024
3217 / 1024 = 3.14160, error = +0.00001

You can use this same technique for multiplying or dividing by any number.
(Watch out for overflows when working with large integers!)

Edit: We will move all of this integer math tips to a new thread in Math & Science or maybe a blog.

BobaMosfet · May 28, 2020

Ghina Bayyat said:
I have a question can you please help me
I know that real numbers are represented using the floating point representation method and that negative numbers are represented using two's complement .
I know there are other methods but these are the most common ways and i know very well how to represent real and negative numbers using the two methods but my question is :
What if i have a real negative number ? like -3.5 or -4.25 or any other number ?
How would i represent these numbers ? is the floating point method used to represent these numbers since it has the left most bit as the sign bit or is there another way to represent these numbers ? maybe a mix of the two ways or something like that ??

binary works fine for real numbers so long as you use fixed-point notation. You can have as much or as little precision as you have bits, and in fact during calculations you can move the mantissa one way or the other to gain precision where you need it. You still use the high-bits for sign.

In a 16-bit register, 3.5 = 896 or 0x380 which is 11 1000 0000 in binary. If we make it negative it is: 1111 1100 1000 0000 or (0xFC80) or -896. This puts the mantissa between the 8 and 9th bit (the very middle, so 8-bits of precision on either side of the mantissa.

Deleted member 115935 · May 28, 2020

Your original statement is incorrect,
A real number does not have to be a floating point number.

A real just has a bit before and after the point,

Fixed point is real by this definition,

Thread starter	Similar threads	Forum	Replies	Date
R	Just having a bit of a current control problem.	Power Electronics	57	Jun 13, 2026
M	Crossing within differential pair problem	PCB Layout , EDA & Simulations	17	Jun 7, 2026
S	Boss DM-2 PCB problem	PCB Layout , EDA & Simulations	9	May 20, 2026
	Filter problem	Analog & Mixed-Signal Design	8	May 11, 2026
J	Help Drawing Circuit for Problem.	Homework Help	15	Apr 14, 2026

problem with representing real numbers in binary system

Join our Engineering Community! Sign-in with:

problem with representing real numbers in binary system

Ghina Bayyat

jpanhalt

Ian Rogers

Ghina Bayyat

jpanhalt

Ghina Bayyat

BobTPH

jpanhalt

MrChips

atferrari

MrChips

Ian Rogers

Ian Rogers

MrChips

Papabravo

atferrari

Ian Rogers

MrChips

BobaMosfet

Deleted member 115935

You May Also Like

Onsemi Unveils Interactive Web Tool to Simplify Power Design

Infineon Serves Up GaN 40 V Switches Aimed at Portable Power Designs

Europe’s Chip Ambitions: Why the EU’s Semi Strategy May Fall Short

The Kilo Lamp: An Interactive Lamp Controller