signed char

Thread Starter

gogo00

Joined Oct 28, 2023
43
The char data type in C stores characters. I'm a bit confused about the signed char variable in C. In C, the unsigned keyword indicates positive values ranging from 0 to 255, while signed represents both positive and negative values within the range of -128 to 127.
 

Papabravo

Joined Feb 24, 2006
22,058
You are correct. In a signed character representation, the 256 possible values in a byte represent 128 positive values from 0 to 127 (0x00 to 0x7F) and 128 negative values from -128 to -1 (080 to 0xFF). Where does this make a difference? If you cast a signed character to a signed integer, it gets "sign extended". That is 0xC0 when cast to a signed int becomes 0xFFC0. It also matters for comparisons. That is -15 < 3. In the unsigned case it would be the other way around.
 

dl324

Joined Mar 30, 2015
18,220
I'm a bit confused about the signed char variable in C.
Other than the minimum signed char value being implementation dependent, your statements are correct.

This is for gcc on Debian on Win10:
Code:
grep SCHAR_MIN limits.h
limits.h:#  define SCHAR_MIN    (-128)
limits.h:#   define CHAR_MIN    SCHAR_MIN
What are you confused about?
 

BobTPH

Joined Jun 5, 2013
11,463
In C, char is an integer data type that is guaranteed to be able to store one ASCII character. ASCII characters are in the range 0 to 127. Either a signed 8-bit integer or an unsigned one can store values in that range, so both are capable storing one ASCII character.
 

ApacheKid

Joined Jan 12, 2015
1,762
The char data type in C stores characters. I'm a bit confused about the signed char variable in C. In C, the unsigned keyword indicates positive values ranging from 0 to 255, while signed represents both positive and negative values within the range of -128 to 127.
So, what are you confused about exactly?
 

Thread Starter

gogo00

Joined Oct 28, 2023
43
I understand that the char data type can store ASCII values from 0 to 127, but I'm a bit confused about the range from -128 to -1.

Could you help clarify what happens in this range? I'm curious about how negative ASCII values, such as -128 to -1, are handled or if they have any representation.
 

BobTPH

Joined Jun 5, 2013
11,463
There are no negative ASCII values. C actually has no datatype that is an ASCII character. signed char is an integer type. What happens is what happens with the integer values.
 

Papabravo

Joined Feb 24, 2006
22,058
I understand that the char data type can store ASCII values from 0 to 127, but I'm a bit confused about the range from -128 to -1.

Could you help clarify what happens in this range? I'm curious about how negative ASCII values, such as -128 to -1, are handled or if they have any representation.
The representation of negative numbers is known as Two's Complement notation. Negative values are handled in the same way as positive numbers. If I tell you that a singed character of all 1's represents -1 and ask you to add +10 and -1 you would immediately say "9"! Now do it with a binary adder.

0000 1010 = 10 decimal
+1111 1111 = -1 decimal
----------------
0000 1001

It is true that there is a carry out of the high order bit, but we drop that when using Two's Complement arithmetic.
 

xox

Joined Sep 8, 2017
936
I understand that the char data type can store ASCII values from 0 to 127, but I'm a bit confused about the range from -128 to -1.


Could you help clarify what happens in this range? I'm curious about how negative ASCII values, such as -128 to -1, are handled or if they have any representation.
ASCII is a standard which defines a mapping from a 7-bit unsigned integer to a "glyph". These characters are sometimes treated like ordinals during comparisons. For example 'a' < 'z' and 'a' + 2 = 'c'. So it is a quasi-number system of sorts.

A 8-bit byte on the other hand utilizes the highest bit to indicate the sign. Now theoretically you could "store" an extra bit of information within a given char variable (which of course would have to be masked off before treating it as an ordinary char value). Point being that ASCII, signed integers, and their unsigned counterparts are each an example of an encoding scheme. Among others...
 

MrChips

Joined Oct 2, 2009
34,628
8-bit binary is, well, binary.

You can use it to represent 256 different things, even 256 random numbers, negative or positive. What it represents is entirely up to you.

The era of the IBM PC introduced the extended 256-character set.

https://www.ascii-codes.com/
 

WBahn

Joined Mar 31, 2012
32,704
I understand that the char data type can store ASCII values from 0 to 127, but I'm a bit confused about the range from -128 to -1.

Could you help clarify what happens in this range? I'm curious about how negative ASCII values, such as -128 to -1, are handled or if they have any representation.
There are no negative ASCII values.

The 'char' data type in C is fundamentally just an integer data type whose name happens to be 'char'. It is no different than the other integer data types of 'int', or 'long'.

The C standard sets lower limits on the range of values each data type must be able to represent. For a integer of type 'char', that is from 0 through +255 if the type is unsigned and -128 through +127 if the type is signed.

The C standard also requires that all naked integer data types EXCEPT 'char' must be signed. For 'char', it leaves that up to the implementation. So if you want to be able to represent values greater than +128, you should be careful to include the 'unsigned' type modifier, while if you want to be able to represent negative values, you want to include the 'signed' modifier.

The 'char' data type is typically used to store single-byte values (although a given implementation is free to have it be more than one byte).

Notice that NOTHING above says ANYTHING about characters or ASCII.

The C standard also requires that the execution character set include encodings for a specific set of characters, at a minimum. ASCII happens to cover all of those required characters and is, by far, the most commonly used execution character set. It also requires that the 'char' data type be wide enough to be able to represent all encodings of the execution character set.

Probably the most common use of a variable of type 'char' is to store the encoding of one member of the execution character set, hence the name that was chosen for this data type. This does NOT mean that EVERY variable of type char can be interpreted as a character. How the value is interpreted is up to the programmer.

Consider the following:

int monkeys;

This variably might be used to store the number of monkeys in a room.

Just because one instance of an 'int'-type variable is used to store a number of monkeys, does not imply that all 'int'-type variables have something to do with monkeys.

The same with instances of 'char'-type variables. Just because they are commonly used to store character codes, that does NOT imply that they always do.

IF you are using it to store character codes, then it is your responsibility to ensure that the values stored in that variable do not exceed the bounds of the character encoding that you are using. If you do allow that, then the behavior is usually undefined if you try to interpret that value as a character code at some point.
 

ApacheKid

Joined Jan 12, 2015
1,762
I understand that the char data type can store ASCII values from 0 to 127, but I'm a bit confused about the range from -128 to -1.

Could you help clarify what happens in this range? I'm curious about how negative ASCII values, such as -128 to -1, are handled or if they have any representation.
Modern computer memory is organized into pieces called "bytes" and these comprise eight bits. A "char" is a type that represents a byte of memory, it refers to a value that is always eight bits, an "unsigned char" too is also always eight bits.

We can store data in an eight bit byte, we can store 00000000 up to 11111111 and all the combinations in-between.

The difference between these is important when doing arithmetic, outside of the eight bits there is no way to store a sign for a number, so to have a sign indicating positive or negative we need a bit and so one of the eight bits is used for a sign, the remaining seven bits are then used to store the value.

char means a value that is stored as seven bit and a sign + or -.
unsigned char means a value that is stored as eight bits and no sign, no + or - it is just always regarded as being +, a positive number.

As others here have explained, ASCII (an old way to represent typewriter symbols and stuff) is a seven bit code, created many years ago to represent a limited number of symbols. There's no such thing as a negative ASCII character either.

An ASCII character can - in the C language - be stored in either a char or an unsigned char, it doesn't matter. The difference only matters when doing arithmetic, that is, using the value as a number rather than a symbol like 'a' or 'T' or '@'.

Does this help?
 
Last edited:

Papabravo

Joined Feb 24, 2006
22,058
8-bit binary is, well, binary.

You can use it to represent 256 different things, even 256 random numbers, negative or positive. What it represents is entirely up to you.

The era of the IBM PC introduced the extended 256-character set.

https://www.ascii-codes.com/
I would argue that 8-bit character sets preceded the introduction of the IBM PC by at least a decade and a half. System 360 (ca. 1964) had the 8-bit EBCDIC charter set.
 

WBahn

Joined Mar 31, 2012
32,704
Modern computer memory is organized into pieces called "bytes" and these comprise eight bits. A "char" is a type that represents a byte of memory, it refers to a value that is always eight bits, an "unsigned char" too is also always eight bits.
This is NOT true -- The C standard does NOT require that a 'char' data type always be eight bits -- that is merely the minimum required width, even if the execution character set can be represented with fewer bits. But the standard also requires that it be wide enough to represent every code in the execution character set, so if the execution character set has 500 characters, the minimum width of a 'char' would be 9, but would most likely be 16.

char means a value that is stored as seven bit and a sign + or -.
unsigned char means a value that is stored as eight bits and no sign, no + or - it is just always regarded as being +, a positive number.
Again, not true. A naked 'char' declaration is either signed or unsigned, at the discretion of the implementation. The limits.h header file has macros that let the program detect which case was chosen, should the distinction be important.

I've worked with compilers that have made different choices on this. If you assume that every compiler is going to make the same choices that the compilers you have use have made, that is asking to get bit.
 

WBahn

Joined Mar 31, 2012
32,704
I would argue that 8-bit character sets preceded the introduction of the IBM PC by at least a decade and a half. System 360 (ca. 1964) had the 8-bit EBCDIC charter set.
But that is not an extended ASCII character set, which is what he was referring to.

Having said that, I don't know whether the PC era is the first in which the 7-bit ASCII code was extended to 8-bits or not. It's almost certainly the one that has caused the most heartache, since there are so many of them and no standard was ever devised or adopted. In fact, I'm pretty sure it wasn't, since systems like Atari and other video game systems often used extensions for defining sprites in the game.

The character set that we often refer to as IBM Extended ASCII is not even a true ASCII extension, as it assigned glyphs (such as smiley faces) to many of the ASCII control codes.
 

MrChips

Joined Oct 2, 2009
34,628
I would argue that 8-bit character sets preceded the introduction of the IBM PC by at least a decade and a half. System 360 (ca. 1964) had the 8-bit EBCDIC charter set.
Yeah but...
EBCDIC was not ASCII.
ASCII started off as a 128-character set. The 8th bit was reserved for parity.
 

Papabravo

Joined Feb 24, 2006
22,058
But that is not an extended ASCII character set, which is what he was referring to.

Having said that, I don't know whether the PC era is the first in which the 7-bit ASCII code was extended to 8-bits or not. It's almost certainly the one that has caused the most heartache, since there are so many of them and no standard was ever devised or adopted. In fact, I'm pretty sure it wasn't, since systems like Atari and other video game systems often used extensions for defining sprites in the game.

The character set that we often refer to as IBM Extended ASCII is not even a true ASCII extension, as it assigned glyphs (such as smiley faces) to many of the ASCII control codes.
The original claim did not mention ASCII, only 8-bit character sets. There were 8-bit extensions of 7-bit ASCII for other languages and alphabets. Unlikely they would be called American standards, but they existed at least a decade before 1981. The IBM PC's claim to fame is notable, but it was one among many.
 

ApacheKid

Joined Jan 12, 2015
1,762
This is NOT true -- The C standard does NOT require that a 'char' data type always be eight bits -- that is merely the minimum required width, even if the execution character set can be represented with fewer bits. But the standard also requires that it be wide enough to represent every code in the execution character set, so if the execution character set has 500 characters, the minimum width of a 'char' would be 9, but would most likely be 16.



Again, not true. A naked 'char' declaration is either signed or unsigned, at the discretion of the implementation. The limits.h header file has macros that let the program detect which case was chosen, should the distinction be important.

I've worked with compilers that have made different choices on this. If you assume that every compiler is going to make the same choices that the compilers you have use have made, that is asking to get bit.
Yes, I accept your points, you are correct. But for the purposes of explaining the idea to the OP I wanted to avoid this idiosyncratic stuff. You also know my opinions on these traits of the C language, I think it's poorly designed, outdated and a hugely confusing language when you factor in stuff like this.

If a char was say 16 bits then the entire question of the legal range of numeric values changes, we're no longer able to say the range of char is -128 to 127 nor can we say the range of unsigned char is 0 to 255, so for the OP to even make sense we are inherently assuming a char is 8 bits given the way his question is stated.
 

WBahn

Joined Mar 31, 2012
32,704
Yes, I accept your points, you are correct. But for the purposes of explaining the idea to the OP I wanted to avoid this idiosyncratic stuff. You also know my opinions on these traits of the C language, I think it's poorly designed, outdated and a hugely confusing language when you factor in stuff like this.

If a char was say 16 bits then the entire question of the legal range of numeric values changes, we're no longer able to say the range of char is -128 to 127 nor can we say the range of unsigned char is 0 to 255, so for the OP to even make sense we are inherently assuming a char is 8 bits given the way his question is stated.
The TS gave the range for the char data type on his system, so it is quite reasonable to discuss the limits on his system. That is very different than making the blanket assertion that a char is ALWAYS eight bits and that a char is always signed.
 

ApacheKid

Joined Jan 12, 2015
1,762
The TS gave the range for the char data type on his system, so it is quite reasonable to discuss the limits on his system. That is very different than making the blanket assertion that a char is ALWAYS eight bits and that a char is always signed.
Yes, I accept your point, but if a newcomer to C were to learn the true nature of the language when beginning to learn, they'd likely scream when they learn how so much basic behavior is "implementation defined".
 
Top