Analyzing Output [ SOLVED ]

Thread Starter

Kittu20

Joined Oct 12, 2022
511
Consider the following C code

C:
#include<stdio.h>


int main (){

    unsigned int x = 10 ;

    int y = -40;

    if(x+y > 10)
    {

        printf("Greater than 10");

    }
    else
    {
        printf("Less than or equals 10");
    }
    
  return 0;
}
When this code is executed It print statement "Greater than 10"

I don't know why program print that statement inside if , the correct result should be "Less than or equals 10". The condition x + y > 10 evaluates to 10 + (-40) > 10, which simplifies to -30 > 10. This condition is not true, so the program should execute the code inside the else block and print "Less than or equals 10."
 

Irving

Joined Jan 30, 2016
4,994
You are adding an unsigned integer to a signed integer. The result is an unsigned value,

Add this print statement before the 'if' block :
printf(" x is\t%08x,\n y is\t%08x,\n x+y is\t%08x\n",x, y, x+y);

and you get:

x is 0000000a,
y is ffffffd8,
x+y is ffffffe2


FFFFFFE2 is 4,294,967,266 as unsigned, very definitely > 10!

To make the comparison work you must assign the result to a signed integer to use in the 'if' or cast the x+y to be a signed integer


if((int)(x+y) > 10) {


Note: just to confuse things,

printf("%d", (x+y));

does render it as -30 because %d assumes a signed value.
 
Last edited:

WBahn

Joined Mar 31, 2012
32,702
Consider the following C code

C:
#include<stdio.h>


int main (){

    unsigned int x = 10 ;

    int y = -40;

    if(x+y > 10)
    {

        printf("Greater than 10");

    }
    else
    {
        printf("Less than or equals 10");
    }
   
  return 0;
}
When this code is executed It print statement "Greater than 10"

I don't know why program print that statement inside if , the correct result should be "Less than or equals 10". The condition x + y > 10 evaluates to 10 + (-40) > 10, which simplifies to -30 > 10. This condition is not true, so the program should execute the code inside the else block and print "Less than or equals 10."
As Irving stated, the result is converting the signed integer y to a an unsigned integer, which wraps it around to a big value.

Here's where this is spelled out in the standard (using n869, the C99 Draft Standard):

6.3.1.8 Usual arithmetic conversions
From para 1:
If both operands have the same type, then no further conversion is needed.
Otherwise, if both operands have signed integer types or both have unsigned integer types, the operand with the type of lesser integer conversion rank is converted to the type of the operand with greater rank.
Otherwise, if the operand that has unsigned integer type has rank greater or equal to the rank of the type of the other operand, then the operand with signed integer type is converted to the type of the operand with unsigned integer type.
Otherwise, if the type of the operand with signed integer type can represent all of the values of the type of the operand with unsigned integer type, then the operand with unsigned integer type is converted to the type of the operand with signed integer type.
Otherwise, both operands are converted to the unsigned integer type corresponding to the type of the operand with signed integer type.

Unsigned integer and signed integer have equal rank (defined in 6.3.1.1)
The rank of any unsigned integer type shall equal the rank of the corresponding signed integer type, if any.

The conversion from integer to unsigned is also spelled out:

6.3.1.3 Signed and unsigned integers
1 When a value with integer type is converted to another integer type other than _Bool, if the value can be represented by the new type, it is unchanged.
2 Otherwise, if the new type is unsigned, the value is converted by repeatedly adding or subtracting one more than the maximum value that can be represented in the new type until the value is in the range of the new type.
3 Otherwise, the new type is signed and the value cannot be represented in it; the result is implementation-defined.

This may seem odd, since we look at the numbers in this example and see that they can both be represented at signed integers, so why not convert the unsigned int to a signed int. But the compiler has to put in the conversion code without making assumptions about what values will be there when it is executed and there are values that can be represented by each type that can't be represented by the other (so Note 1 does not apply). When this is the case, the conversion in either direction risks losing information or producing in incorrect value, but the rules for unsigned integers are very fully defined whereas the rules for signed integers have significant implementation-defined aspects, so the choice was made to convert to the unsigned integer.

To see how these rules are enforced if you make a different choice, make y a long int instead of just an int. Then x will be converted to a signed long int (on most compilers) because a signed long int (again, on most compilers) can represent all of the values of an unsigned int. If the compiler uses, for example, a 32-bit representation for both int and long int, then this will not be the case and the final clause will apply.
 

Thread Starter

Kittu20

Joined Oct 12, 2022
511
Thanks for the help

I have another question whose output surprised me

Code:
#include <stdio.h>

#define mul(x) (x)*(x)


int main()
{
    printf("%d ", mul(5));       //  25
    int x = 5;                   // x = 5
    printf("%d ", mul(++x));    // Expecting 25 but getting 49
  
    return 0;
}
Why line 10 print 49 instead of 25 ?
 

Irving

Joined Jan 30, 2016
4,994
#include <stdio.h> #define mul(x) (x)*(x) int main() { printf("%d ", mul(5)); // 25 int x = 5; // x = 5 printf("%d ", mul(++x)); // Expecting 25 but getting 49 return 0; }
Because ++x means increment x before using it... so inlining mul(x) you are calling ++x twice, incrementing it to 7 then multiplying it by itself to give 49.

If mul(x) had been a subroutine rather than a substitution, x would be incremented only once on entry and would return 6 * 6 = 36.
 

WBahn

Joined Mar 31, 2012
32,702
Thanks for the help

I have another question whose output surprised me

Code:
#include <stdio.h>

#define mul(x) (x)*(x)


int main()
{
    printf("%d ", mul(5));       //  25
    int x = 5;                   // x = 5
    printf("%d ", mul(++x));    // Expecting 25 but getting 49
 
    return 0;
}
Why line 10 print 49 instead of 25 ?
You are invoking undefined behavior because you are modifying the same variable twice between the same set of sequence points.

Since it is undefined, it can do anything and still be in compliance with the language standard -- it can use the incremented value after each execution, it can perform both increments after the multiplication, it can start a global thermonuclear war, or it can do exactly what you expect. The last possibility is the worst, because you will think you understand what it is doing and how it will behave in other programs and you don't.
 

Irving

Joined Jan 30, 2016
4,994
Regarding calling ++x twice, I don't see how that's happening. Can you provide more details

It should 6*6 = 36
You are invoking undefined behavior because you are modifying the same variable twice between the same set of sequence points.
Consider the define... you write mul(++x), the compiler starts by expanding the entry to (++x) * (++x)

The prefix increment is applied to x twice before the multiply

We can see this in the way prefix increment is defined in the standard. Prefix calls postfix, which updates x but returns the original value, however prefix ignores the return value from postfix and returns the, now incremented, variable by reference.
I'm not sure I agree that its undefined behaviour, though it is possible that a highly optimising compiler could recognise the ++x occurs twice and optimise to one increment, but all 3 that I've tried here behave as expected.

1705686261742.png
 

WBahn

Joined Mar 31, 2012
32,702
Consider the define... you write mul(++x), the compiler starts by expanding the entry to (++x) * (++x)

The prefix increment is applied to x twice before the multiply

We can see this in the way prefix increment is defined in the standard. Prefix calls postfix, which updates x but returns the original value, however prefix ignores the return value from postfix and returns the, now incremented, variable by reference.
I'm not sure I agree that its undefined behaviour, though it is possible that a highly optimising compiler could recognise the ++x occurs twice and optimise to one increment, but all 3 that I've tried here behave as expected.
But then why doesn't one of the ++x evaluate to 6 and the other evaluate to 7, since it should be starting out with the updated value from the first one, to yield a value of 42?

Here's what the C Standard has to say about it:

Code:
Between the previous and next sequence point an object shall have its stored value modified at most once by the evaluation of an expression. Furthermore, the prior value shall be accessed only to determine the value to be stored.

This paragraph renders undefined statement expressions such as

    i = ++i + 1;
    a[i++] = i;

while allowing

    i = i + 1;
    a[i] = i;
The example with a[] is a milder case of ++x * ++x
 

Irving

Joined Jan 30, 2016
4,994
But then why doesn't one of the ++x evaluate to 6 and the other evaluate to 7, since it should be starting out with the updated value from the first one, to yield a value of 42?
Because its passed by reference... both instances of (++x) act on the same memory location

The machine code for this would be something like:

// do first ++X
Load <X> -> ACC // X = 5
INC ACC
Store ACC -> <X> // X now 6
// do second ++X
Load <X> -> ACC // X = 6
INC ACC
Store ACC -> <X> // X now 7
//do mul
Load <X> -> ACC
Load <X> -> REG1
MUL ACC, REG1 -> ACC
Store ACC, <temp result>
 

WBahn

Joined Mar 31, 2012
32,702
Because its passed by reference... both instances of (++x) act on the same memory location

The machine code for this would be something like:

// do first ++X
Load <X> -> ACC // X = 5
INC ACC
Store ACC -> <X> // X now 6
// do second ++X
Load <X> -> ACC // X = 6
INC ACC
Store ACC -> <X> // X now 7
//do mul
Load <X> -> ACC
Load <X> -> REG1
MUL ACC, REG1 -> ACC
Store ACC, <temp result>
And where in the language standard does it require this behavior?
 

Irving

Joined Jan 30, 2016
4,994
And where in the language standard does it require this behavior?
It doesn't, but if I look at the code generated thats essentially what is for 3 dfferent compilers with optimisation switched off. IMHO that mirrors the intent of what was written, even though thats not what was expected.
 

WBahn

Joined Mar 31, 2012
32,702
It doesn't, but if I look at the code generated thats essentially what is for 3 dfferent compilers with optimisation switched off. IMHO that mirrors the intent of what was written, even though thats not what was expected.
That's a dangerous approach -- you are assuming that because the compilers you happen to have tested behaved a certain way, that that somehow defines the intent of the language and therefore the way that all compilers must behave.

The language was written the way it was specifically to put as few constraints on the compiler implementers as possible so that they could leverage the strengths and mitigate the weaknesses of the hardware they were targeting (as well as simplify their task as much as possible). When the language standard says that something is implementation-defined, the really mean that it is up to the implementation to define it (sometimes within constraints, such as the behavior of integer division for negative operands up until C11), and when they say that something is undefined behavior, they really mean that ANYTHING behavior is acceptable as far as the language is concerned. The fact that some, even most, compilers choose to behave a certain way in a specific instance of undefined behavior does NOT mean that this was in any way what was intended by the language committee and any compiler is free to do something else, including the compiler you are using in its next edition.

This may not make a lot of sense in today's world, but C was written in an era of extremely disparate hardware capabilities and extremely resource-starved platforms both for compiling the code and for executing the resulting programs. New languages have far less maneuvering room for implementers, and the C language itself has been progressively tightening down the behavior definition, but there are limits due to the need to avoid breaking legacy code as much as reasonably possible.

This is a perfect example of what I said in an earlier post: Since it is undefined, it can do anything and still be in compliance with the language standard -- it can use the incremented value after each execution, it can perform both increments after the multiplication, it can start a global thermonuclear war, or it can do exactly what you expect. The last possibility is the worst, because you will think you understand what it is doing and how it will behave in other programs and you don't.

Any program that relies on implementation-defined or undefined behavior has locked itself to a particular version of a particular compiler and must be carefully vetted whenever a change in compiler or a compiler upgrade is made. That is not the way to write good, maintainable code.

Back when I was first learning C and was unaware of such things as language specifications, I got bit by this. I was working with modular arithmetic and so had to find residues of negative results. Since my compiler truncated integer results, which has the effect of rounding toward negative infinity (which is one of two rounding methods allowed by the standard but specified by the implementation), I wrote my programs accordingly, assuming that this was how C programs worked because it was exactly what I expected. Then a year later I compiled it on a different compiler, without testing it since it had been thoroughly tested before and, after all, C is C, right? Well, that compiler made the other allowed choice and rounded negative results toward zero. The results weren't pretty, because I had all kinds of incorrect results that took some time to even notice because they were so intermittent (having to work with negative results was a fairly rare case) and a lot longer to track down. When I finally did, I was absolutely convinced that I had found a bug in the compiler because it wasn't doing integer division and remainder operations correctly. If nothing else, I had proof that at least one of the two compilers had a bug. In retrospect, that was a very valuable experience because that is how I got introduced to the notion of undefined-behavior and that there was this thing known as a language specification that defined how the language had to behave and that programs needed to stay within those bounds if they wanted to be portable.
 

Irving

Joined Jan 30, 2016
4,994
That's a dangerous approach -- you are assuming that because the compilers you happen to have tested behaved a certain way, that that somehow defines the intent of the language and therefore the way that all compilers must behave.
No, I didn't say that was the intent of the language, I said it mirrors, IMHO, the intent of what the programmer wrote even though its not what they intended.

The issue is the use of a #define rather than a subroutine. In a subroutine the ++x would have been evaluated on entry giving x=6 as defined by the standard and then passed to the inner code, giving x * x =36. However a #define is a simple substitution so the (++x) sub-clause is evaluated twice, by definition, so the first gives x=6 and the second, independently, gives x = 7. This must be true because an inline prefix INC is passed by reference as stated in the standard. Then the multiply takes x = 7 and multiplies it by x = 7, giving 49. This is not helped by using x as both a variable and a subtitution token. If the TS had written #define mul(a) (a) * (a) then the substitution, and its impact, might have been more obvious.

The machine code simply mirrors what the programmer wrote. I agree that the translation of C to machine code is not defined, IMHO, by the standard, but the expected output for a given operation is defined and that's the result, albeit the result of a (to the uninitiated) side-effect of the way prefix increment is defined and potentially implemented and the standard actually notes that possibility.

Passing by reference, and it's bigger brothers, multi-level indirection, catches out even very experienced programmmers.
 
Last edited:

WBahn

Joined Mar 31, 2012
32,702
This must be true because an inline prefix INC is passed by reference as stated in the standard. Then the multiply takes x = 7 and multiplies it by x = 7, giving 49.
WHERE is this defined by the standard? The word "reference" appears 24 times in the current draft standard for C23 (N3096) and not one of them has anything to do with passing information.

So which section of which document states that an inline prefix INC is passed by reference?
 
Top