Rules of c language are very confusing

nsaspook

Joined Aug 27, 2009
13,086
C cannot protect a programmer from making certain mistakes whereas Rust does. Yes all such languages work at an unsafe level under the hood where machine instructions are used but Rust can and does strictly and explicitly isolate unsafe code.

Have a read of this: Computer Scientist proves safety claims of the programming language Rust

That's proof in the mathematical sense, as in we can prove the square root of two is irrational, that kind of proof.



So if the team are all writing safe Rust code these kinds of errors simply cannot happen, ever, ever, ever - that is a huge improvement in quality.
It protects until you need to disable that protection to actually program a solution in Rust.
From the link you supplied .
Sometimes, however, it is necessary to write an operation into the code that Rust would not accept because of its type safety," the computer scientist continues. "This is where a special feature of Rust comes into play: programmers can mark their code as 'unsafe' if they want to achieve something that contradicts the programming language's safety precautions.
...
This proof, called RustBelt, is complemented by Ralf Jung with a tool called Miri, with which 'unsafe' Rust code can be automatically tested for compliance with important rules of the Rust specification - a basic requirement for correctness and safety of this code.
In other words you need a knife (external program "Miri') too just like with C to check for correctness in the land-mine field. I've seen lots of CS mathematical proofs be junked because they don't prove physical correctness in non-deterministic conditions like hardware interfacing.

I looked the proof and the preconditions and limitations.
https://plv.mpi-sws.org/rustbelt/popl18/paper.pdf
9 CONCLUSION
We have described λRust, a formal version of the Rust type system that we used to study Rust’s
ownership discipline in the presence of unsafe code. We have shown that various important
Rust libraries with unsafe implementations, many of them involving interior mutability, are safely
encapsulated by their type. We had to make some concessions in our modeling: We do not model
(1) more relaxed forms of atomic accesses, which Rust uses for efficiency in libraries like Arc; (2)
Rust’s trait objects (comparable to interfaces in Java), which can pose safety issues due to their
interactions with lifetimes; or (3) stack unwinding when a panic occurs, which causes issues similar
to exception safety in C++ [Abrahams 1998]. We proved safety of the destructors of the verified
libraries, but do not handle automatic destruction, which has already caused problems [Ben-Yehuda
2015b] for which the Rust community still does not have a modular solution [Rust team 2016]. The
remaining omissions are mostly unrelated to ownership, like proper support for type-polymorphic
functions, and “unsized” types whose size is not statically known12
Mathematical proofs relate to exactly how a MODEL will behave. They don't have much to do with how the real world behaves.
 
Last edited:

ApacheKid

Joined Jan 12, 2015
1,533
It protects until you need to disable that protection to actually program a solution in Rust.
From the link you supplied .

In other words you need a knife (external program "Miri') too just like with C to check for correctness in the land-mine field. I've seen lots of CS mathematical proofs be junked because they don't prove physical correctness in non-deterministic conditions like hardware interfacing.

I looked the proof and the preconditions and limitations.
https://plv.mpi-sws.org/rustbelt/popl18/paper.pdf


Mathematical proofs relate to exactly how a MODEL will behave. They don't have much to do with how the real world behaves.
Well C is unsafe all the time, has no "safe" option at all. The proof I mentioned proves that certain execution outcomes are impossible in safe Rust code assuming of course the compiler, hardware and microcode etc are all trustworthy.

I any real project the amount of unsafe code can likely be tiny, so the vast bulk of a team's code can be trusted to be 100% free of certain kinds of execution problems.
 

nsaspook

Joined Aug 27, 2009
13,086
Well C is unsafe all the time, has no "safe" option at all. The proof I mentioned proves that certain execution outcomes are impossible in safe Rust code assuming of course the compiler, hardware and microcode etc are all trustworthy.

I any real project the amount of unsafe code can likely be tiny, so the vast bulk of a team's code can be trusted to be 100% free of certain kinds of execution problems.
I agree (but am doubtful of impossibility) and that's the reason I like Rust as a programming principle for any programming language but it's pretty much the same for C with a proper checker for those class of bugs like Sparse:
https://en.wikipedia.org/wiki/Sparse
https://lwn.net/Articles/689907/
 

ApacheKid

Joined Jan 12, 2015
1,533
Curiously I actually enjoyed learning C back in the early 1990s. I only initially studied it because I was interested in developing software on "PCs" at that time and the only language I knew really well was PL/I from mainframes and minicomputers. That had once been advocated and used for PCs because Gary Kildall used it as the preferred language on CP/M. But DOS gradually killed CPM and so C prevailed.

So I started to learn C on a 286 PC and eventually purchased Borland's tools which were very good. To push myself with real challenges (not just stupid "Hello World" programs) I set real challenges and one was a lexical analyzer for PL/I. So I learned C by eventually developing a full multi phase compiler for a different language, including code generator, optimizer.

Once NT became prevalent I rebuild the code for 32 bit NT using Visual C. Eventually writing a complex C API that could generate the COFF OBJ and DLL files which enabled me to use the MS Linker to link the PL/I code modules. It all worked well but had no commercial future (I did sell a few copies though for shops that wanted to compile PL/I locally for developer use where the code was destined for other platforms).

I understand that C works OK especially when some discipline is enforced. I later used C professionally on minicomputers and PCs. A huge problem I found (and devised strict rules to minimize) was the way C headers often included other headers and this could lead to an unfathomable mess if left unchecked.
 

ApacheKid

Joined Jan 12, 2015
1,533
Another totally weird thing about C is the way arrays and pointers (completely different abstractions) are almost interchangeable. Most C developers soon internalize this and just live with it but when you step back and take a fresh look, it is another one of those horrible language "features".

I mean look:

Code:
    char message[] = "0123456789";
    char record[] = { 0, 1, 2, 3, 4, 5, 6, 7, 8, 9 };
    
    int w, x, y, z;
    
    for (;;)
    {
        w = sizeof(message);
        x = strlen(message);
        
        y = sizeof(record);
        z = strlen(record);

        HAL_GPIO_WritePin(GPIOA, GPIO_PIN_4, GPIO_PIN_RESET);
        HAL_SPI_Transmit(&spi, (uint8_t *)message, strlen(message), HAL_MAX_DELAY);
        HAL_GPIO_WritePin(GPIOA, GPIO_PIN_4, GPIO_PIN_SET);
        HAL_Delay(10);
    }
The variable "message" is an array, apparently with ten elements, but NO! it actually has 11.
 

nsaspook

Joined Aug 27, 2009
13,086
Another totally weird thing about C is the way arrays and pointers (completely different abstractions) are almost interchangeable. Most C developers soon internalize this and just live with it but when you step back and take a fresh look, it is another one of those horrible language "features".

I mean look:

Code:
    char message[] = "0123456789";
    char record[] = { 0, 1, 2, 3, 4, 5, 6, 7, 8, 9 };
  
    int w, x, y, z;
  
    for (;;)
    {
        w = sizeof(message);
        x = strlen(message);
      
        y = sizeof(record);
        z = strlen(record);

        HAL_GPIO_WritePin(GPIOA, GPIO_PIN_4, GPIO_PIN_RESET);
        HAL_SPI_Transmit(&spi, (uint8_t *)message, strlen(message), HAL_MAX_DELAY);
        HAL_GPIO_WritePin(GPIOA, GPIO_PIN_4, GPIO_PIN_SET);
        HAL_Delay(10);
    }
The variable "message" is an array, apparently with ten elements, but NO! it actually has 11.
Obvious, simple rules with logical cause and effect if you really understand memory semantics on hardware. "message" is an array initialized with zero terminated character string "" memory semantics. Nothing horrible about that feature at all.

None of BCPL, B, or C supports character data strongly in the language; each treats strings much like vectors of integers and supplements general rules by a few conventions. In both BCPL and B a string literal denotes the address of a static area initialized with the characters of the string, packed into cells. In BCPL, the first packed byte contains the number of characters in the string; in B, there is no count and strings are terminated by a special character, which B spelled *e. This change was made partially to avoid the limitation on the length of a string caused by holding the count in an 8- or 9-bit slot, and partly because maintaining the count seemed, in our experience, less convenient than using a terminator.
The solution constituted the crucial jump in the evolutionary chain between typeless BCPL and typed C. It eliminated the materialization of the pointer in storage, and instead caused the creation of the pointer when the array name is mentioned in an expression. The rule, which survives in today's C, is that values of array type are converted, when they appear in expressions, into pointers to the first of the objects making up the array.

This invention enabled most existing B code to continue to work, despite the underlying shift in the language's semantics. The few programs that assigned new values to an array name to adjust its origin—possible in B and BCPL, meaningless in C—were easily repaired. More important, the new language retained a coherent and workable (if unusual) explanation of the semantics of arrays, while opening the way to a more comprehensive type structure.
Critique
Two ideas are most characteristic of C among languages of its class: the relationship between arrays and pointers, and the way in which declaration syntax mimics expression syntax. They are also among its most frequently criticized features, and often serve as stumbling blocks to the beginner. In both cases, historical accidents or mistakes have exacerbated their difficulty. The most important of these has been the tolerance of C compilers to errors in type. As should be clear from the history above, C evolved from typeless languages. It did not suddenly appear to its earliest users and developers as an entirely new language with its own rules; instead we continually had to adapt existing programs as the language developed, and make allowance for an existing body of code. (Later, the ANSI X3J11 committee standardizing C would face the same problem.)
Dennis M Ritchie, Development of the C Language

https://www.bell-labs.com/usr/dmr/www/chist.html
 

ApacheKid

Joined Jan 12, 2015
1,533
Obvious, simple rules with logical cause and effect if you really understand memory semantics on hardware. "message" is an array initialized with zero terminated character string "" memory semantics. Nothing horrible about that feature at all.




Dennis M Ritchie, Development of the C Language

https://www.bell-labs.com/usr/dmr/www/chist.html
Oh, I understand it perfectly.

But the scope for human error is increased when we have this kind of thing.

Code:
    char message[] = "0123456789";
    char record[] = { '0', '1', '2', '3', '4', '5', '6', '7', '8', '9' };
message has a strlen of 10 and a sizeof of 11, record has a strlen of 23 (in my particular case) and a sizeof of 10.

Yet they are identical data types - arrays of 'char', this non uniformity is the kind of thing that can led to misunderstandings.

If the language had simply supported a type string this kind of problem would not arise:

Code:
    string message = "0123456789";
    char record[] = { '0', '1', '2', '3', '4', '5', '6', '7', '8', '9' };
Different data types, hence different semantics, C is infamous for buffer overrun bugs, its a feature of the language! Also arrays do not carry metadata in C, so a function accepting a parameter like:

Code:
process_message (char[] buffer);
Has no idea about the bounds of the array, it cannot know how many elements it has, how "big" the array is, so if this is needed (and it invariably is) it has to be passed in by the caller but if the caller gets that wrong...

A well designed language would invisibly pass that metadata around, its a tiny cost but no more cost than the caller having to explicitly think about this and pass it around explicitly, and the benefits are huge, the compiler knows 100% how big that array is, but a human might pass the wrong thing, might pass the size of a different array by mistake and so on, this is all well known in the world of C.

No amount of static analysis performed on the source code of function process_message can tell if a buffer overrun is present.
 
Last edited:

nsaspook

Joined Aug 27, 2009
13,086
Water under the bridge with C my friend. Being efficient, powerful, mysterious and 'dangerous' are traits humans admire, always will.
 

WBahn

Joined Mar 31, 2012
29,979
It's rather pointless to keep complaining about things that C "should" have done but didn't. The fact is, it didn't and that ship has sailed. Many of the things you are complaining about, such as assignments evaluating to a value, are things that were done intentionally for the sake of performance by giving the programmer a lot of control over how the code compiled by how they wrote their programs. That's why there are three different ways of incrementing the value of an integer, c=c+1, c+=1, and c++. Most early compilers were not sophisticated enough to take a single expression and figure out how best to use the processor's instruction set to shave clock cycles of the execution, but by having three different ways of doing it, compiler writers could leverage a given processor's capabilities when modifying the value of a target used in the expression or when incrementing the value of a variable without needing to evaluate an expression. Today, there are probably few mainstream compilers that need those hints (though compilers for specific families of embedded products might still use them).

Remember, C was NOT meant to be used by novice programmers to write generic programs -- it was meant to be used by experienced system programmers to write code that had demanding performance requirements to meet. Two very different groups of people and mindsets and programming strategies. I would be one of the first to say that, in an ideal world, C would be used almost exclusively in situations in which performance mattered above nearly all else. Especially today, when we can tolerate all the "safe" stuff because we have processors with more than enough speed and memory to handle all of that overhead while still achieving more than acceptable performance for the overwhelming majority of applications.
 

ApacheKid

Joined Jan 12, 2015
1,533
Water under the bridge with C my friend. Being efficient, powerful, mysterious and 'dangerous' are traits humans admire, always will.
Would you regard the same kind of laxity as admirable in say a handgun, say a safety catch that doesn't quite work when certain brands of ammo are used? should the scope for human error be admired rather than steps being taken to minimize or even eliminate it? I think not. Well likewise in software where such code might well be used to control some industrial system where such dangers are a liability.
 

ApacheKid

Joined Jan 12, 2015
1,533
It's rather pointless to keep complaining about things that C "should" have done but didn't. The fact is, it didn't and that ship has sailed. Many of the things you are complaining about, such as assignments evaluating to a value, are things that were done intentionally for the sake of performance by giving the programmer a lot of control over how the code compiled by how they wrote their programs. That's why there are three different ways of incrementing the value of an integer, c=c+1, c+=1, and c++. Most early compilers were not sophisticated enough to take a single expression and figure out how best to use the processor's instruction set to shave clock cycles of the execution, but by having three different ways of doing it, compiler writers could leverage a given processor's capabilities when modifying the value of a target used in the expression or when incrementing the value of a variable without needing to evaluate an expression. Today, there are probably few mainstream compilers that need those hints (though compilers for specific families of embedded products might still use them).

Remember, C was NOT meant to be used by novice programmers to write generic programs -- it was meant to be used by experienced system programmers to write code that had demanding performance requirements to meet. Two very different groups of people and mindsets and programming strategies. I would be one of the first to say that, in an ideal world, C would be used almost exclusively in situations in which performance mattered above nearly all else. Especially today, when we can tolerate all the "safe" stuff because we have processors with more than enough speed and memory to handle all of that overhead while still achieving more than acceptable performance for the overwhelming majority of applications.
Very well, but the complaints I make about C were only being made as supporting arguments for my view that the language grammar continues to be a negative influence on newer languages, that it should not be seen and used as a starting point for newer languages, that was the context.

C is indeed C and I've written many thousands of lines of it over the years, some of it in big systems, settlement systems, message routing systems etc, used at securities trading firms or stock exchanges and so on. But new languages continue to "pay homage" as it were, to C. take Rust for example, that borrows some of the grammar (but has the good sense to eliminate X++ kinds of monstrosities) Java, C#, Swift (since removed), javascript etc all inherit these grammatical traits.

Note:

The increment/decrement operators in Swift were added very early in the development of Swift, as a carry-over from C. These were added without much consideration, and haven't been thought about much since then. This document provides a fresh look at them, and ultimately recommends we just remove them entirely, since they are confusing and not carrying their weight.

I'm really discussing programming language grammar and design, the criticisms of C are just examples of what I see as bad practice in the ongoing development of new languages today.
 
Last edited:

ApacheKid

Joined Jan 12, 2015
1,533
This is a great example of why I balk at C sometimes, this is me looking at options for a project I'm playing with:

1666911079342.png

That option "Language Standard for C files" is blank, so I clicked to see what the options were and look!

There are in fact fourteen options, and IMHO that means 14 languages that are very similar but not the same.

And C++ is no better, look:

1666911304634.png

(I have no idea what is used when the field is left blank!).
 

joeyd999

Joined Jun 6, 2011
5,237
This is a great example of why I balk at C sometimes, this is me looking at options for a project I'm playing with:

That option "Language Standard for C files" is blank, so I clicked to see what the options were and look!

There are in fact fourteen options, and IMHO that means 14 languages that are very similar but not the same.

And C++ is no better, look:

(I have no idea what is used when the field is left blank!).
I feel the same way when walking down the toilet paper isle at Walmart.

(My preferred options are "absolute" or "relocatable")
 

WBahn

Joined Mar 31, 2012
29,979
This is a great example of why I balk at C sometimes, this is me looking at options for a project I'm playing with:

That option "Language Standard for C files" is blank, so I clicked to see what the options were and look!

There are in fact fourteen options, and IMHO that means 14 languages that are very similar but not the same.
So C is the only language that has ever evolved or had variants?

I've never had to modify old C code to get it to work with a newer version of the language. Sure can't say the same for Python.
 

xox

Joined Sep 8, 2017
838
So C is the only language that has ever evolved or had variants?

I've never had to modify old C code to get it to work with a newer version of the language. Sure can't say the same for Python.
Python is such a headache to work with. Just installing/loading libraries can be an absolute nightmare...
 

ApacheKid

Joined Jan 12, 2015
1,533
I forgot that C does not support the equality operator for two identically structured structs. So to compare two structs with many fields for equality we must write different code either a member by member compare or a block compare of memory. It does allow assignment though, I can assign one struct to another and it does (effectively) a member to member assignment for all members.

It really would be trivial for any compiler to support this and it would not be a breaking change either since it is illegal now!
 

joeyd999

Joined Jun 6, 2011
5,237
I forgot that C does not support the equality operator for two identically structured structs. So to compare two structs with many fields for equality we must write different code either a member by member compare or a block compare of memory. It does allow assignment though, I can assign one struct to another and it does (effectively) a member to member assignment for all members.

It really would be trivial for any compiler to support this and it would not be a breaking change either since it is illegal now!
That's what C++ is for.
 

xox

Joined Sep 8, 2017
838
I forgot that C does not support the equality operator for two identically structured structs. So to compare two structs with many fields for equality we must write different code either a member by member compare or a block compare of memory.

Ah yes, the beloved C boilerplate. Provided there are no "complex" pointers involved you can always use a simple macro.


Code:
#define MEQ(a, b) (memcmp(&a, &b, sizeof(a)) == 0)

Or even


Code:
#define MEQ(a, b) ((sizeof(a) == sizeof(b))\
? (memcmp(&a, &b, sizeof(a)) == 0) : 0)
Granted, without type-checking we are back to old arguments about how C can be so unsafe if used improperly!

That's what C++ is for.

Exactly. Although I must say I never did bother to learn the newer version of the language. I still use the older syntax.


https://codebeautify.org/c-formatter-beautifier


C:
#include <stdio.h>



int main(void) {


  int i = 1, d = 1;


  for (;;) {


    printf("%d", i);


    i += d;


    if (!i) break;


    if (i == 2446) d = -1;


  }


  puts(": PRIME");


}

Kind of long for a signature, if you ask me. Plus the minified version isn't nearly as likely to be confused for a response. I also just like the idea of this short snippet of code producing such a large prime number. Although admittedly Python is capable of producing even more impressive feats in that respect:


Code:
import sys
sys.stdout.write(str(2*(2618163402417*2**1290000-1)+1))
print(": SAFE PRIME")

Big-integers are baked into the language.
 
Top