C Language Conundrum

Discussion in 'Programmer's Corner' started by MrAl, Mar 18, 2017.

  1. MrAl

    Thread Starter Distinguished Member

    Jun 17, 2014
    3,229
    647
    Hello there,

    I was recently maintaining some old C code for the Windows platform and it always amazes me how things
    turn up strange sometimes either with C itself or with Windows. Windows coding is so strange sometimes
    that i could probably write forever on that, but here i limit it to a C conundrum.

    This involves the C function "memset" which has prototype:
    memset(void*, int c, size_t n);

    The name itself, "memset", implies that you can set memory with this function, but guess what,
    you cant, unless of course it is zero or some other value that fits a narrow range of applications
    not really worth mentioning here. In fact, the only thing this is really good for is if you
    want to zero your memory out, and then you have to be aware of the fictitious prototype definition
    because two out of the three arguments are not represented correctly.

    First we have the void *, which normally would mean you could pass any type of pointer you wanted
    to pass such as an int* and then the function would know that you are using a pointer to type int.
    But no way. That void* means, at least here, that everything gets automatically cast down to type char
    or perhaps unsigned char. This of course means it only operates on bytes (ANSI version).

    Next we have "int c", which is also NOT true. It's not an integer, it's a char or unsigned char.
    In any case, it's also cast down to type char or unsigned char, so you can only "set" chars (bytes)
    not integers or anything else.

    Now we come to "size_t n", which is somewhat correct, if you know about the limit to type char above.
    If you dont, then this is not true either. The reason you might not know is because of the first
    two arguments, which are so misleading that you would think you can set 'n' equal to the number of
    elements you want to set to that so called integer, 'c'.
    But yeah, assuming you know the pitfalls already then i guess we can say that the third arg is ok.

    What is problematic about this function is that so much of it is misleading i cant help but think
    that somebody, somewhere, screwed up when the language was first being invented. I dont think this
    happens too much in languages, but it did this time.

    Example that wont work:
    int a[20];
    memset((int*)a,0,20);

    Looks like it should work because we cast 'a' into an int pointer, and there are 20 elements. However,
    again since that last count must be the count of bytes, we should do something like this:
    int a[20]
    memset((int*)a,0,20*4);

    However that (int*)a part is fictitious because 'a' is automatically cast to type char (or uchar).
    So the final version ends up being:
    memset(a,0,20*4);

    although more appropriate is:
    memset(a,0,sizeof(a));

    and that works only because sizeof returns the entire number of bytes that 'a' takes up for storage
    not the number of elements.

    So one question that comes up is just what good is that "int c" then? It is only good if c=0 or if
    you have a very special case where you fill every byte with the (char)((int)c) value and it happens
    to work out to the right four bytes for every integer in the array. You still have to remember though
    to use the right value for 'n' also or all the ints would get set right, and it is sort of a hack
    anyway.


    Very strange, huh? What do you think?
     
  2. joeyd999

    AAC Fanatic!

    Jun 6, 2011
    2,944
    3,622
    Evolution from .asm was a mistake. Time to get back into the sea.
     
  3. WBahn

    Moderator

    Mar 31, 2012
    19,295
    5,233
    What is the basis for the claim that it would normally mean that the function would know what kind of pointer it is?

    Your problems come from very basic misunderstandings of the fundamentals of C.

    A void pointer is a pointer that points to an unknown type of data. Period. It is interpreted as an address in memory and nothing more. How can it be anything else? If I tell you memory address 420000, how can you possibly determine from that what type of data that is stored there. You can't. Well, neither can a function that is passed a void pointer. So a function that takes a void pointer can accept a pointer to any kind of data, but what it can do with it is limited to what can be done with a pointer to an address that contains data of an unknown type. So it points to the beginning of a block of memory. In C, memory is organized as blocks of bytes, so the only thing that a general purpose function like memset() can interpret a void pointer as is the address of a block of memory that is to be treated as a block of bytes.

    It is an int because that is the most efficient type of data to pass back and forth on the stack. Of course memset can only be used to set bytes of memory, because that it the only reasonable assumption that a general purpose function like memset() can make.

    The reason you might not know what the first two arguments are is because you are insisting on using a function without bothering to read what that function is or does. If you use it incorrectly, that's a direct result of the choice YOU made and that's on YOU.

    How much effort is involved in Googling "memset()" and looking at any of the following links (the first four hits) for about ten seconds?

    https://www.tutorialspoint.com/c_standard_library/c_function_memset.htm

    http://www.geeksforgeeks.org/memset-c-example/

    http://man7.org/linux/man-pages/man3/memset.3.html

    http://www.cplusplus.com/reference/cstring/memset/

    I think you should start reading about the functions you want to use. That has nothing to do with C or any language in particular. How can you expect to get what you want if you won't take the time to bother reading about what the function you want to use does as opposed to blindly assuming that it is somehow going to magically read your mind and do what you would like it to do?
     
    nsaspook likes this.
  4. MrAl

    Thread Starter Distinguished Member

    Jun 17, 2014
    3,229
    647
    Hi,

    Wow, you really went off on the deep end with that reply :)

    Yes, for some reason i thought (at first) you could 'tell' the function what kind of pointer you were passing by casting it, but that is simply because i havent been using C that much lately and might have gotten confused by the way we can usually cast a return value. It has nothing whatsoever to do with mind reading.

    Although you made a lot of sense with your reply, you color it with overtones which dont help the situation at all. But one point stands out and that is where you are stating that a function cant do this, and it would be easy to have a memset that actual sets mem with any kind of pointer if we could just pass it's type. It would be simple to write one, but that wasnt what i was looking for anyway.

    The fact that type 'int' is faster is nice, but since it has to be cast to type char i would have to see the function in order to know for sure if type 'int' is really the fastest here. Since memset sets EVERY single byte, passing an int rather than type char would take a miniscule amount of time compared to setting all those bytes one at a time. Of course in most systems it probably defaults to the asm instruction that fills memory, but it will still take longer than the difference between passing int and char.
    That's the whole ball game in time profiling, noting the time taken by a function call as compared to it's body.

    I do appreciate your reply through as you reminded me of what the void pointer is doing for us, but maybe you could keep the mocking tone down just a little bit huh :)
    That's of course unless you never make a mistake yourself :)

    Anyway, back to the main idea of an improved version.
    If we could set the type that might be helpful.

    You'll also see a lot of confusion on the web about this function if you look around.
    I ran into this when i found a function written like this:
    memset((char*)a,0,sizeof(a));

    and i had to look three times to figure out why they went through the trouble of casting a to type char.
    It's not necessary.

    Yeah this is one of those low level functions which operates on bytes.
     
  5. WBahn

    Moderator

    Mar 31, 2012
    19,295
    5,233
    No, it's not necessary to cast 'a' to a char * and, if you are going to cast it to anything, it should be cast to what it needs to be, namely a void *.

    The call to sizeof(a) in that call can also get you in trouble. If 'a' is of pointer type (as opposed to array type), then sizeof(a) will return how many bytes it takes to represent a pointer and not how many bytes it takes to store the array pointed to by 'a'.

    As for making memset() somehow figure out what type of pointer it is pointing to, there is no way to do that. The notion of "types" is a compile time issue and the necessary symbol table does not survive the compilation process. How would you pass this information, anyway?

    Since you say it would be simple to write one, then why don't you write one and show us just how simple it would be?

    Say I have a pointer that points to an array of structures of type FRED. How would you pass this information to the MrAl_memset() function? What would your function, which was compiled long before the FRED structure was ever defined, do with this information?

    The very best you could do would be to pass it a pointer to an object of type FRED and, separately, the number of bytes occupied by a FRED object. Then the function could copy that block of bytes repeatedly a specified number of times starting at the address pointed to by the void pointer. But that is an oddball use and any structure that needs such a utility probably needs something more and so would have its own function for that purpose. The memset() function is intended to efficiently initialize a block of memory to a specific value. That value is usually zero, but it is common enough to want to initialize it to something else (usually -1) so it accommodates this use as well.

    The passing of the int is generally the most efficient because the language standard allows the compiler writer to match the int data type to the natural integer width of the targeted machine (as long as it is at least 16 bits wide). The underlying implementation of the memset() function is a completely different matter and, yes, most compilers make it as efficient as they can on the target hardware. This often involves a slight bit of initial overhead to break the block into three pieces -- a central one that is an integer multiple of the largest block of data that can be written and it properly byte aligned plus the fractional block prefix and suffix.
     
  6. ErnieM

    AAC Fanatic!

    Apr 24, 2011
    7,879
    1,772
    The main issues I see here are all related to RTFM, either outright no doing it, or not believing what it says.

    While there may be some serious issues with parts of the standard C libraries this function is quite well behaved when used in the mannor intended.
     
    MrSoftware and nsaspook like this.
  7. MrAl

    Thread Starter Distinguished Member

    Jun 17, 2014
    3,229
    647
    Hi,

    Thanks for the intelligent replies guys.

    Yes the main issue here was that void* as argument just means a pointer to the most low level type, which is the byte (or char or uchar), and it does no good to cast it into any other type because the function itself sees only the pointer in the same way that passing a pointer becomes a void pointer in a function anyway.
    I was thinking along the lines of a prototype like:
    void* ThisFunc(int k);

    and then calling it like so:
    int* lp;
    lp=(int*)ThisFunc(10);

    and here we cast the void* to type int* so we can use it as a pointer to int's instead of pointer to char's.

    There is another issue though like the "int c" argument does make it seem like you can pass an integer and that the function will treat it as an integer, but the function really treats it as a byte.

    The name itself, "memset", has to be treated as a low level function which can only work on bytes. It does make sense to do this, but it also makes sense to have a higher level function to operating on other types, especially regular arrays. I guess we just have to depend on higher level libraries to handle that, or else write the function ourselves.

    Yes, i could envision a function that takes a pointer, a value, and the size of the array and the size of the type:
    Memset(void*,size_t n, size_t sizeof(int), int c);

    What else i dont remember right now is which arg it pushes on the stack LAST, if it is the last arg then that is probably ok, but if it is the first arg then change the order and put c first, but we run into a problem.
    If we want to tell the function we want to use int_64 we would need:
    Memset(void*, size_t n, size_t sizeof(int_64), int_64 c);

    and then "int_64 c" ends up being the hack.
    Maybe passing a pointer to the data to be placed into mem:
    Memset(void*,size_t n,size_t elementsize,void* value);

    and now we pass the address of value rather than the value itself, and elementsize can be one of:
    sizeof(char)
    sizeof(int)
    sizeof(int_64)
    or just some number that places value into successive memory 'n' times.
    I guess this may seem a little over complicated though.

    ErnieM:
    It's not really about understanding the function and knowing how it works in advance, it's about the way the function is presented in the prototype. It's like a hack, where we are writing one thing and thinking another. An example would be:
    memset(void*,unsigned char c,size_t n);

    On inspection, this IMMEDIATELY tells us that we can only set char's not ints, longs, ulongs, int_64's, etc. If it was written that way it would be very apparent how it works. What i see on the web is a lot of mistakes where the true prototype written as:
    memset(void*,int c,size_t n);
    makes people think that they can actually pass an integer for 'c', and so they are trying it, and have problems of course. I guess what the function does internally is something like:
    value=c&0x000000FF
    and then uses value to set the bytes, or something like that. Dont quote me on that part though :)

    So in the end "void*" is ok, but "int c" is still questionable because it is after all a hack.
    The name itself, "memset" is ok if we think of it as a low level function that can only operate on bytes.

    To illustrate a little more clearly why it is a hack, which admittedly may be hard to avoid, is this prototype:
    int SetValues(int a, int b, int c);

    Now this can be any function really:
    int DoStuff(int a, int b, int c);

    What do we expect for these functions when we see the prototype?
    We expect to be able to pass three integers, and get as return one integer. But what if the function made 'b' into a char before using it? It would make a lot more sense to write it as:
    int DoStuff(int a, char b, int c);

    and now we know RIGHT away that 'b' can only be of type char.

    Another possibility i did not look at yet was if memset was possibly always compiled as an inline function rather than just a function. I dont think there is a need for that but cant be sure without seeing some asm code. They dont have the word 'inline' in the prototype definition though so i guess it is not, although who knows, the writers may have left that out too :)

    Ok, so a function like this:
    Memset(void*,size_t numbofelements, size_t elementsize, void* value);
    might work. Then we might call it like so:
    Memset(arry,10,sizeof(int),&b);
    or something like that anyway.
    This would sometimes require this though:
    Memset(arry, sizeof(arry)/sizeof(int), sizeof(int), &b);
    but hey that's life :)

    I guess the main problem with the original prototype is that we want to PASS type int, but USE type char, and there is no good way to indicate that in the prototype:
    memset(void*,char_as_int c,size_t n);
    We would have to make up the 'type' "char_as_int".
    typedef int char_as_int;
     
    Last edited: Mar 19, 2017
  8. WBahn

    Moderator

    Mar 31, 2012
    19,295
    5,233
    I wouldn't classify it quite like that.

    A void pointer is a pointer to an UNKNOWN type of data, not "the most low level" type. You CANNOT dereference a void pointer and you CANNOT perform pointer arithmetic on it, because the compiler has no idea what type of data it points to (so it can't interpret the bit pattern found there) or how large the data is. It is unknown. But the function is free to cast it into any type that it wants and, in fact, it MUST cast it into something before it can do anything other than pass it as an argument to another function or return it (or set it equal to something else, but that something else is now a void pointer).

    The way I write structures makes it so that I have some pseudo-generic functions (functions that aren't tied to a specific structure definition) and I pass in a void pointer (that allows me to pass in a pointer to any of the structures that have been defined according to my protocol). I then cast the pointer to an unsigned int which allows me to access the first four bytes of the structure. In these structures, the first four bytes are a code that identifies the type of structure. I can then use a look-up table to cast the pointer to that of a structure to the proper type and then proceed from there. It's a kludge that allows me to get some semblance of object-oriented behavior from my C code and has proven very useful.

    The claim that "passing a pointer becomes a void pointer in a function anyway" is entirely incorrect. The type of pointer is specified in the parameter list, just as the type of every other parameter. The same for the return value. If these are pointers, then they most certainly can be typed.

    If ThisFunc() returns a pointer that points to a value of type int (or an array of values of type int), then the prototype should specify that and it should be

    int *ThisFunc(int k);

    Functions like malloc() return a void pointer precisely because they have no idea what kind of data they are returning a pointer to -- that is up to the person using malloc() and therefore they should cast the return value properly so that the compiler can perform proper type checking.

    Again, remember C's pedigree. It is about performance. It was written for the purpose of implementing an operating system at a time when even the most capable machines were extremely limited by today's norms. The standard libraries reflect this -- they pass int data types over smaller data types whenever possible because the smaller data types impose overhead that you don't want in your operating system kernel if you can avoid it.

    You get to declare your Memset() with one prototype or the other. You don't have the option of declaring it one way and then using it as though it is declared the other. It doesn't matter which order things are pushed on the stack. The compiler has to know the exact type and size of all arguments at compile time. If you declared the function to take and int and you pass it an int_64, then it will coerce the int_64 to an int. Similarly, if you declared it to be an int_64 and pass an int to it, then the int will be coerced to an int_64.

    A prototype ONLY tells you the number and type of arguments a function takes. Nothing more. The prototype for memset() could be

    void *memset(void *, int, size_t);

    There is NOTHING in the prototype that documents what the function does or what the parameters mean. If someone wants to go guessing because they are too lazy to read the documentation, then they have no one but themselves to blame when it doesn't do what they want it to or think it should.

    The C language places a HUGE amount of power in the hands of the programmer, but it also saddles them with a commensurate level of responsibility. Many programmers simply aren't emotionally prepared to deal well with that reality.
     
    ErnieM and nsaspook like this.
  9. nsaspook

    AAC Fanatic!

    Aug 27, 2009
    3,379
    3,274
    It's a common kludge that's used a lot. In the Linux kernel, structure is embedded into pointer returns for some functions.
    Code (C):
    1.  
    2. /*
    3. * make two threads for the spi i/o streams
    4. */
    5. static int32_t daqgert_create_thread(struct comedi_device *dev,
    6.     struct daqgert_private *devpriv)
    7. {
    8.     const char hunk_thread_name[] = "daqgerth", thread_name[] = "daqgert";
    9.     const char *name_ptr;
    10.  
    11.     if (devpriv->use_hunking)
    12.         name_ptr = hunk_thread_name;
    13.     else
    14.         name_ptr = thread_name;
    15.  
    16.     devpriv->ai_spi->daqgert_task =
    17.         kthread_create_on_node(&daqgert_ai_thread_function,
    18.         (void *) dev,
    19.         cpu_to_node(devpriv->ai_node),
    20.         "%s_a/%d", name_ptr,
    21.         devpriv->ai_node);
    22.     if (!IS_ERR(devpriv->ai_spi->daqgert_task)) {
    23.         kthread_bind(devpriv->ai_spi->daqgert_task, devpriv->ai_node);
    24.         wake_up_process(devpriv->ai_spi->daqgert_task);
    25.     } else {
    26.         return PTR_ERR(devpriv->ai_spi->daqgert_task);
    27.     }
    28.  
    29.     devpriv->ao_spi->daqgert_task =
    30.         kthread_create_on_node(&daqgert_ao_thread_function,
    31.         (void *) dev,
    32.         cpu_to_node(devpriv->ao_node),
    33.         "%s_d/%d", name_ptr,
    34.         devpriv->ao_node);
    35.     if (!IS_ERR(devpriv->ao_spi->daqgert_task)) {
    36.         kthread_bind(devpriv->ao_spi->daqgert_task, devpriv->ao_node);
    37.         wake_up_process(devpriv->ao_spi->daqgert_task);
    38.     } else {
    39.  
    40.         return PTR_ERR(devpriv->ao_spi->daqgert_task);
    41.     }
    42.  
    43.     return 0;
    44. }
    45.  
    Here 'daqgert_task' is a I/O programming function (daqgert_ai_thread_function with a void data pointer to structures) assigned to a specific core in a multi-core machine. If 'kthread_create_on_node' fails then the embedded error code for that failure is passed when 'daqgert_create_thread' returns.

    Code (C):
    1.  
    2. /*
    3.  * A client must be connected with a valid comedi cmd
    4.  * and *data a pointer to that comedi structure
    5.  * for this not to segfault
    6.  */
    7. static DECLARE_WAIT_QUEUE_HEAD(daqgert_ai_thread_wq);
    8.  
    9. static int32_t daqgert_ai_thread_function(void *data)
    10. {
    11.     struct comedi_device *dev = (void*) data;
    12.     struct comedi_subdevice *s = dev->read_subdev;
    13.     struct daqgert_private *devpriv = dev->private;
    14.     struct spi_param_type *spi_data = s->private;
    15.     struct spi_device *spi = spi_data->spi;
    16.     struct comedi_spigert *pdata = spi->dev.platform_data;
    17.  
    18.     if (!devpriv)
    19.         return -EFAULT;
    20.     dev_info(dev->class_dev, "ai device thread start\n");
    21.  
    22.     while (!kthread_should_stop()) {
    23.         while (unlikely(!devpriv->run)) {
    24.             if (devpriv->timer)
    25.                 schedule();
    26.             else
    27.                 wait_event_interruptible(daqgert_ai_thread_wq, test_bit(AI_CMD_RUNNING, &devpriv->state_bits));
    28.  
    29.             if (kthread_should_stop())
    30.                 return 0;
    31.         }
    32.         if (likely(test_bit(AI_CMD_RUNNING, &devpriv->state_bits))) {
    33.             if (likely(devpriv->ai_hunk)) {
    34.                 daqgert_handle_ai_hunk(dev, s);
    35.                 devpriv->hunk_count++;
    36.                 hunk_count = devpriv->hunk_count;
    37.             } else {
    38.                 daqgert_handle_ai_eoc(dev, s);
    39.                 devpriv->ai_count++;
    40.                 pdata->kmin = ktime_set(0, pdata->delay_nsecs);
    41.                 __set_current_state(TASK_UNINTERRUPTIBLE);
    42.                 schedule_hrtimeout_range(&pdata->kmin, 0,
    43.                     HRTIMER_MODE_REL_PINNED);
    44.             }
    45.         } else {
    46.             clear_bit(SPI_AI_RUN, &devpriv->state_bits);
    47.             smp_mb__after_atomic();
    48.             wait_event_interruptible(daqgert_ai_thread_wq, test_bit(AI_CMD_RUNNING, &devpriv->state_bits));
    49.             smp_mb__before_atomic();
    50.             set_bit(SPI_AI_RUN, &devpriv->state_bits);
    51.             smp_mb__after_atomic();
    52.         }
    53.     }
    54.  
    55.     return 0;
    56. }
    57.  
     
  10. ErnieM

    AAC Fanatic!

    Apr 24, 2011
    7,879
    1,772
    Structures in C can have a close resemblance to objects in C++. A while back I did some work making some libraries for the MASM32 project which is the assembly language used on PC computers. My main focus was making the interface to create assembly language plug ins using the COM object model where code blobs present themselves as objects. These objects followed the C++ scheme so to write COM you needed to know C++ under the hood.

    When you pass an object in C++ you pass a reference to memory, some blob you obtain from a system call. I forget what call as to make it a COM object you needed to use something other than malloc to keep it thread safe. The object is filled with elements defined by a published structure which includes the data used by that object, and a set of pointers to the public functions. When you access an object (call a function) you use the object pointer as a structure to locate where the code for that objects function resides.

    The name of the object pointer is "this", as in this is the object I am using. C++ objects include this pointer implicitly, C and other languages must pass it explicitly.
     
  11. MrAl

    Thread Starter Distinguished Member

    Jun 17, 2014
    3,229
    647
    Hi,

    First, when i said "the most low level type" i did actually mean the byte. The void pointer is a pointer to bytes because the byte is the most low level type and until we cast it to a higher level type, it just points to the start of the byte string. Yes it should be cast to a type like char, int, etc., but that just tells the compiler something more than it had before that cast. Semantics? Maybe, but if the type is a long, int, char, struct, etc., the void pointer points to the first byte of that.
    Im not saying that it should not be cast to a known type, just that it defaults to a pointer to bytes.
    Im also not saying that the following will work:
    void MyFunc(void* p)
    {
    uchar a;
    a=*p;
    }
    but that the void pointer cant point to anything other than bytes because the machine cant split bytes by itself because the byte is the lowest level form known to the language. More importantly, it cant group them into multiples either.
    Maybe that is not the best interpretation but in binary it makes sense. If you can never split a byte up then the byte is the lowest level form.

    As to the speed of "int c" vs "uchar c", time profiling has not changed since the dawn of programming. It's never about ONE thing, it is always about a GROUP of things and how ONE thing compares to the GROUP, with attention to the timing of course.
    For a simple example, say we have a function that sets array bytes by making each byte the same value as a counter variable like 'k', starting from 0 to 1000. As the counter progresses, p[0]=0, p[1]=1, p[2]=2, etc., up to p[1000]=1000. Now if this is an unsigned char array and instead of starting at 0 we want to start at a value passed to the function, the function might be:
    void MyFunc(int c)
    {
    uchar cc;
    cc=(uchar)c;
    for (k=cc;k<=1000;k++)
    etc.
    }

    or it might be:
    void MyFunc(unsigned char cc)
    {
    for (k=cc;k<=1000;k++)
    etc.
    }

    Which looks better to you?

    But more to the point, let's just look at the function def:
    void MyFunc1(int c)

    versus:
    void MyFunc2(unsigned char c)

    Now if we have to do 1000 iterations inside either of these functions and each iteration takes 1 unit of time and pushing and popping type int takes 1 unit of time and pushing and popping type uchar takes 2 units of time, the time for the first function is some common overhead plus 1 unit plus 1000 units which is 1001 units, and for the second function is some common overhead plus 2 untis plus 1000 units which is 1002 units of time. If the unit of time was 1us, that's 1.001ms vs 1.002ms. If the unit of time was 1ns then that's 1.001us vs 1.002us.
    So i think we've established that there is not much difference when the count of iterations inside the loop is somewhat high.

    If we lower the count to 100, then it becomes 1.01ms vs 1.02ms. If we lower the count to 10 then it becomes 1.1ms vs 1.2ms, a little more significant but still only 10 percent. It's only when we get a very low counts that it starts to make a difference, but then of course we have to consider the rest of the function block and see how that part affects the overall time too.
    We also have to consider that casting type int to type char probably requires and AND operation like:
    cc=0x000000FF&c;

    which adds to the time for the version where we pass type 'int'. To be sure we'd have to see the exact implementation of memset and maybe also if it varies with compiler.

    It is true however that whoever wrote the original function believed there would be a big time saving, so we should really look for the implementation of memset to be sure. But you should still be able to see my main point, and that is that the function does not use type int it uses type char or uchar so writing the prototype as having type int is very misleading. It doesnt even matter if it takes more time or not really, it matters that the statement with 'int' in it tells us something that is not entirely true. I will repeat the function i quoted earlier here:
    void MyFunc(int a, int b, int c, int d)

    which one of those int's is used as type char, or are all of them or none of them used as type char?
    Now substitute a new type for one of those:
    void MyFunc(int a,int b, intchar c, int d)

    and we immediately see that 'c' is different than the rest. "intchar" is actually the same as type 'int':
    typedef int intchar;

    but now the prototype reflects this difference and so we have to see what intchar means here, and when we do that we see that inside the function that 'c' is being used as a char.
    Keep in mind however that all we ever see is:
    void MyFunc(int a, int b, int c, int d)

    and so we think we are passing all int's and they are being used as int's, when really one of them is not.
    This i believe is very bad practice. Not everything is about following the 'rules' to the letter, it's also about making the code readable. Yes it is true that the function prototype tells what the function "takes" but in the case where it does not actually use that type in the function makes for some confusion.

    So to recap, to find out if using type 'int' is faster than using type 'char' is faster we'd have to see the implementation of memset. I dont even think we can push or pop a lone byte in a modern system, i think we have to push at least a 32 bit value. That would make sense use type 'int', but some way of showing this in the prototype would be a good idea.
    To take this idea to the extreme, does that mean we can never pass type "char" simply because we cant push a char?

    Yes, i see many programmers crying when they see their functions are not working...they have to take lots of antidepressants :)
    I agree it takes more effort to work with C rather than a higher level language. These days i use a higher level language because there are certain benefits that make more sense for most of the stuff i do these days. There's no way around using C however for programs already written in C so i have to use it sometimes just like C++. I do get 'spoiled' by using the higher level language however, so i have to change my way of thinking when i go back to C. What i do like about C though is that it is usually very concise, USUALLY. Compilers sometimes mess this up a little though.
     
  12. MrAl

    Thread Starter Distinguished Member

    Jun 17, 2014
    3,229
    647
    Hi,

    I think that is when i got the idea that the byte was the lowest level type on the computer. That's because i had to work in binary when working in a higher level language that knew nothing of COM and yet had to access COM functions. COM is a binary standard so it works at the "lowest level". This meant converting everything in the higher level language to bytes so COM could understand it.
     
  13. ErnieM

    AAC Fanatic!

    Apr 24, 2011
    7,879
    1,772
    Actually when making prototypes in assembler for the commonly used COM interfaces most every parameter was a 32 bit DWORD size. The PC was using 32 bit at the time and passing 32 bits was the most efficient way to do anything, as using anything smaller was just wasting bits becausenitnwasntransfering 32 bits.

    Today's PCs would be passing 64 bit integers to fit the native data size. Bytes are so 1980's.

    Aside: I got into MASM32 assembly because I needed to know the Windows API for other C/C++ work I was doing but had a brain fart understanding data types due to my assembly background (why are these different types if they are all the same size???). I found it an interesting subject all on its own and pursued it for some time.
     
  14. WBahn

    Moderator

    Mar 31, 2012
    19,295
    5,233
    NO!!!! It is NOT a pointer to bytes!

    It is a pointer to data of UNKNOWN type and size! PERIOD! End of sentence.

    Perhaps one of the reasons you keep thinking that things are misleading is because you keep insisting on holding on to interpretations that are based on what you want to believe and not in reality.

    Besides, on most machines today the fundamental, atomic data size is NOT the byte, it is either the 32-bit word or the 64-bit word. Accessing anything smaller than this, whether it be a short, a byte, or a bit, involves bit banging while working on an atomic-sized data item.
     
    Last edited: Mar 20, 2017
    ErnieM likes this.
  15. ErnieM

    AAC Fanatic!

    Apr 24, 2011
    7,879
    1,772
    Agreed, not pointing to bytes. It is a pointer, period. A pointer appropriate to the hardware in use such that it may refer to memory.

    What is at that location is unknown, hence the pointer type is "void". Void as in there's no type checking possible so the developer must check that something appropriate is being sent, least they send a pointer to an apple to something expecting a pointer to a pickup truck.
     
  16. WBahn

    Moderator

    Mar 31, 2012
    19,295
    5,233
    And let me point out that there was a time when I was under the same impression and, as a result, kept writing code that tried to treat a pointer as a pointer to a byte and kept having to waste time debugging code as a consequence. Once it finally got through my thick skull that a void pointer is a pointer to an UNKNOWN data type, those problems evaporated instantly.
     
  17. MrAl

    Thread Starter Distinguished Member

    Jun 17, 2014
    3,229
    647
    Hi,

    Well i have to disagree again because the 'atomic' size of storage is not the same as the atomic size of the CPU. The atomic size of storage is the byte, while that of the CPU depends on the CPU type which is anything from one 8 bit byte to one 64 bit byte on various machines. It might be true that modern machines draw at least 32 bits at a time, but conceptually it's still a byte.
    I guess this point can be argued, but when we buy memory see see that we are buying in units of bytes not ints, longs, int_64's, etc.
    The void pointer can be viewed as a pointer to bytes when we realize that a pointer can NOT point to anything other than bytes on a machine that stores everything in bytes. The might be the more general void pointer though, and that comes from working with binary standards. I would bet it would be hard to work with this in the C language though, and i think that is why you disagree which is understandable.
    Note the following reference which assumes *p contains the 32 bit value 0x12345678:
    void MyFunc(void* p)
    {
    ulong x;
    uchar b,c,d,e;

    x=*(ulong*)p; //0x12345678
    a=*((uchar*)p+0); //0x78
    b=*((uchar*)p+1); //0x56
    c=*((uchar*)p+2); //0x34
    d=*((uchar*)p+3); //0x12
    //the above all should work, but there is no such thing as the following:
    e=*((nibble*)p+0); //0x7
    f=*((nibble*)p+1); //0x8
    }

    In binary, the void pointer must point to bytes although the common use is not like that. In fact, every pointer points to bytes it's the compiler that sees it differently. When you work in binary you see this right away because when you pass a pointer to a set of code written in another language it has to know how many bytes are in the groups, such as with type long. When we pass a pointer we are just passing a number like 0x92345668, and that is a one dimensional value so it can never tell anything else how many bytes are in each set of data in the storage area, but it can not point to a bit in a byte because the byte is the atomic unit of storage (even though the CPU may not see it like that either).

    Also, we can pass a void pointer and an element size (such as 1,2,3,4, etc) but the element size will be in bytes. Can we pass a pointer to nibbles? No. We cant even simulate this using the addressing standard which is in bytes.
    For example, if we had two bytes with values 1 and 2, we can point to that 1 or to that 2, but if we had two nibbles that were 1 and 2, we could only point to the lower order nibble or perhaps the high order nibble but we can not point to both on the same system and this is obvious because there is no address increment less than 1 byte.

    I believe most compilers will complain if we tried to use a void pointer directly though because it would not be able to handle it directly.
     
  18. MrAl

    Thread Starter Distinguished Member

    Jun 17, 2014
    3,229
    647
    Going from one standard to another everything is a void pointer because there is no way to pass anything about the type from one standard to another. It must therefore be taken as a pointer to bytes, and how it is handled must be determined by another number that is also passed. If you would rather call this a 'byte' pointer rather than void pointer that's up to you, but try getting a void pointer to point to something less than one full byte.

    Note also that as of late we have encountered 'aligned' pointers, which are pointers to memory that fall on boundaries of either 2 bytes or 4 bytes.

    What you seem to be suggesting is that a void pointer points to nothing at all unless we say it is something, and that cant be true because when we look at the numerical value of the pointer it is the same no matter what we cast it into.
    I think the value i got was 0x00066E44 one time and you can cast this into pointers of type char, long, unsigned long, int_64, int_jibberjabber and it still holds that same value.

    Another interesting point is that if you have numbers stored in the space pointed to by that void pointer and you pass that void pointer to a program that has to read memory directly and that program resides in the same memory space, you can read the bytes one by one using a function like peek(address). I've done this so many times i cant count on 100 people's hands and feet :)

    BTW i misspoke a while back when i said that everything becomes a void pointer in a function, what i meant to say was that if an array is passed as a regular pointer (like int* p) the function does not know if it is an array or just a regular pointer. That's not the same as a void pointer :)
    I may be likely to call this a "void array pointer" which is only 'void' in the sense that we dont know how many elements are in that array, if any.
     
    Last edited: Mar 20, 2017
  19. WBahn

    Moderator

    Mar 31, 2012
    19,295
    5,233
    What "standard" are you talking about? And what is the "going from one standard to another" babble about?

    When you compile a C program, the only way that a pointer can be dereferenced is if the compiler knows what type of data it is pointing to. If it is a void pointer then the type of data it is pointing to is unknown. Where are you getting this silly notion that anyone is claiming that a void pointer doesn't point to anything. It's value is that of a pointer on the underlying hardware. The compiler keeps track of the type of data that a pointer points to. If it is a void pointer, then the compiler simply does not know what type of data it points to.

    You can keep insisting on your fantasy interpretation. I can't stop you. But don't be surprised when you keep being "misled" by clearly defined and well-documented functions because you insist on viewing them through that fantasy world's lens.
     
  20. MrSoftware

    Active Member

    Oct 29, 2013
    674
    186
    In response to the original post; you're expecting the language to do too much for you. C is a lower level language, and memset() is a very old C function. What you get is exactly what you see.

    void *memset(void *s, int c, size_t n);

    It sets "n" bytes of memory, starting at the address specified by "s" to the value specified by "c", and returns the value of s. That's all it does, no more no less. It works exactly as the man page describes.
     
    nsaspook likes this.
Loading...