Reallocation function

Thread Starter

gogo00

Joined Oct 28, 2023
43
I'm grappling with understanding dynamic memory allocation and reallocation in C, particularly the intricacies of the realloc function. To illustrate my confusion, let's consider the following scenario:

Program initially allocate dynamic memory for three variables with values 30, 31, and 32, resulting in memory locations 30, 31, and 32. Now, if I want to add two extra numbers but find that memory locations 33 and 34 are already occupied by other variables, and memory is available from location 35.

My question is: What happens when I use the realloc function in this situation? How does it handle the case where the desired memory locations are already in use?
 

nsaspook

Joined Aug 27, 2009
16,275
The reallocation is done by either:

a) expanding or contracting the existing area pointed to by ptr, if possible. The contents of the area remain unchanged up to the lesser of the new and old sizes. If the area is expanded, the contents of the new part of the array are undefined.
b) allocating a new memory block of size new_size bytes, copying memory area with size equal the lesser of the new and the old sizes, and freeing the old block.
https://en.cppreference.com/w/c/memory/realloc
 

Papabravo

Joined Feb 24, 2006
22,070
There is NO correlation between the value contained in a memory location and the address of that memory location. Just get THAT silly notion out of your head. It would be silly to allocate memory in small blocks. Normally you would allocate a larger block. When you are done with the block you would free it and return the storage to the heap. The next time you need some memory you would allocate a brand new block and you don't care where the block is located. This would be the normal way with things.

The realloc() function is used to resize an existing block. The actual result may be implementation dependent, in particular it seems that it will return a block of the requested size which may be in a completely different area of memory. It will do this after "freeing" the original block. There is literally no guarantee the pointer returned will be the same one that was passed. One reason for this is to avoid creating a large number of "small" fragments. This would be a fairly easy test to instrument.
 

xox

Joined Sep 8, 2017
936
I'm grappling with understanding dynamic memory allocation and reallocation in C, particularly the intricacies of the realloc function. To illustrate my confusion, let's consider the following scenario:

Program initially allocate dynamic memory for three variables with values 30, 31, and 32, resulting in memory locations 30, 31, and 32. Now, if I want to add two extra numbers but find that memory locations 33 and 34 are already occupied by other variables, and memory is available from location 35.

My question is: What happens when I use the realloc function in this situation? How does it handle the case where the desired memory locations are already in use?
Maybe it would help to see an example of using realloc in an actual program?

Code:
/*
    realloc example (note: code was written for clarity, not efficiency)
*/

#include <stdio.h>
#include <stdlib.h>

/*
    read a line into an allocated buffer from standard input
*/
char* read_line_as_buffer(void) {
  size_t len = 1;
  size_t cur = 0;
  //    always allocate an extra byte for the sentinal zero
  char* buf = malloc(len + 1);
  while (1) {
    int ch = getchar();
    //    check for end-of-line condition(s)
    if (ch == '\n' || ch == EOF)
      break;
    //    grow the buffer to make room for more data 
    if (cur >= len) {
      len = 2 * len + 1;
      buf = realloc(buf, len);
    }
    buf[cur++] = ch;
  }
  //    null-terminate to make a proper "string"
  buf[cur] = 0;
  return buf;
}

int main(void) {
  while (1) {
    printf("Type something, then press enter (or an empty line to exit)\n > ");
    char* line = read_line_as_buffer();
    //    did the user enter an empty line?
    if (*line == 0)
      break;
    printf("You typed: `%s`\n", line);
    free(line);  //    DO NOT forget to do this!!!
  }
}
 

WBahn

Joined Mar 31, 2012
32,777
As others have said, realloc() returns a pointer to the new memory block and it may (or may not) be at the same base address as the original memory block. So it is extremely important that ALL references to the original block (i.e., all copies of the original pointer value) be updated to the new value.

It's also sometimes useful to know that if realloc() fails to allocate the new memory, it returns a NULL pointer but does NOT deallocate the old memory block. However, this requires you to have a local copy of the pointer saved instead of blindly overwriting it with the return value from realloc().

The typical application for realloc() is when you are gathering an unknown amount of data, particularly if it might be either very small or very large. You can incrementally grow the amount of memory to match the size of the data as it is read. A common way to do this is to double the amount of memory that is allocated each time in order to reduce the reallocation overhead, since moving the data from one block to another is increasingly expensive as the block size grown.
 

WBahn

Joined Mar 31, 2012
32,777
As a follow-up, something I was going to mention, but forgot, is that if realloc() fails and you don't have a copy of the original pointer saved, then you have just suffered a memory leak. So best practice is to always assign the return value of realloc() to a temp pointer, then check to see if it is NULL. If it isn't, assign it to the actual pointer and move on (realloc() took care of freeing the block that the original pointer pointed to). If it IS null, then you have the option of keeping the original pointer value, knowing that it points to a block of the original size and NOT a block of the size you just requested. If, instead, you want to do something else and don't need the original block, you can now free it properly.
 

xox

Joined Sep 8, 2017
936
As a follow-up, something I was going to mention, but forgot, is that if realloc() fails and you don't have a copy of the original pointer saved, then you have just suffered a memory leak. So best practice is to always assign the return value of realloc() to a temp pointer, then check to see if it is NULL. If it isn't, assign it to the actual pointer and move on (realloc() took care of freeing the block that the original pointer pointed to). If it IS null, then you have the option of keeping the original pointer value, knowing that it points to a block of the original size and NOT a block of the size you just requested. If, instead, you want to do something else and don't need the original block, you can now free it properly.
All good and sound suggestions of course. But to be fair if your program ever does happen to reach that point most likely it will fail anyway. Unless there is some way for the program to cope with running out of memory then what is the point in even bothering to check return values? Not to mention the kruft of all the boilerplate you'd be adding to every single function that attempts to do so. Maybe better to install an exception handler for SIGSEGV or what have you (setjmp/longjmp might useful).
 

nsaspook

Joined Aug 27, 2009
16,275
All good and sound suggestions of course. But to be fair if your program ever does happen to reach that point most likely it will fail anyway. Unless there is some way for the program to cope with running out of memory then what is the point in even bothering to check return values? Not to mention the kruft of all the boilerplate you'd be adding to every single function that attempts to do so. Maybe better to install an exception handler for SIGSEGV or what have you (setjmp/longjmp might useful).
If it's something like a communications buffer with flow-control it shouldn't fail if programmed correctly with FIFO buffers (usually in hardware so no software overhead) that have fractionally full (3/4 is a common value) interrupts. You assert flow-control to constrain the need for more memory growth.
 

ApacheKid

Joined Jan 12, 2015
1,762
I'm grappling with understanding dynamic memory allocation and reallocation in C, particularly the intricacies of the realloc function. To illustrate my confusion, let's consider the following scenario:

Program initially allocate dynamic memory for three variables with values 30, 31, and 32, resulting in memory locations 30, 31, and 32. Now, if I want to add two extra numbers but find that memory locations 33 and 34 are already occupied by other variables, and memory is available from location 35.

My question is: What happens when I use the realloc function in this situation? How does it handle the case where the desired memory locations are already in use?
I'd be inclined to avoid realloc myself. As others point out there's some discipline required and frankly there's no problem that realloc can solve that can't be solved without it.

There might be some good use cases, I've never seen one myself though.
 

michael8

Joined Jan 11, 2015
472
The only advantage I can see for using realloc is to expand an area where the realloc routine might be able to expand it
without having to copy the data while malloc/copy/free would require a copy and in addition would leave a "hole" in
memory allocation (more fragmentation).

I've never used realloc.
 

WBahn

Joined Mar 31, 2012
32,777
I'd be inclined to avoid realloc myself. As others point out there's some discipline required and frankly there's no problem that realloc can solve that can't be solved without it.

There might be some good use cases, I've never seen one myself though.
A good use case has already been presented here, twice. When you don't know how much data you are going to receive, such as getting input from a keyboard, or data from a network interface or some other source where you don't know how much data you are going to get in advance, you can store the data as it comes in and expand the size of your buffer as you approach its limits. Most of the time, the memory manager can simply extend the memory block you already have. When it can't it handles moving the data to the new block.

The issues with using it are no more complicated that the issues using memory allocation in general. You should ALWAYS confirm that the request succeeded before using the block and, sadly, most people don't.
 

ApacheKid

Joined Jan 12, 2015
1,762
A good use case has already been presented here, twice. When you don't know how much data you are going to receive, such as getting input from a keyboard, or data from a network interface or some other source where you don't know how much data you are going to get in advance, you can store the data as it comes in and expand the size of your buffer as you approach its limits. Most of the time, the memory manager can simply extend the memory block you already have. When it can't it handles moving the data to the new block.

The issues with using it are no more complicated that the issues using memory allocation in general. You should ALWAYS confirm that the request succeeded before using the block and, sadly, most people don't.
Yes I can see what you're driving at, but I'm just not convinced it actually buys us anything. With realloc you cannot control whether it simply grows the existing block (if there is contiguous room) or whether it will perform a bulk copy to a newly allocated block, this means there is a potential cost, it might just resize the block or it might do a costly copy.

If the C heap functions had "expalloc" (expand allocated block) or something that would grow the block if possible and do nothing otherwise, that would certainly be a very good capability.

Surely in time critical and limited resource systems, we want to avoid copying data around for no good reason?

Multiple discontiguous buffers is how I've solved some of these kinds of problems in the past.
 
Last edited:

WBahn

Joined Mar 31, 2012
32,777
Yes I can see what you're driving at, but I'm just not convinced it actually buys us anything. With realloc you cannot control whether it simply grows the existing block (if there is contiguous room) or whether it will perform a bulk copy to a newly allocated block, this means there is a potential cost, it might just resize the block or it might do a costly copy.

Surely in time critical and limited resource systems, we want to avoid copying data around for no good reason?
What it buys you is that most of the time it DOESN'T have to copy the data around for no good reason. If you allocate a new block with malloc() every time you run up against the limit of the current block, you have to copy over the data EVERY time you expand the array.

Also, realloc() had the potential to expand the block under some situations in which using malloc() can't succeed. Using malloc() both the old and the new block have to exist at the the same time. But realloc() has options. Let's say that you have 10 MB of memory on the heap and, to keep things simply we will ignore memory block overhead, you have 4 blocks of 2 MB each allocated with 1 MB of free memory at the end. Now you deallocate the next to last block. You now want to increase the last block from 2 MB to 4 MB. You don't have a 4 MB block of memory available, so malloc() will fail. But realloc() can recognize that, once the original block is released, that there will be a 4 MB block available by combining the current 2 MB block that is available with the soon-to-be-released 2 MB block. It can then copy from the new to the old even though the two blocks overlap.

An alternative is to store the data in something like a linked list, which will require more overhead, both in processing time and in memory requirements, though it is more predictable time-wise. If this is important, then don't use realloc(), since it is more erratic in the execution time from call-to-call. But just like you don't have to use realloc() for every case, finding a situation in which you can't use it does not mean that you shouldn't use it in every case, either.
 

ApacheKid

Joined Jan 12, 2015
1,762
What is buys you is that most of the time it DOESN'T have to copy the data around for no good reason. If you allocate a new block with malloc() every time you run up against the limit of the current block, you have to copy over the data EVERY time you expand the array.

Also, realloc() had the potential to expand the block under some situations in which using malloc() can't succeed. Using malloc() both the old and the new block have to exist at the the same time. But realloc() has options. Let's say that you have 10 MB of memory on the heap and, to keep things simply we will ignore memory block overhead, you have 4 blocks of 2 MB each allocated with 1 MB of free memory at the end. Now you deallocate the next to last block. You now want to increase the last block from 2 MB to 4 MB. You don't have a 4 MB block of memory available, so malloc() will fail. But realloc() can recognize that, once the original block is released, that there will be a 4 MB block available by combining the current 2 MB block that is available with the soon-to-be-released 2 MB block. It can then copy from the new to the old even though the two blocks overlap.

An alternative is to store the data in something like a linked list, which will require more overhead, both in processing time and in memory requirements, though it is more predictable time-wise. If this is important, then don't use realloc(), since it is more erratic in the execution time from call-to-call. But just like you don't have to use realloc() for every case, finding a situation in which you can't use it does not mean that you shouldn't use it in every case, either.
Yes but this approach is simply striving for contiguity as much as possible, that's certainly a way of doing this stuff but it does entail copying of data - a cost, I just question the need, the necessity of striving for contiguity.

The scatter/gather API summarized in that article I linked to, doesn't care about contiguity, it just requires a way of collectively referring to the individual, discontiguous buffers. This approach only sees copying when data arrives or is transmitted, no copying of it takes place otherwise.

That example uses an array of "buffer descriptors" but as you say one could use a list too.

See Zero Copy.
 

WBahn

Joined Mar 31, 2012
32,777
Yes but this approach is simply striving for contiguity as much as possible, that's certainly a way of doing this stuff but it does entail copying of data - a cost, I just question the need, the necessity of striving for contiguity.

The scatter/gather API summarized in that article I linked to, doesn't care about contiguity, it just requires a way of collectively referring to the individual, discontiguous buffers. This approach only sees copying when data arrives or is transmitted, no copying of it takes place otherwise.

That example uses an array of "buffer descriptors" but as you say one could use a list too.

See Zero Copy.
And how big is the array of buffer descriptors? What happens if you run up against that limit while you are still reading in data?

If the end result is that you need the data in an array, then that data has to end up in a contiguous block of memory. If you don't need it in an array, fine. But if you do, then at the end you are still going to have to allocate a block large enough to hold it all and copy the data out of all the buffers over to the block. That this happens once after the data has been all read may be important if the reading of the data on time is critical. But if you need it to be in an array when everything is done, you may find yourself in a situation where you don't have enough memory to allocate the final block while all of the buffers are allocated, even though there is plenty of memory overall.

If you want to never use realloc(), then by all means, never use it. But that doesn't mean that it doesn't have a use and isn't a better alternative for many problems.
 

ApacheKid

Joined Jan 12, 2015
1,762
And how big is the array of buffer descriptors? What happens if you run up against that limit while you are still reading in data?

If the end result is that you need the data in an array, then that data has to end up in a contiguous block of memory. If you don't need it in an array, fine. But if you do, then at the end you are still going to have to allocate a block large enough to hold it all and copy the data out of all the buffers over to the block. That this happens once after the data has been all read may be important if the reading of the data on time is critical. But if you need it to be in an array when everything is done, you may find yourself in a situation where you don't have enough memory to allocate the final block while all of the buffers are allocated, even though there is plenty of memory overall.

If you want to never use realloc(), then by all means, never use it. But that doesn't mean that it doesn't have a use and isn't a better alternative for many problems.
Well I never said one had to use an array to represent the buffer descriptors, an unbounded list would certainly work fine too. I can envisage a FragmentedBuffer concept (perhaps a class in a suitable language) where we do not care about contiguity at all.

Reads and writes from/into such a thing are then simply a matter of multiple reads/writes if the IO calls do not support scatter/gather, if they do then it's a non issue.

It may well have some use cases sure, but the absence of control over whether it does a copy or not is a worry for me, i'd prefer it if the call took some options, flags so we could grow the existing block is possible but do nothing otherwise and just return a status. Being able to grow an existing allocated block if there is sufficient unused contiguous space is a superb concept, I do not object at all to that.

Heap managers often allocate blocks in integer multiples of some binary value too, like 8, 16, 32, 64 bytes at a time and so on. So allocating a 33 byte block will of course actually allocate a 64 byte block but the last 31 bytes are just unused, so being able to "expand" an allocation (if possible) is a good idea.

In my experience huge amounts of CPU time can be wasted by copying data all over the place only because a simplistic approach was taken for managing heap storage.

Having said all that I do recognize too that IO calls are also a cost and in tests I did on Windows some years ago, a single socket write of 1000 bytes was much less OS CPU than ten socket writes 100 bytes took in total, so minimizing the number of distinct IO calls is also an important parameter.
 
Last edited:
Top