Storing big data

Thread Starter

Gajyamadake

Joined Oct 9, 2019
310
I am thinking in c language how do we store the big data from one person to 2 million person.

lets say if we need to write program to store age and gender of a person only

we can declare two variables

int age = 18;
char gender = 'M';


We have choice array, structure and I think we can use class also to store large data in c++


What data type we need to write program to store the name, age and gender of persons living in colony
What data type we need to write program to store the name, age and gender of persons living in city
What data type we need to write program to store the name, age and gender of persons living in state
What data type we need to write program to store the name, age and gender of persons living in country
 

BobTPH

Joined Jun 5, 2013
8,813
How you store the data is all wrapped up with how you need to access it. There are entire university courses on the topic (the one I took was called "data structures." It is not something you can cover in a forum post.

Bob
 

Thread Starter

Gajyamadake

Joined Oct 9, 2019
310
How you store the data is all wrapped up with how you need to access it. There are entire university courses on the topic (the one I took was called "data structures." It is not something you can cover in a forum post.

Bob
I am asking suggestion what would be experience programmer prefer array, data structure class or any other

lets look into two case

1. What data type we need to write program to store the name, age and gender of persons living in city

2. What data type we need to write program to store the name, age and gender of persons living in country

Do you think data structure is best choice ?
 

Papabravo

Joined Feb 24, 2006
21,159
I am asking suggestion what would be experience programmer prefer array, data structure class or any other

lets look into two case

1. What data type we need to write program to store the name, age and gender of persons living in city

2. What data type we need to write program to store the name, age and gender of persons living in country

Do you think data structure is best choice ?
You are making the default assumption that there is a single correct, easy, and obvious answer. That is far from true. For example if we are aiming for extreme flexibility I would argue for a single array of wide characters containing a variable length, delimited string, with the required data. Now I am more than certain that I will get pushback from one or more members, and that would just confirm my thesis.
 

bogosort

Joined Sep 24, 2011
696
In the real world, there's only one answer: use a database. Given the constraints, this seems to be a homework assignment designed to have the student consider the scaling impact of the various built-in data structures.
 

nsaspook

Joined Aug 27, 2009
13,086
@nsaspook @Papabravo

I think I have asked question in wrong way. as far as I know we can declare single variable, array, structure union, class and other to store the information into memory location.

I was thinking if there is only one person so I should declare only one variable

if there are five peoples in family we should declare array

if there 20 thousand peoples are in city we should declare structure

if there are 1 million peoples are in state we should declare class

I might be wrong but this is my thinking to declare the data type for the given requirement
Your basic data array/structure/class for each person should be the same (excluding indexing and accessing data) for 1 person or two million people. The question is how to store that X number of person array/structure/class for actual use in a program. IMO this is more a database (internal/external) construction question (how to add and use indexing/access data), not a general programming question of memory storage and organization types.
 
Last edited:

Papabravo

Joined Feb 24, 2006
21,159
Granted but that's only what he thinks, not the reality. Two million possible records, each with individual personal description Tuples is not a problem normally solved with a gp language unless his goal is to design a new database program.
https://en.wikipedia.org/wiki/Relational_model
I agree, but giving him the answer in this way is the equivalent of doing his homework for him. I thought we generally discourage that. He is obviously not a professional and I doubt he has access to a commercial grade database software suite. Even if he does his question suggests that he needs to learn a great deal more than he knows at this point. Let him discover things by doing them. It never hurt me taking that path.
 

nsaspook

Joined Aug 27, 2009
13,086
I agree, but giving him the answer in this way is the equivalent of doing his homework for him. I thought we generally discourage that. He is obviously not a professional and I doubt he has access to a commercial grade database software suite. Even if he does his question suggests that he needs to learn a great deal more than he knows at this point. Let him discover things by doing them. It never hurt me taking that path.
Your point about he needs to learn a great deal more is why this is not just giving him the answer. It will hopefully point him in the right direction to find a good answer.
 
Last edited:

Papabravo

Joined Feb 24, 2006
21,159
In the dawn of computer time (ca. 1970 when Unix time begins) many primitive "databases" were constructed as text files with variable length records separated by a pair of characters known as "CRLF" or "CRNL". The CR is "Carriage Return" and LF or NL is "Line Feed" or "NewLine". This usage should be understood in the context of early printers attached to computers being more like typewriters than today's laser and inkjet printers. In that era a powerful tool for searching this primitive database called "grep" was developed. It stands for "General Regular Expression Processor". It was used to scan "large", by the standards of the day, text files for all lines that matched an easily constructed patterns. If each line in the primitive "database" contained a zip code it was easy to make a list of records that had the zip code 34432 without writing a special piece of software to do it.

$ grep 34432 ./database.txt

would print all lines with 34432 in the line from the file "database.txt" in the current directory.

A couple of observations:
  1. Searching the database is a linear function of the total number of characters in the file.
  2. The items in the database are not stored in any particular order. If you need to sort them it may take a while if the data base is large and the keys involve multiple fields
  3. The amount of time it takes find a particular record is highly variable, but at least is bounded.
  4. There is no obvious way to efficiently update and maintain such a database as it gets large.
This is really where you can begin to investigate the issues that you are hinting at.
 

MrChips

Joined Oct 2, 2009
30,712
You need to look at the big picture. It shouldn't matter whether the database has one entry or one billion entries.
You may at first think that you can create a solution for single person and then scale it up. But is it scalable?
Think of the big picture first. What do you need to do with the information? How can you create, manage, maintain, and search the database efficiently?

This is really a database problem, not a programming problem.
 

djsfantasi

Joined Apr 11, 2010
9,156
Database problem; programming problem. They are the same problems.

I adhere that a program cannot be efficiently developed unless the data is strongly defined. Thus, the data drives the program and not the other way around.

Take a look at the links on this site that I’ve written for a custom language. The basic record consists of four values. I implemented this as four arrays, but it could have been easily be implemented as a structure. I treated other constructs similarly. Besides the coding structure, I have a multitasking structure, a mapped structure and a execution structure. Or arrays instead of structured... Hence my initial statement, “database problems are the same as programming problems”!
IMHO, @MrChips has the best answer to this post.

It’s not what you think you need to do; it’s what you must do, that’s the issue

So, I suggest to the thread starter that he becomes 100% sure if what he needs to do. Then, using the options available to him (arrays/ structures/ class) which are most comfortable, he designs the data structures before the code.

This may be an iterative process. I added an additional element during this process becomes it made the code easer.

In closing, define the DATA before CODE, and you’ll likely result in the best solution
 

nsaspook

Joined Aug 27, 2009
13,086
This is why you use a database system that's been tested instead of writing your own code indexing code for things that are important. Programmer assumed result of Python "glob"-function is sorted, when in fact it is not.
https://www.vice.com/en_us/article/...sed-errors-in-more-than-100-published-studies

The code in question depended on the OS to sort files in a specific order on the physical disk. However that order changes by by file and operating system.
 

bogosort

Joined Sep 24, 2011
696
I was looking one real world example for array/structure/class so I decided to store information of one or more persons
You'll receive better help the more specific you are about what you're trying to do and with what you're having problems. From the quote above, it seems like your real goal is to learn about arrays/structures/classes, using a list of people as example data. Is that accurate?

In your example, a person is associated with a name, a gender, and an age. The most natural data types for these are a string (or array of char), a char (or boolean), and an integer. Considering that the elements of an array, by definition, must have the same data type, an array doesn't seem like a good a fit for the given data.

A structure, on the other hand, groups multiple data types into a single record, which is exactly what you're trying to do. To store the records of a list of people, a common paradigm is to use an array of structures: each element in the array is the same type of "Person" structure, each of which stores the information for a single person. There are other ways of doing this, including a structure of arrays, wherein a single structure represents the entire list; to handle more than one person, each element in the structure is an array.

These organizational paradigms -- whether you use an array of structs, a struct of arrays, or something else entirely -- have their own pros and cons in terms of memory usage, speed of access (cache locality), and programmatic ease of access. Regardless of your choice, you'll need to write functions that let you insert, edit, delete, and display the data. There are other considerations, too. For example, a person's age isn't static, it will change on their birthday. In this light, perhaps the age field should be replaced by a birthday field, with a helper function that returns the person's current age based on the current date.

Such helper functions can be integrated into the structure itself with function pointers, but as the complexity and scope of the structure grows, it becomes more difficult to manage from a programming point of view. This is where classes shine. A class is essentially a structure with some extra behind-the-scenes stuff added in to help the programmer deal with code complexity. Most notably, in the context of your example, is inheritance: you may, for example, decide to further organize the people in your list according to their roles, using role-specific subclasses that inherit from the generic Person class.

Hopefully you can see that there is no single "right" choice when it comes to organizing data in memory. If you're trying to learn about the various options, I highly recommend implementing all of the them to get a first-hand feel of the advantages and disadvantages of each. You should even try to do it just using arrays. One possible way is to have an array of strings for the names, an enum for the gender, an integer representation of the birthday, and an integer "id" field. Each of these can be mapped to a single unique integer, which gets stored in the "persons" array of integers, indexed by the "id" field.
 

MrSoftware

Joined Oct 29, 2013
2,188
To be very general; define a container for your data to represent one person. Put whatever data you want into it. Use a class, nested classes, struct, nested structs, whatever, there are probably at least a hundred ways to skin the cat. The first step is defining exactly what data you want to store. Lets make it really basic for a moment. We want name, address and age. All you get is 128 chars for name, 256 for address and a single int for age. These can all be dynamic if you want, but you must decide before you start coding. For our simple case, lets use a basic struct:

struct personInfo
{
char[128] name;
char[256] address;
int age;
};

To over generalize; if you want to work with a few people, maybe make an array on the stack:

personInfo fewPeople[10];

If you want to work with many people, maybe allocate it on the heap:

personInfo* manyPeople = (personInfo*)malloc(sizeof(personInfo * 1000));

If you want to work with a whole lot of people, more than you have memory for, you're going to need to store it some sort of persistent storage, such as a file or database.

In reality, you want to store this data in a file or database of some sort anyway, otherwise your data will all be lost when the program terminates.
 
Top