Reading characters from a text and other types of files in C and C++

Thread Starter

ArakelTheDragon

Joined Nov 18, 2016
1,362
Hi!

I made this program to count characters:
Code:
int CharacterCount()
{
  /* Local variables. */
  using namespace std;

  ifstream  fin(TextSaved);  /* Read from the file given in the "case WM_CREATE" .*/
  /* Local variables. */
  char ch;

  iAll = c = Blanks = Tabs = NewLines = 0;  /* Global variables need to be cleared. */

  while(fin)
  {
  fin.get(ch);  /* Get the next character from the file. */
  if((ch >= 'a' && ch <= 'z') || (ch >= 'A' && ch <= 'Z') || (ch >= '0' && ch <= '9'))/* If ch is "a to z" or "A to Z" or "0 to 9". */
  c++;
  else
  if(ch== ' ')
  Blanks++;
  if (ch == '\t')
  Tabs++;
  if (ch == '\n')
  NewLines++;
  }

  iAll = c + Blanks + Tabs + NewLines;

  return '0';
}
Problem_0: The last character is counted twice no matter is it a character, newline, blank or tab.
Problem_1: When I try to read an ".odt"(OpenOffice writer file) I get + 200 characters.

Can anyone give me a hint on why does this happen?
 

WBahn

Joined Mar 31, 2012
29,976
It would be helpful if you would post reasonably formatted code.

My guess is that your problem is that fin doesn't become NULL until AFTER it has attempted to read past the end of the file. What does fin.get() return if you are reading past the end of the file? My guess is that it's not doing anything, so you have the same situation you did when you read the last character in the prior pass.

How does the value of ch get changed? It appears to be a local variable that is being passed to fin.get(), so how does the value get updated?

Is there some reason that you have so many global variables?

Is there a reason why you have the check for blanks in an else clause but the checks for tabs and newlines aren't?

What do you mean you get + 200 characters? It would really help if you gave more meaningful descriptions.
 

Thread Starter

ArakelTheDragon

Joined Nov 18, 2016
1,362
w
It would be helpful if you would post reasonably formatted code.

My guess is that your problem is that fin doesn't become NULL until AFTER it has attempted to read past the end of the file. What does fin.get() return if you are reading past the end of the file? My guess is that it's not doing anything, so you have the same situation you did when you read the last character in the prior pass.

How does the value of ch get changed? It appears to be a local variable that is being passed to fin.get(), so how does the value get updated?

Is there some reason that you have so many global variables?

Is there a reason why you have the check for blanks in an else clause but the checks for tabs and newlines aren't?

What do you mean you get + 200 characters? It would really help if you gave more meaningful descriptions.
The code was formatted, but the copy paste or "code" tag changed it like this.

They are all in the else except the characters. If its not a character check what it is.

upload_2019-1-4_19-53-20.png

If I read a ".txt" file, I get real character+1.
If I read an ".odt" file I get real characters +200.
 

WBahn

Joined Mar 31, 2012
29,976
w

The code was formatted, but the copy paste or "code" tag changed it like this.
You need to always go in and add in the spacing when you paste something into the response box (depends somewhat on the browser you are using). It's a royal pain, but it's really important.

They are all in the else except the characters. If its not a character check what it is.

View attachment 167142
Uh... nope, they aren't. Only the first one is.

Remember, C doesn't give a flip about how you indent things. If you want more than one statement in the else clause, then they need to be made a single compound statement by enclosing them in curly braces.

But is there any need for an 'else' clause at all. Are not all of those things mutually exclusive?

If I read a ".txt" file, I get real character+1.
If I read an ".odt" file I get real characters +200.
What do you consider "real" characters?

Have you opened a .odt file in a text editor and counted them manually (for a small file), or looked to see what is in there besides the characters you see on the screen. Hint: It's an XML file, so that should tell you something about what is in there that you might not be taking into account.
 

402DF855

Joined Feb 9, 2013
271
How does the value of ch get changed?
The variable is passed by reference to ifstream::get, basically syntactic candy of C++, so you don't have to specify the address of symbol (&).

Check for end of file when reading the next character.
C:
if (fin.get(ch)==EOF)  /* Get the next character from the file. */
    break;
 

WBahn

Joined Mar 31, 2012
29,976
The variable is passed by reference to ifstream::get, basically syntactic candy of C++, so you don't have to specify the address of symbol (&).
Thanks. Seems like one more reason not to like C++. I take it that the compiler uses the function prototype to decide whether to pass the value or pass the reference?

Seems awfully dangerous if the programmer can't tell at a glance which is happening. They could be passing what they think is a value and then be surprised with their variable keeps changing on them.

I'm assuming this is not a new observation, so how does C++ deal with this (or, perhaps more aptly, how to C++ programmers learn to cope with it)?
 

Thread Starter

ArakelTheDragon

Joined Nov 18, 2016
1,362
I came up with a different solution that fully works(using the goold old always working "C"). By the way this is a combo of "C" and "C++".

The character count now counts correctly from the ".txt" file. But when I open the ".odt" all I get is 201, no matter what I do. If I add, remove characters, new lines everything, it still shows 201.

I think the bold and other things have to be included but this should account for it:
Code:
  if((c >= 'a' && c <= 'z') || (c >= 'A' && c <= 'Z') || (c >= '0' && c <= '9'))/* If ch is "a to z" or "A to Z" or "0 to 9". */
  iC++;
This only takes the characters "a to z", "A to Z" and the number "0 to 9". If not I need to add something to remove the unnecessary things. But I have no idea what it is.

EDIT:
There has to be some method of extracting only the characters from all types of files?
 
Last edited:

WBahn

Joined Mar 31, 2012
29,976
I came up with a different solution that fully works(using the goold old always working "C"). By the way this is a combo of "C" and "C++".
Which is actually one of C++'s big drawbacks -- the ability to mix object and imperative programming. The usual result is a mish-mash.

The character count now counts correctly from the ".txt" file. But when I open the ".odt" all I get is 201, no matter what I do. If I add, remove characters, new lines everything, it still shows 201.
How about attaching the .odt file that you are trying to work with so that we can see what it actually has in it.

What happens if you just change the extension from .odt to .txt?

I think the bold and other things have to be included but this should account for it:
Code:
  if((c >= 'a' && c <= 'z') || (c >= 'A' && c <= 'Z') || (c >= '0' && c <= '9'))/* If ch is "a to z" or "A to Z" or "0 to 9". */
  iC++;
This only takes the characters "a to z", "A to Z" and the number "0 to 9". If not I need to add something to remove the unnecessary things. But I have no idea what it is.

EDIT:
There has to be some method of extracting only the characters from all types of files?
What constitutes "only the characters"? If you ask five different people you are likely to get five different answers. Some might say it means only the letters, some might say it also includes the digits, some might say it also includes the punctuation characters, some might say it also includes the space, some might say it also includes horizontal tabs, some might say it also includes vertical tabs, some might say it also includes newline characters. Okay, more than five answers.
 

Ian Rogers

Joined Dec 12, 2012
1,136
Why not check out the string and stdio libraries.... All this is written.. isalpha isnumb are a couple of many functions to take care of text input!!
 

dl324

Joined Mar 30, 2015
16,839
If you're using Linux, you can use the strings command. It has an option to set the minimum string length.

If you want to run on different operating systems, get the code for the strings command.
 

Thread Starter

ArakelTheDragon

Joined Nov 18, 2016
1,362
It will take me some time to do this, thanks for the help!

There is no language better than C. Even my professors told me that and they make the software for corporations. The same that you can do in "C++" you can do in C and its just as easy even if its not OOP.
 
Last edited:

WBahn

Joined Mar 31, 2012
29,976
It will take me some time to do this, thanks for the help!

There is no language better than C. Even my professors told me that and they make the software for corporations. The same that you can do in "C++" you can do in C and its just as easy even if its not OOP.
While I love C, any claim that there is no language better than it (or a similar claim for ANY language) reveals a significant amount of naivety. If this were true, then all those other languages would have died in favor of that "best" language. There's a simple reason that we have so many languages that have strong market presence -- no one language can be the best for all applications.
 

402DF855

Joined Feb 9, 2013
271
Thanks. Seems like one more reason not to like C++. I take it that the compiler uses the function prototype to decide whether to pass the value or pass the reference?
Correct. Something like: stream::get(char &output);

I'm assuming this is not a new observation, so how does C++ deal with this (or, perhaps more aptly, how to C++ programmers learn to cope with it)?
It's probably considered an improvement. One could argue, as in this case, that get(ch) must use a reference or it wouldn't make sense. While I do use references frequently, I still resort to pointers when it makes sense. For instance, calling a function with a NULL pointer parameter can indicate that the parameter is optional and there's no need to do anything with it.
 

MrSoftware

Joined Oct 29, 2013
2,188
If you're using ifstream, instead of using get(), have you tried using getline() and then line.length() to count the chars in the line?

It will take me some time to do this, thanks for the help!

There is no language better than C. Even my professors told me that and they make the software for corporations. The same that you can do in "C++" you can do in C and its just as easy even if its not OOP.
If someone makes this statement and is serious, don't take career advice from them. Several people here do now or have at some point in their career written "software for corporations" including myself; I wouldn't use that as a way to measure competence. C is a tool, and it is a great tool for solving some problems. But for other problems there are much better tools to use. Have you ever written a web site in C? Painful is an understatement. Need to have multiple teams working on your app, one team writing the GUI and the other writing the back end? Some of the managed languages such as C# make this much easier than using C. Need an app for your Android phone? C is going to be the hard way to make that happen. Need to automate something from a shell? There are a number of options better than C, such as Perl, Python, bash, sh, csh, PowerShell, PHP, etc.. And there are plenty of situations where C++ is preferable to C. For example I worked on a multi platform (Windows, Linux, OSX (PPC and x86), AIX, Irix, Solaris (SPARC and x86)) file system library that would build physical images of CD's and DVD's with multiple file systems on each. C was the best language for the API into the library, but under the hood C++ was significantly better due to the complexity of the work being done. The features of C++ made it much easier to organize the code, which led to less bugs and a faster and more efficient design. I'm a big C fan, but to pretend that it's the best solution for every problem would be a mistake.
 
Top