Difference between .ignore(…) and getline(…) for skipping lines in C++
Here I want to compare the difference between .ignore() and .getline() and see which one provides more restrictions. If we want to skip a few lines before reading the actual data, there are two main methods that could be used: .ignore(...)
and getline(...)
inFile.open(FILE_NAME);
.ignore(..)
inFile.ignore(LONG_MAX, '\\n');//Ignore first line
Using ignore, the number of characters provided as the first argument will be ignored OR until the delimiter is reached (‘\n’). Here, we provided LONG_MAX as a very long number of characters.
In a modern compiler, having LONG_MAX characters in a single line is equal to more than 1 Million TeraBytes (TB)!
getline(..)
string junk;
getline(inFile, junk);
Here, using getline, a junk string is created and used to STORE one line of input. String is in memory, so if there are n characters in the first line, we need at least n+1 bytes in memory.
Comparison
It seems like getline() does not restrict the number of characters. Comparing with .ignore however, getline() tries to STORE the first line in memory! While .ignore() simply IGNORES the first n characters.
Let’s see what happens in reality. If these assumptions are correct, for getline() to read LONG_MAX characters and store it in <junk> variable, we need more than 1 Million TBs of main memory!
Creating test file (37 GB)
I only have 16GB of RAM in my computer. In the code below, I tried to create a 37GB file in my computer (two lines, first line being 37GB and the second containing a small string).
The created file:
Comparing both methods
Then I tried both methods:
Results
.ignore()
Using ignore, the computer used less than 1 MB of memory (920 KB) and finished in around 20 seconds and read the second file successfully. I shot this when the application was running using visual studio performance profiling.
getline()
getline method consumed computer memory quite quickly! In four seconds, it reached 1 GB of memory and then … it stopped reading. The result? Nothing! It did not read the second line at all! It just gave up it seems and the input buffer was probably broken.
Conclusion
Although it may seem that .ignore() does not read very long files and has more restriction than .getline(), however, a real-world testing revealed that in practice .getline() cannot even ignore read a 37GB line in a computer with 16GB of memory and .ignore() is a more appropriate method for skipping very long lines up to 1 million Terabytes (in theory at least).
It makes sense from memory point of view as well. If we do not need a few lines of inputs, we should IGNORE them rather than STORING them in the memory using getline().
Copyright © 2020 Baran Erfani (baranerf.github.io)