This is a follow up to Jumping Into C++, a first-hand account of my experiences writing my first modern C++ application. It was a simple exercise that turned out to be not so simple and much more educational than I imagined thanks to the community.
And now, in no particular order, a few thoughts.
About not using regular expressions. I avoided regular expressions early in the project because I was leery of additional complexity they might introduce. Not complexity from the regular expression language itself (\w+ etc.) but complexity introduced by the C++ plumbing required to use the regular expression. I didn't research the issue before nixing them, I simply nixed them. I was worried of getting stuck in what I thought would be additional complexity.
Fear is not the reason to eliminate a potential solution.
On the plus side, avoiding regular expressions gave me the opportunity to learn how to remove characters from a std::string and, in a subsequent version, the joys of lambdas. It also taught me that even with short example projects, correctness matters. I should have had a more rigorous definition for word and implemented a slightly better solution, one the regular expression delivers concisely.
About the speed of Community. MattPD posted an alternative version with comments in a little over an hour after the original post. In the first twelve hours, many more shared alternative versions and observations including Diego Dagum, Stephan Lavavej, Carl Daniel, Ivan, Daniel Earwicker, Mike and others. By Tuesday afternoon there were at least five different versions including one in C# and two that used memory mapped files and many more comments on the blog and elsewhere. Comments were thoughtful and there was no trolling, not even when DotNet quipped "There is a better way. Jumping into C#!".
The C# part of me agreed 100%.
Community is awesome.
About the different solutions. In broad strokes, all solutions made changes to the same core functionality, namely file input and word parsing.
To handle file input, the original program used iostream, a filename check to weed out non-text files, and a convoluted loop to check for failure on the initial read from the file. The code worked for normal cases but would fail to weed out some files. Enhancements include:
To parse a word, the original program read a string from the input stream and hacked out certain punctuation characters using a custom function. It left room for improvement but met the spirit (*cough* weasel word *cough*) of the requirements. Enhancements include:
The awesomest part about these solutions is watching how each evolved from previous discussions and versions. The journey is the trip, not the destination!
About scaring off potential new C++ developers. As the original program evolved, there were discussions about variable declaration best practices, performance tradeoffs for different statements and libraries, which arguments are passed at the command line, how memory is allocated in C++ versus C#, memory mapped files, long long (as in guaranteed 64 bitness, not a galaxy far, far away), and parallel tasks.
From the outside, it is intimidating. what, can't those C++ folks agree on anything? How hard can it be? In my favorite language, it would only be two lines (if that)! From the inside, it is a technical exercise. It is about precision and developer personality. How can the code be improved? What is the proof? Can it be simpler?
I think the discussion is helpful for newbies; learning takes time and no code is perfect.
About optimization. Optimization was beyond the scope of the original exercise. Those who offered enhanced versions had experience profiling code, likely profiled the original and subsequent enhancements, and used their experience with C++ to create faster versions. I like to think that my version was just fine.
I remember some advice I was given by a grizzled veteran developer: Get a reasonable version of your code working before profiling. If you do, you will spend time optimizing where it will make the most difference, not the places you think it matters but probably does not.
About reading a file using iostream. Don't. It is slower and difficult to manage for complicated file formats. This application was trivial but I will avoid in the future.
About the next exercise. My journey to C++ Ninjahood continues. Next time, I'll write a class and likely come to terms with copy versus move semantics and all that jazz. I also want to take a look at how lack of domain knowledge can make an example program much more difficult to understand even where the underlying C++ is standard, basic stuff. Word counting does not take much domain knowledge; 3D object manipulation for real-time games does. Kinematic Equations fall somewhere in between.
Was this article helpful? Let us know in the comments, Twitter, or Facebook.
What will you use instead of iostream?
One possible reason (implicit flush) for slow iostream:
> About reading a file using iostream. Don't. It is slower and difficult to manage for complicated file formats.
I could not disagree more. It's not (much) slower if you call `std::ios_base::sync_with_stdio(false)` first, and more importantly, it's *type-safe* – throwing away type-safety in C++ is the absolute last thing you want to do. If performance/complexity is that much of a concern, that's what Boost.Spirit is for . :-]
(And I know people will be inclined to balk at Spirit as a solution to complexity, but don't bother – after the initial learning curve, it's extremely easy to maintain very complicated file formats, both generation and parsing, and the runtime performance is unparalleled.)
@ildjarn have you actually benchmarked this? Because every test I've made has showed iostreams to be monstrously slow (easily a 5x over stdio). Of course, often performance doesn't matter, and type-safety is very nice, but the iostreams API is also so, well, messed-up that I don't think safety is much of an argument for using it. There's far too much to trip you up, far too much you can get wrong. It's just not a very good library, and as far as performance goes, the half-arsed "let's not even *try* to make it fast" implementation certainly doesn't help. (In fairness, it's not just the MSVC implementation that's slow. The same can be said of GCC's. I haven't tested Clang's, but I wouldn't be surprised to see that perform like crap as well)
Not a bad idea to count with concrete performance requirements before dropping a meant-to-be-used library in favor of another one, just because the latter is faster.
The risk otherwise is to overcomplicate an approach to get a benefit that could be more apparent than real.
@ildjarn: another quite important point: sync_with_stdio does nothing for file manipulation. It disables synchronization of the *standard* streams (stdin/stdout/stderr), but for file streams, it makes zero difference. If you think that setting sync_with_stdio to false magically solves all the performance problems with iostream, it sounds like cargo-cult programming to me. You haven't investigated whether what you're doing actually has an effect, you're just assuming it makes teh problem go away. :)
Update: Stephan (STL) talks about do/while in his latest Core C++ video on Channel 9. Check it out at channel9.msdn.com/.../Stephan-T-Lavavej-Core-Cpp-8-of-n.