Header files and the preprocessor - Can't Live With 'em, Can't Live Without 'em

Header files and the preprocessor - Can't Live With 'em, Can't Live Without 'em

  • Comments 12

Hello, my name is Richard Russo.  I'm the newest member of the Visual C++ compiler front-end developer team, having started in January.  I'm not new to MS though as I spent the last three years working on Windows Vista.  I'm excited to be on the front-end team because compiler development has been a hobby of mine for a few years.

Most posts on this blog discuss new features that are added or what daily work is like here in various positions.  Instead of that, I'd like to discuss my thoughts on a particular aspect of C/C++ that I think deserves more design attention: header files and the preprocessor.  I'll probably be discussing a few hypothetical features here, and I just want to say up front that this does not necessarily mean the VisualC++ team will be working on or delivering these features.  In writing up these ideas, mainly what I'm hoping to do is spark discussion and get your feedback.  There should be a link below to leave comments so please do!

First off, what are header files really for?  Well, without doing research into the design rationale of Kernighan and Ritchie, the most basic purpose it serves is to allow us to maintain units of code for separate compilation in a declare-before-use language.  You can imagine that without the preprocessor we'd have to declare the same things in every source file -- we'd quickly get tired of that and probably hack something together that looks a lot like the current preprocessor.  Sure, we use the preprocessor for lots of clever things, but to me that is its essential purpose.

What are some frustrations with header files?  Well, they seem to contribute to really long build times.  Have you ever used the preprocessor modes of cl.exe?  You can access those with /E (preprocess to stdout), /P (preprocess to a file), and /EP (preprocess to a file, but don't produce #line directives).  Give it a try, for instance "cl /P foo.cpp" will produce "foo.i".  On my system, I wrote a quick Windows "hello world" with the MessageBox function.  When I preprocessed this 5-line program it wound up being roughly 200,000 lines of source code for the compiler to parse.  Now imagine what it is like in your project if every source file includes windows.h.  You can imagine that parsing 200k lines of declarations slows down the compiler a bit.  What else?  Well, header files seem redundant.  We have to type class names, method names, parameter lists, etc. twice.  While that's certainly not the most costly part of developing a C++ project, it does probably slow your thought process down a little.  It also creates additional maintenance.  If you change a parameter list in one place you need to change it in another.   Again, not a huge cost but a real life cost.  We're still not done yet - I can add a few more potential issues.  Header files change depending on the context in which they are preprocessed -- or to say it more succinctly they have isolation problems.  What if you are integrating two libraries that both have a foo.h and both use the guard macros FOO_H?  Well, that's something you as a coder have to take time to deal with.  Along those same lines, if you have a really big project without carefully designed headers, you might notice differences in compile-time (or even potentially run-time) behavior depending on the order in which you include headers.  It's not the end of the world, but it has a cost that you have to pay to investigate and fix the problem.  I think most C/C++ coders would agree that the preprocessor comes with a price tag and a long list of potential pitfalls.

Well, it's not all bad right?  Certainly.  The first benefit I'm thinking of is the most interesting to me.  We have a situation in C/C++ where we have to declare before use, and we put all those shared declarations in header files.  We can easily produce separate compilation units that are mutually-dependent in this way.  But because the mutual dependencies are satisfied by the header files, in general all of our source files can be compiled in parallel.  I think most people would agree that this parallelism is a good thing.  Take a look at counter-examples to this.  Say you're compiling a C# program.  First off, you probably rarely compile that program one source file at a time.  You pass a collection of source files to the compiler, the same way you pass those files to the C/C++ compiler.  But the C# compiler must treat that batch of files differently.  The C/C++ compiler can compile them all separately and then finally consider them as a unit for linking purposes.  For the C# case, the compiler has to consider all of those source files somewhat simultaneously because they can have inter-dependencies:  you can refer to a class in another source file without having to give a declaration of it first.  In short, in C/C++ a translation unit is always a single source file and headers, whereas in C# the translation unit is potentially multiple source files.  At the very least, that makes it seem more difficult to parallelize the C# build process.  No doubt it is doable, but there will be some amount of overhead associated with this issue.  You can think of this as the C/C++ coder paying the cost of this overhead by maintaining header files.  No doubt there are other benefits you can think of to header files.  For instance you can use the macro preprocessor to add rudimentary syntax to our language, to help "automate" some tasks.

What can we do about it?  I'm going to discuss two proposals.  One of them is not mine at all and regards modules for C++.  The other is perhaps a more practical tool that might help you diagnose issues with header files in your codebase.

The word "module" has a lot of different meanings, even in the context of the coding world.  I would say that many programming systems out there (notice this does not include C and C++) provide module functionality which is some joining of the concepts of separate compilation and namespaces.  C and C++ give us facilities for separate compilation, and C++ gives us namespaces, but it does not give us a unified feature that encapsulates these together, allowing us to refer to a separately compiled module and import selected symbols from it.  Enter Vandevoorde's Modules for C++ proposal (http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2006/n2073.pdf).  This gives us a mechanism by which we refer to potentially previously-compiled modules instead of including source code of declarations in our projects.  This has some important benefits in terms of the caveats pointed out above; for example it does have isolation properties and it does not have the maintenance burden associated with the redundancy of header files.  I can see usage of this feature negatively impacting the parallelism property if used in certain ways.  For example (and similar to the C# case), if each source file in your project is a separate module, the compiler will have to analyze dependencies before attempting to compile the files in parallel.  Luckily, the "import" statements give it some clues about these dependencies and can probably be scanned quickly, so there are no doubt ways to solve that problem but there will be some amount of overhead.  I don't have much else to add to Vandevoorde's discussion in that paper, it is very thorough, and if you're interested in the design of such a module system and potential issues I encourage you to read it.

As fun as it is to consider redesigning the world, what could we potentially do today, without changing the infrastructure of header files, to make things better?  I can envision tools that generate headers or scan your header files and greater code-base and attempts to diagnose problems for you.  Most of what I argued above as problems with header files had to do with a cost involving work for a human programmer.  Well, maybe an automated tool can help reduce or eliminate that cost in some cases.

The first suggestion is around "automagically" generating headers.  Think of a compiler which analyzes your C and/or C++ source and extracts just declarations into a minimal header file.  You might have a source file which includes windows.h and then declares several classes.  The header file would need to contain declarations of those classes, and probably just a few of the typedefs declared in windows.h and the various headers it includes.  The result would most likely be fairly short; perhaps all you need to declare those classes is a few typedefs from windows.h like HMODULE, HWND, etc.  This tool could be potentially integrated in to your IDE so that when you change the source file it automatically updates the header if necessary.  For efficiency, the tool could assume that system headers like windows.h don't change.  Another suggestion for efficiency might be to have the tool generate a header for an entire library instead of the single-translation-unit level.

Such a tool seems like it could potentially help with the maintenance and build time issues.  That whole windows.h header would need to be parsed by the header-generating tool, but that cost is amortized.  It is paid only when you modify that source file, and you reap the rewards every time you compile another source file that includes the associated header.  I see potential problems in this area, but if you have a code-base that is partitioned the right way, you might even be able to check in those generated headers and save even more compile time throughout your development team.

What about other issues, such as isolation?  The header-generating tool might be able to help with some of the issues there by creating "well behaved" header files.  For instance it might generate preprocessor code that saves and restores the state of macros inside the context of the file, or generating headers that don't use any preprocessor features other than perhaps for guard macros at the beginning and end of the file.  But what would likely be more useful is some sort of tool to analyze your headers.  I'll give a few simple examples.  It could look for isolation issues such as two different header files in your INCLUDE path with the same file name or that use the same guard macro.  It could find headers that don't have proper guard macros.  It could build a dependency graph that shows you how one header's definitions impact the definitions another, and what the include graph looks like for particular headers and source files.

Such an analysis tool might help you get your build times down as well.  For instance it could scan your source files and tell you that it is unnecessary to include a particular header because none of the declarations are used.  Or it might tell you that instead of including foo.h which includes bar.h, you could just include the latter because this particular source file only uses declarations from bar.h.

I'm afraid I've used up my time and space for this post, but I hope it was an interesting read, and got you thinking about what features regarding the preprocessor and header files might make you more productive in your C and C++ codebase.  Please leave comments if you liked these ideas, didn't like them, or have your own. 

Thanks for reading!

Richard

  • I've found that precompiled headers massively cut down compile times when used properly, but determining which files are most effective in the PCH, and what files are still missing, is time consuming.

    Another thing that would help is to get the PSDK team to clean up the rampant #define abuse in the Win32 headers. I could speed up my build if I put windows.h in the PCH, but I've been burned too many times by symbols like OpenRaw getting #defined, so I partition it off as much as I can, and the files that have to include it compile quite slowly. And having two PCHs in a project is error-prone.

    Finally, it seems like Intellisense doesn't take advantage of PCHs. When I traced CreateFile() calls out of FEACP to track down what was triggering an Intellisense infinite loop, I noticed it reading the same files over and over.

  • You can use a script to embed #pragma message("Compiling _filename_") directives within each header included in your project. Then search through the build log(s) to count what's compiled most frequently. It's then a bit easier to decide what to precompile.

  • Regarding your statement: "What if you are integrating two libraries that both have a foo.h and both use the guard macros FOO_H".  Isn't this type of header guard unnecessary since we have #pragma once

    In the past I used these header guards but no longer do after I noticed Visual Studio 2005 generates (.h) files having only a <b>#pragma once</b> and no longer have something like:

    #if !defined(_MYDLG_H__)

    #define _MYDLG_H__

    My questions:

    a) What version of VS added #pragma once?

    b) Doesn't this replace old style header guards? (see above)

    c) Are there any disadvantages to #pragma once?

  • @ jamome: I'm not Richard, but I can possibly answer your questions:

    >> a) What version of VS added #pragma once?

    From the looks of my VC6 header files, you need a compiler version (_MSC_VER) of greater than 1000.  VC6 shipped with compiler version 1200.

    >> b) Doesn't this replace old style header guards? (see above)

    Probably not if you want to write portable code.  If you don't care about portability, then it's probably fine to use #pragma once.

    Regarding your third question, I don't know of any, besides the portability issue.

  • Thanks everyone for their comments so far!

    In response to Jamome:

    >> Regarding your statement: "What if you are integrating two libraries that both have a foo.h and both use the guard macros FOO_H".  Isn't this type of header guard unnecessary since we have #pragma once

    The intent of the example was more along the lines of you as a coder having to integrate libraries from various sources, potentially code you didn't write and may not want to change for various reasons.  In that case the library developer might not use #pragma once for portability or other reasons.

    I agree with ChrisR that I don't believe there are any particular disadvantages to #pragma once over the guard macro idiom, other than the portability issue.

    Hope that helps,

    Richard

  • Instead of choosing one over the other, do both! :-) Portable headers should use idempotency guards that look like this:

    #ifndef PROJECTNAME_HEADERNAME_HPP

    #define PROJECTNAME_HEADERNAME_HPP

    #ifdef _MSC_VER

    #pragma once

    #endif

    // Code goes here!

    #endif // Idempotency

    The portable "#ifndef PROJECTNAME_HEADERNAME_HPP" guard is for other compilers (some will recognize the guard in order to avoid opening the file again, as long as nothing exists outside of it), while the "#pragma once" is for Visual Studio (which tells it to avoid opening the file again, for faster builds of very large projects). The "#ifdef _MSC_VER" prevents other compilers from seeing the Visual Studio-specific pragma.

    Of course, Visual Studio headers do not necessarily follow this practice because they do not have to be portable.

    Prefixing your macros with your project name is a general practice that avoids clashes, since macros aren't aware of namespaces.

    Stephan T. Lavavej

    Visual C++ Libraries Developer

  • I liked the modules proposal a lot!

    Unfortunately, it's not going to be in the new C++ standard. The committee liked the idea, but thought that it was too much work to polish the proposal in time (before 2010), since there there is no available compiler where it could be tested (IOW, it's too big and it's not "existing practice").

    Probably modules will be standardised in the future, but it would be nice if compilers started to ship with a few optionally-enabled new features. If the programmer wants to test modules, or if he thinks they are useful to his project even if the syntax or semantics are not in they final form, the feature can be enabled.

  • We've been doing a lot of thinking lately about header files and modularity in C, and we've developed a tool called CMod that enforces modularity in C.  The CMod web page (http://www.cs.umd.edu/projects/PL/CMod/) contains a short paper we wrote about our system and an earlier implementation.

    We have been improving the system since that paper was written, and we expect to have a new paper ready in a couple of weeks, with more evaluation and tighter rules.   A finished implementation will be available then, too, but you can download the latest version from our web page.  We look forward to comments you might have.

    -Mike (and Saurabh, Jeff, and Pat, the CMod team)

  • This is a really interesting topic. The application i work on is quite big and has been developed over a long period of time. Over times the dependencies (real and fake) has grown but we try to clean it up bit by bit but that takes a lot of time. The main problem is build times, we use Incredibuild to do parallel builds and that helps a lot.

    It sounds quite good to have a tool that could generate "well behaved" header files with a minimum of dependencies and also a tool that can visualize dependencies. We built a small simple tool that can create an include graph from a source file and that helps alot it does not do any analysis of the graph to recommend what to do but that would also help alot.

    Another thing that i have thought about is sharing precompiled headers and preferabliy beeing able to use several precompiled headers for a single project. We generate a quite large number of dll:s and several projects should be able to share a common precompiled header.

    Anything that could help in this area is interesting and iäm looking forward to more of this.

  • Is there a preprocessor macro that can be checked to see if precompiled headers are being used for a particular complilation unit?

    I like to test my files to see if they explicitly include everything they actually need, and nothing extra, so I test compile them individually as I write them without using precompiled headers.  However, then they including everything in stdafx.h un-precompiled, which is slow.  I'd like to stick a guard in stdafx.h itself to ignore it's contents if precompiled headers aren't being used.

  • the best way to improve build times is to build all your C++ files together in one 'bundled' cpp file like this:

    #include "src1.cpp"

    #include "src2.cpp"

    #include "src3.cpp"

    ...

    yes it means you might have to resolve name conflicts b/w duplicate static definitions, but these issues are usually trivial and the build time increase is dramatic, and the codegen is more optimal as well.  Try it, you'll like it.

  • Richard,

    Take a look a this link:

    http://os.inf.tu-dresden.de/~hohmuth/prj/preprocess/

    it is a preprocess tool to generate header files!

Page 1 of 1 (12 items)