Go Ask printf() To Behave Predictably

Go Ask printf() To Behave Predictably

  • Comments 24

This is one of those problems whose solution is very evident… or bitterly hard to guess depending on how much you were influenced by what you were told about it.

I’ll paste here the case

I must be getting blind, as I can’t see what’s wrong. This code works on amd64, but crashes on x86.


  1. #include <regex>
  2. using namespace std;
  3.  
  4. int main() {
  5.     char searchMe[] = "Go ask printf() to behave predictably.";
  6.     regex rx("^Go ask ");
  7.     cmatch found;
  8.  
  9.     if (regex_search(searchMe, found, rx))
  10.         printf("Found it: \"%s\".\n", found.str());
  11.     else
  12.         printf("Didn't find it.\n");
  13. }


I debugged a bit and found out that what is failing is actually the first printf, because formatting for %s is not apparently using the address of the string, but it tries to dereference its content.

Can anyone help me see what’s wrong?

Thanks

 

You will rapidly solve it… or not! It will actually depend on what it impressed you the most about the symptoms.

  • Just a thought. shouldn't be found.str().c_str() ?

  • str returns a basic_string, not a char* pointer, so the code is just plain wrong.

  • I just wonder how it works (anywhere), because %s requires a pointer to char, but str() returns basic_string<>. Probably, on amd64 compiler puts internal char* in first place of class, but on others platforms not. But how it can be I cant understand.

  • And the lesson is:  avoid the non-typesafe C I/O functions at all costs.

  • Any C++ programmer that's spent *any* time with the language will get the type error - the interesting thing is why does printf("%s", std::string("foo")) work on amd64, but not x86?

    Well, C variable arguments, as used by the formatted arguments to printf(), are just passed as if the signature matched, so in this case as if printf() was declared as "printf(const char* format, std::string a1)". Of course, "printf()" is a C function, it doesn't know anything about C++'s std::string, it's expecting a raw char* because of the "%s" in the format string.

    As it turns out, in Visual C++'s implementation of std::string looks like "char* ptr, size_t length, size_t buffer_length" to C, for strings longer than 16 bytes, (otherwise, replace char* ptr with char data[16]), so you would expect that treating it as a char* should actually work, if the string is longer than 16 bytes, right?

    The problem is that pesky thing alignment, and it's buddy packing - VC will add padding bytes before certain values to ensure they start at specified addresses, depending on type annotations, pragmas, compiler options, the size of the members in structures, and who knows what else!

    In the 64-bit case, everything is 8 bytes, so no padding bytes are inserted and the C rules agree with the C++ rules. But in the 32-bit case, according to C the char* should come immediately after the formating char*, because everything is 4 bytes (32 bits = 4 bytes * 8 bits in a byte), right? But C++ is still aligning the std::string to 8 bytes (due to the complex struct alignment rules), meaning 4 bytes have to be inserted between it and the format argument, right where printf() expects the char* to be. And as the final piece of the puzzle, VC++ is nice and sets the unused space to a value that will always error if you read it as a pointer in DEBUG builds so you see where the problem is quickly.

  • Normal developers use environments with checked printf (e.g. GCC)... But this is just a precaution, because normal developers thing before they type and there fore do not do dumb mistakes like that one.

    No one gives a damn why this works on AMD -- it is forbidden by standard and therefore does not work. Even if it is working sometimes.

  • In our code we have many cases of:

    CString strTmp(L"bla");

    wprintf(L"%s",strTmp);

    This code scares the hell out of me.

    I know that one day I will need to ctrl-h them all.

  • It really is a shame that VS doesn't seem to allow type-checking of printf-style arguments.

    (Especially as the alternative C++ I/O functions are horrible to use by comparison. printf is fine with good tool support but we still don't have it built-in. How hard would it be, really?)

  • printf is supposed to be SAL annotated with __format_string or something.

  • @Leo Davidson, how can you add type-checking to it?

    Have you looked at the inside of printf?

    Maybe with variadic templates, Maybe with tuples.

    But I can't see how you can rewrite it to be easy to use.

  • cl.exe /analyze will emit warnings for the SAL-annotated printf, saying that you're putting an object into the ... .

    It's mostly same as the last quiz: use compiler warnings and save yourself low-level code troubles.

  • @Leo Davidson

    1. Talk to me when you're able to use printf to write generic functions that can write to any kind of stream (file stream, string stream, etc.):

    void WriteSomething(std::ostream& anyStream)

    {

       anyStream << "...";

    }

    2. Talk to me when you're able to extend printf by deriving new stream classes from it:

    class SocketStream : public std::iostream

    {

       ...

    };

    3. Talk to me when you're able to extend printf by adding new formatting data types to it:

    class MyClass

    {

       friend std::ostream& operator<<(std::ostream&, const MyClass&) { ... }

       friend std::istream& operator>>(std::istream&, MyClass&) { ... }

    };

    MyClass mc;

    std::cout << mc;

  • Sadly the /analyze feature of VC++ doesn't help here either.

    Gimpel's PC-Lint however spots the problem:

    "error 437: (Warning -- Passing struct 'basic_string' to ellipsis)"

    I would suspect other commercial static analysis tools would spot it too.

  • I really am curious how c++ io is horrible by comparison.  Type checking is a win, templates are a win... if you don't want to flush don't use endl.  There, now it's powerful and still as efficient...

  • It's funny people were asking how you could make printf safe when other tools have already done exactly that.

    I've looked into cl /analyze before but from what I remember it isn't actually supported by the IDE unless you have the uber-expensive Team Builder version, or something like that. I'm fuzzy on the details.

    And, yes, I fully realise that C++ I/O is more powerful than printf. I rarely actually want any of that power, though; I just want to write expressive code that outputs simple strings and doesn't require a load of operator overloads, hidden conversions, spending ages looking at the types to see what they will do when streamed...

    Most of all, I prefer printf because it adds less noise to the string I'm generating. You can read the string on its own without a load of code being interwoven with it. (Obviously this is a pro as well as a con since the code is then in another place and things can get out of sync.)

    If you prefer the C++ stuff then more power to you. Maybe it's also a lot to do with what we're used to. I think a lot of people, like me, who used C before C++ still prefer the old C way here. I've certainly spoken to many others who feel the same.

    Anyway, the fact is that with better tooling, printf can be just type-safe. People are not going to stop using printf and friends. It'd be nice if the toolset did a better job checking their types.

Page 1 of 2 (24 items) 12