With the introduction of Windows RT for ARM devices, many Windows software developers will be encountering ARM processors for the first time. For the native C++ developer this means the potential for running afoul of undefined, unspecified, or implementation-defined behavior--as defined by the C++ language--that is expressed differently on the ARM architecture than on the x86 or x64 architectures that most Windows developers are familiar with.
In C++, the results of certain operations, under certain circumstances, are intentionally ambiguous according to the language specification. In the specification this is known as "undefined behavior" and, simply put, means "anything can happen, or nothing at all. You're on your own". Having this kind of ambiguity in a programming language might seem undesirable at first, but it's actually what allows C++ to be supported on so many different hardware platforms without sacrificing performance. By not requiring a specific behavior in these circumstances, compiler vendors are free to do whatever makes reasonable sense. Usually that means whatever is easiest for the compiler or most efficient for the target hardware platform. However, because undefined behavior can't be anticipated by the language, it's an error for a program to rely on the expressed behavior of a particular platform. As far as the C++ specification is concerned, it's perfectly standards-conforming for the behavior to be expressed differently on another processor architecture, to change between microprocessor generations, to be affected by compiler optimization settings, or for any other reason at all.
Another category of behavior, known as "implementation-defined behavior", is similar to undefined behavior in that the C++ language specification doesn't prescribe a particular expression, but is different in that the specification requires the compiler vendor to define and document how their implementation behaves. This gives the vendor freedom to implement the behavior as they see fit, while also giving users of their implementation a guarantee that the behavior can be relied upon, even if the behavior might be non-portable or can change in the next version of the vendor's compiler.
Finally, there is "unspecified behavior" which is just implementation-defined behavior without the requirement for the compiler vendor to document it.
All of this undefined, unspecified, and implementation-defined behavior can make porting code from one platform to another a tricky proposition. Even writing new code for an unfamiliar platform might seem at times like you've been dropped into a parallel universe that's just slightly askew from the one you know. Some developers might routinely cite the C++ language specification from memory, but for the rest of us the border between portable C++ and undefined, unspecified, or implementation-defined behavior might not always be at the forefront of our minds. It can be easy to write code that relies on undefined, unspecified or implementation-defined behavior without even realizing it.
To help you make a smooth transition to Windows RT and ARM development, we've compiled some of the most common ways that developers might encounter (or stumble into) undefined, unspecified, or implementation-defined behavior in "working" code--complete with examples of how the behavior is expressed on ARM, x86 and x64 platforms using the Visual C++ tool chain. The list below is by no means exhaustive, and although the specific behaviors cited in these examples can be demonstrated on particular platforms, the behaviors themselves should not be relied upon in your own code. We include the observed behaviors only because this information might help you recognize how your own code might rely on them.
On the ARM architecture, when a floating-point value is converted to a 32-bit integer, it saturates; that is, it converts to the nearest value that the integer can represent, if the floating-point value is outside the range of the integer. For example, when converting to an unsigned integer, a negative floating-point value always converts to 0, and to 4294967295 if the floating-point value is too large for an unsigned integer to represent. When converting to a signed integer, the floating-point value is converted to -2147483648 if it is too small for a signed integer to represent, or to 2147483637 if it is too large. On x86 and x64 architectures, floating point conversion does not saturate; instead, the conversion wraps around if the target type is unsigned, or is set to -2147483648 if the target type is signed.
The differences are even more pronounced for integer types smaller than 32 bits. None of the architectures discussed have direct support for converting a floating-point value to integer types smaller than 32 bits, so the conversion is performed as if the target type is 32 bits wide, and then truncates to the correct number of bits. Here you can see the result of converting +/- 5 billion (5e009) to various signed and unsigned integer types on each platform:
As you can see, there's no simple pattern to what's going on because saturation doesn't take place in all cases, and because truncation doesn't preserve the sign of a value.
Still other values introduce more arbitrary conversions. On ARM, when you convert a NaN (Not-a-Number) floating point value to an integer type, the result is 0x00000000. On x86 and x64, the result is 0x80000000.
The bottom line for floating point conversion is that you can't rely on a consistent result unless you know that the value fits within a range that the target integer type can represent.
On the ARM architecture, the shift operators always behave as if they take place in a 256-bit pattern space, regardless of the operand size--that is, the pattern repeats, or "wraps around", only every 256 positions. Another way of thinking of this is that the pattern is shifted the specified number of positions modulo 256. Then, of course, the result contains just the least-significant bits of the pattern space.
On the x86 and x64 architectures, the behavior of the shift operator depends on both the size of the operand and on whether the code is compiled for x86 or x64. On both x86 and x64, operands that are 32 bits in size or smaller behave the same--the patterns space repeats every 32 positions. However, operands that are larger than 32 bits in size behave differently when compiled for x86 and x64 architecture. Because the x64 architecture has an instruction for shifting 64-bit values, the compiler emits this instruction to carry out the shift; but the x86 architecture doesn't have a 64-bit shift instruction, so the compiler emits a small software routine to shift the 64-bit operand instead. The pattern space of this routine repeats every 256 positions. As a result, the x86 platform behaves less like its x64 sibling and more like ARM when shifting 64-bit operands.
Let's look at some examples. Notice that in the first table the x86 and x64 columns are identical, while in the second table its the x86 and ARM columns.
To help you avoid this error, the compiler emits warning C4295 to let you know that your code uses shifts that are too large (or negative) to be safe, but only if the shift amount is a constant or literal value.
On the ARM architecture, the memory model is weakly-ordered. This means that a thread observes its own writes to memory in-order, but that writes to memory by other threads can be observed in any order unless additional measures are taken to synchronize the threads. The x86 and x64 architectures, on the other hand, have a strongly-ordered memory model. This means that a thread observes both its own memory writes, and the memory writes of other threads, in the order that the writes are made. In other words, a strongly-ordered architecture guarantees that if a thread, B, writes a value to location X, and then writes again to location Y, then another thread, A, will not see the update to Y before it sees the update to X. Weakly-ordered memory models make no such guarantee.
Where this intersects with the behavior of volatile variables is that, combined with the strongly-ordered memory model of x86 and x64, it was possible to (ab)use volatile variables for certain kinds of inter-process communication in the past. This is the traditional semantics of the volatile keyword in Microsoft's compiler, and a lot of software exists that relies on those semantics to function. However, the C++11 language specification does not required that such memory accesses are strongly-ordered across threads, so it is an error to rely on this behavior in portable, standards-conforming code.
For this reason, the Microsoft C++ compiler now supports two different interpretations of the volatile storage qualifier that you can choose between by using a compiler switch. /volatile:iso selects the strict C++ standard volatile semantics that do not guarantee strong ordering. /volatile:ms selects the Microsoft extended volatile semantics that do guarantee strong ordering.
Because /volatile:iso implements the C++ standard volatile semantics and can open the door for greater optimization, it's a best practice to use /volatile:iso whenever possible, combined with explicit thread synchronization primitives where required. /volatile:ms is only necessary when the program depends upon the extended, strongly-ordered semantics.
Here's where things get interesting.
On the ARM architecture, the default is /volatile:iso because ARM software doesn't have a legacy of relying on the extended semantics. However, on the x86 and x64 architectures the default is /volatile:ms because a lot of the x86 and x64 software written using Microsoft's compiler in the past rely on the extended semantics. Changing the default to /volatile:iso for x86 and x86 would silently break that software in subtle and unexpected ways.
Still, it's sometimes convenient or even necessary to compile ARM software using the /volatile:ms semantics--for example, it might be too costly to rewrite a program to use explicit synchronization primitives. But take note that in order to achieve the extended /volatile:ms semantics within the weakly-ordered memory model of the ARM architecture, the compiler has to insert explicit memory barriers into the program which can add significant runtime overhead.
Likewise, x86 and x64 code that doesn't rely on the extended semantics should be compiled with /volatile:iso in order to ensure greater portability and free the compiler to perform more aggressive optimization.
Code that relies on function call arguments being evaluated in a specific order is faulty on any architecture because the C++ standard says that the order in which function arguments are evaluated is unspecified. This means that, for a given function call F(A, B), it's impossible to know whether A or B will be evaluated first. In fact, even when targeting the same architecture with the same compiler, things like calling convention and optimization settings can influence the order of evaluation.
While the standard leaves this behavior unspecified, in practice, the evaluation order is determined by the compiler based on properties of the target architecture, calling convention, optimization settings, and other factors. When these factors remain stable, it's possible that code which inadvertently relies on a specific evaluation order can go unnoticed for quite some time. But migrate that same code to ARM, and you might shake things up enough to change the evaluation order, causing it to break.
Fortunately, many developers are already aware that argument evaluation order is unspecified and are careful not to rely on it. Even still, it can creep into code in some unintuitive places, such as member functions or overloaded operators. Both of these constructs are translated by the compiler into regular function calls, complete with unspecified evaluation order. Take the following code example:
Foo foo; foo->bar(*p);
This looks well defined, but what if -> and * are actually overloaded operators? Then, this code expands to something like this:
Thus, if operator->(foo) and operator*(p) interact in some way, this code example might rely on a specific evaluation order, even though it would appear at first glance that bar() has only one argument.
On the ARM architecture, all loads and stores are aligned. Even variables that are on the stack are subject to alignment. This is different than on x86 and x64, where there is no alignment requirement and variables pack tightly onto the stack. For local variables and regular parameters, the developer is well-insulated from this detail by the type system. But for variadic functions--those that take a variable number of arguments--the additional arguments are effectively typeless, and the developer is no longer insulated from the details of alignment.
This code example is actually a bug, regardless of platform. But what makes it interesting in this discussion is that the way that x86 and x64 architectures express the behavior happens to make the code function as the developer probably intended it to for a subset of potential values, while the same code running on the ARM architecture always produces the wrong result. Here's an example using the cstdio function printf:
// note that a 64-bit integer is being passed to the function, but '%d' is being used to read it. // on x86 and x64, this may work for small values since %d will "parse" the lower 32 bits of the argument. // on ARM, the stack is padded to align the 64-bit value and the code below will print whatever value // was previously stored in the padded position. printf("%d\n", 1LL);
In this case, the bug can be corrected by making sure that the correct format specification is used, which ensures that the alignment of the argument is considered. The following code is correct:
// CORRECT: use %I64d for 64 bit integers printf("%I64d\n", 1LL)
Windows RT, powered by ARM processors, is an exciting new platform for Windows developers. Hopefully this blog post has exposed some of the subtle portability gotchas that might be lurking in your own code, and has made it easier for you to bring your code to Windows RT and the Windows Store.
Do you have questions, feedback, or perhaps your own portability tips? Leave a comment!
Why not allow MFC to compile to ARM as it already did in Windows Mobile 6.5 !! All porting was already done, it would be a great benefit to allow MFC applications to be available on Windows RT. Why not have some certification which attests an MFC application to comply with the needed energy saving routines Windows RT is capable of. That would be a real benefit for all customers using their devices for real business tasks which no modern UI app can handle with the same efficiency!!
@Moritz I'm very happy that MFC is being banned from WinRT. It is a horrible framework, compared to current standards. And hopefully one day the WinRT XAML will be the default way to develop native C++ application on all Windows platforms.
I'm confused by the Shift operators section. The << and >> operators are logical, and do not rotate the bits. How does this 256-bit "pattern space" affect C++ code? If you shift a 32-bit integer of value 1 left 32 bits on an x86 platform, you don't get 1, you get 0!
In the first place C++ is a horrible language so thats why MCF is horrible too.
MS should made C# a compiled language, Im not sure why continue with the monstrosity of C++.
C# with full garbage collection could use the CLR but they could do C# with just ref counter and be compiled to native, C# also offers pointers and many many goodies and the syntax is clean and awesome.
@emddudley Thanks for your feedback, and in fact you have exposed a subtlety that I had not discussed in the article. What I imagine you are seeing is that you've shifted by 32 positions using a constant or literal value. Then, assuming that the value being shifted is also known at compile time, the compiler's optimizer notices that it can fold the expression away; the expression is evaluated by the compiler's own internal routines, and this result is left in place of the expression.
However, if the compiler cannot fold this expression away because either operand is unknown, the expression is encoded into the instruction stream and eventually goes to the x86 hardware for execution. You will then see a result of 1 in this case and more generally, the "wrap around" behavior I describe in this article. This is because x86 processors effectively modulos the shift amount by 32 internally (most likely by ignoring all but the 5 lowest bits of the shift amount).
This inconsistency of behavior is perfectly fine by the standard, because it leaves the result of these too-large shifts unspecified.
In fact, it's conceivable that there might be examples of x86 hardware that don't behave this way, and as long as your code doesn't rely on the wrap-around behavior, the difference in execution would not have in impact on your code.
Thank you again for your feedback; I'll update the post to reflect this subtlety.
Luckily, Microsoft have artificially blocked third party developers from building desktop apps for ARM (unlike Microsoft's own apps such as IE10 and Office), so we don't have to care about any of this stuff.
Not? Last time I checked C++ on WinRT is still compiled language to machine code...
Microsoft Apps ordered by form-factor target client personal computing devices categorized as .net micro, Windows Phone, Windows RT, Windows 8 and server cloud computing as Azure. Leaving aside intricate details to reach bottom line sooner, how soon will store apps independent from micro-specific hardware capability be sellable as for both phone & micro through the store? Or one independent from phone specific hardware capability be sellable as meant for RT+Phone 8+.net micro? Hardware constraints understandable, however, over-segmentation without cross-segment compatibility is leading to dearth of apps.. :(
Even if I intend to startup business targetting RT, and .net micro, without a mid-size intital setup, undesirable as I wish small micro sole proprietorship startup, constraint of segment to target without confining business prospects to reamining a hobbyist clogs judgment! Technically specializing on WRL ARM is appealing..
@Moritz Leutenecker: MFC for Windows RT would be a recompile of the desktop code if it were permitted, except for the very, very few pieces of assembly. On x64 this is merely _AfxParseCall and _AfxDispatchCall, used for implementing IDispatch. x86 has an assembly version of AfxFindMessageEntry but other platforms just have a C++ implementation.
The port of MFC for Windows Mobile was largely about dealing with the numerous Windows APIs that are or were not guaranteed to be present on Windows CE. There is a tiny amount of ARM-specific code which I would expect would apply the same to Windows RT, assuming that the Application Binary Interface/calling conventions are the same for Windows 8 on ARM as for Windows CE (no guarantee, but I would expect them to be).
For whatever reason, the Windows organization have decided that they will not support ports of legacy desktop applications to Windows RT. It could be down to power saving. It could be that there are plenty of difficult-to-port APIs and libraries that developers would expect to find present. It could be that providing the complete API set would occupy so much space that there would be no space for user apps and data. Windows 8 x86 requires 16 GB, Windows 8 x64 requires 20 GB, and I'd expect that ARM code is likely to be a bit larger than x86 due to its fixed 32-bit instruction size and that it can often take a greater number of instructions to perform an action encoded as one instruction for x86.
> printf("%I64d\n", 1LL)
Is %I64d standard C++11? Isn't it %lld?
C++ really needs a type-safe variant of printf.
"C++ really needs a type-safe variant of printf."
Leo: Anyone can write a C++ native ARM application for Windows RT and the Windows phone 8 platform using the freely available Visual Studio 2012 using the Windows Store app model. The only restriction here is developing Win32 desktop applications for Windows RT. The information in this article applies to any C++ native application compiled for ARM using VC++.
Olaf/Samuel: One of the most useful warnings found by /analyze is validating printf-style format strings against the parameters. Run Code Analysis for all your arch flavors, clean them up, and you'll get them all right. Note that implementations are free to define 'long long' to different bit sizes (VC++ uses the LLP64 model for x64, other platforsm may choose LP64 or ILP64). The most portable thing to do with C++11 is to use stdint.h types such as int64_t instead of "long long" if you really want 64-bits.