Holy cow, I wrote a book!
As a prerequisite, I am going to assume that you understand
how TLS works, and in particular how __declspec(thread)
There's a quite thorough treatise on the subject
Ken Johnson (better known as Skywing),
who comments quite frequently on this site.
and continues for a total of 8 installments,
That last page also has a table of contents so you can skip
over the parts you already know to get to the parts you don't know.
Now that you've read Ken's articles...
No, wait I know you didn't read them
and you're just skimming past it in the hopes that you will be
able to fake your way through the rest of this article
without having read the prerequisites.
Well, okay, but don't be surprised when I get frustrated if you
ask a question that is answered in the prerequisites.
as you learned from Part 5 of Ken's series,
the __declspec(thread) model, as originally envisioned,
assumed that all DLLs which use the feature would be present
at process startup, so that all the _tls_index
values can be computed and the total sizes of each module's TLS data
can be calculated before any threads get created.
(Well, okay, the initial thread already got created, but that's
okay; we'll set up that thread's TLS before we execute any application
If you loaded a __declspec(thread)-dependent module dynamically,
bad things happened.
TLS data was not set up for
any pre-existing threads,
since those threads were initialized before your module got loaded.
Windows doesn't have a time machine where it can go back in time to
when those threads were initialized and pre-reserve space for the TLS
variables your new module needed.
Nope, your module is just out of luck with respect to those pre-existing
threads, and if it tries to use __declspec(thread) variables,
it'll find that its TLS slot never got initialized, and there's no data
there to access.
Unfortunately, there's an even worse problem, which Ken quite ably
elaborates on in
The _tls_index variable inside the module arrived
after the train left the station.
All those TLS indices were assigned at process initialization.
When it loads dynamically, the _tls_index variable
just sits there, and nobody bothers to initialize it,
leaving it at its default value of zero.
(Too bad the compiler didn't initialize it to
As a result, the module thinks that its TLS variables are at
slot zero in the TLS array, leading to what Ken characterizes
as "one of the absolute worst possible kinds of problems to debug":
Two modules both think they are the rightful owners of the same data,
each with a different concept of what that data is supposed to be.
It'd be like if there was a bug in HeapAllocate where
it returned the same pointer to two separate callers.
Each caller would use the memory, cheerfully believing that the
values the code writes to the memory will be there when it comes back.
What truly frightens me is that there's at least one person
who considers this horrific data corruption bug a feature.
webcyote calls this bug "sharing all variables between the EXE and the
DLL" and complained that fixing the bug breaks programs that
"depend on the old behavior".
That's like saying
"We found that if we use this exact pattern of memory allocations,
we can trick HeapAllocate into allocating the same
so we will have our EXE allocate some memory, then perform the
magic sequence of allocations, and then load the DLL,
and then the DLL will call HeapAllocate to allocate some memory,
and it will get the
same pointer back, and now the EXE and DLL can share memory."
this crazy "EXE and DLL sharing thread variables" trick
is extremely fragile.
You have to intentionally delay loading the DLL until
after process startup.
(If you load it as part of an explicit dependency, then you
don't trigger the bug and the DLL gets its own set of variables
And then you have to make sure that the EXE and DLL declare
exactly the same variables in exactly the same order
and link the OBJ files in exactly the right sequence,
so that all the offsets match.
Oh, and you have to make sure your DLL is loaded only into the EXE
with which it is in cahoots.
If you load it into any other EXE, it will start corrupting that
EXE's thread variables.
(Or, if the EXE doesn't use thread variables, it'll corrupt some
other random DLL's thread variables.)
If the feature had been intended to be used in this insane way,
they would have been called "shared variables" instead of "thread variables".
No wait, they would have been called "thread variables
that sometimes end up shared under conditions outside your DLL's control."
I wonder if Webcyote also
drives a manual transmission and
just slams the gear stick into position without using the clutch.
Yes, you can do it if you are really careful and get everything
to align just right,
but if you mess up,
your transmission explodes and spews parts all over the road.
Don't abuse a bug in the loader.
If you want shared variables, then create shared variables.
Don't create per-thread variables and then intentionally trigger
a bug that causes them to overlay each other by mistake.
That's such a crazy idea that it probably never occurred
to anyone that somebody would actually build a system that relies on it!
A customer ran into a problem with the "inadvertently sharing
variables between the EXE and the DLL" bug.
Here is the message from the customer liaison:
My customer has a DLL that uses static thread local storage
(__declspec(thread)), and he wants to use this
DLL from his C# program.
Unfortunately, he is running into the limitation when running
on Windows XP that DLLs which use static thread local storage
crash when they try to access their thread variables.
The customer cannot modify the DLL.
What do you recommend?
Commenter shf gives
the most complete answer.