HoppeRx - the cure for your ailing device

A community site dedicated to the support of device problems found by Hopper

Using MAP files - part 1

Using MAP files - part 1

  • Comments 5

jbroxson@microsoft.com

 

Back in February, the Doctor talked about manually unwinding stacks.  MAP files are a great tool to help when you are doing this, and they are your best tool for resolving the code addresses found on the stack.  I was recently working on an issue that required converting a lot of unresolved stacks and trying to pin down what was happening on several devices.  Doing this reminded me of all the tricks and the Doctor and I decided we should document it for everyone.

 

In part one we’re going to talk about what a MAP file is.  In part two, we’ll talk about how you can use this information for resolving stacks.

 

Today, symbol files contain a lot of information, including some extras like source line information, etc.  To get pretty much ALL of that data in human readable form, you would need both MAP and COD files.  Most people don’t want COD files getting out because it contains their source code.  MAP files are just the basics – offsets for functions, globals, and other data.  It’s plenty of information for what we need to do – resolve a stack – and you should already have them in your flat release directory.

 

I'm going to use COREDLL as an example (I’ve randomly trimmed this a lot). Highlighted text will be discussed after the example:

 

Coredll    <<<< the module name

 

 Timestamp is d600142f   <<<< timestamp when it was built                       

 

 Preferred load address is 10000000  <<<< Where the module wants to load. Don’t trust this!

 

>>>> Begin information about the sections in this module.

 Start         Length     Name                   Class

 0001:00000000 00005c44H .rdata                  CODE

 0001:00005c44 00000024H .rdata$debug            CODE

 0001:00005c68 000003c4H .rdata$r                CODE

 0001:0000602c 00065338H .text                   CODE

 0001:0006c1f0 0000c36eH .edata                  CODE

 0002:00000000 00000004H .CRT$XCA                DATA

 0002:00000004 00000004H .CRT$XCAA               DATA

 0002:00000028 000009acH .data                   DATA

 0002:000009e0 00000384H .bss                    DATA

 0003:00000000 00005310H .pdata                  DATA

 0004:00000000 000002b0H .rsrc$01                DATA

 

>>>> Begin actual symbolic information

  Address         Publics by Value              Rva+Base     Lib:Object

 

 0000:00000000       ___safe_se_handler_count   00000000     <absolute>

 0000:00000000       ___safe_se_handler_table   00000000     <absolute>

 0001:00000030       ??_7exception@std@@6B@     10001030     coredll_ALL:stdexcpt.obj

 0001:00000174       ??_C@_17LDADEION@?$AAI?$AAM?$AAE?$AA?$AA@ 10001174     coredll_ALL:Imm.obj

 0001:0000017c       ??_C@_13COJANIEC@?$AA0?$AA?$AA@ 1000117c     coredll_ALL:Imm.obj

 0001:00000180       ??_C@_15KNBIKKIN@?$AA?$CF?$AAd?$AA?$AA@ 10001180     coredll_ALL:Imm.obj

 0001:00000434       ??_7logic_error@std@@6B@   10001434     coredll_ALL:string.obj

>>>> Ok, enough of that.  It’s ugly and not relevant to us. Further in, we start seeing the following:

 

 0001:00000960       cszTimeZones               10001960     coredll_ALL:time.obj

 0001:00000978       NormalYearDaysBeforeMonth  10001978     coredll_ALL:time.obj

 0001:00000994       LeapYearDaysBeforeMonth    10001994     coredll_ALL:time.obj

 0001:000009b0       NormalYearDayToMonth       100019b0     coredll_ALL:time.obj

 0001:00000b20       LeapYearDayToMonth         10001b20     coredll_ALL:time.obj

<<<< Those are global variables.  I’ll tell you how I knew that soon.

 

>>>> Now things get interesting.  Notice that in the 4th column there is an “f”?  That stands for function. (that’s how I knew the ones above were globals... no “f”)

 0001:0000602c       mbstowcs                   1000702c f   coredll_ALL:coredll.obj

 0001:00006114       wcstombs                   10007114 f   coredll_ALL:coredll.obj

 0001:00006210       RegisterDlgClass           10007210 f   coredll_ALL:coredll.obj

 0001:000062a4       CoreDllInit                100072a4 f   coredll_ALL:coredll.obj

  

>>>> More functions, but they look kinda funny. That’s because they are “decorated” or “mangled”. More on this in part two.

 0001:00006338       ??0exception@std@@QAA@XZ   10007338 f   coredll_ALL:stdexcpt.obj

 0001:00006354       ??0exception@std@@QAA@PBD@Z 10007354 f   coredll_ALL:stdexcpt.obj

 0001:000063a0       ??0exception@std@@QAA@ABV01@@Z 100073a0 f   coredll_ALL:stdexcpt.obj

 0001:00006404       ??1exception@std@@UAA@XZ   10007404 f   coredll_ALL:stdexcpt.obj

  

I said above that you should not trust the “Preferred load address”.  There is a simple reason for this.  It is preferred.  If I do a findstr on MAP files in one of my flat release directories for the string “Preferred load address is 10000000”,  I get back 582 hits.  We know that all those files aren’t loading at the same address.  Some binaries will be rebased to a specific pre-determined address, others will be dynamically rebased when they are loaded.  In part two, I will talk about ways to determine the true load address for a module.

 

Above you will also notice the addresses in the first column that look like this: 0001:00005c44.  This is a segmented address.  Without going into too much explanation, a segmented address is an offset relative to a segment written in the form segment:offset.  In the this case, the address represents an offset of 0x5c44 bytes into segment number 0x0001.   So, if segment 0x0001 began at 0x01000000 this would represent a relative address of 0x01005c44

 

Another term above that should be defined is, RVA+Base.   RVA is the Relative Virtual Address.  Base is the Preferred Load Address I already told you about.  So, let’s look at the following line:

 

 Address         Publics by Value              Rva+Base     Lib:Object

0001:00006114       wcstombs                   10007114 f   coredll_ALL:coredll.obj

 

Here, the RVA+Base is 0x10007114.  We know from the top of the file that the Preferred load address is 0x10000000.  Subtracting that, we are left with 0x7114.  This is the offset into the file to the beginning of the wcstombs function.  We can learn something else here as well. The segmented address is 0001:00006114.  If we subtract 0x6114 from 0x7114 we get 0x1000.  Why does this difference exist?  Well, 0001:00006114 is segmented.   It is an offset into segment 0x0001.  We now know that segment 0x0001 begins at 0x1000 – 4k into the file.

 

Ok.  So now you’ve seen a MAP file, and have a couple of clues about what’s in it.  In part two, we will talk about using this information to resolve call stacks.

 

Comments
  • Good article,

    Where is the next part?

  • Also see http://blogs.msdn.com/sloh/archive/2005/02/28/381706.aspx

    Though I never talked about global vars at all.  I always meant to post about how to go the other way (figure out where in RAM a particular symbol was, which is more useful for global variables) but never got around to it.

    Sue

  • Where can i find part2, thanks!

  • Maybe this guy quit from M.S, because the E-mail address not work;

    but thanks to him too;

  • can u tell how to get the actual address of the global variables and is it possible to set the values of variaables based on this address

Page 1 of 1 (5 items)
Leave a Comment
  • Please add 2 and 7 and type the answer here:
  • Post