Here is the scenario: you have compiled and linked a big program – you may have even shipped it out to customers. After it was built you realize that in order to find a bug or determine some necessary information, you need to instrument a certain function in the program. With Phoenix you don’t need to rebuild the program, but can simply use this Phoenix tool to instrument the binary directly. We do this by inserting a function call into the original program, from a DLL that you have written (It is worth noting that this function call that is inserted takes no arguments in this sample. We will handle passing arguments later, as that is a more complex task).
As I promised, I will switch between C# and C++/CLI. This program is taken from the Phoenix RDK and is written in C++/CLI. If you aren’t familiar with the syntax, it is actually quite similar to C#, but if you want more details then the language specification is located here.
Things Covered in this Article
· Reading/Writing a PE file.
· Creating imports.
· Creating a memory operand (MemOpnd).
· Loading functions.
· Adding function calls to an instruction stream.
There is one part of the program that this article will NOT cover, which are controls, this is Phoenix lingo for the command line arguments. We will cover controls in depth in a later article. They are somewhat apparent from reading the code, so I don’t think you should be confused by their presence, as you’ve probably written code to parse the command line a million times yourself.
Also, I haven’t talked about the details of the various Units in Phoenix. This is something I’ll have to do in a future posting. If any of this blog is unclear due to this omission, let me know, and I’ll make sure to make this clarification.
The Main Function
Like StaticGlobalDump, we start with main(), which is given below (it’s just after Code Point 9). The code that is bolded are function calls that have more user-defined functionality behind it, whereas the non-bold code calls directly into supplied framework code (either the CRT, STL, CLR, or Phoenix).
Code Point 1: Looking at the code, we see that the first thing we do is to initialize the Phoenix targets. In the StaticGlobalDump walkthrough I explained this code, so I’ll skip discussion of it here. The code is identical (except in StaticGlobalDump it was a in a separate function).
Code point 2: This is where we begin initialization of the infrastructure. This is the second time we have seen the BeginInit method, as we also saw it in the StaticGlobalDump program.
What happens when you call BeginInit is that a LOT of things get initialized under the covers. Everything from the initialization of threading and memory management infrastructure of Phoenix, to the symbol and type table, to the controls infrastructure. BeginInit is just something you need to do to get Phoenix started.
Code point 3: Phoenix has a very rich set of command-line parsing capability. The various command line arguments one can pass to a Phoenix client are called controls, and are accessed in the Phx::Ctrls namespace. InitCmdLineParser is a user-defined function that parses the command-line argument for the program using the routines in the Phx::Ctrls namespace. We will cover this capability in a future article.
Code point 4: One reasonable question is “Why is InitCmdLineParser in-between BeginInit and EndInit, whereas in StaticGlobalDump there was nothing in-between those two calls?”
The reason is that at EndInit Phoenix parses the command-line for the controls, thus you need to have the controls setup before EndInit is called, and naturally you can’t set up the Phoenix controls before you start initialization of Phoenix. Therefore the InitCmdLineParser must reside in-between BeginInit and EndInit.
EndInit actually does more than just parse the command-line, but for this particular piece of code that’s the only thing that is relevant. We will dive into some of the other things that need to happen in-between BeginInit/EndInit during another article where it is relevant.
Code point 5: This is a call to CheckCmdLine. This is a simple user-defined function that checks to make sure that each of the required command-line arguments is supplied. If not, it exits the program with an error. Again, we will cover this capability in a future article.
Code point 6: This is where we open a PE file, in the same way we did with StaticGlobalDump. the main difference is that we get the string name from a global variable called “GlobalPlaceHolder”. GlobalPlaceHolder is a class that has a set of controls in it, each one mapping to one of the command line arguments:
public ref class GlobalPlaceHolder {
public:
static Phx::Ctrls::StringCtrl ^ in;
static Phx::Ctrls::StringCtrl ^ out;
static Phx::Ctrls::StringCtrl ^ pdbout;
static Phx::Ctrls::StringCtrl ^ importdll;
static Phx::Ctrls::StringCtrl ^ importmethod;
static Phx::Ctrls::StringCtrl ^ localmethod;
};
GlobalPlaceHolder::in->GetValue(nullptr) gets the string out of the “in” field, which corresponds to the name of the input PE file for this program. For this walkthrough, ignore the nullptr argument. We will cover that in the future when I discuss controls.
Code point 7: These two lines simply copy the command-line arguments out of the controls into two fields in the PEModuleUnit that correspond to the command line arguments. The first being the path for the resulting PE image, and the second line being for the output PDB filename.
Code point 8: LoadGlobalSymbols takes a PE ModuleUnit and loads all of the global symbol data out of the associated PDB file. So after doing this call you will have a symbol table populated with all of the global/static variables and the symbols for the types, and the methods associated with that type.
We will go into more depth about LoadGlobalSymobols in a future posting, but if you’re curious, you can dump the Symbol Table for the PEModuleUnit after you load it (PEModuleUnit->SymTable->Dump(dumpOptions)).
Code point 9: DoAddInstrumentation is where the user logic to add the new calls to the existing function takes place. See later in this article for the section on DoAddInstrumentation.
After that is the call to moduleUnit->Close(). It does more than simply closes the PEModuleUnit. It also checks if the OutputImagePath is non-null. If it is non-null then it writes out the PEModuleUnit to disk, using the OutputImagePath. Note that we did set the OutputImagePath in code point 7, thus when this program ends it generates a new binary. It also generates a new PDB file, placing it at OutputPdbPath, which we also set in code point 7.
int main(array<String ^> ^ args) {
// Initialize the target architectures.
// 1
Phx::Targets::Archs::Arch ^ arch =
Phx::Targets::Archs::X86::Arch::New();
Phx::Targets::Runtimes::Runtime ^ runtime =
Phx::Targets::Runtimes::VCCRT::Win32::X86::Runtime::New(arch);
Phx::GlobalData::RegisterTargetArch(arch);
Phx::GlobalData::RegisterTargetRuntime(runtime);
// Initialize the infrastructure.
// 2
Phx::Init::BeginInit();
// Init the cmd line stuff.
// 3
::InitCmdLineParser();
// Check for Phoenix wide options like "-assertbreak".
// 4
Phx::Init::EndInit(L"PHX|*|_PHX_", args);
// Check the command line.
// 5
::CheckCmdLine();
// Open the module and read it in.
Phx::PEModuleUnit ^ moduleUnit;
// 6
moduleUnit =
Phx::PEModuleUnit::Open(GlobalPlaceHolder::in->GetValue(nullptr));
// Setup output file name and PDB.
// 7
moduleUnit->OutputImagePath = GlobalPlaceHolder::out->GetValue(nullptr);
moduleUnit->OutputPdbPath = GlobalPlaceHolder::pdbout->GetValue(nullptr);
// Iterator will load symbols implicitly.
// However Load Global Symbols upfront and print total.
// 8
moduleUnit->LoadGlobalSyms();
Phx::Output::WriteLine(L"Total Global Symbols Count - {0} ",
moduleUnit->SymTable->SymCount.ToString());
// Do some useful work on the tool front here:
// 9
::DoAddInstrumentation(moduleUnit);
// Close the ModuleUnit.
moduleUnit->Close();
// If this was not the end of the application it would be best to
// delete the ModuleUnit.
// moduleUnit->Delete();
return 0;
}
The DoAddInstrumentation Function
This function is where we do the meat of the work to instrument the PE image. We instrument the PE image with a call to an imported function at entry to the function which is specified on the command-line.
Let’s step back and think about the steps that are required to add a call to a function, even outside of a framework such as Phoenix.
1. Import the DLL that contains the function, F, which we wish to inject into the specified function.
2. Get the import symbol of F from within the import module that we wish to inject into the specified function, S, in the PE file.
3. Get S and find its first instruction.
4. Inject a call to F from the imported DLL before this first instruction in S.
Those are the basic steps that we need to do. DoAddInstrumentation will do these four steps using Phoenix.