MrEpl a fun way to learn MSchema
12 November 08 10:34 PM | joshwil | 2 Comments   

One of the hidden gems (in my mind, but then I’m biased) in the Microsoft “Oslo” SDK that went public at PDC is the “M Script Mode” in Intellipad which we affectionately know as MrEpl.

MrEpl is a REPL processor for the MSchema language and is a great way to play around with M and learn about it.

I have recorded a screen cast of some adventuring through MrEpl and it is posted on the Oslo dev center here: http://wm.microsoft.com/ms/msdn/oslo/MrEplWalkthrough.wmv

 

For those who just want to play, make sure you have the Oslo SDK installed, then run “Intellipad (With Samples)”.

- Once in Intellipad type “Ctrl-/” to get to the minibuffer.

- There type "SetMode(‘MScriptMode’)”

You’re now in MrEpl. Feel free to type an M expression: 1 + 1

 

Have Fun!

Using MGrammar to create .Net instances through Xaml
12 November 08 10:30 PM | joshwil | 5 Comments   

In the last couple months I have had more fun playing with MGrammar than just about any other programming I’ve ever done, but in its current form the one thing that really feels missing to me is an easy ability to create CLR object instances.

MGrammar is great for tying in with MSchema, and you can write custom IGraphBuilder implementations for all kinds of things, but in the end most of the code I am writing is still C# and needs instances as input.

Enter Xaml, a wonderful format for writing down arbitrary .Net objects (well many of them anyway). And, it turns out that the MGraph data rep is reasonably friendly to a Xaml projection. So, now posted as a sample on the Oslo dev center is some code I wrote to enable targeting Xaml with MGrammar output.

 

Get the bits here: [updated link on 11/28, apparently the bits moved]http://download.microsoft.com/download/B/9/8/B982BFD9-3E17-426A-A537-833BEF376698/MGrammarXamlSample.msi

 

And after that go watch a couple of screen casts that I recorded about using the sample.

 

First: a very simple example of instantiating a Person object from MGrammar output: http://wm.microsoft.com/ms/msdn/oslo/PersonExample.wmv

 

This example takes a simple grammar and some input, creates a graph and then runs it through the MGraphXamlReader to create instances of the type Person (see below) which I defined in C#.

image

public class Person
{
    List<Person> children = new List<Person>();

    public string First { get; set; }
    public string Last { get; set; }
    public ICollection<Person> Children { get { return children; } }

}

 

 

A more complex example of a grammar for writing down WPF applications in an English like syntax:

http://wm.microsoft.com/ms/msdn/oslo/mwindowexample.wmv

What have I been up to?
02 November 08 10:17 AM | joshwil | 1 Comments   

My blog has been a little quiet in the last year or two after I left the CLR and went to work for Doug Purdy’s (http://www.douglasp.com) team in the Connected Systems Division. Since then I’ve been having amazing amounts of fun and working on lots of fun stuff ranging from duck typing, serialization/deserialization stacks, change notification systems and most recently language/compilers with the new language that Microsoft is unleashing on the world called M (http://blogs.msdn.com/mlanguage).

You can check out a bunch of interesting content about the Oslo product on MSDN at http://msdn.microsoft.com/oslo. Or, just hang around here as I’m sure to post entries now that things are public about my adventures in this new family of languages and their uses.

Filed under: , ,
Haven't posted in a while... New challenges... New peers?
18 November 07 10:21 PM | joshwil | 1 Comments   

So, I haven't posted in quite a while, I need to get on that... In the meantime I've moved on from the CLR team after about 5 years of fun times and am now having fun exploring some new stuff that unfortunately I can't talk about... However, the peers are awesome, and, now we're looking for more awesome people:

http://www.douglasp.com/blog/2007/11/15/MyTeamIsHiring.aspx

Interested in kick ass technology with crazy smart people? Email Doug (douglasp@microsoft.com).

Patrick is blogging, make sure to check it out!
22 November 06 01:10 PM | joshwil | 1 Comments   

http://blogs.msdn.com/patrick_dussud/default.aspx

Patrick is super-smart... Hopefullly his blog will pick up where Chris Brumme's left off a couple years back.

 -josh

PerfConsole is my kind of 20% time...
10 August 06 10:44 AM | joshwil | 3 Comments   

Other companies have much talked about 20% time... That sounds great to me! Here at Microsoft (at least in my group) we do too, kinda. I find that every one of my peers has some interesting project that they're working on outside of their normal work activities, some of the categories for people I know of on the CLR are:

- pet projects related to the product (in this case CLR) which they are prototyping in the hope of getting some traction in the problem space

- pet projects related to the engineering effort around the product (e.g. static analysis, etc...)

- tools related to software engineering in general (much of my stuff falls into this category)

- random stuff using managed code just to use manage code (eating your own dog food is good right? especially when it tastes this good)

Many of the tools that we use to build and test the CLR started this way, two public examples being PerfConsole and MDBG. Both of these were started as pet projects and have become an integral part of CLR processes (admittedly MDBG significantly more so than PerfConsole at the moment, MDBG being the CLR's primary managed debugging test harness).

 

Anecdotally I believe that there are lots of people inside Microsoft doing this as well, internally we have a system for sharing tools you've written that help you get things done and last time I checked it had 4500+ tools on it, and that's just the stuff that people bother to publish.

 

I strongly believe that pet projects should be encouraged within any software development team. I'm glad that my experience with Microsoft has been that they do encourage experimentation, after all where do you think the CLR came from?

PerfConsole is unleashed...
04 August 06 08:01 AM | joshwil | 8 Comments   

So, Rico beat me to the punch: http://blogs.msdn.com/ricom/archive/2006/08/03/688019.aspx

 

As we have seen in the previous entry it is nice that the VSTS Profiler team has provided a tool which converts their .VSP files into something which is more easily readable by other programs. While I was working on CLR oerformance last year I spent a lot of time playing with the output of the VSTS Profiler before the UI pieces were ready, so I spent a lot of time using the data in the .CSV files to diagnose problems. After just a short amount of time doing this by hand I decided to go ahead and automate the process, I created a number of utilities to parse the output CSV files and find some interesting information. In that first rev you basically ha d to run the application anytime you wanted to change the inputs to the query (different string, different cutoffs for various things, etc.). It didn't take too long to realize that this wasn't going to scale, and PerfConsole was born...

What we can see above are some of the basics of PerfConsole's command syntax; overall it strives to mix some of the console module that cmd.exe or Windows PowerShell has with what is more or less a domain specifica language for interpreting profile data. There are a number of commands which are included in the base package which include the following which we can see above:

- load: loads a profile from disk, commonly you specify a .VSP file and PerfConsole will look for the (required) associated .CSV files which were generated either from the "export" function in VS or using the command line as outlined here.

- functions: takes a profile as input and returns the piece that corresponds to a list of functions which were seen in the profile with Inclusive and Exclusive times.

- modules: takes a profile as input and returns the piece that contains a list of modules which were seen in the profile with Inclusive and Exclusive times.

- sort: takes in a list and returns a sorted list ordered by some property (in this case "ex" is shorthand for "ExclusivePercentage")

- top: takes in a list and returns a subset of that list including X elements starting at element 0

Through the example above we can start to see the basic syntax of using PerfConsole: you use the '|' to connect statements where each statement (reading left to right) has input from the return value of the previous statement and the parameters specified. Like Windows PowerShell when things are passed from left to right they are full fledged .Net objects, and PerfConsole can use those objects to do some type checking as you go in order to tell you if you're combining commands in unsupported ways.

Incidentally I initially tried really hard to reverse the execution order and instead execute from right to left, it made for statements that read like a sentance "top 5 | sort ex | functions @board_ngen" == "get me the top 5 exclusive functions in @board_ngen"; however the people internally revolted and said that if that is what I wanted I couldn't overload '|' to do it.

As a side effect (or design point) of the way that PerfConsole passes objects between statements I have been able to implement a very powerful help system, for instance at any point you're able to pipe the result of any statement or set of statements to '?' and it will tell you the resulting data type and which commands will accept that type as a input (in a strongly typed manner). For instance:

> functions @board_ngen | ?
Data type is: PerfConsole.Data.FunctionSummaryData

This data type can be input to the following commands:

bottom
count
find
getitem
getproperty
sort
top
trim
timebytype
partition

Here we see that the functions command returns a FunctionSummaryData object which can be input to a number of other commands, including the top and sort commands. Here we also see that functions has this wierd "@board_ngen" parameter, in PerfConsole @ denotes a temporary (you can create these using the save command), and the temporary named "board_ngen" is the profile that I've imported for this demo.

Help also works well with commands. For instance:

> ? sort
Sort a ISortable (e.g. FunctionSummary, ModuleSummary). Each type has its own
sort types (e.g. 'name', 'address', 'exclusive', etc...).

Usage:
ISortable | sort <field:SortField> | ISortable
ISortable | sort <field:SortField> <direction:SortDirection> | ISortable

Have enum paramters:
SortDirection { Ascending, Descending }
SortField { Name, ExclusivePercentage, InclusivePercentage, Address }

Here we can see that we are able to see that the sort command takes as input something which implements ISortable (which by inference from the previous help output we believe that FunctionSummaryData does), it also can take up to two parameters which are values of the enum SortField and SortDirection (these are allowed to be shortened arbitrarially so long as they remain disabiguated from their peers). As a return value sort specifies that it returns an ISortable, if we pipe the return value from sort to the help we see:

> functions @board_ngen | sort ex | ?
Data type is: PerfConsole.Data.FunctionSummaryData

This indicates that sort actually returns a FunctionSummaryData (which implements ISortable), behind the scenes the sort command clones the original FunctionSummaryData and returns the modified copy.

 

Note that for simplicity's sake I've used PerfConsole with the /console command line option which causes results to show in the console, the default is to show results as HTML which you can click around to do more queries and such, in fact, before I end this blog entry which has already managed to get too long I'll show you Rico's favorite command:

> calltree @board_ngen | trim in > 5 | compacttree

 

Anyway, that's a primer on using PerfConsole, please download it here. When you start it up, the ?? command will open a longer document which should hopefully get you started quickly.

Performance Analysis
03 August 06 11:15 AM | joshwil | 4 Comments   

I have spent the last couple years investigating the performance of various parts of the CLR and managed code, in that time I have learned that a profiler is an invaluable tool. There are quite a few profilers available including: CLR Profiler, Intel's VTune, AMD's CodeAnalyst, Microsoft's VSTS Profiler, Kernrate and various profilers which Microsoft has developed internally (including the predecessors to VSTS Profiler) as well as many others which I don't list here only because I don't have experience with them. While every profiler has strengths and weaknesses, lately I've been finding that I turn to the VSTS Profiler more often than not for my first analysis.

If you have VSTS and want to learn more about the profiler see the team's blog: http://blogs.msdn.com/profiler/

VSTS Profiler has many options and through those can get at a lot of data about your application, however I usually find that some of the most interesting can be obtained with simple vanilla sample based profiling. You can do this through the VS interface, or from the command line using the VSTS Profiler command line utilities, basically the steps are as follows:

1) Only if profiling managed code: vsPerfClrEnv.cmd /sampleOn 

2) vsPerfCmd.exe /start:sample /output:<name>.VSP

3) vsPerfCmd.exe /launch:<application>.exe

    <<wait for application to close>>

4) vsPerfCmd.exe /shutdown

By following those 4 steps you can profile an application, the result is a .VSP file named <name>.VSP. This contains all the profile data collected during the run. This file is readable directly by VSTS. There are however interesting analysis to do on the data that the VSTS 2005 interface doesn't have support for, to address this issue the VSTS Profiler team has included a utility called vsPerfReport.exe which can translate a .VSP file into a set of comma seperated value (CSV) files, one for each of the major views seen in VSTS. In order to do this you execute:

5) vsPerfReport.exe /summary:all <name>.VSP

at the end of all this you will have a set of files named <name>_<type>.CSV which are views into the data in the VSP, you can open these in Notepad to view the text or import them into Excel to see the columns more clearly called out.

Once the data is in this form we can do whatever we like with it, for instance sorting by a particular column in Excel or searching for a particular row using a string in Notepad. However, wouldn't it be interesting to be able to do more?

Stay tuned...

Visual Studio Team System 180 day trial...
01 August 06 08:46 AM | joshwil | 1 Comments   

I just found out that VSTS has a free 180-day trial edition.

http://msdn.microsoft.com/vstudio/products/trial/

This is AWESOME because VSTS has a great profiler included in the package!

Should I choose to take advantage of 64-bit?
18 July 06 01:08 PM | joshwil | 12 Comments   

Here's the guts of a response that I posted a while back to an internal mailing list re: tradeoffs of runing your managed code as 64-bit vs 32-bit. YMMV, and I'll remind you that every perf question has a thousand answers depending on the situation.

 

>>>>>>> snip >>>>>>>>>>>>>>>>>

 

Here's my own personal list of the big pluses and minuses of moving to 64-bit code...

 

Pluses:

- more memory (+++++)

- better 64-bit math (+++)

- X64 OS kernel takes advantage of more memory to do good things for a lot of stuff (+++)

Minuses:

- things need more memory (pointers are bigger, and especially in managed code references are everything and are everywhere) (--)

- the processor's cache is effectively smaller (when comparing against the same machine in 32-bit vs 64-bit mode) because of the prior point (----)

- code also tends to be bigger because of extra prefix bytes and instructions that carry around 8-byte immediate values instead of 4 byte immediate values

 

What this tends to mean is that code that runs extremely well on 32-bit, doesn't have any 64-bit math (or otherwise take advantage of improvements in the 64-bit processor) and runs well in < 2GB of memory without having to bother hitting the disk for anything will likely continue to run on 64-bit with somewhat MORE memory usage and a little bit slower because the processors cache is effectively smaller when compared to the bloated size of the things that need to be in it.

In the scenario described above you get the minuses of the platform without taking advantage of the pluses.

If however you have an application or set of applications that can take advantage of the pluses to offset the minuses they can come out in the black (sometimes _very_ much so). We have seen a number of large applications which used to be memory starved on 32-bit and had some type of home-grown paging able to throw that more or less out the window and see their performance go up by 2, 3 or even 4X. PaintDotNet (which is a pretty cool photo editing application, Rick Brewster's blog: http://blogs.msdn.com/rickbrew/default.aspx) rewrote a bunch of their filters to take advantage of 64-bit math and saw speed boosts moving to x64 of 3X+ for those filters. I just saw a presentation the other day where microsoft.com was saying that they have seen both significant reliability boosts and throughput increases moving to 64-bit (however they were running 12 app pools on a box and were definitely running into the memory limits of the 32-bit system).

Blog about writing profiler stubs to interact with the 2.0 runtime.
11 August 05 12:52 PM | joshwil | 3 Comments   

Check out this blog entry (http://blogs.msdn.com/jkeljo/archive/2005/08/11/450506.aspx) to see some information that is rather interesting to people writing managed profilers, and probably not very interesting to everyone else.

BigArray<T>, getting around the 2GB array size limit
10 August 05 06:14 PM | joshwil | 14 Comments   

I’ve received a number of queries as to why the 64-bit version of the 2.0 .Net runtime still has array maximum sizes limited to 2GB. Given that it seems to be a hot topic of late I figured a little background and a discussion of the options to get around this limitation was in order.

First some background; in the 2.0 version of the .Net runtime (CLR) we made a conscious design decision to keep the maximum object size allowed in the GC Heap at 2GB, even on the 64-bit version of the runtime. This is the same as the current 1.1 implementation of the 32-bit CLR, however you would be hard pressed to actually manage to allocate a 2GB object on the 32-bit CLR because the virtual address space is simply too fragmented to realistically find a 2GB hole. Generally people aren’t particularly concerned with creating types that would be >2GB when instantiated (or anywhere close), however since arrays are just a special kind of managed type which are created within the managed heap they also suffer from this limitation.

<Sidenote> managed arrays: arrays are a first class type in the CLR world and they are laid out in one contiguous block of memory in the managed garbage collected heap. In the CLR 1.1 they can be thought of as the only generic type (in 2.0 we’re introducing a much more universal concept of generics) in that you can have an array that is of the type of any managed type that you like (primitive types, value types, reference types). It is interesting to think about what that means in context of the 2GB object instance size limit imposed on objects in the managed heap. With value types (bool, char, int, long, struct X {}, etc…) the actual data of the instance for each element in the array will be laid out contiguous with the next element in memory, since the 2GB limit discussed earlier applies to the total array size, and the array size is a factor of the type size the maximum number of elements you can store in an array of type X will vary proportionally to the size of X.

Differing from this are arrays of reference types (e.g. objects, strings, class Y {}, etc…), for these arrays the actual array will be that of a bunch of references, initially null. To initialize the array your code will need to go through one element at a time and create or assign an appropriate instance of the type to that array element. The 2GB size limit for arrays applies to this array of references, not the instances of the objects themselves. On a 32-bit machine if you create an array of type object (object[]) and one instance of type object per element in the array then your available virtual address space will end up limiting the size of your array as you will never be able to fit enough objects in memory to be able to fill up a 2GB object array with unique object references.</Sidenote>

The developer visible side of this is that array indexes are signed integers (with a byte[] you can use the full positive space of the signed integer as an index (assuming the array is 0 based), with other types you use some subset of that space until the total array size is 2GB). While some of the BCL APIs that deal with arrays have alternate signatures that take longs this isn’t yet ubiquitous in the framework (i.e. the IList interface (which the BCL’s Array class implements) uses int indexes).

It is debatable whether or not we should have included a “Big Array” implementation in the 2.0 64-bit runtime, and I’m sure that debate will rage for some years to come. However, as 2.0 is getting ready to ship and there currently isn’t any support for this we are going to have to live without it until at least the next version.

So, what is there to do in .Net 2.0 if you have an application which requires arrays that are very large?

Well, first switch to 64-bit! As mentioned, it is next to impossible to allocate a full 2GB array on 32-bit because of the way that the virtual address space is broken up by modules and other various allocations. Simply switching to 64-bit will buy you the ability to allocate those full 2GB blocks (well, close anyway, the total object size is limited to 2GB, but there is some CLR book-keeping goo in there that takes a few bytes).

What if that still isn’t enough? You have a couple of choices:

A) Rethink your application’s design? Do you really need a single gigantor array to store your data? Keep in mind that if you’re allocating 8GB of data in a single array and then accessing it in a sparse and random manner you’re going to be in for a world of paging pain unless you have a ton of physical memory. It is very possible that there is another data organization scheme you can use where you can group data into frequently used groups of some sort or another and manage to keep under the 2GB limit for an individual group. If you choose correctly you can vastly improve your applications performance due to lower paging and better cache access characteristics that come from keeping things that are used together close to one another.

B) Use native allocations. You can always P/Invoke to NT’s native heap and allocate memory which you can then use unsafe code to access. This isn’t going to work if you want to have an array full of object references, but if you just need a huge byte[] to store an image this might work out fine, even great. The added cost of the P/Invoke is low because the NT APIs have simple signatures that don’t require marshaling and the code executed when allocating an 8GB block is probably mostly zeroing the memory anyway. If you choose this option you will have to write a small memory management class of some kind and be comfortable using unsafe code. I know that Paint.Net (http://blogs.msdn.com/joshwil/archive/2005/04/07/406218.aspx) uses this very method for allocating the memory in which they store the image (and it’s various layers) which you’re editing. This is a good solution for the case where you really need a single unbroken allocation of some large size. While it isn’t a very general purpose solution it works out great for the Paint.Net guys.

C) Write your own BigArray class.

I’d stress that option C is my least favorite of the above three, but I will acknowledge that there are probably cases where it is the right thing to do. Given that, I have gone and written one myself. This is a very bare bones implementation, just the array allocation and accessors are implemented, I will leave implementing any extended functionality (like the functionality provided by the static members of the Array class, Sort, Copy, etc… or writing big collections on top of it) as an exercise for the reader.

// Goal: create an array that allows for a number of elements > Int.MaxValue
class BigArray<T>
{
    // These need to be const so that the getter/setter get inlined by the JIT into
    // calling methods just like with a real array to have any chance of meeting our
    // performance goals.
    //
    // BLOCK_SIZE must be a power of 2, and we want it to be big enough that we allocate
    // blocks in the large object heap so that they don't move.
    internal const int BLOCK_SIZE = 524288;
    internal const int BLOCK_SIZE_LOG2 = 19;

    // Don't use a multi-dimensional array here because then we can't right size the last
    // block and we have to do range checking on our own and since there will then be
    // exception throwing in our code there is a good chance that the JIT won't inline.
    T[][] _elements;
    ulong _length;

    // maximum BigArray size = BLOCK_SIZE * Int.MaxValue
    public BigArray(ulong size)
    {
            int numBlocks = (int)(size / BLOCK_SIZE);
            if ((numBlocks * BLOCK_SIZE) < size)
            {
                numBlocks += 1;
            }

            _length = size;
            _elements = new T[numBlocks][];
            for (int i=0; i<(numBlocks-1); i++)
            {
                _elements[i] = new T[BLOCK_SIZE];
            }
            // by making sure to make the last block right sized then we get the range checks
            // for free with the normal array range checks and don't have to add our own
            _elements[numBlocks-1] = new T[NumElementsInLastBlock];
    }

    public ulong Length
    {
        get
        {
            return _length;
        }
    }

    public T this[ulong elementNumber]
    {
        // these must be _very_ simple in order to ensure that they get inlined into
        // their caller
        get
        {
            int blockNum = (int)(elementNumber >> BLOCK_SIZE_LOG2);
            int elementNumberInBlock = (int)(elementNumber & (BLOCK_SIZE – 1));
            return _elements[blockNum][elementNumberInBlock];
        }
        set
        {
            int blockNum = (int)(elementNumber >> BLOCK_SIZE_LOG2);
            int elementNumberInBlock = (int)(elementNumber & (BLOCK_SIZE – 1));
            _elements[blockNum][elementNumberInBlock] = value;
        }
    }
}

The beauty of this implementation is that the JIT already understands single dimensional array accesses intrinsically, including range checking code. In practice this class ends up being almost as fast as real array access for small arrays (< BLOCK_SIZE) and not too much slower once you get to reasonably big arrays. It doesn’t waste much space compared to a normal array because the last block is right sized and the performance is good because the getter and setter for array elements are simple enough that they should get inlined into the calling method, this becomes very important for getting anywhere close to normal array access speeds.

Here is an example of big array usage:

public static void Main()
{
    long size = 0x1FFFFFFFF;
    BigArray<int> baInt = new BigArray<int>(size);
    long len = baInt.Length;
    for (long i=0; i<len; i++)
    {
        baInt[i] = i;
    }
    Console.WriteLine("baInt[len/2]=" + baInt[len/2]);
}

You could imagine also exposing the fact that this BigArray<T> implementation has blocks through a couple of properties and a indexer of this[int block, int element] which would allow people to intelligently write code to do block based access on the array (e.g. merge sorts that are block intelligent). This can be important for performance as we know that elements within a single block are contiguous in memory, however we cannot make that guarantee about elements in neighboring blocks.

It is worth noting that given the allocation scheme of the BigArray<T> constructor we may very well have multiple garbage collections while it runs, because of this you don’t really want to be using large instances of this class in a throw away manner. My advice would be to use this carefully and sparingly, instead favoring architectures which don’t require such large single arrays.

 

What is the difference in a P/Invoke signature between “byref byte” and “byte[]”?
10 August 05 06:11 PM | joshwil | 3 Comments   

Lately we’ve seen a spate of issues coming up on 64-bit platforms within the Developer Division around usages of P/Invoke signatures which declare a parameter as type “byref byte” where the developer really means “byte[]” (the corresponding native parameter type being something like LPBYTE). Usually when something works on 32-bit and doesn’t work on 64-bit we quickly get a phone call or email indicating that this must be a CLR problem, and this case was no different.

I received an email which pointed me to the following P/Invoke signature:
[DllImport(“kernel32.dll”)]
public static extern int ReadProcessMemory(
                             IntPtr hProcess,
                             IntPtr lpBaseAddress,
                             ref byte lpBuffer,
                             IntPtr nSize, 
                             IntPtr lpNumberOfBytesWritten
);

Looking at MSDN we can see that the C prototype for this function is:
BOOL ReadProcessMemory(
         HANDLE hProcess,
         LPCVOID lpBaseAddress,
         LPVOID lpBuffer,
         SIZE_T nSize,
         SIZE_T* lpNumberOfBytesRead
);

There are a number of problems with the P/Invoke declaration (it’s return type should be BOOL for instance and the nSize parameter should probably be a UIntPtr instead of IntPtr), those aside, the real problem is that the lpBuffer parameter shouldn’t be defined as a byref byte. The intended usage was:

byte[] b = new byte[100];
ReadProcessMemory(…, …, ref b[0], …, …);

The expectation being that this would result in a pointer to the beginning of the byte array being delivered to the native code to play with. However that wasn’t happening and ReadProcessMemory was returning a failure (something that was actually very convenient in tracking down this bug). In the end though, this isn’t a CLR problem, it is a usage problem with the P/Invoke signature declaration. If that’s the case then you might ask: why did it work on 32-bit in the first place? Well, because of an “optimization” (I put it in quotes for a reason) in the x86 P/Invoke code “byref byte” means that we just happen to pin the reference to the byte which is passed through the P/Invoke layer and we pass that pinned original reference on to the native code.

This means that if you pass in a reference to the first byte (or any byte for that matter) of an array of bytes then we will pass a pointer to that and the native code can party on the rest of the array just as if we passed an interior pointer into the object (well, we did). It is very possible that this makes a lot of sense to those C++ programmers out there who have become very accustomed to a reference and a pointer being the same thing, and being able to do fancy pointer math on references just by casting them to pointers.

It turns out that in the 64-bit implementation of P/Invoke (which under the covers is radically different than the 32-bit implementation) we decided to more accurately represent a “byref byte” as a reference to a single byte, in fact, we allocate the byte on the interop layer’s stack and pass along a reference to that to the native code. On the way back to managed code we copy that byte back into the GCHeap wherever the managed object identified by the incoming object reference is currently living (in this case some byte in a byte array). This decision was also made as an “optimization” to avoid some of the frequent pinning that the CLR does during interop (as pinning can be rather hard for the GC to deal with, especially for very small objects and generally the less of it the better).

We do this for small native types that we can move around with an instruction or two, however for larger types (like an actual byte array, specified as a “byte[]” in a P/Invoke signature (or a “byref byte” identified by an array attribute of sorts) we still do go ahead pin the reference in the GC Heap and pass along the pinned reference to native code to party. This is what the developer of the above code intended to happen.

The correct P/Invoke signature would be (conveniently this can be found on http://www.pinvoke.net):
[DllImport(“kernel32.dll”)]
public static extern bool ReadProcessMemory(
                              IntPtr hProcess,
                              IntPtr lpBaseAddress,
                              [Out] byte[] lpBuffer,
                              UIntPtr nSize,
                              IntPtr lpNumberOfBytesWritten
);

Given this fixed signature we will pin the byte[] reference and pass along a pointer to the unmanaged code and everything will work as expected. Fortunately in this case for the group that wrote this code ReadProcessMemory was able to return a failure when it received what it deemed to be a bad pointer for lpBuffer, in most cases you will probably just end up seeing spectacular failures when the native code that you’re P/Invoke-ing out to starts overwriting your application’s stack. So it is very important to remember to get your P/Invoke signatures right!!! It will save you some serious debugging later.

 

Bit specific code in agnostic assemblies???
10 August 05 06:09 PM | joshwil | 7 Comments   

In previous blog entries I’ve spent some time talking about how to mark assemblies as bit specific and how the loader deals with those markings.

What however is the preferred mode of an application? I will posit that it is to be compiled agnostic and to run equally well on both 32-bit and 64-bit platforms. It makes a lot of things easier: development, build, testing, deployment, servicing…

Caveat: The following discussion deals only with fully IL assemblies. If you generate managed C++ code you may end up with some native code in your image at which point it has to be tied to one specific platform.

If you have a reason to tie yourself to only one platform (e.g. x86 because you only have an x86 version of some native DLL that you need to P/Invoke to) then your decision is easy and it’s been made for you. Just flip the /platform:x86 switch on your compiler and go. However, if you have some code that works on both 32-bit and 64-bit platforms with just some subtle difference then you have a couple of options to think about for implementing the differences:

1) Use compile time defines (#if/#else in C#) to separate your 64-bit code from 32-bit code. Use the /platform:X switch of your compiler to generate different assemblies for 32-bit and (both) 64-bit platforms.
2) Use runtime if/else blocks to separate 32-bit and 64-bit code.

Both of these end up having their place. In most cases I’ve seen, people have only a small amount of code which needs to be bit specific, and in those cases dealing with the rest of the hassles around building and deploying multiple assemblies aren’t really worth it…

But, what about the runtime cost of the check? What if your bit specific code is on the hot path? Won’t that hurt?

Actually, that’s the cool part, if you do it right it won’t[1]. So, what are your options for determining bitness of a process at runtime?

A) if (Marshal.SizeOf(IntPtr.GetType()) == 8) {/*64*/} else {/*32*/ }
B) if (IntPtr.Size == 8) {/*64*/} else {/*32*/}
C) readonly static bool is64Bit = (IntPtr.Size==8);
    if (is64Bit) {/*64*/} else {/*32*/}

Of those options there are 2 right ways and a wrong way. Unfortunately, some of the early information coming out of Microsoft indicated that you should use Marshal.SizeOf, which is definitely the wrong way to do this. That check involves a call to the marshaling code in mscorlib.dll and since the JIT (or ngen) compiler doesn’t know at JIT (or ngen) time what the result will be the unused half of the code can’t be optimized away as dead code.

The easiest way to do this is B, since IntPtr.Size is a constant which is hard-coded into mscorlib.dll when we build the runtime, the JIT (or ngen) can check the loaded mscorlib.dll (which will vary depending on bitness) and optimize away the check and the unused half of the code.

Option C works also, but it has a potentially subtle bug to it. If you don’t mark the static variable definition as readonly then the JIT (and ngen) won’t be able to optimize away the check and unused code. This is because it has to assume that the value can change at runtime. This is very important to remember because without this keyword this solution will become almost as bad as A.

Recommendation: for simple cases, use “if (IntPtr.Size==8)” to determine 64-bitness. For more complex cases consider using a static boolean, but remember to mark it as readonly.

Unfortunately the if/else solution won’t work well for cases where you need different structure definitions on 64-bit and 32-bit platforms for P/Invoke-ing to native routines. If you have a very small number of usages you might consider having two separate structure definitions and P/Invoke declarations, and using if/else to determine which one you use (maybe hiding the bitness stuff behind a wrapper). However, if it is a frequently used structure then it probably makes more sense to just use platform specific assemblies and compile time defines to determine structure layout as then changes only need to be made at the structure definition site.

If you’d like to see some of this stuff in action I’ve posted the source to a test that you can run and then inspect in the debugger to see what the JIT does (http://homepage.mac.com/willij3/blog/testing_bitness.cs). I’m sure it can be done more easily in VS, but I’ve been using WinDbg with SOS’s !name2ee and !u commands to disassemble the resulting code.


[1] Well, there is a small cost involved in the JIT having to parse the extra IL code for both platforms before it can evaluate the const condition and throw away half. However this cost is minimal and for frequently executed code is trivial. For ngen’d code the cost at runtime is non-existant.

Ferrari 4000
24 July 05 10:42 AM | joshwil | 5 Comments   

I am forced to admit that this is one damn fine notebook. Thanks to the helpful instructions on Volker's blog (http://blogs.msdn.com/volkerw/) I was able to get it up and running with 32-bit and 64-bit OSes very quickly. I'm currently trying to live with the 64-bit OS for a while before I fully commit to it. I've been fully 64-bit on my dev machines at work for over a year now and most everything works seamlessly. I feel like it's the smart however to give it a bit of a test run before fully committing to 64-bit on a laptop. I'll keep you updated. Also, once I kill the 32-bit install I'll have room for 64-bit Longhorn (whoops, I mean Vista).

My only complaint about the laptop is it's size. Generally I'm more of a ThinkPad X series form factor type of guy. I like my laptops small and light. This is neither, though at 6lbs and change isn't bad given the size. The screen is great, battery life seems reasonable, having a CD/DVD burner is cool and the DVI out on the back has me contemplacing ditching my desktop machine at home and just keeping an extra LCD around to do dual-monitor with the laptop for when I'm working from home.

Here's to the 64-bit future of computing. Now all I yearn for is a quad core laptop so that I can do all my builds quickly on the run!

 

p.s. Does anyone have any suggestions for a low-power/quiet case that I can stuff my old desktop P4 into to turn it into a media server hidden in the closet? The current power supply sucks way to much juice to leave it on all the time...

More Posts Next page »

Search

This Blog

Syndication

Page view tracker