Welcome to MSDN Blogs Sign in | Join | Help

Update on Virtual PC and profiling

Last year, Richard explained  why sampling does not work on Virtual PC.   Since the sampling failure is quite catastrophic, we blocked all profiling on Virtual PC.

Some Beta users noticed that we block profiling also for instrumentation-based profiling. If you are one of these people, I have good news and bad news for you.

The good news is that we will enable profiling instrumented applications on a Virtual PC. [Disclaimer: I am writing here about a product that is not yet released so this is not final]

The bad new is that this may not be a good idea to do that. The performance characteristics on a virtual machine can be different from a real machine. For example, I/O on a virtual machine takes longer as it virtualized. Virtual PC emulates a single, non-hyper threaded CPU. If your application is going to be deployed on a multi-processor server, you’d better profile on such environment and not on a single-CPU machine.

Posted by Ishai | 1 Comments

Bugs that hide from debuggers

Sometimes running a program under a debugger makes it work.   Running the same program without a debugger causes a failure or a crash.   Here are some reasons why this can happen:

1.   Timing - attaching a debugging changes timing and can hide race conditions.   Even without single-stepping and without using breakpoints, the debugger is affecting timing as it receives notifications from the operating system for every exception thrown,  creation of threads, loading dlls, etc.  

2.  Debug heaps – when running under a debugger, the operating system automatically enables heap validation.   This can help finding some types of heap corruption bugs, but sometimes have the opposite effect of hiding a race condition or use of already-free-memory on the heap.    This side effect can be avoided by using the –hd parameter (windows debugger) or by launching the process outside the debugger and attaching later.  You can get a similar effect on a process by using GFlags.  I’m not sure exactly which flags are enabled by the debugger, the following are likely candidates:  FLG_HEAP_ENABLE_FREE_CHECK, FLG_HEAP_VALIDATE_PARAMETERS, FLG_HEAP_ENABLE_TAIL_CHECK.   

3.  SeDebugPrivilege –  The debugger enables SeDebugPrivilege in its process token.   A debugger may need this privilege if it needs to open a process that belongs to a different user.  When a debugger creates a child process, that process inherits the privilege.   Usually, even if you run as administrator, this privilege is disabled by default and needs to be explicitly enabled for the process.   When your process is launched by the debugger it has access to some objects that it would otherwise fail to access.

 

Posted by Ishai | 0 Comments

x64 calling convention and the disappearing process syndrome

Raymond Chen describes the parameter passing aspect of the x64 calling convetions.   But there is more than parameter passing to the calling convention.   Exception handling is an important part of the calling convention.

A function that calls another function or needs to allocate stack space or requires exception handling (e.g. has a try statement) must have a prolog and an epilog.  It also has to have an entry in a special function-table.   The function table includes unwind information – information that enables the exception-handling routings to unwind the stack and undo the effect of the function prolog.    In order for exception handling to work, there are limitations on function prolog and epilog.

The fun begins when a function does not have correct unwind information.  If in addition to that, there is no debugger attached to the process, the system notifies the Win32 sub-system about the exception.   The Win32 sub-system will simply kill the process. You will not see any Watson or JIT debugger dialog box.   The process will just disappear.

This happened to me last week.  I had an assembly thunk function that called some C++ code that had a race condition (which seems to happen only when a debugger is not attached).   Debugging would have been much easier if I did not have an assembly thunk that did not play by the rules and did not have a function-table entry.

Posted by Ishai | 1 Comments

Why doesn’t sampling show the actual time spent in each function?

Some people have asked for a “wall clock time” column in the sampling profiler report.   Unfortunately, the actual time spent in a function cannot be reliably deduced from the collected data. 

Sampling counts “hits” on a function when a certain event occurs.  By default, this event is a CPU cycle counter counting N cycles, but you can also sample based on the Nth page fault, the Nth system call.  

When an event occurs, if the CPU happens to be running in a process that is being sampled, and is running user-mode code, this will count as a hit on the current function and every function above it on the stack.    The hit on the current function is counted as exclusive; the hit on the callers is counted as inclusive.  

Because of this, the number of hits does not represent the wall clock time spent in the function.   A function that is blocked waiting on a resource, waiting for a different process (e.g. calling a remote process), or just calling an expensive system call, may get a number of hits that is not proportional to the time spent in the function.

Posted by Ishai | 1 Comments

There are no Safe Functions – only Safe Programming

The Platform SDK includes a set of string manipulation functions defined in strsafe.h.   These functions were added during the Windows Security push.  The functions offer an alternative to the C run-time string functions which is more consistent, and less error-prone than the C standard functions.

But in order for the functions to actually be safe, they still must be used correctly.   Wrongly used these functions are as unsafe as the old C runtime functions.

In order to use the functions safely the following must be true:

  1. The first parameter is a pointer to a memory buffer,
  2. The second parameter is the size of the buffer in bytes if when the function name begins with StringCb.  The second parameter is the size of the buffer in chars if when function name begins with StringCch
  3. The return code of the function is checked for errors

So when I see something like:

   StringCbCopy(Foo, 2, “\\“);

It immediately raises a red flag for me.   Unless 2 happens to be the number of bytes in Foo, this is not a safe use of the function!   The second parameter here is not the number of bytes remaining in the buffer but rather the number of bytes needed from the second string.    Usually, code like this is the results of someone who just converted the code from strncpy to StringCbCopy without making the necessary changes to actually benefit from the new API. 

Now consider the following code:

   

char FileName[MAX_PATH];

StringCbCopy(FileName, MAX_PATH, DirectoryName);

Is this usage a safe use of the function?  

Only if you will never try to convert the file name and directory name to wide char.    Although for a char, count of chars and count of bytes have the same numeric value, in this case MAX_PATH is a count of chars so either of the following would be better: 

// Use Count of Chars

StringCChCopy(FileName, MAX_PATH, DirectoryName);

// Use a count of bytes

StringCbCopy(FileName, sizeof(FileName), DirectoryName);

Each of the String APIs supports an Ex version that has output parameters returning a pointer to the terminating null at the end of the target string, and the remaining chars/bytes in the buffer.  The Ex versions provide an alternative to doing the pointer arithmetic yourself which is error prone and dangerous.

Program safely.

[edited 8-13-04]

Posted by Ishai | 5 Comments

ISA Server 2004

ISA Server 2004 is now available.    Lots of new and improved features.

 

Posted by Ishai | 0 Comments
Filed under:

Why does the compiler generate a MOV EDI, EDI instruction at the beginning of functions?

Why does the compiler generate a MOV  EDI, EDI instruction at the beginning of functions?

 

I’ve recently noticed that on the XPSP2 Beta that I am running the function prologs look like this:    

 

     MOV    EDI, EDI

     PUSH   EBP

     MOV    EBP, ESP

 

The PUSH  EBP and MOV EBP, ESP instructions are standard frame establishment, but what is the purpose of the MOV EDI,EDI instruction?  Seems like a 2-byte NOP instruction.

 

MOV EDI,EDI is indeed a 2-byte no-op that is there to enable hot-patching.   It enables the application of a hot-fix to a function without a need for a reboot, or even a restart of a running application.   Instead, at runtime, the 2-byte NOP is replaced by a short jump to a long jump instruction that jumps to the hot-fix function.   A 2-byte instruction is required so that when patching the instruction pointer will not point in a middle of an instruction.

 

Posted by Ishai | 4 Comments

Battered fries are considered “fresh vegetables”!?

Well, probably not by nutritionists, but the USDA, backed by a court decision says they are.

(Above link will probably expire in few days).

[typo corrected 15:00]

Posted by Ishai | 2 Comments
Filed under:

Developing firewall and NAT friendly network applications

Developing firewall and NAT friendly network applications

When I worked on ISA Server I've seen network applications that were incompatible with firewalls and NAT and were difficult or impossible to configure the firewall for, even when the firewall administrator wanted the application to pass through.   Below are some design considerations that can help make network applications more “firewall/NAT friendly”.

For a client-server application that has request-reply semantics, the obvious choice would probably be to use HTTP since it is so commonly available.   Be warned that it is not a good idea to use port 80 for traffic which is not HTTP.  While port 80 is usually open in firewalls, content is often inspected for validity or transparently redirected to proxy servers for caching purpose so using port 80 for a protocol which is not HTTP is not a good idea.  

Things get more complicated if both the hosts that need to communicate could both be behind a NAT device or a firewall.  From the firewall administrator point of view (and the end user who needs to request the administrator to open ports), it is better to have outbound connections than an inbound connection, and a fixed port is better than a port range.   It is even better if you do not need to open any new port at all for that matter.  Again, be warned against overloading a well-known port with traffic that does not conform to the standard.   I once encountered a P2P application that tried to use port 21 (assigned to FTP).  ISA Server blocked this application because it knows how to filter FTP and this P2P application was definitely not FTP.

Now if you absolutely have to receive inbound connections to your client application, here are few rules to consider

1. Have plan B ready.    Firewall and routers from different vendors will behave differently, some will work, some will not.  Even with same vendor some will have stricter policy which will not allow inbound connections at all.  Plan for a fall-back to outbound connection or different protocol.

2.  gethostbyname() on the local machine does not return the address for the local machine.   What this API returns is a list of addresses.  If the machine happens to have a single NIC you get the address.  If the machine is multi-homed (as would be the case if the machine is used for sharing Internet connections) then you get a list of addresses.  The first address on the list could be the internal private address that is not accessible from the outside world.  So how do you get this address?  If you already have a connection with the other host, or with a host that is accessible to both, you can use getsockname() on that socket.   If the client is behind an ISA Server and using ISA Firewall Client, using this method will retrieve the “external” IP address of the ISA server.   This is good because you can later also bind() using this address which will cause the firewall client to send a bind request to the ISA server.  If you call bind() with 0.0.0.0 as the IP address, the firewall client software has to guess whether this is a local or remote bind() and may guess wrong.

3. UPnP - ICF and modern routers support configuration of port mapping via UPnP.  There are APIs that can help.   This works great with home routers, but is not good for WANs or any network with more than one segment because UPnP is based on broadcasts and works only on a LAN.

4.  Allocate from a port range.   While for the outbound case, a single port is preferred, for inbound you should take into account that a single port could already be in use by another user on the same machine (think about XP home with multiple user sessions open), or by another application on the machine.   In case of NAT, the port could be in use by another machine.    You may want to allocate from a range of ports instead of letting the system select an available port.  This can help the configuration of firewalls by limiting the outbound ports that need to be opened.

5.  UDP sessions.  Some NAT devices and firewalls generate a UDP “session” when a packet is sent outbound.   By sending a single UDP packet it may be possible to enable a UDP mapping on the NAT device.  Teredo, for example, uses this technique. 

This posting is provided "AS IS" with no warranties, and confers no rights.

Posted by Ishai | 2 Comments
Filed under:
 
Page view tracker