In the first step of profile-guided optimization, we reduced tatal CPU sample of ReadNewLog.ReadFile from 6,223 samples to 5,803 samples; the second step reduced it further to 3,982. Would it be nice if we can reduce it to below 3,111 samples, essentially doubling the speed of CLRProfiler's profile loading?
The biggest target now are ReadChar and ReadInt, which has 1,573 and 1,160 exclusive samples respectively.
Here is the current implementation of ReadChar and ReadInt:
internal int ReadChar()
{
pos++;
if (bufPos < bufLevel)
return buffer[bufPos++];
else
return FillBuffer();
}
int ReadInt()
while (c == ' ' || c == '\t')
c = ReadChar();
bool negative = false;
if (c == '-')
negative = true;
if (c >= '0' && c <= '9')
int value = 0;
if (c == '0')
if (c == 'x' || c == 'X')
value = ReadHex();
while (c >= '0' && c <= '9')
value = value * 10 + c - '0';
if (negative)
value = -value;
return value;
return -1;
Here is the modified version:
int FastReadInt()
int lc = c;
int len = 0;
while (lc == ' ' || lc == '\t')
lc = buffer[bufPos ++];
lc = FillBuffer();
len++;
uint diff = (uint) (lc - '0');
if (diff <= 9)
do
value = value * 10 + (int) diff;
len ++;
diff = (uint) (lc - '0');
while (diff <= 9);
value = -1;
c = lc;
pos += len;
We just change the two most frequenet calls to ReadInt in call ('c') command processing to try the new method FastReadInt
case 'C':
case 'c':
if (pos < startFileOffset || pos >= endFileOffset)
while (c >= ' ')
break;
int threadIndex = FastReadInt();
int stackTraceIndex = FastReadInt();
Result for top-10 most expensive (exclusive) functions:
ReadFile now only has 2,977 inclusive samples. We more than doubled the loading performance of CLRProfiler in 3 simple steps.
Here are the changes:
There are still things we can do to improve FastReadInt performance. But now that we've domonstrated the power of profile-guide CPU time optimization to improve CLRProfiler profile loading time, we should switch to other optimizations.