<?xml version="1.0" encoding="UTF-8" ?>
<?xml-stylesheet type="text/xsl" href="http://blogs.msdn.com/utility/FeedStylesheets/rss.xsl" media="screen"?><rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:slash="http://purl.org/rss/1.0/modules/slash/" xmlns:wfw="http://wellformedweb.org/CommentAPI/"><channel><title>Software Sleuthing : Performance</title><link>http://blogs.msdn.com/joshpoley/archive/tags/Performance/default.aspx</link><description>Tags: Performance</description><dc:language>en-US</dc:language><generator>CommunityServer 2.1 SP1 (Build: 61025.2)</generator><item><title>Poor-Man's Profiler</title><link>http://blogs.msdn.com/joshpoley/archive/2008/03/12/poor-man-s-profiler.aspx</link><pubDate>Wed, 12 Mar 2008 18:59:00 GMT</pubDate><guid isPermaLink="false">91d46819-8472-40ad-a661-2c78acb4018c:8171100</guid><dc:creator>joshpoley</dc:creator><slash:comments>0</slash:comments><comments>http://blogs.msdn.com/joshpoley/comments/8171100.aspx</comments><wfw:commentRss>http://blogs.msdn.com/joshpoley/commentrss.aspx?PostID=8171100</wfw:commentRss><description>&lt;P class=MsoNormal style="MARGIN: 0in 0in 0pt"&gt;Microsoft's C/C++ compiler supports the &lt;A href="http://msdn2.microsoft.com/en-us/library/c63a9b7h.aspx" mce_href="http://msdn2.microsoft.com/en-us/library/c63a9b7h.aspx"&gt;/Gh&lt;/A&gt; and &lt;A href="http://msdn2.microsoft.com/en-us/library/xc11y76y.aspx" mce_href="http://msdn2.microsoft.com/en-us/library/xc11y76y.aspx"&gt;/GH&lt;/A&gt; switches, these options allow the developer to inject a function call into the beginning and exit of every procedure being compiled. Aside from enabling some cool logging/traceability scenarios, you can also utilize this functionality to build in a simple profiler.&lt;/P&gt;
&lt;P class=MsoNormal style="MARGIN: 0in 0in 0pt"&gt;&lt;?xml:namespace prefix = o ns = "urn:schemas-microsoft-com:office:office" /&gt;&lt;o:p&gt;&amp;nbsp;&lt;/o:p&gt;&lt;/P&gt;
&lt;P class=MsoNormal style="MARGIN: 0in 0in 0pt"&gt;There have been a (surprisingly large) handful of times when I haven't had a profiler available, either due to the fact that one simply didn't exist for the platform I was working on, or a licensed copy of one wasn't readily obtainable. As such, having a simple profiler library ready to link into a project has been a welcome and useful addition to my toolbox.&lt;/P&gt;
&lt;P class=MsoNormal style="MARGIN: 0in 0in 0pt"&gt;&lt;o:p&gt;&amp;nbsp;&lt;/o:p&gt;&lt;/P&gt;
&lt;P class=MsoNormal style="MARGIN: 0in 0in 0pt"&gt;Here is the basic outline of what needs to happen to pull this together. Also note that the discussion here assumes a 32bit Intel architecture. If you are on a 64bit or a non-Intel compatible platform, you will have to forgo the inline assembly, and supply the _penter/_pexit functions in a stand-alone assembly module.&lt;/P&gt;
&lt;H2 style="MARGIN: 12pt 0in 3pt"&gt;&lt;FONT size=3&gt;1. Provide a &lt;SPAN style="mso-spacerun: yes"&gt;&amp;nbsp;&lt;/SPAN&gt;_penter function&lt;/FONT&gt;&lt;/H2&gt;
&lt;P class=MsoNormal style="MARGIN: 0in 0in 0pt 0.25in"&gt;Per the example referenced in the /Gh documentation, you will need to link in your own _penter function which utilizes the &lt;A href="http://msdn2.microsoft.com/en-us/library/h5w10wxs.aspx" mce_href="http://msdn2.microsoft.com/en-us/library/h5w10wxs.aspx"&gt;naked&lt;/A&gt; calling convention. Because of this (and not shown in the official example), you will need to adjust the stack pointer to account for any local variables you use. For example, assuming we use 24 bytes of local stack:&lt;/P&gt;
&lt;DIV style="BACKGROUND-COLOR: #aaaaaa"&gt;
&lt;P class=CodeComment style="MARGIN: 0in 0in 0pt 0.25in"&gt;&lt;FONT color=#008000&gt;&lt;FONT style="BACKGROUND-COLOR: #aaaaaa"&gt;&lt;FONT face="Courier New"&gt;&lt;SPAN style="mso-spacerun: yes"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; &lt;/SPAN&gt;// adjust our stack pointer for local variables&lt;/FONT&gt;&lt;/FONT&gt;&lt;/FONT&gt;&lt;/P&gt;
&lt;P class=Code style="MARGIN: 0in 0in 0pt 0.25in"&gt;&lt;FONT color=#050505&gt;&lt;FONT style="BACKGROUND-COLOR: #aaaaaa"&gt;&lt;FONT face="Courier New"&gt;&lt;SPAN style="mso-spacerun: yes"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; &lt;/SPAN&gt;mov ebp, esp&lt;/FONT&gt;&lt;/FONT&gt;&lt;/FONT&gt;&lt;/P&gt;
&lt;P class=Code style="MARGIN: 0in 0in 0pt 0.25in"&gt;&lt;FONT color=#050505&gt;&lt;FONT style="BACKGROUND-COLOR: #aaaaaa"&gt;&lt;FONT face="Courier New"&gt;&lt;SPAN style="mso-spacerun: yes"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; &lt;/SPAN&gt;sub esp, 24&lt;/FONT&gt;&lt;/FONT&gt;&lt;/FONT&gt;&lt;/P&gt;&lt;/DIV&gt;
&lt;P class=MsoNormal style="MARGIN: 0in 0in 0pt 0.25in"&gt;&lt;o:p&gt;&amp;nbsp;&lt;/o:p&gt;&lt;/P&gt;
&lt;P class=MsoNormal style="MARGIN: 0in 0in 0pt 0.25in"&gt;In this function you will want to grab the return address off the stack (which will be used to identify the caller) and take a snapshot of the time. To store these values, you will need to use a stack-like data structure so you can keep track of the start times for all the functions in the current call chain. Have each thread utilize its own storage buffers for the timing stats, the last thing you want to do is introduce a deadlock with your profiling code (or corrupt memory if you don't protect yourself).&lt;/P&gt;
&lt;P class=MsoNormal style="MARGIN: 0in 0in 0pt 0.25in"&gt;&lt;o:p&gt;&amp;nbsp;&lt;/o:p&gt;&lt;/P&gt;
&lt;P class=MsoNormal style="MARGIN: 0in 0in 0pt 0.25in"&gt;To get the time, use a high resolution clock. I would recommend either calling &lt;A href="http://msdn2.microsoft.com/en-us/library/ms644904.aspx" mce_href="http://msdn2.microsoft.com/en-us/library/ms644904.aspx"&gt;QueryPerformanceCounter&lt;/A&gt;() or invoke &lt;A href="http://en.wikipedia.org/wiki/RDTSC" mce_href="http://en.wikipedia.org/wiki/RDTSC"&gt;rdtsc&lt;/A&gt; from assembly if you can account for/mitigate the issues around using it directly.&lt;/P&gt;
&lt;H2 style="MARGIN: 12pt 0in 3pt"&gt;&lt;FONT size=3&gt;2. Provide a &lt;SPAN style="mso-spacerun: yes"&gt;&amp;nbsp;&lt;/SPAN&gt;_pexit function&lt;/FONT&gt;&lt;/H2&gt;
&lt;P class=MsoNormal style="MARGIN: 0in 0in 0pt 0.25in"&gt;In your custom _pexit function, grab the current time so you can calculate the run-time. Since file IO is very slow, write out the caller's address and the elapsed time to a separate buffer in memory.&lt;/P&gt;
&lt;P class=MsoNormal style="MARGIN: 0in 0in 0pt 0.25in"&gt;&lt;o:p&gt;&amp;nbsp;&lt;/o:p&gt;&lt;/P&gt;
&lt;P class=MsoNormal style="MARGIN: 0in 0in 0pt 0.25in"&gt;Warning: If your _penter or _pexit functions call any other functions or methods (such as the above mentioned QueryPerformanceCounter), make sure those are &lt;I style="mso-bidi-font-style: normal"&gt;not&lt;/I&gt; compiled with /Gh or /GH. Otherwise death by recursion will ensue.&lt;/P&gt;
&lt;P class=MsoNormal style="MARGIN: 0in 0in 0pt 0.25in"&gt;&lt;o:p&gt;&amp;nbsp;&lt;/o:p&gt;&lt;/P&gt;
&lt;P class=MsoNormal style="MARGIN: 0in 0in 0pt 0.25in"&gt;Depending on how much run-time overhead you want to deal with, you can also add additional code to keep track of:&lt;/P&gt;
&lt;P class=MsoListParagraphCxSpFirst style="MARGIN: 0in 0in 0pt 0.5in; TEXT-INDENT: -0.25in; mso-list: l0 level1 lfo1"&gt;&lt;SPAN style="FONT-FAMILY: Symbol; mso-fareast-font-family: Symbol; mso-bidi-font-family: Symbol"&gt;&lt;SPAN style="mso-list: Ignore"&gt;·&lt;SPAN style="FONT: 7pt 'Times New Roman'"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; &lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/SPAN&gt;hit counts (how many times a function is called)&lt;/P&gt;
&lt;P class=MsoListParagraphCxSpMiddle style="MARGIN: 0in 0in 0pt 0.5in; TEXT-INDENT: -0.25in; mso-list: l0 level1 lfo1"&gt;&lt;SPAN style="FONT-FAMILY: Symbol; mso-fareast-font-family: Symbol; mso-bidi-font-family: Symbol"&gt;&lt;SPAN style="mso-list: Ignore"&gt;·&lt;SPAN style="FONT: 7pt 'Times New Roman'"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; &lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/SPAN&gt;minimum and maximum times spent in a function&lt;/P&gt;
&lt;P class=MsoListParagraphCxSpLast style="MARGIN: 0in 0in 0pt 0.5in; TEXT-INDENT: -0.25in; mso-list: l0 level1 lfo1"&gt;&lt;SPAN style="FONT-FAMILY: Symbol; mso-fareast-font-family: Symbol; mso-bidi-font-family: Symbol"&gt;&lt;SPAN style="mso-list: Ignore"&gt;·&lt;SPAN style="FONT: 7pt 'Times New Roman'"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; &lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/SPAN&gt;the amount of time spent in calls to children&lt;/P&gt;
&lt;H2 style="MARGIN: 12pt 0in 3pt"&gt;&lt;FONT size=3&gt;3. Write an "at end" function&lt;/FONT&gt;&lt;/H2&gt;
&lt;P class=MsoNormal style="MARGIN: 0in 0in 0pt 0.25in"&gt;Since our timings are still in memory, we need to provide a function which will be called at the end of the application to write the results out to a file.&lt;/P&gt;
&lt;H2 style="MARGIN: 12pt 0in 3pt"&gt;&lt;FONT size=3&gt;4. Post-Processing&lt;/FONT&gt;&lt;/H2&gt;
&lt;P class=MsoNormal style="MARGIN: 0in 0in 0pt 0.25in"&gt;Write a tool to translate the caller addresses into the actual function name. This can either be a separate application or done within step 3 above. To do the translation, you can either utilize a &lt;A href="http://msdn2.microsoft.com/en-us/library/k7xkk3e2.aspx" mce_href="http://msdn2.microsoft.com/en-us/library/k7xkk3e2.aspx"&gt;map&lt;/A&gt; file or use the &lt;A href="http://msdn2.microsoft.com/en-us/library/x93ctkx8.aspx" mce_href="http://msdn2.microsoft.com/en-us/library/x93ctkx8.aspx"&gt;PDB&lt;/A&gt; APIs. Then it is just a matter of providing the results in an easy to view/sort manner. Below is a screen shot of some results taken from an application which searches through the metadata in photos (times shown are in clock ticks to avoid rounding or truncation issues, and in this specific sample, there happens to be 2175.21 ticks in a microsecond).&lt;/P&gt;
&lt;P mce_keep="true"&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&lt;IMG title="Sample Gh Profiler Output" style="WIDTH: 693px; HEIGHT: 333px" height=333 alt="Sample Gh Profiler Output" src="http://blogs.msdn.com/photos/joshpoley/images/8162156/original.aspx" width=693 border=0 mce_src="http://blogs.msdn.com/photos/joshpoley/images/8162156/original.aspx"&gt;&lt;/P&gt;
&lt;P mce_keep="true"&gt;&amp;nbsp;&lt;/P&gt;
&lt;P mce_keep="true"&gt;&amp;nbsp;&lt;/P&gt;&lt;img src="http://blogs.msdn.com/aggbug.aspx?PostID=8171100" width="1" height="1"&gt;</description><category domain="http://blogs.msdn.com/joshpoley/archive/tags/C_2F00_C_2B002B00_/default.aspx">C/C++</category><category domain="http://blogs.msdn.com/joshpoley/archive/tags/Poor-Man/default.aspx">Poor-Man</category><category domain="http://blogs.msdn.com/joshpoley/archive/tags/Performance/default.aspx">Performance</category></item></channel></rss>