<?xml version="1.0" encoding="UTF-8" ?>
<?xml-stylesheet type="text/xsl" href="http://blogs.msdn.com/utility/FeedStylesheets/rss.xsl" media="screen"?><rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:slash="http://purl.org/rss/1.0/modules/slash/" xmlns:wfw="http://wellformedweb.org/CommentAPI/"><channel><title>Dr. HardwareBlog</title><link>http://blogs.msdn.com/b/carmencr/</link><description>or: How I Learned to Stop Blaming Windows and Love the BSOD</description><dc:language>en-US</dc:language><generator>Telligent Evolution Platform Developer Build (Build: 5.6.50428.7875)</generator><item><title>SMIs Are EEEEVIL (Part 2)</title><link>http://blogs.msdn.com/b/carmencr/archive/2005/09/01/459194.aspx</link><pubDate>Thu, 01 Sep 2005 18:29:00 GMT</pubDate><guid isPermaLink="false">91d46819-8472-40ad-a661-2c78acb4018c:459194</guid><dc:creator>carmencr</dc:creator><slash:comments>0</slash:comments><wfw:commentRss xmlns:wfw="http://wellformedweb.org/CommentAPI/">http://blogs.msdn.com/b/carmencr/rsscomments.aspx?WeblogPostID=459194</wfw:commentRss><comments>http://blogs.msdn.com/b/carmencr/archive/2005/09/01/459194.aspx#comments</comments><description>&lt;P&gt;&lt;FONT face=Tahoma&gt;In &lt;/FONT&gt;&lt;a href="http://blogs.msdn.com/carmencr/archive/2005/08/31/458609.aspx"&gt;&lt;FONT face=Tahoma&gt;Part 1&lt;/FONT&gt;&lt;/A&gt;&lt;FONT face=Tahoma&gt;, I discussed a bit of the history and function of SMIs.&amp;nbsp; How does this make them EEEEVIL, is the question?&lt;/FONT&gt;&lt;/P&gt;
&lt;P&gt;&lt;FONT face=Tahoma&gt;Essentially, SMIs are the final word in what happens on a CPU, outside of removing power.&amp;nbsp; They cannot be interrupted, even by a Non-Maskable Interrupt (NMI).&amp;nbsp; Also, since they are not assertable from within software, it's impossible to use them or detect when they happen.&amp;nbsp; Essentially, the BIOS has control over everything that happens when it takes over.&amp;nbsp; Since it is it's own execution mode, the assumptions and mechanisms of the previous ones are ignored.&amp;nbsp; Specifically, this means any hardware breakpoints you may have set in your debugger will not fire based on anything that is happening in SMM.&amp;nbsp; &lt;/FONT&gt;&lt;/P&gt;
&lt;P&gt;&lt;FONT face=Tahoma&gt;Now, when SMM was originally used only to implement power savings via&amp;nbsp;Advanced Power Management (APM), this wasn't a huge problem.&amp;nbsp; When it became a problem was when BIOS makers and their OEM's started using this ability to implement other functionality via SMM trickery.&amp;nbsp; The most common application is implementing a USB keyboard handler for real-mode operation.&amp;nbsp; This also happens to be one of the most frustrating issue we see, as it can cause any variety of problems with the system's normal operation.&lt;/FONT&gt;&lt;/P&gt;
&lt;P&gt;&lt;FONT face=Tahoma&gt;To understand why, think of the implications of an undetectable Hypervisor mode that has full access to the system.&amp;nbsp; Necessarily, to implement a keyboard handler like that, it needs to touch the hardware.&amp;nbsp; This means meddling with registers on devices, and even physical memory.&amp;nbsp; Now if you implement the perfect SMM handler for this kind of work, fine.&amp;nbsp; If you have a bug however, havoc can ensue.&amp;nbsp; You can be running along in a critical, essentially non-preempt-able code path, and from one instruction to the next have a section of memory or a hardware register changed out from under you.&amp;nbsp; This can result in all kinds of strange issue, from a crashed application, to a bluescreen, to a hang.&lt;/FONT&gt;&lt;/P&gt;
&lt;P&gt;&lt;FONT face=Tahoma&gt;I'll cover some of the more common problems and symptoms in another article.&lt;/FONT&gt;&lt;/P&gt;&lt;div style="clear:both;"&gt;&lt;/div&gt;&lt;img src="http://blogs.msdn.com/aggbug.aspx?PostID=459194" width="1" height="1"&gt;</description></item><item><title>SMIs Are EEEEVIL (Part 1)</title><link>http://blogs.msdn.com/b/carmencr/archive/2005/08/31/458609.aspx</link><pubDate>Wed, 31 Aug 2005 22:18:00 GMT</pubDate><guid isPermaLink="false">91d46819-8472-40ad-a661-2c78acb4018c:458609</guid><dc:creator>carmencr</dc:creator><slash:comments>1</slash:comments><wfw:commentRss xmlns:wfw="http://wellformedweb.org/CommentAPI/">http://blogs.msdn.com/b/carmencr/rsscomments.aspx?WeblogPostID=458609</wfw:commentRss><comments>http://blogs.msdn.com/b/carmencr/archive/2005/08/31/458609.aspx#comments</comments><description>&lt;P&gt;&lt;FONT face=Tahoma&gt;As a quick introduction, SMIs were introduced to the x86 world by the 386SL.&amp;nbsp; It was created to allowed systems designers to have access to the CPU while unspecified software of any type was running.&amp;nbsp; The reasons for this are obvious when you look at the market the 386SL was aimed it.&amp;nbsp; It was Intel's first attempt at a truly mobile CPU.&amp;nbsp; SMIs allowed the BIOS to control various aspects of power management on the CPU, regardless of what kind of OS was running on top of it.&amp;nbsp; That was a good thing in the days when DOS still ruled the land.&amp;nbsp; DOS knew as much about power management as your average light bulb, and letting the system designer control how and when devices were turned on and off seemed like a great solution.&amp;nbsp; The problem is how it was implemented&lt;/FONT&gt;&lt;/P&gt;
&lt;P&gt;&lt;FONT face=Tahoma&gt;To implement SMI, Intel created a new interrupt pin on their CPU, appropriately named the SMI# pin.&amp;nbsp; When this pin was asserted, (turned on, essentially) the system would halt everything it was doing, save state, and transition into System Management Mode (SMM).&amp;nbsp; SMM is essentially&amp;nbsp;another entire&amp;nbsp;operating mode for the CPU, just like Real Mode, V86 Mode, and Protected Mode.&amp;nbsp; The big difference here is that this mode can't be signaled from software, and everything that happens is 100% transparent to software.&amp;nbsp; Once the system enters into SMM mode, it's truly like time stands still.&amp;nbsp; Everything you know about the state of the system can change from underneath you from one instruction to the next.&amp;nbsp; This includes every part of the OS, which generally assumes, (rightfully so) that things will always be a certain way until it says otherwise.&lt;/FONT&gt;&lt;/P&gt;
&lt;P&gt;&lt;FONT face=Tahoma&gt;As you can imagine, this requires the systems programmer to get everything right, or disaster can ensue.&amp;nbsp; Needless to say, the reason I am writing about it is because it often does.&amp;nbsp; I'll go into some of the weirdness you can see, and what you might do about it in part 2.&lt;/FONT&gt;&lt;/P&gt;&lt;div style="clear:both;"&gt;&lt;/div&gt;&lt;img src="http://blogs.msdn.com/aggbug.aspx?PostID=458609" width="1" height="1"&gt;</description><category domain="http://blogs.msdn.com/b/carmencr/archive/tags/Hardware/">Hardware</category></item><item><title>Favorite Hardware Bugs &lt;CENSORED&gt;</title><link>http://blogs.msdn.com/b/carmencr/archive/2005/08/31/458489.aspx</link><pubDate>Wed, 31 Aug 2005 18:28:00 GMT</pubDate><guid isPermaLink="false">91d46819-8472-40ad-a661-2c78acb4018c:458489</guid><dc:creator>carmencr</dc:creator><slash:comments>0</slash:comments><wfw:commentRss xmlns:wfw="http://wellformedweb.org/CommentAPI/">http://blogs.msdn.com/b/carmencr/rsscomments.aspx?WeblogPostID=458489</wfw:commentRss><comments>http://blogs.msdn.com/b/carmencr/archive/2005/08/31/458489.aspx#comments</comments><description>&lt;P&gt;&lt;FONT face=Tahoma&gt;Want to know why I started posting again just now?&amp;nbsp; &lt;/FONT&gt;&lt;/P&gt;
&lt;P&gt;&lt;a href="http://blogs.msdn.com/adioltean/"&gt;&lt;FONT face=Tahoma&gt;Adi Oltean&lt;/FONT&gt;&lt;/A&gt;&lt;FONT face=Tahoma&gt; posted a great entry about &lt;/FONT&gt;&lt;a href="http://blogs.msdn.com/adioltean/archive/2005/08/27/457072.aspx"&gt;&lt;FONT face=Tahoma&gt;his favorite hardware bug&lt;/FONT&gt;&lt;/A&gt;&lt;FONT face=Tahoma&gt;.&amp;nbsp; This prompted &lt;/FONT&gt;&lt;a href="http://blogs.msdn.com/larryosterman/"&gt;&lt;FONT face=Tahoma&gt;Larry Osterman&lt;/FONT&gt;&lt;/A&gt;&lt;FONT face=Tahoma&gt; to post &lt;/FONT&gt;&lt;a href="http://blogs.msdn.com/larryosterman/comments/457541.aspx"&gt;&lt;FONT face=Tahoma&gt;his favorite&lt;/FONT&gt;&lt;/A&gt;&lt;FONT face=Tahoma&gt;, and I started feeling left out.&amp;nbsp; I have a ton to choose from, after all I deal with new bleeding edge hardware on a daily basis.&amp;nbsp;I'm hard pressed to call them my favorites, because they generally cause me to suffer, but it's my job so I can't complain too much.&amp;nbsp;&amp;nbsp; They can be a ton of fun to work on too, much more abstract and asynchronous&amp;nbsp;than&amp;nbsp;just working on dry code.&amp;nbsp;&lt;/FONT&gt;&lt;/P&gt;
&lt;P&gt;&lt;FONT face=Tahoma&gt;Unfortunately, I can't really share the best ones since they're still pretty fresh on the market.&amp;nbsp; All the stories would start&amp;nbsp;like this:&amp;nbsp; There was this one time a few &lt;STRONG&gt;&amp;lt;CENSORED&amp;gt;&lt;/STRONG&gt; ago, when our OEM, &lt;STRONG&gt;&amp;lt;CENSORED&amp;gt;&lt;/STRONG&gt;, was about to ship a new &lt;STRONG&gt;&amp;lt;CENSORED&amp;gt;&lt;/STRONG&gt;, and it had this really strange behavior when you &lt;STRONG&gt;&amp;lt;CENSORED&amp;gt;&lt;/STRONG&gt;...and end like this:&amp;nbsp; In the end, we fixed it.&amp;nbsp; OR: In the end,&amp;nbsp;they hoped not many people would see it and just shipped anyway.&amp;nbsp; &lt;/FONT&gt;&lt;/P&gt;
&lt;P&gt;&lt;FONT face=Tahoma&gt;Riveting stuff, eh?&amp;nbsp; I know I could be more generic, but since I work with these vendors on a daily basis I don't want to chance it.&amp;nbsp; I can tell you what is the bane of any operating system developer: System Management Interrupts (SMIs)&amp;nbsp; BIOSes that make heavy use of SMIs can wreak havoc with an OS, and there's very little in the way of figuring it out, and no way to stop it except a new BIOS.&amp;nbsp; I'll go into the pain and misery of SMIs in my next post.&lt;/FONT&gt;&lt;/P&gt;&lt;div style="clear:both;"&gt;&lt;/div&gt;&lt;img src="http://blogs.msdn.com/aggbug.aspx?PostID=458489" width="1" height="1"&gt;</description><category domain="http://blogs.msdn.com/b/carmencr/archive/tags/Hardware/">Hardware</category></item><item><title>Back from a long hiatus</title><link>http://blogs.msdn.com/b/carmencr/archive/2005/08/30/458007.aspx</link><pubDate>Tue, 30 Aug 2005 18:53:00 GMT</pubDate><guid isPermaLink="false">91d46819-8472-40ad-a661-2c78acb4018c:458007</guid><dc:creator>carmencr</dc:creator><slash:comments>2</slash:comments><wfw:commentRss xmlns:wfw="http://wellformedweb.org/CommentAPI/">http://blogs.msdn.com/b/carmencr/rsscomments.aspx?WeblogPostID=458007</wfw:commentRss><comments>http://blogs.msdn.com/b/carmencr/archive/2005/08/30/458007.aspx#comments</comments><description>&lt;P&gt;&lt;FONT face=Tahoma&gt;Well, I sort of had to stop blogging for awhile there because I moved on to a slightly different role.&amp;nbsp; I have the same job at the end of the day, but now I support more general portions of the OS, and of course one of the things I enjoy most:&amp;nbsp;storage.&amp;nbsp; This has always been one of my strong suits as a debugger, and over the last year I've had plenty of opportunities to improve.&amp;nbsp; My group now owns support for most SAN/NAS solutions, storport.sys, scsiport.sys, and our new iSCSI 2.0 solution.&amp;nbsp; Expect to see more posts in the coming days about these topics.&amp;nbsp; I'm also still interested in random other debugging and hardware topics of course, since as I've said before, it's almost as much my hobby as my job.&amp;nbsp; Those topics will never go away.&lt;/FONT&gt;&lt;/P&gt;
&lt;P&gt;&lt;FONT face=Tahoma&gt;I'm going to attempt to be slightly more spontaneous with my posts, to keep myself from getting into a rut where I refuse to post because I am so busy and don't have enough information to make an informative post.&amp;nbsp; That said, I'll try not to walk off into the weeds of random discussions too often.&amp;nbsp; Look forward to more random hardware information shortly.&lt;/FONT&gt;&lt;/P&gt;&lt;div style="clear:both;"&gt;&lt;/div&gt;&lt;img src="http://blogs.msdn.com/aggbug.aspx?PostID=458007" width="1" height="1"&gt;</description></item><item><title>NUMA and you, perfect together (Part 1)</title><link>http://blogs.msdn.com/b/carmencr/archive/2004/08/30/223027.aspx</link><pubDate>Tue, 31 Aug 2004 00:25:00 GMT</pubDate><guid isPermaLink="false">91d46819-8472-40ad-a661-2c78acb4018c:223027</guid><dc:creator>carmencr</dc:creator><slash:comments>7</slash:comments><wfw:commentRss xmlns:wfw="http://wellformedweb.org/CommentAPI/">http://blogs.msdn.com/b/carmencr/rsscomments.aspx?WeblogPostID=223027</wfw:commentRss><comments>http://blogs.msdn.com/b/carmencr/archive/2004/08/30/223027.aspx#comments</comments><description>&lt;p&gt;&lt;font face="Tahoma"&gt;I know this is a slightly more esoteric topic, even for me, but I want to address cc:NUMA platforms, and how they matter to Windows and Windows applications.&amp;nbsp; What is NUMA you ask?&amp;nbsp; NUMA stands for Non-Uniform Memory Architecture.&amp;nbsp; (The cc: stands for Cache Coherent, by the way, because there is non-cache coherent NUMA as well, but I won't address that here since there are no Windows support platforms that are non-cache coherent.)&lt;/font&gt;&lt;/p&gt; &lt;p&gt;&lt;font face="Tahoma"&gt;To understand why NUMA exists, we need to look at Symmetric Multiprocessing (SMP).&amp;nbsp; SMP has a few core principles it is built around, and one is that every CPU in the system has an identical view of the system. Memory, I/O subsystem, and other CPU's can all be treated the same by software.&amp;nbsp; The problem comes when this assumption is no longer true.&amp;nbsp; As you scale up the size of a system, it becomes harder and harder to keep everything close together, literally.&amp;nbsp; The more switches and busses your data flows through, the longer it takes.&lt;/font&gt;&lt;/p&gt; &lt;p&gt;&lt;font face="Tahoma"&gt;This fact means that in order to squeeze the maximum amount of performance out of the system, it behooves the OS as well as the programmer to try and keep data as close to the place where it's needed as possible.&amp;nbsp; By keeping track of which pages of memory and CPU's have the best locality to each other, decisions can be made when threads are scheduled and memory allocated that will squeeze that extra little bit out of the system.&lt;/font&gt;&lt;/p&gt; &lt;p&gt;&lt;font face="Tahoma"&gt;Until only a few years ago, this was exclusively the realm of large mainframe style computers, not the PC world.&amp;nbsp; But with the introduction of the Unisys ES7000 in 2000, the PC suddenly had something to benefit by being NUMA aware.&amp;nbsp; Even then, this was something that mostly concerned large scale-up server implementations, not the average user or programmer.&amp;nbsp; That is, until AMD announced their unique implementation of their new Opteron and Athlon64 processors.&amp;nbsp; Suddenly, any system that has more than one of those CPUs could potentially benefit from NUMA optimizations.&amp;nbsp; I'll go into why in the next entry.&lt;/font&gt;&lt;/p&gt;&lt;div style="clear:both;"&gt;&lt;/div&gt;&lt;img src="http://blogs.msdn.com/aggbug.aspx?PostID=223027" width="1" height="1"&gt;</description><category domain="http://blogs.msdn.com/b/carmencr/archive/tags/General+Windows/">General Windows</category></item><item><title>The Frustrations with Social Engineering, Even in Support</title><link>http://blogs.msdn.com/b/carmencr/archive/2004/08/20/217971.aspx</link><pubDate>Fri, 20 Aug 2004 21:26:00 GMT</pubDate><guid isPermaLink="false">91d46819-8472-40ad-a661-2c78acb4018c:217971</guid><dc:creator>carmencr</dc:creator><slash:comments>2</slash:comments><wfw:commentRss xmlns:wfw="http://wellformedweb.org/CommentAPI/">http://blogs.msdn.com/b/carmencr/rsscomments.aspx?WeblogPostID=217971</wfw:commentRss><comments>http://blogs.msdn.com/b/carmencr/archive/2004/08/20/217971.aspx#comments</comments><description>&lt;p&gt;&lt;font face="Tahoma"&gt;I just got another first-hand experience in the difficulty of trying to affect computing through social engineering.&amp;nbsp; Our fax forwarding people do their forwarding based on the cover letter.&amp;nbsp; Whoever is listed&amp;nbsp;on the From: line&amp;nbsp;gets a TIFF of the fax forwarded to their e-mail inbox.&amp;nbsp; Normally, this works great.&amp;nbsp; However, just a few minutes ago, a customer sent a fax where the cover letter just had a generic destination.&amp;nbsp; Since there was an e-mail alias that matched this generic description, the fax got forwarded, and targeted literally thousands of people.&lt;/font&gt;&lt;/p&gt; &lt;p&gt;&lt;font face="Tahoma"&gt;My first reaction was confusion, since I hadn't looked at who it was intended for.&amp;nbsp; (After all, you usually don't get a fax to a distribution list.)&amp;nbsp; I looked it over, and when it didn't make any sense I finally looked at who it was intended for.&amp;nbsp; I laughed, and thought about all the confused souls out there who will be getting this too.&amp;nbsp; Then, the inevitable happened.&amp;nbsp; Someone replied to all, saying the fax wasn't meant for them.&amp;nbsp; Then the flood of Me Too's came.&amp;nbsp; It reminded me in a painful way of the Bedlam DL3 fiasco that happened right after I joined Microsoft 7 years ago.&amp;nbsp; The Exchange team covers it in all its painful detail &lt;A href="http://blogs.msdn.com/exchange/archive/2004/04/08/109626.aspx"&gt;here&lt;/a&gt;.&lt;/font&gt;&lt;/p&gt; &lt;p&gt;&lt;font face="Tahoma"&gt;People who have been here long enough, and are "technical" enough to know better, are just as guilty of replying to everyone as anyone else.&amp;nbsp; In this case, the smaller scale,&amp;nbsp;Exchange 2003 servers, and the quick action of others prevented the disaster that Bedlam caused.&amp;nbsp; Still, I can't help but wonder how we're supposed to prevent people from taking self-damaging actions when it seems it's part of human nature to do so, and ignore the mistakes of the past.&lt;/font&gt;&lt;/p&gt;&lt;div style="clear:both;"&gt;&lt;/div&gt;&lt;img src="http://blogs.msdn.com/aggbug.aspx?PostID=217971" width="1" height="1"&gt;</description><category domain="http://blogs.msdn.com/b/carmencr/archive/tags/Other/">Other</category></item><item><title>Support is not Manufacturing: Part 2</title><link>http://blogs.msdn.com/b/carmencr/archive/2004/08/13/214325.aspx</link><pubDate>Fri, 13 Aug 2004 19:15:00 GMT</pubDate><guid isPermaLink="false">91d46819-8472-40ad-a661-2c78acb4018c:214325</guid><dc:creator>carmencr</dc:creator><slash:comments>4</slash:comments><wfw:commentRss xmlns:wfw="http://wellformedweb.org/CommentAPI/">http://blogs.msdn.com/b/carmencr/rsscomments.aspx?WeblogPostID=214325</wfw:commentRss><comments>http://blogs.msdn.com/b/carmencr/archive/2004/08/13/214325.aspx#comments</comments><description>&lt;p&gt;&lt;font face="Tahoma"&gt;Ok, my arm is warm now.&amp;nbsp; Time to start tossing some theory bombs out there, and hope none get picked off.&amp;nbsp; They said Italians couldn’t quarterback, but look at Vinny Testaverde!&amp;nbsp; (Err…no, don’t.)&lt;/font&gt;&lt;/p&gt; &lt;p&gt;&lt;font face="Tahoma"&gt;The reason treating support processes like a manufacturing endeavor fails is because it doesn’t take into account the sheer mass of uncontrolled variables that go into fixing software, IMO.&amp;nbsp; In an ideal world, you could have an unlimited number of truly gifted people performing every job function.&amp;nbsp; In the real world, you have to balance the needs for customer service with technical savvy, troubleshooting with good communication, prompt response with cost, etc.&amp;nbsp; People aren’t machines.&lt;/font&gt;&lt;/p&gt; &lt;p&gt;&lt;font face="Tahoma"&gt;Taking a deterministic approach to creating your workflows means “sticking your head in the sand” to the variables out of your control, and overloading the system should a bottleneck occur.&amp;nbsp; Example:&amp;nbsp; If you require all cases go to your next tier of support at N days, but you neglect to factor that all of Europe takes August off, your small, specialized upper tier will be overloaded with issues from people who aren’t answering phones, (and when they do it will all be on the same day.)&lt;/font&gt;&lt;/p&gt; &lt;p&gt;&lt;font face="Tahoma"&gt;Going in the opposite direction is fraught with peril as well.&amp;nbsp; As anyone who has worked in a helpdesk or support environment knows (at least from the armchair quarterback standpoint), the only consistently effective way to improve the quality of support is to have more people available to handle your volume, many of whom are aces at what they do.&amp;nbsp; As anyone who has been involved in managing a project like that knows, doing so is so expensive that you’d trash any profit your product might make.&amp;nbsp; So the question is how to strike the right balance.&lt;/font&gt;&lt;/p&gt; &lt;p&gt;&lt;font face="Tahoma"&gt;The problem is, I don’t think there’s a good answer, at least not in terms of traditional support.&amp;nbsp; The real answer lies with software.&amp;nbsp; I’ll go into that next.&lt;br /&gt;&lt;/font&gt;&lt;/p&gt;&lt;div style="clear:both;"&gt;&lt;/div&gt;&lt;img src="http://blogs.msdn.com/aggbug.aspx?PostID=214325" width="1" height="1"&gt;</description><category domain="http://blogs.msdn.com/b/carmencr/archive/tags/Other/">Other</category></item><item><title>Support is not Manufacturing: Part 1</title><link>http://blogs.msdn.com/b/carmencr/archive/2004/08/12/213768.aspx</link><pubDate>Thu, 12 Aug 2004 21:45:00 GMT</pubDate><guid isPermaLink="false">91d46819-8472-40ad-a661-2c78acb4018c:213768</guid><dc:creator>carmencr</dc:creator><slash:comments>2</slash:comments><wfw:commentRss xmlns:wfw="http://wellformedweb.org/CommentAPI/">http://blogs.msdn.com/b/carmencr/rsscomments.aspx?WeblogPostID=213768</wfw:commentRss><comments>http://blogs.msdn.com/b/carmencr/archive/2004/08/12/213768.aspx#comments</comments><description>&lt;p&gt;&lt;font face="Tahoma"&gt;Ok, I know I said when I started this blog that I wouldn’t be going into the support aspects of my job much, but I lied.&amp;nbsp; I can’t resist being an armchair quarterback, so I am going to warm up my arm today, and start tossing Hail Mary’s tomorrow.&amp;nbsp; Just remember, this is coming from a tech guy with an eye for process, not a manager in charge of making the tough decisions. :)&lt;/font&gt;&lt;/p&gt; &lt;p&gt;&lt;font face="Tahoma"&gt;&amp;nbsp;If there’s one thing that has been consistent in my 7 years in support, it’s that processes rule the day.&amp;nbsp; Not the kind that people like to debug, the kind that tells you what you should do and when.&amp;nbsp; &lt;/font&gt;&lt;/p&gt; &lt;p&gt;&lt;font face="Tahoma"&gt;After some grim time fighting the system, I came to see why they are important.&amp;nbsp; After all, how can you manage and understand your costs when you don’t know what’s going on and why?&amp;nbsp; It’s not like creating a product where you can see something coming out the other end of the line, like software, or toothpaste.&amp;nbsp; So, you need to put some rules around your workflows, and figure out a way to get a very ephemeral and soft result: Happy Customers.&amp;nbsp; I accept that now, even if it cramps my style.&lt;/font&gt;&lt;/p&gt; &lt;p&gt;&lt;font face="Tahoma"&gt;My problem comes from the data used to make some of these process flow decisions.&amp;nbsp; Your choices can only be as good as your data, unless you get lucky.&amp;nbsp; Playing carefree games with your data doesn’t help you make good decisions.&amp;nbsp; Too many people spend a lot of time trying to find the best way to get the data to fit their ideas and preconceptions, and make horrible decisions based on their “findings”.&amp;nbsp; Richard Feynman famously called it &lt;a href="http://www.columbia.edu/itc/applied/wiggins/Classes/E4903/Fall2003/cargo.pdf"&gt;Cargo Cult Science&lt;/a&gt;.&amp;nbsp; &lt;/font&gt;&lt;font face="Tahoma"&gt;The result is often a system that doesn’t help customers, and makes employees unhappy to come to work every day.&lt;/font&gt;&lt;/p&gt; &lt;p&gt;&lt;font face="Tahoma"&gt;One of the biggest culture-shock changes comes when someone who is familiar with manufacturing process comes in and tries to “shape things up.”&amp;nbsp; It usually doesn’t work out well, and I’ll go into why I think that is in my next entry.&lt;/font&gt;&lt;/p&gt;&lt;div style="clear:both;"&gt;&lt;/div&gt;&lt;img src="http://blogs.msdn.com/aggbug.aspx?PostID=213768" width="1" height="1"&gt;</description><category domain="http://blogs.msdn.com/b/carmencr/archive/tags/Other/">Other</category></item><item><title>Self-Monitoring and Diagnosing Hardware</title><link>http://blogs.msdn.com/b/carmencr/archive/2004/08/10/212155.aspx</link><pubDate>Tue, 10 Aug 2004 19:27:00 GMT</pubDate><guid isPermaLink="false">91d46819-8472-40ad-a661-2c78acb4018c:212155</guid><dc:creator>carmencr</dc:creator><slash:comments>0</slash:comments><wfw:commentRss xmlns:wfw="http://wellformedweb.org/CommentAPI/">http://blogs.msdn.com/b/carmencr/rsscomments.aspx?WeblogPostID=212155</wfw:commentRss><comments>http://blogs.msdn.com/b/carmencr/archive/2004/08/10/212155.aspx#comments</comments><description>&lt;P&gt;&lt;SPAN&gt;&lt;FONT face=Tahoma&gt;This is something that most people in the mainframe business have taken fom granted for decades now.&lt;SPAN&gt;&amp;nbsp; &lt;/SPAN&gt;To the PC world, it&amp;#8217;s relatively new&amp;#8230;and to the PC OS world, even newer.&lt;/FONT&gt;
&lt;P&gt;&lt;FONT face=Tahoma&gt;&lt;/FONT&gt;&lt;/P&gt;&lt;/SPAN&gt;
&lt;P&gt;&lt;FONT face=Tahoma&gt;&lt;/FONT&gt;&lt;/P&gt;
&lt;P&gt;&lt;SPAN&gt;&lt;FONT face=Tahoma&gt;&lt;/FONT&gt;&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&lt;SPAN&gt;&lt;FONT face=Tahoma&gt;Starting with the Pentium and Pentium Pro, Intel introduced the Machine Check Architecture (MCA), which was a way for the CPU and other components of the system to report internal inconsistencies to software, so that the operating system can make decisions about how best to protect the user and data and/or report the problem.&lt;SPAN&gt;&amp;nbsp; &lt;/SPAN&gt;For full information on how this works, see the &lt;A href="http://http://www.intel.com/design/pentium4/manuals/index_new.htm"&gt;IA-32 Architecture Software Developer&amp;#8217;s Manual Volume 3: System Programming Guide&lt;/A&gt;, Chapter 14.&lt;/FONT&gt;
&lt;P&gt;&lt;FONT face=Tahoma&gt;&lt;/FONT&gt;&lt;/P&gt;&lt;/SPAN&gt;
&lt;P&gt;&lt;FONT face=Tahoma&gt;&lt;/FONT&gt;&lt;/P&gt;
&lt;P&gt;&lt;SPAN&gt;&lt;FONT face=Tahoma&gt;&lt;/FONT&gt;&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&lt;SPAN&gt;&lt;FONT face=Tahoma&gt;Now, that&amp;#8217;s all well and good, but unfortunately Windows didn&amp;#8217;t support anything but the most basic level of reporting until Server 2003.&lt;SPAN&gt;&amp;nbsp; &lt;/SPAN&gt;Before that release, we would stop the system is a fatal error occurred, but not much else.&lt;SPAN&gt;&amp;nbsp; &lt;/SPAN&gt;With Server 2003 however, the reporting mechanism became more sophisticated.&lt;SPAN&gt;&amp;nbsp; &lt;/SPAN&gt;&lt;/FONT&gt;
&lt;P&gt;&lt;FONT face=Tahoma&gt;&lt;/FONT&gt;&lt;/P&gt;&lt;/SPAN&gt;
&lt;P&gt;&lt;FONT face=Tahoma&gt;&lt;/FONT&gt;&lt;/P&gt;
&lt;P&gt;&lt;SPAN&gt;&lt;FONT face=Tahoma&gt;&lt;/FONT&gt;&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&lt;SPAN&gt;&lt;FONT face=Tahoma&gt;If your processor and platform support it, we can read and log events into the event log to tell you more clearly what happened.&lt;SPAN&gt;&amp;nbsp; &lt;/SPAN&gt;This might seem redundant, but not all Machine Check Exceptions (MCE) are fatal.&lt;SPAN&gt;&amp;nbsp; &lt;/SPAN&gt;Some are just informative.&lt;SPAN&gt;&amp;nbsp; &lt;/SPAN&gt;For example, you could&amp;nbsp;have one particular region of memory that keeps returning corrected parity errors.&lt;SPAN&gt;&amp;nbsp; &lt;/SPAN&gt;Corrected is great, no problems with your data.&lt;SPAN&gt;&amp;nbsp; &lt;/SPAN&gt;The fact that they keep happening?&lt;SPAN&gt;&amp;nbsp; &lt;/SPAN&gt;Usually bad news.&lt;SPAN&gt;&amp;nbsp; &lt;/SPAN&gt;Go get it replaced!&lt;SPAN&gt;&amp;nbsp; &lt;/SPAN&gt;&lt;/FONT&gt;
&lt;P&gt;&lt;FONT face=Tahoma&gt;&lt;/FONT&gt;&lt;/P&gt;&lt;/SPAN&gt;
&lt;P&gt;&lt;FONT face=Tahoma&gt;&lt;/FONT&gt;&lt;/P&gt;
&lt;P&gt;&lt;SPAN&gt;&lt;FONT face=Tahoma&gt;&lt;/FONT&gt;&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&lt;SPAN&gt;&lt;FONT face=Tahoma&gt;The worst-case scenario, of course, is an unrecoverable error.&lt;SPAN&gt;&amp;nbsp; &lt;/SPAN&gt;Those are reported with a STOP 0x0000009C.&lt;SPAN&gt;&amp;nbsp; &lt;/SPAN&gt;If you encounter one of these, it&amp;#8217;s best to contact your OEM instead of Microsoft.&lt;SPAN&gt;&amp;nbsp; &lt;/SPAN&gt;There&amp;#8217;s really nothing we can do.&lt;SPAN&gt;&amp;nbsp; &lt;/SPAN&gt;This is a hardware problem, always.&lt;SPAN&gt;&amp;nbsp; &lt;/SPAN&gt;We might be able to help interpret, but it&amp;#8217;s not likely.&lt;SPAN&gt;&amp;nbsp; &lt;/SPAN&gt;If the system is critical, get to hardware swapping.&lt;/FONT&gt;&lt;/P&gt;&lt;/SPAN&gt;&lt;div style="clear:both;"&gt;&lt;/div&gt;&lt;img src="http://blogs.msdn.com/aggbug.aspx?PostID=212155" width="1" height="1"&gt;</description><category domain="http://blogs.msdn.com/b/carmencr/archive/tags/Hardware/">Hardware</category></item><item><title>How a Bluescreen Button (NMI) can Save Your Bacon </title><link>http://blogs.msdn.com/b/carmencr/archive/2004/08/09/211385.aspx</link><pubDate>Mon, 09 Aug 2004 19:47:00 GMT</pubDate><guid isPermaLink="false">91d46819-8472-40ad-a661-2c78acb4018c:211385</guid><dc:creator>carmencr</dc:creator><slash:comments>3</slash:comments><wfw:commentRss xmlns:wfw="http://wellformedweb.org/CommentAPI/">http://blogs.msdn.com/b/carmencr/rsscomments.aspx?WeblogPostID=211385</wfw:commentRss><comments>http://blogs.msdn.com/b/carmencr/archive/2004/08/09/211385.aspx#comments</comments><description>&lt;P&gt;&lt;SPAN&gt;&lt;FONT face=Tahoma&gt;I know, another title that seems ridiculous.&lt;SPAN&gt;&amp;nbsp; &lt;/SPAN&gt;Why in the world would anyone want a button that intentionally bluescreens your system?!&lt;SPAN&gt;&amp;nbsp; &lt;/SPAN&gt;When you&amp;#8217;re confronted with a hard hang though, (no mouse or keyboard) you&amp;#8217;re in for a heck of a time trying to figure out what&amp;#8217;s wrong without one.&lt;SPAN&gt;&amp;nbsp; &lt;/SPAN&gt;That&amp;#8217;s where the NMI button can come in handy.&lt;/FONT&gt;
&lt;P&gt;&lt;FONT face=Tahoma&gt;&lt;/FONT&gt;&lt;/P&gt;&lt;/SPAN&gt;
&lt;P&gt;&lt;FONT face=Tahoma&gt;&lt;/FONT&gt;&lt;/P&gt;
&lt;P&gt;&lt;SPAN&gt;&lt;FONT face=Tahoma&gt;&lt;/FONT&gt;&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&lt;SPAN&gt;&lt;FONT face=Tahoma&gt;Many people are already familiar with the &lt;A href="http://support.microsoft.com/?id=244139"&gt;mechanism introduced in Windows 2000&lt;/A&gt; for these kinds of issues.&amp;nbsp; &lt;/FONT&gt;&lt;FONT face=Tahoma&gt;The gist is that by setting a registry key, you can enable a key sequence (at the local keyboard only) that will bluescreen the machine.&lt;SPAN&gt;&amp;nbsp; &lt;/SPAN&gt;Thus if you&amp;#8217;re having problems with hangs, you can get a memory.dmp and send it to your OEM or Microsoft for analysis.&lt;/FONT&gt;
&lt;P&gt;&lt;FONT face=Tahoma&gt;&lt;/FONT&gt;&lt;/P&gt;&lt;/SPAN&gt;
&lt;P&gt;&lt;FONT face=Tahoma&gt;&lt;/FONT&gt;&lt;/P&gt;
&lt;P&gt;&lt;SPAN&gt;&lt;FONT face=Tahoma&gt;&lt;/FONT&gt;&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&lt;SPAN&gt;&lt;FONT face=Tahoma&gt;However, this mechanism can&amp;#8217;t cover every scenario that will result in a hang.&lt;SPAN&gt;&amp;nbsp; &lt;/SPAN&gt;The keyboard interrupt is typically a fairly low priority on the system in relation to the rest of the devices.&lt;SPAN&gt;&amp;nbsp; &lt;/SPAN&gt;If your hang isn&amp;#8217;t the result of a deadlock in the kernel itself, the key sequence will never get through and initiate the crash.&lt;SPAN&gt;&amp;nbsp; &lt;/SPAN&gt;It&amp;#8217;s simply too easy for other devices and drivers to turn off that interrupt while doing their own I/O.&lt;/FONT&gt;
&lt;P&gt;&lt;FONT face=Tahoma&gt;&lt;/FONT&gt;&lt;/P&gt;&lt;/SPAN&gt;
&lt;P&gt;&lt;FONT face=Tahoma&gt;&lt;/FONT&gt;&lt;/P&gt;
&lt;P&gt;&lt;SPAN&gt;&lt;FONT face=Tahoma&gt;&lt;/FONT&gt;&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&lt;SPAN&gt;&lt;FONT face=Tahoma&gt;This is where the Non-Maskable Interrupt (NMI) comes in to save the day.&lt;SPAN&gt;&amp;nbsp; &lt;/SPAN&gt;As the name implies, this is an interrupt that cannot be hidden by software.&lt;SPAN&gt;&amp;nbsp; &lt;/SPAN&gt;When the interrupt is generated, the CPU will always get it, and the interrupt handler (which you also must &lt;A href="http://www.microsoft.com/whdc/system/CEC/dmpsw.mspx"&gt;explicitly enable&lt;/A&gt; in the registry) will start the process of bluescreening the box.&lt;SPAN&gt;&amp;nbsp; &lt;/SPAN&gt;It will then break into the kernel debugger if attached, or generate a STOP 0x00000080 blue screen if not.&lt;/FONT&gt;
&lt;P&gt;&lt;FONT face=Tahoma&gt;&lt;/FONT&gt;&lt;/P&gt;&lt;/SPAN&gt;
&lt;P&gt;&lt;FONT face=Tahoma&gt;&lt;/FONT&gt;&lt;/P&gt;
&lt;P&gt;&lt;SPAN&gt;&lt;FONT face=Tahoma&gt;&lt;/FONT&gt;&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&lt;SPAN&gt;&lt;FONT face=Tahoma&gt;Now if the NMI doesn&amp;#8217;t work, you can be confident that something is seriously wrong with your system, and it&amp;#8217;s probably hardware.&lt;SPAN&gt;&amp;nbsp; &lt;/SPAN&gt;The CPU typically has to move into an unknown state for this feature to fail.&lt;SPAN&gt;&amp;nbsp; &lt;/SPAN&gt;It&amp;#8217;s time to contact your hardware vendor, and quick.&lt;SPAN&gt;&amp;nbsp; &lt;/SPAN&gt;If you&amp;#8217;re wondering why no one uses this feature, you&amp;#8217;d be surprised.&lt;SPAN&gt;&amp;nbsp; &lt;/SPAN&gt;A number of major server vendors do in fact ship systems with this button, but they keep it hidden (for good reason) and don&amp;#8217;t really use it as a feature to sell the box.&lt;SPAN&gt;&amp;nbsp; &lt;/SPAN&gt;They consider it purely diagnostic.&lt;SPAN&gt;&amp;nbsp; &lt;/SPAN&gt;&lt;/FONT&gt;
&lt;P&gt;&lt;FONT face=Tahoma&gt;&lt;/FONT&gt;&lt;/P&gt;&lt;/SPAN&gt;
&lt;P&gt;&lt;FONT face=Tahoma&gt;&lt;/FONT&gt;&lt;/P&gt;
&lt;P&gt;&lt;SPAN&gt;&lt;FONT face=Tahoma&gt;&lt;/FONT&gt;&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&lt;SPAN&gt;&lt;FONT face=Tahoma&gt;Personally, I&amp;#8217;d want every system in my server room to have this mechanism.&lt;SPAN&gt;&amp;nbsp; &lt;/SPAN&gt;I don&amp;#8217;t want 2 or 3 hangs before I can even begin to troubleshoot.&lt;SPAN&gt;&amp;nbsp; &lt;/SPAN&gt;I want it done the first time, every time.&lt;/FONT&gt;
&lt;P&gt;&lt;FONT face=Tahoma&gt;&lt;/FONT&gt;&lt;/P&gt;&lt;/SPAN&gt;
&lt;P&gt;&lt;FONT face=Tahoma&gt;&lt;/FONT&gt;&lt;/P&gt;&lt;div style="clear:both;"&gt;&lt;/div&gt;&lt;img src="http://blogs.msdn.com/aggbug.aspx?PostID=211385" width="1" height="1"&gt;</description><category domain="http://blogs.msdn.com/b/carmencr/archive/tags/Hardware/">Hardware</category></item></channel></rss>