Larry Osterman's WebLog

Confessions of an Old Fogey
  • Larry Osterman's WebLog

    Interacting with Services

    • 15 Comments
    In the comments for my first services post, someone asked about the SERVICE_INTERACTIVE_PROCESS flag that can be specified for the CreateService API.

    This flag allows the user to specify that the service should be allowed to interact with the logged on user.  The idea of an interactive service was added back in NT 3.1 for components like printer drivers that want to pop up UI.

    IMHO this was a spectacularly bad idea that should never have been added to the system.

    MSDN has an entire page on interactive services, unfortunately IMHO, it doesn't go into enough detail as to why it's a bad idea to ever specify the SERVICE_INTERACTIVE_PROCESS flag on a service.

    The primary reason for this being a bad idea is that interactive services enable a class of threats known as "Shatter" attacks (because they "shatter windows", I believe). 

    If you do a search for "shatter attack", you can see some details of how these security threats work.  Microsoft also published KB article 327618 which extends the documentation about interactive services, and Michael Howard wrote an article about interactive services for the MSDN Library.  Initially the shatter attacks went after windows components that had background window message pumps (which have long been fixed), but they've also been used to attack 3rd party services that pop up UI.

    The second reason it's a bad idea is that the SERVICE_INTERACTIVE_PROCESS flag simply doesn't work correctly.  The service UI pops up in the system session (normally session 0).  If, on the other hand, the user is running in another session, the user never sees the UI.  There are two main scenarios that have a user connecting in another session - Terminal Services, and Fast User Switching.  TS isn't that common, but in home scenarios where there are multiple people using a single computer, FUS is often enabled (we have 4 people logged in pretty much all the time on the computer in our kitchen, for example).

    The third reason that interactive services is a bad idea is that interactive services aren't guaranteed to work with Windows Vista :)  As a part of the security hardening process that went into Windows Vista, interactive users log onto sessions other than the system session - the first interactive user runs in session 1, not session 0.  This has the effect of totally cutting shatter attacks off at the knees - user apps can't interact with high privilege windows running in services. 

     

    On the other hand, sometimes it's important to interact with the logged on user.  How do you deal with this problem?  There are a couple of suggestions as to how to resolve the issue.  The first is to use the CreateProcessAsUser API to create a process on the users desktop.  Since the new process is running in the context of the user, privilege elevation attacks don't apply.  Another variant of this solution is to use an existing systray process to communicate with the service.

    In addition, if a COM object is marked as running in the security context of the interactive user, it will be activated in the interactive user's session.  You can use a session moniker to start a COM object in a particular session.  There's an example of how to do this here.

     

  • Larry Osterman's WebLog

    Bobs Math Question: The Official Answers

    • 23 Comments

    EDIT: Please note: This is a single post explaining the answer to a question posted earlier on this blog. 

    This site is NOT intended as a general purpose site in which to get help with your math homework.

    If you're having problems with your math homework, then you should consider asking your parents for help, you're not likely to find it here, sorry about that.

     


    Ok, he's back :)  My last post was a math problem that the teacher in my wife's classroom gave to the students (mostly 11 and 12 year olds fwiw).

    Here's the official answer to the problem, the kids needed to show ALL the calculations (sorry for the word-junk):


    Pyramid L=W=2’ H2 = 22 – 12 so H = 1.73

    V        =1/3*l*w*h

    = 1/3*2*2*1.73 = 2.31 cubic feet

    SA     =b2 + 2bh

    = (2)2 + 2*(2)*1.73

    = 4 + 6.92 = 10.92 square feet.

     

    Triangles

    V=B*h   SA = front + back + 3 sides

    = 2*(1/2*l*h) + 3* L*W

    Triangle #1 : L=8’, W=2’ H2 = 82 – 42 H = 6.93

    V = 1/2*8*6.93*2 = 55.44 cubic feet

    SA = 2(1/2*8*6.93) + 3*8*2 = 103.44 square feet

     

    Triangle #2 : L=9’, W=2’ H2 = 92 – 4.52 H = 7.79

    V = 1/2*9*7.79*2 = 70.11 cubic feet

    SA = 2(1/2*9*7.79) + 3*9*2 = 124.11 square feet

     

    Triangle #3 : L=10’, W=2’ H2 = 102 – 52 H = 8.66

    V = 1/2*10*8.66*2 = 86.6 cubic feet

    SA = 2(1/2*10*8.66) + 3*10*2 = 146.6 square feet

     

    Base of Tree: L=W=2’  H= 3’

    V = L*W*H = 2*2*3 = 12 cubic feet

    SA     = 2(L*H) + 2(W*H) + 2(L*W)

              = 2(2*3 + 2*3 + 2*2)

              = 2(6 + 6 + 4)

              = 32 square feet

     

    6 cones with H=1’, R=.5’, S= 1.12’

    V = 1/3*π*r2h = 1/3 * 3.14*.52 * 1 = .26 cubic feet

    Total volume = 6*.26 = 1.56 cubic feet

    Volume before cutouts:

    Pyramid                    2.31

    Triangle #1           55.44

    Triangle #2           70.11

    Triangle #3           86.60

    Base                        12.00

    Cones                        1.56

    TOTAL                  228.02

                                 Cubic feet

     

     Surface Area before cutouts:

    Pyramid                   10.92

    Triangle #1           103.44

    Triangle #2           124.11

    Triangle #3           146.60

    Base                        32.00

    Cones                      15.30

    TOTAL                  432.37

    Square

     


    Cutout Calculations - Volume

    All of the volume of the cutouts are subtracted from the total volume of the Christmas tree.

     

    There are 6 cylinders total.

    1 has r=1, h=2

    4 have r=1.5, h=2

    1 has r=2, h=2

     

    V = πr2h       SA = 2πr2 + 2πrh

    V        = π*(12 + 4(1.52) + 22)*2

              = π*(1+9+4)*2

              = 3.14*14*2 = 87.92 cubic feet

     

    Small Triangular Prisms

    There are three triangular prisms.

    1 has L=B=1 and W = 2’

    H2 = 12 - .52 so H= .87’

    2 have L=B=1.5 and W = 2

              H2 = 1.52 - .752 so H = 1.69’

     

    V        = Bw where B=1/2*l*h

    V        = (1/2*1*.87*2) + 2*(1/2*1.5*1.69*2)

              = .87 + 5.07

              = 5.94 cubic feet

     

    Total volume to subtract:

    87.92

    +5.94

    93.86 cubic feet

     

    Christmas tree volume minus cutouts:

              228.02

              -93.86

    134.16 Cubic Feet total


    Cutout Calculations – SA

    The front and back SA’s are subtracted from the total SA of the Christmas Tree but the side SA’s are added to the total.

     

    Cylinders

    Front and back SA = 2πr2

    Side SA = 2πrh

    Front and Back SA

              = 2π(12 + 4*1.52 + 22)

              =6.28 * (1+9+4)

              = 87.92 Square feet

    Side SA

              = 2πrh

              =2*π*(1+4*1.5+2)*2

              = 12.56 * 9 = 113.04 Square feet

    Small Triangular Prisms

    Front and Back SA

    = 2*1/2*b*h

    = b*h

    = 1*.87 + 2(1.5*1.69)

    = .87 + 5.07

    = 5.91 Square Feet

     

    Side SA

              = 3*b*w

              = 3*(1+1.5+1.5)*2

              = 24 square feet

    Twice the SA of top of Base

              =2(2*2)=8 Square Feet

     

    SA to Add:            137.04

    SA to Subtract:      101.83

    Total SA to add:      35.21

     

    Christmas Tree SA plus cutouts:

              432.37

              +35.21

              467.58 Square Feet Total

    Edit: Reduced Google juice of this post by changing the title from "Bobs Math Answers" to something more accurate - this post isn't intended to be a Q&A for students who are having trouble with their math homework :)

     

  • Larry Osterman's WebLog

    Tipping Points

    • 84 Comments

    One of my birthday presents was the book "The Tipping Point" by Malcolm Gladwell.

    In it, he talks about how epidemics and other flash occurances happen - situations that are stable, and a small thing changes and suddenly the world changed overnight.

    I've been thinking a lot about yesterdays blog post, and I realized that not only is it a story about one of the coolest developers I've ever met, it also describes a tipping point for the entire computer industry.

    Sometimes, it's fun to play the "what if" game, so...

    What if David Weise hadn't gotten Windows applications running in protected mode?  Now, keep in mind, this is just my rampant speculation, not what would have happened.  Think of it kinda like the Marvel Comics "What if..." series (What would have happened if Spiderman had rescued Gwen Stacy, etc [note: the deep link may not work, you may have to navigate directly]).

    "What If David Weise hadn't gotten Windows applications running in protected mode..."[1]

    Well, if Windows 3.0 hadn't had windows apps running in protected mode, then it likely would have not been successful.  That means that instead of revitalizing interest in Microsoft in the MS-DOS series of operating systems, Microsoft would have continued working on OS/2.  Even though working under the JDA was painful for both Microsoft and IBM, it was the best game in town.

    By 1993, Microsoft and IBM would have debuted OS/2 2.0, which would have had supported 32bit applications, and had MVDM support built-in.

    Somewhere over the next couple of years, the Windows NT kernel would have come out as the bigger, more secure brother of OS/2, it would have kept the workplace shell that IBM wrote (instead of the Windows 3.1 Task Manager).

    Windows 95 would have never existed, since the MS-DOS line would have withered and died off.  Instead, OS/2 would be the 32bit application for lower end machines.  And instead of Microsoft driving the UI story for the platform, IBM would have owned it.

    By 2001, most PC class machines would have OS/2 running on them (probably OS/2 2.5) with multimedia support.  NT OS/2 would also be available for business and office class machines.  With IBMs guidance, instead of the PCI bus becoming dominant, the MCA was the dominant bus form factor.  The nickname for the PC architecture wasn't "Wintel", instead it was "Intos" (OS2tel was just too awkwards to say).  IBM, Microsoft and Intel all worked to drive the hardware platform, and, since IBM was the biggest vendor of PC class hardware, they had a lot to say in the decisions.

    And interestingly enough, when IBM came to the realization that they could make more money selling consulting services than selling hardware, instead of moving to Linux, they stuck with OS/2 - they had a significant ownership stake in the platform, and they'd be pushing it as hard as they can.

    From Microsoft's perspective, the big change would be that instead of Microsoft driving the industry, IBM (as Microsoft's largest OEM, and development partner in OS/2) would be the driving force (at least as far as consumers were concerned).  UI decisions would be made by IBM's engineers, not Microsoft's.

    In my mind, the biggest effect of such a change would be on Linux.  Deprived of the sponsorship of a major enterprise vendor (the other enterprise players followed IBMs lead and went with OS/2), Linux remained as primarily an 'interesting' alternative to Solaris, AIX, and the other *nix based operating systems sold by hardware vendors.  Instead, AIX and Solaris became the major players in the *nix OS space, and flourished as an alternative. 

     

    Anyway, it's all just silly speculation, about what might have happened if the industry hadn't tipped, so take it all with a healthy pinch of salt.

    [1] I'm assuming that all other aspects of the industry remain the same: The internet tidal wave hit in the mid 90s, computers remained as fast as they had always, etc. - this may not be a valid set of assumptions, but it's my fantasy.  I'm also not touching on what affects the DoJ would have had on the situation.

  • Larry Osterman's WebLog

    Building a flicker free volume control

    • 30 Comments

    When we shipped Windows Vista, one of the really annoying UI annoyances with the volume control was that whenever you resized it, it would flicker. 

    To be more specific, the right side of the control would flicker – the rest didn’t flicker (which was rather strange).

     

    Between the Win7 PDC release (what we called M3 internally) and the Win7 Beta, I decided to bit the bullet and see if I could fix the flicker.  It seems like I tried everything to make the flickering go away but I wasn’t able to do it until I ran into the WM_PRINTCLIENT message which allowed me to direct all of the internal controls on the window to paint themselves.

    Basically on a paint call, I’d take the paint DC and send a WM_PRINTCLIENT message to each of the controls in sndvol asking them each to paint themselves to the new DC.  This worked almost perfectly – I was finally able to build a flicker free version of the UI.  The UI wasn’t perfect (for instance the animations that faded in the “flat buttons” didn’t fire) but the UI worked just fine and looked great so I was happy that' I’d finally nailed the problem.  That happiness lasted until I got a bug report in that I simply couldn’t figure out.  It seems that if you launched the volume mixer, set the focus to another application then selected the volume mixer’s title bar and moved the mixer, there were a ton of drawing artifacts left on the screen.

    I dug into it a bunch and was stumped.  It appeared that the clipping rectangle sent in the WM_PAINT message to the top level message didn’t include the entire window, thus portions of the window weren’t erased.  I worked on this for a couple of days trying to figure out what was going wrong and I finally asked for help on one of our internal mailing lists.

    The first response I got was that I shouldn’t use WM_PRINTCLIENT because it was going to cause me difficulty.  I’d already come to that conclusion – by trying to control every aspect of the drawing experience for my app, I was essentially working against the window manager – that’s why the repaint problem was happening.  By calling WM_PRINTCLIENT I was essentially putting a band-aid on the real problem but I hadn’t solved the real problem, all I’d done is to hide it.

     

    So I had to go back to the drawing board.  Eventually (with the help of one of the developers on the User team) I finally tracked down the original root cause of the problem and it turns out that the root cause was somewhere totally unexpected.

    Consider the volume UI:

    image

    The UI is composed of two major areas: The “Devices” group and the “Applications” group.  There’s a group box control wrapped around the two areas.

    Now lets look at the group box control.  For reasons that are buried deep in the early history of Windows, a group box is actually a form of the “button” control.  If you look at the window styles for a button in SpyXX, you’ll see:

    image

     

    Notice the CS_VREDRAW and CS_HREDRAW window class styles.  The MSDN documentation for class styles says:

    CS_HREDRAW - Redraws the entire window if a movement or size adjustment changes the width of the client area.
    CS_VREDRAW - Redraws the entire window if a movement or size adjustment changes the height of the client area.

    In other words every window class with the CS_HREDRAW or CS_VREDRAW style will always be fully repainted whenever the window is resized (including all the controls inside the window).  And ALL buttons have these styles.  That means that whenever you resize any buttons, they’re going to flicker, and so will all of the content that lives below the button.  For most buttons this isn’t a big deal but for group boxes it can be a big issue because group boxes contain other controls.

    In the case of sndvol, when you resize the volume control, we resize the applications group box (because it’s visually pinned to the right side of the dialog).  Which causes the group box and all of its contained controls to repaint and thus flicker like crazy.  The only way to fix this is to remove the CS_HREDRAW and CS_VREDRAW buttons from the window style for the control.

    The good news is that once I’d identified the root cause, the solution to my problem was relatively simple.  I needed to build my own custom version of the group box which handled its own painting and didn’t have the CS_HREDRAW and CS_VREDRAW class.  Fortunately it’s really easy to draw a group box – if themes are enabled a group box can be drawn with DrawThemeBackground API with the BP_GROUPBOX part and if theming is disabled, you can use the DrawEdge API to draw the group box.  Once I added the new control that and dealt with a number of other clean-up issues (making sure that the right portions of the window were invalidated when the window was resized for example), making sure that my top level control had the WS_CLIPCHILDREN style and that each of the sub windows had the WS_CLIPSIBLINGS style I had a version of sndvol that was flicker free AND which let the window manager handle all the drawing complexity.  There are still some minor visual gotchas in the UI (for example, if you resize the window using the left edge the right side of the group box “shudders” a bit – this is apparently an artifact that’s outside my control – other apps have similar issues when resized on the left edge) but they’re acceptable.

    As an added bonus, now that I was no longer painting everything manually, the fade-in animations on the flat buttons started working again!

     

    PS: While I was writing this post, I ran into this tutorial on building flicker free applications, I wish I’d run into it while I was trying to deal with the flickering problem because it nicely lays out how to solve the problem.

  • Larry Osterman's WebLog

    So what's wrong with DRM in the platform anyway?

    • 53 Comments

    As I said yesterday, it's going to take a bit of time to get the next article in the "cdrom playback" series working, so I thought I'd turn the blog around and ask the people who read it a question.

    I was reading Channel9 the other day, and someone turned a discussion of longhorn into a rant against the fact that Longhorn's going to be all about DRM (it's not, there will be DRM support in Longhorn, just like there has been DRM support in just about every version of Windows that's distributed windows media format).

    But I was curious.  Why is it so evil that a platform contain DRM support?

    My personal opinion is that DRM is a tool for content producers.  Content Producers are customers, just like everyone else that uses our product is a customer.  They want a platform that provides content protection.  You can debate whether or not that is a reasonable decision, but it's moot - the content producers today want it.

    So Microsoft, as a platform vendor provides DRM for the content producers.  If we didn't, they wouldn't use our media formats, they'd find some other media format that DOES have DRM support for their content.

    The decision to use (or not use) DRM is up to the content producer.  It's their content, they can decide how to distribute it.  You can author and distribute WMA/WMV files without content protection - all my ripped CDs are ripped without content protection (because I don't share them).  I have a bunch of WMV files shot on the camcorder that aren't DRM'ed - they're family photos, there's no point in using rights management.

    There are professional content producers out there that aren't using DRM for their content (Thermal and a Quarter is a easy example I have on the tip of my tongue (as I write this, they've run out of bandwidth :( but...)).  And there are content producers that are using DRM.

    But why is it evil to put the ability to use DRM into the product?

  • Larry Osterman's WebLog

    What IS audio on a PC anyway?

    • 39 Comments

    This may be well known, but maybe not (I didn’t understand it until I joined the Windows Audio team).

    Just what is digital audio, anyway?  Well, at its core, all of digital audio is a “pop” sound made on the speaker.  When you get right down to it, that’s all it is.  A “sound” in digital audio is a voltage spike applied to a speaker jack, with a specific amplitude.  The amplitude determines how much the speaker diaphragm moves when the signal is received by the speaker.

    That’s it, that’s all that digital audio is – it’s a “pop” noise.  The trick that makes it sound like Sondheim is that you make a LOT of pops every second – thousands and thousands of pops per second.  When you make the pops quickly enough, your ear puts the pops together to turn them into a discrete sound.  You can hear a simple example of this effect when you walk near a high voltage power transformer.  AC power in the US runs at 60 cycles per second, and as the transformer works, it emits a noise on each cycle.  The brain smears that 60 Hz sound together and turns it into the “hum” that you hear near power equipment.

    Another way of thinking about this (thanks Frank) is to consider the speaker on your home stereo.  As you’re listening to music, if you pull the cover off the speaker, you can see the cone move in and out with the music.  Well, if you were to take a ruler and measure the displacement of the cone from 0, the distance that it moves from the origin is the volume of the pop.  Now start measuring really fast – thousands of times a second.  Your collected measurements make up an encoded representation of the sound you just heard.

    To play back the audio, take your measurements, and move the cone the same amount, and it will reproduce the original sound.

    Since a picture is worth a thousand words, Simon Cooke was gracious enough to draw the following...

    Take an audio signal, say a sine wave:

    Then, you sample the sine wave (in this case, 16 samples per frequency):

    Each of the bars under the sine wave is the sample.  When you play back the samples, the speaker will reproduce the original sound.  One thing to keep in mind (as Simon commented) is that the output waveform doesn't look quite like the stepped function that the samples would generate.  Instead, after the Digital-to-Audio-Converter (DAC) in the sound card, there's a low pass filter that smooths the output of the signal.

    When you take an analog audio signal, and encode it in this format, it’s also known as “Pulse Coded Modulation”, or “PCM”.  Ultimately, all PC audio comes out in PCM, that’s typically what’s sent to the sound card when you’re playing back audio.

    When an analog signal is captured (in a recording studio, for example), the volume of the signal is sampled at some frequency (typically 44.1 kHz for CD audio).  Each of the samples is captured with a particular range of amplitudes (or quantization).  For CD audio, the quantization is 16 bits, in two samples.  Obviously, this means that each sample has one of at most 65,536 values, which is typically enough for most audio applications.  Since the CD audio is stereo, there are two 16 bit values for each sample. 

    Other devices, like telephones, on the other hand, typically uses 8 bit samples, and acquires their samples at 8kHz – that’s why the sound quality on telephone communications is so poor (btw, telephones don’t actually use direct 8 bit samples, instead their data stream is compressed using a format called mu-law (or a-law in Europe), or G.711).  On the other hand, the bandwidth used by typical telephone communication is significantly lower than CD audio – CD audio’s bandwidth is 44,100*16*2=1.35Mb/second, or 176KB/second.  The bandwidth of a telephone conversation is 64Kb/second, or 8KB/second (reduced to from 3.2Kb/s to 11Kb/s with compression), an order of magnitude lower.  When you’re dealing with low bandwidth networks like the analog phone network or wireless networks, this reduction in bandwidth is critical.

    It’s also possible to sample at higher frequencies and higher sample sizes.  Some common sample sizes are 20bits/sample and 24bits/sample.  I’ve also seen 96.2 kHz sample frequencies and sometimes even higher.

    When you’re ripping your CDs, on the other hand, it’s pointless to rip them at anything other than 44.1 kHz, 16 bit stereo, there’s nothing you can do to improve the resolution.  There ARE other forms of audio that have a higher bit rate, for example, DVD-Audio allows samples at 44.1, 48, 88.2, 96, 176.4 or 192 kHz, and sample sizes of 16, 20, or 24 bits/sample, with up to 6 96 kHz audio channels or 2 192 kHz samples.

    One thing to realize about PCM audio is that it’s extraordinarily sparse – there is a huge amount of compression that can be done to the data to reduce the size of the audio data.  But in most cases, when the data finally hits your sound card, it’s represented as PCM data (this isn’t always the case, for example, if you’re using the SPDIF connector on your sound card, then the data sent to the card isn’t PCM).

    Edit: Corrected math slightly.

    Edit: Added a couple of pictures (Thanks Simon!)

    Edit3: Not high pass, low pass filter, thanks Stefan.

  • Larry Osterman's WebLog

    Threat Modeling Again, Threat Modeling Rules of Thumb

    • 12 Comments

    I wrote this piece up for our group as we entered the most recent round of threat models.  I've cleaned it up a bit (removing some Microsoft-specific stuff), and there's stuff that's been talked about before, but the rest of the document is pretty relevant. 

     

    ---------------------------------------

    As you go about filling in the threat model threat list, it’s important to consider the consequences of entering threats and mitigations.  While it can be easy to find threats, it is important to realize that all threats have real-world consequences for the development team.

    At the end of the day, this process is about ensuring that our customer’s machines aren’t compromised. When we’re deciding which threats need mitigation, we concentrate our efforts on those where the attacker can cause real damage.

     

    When we’re threat modeling, we should ensure that we’ve identified as many of the potential threats as possible (even if you think they’re trivial). At a minimum, the threats we list that we chose to ignore will remain in the document to provide guidance for the future. 

     

    Remember that the feature team can always decide that we’re ok with accepting the risk of a particular threat (subject to the SDL security review process). But we want to make sure that we mitigate the right issues.

    To help you guide your thinking about what kinds of threats deserve mitigation, here are some rules of thumb that you can use while performing your threat modeling.

    1. If the data hasn’t crossed a trust boundary, you don’t really care about it.

    2. If the threat requires that the attacker is ALREADY running code on the client at your privilege level, you don’t really care about it.

    3. If your code runs with any elevated privileges (even if your code runs in a restricted svchost instance) you need to be concerned.

    4. If your code invalidates assumptions made by other entities, you need to be concerned.

    5. If your code listens on the network, you need to be concerned.

    6. If your code retrieves information from the internet, you need to be concerned.

    7. If your code deals with data that came from a file, you need to be concerned (these last two are the inverses of rule #1).

    8. If your code is marked as safe for scripting or safe for initialization, you need to be REALLY concerned.

     

    Let’s take each of these in turn, because there are some subtle distinctions that need to be called out.

    If the data hasn’t crossed a trust boundary, you don’t really care about it.

    For example, consider the case where a hostile application passes bogus parameters into our API. In that case, the hostile application lives within the same trust boundary as the application, so you can simply certify the threat. The same thing applies to window messages that you receive. In general, it’s not useful to enumerate threats within a trust boundary. [Editors Note: Yesterday, David LeBlanc wrote an article about this very issue - I 100% agree with what he says there.] 

    But there’s a caveat (of course there’s a caveat, there’s ALWAYS a caveat). Just because your threat model diagram doesn't have a trust boundary on it, it doesn't mean that the data being validated hasn't crossed a trust boundary on the way to your code.

    Consider the case of an application that takes a file name from the network and passes that filename into your API. And further consider the case where your API has an input validation bug that causes a buffer overflow. In that case, it’s YOUR responsibility to fix the buffer overflow – an attacker can use the innocent application to exploit your code. Before you dismiss this issue as being unlikely, consider CVE-2007-3670. The Firefox web browser allows the user to execute scripts passed in on the command line, and registered a URI handler named “firefoxurl” with the OS with the start action being “firefox.exe %1” (this is a simplification). The attacker simply included a “firefoxurl:<javascript>” in a URL and was able to successfully take ownership of the client machine. In this case, the firefox browser assumed that there was no trust boundary between firefox.exe and the invoker, but it didn’t realize that it introduced such a trust boundary when it created the “firefoxurl” URI handler.

    If the threat requires that the attacker is ALREADY running code on the client at your privilege level, you don’t really care about it.

    For example, consider the case where a hostile application writes values into a registry key that’s read by your component. Writing those keys requires that there be some application currently running code on the client, which requires that the bad guy first be able to get code to run on the client box.

    While the threats associated with this are real, it’s not that big a problem and you can probably state that you aren’t concerned by those threats because they require that the bad guy run code on the box (see Immutable Law #1: “If a bad guy can persuade you to run his program on your computer, it’s not your computer anymore”).

    Please note that this item has a HUGE caveat: it ONLY applies if the attacker’s code is running at the same privilege level as your code. If that’s not the case, you have the next rule of thumb:

    If your code runs with any elevated privileges, you need to be concerned.

    We DO care about threats that cross privilege boundaries. That means that any data communication between an application and a service (which could be an RPC, it could be a registry value, it could be a shared memory region) must be included in the threat model.

    Even if you’re running in a low privilege service account, you still may be attacked – one of the privileges that all services get is the SE_IMPERSONATE_NAME privilege. This is actually one of the more dangerous privileges on the system because it can allow a patient attacker to take over the entire box. Ken “Skywing” Johnson wrote about this in a couple of posts on his blog (1 and 2) on his excellent blog Nynaeve. David LeBlanc has a subtly different take on this issue (see here), but the reality is that both David and Ken agree more than they disagree on this issue. If your code runs as a service, you MUST assume that you’re running with elevated privileges. This applies to all data read – rule #2 (requiring an attacker to run code) does not apply when you cross privilege levels, because the attacker could be writing code under a low privilege account to enable an elevation of privilege attack.

    In addition, if your component has a use scenario that involves running the component elevated, you also need to consider that in your threat modeling.

    If your code invalidates assumptions made by other entities, you need to be concerned

    The reason that the firefoxurl problem listed above was such a big deal was that the firefoxurl handler invalidated some of the assumptions made by the other components of Firefox. When the Firefox team threat modeled firefox, they made the assumption that Firefox would only be invoked in the context of the user.  As such it was totally reasonable to add support for executing scripts passed in on the command line (see rule of thumb #1).  However, when they threat modeled the firefoxurl: URI handler implementation, they didn’t consider that they had now introduced a trust boundary between the invoker of Firefox and the Firefox executable.  

    So you need to be aware of the assumptions of all of your related components and ensure that you’re not changing those assumptions. If you are, you need to ensure that your change doesn’t introduce issues.

    If your code retrieves information from the internet, you need to be concerned

    The internet is a totally untrusted resource (no duh). But this has profound consequences when threat modeling. All data received from the Internet MUST be treated as totally untrusted and must be subject to strict validation.

    If your code deals with data that came from a file, then you need to be concerned.

    In the previous section, I talked about data received over the internet. Microsoft has issued several bulletins this year that required an attacker tricking a user into downloading a specially crafted file over the internet; as a consequence, ANY file data must be treated as potentially malicious. For example, MS07-047 (a vulnerability in WMP) required that the attacker force the user to view a specially crafted WMP skin. The consequence of this is that that ANY file parsed by our code MUST be treated as coming from a lower level of trust.

    Every single file parser MUST treat its input as totally untrusted –MS07-047 is only one example of an MSRC vulnerability, there have been others. Any code that reads data from a file MUST validate the contents. It also means that we need to work to ensure that we have fuzzing in place to validate our mitigations.

    And the problem goes beyond file parsers directly. Any data that can possibly be read from a file cannot be trusted. <A senior developer in our division> brings up the example of a codec as a perfect example. The file parser parses the container and determines that the container isn't corrupted. It then extracts the format information and finds the appropriate codec for that format. The parser then loads the codec and hands the format information and file data to the codec.

    The only thing that the codec knows is that the format information that’s been passed in is valid. That’s it. Beyond the fact that the format information is of an appropriate size and has a verifiable type, the codec can make no assumptions about the contents of the format information, and it can make no assumptions about the file data. Even though the codec doesn’t explicitly parse the file, it’s still dealing with untrusted data read from the file.

    If your code is marked as “Safe For Scripting” or “Safe for Initialization”, you need to be REALLY concerned.

    If your code is marked as “Safe For Scripting” (or if your code can be invoked from a control that is marked as Safe For Scripting), it means that your code can be executed in the context of a web browser, and that in turn means that the bad guys are going to go after your code. There have been way too many MSRC bulletins about issues with ActiveX controls.

    Please note that some of the issues with ActiveX controls can be quite subtle. For instance, in MS02-032 we had to issue an MSRC fix because one of the APIs exposed by the WMP OCX returned a different error code if a path passed into the API was a file or if it was a directory – that constituted an Information Disclosure vulnerability and an attacker could use it to map out the contents of the users hard disk.

    In conclusion

    Vista raised the security bar for attackers significantly. As Vista adoption spreads, attackers will be forced to find new ways to exploit our code. That means that it’s more and more important to ensure that we do a good job ensuring that they have as few opportunities as possible to make life difficult for our customers.  The threat modeling process helps us understand the risks associated with our features and understand where we need to look for potential issues.

  • Larry Osterman's WebLog

    COM registration if you need a typelib

    • 8 Comments
    The problem with the previous examples I posted on minimal COM object registration is that they don't always work.  As I mentioned, if you follow the rules specified, while your COM object will work just fine from Win32 applications, you'll have problems if you attempt to access it from a managed environment (either an app running under the CLR or another management environment such as the VB6 runtime or the scripting host).

    For those environments, you need to have a typelib.  Since typelib's were designed primarily for interoperating with visual basic, they don't provide full access to the functionality that's available via MIDL (for instance, unnamed unions get turned into named unions, the MIDL boolean type isn't supported, etc), but if you gotta interoperate, you gotta interoperate.

    So you've followed the examples listed here and you've registered your COM object, now how do you hook it up to the system?

    First, you could call the RegisterTypeLib function, which will perform the registration, but that would be cheating :)  More importantly, there are lots of situations where it's inappropriate to use RegisterTypeLib - for instance, if you're building an app that needs to be installed, you need to enumerate all the registry manipulations done by your application so they can be undone.

    So if you want to register a typelib, it's a smidge more complicated than registering a COM component or interface.

    To register a typelib, you need (from here):

    Key: HKEY_CLASSES_ROOT\Typelib\<LibID>\
    Key: HKEY_CLASSES_ROOT\Typelib\<LibID>\<major version>.<minor version>\   
        Default Value: <friendly name for the library> Again, not really required, but nice for oleview
    Key: HKEY_CLASSES_ROOT\Typelib\<LibID>\<major version>.<minor version>\HELPDIR   
        Default Value: <Directory that contains the help file for the type library>
    Key: HKEY_CLASSES_ROOT\Typelib\<LibID>\<major version>.<minor version>\FLAGS   
        Default Value: Flags for the ICreateTypeLib::SetLibFlags call (typically 0)
    Key: HKEY_CLASSES_ROOT\Typelib\<LibID>\<major version>.<minor version>\<LCID for library>
    Key: HKEY_CLASSES_ROOT\Typelib\<LibID>\<major version>.<minor version>\<LCID>\<Platform>
        Default Value: <File name that contains the typelib>

    Notes:

    If your typelib isn't locale-specific, you can specify 0 for the LCID.  Looking at my system, that's typically what most apps do.

    <Platform> can be win32, win64 or win16 depending on the platform of the binary.
     

    But this isn't quite enough to get the typelib hooked up  - the system still doesn't know how to get access to the type library.  To do that, you need to enhance your CLSID registration to let COM know that there's a typelib available.  With the typelib, a managed environment can synthesize all the interfaces associated with a class.  To do that, you enhance the class registration:

    Key: HKEY_CLASSES_ROOT\CLSID\<CLSID>\TypeLib = <LibID>

    But we're still not quite done.  For each of the interfaces in the typelib, you can let the system do the marshaling of the interface for you without having to specify a proxy library.  To do this, you can let the standard proxy marshaler do the work.  The universal marshaler has a clsid of {00020424-0000-0000-C000-000000000046}, so instead of using the interface registration mentioned in the last article, you can replace it with:

    Key: HKEY_CLASSES_ROOT\Interface\<IID>\
        Default Value: <friendly name for the interface> Again, not really required, but nice for oleview
    Key: HKEY_CLASSES_ROOT\Interface\<IID>\ProxyStubClsid32\
        Default Value: {00020424-0000-0000-C000-000000000046}
    Key: HKEY_CLASSES_ROOT\Interface\<IID>\TypeLib\
        Default Value: <LibID>

    Now instead of using the proxy code in a proxy DLL, the system will do the marshaling for you.

    Next: Ok, but what if I don't want to deal with all those ugly GUID thingies?

  • Larry Osterman's WebLog

    Microsoft Anti-Spyware

    • 23 Comments

    I don't normally do "Me Too" posts, and I know that this one will get a lot of coverage on the Microsoft blogs, but the Seattle PI blog just mentioned that the beta of Microsoft's new anti-spyware solution was just released to the web here.

    I installed it on my machines at work yesterday, and it seems pretty nice so far.  Of course I didn't have any spyware for it to find (because I'm pretty darned careful, and run as a limited user), but...  It'll be interesting to run it at home, especially since Valorie (and I) like playing some of the online games (like Popcap's) that get singled out as being spyware by some tools.

     

    I have no knowledge of their final product plans, so it's pointless to ask.  All I know about this is what I've read in the press.

     

  • Larry Osterman's WebLog

    Collectionz

    • 23 Comments

    For those of you that know us, you know that everyone in my family is an inveterate reader.  One of the more unfortunate consequences of this is that we have a TON of books.  The front office in our house, the bonus room over the garage, and Daniel's old bedroom are all given over to books, my guess is that we have two or three thousand of them.

    For Fathers day this year, Valorie got me a Flic Barcode scanner and some software from Collectorz.  The Flic barcode scanner is a small handheld scanner with memory for about 500 UPC codes, combined with Collectorz movie, book and music collector, it has the ability to categorize all our collections.

    Initially I sort-of ignored it, but last night at about 10:00, Valorie reminded me of it.  I installed the software and played around with it a bit.  And a bit more.  And still some more.

    Darn, I had never thought that I'd spend two and a half hours (with Valorie) running around pulling books from the library trying to find ones that the program wouldn't find.  And I've got to say, it did a remarkable job.  Except for the hundred or so books that pre-date bar-codes (I still have the very first book I ever purchased (Checkpoint Lambda by Murray Leinster), it did a remarkable job. 

    Essentially the software reads the data off the barcode, then datamines off of a bunch of sites to build the database, including Amazon.com, B&N.com, Powells.com, the Library of Congress, imdb (for movies), etc.  It's actually pretty cool.

    Again, this is just first impressions - one tricky bit is that the barcode on the back of the book often isn't the ISBN, which screws up the database lookup, but that's really not the fault of the software.

    Anyway, it's a cool toy :)

     

  • Larry Osterman's WebLog

    Moving Offices

    • 27 Comments
    Well, last week, we had yet another office move.

    Office moves are sort-of a tradition at Microsoft, this one's something like my 20th.  Personally I think that management schedules them just to make sure we don't collect too much junk in our offices... 

    For me, it doesn't help, I moved 14 boxes of stuff this time (and a boatload of legos that were stashed in my grandmanagers office).

    As I said, moving's a regular occurrence - I'm in my 4th office in this building alone.  Fortunately, intra-building moves aren't NEARLY as painful as inter-building moves, but they're still a pain in the neck.

    My longest time in an office was something like two years, my shortest was 2 weeks (they moved us out of building one into building four for two weeks while they moved another group out of building two, then moved us from building four back into building two).  I've had corner offices (twice, once in building two, another time in 25), I've had window offices and I've had interior offices.  I've got to say that I REALLY hate corner offices - my office has a whiteboard, a corkboard and two bookshelves, but in a corner office, you lose one of your walls, which means that you can only have two of the 4 items (we have modular shelving and corkboard units in our offices, in an interior office, you get two walls full of hanging shelving racks, in a corner office, you only get one, plus a partial one).  The great view doesn't even come close to making up for the loss of a bookshelf.  In my case, one of my bookshelves is filled with lego models, but who's counting :)

    I can't wait to see the view from my new office though - it faces more-or-less northeast, which means that I get to see the Cascades.  I took the opportunity to reorient my office as well - traditionally, I have had my office laid out like this:

    But I'm laying my new office out like this:

    just to take advantage of the view (Ignore the units, they're Visio goop from when I made the drawing).  I like facing the door (so I can see who's coming), but I figured that the view would be worth the startle effect.  I suspect I'll end up getting a mirror to put into the window so I can see people at the door...  The cool thing about the new layout is that I'll be able to add a round table to the office, so I'll be able to get the manipulative puzzles off my main desk onto the round table.

    Unfortunately, this morning, just before came into work to unpack, the fan motor on the AC blower feeding into my office gave up the ghost, filling the office (and the corridor) with REALLY noxious fumes, so I'm currently installed in an empty office near my office (I'd forgotten how heavy a 21 inch CRT monitor is).

    Anyway, today's tech-light, hopefully I'll get bandwidth to do more tomorrow.

    Edit: Clarified text around new office layout, it was awkwards.

     

  • Larry Osterman's WebLog

    Nathan's laws of software

    • 16 Comments
    Way back in 1997, Nathan Myhrvold (CTO of Microsoft at the time) wrote a paper entitled "The Next Fifty Years of Software" (Subtitled "Software: The Crisis Continues!")  which was presented at the ACM97 conference (focused on the next 50 years of computing).

    I actually attended an internal presentation of this talk, it was absolutely riveting. Nathan's a great public speaker, maybe even better than Michael Howard :).

    But an email I received today reminded me of Nathan's First Law of Software:  "Software is a Gas!"

    Nathan's basic premise is that as machines get bigger, the software that runs on those computers will continue to grow. It doesn't matter what kind of software it is, or what development paradigm is applied to that software.  Software will expand to fit the capacity of the container.

    Back in the 1980's, computers were limited.  So software couldn't do much.  Your spell checker didn't run automatically, it needed to be invoked separately.  Nowadays, the spell checker runs concurrently with the word processor.

    The "Bloatware" phenomenon is a direct consequence of Nathan's First Law.

    Nathan's second law is also fascinating: "Software grows until it becomes limited by Moore's Law". 

    The second law is interesting because we're currently nearing the end of the cycle of CPU growth brought on by Moore's law.  So in the future, the growth of software is going to become significantly constrained (until some new paradigm comes along).

    His third law is "Software growth makes Moore's Law possible".  Essentially he's saying that because software grows to hit the limits of Moore's law, software regularly comes out that pushes the boundaries.  And that's what drives hardware sales.  And the drive for ever increasing performance drives hardware manufacturers to make even faster and smaller machines, which in turn makes Moore's Law a reality.

    And I absolutely LOVE Nathan's 4th law.  "Software is only limited by human ambition and expectation."   This is so completely true.  Even back when the paper was written, the capabilities of computers today were mere pipe dreams.  Heck, in 1997, you physically couldn't have a computer with a large music library - a big machine in 1997 had a 600M hard disk.

    What's also interesting is the efforts in fighting Nathan's first law.  It's a constant fight, waged by diligent performance people against the hoards of developers who want to add their new feature to the operating system.  All the developers want to expand their features.  And the perf people need to fight back to stop them (or at least make them justify what they're doing).  The fight is ongoing, and unending.

    Btw, check out the slides they're worth reading.  Especially when he gets to the part where the stuff that makes you genetically unique fits on a 3 1/2" floppy drive.

    He goes on from that point - at one point in his presentation, he pointed out that the entire human sensory experience can be transmitted easily on a 100mB ethernet connection.

     

    Btw, for those of you who would like, there's a link to two different streaming versions of the talk here: http://research.microsoft.com/acm97/

     

    Edit: Added link to video of talk.

     

  • Larry Osterman's WebLog

    Fun with names

    • 10 Comments
    The other day, someone sent an email to an internal mailing list asking about a "typo" in the eventvwr.

    It seems they noticed a number of events coming from the "bowser" event source, and they were convinced that it had to be a typo.

     

    Well, it's not :)  The name of the component is bowser, and I wrote it back in NT 3.1...

     

    The bowser is actually the kernel mode portion of the Computer browser service.  It also handles receiving broadcast mailslot messages and handing them.  When I originally described the functionality, my boss at the time (who was rather opinionated) said "What a dog!  Why don't we call it the bowser?" 

    For various technical reasons we didn't want to call the kernel component browser.sys (because it messed up the debugger to have two components with the same name), so the name bowser just stuck.

    Thus was born the name of the "misspelled" system component.  Nowadays the bowser is essentially gone (for instance, I can't find it on my XP SP2 installation), but the name lives on in eventlogs everywhere...

     

  • Larry Osterman's WebLog

    AARDvarks in your code.

    • 29 Comments

    If there was ever a question that I’m a glutton for punishment, this post should prove it.

    We were having an email discussion the other day, and someone asked:

    Isn't there a similar story about how DOS would crash when used with [some non-MS thing] and only worked with [some MS thing]? I don't remember what the "thing" was though =)

    Well, the only case I could think of where that was the case was the old AARD code in Windows.  Andrew Schulman wrote a great article on it back in the early 1990’s, which dissected the code pretty thoroughly.

    The AARD code in Windows was code to detect when Windows was running on a cloned version of MS-DOS, and to disable Windows on that cloned operating system.  By the time that Windows 3.1 shipped, it had been pulled from Windows, but the vestiges of the code were left behind.  As Andrew points out, the code was obfuscated, and had debugger-hiding logic, but it could be reverse engineered, and Andrew did a great job of doing it.

    I can’t speak as to why the AARD code was obfuscated, I have no explanation for that, it seems totally stupid to me.  But I’ve got to say that I totally agree with the basic concept of Windows checking for an alternative version of MS-DOS and refusing to run on it.

    The thing is that the Windows team had a problem to solve, and they didn’t care how they solved it.  Windows decided that it owned every part of the system, including the internal data structures of the operating system.  It knew where those structures were located, it knew what the size of those data structures was, and it had no compunction against replacing those internal structures with its own version.  Needless to say, from a DOS developer’s standpoint, keeping Windows working was an absolute nightmare.

    As a simple example, when Windows started up, it increased the size of MS-DOS’s internal file table (the SFT, that’s the table that was created by the FILES= line in config.sys).  It did that to allow more than 20 files to be opened on the windows system (a highly desirable goal for a multi-tasking operating system).  But it did that by using an undocumented API call, which returned a pointer to a set of “interesting” pointers in MS-DOS. It then indexed a known offset relative to that pointer, and replaced the value of the master SFT table with its own version of the SFT.  When I was working on MS-DOS 4.0, we needed to support Windows.  Well, it was relatively easy to guarantee that our SFT was at the location that Windows was expecting.  But the problem was that the MS-DOS 4.0 SFT was 2 bytes larger than the MS-DOS 3.1 SFT.   In order to get Windows to work, I had to change the DOS loader to detect when win.com was being loaded, and if it was being loaded, I looked at the code at an offset relative to the base code segment, and if it was a “MOV” instruction, and the amount being moved was the old size of the SFT, I patched the instruction in memory to reflect the new size of the SFT!  Yup, MS-DOS 4.0 patched the running windows binary to make sure Windows would still continue to work.

    Now then, considering how sleazy Windows was about MS-DOS, think about what would happen if Windows ran on a clone of MS-DOS.  It’s already groveling internal MS-DOS data structures.  It’s making assumptions about how our internal functions work, when it’s safe to call them (and which ones are reentrant and which are not).  It’s assuming all SORTS of things about the way that MS-DOS’s code works.

    And now we’re going to run it on a clone operating system.  Which is different code.  It’s a totally unrelated code base.

    If the clone operating system isn’t a PERFECT clone of MS-DOS (not a good clone, a perfect clone), then Windows is going to fail in mysterious and magical ways.  Your app might lose data.  Windows might corrupt the hard disk.   

    Given the degree with which Windows performed extreme brain surgery on the innards of MS-DOS, it’s not unreasonable for Windows to check that it was operating on the correct patient.

     

    Edit: Given that most people aren't going to click on the link to the Schulman article, it makes sense to describe what the AARD check was :)

    Edit: Fixed typo, thanks KC

  • Larry Osterman's WebLog

    Hey, why am I leaking all my BSTR's?

    • 12 Comments

    IMHO, every developer should have a recent copy of the debugging tools for windows package installed on their machine (it's updated regularly, so check to see if there's a newer version).

    One of the most useful leak tracking tools around is a wonderfully cool tool that's included in this package, UMDH.  UMDH allows you to take a snapshot of the heaps in a process, and perform a diff of the heap over time - basically you run it once to take a snapshot, then run it a second time after running a particular test and it allows you to compare the differences in the heaps.

    This tool can be unbelievably useful when debugging services, especially shared services.  The nice thing about it is that it provides a snapshot of the heap usage, there are often times when that's the only way to determine the cause of a memory leak.

    As a simple example of this the Exchange 5.5 IMAP server cached user logons.  It did this for performance reasons, it could take up to five seconds for a call to LogonUser to complete, and that affected our ability to service large numbers of clients - all of the server threads ended up being blocked waiting on the domain controllers to respond.  So we put in a logon cache.  The cache took the users credentials, performed a LogonUser with those credentials, and put the results into a heap.  On subsequent logons, the cache took the users credentials, looked them up in the heap, and if they were found, it just reused the token from the cache (and no, it didn't do the lookup in clear text, I'm not that stupid).  Unfortunately, when I first wrote the cache implementation, I had an uninitialized variable in the hash function used to lookup the user in the cache, and as a result, every user logon occupied a different slot in the hash table.  As a result, when run over time, I had a hideous memory leak (hundreds of megabytes of VM).  But, since the cache was purged on exit, the built-in leak tracking logic in the Exchange store didn't detect any memory leaks. 

    We didn't have UMDH at the time, but UMDH would have been a perfect solution to the problem.

    I recently went on a tear trying to find memory leaks in some of the new functionality we've added to the Windows Audio Service, and used UMDH to try to catch them.

    I found a bunch of the leaks, and fixed them, but one of the leaks I just couldn't figure out showed up every time we allocated a BSTR object.

    It drove me up the wall trying to figure out how we were leaking BSTR objects, nothing I did found the silly things.  A bunch of the leaks were in objects allocated with CComBSTR, which really surprised me, since I couldn't see how on earth they would leak memory.

    And then someone pointed me to this KB article (KB139071).  KB1239071 describes the OLE caching of BSTR objects.  It also turns out that this behavior is described right on the MSDN page for the string manipulation functions, proving once again that I should have looked at the documentation :).

    Basically, OLE caches all BSTR objects allocated in a process to allow it to pool together strings.  As a result, these strings are effectively leaked “on purposeâ€.  The KB article indicates that the cache is cleared when the OLEAUT32.DLL's DLL_PROCESS_DETACH logic is run, which is good to know, but didn't help me to debug my BSTR leak - I could still be leaking BSTRs.

    Fortunately, there's a way of disabling the BSTR caching, simply set the OANOCACHE environment variable to 1 before launching your application.  If your application is a service, then you need to set OANOCACHE as a system environment variable (the bottom set of environment variables) and reboot.

    I did this and all of my memory leaks mysteriously vanished.  And there was much rejoicing.

     

  • Larry Osterman's WebLog

    UUIDs are only unique if you generate them...

    • 28 Comments

    We had an internal discussion recently and the upshot of the discussion was that it turns out that some distributed component on the web appears to have used the UUID of a sample COM component.

    Sigh.

    I wonder sometimes why people do this.  It's not like it's hard to run uuidgen and then copy the relevent GUIDs to your RGS file (and/or IDL file, or however it is you're defining and registering your class).

    I guess the developers of the distributed component figured that they didn't have to follow the rules because everyone else was going to follow them.

    And, no, I don't know what component it was, or why they decided to copy the sample.

    So here's a good rule of thumb.  When you're designing a COM component, you should probably use UUIDGEN (or UuidCreate()) to generate unique (and separate) GUIDS for the Interface ID, Class ID, and Library ID and App ID.

     

  • Larry Osterman's WebLog

    What is a BUGBUG?

    • 27 Comments
    One of the internal software engineering traditions here at Microsoft is the "BUGBUG".

    Bugbug's are annotations that are added to the source code when the developer writing the code isn't sure if the code they're writing is "correct", or if there's some potential issue with the code that the developer feels needs further investigation.

    So when looking through source code, you sometimes find things like:

        // BUGBUG: I'm sure these GUIDs are defined somewhere but I'm not sure which library contains them, so defining them here.
        DEFINE_GUID(IID_IFoo, 0x12345678,0x1234,0x1234,0x12,0x12,0x12,0x12,0x12,0x12,0x12,0x12);
     

    The idea behind a BUGBUG annotation is that a BUGBUG is something that you should fix before you ship, but that won't necessarily hold shipping the product for - as in the example above, it's not the end of the world if the definition of IFoo is duplicated in this module, but it IS somewhat sloppy.  Typically every component has a P1 bug in the database to remove all the BUGBUG's - either turn them into real bugs, or remove them, or ensure that unit tests exist to verify (or falsify) the bugbug.

    As far as I know, the concept of a BUGBUG's was initially created by Alan Whitney, who was my first manager at Microsoft - I know he's the first person who explained their use to me.  Lately they've fallen out of favor in favor of more structured constructs, but conceptually, I still like them.

  • Larry Osterman's WebLog

    Office Decorations

    • 18 Comments

    One of the long standing traditions here at Microsoft is decorating other employee’s offices.

    Over the years, people have come up with some extraordinarily creative ways to trash others offices.  It’s truly awe inspiring how people use their imagination when they want to make mischief.

    One of my all-time favorites was done to one of the Xenix developers for his birthday.

    This particular developer had recently taken up golf.  So the members of his team came in one night, removed all the furniture from the office, and brought in sod to cover the office floor.

    They then cut a hole in the sod for the golf cup, mounted a golf pole (stolen from a nearby golf course, I believe), and put all the office furniture back in the office, making him his own in-office putting green.

    You could smell the sod from one side of the building to the other, it was that strong.

    I don’t want to think of how they ended cleaning it up.

     

  • Larry Osterman's WebLog

    How do you know what a particular error code means?

    • 16 Comments

    So you're debugging your program, and all of a sudden you get this wierd error code - say error 0x00000011.  How do you know what that message means?

    Well, one way is to memorize the entire Win32 error return code set, but that's got some issues.

    Another way, if you have the right debugger extension is to use the !error extension - it will return the error text associated with the error.  There's a similar trick for dev studio (although I'm not sure what it is since I don't use the devstudio debugger)

    But sometimes you're not running under windbg or devstudio and you've got a Win32 error code to look up.

    And here's where the clever trick comes in.  You see, there's a complete list of error codes built into the system.  It's buried in the NET.EXE command that's used for network administration.

    If you type "NET HELPMSG <errorno>" on the command line, you'll get a human readable version of the error code. 

    So:

    C:\>net helpmsg 17
    The system cannot move the file to a different disk drive.

    It's a silly little trick, but I've found it extraordinarily useful.

     

  • Larry Osterman's WebLog

    Some final thoughts on Threat Modeling...

    • 16 Comments

    I want to wrap up the threat modeling posts with a summary and some comments on the entire process.  Yeah, I know I should have done this last week, but I got distracted :). 

    First, a summary of the threat modeling posts:

    Part 1: Threat Modeling, Once again.  In which our narrator introduces the idea of a threat model diagram

    Part 2: Threat Modeling Again. Drawing the Diagram.  In which our narrator introduces the diagram for the PlaySound API

    Part 3: Threat Modeling Again, Stride.  Introducing the various STRIDE categories.

    Part 4: Threat Modeling Again, Stride Mitigations.  Discussing various mitigations for the STRIDE categories.

    Part 5: Threat Modeling Again, What does STRIDE have to do with threat modeling?  The relationship between STRIDE and diagram elements.

    Part 6: Threat Modeling Again, STRIDE per Element.  In which the concept of STRIDE/Element is discussed.

    Part 7: Threat Modeling Again, Threat Modeling PlaySound.  Which enumerates the threats against the PlaySound API.

    Part 8: Threat Modeling Again, Analyzing the threats to PlaySound.  In which the threat modeling analysis work against the threats to PlaySound is performed.

    Part 9: Threat Modeling Again, Pulling the threat model together.  Which describes the narrative structure of a threat model.

    Part 10: Threat Modeling Again, Presenting the PlaySound threat model.  Which doesn't need a pithy summary, because the title describes what it is.

    Part 11: Threat Modeling Again, Threat Modeling in Practice.  Presenting the threat model diagrams for a real-world security problem .[1]

    Part 12: Threat Modeling Again, Threat Modeling and the firefoxurl issue. Analyzing the real-world problem from the standpoint of threat modeling.

    Part 13: Threat Modeling Again, Threat Modeling Rules of Thumb.  A document with some useful rules of thumb to consider when threat modeling.

     

    Remember that threat modeling is an analysis tool. You threat model to identify threats to your component, which then lets you know where you need to concentrate your resources.  Maybe you need to encrypt a particular data channel to protect it from snooping.  Maybe you need to change the ACLs on a data store to ensure that an attacker can't modify the contents of the store.  Maybe you just need to carefully validate the contents of the store before you read it.  The threat modeling process tells you where to look and gives you suggestions about what to look for, but it doesn't solve the problem.  It might be that the only thing that comes out from your threat modeling process is a document that says "We don't care about any of the threats to this component".  That's ok, at a minimum, it means that you considered the threats and decided that they were acceptable.

    The threat modeling process is also a living process. I'm 100% certain that 2 years from now, we're going to be doing threat modeling differently from the way that we do it today.  Experience has shown that every time we apply threat modeling to a product, we realize new things about the process of performing threat modeling, and find new, more efficient ways of going about the process.   Even now, the various teams involved with threat modeling in my division have proposed new changes the process based on the experiences of our current round of threat modeling.  Some of them will be adopted as best practices across Microsoft, some of them will be dropped on the floor. 

     

    What I've described over these posts is the process of threat modeling as it's done today in the Windows division at Microsoft.  Other divisions use threat modeling differently - the threat landscape for Windows is different from the threat landscape for SQL Server and Exchange, which is different from the threat landscape for the various Live products, and it's entirely different for our internal IT processes.  All of these groups use threat modeling, and they use the core mechanisms in similar ways, but because each group that does threat modeling has different threats and different risks, the process plays out differently for each team.

    If your team decides to adopt threat modeling, you need to consider how it applies to your components and adopt the process accordingly.  Threat Modeling is absolutely not a one-size-fits-all process, but it IS an invaluable tool.

     

    EDIT TO ADD: Adam Shostak on the Threat Modeling Team at Microsoft pointed out that the threat modeling team has a developer position open.  You can find more information about the position by going to here:  http://members.microsoft.com/careers/search/default.aspx and searching for job #207443.

    [1] Someone posting a comment on Bruce Schneier's blog took me for task for using a browser vulnerability.  I chose that particular vulnerability because it was the first that came to mind.  I could have just as easily picked the DMG loading logic in OSX or the .ANI file code in Windows for examples (actually the DMG file issues are in several ways far more interesting than the firefoxurl issue - the .ANI file issue is actually relatively boring from a threat modeling standpoint).

  • Larry Osterman's WebLog

    Why I removed the MSN desktop search bar from IE

    • 16 Comments

    I was really quite excited to see that the MSN Desktop Search Team had finally released the final version of their MSN Desktop Search toolbar.

    I've been using it for quite a while, and I've been really happy with it (except for the minor issue that the index takes up 220M of virtual memory, but that's just VA - the working set of the index is quite reasonable).

    So I immediately downloaded it and enabled the toolbar on IE.

    As often happens with toolbars, the toolbar was in the wrong place.  No big deal, I unlocked the toolbar and repositioned it to where I want it (immediately to the right of the button bar, where it takes up less real-estate).

    Then I locked the toolbar.  And watched as the MSN desktop search toolbar repositioned itself back where it was originally.

    I spent about 10 minutes trying to figure out a way of moving the desktop search bar next to the button bar, to no success.  By positioning it in the menu bar, I was able to get it to move into the button bar when I locked the toolbar, but it insisted on being positioned to the left of the button bar, not the right.

    Eventually I gave up.  I'm not willing to give up 1/4 inch of screen real-estate to an IE toolbar - it doesn't give me enough value to justify the real-estate hit.

    Sorry guys.  I'm still using the desktop search stuff (it's very, very cool), including the taskbar toolbar, but not the IE toolbar.  I hate it when my toolbars have a mind of their own.

    Update: Someone on the CLR team passed on a tip: The problem I was having is because I run as a limited user.  But it turns out that if you exit IE and restart it, the toolbar sticks where you put it!

    So the toolbar's back on my browser.

  • Larry Osterman's WebLog

    What's wrong with this code, part lucky 13

    • 35 Comments
    Today's example is a smidge long, I've stripped out everything I can possibly imagine stripping out to reduce size.

    This is a very real world example that we recently hit - only the names have been changed to protect the innocent.

    I've used the built-in C++ decorations for interfaces, but that was just to get this stuff to compile in a single source file, it's not related to the bug.

    extern CLSID CLSID_FooDerived;
    [
        object,
        uuid("0A0DDEDC-C422-4BB3-9869-4FED020B66C5"),
    ]
    __interface IFooBase : IUnknown
    {
        HRESULT FooBase();
    };

    class CFooBase: public IFooBase
    {
        LONG _refCount;
        virtual ~CFooBase()
        {
            ASSERT(_refCount == 0);
        };
    public:
        CFooBase() : _refCount(1) {};
        virtual HRESULT STDMETHODCALLTYPE QueryInterface(const IID& iid, void** ppUnk)
        {
            HRESULT hr=S_OK;
            *ppUnk = NULL;
            if (iid == IID_FooBase)
            {
                AddRef();
                *ppUnk = reinterpret_cast<void *>(this);
            }
            else if (iid == IID_IUnknown)
            {
                AddRef();
                *ppUnk = reinterpret_cast<void *>(this);
            }
            else
            {
                hr = E_NOINTERFACE;
            }
            return hr;
        }
        virtual ULONG STDMETHODCALLTYPE AddRef(void)
        {
            return InterlockedIncrement(&_refCount);
        }
        virtual ULONG STDMETHODCALLTYPE Release(void)
        {
            LONG refCount;
            refCount = InterlockedDecrement(&_refCount);
            if (refCount == 0)
            {
                delete this;
            }
            return refCount;

        }
        STDMETHOD(FooBase)(void);
    };
    class ATL_NO_VTABLE CFooDerived :
        public CComObjectRootEx<CComMultiThreadModel>,
        public CComCoClass<CFooDerived, &CLSID_FooDerived>,
        public CFooBase
    {
        virtual ~CFooDerived();
        public:
        CFooDerived();
        DECLARE_NO_REGISTRY()
        BEGIN_COM_MAP(CFooDerived)
            COM_INTERFACE_ENTRY(IFooBase)
        END_COM_MAP()
        DECLARE_PROTECT_FINAL_CONSTRUCT()

    };

    OBJECT_ENTRY_AUTO(CLSID_FooDerived, CFooDerived)

     

    As always, tomorrow I'll post the answers along with kudos and mea culpas.

    Edit: Fixed missing return value in Release() - without it it doesn't compile.  Also added the addrefs - my stupid mistake.  mirobin gets major props for those ones.

  • Larry Osterman's WebLog

    Nobody ever reads the event logs…

    • 19 Comments

    In my last post, I mentioned that someone was complaining about the name of the bowser.sys component that I wrote 20 years ago.  In my post, I mentioned that he included a screen shot of the event viewer.

    What was also interesting thing was the contents of the screen shot.

    “The browser driver has received too many illegal datagrams from the remote computer <redacted> to name <redacted> on transport NetBT_Tcpip_<excluded>.  The data is the datagram.  No more events will be generated until the reset frequency has expired.”

    I added this message to the browser 20 years ago to detect computers that were going wild sending illegal junk on the intranet.  The idea was that every one of these events indicated that something had gone horribly wrong on the machine which originated the event and that a developer or network engineer should investigate the problem (these illegal datagrams were often caused by malfunctioning networking hardware (which was not uncommon 20 years ago)).

    But you’ll note that the person reporting the problem only complained about the name of the source of the event log entry.  He never bothered to look at the contents of this “error” event log entry to see if there was something that was worth reporting.

    Part of the reason that nobody bothers to read the event logs is that too many components log to the eventlog.  The event logs on customers computers are filled with unactionable meaningless events (“The <foo> service has started.  The <foo> service has entered the running state.  The <foo> service is stopping.  The <foo> service has entered the stopped state.”).  And they stop reading the event log because there’s never anything actionable in the logs.

    There’s a pretty important lesson here: Nobody ever bothers reading event logs because there’s simply too much noise in the logs. So think really hard about when you want to write an event to the event log.  Is the information in the log really worth generating?  Is there important information that a customer will want in those log entries?

    Unless you have a way of uploading troublesome logs to be analyzed later (and I know that several enterprise management solutions do have such mechanisms), it’s not clear that there’s any value to generating log entries.

  • Larry Osterman's WebLog

    The dirty little secret of Windows volume

    • 8 Comments

    Here's a dirty little secret about volume in Windows.

    If you look at the documentation for waveOutSetVolume it very clearly says:

    Volume settings are interpreted logarithmically. This means the perceived increase in volume is the same when increasing the volume level from 0x5000 to 0x6000 as it is from 0x4000 to 0x5000.

    The implication of this is that you can implement a linear slider for volume control and use the position of the slider to represent the volume.  This is pretty cool.

    But if you've ever written an application that uses the waveform volume (say an app that plays content with a volume slider attached to it), you'll notice that your volume control is far more responsive when it's on the low end of the slider and less responsive on the high end of the slider.

    Logarithmic curve

    That's weird.  The volume settings are supposed to be logarithmic, but a slider that's more responsive at the low end of the scale than the high end of the scale is an indicator that the slider's controlling LINEAR volume.

    And that's the dirty little secret.  Even though the wave volume is supposed to be logarithmic, the wave volume is actually linear.

    What's worse is that we didn't notice this until we shipped Media Center Edition.  The PM for my group was playing with his MCE machine and noticed that the volume was linear.   To confirm it, he whipped out his sound pressure meter (he's a recording  artist so he has stuff like that in his house).  And yup, the volume control was linear.

    When he came back to work the next day, panic ensued.  I can't explain WHY nobody had noticed this, but they hadn't.

    In response, we added support (for XP SP2) for customized volume tapers for the audio APIs.  The results of that are discussed in this article.

     

    Interestingly enough, it appears that this problem is well known.  The article from which I stole this image discusses the problem of linear vs. logarithmic tapers and discusses how to find the optimal volume taper.

     

    Edit: Cleared up some ambiguities in the language.

  • Larry Osterman's WebLog

    Concurrency, Part 10 - How do you know if you've got a scalability issue?

    • 21 Comments
    Well, the concurrency series is finally running down (phew, it's a lot longer than I expected it to be)...

    Today's article is about determining how you know if you've got a scalability problem.

    First, a general principle: All non trivial, long lived applications have scalability problems.  It's possible that the scalability issues don't matter to your application.  For example, if you application is Microsoft Word (or mIRC, or Firefox, or just about any other application that interacts with the user)) then scalability isn't likely to be an issue for your application - the reality is that the user isn't going to try to make your application faster by throwing more resources at the application.

    As I write wrote the previous paragraph, I just realized that it describes the heart of scalability issues - if the user of your application feels it's necessary to throw more resources at your application, then your application needs to have to worry about scalability.  It doesn't matter if the resources being thrown at your application are disk drives, memory, CPUs, GPUs, blades, or entire computers, if the user decides that hat your system is bottlenecked on a resource, they're going to try to throw more of that resource at your application to make it run faster.  And that means that your application needs to be prepared to handle it.

    Normally, these issues are only for server applications living in data farms, but we're starting to see the "throw more hardware at it" idea trickle down into the home space.  As usual, the gaming community is leading the way - the AlienWare SLI machines are a great example of this - to improve your 3d graphics performance, simply throw more GPUs at the problem.

    I'm not going to go into diagnosing bottlenecks in general, there are loads of resources available on the web for it (my first Google hit on Microsoft.com was this web cast from 2003).

    But for diagnosing CPU bottlenecks related to concurrency issues, there's actually a relatively straightforward way of determining if you've got a scalability issue associated with your locks.  And that's to look at the "Context Switches/sec" perfmon counter.  There's an article on how to measure this in the Windows 2000 resource kit here, so I won't go into the details, but in a nutshell, you start the perfmon application, select all the threads in your application, and look at the context switches/sec for each thread.

    You've got a scalability problem related to your locks if the context switches/second is somewhere above 2000 or so.

    And that means you need to dig into your code to find the "hot" critical sections.  The good news is that it's not usually to hard to detect which critical section is "hot" - hook a debugger up to your application, start your stress and put a breakpoint in the ntdll!RtlEnterCriticalSection routine.  You'll get a crazy number of hits, but if you look at your call stacks, then the "hot" critical will start to show up.  It sounds tedious (and it is somewhat) but it is surprisingly effective.   There are other techniques for detecting the "hot" critical sections in your process but they are not guaranteed to work on all releases on Windows (and will make Raymond Chen very, very upset if you use them).

    Sometimes, your CPU bottleneck is simply that you're doing too much work on a single thread - if it simply takes too much time to calculate something, then you need to start seeing if it's possible to parallelize your code - you're back in the realm of making your code go faster and out of the realm of concurrent programming.  Another option that you might have is the OpenMP language extensions for C and C++ that allow the compiler to start parallelizing your code for you.

    But even if you do all that and ensure that your code is bottleneck free, you still can have scalability issues.  That's for tomorrow.

    Edit: Fixed spelling mistakes.

     

Page 4 of 33 (815 items) «23456»