Larry Osterman's WebLog

Confessions of an Old Fogey
Blog - Title

Venting steam

Venting steam

  • Comments 21

Ok, today I'm going to vent a bit...

This has been an extraordinarily frustrating week (that's a large part of why I've had virtually no technical content this week).  Think of this one as a peek behind the curtain into a bit of what happens behind the scenes here.

The week started off great, on Tuesday, we had a meeting that finally put the final pieces together on a month-long multi-group design effort that I've been driving (over the course of the month, the effort's wandered through the core windows team, the security team, the terminal services team, the multimedia team, and I don't know how many other teams).  For me, it's been a truly challenging development effort and I was really happy to see it finally come to a conclusion.  I've been working on developing the non controversial work, and that stuff has been going pretty well.

On Wednesday, I started trying to test the next set of changes I've made.

I dropped a version of win32k.sys that I'd built (since my feature involves some minor changes to win32k.sys) onto my test machine and rebooted.  Kaboom.  The system failed to boot.  It turns out that you can't drop a checked version of the win32k.sys onto a retail build (yeah, I test on a retail OS).  This isn't totally surprising, if I'd thought about it I'd have realized that it wouldn't work.

But it's not the end of the world, I rebooted my test machine back to the safe build - you always have to have a safe build if you're doing OS development, otherwise if the test OS crashes irretrievably (and that does happen on test OSs), you need to be able to recover your system.

Unfortunately, one of the security changes in Longhorn meant that I was unable to put the working version of win32k.sys back on my machine when running my safe build.  Not a huge deal, and if I'd been thinking about it I could have probably tried the recovery console to repair the system.

Instead, I decided to try to install the checked build on my test machine (that way I'd be able to just copy my checked binary over)

One of the tools we have internally is used to automate the installation of a new OS.  Since we do this regularly, it's an invaluable tool.  Essentially, after installing it on our test machine, we can click a couple of buttons and have the latest build installed cleanly on our test machines (or we can click a different set of buttons and have a built upgraded, etc).   It's extraordinarily useful because it pretty much guarantees that we don't have to waste time chasing down a debugger and installing it, enabling the kernel debugger, etc.  It's a highly specialized tool, and is totally unsuitable for general distribution, but boy is it useful if you're installing a new build once a week or so.

I installed the checked build, and my test machine went to work copying the binaries and running setup.  A while later, it had rebooted.

It turns out that the driver for the network card in my test machine isn't in the current Longhorn build - this is temporary, but...  No big deal, I have a copy of the driver for the network card saved on the test machine's hard disk.

The thing is, sometimes (as often happens) the auto-install tool is temperamental. It can be extremely sensitive to failure scenarios (if one of the domain controllers is unavailable, bad sectors on the disk, etc).  And this week the tool was particularly temperamental.  And it turns out that not having a network card is one of the situations that makes the tool temperamental.  If you don't get things just right, the script can get "stuck" (that's the problem with an automated solution - it's automated, and if something goes wrong, it gets upset).

And that's what happened.  My test machine got stuck somewhere in the middle of running the scripts.  I'm not even sure where in the scripts it got stuck, since the tool doesn't report progress (it's intended for unattended use, so that normally isn't necessary). 

Sigh.  Well, it's time to reinstall.  And reinstall.  And reinstall.  The stupid tool got stuck three different times.  All at the same place.  It's quite frustrating.   I'm skipping a bunch of stuff that went on here as I tried to make progress, but you get the picture.  I think I did this about 4 times yesterday alone.

And of course the team expert for this tool is on vacation, so...

This morning, I'm trying one more time. 

** Flashes to an image of someone banging their head against the wall exclaiming that they're hoping it will stop hurting soon **

I just want to get to testing my code - I've got a bunch of work to do on this silly feature and the stupid tool is getting in my way.  Aargh.

Oh, and one of the program managers on the team that's asking for my new feature just added a new requirement to the feature.  That's going to involve even more cross-group discussions and coordination of work items.

Oh well.  on the other hand, I've made some decent progress documenting the new feature in it's feature spec, and I've been to some really quite interesting meetings about the process for our annual review cycle (which runs through this month).

 

Edit: One of the testers in my group came by and helped me get the machine unstuck.  Yay.

 

  • Boy, do I know that feeling...when it almost seems that something is determined not to let you even test your program, it's infuriating!
    I'm not sure I feel comforted by that you have the same problem at MSFT sometimes...
  • Why don't you have a dedicated Virtual PC machine? That way you can trash your machine, delete the drive image and just copy / paste back over the top from a backup.
  • Manip,
    Because a dedicated VirtualPC machine is great if I'm not updating the OS.

    But I'm putting a brand new OS.

    The tool does everything you're describing for me, but it runs on the machine (which means I don't get the performance hit of VirtualPC)
  • I was thinking the same thing..use virtual pc, but when doing os development, it's better to rely on physical devices and not emulated stuff. What if there's a bug in virtual pc...?
  • Even automated tools should produce output (to be redirected to a log file or whatever) so one can see what went wrong when something does. Escpecially tools that are used internally and do not undergo the testing released product goes through. And yes, I've learned the hard way :)
  • Jerry, I know what's going wrong (the script is failing to unjoin a domain), I just don't know why the script is trying to unjoin a domain or how to fix it.

    And for the people who suggested virtual PC. I need to test (regularly) at least four different audio adapters, and two different types of USB devices. If I'm using VPC, can I do that? What about USB arrival/removal scenarios, can I test those as well?

    I believe the answer is "no", but on the PC, the answer is "of course".
  • Yes, many times I feel like I spend more time fighting the environment, the tools, my machine, everything - rather than simply trying to fix a problem or get some programming done.

    As a for instance - we use SourceGear Vault and Visual Studio 2003. I have a project that no matter how I retrieve it from Vault, VS2003 won't let me edit the file. Claims it's under source code control under a different project and "editing isn't recommended." Recommended?! HA! It's not possible! The file isn't marked read only but I'll be damned if I can get VS2003 to let me change the damned thing. Used the Vault CLI to download the project, the IDE, the Vault GUI, bleh.

    I finally used the Vault GUI, checked out the project to a new location, and used an alternate IDE (Eclipse) to make my changes. I haven't the foggiest what's gotten into Visual Studio, but ... well, sounds like maybe my life of fighting the machine to get things done isn't so unusual. ;p

    I just dread the day when I have to work on the aforementioned project again and have to find a way around the problem. ;)
  • "which means I don't get the performance hit of VirtualPC"

    And that is why you need the quad XEON machine with 4GB of RAM ;-)
  • "I was thinking the same thing..use virtual pc, but when doing os development, it's better to rely on physical devices and not emulated stuff. What if there's a bug in virtual pc...? "
    The opposite may also be true: your hardware contains a bug which the virtualization environment does not contain.

    Think about developing an OS for an embedded system, where you're developing the hardware at the same time as the software... and you may also not have the luxury of having any real hardware finished for you to test on. Emulation comes to rescue!

    Virtualization is very good when you can use it. Would be nice if MS Virtual Server supported custom virtual hardware. Like a "Virtual Server DDK" :)

    Btw. USB support is nr. 3 at the "Most wanted features" list at www.virtualization.info.
  • "And it turns out that not having a network card is one of the situations that makes the tool temperamental. If you don't get things just right, the script can get "stuck" (that's the problem with an automated solution - it's automated, and if something goes wrong, it gets upset)."

    Glad I'm not the only person who has tripped across that problem. I worked as an SOE Build guy for a large company. Our NT4 scripted build would fail when it couldn't a network card (go figure!). The solution? Detect if a card was missing, and then install the loopback adapter.
  • Andreas Haeber wrote:
    "Virtualization is very good when you can use it."

    And it is ideal for Application Packaging. You save a bunch of the time not having to re-imaging a physical PC.
  • D. Absolutely. VirtualPC is a developers dream for a certain class of developers.

    Unfortunately, I'm not a member of that class in my current job. In previous jobs it would have been quite nice, but...
  • While I'm happy to see you also suffer like uncountable Microsoft customers do with even released version, I'd in your shoes be pissed to the point I'd get a hold of bill himself and say "We gotta talk - now".

    Yeah, it's just me, and it sometimes gets me into trouble, but then I make stuff work. For a living.

    What I really found interesting here, is what you didn't write. Are you, finally, going to put at least remote audio into the Terminal Server Client? For ActiveX audio too? No? Oh, OK, it was worth a shot.

    When you do D3D remoting (hehe, this will bake MS' noodles considering how they hardcoded it to be machine-local) as well as any OpenGL over X - create a blog entry, willya? :-)

    Make no mistake - I love your blog. It was just I couldn't stop myself facing an open door of that size. Keep it up.
  • Mike,
    Remote audio works just fine today in Windows XP. I'm not 100% about dsound but I believe it works too.

    I don't know what "activex audio" is. There are two ways of playing audio on Windows - the MME APIs (PlaySound, waveOutXxx) and DSound (DShow uses DSound). I don't know what activex audio is.

    I can't speak about D3D remoting, I'm not on the remote team.

    And venting to Bill wouldn't help. This was just a stupid tool issue. And I've complained to the right people.

    My customers won't see this, ever.
  • > Unfortunately, one of the security changes
    > in Longhorn meant that I was unable to put
    > the working version of win32k.sys back on my
    > machine when running my safe build.

    I don't know enough about the security changes between current systems and Longhorn, but with current systems this seems pretty trivial. The OS that you're debugging is installed in some partition, say E. The safe OS that you use for recovery is in some other partition, say D. You boot the one in D, look at E:\Windows\System32, and assign that folder's ownership to the Administrators group so you can copy your safe win32.sys back to that directory. Will the Longhorn on partition E refuse to boot itself when it detects that its System32 directory has been modified that way?

    Regarding parallel installations for this kind of recovery, some Knowledge Base articles even used to recommend it back in the days of NT4, but now Microsoft says anyone doing this has to pay for multiple licences for their one machine. Don't tell anyone, but before I noticed that about licences, on one machine I activated Windows XP installations on both partitions D and F. I've only needed to boot that F version around 5 times though.

    Hmm wait a minute, on one friend's machine where I couldn't log in through the recovery console, I put a parallel installation on partition E even though his real one was on D. After that I could repair his D, so he didn't lose any data. I don't remember if I activated the one on his E. (Actually I had told him to put all his data files on E so that if his installation on D dies then we can wipe D and reinstall, but he didn't understand and he still had a bunch of stuff in "My Documents".)
Page 1 of 2 (21 items) 12