October, 2005

Posts
• Flow, and Cycling

Tonight I read a post by Eldon, where (among other things) he says that he has learned how to lose himself in his ride. I've experienced that a number of times - I'm just riding, still paying attention to what's going on, but not thinking about anything.

I've been thinking about how that relates to the flow state when programming. In my experience, it's a lot easier to get into the flow state when programming than it is to get into the analogous state when riding. It may have to do with the degree to which you can shut out external input. In programming, it's pretty easy - put on some appropriate music, and lose myself in the task. On the bike, it's often not possible, since losing oneself can have some pretty bad consequences.

Thoughts? Are the two states analogous, or are they different things entirely?

• Regex 101 Exercise S2 - Verify a string is a hex number

Welcome to week #2 of our class. This is a simple one:

S2 - Verify a string is a hex number

Given a string, verify that it contains only the digits 0-9 and the letters a through f (either in uppercase or lowercase).

• Regex 101 Exercise S1 - Discussion

Welcome to Regex 101 discussion section 3.

My goals for these discussions are twofold. First, I'd like to give a reasonable answer (or set of answers, more likely) to the exercise. Second, I'd like to impart some understanding of how the regex works, both in the "this is what this construct does" and the "this is how this works under the covers" sense. This first one is going to cover a lot of basics, so if you're already familiar with regex, you may want to "read for flavor"...

Our challenge is the following:

*****

S1 - Match a Social Security Number

Verify that a string is a social security number of the format ddd-dd-dddd.

*****

So, we need to come up with a regex pattern that will match a valid string, and not match an invalid string. Consider "999-55-1827" as our sample string.

Our first task is to match a digit. We do that with a character class, written as follows:

[0123456789]

which means "match any one of the characters inside the []". So, this matches a single digit. It's also really ugly to write. We can write this using a shorthand:

[0-9]

meaning "any one character in the range 0-9". It turns out matching digits is a common operation, so the regex language provides a shorthand (table of common shorthands):

\d

So, to match three digits in a row, we'd write:

\d\d\d

Characters that don't have a meaning in the regex pattern language can be used as literals, and "-" doesn't have a meaning at this level, so we can also use that as a literal, giving us the pattern:

\d\d\d-\d\d-\d\d\d\d

That matches our sample correctly. But there is nothing to stop the engine from finding a valid match in the middle of a string - our pattern will also match:

My father's SSN was 999-55-1827, and he loved that number

To limit it to work only with the SSN string, we use anchors. Technically, they're called Atomic Zero-width assertions, which makes it clear why I prefer "anchor". Anchors set limitations on where a match must take place. The two that we want are "^", which means "anchor to the beginning of the string" and "\$", which means anchor to the end of the string. So, our pattern becomes:

^\d\d\d-\d\d-\d\d\d\d\$

and I would consider that a valid answer to the exercise. But not a good one - it's not as readable as it could be (and I'm not going where you think I'm going... yet...). C# provides the option to write a string on multiple lines, and the .net regex classes support comments within the regex (use RegexOptions.IgnorePatternWhitespace), so I'm able to write this as:

^         # beginning of string

\d\d\d    # three digits

-         # literal '-'

\d\d      # two digits

-         # literal '-'

\d\d\d\d  # four digits

\$         # end of string

When it is possible, I think all regex patterns should be written like this.

Now, that's a fine string, but it is a bit hard visually to parse "\d\d\d\d". We can make that simpler by using a quantifier, which modifies a matching item. To match four digits in a row, we can write:

\d{4}

and our whole pattern becomes:

^         # beginning of string

\d{3}     # three digits

-         # literal '-'

\d{2}     # two digits

-         # literal '-'

\d{4}     # four digits

\$         # end of string

Whether that is preferable to the previous choice is a matter of aesthetics. I think that "\d{3}" is slightly nicer than "\d\d\d", but I also think that "ooo" is better than "o{3}", so it's not an obvious choice.

To use this in a C# program? Well, using "Copy as C#" from regex workbench.

Regex regex = new Regex(@"
^         # beginning of string
\d{3}     # three digits
-         # literal '-'
\d{2}     # two digits
-         # literal '-'
\d{4}     # four digits
\$         # end of string
",
RegexOptions.IgnorePatternWhitespace);

Match match = regex.Match("999-55-1827");

Console.WriteLine(match.Success);

I think that's a manageable chunk for the first exercise. If you're the kind who likes spoilers, you might want to read the quantifiers page.

Careful readers may have noticed that my solution doesn't cover some of the cases discussed in the comments to the original question. I will get to those parts later, but they're too complex for early posts.

I'm also considering disabling comments on the exercise posts so that people aren't distracted by them.

My goal is to do one of these a week - with the exercise on Monday, and the discussion on Thursday/Friday. Something like that.

• The rewards of studying science prove tasty...

From the Improbable Research blog, a link to a site using paper plates in education.

Here's my favorite

• A plea to development tool designers...

Systems hang. Especially systems with alpha or beta-quality operating systems on them.

Sometimes dev tools hang, and the only way to get things back is to do a hard reset.

So why do some tools put off saving their Tools->options settings until you exit the program?

• Shower Debugging...

I have a book on my shelf titled "Software Exorcism", which is all about debugging and optimization. It's a good book, but it doesn't have any reference to the (I suspect) widely-used-but-rarely-discussed technique of "shower debugging".

Shower debugging is somewhat related to inspection, where you read source code to try to find the bug, but substitutes hot water and toiletries for a code editor.

I was working on a problem this morning, and managed to figure out what was going on and why in about 10 minutes.

There are, however, a few cautions with this technique:

1. Additional hot water usage, with the associated increased chance of the "arctic blast mind refocusing experience".
2. You will, at some point, forget whether you've washed your hair yet. For the sake of others, please do it again, just to be sure.
3. Two words: Prune City
• Regular Expression Exercise S1

The first in a series of exercises designed to teach you more about regular expressions, written by a guy who got partway through writing a regex book.

But first, a word about tools. It's a lot easier to use a tool to do this sort of thing than it is to write code to do it. So, I suggest one of the following:

So, S stands for simple, 1 stands for 1, so this first one is going to be pretty simple.

S1 - Match a Social Security Number

Verify that a string is a social security number of the format ddd-dd-dddd.

• The book that never was

After I finished my C# book, I looked around for something else to write about. After some thinking, I decided to write a regular expression book that was focused explicitly on the .NET version of regular expressions. Mastering Regular Expressions is a pretty good book, but it suffers from trying to cover too much and at the time, didn't talk about .NET at all.

So, over a period of about a year, I spent time working on the book. (what I probably should have been doing was working on updates to my C# book, but that wasn't very appetizing at the time). After a while, I showed it to some friends, who gave me their honest feedback, which was, "well, yeah, um, we think it needs a lot of work".

They were right, and I have found no desire to spend the time required to get it good enough. But when I'm in a lamenting mood, I lament the fact that there's a bunch of content that I wrote that could be useful. I toyed with posting both the kit and the kaboodle , but that whole "needs a lot of work" thing means that that's probably not the most helpful thing.

So, here's what I'm going to do. One of the unique (or at least rare) things that I did was come up with a series of 25 regex exercises, so I'm going to post one, let people chew on them for a while, and then post my answer. And repeat that at intervals resembling but not precisely equal to a week.

• Insert Smiley

:)

I haven't been paying too much attention to my blog formatting options recently. I'm pretty much a straightforward text guy. Back when I started coding, all we had was uppercase letters. Struth!

Anyway, I'm not big on lots of extra formatting - I'll occaisionally grab a bold, and if I'm feeling funky, I might do a bit in Courier New. (Personally, I think that "Courier New" has been a huge marketing mistake for the Courier company, and I'm waiting for the return of "Courier Classic".)

But today I noticed that I now have a combo box labelled "Insert Smiley", and the sky's the limit. Not only does it have about 50 smileys listed, it has translation next to them. So, if I'm talking about a trip to France, instead of writing a paragraph about lunch, I can just write:

sn pi

to let you know that I had snail pizza.

Or, to describe a Friday night with friends, I could use:

D B |-)

Which means "Drinks beer and sleeps".

I'm sure all of this will be easy for you to understand. For example, given what you've learned, what does this mean?

D B ip

That's right. "Drinks beer in paradise".

See how well that works? Well, time to go now.

[:O][6][:-*][sn][:@][:D]

• Come work in the Movie Maker Team...

The Movie Maker team is hiring.

I haven't written much about the Movie Maker team, but it's a great place to work. Challenging and interesting problems. Making the complex easy (easier?). Pushing the PC architecture to its limits. Drinking lots of free pop.

If you would like to apply, you can find the job listing here. If you aren't sure but just want to learn more, drop me a line, and we'll talk.

Disclaimer: This job posting was packed as full as possible by modern equipment, but the contents may have settled during transmission. Be assured that it contains the full number of bytes imprinted on the email packet. Offer to talk does not represent a guarantee of physical conversation, much less of interesting conversation. But you probably could have figured that out. Free pop does not include all pop but does include a wide variety of assorted beverages, including the ever-popular chocolate milk. We also have, for those of you raised back east, "free soda", and many different flavors of Coke (as in "you want a Coke?" - "Sure, I'll take a Sprite", not as in "hey, want one of these new "Spumoni Cokes (tm)?"). Only one job per applicant, cash value 1 / 2^8 cents.

• Bitmap::FromStream() issues

I've been writing some code the last day or so to pull a PNG bitmap out of a resource file and put it into a GDI+ bitmap. You would like to use:

Bitmap::FromResource()

but unfortunately, that only works for BITMAP resources. I found some code internally that uses Bitmap::FromStream() to create GDI+ bitmap - it's somewhat similar to this code.

That worked fine to draw an image sometimes, but didn't work other times. Specifically, I couldn't do something like:

graphics.DrawImage(&bitmap, 0, 0);

while it would work with:

graphics.DrawImage(&bitmap, 0, 0, 32, 32);

It seems that the bitmap you get back from Bitmap::FromStream() doesn't know how big it is. The first call therefore doesn't work, nor does trying to draw with transparency using ImageAttributes. If, however, you create a new bitmap:

Bitmap* pGoodBitmap = new Bitmap(width, height, PixelFormat32bppARGB);
Graphics graphics(pGoodBitmap);

Status status = graphics.DrawImage(pLoadedBitmap, 0, 0, width, height);

then that new bitmap (pGoodBitmap here) will be a fully-fledged bitmap that you can then use as you wish.

It might also work work to call Bitmap::Clone()

• A new version of C# beyond Linq there will be...

Matt provides some insight...
• Help needed: ASP.NET 2.0 and Google Maps...

[Update: I got this to work by turn on IIS, installing ASP.NET on it, and then using a google map key registered for http://localhost.

I had tried this with Cassini (the development web server that comes with ASP.NET 2.0) with keys registered to both http://localhost and http://localhost:3348 (or whatever the port ended up being), but couldn't get it to work.

So, now my code uses different keys based on whether I'm on my dev system or on the real site.

Thanks for the help...]

I think I'm missing something.

I spent some time today trying to get google maps up and running on my development system, without any luck. I've tried editing my hosts file and pointing my domain name to 127.0.0.1, making up a fake name and pointing that to my current IP address (with a new google api key), and a few other variations.

None of these are working. I think it might be because I'm using the Cassini web server rather than IIS, but I'm not sure.

So, if somebody can list the steps I need to get this working, I'd be much obliged.

• Why doesn't Eric respond to my comment?

Eric doesn't respond to comments because:

1. He prefers pontification to conversation
2. Time spent on comments is better spent learning how to cat juggle
3. They bummer my buzz, man
4. Some other reason

All of these are true to some degree. The author does admit to periods of pontification when up in front of an audience (they break up the periods of intense boredom), and has dabbled in juggling, though not of the feline variety (thwarted by the lack of a "juggling" checkbox on the "Why do you want to adopt a cat?" form). And his buzz, though not necessarily bummered, has at least been tempered at times.

The real reason is technological. Community server has this nice feature where you can tell it to email you comments to your blogs, but for some reason, it only works sporadically. Couple that with the fact that some responses that do make it to my mail server get shunted off to my junk email folder (which I do check from time to time), and it's a wonder I respond to any comments at all.

So, anyway, if you make a comment that you'd like me to respond to and I haven't, you can drop me a line at my traditional microsoft address or my new, easy-to-remember microsoft address.

I have some code that uses ReadDirectoryChangesW() to monitor when files are modified, renamed, or deleted. I've gotten a few bugs from our testers saying that the right thing isn't happening, but I've been unable to repro them.

Today, I finally got a repro, and figured out the cause.

If you read the docs for ReadDirectoryChangesW(), you'll find that the file change information is passed back in a FILE_NOTIFY_INFORMATION structure. In the docs for the FileName field, there's a single line that says:

If there is both a short and long name for the file, the function will return one of these names, but it is unspecified which one.

For the youngsters, in ancient times, all DOS (and windows) filenames were short. You got 8 characters for filename, a period, and then 3 for the extension, giving you the "8.3" format for filenames. When long filenames were introduced, you needed a way to not break every app in existence, so a mapping was developed that took a long filename and gave you a unique short filename for a given directory. If you've ever seen a name like "graph2~1.jpg", that's a short filename.

Short filenames really aren't seen much these days, but apparently they're still alive and well in some places. And the bad part is that not only is it unspecified whether it will return a short or long name, the one that it returns isn't invariant for a single file. In my case, if you renamed a file, you got a notification with the long filename, where if you deleted the file, you got a notification with the short filename. For some files...

For other files, you got the long filename in both cases, which was why I was having so much trouble getting a repro.

That behavior is evil.

The fix is to store both the long and short names in my list of files to watch, so I can detect both cases.

Maybe Ray can tell me why things work this way.

Page 1 of 1 (17 items)