I love coming up with new programming projects for students to do. I just think that coming up with something different is fun, makes things more interesting for the students and helps keep a teacher fresh. But where do new ideas come from. Sometimes from textbooks of course. I have a large collection of textbooks and I know that many others do the same thing. They borrow ideas from old textbooks and fit them into a new programming language or design paradigm.

But I really like to come up with ideas from real life. I think I found one today. Adam Barr, who works for Microsoft, wrote a blog post today about how Microsoft comes up with email addresses (called aliases here) for employees. The basic plan is the first name and the first letter of the last name. It’s all simple enough until you add some constraints.

  • No alias can be more than eight letters
  • If two people have the same first name and last initial the second person uses the first two letters of the last name
  • If two people have the same first name and the same first two letters of the last name go to a third letter and so on (never forgetting the eight character limit)

Oh and because there were starting to be too many duplicates sometimes (alternate perhaps?) use the first letter of the first name and as many letters of the last name as it takes to complete the name or reach 8 letters. I’ll leave the rest of the constraints to the student, ah, I mean classroom teacher.

To me this screams “text manipulation project.” And of course I love text manipulation projects. But wait there is more. Since you have to watch out for duplicates that opens the possibility for database additions or even hashing algorithms for duplicate detection. The sky is the limit. One could get very creative here.

One last thing. To do this really right you need names. The more names the better. Good news. The US Census Bureau has lists of names. They have lists of the most common male and female first names and most common last names from the 1990 US census at their web site.

Each of the three files, (dist.all.last), (dist. male.first), and (dist female.first) contain four items of data. The four items are:

  1. A "Name"
  2. Frequency in percent
  3. Cumulative Frequency in percent
  4. Rank

Now if that data doesn’t suggest some interesting data parsing and manipulation projects you really do need some summer vacation don’t you? Or am I just way too much the geek?