If you read my What Do I Do post, you'll know that I'm the Content Owner for this year's PDC 2008 in Los Angeles. MIX08 is behind us, and I've just recently transitioned away from my Web GO role. This means that I can now focus 100% of my time and attention on our October event. It's going to be fantastic!
One of the many responsibilities I have as Content Owner is to create the master schedule for the event. This is the schedule that tells you which session is in which room and at what time. For PDC 2005, we delivered over 200 sessions at the conference, not including repeats (we run repeats of popular sessions that are filled to capacity).
Because PDC is where we talk about the future of Microsoft's platform, all of the content relates in some way to our overall strategy (which is typically delivered during big keynotes and general sessions). This means that some sessions need to be scheduled ahead of others to provide foundational and prerequisite knowledge. For example, a 200-level (intermediate) session covering a specific topic should precede a 300-level (advanced) or 400-level deep-dive. The following diagram is a simplification, but it illustrates my point.
Here are some additional constraints that must be considered when creating a master schedule:
To tackle the scheduling problem, sessions are typically grouped into tracks. These tracks are then scheduled in parallel, almost like mini-conferences running alongside each other. While this can be effective, many studies show that conference attendees don't attend tracks; they attend sessions. In other words, it's relatively rare for someone to sit through all of the sessions in a single track. An attendee is much more likely to hop from track to track to attend the sessions they're interested in.
Because of this track-hopping behavior, post-event surveys often reflect the inability of people to attend all of their favorite sessions. This is almost always due to a conflict where two or more desired sessions share the same time slot, and an attendee is forced to pick only one. For our MIX07 and MIX08 events, we tried to mitigate this undesirable outcome by publishing the session recordings within 24 hours of their completion (we actually averaged under 12 hours in both cases). This helps, but it's not a panacea.
In an ideal world, conference participants would be able to attend all of their favorite sessions. We almost always do a pre-event survey asking people to pick the sessions they'd like to attend, and we use that data to extrapolate expected room capacities for scheduling. For PDC, where we announce many brand new technologies, we have the additional challenge of making educated guesses about how many people will want to attend sessions that aren't revealed until the first day of the conference. It's an inexact science, to be sure.
For PDC 2008, we're going to try something brand new. In my spare time, I've been working on a genetic algorithm (you may want to review this article to understand some upcoming terms) that takes all of the above factors into account, including a couple more:
In my version of the algorithm, each solution in the population represents a conference schedule. The fitness function takes all of the aforementioned factors into account, and penalizes solutions with undesirable attributes. At the end of each generation, an elite group of solutions is retained, and the remainder are subject to both crossover and mutation. Generally speaking, optimal solutions are discovered in under 500 total generations.
The end result is that most conference attendees should be able to attend most of their desired sessions, all without a rigid track structure.
Update: Channel 9 has published a 32-minute video interview of me discussing this technique.
If the genetic algorithm doesn't work out, try simulated annealing. It is very good for this kind of problem :)
Man, I really wish I can make it to PDC. With Live Mesh announced and Silverlight, this is going to be exciting.
Thanks for the tip, Florian. Although I didn't mention it in the article, I did experiment with simulated annealing, and it's still part of the project. I'm about to do some testing with a much larger dataset, and my guess is that I'll start to see some differences in the two algorithms soon.
Interesting! I'm curious how the differences will turn out in terms of compute time and result.
Well, that just about wraps up my "Is there anything Mike Swanson can't do?" list. (scratches off "#32 - Applied Genetics"). Only two to go... "#57 - Paragliding" and "#138 - Surfing." I expect these will be on your FY09 commitments.
That sounds like a fun challenge, although I like your use of the word 'most' at the end of the post. Years ago, as my senior project in college, I tried writing a scheduling application (in LISP of all things) that would create the class schedule for the Computer Science department for an upcoming quarter. After working on it for a while, I learned about NP-Complete problems. Man. So, at the end, I found solice in the fact that I was able to automatically schedule *most* of the classes. I'm eager to see how your scheduling turns out!
I love it! You can always introduce a random number generator in there if it helps... :-)
How compute intensive do you expect this to be? We've had several customers use our .NET-based grid computing framework to run genetic algorithms across small and large grids alike. Let me know (at dan (at) digipede (dot) net if you're interested). MSR has a grid they may be willing to share if you need it...
Cool stuff. What does your fitness function look like? Seems to me that if you're factoring into your score each attendee's ability to attend all their favourite sessions, then you need for each possible solution some clever algorithm to see how well that can be done. For example, if one session does occur more than once during the day, then there are multiple options for the attendee (for that solution). Does that mean there are then multiple options for each attendee+solution you must score?
I have an internal PDC2008 gadget running in my Windows Sidebar, and it reports that there are only 25
Today was our first day of PDC2008 dry-runs on campus. To present a session at the conference, all speakers
La liste des sessions de la conférence Microsoft 2008 pour les développeurs et architectes est au 2/3
After 1.5 days at PDC, so far I'd have to say that the chosen algorithm was not a very good one. There have been countless room overflows, many times to a second overflow room, and quite often the overflow room is on the opposite side of the facility. Saw in Hanselman's talk yesterday that he had to remove some slides / content since his talk was scheduled before the Windows 7 keynote.
Was there a snafu in the number of attendees, or was all this expected?