Exercise S4 - Extract load average from a string
The shop that you work with has a server that writes a log entry every hour in the following format:
8:00 am up 23 day(s), 21:34, 7 users, load average: 0.13
You need to write a utility that lets you track the load average on an hourly basis. Write a regex that extracts the time and the load average from the string.
This is pretty close to the first thing I ever did with regular expressions. I had some logfile information I needed to process. I started writing in C++, and if you've ever tried to do lots of character manipulation in C++, you know how much fun that can be.
For this sort of thing, I like to look for good delimiters. To get the time, I'll use "up" as the delimiter, which means I can match with:
The \s is something new, it means "any whitespace character". I next need to pull out the load average. I'll use "load average:" as the delimiter, so the regex to pull that out is:
and I can string them together to get:
.+\s*up # match time.+? # skip middle sectionload\ average:\s*[0-9.]+ # match load average
I added the middle clause to skip the characters in the middle that I don't care about. I also switched to multi-line mode, which means that I need to use RegexOptions.IgnorePatternWhitespace, and that required me to change "load average" to "load\ average" so that the regex engine wouldn't ignore the space (after I stared at it for a minute, wondering why it wasn't working...)
If I run this in regex workbench, it will report:
0 => 8:00 am up 23 day(s), 21:34, 7 users, load average: 0.13
That tells me that the match worked, but not much else. What I need is a way to extract certain parts of the string, which is done with a "capture" in the regex language. The simplest form of a capture is done by enclosing part of the regex in parenthesis:
(.+)\s*up # match time.+? # skip middle sectionload\ average:\s*([0-9.]+) # match load average
Executing that gives:
0 => 8:00 am up 23 day(s), 21:34, 7 users, load average: 0.13 1 => 8:00 am 2 => 0.13
The first capture (index 0) is always the entire match, and then subsequent captures correspond to the portions of the match enclosed in parenthesis. In code, if I wanted to pull the time out, I would write something like:
string time = match.Groups.Value;
That works fine. I could declare victory, but I don't really like the "Groups" part - it doesn't tell me much. Nicely, the .NET regex variant provides (as do some others) A way to name captures. That allows me to write:
Running that gives me:
0 => 8:00 am up 23 day(s), 21:34, 7 users, load average: 0.13 Time => 8:00 am LoadAverage => 0.13
and I could now write code that looks like:
string time = match.Groups["Time"].Value;
which is very clear - clear enough that I often will not bother with the local variable.
That's gets us to where I wanted to get. You may have noticed that I didn't try to validate the time nor did I use anchors for the beginning and end of the string. In this example, I'm dealing with well formed text - the server log is always going to look the way that it does - and it's not worth the effort or complexity to do more than what I did.