Fabulous Adventures In Coding

Eric Lippert's Blog

Regular Expressions From Scratch, Part Four: The Kleene Closure of a Language

Languages are sets, so we can take any two languages (over the same alphabet) and take their union to form a new language. Just as a reminder:

Definition 7: The union of two sets L and K is the set with all the members found in either, and is written L ∪ K.

We're going to take the Kleene Star one level further and say that it applies to languages too.

Definition 8: The Kleene Closure of a language L is L* = { e } ∪ L ∪ LL ∪ LLL …

Got it? L* is the language consisting of set of strings, where those strings are any number of strings in L concatenated together.

I want to extend our notion of concatenation in one more way. When I write a string or string variable next to a language, I mean "concatenate the language onto the end of the language that consists of only this string". So when I say abLa I mean {ab}L{a} Again, you can infer the braces from the context.

Let's do an example of using * and string concatenation of languages, just so its clear.

S = {0, 1}
L = {011*}
1L* = {1, 101, 1011, 101101, 1011011, …}

Or, in English, this is the set of strings w in S* such that w:

  • starts with a single 1,
  • has no more than one 0 in a row, and
  • ends with 1

We need one more thing to make this notation sufficiently terse. You probably noticed that the Kleene Star above only applied to the thing immediately to the star's left, whether the language or the symbol. Suppose we've got four languages: H, J, K and L. We can form two new languages through concatenation: G = HJ and F = KL. And then we can form a seventh language, D = G*F*. But we can't write D = HJ*KL*, because that's a different language. What we mean is D = (HJ)*(KL)*.

Therefore I'm declaring that we can use parenthesis in the normal way we're all used to for order of operations. We will figure out from context whether parens mean "this is a finite sequence" or "this is a grouping of concatenations".

Remember, we're looking for a way to concisely characterize languages, and clearly we're onto something here. We now have union, grouping, concatenation and Kleene closure. In other words, we have enough tools to define regular expressions, which we'll do next time. Stay tuned!

Published Monday, November 28, 2005 7:00 AM by Eric Lippert

Comment Notification

If you would like to receive an email when updates are made to this post, please register here

Subscribe to this post's comments using RSS

Comments

 

Attentive said:

Just so you know we are out here!
Thanks for this
November 30, 2005 9:03 AM
 

Jeff Mayeur said:

Second that thanks... pretty cool... It's still a few miles over my head, but I'd have to say, it's making regexp much more understandable when viewing the concept from sets plateau... very cool.

Thanks
November 30, 2005 6:00 PM
 

Shawn Cheng said:

Hello Eric, I don't understand what you mean by "has no more than one 0 in a row". What row? Do you mean a member of 1L*? As the example shows that we can have more that one 0...
January 8, 2006 11:44 AM
 

Jon Turner said:

Shawn, I think he means that there is no more than one consecutive 0. So you can not have any strings that contain 00 or 000, but they could contain 1010, etc.
January 24, 2006 11:22 AM
 

Synesthesia » Links Roundup for 2006-02-21 said:

February 22, 2006 4:18 AM
 

Dan Sellers's WebLog said:

When it comes to validating input regular expression becomes a very important part of your security plan. ...
March 3, 2006 6:29 PM

Leave a Comment

(required) 
(optional)
(required) 
Submit

About Eric Lippert

Eric Lippert is a senior developer on the Microsoft C# compiler team. Before that he worked on the framework of Visual Studio Tools For Office. Before that, he worked on the compilers, runtimes and tools for VBScript, JScript, Windows Script Host and other Microsoft Scripting technologies. He lives in Seattle and spends his free time editing books about programming languages, playing the piano, and trying to keep his tiny sailboat upright in Puget Sound.

This Blog

Syndication


© 2009 Microsoft Corporation. All rights reserved. Terms of Use  |  Trademarks  |  Privacy Statement
Microsoft
Page view tracker