Regular Expressions - Video Introduction to the STL, Part 8

Regular Expressions - Video Introduction to the STL, Part 8

  • Comments 9

C++0x's <regex> header combines Perl's regular expressions with C++'s templates and iterators.  The result, like the rest of the STL, is insanely powerful but potentially overwhelming at first sight.  It's actually easy to use, and much easier than writing string processing code by hand.  I demonstrate how to start using <regex> in Part 8 of my video lecture series introducing the Standard Template Library.

 

Previous parts:

 

Part 1 (sequence containers)

Part 2 (associative containers)

Part 3 (smart pointers)

Part 4 (Nurikabe solver) - see Wikipedia's article and my updated source code

Part 5 (Nurikabe solver, continued)

Part 6 (algorithms and functors)

Part 7 (algorithms and functors, continued)

 

Attached to this blog post are the slides that I mentioned in the video, which contain an overview of the Perl/JavaScript/ECMAScript regular expression grammar used by C++, as well as many examples.

 

Stephan T. Lavavej

Visual C++ Libraries Developer

Attachment: regex-1.0.pptx
  • Played around with regex, and found it pretty awesome. I did run into a few places in the implementation that moved iterators via it++ instead of ++it, but that's a minor complaint. A few suggestions (for the STL itself, not for the VC implementation): First, in the event of a syntax error in the pattern, it would be nice for the thrown exception to include an iterator that references the position in the pattern that triggered the error. Second, it would be nice for the match format function to accept a sequence of CharT instead of requiring that the replacement format be provided as a basic_string<CharT>.

  • > I did run into a few places in the implementation that moved iterators via it++ instead of ++it, but that's a minor complaint.

    Looking at VC10 RTM's <regex>, all but two occurrences of foo++ are of the form *foo++, where we want to dereference the iterator before stepping it forward. This "should" (famous last words, I know) be equivalent to *foo followed by ++foo, after the optimizer has done its work. The traditional advice to prefer ++it to it++, which we are obviously aware of, applies when you're incrementing an iterator and doing nothing else (in that case, there's no reason to construct a return value, and hope that the optimizer will throw it away).

    The last two occurrences are _Chrs[_Nchrs++] = _Ch; and while (_Node->_Max == -1 || _Ix++ < _Node->_Max), where integers instead of iterators are involved, and where we do want the old values.

    > Second, it would be nice for the match format function to accept a sequence of CharT instead of requiring that the replacement format be provided as a basic_string<CharT>.

    If you're referring to slide 28 of my presentation, the Library Issue that I filed was fixed by the Standardization Committee in the C++0x Working Paper, and will be implemented in VC.

  • Interesting stuff regex and finally with c++0x i don't need to write a custom mini parser every time i have to parse some strings

    Keep these videos coming please,

    Stephan T. Lavavej, good work

    In case you missed it in the comments of part 7:

    @STL: > Is there anything else that people want to see?

    How about making a simple yet extensible calculator using STL

    i think a lot the STL will come in handy.

    What do you think about that suggestion ?

  • I think a detailed rundown of iostreams would be worth a talk.

  • It's great to have regular expressions in VC++ 2010, but the implementation doesn't seem finished. Flags like (?i) which I use, possessive quantifiers, atomic grouping, look behind, conditionals, begin and end of string with \A and \Z and comments don't work yet.

  • [Sam]

    > How about making a simple yet extensible calculator using STL

    The hard part there is parsing, not computation. For parsing something beyond the capability of a regular expression, you really want a dedicated parsing library like Boost.Spirit: boost.org/.../index.html

    [Michael Hamilton]

    > I think a detailed rundown of iostreams would be worth a talk.

    I agree. However, I don't consider myself to be an expert with them, so I don't think I'll cover them in this series (which covers the STL proper, anyways - I consider things like regex to be new citizens of the STL, but iostreams is definitely different).

    [Mark Van Peteghem]

    > Flags like (?i) which I use, possessive quantifiers, atomic grouping, look behind, conditionals, begin and end of string with \A and \Z and comments don't work yet.

    That's because none of those features are in ECMAScript 3, which is the grammar used by the C++0x Working Paper. They are, however, implemented by Boost.Regex: boost.org/.../index.html

  • [Stephan T. Lavavej]

    Yes, I'm used to the Boost.Regex library, and wanted to convert my code to std::regex, hence my disappointment.

    I'm a bit confused. On boost.org/.../perl_syntax.html I see that the features that I mentioned are part of the Perl syntax, and at the bottom of that page it says that the Perl syntax flag is equivalent to the ECMAscript syntax flag.

    And in the video you mentioned that ECMAscript 3 (the default) is preferred because it has the most features, so the other syntaxes won't help me. Too bad that C++0x uses ECMAscript 3 if it doesn't have these features.

  • Stephan,

    thank you for these introductions. Thumbs up!

    Althought I am working with the STL for quite a long time, many aspects got much clearer for me.

    Two suggestions for future introductions: iostream (as mentioned earlier) and the new std::bind.

  • Would be nice to see a series of char and wchar_t related stuff inside std lib.

    - of course the std strings and how do efficient string processing

     (support of multi-char/wchar_t encodings and how the stl handles this, memory allocation, is it good to have const std::string& as func parameter or bettter use const char*)

    - fstreams in different encodings and how they handle char/wchar_t

    - converting between char/wchar_t in different encoding and how locales play a role here

    Anyway, good series! I like the simple examples still learn some cool tricks on every video.

Page 1 of 1 (9 items)