Regex 101 Exercise I6 - Remove font directives from HTML
Remove all <font…> or </font> directives from an HTML string.
I've decided to start linking my answers back to the original posts, since the answers given there are often as good or better than the one that I give.
The most obvious way to write this is:
That's pretty straightforward - match either a <font...>, or a </font>. But it's also wrong, since the ">" in the first part will match the last ">" in the string. We need the non-greedy qualifier:
That does what we want it to do (assuming we use singleline and ignorecase options...)
Other ways of doing this showed up in the comments. Maurits suggested using 3 regexes, or a simple one:
I don't know whether I prefer that one over mine. It is shorter, though it's a bit harder for me to read the /? part.
Kbiel suggest a version without the non-greedy option:
which also works well, though I prefer the non-greedy version due to readability.