Sorting it all Out Michael Kaplan's random stuff of dubious value Be sure to read the disclaimer here first!
A few days ago, Scott Hanselman asked in the Suggestion Box:
I'm doing an English/Spanish site with ASP.NET using some client side validation with Regular Expressions. I wanted to write a single Regular Expression for most large text fields: ^[\w\d\s-'.,&#@:?!()$\/]+$ Notice that I'm using \w and \d for WORD characters and DIGITS respectively. I was assuming that JavaScript would allow "áÁéÉíÍóÓúÚñÑüÜ" and characters like it when a browser is configured for Spanish, but it seems only to care about A-Za-z. I wanted to avoid using A-Za-z as its so English Focused. What's the i18n "right thing to do" when using Regular Expressions? Ayudame por favor! ;)
I'm doing an English/Spanish site with ASP.NET using some client side validation with Regular Expressions.
I wanted to write a single Regular Expression for most large text fields:
^[\w\d\s-'.,&#@:?!()$\/]+$
Notice that I'm using \w and \d for WORD characters and DIGITS respectively. I was assuming that JavaScript would allow "áÁéÉíÍóÓúÚñÑüÜ" and characters like it when a browser is configured for Spanish, but it seems only to care about A-Za-z.
I wanted to avoid using A-Za-z as its so English Focused.
What's the i18n "right thing to do" when using Regular Expressions?
Ayudame por favor! ;)
Of course he did not wait for me to answer <grin>, instead choosing to post about it on his own blog in a post entitled Internationalized Regular Expressions.
Its funny, I find myself using Regular Expressions more often in Visual Studio's Find/Replace than I do in actual code using the RegEx classes. Not sure what that means, but it is probably bad....
Anyway, the help for Regular Expressions that you get to if you click the Help button on the Find dialog has the following table in it, that I have used quite a bit:
The following table lists the syntax for matching by standard Unicode character properties. The two-letter abbreviation is the same as listed in the Unicode character properties database. These may be specified as part of a character set. For example, the expression [:Nd:Nl:No] matches any kind of digit.
:Luhe
:Llhe
I use these all the time when I am trying to get behavior that respects more of Unicode.
Not sure if this will help you with what you are looking for, but it is the way I use to get internationally aware regular expressions.... :-)
This post brought to you by "ר" (U+05e8, a.k.a. HEBREW LETTER RESH)Because this post is כשר לפסח in anticipation of the festivities that start less than 24 hours from now)