User-Agent: Mozilla/5.0 (X11; U; Linux i686; ca; rv:220.127.116.11) Gecko/20101206 Ubuntu/10.10 (maverick) Firefox/3.6.13 Build Identifier: Firefox 3.6 and Firefox 4 (beta 1..10) The RegExp /\bword\b/ return true when it finds the whole word "word". But when the search word starts with or ends with an Unicode char it does very strange things. Reproducible: Always Actual Results: For example: 1) /\baixò\b/.test(" això ") : false should be TRUE 2) /\baixò\b/.test("això") : false should be TRUE 3) /\baixò\b/.test("aixòs") : true should be FALSE Without Unicode chars it works fine: /\baixo\b/.test(" aixo ") : true /\baixo\b/.test("aixo") : true /\baixo\b/.test("aixos") : false I've found some bugs related with this: https://bugzilla.mozilla.org/show_bug.cgi?id=247179 https://bugzilla.mozilla.org/show_bug.cgi?id=550984 Both bugs talk about per ECMA-262 18.104.22.168, the \b assertion should break at 'word' boundaries, where 'word' means characters A-Za-z0-9_ and no others. If that case, examples 1 and 2 are correct and should return FALSE, but example 3 is still incorrect and should return FALSE.
> but example 3 is still incorrect No, it's correct and should return true. \b matches between chars X and Y if one of X and Y is in [A-Za-z0-9_] and the other is not. It also matches at the beginning and end of the string. So your example 3 matches, because 's' is in that set but 'ò' is not, so /ò\b/ matches the string "òs".
Thank you Boris, now I understand how it works. But it seems that \b a little useless when working with localized strings.
Well, yes. It is. But for general Unicode strings the concept of "word" doesn't exactly make sense (or more precisely, what sort of sense it makes, if any, is still an active research topic in linguistics, even with decades of work behind us).
lwall realized Perl 5 went down a bad path with regex extensions that were long-winded for the more common operations, e.g. (?:...) for non-capturing groups, yet still had [abc] for Unicode-hostile character classes. So for Perl 6 he broke compat utterly. JS is stuck with Perl5-based regexps but we'll try to fix things up for Harmony. If you are interested, see http://wiki.ecmascript.org/doku.php?id=strawman:strawman under "Regular Expressions". /be