Closed Bug 247179 Opened 21 years ago Closed 21 years ago

\b and \w assertions recognises non-ASCII alphanumerics as word characters

Categories

(Core :: JavaScript Engine, defect)

defect
Not set
normal

Tracking

()

VERIFIED FIXED

People

(Reporter: malcolm-bmo, Assigned: malcolm-bmo)

References

()

Details

(Keywords: js1.5)

Attachments

(2 files, 2 obsolete files)

per ECMA-262 15.10.2.6, the \b assertion should break at 'word' boundaries, where 'word' means characters A-Za-z0-9_ and no others. It seems that the current implementation includes other characters (see example URL). js/src/jsregexp.c includes the following comments (line 930ff or so): * '\b' Word boundary (between \w and \W). * '\w' A word character, [0-9a-z_A-Z]. This is happening because JS_ISWORD is based on JS_ISALNUM, which is true for non-ASCII alphabetics. Nothing else seems to use JS_ISALNUM. I think that JS_ISWORD should probably not be using JS_CTYPE to determine whether a character is a 'word' character. Rather, it should probably check the allowed characters specifically. JS_ISALNUM would then no longer be necessary.
Something like this? (disclaimer: compiles, not tested). I forgot to mention: IE currently appears to follow the ECMA-262 spec, as far as this behaviour is concerned. Should we change to match IE and ECMA-262?
Assignee: general → malcolm-bmo
Status: NEW → ASSIGNED
Attachment #150975 - Flags: review?(brendan)
Seems real enough -- wonder whether the ECMA spec reflects the intentions of the technical group, though. Cc'ing waldemar. /be
Keywords: js1.5
Comment on attachment 150975 [details] [diff] [review] Change JS_ISWORD to disallow non-ASCII alphabetics No need to move the #define down. Also, do you want JS_ISDIGIT or JS7_ISDEC? /be
Attachment #150975 - Flags: review?(brendan) → review-
> No need to move the #define down. Ok, just thought it would read better if it depended only upon macros that were fully defined at point-of-definition. I know it doesn't make a difference. > Also, do you want JS_ISDIGIT or JS7_ISDEC? Ah, didn't see JS7_ISDEC - yes, that would be right. I'll generate a new patch tonight, if required. Hopefully, I'll also be able to test it this time, too (my tree's broken at the moment). I probably should have mentioned that the URL is encoded as UTF-8, though as it happens, I don't think it's making any difference.
Attached patch Updated patch (obsolete) — Splinter Review
Updated according to brendan's comments. I still can't test this; my tree is shot.
Attachment #150975 - Attachment is obsolete: true
Attachment #151052 - Flags: review?(brendan)
Comment on attachment 151052 [details] [diff] [review] Updated patch My tree is happier now: the patch seems to work as intended. I also ran the JS regression suite with no differences between pre-patch and post-patch.
This cleans up a nearby comment style nit, and avoids repeating the (c) < 128 test (although any decent compiler should common it). /be
Comment on attachment 151052 [details] [diff] [review] Updated patch Looks good, but I'm gonna check in the next patch, crediting Malcolm and marking r=brendan. Hope that's ok. /be
Attachment #151052 - Attachment is obsolete: true
Attachment #151052 - Flags: review?(brendan)
Comment on attachment 151173 [details] [diff] [review] patch based on Malcom's patch Checking this in now. /be
Attachment #151173 - Flags: review+
Fixed. /be
Status: ASSIGNED → RESOLVED
Closed: 21 years ago
Resolution: --- → FIXED
Malcolm, with your permission this will be included in the javascript test library.
Comment on attachment 174931 [details] js1_5/Regress/regress-247179.js Fine by me, although for proper attribution I should probably note that the original example came from Jim Ley, in the context of the WHATWG discussions.
Malcolm, can you send me Jim's email?
(In reply to comment #13) > Malcolm, can you send me Jim's email? Done.
js1_5/Regress/regress-247179.js checked in.
Summary: \b assertion recognises non-ASCII alphanumerics as word characters → \b and \w assertions recognises non-ASCII alphanumerics as word characters
Flags: testcase+
verified fixed 1.8.x and trunk.
Status: RESOLVED → VERIFIED
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: