Closed
Bug 247179
Opened 21 years ago
Closed 21 years ago
\b and \w assertions recognises non-ASCII alphanumerics as word characters
Categories
(Core :: JavaScript Engine, defect)
Core
JavaScript Engine
Tracking
()
VERIFIED
FIXED
People
(Reporter: malcolm-bmo, Assigned: malcolm-bmo)
References
()
Details
(Keywords: js1.5)
Attachments
(2 files, 2 obsolete files)
|
2.27 KB,
patch
|
brendan
:
review+
|
Details | Diff | Splinter Review |
|
2.19 KB,
text/plain
|
Details |
per ECMA-262 15.10.2.6, the \b assertion should break at 'word' boundaries,
where 'word' means characters A-Za-z0-9_ and no others. It seems that the
current implementation includes other characters (see example URL).
js/src/jsregexp.c includes the following comments (line 930ff or so):
* '\b' Word boundary (between \w and \W).
* '\w' A word character, [0-9a-z_A-Z].
This is happening because JS_ISWORD is based on JS_ISALNUM, which is true for
non-ASCII alphabetics. Nothing else seems to use JS_ISALNUM.
I think that JS_ISWORD should probably not be using JS_CTYPE to determine
whether a character is a 'word' character. Rather, it should probably check the
allowed characters specifically. JS_ISALNUM would then no longer be necessary.
| Assignee | ||
Comment 1•21 years ago
|
||
Something like this? (disclaimer: compiles, not tested).
I forgot to mention: IE currently appears to follow the ECMA-262 spec, as far
as this behaviour is concerned. Should we change to match IE and ECMA-262?
Assignee: general → malcolm-bmo
Status: NEW → ASSIGNED
| Assignee | ||
Updated•21 years ago
|
Attachment #150975 -
Flags: review?(brendan)
Comment 2•21 years ago
|
||
Seems real enough -- wonder whether the ECMA spec reflects the intentions of the
technical group, though. Cc'ing waldemar.
/be
Keywords: js1.5
Comment 3•21 years ago
|
||
Comment on attachment 150975 [details] [diff] [review]
Change JS_ISWORD to disallow non-ASCII alphabetics
No need to move the #define down.
Also, do you want JS_ISDIGIT or JS7_ISDEC?
/be
Attachment #150975 -
Flags: review?(brendan) → review-
| Assignee | ||
Comment 4•21 years ago
|
||
> No need to move the #define down.
Ok, just thought it would read better if it depended only upon macros that were
fully defined at point-of-definition. I know it doesn't make a difference.
> Also, do you want JS_ISDIGIT or JS7_ISDEC?
Ah, didn't see JS7_ISDEC - yes, that would be right.
I'll generate a new patch tonight, if required. Hopefully, I'll also be able to
test it this time, too (my tree's broken at the moment).
I probably should have mentioned that the URL is encoded as UTF-8, though as it
happens, I don't think it's making any difference.
| Assignee | ||
Comment 5•21 years ago
|
||
Updated according to brendan's comments. I still can't test this; my tree is
shot.
Attachment #150975 -
Attachment is obsolete: true
| Assignee | ||
Updated•21 years ago
|
Attachment #151052 -
Flags: review?(brendan)
| Assignee | ||
Comment 6•21 years ago
|
||
Comment on attachment 151052 [details] [diff] [review]
Updated patch
My tree is happier now: the patch seems to work as intended.
I also ran the JS regression suite with no differences between pre-patch and
post-patch.
Comment 7•21 years ago
|
||
This cleans up a nearby comment style nit, and avoids repeating the (c) < 128
test (although any decent compiler should common it).
/be
Comment 8•21 years ago
|
||
Comment on attachment 151052 [details] [diff] [review]
Updated patch
Looks good, but I'm gonna check in the next patch, crediting Malcolm and
marking r=brendan. Hope that's ok.
/be
Attachment #151052 -
Attachment is obsolete: true
Attachment #151052 -
Flags: review?(brendan)
Comment 9•21 years ago
|
||
Comment on attachment 151173 [details] [diff] [review]
patch based on Malcom's patch
Checking this in now.
/be
Attachment #151173 -
Flags: review+
Comment 10•21 years ago
|
||
Fixed.
/be
Status: ASSIGNED → RESOLVED
Closed: 21 years ago
Resolution: --- → FIXED
Comment 11•21 years ago
|
||
Malcolm, with your permission this will be included in the javascript test
library.
| Assignee | ||
Comment 12•21 years ago
|
||
Comment on attachment 174931 [details]
js1_5/Regress/regress-247179.js
Fine by me, although for proper attribution I should probably note that the
original example came from Jim Ley, in the context of the WHATWG discussions.
Comment 13•21 years ago
|
||
Malcolm, can you send me Jim's email?
| Assignee | ||
Comment 14•21 years ago
|
||
(In reply to comment #13)
> Malcolm, can you send me Jim's email?
Done.
Comment 15•21 years ago
|
||
js1_5/Regress/regress-247179.js checked in.
Updated•20 years ago
|
Summary: \b assertion recognises non-ASCII alphanumerics as word characters → \b and \w assertions recognises non-ASCII alphanumerics as word characters
Updated•20 years ago
|
Flags: testcase+
You need to log in
before you can comment on or make changes to this bug.
Description
•