Closed
Bug 773687
Opened 12 years ago
Closed 9 years ago
RegExp flag /y (sticky) causes ^ anchor to apply to start of match attempt rather than entire target string
Categories
(Core :: JavaScript Engine, defect)
Core
JavaScript Engine
Tracking
()
RESOLVED
FIXED
mozilla44
Tracking | Status | |
---|---|---|
firefox44 | --- | fixed |
People
(Reporter: steves_list, Assigned: arai)
References
(Blocks 1 open bug)
Details
(Keywords: dev-doc-complete, Whiteboard: [js:p2][DocArea=JS])
Attachments
(3 files)
Repro steps for Firefox 13.0.1:
var r = /^r/gy;
r.lastIndex = 2;
console.log(r.exec('str'));
Result: ['r']
Expected: null
If you remove the /y flag, you do get the correct result (null).
Because flag /y is (as-yet) nonstandard, I suppose I'm arguing for what the correct result should be, based on intuitive semantics and what would actually be useful.
Note that XRegExp <http://xregexp.com> uses flag /y behind the scenes for various perf optimizations, and shims it via its XRegExp.exec function via an optional 'sticky' argument (which uses native /y in browsers that support it, i.e., Firefox 3+). The described, incorrect behavior is therefore causing bugs in XRegExp.
Comment 1•12 years ago
|
||
Maybe I'm missing something, but isn't the exactly the semantics /y was designed for? For example, if you were using regular expressions to parse header lines in an HTTP request, you could use a /y regex to extract each header line, without having to mess with creating substrings of the entire HTTP request. If /y doesn't do what's demonstrated here, what's it supposed to do, and what behavior does it actually change?
Reporter | ||
Comment 2•12 years ago
|
||
For the use case you described, you should probably use the /m flag combined with ^ and /y, in which you would get the behavior you expect.
Flag /y should add an *implicit* non-/m ^ anchor that is applied to the beginning of the match attempt. However, an *explicit* (user-specified) ^ in the RegExp pattern should match only at the beginning of the string, or at the beginning of a line if used with /m.
Reporter | ||
Comment 3•12 years ago
|
||
Consider something like 'baaba'.match(/a|^b/gy). This should return ['b','a','a']. However, currently Firefox returns ['b','a','a','b','a'].
Comment 4•12 years ago
|
||
Really should have a https://bugs.ecmascript.org report on file instead of this, or upstream of this.
/be
Status: UNCONFIRMED → NEW
Ever confirmed: true
Reporter | ||
Comment 5•12 years ago
|
||
I've added a new bug at https://bugs.ecmascript.org/show_bug.cgi?id=526
Updated•12 years ago
|
Whiteboard: [js:p2]
But there are no way to test if a string "startsWith" a RegExp at position "position".
The current behaviour of FireFox with "y" flag allows this.
https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/RegExp/sticky
This pages tells:
"This allows the match-only-at-start capabilities of the character "^" to effectively be used at any location in a string by changing the value of the lastIndex property."
Also this page contains an example, where "g" flag should be used instead of "y".
I cannot think, when "y" can be usefull. But why not to extend "startsWith" to work with RegExpes at least?
Comment 7•11 years ago
|
||
(In reply to 4esn0k from comment #6)
> I cannot think, when "y" can be usefull.
I'm not sure I understand. The part you quite, and the way you quote it, sounds like it does what you want. Why can it then never be useful?
> But why not to extend "startsWith" to work with RegExpes at least?
The exact behavior of the "sticky" flag is now specified in ES6[1]. If you think that something is seriously wrong with the way it works, you should file a bug against the specification, as mentioned by brendan in comment 4.
The same applies if you want to have other functions extended, String.prototype.startsWith[2] in this case. Note that the specification of that function already contains a note about potentially supporting Regexps in the future.
[1]: http://people.mozilla.org/~jorendorff/es6-draft.html#sec-regexpbuiltinexec
[2]: http://people.mozilla.org/~jorendorff/es6-draft.html#sec-string.prototype.startswith
Updated•10 years ago
|
Assignee: general → nobody
Comment 9•10 years ago
|
||
The draft ES6 spec now lays down the rule:
21.2.2.6 ... Even when the y flag is used with a pattern, ^ always matches only at the beginning of Input, or (if Multiline is true) at the beginning of a line.
But FF gets this wrong (tested in FF 32 on MacOSX):
var re = /^foo/y
re.lastIndex = 2
re.test("..foo")
This tests true, but should be false.
Comment 10•10 years ago
|
||
(In reply to Erik Corry from comment #9)
> But FF gets this wrong (tested in FF 32 on MacOSX):
Thanks for testing. It still does in current Nightly.
Assignee | ||
Comment 11•9 years ago
|
||
Changed following:
* In irregexp::CompilePattern, do not prepend nodes for ".*?" pattern when sticky flag is set,
so pattern match starts directly from specified offset
* remove hack for sticky flag in following:
* In RegExpShared::compile do not enclose pattern with "^(?:)"
* In RegExpShared::execute, do not modify the start of input text, and related other parameters, results
* In RegExpShared::execute, call HasSubstringAt instead of StringFindPattern if sticky flag is set and pattern is simple string
Green on try run: https://treeherder.mozilla.org/#/jobs?repo=try&revision=a9df753d96a0
Currently ^, \b, and \B assertions are tested. Do you know any other case that sticky flag bahaves differently between current SpiderMonkey and the spec?
Assignee: nobody → arai.unmht
Attachment #8663358 -
Flags: review?(till)
Comment 12•9 years ago
|
||
Assignee | ||
Comment 13•9 years ago
|
||
Thank you Erik :)
I tested with latest revision:
https://v8.googlecode.com/svn/trunk/test/mjsunit/harmony/regexp-sticky.js
and looks like the test itself has 2 bugs in lastIndex after match (or, maybe the spec changed after that?).
Except them, patched SpiderMonkey passed it, and I think the test in my patch covers almost cases :)
Assignee | ||
Comment 14•9 years ago
|
||
wow, it was old repository :P
anyway it's not changed:
https://chromium.googlesource.com/v8/v8.git/+/master/test/mjsunit/harmony/regexp-sticky.js
Comment 15•9 years ago
|
||
Comment on attachment 8663358 [details] [diff] [review]
Fix assertion pattern in RegExp with sticky flag.
Review of attachment 8663358 [details] [diff] [review]:
-----------------------------------------------------------------
Very nice, thank you. r=me with feedback addressed.
::: js/src/jsstr.h
@@ +218,5 @@
>
> extern int
> StringFindPattern(JSLinearString* text, JSLinearString* pat, size_t start);
>
> +/* Return true if the string contains a pattern at start. */
s/start/|start|/ to make clear that the value of that parameter is meant, not the start of the string.
::: js/src/tests/ecma_6/RegExp/sticky.js
@@ +2,5 @@
> +var summary = 'sticky flag should not break assertion behavior.';
> +
> +print(BUGNUMBER + ": " + summary);
> +
> +function test(re, text, results) {
I think I would call |results| |expectations|, and the |ret| member |matches|. And rename |m| to |match|. However, all that is utterly unimportant, so feel free to ignore.
@@ +3,5 @@
> +
> +print(BUGNUMBER + ": " + summary);
> +
> +function test(re, text, results) {
> + // sanity check for test data itself
Nit: full-sentence comments should start with an upper-case letter and end with a "."
@@ +18,5 @@
> + } else {
> + assertEq(re.lastIndex, result.lastIndex);
> + assertEq(m !== null, true);
> + assertEq(m.length, result.ret.length);
> + for (var j = 0; j < result.ret.length; j++) {
Nit: no braces needed for single-line body.
::: js/src/vm/RegExpObject.cpp
@@ +666,5 @@
> if (canStringMatch) {
> MOZ_ASSERT(pairCount() == 1);
> + size_t sourceLength = source->length();
> + if (sticky()) {
> + if (sourceLength + start < sourceLength || sourceLength + start > length)
I don't think the first part of this condition can ever hold: |start| is unsigned, and this boils down to `start < 0`. Simply removing it should be fine, IINM.
@@ +753,5 @@
> result = irregexp::InterpretCode(cx, byteCode, chars, start, length, matches);
> }
>
> + if (result == RegExpRunStatus_Success && matches)
> + matches->checkAgainst(length);
I think `MatchPairs::displace` is dead after this removal. If I'm right, please also remove the definition.
Attachment #8663358 -
Flags: review?(till) → review+
Comment 16•9 years ago
|
||
(In reply to Tooru Fujisawa [:arai] from comment #11)
> Currently ^, \b, and \B assertions are tested. Do you know any other case
> that sticky flag bahaves differently between current SpiderMonkey and the
> spec?
Forgot to comment on this: no, I can't think of more cases either.
And thanks, Erik, for the tests!
Comment 17•9 years ago
|
||
Comment on attachment 8663358 [details] [diff] [review]
Fix assertion pattern in RegExp with sticky flag.
Review of attachment 8663358 [details] [diff] [review]:
-----------------------------------------------------------------
Good point, I think V8 has a bug there (and the test) where it is not writing back lastIndex for non-global sticky regexps.
Comment 18•9 years ago
|
||
Status: NEW → RESOLVED
Closed: 9 years ago
status-firefox44:
--- → fixed
Resolution: --- → FIXED
Target Milestone: --- → mozilla44
Keywords: dev-doc-needed
Whiteboard: [js:p2] → [js:p2][DocArea=JS]
Comment 20•9 years ago
|
||
https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/RegExp/sticky
https://developer.mozilla.org/en-US/Firefox/Releases/44#JavaScript
Keywords: dev-doc-needed → dev-doc-complete
Updated•8 years ago
|
Flags: needinfo?(supriyantomaftuh)
Updated•8 years ago
|
Flags: needinfo?(supriyantomaftuh)
You need to log in
before you can comment on or make changes to this bug.
Description
•