Closed Bug 773687 Opened 12 years ago Closed 9 years ago

RegExp flag /y (sticky) causes ^ anchor to apply to start of match attempt rather than entire target string

Categories

(Core :: JavaScript Engine, defect)

defect
Not set
normal

Tracking

()

RESOLVED FIXED
mozilla44
Tracking Status
firefox44 --- fixed

People

(Reporter: steves_list, Assigned: arai)

References

(Blocks 1 open bug)

Details

(Keywords: dev-doc-complete, Whiteboard: [js:p2][DocArea=JS])

Attachments

(3 files)

Repro steps for Firefox 13.0.1: var r = /^r/gy; r.lastIndex = 2; console.log(r.exec('str')); Result: ['r'] Expected: null If you remove the /y flag, you do get the correct result (null). Because flag /y is (as-yet) nonstandard, I suppose I'm arguing for what the correct result should be, based on intuitive semantics and what would actually be useful. Note that XRegExp <http://xregexp.com> uses flag /y behind the scenes for various perf optimizations, and shims it via its XRegExp.exec function via an optional 'sticky' argument (which uses native /y in browsers that support it, i.e., Firefox 3+). The described, incorrect behavior is therefore causing bugs in XRegExp.
Maybe I'm missing something, but isn't the exactly the semantics /y was designed for? For example, if you were using regular expressions to parse header lines in an HTTP request, you could use a /y regex to extract each header line, without having to mess with creating substrings of the entire HTTP request. If /y doesn't do what's demonstrated here, what's it supposed to do, and what behavior does it actually change?
For the use case you described, you should probably use the /m flag combined with ^ and /y, in which you would get the behavior you expect. Flag /y should add an *implicit* non-/m ^ anchor that is applied to the beginning of the match attempt. However, an *explicit* (user-specified) ^ in the RegExp pattern should match only at the beginning of the string, or at the beginning of a line if used with /m.
Consider something like 'baaba'.match(/a|^b/gy). This should return ['b','a','a']. However, currently Firefox returns ['b','a','a','b','a'].
Really should have a https://bugs.ecmascript.org report on file instead of this, or upstream of this. /be
Status: UNCONFIRMED → NEW
Ever confirmed: true
Whiteboard: [js:p2]
But there are no way to test if a string "startsWith" a RegExp at position "position". The current behaviour of FireFox with "y" flag allows this. https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/RegExp/sticky This pages tells: "This allows the match-only-at-start capabilities of the character "^" to effectively be used at any location in a string by changing the value of the lastIndex property." Also this page contains an example, where "g" flag should be used instead of "y". I cannot think, when "y" can be usefull. But why not to extend "startsWith" to work with RegExpes at least?
(In reply to 4esn0k from comment #6) > I cannot think, when "y" can be usefull. I'm not sure I understand. The part you quite, and the way you quote it, sounds like it does what you want. Why can it then never be useful? > But why not to extend "startsWith" to work with RegExpes at least? The exact behavior of the "sticky" flag is now specified in ES6[1]. If you think that something is seriously wrong with the way it works, you should file a bug against the specification, as mentioned by brendan in comment 4. The same applies if you want to have other functions extended, String.prototype.startsWith[2] in this case. Note that the specification of that function already contains a note about potentially supporting Regexps in the future. [1]: http://people.mozilla.org/~jorendorff/es6-draft.html#sec-regexpbuiltinexec [2]: http://people.mozilla.org/~jorendorff/es6-draft.html#sec-string.prototype.startswith
sorry, i missed, that i need to use /y flag without "^" in pattern.
Assignee: general → nobody
The draft ES6 spec now lays down the rule: 21.2.2.6 ... Even when the y flag is used with a pattern, ^ always matches only at the beginning of Input, or (if Multiline is true) at the beginning of a line. But FF gets this wrong (tested in FF 32 on MacOSX): var re = /^foo/y re.lastIndex = 2 re.test("..foo") This tests true, but should be false.
(In reply to Erik Corry from comment #9) > But FF gets this wrong (tested in FF 32 on MacOSX): Thanks for testing. It still does in current Nightly.
Blocks: es6
OS: Windows 7 → All
Hardware: x86_64 → All
Version: 13 Branch → unspecified
Blocks: 887016
Changed following: * In irregexp::CompilePattern, do not prepend nodes for ".*?" pattern when sticky flag is set, so pattern match starts directly from specified offset * remove hack for sticky flag in following: * In RegExpShared::compile do not enclose pattern with "^(?:)" * In RegExpShared::execute, do not modify the start of input text, and related other parameters, results * In RegExpShared::execute, call HasSubstringAt instead of StringFindPattern if sticky flag is set and pattern is simple string Green on try run: https://treeherder.mozilla.org/#/jobs?repo=try&revision=a9df753d96a0 Currently ^, \b, and \B assertions are tested. Do you know any other case that sticky flag bahaves differently between current SpiderMonkey and the spec?
Assignee: nobody → arai.unmht
Attachment #8663358 - Flags: review?(till)
Thank you Erik :) I tested with latest revision: https://v8.googlecode.com/svn/trunk/test/mjsunit/harmony/regexp-sticky.js and looks like the test itself has 2 bugs in lastIndex after match (or, maybe the spec changed after that?). Except them, patched SpiderMonkey passed it, and I think the test in my patch covers almost cases :)
Comment on attachment 8663358 [details] [diff] [review] Fix assertion pattern in RegExp with sticky flag. Review of attachment 8663358 [details] [diff] [review]: ----------------------------------------------------------------- Very nice, thank you. r=me with feedback addressed. ::: js/src/jsstr.h @@ +218,5 @@ > > extern int > StringFindPattern(JSLinearString* text, JSLinearString* pat, size_t start); > > +/* Return true if the string contains a pattern at start. */ s/start/|start|/ to make clear that the value of that parameter is meant, not the start of the string. ::: js/src/tests/ecma_6/RegExp/sticky.js @@ +2,5 @@ > +var summary = 'sticky flag should not break assertion behavior.'; > + > +print(BUGNUMBER + ": " + summary); > + > +function test(re, text, results) { I think I would call |results| |expectations|, and the |ret| member |matches|. And rename |m| to |match|. However, all that is utterly unimportant, so feel free to ignore. @@ +3,5 @@ > + > +print(BUGNUMBER + ": " + summary); > + > +function test(re, text, results) { > + // sanity check for test data itself Nit: full-sentence comments should start with an upper-case letter and end with a "." @@ +18,5 @@ > + } else { > + assertEq(re.lastIndex, result.lastIndex); > + assertEq(m !== null, true); > + assertEq(m.length, result.ret.length); > + for (var j = 0; j < result.ret.length; j++) { Nit: no braces needed for single-line body. ::: js/src/vm/RegExpObject.cpp @@ +666,5 @@ > if (canStringMatch) { > MOZ_ASSERT(pairCount() == 1); > + size_t sourceLength = source->length(); > + if (sticky()) { > + if (sourceLength + start < sourceLength || sourceLength + start > length) I don't think the first part of this condition can ever hold: |start| is unsigned, and this boils down to `start < 0`. Simply removing it should be fine, IINM. @@ +753,5 @@ > result = irregexp::InterpretCode(cx, byteCode, chars, start, length, matches); > } > > + if (result == RegExpRunStatus_Success && matches) > + matches->checkAgainst(length); I think `MatchPairs::displace` is dead after this removal. If I'm right, please also remove the definition.
Attachment #8663358 - Flags: review?(till) → review+
(In reply to Tooru Fujisawa [:arai] from comment #11) > Currently ^, \b, and \B assertions are tested. Do you know any other case > that sticky flag bahaves differently between current SpiderMonkey and the > spec? Forgot to comment on this: no, I can't think of more cases either. And thanks, Erik, for the tests!
Comment on attachment 8663358 [details] [diff] [review] Fix assertion pattern in RegExp with sticky flag. Review of attachment 8663358 [details] [diff] [review]: ----------------------------------------------------------------- Good point, I think V8 has a bug there (and the test) where it is not writing back lastIndex for non-global sticky regexps.
Status: NEW → RESOLVED
Closed: 9 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla44
Keywords: dev-doc-needed
Whiteboard: [js:p2] → [js:p2][DocArea=JS]
Flags: needinfo?(supriyantomaftuh)
Flags: needinfo?(supriyantomaftuh)
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: