Closed Bug 1408452 Opened 5 years ago Closed 5 years ago

/\x/ and /\u/ produce incorrect regular expressions


(Core :: JavaScript Engine, defect)

Not set



Tracking Status
firefox-esr52 --- wontfix
firefox56 --- wontfix
firefox57 --- wontfix
firefox58 --- fixed


(Reporter: zirakertan, Assigned: zirakertan)



(2 files, 3 obsolete files)

Attached patch solution-1.diff (obsolete) — Splinter Review
User Agent: Mozilla/5.0 (X11; Linux x86_64; rv:56.0) Gecko/20100101 Firefox/56.0
Build ID: 20171009141542

Steps to reproduce:

    /\x/.test('x'); // false
    /\x/.test('xx'); // true

Does not trigger when there's any character at all after the escape:

    /\x!/.test('x!'); // true

(if you don't care about the debugging, Ctrl-F "solution")

After a long debugging session, it came down to RegExpParser.cpp:
The problem stems from the "backtracing" done in ParseHexEscape, and it's a subtle problem at that (at least for my tastes).

In the beginning of the `case 'x'` label in ParseDisjunction, our state looks something like:

    current()  = 0x5c ('\')
    position() = 0x...3368 ("\\x")
    next_pos_  = 0x...3369 ("x")
    has_more   = true

After the Advance(2) call in line 1756, we've supposedly stepped over both characters, giving us:

    current()  = random memory
    position() = 0x...3369 ('x')
    next_pos_  = end_ = 0x...3360
    has_more   = false

The important bit is that even though we've "stepped over" both characters, `position()` still returns `"x"` and not `end_`.

Now inside of `ParseHexEscape`, line 371 saves our current position and when `Reset` is called on line 378, `has_more` is flipped back to `true`.

We leave the function with the following state:

    current()  = 0x78 ('x')
    position() = 0x...3369 ("x")
    next_pos_  = end_ = 0x...3360
    has_more_  = true

Outside of the function in `ParseDisjunction`:1763 adds 'x' to the regexp, giving us /x/. That's good.

However, the `while` loop in `ParseDisjunction` continues since `has_more` is `true`. Now we're pointing at 'x' again which is consumed as a literal character, giving us the effective regexp /xx/. Yikes.

This also happens for \u: /\u/.test('uu') === true

There're several solutions to this sort of problem. It's a classic combination of "state management is hard" and "parsing is hard", giving us a bit of a headache.

One simple and obvious solution is to add `has_more_` checks to `ParseX`, for instance:

    1 file changed, 4 insertions(+)
    js/src/irregexp/RegExpParser.cpp | 4 ++++

    modified   js/src/irregexp/RegExpParser.cpp
    @@ -368,6 +368,10 @@ template <typename CharT>
     RegExpParser<CharT>::ParseHexEscape(int length, widechar* value)
    +    if (!has_more_) {
    +        return false;
    +    }
         const CharT* start = position();
         uint32_t val = 0;
         bool done = false;

That works but it doesn't solve the underlying problem: Advance does not correctly set next_pos_ when has_more_ is set to false. Those are two pieces of state which should be linked but they are divorced in this specific case. A more thorough solution could be:

    modified   js/src/irregexp/RegExpParser.cpp
    @@ -328,6 +328,7 @@ RegExpParser<CharT>::Advance()
         } else {
             current_ = kEndMarker;
    +        next_pos_ = end_ + 1;
             has_more_ = false;

That also seems to be the approach Chromium took:
Attached patch More appropriate solution (obsolete) — Splinter Review
Test case on test262:
Assignee: nobody → zirakertan
Ever confirmed: true
Comment on attachment 8918374 [details] [diff] [review]
More appropriate solution

thanks :)
this looks good.

can you export this as a patch file and attach it here, so that it contains commit message and author information?
Attachment #8918374 - Flags: feedback+
Attached. Hopefully it's the correct format, as I am but a git plebian I followed the instructions in
Attachment #8918367 - Attachment is obsolete: true
Attachment #8918374 - Attachment is obsolete: true
Comment on attachment 8918484 [details] [diff] [review]

Review of attachment 8918484 [details] [diff] [review]:

Thanks again :D

I've triggered try run (automated testing)

while waiting for the try, can you fix the 1st line of the commit message to
"Bug 1408452 - Fix RegExpParser position tracking" ?

also, once your test262 patch gets merged there, can you create another patch to add the test under js/src/tests/test262/local ?
with appending `reportCompare(0, 0);` at the end of file, like the following file:

we're updating js/src/tests/test262 directory periodically,
and js/src/tests/test262/local is the directory for adding tests temporary.
Attachment #8918484 - Flags: review+
the test gets merged.
zirak, can you prepare a patch for testcase, and also update the commit message of the 1st patch?
Flags: needinfo?(zirakertan)
Attachment #8918484 - Attachment is obsolete: true
Flags: needinfo?(zirakertan)
Comment on attachment 8919928 [details] [diff] [review]

Review of attachment 8919928 [details] [diff] [review]:

Thank you :D
Attachment #8919928 - Flags: review+
Was this your intention?
Comment on attachment 8919934 [details] [diff] [review]

Review of attachment 8919934 [details] [diff] [review]:

Yes, thanks again!
Attachment #8919934 - Flags: review+
The test passed locally.
Keywords: checkin-needed
Huzzah! What happens now?
these patches will be checked in shortly :)
You need to log in before you can comment on or make changes to this bug.