Closed Bug 224676 Opened 21 years ago Closed 21 years ago

JS RegExp failing on certain disjunction + character class patterns

Categories

(Core :: JavaScript Engine, defect)

x86
Linux
defect
Not set
normal

Tracking

()

VERIFIED FIXED

People

(Reporter: dev+mozilla, Unassigned)

References

()

Details

(Keywords: js1.5, regression, Whiteboard: [no such issue in Rhino ])

Attachments

(3 files, 2 obsolete files)

User-Agent:       Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.6a) Gecko/20030729
Build Identifier: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.6a) Gecko/20031031

JSLint at http://www.crockford.com/javascript/jslint.html so far has worked
fine, but now it always reports "Unclosed string" for non-empty string literals.
Probably this is a regression in the regexp code.

Reproducible: Always

Steps to Reproduce:
1. Load URL.
2. Enter the following code:
   var test = "hallo";
3. Press the "jslint" button.

Actual Results:  
JSlint displays the following:

(1) : error at character 13: Unclosed string: hallo";

Expected Results:  
JSlint displays the following:

ok


Global:  test

This works fine in Opera 7.21, Konqueror 3.1 IE 6.0.2 and Mozilla 1.5.
Keywords: regression
Summary: JSLint fails to recognize string → JSLint fails to recognize string
Summary: JSLint fails to recognize string → JSLint fails to recognize string delimiters
I bet that this is fallout from bug 85721 (which also fixed bug 123437). See 
also bug 190685.
This is most likely a regexp problem.
The problem is caused by this:

  /^((\\[^\x00-\x1f]|[^\x00-\x1f"\\])*)"/.exec( "hallo\";" );

Smallest testcase I found is this:

  /a|[^b]b/.exec( "xb" );

Note that

  /a|[^b]{1}b/.exec( "xb" );

works as expected.
The problem exists if all of the following are true:
- a disjunction has 2 or more alternatives
- a character class is the first term of an alternative
- it has no quantifier
- the alternative is one of the 2 last alternatives

So, this fails:
  /a|[x]b/.exec( "xb" );
  /[x]b|a/.exec( "xb" );
  /([x]b|a)/.exec( "xb" );
  /([x]b|a)|a/.exec( "xb" );

But this works:
  /^[x]b|a/.exec( "xb" );
  /([x]b)|a/.exec( "xb" );
  /()[x]b|a/.exec( "xb" );
  /x[b]|a/.exec( "xb" );
  /[x]{1}b|a/.exec( "xb" );
  /[x]b|a|a/.exec( "xb" );
This works too:
  /[x]b|[a]/.exec( "xb" );
  /[x]b|a+/.exec( "xb" );
  /[x]b|a{1}/.exec( "xb" );
  /[x]b|(a)/.exec( "xb" );
  /[x]b|()a/.exec( "xb" );
  /[x]b|^a/.exec( "xb" );
Zack: thanks, as always. You've saved us a lot of time!
Cc'ing Brendan; will add regression testcase later today -
Keywords: js1.5
Summary: JSLint fails to recognize string delimiters → JS RegExp failing on certain disjunction + character class patterns
I wish rogerl were around to help.  Phil, do you know of any way to reach him?

/be
Yes, I will try to get in touch with Roger today -
ALTPREREQ2 seems to be broken. It seems to be generated only if the character
class has no quantifier, and seems to fail. But I don't understand the code enough
to say why.

BTW, the simplest testcase is
  /[x]|x/.exec( "x" );
Attached patch patch (effects mostly unknown) (obsolete) — Splinter Review
By comparing with REOP_ALTPREREQ I guess the code for REOP_ALTPREREQ2 was meant

like in the patch. But since REOP_ALTPREREQ2 is almost never generated, nobody
saw the error in testcases.
Requesting 1.6b blocker.
Flags: blocking1.6b?
Attachment #134771 - Attachment is obsolete: true
Simple approach to account for REOP_QUANTs. May or may not work. Untested.
This patch passes the test suite.

Besides fixing the bug it makes the prerequisites useful for character classes,

because they will be created for classes with quantifiers too (unless they have

a minimum of zero), and most real-world character classes have quantifiers.

The benchmark in attachment 99877 [details] of bug 85721 executes about 3% faster, but
there is a penalty at compile time, making attachment 116384 [details] about 3% slower.
Attachment #134786 - Attachment is obsolete: true
Testcase added to JS testsuite:

      mozilla/js/tests/ecma_3/RegExp/regress-224676.js

Currently passing in the Rhino shell; failing in SpiderMonkey on 
Sections 1-6, 20, and 22.
Prerequisites in the current form aren't that useful. They apply only to the
last two alternatives and only if both begin with a flat or character class
(and have no quantifier without my previous patch). Maybe it's better to remove
the code bloat.

A IMHO better implementation would give /each/ alternative a prerequisite
whenever possible to give the opportunity for a short-circuit.

Brendan: Are you still planning to rewrite the regexp code again? If so, then
it's not worth spending much thoughts on such relatively small improvements.
Otherwise I may try to make a better patch.

Has anybody good regexp benchmarks? It would be interesting to compare this
patch with the previous patch. attachment 99877 [details] is about 7% slower with this
patch, so prerequisites aren't completely useless.
Depends on: 225289
Fixed by patch for bug 225289, which was quite similar to this bug's working
extended patch, although developed independently!

/be
Status: NEW → RESOLVED
Closed: 21 years ago
Resolution: --- → FIXED
Verified FIXED. 

The above testcase now passes in the SpiderMonkey shell.
Note, it already did pass in the Rhino shell -
Status: RESOLVED → VERIFIED
Whiteboard: [no such issue in Rhino ]
Flags: blocking1.6b?
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: