Closed Bug 98306 Opened 23 years ago Closed 23 years ago

JS regexp code crashes in ParseAtom for script using Regexp() [@ParseAtom]

Categories

(Core :: JavaScript Engine, defect)

x86
All
defect
Not set
normal

Tracking

()

VERIFIED FIXED

People

(Reporter: jrgmorrison, Assigned: rogerl)

References

()

Details

(Keywords: crash, js1.5)

Crash Data

Attachments

(3 files)

Overview Description: 

JS parser crashes in ParseAtom for script using Regexp()

Steps to Reproduce: 
1) go to 
http://msdn.microsoft.com/library/default.asp?url=/workshop/author/dhtml/refere
nce/dhtmlrefs.asp
2) click on the item labelled 'DHTML Properties' in the left hand frame
3) go back a page, then forward a page. 

Test case with just a simple html document and the offending JS file
is as http://jrgm/bugs/ParseAtomCrash/foo.html (and foo.js).

  Actual Results: You should crash. If not reload a couple of times.
Expected Results: Bliss.

Reproducibility: 100% after some number of tries.

Build Date & Platform Bug Found: build pulled 8/2 on win2k
 and confirmed in 8/29 win2k comm. verification build.

Additional Information: 

Stack trace:

ParseAtom(CompilerState * 0x00bfeb29) line 919 + 8 bytes
ParseQuantAtom(CompilerState * 0x00bfeb17) line 657 + 13 bytes
ParseItem(CompilerState * 0x032d5ce0) line 634 + 6 bytes
ParseRegExp(CompilerState * 0x030c88c0) line 497
js_NewRegExpObject(JSContext * 0x030c88c0, JSTokenStream * 0x031dab08, unsigned 
short * 0x0316a760, unsigned int 8, unsigned int 0) line 2846 + 15 bytes
js_GetToken(JSContext * 0x00c041cc, JSTokenStream * 0x030c88c0) line 1152 + 27 
bytes
js_MatchToken(JSContext * 0x030c88c0, JSTokenStream * 0x031dab08, int 8) line 
1278 + 13 bytes
ArgumentList(JSContext * 0x00000000, JSTokenStream * 0x00000000, JSTreeContext 
* 0x00001312, JSParseNode * 0x00b7d010) line 2534
0306e010()
030c88c0()
00b7d200()
00cc7c88()
00ce0b48()
... and these weird symbols continue on for 100's of frames
Seems funny that this would crash in that routine (which hasn't changed much
for some time). But, also seems like this is a pretty high profile place to 
crash for web developers (reading MS online docs) so we should probably fix
this sooner as opposed to later (yes, it's ironic; fix the mozilla browser 
so people can read docs on IE, but ...).
Keywords: nsbranch
Testcase added to JS test suite:

         mozilla/js/tests/ecma_3/RegExp/regress-98306.js


If you want to run this directly in the JS shell, independently of
the test driver, do e.g.:

[D:/JS_TRUNK/mozilla/js/src/WINNT4.0_DBG.OBJ] ./js
js> load('D:/JS_TRUNK/mozilla/js/tests/ecma_3/shell.js')
js> load('D:/JS_TRUNK/mozilla/js/tests/ecma_3/RegExp/shell.js')
js> load('D:/JS_TRUNK/mozilla/js/tests/ecma_3/RegExp/regress-98306.js')


As jrgm notes, you don't crash 100% of the time, but pretty nearly ...
The testcase can be reduced to a single line:

                   "Hello".match(/[/]/);


Remember, if you key this directly in the JS shell, you may need to
execute it a few times before the crash occurs. Happens on Linux, too.
OS : Win2K --> All.

As I understand it this is incorrect syntax, because "/" is a special
character and therefore must be escaped in the middle of the RegExp.
Yet this is what the Microsoft site fails to do:

 this.userAgent.match(/Mozilla[/].*(95[/]NT|95|NT|98|3.1).*Opera.*(\d+)\.(\d+)/)

On the other hand, we shouldn't crash on this!
Keywords: crash, js1.5
OS: Windows 2000 → All
By the way, when you execute the one-line testcase and DON'T crash,
this is what you see in the JS shell:

js> "Hello".match(/[/]/);
1: missing ) after argument list:
1: "Hello".match(/[/]/);
1: .................^
Note: in Rhino (both the interpreted and compiled shells), I "never"
crash on either the one-line testcase or the larger testcase, no matter
how many times I execute them. I "always" get the error message instead:

js> "Hello".match(/[/]/);
js: "<stdin>", line 17: uncaught JavaScript exception:
SyntaxError: missing ) after argument list (<stdin>; line 17)
js: "Hello".match(/[/]/);
js: .................^
I just realized, there is an ambibuity in the regular expression /[/]/.
I had been regarding it as: 

                "/" is an element of a character set

But "]" is a special character, so the parser might view the whole thing as:

                "]" is being escaped
'ambibuity' : my best one yet
Note: this alone causes SpiderMonkey to crash!

js> var re = /[/

Whereas Rhino says:

js> var re = /[/
js: uncaught JavaScript exception: SyntaxError: Unterminated character class [
More tests in SpiderMonkey:

Here's the original regexp; takes a few tries before crashing:
js> var re = /[/]/
8: missing ; before statement:
8: var re = /[/]/
8: ............^

js> var re = /[/]/
9: missing ; before statement:
9: var re = /[/]/
9: ............^

js> var re = /[/]/
CRASH !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!



Now add a letter in the character set:
js> var re = /[/a]/
1: invalid flag after regular expression:
1: var re = /[/a]/
1: ............^

(STABLE; DOESN'T SEEM TO CRASH)


Now try two backslashes in the character set. Crashes after awhile:
js> var re = /[//]/
16: syntax error:
16: var re = /[//]/
16: .............^
js> var re = /[//]/
17: syntax error:
17: var re = /[//]/
17: .............^
js> var re = /[//]/
CRASH !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Compare: Rhino gives the same three error messages as in the entry above,
but remains stable. I can't ever get it to crash on any of these -
So, spidermonkey is not trapping the bogus syntax (although to some degree
this is a parseable syntax if you lookahead to see whether the second '/'
is or isn't the last '/' in the expression. [But I'm not volunteering to 
do that, as that probably is more of a world of hurt to do than I can even
imagine :-]).

And when it botches the parse, that sets the pointer a-walking randomly 
through other people's heap allocations (Purify reports 680 ABR - Array Bounds
Reads for the simple case of `var re = /ab[/]cd/i;'). Here is the first ABR
with details:

[E] ABR: Array bounds read in ParseAtom {1 occurrence}
    Reading 2 bytes from 0x0989a020 (2 bytes at 0x0989a020 illegal)
    Address 0x0989a020 is 1 byte past the end of a 8 byte block at 0x0989a018
    Address 0x0989a020 points to a malloc'd block in heap 0x02d90000
    Thread ID: 0x50c
    Error location
    ParseAtom      [jsregexp.c:910]
        if (!ren)
                return NULL;
        
     =>         while ((c = *++cp) != ']') {
                if (cp == state->cpend) {
                        js_ReportCompileErrorNumber(state->context, 
                                                    state->tokenStream,
                                                    NULL,
    ???            [ip=0x0989a01c]
    ???            [ip=0x099ba028]
    RelExpr        [jsparse.c:2300]
            tc->flags &= ~TCF_IN_FOR_INIT;
        #endif /* JS_HAS_IN_OPERATOR */
        
     =>     pn = ShiftExpr(cx, ts, tc);
            while (pn &&
                   (js_MatchToken(cx, ts, TOK_RELOP)
        #if JS_HAS_IN_OPERATOR
    EqExpr         [jsparse.c:2275]
    BitAndExpr     [jsparse.c:2263]
    BitXorExpr     [jsparse.c:2250]
    BitOrExpr      [jsparse.c:2237]
    AndExpr        [jsparse.c:2226]
    OrExpr         [jsparse.c:2215]
Allocation location
    malloc         [msvcrt.DLL]
    JS_malloc      [jsapi.c:1409]
            if (nbytes == 0)
                nbytes = 1;
            cx->runtime->gcMallocBytes += nbytes;
     =>     p = malloc(nbytes);
            if (!p)
                JS_ReportOutOfMemory(cx);
            return p;
    js_NewRegExpObject [jsregexp.c:2843]
            JSObject *obj;
            JSRegExp *re;
        
     =>     str = js_NewStringCopyN(cx, chars, length, 0);
            if (!str)
            return NULL;
            re = js_NewRegExp(cx, ts,  str, flags, JS_FALSE);
    js_GetToken    [jsscan.c:1152]
    RelExpr        [jsparse.c:2300]
    EqExpr         [jsparse.c:2275]
    BitAndExpr     [jsparse.c:2263]
    BitXorExpr     [jsparse.c:2250]
    BitOrExpr      [jsparse.c:2237]
    AndExpr        [jsparse.c:2226]


(Note: I don't know why the two stack frames preceding ParseAtom are 
" ???   [ip=0x0989a01c]" and " ???   [ip=0x099ba028]". That's what Purify
coughed up.].
Forget my idea at 2001-09-05 13:09 above - I had my slashes mixed up.
Of course it's the BACKslash that escapes...
*** Bug 99377 has been marked as a duplicate of this bug. ***
See the duplicate bug 99377 for a look at jsregexp.c line: 910
I just want to reiterate that this must be fixed for emojo, imho. Way too
common a location for developers (i.e., msdn).
This patch fixes the crash. But I notice that we don't handle [] & [^] per ECMA,
I guess that should be a separate bug.
Status: NEW → ASSIGNED
BTW, Perl accepts the regexp /[/]/ 
Here are a couple examples:


regexp        /[/]/
string        "Hello"
NO MATCH

regexp        /[/]/
string        "Hello/there"
MATCH!!!      /
*** Bug 99667 has been marked as a duplicate of this bug. ***
cc'ing potential reviewers -
r=jst
Bug 100199 has been filed for the [], [^] issue -
Summary: JS parser crashes in ParseAtom for script using Regexp() → JS regexp code crashes in ParseAtom for script using Regexp()
Fix checked in.
Status: ASSIGNED → RESOLVED
Closed: 23 years ago
Resolution: --- → FIXED
On the basis of the JS shell testcase,

         mozilla/js/tests/ecma_3/RegExp/regress-98306.js,

ready to mark VERIFIED. Testcase now passes on WinNT, Linux, and Mac.
But I will wait until tomorrow's builds come out and test the msdn URL
before wrapping this one up -
I'd like to see this on the nsbranch.

(Bob the developer goes to MSDN to check some docs. 
Netscape crashes. "Nice browser! Netscape sucks.").
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
I just realized something else: what the patch has done is prevent JS
from crashing on the regular expression /[/]/. But it still produces a 
syntax error:

js> "Hello/there".match(/[/]/);
3: unterminated character class [:
3: "Hello/there".match(/[/]/);
3: ....................^


Contrast this with the Perl example above: 

regexp        /[/]/
string        "Hello/there"
MATCH!!!      /


So - it's good that we don't crash on this, but shouldn't we accept
this regexp as a valid one? Or as John pointed out above, would that 
be asking too much of this fix? Compare what happens in the JS shell
when you use the RegExp constructor instead of a literal RegExp:

js> var re = new RegExp("[/]");
js> "Hello/there".match(re);
/
Which perl is working? This fails for me, with "unmatched [] in regexp".

    print "match: $1, $2\n" 
        if "one/two" =~ /(.*)[/](.*)/;

But, at any rate, let's take the crash fix, and leave the rest of this to 
some other bug. (It borders on ambiguous, and the alternate constructor
syntax allows this re to be written. What the author was thinking when 
they wrote the test in that page was, well, I don't know).

Can I get an nsbranch+ for the crash fix, at a high profile collection of 
pages? (I doubt we can 'evangelize' this one away :-]).
> Which perl is working? 

my $pattern = "(.*)[/](.*)";
print "match: $1, $2\n" 
  if  "one/two" =~ /$pattern/;

OUTPUT
match: one, two


This approach is similar to the RegExp constructor syntax in JS,
in that the pattern string is parsed separately from the // operator.
> So - it's good that we don't crash on this, but shouldn't we accept
> this regexp as a valid one?

I believe I can answer this from a reading of the ECMA-262 Final spec.
It says that "/" IS permitted as a "pattern character" (Chapter 15.10.1)
but is specifically NOT permitted in RegExp literals (Chapter 7.8.5). 

Hence "/" may be provided to the RegExp function or constructor as part
of a pattern, but may not appear (unescaped) as part of a RegExp literal.
Thus, we seem to be acting exactly according to spec.

I will attach the documentation below. A similar example is the following: 
RegExp("") is permitted, but the literal // is not (it's the comment sign).
                                  SUMMARY
 
      "/" allowed in RegExp patterns    "/" allowed unescaped in RegExp literals

Perl 5            YES                                 NO
ECMA-262          YES                                 NO
JS1.5             YES                                 NO
IE4.7             YES                                 YES   


To see the IE4.7 behavior, try this:

javascript: var re = /[/]/;
            var str = "Hello/there";
            var res = str.match(re);  
            alert(res);

OUTPUT:
/
NOTE: SpiderMonkey, Rhino are also handling things like this correctly:

js> var re = /[/a]/
1: invalid flag after regular expression
1: var re = /[/a]/
1: ............^

js> var re = /[/g]/
8: unterminated character class [:
8: var re = /[/g]/
Verification in browser: using trunk builds 20010919xx on WinNT, Linux, Mac9.1
No problem loading the MSDN site, reloading it, going back and forth, etc.
Similarly, no problem with jrgm's foo.html testcase. I never crashed.
Attachment #49232 - Flags: superreview+
Attachment #49232 - Flags: review+
Pulled and built branch engine, ran tests fine.

Low risk fix - the loop is simply re-structured to detect the end condition
earlier, before the increment, no other changes.
My boss 'thesteve' says this is a nsbranch+ nominee.
Keywords: nsbranchnsbranch+
Status: REOPENED → RESOLVED
Closed: 23 years ago23 years ago
Resolution: --- → FIXED
Fix checked in to branch.
Bug 101070 has been filed on the evangelism issue - 
Verified fixed on 0.9.4 branch. Using Mozilla branch binaries 20010923xx
on WinNT, Mac 9.1, and Linux. I verified the msdn URL given in this bug,
plus the two msdn URLS given in the duped bugs: 

                              bug 99377 
                              bug 99667

Also verified against jrgm's "foo.html" testcase. 
Also verified with trunk builds 20010923xx. 


Marking VERIFIED FIXED -
Status: RESOLVED → VERIFIED
Adding the signature to the summary for talkback tracking. Good job guys!
Summary: JS regexp code crashes in ParseAtom for script using Regexp() → JS regexp code crashes in ParseAtom for script using Regexp() [@ParseAtom]
*** Bug 103559 has been marked as a duplicate of this bug. ***
Flags: testcase+
Crash Signature: [@ParseAtom]
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: