Last Comment Bug 280769 - crash while running javascript that has large regex
: crash while running javascript that has large regex
Status: VERIFIED FIXED
[sg:fix][no l10n impact] [needs branc...
: crash, js1.5, verified1.7.13, verified1.8
Product: Core
Classification: Components
Component: JavaScript Engine (show other bugs)
: Trunk
: x86 Windows XP
: P1 critical (vote)
: mozilla1.8beta4
Assigned To: Igor Bukanov
: Bob Clary [:bc:]
Mentors:
Depends on: 308566 310539
Blocks:
  Show dependency treegraph
 
Reported: 2005-02-02 04:47 PST by Alden D'Souza
Modified: 2006-06-02 22:58 PDT (History)
15 users (show)
dveditz: blocking1.7.13+
dveditz: blocking‑aviary1.0.8+
brendan: blocking1.8b5+
bob: in‑testsuite+
See Also:
Crash Signature:
(edit)
QA Whiteboard:
Iteration: ---
Points: ---
Has Regression Range: ---
Has STR: ---


Attachments
Webpage that crashes firefox (27.16 KB, text/html)
2005-02-02 04:50 PST, Alden D'Souza
no flags Details
'jsshell script' (26.79 KB, text/plain)
2005-02-13 21:19 PST, timeless
no flags Details
Fix: replaing assert(x) by generating JS runtime error if x is false (7.76 KB, patch)
2005-02-28 12:02 PST, Igor Bukanov
no flags Details | Diff | Splinter Review
JS test with controllable complexity (142 bytes, text/plain)
2005-02-28 12:05 PST, Igor Bukanov
no flags Details
Fixing tree_depth assert (12.67 KB, patch)
2005-02-28 12:46 PST, Igor Bukanov
no flags Details | Diff | Splinter Review
Test case for re class count overflow (202 bytes, patch)
2005-02-28 15:33 PST, Igor Bukanov
no flags Details | Diff | Splinter Review
Fix extension to cover class count overflow (14.11 KB, patch)
2005-02-28 16:13 PST, Igor Bukanov
no flags Details | Diff | Splinter Review
Fix extension to cover the assert in AddCharacterToCharSet() (15.74 KB, patch)
2005-02-28 17:22 PST, Igor Bukanov
no flags Details | Diff | Splinter Review
Fix for the fix (15.82 KB, patch)
2005-03-01 05:35 PST, Igor Bukanov
no flags Details | Diff | Splinter Review
proposed fix, based on igor's patches but with jump-to-jump extension (49.38 KB, patch)
2005-03-02 17:37 PST, Brendan Eich [:brendan]
no flags Details | Diff | Splinter Review
Patch update to reflect CVS changes (46.49 KB, patch)
2005-06-27 15:11 PDT, Igor Bukanov
no flags Details | Diff | Splinter Review
Patch extension to deal with flat_string_length >= 64K (47.92 KB, patch)
2005-06-27 16:39 PDT, Igor Bukanov
no flags Details | Diff | Splinter Review
Patch extension to deal with flat_string_length >= 64K (47.92 KB, patch)
2005-06-27 16:39 PDT, Igor Bukanov
no flags Details | Diff | Splinter Review
The previous patch with smaller whitespace differences (50.75 KB, patch)
2005-06-27 16:44 PDT, Igor Bukanov
no flags Details | Diff | Splinter Review
Variable-length bytecode for flat strings (49.82 KB, patch)
2005-06-28 08:06 PDT, Igor Bukanov
no flags Details | Diff | Splinter Review
Patch version against CVS head (49.20 KB, patch)
2005-07-30 11:17 PDT, Igor Bukanov
no flags Details | Diff | Splinter Review
Fixes to address the above issues (37.88 KB, patch)
2005-08-02 17:23 PDT, Igor Bukanov
no flags Details | Diff | Splinter Review
Patch split: whitespace only changes (15.00 KB, patch)
2005-08-03 16:37 PDT, Igor Bukanov
mrbkap: review+
brendan: superreview+
brendan: approval1.8b4+
Details | Diff | Splinter Review
Patch split: the real changes (25.84 KB, patch)
2005-08-03 16:47 PDT, Igor Bukanov
no flags Details | Diff | Splinter Review
Yet another patch split: comments, moves and treeDepth (5.22 KB, patch)
2005-09-10 10:53 PDT, Igor Bukanov
no flags Details | Diff | Splinter Review
Yet another patch split: comments, moves and treeDepth (5.22 KB, patch)
2005-09-10 10:53 PDT, Igor Bukanov
mrbkap: review+
Details | Diff | Splinter Review
The real fixes (23.98 KB, patch)
2005-09-10 11:06 PDT, Igor Bukanov
no flags Details | Diff | Splinter Review
More fix splitting: compact index for flat strings (7.12 KB, patch)
2005-09-10 14:11 PDT, Igor Bukanov
no flags Details | Diff | Splinter Review
More splitting 2: overflow checks. (19.24 KB, patch)
2005-09-10 14:33 PDT, Igor Bukanov
no flags Details | Diff | Splinter Review
Split: eliminating uint16 as indexes and limits check (25.28 KB, patch)
2005-09-11 07:02 PDT, Igor Bukanov
no flags Details | Diff | Splinter Review
Split: jump overflow checks (10.64 KB, patch)
2005-09-11 07:09 PDT, Igor Bukanov
no flags Details | Diff | Splinter Review
split test results (18.84 KB, text/plain)
2005-09-12 19:01 PDT, Bob Clary [:bc:]
no flags Details
compact index for flat strings (version 2) (8.12 KB, patch)
2005-09-13 06:56 PDT, Igor Bukanov
no flags Details | Diff | Splinter Review
stacks for patches 1, 2 in comment 84 (29.16 KB, text/plain)
2005-09-13 10:48 PDT, Bob Clary [:bc:]
no flags Details
fix bug pointed out by bc (1.75 KB, patch)
2005-09-13 23:14 PDT, Blake Kaplan (:mrbkap)
brendan: review-
Details | Diff | Splinter Review
Updated split: mostly code movements and treeDepth fixes (5.22 KB, patch)
2005-09-15 06:41 PDT, Igor Bukanov
mrbkap: review+
brendan: superreview+
Details | Diff | Splinter Review
Updated split: tighter memory allocation and assert checks for bytecode (6.22 KB, patch)
2005-09-15 06:42 PDT, Igor Bukanov
brendan: review+
Details | Diff | Splinter Review
Updated split: compact index for flat strings (7.55 KB, patch)
2005-09-15 06:43 PDT, Igor Bukanov
brendan: review+
Details | Diff | Splinter Review
Updated split: compact indexes for all index-like data (25.59 KB, patch)
2005-09-15 06:45 PDT, Igor Bukanov
no flags Details | Diff | Splinter Review
Updated split: jump overflow checks (10.64 KB, patch)
2005-09-15 06:46 PDT, Igor Bukanov
no flags Details | Diff | Splinter Review
Patch split 2009-09-17: mostly code movements and treeDepth fixes (5.22 KB, patch)
2005-09-17 12:02 PDT, Igor Bukanov
no flags Details | Diff | Splinter Review
Patch split 2009-09-17: tighter memory allocation and assert checks for bytecode (7.46 KB, patch)
2005-09-17 12:05 PDT, Igor Bukanov
no flags Details | Diff | Splinter Review
Patch split 2009-09-17: compact index for flat strings (7.63 KB, patch)
2005-09-17 12:06 PDT, Igor Bukanov
no flags Details | Diff | Splinter Review
Patch split 2009-09-17: compact indexes for all index-like data (24.67 KB, patch)
2005-09-17 12:07 PDT, Igor Bukanov
no flags Details | Diff | Splinter Review
Patch split 2009-09-17: jump overflow checks (10.64 KB, patch)
2005-09-17 12:09 PDT, Igor Bukanov
no flags Details | Diff | Splinter Review
Combined patch 2005-09-17 (47.05 KB, patch)
2005-09-17 12:21 PDT, Igor Bukanov
no flags Details | Diff | Splinter Review
Patch split 2009-09-17: V2 of compact indexes for all index-like data (25.36 KB, patch)
2005-09-17 14:26 PDT, Igor Bukanov
no flags Details | Diff | Splinter Review
Combined patch 2005-09-17 v2 (47.74 KB, patch)
2005-09-17 14:42 PDT, Igor Bukanov
mrbkap: review+
brendan: superreview+
brendan: approval1.8b5+
Details | Diff | Splinter Review
Patch to commit (47.98 KB, patch)
2005-09-29 03:28 PDT, Igor Bukanov
dveditz: approval‑aviary1.0.8-
dveditz: approval1.7.13-
Details | Diff | Splinter Review
backported to the 1.7/aviary101 branch (49.05 KB, patch)
2006-02-15 14:52 PST, Daniel Veditz [:dveditz]
mrbkap: review+
dveditz: approval‑aviary1.0.8+
dveditz: approval1.7.13+
Details | Diff | Splinter Review

Description Alden D'Souza 2005-02-02 04:47:54 PST
User-Agent:       Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.7.5) Gecko/20041107 Firefox/1.0
Build Identifier: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.7.5) Gecko/20041107 Firefox/1.0

If the Number of Strings compared in the regular expression, ie:

Newington|Abbeville|Abbotsford| .....

is more than a certain number, firefox will crash.

On my machine that number happens to be exactly 2729. Comparing
2730 values will crash Firefox everytime.

The attachment page provided, contains a regex comparing 2729 values,
the last entry being 'x'. changing the last entry to 'xa' or will crash
firefox.



Reproducible: Always

Steps to Reproduce:
1. Open Attachment Webpage in Firefox.
2. Copy the Address at the bottom of the page and paste it into the TextArea.
3. Click the "Check" Button.
4. If firefox does not Crash, Try editing the script and increase the number
of elements compared in the Regular Expression.
5. The attachment page provided, contains a regex comparing 2729 values,
the last entry being 'x'. changing the last entry to 'xa' crashes firefox
on my machine.
Actual Results:  
Crashed

Expected Results:  
should alert the user if the string is matched. (works fine in browsers)
Comment 1 Alden D'Souza 2005-02-02 04:50:11 PST
Created attachment 173162 [details]
Webpage that crashes firefox
Comment 2 Varun 2005-02-13 18:57:58 PST
good way to test if your quality feedback agent is working fine.
yes. confirmed. I built firefox and installed it (i also built the installer
which was missing the talkback.xpi). so i copied the missing files to
c:\programming files\mozilla firefox\components\ and tried this bug. It crashed
my ff and confirmed that my QFA was working fine.

(In reply to comment #0)
> User-Agent:       Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.7.5)
Gecko/20041107 Firefox/1.0
> Build Identifier: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.7.5)
Gecko/20041107 Firefox/1.0
> 
> If the Number of Strings compared in the regular expression, ie:
> 
> Newington|Abbeville|Abbotsford| .....
> 
> is more than a certain number, firefox will crash.
> 
> On my machine that number happens to be exactly 2729. Comparing
> 2730 values will crash Firefox everytime.
> 
> The attachment page provided, contains a regex comparing 2729 values,
> the last entry being 'x'. changing the last entry to 'xa' or will crash
> firefox.
> 
> 
> 
> Reproducible: Always
> 
> Steps to Reproduce:
> 1. Open Attachment Webpage in Firefox.
> 2. Copy the Address at the bottom of the page and paste it into the TextArea.
> 3. Click the "Check" Button.
> 4. If firefox does not Crash, Try editing the script and increase the number
> of elements compared in the Regular Expression.
> 5. The attachment page provided, contains a regex comparing 2729 values,
> the last entry being 'x'. changing the last entry to 'xa' crashes firefox
> on my machine.
> Actual Results:  
> Crashed
> 
> Expected Results:  
> should alert the user if the string is matched. (works fine in browsers)

Comment 3 timeless 2005-02-13 21:19:14 PST
Created attachment 174273 [details]
'jsshell script'

js> load('/tmp/0.htm')
Assertion failure: ((diff) >= -32768) && ((diff) <= 32767), at jsregexp.c:1448

Program received signal SIGABRT, Aborted.
0x900429ac in kill ()
(gdb) where
#0  0x900429ac in kill ()
#1  0x9009eb1c in abort ()
#2  0x00007548 in JS_Assert (s=0xf1450 "((diff) >= -32768) && ((diff) <=
32767)", file=0xf13ac "jsregexp.c", ln=1448) at jsutil.c:155
#3  0x000d491c in EmitREBytecode (state=0xbfffd000, re=0x28a000,
treeDepth=2731, pc=0x292023 "", t=0x90f778) at jsregexp.c:1448
#4  0x000d58dc in js_NewRegExp (cx=0x3001c0, ts=0x817210, str=0x805070,
flags=1, flat=0) at jsregexp.c:1674
#5  0x000dd66c in js_NewRegExpObject (cx=0x3001c0, ts=0x817210, chars=0x26b018,
length=27068, flags=1) at jsregexp.c:3786
#6  0x000a7474 in js_GetToken (cx=0x3001c0, ts=0x817210) at jsscan.c:1851
#7  0x000af554 in UnaryExpr (cx=0x3001c0, ts=0x817210, tc=0xbfffdeb0) at
jsparse.c:2705
#8  0x000aefe4 in MulExpr (cx=0x3001c0, ts=0x817210, tc=0xbfffdeb0) at
jsparse.c:2611
#9  0x000aeec8 in AddExpr (cx=0x3001c0, ts=0x817210, tc=0xbfffdeb0) at
jsparse.c:2593
#10 0x000aedec in ShiftExpr (cx=0x3001c0, ts=0x817210, tc=0xbfffdeb0) at
jsparse.c:2578
#11 0x000aec98 in RelExpr (cx=0x3001c0, ts=0x817210, tc=0xbfffdeb0) at
jsparse.c:2546
#12 0x000aeb94 in EqExpr (cx=0x3001c0, ts=0x817210, tc=0xbfffdeb0) at
jsparse.c:2522
#13 0x000aead8 in BitAndExpr (cx=0x3001c0, ts=0x817210, tc=0xbfffdeb0) at
jsparse.c:2510
#14 0x000aea1c in BitXorExpr (cx=0x3001c0, ts=0x817210, tc=0xbfffdeb0) at
jsparse.c:2497
#15 0x000ae960 in BitOrExpr (cx=0x3001c0, ts=0x817210, tc=0xbfffdeb0) at
jsparse.c:2484
#16 0x000ae8ac in AndExpr (cx=0x3001c0, ts=0x817210, tc=0xbfffdeb0) at
jsparse.c:2473
#17 0x000ae7f8 in OrExpr (cx=0x3001c0, ts=0x817210, tc=0xbfffdeb0) at
jsparse.c:2462
#18 0x000ae620 in CondExpr (cx=0x3001c0, ts=0x817210, tc=0xbfffdeb0) at
jsparse.c:2422
#19 0x000ae374 in AssignExpr (cx=0x3001c0, ts=0x817210, tc=0xbfffdeb0) at
jsparse.c:2357
#20 0x000ae030 in Variables (cx=0x3001c0, ts=0x817210, tc=0xbfffdeb0) at
jsparse.c:2301
#21 0x000acd84 in Statement (cx=0x3001c0, ts=0x817210, tc=0xbfffdeb0) at
jsparse.c:1908
#22 0x000aa2bc in Statements (cx=0x3001c0, ts=0x817210, tc=0xbfffdeb0) at
jsparse.c:1016
#23 0x000a917c in FunctionBody (cx=0x3001c0, ts=0x817210, fun=0x809b80,
tc=0xbfffdeb0) at jsparse.c:651
#24 0x000a9ee8 in FunctionDef (cx=0x3001c0, ts=0x817210, tc=0xbfffe210,
lambda=0) at jsparse.c:910
#25 0x000aa12c in FunctionStmt (cx=0x3001c0, ts=0x817210, tc=0xbfffe210) at
jsparse.c:984
#26 0x000ab1fc in Statement (cx=0x3001c0, ts=0x817210, tc=0xbfffe210) at
jsparse.c:1294
#27 0x000aa2bc in Statements (cx=0x3001c0, ts=0x817210, tc=0xbfffe210) at
jsparse.c:1016
#28 0x000a8a0c in js_CompileTokenStream (cx=0x3001c0, chain=0x804ba0,
ts=0x817210, cg=0xbfffe210) at jsparse.c:464
#29 0x0001263c in CompileTokenStream (cx=0x3001c0, obj=0x804ba0, ts=0x817210,
tempMark=0x300208, eofp=0x0) at jsapi.c:3219
#30 0x00012cd0 in JS_CompileFile (cx=0x3001c0, obj=0x804ba0, filename=0x303470
"/tmp/0.htm") at jsapi.c:3366
#31 0x0000380c in Load (cx=0x3001c0, obj=0x804ba0, argc=1, argv=0x818424,
rval=0xbfffe450) at js.c:682
#32 0x0006af68 in js_Invoke (cx=0x3001c0, argc=1, flags=0) at jsinterp.c:1293
#33 0x0007c078 in js_Interpret (cx=0x3001c0, pc=0x303427 ":",
result=0xbfffeac0) at jsinterp.c:3563
#34 0x0006baac in js_Execute (cx=0x3001c0, chain=0x804ba0, script=0x3033f0,
down=0x0, flags=0, result=0xbfffebdc) at jsinterp.c:1523
#35 0x00013898 in JS_ExecuteScript (cx=0x3001c0, obj=0x804ba0, script=0x3033f0,
rval=0xbfffebdc) at jsapi.c:3630
#36 0x000028cc in Process (cx=0x3001c0, obj=0x804ba0, filename=0x0) at js.c:385

#37 0x000032d8 in ProcessArgs (cx=0x3001c0, obj=0x804ba0, argv=0xbffffdd0,
argc=0) at js.c:573
#38 0x000074bc in main (argc=0, argv=0xbffffdd0, envp=0xbffffdd4) at js.c:2441
(gdb) up 3
#3  0x000d491c in EmitREBytecode (state=0xbfffd000, re=0x28a000,
treeDepth=2731, pc=0x292023 "", t=0x90f778) at jsregexp.c:1448
1448		    CHECK_OFFSET(diff);
(gdb) l
1443		    op = t->op;
1444		    continue;
1445
1446		case REOP_ENDALT:
1447		    diff = pc - emitStateSP->nextTermFixup;
1448		    CHECK_OFFSET(diff);
1449		    SET_OFFSET(emitStateSP->nextTermFixup, diff);
1450		    if (t->op != REOP_ALT) {
1451			diff = pc - emitStateSP->endTermFixup;
1452			CHECK_OFFSET(diff);
Comment 4 Brendan Eich [:brendan] 2005-02-14 21:08:33 PST
Sigh.  Another bug I wish rogerl were around to fix.

/be
Comment 5 Hermann Schwab 2005-02-25 15:09:00 PST
(In reply to comment #2)
> good way to test if your quality feedback agent is working fine.
No
> yes. confirmed. I built firefox and installed it (i also built the installer
> which was missing the talkback.xpi). 
fine
> so i copied the missing files to
> c:\programming files\mozilla firefox\components\ and tried this bug. It crashed
> my ff and confirmed that my QFA was working fine.

Your QFA isn´t working fine, as it acts like an agent seriously having fever.
Being able to talk, but nothing giving sense.
Talkback is packaged to the build, so installing talkback from another build is
easy, because it is trusted xpi, but it won´t give useful, but wrong info. 

But maybe I´m wrong, that was the info I was told somewhere in a bug.
Comment 6 Bob Clary [:bc:] 2005-02-26 15:56:33 PST
js/tests/Regress/regress-280769.js checked in. 
Comment 7 Igor Bukanov 2005-02-28 12:02:45 PST
Created attachment 175838 [details] [diff] [review]
Fix: replaing assert(x) by generating JS runtime error if x is false

The reason for the bug is that CHECK_OFFSET(diff) defined
(JS_ASSERT(((diff) >= -32768) && ((diff) <= 32767)))
is called for diff that depends on the regexp complexity. With sufficiently
complex RegExp the condition is violated. 

To fix this the patch replaces sequences like 
	    diff = pc - emitStateSP->nextTermFixup;
	    CHECK_OFFSET(diff);
	    SET_OFFSET(emitStateSP->nextTermFixup, diff);

by a call to:
+	     if (!SetForwardJumpOffset(emitStateSP->nextTermFixup, pc))
+		 goto too_big_jump;

where SetForwardJumpOffset returns false if the offset is too big to fit the
bytecode limit. The patch also changes the definition of offset from int16 to
unit16 since all the usages of offset was for forward jumping so it can not be
negative.
Comment 8 Igor Bukanov 2005-02-28 12:05:22 PST
Created attachment 175840 [details]
JS test with controllable complexity

The test case assemble a RegExp instance at runtime to allow for simple
manipulations of the complexity.
Comment 9 Igor Bukanov 2005-02-28 12:14:58 PST
Results with different N from the above script:

N = 5000
js regexp_test.js
<EXPECTED EMPTY OUTPUT>

N = 10000
js regexp_test.js
InternalError: too complex regular expression <EXPECTED>

N= 100*1000
js regexp_test.js
Assertion failure: (emitStateSP - emitStateStack) <= treeDepth, at jsregexp.c:1476
Aborted

That points to JS_ASSERT((emitStateSP - emitStateStack) <= treeDepth)

It looks like another input-dependent assert ...
Comment 10 Igor Bukanov 2005-02-28 12:46:38 PST
Created attachment 175844 [details] [diff] [review]
Fixing tree_depth assert

The reason for the above assert for tree_depth is that it was declared as uin16
and its incrementing was never checked against overflow. The patch changes that
by adding an explicit overflow checks to report about too complex regexp if
necessary. 

The patch also changes the type of tree_depth from uin16 to size_t since
tree_depth is never embedded into any bytecode so it is limited only by maximum
size of EmitStateStackEntry array that system can theoretically malloc.
Comment 11 Igor Bukanov 2005-02-28 15:25:57 PST
The last patch is not enough to fix all regexp overflows. There is a similar bad
problem with CompilerState.classCount which is declared uint16 with no overflow
checks. AFAICS it can be used to cause controllable heap corruption like writing
past allocated array etc. The following test case demonstrates it.
Comment 12 Igor Bukanov 2005-02-28 15:33:13 PST
Created attachment 175864 [details] [diff] [review]
Test case for re class count overflow

If I run the test case in the shell as is (with or without the above patch) I
got:

Assertion failure: c <= cs->length, at jsregexp.c:1943
Aborted

If I change N in th test case from 0xFFFF to 100*1000, I got:
Segmentation fault
Comment 13 Igor Bukanov 2005-02-28 16:13:13 PST
Created attachment 175870 [details] [diff] [review]
Fix extension to cover class count overflow

That patch adds checks for class count overflow. The maximum class limit is set
to 2**15, not 2**16, as the later would require to add a few changes to
ExecuteREBytecode which currently uses signed numbers for class array indexes.

This fixes segfault case of the test case from the attachment 175864 [details] [diff] [review]. So what
is left is fixing the assert problem there.
Comment 14 Bob Clary [:bc:] 2005-02-28 17:00:52 PST
checked in js/tests/js1_5/Regress/regress-280769-1.js for Igor's testcase in
comment 12.
Comment 15 Igor Bukanov 2005-02-28 17:09:52 PST
(In reply to comment #14)
> checked in js/tests/js1_5/Regress/regress-280769-1.js for Igor's testcase in
> comment 12.

Could you make the test to consist of 3 sections with N set to 20000, 0xFFFF and
100000 to have a good coverage. The version with N=100*1000 would not expose the
assert in AddCharacterToCharSet.
Comment 16 Igor Bukanov 2005-02-28 17:22:12 PST
Created attachment 175877 [details] [diff] [review]
Fix extension to cover the assert in AddCharacterToCharSet()

The reason for that assert failure is overflow via RENode.u.startIndex and
RENode.u.kidlen. They represent indexes into the source array and clearly can
exceed 64K. 

This is exactly what happens in the last test case with N=20000 or so. In this
case the string exceeds 64K, startIndex wraps and points to the character ']'
of some previous char range.

The patch changes type for startIndex and kidlen to size_t. It increases the
size of RENode, but avoids adding code for overflow checks since with
startIndex and kidlen defined as pointer difference which always fits size_t.
Comment 17 Bob Clary [:bc:] 2005-02-28 17:47:22 PST
(In reply to comment #15)

Done.

Checking in regress-280769-1.js;
/cvsroot/mozilla/js/tests/js1_5/Regress/regress-280769-1.js,v  <-- 
regress-280769-1.js
new revision: 1.2; previous revision: 1.1
Comment 18 Igor Bukanov 2005-02-28 17:50:22 PST
(In reply to comment #17)
> (In reply to comment #15)
> 
> Done.

It the test the lines:

function crash(N)
{
  var N = 100*1000;

should be simpe:
function crash(N)
{
Comment 19 Igor Bukanov 2005-02-28 18:04:14 PST
(In reply to comment #17)
> (In reply to comment #15)
> 
> Done.
> 

One more thing. The version of the original testcase in the attachment from
comment 8 contains in fact 2 independent tests:

If you set N  == 10000, then it is more or less the original assert
If you set N  == 100000, then it crashes due to different bug. So it is
necessary either to add this 2 cases to regress-280769-1.js or have a separated
regress-280769-2.js to cover N == 10000, 0xFFFF and 100000.
Comment 20 Bob Clary [:bc:] 2005-03-01 01:13:18 PST
(In reply to comment #19)

How about if I create separate files each which calls crash() with:

regress-280769-1.js: N = 10000
regress-280769-2.js: N = 20000
regress-280769-3.js: N = 0xFFFF
regress-280769-4.js: N = 100000

will that do the trick?

Comment 21 Igor Bukanov 2005-03-01 03:24:22 PST
(In reply to comment #20)
> How about if I create separate files each which calls crash() with:
> 

Here are the reasons why I would prefer to have many test cases per test file
rather then multiple test files.

1. When I run the test suite against Rhino each test file means JVM restart. It
takes time and flushes away dynamically compiled bytecode. For a few stress
tests with 100K regexp it is significant delay especially under older JVM like
JDK 1.1.8 or MS JVM.

2. The tests for the bug are stress tests and running them together in one
runtime provides additional assurances if runtime can survive all of them.

3. It is trivial to modify the suite to add more cases of N.

But these are just from my point of view and if one test per file is more
convenient for you then I have no objections what so ever.
Comment 22 Igor Bukanov 2005-03-01 05:14:09 PST
Here is a summary of overflow and their relations to test cases to avoid confusion.

1. Overflow over 32K boundary in regexp bytecode jump offset. This is the
original bug report and is covered by js1_5/Regress/regress-280769.js. It is
also covered by the JS code from the attachment 175840 [details] which I reproduce here
with comments:

var N = 10 * 1000;
var a = new Array(N);
for (var i = 0; i != N; ++i) {
    a[i] = i;
}
var str = a.join('|');  // str is 0|1|2|3|...|<printed value of N -1>
var re = new RegExp(str);
re.exec(N - 1);

2. Overflow over 64K boundary in the treeDepth. It can be tested through the
above script with N changed to 100*1000. There is NO test cases in test suite to
cover this.

3. Overflow over 64K boundary of [] offset from the beginning of string. It
corresponds to N=20000 and N=0xFFFF cases currently present in
js1_5/Regress/regress-280769-1.js. But perhaps it is better to have explicit
testing against it in the form of:

var N = 100 * 1000; // N should be more then 64K
var a = new Array(N + 1);
var prefix = a.join("z"); // prefix is sequence of N "z", zzz...zzz
var str = prefix+"[AB]"; // str is zzz...zzz[A] 
var re = new RegExp(str);
re.exec(prefix+"A");

4. Overflow over 64K boundary in number of classes in regexp. This is tested by
the case with N=100*1000 in js1_5/Regress/regress-280769-1.js.

5. Overflow over 64K length of char sequence inside []. There is no currently
test case for this particular bug. AFAICS it can not be used to crash, but is
visible through wrong exec which would not see the tail of []. Here is a test
case which construct [] to match all chars with even code between 0 and 2N:

var N = 20 * 1000; // It should be that N*6  > 64K and N < 32K

var prefixes = ["000", "00", "0"];

function to_4_hex(i)
{
    var printed = i.toString(16);
    if (printed.length < 4) {
        printed= prefixes[printed.length - 1]+printed;
    }
    return printed;

}

var a = new Array(N);
for (var i = 0; i != N; ++i) {
    a[i] = to_4_hex(2*i);
}

var str = '[\\u'+a.join('\\u')+']'; 
// str is [\u0000\u0002\u0004...\u<printed value of 2N>]

var re = new RegExp(str);

var string_to_match = String.fromCharCode(2 * (N - 1));

var value = re.exec(string_to_match);

var expected =  string_to_match;
var actual = value[0];

if (expected === actual) {
    print("GOOD");
} else {
    print("BAD");
}
Comment 23 Igor Bukanov 2005-03-01 05:35:08 PST
Created attachment 175936 [details] [diff] [review]
Fix for the fix

In the previous patch I forgot to change CompilerState.classCache[].length from
uint16 to size_t to match RENode.u.ucclass.kidlen changes.
Comment 24 Igor Bukanov 2005-03-01 06:42:27 PST
There are few other more places with 64K overflow:

1. Flat strings: they use uint16 for offset and length. The overflow would not
trigger crash but would produce incorrect results during matching.

2. Overflow in REGlobalData.stateStackTop, stateStackLimit. It can be used to
write past allocated array.

3. Overflow over RECapture.length

It seems that the simplest way to fix this is to throw an internal error as long
as regexp string or regexp bytecode exceeds 64K. Is it OK?

Comment 25 Bob Clary [:bc:] 2005-03-01 09:11:04 PST
per Igor's comments, the testcases have been separated and checked in as:

regress-280769.js - Do not crash on overflow of 32K boundary in regexp bytecode
jump offset

regress-280769-1.js - Do not crash on overflow of 64K boundary of [] offset in
regexp search string

regress-280769-2.js - Do not overflow 64K boundary in treeDepth

regress-280769-3.js - Do not crash on overflow of 64K boundary in number of
classes in regexp

regress-280769-4.js - Do not overflow 64K length of char sequence in RegExp [] 
Comment 26 Brendan Eich [:brendan] 2005-03-02 17:37:00 PST
Created attachment 176080 [details] [diff] [review]
proposed fix, based on igor's patches but with jump-to-jump extension

Ok, here's big fun.  Thanks to igor's patches, this fixes up a bunch of
unchecked or arbitrarily half-sized limits.  But I go further, and when a jump
will overflow its two-byte unsigned immediate offset operand, I find a later
alternate's jump to jump to (and so on, spanning megabytes of regexp bytecode
with 64K jumps if necessary).

Comments?  Please remind me if I missed some remaining bogus limit, or left a
dangling comment.

/be
Comment 27 Brendan Eich [:brendan] 2005-03-02 17:42:13 PST
Passes the testsuite, AFAICT -- Bob, what are the known failures for the trunk?

/be
Comment 28 Igor Bukanov 2005-03-03 04:13:31 PST
(In reply to comment #26)
> 
> Please remind me if I missed some remaining bogus limit, or left a
> dangling comment.

The patch does not fix 64K overflow on string offset.

Here is the test case that tests it:

var N = 100 * 1000;

var prefix = new Array(N).join("a"); // prefix is sequence of N "a"

var suffix = "111";

var re = new RegExp(prefix+"0?"+suffix);  // re is /aaa...aaa0?111/

var str_to_match = prefix+suffix;  // str_to_match is "aaa...aaa111"

var index = str_to_match.search(re);

print(0 == index ? "GOOD" : "BAD");

Currently it prints "BAD" since the offset of suffix is encoded only in 2 bytes.
Note that this test also checks if there is an overflow on 64K boundary of
string length, but that is OK now.

I guess one way to fix this is to to REOP_FLATi?_BIG that would use 4 bytes for
string offset and use them instead of REOP_FLATi? as long as the size of the
input string exceeds 64K.

Also the patch tries to support jumps over 64K of bytecode for the "|" case
while in the rest of jump cases it simply causes JS to report too complex regexp
error. This is OK, but if the goal is to match MSIE complexity limits and
extending the jump offset to 3 or even 4 bytes is not an option, then there are
alternative ways to encode jump offset that I believe would take less code then
the current jump to jump workaround for "|".
Comment 29 Igor Bukanov 2005-03-03 05:10:27 PST
(In reply to comment #26)
> Please remind me if I missed some remaining bogus limit

There is one more problem with overflow in ReallocStateStack(REGlobalData *gData):

    uint16 limit = gData->stateStackLimit;
    size_t sz = sizeof(REProgState) * limit;
    
Thats "uint16 limit" should be "size_t limit". 

Plus there is a theoretical problem in the second line: what if
sizeof(REProgState) * limit exceeds SIZE_T_MAX? This is theoretical here since
it would require here for JS_ARENA_GROW_CAST to succeed to allocate 2GB on 32bit
platform previously...

Comment 30 Bob Clary [:bc:] 2005-03-03 07:05:31 PST
The most current results I have are in <http://bclary.com/2004/10/03/results/>.
I will post new ones as soon as I rebuild everything. ETA an hour or so.
Comment 31 Igor Bukanov 2005-03-03 07:16:08 PST
(In reply to comment #26)
> But I go further, and when a jump
> will overflow its two-byte unsigned immediate offset operand, I find a later
> alternate's jump to jump to (and so on, spanning megabytes of regexp bytecode
> with 64K jumps if necessary).

Ths can be quite slow. Consider 

var N = 500 * 1000;
var str = new Array(N).join("a|")+"a"; // str is a|a|a|...|a|a
new RegExp(str);

It took 5 minutes on 1.2 GHz box spending allmost all of the time in re bytecode
generation. Fortunately it is the upper time limit since increasing N would hit
TREE_DEPTH_MAX almost instantly and report internal error, but the script can
repeat new RegExp(str); to get denial of service.


Comment 32 Igor Bukanov 2005-03-03 08:49:10 PST
(In reply to comment #30)
> The most current results I have are in <http://bclary.com/2004/10/03/results/>.
> I will post new ones as soon as I rebuild everything. ETA an hour or so.

Bob, would you add the test code code from comment 28 to the suite as well?
Comment 33 Bob Clary [:bc:] 2005-03-03 09:11:32 PST
testcase from comment 28 added as js/tests/js1_5/Regress/regress-280769-5.js. 
FF trunk from 3/1, had three different results running in the browser: 1) failed
with index != 0; 2) passed with index == 0; 3) crashed. SM trunk opt build from
this morning crashes consistently.
Comment 34 Igor Bukanov 2005-03-03 10:16:55 PST
One more idea to handle 64K overflows.

Such overflows can only happen if string length exceeds (64K -
1)/MAX_POSSIBLE_BYTECODE_SIZE_PER_CHAR. With the current code
MAX_POSSIBLE_BYTECODE_SIZE_PER_CHAR is 13 so only regexp build from strings
longer then 5041 can cause troubles. So the idea is to always use 4 bytes per
arg/offset for such strings and adjust GET_OFFSET etc. macros to be sensitive to
the size flag. 
Comment 35 Brendan Eich [:brendan] 2005-03-03 11:51:36 PST
Comment on attachment 176080 [details] [diff] [review]
proposed fix, based on igor's patches but with jump-to-jump extension

Looking harder at the code, 4-byte jump offsets should not be too horrible. 
I'll take a run at it based on this patch with the altHead/jumpToJumpFlag data
members and related logic removed.

Plus fix the long string probs.  13 bytes per char matched, yeesh!

/be
Comment 36 Chris 2005-06-05 09:35:11 PDT
cannot reproduce any testcase on Mozilla/5.0 (Windows; U; Windows NT 5.0; en-US;
rv:1.8b2) Gecko/20050604 Firefox/1.0+ ID:2005060406

windows 2000sp4
Comment 37 Bob Clary [:bc:] 2005-06-09 23:15:36 PDT
this patch does fix js1_5/Regress/regress-280769.js,
js1_5/Regress/regress-280769-1.js, js1_5/Regress/regress-280769-2.js,
js1_5/Regress/regress-280769-3.js, js1_5/Regress/regress-280769-4.js leaving
only js1_5/Regress/regress-280769-5.js to fail.
Comment 38 Asa Dotzler [:asa] 2005-06-21 11:46:09 PDT
brendan, if this has to happen for b3, then it needs to land in the next few
days. If not, can you move it out to b4?
Comment 39 Brendan Eich [:brendan] 2005-06-21 12:44:09 PDT
This must get fixed, but I don't have time for b3.

/be
Comment 40 Igor Bukanov 2005-06-27 15:11:56 PDT
Created attachment 187442 [details] [diff] [review]
Patch update to reflect CVS changes
Comment 41 Igor Bukanov 2005-06-27 16:33:09 PDT
(In reply to comment #40)
> Created an attachment (id=187442) [edit]
> Patch update to reflect CVS changes
> 
To be precise, this the version of the last Brendan's patch adopted for CVS head.
Comment 42 Igor Bukanov 2005-06-27 16:39:37 PDT
Created attachment 187453 [details] [diff] [review]
Patch extension to deal with flat_string_length >= 64K

The patch extension adds REOP_FLAT32 and REOP_FLAT32i to deal with cases when
string offset or length in the patten exceeds 64K.
Comment 43 Igor Bukanov 2005-06-27 16:39:45 PDT
Created attachment 187454 [details] [diff] [review]
Patch extension to deal with flat_string_length >= 64K

The patch extension adds REOP_FLAT32 and REOP_FLAT32i to deal with cases when
string offset or length in the pattern exceeds 64K.
Comment 44 Igor Bukanov 2005-06-27 16:44:43 PDT
Created attachment 187455 [details] [diff] [review]
The previous patch with smaller whitespace differences
Comment 45 Igor Bukanov 2005-06-28 08:06:34 PDT
Created attachment 187508 [details] [diff] [review]
Variable-length bytecode for flat strings

Compared with the previous version the patch uses variable-length bytecode to
encode flat substring offset and length. For shorter substrings with
length/offset < 128 it would even make bytecode more compact.
Comment 46 Blake Kaplan (:mrbkap) 2005-07-29 14:21:36 PDT
I think I understand what's going on here. I'm not quite sure enough to review
quite yet. Igor, is there anything else that needs fixing here, or is your last
patch enough to get this bug FIXED? If so, I'll review it as soon as I can so we
can try to get it in before the branch.
Comment 47 Igor Bukanov 2005-07-29 16:16:09 PDT
(In reply to comment #46)
> I think I understand what's going on here. I'm not quite sure enough to review
> quite yet. Igor, is there anything else that needs fixing here, or is your last
> patch enough to get this bug FIXED? If so, I'll review it as soon as I can so we
> can try to get it in before the branch.

The patch should fix all known issues with regexps indeed. I was hoping to add
some cleanups to the patch before asking for review, but it would take time
which is hard to find this days for me for spidermonkey.
Comment 48 Bob Clary [:bc:] 2005-07-29 17:48:28 PDT
attachment 187508 [details] [diff] [review] didn't apply cleanly to the trunk. I have a hacked up version
I  can attach unless Igor wants to attach a new version?
Comment 49 Igor Bukanov 2005-07-30 11:17:43 PDT
Created attachment 191066 [details] [diff] [review]
Patch version against CVS head

Updating patch from attachment 187508 [details] [diff] [review] to reflect CVS HEAD changes
Comment 50 Blake Kaplan (:mrbkap) 2005-08-02 15:14:33 PDT
Comment on attachment 191066 [details] [diff] [review]
Patch version against CVS head

There is some whitespace cleanup here that leaves the file in a weirder
whitespace state than it was before. Are you going to clean that up before
checkin? If not, it'd probably be better to leave that for another bug.

>         }
>-        else
>+        else {
>             /* ALT, <next>, ..., JUMP, <end> ... ENDALT */

Move the else up next to the }

>+            /*
>+             * If there's no stacked open parenthesis, we accept the close
>+             * as a flat ')' to match.
>              */

This comment is no longer true. Unmatched ) are errors.

>+                operatorStack = (REOpData *)
>+                    JS_realloc(state->context, operatorStack,
>                                            sizeof(REOpData) * operatorStackSize);

This whitespace fix is half-finished (the sizeof line should move left). There
were a couple earlier in the patch too.

>+        state.progLength += 1 + getCompactIndexWidth(0)
>+                            + getCompactIndexWidth(state.cpend - state.cpbegin);

I admit, I don't quite understand this change, could you explain it?

Other than that and the whitespace nits, I think this looks good, so r=me.
Bob Clary was running this through the JS test suite and had some results. He
should probably post them here.
Comment 51 Igor Bukanov 2005-08-02 16:39:37 PDT
(In reply to comment #50)
> (From update of attachment 191066 [details] [diff] [review] [edit])
> There is some whitespace cleanup here that leaves the file in a weirder
> whitespace state than it was before. Are you going to clean that up before
> checkin? If not, it'd probably be better to leave that for another bug.
> 
> >         }
> >-        else
> >+        else {
> >             /* ALT, <next>, ..., JUMP, <end> ... ENDALT */
> 
> Move the else up next to the }

But then the else if abive has to be rerganized as well which would produce too
much whitespace noise in the patch.
> 
> >+            /*
> >+             * If there's no stacked open parenthesis, we accept the close
> >+             * as a flat ')' to match.
> >              */
> 
> This comment is no longer true. Unmatched ) are errors.
> 
> >+                operatorStack = (REOpData *)
> >+                    JS_realloc(state->context, operatorStack,
> >                                            sizeof(REOpData) *
operatorStackSize);
> 
> This whitespace fix is half-finished (the sizeof line should move left). There
> were a couple earlier in the patch too.

I will fix this.

> 
> >+        state.progLength += 1 + getCompactIndexWidth(0)
> >+                            + getCompactIndexWidth(state.cpend - state.cpbegin);
> 
> I admit, I don't quite understand this change, could you explain it?

The patch changes the encoding of flat strings from fixed 5 bytes (opcode + 2
bytes for string offset + 2 bytes for string length) to corresponding variable
length encoding. In the above case getCompactIndexWidth(0) stands for encoding
of the offset. I will adds comments about that.
> 
> Other than that and the whitespace nits, I think this looks good, so r=me.

Unfortunately I was not careful with coalessing of single char nodes into
REOP_FLAT string. The problem is that it may increase bytecode size with
variable size encoding. The single char bytecode takes 3 bytes. The coalessed
bytecode would become then 
1 + getCompactIndexWidth(offset_of_2_chars) +
getCompactIndexWidth(length_of_2_chars_string) 
  == 2 + getCompactIndexWidth(offset_of_2_chars) 
That exceeds the original allocation of 6 == 2 * 3 bytes when
getCompactIndexWidth(offset_of_2_chars) > 4 or when offset_of_2_chars >= 2^28.
Although I have doubts that on typical hardware one could parse with
SpiderMonkey regexp with 256MB chars, still it is a bug.
Comment 52 Igor Bukanov 2005-08-02 17:23:06 PDT
Created attachment 191429 [details] [diff] [review]
Fixes to address the above issues
Comment 53 Bob Clary [:bc:] 2005-08-02 20:03:40 PDT
(In reply to comment #50)

> 
> Bob Clary was running this through the JS test suite and had some results. He
> should probably post them here.

I finally got the browser based version of the js test library to run all of the
tests by invoking the browser for each test. I see different behavior with
regard to return codes/termination for the following tests:

ecma_3/RegExp/octal-001.js
js1_2/regexp/octal.js
js1_2/regexp/special_characters.js

I am confused on how to interpret my test results, as I could not reproduce the
same behavior when running these tests by hand either via the menu or by
invoking the test url directly. Blake, Igor: if you can investigate these
particular testcases in more detail while I try to figure out what my tests are
doing that would be very cool.
Comment 54 Igor Bukanov 2005-08-03 00:22:46 PDT
(In reply to comment #53)
> I finally got the browser based version of the js test library to run all of the
> tests by invoking the browser for each test. I see different behavior with
> regard to return codes/termination for the following tests:
> 
> ecma_3/RegExp/octal-001.js
> js1_2/regexp/octal.js
> js1_2/regexp/special_characters.js
> 

Hm, js shell gives no regression there:
Test results, smdebug

   Test    List:    ecma_3/RegExp/octal-001.js,    js1_2/regexp/octal.js,
   js1_2/regexp/special_characters.js
   Skip List: (none)
   3 test(s) selected, 3 test(s) completed, 0 failures reported (0% failed)

What were the test reports?
Comment 55 Bob Clary [:bc:] 2005-08-03 00:35:36 PDT
(In reply to comment #54)
> 
> What were the test reports?

I also did not find any problems in the shell versions of the tests, nor when
running them individually in the browser. The test set up is a new one which
creates a new profile, installs my Spider xpi, then launches the browser to
execute a single test via a perl script that watches the process to determine if
it exits normally, crashes or has to be killed after a specified timeout. This
was done for each of the 1200+ tests in the ecma* and js1_* suites. The test
results weren't consistent between runs: in the first run they crashed (or so
the report said), in the second run they timed out. Unfortunately in the first
run, I had Windows debugger disabled so was not able to capture any stacks. I
will try to pin down the situation a bit more and will retest with the latest
patch and get back to you. I mentioned it in the bug since the behavior was
anomalous and I thought you should know. 

Comment 56 Blake Kaplan (:mrbkap) 2005-08-03 14:27:30 PDT
Comment on attachment 191429 [details] [diff] [review]
Fixes to address the above issues

>+            if (t->kid
>+                && getCompactIndexWidth((jschar *)t->kid - state->cpbegin)
>+                     <= 4)

Move the && up onto the "if(t->kid" line. That should allow the "<= 4" to fit
nicely on the getCompactIndexWidth line.

r=me
Comment 57 Bob Clary [:bc:] 2005-08-03 14:37:39 PDT
Got a stack on ecma_3/RegExp/octal-001.js and a very similar one in
js1_2/regexp/octal.js

-	cap	0x02a3e9b4
	index	CXX0030: Error: expression cannot be evaluated
	length	CXX0030: Error: expression cannot be evaluated
	i	0x0012e910
	len	0x0012e8e0
+	parenContent	0x10212788 "??????????"
	parenIndex	0x0000ffff
+	&x->parens[parenIndex]	0x02a3e9b4


BackrefMatcher(REGlobalData * 0x0012ea10, REMatchState * 0x029be9b8, unsigned
int 0x0000ffff) line 2086 + 3 bytes
SimpleMatch(REGlobalData * 0x0012ea10, REMatchState * 0x029be9b8, int
0x0000000d, unsigned char * * 0x0012e970, int 0x00000001) line 2474 + 17 bytes
ExecuteREBytecode(REGlobalData * 0x0012ea10, REMatchState * 0x029be9b8) line
2599 + 23 bytes
MatchRegExp(REGlobalData * 0x0012ea10, REMatchState * 0x029be9b8) line 3072 + 13
bytes
js_ExecuteRegExp(JSContext * 0x028969a0, JSRegExp * 0x029a2800, JSString *
0x0298a508, unsigned int * 0x0012ead4, int 0x00000000, long * 0x0012ebf4) line
3171 + 13 bytes
match_or_replace(JSContext * 0x028969a0, JSObject * 0x0298a7f8, unsigned int
0x00000001, long * 0x029ba04c, int (JSContext *, long, GlobData *)* 0x100b4550
match_glob(JSContext *, long, GlobData *), GlobData * 0x0012eb1c, long *
0x0012ebf4) line 1206 + 29 bytes
str_match(JSContext * 0x028969a0, JSObject * 0x0298a7f8, unsigned int
0x00000001, long * 0x029ba04c, long * 0x0012ebf4) line 1260 + 34 bytes
js_Invoke(JSContext * 0x028969a0, unsigned int 0x00000001, unsigned int
0x00000000) line 1173 + 23 bytes
js_Interpret(JSContext * 0x028969a0, unsigned char * 0x029b4c12, long *
0x0012f5fc) line 3463 + 15 bytes
js_Execute(JSContext * 0x028969a0, JSObject * 0x028dcb38, JSScript * 0x029b4af0,
JSStackFrame * 0x00000000, unsigned int 0x00000000, long * 0x0012f704) line 1403
+ 19 bytes
JS_EvaluateUCScriptForPrincipals(JSContext * 0x028969a0, JSObject * 0x028dcb38,
JSPrincipals * 0x02911fdc, const unsigned short * 0x029b6c88, unsigned int
0x00000f61, const char * 0x029ab2d8, unsigned int 0x00000001, long * 0x0012f704)
line 3854 + 25 bytes
nsJSContext::EvaluateString(const nsAString_internal & {...}, void * 0x028dcb38,
nsIPrincipal * 0x02911fd8, const char * 0x029ab2d8, unsigned int 0x00000001,
const char * 0x100da83c _js_default_str, nsAString_internal * 0x00000000, int *
0x0012f768) line 1060 + 67 bytes
nsScriptLoader::EvaluateScript(nsScriptLoadRequest * 0x0299d0a8, const nsString
& {...}) line 757
nsScriptLoader::ProcessRequest(nsScriptLoadRequest * 0x0299d0a8) line 658 + 22 bytes
nsScriptLoader::OnStreamComplete(nsScriptLoader * const 0x02910ab4,
nsIStreamLoader * 0x029a20d0, nsISupports * 0x0299d0a8, unsigned int 0x00000000,
unsigned int 0x00000f61, const unsigned char * 0x02995010) line 1020
nsStreamLoader::OnStopRequest(nsStreamLoader * const 0x029a20d4, nsIRequest *
0x029a79e8, nsISupports * 0x0299d0a8, unsigned int 0x00000000) line 137
nsStreamListenerTee::OnStopRequest(nsStreamListenerTee * const 0x029ab050,
nsIRequest * 0x029a79e8, nsISupports * 0x0299d0a8, unsigned int 0x00000000) line 66
nsHttpChannel::OnStopRequest(nsHttpChannel * const 0x029a79f0, nsIRequest *
0x029a2a18, nsISupports * 0x00000000, unsigned int 0x00000000) line 4058
nsInputStreamPump::OnStateStop() line 507
nsInputStreamPump::OnInputStreamReady(nsInputStreamPump * const 0x029a2a1c,
nsIAsyncInputStream * 0x0299fda0) line 343 + 11 bytes
nsInputStreamReadyEvent::EventHandler(PLEvent * 0x029a2b0c) line 120
PL_HandleEvent(PLEvent * 0x029a2b0c) line 688 + 10 bytes
PL_ProcessPendingEvents(PLEventQueue * 0x00ae9418) line 623 + 9 bytes
_md_EventReceiverProc(HWND__ * 0x11a400e2, unsigned int 0x0000c0e8, unsigned int
0x00000000, long 0x00ae9418) line 1408 + 9 bytes
USER32! 77d48734()
USER32! 77d48816()
USER32! 77d489cd()
USER32! 77d48a10()
nsAppShell::Run(nsAppShell * const 0x00b61250) line 135
nsAppStartup::Run(nsAppStartup * const 0x00b611b0) line 145 + 26 bytes
XRE_main(int 0x00000005, char * * 0x003f7eb8, const nsXREAppData * 0x0042201c
kAppData) line 2322 + 35 bytes
main(int 0x00000005, char * * 0x003f7eb8) line 61 + 18 bytes
mainCRTStartup() line 338 + 17 bytes
Comment 58 Igor Bukanov 2005-08-03 16:37:04 PDT
Created attachment 191540 [details] [diff] [review]
Patch split: whitespace only changes

Given Bob's findings I decided to split patch into whitespace-only part and the
real changes. So here comes whitespace part.

Note that I do not have CVS commit access, so when approved this has to be
committed by somebody else.
Comment 59 Igor Bukanov 2005-08-03 16:47:45 PDT
Created attachment 191543 [details] [diff] [review]
Patch split: the real changes

If time permits I will split this patch to 2-3 more parts to locate the crash
trigger.
Comment 60 Blake Kaplan (:mrbkap) 2005-08-03 19:19:56 PDT
Comment on attachment 191540 [details] [diff] [review]
Patch split: whitespace only changes

>     RECapture parens[1];      /* first of 're->parenCount' captures,
>-                               * allocated at end of this struct.
>-                               */
>+                                 allocated at end of this struct */

Prevailing style for these sorts of comments (at least in this file) would
dictate a * in front of "allocated" and pushing the */ onto the next line.

>+    JSArenaPool     pool;           /* It's faster to use one malloc'd pool
>+                                       than to malloc/free the three items
>+                                       that are allocated from this pool */

Same here.

>+                 * If there's no stacked open parenthesis, throw syntax error.

throw "a" syntax error.

>+             * If there's no stacked open parenthesis, throw syntax error.

Same.

Other than those nits, r=me. I won't be able to check this in until this coming
Sunday, so if you want to make a new patch so someone else can check this in,
feel free. Otherwise, I'll just pick the nits when I check it in.
Comment 61 Igor Bukanov 2005-08-04 01:47:34 PDT
(In reply to comment #60)
> Prevailing style for these sorts of comments (at least in this file) would
> dictate a * in front of "allocated" and pushing the */ onto the next line.

Those changes were in Brendan's patch and I simply kept them.
Comment 62 Brendan Eich [:brendan] 2005-08-19 11:07:31 PDT
Igor is the man here, I'm just holding the bug and ready to review and check in.
 Blake (mrbkap) is even more involved in that than I am, so maybe he should take
it.  Igor, if you will have time in the next four days to figure out what caused
the crash regression, please let us know -- if not, say so too so we can help do
that work.

This should be fixed for 1.8 if at all possible, and it certainly seems possible
to fix within a week or ten days.  Anyone disagree?

/be
Comment 63 Igor Bukanov 2005-08-19 11:58:33 PDT
Comment on attachment 191540 [details] [diff] [review]
Patch split: whitespace only changes

Asking for aproval of the prepatch with ONLY code-style changes.
Comment 64 Igor Bukanov 2005-08-19 12:10:06 PDT
(In reply to comment #62)
> Igor, if you will have time in the next four days to figure out what caused
> the crash regression, please let us know -- if not, say so too so we can help 
> do that work.

I did not see the crash with the test suite running against js shell. Thus I
will try to check the patch one more time and perhaps split it into few more
parts so the regressions can be located more easly.
Comment 65 Brendan Eich [:brendan] 2005-08-19 12:23:27 PDT
Comment on attachment 191540 [details] [diff] [review]
Patch split: whitespace only changes

Sure -- this should go into the trunk and the MOZILLA_1_8_BRANCH, for sanity's
sake.

/be
Comment 66 Igor Bukanov 2005-08-23 09:22:49 PDT
(In reply to comment #65)
> (From update of attachment 191540 [details] [diff] [review] [edit])
> Sure -- this should go into the trunk and the MOZILLA_1_8_BRANCH, for sanity's
> sake.
> 
> /be
> 

So could somebody commit this?
Comment 67 Igor Bukanov 2005-08-23 09:25:19 PDT
(In reply to comment #64)
> (In reply to comment #62)
> > Igor, if you will have time in the next four days to figure out what caused
> > the crash regression, please let us know -- if not, say so too so we can help 
> > do that work.
> 
> I did not see the crash with the test suite running against js shell. Thus I
> will try to check the patch one more time and perhaps split it into few more
> parts so the regressions can be located more easly.
> 

Unfortunately I have no time to look at it until the next weekend :(

Comment 68 Blake Kaplan (:mrbkap) 2005-08-23 09:52:17 PDT
(In reply to comment #66)
> So could somebody commit this?

Whitespace changes checked in to trunk and MOZILLA_1_8_BRANCH. Sorry for taking
so long to check this in.
Comment 69 Blake Kaplan (:mrbkap) 2005-08-25 11:53:42 PDT
This isn't quite fixed yet. Only the whitespace changes were checked in. I've
volunteered to try to drive this in.
Comment 70 Asa Dotzler [:asa] 2005-08-29 15:05:49 PDT
still interested as soon as this is available but if we're all ready to go with
everything else, this might have to slip to beta2
Comment 71 Igor Bukanov 2005-09-10 10:53:35 PDT
Created attachment 195555 [details] [diff] [review]
Yet another patch split: comments, moves and treeDepth

Here is another patch split to help to locate the crash bug in the original
version. This is the part that contains comments and code movements/renames
from the attachment 191543 [details] [diff] [review] plus making treeDepth to be size_t instead of
uint16. The later is the only essential fix from the older version.
Comment 72 Igor Bukanov 2005-09-10 10:53:43 PDT
Created attachment 195556 [details] [diff] [review]
Yet another patch split: comments, moves and treeDepth

Here is another patch split to help to locate the crash bug in the original
version. This is the part that contains comments and code movements/renames
from the attachment 191543 [details] [diff] [review] plus making treeDepth to be size_t instead of
uint16. The later is the only essential fix from the older version.
Comment 73 Igor Bukanov 2005-09-10 11:06:11 PDT
Created attachment 195557 [details] [diff] [review]
The real fixes

Here is the rest of chnages from the attachment 191543 [details] [diff] [review] with real fixes for
overflows and that bug somewhere causing crashes.
Comment 74 Igor Bukanov 2005-09-10 11:11:27 PDT
Comment on attachment 195556 [details] [diff] [review]
Yet another patch split: comments, moves and treeDepth

Consider this for commit. I wish I had time to locate the bug but at least the
patch split can help with that.
Comment 75 Blake Kaplan (:mrbkap) 2005-09-10 12:40:05 PDT
(In reply to comment #57)
> +	parenContent	0x10212788 "??????????"
> 	parenIndex	0x0000ffff
> +	&x->parens[parenIndex]	0x02a3e9b4

This looks like Bob might have been hitting bug 307317. If that's the case, this
patch might have not been at fault all along. I'm not sure I can explain the
parenIndex, though. That does seem to be bogus, though I haven't studied the
testcase in question.
Comment 76 Igor Bukanov 2005-09-10 14:11:22 PDT
Created attachment 195576 [details] [diff] [review]
More fix splitting: compact index for flat strings


I decided to split the "real fixes" into 2 more parts. Here comes the compact
index changes to use variable-length encoding for flat string length and
offset.
Comment 77 Igor Bukanov 2005-09-10 14:33:41 PDT
Created attachment 195580 [details] [diff] [review]
More splitting 2: overflow checks.

The remaining overflow checks.
Comment 78 Igor Bukanov 2005-09-10 14:39:54 PDT
(In reply to comment #75)
> I'm not sure I can explain the
> parenIndex, though. That does seem to be bogus, though I haven't studied the
> testcase in question.

Could you post one more time the problematic patch lines?

Here is the structure od the lasted patch split:

1. Attachment 195556 [details] [diff] -- mostly code movements and treeDepth fixes
2. Attachment 195576 [details] [diff] -- compact index fixes
3. Attachment 195580 [details] [diff] -- checks for overflow

I think if there are problems, then given the crash description the case 2.
Comment 79 Igor Bukanov 2005-09-11 07:02:09 PDT
Created attachment 195628 [details] [diff] [review]
Split: eliminating uint16 as indexes and limits check

The split contains all changes regarding replacing uint16 by size_t/uintN from
the the attachment 195580 [details] [diff] [review]. It addition it extends the usage of compact indexes
for cases like parenthesis count or min/max quantifiers. In this way more
compact bytecode is generated for typical cases of small numbers.

Another change compared with attachment 195580 [details] [diff] [review] is 2^24 bytes limit on the size
of memory that can be allocated for class bitmaps. It replaces the previous
limit of 64K classes.
Comment 80 Igor Bukanov 2005-09-11 07:09:48 PDT
Created attachment 195629 [details] [diff] [review]
Split: jump overflow checks

This is the final part of the split that adds jump overflow checks for bytecode
and special jump-to-jump workaround to support case1|case2|...|caseN regexp
when otherwise 64K jump limit would report error.

This is the patch to fix the original bug report and in principle can be
applied independently if those compact index changes are too drastic.
Comment 81 Igor Bukanov 2005-09-11 07:14:58 PDT
New summary of patches:

1. Attachment 195556 [details] [diff] -- mostly code movements and treeDepth fixes
2. Attachment 195576 [details] [diff] -- compact index for flat strings
3. Attachment 195628 [details] [diff] -- compact indexes for all index-like data
4. Attachment 195629 [details] [diff] -- jump overflow checks

If cases 2-3 would be buggy, then the case 4 can be applied independently (after
some patch edits).
Comment 82 Bob Clary [:bc:] 2005-09-12 19:01:27 PDT
Created attachment 195826 [details]
split test results

(In reply to comment #81)

I ran a series of tests on winxsp consisting of the tests (ecma*, js*, e4x), in
the shell, in the browser, and each test run individually in the browser with
Firefox 1.5 and Firefox 1.6 opt and debug builds first beginning with a cvs
pull without any patch from Saturday and after each of the 4 patches.

The shell based tests showed _no regressions_.

Note that js1_5/Regress/regress-280769-3.js was not fixed in either the shell
or browser tests.

ecma_3/RegExp/octal-001.js, js1_2/regexp/octal.js,
js1_2/regexp/special_characters.js began crashing in the browser based tests
with patch 2.

There were random, non reproducible crashes especially with the trunk. Full
result summary is attached. I do not have stacks for these yet.
Comment 83 Igor Bukanov 2005-09-13 06:56:09 PDT
Created attachment 195865 [details] [diff] [review]
compact index for flat strings (version 2)

This is a new version of the patch from attachment 195576 [details] [diff] [review]. It adds realloc to
shrink memory for bytecode to the final value. In this way if there are buffer
overwrites, they would more likely touch malloc internal structs causing faster
crashes.
Comment 84 Igor Bukanov 2005-09-13 07:07:33 PDT
(In reply to comment #82)
> ecma_3/RegExp/octal-001.js, js1_2/regexp/octal.js,
> js1_2/regexp/special_characters.js began crashing in the browser based tests
> with patch 2.
> 
> There were random, non reproducible crashes especially with the trunk.

I do not see how the changes may introduce this :(. Assuming there is a bug that
escape shell detection and visible only through browser, it is hard to see how
the changes may affect the other parts of the browser since even the amount of
allocated memory in the given test cases remains the same. 

To try to expose the bug I updated the second patch from the list with code to
shrink allocated space for regexp to get earlier heap overrun expose. So here is
new summary list:

1. Attachment 195556 [details] [diff] -- mostly code movements and treeDepth fixes
2. Attachment 195865 [details] [diff] -- compact index for flat strings and bytecode shrink
3. Attachment 195628 [details] [diff] -- compact indexes for all index-like data
4. Attachment 195629 [details] [diff] -- jump overflow checks

Is it feasible to apply 1-2 and watch again for the crashes? And is it possible
to run teastcases on Linux with MALLOC_CHECK_ environment var (see man malloc)
set to 2?

Comment 85 Blake Kaplan (:mrbkap) 2005-09-13 10:20:52 PDT
Comment on attachment 195556 [details] [diff] [review]
Yet another patch split: comments, moves and treeDepth

r=mrbkap
Comment 86 Bob Clary [:bc:] 2005-09-13 10:48:20 PDT
Created attachment 195884 [details]
stacks for patches 1, 2 in comment 84

(In reply to comment #84)

I tested with a clean checkout and a clobber build for trunk on winxpsp2.

Here are the stacks for 

ecma_3/Function/regress-131964.js (probably not related, but I'll include it
for completeness sake)

ecma_3/RegExp/octal-001.js
js1_2/regexp/octal.js
js1_2/regexp/special-characters.js
js1_5/Regress/regress-280769.js
js1_5/Regress/regress-[123].js

Igor, if you need me to poke around the stacks in the debugger, lets schedule a
time where we can get together on irc.
Comment 87 Blake Kaplan (:mrbkap) 2005-09-13 23:14:05 PDT
Created attachment 196012 [details] [diff] [review]
fix bug pointed out by bc

It looks like this old bug (which bc pointed out to me on IRC) is being
expounded by the loop in ReadCompactIndex). We need to protect against
parenCount = 0.
Comment 88 Bob Clary [:bc:] 2005-09-14 07:16:37 PDT
+    resultsize = sizeof (REMatchState);
     ^^^^^^^^^^
+    if (re->parenCount)
+        resultSize += (re->parenCount - 1) * sizeof(RECapture);
         ^^^^^^^^^^

Anyone else notice that when trying to edit the attachment you could not delete
or cut text in yesterdays FF 1.4 build?
Comment 89 Brendan Eich [:brendan] 2005-09-14 14:38:45 PDT
Comment on attachment 196012 [details] [diff] [review]
fix bug pointed out by bc

Just curious why this is necessary, given -1 is 0xffffffff on 32-bit systems,
and multiplying times sizeof(RECapture) is probably 0xfffffff8, and adding that
to the size being computed just subtracts 8 from the total size -- and that's
ok because there's a 1-element RECapture array in REMatchState already.

What am I missing?

/be
Comment 90 Blake Kaplan (:mrbkap) 2005-09-14 15:12:40 PDT
(In reply to comment #89)
> What am I missing?

Nothing, I suppose. I just don't like relying on integer overflow (in addition
and multiplication!) to allocate memory. This makes it explicit, and is
guaranteed to allocate the correct amount of memory, as opposed to hoping that
everything turns out correctly.
Comment 91 Brendan Eich [:brendan] 2005-09-14 15:20:05 PDT
Comment on attachment 196012 [details] [diff] [review]
fix bug pointed out by bc

But there's no bug, right?  Whatever size_t, whether 32- or 64-bit, it's
unsigned, and unsigned modular arithmetic will do the right thing.

This patch will overallocate trivially in the 0-parens case.

I just don't see the point, given other real bugs to fix in jsregexp.c.  Sell
me more?

/be
Comment 92 Igor Bukanov 2005-09-15 06:41:30 PDT
Created attachment 196159 [details] [diff] [review]
Updated split: mostly code movements and treeDepth fixes

This is an updated version of attachment 195556 [details] [diff] [review] to refelect changes in CVS with
the fix for bug 308566.
Comment 93 Igor Bukanov 2005-09-15 06:42:57 PDT
Created attachment 196160 [details] [diff] [review]
Updated split: tighter memory allocation and assert checks for bytecode

This is yet another patch subsplit to add realloc to shrink memory for
allocation and adds assert checks during bytecode read to ensure its
consistency.

The patch also changes problematic-looking code pointed above:

     JS_ARENA_ALLOCATE_CAST(result, REMatchState *,
			    &gData->pool,
-			    sizeof(REMatchState)
-				+ (re->parenCount - 1) * sizeof(RECapture));
+			    offsetof(REMatchState, parens)
+				+ re->parenCount * sizeof(RECapture));

which states intentions clearly.
Comment 94 Igor Bukanov 2005-09-15 06:43:53 PDT
Created attachment 196161 [details] [diff] [review]
Updated split: compact index for flat strings

This is an update of the attachment 195865 [details] [diff] [review] to take into account the above
subsplit.
Comment 95 Igor Bukanov 2005-09-15 06:45:21 PDT
Created attachment 196162 [details] [diff] [review]
Updated split: compact indexes for all index-like data

This is an update of attachment 195628 [details] [diff] [review].
Comment 96 Igor Bukanov 2005-09-15 06:46:06 PDT
Created attachment 196163 [details] [diff] [review]
Updated split: jump overflow checks

This is an update of attachment 195629 [details] [diff] [review].
Comment 97 Igor Bukanov 2005-09-15 06:50:14 PDT
New summary for patches:

Attachment 196159 [details] [diff] -- mostly code movements and treeDepth fixes
Attachment 196160 [details] [diff] -- tighter memory allocation and assert checks for bytecode
Attachment 196161 [details] [diff] -- compact index for flat strings
Attachment 196162 [details] [diff] -- compact indexes for all index-like data
Attachment 196163 [details] [diff] -- jump overflow checks
Comment 98 Blake Kaplan (:mrbkap) 2005-09-15 11:01:29 PDT
Comment on attachment 196159 [details] [diff] [review]
Updated split: mostly code movements and treeDepth fixes

r=mrbkap
Comment 99 Brendan Eich [:brendan] 2005-09-16 17:22:28 PDT
Comment on attachment 196159 [details] [diff] [review]
Updated split: mostly code movements and treeDepth fixes

sr=me, this is good to go on the trunk -- mrbkap, can you land it ASAP?

/be
Comment 100 Brendan Eich [:brendan] 2005-09-16 17:35:43 PDT
Comment on attachment 196160 [details] [diff] [review]
Updated split: tighter memory allocation and assert checks for bytecode

>@@ -1756,7 +1756,19 @@ js_NewRegExp(JSContext *cx, JSTokenStrea
>         goto out;
>     }
>     *endPC++ = REOP_END;
>-    JS_ASSERT(endPC <= (re->program + (state.progLength + 1)));
>+    /*
>+     * Check if size was overestimated and shrink using realloc.

Nit: use "whether" instead of "if".

>@@ -2010,7 +2019,18 @@ ProcessCharSet(REGlobalData *gData, RECh
>     intN nDigits, i;
> 
>     JS_ASSERT(!charSet->converted);
>+    /*
>+     * Assert that startIndex and length points to chars inside [] inside
>+     * source string.
>+     */
>+    JS_ASSERT(1 <= charSet->u.src.startIndex
>+              && charSet->u.src.startIndex <= JSSTRING_LENGTH(gData->regexp->source));

Nit: put && at the end of the first line to match prevailing style.

Also, can't we assert <, not <=, in the second line?

>+    JS_ASSERT(charSet->u.src.length <= JSSTRING_LENGTH(gData->regexp->source)
>+                                           - charSet->u.src.startIndex - 1);

Style nit only: underhang so that - lines up with the J.  Or something.  The
above is just indented four more, but indentation usually means something to-do
with block structure, case or goto labels, etc.  Terms in an expression
generally start in the same column, or the overflow line starts with an
operator underhanging the previous line's term.  Rarely, you see the operator
pulled back so that terms line up -- it's all good, but the above just seems
overindented to me.

>+        JS_ASSERT(1 <= length
>+                  && length <= JSSTRING_LENGTH(gData->regexp->source) - offset);

Nit again about && at end of first line.

>+        JS_ASSERT(1 <= length
>+                  && length <= JSSTRING_LENGTH(gData->regexp->source) - offset);

Ibid.

>@@ -2666,6 +2697,8 @@ ExecuteREBytecode(REGlobalData *gData, R
>                 pc += ARG_LEN;
>                 curState->u.quantifier.max = GET_ARG(pc);
>                 pc += ARG_LEN;
>+                JS_ASSERT(curState->u.quantifier.min
>+                          <= curState->u.quantifier.max);

Here's a nice underhang -- could use this style for that earlier assertion,
might be more readable.

>@@ -2969,8 +3004,8 @@ InitMatch(JSContext *cx, REGlobalData *g
> 
>     JS_ARENA_ALLOCATE_CAST(result, REMatchState *,
>                            &gData->pool,
>-                           sizeof(REMatchState)
>-                               + (re->parenCount - 1) * sizeof(RECapture));
>+                           offsetof(REMatchState, parens)
>+                               + re->parenCount * sizeof(RECapture));

Here's another case that overindents the underhang, but I'm really picking nits
here -- forgive me.

Looks great otherwise, on to the next patch!  Thanks again, Igor.

Mrbkap: perhaps we should land all at once, to ease branch patch providing (in
the bug) and applying?

/be
Comment 101 Brendan Eich [:brendan] 2005-09-16 18:36:52 PDT
Comment on attachment 196161 [details] [diff] [review]
Updated split: compact index for flat strings

>+static size_t
>+GetCompactIndexWidth(size_t index)
>+{
>+   size_t width = 1;
>+   while ((index >>= 7) != 0) {

One-time nit, applies to several places in the patch: prevailing style puts an
empty line between declarations and statements in function bodies.

>@@ -1541,9 +1581,22 @@ EmitREBytecode(CompilerState *state, JSR
> 
>         case REOP_FLAT:
>             /*
>-             * Consecutize FLAT's if possible.
>+             * Coalesce FLATs if possible and if it would not increase bytecode
>+             * size. The later can only happen when
>+             * GetCompactIndexWidth(offset_of_2_chars)

This would fit and be less ragged-right if offset_of_2_chars was shorter --
just a thought.  That name does not appear in the code, so you might just use
offset and explain what it is separately.

>+                                                       exceeds 4, i.e. when
>+             * initial coalescing replaces 2*3==6 bytes preallocated for 2
>+             * single char nodes by
>+             * 1 + GetCompactIndexWidth(offset_of_2_chars)
>+             *  + GetCompactIndexWidth(length_of_2_chars_string)

Nit: indent that + to underhang the one on the line before :-).

>+             * or 2 + GetCompactIndexWidth(offset_of_2_chars)
>+             * and that exceeds 6.
>+             * Since when GetCompactIndexWidth(offset_of_2_chars) <= 4

Comma at end of phrase here.

>+             * coalescing of 3 or more nodes strictly decreases bytecode
>+             * size, that check has to be done only for the first coalescing.

Not sure whether it's "this check" or "that check" -- probably "this".	English
grammar/usage nuts please help me out.

>              */
>-            if (t->kid) {
>+            if (t->kid &&
>+                GetCompactIndexWidth((jschar *)t->kid - state->cpbegin) <= 4)
>+            {
>                 while (t->next &&
>                        t->next->op == REOP_FLAT &&
>                        (jschar*)t->kid + t->u.flat.length ==
>@@ -1552,17 +1605,14 @@ EmitREBytecode(CompilerState *state, JSR
>                     t->next = t->next->next;
>                 }
>             }
>-            if (t->kid && (t->u.flat.length > 1)) {
>-                if (state->flags & JSREG_FOLD)
>-                    pc[-1] = REOP_FLATi;
>-                else
>-                    pc[-1] = REOP_FLAT;
>-                SET_ARG(pc, (jschar *)t->kid - state->cpbegin);
>-                pc += ARG_LEN;
>-                SET_ARG(pc, t->u.flat.length);
>-                pc += ARG_LEN;
>+            if (t->kid && t->u.flat.length > 1) {
>+                pc[-1] = (state->flags & JSREG_FOLD)
>+                         ? REOP_FLATi : REOP_FLAT;

Nit: change this to

		pc[-1] = (state->flags & JSREG_FOLD) ? REOP_FLATi : REOP_FLAT;
0123456789012345678901234567890123456789012345678901234567890123456789012345678
9

It fits on one line without entering forbidden column 80.

>+                pc = WriteCompactIndex(pc, (jschar *)t->kid - state->cpbegin);
>+                pc = WriteCompactIndex(pc, t->u.flat.length);
>             } else if (t->u.flat.chr < 256) {
>-                pc[-1] = (state->flags & JSREG_FOLD) ? REOP_FLAT1i : REOP_FLAT1;
>+                pc[-1] = (state->flags & JSREG_FOLD)
>+                         ? REOP_FLAT1i : REOP_FLAT1;

This one squeaks by in my book!

		pc[-1] = (state->flags & JSREG_FOLD) ? REOP_FLAT1i :
REOP_FLAT1;

0123456789012345678901234567890123456789012345678901234567890123456789012345678
9

Who cares about Emacs' losing (fixed now?) tendency to wrap if a newline is the
character that exceeds column 80 by falling into column 81?

>@@ -1726,7 +1776,9 @@ js_NewRegExp(JSContext *cx, JSTokenStrea
>         state.result->u.flat.chr = *state.cpbegin;
>         state.result->u.flat.length = JSSTRING_LENGTH(str);
>         state.result->kid = (void *) state.cpbegin;
>-        state.progLength += 5;
>+        /* Flat bytecode: REOP_FLAT compact(string_offset) compact(length). */
>+        state.progLength += 1 + GetCompactIndexWidth(0)
>+                            + GetCompactIndexWidth(state.cpend - state.cpbegin);

Perhaps cleaner to underhang + here.

Great work, I'm reduced to minor comments.  Thanks again.

/be
Comment 102 Brendan Eich [:brendan] 2005-09-16 18:38:21 PDT
Comment on attachment 196160 [details] [diff] [review]
Updated split: tighter memory allocation and assert checks for bytecode

Forgot to mark r+ with assertion < vs. <= question addressed, and nits picked
as much as you can stand.

/be
Comment 103 Igor Bukanov 2005-09-17 12:02:43 PDT
Created attachment 196440 [details] [diff] [review]
Patch split 2009-09-17: mostly code movements and treeDepth fixes

This is an update of attachment 196159 [details] [diff] [review] to reflect commit for bug 306727.
Comment 104 Igor Bukanov 2005-09-17 12:05:01 PDT
Created attachment 196441 [details] [diff] [review]
Patch split 2009-09-17: tighter memory allocation and assert checks for bytecode

This is an update for attachment 196160 [details] [diff] [review] to reflect comment #100 including nits
and using < in class bytecode consistency checks in:

+    JS_ASSERT(charSet->u.src.startIndex
+	       < JSSTRING_LENGTH(gData->regexp->source));

In addition this version of the patch replaces JS_ASSERT(a && b) by 2 separated
asserts to get more resize printouts in case of bugs and get better looking
indentation as a bonus.
Comment 105 Igor Bukanov 2005-09-17 12:06:38 PDT
Created attachment 196442 [details] [diff] [review]
Patch split 2009-09-17: compact index for flat strings

This is an update for attachment 196161 [details] [diff] [review] to reflect comment #101 and add
documemtation about compact index bytecode.
Comment 106 Igor Bukanov 2005-09-17 12:07:53 PDT
Created attachment 196444 [details] [diff] [review]
Patch split 2009-09-17: compact indexes for all index-like data

This is an update for attachment 196162 [details] [diff] [review] to make patch to apply cleanly.
Comment 107 Igor Bukanov 2005-09-17 12:09:10 PDT
Created attachment 196446 [details] [diff] [review]
Patch split 2009-09-17: jump overflow checks

This is an update for attachment 196163 [details] [diff] [review] to make patch to apply cleanly.
Comment 108 Igor Bukanov 2005-09-17 12:21:26 PDT
Created attachment 196449 [details] [diff] [review]
Combined patch 2005-09-17

New summary of patches:

Attachment 196440 [details] [diff] -- mostly code movements and treeDepth fixes
Attachment 196441 [details] [diff] -- tighter memory allocation and assert checks for bytecode
Attachment 196442 [details] [diff] -- compact index for flat strings
Attachment 196444 [details] [diff] -- compact indexes for all index-like data
Attachment 196445 [details] -- jump overflow checks
This attachment -- the combined patch against CVS HEAD from 2005-09-17
Comment 109 Igor Bukanov 2005-09-17 14:26:42 PDT
Created attachment 196458 [details] [diff] [review]
Patch split 2009-09-17: V2 of compact indexes for all index-like data

In the attachment 196444 [details] [diff] [review] I forgot to include jsregexp.h with uint16->size_t
changes.
Comment 110 Igor Bukanov 2005-09-17 14:42:22 PDT
Created attachment 196460 [details] [diff] [review]
Combined patch 2005-09-17 v2

New summary of patches:

Attachment 196440 [details] [diff] [edit] -- mostly code movements and treeDepth fixes
Attachment 196441 [details] [diff] [edit] -- tighter memory allocation and assert checks for
bytecode
Attachment 196442 [details] [diff] [edit] -- compact index for flat strings
Attachment 196458 [details] [diff] [edit] -- compact indexes for all index-like data
Attachment 196445 [details] [edit] -- jump overflow checks
This attachment -- the combined patch against CVS HEAD from 2005-09-17 that
allows to pass all 6 js1_5/Regress/regress-280769* tests.
Comment 111 Asa Dotzler [:asa] 2005-09-21 11:53:43 PDT
Have we fixed enough of the actual crash problem here to avoid taking further
changes (and risk) ?
Comment 112 Blake Kaplan (:mrbkap) 2005-09-21 12:21:35 PDT
(In reply to comment #111)
> Have we fixed enough of the actual crash problem here to avoid taking further
> changes (and risk) ?

No, none of the actual crash fixes have made it in, yet. I'll try to get this in
either today or tomorrow.
Comment 113 Blake Kaplan (:mrbkap) 2005-09-26 18:47:19 PDT
Comment on attachment 196460 [details] [diff] [review]
Combined patch 2005-09-17 v2

>Index: src/jsregexp.c
>@@ -1264,7 +1374,17 @@ doSimple:
>+        n = (state->result->u.ucclass.bmsize >> 3) + 1;

This needs a comment about why you're doing |>> 3| here.

r=mrbkap
Comment 114 Brendan Eich [:brendan] 2005-09-28 07:32:28 PDT
Comment on attachment 196460 [details] [diff] [review]
Combined patch 2005-09-17 v2

I'm sr'ing but wanted to raise one small design issue, and point out a few more
nits to pick.

>+/*
>+ * Functions to get size and write/read bytecode that represent small indexes
>+ * compactly.

The small design issue concerns whether a single-bit frequency encoding where
bit 7 clear means bits 6..0 are the index, else bits 6..0 are high order 7 of
23 bits, with the rest coming from the following two bytes?  Avoids loops, does
just one bit-test, simplifies later analysis under REOP_FLAT coalescing code
that worries about compact width > 4.

If we really want to handle the full 32-bit size_t domain, we could use 5 bytes
with bits 7 and 6 of the first byte set, and bits 1..0 containing bits 31 and
30 of the size_t.

If size_t is 64 bits, I claim we do not need to support more than 32 (or 23)
bits.

>+ * Each byte in the code represent 7-bit chunk of the index. 8th bit when set
>+ * indicates that the following byte brings more bits to the index. Otherwise
>+ * this is the last byte in the index bytecode representing highest index bits.
>+ */
>+static size_t
>+GetCompactIndexWidth(size_t index)
>+{
>+    size_t width;
>+
>+    for (width = 1; (index >>= 7) != 0; ++width) { }

Nit: House style puts continue on next line for empty-body loops.

>+            /*
>+             * QUANT, <min>, <max>, <next> ... <ENDCHILD>
>+             * where <max> is written as compact(max+1) to make
>+             * (uintN)-1 sentinel to occupy 1 byte, not sizeof(max)+1.

s/sizeof/widthof/ ?

>         case REOP_FLAT:
>             /*
>-             * Consecutize FLAT's if possible.
>+             * Coalesce FLATs if possible and if it would not increase bytecode
>+             * beyond preallocated limit. The later happens only when bytecode

s/later/latter/

Thanks for keeping this patch alive.  It has been too long since I hacked
jump-to-jump on your original patch!

/be
Comment 115 Igor Bukanov 2005-09-29 01:56:17 PDT
(In reply to comment #114)
> The small design issue concerns whether a single-bit frequency encoding where
> bit 7 clear means bits 6..0 are the index, else bits 6..0 are high order 7 of
> 23 bits, with the rest coming from the following two bytes?  Avoids loops, does
> just one bit-test, simplifies later analysis under REOP_FLAT coalescing code
> that worries about compact width > 4.

It would not simplify code since then REOP_FLAT would need to deal with index or
offset overflow over 2**23 boundary. A similar code would have to be added to
report too big offset/size for [class_definition]. Of cause it is insane to
write regexp with length exceeding 2*23, but if supporting it is simpler then
not, why use more complex code?

> 
> If we really want to handle the full 32-bit size_t domain, we could use 5 bytes
> with bits 7 and 6 of the first byte set, and bits 1..0 containing bits 31 and
> 30 of the size_t.
> 

I suspect that using single if and 5 bytes may generate more code then the
current version with loops.

Comment 116 Brendan Eich [:brendan] 2005-09-29 02:19:53 PDT
(In reply to comment #115)
> > Avoids loops, does
> > just one bit-test, simplifies later analysis under REOP_FLAT coalescing code
> > that worries about compact width > 4.
> 
> It would not simplify code since then REOP_FLAT would need to deal with index
> or offset overflow over 2**23 boundary.

True, it substitutes one check for another.

> A similar code would have to be added to
> report too big offset/size for [class_definition]. Of cause it is insane to
> write regexp with length exceeding 2*23, but if supporting it is simpler then
> not, why use more complex code?

The tradeoff is more complex code vs. cross-platform, smaller limits.  We could
even be more restrictive (65535 flat chars max?).

But, you are right.  It's definitely smaller and cleaner code to do what you
did, and the Denial Of Service opportunity does not seem real here.  Ok, thanks.
This patch should land ASAP.

/be
Comment 117 Igor Bukanov 2005-09-29 03:28:19 PDT
Created attachment 197830 [details] [diff] [review]
Patch to commit

This is the version of attachment 196460 [details] [diff] [review] with added comments per comment 113
and nits per comment 114.

Note: I kept that loop with empty body written as 
for (width = 1; (index >>= 7) != 0; ++width) { }

since I am not sure should it be 

for (width = 1; (index >>= 7) != 0;
     ++width) { 
}

or 

for (width = 1; 
     (index >>= 7) != 0; ++width) { 
}

or

for (width = 1; (index >>= 7) != 0; 
     ++width);

Feel free to change that.
Comment 118 Brendan Eich [:brendan] 2005-09-29 04:19:13 PDT
(In reply to comment #117)
> Note: I kept that loop with empty body written as 
> for (width = 1; (index >>= 7) != 0; ++width) { }

That's ok with me, it was just a suggestion for consistency with other
empty-body loops in the code.

> since I am not sure should it be ...

The idea was to make it

+    for (width = 1; (index >>= 7) != 0; ++width)
+        continue;

/be
Comment 119 Blake Kaplan (:mrbkap) 2005-09-29 18:22:37 PDT
Attachment 197830 [details] [diff] has been checked into the trunk. Leaving this assigned to me
so I remember to check it into the branch tomorrow.
Comment 120 Philip K. Warren 2005-09-29 22:21:33 PDT
(In reply to comment #119)
> Attachment 197830 [details] [diff] [edit] has been checked into the trunk. Leaving this
assigned to me
> so I remember to check it into the branch tomorrow.

This checkin broke the AIX tinderbox. Bug 310539 filed for the bustage.
Comment 121 Blake Kaplan (:mrbkap) 2005-09-30 14:58:51 PDT
Fix checked in to MOZILLA_1_8_BRANCH. Reopening this so I can give it to Igor
who deserves many thanks.
Comment 122 Blake Kaplan (:mrbkap) 2005-09-30 15:00:06 PDT
Re-resolving as fixed. Sorry for the spam, and thanks again for doing this, Igor!
Comment 123 Boris Zbarsky [:bz] 2005-09-30 17:38:38 PDT
This broke the AIX build:

  "/home/tbox/sb/tinderbox/AIX_5.1_Clobber/mozilla/js/src/jsregexp.c", line
  206.9: 1506-213 (S) Macro name ARG_MAX cannot be redefined.

Note that that tinderbox has been red for about 24 hours at this point...
Comment 124 Bob Clary [:bc:] 2005-11-09 18:30:09 PST
no crash in firefox 1.5 rc2 winxp/linux
Comment 125 Tim Riley [:timr] 2006-02-06 11:33:55 PST
Comment on attachment 197830 [details] [diff] [review]
Patch to commit

Need to double check the merge into the 1.0/1.7 branch
Comment 126 Daniel Veditz [:dveditz] 2006-02-12 22:18:53 PST
Comment on attachment 197830 [details] [diff] [review]
Patch to commit

a=dveditz
Comment 127 Daniel Veditz [:dveditz] 2006-02-14 13:17:48 PST
Comment on attachment 197830 [details] [diff] [review]
Patch to commit

This patch doesn't merge at all cleanly on the old branches. Working up a new one.
Comment 128 Daniel Veditz [:dveditz] 2006-02-15 14:52:30 PST
Created attachment 212040 [details] [diff] [review]
backported to the 1.7/aviary101 branch
Comment 129 Daniel Veditz [:dveditz] 2006-02-15 15:27:08 PST
Comment on attachment 212040 [details] [diff] [review]
backported to the 1.7/aviary101 branch

>@@ -206,11 +296,11 @@ typedef struct REProgState {
>     jsbytecode continue_op;
>     uint16 index;                   /* progress in text */

I've changed this to ptrdiff_t locally and plan to check in that way.
Comment 130 Blake Kaplan (:mrbkap) 2006-02-15 18:14:19 PST
Comment on attachment 212040 [details] [diff] [review]
backported to the 1.7/aviary101 branch

Looks good.
Comment 131 Daniel Veditz [:dveditz] 2006-02-15 18:54:49 PST
Comment on attachment 212040 [details] [diff] [review]
backported to the 1.7/aviary101 branch

Carrying over approvals from the unmerged patch.
Comment 132 Daniel Veditz [:dveditz] 2006-02-15 19:09:08 PST
backported patch checked into aviary101/moz17 branches
Comment 133 Bob Clary [:bc:] 2006-02-22 02:10:40 PST
no crash ff on 1.7.13, 1.8.0.1, 1.8, 1.9a1 on windows/linux/mac
no crash moz on 1.7.13 on windows.
Comment 134 Bob Clary [:bc:] 2006-03-10 11:28:34 PST
This is already in the 1.5 release as is the aix regression fix from bug 310539, removing js16 bug 309169 blocking.

Note You need to log in before you can comment on or make changes to this bug.