Closed Bug 76717 Opened 24 years ago Closed 22 years ago

regexp literals wrongly shared among threads executing precompiled scripts

Categories

(Core :: JavaScript Engine, defect, P2)

defect

Tracking

()

VERIFIED FIXED
mozilla1.2alpha

People

(Reporter: brendan, Assigned: brendan)

References

Details

(Keywords: js1.5)

Attachments

(3 files, 8 obsolete files)

This bites brutal sharing.  Any embedding that compiles once (using a dummy or
super global) and executes among N threads will leave those threads contending
for the lastIndex member.  This breaks global regexp uses, and should be fixed
by generating a prolog bytecode to conditionally clone the regexp (sharing all
the immutable state) if the execute-time scope chain is not the same as the
compile-time one, saving the clone in an optional frame member.

This means regexp objects can't be fetched from script atom maps, in general. 
They need to come from the optiona frame member mapping a dense index (bytecode
immediate operand) to the clone.  So two bytecodes, JSOP_DEFREGEXP in prologs,
and JSOP_USEREGEXP for the literal site.

This applies only to global regexps, because those are the only ones for which
lastIndex is updated after being initialized to 0.  So JSOP_OBJECT can still be
used to index a script's shared, immutable atomMap to find non-global regexps
created for literals.

This suggests that the global regexp map indexed by JSOP_DEFREGEXP/USEREGEXP
ought not be in every JSStackFrame.  There are other members of JSStackFrame
that are used rarely, if ever.  I'm going to make a sparse-to-parallel stack of
such optional frame members, to minimize call overhead and bloat.

/be
Severity: normal → major
Status: NEW → ASSIGNED
Keywords: js1.5, mozilla0.9.1
Priority: -- → P2
Target Milestone: --- → mozilla0.9.1
No way this'll happen in one day.

/be
Target Milestone: mozilla0.9.1 → mozilla0.9.2
Target Milestone: mozilla0.9.2 → mozilla0.9.3
Would like to fix this soon, for js1.5.

/be
Target Milestone: mozilla0.9.3 → mozilla0.9.4
Next milestone, this needs fixing.

/be
Target Milestone: mozilla0.9.4 → mozilla0.9.5
Ran out of time (don't ask), but this will get fixed in 0.9.6.

/be
Target Milestone: mozilla0.9.5 → mozilla0.9.6
Pushing out, this is not yet important to Mozilla-the-browser, but is to Mozilla
the platform.

/be
Keywords: mozilla0.9.6mozilla1.0
Target Milestone: mozilla0.9.6 → mozilla0.9.9
Oops, off-by-1 on the target milestone.  The changes to fix this should not be
landing as late as 0.9.9.

/be
Target Milestone: mozilla0.9.9 → mozilla0.9.8
0.9.9.

/be
Target Milestone: mozilla0.9.8 → mozilla0.9.9
Plan A is too invasive and costly for the gain, besides being too much for
0.9.9.  I have a plan B: at the cost of an extra word per JSRegExp, multiplex
lastIndex so that different threads do not collide on it.  This is a smaller
change that I hope to attach soon.

/be
Not just globals, all regexp objects created by the compiler are badly shared
(you can use lastIndex on a non-global regexp to make that re start matching at
a given offset).  Patch next, it adds costs only #ifdef JS_THREADSAFE, and for
all I know violates the spirit of ECMA -- but ECMA does not talk about threads,
or about scripts compiled once and executed concurrently.

/be
Summary: global regexp literals wrongly shared among threads executing precompiled scripts → regexp literals wrongly shared among threads executing precompiled scripts
Attached patch proposed fix (obsolete) — Splinter Review
Only regexps with objects (those created by literals or by new RegExp, or a
native equivalent) need to worry about being shared among threads --
object-less regexps are internal to the engine and thread-private/temporary.

Note also how the lastIndexes table is created lazily, in SetLastIndex.

Testing and review craved, esp. testing from tellme.com folks.

/be
Attached patch I need to eat lunch (obsolete) — Splinter Review
I should have observed that only regexps owned by objects (created by the
compiler, via new RegExp or just RegExp, or via an equivalent native call) have
their lastIndex members got or set.  The getting or setting code (now the code
that calls GetLastIndex or SetLastIndex) already locks the object, carefully. 
See bug 76233.

This saves the re->object member and code to update it added in the earlier
patches, and I further optimized GetLastIndex to avoid claiming ownership (i.e.
setting re->ownerThread) -- only SetLastIndex needs to do that.

/be
Attachment #68372 - Attachment is obsolete: true
Final note: JSRegExp grows by only one word (on 32-bit-pointer systems) #ifdef
JS_THREADSAFE, and shrinks by one word in the single-threaded engine.

/be
Attached patch I ate lunch (obsolete) — Splinter Review
One change only from last patch: regexp_exec_sub was clobbering the ok return
value of js_ExecuteRegExp for a global regexp, by setting ok to the result of
SetLastIndex.  It shouldn't even have called SetLastIndex (or stored the
updated index) if js_ExecuteRegExp failed.

/be
Attachment #68385 - Attachment is obsolete: true
Comment on attachment 68405 [details] [diff] [review]
I ate lunch

r=rogerl
Comment on attachment 68405 [details] [diff] [review]
I ate lunch

Recording rogerl's r= with the patch manager.  Shaver, how about sr=?

/be
Attachment #68405 - Flags: review+
Comment on attachment 68405 [details] [diff] [review]
I ate lunch

I will not be bullied!

OK, maybe just a little bit.  sr=shaver.
Attachment #68405 - Flags: superreview+
The testsuite made me transpose lastIndex and parenCount, so the latter was the
24-bit field sharing a 32-bit word wit flags.  That seems bogus of the
testsuite, but I made it happy.  Who needs > 16M parens anyhoo?

ECMA seems to say that lastIndex is a read-write property whose value is an
Integer.  The regexp semantics won't ever let it be negative, but put no upper
bound less than Integer's u.b. on its domain.  Does that really require
conforming implementations to store lastIndex using an integer-valued double? 
We don't, and the testsuite actually expects us to use a uint32.  I say both the
engine and the suite are buggy.  Phil, can you ping waldemar and open a new bug
with fixed/new tests as needed?  Thanks,

/be
Status: ASSIGNED → RESOLVED
Closed: 23 years ago
Resolution: --- → FIXED
The test that wants uint32 storage is ecma_2/RegExp/properties-001.js, IIRC.

/be
Have filed bug 124339 on the lastIndex issue.

I've run the full testsuite on this fix, in both the debug and optimized
JS shell, on both WinNT and Linux. No new regressions have occurred.

Marking this bug Verified Fixed -
Status: RESOLVED → VERIFIED
The checked in changes created the folowing three warnings on win32.

X:\trunk\mozilla\js\src\jsregexp.c(2174) : warning C4244: '=' : conversion from
'unsigned long ' to 'unsigned char ', possible loss of data
X:\trunk\mozilla\js\src\jsregexp.c(2640) : warning C4244: '=' : conversion from
'unsigned long ' to 'unsigned char ', possible loss of data
X:\trunk\mozilla\js\src\jsregexp.c(2833) : warning C4244: '=' : conversion from
'double ' to 'unsigned int ', possible loss of data

The first two seem to be just the MS compiler being a little braindead about
bitfields - and easly fixed with a safe cast. 

The third is more interesting. It is easily quieted with a pretty safe cast to
size_t. But this begs the question of whether or not {S,G}etLastIndex should
really be working with doubles when we are using a data type with a much smaller
range. Ah! I see bug 124339. So, I assume we are not worried about overflow and
a cast in this code is just fine. I'll attach a patch to kill the warnings.
Status: VERIFIED → REOPENED
Resolution: FIXED → ---
Attached patch kill the warnings (obsolete) — Splinter Review
Oops, ECMA-262 15.10.6.2 step 6 (if you fix the wrong uppercasing of i to I)
says that exec should fail if i < 0 or i > length.  Patch coming up.

/be
Status: REOPENED → ASSIGNED
Looking for quick r= and sr= for 0.9.8.

/be
Attachment #69885 - Attachment is obsolete: true
I mean 0.9.9 of course!

/be
Comment on attachment 69890 [details] [diff] [review]
jband's patch + ECMA 15.10.6.2 step 6 + s/double/jsdouble/

sr=shaver.
Attachment #69890 - Flags: superreview+
I'm happy if someone else r='s, but I'm willing if I can be schooled a bit.

shaver did the sr= and left me to ask the stupid question...

+    if (lastIndex < 0 || JSSTRING_LENGTH(str) < lastIndex) {
+        ok = SetLastIndex(cx, re, 0);
+        *rval = JSVAL_NULL;

Why in this case can we call SetLastIndex without the object locking and the
flag check? And why would the lastIndex be out of range like this?
jband, thanks for the good questions.  Locking is required, I'm a dumbass.  And
silly ECMA requires us to set .lastIndex to 0 for non-global as well as global
regexps if a script author set it to a negative number, or left it set too
large for the current target string.  This revalidation for non-globals means
that I cannot share the tail of the else clause, so I did another lock/unlock
around the revalidation SetLastIndex call in the then clause.

/be
Attachment #68405 - Attachment is obsolete: true
Attachment #69890 - Attachment is obsolete: true
Comment on attachment 70108 [details] [diff] [review]
final proposed fix

r/sr=jband. I'm happy.
Attachment #70108 - Flags: superreview+
Comment on attachment 70108 [details] [diff] [review]
final proposed fix

r/sr=jband. I'm happy.
Comment on attachment 70108 [details] [diff] [review]
final proposed fix

After jband's sweet review, who wouldn't be happy?

r=shaver. (mildly shamed, alas.)
Attachment #70108 - Flags: review+
Fixed.

/be
Status: ASSIGNED → RESOLVED
Closed: 23 years ago23 years ago
Resolution: --- → FIXED
Inspired by Brendan's observations in Comment #23 and Comment #28,
I have added this test to the JS testsuite:

       mozilla/js/tests/ecma_3/RegExp/15.10.6.2-2.js

It tests re.exec(str) when re.lastIndex is < 0 or > str.length, 
treating both the case when |re| has the global flag set,
and when it does not.

The testcase failed before this fix, but now passes -
Ran the entire JS testsuite on WinNT and Linux, and got only the known
test failures. The testsuite does not set up a multi-threaded environment, 
however. Lacking that, marking Verified Fixed until proven otherwise -
Status: RESOLVED → VERIFIED
Now that bug 164697 has been filed and has a patch, perhaps this bug's patch
should be backed out, and the lastIndex property's value should be stored in
each RegExp object (requiring a slot per object, i.e., no more JSPROP_SHARED
attribute for .lastIndex), not in a private data structure indexed by thread-id.

/be
Status: VERIFIED → REOPENED
Resolution: FIXED → ---
Depends on: 164697
Phil, I don't know when this broke (maybe it never worked), but we seem to
violate ECMA-262 Edition 3 15.5.4.10, the "If regexp.global is true" clause, in
that we do not update lastIndex.  E.g.

[~/src/trunk-opt/mozilla/js/src]$ Linux_All_DBG.OBJ/js
js> r = /\ba+/g
/\ba+/g
js> s = "a,aa, aaa "
a,aa, aaa 
js> r.lastIndex
0
js> s.match(r)
a,aa,aaa
js> r.lastIndex
0
js> s.match(r)  
a,aa,aaa
js> r.lastIndex
0

Shouldn't r.lastIndex be 9 after each match, if you follow the
RegExp.prototype.exec spec at 15.10.6.2?

I'm fixing this as well as ripping out the per-thread lastIndex jazz.  I guess a
new bug is in order, although the patch for both that new bug and this bug will
be attached here.

/be
Status: REOPENED → ASSIGNED
Keywords: mozilla1.0mozilla1.2
Target Milestone: mozilla0.9.9 → mozilla1.2alpha
> Shouldn't r.lastIndex be 9 after each match, if you follow the
> RegExp.prototype.exec spec at 15.10.6.2?

The way I read it, no -


15.5.4.10 String.prototype.match (regexp)

If regexp is not an object whose [[Class]] property is "RegExp",
it is replaced with the result of the expression new RegExp(regexp).

Let string denote the result of converting the this value to a string.
Then do one of the following:

• If regexp.global is false: Return the result obtained by invoking 
RegExp.prototype.exec (see section 15.10.6.2) on regexp with string
as parameter.

• If regexp.global is true: Set the regexp.lastIndex property to 0
and invoke RegExp.prototype.exec repeatedly until there is no match.

                          etc.


The phrase I'm keying on is "repeatedly until there is no match."
When r.exec() is invoked until there is no match, then r.lastIndex
always gets set back to 0:


js> r = /\ba+/g
/\ba+/g
js> s = "a,aa, aaa "
a,aa, aaa

js> r.exec(s)
a
js> r.lastIndex
1
js> r.exec(s)
aa
js> r.lastIndex
4
js> r.exec(s)
aaa
js> r.lastIndex
9
js> r.exec(s)
null
js> r.lastIndex
0


This is because of:


15.10.6.2 RegExp.prototype.exec(string)

Performs a regular expression match of string against the regular expression
and returns an Array object containing the results of the match, or null
if the string did not match.

The string ToString(string) is searched for an occurrence of the regular 
expression pattern as follows:

1. Let S be the value of ToString(string).
2. Let length be the length of S.
3. Let lastIndex be the value of the lastIndex property.
4. Let i be the value of ToInteger(lastIndex).
5. If the global property is false, let i = 0.
6. If i < 0 or i > length then set lastIndex to 0 and return null.
7. Call [[Match]], giving it the arguments S and i. If [[Match]] returned    
failure, go to step 8; otherwise let r be its State result and go to step 10.
8. Let i = i+1.
9. Go to step 6.
10. Let e be r's endIndex value.
11. If the global property is true, set lastIndex to e.
12. Let n be the length of r's captures array. (This is the same value
as section 15.10.2.1's NCapturingParens.)
13. Return a new array with the following properties:
            
                       etc.


In your example, the last successful match leaves lastIndex at 9.
Note the test string has length 10. When we try to match again,
there is no match and we fall into "failure" in step 7, which sends
us to step 8 and i is incremented from 9 to 10. Then we go back
to step 6 and step 7 and fail again, and i is incremented to 11.

We go back to step 6 again, and now, since 11 > length of test string,
we return and set lastIndex to 0.
We do differ from IE6 on this point. Mozilla is consistent with NN4.7.
I will attach an HTML testcase below.
Try the HTML testcase in all three browsers. All three show the same
data for repeated appeals to r.lastIndex + r.exec(s) + r.lastIndex :

0
a
1

1
aa
4

4
aaa
9

9
null
0


However, IE6 differs from Mozilla and NN4.x when we appeal to
r.lastIndex + s.match(r) + r.lastIndex:

0
a,aa,aaa
9

9
a,aa,aaa
9

9
a,aa,aaa
9

etc.


Whereas Mozilla and NN4.7 do what Brendan observed above:

0
a,aa,aaa
0

0
a,aa,aaa
0

0
a,aa,aaa
0

etc.


The way I read the spec, we seem to be doing the right thing,
and IE is not -
Phil, you're right -- I was wrong.  New patch coming up, it simply removes the
second js_SetLastIndex call in match_or_replace in jsstr.c.

/be
Attached patch latest proposed fix (obsolete) — Splinter Review
Ok, this should be good.  Phil, please let me know how it tests.

/be
Attachment #96959 - Attachment is obsolete: true
In js_CloneRegExpObject, shouldn't

    if (!JS_SetPrivate(cx, clone, re) || !js_SetLastIndex(cx, obj, 0)) {

be

    if (!JS_SetPrivate(cx, clone, re) || !js_SetLastIndex(cx, clone, 0)) { 

?

The patch produces one test regression:

          js/tests/js1_2/function/regexparg-2-n.js


js> load('D:/JS_TRUNK/mozilla/js/tests/js1_2/shell.js');
js> load('D:/JS_TRUNK/mozilla/js/tests/js1_2/function/regexparg-2-n.js');
JS_1.2 The variable statement
function f(x) {return x;}; x = f(/abc/); x() = undefined FAILED! expected: error


From the test:
-------------------------------------------------------------------------------
/*
   Regression test for 
    http://scopus/bugsplat/show_bug.cgi?id=122787
    Passing a regular expression as the first constructor argument fails

    Author:             christine@netscape.com
    Date:               15 June 1998
*/

    var SECTION = "JS_1.2";
    var VERSION = "JS_1.2";
    startTest();
    var TITLE   = "The variable statment";

    writeHeaderToLog( SECTION + " "+ TITLE);

    var testcases = new Array();

    function f(x) {return x;}

    x = f(/abc/);

    testcases[tc++] = new TestCase( SECTION,
	              "function f(x) {return x;}; x = f(/abc/); x()",
                      "error", x() );

    test();
-------------------------------------------------------------------------------
The test is failing because the original initialization of ok to JS_FALSE got
subsumed at line 2731 and the error (arising from argc == 0) gets ignored.
rogerl showed me the essence of the above test:

CURRENT JS SHELL:             js> /a/();
                              1: SyntaxError: no input for /a/

JS SHELL WITH LATEST PATCH:   js> /a/();
                              js>


That is, before the patch it was a SyntaxError to call a regexp
with no input string. This was true whether we used the syntax
|re()| or |re.exec()|. The testcase is expecting this error.

After the patch, however, no error is being produced -
Thanks^2 rogerl.

/be
Attachment #97109 - Attachment is obsolete: true
Comment on attachment 97238 [details] [diff] [review]
i hope this is it!

r=rogerl
Attachment #97238 - Flags: review+
Ran the latest patch in the optimized JS shell on WinNT against
the entire JS testsuite. No test regressions were introduced -
Fix went in the other day.

/be
Status: ASSIGNED → RESOLVED
Closed: 23 years ago22 years ago
Resolution: --- → FIXED
Marking Verified Fixed.

The above testcase (see Comment #33) passes in both the debug and
optimized JS shell on WinNT. Furthermore, as in Comment #34:

> Ran the entire JS testsuite on WinNT and Linux, and got only the known
> test failures. The testsuite does not set up a multi-threaded environment, 
> however. Lacking that, marking Verified Fixed until proven otherwise -
Status: RESOLVED → VERIFIED
Flags: testcase+
note <http://test.bclary.com/tests/mozilla.org/js/js-test-driver-standards.html?test=js1_2/function/regexparg-2-n.js;language=type;text/javascript> fails in all versions of ff 1.0.x, 1.5.x, 1.6 on win/linux/mac in the 
_browser only if popups are disabled for the site_. 

When run in the shell, the js console, or in the url bar as 
|javascript:function f(x) {return x;}; x = f(/abc/); x()| it will throw the error.
(In reply to comment #53)

ignore that popup comment. it just fails in the browser period. its late and thats the story i'm sticking to.
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: