Closed
Bug 76717
Opened 24 years ago
Closed 22 years ago
regexp literals wrongly shared among threads executing precompiled scripts
Categories
(Core :: JavaScript Engine, defect, P2)
Core
JavaScript Engine
Tracking
()
VERIFIED
FIXED
mozilla1.2alpha
People
(Reporter: brendan, Assigned: brendan)
References
Details
(Keywords: js1.5)
Attachments
(3 files, 8 obsolete files)
3.09 KB,
patch
|
shaver
:
review+
jband_mozilla
:
superreview+
|
Details | Diff | Splinter Review |
681 bytes,
text/html
|
Details | |
11.27 KB,
patch
|
rogerl
:
review+
|
Details | Diff | Splinter Review |
This bites brutal sharing. Any embedding that compiles once (using a dummy or super global) and executes among N threads will leave those threads contending for the lastIndex member. This breaks global regexp uses, and should be fixed by generating a prolog bytecode to conditionally clone the regexp (sharing all the immutable state) if the execute-time scope chain is not the same as the compile-time one, saving the clone in an optional frame member. This means regexp objects can't be fetched from script atom maps, in general. They need to come from the optiona frame member mapping a dense index (bytecode immediate operand) to the clone. So two bytecodes, JSOP_DEFREGEXP in prologs, and JSOP_USEREGEXP for the literal site. This applies only to global regexps, because those are the only ones for which lastIndex is updated after being initialized to 0. So JSOP_OBJECT can still be used to index a script's shared, immutable atomMap to find non-global regexps created for literals. This suggests that the global regexp map indexed by JSOP_DEFREGEXP/USEREGEXP ought not be in every JSStackFrame. There are other members of JSStackFrame that are used rarely, if ever. I'm going to make a sparse-to-parallel stack of such optional frame members, to minimize call overhead and bloat. /be
Assignee | ||
Updated•24 years ago
|
Severity: normal → major
Status: NEW → ASSIGNED
Keywords: js1.5,
mozilla0.9.1
Priority: -- → P2
Target Milestone: --- → mozilla0.9.1
Assignee | ||
Comment 1•23 years ago
|
||
No way this'll happen in one day. /be
Keywords: mozilla0.9.1 → mozilla0.9.2
Target Milestone: mozilla0.9.1 → mozilla0.9.2
Updated•23 years ago
|
Target Milestone: mozilla0.9.2 → mozilla0.9.3
Assignee | ||
Comment 2•23 years ago
|
||
Would like to fix this soon, for js1.5. /be
Keywords: mozilla0.9.2 → mozilla0.9.4
Target Milestone: mozilla0.9.3 → mozilla0.9.4
Assignee | ||
Comment 3•23 years ago
|
||
Next milestone, this needs fixing. /be
Keywords: mozilla0.9.4 → mozilla0.9.5
Target Milestone: mozilla0.9.4 → mozilla0.9.5
Assignee | ||
Comment 4•23 years ago
|
||
Ran out of time (don't ask), but this will get fixed in 0.9.6. /be
Keywords: mozilla0.9.5 → mozilla0.9.6
Target Milestone: mozilla0.9.5 → mozilla0.9.6
Assignee | ||
Comment 5•23 years ago
|
||
Pushing out, this is not yet important to Mozilla-the-browser, but is to Mozilla the platform. /be
Keywords: mozilla0.9.6 → mozilla1.0
Target Milestone: mozilla0.9.6 → mozilla0.9.9
Assignee | ||
Comment 6•23 years ago
|
||
Oops, off-by-1 on the target milestone. The changes to fix this should not be landing as late as 0.9.9. /be
Target Milestone: mozilla0.9.9 → mozilla0.9.8
Assignee | ||
Comment 8•23 years ago
|
||
Plan A is too invasive and costly for the gain, besides being too much for 0.9.9. I have a plan B: at the cost of an extra word per JSRegExp, multiplex lastIndex so that different threads do not collide on it. This is a smaller change that I hope to attach soon. /be
Assignee | ||
Comment 9•23 years ago
|
||
Not just globals, all regexp objects created by the compiler are badly shared (you can use lastIndex on a non-global regexp to make that re start matching at a given offset). Patch next, it adds costs only #ifdef JS_THREADSAFE, and for all I know violates the spirit of ECMA -- but ECMA does not talk about threads, or about scripts compiled once and executed concurrently. /be
Summary: global regexp literals wrongly shared among threads executing precompiled scripts → regexp literals wrongly shared among threads executing precompiled scripts
Assignee | ||
Comment 10•23 years ago
|
||
Only regexps with objects (those created by literals or by new RegExp, or a native equivalent) need to worry about being shared among threads -- object-less regexps are internal to the engine and thread-private/temporary. Note also how the lastIndexes table is created lazily, in SetLastIndex. Testing and review craved, esp. testing from tellme.com folks. /be
Assignee | ||
Comment 11•23 years ago
|
||
Attachment #68366 -
Attachment is obsolete: true
Assignee | ||
Comment 12•23 years ago
|
||
I should have observed that only regexps owned by objects (created by the compiler, via new RegExp or just RegExp, or via an equivalent native call) have their lastIndex members got or set. The getting or setting code (now the code that calls GetLastIndex or SetLastIndex) already locks the object, carefully. See bug 76233. This saves the re->object member and code to update it added in the earlier patches, and I further optimized GetLastIndex to avoid claiming ownership (i.e. setting re->ownerThread) -- only SetLastIndex needs to do that. /be
Attachment #68372 -
Attachment is obsolete: true
Assignee | ||
Comment 13•23 years ago
|
||
Final note: JSRegExp grows by only one word (on 32-bit-pointer systems) #ifdef JS_THREADSAFE, and shrinks by one word in the single-threaded engine. /be
Assignee | ||
Comment 14•23 years ago
|
||
One change only from last patch: regexp_exec_sub was clobbering the ok return value of js_ExecuteRegExp for a global regexp, by setting ok to the result of SetLastIndex. It shouldn't even have called SetLastIndex (or stored the updated index) if js_ExecuteRegExp failed. /be
Attachment #68385 -
Attachment is obsolete: true
Comment 15•23 years ago
|
||
Comment on attachment 68405 [details] [diff] [review] I ate lunch r=rogerl
Assignee | ||
Comment 16•23 years ago
|
||
Comment on attachment 68405 [details] [diff] [review] I ate lunch Recording rogerl's r= with the patch manager. Shaver, how about sr=? /be
Attachment #68405 -
Flags: review+
Comment 17•23 years ago
|
||
Comment on attachment 68405 [details] [diff] [review] I ate lunch I will not be bullied! OK, maybe just a little bit. sr=shaver.
Attachment #68405 -
Flags: superreview+
Assignee | ||
Comment 18•23 years ago
|
||
The testsuite made me transpose lastIndex and parenCount, so the latter was the 24-bit field sharing a 32-bit word wit flags. That seems bogus of the testsuite, but I made it happy. Who needs > 16M parens anyhoo? ECMA seems to say that lastIndex is a read-write property whose value is an Integer. The regexp semantics won't ever let it be negative, but put no upper bound less than Integer's u.b. on its domain. Does that really require conforming implementations to store lastIndex using an integer-valued double? We don't, and the testsuite actually expects us to use a uint32. I say both the engine and the suite are buggy. Phil, can you ping waldemar and open a new bug with fixed/new tests as needed? Thanks, /be
Status: ASSIGNED → RESOLVED
Closed: 23 years ago
Resolution: --- → FIXED
Assignee | ||
Comment 19•23 years ago
|
||
The test that wants uint32 storage is ecma_2/RegExp/properties-001.js, IIRC. /be
Comment 20•23 years ago
|
||
Have filed bug 124339 on the lastIndex issue. I've run the full testsuite on this fix, in both the debug and optimized JS shell, on both WinNT and Linux. No new regressions have occurred. Marking this bug Verified Fixed -
Status: RESOLVED → VERIFIED
Comment 21•23 years ago
|
||
The checked in changes created the folowing three warnings on win32. X:\trunk\mozilla\js\src\jsregexp.c(2174) : warning C4244: '=' : conversion from 'unsigned long ' to 'unsigned char ', possible loss of data X:\trunk\mozilla\js\src\jsregexp.c(2640) : warning C4244: '=' : conversion from 'unsigned long ' to 'unsigned char ', possible loss of data X:\trunk\mozilla\js\src\jsregexp.c(2833) : warning C4244: '=' : conversion from 'double ' to 'unsigned int ', possible loss of data The first two seem to be just the MS compiler being a little braindead about bitfields - and easly fixed with a safe cast. The third is more interesting. It is easily quieted with a pretty safe cast to size_t. But this begs the question of whether or not {S,G}etLastIndex should really be working with doubles when we are using a data type with a much smaller range. Ah! I see bug 124339. So, I assume we are not worried about overflow and a cast in this code is just fine. I'll attach a patch to kill the warnings.
Status: VERIFIED → REOPENED
Resolution: FIXED → ---
Comment 22•23 years ago
|
||
Assignee | ||
Comment 23•23 years ago
|
||
Oops, ECMA-262 15.10.6.2 step 6 (if you fix the wrong uppercasing of i to I) says that exec should fail if i < 0 or i > length. Patch coming up. /be
Status: REOPENED → ASSIGNED
Assignee | ||
Comment 24•23 years ago
|
||
Looking for quick r= and sr= for 0.9.8. /be
Attachment #69885 -
Attachment is obsolete: true
Assignee | ||
Comment 25•23 years ago
|
||
I mean 0.9.9 of course! /be
Comment 26•23 years ago
|
||
Comment on attachment 69890 [details] [diff] [review] jband's patch + ECMA 15.10.6.2 step 6 + s/double/jsdouble/ sr=shaver.
Attachment #69890 -
Flags: superreview+
Comment 27•23 years ago
|
||
I'm happy if someone else r='s, but I'm willing if I can be schooled a bit. shaver did the sr= and left me to ask the stupid question... + if (lastIndex < 0 || JSSTRING_LENGTH(str) < lastIndex) { + ok = SetLastIndex(cx, re, 0); + *rval = JSVAL_NULL; Why in this case can we call SetLastIndex without the object locking and the flag check? And why would the lastIndex be out of range like this?
Assignee | ||
Comment 28•23 years ago
|
||
jband, thanks for the good questions. Locking is required, I'm a dumbass. And silly ECMA requires us to set .lastIndex to 0 for non-global as well as global regexps if a script author set it to a negative number, or left it set too large for the current target string. This revalidation for non-globals means that I cannot share the tail of the else clause, so I did another lock/unlock around the revalidation SetLastIndex call in the then clause. /be
Attachment #68405 -
Attachment is obsolete: true
Attachment #69890 -
Attachment is obsolete: true
Comment 29•23 years ago
|
||
Comment on attachment 70108 [details] [diff] [review] final proposed fix r/sr=jband. I'm happy.
Attachment #70108 -
Flags: superreview+
Comment 30•23 years ago
|
||
Comment on attachment 70108 [details] [diff] [review] final proposed fix r/sr=jband. I'm happy.
Comment 31•23 years ago
|
||
Comment on attachment 70108 [details] [diff] [review] final proposed fix After jband's sweet review, who wouldn't be happy? r=shaver. (mildly shamed, alas.)
Attachment #70108 -
Flags: review+
Assignee | ||
Comment 32•23 years ago
|
||
Fixed. /be
Status: ASSIGNED → RESOLVED
Closed: 23 years ago → 23 years ago
Resolution: --- → FIXED
Comment 33•23 years ago
|
||
Inspired by Brendan's observations in Comment #23 and Comment #28, I have added this test to the JS testsuite: mozilla/js/tests/ecma_3/RegExp/15.10.6.2-2.js It tests re.exec(str) when re.lastIndex is < 0 or > str.length, treating both the case when |re| has the global flag set, and when it does not. The testcase failed before this fix, but now passes -
Comment 34•23 years ago
|
||
Ran the entire JS testsuite on WinNT and Linux, and got only the known test failures. The testsuite does not set up a multi-threaded environment, however. Lacking that, marking Verified Fixed until proven otherwise -
Status: RESOLVED → VERIFIED
Assignee | ||
Comment 35•22 years ago
|
||
Now that bug 164697 has been filed and has a patch, perhaps this bug's patch should be backed out, and the lastIndex property's value should be stored in each RegExp object (requiring a slot per object, i.e., no more JSPROP_SHARED attribute for .lastIndex), not in a private data structure indexed by thread-id. /be
Status: VERIFIED → REOPENED
Resolution: FIXED → ---
Assignee | ||
Comment 36•22 years ago
|
||
Phil, I don't know when this broke (maybe it never worked), but we seem to violate ECMA-262 Edition 3 15.5.4.10, the "If regexp.global is true" clause, in that we do not update lastIndex. E.g. [~/src/trunk-opt/mozilla/js/src]$ Linux_All_DBG.OBJ/js js> r = /\ba+/g /\ba+/g js> s = "a,aa, aaa " a,aa, aaa js> r.lastIndex 0 js> s.match(r) a,aa,aaa js> r.lastIndex 0 js> s.match(r) a,aa,aaa js> r.lastIndex 0 Shouldn't r.lastIndex be 9 after each match, if you follow the RegExp.prototype.exec spec at 15.10.6.2? I'm fixing this as well as ripping out the per-thread lastIndex jazz. I guess a new bug is in order, although the patch for both that new bug and this bug will be attached here. /be
Status: REOPENED → ASSIGNED
Keywords: mozilla1.0 → mozilla1.2
Target Milestone: mozilla0.9.9 → mozilla1.2alpha
Assignee | ||
Comment 37•22 years ago
|
||
Comment 38•22 years ago
|
||
> Shouldn't r.lastIndex be 9 after each match, if you follow the
> RegExp.prototype.exec spec at 15.10.6.2?
The way I read it, no -
15.5.4.10 String.prototype.match (regexp)
If regexp is not an object whose [[Class]] property is "RegExp",
it is replaced with the result of the expression new RegExp(regexp).
Let string denote the result of converting the this value to a string.
Then do one of the following:
• If regexp.global is false: Return the result obtained by invoking
RegExp.prototype.exec (see section 15.10.6.2) on regexp with string
as parameter.
• If regexp.global is true: Set the regexp.lastIndex property to 0
and invoke RegExp.prototype.exec repeatedly until there is no match.
etc.
The phrase I'm keying on is "repeatedly until there is no match."
When r.exec() is invoked until there is no match, then r.lastIndex
always gets set back to 0:
js> r = /\ba+/g
/\ba+/g
js> s = "a,aa, aaa "
a,aa, aaa
js> r.exec(s)
a
js> r.lastIndex
1
js> r.exec(s)
aa
js> r.lastIndex
4
js> r.exec(s)
aaa
js> r.lastIndex
9
js> r.exec(s)
null
js> r.lastIndex
0
This is because of:
15.10.6.2 RegExp.prototype.exec(string)
Performs a regular expression match of string against the regular expression
and returns an Array object containing the results of the match, or null
if the string did not match.
The string ToString(string) is searched for an occurrence of the regular
expression pattern as follows:
1. Let S be the value of ToString(string).
2. Let length be the length of S.
3. Let lastIndex be the value of the lastIndex property.
4. Let i be the value of ToInteger(lastIndex).
5. If the global property is false, let i = 0.
6. If i < 0 or i > length then set lastIndex to 0 and return null.
7. Call [[Match]], giving it the arguments S and i. If [[Match]] returned
failure, go to step 8; otherwise let r be its State result and go to step 10.
8. Let i = i+1.
9. Go to step 6.
10. Let e be r's endIndex value.
11. If the global property is true, set lastIndex to e.
12. Let n be the length of r's captures array. (This is the same value
as section 15.10.2.1's NCapturingParens.)
13. Return a new array with the following properties:
etc.
In your example, the last successful match leaves lastIndex at 9.
Note the test string has length 10. When we try to match again,
there is no match and we fall into "failure" in step 7, which sends
us to step 8 and i is incremented from 9 to 10. Then we go back
to step 6 and step 7 and fail again, and i is incremented to 11.
We go back to step 6 again, and now, since 11 > length of test string,
we return and set lastIndex to 0.
Comment 39•22 years ago
|
||
We do differ from IE6 on this point. Mozilla is consistent with NN4.7. I will attach an HTML testcase below.
Comment 40•22 years ago
|
||
Comment 41•22 years ago
|
||
Try the HTML testcase in all three browsers. All three show the same data for repeated appeals to r.lastIndex + r.exec(s) + r.lastIndex : 0 a 1 1 aa 4 4 aaa 9 9 null 0 However, IE6 differs from Mozilla and NN4.x when we appeal to r.lastIndex + s.match(r) + r.lastIndex: 0 a,aa,aaa 9 9 a,aa,aaa 9 9 a,aa,aaa 9 etc. Whereas Mozilla and NN4.7 do what Brendan observed above: 0 a,aa,aaa 0 0 a,aa,aaa 0 0 a,aa,aaa 0 etc. The way I read the spec, we seem to be doing the right thing, and IE is not -
Assignee | ||
Comment 42•22 years ago
|
||
Phil, you're right -- I was wrong. New patch coming up, it simply removes the second js_SetLastIndex call in match_or_replace in jsstr.c. /be
Assignee | ||
Comment 43•22 years ago
|
||
Ok, this should be good. Phil, please let me know how it tests. /be
Attachment #96959 -
Attachment is obsolete: true
Comment 44•22 years ago
|
||
In js_CloneRegExpObject, shouldn't if (!JS_SetPrivate(cx, clone, re) || !js_SetLastIndex(cx, obj, 0)) { be if (!JS_SetPrivate(cx, clone, re) || !js_SetLastIndex(cx, clone, 0)) { ?
Comment 45•22 years ago
|
||
The patch produces one test regression: js/tests/js1_2/function/regexparg-2-n.js js> load('D:/JS_TRUNK/mozilla/js/tests/js1_2/shell.js'); js> load('D:/JS_TRUNK/mozilla/js/tests/js1_2/function/regexparg-2-n.js'); JS_1.2 The variable statement function f(x) {return x;}; x = f(/abc/); x() = undefined FAILED! expected: error From the test: ------------------------------------------------------------------------------- /* Regression test for http://scopus/bugsplat/show_bug.cgi?id=122787 Passing a regular expression as the first constructor argument fails Author: christine@netscape.com Date: 15 June 1998 */ var SECTION = "JS_1.2"; var VERSION = "JS_1.2"; startTest(); var TITLE = "The variable statment"; writeHeaderToLog( SECTION + " "+ TITLE); var testcases = new Array(); function f(x) {return x;} x = f(/abc/); testcases[tc++] = new TestCase( SECTION, "function f(x) {return x;}; x = f(/abc/); x()", "error", x() ); test(); -------------------------------------------------------------------------------
Comment 46•22 years ago
|
||
The test is failing because the original initialization of ok to JS_FALSE got subsumed at line 2731 and the error (arising from argc == 0) gets ignored.
Comment 47•22 years ago
|
||
rogerl showed me the essence of the above test: CURRENT JS SHELL: js> /a/(); 1: SyntaxError: no input for /a/ JS SHELL WITH LATEST PATCH: js> /a/(); js> That is, before the patch it was a SyntaxError to call a regexp with no input string. This was true whether we used the syntax |re()| or |re.exec()|. The testcase is expecting this error. After the patch, however, no error is being produced -
Assignee | ||
Comment 48•22 years ago
|
||
Thanks^2 rogerl. /be
Attachment #97109 -
Attachment is obsolete: true
Comment 49•22 years ago
|
||
Comment on attachment 97238 [details] [diff] [review] i hope this is it! r=rogerl
Attachment #97238 -
Flags: review+
Comment 50•22 years ago
|
||
Ran the latest patch in the optimized JS shell on WinNT against the entire JS testsuite. No test regressions were introduced -
Assignee | ||
Comment 51•22 years ago
|
||
Fix went in the other day. /be
Status: ASSIGNED → RESOLVED
Closed: 23 years ago → 22 years ago
Resolution: --- → FIXED
Comment 52•22 years ago
|
||
Marking Verified Fixed. The above testcase (see Comment #33) passes in both the debug and optimized JS shell on WinNT. Furthermore, as in Comment #34: > Ran the entire JS testsuite on WinNT and Linux, and got only the known > test failures. The testsuite does not set up a multi-threaded environment, > however. Lacking that, marking Verified Fixed until proven otherwise -
Status: RESOLVED → VERIFIED
Updated•19 years ago
|
Flags: testcase+
Comment 53•19 years ago
|
||
note <http://test.bclary.com/tests/mozilla.org/js/js-test-driver-standards.html?test=js1_2/function/regexparg-2-n.js;language=type;text/javascript> fails in all versions of ff 1.0.x, 1.5.x, 1.6 on win/linux/mac in the _browser only if popups are disabled for the site_. When run in the shell, the js console, or in the url bar as |javascript:function f(x) {return x;}; x = f(/abc/); x()| it will throw the error.
Comment 54•19 years ago
|
||
(In reply to comment #53) ignore that popup comment. it just fails in the browser period. its late and thats the story i'm sticking to.
You need to log in
before you can comment on or make changes to this bug.
Description
•