Closed
Bug 604756
Opened 14 years ago
Closed 14 years ago
crash [@ JSString::flatten() ] [@ JSString::flatten ]
Categories
(Core :: JavaScript Engine, defect)
Tracking
()
RESOLVED
DUPLICATE
of bug 608142
Tracking | Status | |
---|---|---|
blocking2.0 | --- | beta7+ |
People
(Reporter: scoobidiver, Assigned: dmandelin)
References
Details
(Keywords: crash, regression, reproducible)
Crash Data
Attachments
(3 files, 1 obsolete file)
9.88 KB,
patch
|
luke
:
review+
damons
:
approval2.0+
|
Details | Diff | Splinter Review |
3.19 KB,
patch
|
luke
:
review+
|
Details | Diff | Splinter Review |
1.56 KB,
patch
|
dvander
:
review+
gal
:
review+
|
Details | Diff | Splinter Review |
Build: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:2.0b8pre) Gecko/20101015 Firefox/4.0b8pre
It is a residual crash signature with a low daily rate of about 5
crashes/day.
From b8pre/20101015 build, there is a spike in crashes. It becomes #6 top
crasher.
The regression range for the spike is :
http://hg.mozilla.org/mozilla-central/pushloghtml?fromchange=ad0a0be8be74&tochange=19cb42fa4554
Signature JSString::flatten()
UUID 84334e51-5271-49ec-a2c4-8972b2101015
Time 2010-10-15 12:57:58.4697
Uptime 129
Last Crash 136 seconds before submission
Install Age 12337 seconds (3.4 hours) since version was first installed.
Product Firefox
Version 4.0b8pre
Build ID 20101015042126
Branch 2.0
OS Windows NT
OS Version 6.1.7600
CPU x86
CPU Info GenuineIntel family 6 model 15 stepping 11
Crash Reason EXCEPTION_ACCESS_VIOLATION_READ
Crash Address 0x0
App Notes AdapterVendorID: 10de, AdapterDeviceID: 0e22
Frame Module Signature [Expand] Source
0 mozjs.dll JSString::flatten js/src/jsstr.cpp:118
1 mozjs.dll js_ConcatStrings js/src/jsstr.cpp:404
2 mozjs.dll js::Interpret js/src/jsinterp.cpp:3588
3 mozjs.dll js::RunScript js/src/jsinterp.cpp:638
4 mozjs.dll js::Invoke js/src/jsinterp.cpp:747
5 mozjs.dll js_fun_apply js/src/jsfun.cpp:2369
6 mozjs.dll js::Interpret js/src/jsinterp.cpp:4714
7 mozjs.dll js::RunScript js/src/jsinterp.cpp:638
8 mozjs.dll js::Invoke js/src/jsinterp.cpp:747
9 mozjs.dll js::ExternalInvoke js/src/jsinterp.cpp:871
10 mozjs.dll JS_CallFunctionValue js/src/jsapi.cpp:4961
11 xul.dll nsXPCWrappedJSClass::CallMethod js/src/xpconnect/src/xpcwrappedjsclass.cpp:1692
12 xul.dll nsXPCWrappedJS::CallMethod js/src/xpconnect/src/xpcwrappedjs.cpp:571
13 xul.dll PrepareAndDispatch xpcom/reflect/xptcall/src/md/win32/xptcstubs.cpp:114
14 xul.dll SharedStub xpcom/reflect/xptcall/src/md/win32/xptcstubs.cpp:141
15 xul.dll nsThread::ProcessNextEvent xpcom/threads/nsThread.cpp:547
16 xul.dll nsThread::ThreadFunc xpcom/threads/nsThread.cpp:263
17 nspr4.dll _PR_NativeRunThread nsprpub/pr/src/threads/combined/pruthr.c:426
18 nspr4.dll pr_root nsprpub/pr/src/md/windows/w95thred.c:122
19 mozcrt19.dll _callthreadstartex obj-firefox/memory/jemalloc/crtsrc/threadex.c:348
20 mozcrt19.dll _threadstartex obj-firefox/memory/jemalloc/crtsrc/threadex.c:326
21 kernel32.dll BaseThreadInitThunk
22 ntdll.dll __RtlUserThreadStart
23 ntdll.dll _RtlUserThreadStart
More reports at:
http://crash-stats.mozilla.com/report/list?product=Firefox&query_search=signature&query_type=exact&query=JSString%3A%3Aflatten%28%29&range_value=4&range_unit=weeks&hang_type=any&process_type=any&plugin_field=&plugin_query_type=&plugin_query=&do_query=1&admin=&signature=JSString%3A%3Aflatten%28%29
Reporter | ||
Updated•14 years ago
|
blocking2.0: --- → ?
Reporter | ||
Comment 1•14 years ago
|
||
It happens also on Mac OS X.
OS: Windows 7 → All
Summary: crash [@ JSString::flatten() ] → crash [@ JSString::flatten() ] [@ JSString::flatten ]
Comment 2•14 years ago
|
||
#10 and #57 ranked crash on the trunk yesterday. might have been around on the trunk for awhile but something seems to have caused a bit of a spike in the last few days.
date tl crashes at, count build, count build, ...
JSString::flatten
20100927
20100928
20100929 2 4.0b62010091407 2 ,
20100930 1 4.0b62010091407 1 ,
20101001
20101002
20101003 2 1 4.0b7pre2010100303,
1 4.0b7pre2010100203,
20101004 1 4.0b7pre2010100403 1 ,
20101005 3 2 4.0b7pre2010100403,
1 4.0b7pre2010100503,
20101006
20101007
20101008
20101009
20101010
20101011 1 4.0b8pre2010101003 1 ,
20101012
20101013
20101014 2 4.0b8pre2010101403 2 ,
20101015 6 4.0b8pre2010101503 6 ,
20101016 9 4 4.0b8pre2010101603,
4 4.0b8pre2010101503, 1 4.0b8pre2010101602,
there was some similar higher activity on this signature back around the 1st of sept.
date tl crashes at, count build, count build, ...
JSString::flatten
20100901 4 4.0b4pre2010080703 4 ,
20100902 3 2 4.0b42010081812,
1 4.0b4pre2010080703,
20100903 5 4 4.0b4pre2010080703,
1 4.0b5pre2010082406,
20100904
20100905 2 4.0b42010081812 2 ,
20100906 2 4.0b42010081812 2 ,
20100907 1 4.0b42010081812 1 ,
20100908 3 2 4.0b42010081812,
1 4.0b52010083107,
20100909
20100910
then things tailed off.
Comment 3•14 years ago
|
||
er, that's just the mac track record. the windows spike is much higher with it going from around 1-10 crashes per day to near 50 maybe starting with 4.0b8pre build 2010101504
20101012 2 4.0b62010091408 2 ,
20101013 8 7 4.0b62010091408,
1 4.0b8pre2010101304,
20101014 9 4 4.0b8pre2010101322,
2 4.0b8pre2010101404, 2 4.0b62010091408,
1 4.0b62010091323,
20101015 49 38 4.0b8pre2010101504,
8 4.0b62010091408, 2 4.0b8pre2010101404,
1 4.0b42010081813,
20101016 53 27 4.0b8pre2010101604,
24 4.0b8pre2010101504, 1 4.0b62010091408,
1 4.0b42010081813,
Comment 4•14 years ago
|
||
Just hit that crash (bp-ceb8464f-13e2-43e7-8f9b-5b0c92101018 - stack identical to the one quoted by Scoobidiver) when testing a new Adblock Plus feature which isn't checked in yet. The crash happened when I hit a button that closes a XUL window and reloads a page in browser (http://www.heise.de/). However, doing exactly the same thing again no longer triggers the crash, so the issue is probably intermittent. Or it is related to one of the ads on that page which change on each reload.
Comment 5•14 years ago
|
||
Hit this after restoring my Mac from sleep and opening a URL from another application (in this case, IRC in a terminal).
Updated•14 years ago
|
blocking2.0: ? → beta7+
Assignee | ||
Comment 6•14 years ago
|
||
Here is the crash:
void
JSString::flatten()
{
JSString *topNode;
jschar *chars;
size_t capacity;
JS_ASSERT(isRope());
/*
* This can be called from any string in the rope, so first traverse to the
* top node.
*/
topNode = this;
while (topNode->isInteriorNode())
topNode = topNode->interiorNodeParent(); *** CRASH HERE
In the common case, it is an NPE on topNode. The assembly shows that it occurs after |interiorNodeParent| has been called at least once. This makes sense, because we would have crashed on |topNode->isInteriorNode| otherwise.
So, it appears at the point of the crash we have a string that is tagged as an INTERIOR_NODE, but has a NULL parent, which is presumably invalid. But it's hard to say whether the tag is wrong or the parent is wrong. If the tag is wrong, then the memory that stores the parent for interior nodes is also unioned with a 4-byte inline storage, a mutable flat string capacity, and a JSRopeBufferInfo for top nodes. Alan, do you know which of those, if any, are allowed to be NULL?
An easy first step is to add an assertion in this loop, and see if any of our existing test cases trip it.
Assignee | ||
Comment 7•14 years ago
|
||
(In reply to comment #6)
> An easy first step is to add an assertion in this loop, and see if any of our
> existing test cases trip it.
Of course, that makes no sense, because any such test case would already be hitting this crash.
Assignee | ||
Comment 8•14 years ago
|
||
In terms of history, there are 3 notable events with this crash:
0. There was bug 585309, which got fixed around 8/8/2010.
1. The crash started again on 9/23. The earliest builds with the 'new' crash are from 9/21 (20100921041551) and occurred on 9/24.
The last merge from TM before then was on 9/20:
http://hg.mozilla.org/mozilla-central/pushloghtml?fromchange=e4fdb23efa33&tochange=e6cfaf0840cb
This happens to be the same time period as bug 601725, but I don't know that there is any relationship. Bug 601725 does relate to requests and/or threads, and if something of that kind went wrong in there, I could see that maybe it is corrupting string headers by sharing them when it shouldn't.
Another thing that happened in that merge is that we started JM-compiling some new ops that we hadn't before, so it's possible we exposed a bug somewhere by doing that. That doesn't seem likely, though.
2. The crash got much more frequent starting with 10/15 builds.
Compartments landed just before that, but the first m-c build with compartments was, I believe, on 10/14, so I think the big compartments landing was *not* the cause here.
But there were 4 compartments-related landings on m-c on 10/14, that would have first run in 10/15 builds:
(Don't try to waive Xray wrapper for primitives)
http://hg.mozilla.org/mozilla-central/rev/4666d67cba75
(Don't create Xray wrappers for chrome objects in sandboxes)
http://hg.mozilla.org/mozilla-central/rev/ed24c21e6497
(Some test changes)
http://hg.mozilla.org/mozilla-central/rev/ca9083cab8ef
http://hg.mozilla.org/mozilla-central/rev/83a06cce6b14
Of these, the first seems the most likely to me, since it involves primitives.
---
Given the clear spike in 10/15 builds, if code inspection/thinking don't provide a ready answer, it seems like backing out these patches one by one (modulo not introducing orange) and seeing if the numbers go back down is a reasonable next step.
Comment 9•14 years ago
|
||
(In reply to comment #6)
> So, it appears at the point of the crash we have a string that is tagged as an
> INTERIOR_NODE, but has a NULL parent, which is presumably invalid. But it's
> hard to say whether the tag is wrong or the parent is wrong. If the tag is
> wrong, then the memory that stores the parent for interior nodes is also
> unioned with a 4-byte inline storage, a mutable flat string capacity, and a
> JSRopeBufferInfo for top nodes. Alan, do you know which of those, if any, are
> allowed to be NULL?
-Inline storage is only valid in static strings and short strings, and is rarely NULL, if ever.
-Capacity is usually 0, and that field is always NULL in dependent strings, so it's possible that we have a flat or dependent string mistagged as an interior node.
-mBufferWithInfo is almost never NULL. However, it's NULL in js_ConcatStrings just before a top node is converted to an interior node, after we free the buffer that it points to.
A few other thoughts:
I noticed that the caller of flatten is often js_ConcatStrings on line 404, which is the call to FinishConcat. A bunch of things get inlined, but I don't see where flatten() would be called. In js_ConcatStrings, my code assumes that NewFinalizableGCThing will not flatten any existing strings (e.g. by calling chars() on them). If one of the arguments is flattened while calling NewFinalizableGCThing, then I think we would crash in flatten when we first because a top node has a NULL buffer (which doesn't agree with any of the crash line numbers, I think).
There seem to be a number of broken invariants with strings here. For example
http://crash-stats.mozilla.com/report/index/db1d4cc0-3766-430e-9a51-1e96b2101017
shows flatten calling itself, which should never happen. What's happening there is there's a node that has a top node as its left child, which should never happen.
The change http://hg.mozilla.org/mozilla-central/rev/4666d67cba75 seems like it could have an effect on this kind of thing, since JS_WrapValue flattens its argument if it's a string.
Comment 10•14 years ago
|
||
This is happening upon restarting from applying the latest nightly update, on Mac. 20101019.
Keywords: reproducible
Comment 11•14 years ago
|
||
Alan, can you explain a bit why the JS_WrapValue code flattening is a problem here?
Assignee | ||
Comment 12•14 years ago
|
||
From conversation with Andreas: the JS_WrapValue patch vastly increased the number of flatten operations that we do, so it makes sense that that patch would spike the frequency of this crash.
At this point we have:
- This started around the time as bug 601725.
- Invariants are being violated. (Interestingly, that is also happening in bug 601725.)
Adding more frequent assertions to check the invariants might help here.
Assignee | ||
Comment 13•14 years ago
|
||
Update on the history--I think comment 8 is invalid. For that comment, I was looking only at nightlies. For some reason we don't see this crash in nightlies before 9/24, but it was occurring in betas. In fact, it seems we had this crash in all betas that had ropes. So, there's a good chance this was a bug in the original code.
At this point, I think we should figure out again what invariant violations are occurring, then audit the code for them, and add assertions to catch this as early as possible. We probably want them to be assertions in opt builds as well so we can get data from Socorro.
Comment 14•14 years ago
|
||
looking at a summary of the top ten frames of all the stacks in the top 200 signatures these signatures all seem to run though a lot of the same code as the stack in comment 0
#9 JSString::flatten
#12 js_NewGCStringJSContext
#21 JSString::flatten
#22 js_ConcatStrings
#87 js_ConcatStringsJSContext,JSString,JSString
#135 js_NewGCShortStringJSContext
search for ConcatStrings in
http://people.mozilla.com/crash_stacks/stack-summary-4.0b8pre.txt
bugs like bug 605852 are appearing on these other signatures. should we dup against this one?
Assignee | ||
Comment 15•14 years ago
|
||
(In reply to comment #14)
> bugs like bug 605852 are appearing on these other signatures. should we dup
> against this one?
Seems OK to me--we can easily reopen them if they prove to be different bugs.
Comment 16•14 years ago
|
||
Bug 605852 is definitely a dup of
Bug 605836 - crash [@ js_NewGCShortString(JSContext*)
but comments in 605836 there suggest it may be related to
Bug 601725 - Firefox 4.0b7pre Crash Report [@ StartRequest ]
Maybe it adds do much complication to mix that chain of bugs in until we understand that one better.
Assignee | ||
Comment 17•14 years ago
|
||
Luke gave me the idea of looking more closely at the crash stacks. It turns out they cluster into 2 main kinds:
- type 1: In these, the next stack signature after |flatten| is |ConcatStrings:404|. This line of code calls FinishConcat. I'm not even sure how that calls flatten, so I need to look at minidumps more closely.
- type 2: In these, we crash on the lnes of ConcatStrings that call flatten directly:
if (left->isInteriorNode())
left->flatten();
if (right->isInteriorNode())
right->flatten();
The crash appears to be that in walking up to find the top node, we run into a null pointer first. This was the cause I noted earlier.
In crashes from the last few days, 2/3 (of a 30-crash sample I classified were of type 1, and 1/3 were of type 2. Before the 10/15 spike, 1/6 (of a 12-crash sample) were of type 1, and 5/6 were of type 2. So there's a good chance there are 2 separate bugs here, where presumably the type 1 crashes are the reason this blocks.
Assignee | ||
Updated•14 years ago
|
Assignee: general → dmandelin
Assignee | ||
Comment 18•14 years ago
|
||
It turns out the stack traces are bogus in the type 1 crashes. The minidumps in MSVC show that we are really calling flatten from the same lines of code as in type 2 crashes.
Build identifier: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.6; rv:2.0b8pre) Gecko/20101021 Firefox/4.0b8pre
Crashed as well. Flash had crashed, I dismissed the dialog. After browsing some more, the app ended up crashing as well.
http://crash-stats.mozilla.com/report/index/bp-f75000de-30bc-47a5-87d5-9dcba2101021
Assignee | ||
Comment 20•14 years ago
|
||
I want to land this temporarily on m-c to gather more data.
Assignee | ||
Updated•14 years ago
|
Attachment #485202 -
Flags: review?(dvander)
Assignee | ||
Updated•14 years ago
|
Attachment #485202 -
Flags: review?(dvander) → review?(lw)
Comment 21•14 years ago
|
||
Comment on attachment 485202 [details] [diff] [review]
Diagnostic patch
Excellent!
>+ // About to crash: record some black-box info.
>+ char blackbox[4 + sizeof(JSString) + 4];
>+ memcpy(blackbox, "AAAA", 4);
>+ memcpy(blackbox + 4 + sizeof(JSString), "AAAA", 4);
>+ memcpy(blackbox + 4, (char *) topNode, sizeof(JSString));
Since memcpy is known to the compiler, this whole sequence may get killed. 'volatile char blackbox' should do the trick although to be really really safe, you might additionally assign &blackbox[0] to a volatile global.
Attachment #485202 -
Flags: review?(lw) → review+
Assignee | ||
Comment 22•14 years ago
|
||
Comment on attachment 485202 [details] [diff] [review]
Diagnostic patch
Requesting approval for this diagnostic patch that will help us fix the b7 blocker.
Attachment #485202 -
Flags: approval2.0?
Updated•14 years ago
|
Attachment #485202 -
Flags: approval2.0? → approval2.0+
Assignee | ||
Comment 23•14 years ago
|
||
Crash reports with the diagnostic patch are starting to come in. So far, 3 out 4 crash at the same point with a crash address of 0x2. This suggests the bug is related to the traversal loop in flatten:
- Before the loop starts, the top node is converted to an interior node with a parent of 0x2 (with the diagnostic patch--it is NULL in the original code).
- The loop is supposed to walk over the tree postorder, left child first, converting each interior node to a flat dependent string.
- The loop finishes when it reaches a node with a parent of 0x2: this is supposed to be the original top node.
- After the loop finishes, the top node is converted to a flat mutable string.
So, how do we end up with interior nodes with a parent of 0x2 at the crash point? It seems like it can't happen. flatten() sets topNode's parent to 0x2. At the end, it sets topNode to a flat string, at a point that postdominates the point that sets 0x2.
- One theoretical possibility is that two threads call flatten on the same string simultaneously. Say T1 calls flatten, sets the 0x2 parent, then we switch to thread T2. T2 calls flatten on the same string and reaches the original top node while it still has a 0x2 parent, then we crash? Can this happen? I don't know, but it seems like the best bet at this point.
We could try to detect that by keeping a global variable that tells us if we are in flatten, and seeing if we have re-entered flatten when we crash.
I'll put recommendations in the next comment to set them off from these investigative notes.
Comment 24•14 years ago
|
||
Is it true that (almost) all crashes happen in chrome code? I saw only one stack with JaegerShot, and JM is disabled for chrome...
Assignee | ||
Comment 25•14 years ago
|
||
Recommendations:
1. Disable the ropes code for the b7 branch. It looks like a threading bug, so schedule risk is high. We can re-enable on trunk and continue investigating.
2. Next steps in investigation:
2.1. Try to show that we re-enter |flatten| when we crash. As noted above, we can keep a global "in-flatten" flag and assert if we are about to crash while re-entering. We could try to get more fine-grained by keeping track of which strings we are trying to flatten and asserting if it's the same one, but that is harder because in this context S1 and S2 are "the same" if they are members of the same rope.
2.2. Try preventing re-entrance of |flatten| with some locking. If that makes the crash go away, then we know the cause. This seems to give less information than 2.1, so it is probably worth trying only if it is easier or more reliable.
2.3. Add a |cx| parameter to |flatten|, then assert that we don't have a compartment mismatch in flatten. This seems to be a pretty direct way of detecting violations of our basic threading invariants. And if it does assert, then we know the exact point that is making a bad cross-compartment call. The disadvantage is that if we do this and don't get asserts, we still haven't ruled out all threading issues: we'd still have to do 2.1 or 2.2 for that.
2.4. The spike on 10/15 seems kind of interesting again. If we look at those changes, in particular the one that calls |flatten| more often when wrapping strings, we might be able to spot the issue.
I think 2.3 is the most promising right now. We know there are compartment mismatch bugs, and this could just be another one. Keep in mind that this crash predates compartments, so compartments probably didn't cause the bug; the issue would be more that compartments didn't stop it.
Assignee | ||
Comment 26•14 years ago
|
||
(In reply to comment #24)
> Is it true that (almost) all crashes happen in chrome code? I saw only one
> stack with JaegerShot, and JM is disabled for chrome...
Hey, great observation! I think that supports the idea that these crashes are threading-related, since content isn't supposed to use multithreading for the most part. It also means we could disable the ropes optimization only for contexts that are chrome (or that have methodjit off), which would hopefully nearly eliminate the crash while also not costing us any perf in content.
Comment 27•14 years ago
|
||
Thats a very useful observation Jan. Awesome. Dave, we should look at extensions. Browser chrome doesn't multithread either.
Comment 28•14 years ago
|
||
Is JM enabled for workers?
Comment 29•14 years ago
|
||
Looks to me like JM is enabled for workers:
http://mxr.mozilla.org/mozilla-central/source/dom/base/nsJSEnvironment.cpp#1241
and workers use nsIScriptContext's only mplementation, nsJSContext. Bent can confirm.
The string code grew up immutable, so safe cross-thread. I think we need to restore that even if it costs us on the thread-crossing boundaries (should have zero cost on same-thread-only code and data).
/be
Comment 30•14 years ago
|
||
Compartment transitions flatten and copy strings. We are thread-safe by design. This is a bug.
(In reply to comment #29)
> Looks to me like JM is enabled for workers:
No. Workers have their own contexts and the relevant options are here:
http://mxr.mozilla.org/mozilla-central/source/dom/src/threads/nsDOMThreadService.cpp#1031
JM is not enabled for workers yet, though I'd love to try it out.
Comment 32•14 years ago
|
||
Alright, I am pretty sure this is a workers-related bug then. Lets whip up a stress test that sends strings back and forth maybe.
Hm, then maybe related to the new structured clone stuff.
We serialize here:
http://mxr.mozilla.org/mozilla-central/source/dom/src/threads/nsDOMWorker.cpp#1544
and deserialize here:
http://mxr.mozilla.org/mozilla-central/source/dom/src/threads/nsDOMWorkerEvents.cpp#303
Comment 34•14 years ago
|
||
The structured cloning stuff decomposes the string into a stream, so that seems fine. Bent, any other way a worker can touch a JS string?
Well, we do in a few places here:
http://mxr.mozilla.org/mozilla-central/search?string=nsdependentjsstring&find=dom%2Fsrc%2Fthreads&findi=&filter=^[^\0]*%24&hitlimit=&tree=mozilla-central
... but it doesn't look like we hang on to them or anything, we just convert to xpcom strings.
The only other place where we use JSStrings is in setTimeout calls where a string expression is passed, here:
http://mxr.mozilla.org/mozilla-central/source/dom/src/threads/nsDOMWorkerTimeout.cpp#155
The strings are then evaluated here:
http://mxr.mozilla.org/mozilla-central/source/dom/src/threads/nsDOMWorkerTimeout.cpp#196
Oh, and those setTimeout strings may not be saved and evaluated on the same thread, but they're guaranteed to not run on two threads simultaneously.
Comment 37•14 years ago
|
||
We are flattening a string on 2 threads simultaneously, so this must be some sort of user data that leaks across. Ropes tend to be generated by user code, not within gecko.
Comment 38•14 years ago
|
||
ProxyAutoConfig was on a thread at one time...
Comment 39•14 years ago
|
||
(In reply to comment #25)
> Recommendations:
>
> 1. Disable the ropes code for the b7 branch. It looks like a threading bug, so
> schedule risk is high. We can re-enable on trunk and continue investigating.
If you do this, you shouldn't just revert to the pre-ropes string concat code, since that isn't compatible with our current design of strings (in particular, it reallocs buffers of flat strings, which isn't allowed anymore because dependent strings now keep a direct pointer to their chars).
The right way to disable ropes, I think, is to just call flatten at the end of js_ConcatStrings (in particular, at the end of FinishConcat). That way, even though we make and flatten ropes during js_ConcatStrings, ropes are always trivial, and the outside world never sees them. The mutable/flatCapacity optimization in js_ConcatStrings makes it so we don't get quadratic append time when concatenating to the end of a string. On my machine, making this change causes about a 5% slowdown on sunspider.
>
> 2.3. Add a |cx| parameter to |flatten|, then assert that we don't have a
> compartment mismatch in flatten. This seems to be a pretty direct way of
> detecting violations of our basic threading invariants. And if it does assert,
> then we know the exact point that is making a bad cross-compartment call. The
> disadvantage is that if we do this and don't get asserts, we still haven't
> ruled out all threading issues: we'd still have to do 2.1 or 2.2 for that.
>
Adding a cx parameter is probably hard, since JSString::chars doesn't take a cx, and chars is used all over the place (this is roughly the same reason that we eagerly allocate rope buffers instead of allocating them during flatten). Maybe a more reasonable approach would be to pass a cx from the most common callers and pass NULL everywhere else (and don't do the assert if cx is NULL).
Comment 40•14 years ago
|
||
(In reply to comment #38)
> ProxyAutoConfig was on a thread at one time...
At least in my case the crash happened without proxy autoconfig.
Comment 41•14 years ago
|
||
I disagree with disabling ropes. The bug here is that non-immutable strings are shared between threads. We happen to see ropes fail over harder than other forms of mutable strings, but disabling ropes is the wrong fix. We have to find the cross thread string leak. We have a test case that shows a crash with workers while they are concat-ing strings. Lets chase that one down with replay on Monday.
Comment 42•14 years ago
|
||
Andreas meant bug 606705 by "We have a test case ...." in comment 41.
/be
Comment 43•14 years ago
|
||
Dave, I only just noticed that the crash-if-not-on-the-string's-compartment's-thread check is only in js_ConcatStrings. Perhaps you could push it down into flatten()?
Assignee | ||
Comment 44•14 years ago
|
||
Over the weekend, almost all the crashes had an address of 0x2, so the analysis above has a good foundation. We should back out the diagnostic patch now. Do I need to get approval on that? Is that going to need 'a=' in the checkin comment?
Assignee | ||
Comment 45•14 years ago
|
||
(In reply to comment #41)
> I disagree with disabling ropes. The bug here is that non-immutable strings are
> shared between threads. We happen to see ropes fail over harder than other
> forms of mutable strings, but disabling ropes is the wrong fix. We have to find
> the cross thread string leak. We have a test case that shows a crash with
> workers while they are concat-ing strings. Lets chase that one down with replay
> on Monday.
Andreas, do you think:
(a) We should fix this bug eventually, but it is OK to switch off ropes for now to get this blocker out of the way, or
(b) The underlying thread string leak is severe enough that fixing that bug should block b7?
Assignee | ||
Comment 46•14 years ago
|
||
This patch should stop most of the immediate crashes. But Andreas wants to find the cross-thread string leakage first, so not requesting review for now.
Comment 47•14 years ago
|
||
Just following along, but I agree with Andreas: cross-thread string leakage is bad even without ropes, due to the mutable string optimization (which allows a mutable single-threaded string to become a dependent string pointing into a larger string concat result string).
/be
Assignee | ||
Comment 48•14 years ago
|
||
Regarding recent discussion of next steps:
- If we want to fix this bug fast (i.e., in order to get beta 7 out the door), and we want to fix the underlying bug in threads/compartments, then I am the wrong assignee. I know little about compartments, so I'd have to spend a bunch of time just learning how things work.
- If we don't care to fix this fast, then let's just make it block beta 8 instead.
- If we don't care to fix the underlying bug immediately, then I think we can make the crash largely go away by restricting ropes to the main content thread.
Guidance from release drivers would be greatly appreciated.
Assignee | ||
Updated•14 years ago
|
Attachment #485846 -
Flags: review?(lw)
Assignee | ||
Updated•14 years ago
|
Attachment #485846 -
Flags: approval2.0?
Updated•14 years ago
|
Attachment #485846 -
Flags: review?(lw) → review+
Comment 49•14 years ago
|
||
B7 blockers doesn't need approval.
Assignee | ||
Updated•14 years ago
|
Attachment #485846 -
Flags: approval2.0?
Assignee | ||
Comment 50•14 years ago
|
||
Restrict patch landed:
http://hg.mozilla.org/mozilla-central/rev/8dd996c91f40
- We'll have to check the crash stats for the next day or two to see if this did it.
- If this works, once b7 goes out, if we haven't figured out the underlying cause yet, we can enable ropes for content or DOM workers selectively to see if it is crashing in one of those areas or both.
Assignee | ||
Comment 51•14 years ago
|
||
So far, it doesn't look like the test patch worked. It's hard to say why. I'm pretty sure this method of 'disabling' ropes doesn't actually entirely prevent rope-related multithreading bugs, because two threads could concatenate strings S1 and S2 simultaneously, and then they could try to flatten their trees simultaneously, both of which contain both S1 and S2.
Not sure what to do next. We could try to step on races harder, by really making it not access two mutable string data structures simultaneously. Or we could try to track down the leakage. Neither path necessarily gets us to a fix very quickly.
Comment 52•14 years ago
|
||
(In reply to comment #51)
> We could try to step on races harder, by really
> making it not access two mutable string data structures simultaneously.
What about adding asserts to rope manipulation code that the current thread matches the compartment thread? One do not need cx in string methods for that. It would be sufficient to add JSThread into the compartment and then check in ropes or any other string mutation code that str->compartment()->thread->is == GetCurrentThreadId.
Assignee | ||
Comment 53•14 years ago
|
||
(In reply to comment #52)
> (In reply to comment #51)
> > We could try to step on races harder, by really
> > making it not access two mutable string data structures simultaneously.
>
> What about adding asserts to rope manipulation code that the current thread
> matches the compartment thread? One do not need cx in string methods for that.
> It would be sufficient to add JSThread into the compartment and then check in
> ropes or any other string mutation code that str->compartment()->thread->is ==
> GetCurrentThreadId.
That sounds like a good idea. But how does one go from a compartment to its thread? I don't see the name 'thread' anywhere in jscompartment.h. Also, what exactly would the thread of a compartment mean? I thought different threads could enter the same compartment, just not at the same time.
Comment 54•14 years ago
|
||
There is no notion of a compartment thread at the moment since threads enter compartments in an unbalanced manner due to cx->globalObject and SetFrameRegs.
Assignee | ||
Comment 55•14 years ago
|
||
I looked at the first 13 crashes that came in from the latest build. 5 of them are still crashing walking up the tree, mostly on address 0, but one on 0x1f. 5 more crash in the traversal loop, where str becomes 0x1ff or something bad like that. There's no way to tell which branch of the switch did it, so it could be just another case where we see a bad parent pointer, or maybe not.
Assignee | ||
Comment 56•14 years ago
|
||
We are in a tough spot here. We have pretty good evidence that the crash is caused by two threads calling |flatten| simultaneously, but that's all we know so far.
Here's what we think is happening. |str->flatten()| can be abstracted to this:
1. enter the function
2. walk to the root of str's concat tree
3. break invariants of the concat tree
4. do stuff
5. restore invariants of the concat tree
If we crash in thread T1, it could happen because of a schedule like this:
T1 T2
enter
walk
break
enter
walk
XXX walk crashes because invariants are broken
Note that the diagnostic patch checked compartments of the arguments to js_ConcatStrings. That is, it would have crashed with a specific address if the compartment of either string argument did not match the compartment of the cx. No such crashes were observed, which tells us that js_ConcatStrings, at least, is usually being used correctly wrt compartments. It seems that some other thread, one that doesn't run scripts, is making the mismatched-compartment call.
We might like to learn more about this by setting things up so that we crash if a second thread enters |flatten| for a given concat tree while a thread is already in |flatten|. But this seems like it probably wouldn't tell us much that we don't already know. The reason is that the second thread is already crashing, so we wouldn't get any new stacks. And it's really the first thread that we probably want to know about, because our crash stacks have a lot of js_ConcatStrings, which appears to be using compartments correctly, as explained in the previous paragraph. So all it would really do is help confirm that we are crashing because of races. But we have other ways of doing that that may be easier.
Comment 57•14 years ago
|
||
(In reply to comment #56)
> Note that the diagnostic patch checked compartments of the arguments to
> js_ConcatStrings. That is, it would have crashed with a specific address if the
> compartment of either string argument did not match the compartment of the cx.
> No such crashes were observed, which tells us that js_ConcatStrings, at least,
> is usually being used correctly wrt compartments.
I don't think this is true. From the recent nightly crashes in js_ConcatStrings:
http://crash-stats.mozilla.com/report/list?range_value=2&range_unit=weeks&date=2010-10-26%2014%3A00%3A00&signature=js_ConcatStrings&version=Firefox%3A4.0b8pre
it looks like when the diagnostic patch was there, there were a number of crashes at addresses 0xd0 and 0xd4, which were the two addresses used for the compartment checks.
Assignee | ||
Comment 58•14 years ago
|
||
(In reply to comment #57)
> (In reply to comment #56)
> > Note that the diagnostic patch checked compartments of the arguments to
> > js_ConcatStrings. That is, it would have crashed with a specific address if the
> > compartment of either string argument did not match the compartment of the cx.
> > No such crashes were observed, which tells us that js_ConcatStrings, at least,
> > is usually being used correctly wrt compartments.
>
> I don't think this is true. From the recent nightly crashes in
> js_ConcatStrings:
>
> http://crash-stats.mozilla.com/report/list?range_value=2&range_unit=weeks&date=2010-10-26%2014%3A00%3A00&signature=js_ConcatStrings&version=Firefox%3A4.0b8pre
>
> it looks like when the diagnostic patch was there, there were a number of
> crashes at addresses 0xd0 and 0xd4, which were the two addresses used for the
> compartment checks.
That helps a lot! I forgot that we need to look with the other signature to find those. I looked at a bunch of them and they are pretty much just normal calls to stubs::Add, which makes me think there's just random crap that leaks strings across compartments, which is about what we expected. (We did also wonder about other threads just calling stuff, like DOM workers or minor browser threads.)
Here's the correlation report for js_ConcatStrings:
78% (25/32) vs. 27% (103/379) firebug@software.joehewitt.com (Firebug, https://addons.mozilla.org/addon/1843)
44% (14/32) vs. 6% (24/379) jsonview@brh.numbera.com (JSONView, https://addons.mozilla.org/addon/10869)
34% (11/32) vs. 5% (18/379) firediff@johnjbarton.com (Firediff, https://addons.mozilla.org/addon/13179)
34% (11/32) vs. 9% (34/379) {c45c406e-ab73-11d8-be73-000a95be3b12} (Web Developer, https://addons.mozilla.org/addon/60)
44% (14/32) vs. 19% (71/379) {3d7eb24f-2740-49df-8937-200b1cc08f8a} (Flashblock, https://addons.mozilla.org/addon/433)
31% (10/32) vs. 9% (33/379) {e4a8a97b-f2ed-450b-b12d-ee082ba24781} (Greasemonkey, https://addons.mozilla.org/addon/748)
25% (8/32) vs. 3% (13/379) netexport@getfirebug.com
25% (8/32) vs. 3% (13/379) eventbug@getfirebug.com
25% (8/32) vs. 3% (13/379) selectbug@getfirebug.com
25% (8/32) vs. 3% (13/379) firestarter@getfirebug.com
25% (8/32) vs. 4% (14/379) facebookBlocker@webgraph.com
25% (8/32) vs. 4% (15/379) rapportive@rapportive.com
25% (8/32) vs. 4% (16/379) {6AC85730-7D0F-4de0-B3FA-21142DD85326} (ColorZilla, https://addons.mozilla.org/addon/271)
25% (8/32) vs. 4% (16/379) foxmarks@kei.com (Xmarks (formerly Foxmarks), https://addons.mozilla.org/addon/2410)
25% (8/32) vs. 4% (17/379) {c151d79e-e61b-4a90-a887-5a46d38fba99} (Pearl Crescent Page Saver Basic, https://addons.mozilla.org/addon/10367)
25% (8/32) vs. 4% (17/379) firecookie@janodvarko.cz (Firecookie, https://addons.mozilla.org/addon/6683)
50% (16/32) vs. 30% (114/379) compatibility@addons.mozilla.org
22% (7/32) vs. 3% (12/379) forcetls@sid.stamm (Force-TLS, https://addons.mozilla.org/addon/12714)
25% (8/32) vs. 7% (28/379) {46551EC9-40F0-4e47-8E18-8E5CF550CFB8} (Stylish, https://addons.mozilla.org/addon/2108)
25% (8/32) vs. 9% (34/379) {DDC359D1-844A-42a7-9AA1-88A850A938A8} (DownThemAll!, https://addons.mozilla.org/addon/201)
16% (5/32) vs. 3% (10/379) chromebug@johnjbarton.com
13% (4/32) vs. 3% (11/379) grafxbot@mozilla.org
9% (3/32) vs. 1% (5/379) speak.words@agadak.net
9% (3/32) vs. 2% (7/379) es-es@dictionaries.addons.mozilla.org (Spanish (Spain) Dictionary, https://addons.mozilla.org/addon/3554)
9% (3/32) vs. 2% (9/379) sendtophone@martinezdelizarrondo.com
50% (16/32) vs. 44% (166/379) {d10d0bf8-f5b5-c8b4-a8b2-2b9879e08c5d} (Adblock Plus, https://addons.mozilla.org/addon/1865)
9% (3/32) vs. 3% (13/379) yslow@yahoo-inc.com (YSlow, https://addons.mozilla.org/addon/5369)
9% (3/32) vs. 4% (14/379) en-US@dictionaries.addons.mozilla.org (United States English Dictionary, https://addons.mozilla.org/addon/3497)
It looks very possible that there are a variety of extensions, principally developer extensions like Firebug, that are leaking strings across compartments. I do see a few of these that have no extensions installed, but most of them have Firebug, and several others are common, so there's a good chance extensions are driving the volume up here.
Assignee | ||
Comment 59•14 years ago
|
||
So, what can we try next? I see a few possibilities:
1. Try to confirm that we are crashing because of races.
One way would be to put a mutex around flatten, so 2 threads can never enter it
simultaneously. This actually isn't quite right, because you are allowed to
call flatten only on a rope node, and flatten makes the nodes not be rope
nodes. So the mutex would have to be acquired before deciding to call flatten,
and released at the end.
Another way would be to put a count of how many threads are currently
flattening a given rope top node at a time in the string header. Then, if we
are about to crash, we can encode that value into the address. If it is usually
greater than 1, then we can conclude races cause the crash.
What would this buy us? If it says we are crashing due to races, not much, just
confirmation of our guess. If is says we are not crashing due to races, then we
need to start over.
2. Try to find the first thread that calls flatten.
This is thread we don't get to see now. Bill and I had an idea how to do this.
Strings would carry an 'in-flatten' flag, and a 'crash-when-done-with-flatten'
flag. If a call enters flatten and the in-flatten flag is set, it would set the
crash flag and then wait for 10 seconds. When exiting flatten, crash if the
crash flag is set. This would give us a chance to see the call stack that
presumably is not supposed to be able to call flatten.
It's also possible that that first thread is showing in dumps we have already
received. I checked a few and it wasn't there. I'm not sure if we should expect
to see it--I don't know what happens with thread scheduling between a crash and
crashreport generation.
What would this buy us? It might point us directly at the answer, if the call
stack was obviously calling from the wrong place, or doing some JSAPI from the
wrong thread. Conversely, it might tell us very little, if both call stacks
look fine, and the real problem is that a string was leaked to the wrong
thread/compartment earlier in time. But maybe we could add a compartment check
near the bad call into flatten, and then try to work back from there.
3. Try to find the cross-compartment call.
We could add a cx argument to flatten, push that back through the APIs, and
then do a compartment check in flatten. It might touch many lines of code,
because chars() also has no cx and calls flatten, and is called in many places.
Also, there might be public APIs that transitively call flatten and don't take
a cx. For those, we'd need to create a new version that does take a cx, and
then use that in the browser. We'd have to provide some other check for
extensions, maybe just check whether they are racing using one of the other
techniques.
What would this buy us? Potentially, a lot. There are a lot of compartment
checks in the code now, and they seem pretty productive for bugfinding (of
course, in those cases there are STR so it's easier). Still, it seems like it
could take us a lot closer to the problem if they hit. If we added these checks
and they didn't hit, we'd have to rethink.
4. Try to find the "wrong thread" call.
It seems that the problem is that thread T2 calls flatten when it really
shouldn't. So, can we check in flatten that we are on the right thread? It
seems hard--it is perfectly OK for different threads to call into the same
apartment at different times. I'm not sure there is any notion of "correct
thread" that we can use here.
--
It seems best to start with #1 today, to find out if threads actually is the
right idea at all. After that, #2 seems better, because although #3 seems more
direct, #2 seems much easier to implement.
A crazy option would be to just make flatten thread-safe. We could use some
kind of thin lock on the root node of the rope, and then make any second
thread that enters wait, and then return once the first thread has returned.
I'm not sure what the perf effects would be.
Assignee | ||
Comment 60•14 years ago
|
||
See also comments 11-14 in bug 604818, which may be related. For at least one user, the problem went away with Firebug, consistent with my comment 58 here on Firebug and such possibly causing a lot of this crash.
Andreas says Firebug doesn't create threads, so it shouldn't cause races. But we know it does cause compartment mismatches, at least, and it seems to be associated with crashes in flatten, js_ConcatStrings, and Decompile, which are all string-related crashes. I'm not sure what to make of that.
I think I'll still try the mutex, just to see if it can tell us anything.
But I'm starting to think that a lot of this is JSD-related and that grinding away much harder on this before getting the known JSD problems taken care of may be a waste of time.
Comment 61•14 years ago
|
||
Most of the jsd-related compartment mismatches should be fixed as of yesterday. Maybe we should resample over night with your patch?
Assignee | ||
Comment 62•14 years ago
|
||
(In reply to comment #61)
> Most of the jsd-related compartment mismatches should be fixed as of yesterday.
> Maybe we should resample over night with your patch?
We could do that, but what information do you expect to get from it? Fixing the JSD compartment mismatches should bring down the total crash count, which seems like it may have been happening. Is there something else we can learn from that?
Assignee | ||
Comment 63•14 years ago
|
||
This will test the hypothesis that concurrent calls to flatten cause the crash. It doesn't seem to affect SunSpider all that much either--maybe 20ms more than the last nightly I tested, but at least some of that is from PGO, so it seems pretty low impact.
Assignee | ||
Updated•14 years ago
|
Attachment #486253 -
Flags: review?(gal)
Assignee | ||
Updated•14 years ago
|
Attachment #486253 -
Flags: review?(gal) → review?(dvander)
Comment on attachment 486253 [details] [diff] [review]
Diagnostic patch to serialize flatten
r=me with UNLOCK_RUNTIME before the return
Attachment #486253 -
Flags: review?(dvander) → review+
Assignee | ||
Comment 65•14 years ago
|
||
Attachment #486253 -
Attachment is obsolete: true
Attachment #486254 -
Flags: review?(dvander)
Updated•14 years ago
|
Attachment #486254 -
Flags: review?(dvander) → review+
Comment 66•14 years ago
|
||
Comment on attachment 486254 [details] [diff] [review]
v2 serializing diagnostic
Looks good.
Attachment #486254 -
Flags: review?
Updated•14 years ago
|
Attachment #486254 -
Flags: review? → review+
Comment 67•14 years ago
|
||
I hit this crash just now, and I don't have Firebug installed, FWIW:
http://crash-stats.mozilla.com/report/index/bp-09af5fd4-398a-42df-a3bb-d4ac62101027
about the only interesting extension I have installed is Flashblock.
Comment 68•14 years ago
|
||
(In reply to comment #67)
> I hit this crash just now, and I don't have Firebug installed, FWIW:
> http://crash-stats.mozilla.com/report/index/bp-09af5fd4-398a-42df-a3bb-d4ac62101027
>
> about the only interesting extension I have installed is Flashblock.
Hmm, I'll run with Flashblock in a debug build. I bet I'll hit a compartments mismatch.
Reading
http://www.mozdev.org/source/browse/flashblock/source/content/flashblock/flashblock.xml?annotate=1.8.2.47#14
I see this:
function nativeMethod(untrustedObject, methodName)
and your stack has a call to apply on it.
Comment 69•14 years ago
|
||
Oh ho, I had forgotten all about that nonsense. Since the Flashblock XBL binding loads in web content, it uses that silliness to ensure that webpages can't kill DOM methods that it needs to function.
Comment 70•14 years ago
|
||
I just tried Flashblock with a TM tip debug build and didn't have any problems. But it looks like Ted's build is from 10-25.
Comment 71•14 years ago
|
||
(In reply to comment #54)
> There is no notion of a compartment thread at the moment since threads enter
> compartments in an unbalanced manner due to cx->globalObject and SetFrameRegs.
For diagnostic purposes this should not be a problem. So my suggestion is to record the last thread that entered the compartment and then assert on that thread in string manipulation code. The false positives are not bad in this case as they may also give some insight into the problem.
Assignee | ||
Comment 72•14 years ago
|
||
Well, it looks like it's stopped crashing: there have been reports of this crash coming in all morning, but zero from the 10/27 build. Unfortunately, there seems to have been no Windows build this morning, so we don't have data there yet, but it looks like it has stopped on Mac/Linux.
I tried SunSpider on an m-c tinderbox build, and my score was 232.5, which seems to show that there was no perf impact from the latest patch. (My last run, with a TM build of a few days ago, got 241. Chrome 8 was 218, so we are within 7% now on trunk even on my machine.) So, we could probably keep this as a mitigation patch if we wanted.
Currently contemplating next steps.
Comment 73•14 years ago
|
||
I hit a slightly different manifestation of this today:
http://crash-stats.mozilla.com/report/index/15cfddcb-c9d9-46a6-8eba-5b4152101028
[@ RtlpWaitOnCriticalSection | RtlpDeCommitFreeBlock | PR_Lock ]
The source line in frame 4 looks wrong, but it looks like we're calling PR_Lock with a NULL PRLock.
Comment 74•14 years ago
|
||
Thanks Ted. This is pretty interesting. We inserted a lock into trunk so exclude mutual access. You crashing with a NULL lock there looks really weird. Memory corruption? Bad string pointer? (we find the lock by finding the compartment for the string).
Comment 75•14 years ago
|
||
philor's all-seeing-eye points out a crash @ JSString::flatten from yesterday:
http://tinderbox.mozilla.org/showlog.cgi?log=TraceMonkey/1288241702.1288242307.16454.gz#err2
which occurred in the middle of a Sync test. I should note that I do use Sync, so it's plausible that these crashes are occurring during a background sync.
Comment 76•14 years ago
|
||
The profile that crashed for me has Sync set up as well.
Assignee | ||
Comment 77•14 years ago
|
||
The volume is way down with the mutex in place, but we still get a few. The most common one of what's left crashes at jsstr.cpp:136 [1] (reported as line 144 in both Socorro and VS, but it's really 136) getting the capacity of a rope top node's buffer. It is an NPE, so somehow we can get a rope top node with no buffer.
That should be filed as a separate bug, though, if we want to study it, but I don't recommend doing that until the main one is solved. I'm wondering if it could be caused by a race in some other string-related function, because the mutex only stops races in flatten.
[1] https://crash-stats.mozilla.com/report/index/a2bf9fa1-ab1f-4fa3-ba8f-c8a262101027
Comment 78•14 years ago
|
||
(In reply to comment #76)
> The profile that crashed for me has Sync set up as well.
I think Sync is a red herring. We're seeing this in pretty high numbers, higher than I believe is representative of sync users within our nightly audience.
Comment 79•14 years ago
|
||
Thanks beltzner. I also can't find any multithreading in sync.
Comment 80•14 years ago
|
||
philiKON: When did off-thread sync crypto land? The crypto stuff has plenty of string passing through ctypes.
Comment 81•14 years ago
|
||
Well, there's http://mxr.mozilla.org/mozilla-central/source/services/crypto/modules/threaded.js, which hit m-c at the time this spiked...
Comment 82•14 years ago
|
||
This sends a function made on one thread to another thread, which is completely unsafe :-/ (and got even unsafer when jorendorff turned off multithreaded objects).
93 this.cbThread.dispatch(new Runner(this.callback, this.thisObj,
94 returnval, error),
95 Ci.nsIThread.DISPATCH_NORMAL);
Comment 83•14 years ago
|
||
By "red herring" I meant "almost assuredly the actual culprit", obviously. Those idioms are right next to each other.
/me hides in shame
Comment 84•14 years ago
|
||
(In reply to comment #82)
> This sends a function made on one thread to another thread, which is completely
> unsafe :-/ (and got even unsafer when jorendorff turned off multithreaded
> objects).
>
> 93 this.cbThread.dispatch(new Runner(this.callback, this.thisObj,
> 94 returnval, error),
> 95 Ci.nsIThread.DISPATCH_NORMAL);
Not only did we seem to support this, it looks just like "good JS". If it were in a single-threaded-by-design framework, e.g. nodejs, it would be fine. But it is not fine.
Advising people not to do this risks whispering into the wind. We need stronger checks and migration help, or MT wrappers (bug 566951).
/be
Comment 85•14 years ago
|
||
(In reply to comment #83)
> By "red herring" I meant "almost assuredly the actual culprit", obviously.
> Those idioms are right next to each other.
>
> /me hides in shame
No worries, it is not easy to call these by sparse sampling of crashes and add-on correlations.
/be
Comment 86•14 years ago
|
||
I filed a bug to fix the crypto stuff. I will file a second bug to stop this at the API level. That will hopefully reveal what we have to fix and stop the crashes.
Comment 87•14 years ago
|
||
(In reply to comment #86)
> I filed a bug to fix the crypto stuff. I will file a second bug to stop this at
> the API level. That will hopefully reveal what we have to fix and stop the
> crashes.
It will only reveal what we can dynamically cover with testing. That's not enough for our addons and others' addons. It is definitely not enough for XULRunner apps (all the dark matter; Le Monde's daily production system? dunno, don't want to have to worry).
So, bug 566951.
/be
Comment 88•14 years ago
|
||
I didn't say its enough. Its a step up from "crash randomly in far away places in the JS engine in rope code".
Comment 89•14 years ago
|
||
We've backed out off-thread crypto so hopefully we should see the number of these crashes go down starting with tomorrow's nightlies.
Comment 90•14 years ago
|
||
Thanks Philipp.
Assignee | ||
Comment 91•14 years ago
|
||
Backed out the serialization patch so we can see more clearly the effect of backing out background crypto:
http://hg.mozilla.org/mozilla-central/rev/16888138cd07
Assignee | ||
Comment 92•14 years ago
|
||
Looking good so far. We got several crash reports this morning from builds before the post-thread crypto backout, but none from after.
Comment 93•14 years ago
|
||
This is awesome. I will dup this bug against 608142, which disallows sending closures across threads.
Status: NEW → RESOLVED
Closed: 14 years ago
Resolution: --- → DUPLICATE
Updated•14 years ago
|
Crash Signature: [@ JSString::flatten() ]
[@ JSString::flatten ]
You need to log in
before you can comment on or make changes to this bug.
Description
•