Closed
Bug 1158558
Opened 10 years ago
Closed 10 years ago
crashes while browsing due to Assertion failure: !JS::IsIncrementalGCInProgress(mJSRuntime) (Don't call FixWeakMappingGrayBits during a GC.)
Categories
(Core :: XPCOM, defect)
Core
XPCOM
Tracking
()
People
(Reporter: dbaron, Assigned: mccr8)
References
Details
(Keywords: assertion, crash)
Attachments
(4 files, 2 obsolete files)
5.04 KB,
text/plain
|
Details | |
2.35 KB,
patch
|
Details | Diff | Splinter Review | |
3.49 KB,
patch
|
smaug
:
review+
|
Details | Diff | Splinter Review |
6.89 KB,
patch
|
smaug
:
review+
Sylvestre
:
approval-mozilla-aurora+
Sylvestre
:
approval-mozilla-esr38+
jocheng
:
approval-mozilla-b2g37+
|
Details | Diff | Splinter Review |
In the past 2 days I've crashed twice with this assertion:
Assertion failure: !JS::IsIncrementalGCInProgress(mJSRuntime) (Don't call
FixWeakMappingGrayBits during a GC.), at
mozilla-central/mozilla/xpcom/base/CycleCollectedJSRuntime.cpp:1022
in my 64-bit Linux debug build of mozilla-central (plus my own patches).
This seems to be a recent regression.
Reporter | ||
Updated•10 years ago
|
Reporter | ||
Comment 1•10 years ago
|
||
Assignee | ||
Comment 2•10 years ago
|
||
Yeah, this looks like bug 1054648 which has been happening on tree herder for a while, though it kicked up in frequency recently. I haven't been able to figure out what is going wrong.
Flags: needinfo?(continuation)
See Also: → 1054648
Assignee | ||
Comment 3•10 years ago
|
||
I'll use this as the bug, given that the intermittent is too noisy.
In certain circumstances, we need to tidy up various gray bits when we start the GC. We don't want to mess with mark bits while we're in the middle of a GC, so we assert in that case.
We actually try to maintain a stronger invariant, that we never run the CC at all in the middle of an IGC, but there's no actual assertion for it. To try to enforce this, we call FinishAnyIncrementalGC() before we start a CC. So, either this is failing to actually finish the GC in some case, or something else that runs after the start of the CC begins a new CC. I'll add some more asserts to get more information about when the invariant is failing to hold, though that may just cause more orange.
Assignee: nobody → continuation
Blocks: 1054648
Assignee | ||
Comment 4•10 years ago
|
||
A try run with assertions at the end of FinishAnyIncrementalGC() didn't trigger, so we must be restarting a GC inside the CC, I guess?
Assignee | ||
Comment 5•10 years ago
|
||
My current theory is that we end up running JS inside BeginCollection() itself, which ends up starting a GC, which attempts to finish off the current CC (this won't do anything, because we're currently in the CC and we have reentrance guards), then we come back to BeginCollection and continue, and hit the assertion.
We haven't really done anything yet at this point in the CC, so if that's accurate, the best course of action is to check for this failure condition explicitly (we probably want to move it after the call to mListener->Begin() as that could also trigger a GC) before we call FixGrayBits(), and then somehow just bail out of the current CC.
Assignee | ||
Comment 6•10 years ago
|
||
I did a try run with additional assertions. It looks like that at least mJSRuntime->BeginCycleCollectionCallback() can begin an IGC. I'll think about what the fix should be.
https://treeherder.mozilla.org/#/jobs?repo=try&revision=c332887e6b7d
Assignee | ||
Comment 7•10 years ago
|
||
Oh, another thing I found is that when we start a CC from shutdown, there can be an IGC in progress. I'm not sure why that doesn't hit the existing assertion.
A data point.
On a local PC, I performedwas running C-C Thunderbird |make mozmill|
test suite by running TB binary under valgrind/memcheck to weed out
any remaining memory issues.
Running TB under valgrind/memcheck slows down memory intensive
operation so much, it can uncover hidden assumptions about ordering of
events as such, whichs need to be enfornced by hihg-level locking, etc.
I think I found a bug of this nature, and found that it has already been reported.
C-C TB running under valgrind/memcheck crashed in one instance (of about 1100 tests).
Obviously this is hightly timing dependent.
Thrown assertion is:
Assertion failure: !JS::IsIncrementalGCInProgress(mJSRuntime) (Don't call FixWeakMappingGrayBits during a GC.), at /REF-COMM-CENTRAL/comm-central/mozilla/xpcom/base/CycleCollectedJSRuntime.cpp:1031
Looking at the stack in the backtrace left in the log (quoted below),
I think one of the functions below ought to realize that the PREVIOUS(?) incremental
GC is not finished (the next incremental GC is invoked by timer event, but due to the
slowdown by valgrind/memcheck, I think the previous incremental GC was not finished.).
nsCycleCollector::FixGrayBits(bool) (nsCycleCollector.cpp:3496)
nsCycleCollector::BeginCollection(ccType, nsICycleCollectorListener*) (nsCycleCollector.cpp:3771)
nsCycleCollector::Collect(ccType, js::SliceBudget&, nsICycleCollectorListener*, bool) (nsCycleCollector.cpp:3604)
nsCycleCollector_collectSlice(js::SliceBudget&, bool) (nsCycleCollector.cpp:4105)
nsJSContext::RunCycleCollectorSlice() (nsJSEnvironment.cpp:1558)
CCTimerFired(nsITimer*, void*) (nsJSEnvironment.cpp:1894)
If the previous incremental GC has not finished, we should not invoke another one, I suspect.
Excerpt from the log:
[...]
TEST-START | /REF-COMM-CENTRAL/comm-central/mail/test/mozmill/quick-filter-bar/test-display-issues.js | teardownModule
[23571] WARNING: NS_ENSURE_SUCCESS(EnsureScriptEnvironment(), nullptr) failed with result 0x80040111: file /REF-COMM-CENTRAL/comm-central/mozilla/docshell/base/nsDocShell.cpp, line 4595
[23571] WARNING: No inner window available!: file /REF-COMM-CENTRAL/comm-central/mozilla/dom/base/nsGlobalWindow.cpp, line 9779
[23571] WARNING: No inner window available!: file /REF-COMM-CENTRAL/comm-central/mozilla/dom/base/nsGlobalWindow.cpp, line 9779
Assertion failure: !JS::IsIncrementalGCInProgress(mJSRuntime) (Don't call FixWeakMappingGrayBits during a GC.), at /REF-COMM-CENTRAL/comm-central/mozilla/xpcom/base/CycleCollectedJSRuntime.cpp:1031
#01: ???[/REF-OBJ-DIR/objdir-tb3/dist/bin/libxul.so +0xf99e3e]
#02: ???[/REF-OBJ-DIR/objdir-tb3/dist/bin/libxul.so +0xf9b75a]
#03: ???[/REF-OBJ-DIR/objdir-tb3/dist/bin/libxul.so +0xfa379c]
#04: ???[/REF-OBJ-DIR/objdir-tb3/dist/bin/libxul.so +0xfa4c8c]
#05: ???[/REF-OBJ-DIR/objdir-tb3/dist/bin/libxul.so +0xfa5b45]
#06: ???[/REF-OBJ-DIR/objdir-tb3/dist/bin/libxul.so +0x2154b53]
#07: ???[/REF-OBJ-DIR/objdir-tb3/dist/bin/libxul.so +0x2155b14]
#08: ???[/REF-OBJ-DIR/objdir-tb3/dist/bin/libxul.so +0x10250c3]
#09: ???[/REF-OBJ-DIR/objdir-tb3/dist/bin/libxul.so +0x1026ee8]
#10: ???[/REF-OBJ-DIR/objdir-tb3/dist/bin/libxul.so +0x101fbab]
#11: NS_InvokeByIndex[/REF-OBJ-DIR/objdir-tb3/dist/bin/libxul.so +0x1036492]
#12: ???[/REF-OBJ-DIR/objdir-tb3/dist/bin/libxul.so +0x1a25120]
#13: ???[/REF-OBJ-DIR/objdir-tb3/dist/bin/libxul.so +0x1a299fd]
#14: ??? (???:???)
==23571== Thread 1:
==23571== Invalid write of size 4
==23571== at 0x721FE3E: mozilla::CycleCollectedJSRuntime::FixWeakMappingGrayBits() const (CycleCollectedJSRuntime.cpp:1030)
==23571== by 0x7221759: nsCycleCollector::FixGrayBits(bool) (nsCycleCollector.cpp:3496)
==23571== by 0x722979B: nsCycleCollector::BeginCollection(ccType, nsICycleCollectorListener*) (nsCycleCollector.cpp:3771)
==23571== by 0x722AC8B: nsCycleCollector::Collect(ccType, js::SliceBudget&, nsICycleCollectorListener*, bool) (nsCycleCollector.cpp:3604)
==23571== by 0x722BB44: nsCycleCollector_collectSlice(js::SliceBudget&, bool) (nsCycleCollector.cpp:4105)
==23571== by 0x83DAB52: nsJSContext::RunCycleCollectorSlice() (nsJSEnvironment.cpp:1558)
==23571== by 0x83DBB13: CCTimerFired(nsITimer*, void*) (nsJSEnvironment.cpp:1894)
==23571== by 0x72AB0C2: nsTimerImpl::Fire() (nsTimerImpl.cpp:630)
==23571== by 0x72ACEE7: nsTimerEvent::Run() (nsTimerImpl.cpp:723)
==23571== by 0x72A5BAA: nsThread::ProcessNextEvent(bool, bool*) (nsThread.cpp:868)
==23571== by 0x72BC491: NS_InvokeByIndex (xptcinvoke_x86_64_unix.cpp:176)
==23571== by 0x7CAB11F: XPCWrappedNative::CallMethod(XPCCallContext&, XPCWrappedNative::CallMode) (XPCWrappedNative.cpp:2080)
==23571== by 0x7CAF9FC: XPC_WN_CallMethod(JSContext*, unsigned int, JS::Value*) (XPCWrappedNativeJSOps.cpp:1141)
==23571== by 0x2865A5AE: ???
==23571== by 0x3AD1CF2F: ???
==23571== by 0x4204D20: ???
==23571== by 0xAF36997: EnterBaseline(JSContext*, js::jit::EnterJitData&) (BaselineJIT.cpp:125)
==23571== by 0xAF4990B: js::jit::EnterBaselineMethod(JSContext*, js::RunState&) (BaselineJIT.cpp:156)
==23571== by 0xAD64845: js::RunScript(JSContext*, js::RunState&) (Interpreter.cpp:667)
==23571== by 0xAD64D03: js::Invoke(JSContext*, JS::CallArgs, js::MaybeConstruct) (Interpreter.cpp:746)
==23571== by 0xAD667C7: js::Invoke(JSContext*, JS::Value const&, JS::Value const&, unsigned int, JS::Value const*, JS::MutableHandle<JS::Value>) (Interpreter.cpp:783)
==23571== by 0xB30DE3F: js::DirectProxyHandler::call(JSContext*, JS::Handle<JSObject*>, JS::CallArgs const&) const (DirectProxyHandler.cpp:77)
==23571== by 0xB316F89: js::CrossCompartmentWrapper::call(JSContext*, JS::Handle<JSObject*>, JS::CallArgs const&) const (CrossCompartmentWrapper.cpp:289)
==23571== by 0xB321ABE: js::Proxy::call(JSContext*, JS::Handle<JSObject*>, JS::CallArgs const&) (Proxy.cpp:391)
==23571== by 0xB321BDB: js::proxy_Call(JSContext*, unsigned int, JS::Value*) (Proxy.cpp:697)
==23571== by 0xAD78D4D: js::CallJSNative(JSContext*, bool (*)(JSContext*, unsigned int, JS::Value*), JS::CallArgs const&) (jscntxtinlines.h:235)
==23571== by 0xAD64F27: js::Invoke(JSContext*, JS::CallArgs, js::MaybeConstruct) (Interpreter.cpp:720)
==23571== by 0xAD5EFA8: Interpret(JSContext*, js::RunState&) (Interpreter.cpp:2955)
==23571== by 0xAD645EA: js::RunScript(JSContext*, js::RunState&) (Interpreter.cpp:677)
==23571== by 0xAD64D03: js::Invoke(JSContext*, JS::CallArgs, js::MaybeConstruct) (Interpreter.cpp:746)
==23571== by 0xAD667C7: js::Invoke(JSContext*, JS::Value const&, JS::Value const&, unsigned int, JS::Value const*, JS::MutableHandle<JS::Value>) (Interpreter.cpp:783)
==23571== by 0xB30DE3F: js::DirectProxyHandler::call(JSContext*, JS::Handle<JSObject*>, JS::CallArgs const&) const (DirectProxyHandler.cpp:77)
==23571== by 0xB316F89: js::CrossCompartmentWrapper::call(JSContext*, JS::Handle<JSObject*>, JS::CallArgs const&) const (CrossCompartmentWrapper.cpp:289)
==23571== by 0xB321ABE: js::Proxy::call(JSContext*, JS::Handle<JSObject*>, JS::CallArgs const&) (Proxy.cpp:391)
==23571== by 0xB321BDB: js::proxy_Call(JSContext*, unsigned int, JS::Value*) (Proxy.cpp:697)
==23571== by 0xAD78D4D: js::CallJSNative(JSContext*, bool (*)(JSContext*, unsigned int, JS::Value*), JS::CallArgs const&) (jscntxtinlines.h:235)
==23571== by 0xAD64F27: js::Invoke(JSContext*, JS::CallArgs, js::MaybeConstruct) (Interpreter.cpp:720)
==23571== by 0xAD5EFA8: Interpret(JSContext*, js::RunState&) (Interpreter.cpp:2955)
==23571== by 0xAD645EA: js::RunScript(JSContext*, js::RunState&) (Interpreter.cpp:677)
==23571== by 0xAD64D03: js::Invoke(JSContext*, JS::CallArgs, js::MaybeConstruct) (Interpreter.cpp:746)
==23571== by 0xAD667C7: js::Invoke(JSContext*, JS::Value const&, JS::Value const&, unsigned int, JS::Value const*, JS::MutableHandle<JS::Value>) (Interpreter.cpp:783)
==23571== by 0xB30DE3F: js::DirectProxyHandler::call(JSContext*, JS::Handle<JSObject*>, JS::CallArgs const&) const (DirectProxyHandler.cpp:77)
==23571== by 0xB316F89: js::CrossCompartmentWrapper::call(JSContext*, JS::Handle<JSObject*>, JS::CallArgs const&) const (CrossCompartmentWrapper.cpp:289)
==23571== by 0xB321ABE: js::Proxy::call(JSContext*, JS::Handle<JSObject*>, JS::CallArgs const&) (Proxy.cpp:391)
==23571== by 0xB321BDB: js::proxy_Call(JSContext*, unsigned int, JS::Value*) (Proxy.cpp:697)
==23571== by 0xAD78D4D: js::CallJSNative(JSContext*, bool (*)(JSContext*, unsigned int, JS::Value*), JS::CallArgs const&) (jscntxtinlines.h:235)
==23571== by 0xAD64F27: js::Invoke(JSContext*, JS::CallArgs, js::MaybeConstruct) (Interpreter.cpp:720)
==23571== by 0xAD5EFA8: Interpret(JSContext*, js::RunState&) (Interpreter.cpp:2955)
==23571== by 0xAD645EA: js::RunScript(JSContext*, js::RunState&) (Interpreter.cpp:677)
==23571== by 0xAD64D03: js::Invoke(JSContext*, JS::CallArgs, js::MaybeConstruct) (Interpreter.cpp:746)
==23571== Address 0x0 is not stack'd, malloc'd or (recently) free'd
==23571==
[...]
TIA
Assignee | ||
Comment 9•10 years ago
|
||
(In reply to ISHIKAWA, Chiaki from comment #8)
> I think one of the functions below ought to realize that the PREVIOUS(?)
> incremental GC is not finished (the next incremental GC is invoked by timer event, but
> due to the slowdown by valgrind/memcheck, I think the previous incremental GC was not
> finished.).
Before we go into the cycle collector (CC), we finish off any existing incremental garbage collection (GC). The problem is that there are a few places we can end up starting a new GC while we're in the middle of running the CC.
Assignee | ||
Comment 10•10 years ago
|
||
This test triggers the assertion, by adding an observer for cycle-collector-begin that starts an incremental GC, then starting a CC.
Assignee | ||
Comment 11•10 years ago
|
||
Attachment #8603477 -
Attachment is obsolete: true
Assignee | ||
Comment 12•10 years ago
|
||
I filed bug 1163663 for additional checks that we don't run the GC during CC.
Assignee | ||
Comment 13•10 years ago
|
||
try: https://treeherder.mozilla.org/#/jobs?repo=try&revision=d6e75af4955d
Sorry for dumping all of these CC reviews on you. :)
Attachment #8605860 -
Flags: review?(bugs)
Assignee | ||
Comment 14•10 years ago
|
||
Various parts of the first half of BeginCollection() can start an incremental GC.
This is bad because running the GC and CC at the same time can cause the CC to end
up with pointers to dead JS objects.
To avoid this, we finish any incremental GC in progress in BeginCollection. This
is slow, but hopefully it is rare.
Attachment #8605861 -
Flags: review?(bugs)
Assignee | ||
Updated•10 years ago
|
Attachment #8605861 -
Attachment is obsolete: true
Attachment #8605861 -
Flags: review?(bugs)
Assignee | ||
Comment 15•10 years ago
|
||
I just forgot to remove the old version of an assertion when I did clean up.
Attachment #8605869 -
Flags: review?(bugs)
Comment 16•10 years ago
|
||
(In reply to Andrew McCreight [:mccr8] from comment #13)
> Sorry for dumping all of these CC reviews on you. :)
It is always fun to look at cycle collector code.
Updated•10 years ago
|
Attachment #8605860 -
Flags: review?(bugs) → review+
Updated•10 years ago
|
Attachment #8605869 -
Flags: review?(bugs) → review+
Comment 17•10 years ago
|
||
Assignee | ||
Comment 18•10 years ago
|
||
Some time early next week I should backport this. RyanVM said Aurora, esr38 and b2g37 would be good to have this on. The latter two because they are going to be long-lived. The risk of this patch is mostly in causing performance problems, but I feel like if we actually were getting into that situation in the wild we may crash so this patch is likely an improvement.
Flags: needinfo?(continuation)
Comment 19•10 years ago
|
||
https://hg.mozilla.org/mozilla-central/rev/985c6db923c3
https://hg.mozilla.org/mozilla-central/rev/bed961b49b25
Status: NEW → RESOLVED
Closed: 10 years ago
status-firefox41:
--- → fixed
Resolution: --- → FIXED
Target Milestone: --- → mozilla41
Assignee | ||
Comment 20•10 years ago
|
||
I don't really want to back port this until I have some telemetry data, but apparently that's broken right now.
Depends on: 1156857
Flags: needinfo?(continuation)
Assignee | ||
Updated•10 years ago
|
Flags: needinfo?(continuation)
Assignee | ||
Comment 21•10 years ago
|
||
Comment on attachment 8605869 [details] [diff] [review]
part 2 - Finish incremental GCs in progress in BeginCollection() and ShutdownCollect().
This approval is for both parts. When uplifting, please uplift the version that landed on m-c, not the attached version. I was hoping to get some telemetry to verify it wasn't causing any regressions, but telemetry is apparently broken.
[Approval Request Comment]
Bug caused by (feature/regressing bug #): the underlying issue is old, but something more recent has started causing this to happen more in test automation.
User impact if declined: possible crashes. I'm mostly nominating this for uplift to fix all of the intermittent test failures at the request of RyanVM (see the many bugs blocked by this one).
Testing completed: This code runs frequently.
Risk to taking this patch (and alternatives if risky): The main risk is some kind of minor performance problem, but it is probably better than the status quo of possible memory corruption in that situation.
String or UUID changes made by this patch: none
[Approval Request Comment]
If this is not a sec:{high,crit} bug, please state case for ESR consideration: this fixes a source of frequent intermittent failures on TreeHerder (see the many blocking bugs), and RyanVM requested I uplift this for a long-living branch.
Flags: needinfo?(continuation)
Attachment #8605869 -
Flags: approval-mozilla-esr38?
Attachment #8605869 -
Flags: approval-mozilla-b2g37?
Attachment #8605869 -
Flags: approval-mozilla-aurora?
Comment 22•10 years ago
|
||
Andrew, if you request the uplift to esr38, I guess 39 (aka beta) is affected too. Correct?
I can approve it in aurora but I would like to wait for telemetry to be back before accepting it to 38.
status-firefox38:
--- → wontfix
status-firefox38.0.5:
--- → wontfix
status-firefox39:
--- → affected
status-firefox-esr38:
--- → affected
Updated•10 years ago
|
Attachment #8605869 -
Flags: approval-mozilla-aurora? → approval-mozilla-aurora+
Assignee | ||
Comment 23•10 years ago
|
||
(In reply to Sylvestre Ledru [:sylvestre] from comment #22)
> Andrew, if you request the uplift to esr38, I guess 39 (aka beta) is
> affected too. Correct?
> I can approve it in aurora but I would like to wait for telemetry to be back
> before accepting it to 38.
Yes, 39 is affected, but I don't think it is a big enough issue to necessarily be worth backporting. I requested esr38 because it will be such a long living branch and there's bunch of related orange bugs.
Comment 24•10 years ago
|
||
https://hg.mozilla.org/releases/mozilla-aurora/rev/daf478785c62
https://hg.mozilla.org/releases/mozilla-aurora/rev/feaee5986ce1
Flags: in-testsuite+
Updated•10 years ago
|
Attachment #8605869 -
Flags: approval-mozilla-b2g37? → approval-mozilla-b2g37+
Updated•10 years ago
|
Comment 25•10 years ago
|
||
This failure has been happening on all desktop platforms since this landed on b2g37:
https://treeherder.mozilla.org/logviewer.html#?job_id=132242&repo=mozilla-b2g37_v2_2
Flags: needinfo?(ryanvm)
Flags: needinfo?(continuation)
Assignee | ||
Comment 27•10 years ago
|
||
The problem is here:
14:05:23 INFO - JavaScript error: http://mochi.test:8888/tests/js/xpconnect/tests/mochitest/test_bug1158558.html, line 33: TypeError: SpecialPowers.Cu.getJSTestingFunctions(...).startgc is not a function
I'm guessing the way I'm triggering startgc is not valid in b2g37. You can just disable the test for now and I'll try to figure that out.
Flags: needinfo?(ryanvm)
Flags: needinfo?(continuation)
Comment 28•10 years ago
|
||
Looks like startgc() is supported by 38+.
https://hg.mozilla.org/releases/mozilla-b2g37_v2_2/rev/663bf4703588
Assignee | ||
Comment 29•10 years ago
|
||
Yeah, I'm not sure off hand of any other way to trigger an IGC from JS, so I think I'll just leave it disabled. We may get some recurrence of the intermittent orange on b2g37 if somehow it starts failing there.
Comment 30•9 years ago
|
||
Comment on attachment 8605869 [details] [diff] [review]
part 2 - Finish incremental GCs in progress in BeginCollection() and ShutdownCollect().
Taking it in esr 38 to avoid the orange.
Attachment #8605869 -
Flags: approval-mozilla-esr38? → approval-mozilla-esr38+
Comment 31•9 years ago
|
||
You need to log in
before you can comment on or make changes to this bug.
Description
•