Closed Bug 1158558 Opened 10 years ago Closed 10 years ago

crashes while browsing due to Assertion failure: !JS::IsIncrementalGCInProgress(mJSRuntime) (Don't call FixWeakMappingGrayBits during a GC.)

Categories

(Core :: XPCOM, defect)

defect
Not set
normal

Tracking

()

RESOLVED FIXED
mozilla41
Tracking Status
firefox38 --- wontfix
firefox38.0.5 --- wontfix
firefox39 --- wontfix
firefox40 --- fixed
firefox41 --- fixed
firefox-esr38 --- fixed
b2g-v2.2 --- fixed
b2g-master --- fixed

People

(Reporter: dbaron, Assigned: mccr8)

References

Details

(Keywords: assertion, crash)

Attachments

(4 files, 2 obsolete files)

In the past 2 days I've crashed twice with this assertion: Assertion failure: !JS::IsIncrementalGCInProgress(mJSRuntime) (Don't call FixWeakMappingGrayBits during a GC.), at mozilla-central/mozilla/xpcom/base/CycleCollectedJSRuntime.cpp:1022 in my 64-bit Linux debug build of mozilla-central (plus my own patches). This seems to be a recent regression.
Flags: needinfo?(continuation)
Keywords: assertion, crash
Yeah, this looks like bug 1054648 which has been happening on tree herder for a while, though it kicked up in frequency recently. I haven't been able to figure out what is going wrong.
Flags: needinfo?(continuation)
See Also: → 1054648
See Also: 1054648
I'll use this as the bug, given that the intermittent is too noisy. In certain circumstances, we need to tidy up various gray bits when we start the GC. We don't want to mess with mark bits while we're in the middle of a GC, so we assert in that case. We actually try to maintain a stronger invariant, that we never run the CC at all in the middle of an IGC, but there's no actual assertion for it. To try to enforce this, we call FinishAnyIncrementalGC() before we start a CC. So, either this is failing to actually finish the GC in some case, or something else that runs after the start of the CC begins a new CC. I'll add some more asserts to get more information about when the invariant is failing to hold, though that may just cause more orange.
Assignee: nobody → continuation
Blocks: 1054648
A try run with assertions at the end of FinishAnyIncrementalGC() didn't trigger, so we must be restarting a GC inside the CC, I guess?
My current theory is that we end up running JS inside BeginCollection() itself, which ends up starting a GC, which attempts to finish off the current CC (this won't do anything, because we're currently in the CC and we have reentrance guards), then we come back to BeginCollection and continue, and hit the assertion. We haven't really done anything yet at this point in the CC, so if that's accurate, the best course of action is to check for this failure condition explicitly (we probably want to move it after the call to mListener->Begin() as that could also trigger a GC) before we call FixGrayBits(), and then somehow just bail out of the current CC.
Blocks: 1159066
I did a try run with additional assertions. It looks like that at least mJSRuntime->BeginCycleCollectionCallback() can begin an IGC. I'll think about what the fix should be. https://treeherder.mozilla.org/#/jobs?repo=try&revision=c332887e6b7d
Oh, another thing I found is that when we start a CC from shutdown, there can be an IGC in progress. I'm not sure why that doesn't hit the existing assertion.
A data point. On a local PC, I performedwas running C-C Thunderbird |make mozmill| test suite by running TB binary under valgrind/memcheck to weed out any remaining memory issues. Running TB under valgrind/memcheck slows down memory intensive operation so much, it can uncover hidden assumptions about ordering of events as such, whichs need to be enfornced by hihg-level locking, etc. I think I found a bug of this nature, and found that it has already been reported. C-C TB running under valgrind/memcheck crashed in one instance (of about 1100 tests). Obviously this is hightly timing dependent. Thrown assertion is: Assertion failure: !JS::IsIncrementalGCInProgress(mJSRuntime) (Don't call FixWeakMappingGrayBits during a GC.), at /REF-COMM-CENTRAL/comm-central/mozilla/xpcom/base/CycleCollectedJSRuntime.cpp:1031 Looking at the stack in the backtrace left in the log (quoted below), I think one of the functions below ought to realize that the PREVIOUS(?) incremental GC is not finished (the next incremental GC is invoked by timer event, but due to the slowdown by valgrind/memcheck, I think the previous incremental GC was not finished.). nsCycleCollector::FixGrayBits(bool) (nsCycleCollector.cpp:3496) nsCycleCollector::BeginCollection(ccType, nsICycleCollectorListener*) (nsCycleCollector.cpp:3771) nsCycleCollector::Collect(ccType, js::SliceBudget&, nsICycleCollectorListener*, bool) (nsCycleCollector.cpp:3604) nsCycleCollector_collectSlice(js::SliceBudget&, bool) (nsCycleCollector.cpp:4105) nsJSContext::RunCycleCollectorSlice() (nsJSEnvironment.cpp:1558) CCTimerFired(nsITimer*, void*) (nsJSEnvironment.cpp:1894) If the previous incremental GC has not finished, we should not invoke another one, I suspect. Excerpt from the log: [...] TEST-START | /REF-COMM-CENTRAL/comm-central/mail/test/mozmill/quick-filter-bar/test-display-issues.js | teardownModule [23571] WARNING: NS_ENSURE_SUCCESS(EnsureScriptEnvironment(), nullptr) failed with result 0x80040111: file /REF-COMM-CENTRAL/comm-central/mozilla/docshell/base/nsDocShell.cpp, line 4595 [23571] WARNING: No inner window available!: file /REF-COMM-CENTRAL/comm-central/mozilla/dom/base/nsGlobalWindow.cpp, line 9779 [23571] WARNING: No inner window available!: file /REF-COMM-CENTRAL/comm-central/mozilla/dom/base/nsGlobalWindow.cpp, line 9779 Assertion failure: !JS::IsIncrementalGCInProgress(mJSRuntime) (Don't call FixWeakMappingGrayBits during a GC.), at /REF-COMM-CENTRAL/comm-central/mozilla/xpcom/base/CycleCollectedJSRuntime.cpp:1031 #01: ???[/REF-OBJ-DIR/objdir-tb3/dist/bin/libxul.so +0xf99e3e] #02: ???[/REF-OBJ-DIR/objdir-tb3/dist/bin/libxul.so +0xf9b75a] #03: ???[/REF-OBJ-DIR/objdir-tb3/dist/bin/libxul.so +0xfa379c] #04: ???[/REF-OBJ-DIR/objdir-tb3/dist/bin/libxul.so +0xfa4c8c] #05: ???[/REF-OBJ-DIR/objdir-tb3/dist/bin/libxul.so +0xfa5b45] #06: ???[/REF-OBJ-DIR/objdir-tb3/dist/bin/libxul.so +0x2154b53] #07: ???[/REF-OBJ-DIR/objdir-tb3/dist/bin/libxul.so +0x2155b14] #08: ???[/REF-OBJ-DIR/objdir-tb3/dist/bin/libxul.so +0x10250c3] #09: ???[/REF-OBJ-DIR/objdir-tb3/dist/bin/libxul.so +0x1026ee8] #10: ???[/REF-OBJ-DIR/objdir-tb3/dist/bin/libxul.so +0x101fbab] #11: NS_InvokeByIndex[/REF-OBJ-DIR/objdir-tb3/dist/bin/libxul.so +0x1036492] #12: ???[/REF-OBJ-DIR/objdir-tb3/dist/bin/libxul.so +0x1a25120] #13: ???[/REF-OBJ-DIR/objdir-tb3/dist/bin/libxul.so +0x1a299fd] #14: ??? (???:???) ==23571== Thread 1: ==23571== Invalid write of size 4 ==23571== at 0x721FE3E: mozilla::CycleCollectedJSRuntime::FixWeakMappingGrayBits() const (CycleCollectedJSRuntime.cpp:1030) ==23571== by 0x7221759: nsCycleCollector::FixGrayBits(bool) (nsCycleCollector.cpp:3496) ==23571== by 0x722979B: nsCycleCollector::BeginCollection(ccType, nsICycleCollectorListener*) (nsCycleCollector.cpp:3771) ==23571== by 0x722AC8B: nsCycleCollector::Collect(ccType, js::SliceBudget&, nsICycleCollectorListener*, bool) (nsCycleCollector.cpp:3604) ==23571== by 0x722BB44: nsCycleCollector_collectSlice(js::SliceBudget&, bool) (nsCycleCollector.cpp:4105) ==23571== by 0x83DAB52: nsJSContext::RunCycleCollectorSlice() (nsJSEnvironment.cpp:1558) ==23571== by 0x83DBB13: CCTimerFired(nsITimer*, void*) (nsJSEnvironment.cpp:1894) ==23571== by 0x72AB0C2: nsTimerImpl::Fire() (nsTimerImpl.cpp:630) ==23571== by 0x72ACEE7: nsTimerEvent::Run() (nsTimerImpl.cpp:723) ==23571== by 0x72A5BAA: nsThread::ProcessNextEvent(bool, bool*) (nsThread.cpp:868) ==23571== by 0x72BC491: NS_InvokeByIndex (xptcinvoke_x86_64_unix.cpp:176) ==23571== by 0x7CAB11F: XPCWrappedNative::CallMethod(XPCCallContext&, XPCWrappedNative::CallMode) (XPCWrappedNative.cpp:2080) ==23571== by 0x7CAF9FC: XPC_WN_CallMethod(JSContext*, unsigned int, JS::Value*) (XPCWrappedNativeJSOps.cpp:1141) ==23571== by 0x2865A5AE: ??? ==23571== by 0x3AD1CF2F: ??? ==23571== by 0x4204D20: ??? ==23571== by 0xAF36997: EnterBaseline(JSContext*, js::jit::EnterJitData&) (BaselineJIT.cpp:125) ==23571== by 0xAF4990B: js::jit::EnterBaselineMethod(JSContext*, js::RunState&) (BaselineJIT.cpp:156) ==23571== by 0xAD64845: js::RunScript(JSContext*, js::RunState&) (Interpreter.cpp:667) ==23571== by 0xAD64D03: js::Invoke(JSContext*, JS::CallArgs, js::MaybeConstruct) (Interpreter.cpp:746) ==23571== by 0xAD667C7: js::Invoke(JSContext*, JS::Value const&, JS::Value const&, unsigned int, JS::Value const*, JS::MutableHandle<JS::Value>) (Interpreter.cpp:783) ==23571== by 0xB30DE3F: js::DirectProxyHandler::call(JSContext*, JS::Handle<JSObject*>, JS::CallArgs const&) const (DirectProxyHandler.cpp:77) ==23571== by 0xB316F89: js::CrossCompartmentWrapper::call(JSContext*, JS::Handle<JSObject*>, JS::CallArgs const&) const (CrossCompartmentWrapper.cpp:289) ==23571== by 0xB321ABE: js::Proxy::call(JSContext*, JS::Handle<JSObject*>, JS::CallArgs const&) (Proxy.cpp:391) ==23571== by 0xB321BDB: js::proxy_Call(JSContext*, unsigned int, JS::Value*) (Proxy.cpp:697) ==23571== by 0xAD78D4D: js::CallJSNative(JSContext*, bool (*)(JSContext*, unsigned int, JS::Value*), JS::CallArgs const&) (jscntxtinlines.h:235) ==23571== by 0xAD64F27: js::Invoke(JSContext*, JS::CallArgs, js::MaybeConstruct) (Interpreter.cpp:720) ==23571== by 0xAD5EFA8: Interpret(JSContext*, js::RunState&) (Interpreter.cpp:2955) ==23571== by 0xAD645EA: js::RunScript(JSContext*, js::RunState&) (Interpreter.cpp:677) ==23571== by 0xAD64D03: js::Invoke(JSContext*, JS::CallArgs, js::MaybeConstruct) (Interpreter.cpp:746) ==23571== by 0xAD667C7: js::Invoke(JSContext*, JS::Value const&, JS::Value const&, unsigned int, JS::Value const*, JS::MutableHandle<JS::Value>) (Interpreter.cpp:783) ==23571== by 0xB30DE3F: js::DirectProxyHandler::call(JSContext*, JS::Handle<JSObject*>, JS::CallArgs const&) const (DirectProxyHandler.cpp:77) ==23571== by 0xB316F89: js::CrossCompartmentWrapper::call(JSContext*, JS::Handle<JSObject*>, JS::CallArgs const&) const (CrossCompartmentWrapper.cpp:289) ==23571== by 0xB321ABE: js::Proxy::call(JSContext*, JS::Handle<JSObject*>, JS::CallArgs const&) (Proxy.cpp:391) ==23571== by 0xB321BDB: js::proxy_Call(JSContext*, unsigned int, JS::Value*) (Proxy.cpp:697) ==23571== by 0xAD78D4D: js::CallJSNative(JSContext*, bool (*)(JSContext*, unsigned int, JS::Value*), JS::CallArgs const&) (jscntxtinlines.h:235) ==23571== by 0xAD64F27: js::Invoke(JSContext*, JS::CallArgs, js::MaybeConstruct) (Interpreter.cpp:720) ==23571== by 0xAD5EFA8: Interpret(JSContext*, js::RunState&) (Interpreter.cpp:2955) ==23571== by 0xAD645EA: js::RunScript(JSContext*, js::RunState&) (Interpreter.cpp:677) ==23571== by 0xAD64D03: js::Invoke(JSContext*, JS::CallArgs, js::MaybeConstruct) (Interpreter.cpp:746) ==23571== by 0xAD667C7: js::Invoke(JSContext*, JS::Value const&, JS::Value const&, unsigned int, JS::Value const*, JS::MutableHandle<JS::Value>) (Interpreter.cpp:783) ==23571== by 0xB30DE3F: js::DirectProxyHandler::call(JSContext*, JS::Handle<JSObject*>, JS::CallArgs const&) const (DirectProxyHandler.cpp:77) ==23571== by 0xB316F89: js::CrossCompartmentWrapper::call(JSContext*, JS::Handle<JSObject*>, JS::CallArgs const&) const (CrossCompartmentWrapper.cpp:289) ==23571== by 0xB321ABE: js::Proxy::call(JSContext*, JS::Handle<JSObject*>, JS::CallArgs const&) (Proxy.cpp:391) ==23571== by 0xB321BDB: js::proxy_Call(JSContext*, unsigned int, JS::Value*) (Proxy.cpp:697) ==23571== by 0xAD78D4D: js::CallJSNative(JSContext*, bool (*)(JSContext*, unsigned int, JS::Value*), JS::CallArgs const&) (jscntxtinlines.h:235) ==23571== by 0xAD64F27: js::Invoke(JSContext*, JS::CallArgs, js::MaybeConstruct) (Interpreter.cpp:720) ==23571== by 0xAD5EFA8: Interpret(JSContext*, js::RunState&) (Interpreter.cpp:2955) ==23571== by 0xAD645EA: js::RunScript(JSContext*, js::RunState&) (Interpreter.cpp:677) ==23571== by 0xAD64D03: js::Invoke(JSContext*, JS::CallArgs, js::MaybeConstruct) (Interpreter.cpp:746) ==23571== Address 0x0 is not stack'd, malloc'd or (recently) free'd ==23571== [...] TIA
Blocks: 1060875
Blocks: 1160685
Blocks: 1161689
Blocks: 1162763
(In reply to ISHIKAWA, Chiaki from comment #8) > I think one of the functions below ought to realize that the PREVIOUS(?) > incremental GC is not finished (the next incremental GC is invoked by timer event, but > due to the slowdown by valgrind/memcheck, I think the previous incremental GC was not > finished.). Before we go into the cycle collector (CC), we finish off any existing incremental garbage collection (GC). The problem is that there are a few places we can end up starting a new GC while we're in the middle of running the CC.
Attached patch Test. (obsolete) — Splinter Review
This test triggers the assertion, by adding an observer for cycle-collector-begin that starts an incremental GC, then starting a CC.
Attached patch Test.Splinter Review
Attachment #8603477 - Attachment is obsolete: true
I filed bug 1163663 for additional checks that we don't run the GC during CC.
Blocks: 1164002
try: https://treeherder.mozilla.org/#/jobs?repo=try&revision=d6e75af4955d Sorry for dumping all of these CC reviews on you. :)
Attachment #8605860 - Flags: review?(bugs)
Various parts of the first half of BeginCollection() can start an incremental GC. This is bad because running the GC and CC at the same time can cause the CC to end up with pointers to dead JS objects. To avoid this, we finish any incremental GC in progress in BeginCollection. This is slow, but hopefully it is rare.
Attachment #8605861 - Flags: review?(bugs)
Attachment #8605861 - Attachment is obsolete: true
Attachment #8605861 - Flags: review?(bugs)
I just forgot to remove the old version of an assertion when I did clean up.
Attachment #8605869 - Flags: review?(bugs)
(In reply to Andrew McCreight [:mccr8] from comment #13) > Sorry for dumping all of these CC reviews on you. :) It is always fun to look at cycle collector code.
Attachment #8605860 - Flags: review?(bugs) → review+
Attachment #8605869 - Flags: review?(bugs) → review+
Some time early next week I should backport this. RyanVM said Aurora, esr38 and b2g37 would be good to have this on. The latter two because they are going to be long-lived. The risk of this patch is mostly in causing performance problems, but I feel like if we actually were getting into that situation in the wild we may crash so this patch is likely an improvement.
Flags: needinfo?(continuation)
Status: NEW → RESOLVED
Closed: 10 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla41
I don't really want to back port this until I have some telemetry data, but apparently that's broken right now.
Depends on: 1156857
Flags: needinfo?(continuation)
Flags: needinfo?(continuation)
Comment on attachment 8605869 [details] [diff] [review] part 2 - Finish incremental GCs in progress in BeginCollection() and ShutdownCollect(). This approval is for both parts. When uplifting, please uplift the version that landed on m-c, not the attached version. I was hoping to get some telemetry to verify it wasn't causing any regressions, but telemetry is apparently broken. [Approval Request Comment] Bug caused by (feature/regressing bug #): the underlying issue is old, but something more recent has started causing this to happen more in test automation. User impact if declined: possible crashes. I'm mostly nominating this for uplift to fix all of the intermittent test failures at the request of RyanVM (see the many bugs blocked by this one). Testing completed: This code runs frequently. Risk to taking this patch (and alternatives if risky): The main risk is some kind of minor performance problem, but it is probably better than the status quo of possible memory corruption in that situation. String or UUID changes made by this patch: none [Approval Request Comment] If this is not a sec:{high,crit} bug, please state case for ESR consideration: this fixes a source of frequent intermittent failures on TreeHerder (see the many blocking bugs), and RyanVM requested I uplift this for a long-living branch.
Flags: needinfo?(continuation)
Attachment #8605869 - Flags: approval-mozilla-esr38?
Attachment #8605869 - Flags: approval-mozilla-b2g37?
Attachment #8605869 - Flags: approval-mozilla-aurora?
Andrew, if you request the uplift to esr38, I guess 39 (aka beta) is affected too. Correct? I can approve it in aurora but I would like to wait for telemetry to be back before accepting it to 38.
Attachment #8605869 - Flags: approval-mozilla-aurora? → approval-mozilla-aurora+
(In reply to Sylvestre Ledru [:sylvestre] from comment #22) > Andrew, if you request the uplift to esr38, I guess 39 (aka beta) is > affected too. Correct? > I can approve it in aurora but I would like to wait for telemetry to be back > before accepting it to 38. Yes, 39 is affected, but I don't think it is a big enough issue to necessarily be worth backporting. I requested esr38 because it will be such a long living branch and there's bunch of related orange bugs.
Attachment #8605869 - Flags: approval-mozilla-b2g37? → approval-mozilla-b2g37+
This failure has been happening on all desktop platforms since this landed on b2g37: https://treeherder.mozilla.org/logviewer.html#?job_id=132242&repo=mozilla-b2g37_v2_2
Flags: needinfo?(ryanvm)
Flags: needinfo?(continuation)
The problem is here: 14:05:23 INFO - JavaScript error: http://mochi.test:8888/tests/js/xpconnect/tests/mochitest/test_bug1158558.html, line 33: TypeError: SpecialPowers.Cu.getJSTestingFunctions(...).startgc is not a function I'm guessing the way I'm triggering startgc is not valid in b2g37. You can just disable the test for now and I'll try to figure that out.
Flags: needinfo?(ryanvm)
Flags: needinfo?(continuation)
Yeah, I'm not sure off hand of any other way to trigger an IGC from JS, so I think I'll just leave it disabled. We may get some recurrence of the intermittent orange on b2g37 if somehow it starts failing there.
Comment on attachment 8605869 [details] [diff] [review] part 2 - Finish incremental GCs in progress in BeginCollection() and ShutdownCollect(). Taking it in esr 38 to avoid the orange.
Attachment #8605869 - Flags: approval-mozilla-esr38? → approval-mozilla-esr38+
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: