Last ditch GCs on the spidermonkey side are too aggressive
Categories
(Core :: JavaScript: GC, enhancement, P3)
Tracking
()
Tracking | Status | |
---|---|---|
firefox68 | --- | fixed |
People
(Reporter: pbone, Assigned: jonco)
References
Details
Attachments
(1 file)
In the DOM when we get a memory pressure event we attempt a last ditch GC, and then if the pressure is ongoing we don't do it again. The reasoning is if the user is low on memory we'd only lag/jank Firefox/their system and probably end up crashing anyway. On the SpiderMonkey side this is not the case, but probably should be :bz has a Firefox process stuck in a loop attempting last ditch GCs 10:30 < bz_away> One is via the array allocation 10:30 < bz_away> Which does JSObject* js::gc::GCRuntime::tryNewTenuredThing<JSObject, (js::AllowGC)1>(JSContext*, js::gc::AllocKind, unsigned long) 10:30 < bz_away> Which does the GCRuntime::collect() call and 10:47 < bz> pbone: In particular just the nsObserverService::NotifyObservers garbage-collection-statistics bit We should probably avoid two last ditch GCs in a row (and also within a certain time window). It may be a better user experience just to fail the allocation and let the calling JS (in this case browser chrome) deal with it. In this case bz's allocation sites seem to be in browser chrome JS, but the memory leak could be elsewhere.
Assignee | ||
Updated•5 years ago
|
Assignee | ||
Updated•5 years ago
|
Assignee | ||
Comment 1•5 years ago
|
||
I considered putting a check in for the amount of heap freed by the last ditch GC, but I found out that we can also do this for chunk allocation failure (actual OOM) as well as hitting the heap size limit, so I went with the simpler option.
Depends on D25636
Pushed by jcoppeard@mozilla.com: https://hg.mozilla.org/integration/mozilla-inbound/rev/0199e2beb6f9 Skip last ditch GC and fail the allocation if we already did last ditch GC within the last minute r=sfink
Comment 3•5 years ago
|
||
Backed out 2 changesets (Bug 1505622, Bug 1540719) for spidermonkey bustages in Allocator.cpp
Failure log: https://treeherder.mozilla.org/logviewer.html#/jobs?job_id=239002452&repo=mozilla-inbound&lineNumber=5890
Backout: https://hg.mozilla.org/integration/mozilla-inbound/rev/7d3896c97b076b5784b4f3831aee796a5b1e311f
Assignee | ||
Comment 4•5 years ago
•
|
||
(In reply to Noemi Erli[:noemi_erli] from comment #3)
This is complaining that lastDitchGC can GC:
Function '_ZN2js2gc9GCRuntime18tryNewTenuredThingI17JSFatInlineStringLNS_7AllowGCE1EEEPT_P9JSContextNS0_9AllocKindEm$JSFatInlineString* js::gc::GCRuntime::tryNewTenuredThing(JSContext*, uint8, uint64) [with T = JSFatInlineString; js::AllowGC allowGC = (js::AllowGC)1u; size_t = long unsigned int]' has unrooted 't' of type 'JSFatInlineString*' live across GC call '_ZN2js2gc9GCRuntime11lastDitchGCEP9JSContext$uint8 js::gc::GCRuntime::lastDitchGC(JSContext*)' at js/src/gc/Allocator.cpp:276
Allocator.cpp:273: Call(5,6, t := refillFreeListFromAnyThread(cx*,kind*))
Allocator.cpp:275: Call(6,7, __temp_3 := __builtin_expect(null(t*),0))
Allocator.cpp:275: Assume(7,8, (__temp_3* != 0), true)
Allocator.cpp:276: Call(8,9, __temp_6 := cx*.runtime())
Allocator.cpp:276: Call(9,10, __temp_5 := __temp_6*.gc.lastDitchGC(cx*)) [[GC call]]
Allocator.cpp:276: Assume(10,12, __temp_5*, false)
Allocator.cpp:276: Assign(12,13, __temp_4 := 0)
Allocator.cpp:276: Assume(13,15, __temp_4*, false)
Allocator.cpp:279: Assume(15,16, null(t*), true)
GC Function: _ZN2js2gc9GCRuntime11lastDitchGCEP9JSContext$uint8 js::gc::GCRuntime::lastDitchGC(JSContext*)
void js::gc::GCRuntime::gc(uint32, int32)
void js::gc::GCRuntime::collect(uint8, js::SliceBudget, int32)
GC
That is certainly the case, but we're in a region where 't' is known to be null and this was already the case before this patch, so I don't know a) why the analysis thinks this is a hazard and b) if it does consider this to be a hazard why it didn't already report a hazard for this method.
Steve, any idea how to proceed here?
Comment 5•5 years ago
•
|
||
This is subtle.
That is certainly the case, but we're in a region where 't' is known to be null and this was already the case before this patch, so I don't know a) why the analysis thinks this is a hazard
It doesn't take the null-ness into account. What it sees is:
if (MOZ_UNLIKELY(!t)) {
if (allowGC && cx->runtime()->gc.lastDitchGC(cx)) {
t = tryNewTenuredThing<T, NoGC>(cx, kind, thingSize);
}
if (!t) { ... }
}
MOZ_UNLIKELY(!t) just establishes that t is live at that point. Then it does a lastDitchGC, invalidating t -- which wouldn't be a problem if the old t were never used, but it is in the if (!t). (Note that there is only a hazard path when allowGC is true and lastDitchGC fails. If it succeeded, then t would be overwritten and the fact that it got invalidated first wouldn't matter.)
and b) if it does consider this to be a hazard why it didn't already report a hazard for this method.
Previously, the code was
if (MOZ_UNLIKELY(!t && allowGC)) {
if (!cx->helperThread()) {
lastDitchGC(); // well, the equivalent
t = tryNewTenuredThing<T, NoGC>(cx, kind, thingSize);
}
if (!t) { ...report OOM... }
Here, lastDitchGC() again invalidates t, but its invalid value is unconditionally overwritten with tryNewTenuredThing() so there's no problem.
Steve, any idea how to proceed here?
The analysis fix would be to allow "invalidated" pointers to be checked for null. (Hm... I wonder how far that could be loosened up? Clearly dereferencing a pointer is bad, and passing them to a function is bad. Comparing the pointer value is ok. Ugh... but hashing the pointer value is bad, and that would be hard to distinguish. So never mind.)
The code fix would be to prevent the old invalidated t from being used after lastDitchGC is called. Some ways to do it:
if (MOZ_UNLIKELY(!t)) {
if (allowGC && cx->runtime()->gc.lastDitchGC(cx)) {
t = tryNewTenuredThing<T, NoGC>(cx, kind, thingSize);
} else {
t = nullptr;
}
if (!t) { ...report OOM... }
}
or closer to the original:
if (MOZ_UNLIKELY(!t) && allowGC) {
if (cx->runtime()->gc.lastDitchGC(cx)) {
t = tryNewTenuredThing<T, NoGC>(cx, kind, thingSize);
} else {
t = nullptr;
}
if (!t) { ...report OOM... }
}
or whatever. I guess the root of the issue here is the new boolean return value of lastDitchGC.
Comment 6•5 years ago
|
||
Sadly, the analysis fix is not as easy as I thought -- for some reason, the test if (!t)
is producing a CFG that looks like if (*t == nullptr)
. Or maybe sixgill is fabricating the dereference, though this is reminiscent of the const Value& case where gcc was just lying, so this might be similarly problematic. At any rate, it requires diving into sixgill again, so it's not the 5-minute fix I was hoping it could be.
Assignee | ||
Comment 7•5 years ago
|
||
(In reply to Steve Fink [:sfink] [:s:] from comment #5)
Ah, it's about the fact that the old value is still used, and the analysis doesn't know that it's always null in this case. Thanks for the explanation.
Don't worry about fixing the analysis, I'll rearrange the code so that the old value is no longer live.
Pushed by jcoppeard@mozilla.com: https://hg.mozilla.org/integration/mozilla-inbound/rev/c6317f06ed07 Skip last ditch GC and fail the allocation if we already did last ditch GC within the last minute r=sfink
Comment 9•5 years ago
|
||
bugherder |
Description
•