Last ditch GCs on the spidermonkey side are too aggressive

RESOLVED FIXED in Firefox 68

Status

()

enhancement
P3
normal
RESOLVED FIXED
8 months ago
2 months ago

People

(Reporter: pbone, Assigned: jonco)

Tracking

unspecified
mozilla68
Points:
---

Firefox Tracking Flags

(firefox68 fixed)

Details

Attachments

(1 attachment)

Reporter

Description

8 months ago
In the DOM when we get a memory pressure event we attempt a last ditch GC, and then if the pressure is ongoing we don't do it again. The reasoning is if the user is low on memory we'd only lag/jank Firefox/their system and probably end up crashing anyway.

On the SpiderMonkey side this is not the case, but probably should be :bz has a Firefox process stuck in a loop attempting last ditch GCs 

10:30 < bz_away> One is  via the array allocation
10:30 < bz_away> Which does JSObject* 
                 js::gc::GCRuntime::tryNewTenuredThing<JSObject, 
                 (js::AllowGC)1>(JSContext*, js::gc::AllocKind, unsigned long)
10:30 < bz_away> Which does the GCRuntime::collect() call

and

10:47 < bz> pbone: In particular just the  nsObserverService::NotifyObservers 
            garbage-collection-statistics bit

We should probably avoid two last ditch GCs in a row (and also within a certain time window).  It may be a better user experience just to fail the allocation and let the calling JS (in this case browser chrome) deal with it.  In this case bz's allocation sites seem to be in browser chrome JS, but the memory leak could be elsewhere.
Assignee

Updated

6 months ago
Priority: -- → P3
Assignee

Updated

3 months ago
Assignee: nobody → jcoppeard
Assignee

Comment 1

3 months ago

I considered putting a check in for the amount of heap freed by the last ditch GC, but I found out that we can also do this for chunk allocation failure (actual OOM) as well as hitting the heap size limit, so I went with the simpler option.

Depends on D25636

Comment 2

3 months ago
Pushed by jcoppeard@mozilla.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/0199e2beb6f9
Skip last ditch GC and fail the allocation if we already did last ditch GC within the last minute r=sfink
Assignee

Comment 4

3 months ago

(In reply to Noemi Erli[:noemi_erli] from comment #3)

This is complaining that lastDitchGC can GC:

Function '_ZN2js2gc9GCRuntime18tryNewTenuredThingI17JSFatInlineStringLNS_7AllowGCE1EEEPT_P9JSContextNS0_9AllocKindEm$JSFatInlineString* js::gc::GCRuntime::tryNewTenuredThing(JSContext*, uint8, uint64) [with T = JSFatInlineString; js::AllowGC allowGC = (js::AllowGC)1u; size_t = long unsigned int]' has unrooted 't' of type 'JSFatInlineString*' live across GC call '_ZN2js2gc9GCRuntime11lastDitchGCEP9JSContext$uint8 js::gc::GCRuntime::lastDitchGC(JSContext*)' at js/src/gc/Allocator.cpp:276
    Allocator.cpp:273: Call(5,6, t := refillFreeListFromAnyThread(cx*,kind*))
    Allocator.cpp:275: Call(6,7, __temp_3 := __builtin_expect(null(t*),0))
    Allocator.cpp:275: Assume(7,8, (__temp_3* != 0), true)
    Allocator.cpp:276: Call(8,9, __temp_6 := cx*.runtime())
    Allocator.cpp:276: Call(9,10, __temp_5 := __temp_6*.gc.lastDitchGC(cx*)) [[GC call]]
    Allocator.cpp:276: Assume(10,12, __temp_5*, false)
    Allocator.cpp:276: Assign(12,13, __temp_4 := 0)
    Allocator.cpp:276: Assume(13,15, __temp_4*, false)
    Allocator.cpp:279: Assume(15,16, null(t*), true)
GC Function: _ZN2js2gc9GCRuntime11lastDitchGCEP9JSContext$uint8 js::gc::GCRuntime::lastDitchGC(JSContext*)
    void js::gc::GCRuntime::gc(uint32, int32)
    void js::gc::GCRuntime::collect(uint8, js::SliceBudget, int32)
    GC

That is certainly the case, but we're in a region where 't' is known to be null and this was already the case before this patch, so I don't know a) why the analysis thinks this is a hazard and b) if it does consider this to be a hazard why it didn't already report a hazard for this method.

Steve, any idea how to proceed here?

Flags: needinfo?(jcoppeard) → needinfo?(sphink)

This is subtle.

That is certainly the case, but we're in a region where 't' is known to be null and this was already the case before this patch, so I don't know a) why the analysis thinks this is a hazard

It doesn't take the null-ness into account. What it sees is:

if (MOZ_UNLIKELY(!t)) {
  if (allowGC && cx->runtime()->gc.lastDitchGC(cx)) {
    t = tryNewTenuredThing<T, NoGC>(cx, kind, thingSize);
  }
  if (!t) { ... }
}

MOZ_UNLIKELY(!t) just establishes that t is live at that point. Then it does a lastDitchGC, invalidating t -- which wouldn't be a problem if the old t were never used, but it is in the if (!t). (Note that there is only a hazard path when allowGC is true and lastDitchGC fails. If it succeeded, then t would be overwritten and the fact that it got invalidated first wouldn't matter.)

and b) if it does consider this to be a hazard why it didn't already report a hazard for this method.

Previously, the code was

if (MOZ_UNLIKELY(!t && allowGC)) {
  if (!cx->helperThread()) {
    lastDitchGC(); // well, the equivalent
    t = tryNewTenuredThing<T, NoGC>(cx, kind, thingSize);
  }
  if (!t) { ...report OOM... }

Here, lastDitchGC() again invalidates t, but its invalid value is unconditionally overwritten with tryNewTenuredThing() so there's no problem.

Steve, any idea how to proceed here?

The analysis fix would be to allow "invalidated" pointers to be checked for null. (Hm... I wonder how far that could be loosened up? Clearly dereferencing a pointer is bad, and passing them to a function is bad. Comparing the pointer value is ok. Ugh... but hashing the pointer value is bad, and that would be hard to distinguish. So never mind.)

The code fix would be to prevent the old invalidated t from being used after lastDitchGC is called. Some ways to do it:

if (MOZ_UNLIKELY(!t)) {
  if (allowGC && cx->runtime()->gc.lastDitchGC(cx)) {
    t = tryNewTenuredThing<T, NoGC>(cx, kind, thingSize);
  } else {
    t = nullptr;
  }
  if (!t) { ...report OOM... }
}

or closer to the original:

if (MOZ_UNLIKELY(!t) && allowGC) {
  if (cx->runtime()->gc.lastDitchGC(cx)) {
    t = tryNewTenuredThing<T, NoGC>(cx, kind, thingSize);
  } else {
    t = nullptr;
  }
  if (!t) { ...report OOM... }
}

or whatever. I guess the root of the issue here is the new boolean return value of lastDitchGC.

Flags: needinfo?(sphink)

Sadly, the analysis fix is not as easy as I thought -- for some reason, the test if (!t) is producing a CFG that looks like if (*t == nullptr). Or maybe sixgill is fabricating the dereference, though this is reminiscent of the const Value& case where gcc was just lying, so this might be similarly problematic. At any rate, it requires diving into sixgill again, so it's not the 5-minute fix I was hoping it could be.

Assignee

Comment 7

3 months ago

(In reply to Steve Fink [:sfink] [:s:] from comment #5)

Ah, it's about the fact that the old value is still used, and the analysis doesn't know that it's always null in this case. Thanks for the explanation.

Don't worry about fixing the analysis, I'll rearrange the code so that the old value is no longer live.

Comment 8

3 months ago
Pushed by jcoppeard@mozilla.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/c6317f06ed07
Skip last ditch GC and fail the allocation if we already did last ditch GC within the last minute r=sfink

Comment 9

3 months ago
bugherder
Status: NEW → RESOLVED
Closed: 3 months ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla68
Regressions: 1545369
You need to log in before you can comment on or make changes to this bug.