Closed Bug 1505622 Opened 6 years ago Closed 6 years ago

Last ditch GCs on the spidermonkey side are too aggressive

Tracking

()

Status:

RESOLVED FIXED

Milestone:

mozilla68

Tracking Flags:

Tracking

Status

firefox68

---

fixed

People

(Reporter: pbone, Assigned: jonco)

References

Details

Attachments

(1 file)

Bug 1505622 - Skip last ditch GC and fail the allocation if we already did last ditch GC within the last minute 6 years ago Jon Coppeard (:jonco) 47 bytes, text/x-phabricator-request		Details \| Review

Paul Bone [:pbone]

Reporter

Description

•

6 years ago

In the DOM when we get a memory pressure event we attempt a last ditch GC, and then if the pressure is ongoing we don't do it again. The reasoning is if the user is low on memory we'd only lag/jank Firefox/their system and probably end up crashing anyway. On the SpiderMonkey side this is not the case, but probably should be :bz has a Firefox process stuck in a loop attempting last ditch GCs 10:30 < bz_away> One is via the array allocation 10:30 < bz_away> Which does JSObject* js::gc::GCRuntime::tryNewTenuredThing<JSObject, (js::AllowGC)1>(JSContext*, js::gc::AllocKind, unsigned long) 10:30 < bz_away> Which does the GCRuntime::collect() call and 10:47 < bz> pbone: In particular just the nsObserverService::NotifyObservers garbage-collection-statistics bit We should probably avoid two last ditch GCs in a row (and also within a certain time window). It may be a better user experience just to fail the allocation and let the calling JS (in this case browser chrome) deal with it. In this case bz's allocation sites seem to be in browser chrome JS, but the memory leak could be elsewhere.

Jon Coppeard (:jonco)

Assignee

Updated

•

6 years ago

Priority: -- → P3

Jon Coppeard (:jonco)

Assignee

Updated

•

6 years ago

Assignee: nobody → jcoppeard

Jon Coppeard (:jonco)

Assignee

Comment 1

•

6 years ago

Attached file Bug 1505622 - Skip last ditch GC and fail the allocation if we already did last ditch GC within the last minute — Details

I considered putting a check in for the amount of heap freed by the last ditch GC, but I found out that we can also do this for chunk allocation failure (actual OOM) as well as hitting the heap size limit, so I went with the simpler option.

Depends on D25636

Pulsebot

Comment 2

•

6 years ago

Pushed by jcoppeard@mozilla.com: https://hg.mozilla.org/integration/mozilla-inbound/rev/0199e2beb6f9 Skip last ditch GC and fail the allocation if we already did last ditch GC within the last minute r=sfink

Noemi Erli[:noemi_erli]

Comment 3

•

6 years ago

Backed out 2 changesets (Bug 1505622, Bug 1540719) for spidermonkey bustages in Allocator.cpp

Push with failures: https://treeherder.mozilla.org/#/jobs?repo=mozilla-inbound&resultStatus=testfailed%2Cbusted%2Cexception&classifiedState=unclassified&revision=0199e2beb6f94dda62383ff2f7b0d401b148d16a&selectedJob=239002452

Failure log: https://treeherder.mozilla.org/logviewer.html#/jobs?job_id=239002452&repo=mozilla-inbound&lineNumber=5890

Backout: https://hg.mozilla.org/integration/mozilla-inbound/rev/7d3896c97b076b5784b4f3831aee796a5b1e311f

Flags: needinfo?(jcoppeard)

Jon Coppeard (:jonco)

Assignee

Comment 4

•

6 years ago

•

Edited

(In reply to Noemi Erli[:noemi_erli] from comment #3)

This is complaining that lastDitchGC can GC:

Function '_ZN2js2gc9GCRuntime18tryNewTenuredThingI17JSFatInlineStringLNS_7AllowGCE1EEEPT_P9JSContextNS0_9AllocKindEm$JSFatInlineString* js::gc::GCRuntime::tryNewTenuredThing(JSContext*, uint8, uint64) [with T = JSFatInlineString; js::AllowGC allowGC = (js::AllowGC)1u; size_t = long unsigned int]' has unrooted 't' of type 'JSFatInlineString*' live across GC call '_ZN2js2gc9GCRuntime11lastDitchGCEP9JSContext$uint8 js::gc::GCRuntime::lastDitchGC(JSContext*)' at js/src/gc/Allocator.cpp:276
    Allocator.cpp:273: Call(5,6, t := refillFreeListFromAnyThread(cx*,kind*))
    Allocator.cpp:275: Call(6,7, __temp_3 := __builtin_expect(null(t*),0))
    Allocator.cpp:275: Assume(7,8, (__temp_3* != 0), true)
    Allocator.cpp:276: Call(8,9, __temp_6 := cx*.runtime())
    Allocator.cpp:276: Call(9,10, __temp_5 := __temp_6*.gc.lastDitchGC(cx*)) [[GC call]]
    Allocator.cpp:276: Assume(10,12, __temp_5*, false)
    Allocator.cpp:276: Assign(12,13, __temp_4 := 0)
    Allocator.cpp:276: Assume(13,15, __temp_4*, false)
    Allocator.cpp:279: Assume(15,16, null(t*), true)
GC Function: _ZN2js2gc9GCRuntime11lastDitchGCEP9JSContext$uint8 js::gc::GCRuntime::lastDitchGC(JSContext*)
    void js::gc::GCRuntime::gc(uint32, int32)
    void js::gc::GCRuntime::collect(uint8, js::SliceBudget, int32)
    GC

That is certainly the case, but we're in a region where 't' is known to be null and this was already the case before this patch, so I don't know a) why the analysis thinks this is a hazard and b) if it does consider this to be a hazard why it didn't already report a hazard for this method.

Steve, any idea how to proceed here?

Flags: needinfo?(jcoppeard) → needinfo?(sphink)

Steve Fink [:sfink] [:s:]

Comment 5

•

6 years ago

•

Edited

This is subtle.

That is certainly the case, but we're in a region where 't' is known to be null and this was already the case before this patch, so I don't know a) why the analysis thinks this is a hazard

It doesn't take the null-ness into account. What it sees is:

if (MOZ_UNLIKELY(!t)) {
  if (allowGC && cx->runtime()->gc.lastDitchGC(cx)) {
    t = tryNewTenuredThing<T, NoGC>(cx, kind, thingSize);
  }
  if (!t) { ... }
}

MOZ_UNLIKELY(!t) just establishes that t is live at that point. Then it does a lastDitchGC, invalidating t -- which wouldn't be a problem if the old t were never used, but it is in the if (!t). (Note that there is only a hazard path when allowGC is true and lastDitchGC fails. If it succeeded, then t would be overwritten and the fact that it got invalidated first wouldn't matter.)

and b) if it does consider this to be a hazard why it didn't already report a hazard for this method.

Previously, the code was

if (MOZ_UNLIKELY(!t && allowGC)) {
  if (!cx->helperThread()) {
    lastDitchGC(); // well, the equivalent
    t = tryNewTenuredThing<T, NoGC>(cx, kind, thingSize);
  }
  if (!t) { ...report OOM... }

Here, lastDitchGC() again invalidates t, but its invalid value is unconditionally overwritten with tryNewTenuredThing() so there's no problem.

Steve, any idea how to proceed here?

The analysis fix would be to allow "invalidated" pointers to be checked for null. (Hm... I wonder how far that could be loosened up? Clearly dereferencing a pointer is bad, and passing them to a function is bad. Comparing the pointer value is ok. Ugh... but hashing the pointer value is bad, and that would be hard to distinguish. So never mind.)

The code fix would be to prevent the old invalidated t from being used after lastDitchGC is called. Some ways to do it:

if (MOZ_UNLIKELY(!t)) {
  if (allowGC && cx->runtime()->gc.lastDitchGC(cx)) {
    t = tryNewTenuredThing<T, NoGC>(cx, kind, thingSize);
  } else {
    t = nullptr;
  }
  if (!t) { ...report OOM... }
}

or closer to the original:

if (MOZ_UNLIKELY(!t) && allowGC) {
  if (cx->runtime()->gc.lastDitchGC(cx)) {
    t = tryNewTenuredThing<T, NoGC>(cx, kind, thingSize);
  } else {
    t = nullptr;
  }
  if (!t) { ...report OOM... }
}

or whatever. I guess the root of the issue here is the new boolean return value of lastDitchGC.

Flags: needinfo?(sphink)

Steve Fink [:sfink] [:s:]

Comment 6

•

6 years ago

Sadly, the analysis fix is not as easy as I thought -- for some reason, the test if (!t) is producing a CFG that looks like if (*t == nullptr). Or maybe sixgill is fabricating the dereference, though this is reminiscent of the const Value& case where gcc was just lying, so this might be similarly problematic. At any rate, it requires diving into sixgill again, so it's not the 5-minute fix I was hoping it could be.

Jon Coppeard (:jonco)

Assignee

Comment 7

•

6 years ago

(In reply to Steve Fink [:sfink] [:s:] from comment #5)

Ah, it's about the fact that the old value is still used, and the analysis doesn't know that it's always null in this case. Thanks for the explanation.

Don't worry about fixing the analysis, I'll rearrange the code so that the old value is no longer live.

Pulsebot

Comment 8

•

6 years ago

Pushed by jcoppeard@mozilla.com: https://hg.mozilla.org/integration/mozilla-inbound/rev/c6317f06ed07 Skip last ditch GC and fail the allocation if we already did last ditch GC within the last minute r=sfink

Daniel Varga [:dvarga]

Comment 9

•

6 years ago

bugherder

https://hg.mozilla.org/mozilla-central/rev/c6317f06ed07

Status: NEW → RESOLVED

Closed: 6 years ago

status-firefox68: --- → fixed

Resolution: --- → FIXED

Target Milestone: --- → mozilla68

Gary Kwong [:gkw] [:nth10sd] (NOT official MoCo now)

Updated

•

6 years ago

Regressions: 1545369

You need to log in before you can comment on or make changes to this bug.

Bugzilla

Last ditch GCs on the spidermonkey side are too aggressive

Categories

(Core :: JavaScript: GC, enhancement, P3)

Tracking

()

People

(Reporter: pbone, Assigned: jonco)

References

Details

Crash Data

Security

(public)

User Story

Attachments

(1 file)

Description

Updated

Updated

Comment 1

Comment 2

Comment 3

Comment 4

Comment 5

Comment 6

Comment 7

Comment 8

Comment 9

Updated

Attachment

General

Description

File Name

Content Type