Apply the delay crashing on OOM idea to the JS GC.
Categories
(Core :: JavaScript: GC, enhancement, P2)
Tracking
()
Tracking | Status | |
---|---|---|
firefox134 | --- | fixed |
People
(Reporter: pbone, Assigned: jonco)
References
(Blocks 2 open bugs)
Details
Attachments
(4 files)
48 bytes,
text/x-phabricator-request
|
Details | Review | |
48 bytes,
text/x-phabricator-request
|
Details | Review | |
Bug 1786451 - Part 3: Add a parameter to MapAlignedPages to allow stall and retry on failure r?sfink
48 bytes,
text/x-phabricator-request
|
Details | Review | |
48 bytes,
text/x-phabricator-request
|
Details | Review |
In Bug 1716727 gsvelto and rkraesig implemented a delay before crashing from OOM to give windows a chance to find more virtual memory. This seems to be working and since there are VIrtualAlloc
calls in JSGC we should probably do the same there.
Alternatively some time ago glandium suggested that we consolidate our calls to VirtualAlloc, especially where we cache memory blocks since JSGC and jemalloc have some similar requirements for these blocks.
Comment 1•3 years ago
|
||
More discussion on this topic is happening in bug 1784033 where we proposed to hook VirtualAlloc()
(but it might not be a good idea).
Updated•3 years ago
|
Assignee | ||
Comment 2•3 years ago
|
||
(In reply to Paul Bone [:pbone] [PTO until 8/8/22] from comment #0)
Looks like this is very effective at moving crashes from the parent process to content process, but doesn't necessarily reduce the number of overall crashes.
Nevertheless, content process crashes are a better user experience so we should consider doing this.
Comment 3•4 months ago
|
||
(In reply to Jon Coppeard (:jonco) from comment #2)
Looks like this is very effective at moving crashes from the parent process to content process, but doesn't necessarily reduce the number of overall crashes.
It's worth noting that this has turned out not to be true: it measurably does reduce the number of overall crashes. (Physical memory is effectively a zero-sum game, but delaying often gives Windows a chance to add more virtual memory.)
Assignee | ||
Comment 4•4 months ago
|
||
(In reply to Ray Kraesig [:rkraesig] from comment #3)
It's worth noting that this has turned out not to be true: it measurably does reduce the number of overall crashes.
That's even better then!
Assignee | ||
Comment 5•4 months ago
|
||
This will allow this function to be used by the JS engine too.
Assignee | ||
Comment 6•4 months ago
|
||
We always allocate pages with read/write protection so we can remove this unused parameter.
Assignee | ||
Comment 7•4 months ago
|
||
This uses MozVirtualAlloc in MapAlignedPagesLastDitch which is called when
other allocation methods fail.
Assignee | ||
Comment 8•4 months ago
|
||
Normally we trigger a GC on chunk allocation failure (either last ditch major
GC or minor GC when trying to extend the nursery). We can't recover from
allocation failure during GC however so this adds the stall and retry behaviour
there.
Comment 10•4 months ago
|
||
bugherder |
https://hg.mozilla.org/mozilla-central/rev/ba7e6787c58a
https://hg.mozilla.org/mozilla-central/rev/bcf2f6626b82
https://hg.mozilla.org/mozilla-central/rev/23494a028c18
https://hg.mozilla.org/mozilla-central/rev/6b63840889b6
Comment 11•4 months ago
|
||
It would be interesting to measure the impact of this, I'll make a note to compare the OOM crash rate of major releases once this hits the release channel and we have enough data.
Comment 12•2 months ago
|
||
(In reply to Gabriele Svelto [:gsvelto] from comment #11)
It would be interesting to measure the impact of this, I'll make a note to compare the OOM crash rate of major releases once this hits the release channel and we have enough data.
Apologies if I'm a bit over eager but is there enough data to see if this has helped a lot or not?
Comment 13•1 month ago
|
||
I haven't looked at telemetry in a while, this should be abundantly visible now. NI? myself so I don't forget to check as soon as I have a minute.
Comment 14•1 month ago
|
||
Looking at legacy telemetry there's been a fairly visible decline in OOM crashes affecting child processes. The decline is not as marked for the main process but given that this should affect JS execution I think it's reasonable that we'd see most of the effect in child processes:
Comment 15•1 month ago
|
||
This per-version query makes the effect more visible:
https://sql.telemetry.mozilla.org/queries/105101
We might check back in a month but it looks like this had a measurable impact.
Description
•