Closed Bug 1459212 Opened 8 years ago Closed 7 years ago

Integrate nsThread::SaveMemoryReportNearOOM() with the available memory tracker

Tracking

()

Status:

RESOLVED FIXED

Milestone:

mozilla63

Tracking Flags:

Tracking

Status

firefox63

---

fixed

People

(Reporter: gsvelto, Assigned: gsvelto)

References

(Blocks 1 open bug)

Details

Attachments

(1 file)

Bug 1459212 - Save memory reports for use in crash reports when low on memory 7 years ago Gabriele Svelto [:gsvelto] 46 bytes, text/x-phabricator-request	away : review+ mccr8 : review+	Details \| Review

Gabriele Svelto [:gsvelto]

Assignee

Description

•

8 years ago

+++ This bug was initially created as a clone of Bug #1451005 +++ As :dmajor pointed out in bug 1451005 comment 11 we periodically check for low-memory scenarios in the main event loop machinery to grab memory reports before we crash. This code is ripe for integration with the available memory tracker since both are periodically checking for available memory. Integrating this code would solve multiple issues: - Checking for available memory in a single place using shared thresholds. Currently this code uses a different threshold than the available memory tracker and only checks for low virtual memory scenarios [1]. On 64-bit Windows builds where we'll never run out of virtual memory this code is useless; checking for low physical memory and low commit space would be useful. - The existing threshold is too low [2]. Looking at OOM crash pings most crashes happen when there's more free memory than specified so the chances of this code actually running are slim. To make matter worse the available memory tracker has a higher threshold at which it'll start sending memory-pressure notifications which will lower memory usage making it even more unlikely for a report to be ever grabbed. - Because this code is also running in child processes, it needs to do a magic IPC dance through the ContentParent object [3] to inform the main process that it needs to grab a memory report. This is probably useless too, if we're low on memory then the main process is likely to realize it quite a bit sooner than a child process. Telemetry data seems to confirm this as I can't find recorded instances of the DOM_CONTENTPROCESS_TROUBLED_DUE_TO_MEMORY scalar. This code can probably be removed. - Last but not least this is being called every time we spin the main thread on every process which is a horrible waste of CPU cycles in my eyes (even though it's a only a handful, think of it multiplied by every event loop spin for every Firefox user). [1] https://searchfox.org/mozilla-central/rev/c0d81882c7941c4ff13a50603e37095cdab0d1ea/xpcom/threads/nsThread.cpp#514 [2] https://searchfox.org/mozilla-central/rev/c0d81882c7941c4ff13a50603e37095cdab0d1ea/xpcom/threads/nsThread.cpp#510 [3] https://searchfox.org/mozilla-central/rev/c0d81882c7941c4ff13a50603e37095cdab0d1ea/dom/ipc/ContentParent.cpp#5105

(Away)

Comment 1

•

8 years ago

(In reply to Gabriele Svelto [:gsvelto] from comment #0) > - The existing threshold is too low [2]. Looking at OOM crash pings most > crashes happen when there's more free memory than specified so the chances > of this code actually running are slim. It may be helpful if I give some explanation (as the author of the code in question). This was written circa 2014 when Win64 Firefox wasn't really a thing, and address-space-OOMs were a huge source of crashes on Win32. We were trying to collect more data on hopelessly-low-address-space machines. The idea was not "if you hit an OOM crash, then you should have collected a memory report" but the converse: "if you are collecting a memory report, that means you are so low on memory that you're pretty much guaranteed to crash very soon." 200MiB was a pretty good number for that. I think part of the reason was to avoid janking users who are in survivable memory situations. Also, engineers can only look at so many reports, so focusing on the most extreme cases helps the problems stand out more. Another factor here is that we release the gBreakpadReservedVM ballast before collecting the GlobalMemoryStatusEx for the crash report. So let's say VM drops to exactly 200MiB and we collect a memory report, and immediately crash; then the crash report is likely to report some value between 200 and 280MiB.

Gabriele Svelto [:gsvelto]

Assignee

Comment 2

•

8 years ago

(In reply to David Major [:dmajor] from comment #1) > It may be helpful if I give some explanation (as the author of the code in > question). This was written circa 2014 when Win64 Firefox wasn't really a > thing, and address-space-OOMs were a huge source of crashes on Win32. We > were trying to collect more data on hopelessly-low-address-space machines. > The idea was not "if you hit an OOM crash, then you should have collected a > memory report" but the converse: "if you are collecting a memory report, > that means you are so low on memory that you're pretty much guaranteed to > crash very soon." 200MiB was a pretty good number for that. I think part of > the reason was to avoid janking users who are in survivable memory > situations. Also, engineers can only look at so many reports, so focusing on > the most extreme cases helps the problems stand out more. Makes sense. > Another factor here is that we release the gBreakpadReservedVM ballast > before collecting the GlobalMemoryStatusEx for the crash report. So let's > say VM drops to exactly 200MiB and we collect a memory report, and > immediately crash; then the crash report is likely to report some value > between 200 and 280MiB. I see, thanks for the explanation. Until this week I would have thought that both the thresholds we used in the available memory tracker and this one were sound, but spending a couple of days on our telemetry data showed me that my gut feeling about them was wrong. OOM crash pings for 32-bit Windows show >70% of OOM crashes happening with more than 256MiB of virtual memory available so we'll have to revise both the available memory tracker threshold and this one to make them effective. I must admit I haven't looked at older data so it's possible that a few years ago these thresholds did work well. After all on early Firefox OS devices we had low-memory detection set to a ridiculously low amount, something like 20MiB, and it worked quite well.

Gabriele Svelto [:gsvelto]

Assignee

Updated

•

7 years ago

Depends on: 1468207

Gabriele Svelto [:gsvelto]

Assignee

Comment 3

•

7 years ago

Attached file Bug 1459212 - Save memory reports for use in crash reports when low on memory — Details

This moves the code that detects very low memory scenarios and grabs memory reports from the main thread event-loop to the available memory tracker. Besides removing the overhead of the check from the event-loop code this increases the likeliness of the reports being gathered by sampling at a higher frequency but only when we already detected a low-memory scenario. Last but not least this add checks for low commit-space detection alongside low virtual-memory detection.

Gabriele Svelto [:gsvelto]

Assignee

Updated

•

7 years ago

Assignee: nobody → gsvelto

Status: NEW → ASSIGNED

(Away)

Comment 4

•

7 years ago

Comment on attachment 9002130 [details] Bug 1459212 - Save memory reports for use in crash reports when low on memory David Major [:dmajor] has approved the revision.

Attachment #9002130 - Flags: review+

Andrew McCreight [:mccr8]

Comment 5

•

7 years ago

Comment on attachment 9002130 [details] Bug 1459212 - Save memory reports for use in crash reports when low on memory Andrew McCreight [:mccr8] has approved the revision.

Attachment #9002130 - Flags: review+

Gabriele Svelto [:gsvelto]

Assignee

Comment 6

•

7 years ago

To add another datapoint, right now only 20-25% of OOM crashes come with a memory report. My hope is to raise that to at least 50%. https://sql.telemetry.mozilla.org/queries/56039

Gabriele Svelto [:gsvelto]

Assignee

Comment 7

•

7 years ago

On crash-stats the percentage is also much, much lower (like less than 1%). I'm not sure why though.

Pulsebot

Comment 8

•

7 years ago

Pushed by gsvelto@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/c2422757c912 Save memory reports for use in crash reports when low on memory r=dmajor,mccr8

Dorel Luca [:dluca]

Comment 9

•

7 years ago

bugherder

https://hg.mozilla.org/mozilla-central/rev/c2422757c912

Status: ASSIGNED → RESOLVED

Closed: 7 years ago

status-firefox63: --- → fixed

Resolution: --- → FIXED

Target Milestone: --- → mozilla63

Gabriele Svelto [:gsvelto]

Assignee

Comment 10

•

7 years ago

Since we landed this the number of OOM crashes that include a memory report has more than doubled: https://sql.telemetry.mozilla.org/queries/56039#151881 Before only 10-15% of the OOM crashes had a memory report attached, now we're up at 30-35%. In particular Win64 builds never had memory reports attached while now they have almost half of the time. This is calculated using crash pings, it would be interesting to see how this reflects

Gabriele Svelto [:gsvelto]

Assignee

Comment 11

•

7 years ago

On crash-stats the difference is night and day: before we had one memory report every ~50 OOM|small crashes, now we have more than two out of five.

You need to log in before you can comment on or make changes to this bug.

Bugzilla

Integrate nsThread::SaveMemoryReportNearOOM() with the available memory tracker

Categories

(Core :: Memory Allocator, enhancement)

Tracking

()

People

(Reporter: gsvelto, Assigned: gsvelto)

References

(Blocks 1 open bug)

Details

Crash Data

Security

(public)

User Story

Attachments

(1 file)

Description

Comment 1

Comment 2

Updated

Comment 3

Updated

Comment 4

Comment 5

Comment 6

Comment 7

Comment 8

Comment 9

Comment 10

Comment 11

Attachment

General

Description

File Name

Content Type