Breakpad should collect/detect OOM

RESOLVED FIXED

Status

()

Toolkit
Crash Reporting
--
enhancement
RESOLVED FIXED
8 years ago
5 years ago

People

(Reporter: Jesse Ruderman, Assigned: ted)

Tracking

(Blocks: 2 bugs)

unspecified
Points:
---
Dependency tree / graph

Firefox Tracking Flags

(blocking2.0 -)

Details

(Whiteboard: [crashkill][crashkill-metrics])

Attachments

(1 attachment, 1 obsolete attachment)

(Reporter)

Description

8 years ago
It would be nice if Breakpad collected information to distinguish OOM crashes from other crashes.  With this information, it will be more clear how many users are hitting terrible memory leaks, and crashes due to OOM will be less confusing.

My guesses are:
1) virtual memory used by the Firefox process
2) disk space free
but I'm probably wrong about what to measure ;)
(Reporter)

Comment 1

8 years ago
Fun fact: there were 215 crash reports corresponding to OOM crash bug 502687 in the last week.

Some Linux users have ulimits set, which means Breakpad can't determine whether we're in an OOM condition just by looking at the amount of virtual memory used.  I wonder if there's another way.

FWIW, the xmalloc work might obsolete the need for Breakpad changes.
(Assignee)

Comment 2

8 years ago
Jesse: if you can tell me exactly what you'd like to see collected here on all 3 platforms, then I can make it happen.
(Reporter)

Comment 3

8 years ago
I'm not really sure :(  Maybe we should try measuring virtual memory usage, and try it with a known OOM crash to see if it works reliably.  Or maybe dbaron has other ideas.

Updated

8 years ago
Whiteboard: [crashkill][crashkill-metrics]

Comment 4

8 years ago
free(1) on Linux seems to give reasonable output for swap

so, if the process is still alive, you should be able to read /proc/{pid}/limits

Limit                     Soft Limit           Hard Limit           Units     
Max cpu time              unlimited            unlimited            seconds   
Max file size             unlimited            unlimited            bytes     
Max data size             unlimited            unlimited            bytes     
Max stack size            10485760             unlimited            bytes     
Max core file size        0                    unlimited            bytes     
Max resident set          unlimited            unlimited            bytes     
Max processes             16384                16384                processes 
Max open files            1024                 1024                 files     
Max locked memory         32768                32768                bytes     
Max address space         unlimited            unlimited            bytes     
Max file locks            unlimited            unlimited            locks     
Max pending signals       16384                16384                signals   
Max msgqueue size         819200               819200               bytes     
Max nice priority         0                    0                    
Max realtime priority     0                    0                    

that handles getting limits on Linux.

Comment 5

8 years ago
on os x, vm_stat(1) seems like Linux's free(1). For Solaris, it sounds like that's swap -l/swap -s.

for os x/solaris, it's probably best to just use getrlimit at startup and dump that data in advance, or just rely on the fact that the spawned crash reporter will inherit the limits and have it look them up.

For windows, probably:
http://msdn.microsoft.com/en-us/library/ms683227(VS.85).aspx
GetProcessWorkingSetSizeEx

http://msdn.microsoft.com/en-us/library/aa366589(VS.85).aspx
GlobalMemoryStatusEx

Comment 6

7 years ago
The thing is (for me) that breakpad does not seem to catch some OOM crashes (such as i reported in bug 590674 ), instead windows 7 just shows the "problem with application" dialog offering me to close it or search for solutions without opening the crash reporter window.

So it's possible that they're under-reported.
(Assignee)

Comment 7

7 years ago
Yes, it's quite possible. Since we write browser dumps in-process, if we're OOM we might not have enough memory left to write a dump. I filed bug 587729 on making us do all dump generation out of process, which should help this.
(Assignee)

Comment 8

7 years ago
(In reply to comment #5)
> http://msdn.microsoft.com/en-us/library/aa366589(VS.85).aspx
> GlobalMemoryStatusEx

This looks pretty useful, the resulting structure has ullTotalVirtual / ullAvailVirtual fields, which I think would be pretty indicative of whether we're crashing due to OOM. (Although if it's VM fragmentation, it might not be as obvious.)
(Assignee)

Updated

7 years ago
Assignee: nobody → ted.mielczarek
(Assignee)

Comment 9

7 years ago
Created attachment 497701 [details] [diff] [review]
Report some memory information with Windows crash reports

This patch adds three extra annotations to Windows crash reports:
SystemMemoryUsePercentage: the percentage of physical memory in use by the system (in bytes)
TotalVirtualMemory: the total amount of virtual memory available to the process (in bytes)
AvailableVirtualMemory: the current amount of virtual memory available to the process (in bytes)

The first should let us estimate if we're hitting swap on the system, and the latter two should let us determine if we're OOMing.
(Assignee)

Updated

7 years ago
blocking2.0: --- → ?
(Assignee)

Updated

7 years ago
Attachment #497701 - Flags: review?(benjamin)
blocking2.0: ? → -
Attachment #497701 - Flags: review?(benjamin) → review+
(Assignee)

Comment 10

7 years ago
Comment on attachment 497701 [details] [diff] [review]
Report some memory information with Windows crash reports

Low-risk patch that should give us more info about OOM crashes on Windows.
Attachment #497701 - Flags: approval2.0?
Attachment #497701 - Flags: approval2.0? → approval2.0+
(Assignee)

Comment 11

7 years ago
Pushed to m-c:
http://hg.mozilla.org/mozilla-central/rev/b84f528e2e10

I'll file a followup on doing something useful on Mac/Linux, but I think most of our problems were on Windows.
Status: NEW → RESOLVED
Last Resolved: 7 years ago
Resolution: --- → FIXED
(Assignee)

Comment 12

7 years ago
Created attachment 499530 [details] [diff] [review]
Report some memory information with Windows crash reports. r+

Simple patch, just switches to use the alternate constructor for ExceptionHandler that allows passing in the MINIDUMP_TYPE parameter. The other two parameters have to do with out-of-process dump generation, passing them as NULL makes them effectively ignored. The fewer-argument constructor we were previously using would just pass them down as NULL internally anyway.
Attachment #499530 - Flags: review?(timeless)
(Assignee)

Comment 13

7 years ago
Comment on attachment 499530 [details] [diff] [review]
Report some memory information with Windows crash reports. r+

Oops, user error on my part with bzexport. :)
Attachment #499530 - Attachment is obsolete: true
Attachment #499530 - Flags: review?(timeless)

Comment 14

7 years ago
Comment on attachment 497701 [details] [diff] [review]
Report some memory information with Windows crash reports

it would be helpful if we could have this on the 1.9.2 branch. this is safe and localized.
Attachment #497701 - Flags: approval1.9.2.15?
Comment on attachment 497701 [details] [diff] [review]
Report some memory information with Windows crash reports

Approved for 1.9.2.15, a=dveditz
Attachment #497701 - Flags: approval1.9.2.15? → approval1.9.2.15+
Comment on attachment 497701 [details] [diff] [review]
Report some memory information with Windows crash reports

Didn't make the code-freeze for non-blockers, you can try again next time.
Attachment #497701 - Flags: approval1.9.2.17+ → approval1.9.2.17-
(Reporter)

Updated

6 years ago
Blocks: 528657
(Reporter)

Updated

5 years ago
Blocks: 733262
(Reporter)

Updated

5 years ago
Blocks: 669108
You need to log in before you can comment on or make changes to this bug.