Closed Bug 493779 Opened 15 years ago Closed 14 years ago

Breakpad should collect/detect OOM

Categories

(Toolkit :: Crash Reporting, enhancement)

enhancement
Not set
normal

Tracking

()

RESOLVED FIXED
Tracking Status
blocking2.0 --- -

People

(Reporter: jruderman, Assigned: ted)

References

(Blocks 1 open bug)

Details

(Whiteboard: [crashkill][crashkill-metrics])

Attachments

(1 file, 1 obsolete file)

It would be nice if Breakpad collected information to distinguish OOM crashes from other crashes.  With this information, it will be more clear how many users are hitting terrible memory leaks, and crashes due to OOM will be less confusing.

My guesses are:
1) virtual memory used by the Firefox process
2) disk space free
but I'm probably wrong about what to measure ;)
Fun fact: there were 215 crash reports corresponding to OOM crash bug 502687 in the last week.

Some Linux users have ulimits set, which means Breakpad can't determine whether we're in an OOM condition just by looking at the amount of virtual memory used.  I wonder if there's another way.

FWIW, the xmalloc work might obsolete the need for Breakpad changes.
Jesse: if you can tell me exactly what you'd like to see collected here on all 3 platforms, then I can make it happen.
I'm not really sure :(  Maybe we should try measuring virtual memory usage, and try it with a known OOM crash to see if it works reliably.  Or maybe dbaron has other ideas.
Whiteboard: [crashkill][crashkill-metrics]
free(1) on Linux seems to give reasonable output for swap

so, if the process is still alive, you should be able to read /proc/{pid}/limits

Limit                     Soft Limit           Hard Limit           Units     
Max cpu time              unlimited            unlimited            seconds   
Max file size             unlimited            unlimited            bytes     
Max data size             unlimited            unlimited            bytes     
Max stack size            10485760             unlimited            bytes     
Max core file size        0                    unlimited            bytes     
Max resident set          unlimited            unlimited            bytes     
Max processes             16384                16384                processes 
Max open files            1024                 1024                 files     
Max locked memory         32768                32768                bytes     
Max address space         unlimited            unlimited            bytes     
Max file locks            unlimited            unlimited            locks     
Max pending signals       16384                16384                signals   
Max msgqueue size         819200               819200               bytes     
Max nice priority         0                    0                    
Max realtime priority     0                    0                    

that handles getting limits on Linux.
on os x, vm_stat(1) seems like Linux's free(1). For Solaris, it sounds like that's swap -l/swap -s.

for os x/solaris, it's probably best to just use getrlimit at startup and dump that data in advance, or just rely on the fact that the spawned crash reporter will inherit the limits and have it look them up.

For windows, probably:
http://msdn.microsoft.com/en-us/library/ms683227(VS.85).aspx
GetProcessWorkingSetSizeEx

http://msdn.microsoft.com/en-us/library/aa366589(VS.85).aspx
GlobalMemoryStatusEx
The thing is (for me) that breakpad does not seem to catch some OOM crashes (such as i reported in bug 590674 ), instead windows 7 just shows the "problem with application" dialog offering me to close it or search for solutions without opening the crash reporter window.

So it's possible that they're under-reported.
Yes, it's quite possible. Since we write browser dumps in-process, if we're OOM we might not have enough memory left to write a dump. I filed bug 587729 on making us do all dump generation out of process, which should help this.
(In reply to comment #5)
> http://msdn.microsoft.com/en-us/library/aa366589(VS.85).aspx
> GlobalMemoryStatusEx

This looks pretty useful, the resulting structure has ullTotalVirtual / ullAvailVirtual fields, which I think would be pretty indicative of whether we're crashing due to OOM. (Although if it's VM fragmentation, it might not be as obvious.)
Assignee: nobody → ted.mielczarek
This patch adds three extra annotations to Windows crash reports:
SystemMemoryUsePercentage: the percentage of physical memory in use by the system (in bytes)
TotalVirtualMemory: the total amount of virtual memory available to the process (in bytes)
AvailableVirtualMemory: the current amount of virtual memory available to the process (in bytes)

The first should let us estimate if we're hitting swap on the system, and the latter two should let us determine if we're OOMing.
blocking2.0: --- → ?
Attachment #497701 - Flags: review?(benjamin)
blocking2.0: ? → -
Attachment #497701 - Flags: review?(benjamin) → review+
Comment on attachment 497701 [details] [diff] [review]
Report some memory information with Windows crash reports

Low-risk patch that should give us more info about OOM crashes on Windows.
Attachment #497701 - Flags: approval2.0?
Attachment #497701 - Flags: approval2.0? → approval2.0+
Pushed to m-c:
http://hg.mozilla.org/mozilla-central/rev/b84f528e2e10

I'll file a followup on doing something useful on Mac/Linux, but I think most of our problems were on Windows.
Status: NEW → RESOLVED
Closed: 14 years ago
Resolution: --- → FIXED
Simple patch, just switches to use the alternate constructor for ExceptionHandler that allows passing in the MINIDUMP_TYPE parameter. The other two parameters have to do with out-of-process dump generation, passing them as NULL makes them effectively ignored. The fewer-argument constructor we were previously using would just pass them down as NULL internally anyway.
Attachment #499530 - Flags: review?(timeless)
Comment on attachment 499530 [details] [diff] [review]
Report some memory information with Windows crash reports. r+

Oops, user error on my part with bzexport. :)
Attachment #499530 - Attachment is obsolete: true
Attachment #499530 - Flags: review?(timeless)
Comment on attachment 497701 [details] [diff] [review]
Report some memory information with Windows crash reports

it would be helpful if we could have this on the 1.9.2 branch. this is safe and localized.
Attachment #497701 - Flags: approval1.9.2.15?
Comment on attachment 497701 [details] [diff] [review]
Report some memory information with Windows crash reports

Approved for 1.9.2.15, a=dveditz
Attachment #497701 - Flags: approval1.9.2.15? → approval1.9.2.15+
Comment on attachment 497701 [details] [diff] [review]
Report some memory information with Windows crash reports

Didn't make the code-freeze for non-blockers, you can try again next time.
Attachment #497701 - Flags: approval1.9.2.17+ → approval1.9.2.17-
Blocks: 528657
Blocks: 733262
Blocks: 669108
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: