Closed Bug 1174250 Opened 5 years ago Closed 5 years ago
Annotate OOM size for pages
_commit (physical memory exhaustion)
The crash signature "moz_abort | pages_commit" indicates a physical memory exhaustion. https://hg.mozilla.org/mozilla-central/file/46dde6cfd303/memory/mozjemalloc/jemalloc.c#l2053 As more pre-release channel users move to 64-bit builds, we're seeing address-space OOMs disappear and physical OOMs rise as physical memory and pagefile become the limiting factors. We should record the size of the failed allocations in crash reports, like we do for address space OOMs. The naive fix is to call mozalloc_handle_oom() when pages_commit fails, but that didn't link, so I'll need some build-wrangling help. (I don't have the build log in front of me right now; I'll attach it when I do)
Here's my first attempt and the resulting error.
Glandium, sorry... you proposed a workaround a few weeks ago but I don't remember what it was.
My irc logs say that I told you I don't care how you hook things up because jemalloc3 doesn't have this code path. Nothing about build failures.
I'd appreciate help in getting this build working.
I should revisit this now that we have jemalloc4. I think the new equivalent is: [@ JemallocInit::CommitHook ]
After a fresh look, I'm starting to understand why I had trouble building. The proper "jemalloc code" knows nothing about our OOM reporting system. It's only concerned with returning a pointer or null. The translation from null to a crash happens further up, in the wrappers in memory/mozalloc/mozalloc.cpp. My attempt to call mozalloc's abort handler from the guts of jemalloc was a layering violation, so no surprise that it didn't work.
So, for MEM_COMMIT failures, instead of crashing deep inside jemalloc, it would be nice to bubble up a null pointer and let the infallibility-wrappers do the crashing. That would also conveniently handle the size annotation. And it would allow fallible allocations to survive a MEM_COMMIT failure. Glandium: I notice that the CommitHook returns a bool. Instead of crashing, could we use that return value to indicate failure?
I'm somewhat scared of this change, because I have no way to see its effects. There's no test coverage, since any test whose behavior this changes would have previously crashed. At first glance jemalloc _looks_ like it can handle this failure, but I don't know the code very well. (What happens to that run?) If this messes up some state, how will that appear? Random crashes later?
Attachment #8660132 - Flags: review?(mh+mozilla) → review+
You need to log in before you can comment on or make changes to this bug.