Last Comment Bug 772338 - jemalloc doesn't check the result of VirtualAlloc(MEM_COMMIT) in pages_commit
: jemalloc doesn't check the result of VirtualAlloc(MEM_COMMIT) in pages_commit
Status: RESOLVED FIXED
:
Product: Core
Classification: Components
Component: Memory Allocator (show other bugs)
: Trunk
: x86 Windows XP
: -- normal (vote)
: mozilla16
Assigned To: Justin Lebar (not reading bugmail)
:
:
Mentors:
Depends on:
Blocks: 706442
  Show dependency treegraph
 
Reported: 2012-07-09 20:58 PDT by Justin Lebar (not reading bugmail)
Modified: 2012-07-10 15:46 PDT (History)
3 users (show)
See Also:
Crash Signature:
(edit)
QA Whiteboard:
Iteration: ---
Points: ---
Has Regression Range: ---
Has STR: ---


Attachments
Patch, v1 (783 bytes, patch)
2012-07-09 21:03 PDT, Justin Lebar (not reading bugmail)
mh+mozilla: review+
Details | Diff | Splinter Review

Description Justin Lebar (not reading bugmail) 2012-07-09 20:58:24 PDT
jemalloc doesn't check the result of VirtualAlloc(MEM_COMMIT) in pages_commit.  When VirutalAlloc fails, jemalloc happily returns uncommitted memory to its caller, who then segfaults.

Fixing this so that a pages_commit failure causes us to return NULL looks somewhat involved.  Two of the pages_commit calls are simple to guard for failure, but one call happens in arena_run_split, and that looks hard to make fallible.

We could simply abort when the commit fails, I guess...

(See security bug 706442, crash signature [@ js::LifoAlloc::getOrCreateChunk(unsigned int) ], where we discovered that many of the crashes [1, 2, 3] were occurring with plenty of available memory, but very low available page file.)

[1] https://crash-stats.mozilla.com/report/index/d41a602f-015b-4dbb-9263-f687e2120709
[2] https://crash-stats.mozilla.com/report/index/90fcf4e0-db21-4d0b-abf1-8c6cb2120709
[3] https://crash-stats.mozilla.com/report/index/82cce1b7-2ede-42dc-b2f5-6c1312120709
Comment 1 Justin Lebar (not reading bugmail) 2012-07-09 21:03:22 PDT
Created attachment 640495 [details] [diff] [review]
Patch, v1
Comment 2 Justin Lebar (not reading bugmail) 2012-07-09 21:11:01 PDT
To plagiarize myself from bug 706442 comments 21-22:

The common thread I see in all the crash reports I looked at is low "available page file".  That number is ullAvailPageFile from MEMORYSTATUSEX, "The maximum amount of memory the current process can commit, in bytes." [1]

It's pretty weird in some cases [2, 3, 4] that we have less than 5MB of available page file and 100+ MB of "available physical memory" (that's ullAvailPhys: "amount of physical memory currently available, in bytes. This is the amount of physical memory that can be immediately reused without having to write its contents to disk first.").

Looking at some random, unrelated crash reports, there's always much more available page file than available physical memory, so I think this is an anomalous situation.

I'm not sure, but I think this may mean that the system has run out of space in its pagefile -- that is, the pagefile is too small, and Windows can't grow it, perhaps because the system is out of disk space.  Windows doesn't overcommit -- I understand that this means that if a process allocates a bunch of MEM_COMMIT virtual memory but doesn't touch it, that space is reserved and must fit either in core or the page file.  So if something (perhaps Firefox) on the user's machine is eating up a lot of MEM_COMMIT vmem, it's possible we could get into this state where there's a lot of physical memory available (because the pages haven't /actually/ been committed yet), but no pagefile available.

We never check the return value in jemalloc's pages_commit (VirtualAlloc(MEM_COMMIT)).  I wouldn't be surprised if that's what's failing here.

[1] http://msdn.microsoft.com/en-us/library/windows/desktop/aa366770%28v=vs.85%29.aspx
[2] https://crash-stats.mozilla.com/report/index/d41a602f-015b-4dbb-9263-f687e2120709
[3] https://crash-stats.mozilla.com/report/index/90fcf4e0-db21-4d0b-abf1-8c6cb2120709
[4] https://crash-stats.mozilla.com/report/index/82cce1b7-2ede-42dc-b2f5-6c1312120709
Comment 3 Justin Lebar (not reading bugmail) 2012-07-10 06:32:58 PDT
Upon reflection, I don't think we can do much more than crash here, in most cases.  Whatever allocation causes this abort is less than 1MB -- "huge" allocations 1MB and larger are mmap'ed separately, and never make use of decommitted memory.  So this is a relatively small allocation failing.

Even if Firefox recovered successfully from malloc returning NULL here, it would be likely to die from some NULL malloc soon thereafter.

The one thing is: I kind of wish we could display a message to the user. "Firefox crashed because your computer is out of memory.  See this SUMO article."  This crash in particular is very likely not our fault.

Anyway, inbound: https://hg.mozilla.org/integration/mozilla-inbound/rev/6c8ea05c83ad
Comment 4 Mike Hommey [:glandium] 2012-07-10 06:35:24 PDT
(In reply to Justin Lebar [:jlebar] from comment #3)
> The one thing is: I kind of wish we could display a message to the user.
> "Firefox crashed because your computer is out of memory.  See this SUMO
> article."  This crash in particular is very likely not our fault.

That would be something very useful to have. But it would require freeing memory before being able to do so.
Comment 5 Justin Lebar (not reading bugmail) 2012-07-10 06:36:17 PDT
> That would be something very useful to have. But it would require freeing memory before being able 
> to do so.

We could display it in the crash reporter?
Comment 6 Ryan VanderMeulen [:RyanVM] 2012-07-10 15:46:24 PDT
https://hg.mozilla.org/mozilla-central/rev/6c8ea05c83ad

Note You need to log in before you can comment on or make changes to this bug.