jemalloc doesn't check the result of VirtualAlloc(MEM_COMMIT) in pages_commit. When VirutalAlloc fails, jemalloc happily returns uncommitted memory to its caller, who then segfaults. Fixing this so that a pages_commit failure causes us to return NULL looks somewhat involved. Two of the pages_commit calls are simple to guard for failure, but one call happens in arena_run_split, and that looks hard to make fallible. We could simply abort when the commit fails, I guess... (See security bug 706442, crash signature [@ js::LifoAlloc::getOrCreateChunk(unsigned int) ], where we discovered that many of the crashes [1, 2, 3] were occurring with plenty of available memory, but very low available page file.)  https://crash-stats.mozilla.com/report/index/d41a602f-015b-4dbb-9263-f687e2120709  https://crash-stats.mozilla.com/report/index/90fcf4e0-db21-4d0b-abf1-8c6cb2120709  https://crash-stats.mozilla.com/report/index/82cce1b7-2ede-42dc-b2f5-6c1312120709
Created attachment 640495 [details] [diff] [review] Patch, v1
To plagiarize myself from bug 706442 comments 21-22: The common thread I see in all the crash reports I looked at is low "available page file". That number is ullAvailPageFile from MEMORYSTATUSEX, "The maximum amount of memory the current process can commit, in bytes."  It's pretty weird in some cases [2, 3, 4] that we have less than 5MB of available page file and 100+ MB of "available physical memory" (that's ullAvailPhys: "amount of physical memory currently available, in bytes. This is the amount of physical memory that can be immediately reused without having to write its contents to disk first."). Looking at some random, unrelated crash reports, there's always much more available page file than available physical memory, so I think this is an anomalous situation. I'm not sure, but I think this may mean that the system has run out of space in its pagefile -- that is, the pagefile is too small, and Windows can't grow it, perhaps because the system is out of disk space. Windows doesn't overcommit -- I understand that this means that if a process allocates a bunch of MEM_COMMIT virtual memory but doesn't touch it, that space is reserved and must fit either in core or the page file. So if something (perhaps Firefox) on the user's machine is eating up a lot of MEM_COMMIT vmem, it's possible we could get into this state where there's a lot of physical memory available (because the pages haven't /actually/ been committed yet), but no pagefile available. We never check the return value in jemalloc's pages_commit (VirtualAlloc(MEM_COMMIT)). I wouldn't be surprised if that's what's failing here.  http://msdn.microsoft.com/en-us/library/windows/desktop/aa366770%28v=vs.85%29.aspx  https://crash-stats.mozilla.com/report/index/d41a602f-015b-4dbb-9263-f687e2120709  https://crash-stats.mozilla.com/report/index/90fcf4e0-db21-4d0b-abf1-8c6cb2120709  https://crash-stats.mozilla.com/report/index/82cce1b7-2ede-42dc-b2f5-6c1312120709
Upon reflection, I don't think we can do much more than crash here, in most cases. Whatever allocation causes this abort is less than 1MB -- "huge" allocations 1MB and larger are mmap'ed separately, and never make use of decommitted memory. So this is a relatively small allocation failing. Even if Firefox recovered successfully from malloc returning NULL here, it would be likely to die from some NULL malloc soon thereafter. The one thing is: I kind of wish we could display a message to the user. "Firefox crashed because your computer is out of memory. See this SUMO article." This crash in particular is very likely not our fault. Anyway, inbound: https://hg.mozilla.org/integration/mozilla-inbound/rev/6c8ea05c83ad
(In reply to Justin Lebar [:jlebar] from comment #3) > The one thing is: I kind of wish we could display a message to the user. > "Firefox crashed because your computer is out of memory. See this SUMO > article." This crash in particular is very likely not our fault. That would be something very useful to have. But it would require freeing memory before being able to do so.
> That would be something very useful to have. But it would require freeing memory before being able > to do so. We could display it in the crash reporter?