Closed Bug 611123 Opened 14 years ago Closed 1 year ago

malloc should be infallible

Categories

(Core :: XPCOM, defect)

defect

Tracking

()

RESOLVED WORKSFORME
Tracking Status
blocking2.0 --- -

People

(Reporter: jruderman, Unassigned)

References

Details

(Keywords: sec-want, Whiteboard: [sg:want P1])

So far we've made the "new" operator infallible, but not the "malloc" function, which is also widely used by both Firefox and system libraries.

Lately we've been getting a lot of security bug reports that boil down to "Here's a testcase that makes Firefox run out of memory.  One time, I saw a stack trace with something other than an allocation-abort or null deref".  These bugs are nearly impossible to fix except with a guarantee that all the allocation sites have been made to abort on failure.
This is the plan, but... the problem with this is that we have many known locations which can turn into aborts very very easily, which we do currently null-check. We really need to find allocation sites where the size is variable, and convert those to use the fallible allocator.
Even variable-size mallocs should usually be infallable.

We should only use the fallible allocator sites where huge allocations are extremely likely AND failure can be propagated sanely.  For most of our data structures, including strings and arrays, those things aren't true.
If you want to reduce OOM crashes on mobile, you'll have to do clever things when memory is *low*, such as posting a memory-pressure event or refusing to open new tabs.  Making a bunch of "variable-size" mallocs fallible will just add untestable attack surface and scatter the crashes.
Note that we have some current string consumers that DO check for OOM and would end up aborting instead of (correctly) throwing.  And they're not hard to hit: the new File apis will run into trouble if they try to read a large file into memory, for example.

So even for strings we'd want a non-aborting API, alongside the aborting one....
blocking2.0: --- → ?
(In reply to comment #1)
> This is the plan, but... the problem with this is that we have many known
> locations which can turn into aborts very very easily, which we do currently
> null-check.

(In reply to comment #4)
> Note that we have some current string consumers that DO check for OOM and would
> end up aborting instead of (correctly) throwing.

On systems that overcommit, I suspect these checks are of limited value.
> On systems that overcommit, I suspect these checks are of limited value.

Please read bug 604561?  Sure seems to work on Windows, at least.
Yes, Microsoft's VirtualAlloc (the way jemalloc uses it) does not commit more
than is available in ram and page files.

On Linux, overcommit protection was disabled in bug 465127.

With the default Linux policy of OVERCOMMIT_GUESS, only each individual mmap
allocation is checked against currently unused ram and swap (where "unused"
means untouched rather than unallocated).  Two consecutive allocations, for
example, will see the same amount of untouched memory, and so may both
succeed, but the app can still receive a SIGKILL when it tries to use the pages.
__vm_enough_memory is the relevant function at
http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=blob;f=mm/mmap.c;h=b179abb1474ae41bff47060b5241045d3b1b12ad;hb=HEAD#l119

A comment in jemalloc.c says "OS X over-commits" but I don't know whether or not that still applies in the situation when jemalloc is not used.
Even on systems that do not commit more than is available in ram and swap,
that is not really the right test for us in general.

Much of our memory is regularly garbage collected, so the assumption that it
can exist in swap (without disabling GC) is incorrect.

We already have a DoS if GCed memory is being forced to swap.

One thing a malloc failure check is useful for on all OSes is to check against address-space limits, which is a useful check for 32-bit processes.
(In reply to comment #8)
> Much of our memory is regularly garbage collected, so the assumption that it
> can exist in swap (without disabling GC) is incorrect.
> 
> We already have a DoS if GCed memory is being forced to swap.
Does that mean that unused portions of FF (such as background tabs) won't be swapped, i.e. that FF consumes more physical ram than necessary?
(In reply to comment #9)
> Does that mean that unused portions of FF (such as background tabs) won't be
> swapped

Some unused portions, yes (effectively).

For background tabs that have timers running, there is little that can be done.

For tabs that are really inactive, there are techniques that can improve the situation by keeping GC'd (and cycle collected) objects in separate pools of pages and grouping related objects into a single pool (compartment).  I haven't looked at how much this is done or how much it is helping.
blocking2.0: ? → -
See bug 709952 for some ideas on recovering from OOM (with less complexity than having every malloc caller check for null).
Depends on: 804492
Depends on: 606198
No longer depends on: 804492
Severity: normal → S3

I think malloc is infallible by default already.

Status: NEW → RESOLVED
Closed: 1 year ago
Resolution: --- → WORKSFORME
You need to log in before you can comment on or make changes to this bug.