Closed Bug 610823 Opened 9 years ago Closed 9 years ago

nsTArray should default to infallible

Categories

(Core :: XPCOM, defect)

defect
Not set

Tracking

()

RESOLVED FIXED
Tracking Status
blocking2.0 --- betaN+

People

(Reporter: jruderman, Assigned: sicking)

References

(Blocks 1 open bug)

Details

(Whiteboard: [sg:want P2][hardblocker])

Attachments

(2 files)

Bug 550611 added the option to make nsTArray infallible.

We should make this the default, because it's safer and the resulting crashes are easier to understand.  And we should do it soon, because security researchers have been all over our OOM bugs lately.

The risk of making this change is introducing new topcrashes in a small number of locations. The risk of not making this change is continuing to have hard-to-debug topcrashes and security bug reports.
blocking2.0: --- → ?
I agree, except that the risk is new topcrashes in an unknown but probably small number of locations.  If we do this it should be soon so as to smoke those out.
Also, fennec would be a great source of OOM data, if we got crash reporting hooked up for content processes.
We _definitely_ have callsites that depend on nsTArray returning OOM properly.

How long is the list of AppendElements() callers?  Can we just audit them?

I don't think we have good enough code coverage, fwiw, to really rely on topcrash reporting as a way of finding the crash bugs.
(In reply to comment #3)
> We _definitely_ have callsites that depend on nsTArray returning OOM properly.

"Depend on" in what way?

We have many, many more callsites that don't deal properly.

> How long is the list of AppendElements() callers?  Can we just audit them?

Over a hundred AppendElements(), over a thousand AppendElement(), ...

> I don't think we have good enough code coverage, fwiw, to really rely on
> topcrash reporting as a way of finding the crash bugs.

I don't understand this comment.  We have enough topcrash coverage to find topcrashes, by definition.  We don't need to find "all" the OOM crash "bugs".
One strategy would be to make the switch right now, in the beginning of a beta cycle, and see how nightlies fare. It's easy enough to revert before beta if we're worried still at that point.
Jesse convinced me that we should give this a try.

While I can believe that we have places that can deal with OOM for nsTArrays, in all places that I can think of, we should never end up enough entries in a nsTArray that the array runs out of memory in non-malicious usage.

I'm sure they are there, and we should try to track them down and fix them, but I think it's worth trying to flick the switch here for now.
blocking2.0: ? → beta8+
bz: please do comment on cases that you can think of that we should flag as fallable? Could very well be more than I can think of so I would love to get some ideas to seed my brain.
> "Depend on" in what way?

Knowingly allocate content-controlled TArrays and expect to throw OOM to the web page if that fails.

I'm not worried overmuch about AppendElement; AppendElements should be the places to worry about.

> non-malicious usage.

I'm obviously assuming malicious usage here...

Jesse, my point is that topcrash coverage won't tell us about sg:dos bugs we introduce.

I'll see if I can find the callsites I'm thinking of.
Some obvious places that at least seem to check for AppendElements OOM and create arrays of web-page-controlled size:

nsMediaList::Clone (this handles it correctly as far as I know), nsTextFrameThebes multiple callsites, nsCSSRuleProcessor::AppendFontFaceRules (though I wouldn't mind this one being infallible, really).

RestyleTracker::ProcessRestyles, though on OOM here we just put off processing restyles till later, which may well not work at all, of course.

But yeah, some other callers are just on crack (e.g. FirePageHideEvent in nsFrameLoader.cpp)....
> > non-malicious usage.
> 
> I'm obviously assuming malicious usage here...

But if the concern is malicious usage here, isn't the "worst" they can do to cause the browser to exit? I would have thought there are plenty of ways to do that already (how many null-deref bugs do we have?)

My main concern was new topcrashers due to us now exiting where we used to properly handle OOM. That should only be the case when a single array grows very large so that even if a single nsTArray buffer-growth allocation fails, we still aren't low enough on memory to cause problems elsewhere.
blocking2.0: beta8+ → betaN+
Shaver pointed out that this could be considered an API change.

One possible way to deal with that might be to use macro-magic such that if embedding code uses nsTArray their arrays don't default to infallable.

I don't have an opinion either way.
The glue is statically linked, so if you're building against an old glue you don't get a behavior change. I don't think the argument is relevant here.
Blocks: 427099
Oh, also, if we're concerned about malicious usage, it seems like it'd be a good tradeoff to change "likely exploitable crashes" into "known dos".

And we can always fix the known dos ones both on m-c and in dot releases as we find them.
Blocks: 585943
Blocks: 578914
Comment on attachment 489335 [details] [diff] [review]
Make nsTArray infallible by default.

Just realized you need to switch the default for nsAutoTArray as well.

Is there any movement here? We should have landed this early in the beta8 cycle, now it is probably too late for beta8, but maybe we can still do it in beta9?
Attachment #489335 - Flags: review+ → review-
(In reply to comment #15)
> Comment on attachment 489335 [details] [diff] [review]
> Make nsTArray infallible by default.
> 
> Just realized you need to switch the default for nsAutoTArray as well.
> 

This patch does that.  nsAutoTArray uses nsTArrayDefaultAllocator too.
Comment on attachment 489335 [details] [diff] [review]
Make nsTArray infallible by default.

You're totally right.
Attachment #489335 - Flags: review- → review+
Assignee: nobody → jones.chris.g
Blocks: 616421
Assignee: jones.chris.g → jonas
So it appears the font code uses tarrays as generic memory buffers all over the place, so for now it might be safer to let them continue to use fallible arrays.

So I basically changed all nsTArray<PRUint8> over to explicitly use a fallible type.

John: The result of this patch should be nothing. I.e. the fonts code will continue to use the same type as it always has. The only difference is that it explicitly does so, so that when we change nsTArray, the font code remains unchanged. If you think that some of this code should use infallible buffers feel free to let me know the review. Or you can always switch back later.
Attachment #499103 - Flags: review?(jdaggett)
Pulling this one in to beta9. If we're gonna do this we should do it sooner rather than later.
blocking2.0: betaN+ → beta9+
As per today's meeting, beta 9 will be a time-based release. Marking these all betaN+. Please move it back to beta9+ if  you believe it  MUST be in the next beta (ie: trunk is in an unshippable state without this)
No longer depends on: 550611
blocking2.0: beta9+ → betaN+
Depends on: 550611
Whiteboard: [sg:want P2] → [sg:want P2] hardblocker
Comment on attachment 499103 [details] [diff] [review]
Change a few uses to fallible

The patch seems fine but I'm unsure about the consistency within the font code.  There are lots of places that check the result of AppendElements that will now use the infallible (i.e. abort on OOM) when the code now will simply handle the OOM.  Jonas, can you explain a little more the principle you used to decide which arrays should be fallible and which don't need to be?
I wanted to err on the "cautious" side by replacing all cases which currently use nsTArray<PRUint8>, which is currently implicitly fallible, to use FallibleTArray<PRUint8>, which is explicitly fallible.

I figured that would cover all cases where nsTArray is used to hold buffers.

Note that the places changed are the places whose behavior will *not* change. I.e. they will have to continue to check for OOM since they will *continue* to use the fallible allocator.

The places that still use nsTArray can and should eventually remove their OOM checks. That can happen after the first patch in this bug has landed and is known to stick. But we should probably do that using some automated code rewriting tools since that's a huge tree-wide change. So leaving that to a different bug.
Attachment #499103 - Flags: review?(jdaggett) → review+
Whiteboard: [sg:want P2] hardblocker → [sg:want P2][hardblocker]
You need to log in before you can comment on or make changes to this bug.