Closed
Bug 595593
Opened 11 years ago
Closed 10 years ago
Repeated use of getContext on canvas elements causes crash
Categories
(Core :: JavaScript Engine, defect)
Tracking
()
People
(Reporter: pleasestand, Assigned: cjones)
References
()
Details
(Keywords: crash, testcase, Whiteboard: [sg:dos (trunk)][sg:critical? (1.9 branches)])
Attachments
(2 files)
52.00 KB,
text/plain
|
Details | |
1.48 KB,
patch
|
vlad
:
review+
|
Details | Diff | Splinter Review |
User-Agent: Mozilla/5.0 (Windows; U; Windows NT 6.1; en-US; rv:1.9.2.9) Gecko/20100824 Firefox/3.6.9 Build Identifier: Mozilla/5.0 (Windows; U; Windows NT 6.1; en-US; rv:1.9.2.9) Gecko/20100824 Firefox/3.6.9 Creating new canvas elements and obtaining the "2d" contexts in a tight loop causes both Firefox 3.6.9 and the 11-Sep-2010 nightly to crash. Reproducible: Always Steps to Reproduce: 1. Open the javascript: URL provided. Actual Results: After several seconds, Firefox/Minefield crashes. Expected Results: After several more seconds, the script should have timed out, which would have made stopping it possible.
![]() |
||
Comment 1•11 years ago
|
||
As it's seems it's an OOM Crash looking at my Windows XP Task Manager with the Page File being occupied and a WinDBG Stacktrace. 0012d010 781399cd MOZCRT19!arena_malloc_small(struct arena_s * arena = 0xffffffff, unsigned int size = 0x20, int zero = 0)+0x6c [e:\builds\moz2_slave\mozilla-central-win32-nightly\build\obj-firefox\memory\jemalloc\crtsrc\jemalloc.c @ 3750] 0012d020 004d105a MOZCRT19!malloc(unsigned int size = 0x1057da64)+0x3d [e:\builds\moz2_slave\mozilla-central-win32-nightly\build\obj-firefox\memory\jemalloc\crtsrc\jemalloc.c @ 5873]
![]() |
||
Comment 2•10 years ago
|
||
This is similar to that GLContext thing, but with just pure memory exhaustion.... _This_ case really needs to be handled via GC and the like, no?
Target Milestone: --- → flash10
If we had xpcomgc, maybe; I don't see how JS GC can handle it without it being involved in our OOM handling -- we only use a tiny amount of the JS heap. Only way we could solve this is if we were able to do a GC during malloc() failure. (Note that canvas defaults to 300x200 [i have no idea why] if no dimensions are specified, so we're using 300*200*4 ~= 1/4MB per canvas element here.)
Target Milestone: flash10 → ---
![]() |
||
Comment 4•10 years ago
|
||
js gc has an "allocated memory" tracker. We could tell it about the memory we're allocating here like built-in js objects do.
> Note that canvas defaults to 300x200 [i have no idea why]
It's 300x150, which is the default replaced element size in CSS, because that's how big IE4/NS4 made iframes by default.
Comment 5•10 years ago
|
||
Yeah, and we are really trying to get rid of that allocated memory counter because its total clown shoes. It doesn't count fragmentation and is mostly just guessing for realloc(). What we really want is actual working set measurement and management. I have an old patch for that if anyone cares. For this specific bug, while I agree that good working set management is great to avoid OOM, OOM on large allocations also should do direct failure handling. While do we crash here? Whats the full stack?
Status: UNCONFIRMED → NEW
Ever confirmed: true
![]() |
||
Comment 6•10 years ago
|
||
XtC4UaLL, want to attach the full stack?
![]() |
||
Comment 7•10 years ago
|
||
I made this Log against Mozilla/5.0 (Windows NT 5.1; rv:2.0b10pre) Gecko/20110124 Firefox/4.0b10pre ID:20110124030332 (http://hg.mozilla.org/mozilla-central/rev/80266029824b). Alternatively I provide a Breakpad Crash ID: bp-fb3059b5-7fbd-4a87-889c-446eb2110124 [@ mozalloc_handle_oom() ]
Comment 8•10 years ago
|
||
Large/variable allocations must be fallible_t and check. /be
![]() |
||
Comment 9•10 years ago
|
||
That seems to be crashing during the constructor of the nsTArray<ContextState> mStyleStack; member. This is an infallible array initialized thusly: mStyleStack(20) in the nsCanvasRenderingContext2D constructor (why is this not an auto array? Who knows.) In any case, sizeof(ContextState) is 88 bytes over here on Mac 64-bit; I'm not sure whether the 0x79 in that log is the correct size on Windows or just the debugger lying. In any case, that's a 1760 byte allocation. Does that count as "large"? :(
Comment 10•10 years ago
|
||
Not large, obviously, but then why did we crash? /be
![]() |
||
Comment 11•10 years ago
|
||
The testcase runs an infinite loop doing the following: 1) createElement("canvas"): 80 bytes plus whatever allocations js-wrapping does. 2) getContext("2d"): 68 bytes for the object itself 1760 bytes for the ContextState array 180,000 bytes for the graphics surface backing store plus whatever allocations js-wrapping does The image surface allocation is null-checked (e.g. see the fallible allocation in gfxImageSurface::gfxImageSurface); failure to allocate the image buffer is silent and doesn't throw back into JS. Perhaps it should! Of the remaining allocations, the ContextState array allocation is most likely to fail, no? So it would be the most likely place to crash. The 180KB allocations make sure that we don't create enough JS garbage in the process of doing all this to trigger GC, I assume.
Assignee | ||
Comment 12•10 years ago
|
||
Do we know that the 3.6.x crash is safe? It's not using infallible tarray.
Assignee | ||
Comment 13•10 years ago
|
||
A last resort could be to alloc and init the canvas context in a fallible factory function, and use fallible tarray for mStyleStack.
![]() |
||
Comment 14•10 years ago
|
||
I suspect that on trunk making image surface init failure throw to JS would be quite enough. I'm not sure what the answer to comment 12 is. :(
Comment 15•10 years ago
|
||
Alright, so I think the real fix here is working set management. The 2d context is one example of a DOM node that allocates a small JS object header and with it a crap ton of C++ storage. The latter is never accounted for. I am sure there are other ways to trigger this. Just loop over something that allocates a WN that happens to be backed by a large-ish C++ struct. If the test is changed to keep the objects all alive, what do we do then? A GC couldn't save us. Again, I really think we need true working set management. I will refresh my patch. Triaging to XPConnect and taking.
Comment 16•10 years ago
|
||
Actually JS engine is probably more appropriate.
Assignee: nobody → general
Component: Canvas: 2D → JavaScript Engine
QA Contact: canvas.2d → general
Updated•10 years ago
|
Assignee: general → gal
![]() |
||
Comment 17•10 years ago
|
||
> If the test is changed to keep the objects all alive, what do we do then? Run out of memory in an honest-to-God way. And if the script catches OOM exceptions, even my proposal from comment 11 doesn't help and we crash. That's why we need content processes... A web page that really wants to can _always_ eat up RAM, unless we start throwing uncatchable exceptions and disabling event delivery or some such.
Comment 18•10 years ago
|
||
Actually we should hide this bug. The private comments give away that something is up and there is a public test case. If its safe, we can re-open.
Group: core-security
Comment 19•10 years ago
|
||
We're not redoing GC memory pressure feedback at this stage of the release! That is guaranteed to cause latent bugs to bite. /be
![]() |
||
Comment 20•10 years ago
|
||
Who said anything about "the release"? This isn't a blocker; it's not even nominated. We need to answer the question in comment 12, but that's all we need to do for 2.0 here. I wouldn't even make the "image surface init fail should throw to JS" change for 2.0 at this point!
Comment 21•10 years ago
|
||
bz: why wouldn't we do what you proposed in comment 14 if we think there's a risk of exploitability here? If we don't, let's not make this bug s-s! /be
![]() |
||
Comment 22•10 years ago
|
||
> bz: why wouldn't we do what you proposed in comment 14 Because of compat worries? This bug is s-s because we don't know the answer to your question in comment 21!
Comment 23•10 years ago
|
||
Someone needs to find out. If we can't find out, we should do what comment 14 says. We should probably just do that anyway instead of going back and forth on it here! Something does not add up. This bug is a crash bug. It seems to have a specific fix: fallible allocation and OOM failure propagation. It *might* go away with a dreamy GC scheduling architecture from the future -- but it might not. We do not know yet. So we're to sit on an s-s bug that may have already leaked until next Firefox release? Let's please just fix the allocation site. /be
Comment 24•10 years ago
|
||
I do not understand "compat worries". We have OOM as uncatchable failure already. What code could possibly demand that it not happen for this bug's test? /be
![]() |
||
Comment 25•10 years ago
|
||
Brendan, we know this bug is not a security issue on trunk.
I just looked at the code. Failure of that mStyleStack buffer allocation is ok for 1.9.2 in and of itself. However failure of the append in InitializeWithSurface is not; that needs to be fixed. Also, failure of the append in Save() seems to be unsafe; it won't die right there, but after that access to CurrenState() will index out of bounds.
> What code could possibly demand that it not happen for this bug's test?
If we're unlucky, the answer is "facebook". ;) Seriously...
In any case, we need a 1.9.2 patch here to throw on those append failures. I'd be willing to consider a patch to make trunk and maybe branch throw on image surface alloc failure. It'd need to land before Friday.
Chris, want to work on this?
blocking1.9.2: --- → ?
Comment 26•10 years ago
|
||
bz: thanks for diagnosing. I still don't understand how OOMing can break a site, where indexing out of bounds makes that site work. Seems this is spot-fixable, in any case, and not something requiring GC-from-the-future -- confirm or deny. /be
![]() |
||
Comment 27•10 years ago
|
||
Yeah, I think fixing this particular thing doesn't require magic.
Comment 28•10 years ago
|
||
Dammit, what happened to bugzilla? I can't reassign to Chris. Andreas, can you do it? /be
Updated•10 years ago
|
Assignee: gal → jones.chris.g
Comment 29•10 years ago
|
||
Wow, lots of excitement in this bug. I just made a patch to feed this one allocation into the engine's memory pressure heuristic (mallocBytes). But if there is a WebGL only fix, even better. Handing over to Chris.
Assignee | ||
Comment 30•10 years ago
|
||
I didn't see any other problems beyond what Boris pointed out in comment 25. In SetDimensions(), if CheckSurfaceSize() fails, the context soldiers on with the invalid width/height values. That makes me a bit nervous but it looks like the mValid check is done properly where the dims are used, so no problems there. This code has a curious mix of vanilla operator new() with null-checking sometimes and sometimes not (doesn't matter, the null checks are pointless), but also use of operator new(nothrow) with meaningful null checks. Just for kicks, I ran the testcase in an rlimit-ed setup a few times to see what cropped up. I saw - crashing below nsMemoryImpl::FlushMemory, below a failed nsTArray::SetCapacity (for mStyleStack) - std::bad_alloc from NS_NewCSSParser (for mCSSParser) - std::bad_alloc from gfxPlatformGtk::CreateOffscreenSurface (below SetDimensions()) from ~10 runs. Based on failed SetCapacity() not being only theoretical, I'd definitely blocking+ this for 1.9.2, if it wasn't otherwise planned to be. I don't particularly care about the other two for 1.9.2.
Attachment #507055 -
Flags: review?(vladimir)
Comment 31•10 years ago
|
||
What about the 1.9.1 branch? Looks like the patched code is the same.
Assignee | ||
Comment 32•10 years ago
|
||
Yes, we should take this on both.
Updated•10 years ago
|
blocking1.9.1: ? → .18+
status1.9.1:
--- → wanted
Whiteboard: [sg:dos (trunk)][sg:critical? (1.9 branches)]
Comment on attachment 507055 [details] [diff] [review] Check for failed tarray realloc Looks fine, though please nuke the bogus { }'s around the single return :-) Weird that this particular nsTArray is an issue; this feels like wallpaper a bit, but better than nothing.
Attachment #507055 -
Flags: review?(vladimir) → review+
Assignee | ||
Comment 34•10 years ago
|
||
Alloc failures for that particular tarray will cause nsCanvasRenderingContext2D to read outside the array bounds. There are gazillion other OOM sites here that are just safe crashers.
Updated•10 years ago
|
blocking2.0: --- → -
![]() |
||
Comment 35•10 years ago
|
||
Note that I do think we should have a separate patch (in a separate bug?) for throwing on image surface alloc failure.
Assignee | ||
Comment 36•10 years ago
|
||
http://hg.mozilla.org/releases/mozilla-1.9.1/rev/59a1a0a2d9ff http://hg.mozilla.org/releases/mozilla-1.9.2/rev/aff9538a38fd
Status: NEW → RESOLVED
Closed: 10 years ago
Resolution: --- → FIXED
Assignee | ||
Comment 37•10 years ago
|
||
Boris or Andreas, one of you guys want to file the followup on the surface allocs? I'm not sure what the final plan is for that.
![]() |
||
Comment 38•10 years ago
|
||
I filed bug 630465.
Comment 39•10 years ago
|
||
This was checked in on branches a while ago (comment 36)
Updated•10 years ago
|
Group: core-security
Updated•8 years ago
|
Flags: in-testsuite-
You need to log in
before you can comment on or make changes to this bug.
Description
•