Closed Bug 1272123 Opened 8 years ago Closed 8 years ago

Crash in OOM | large | mozalloc_abort | mozalloc_handle_oom | moz_xrealloc | nsTArray_base<T>::EnsureCapacity<T> | mozilla::dom::CanvasRenderingContext2D::Save

Categories

(Core :: Graphics: Canvas2D, defect)

41 Branch
x86
Windows
defect
Not set
critical

Tracking

()

VERIFIED FIXED
mozilla51
Tracking Status
firefox46 --- wontfix
firefox47 --- wontfix
firefox48 --- wontfix
firefox49 --- wontfix
firefox-esr45 --- wontfix
firefox50 --- verified
firefox51 --- verified

People

(Reporter: u279076, Assigned: eflores)

Details

(Keywords: crash, Whiteboard: [gfx-noted])

Crash Data

Attachments

(2 files)

This bug was filed from the Socorro interface and is 
report bp-45a44ef8-2fea-4a73-83af-cdd8d2160511.
=============================================================
0 	mozglue.dll 	mozalloc_abort(char const* const) 	memory/mozalloc/mozalloc_abort.cpp:33
1 	mozglue.dll 	mozalloc_handle_oom(unsigned int) 	memory/mozalloc/mozalloc_oom.cpp:46
2 	mozglue.dll 	moz_xrealloc 	memory/mozalloc/mozalloc.cpp:107
3 	xul.dll 	nsTArray_base<nsTArrayInfallibleAllocator, nsTArray_CopyWithMemutils>::EnsureCapacity<nsTArrayInfallibleAllocator>(unsigned int, unsigned int) 	xpcom/glue/nsTArray-inl.h:183
4 	xul.dll 	mozilla::dom::CanvasRenderingContext2D::Save() 	dom/canvas/CanvasRenderingContext2D.cpp:1734
5 	xul.dll 	mozilla::dom::CanvasRenderingContext2DBinding::save 	obj-firefox/dom/bindings/CanvasRenderingContext2DBinding.cpp:1960
6 	xul.dll 	mozilla::dom::GenericBindingMethod(JSContext*, unsigned int, JS::Value*) 	dom/bindings/BindingUtils.cpp:2717
7 	xul.dll 	mozilla::SimpleTimer::Cancel() 	dom/media/VideoUtils.cpp:380
8 		@0x2fc6305f
=============================================================
More reports: https://crash-stats.mozilla.com/signature/?product=Firefox&signature=OOM+%7C+large+%7C+mozalloc_abort+%7C+mozalloc_handle_oom+%7C+moz_xrealloc+%7C+nsTArray_base%3CT%3E%3A%3AEnsureCapacity%3CT%3E+%7C+mozilla%3A%3Adom%3A%3ACanvasRenderingContext2D%3A%3ASave 	

These crashes show up going back as far as Firefox 41 so I don't think it's a recent regression but it does seem to be rising by a significant margin starting around March 23rd.
Multiple comments mention WeTransfer.com so that might be a lead. There does not appear to be a strong correlation to a particular system configuration (OS, GPU, Driver, etc).
(In reply to Anthony Hughes (:ashughes) [GFX][QA][Mentor] from comment #2)
> Multiple comments mention WeTransfer.com so that might be a lead. There does
> not appear to be a strong correlation to a particular system configuration
> (OS, GPU, Driver, etc).

Do you mean going to WeTransfer.com in browser can reproduce it?
Flags: needinfo?(anthony.s.hughes)
(In reply to Vincent Liu[:vliu] from comment #3)
> Do you mean going to WeTransfer.com in browser can reproduce it?

For me personally, no. All I mean is that multiple users who report this crash have a page on wetransfer.com open when the crash occurs. I cannot say for certain that wetransfer.com is the cause. I only mention it in case people are looking for possible leads to test.
Flags: needinfo?(anthony.s.hughes)
Managed to reproduce runaway memory usage on WeTransfer.com.

One of their animations (progress bar, I think) calls CanvasRenderingContext2D.save() on every frame without a matching restore().

This causes our style state stack at [1] to grow slowly, but without bound.

We should probably bound the size of the stack and make it circular.

[1] https://dxr.mozilla.org/mozilla-central/source/dom/canvas/CanvasRenderingContext2D.cpp#1782
Assignee: nobody → edwin
I'm not sure that this is the best course of action. We could also just ignore the save() call.

Does this page modify any canvas state after the save call? If not, we don't have to actually allocate a new array element for the state. We don't have any optimization for that case but we could add one if it helps this page. It's what Chrome seems to do; look for BaseRenderingContext2D::realizeSaves(). (I didn't find any state array length checks in Chrome code.)
(In reply to Markus Stange [:mstange] from comment #7)
> I'm not sure that this is the best course of action. We could also just
> ignore the save() call.

I considered that, but this way it's more likely that the result will be correct.

> Does this page modify any canvas state after the save call? If not, we don't
> have to actually allocate a new array element for the state. We don't have
> any optimization for that case but we could add one if it helps this page.
> It's what Chrome seems to do; look for
> BaseRenderingContext2D::realizeSaves(). (I didn't find any state array
> length checks in Chrome code.)

The style does appear to be modified in between save() calls, but always goes back to the same state just before the call is made. We might be able to get away with comparing the two elements at the top of the stack.

I compared to Chromium, which doesn't exhibit the same runaway memory usage. AFAICT they *should* have the same problem; currently investigating why the don't.
Comment on attachment 8765973 [details] [diff] [review]
1272123.patch

Removing review request until you've found out why Chrome doesn't have this problem.
Attachment #8765973 - Flags: review?(mstange)
Crash volume for signature 'OOM | large | mozalloc_abort | mozalloc_handle_oom | moz_xrealloc | nsTArray_base<T>::EnsureCapacity<T> | mozilla::dom::CanvasRenderingContext2D::Save':
 - nightly (version 50): 5 crashes from 2016-06-06.
 - aurora  (version 49): 54 crashes from 2016-06-07.
 - beta    (version 48): 676 crashes from 2016-06-06.
 - release (version 47): 8073 crashes from 2016-05-31.
 - esr     (version 45): 983 crashes from 2016-04-07.

Crash volume on the last weeks:
             Week N-1   Week N-2   Week N-3   Week N-4   Week N-5   Week N-6   Week N-7
 - nightly          0          1          1          0          1          2          0
 - aurora           6          9         10          8          9          8          1
 - beta           106        112        101        115        108         79         19
 - release       1326       1616       1125       1191       1180        885        255
 - esr            137        101        114        125        109        101         69

Affected platforms: Windows, Linux
Ack. My bad.

Chrome does share this problem; it just didn't show up in my minimal test case (which didn't dirty the state in between save() calls) because of that optimisation.
Is there any way for a user to work around this problem until we get a good long term fix?
Flags: needinfo?(edwin)
Comment on attachment 8765973 [details] [diff] [review]
1272123.patch

AFAICT it's either this or comparing the whole structure on every possibly-dirty save(). Re-requesting review.
Flags: needinfo?(edwin)
Attachment #8765973 - Flags: review?(mstange)
Comment on attachment 8765973 [details] [diff] [review]
1272123.patch

Review of attachment 8765973 [details] [diff] [review]:
-----------------------------------------------------------------

Ok. Can you add a crashtest?
Attachment #8765973 - Flags: review?(mstange) → review+
> Ok. Can you add a crashtest?

I'm not sure we can crash fast enough. This is a super slow OOM that takes about 30 to 60s to crash on my machine; probably much, much slower on test infra.
Pushed by eflores@mozilla.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/ac598dfc00b2
Limit the size of CanvasRenderingContext2D::mStyleStack - r=mstange
https://hg.mozilla.org/mozilla-central/rev/ac598dfc00b2
Status: NEW → RESOLVED
Closed: 8 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla51
Hi Edwin, could you please nominate for uplift to Aurora50?
Flags: needinfo?(edwin)
We might want that uplifted to beta too as the patch seems safe
Too late for 49, do you want to request uplift to aurora/50?
Flags: needinfo?(edwin)
mstange, eflores, This still could make it into 50.
Flags: needinfo?(mstange)
Flags: needinfo?(edwin)
Comment on attachment 8765973 [details] [diff] [review]
1272123.patch

Approval Request Comment
[Feature/regressing bug #]: Canvas 2D.
[User impact if declined]: Crashes on some tablets when viewing 2D canvas content.
[Describe test coverage new/current, TreeHerder]: Been on m-c for about a month.
[Risks and why]: Very low risk, if any.
[String/UUID change made/needed]: None.
Flags: needinfo?(mstange)
Flags: needinfo?(edwin)
Attachment #8765973 - Flags: approval-mozilla-aurora?
Comment on attachment 8765973 [details] [diff] [review]
1272123.patch

Crash fix, Aurora50+
Attachment #8765973 - Flags: approval-mozilla-aurora? → approval-mozilla-aurora+
This has started spiking recently for users with the ru locale (~42% of crashes with this signature have useragent_locale=ru vs ~9% overall in 48).
Probably because of this website: http://kinogo.club/
Flags: qe-verify+
I couldn't reproduce the crash on Microsoft Surface Pro with Windows 10 64-bit.
Still, there are no new crashes in Socorro after the fix landed so I'm marking this as verified based on that.
Status: RESOLVED → VERIFIED
You need to log in before you can comment on or make changes to this bug.