Closed Bug 881634 Opened 7 years ago Closed 6 years ago

crash in mozilla::gfx::DrawTargetSkia::DrawSurface

Categories

(Core :: Canvas: 2D, defect, critical)

24 Branch
x86
Windows 7
defect
Not set
critical

Tracking

()

RESOLVED FIXED
mozilla25
Tracking Status
firefox23 --- unaffected
firefox24 + wontfix
firefox25 --- verified

People

(Reporter: scoobidiver, Assigned: mattwoodrow)

References

Details

(4 keywords)

Crash Data

Attachments

(2 files, 3 obsolete files)

It's the Windows (mainly XP) version of bug 880842 but this one doesn't occur at startup.

It first showed up in 24.0a1/20130607. The regression range is:
http://hg.mozilla.org/mozilla-central/pushloghtml?fromchange=204de5b7e0a6&tochange=dc8e78ed8c44
It's likely a regression from bug 848482.

Signature 	mozilla::gfx::DrawTargetSkia::DrawSurface(mozilla::gfx::SourceSurface*, mozilla::gfx::RectTyped<mozilla::gfx::UnknownUnits> const&, mozilla::gfx::RectTyped<mozilla::gfx::UnknownUnits> const&, mozilla::gfx::DrawSurfaceOptions const&, mozilla::gfx::DrawO... More Reports Search
UUID	250e5cef-5a15-4238-944a-0c70c2130610
Date Processed	2013-06-10 22:40:38
Uptime	9153
Last Crash	2.5 hours before submission
Install Age	2.5 hours since version was first installed.
Install Time	2013-06-10 20:09:49
Product	Firefox
Version	24.0a1
Build ID	20130610031147
Release Channel	nightly
OS	Windows NT
OS Version	5.1.2600 Service Pack 2
Build Architecture	x86
Build Architecture Info	AuthenticAMD family 6 model 8 stepping 1
Crash Reason	EXCEPTION_ACCESS_VIOLATION_READ
Crash Address	0x0
App Notes 	
AdapterVendorID: 0x1002, AdapterDeviceID: 0x5964, AdapterSubsysID: 2073148c, AdapterDriverVersion: 6.14.10.6462
D3D10 Layers? D3D10 Layers- D3D9 Layers? D3D9 Layers- 
Processor Notes 	sp-processor02_phx1_mozilla_com_18030:2012; SignatureTool: signature truncated due to length
EMCheckCompatibility	True
Adapter Vendor ID	0x1002
Adapter Device ID	0x5964
Total Virtual Memory	2147352576
Available Virtual Memory	1372684288
System Memory Use Percentage	98
Available Page File	8904704
Available Physical Memory	25456640

Frame 	Module 	Signature 	Source
0 	gkmedias.dll 	mozilla::gfx::DrawTargetSkia::DrawSurface 	gfx/2d/DrawTargetSkia.cpp:275
1 	xul.dll 	mozilla::dom::CanvasRenderingContext2D::DrawWindow 	content/canvas/src/CanvasRenderingContext2D.cpp:3229
2 	xul.dll 	mozilla::dom::CanvasRenderingContext2DBinding::drawWindow 	obj-firefox/dom/bindings/CanvasRenderingContext2DBinding.cpp:3084
3 	xul.dll 	mozilla::dom::CanvasRenderingContext2DBinding::genericMethod 	obj-firefox/dom/bindings/CanvasRenderingContext2DBinding.cpp:4021
4 	mozjs.dll 	js::Invoke 	js/src/vm/Interpreter.cpp:389
5 	mozjs.dll 	js::Interpret 	js/src/vm/Interpreter.cpp:2211
6 	mozjs.dll 	js::RunScript 	js/src/vm/Interpreter.cpp:346
7 	mozjs.dll 	SendToGenerator 	js/src/jsiter.cpp:1557
8 	mozjs.dll 	generator_send 	js/src/jsiter.cpp:1646
9 	mozjs.dll 	js::Invoke 	js/src/vm/Interpreter.cpp:389
10 	mozjs.dll 	js::Invoke 	js/src/vm/Interpreter.cpp:435
...

More reports at:
https://crash-stats.mozilla.com/report/list?signature=mozilla%3A%3Agfx%3A%3ADrawTargetSkia%3A%3ADrawSurface%28mozilla%3A%3Agfx%3A%3ASourceSurface*%2C+mozilla%3A%3Agfx%3A%3ARectTyped%3Cmozilla%3A%3Agfx%3A%3AUnknownUnits%3E+const%26%2C+mozilla%3A%3Agfx%3A%3ARectTyped%3Cmozilla%3A%3Agfx%3A%3AUnknownUnits%3E+const%26%2C+mozilla%3A%3Agfx%3A%3ADrawSurfaceOptions+const%26%2C+mozilla%3A%3Agfx%3A%3ADrawO%2E%2E%2E
Depends on: 880842
It's #19 browser crasher in 24.0a2 and 25.0a1.

According to comments, it's related to PDF Viewer.
It's now #10 browser crasher in 24.0a2 and #38 in 25.0a1.
Keywords: topcrash
OS: Windows XP → Windows 7
QA Contact: jbecerra
QA Contact: jbecerra
Adding qawanted to see if they can hepl reproduce this.

:Snorp given this may be a fallout bug 848482 and  Windows (mainly XP) version of bug 880842 , anything that's obvious here from your investigation ?
Assignee: nobody → snorp
Mozilla/5.0 (Windows NT 5.1; rv:24.0) Gecko/20130724 Firefox/24.0
Mozilla/5.0 (Windows NT 5.1; rv:25.0) Gecko/20130723 Firefox/25.0

I was able to crash Aurora 24.0a2 (Build ID: 20130724004003) everytime with this signature, but took some time (~ 15 minutes).

STR:
1. Open http://www.ada.org/sections/educationAndCareers/pdfs/dat_test_sampleitems.pdf in 9 separate tabs.
2. Switch tabs, zoom in/out, scroll.

Results: at step 2, Firefox crashed; before the crash, there are some rendering issues.

Notes:
1. I've also crashed Nightly 25.0a1 (Build ID: 20130723030205), but with [@ EMPTY: no crashing thread identified; corrupt dump ] signature. The same issue occured on Windows 7 x32 bit: able to crash Nightly 25.0a1 for 4 times with [@ EMPTY: no crashing thread identified; corrupt dump ] signature. Reports:
https://crash-stats.mozilla.com/report/list?signature=EMPTY%3A+no+crashing+thread+identified%3B+corrupt+dump&product=Firefox&query_type=contains&range_unit=weeks&process_type=any&version=Firefox%3A25.0a1&hang_type=any&date=2013-07-24+14%3A00%3A00&range_value=1#reports
QA Contact: alexandra.lucinet
Keywords: reproducible
Bas, could you take a look, we're trying to keep :snorp on SkiaGL.
Assignee: snorp → bas
I'll have a look, George is probably a more obvious candidate though as I do not know the Skia Moz2D backend at all.
(In reply to Bas Schouten (:bas.schouten) from comment #6)
> I'll have a look, George is probably a more obvious candidate though as I do
> not know the Skia Moz2D backend at all.

Understood, just trying to keep the mobile going for SkiaGL, so if you can help, great.
I have no idea if this fixes the bug, but its a good cleanup and should make the issue easier to understand if it doesn't
Attachment #783277 - Flags: review?(bas)
Comment on attachment 783277 [details] [diff] [review]
skia-snapshot-cleanup

Review of attachment 783277 [details] [diff] [review]:
-----------------------------------------------------------------

This is better anyway, we'll see if it fixes it.

::: gfx/2d/DrawTargetSkia.cpp
@@ +101,2 @@
>  
> +  return snapshot;

Interesting, we don't do this temporary refptr thing in D2D (although it's obviously correct) .. I strongly suspect that's causing us to never actually re-use the snapshot in D2D :) (as the forget will null out mSnapshot in the D2D case).
Attachment #783277 - Flags: review?(bas) → review+
Let's get this landed and see if it helps us.
I *think* this is fixed now, there's only been one crash on 20130806 or newer, and that has a different signature.

Might wait another day or two to be sure.
(In reply to Matt Woodrow (:mattwoodrow) from comment #15)
> I *think* this is fixed now, there's only been one crash on 20130806 or
> newer, and that has a different signature.
There's still only one post-fix crash but it has the same stack trace: bp-d63aa865-ae4e-4717-bc80-cbdd92130807.

It has dropped in volume since 25.0a1/20130717 so before the patch landing. The working range is:
http://hg.mozilla.org/mozilla-central/pushloghtml?fromchange=5976b9c673f8&tochange=0888e29c83a3
That one looks like a different crash address. It's crashing on mCanvas->Restore() instead of MarkChanged().

I also see a lot of reports up until 20130806, and nothing since. I think that's enough to say that this is fixed.
Status: NEW → RESOLVED
Closed: 6 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla25
Whiteboard: [leave-open]
It's #14 browser crasher in 24.0b1 and #12 in 24.0b2. It should be uplifted to Beta after 9 m-c builds of baking.
needinfo on :bas to help with uplift.
Flags: needinfo?(bas)
(In reply to bhavana bajaj [:bajaj] from comment #19)
> needinfo on :bas to help with uplift.

What may I help you with? :) I'm not the patch author although I'm certainly willing to help?
Flags: needinfo?(bas)
(In reply to Bas Schouten (:bas.schouten) from comment #20)
> (In reply to bhavana bajaj [:bajaj] from comment #19)
> > needinfo on :bas to help with uplift.
> 
> What may I help you with? :) I'm not the patch author although I'm certainly
> willing to help?

whoops, I thought you were the author given the assignee on the bug ..nvm, needinfoing Matt here so he can help with uplift on beta.
Flags: needinfo?(matt.woodrow)
Assignee: bas → matt.woodrow
Flags: needinfo?(matt.woodrow)
Comment on attachment 783277 [details] [diff] [review]
skia-snapshot-cleanup

[Approval Request Comment]
Bug caused by (feature/regressing bug #): bug 848482
User impact if declined: Crashes
Testing completed (on m-c, etc.): Been on m-c for a few weeks.
Risk to taking this patch (and alternatives if risky): Fairly low risk, any issues should have been caught by now.
String or IDL/UUID changes made by this patch: None
Attachment #783277 - Flags: approval-mozilla-beta?
Comment on attachment 783277 [details] [diff] [review]
skia-snapshot-cleanup

Low risk patch helps with a top-crasher.Please make sure to land asap to get this into tomorrow's beta.
Attachment #783277 - Flags: approval-mozilla-beta? → approval-mozilla-beta+
Needs a branch-specific patch for uplift.
Flags: needinfo?(matt.woodrow)
Mozilla/5.0 (Windows NT 5.1; rv:24.0) Gecko/20100101 Firefox/24.0
Mozilla/5.0 (Windows NT 5.1; rv:26.0) Gecko/20100101 Firefox/26.0

Still reproducible on Firefox 24 beta 6 (Build ID: 20130826142034) with STR from comment 4; Here are the crash reports:
http://goo.gl/cvgGmq
http://goo.gl/Si4OR5

Couldn't reproduce with latest Nightly (Build ID: 20130826190748).
In Socorro there are crashes with this signature in the latest builds: http://goo.gl/H8NtDW, but not after 2013-08-22.

Any thoughts why it's still crashing?
(In reply to Alexandra Lucinet [QA] from comment #26)
> Any thoughts why it's still crashing?

Matt?
Flags: needinfo?(matt.woodrow)
(In reply to Alexandra Lucinet [QA] from comment #26)
> Still reproducible on Firefox 24 beta 6 (Build ID: 20130826142034) with STR
> from comment 4

Crash stats agree, this is still #6 on 24.0b6, see https://crash-stats.mozilla.com/topcrasher/products/Firefox/versions/24.0b6?days=7
Attached patch Check for null surface (obsolete) — Splinter Review
This looks like a pretty standard OOM crash. Opening an enormous pdf document 9 times, and then scrolling around means that pdf.js loads a lot of data.

Memory usage steadily builds until we run out, and rendering starts to break down. Eventually we hit this path and the function fails to return a valid surface and we crash since DrawTargetSkia::DrawSurface doesn't null check aSurface (and I don't think it should).

I have *no* idea how my other patch could have prevented this from happening on the other branches.

I don't think we're actually leaking anything, since using 'Minimize memory usage' from about:memory drops the memory usage back to reasonable levels.

I'm going on PTO as of now, so I'd appreciate if someone else could handle landing this on all branches. It's very low risk, just null checking.

As for the TODO I added, I think the right solution is to create a SourceSurfaceRawData (Factory::CreateWrappingDataSourceSurface) and implement proper support for those in all the backends (DrawTargetSkia at least won't draw with them). Or add another method/parameter so that we can have CreateSourceSurfaceFromData without requiring a copy.
Attachment #797670 - Flags: review?(ncameron)
Flags: needinfo?(matt.woodrow)
needinfo'ing nrc here to request help with review and landings here of Matt's patch here as appropriate, given we would like to uplift this asap ?
Flags: needinfo?(ncameron)
Comment on attachment 797670 [details] [diff] [review]
Check for null surface

Review of attachment 797670 [details] [diff] [review]:
-----------------------------------------------------------------

::: content/canvas/src/CanvasRenderingContext2D.cpp
@@ +3249,5 @@
>                                             FORMAT_B8G8R8A8);
> +    if (data) {
> +      mgfx::Rect rect(0, 0, w, h);
> +      mTarget->DrawSurface(data, rect, rect);
> +      mTarget->Flush();

I think we should probably be throwing an error rather than silently failing
(In reply to bhavana bajaj [:bajaj] from comment #30)
> needinfo'ing nrc here to request help with review and landings here of
> Matt's patch here as appropriate, given we would like to uplift this asap ?

I'll get this landed NZ Monday so its ready for the uplift.
Flags: needinfo?(ncameron)
Attachment #797670 - Attachment is obsolete: true
Attachment #797670 - Flags: review?(ncameron)
Attachment #798338 - Flags: review?(roc)
This time with less Monday morning.
Attachment #798338 - Attachment is obsolete: true
Attachment #798338 - Flags: review?(roc)
Attachment #798340 - Flags: review?(roc)
Third time lucky. Sorry for the dumbness.
Attachment #798340 - Attachment is obsolete: true
Attachment #798340 - Flags: review?(roc)
Attachment #798341 - Flags: review?(roc)
Comment on attachment 798341 [details] [diff] [review]
Check for null surface and throw an error

Review of attachment 798341 [details] [diff] [review]:
-----------------------------------------------------------------

::: content/canvas/src/CanvasRenderingContext2D.cpp
@@ +3247,5 @@
>                                             IntSize(size.width, size.height),
>                                             img->Stride(),
>                                             FORMAT_B8G8R8A8);
> +    if (!data) {
> +      error.Throw(NS_ERROR_FAILURE);

NS_ERROR_OUT_OF_MEMORY
Attachment #798341 - Flags: review?(roc) → review+
Tested on 24.0b8 on a fairly resource challenged XP machine.

I loaded the large pdf in comment #4 in 10 tabs.  Fx was slow, but it wasn't 'til I attempted to zoom in one of the tabs that performance went bad very quickly, browser was "not responding".  Eventually, I got the slow script warning (which also went "not responding").  Eventually the browser crashed but with the empty stack: https://crash-stats.mozilla.com/report/index/015b1eee-ce6b-480a-bd66-4d4462130903, which is filed as bug 711568.
tested with Nightly on the same machine and could not reproduce.  Perf was better.   Also tried again on 24beta8. perf slow but didn't crash.
The crash signature has disappeared from b8 topcrashes at least, yes.
Tracy/Kairo, it was told it crashkill that the patch here is not fixing the top-crash entirely, but may just be delaying the crash instead, if I heard it right ?Can you confirm ?
(In reply to bhavana bajaj [:bajaj] from comment #41)
> Tracy/Kairo, it was told it crashkill that the patch here is not fixing the
> top-crash entirely, but may just be delaying the crash instead, if I heard
> it right ?Can you confirm ?

I'm basing that on comment #38 here, mostly.
(In reply to Robert Kaiser (:kairo@mozilla.com) from comment #42)
> (In reply to bhavana bajaj [:bajaj] from comment #41)
> > Tracy/Kairo, it was told it crashkill that the patch here is not fixing the
> > top-crash entirely, but may just be delaying the crash instead, if I heard
> > it right ?Can you confirm ?
> 
> I'm basing that on comment #38 here, mostly.

Tracy, In that case it will be good to understand and compare current situation with pre Fx24b6 and report back how exactly the patch help or does not.
The fix here doesn't improve user experience at all.  Fx crashes just a well pre-fix and post-fix.
Clarifying.  
This crash is fixed. Just retested on the Same resource challenged XP machine and did not crash.

I have encountered the EMPTY stack crash at least once trying the same STR's here.  However, that is a different crash, probably OOM, can't tell.
This bug is definitely OOM related.

My patch fixed one case where we were crashing when a large allocation failed and so will buy us more time.

It doesn't do anything about the actual increasing memory usage though, so we will still eventually reach a limit where even small allocations fail, and we have no possible way of handling all of those without crashing.

I think the only way forward here is to modify pdf.js to be less aggressive in it's use of resources, and possibly find a way to get it to respond to low memory notifications.
This also needs to be backed out of https://hg.mozilla.org/releases/mozilla-esr24 - Matt can you get that?
Flags: needinfo?(matt.woodrow)
RyanVM got it in the merge from m-b: http://hg.mozilla.org/releases/mozilla-esr24/rev/59babbe18671
Flags: needinfo?(matt.woodrow)
Flagging for verification in Firefox 25.
Keywords: verifyme
Mozilla/5.0 (Windows NT 5.1; rv:25.0) Gecko/20100101 Firefox/25.0

Every time I load the PDFs from comment 4 and zooming in/out, Firefox 25 beta 2 crashes (Build ID: 20130923194050) with empty stack: https://crash-stats.mozilla.com/report/index/1c1ecf5b-924a-4a3c-b581-b78902130924. Since the new crash signature is filled as bug 711568 and also per comment 46, how could I help further on?
Flags: needinfo?(matt.woodrow)
Alexandra, since you got to the Empty signature crash, this can be verified fixed for 25. Thanks for testing.
Flags: needinfo?(matt.woodrow)
You need to log in before you can comment on or make changes to this bug.