Last Comment Bug 614772 - crash with taskbar preview [@ gfxContext::NewPath() ] mainly with Firefox Showcase
: crash with taskbar preview [@ gfxContext::NewPath() ] mainly with Firefox Sho...
Status: RESOLVED FIXED
: crash
Product: Core
Classification: Components
Component: Graphics (show other bugs)
: Trunk
: x86 Windows 7
: -- critical with 2 votes (vote)
: mozilla22
Assigned To: Brian R. Bondy [:bbondy]
:
Mentors:
: 644124 692980 (view as bug list)
Depends on:
Blocks: 559613
  Show dependency treegraph
 
Reported: 2010-11-25 02:04 PST by Scoobidiver (away)
Modified: 2013-03-23 16:05 PDT (History)
15 users (show)
See Also:
Crash Signature:
QA Whiteboard:
Iteration: ---
Points: ---
Has Regression Range: ---
Has STR: ---
-
-
-
-


Attachments
Possible crash fix for taskbar previews (910 bytes, patch)
2011-09-01 11:30 PDT, Brian R. Bondy [:bbondy]
jmuizelaar: review-
Details | Diff | Review
Patch for documenting the cairo patch. (2.30 KB, patch)
2011-09-06 06:49 PDT, Brian R. Bondy [:bbondy]
jmuizelaar: review-
Details | Diff | Review
Extension that should crash quite often (206.92 KB, application/octet-stream)
2012-02-14 05:37 PST, Josep del Rio
no flags Details
Patch v2. (886 bytes, patch)
2013-03-11 06:48 PDT, Brian R. Bondy [:bbondy]
jmuizelaar: review+
Details | Diff | Review

Description Scoobidiver (away) 2010-11-25 02:04:54 PST
It is a residual crash signature that exist in 3.0, 3.6 and trunk builds.
It is #225 top crasher in 4.0b7 for the last week.

Two comments say:
"clicked a close (X) button on the windows 7 taskbar tab preview of the leftmost tab"
"something with windows 7 taskbar preview and multiple tabs. it just quit unexpectedly."

Signature	gfxContext::NewPath()
UUID	ad59b146-9156-41b7-ba7d-7a2352101124
Time 	2010-11-24 18:13:05.623157
Uptime	2410
Install Age	2410 seconds (40.2 minutes) since version was first installed.
Product	Firefox
Version	4.0b8pre
Build ID	20101124042634
Branch	2.0
OS	Windows NT
OS Version	6.1.7600
CPU	x86
CPU Info	GenuineIntel family 6 model 23 stepping 7
Crash Reason	EXCEPTION_ACCESS_VIOLATION_READ
Crash Address	0x4
App Notes 	AdapterVendorID: 10de, AdapterDeviceID: 06e4

Crashing Thread
Frame 	Module 	Signature [Expand] 	Source
0 	xul.dll 	gfxContext::NewPath 	gfx/thebes/gfxContext.cpp:113
1 	xul.dll 	PresShell::RenderDocument 	layout/base/nsPresShell.cpp:5301
2 	xul.dll 	nsCanvasRenderingContext2D::DrawWindow 	content/canvas/src/nsCanvasRenderingContext2D.cpp:3636
3 	xul.dll 	nsIDOMCanvasRenderingContext2D_DrawWindow 	obj-firefox/js/src/xpconnect/src/dom_quickstubs.cpp:3566
4 	mozjs.dll 	js::Interpret 	js/src/jsinterp.cpp:4748
5 	mozjs.dll 	js::RunScript 	js/src/jsinterp.cpp:657
6 	mozjs.dll 	js::Invoke 	js/src/jsinterp.cpp:737
7 	mozjs.dll 	js::ExternalInvoke 	js/src/jsinterp.cpp:858
8 	mozjs.dll 	JS_CallFunctionValue 	js/src/jsapi.cpp:4973
9 	xul.dll 	nsXPCWrappedJSClass::CallMethod 	js/src/xpconnect/src/xpcwrappedjsclass.cpp:1694
10 	xul.dll 	nsXPCWrappedJS::CallMethod 	js/src/xpconnect/src/xpcwrappedjs.cpp:588
11 	xul.dll 	PrepareAndDispatch 	xpcom/reflect/xptcall/src/md/win32/xptcstubs.cpp:114
12 	xul.dll 	SharedStub 	xpcom/reflect/xptcall/src/md/win32/xptcstubs.cpp:141
13 	xul.dll 	mozilla::widget::TaskbarPreview::DrawBitmap 	widget/src/windows/TaskbarPreview.cpp:404

More reports at:
http://crash-stats.mozilla.com/report/list?product=Firefox&query_search=signature&query_type=exact&range_value=4&range_unit=weeks&hang_type=any&process_type=any&plugin_field=&plugin_query_type=&plugin_query=&do_query=1&admin=&signature=gfxContext%3A%3ANewPath%28%29
Comment 1 Rob Arnold [:robarnold] 2010-11-26 21:43:18 PST
It looks like the thebes surface on the canvas is null. This is probably due to bug 559613 but I'm not sure exactly how.
Comment 2 Ed Morley [:emorley] 2011-03-23 07:53:42 PDT
*** Bug 644124 has been marked as a duplicate of this bug. ***
Comment 3 Ed Morley [:emorley] 2011-03-23 07:55:16 PDT
See the newly duped bug 644124 for STR :-)
Comment 4 Brian R. Bondy [:bbondy] 2011-09-01 08:46:05 PDT
This crash happened to me as I was mouse hovering over each of the tab previews back and forth.  I can't reproduce anymore though.

This happens in thebes context when mCairo is not created correctly.
I.e. in the constructor of gfxContext we call:

> mCairo = cairo_create(surface->CairoSurface());

Trying currently to find a way in code to reproduce without using the debugger to manually set NULL on mCairo.  Perhaps by continually creating a surface.
Comment 5 Josh Matthews [:jdm] 2011-09-01 08:58:41 PDT
cairo_create should always be giving back a valid cairo object which either performs no-ops or correct behaviour.
Comment 6 Brian R. Bondy [:bbondy] 2011-09-01 09:03:08 PDT
Ya I seen the comment in cairo_create and I seen the calls to cairo_create_in_error on error. 

Currently debugging to see if there is another way to get it to crash at the same location other than having a NULL mCairo.
Comment 7 Brian R. Bondy [:bbondy] 2011-09-01 09:44:25 PDT
The only way I can reproduce (manually with the debugger) is with a NULL or invalid mCairo.  Not with a NULL surface nor context. 

The noop object we return is one per error code from 0 to CAIRO_STATUS_LAST_STATUS.  I think perhaps we're offsetting that by too much and getting passed the array and returning an invalid cairo noop pointer.

Still debugging though.
Comment 8 Brian R. Bondy [:bbondy] 2011-09-01 11:30:43 PDT
Created attachment 557582 [details] [diff] [review]
Possible crash fix for taskbar previews

As per Comment 7:

I couldn't find a case where the cairo status is less than 0 or more than CAIRO_STATUS_LAST_STATUS but I think this is the cause of the crash.  Perhaps because of an invalid cairo surface passed to cairo_create or perhaps -1 is being returned as a status somewhere.

In any case I think the fix is worth it and we can see if the crashes go away on Nightly or not for new builds.
The crashes happen and are reported every day a few times, and happen in FF6 dozens of times per day.

It's better than what we currently do in cairo_create_in_error: blindly index into _cairo_nil__objects with the status error code.
Comment 9 Joe Drew (not getting mail) 2011-09-02 13:54:20 PDT
Comment on attachment 557582 [details] [diff] [review]
Possible crash fix for taskbar previews

Jeff's a better person for this, I think. At minimum, you're going to want to put a patch in cairo/ and add it to our list of patches in the README.
Comment 10 Brian R. Bondy [:bbondy] 2011-09-06 06:49:35 PDT
Created attachment 558463 [details] [diff] [review]
Patch for documenting the cairo patch.

As per Joe's comment 9.
Comment 11 Jeff Muizelaar [:jrmuizel] 2011-09-07 06:41:05 PDT
Comment on attachment 557582 [details] [diff] [review]
Possible crash fix for taskbar previews

I thought I had reviewed this already, but I guess I forgot to submit.

I don't think we should do this it feels too much like a band-aid. Can you find the caller that's setting this and fix it there?
Comment 12 Brian R. Bondy [:bbondy] 2011-09-07 06:45:37 PDT
I'll try to find a root cause but if found, I think in addition to that fix we should still do this check since it's better than a memory violation if ever a bad error code is returned from anything in cairo.
Comment 13 Jeff Muizelaar [:jrmuizel] 2011-09-23 14:39:05 PDT
(In reply to Brian R. Bondy [:bbondy] from comment #12)
> I'll try to find a root cause but if found, I think in addition to that fix
> we should still do this check since it's better than a memory violation if
> ever a bad error code is returned from anything in cairo.

It seems like that is more likely to postpone problems. If we do add a check, I think that it ensure that we crash rather than avoid it.
Comment 14 Brian R. Bondy [:bbondy] 2011-09-26 07:51:49 PDT
I can't reproduce this crash ever anymore for the past few weeks.  Could we do something like add telemetry to notify us if the status code I suspect is out of range is indeed out of range?  It would basically just be a patch we put on nightly temporary and revert it out after we have the data?
Comment 15 Jeff Muizelaar [:jrmuizel] 2011-09-26 08:20:45 PDT
(In reply to Brian R. Bondy [:bbondy] from comment #14)
> I can't reproduce this crash ever anymore for the past few weeks.  Could we
> do something like add telemetry to notify us if the status code I suspect is
> out of range is indeed out of range?  It would basically just be a patch we
> put on nightly temporary and revert it out after we have the data?

Definitely. I've done something like this in the past:
http://mxr.mozilla.org/mozilla-central/ident?i=CrashSpline
Comment 16 Brian R. Bondy [:bbondy] 2011-09-26 08:22:47 PDT
Great, I'll approach it that way and re-submit a patch.  Will probably be next week as I'm on several other things this week. Thanks!
Comment 17 Mats Palmgren (:mats) 2011-10-07 17:21:35 PDT
*** Bug 692980 has been marked as a duplicate of this bug. ***
Comment 18 Scoobidiver (away) 2012-01-13 10:33:20 PST
It's correlated in 9.0.1 with Firefox Showcase: 
99% (134/136) vs.   0% (220/101230) {89506680-e3f4-484c-a2c0-ed711d481eda} (Firefox Showcase, https://addons.mozilla.org/addon/1810)
Comment 19 Verdi [:verdi] 2012-01-18 08:21:32 PST
(In reply to Scoobidiver from comment #18)
> It's correlated in 9.0.1 with Firefox Showcase: 
> 99% (134/136) vs.   0% (220/101230) {89506680-e3f4-484c-a2c0-ed711d481eda}
> (Firefox Showcase, https://addons.mozilla.org/addon/1810)

I just had a user confirm that disabling the Showcase extension fixed their crash problem - https://support.mozilla.org/en-US/questions/913408
Comment 20 Brian R. Bondy [:bbondy] 2012-01-18 08:48:50 PST
Awesome, I'll try to fix it with this new info this week or next.
Comment 21 Brian R. Bondy [:bbondy] 2012-01-18 11:26:35 PST
I downloaded the extension, opened over 100 tabs and tried using each of the buttons available with that extension, but unfortunately could still not reproduce.

Likely the crashes are seen with this extension because it provides much more previews than is available via the taskbar previews option which is capped by Windows for the number of previews that are shown.

It anyone has more information on how to reproduce please let me know.
Comment 22 Josep del Rio 2012-01-18 14:15:18 PST
(In reply to Brian R. Bondy [:bbondy] from comment #21)
> I downloaded the extension, opened over 100 tabs and tried using each of the
> buttons available with that extension, but unfortunately could still not
> reproduce.
> 
> Likely the crashes are seen with this extension because it provides much
> more previews than is available via the taskbar previews option which is
> capped by Windows for the number of previews that are shown.
> 
> It anyone has more information on how to reproduce please let me know.

I'm Showcase's author; the users that have contacted me with this issue reported that it was triggered by enabling the "Thumbnail caching" option; this option can be found in the "Advanced" panel, and is disabled by default.
Comment 23 Brian R. Bondy [:bbondy] 2012-01-18 17:15:45 PST
Thanks for the info, I tried that option but cannot reproduce still.
Comment 24 Josep del Rio 2012-01-18 18:39:22 PST
It instantly crashed for me :)

This is what I did:
- Have some tabs loaded with real websites (blank thumbnails are special cases).
- Press F12 to bring Showcase.
- Close it
- Go to the Showcase settings panel, "Advanced" tab.
- Toggle the cache option.

At this point, it crashed; if it doesn't in your computer, try reloading a tab, it appears to be what triggers the crash.

I can provide any information you need; to start with, the user agent is:
Mozilla/5.0 (Windows NT 6.1; WOW64; rv:10.0) Gecko/20100101 Firefox/10.0
browser.startup.homepage_override.buildID 20120111092507
Comment 25 Josep del Rio 2012-01-18 18:42:00 PST
OK, that's odd... now it doesn't crash following the same instructions. I'll be trying different things, see if I can reliably reproduce the issue.
Comment 26 Josep del Rio 2012-01-18 20:09:58 PST
After trying different things, it appears that the crashing happens, but it's random in most cases; however, opening http://www.terra.es/ seemed to reliably crash Firefox when the "thumbnail cache" in Showcase was enabled.

My feeling is that the problem happens when a tab is painted with "drawWindow" multiple times at once; a testcase like that should be trivial to implement.
Comment 27 Brian R. Bondy [:bbondy] 2012-01-19 10:38:19 PST
I tried with that site and with thumbnail cache enabled, but no crashing.
I tried reloading several times as well as opening over 100 other tabs.

I'll investigate further through code since I can't reproduce.
Comment 28 Josep del Rio 2012-01-19 14:23:00 PST
Yes, I just tried the website I referred earlier, and now it doesn't crash; perhaps it was something that was present in that website at that moment.

I also tried to modify the code to paint the thumbnail multiple times, but is not crashing either.

I'll try to create a proper test case that causes the crash, but it appears to be tricky to reproduce.
Comment 29 Josep del Rio 2012-01-19 15:05:45 PST
It appears to crash quite consistently when opening flash games; for instance, try this one (which is very basic and small):
http://armorgames.com/play/21/phit
Comment 30 Brian R. Bondy [:bbondy] 2012-01-19 16:03:18 PST
No crash yet, I have about 20 tabs open with that page.
I noticed that the extension has an export settings option. Could you export your settings to me so I can import them?
Comment 31 Josep del Rio 2012-01-19 16:57:51 PST
I was quite sure I was using the default configuration, but just in case I reset to defaults, and only change the thumbnail caching option; the crash still happens.

Are you using any Flash/ad blocker?
Comment 32 Brian R. Bondy [:bbondy] 2012-01-20 05:50:11 PST
OK so we have the same config, no I don't have a flash / ad blocker.
Comment 33 Josep del Rio 2012-01-20 18:17:55 PST
Just tried the "Enable cache" option with 10.0 beta 5, with hardware acceleration enabled and disabled, and the crash still happens very often.

It's not about opening a lot of windows, it seems to be about browsing; also, it happens to crash a lot on "session restore" with GMail.

If there's any information or test I can provide that would help you, let me know.
Comment 34 Josep del Rio 2012-01-20 20:14:07 PST
One more piece of information; I had that thumbnail caching option enabled and running without problems when toggling the following option:
extensions.showcase.updateThumbnailWhenScroll (from the default value, enabled, to disabled)

In the code, this will disable the thumbnail refreshing when scrolling; this is done through a nsITimer instance, to prevent a large number of thumbnail refreshing; maybe it would be a good idea to correlate this code with the one that is failing in the Firefox side to see if they've anything in common.
Comment 35 Josep del Rio 2012-02-13 21:07:35 PST
I was trying different things to try to fix the problem, and I've come up with a change in my extension that seems to increase quite a lot the crashing frequency; should I upload it here?
Comment 36 Brian R. Bondy [:bbondy] 2012-02-14 04:27:46 PST
Yes please, more of the same crashes would be great
Comment 37 Josep del Rio 2012-02-14 05:37:29 PST
Created attachment 596987 [details]
Extension that should crash quite often
Comment 38 Josep del Rio 2012-02-14 05:46:15 PST
You still need to set the "Enable thumbnail caching" from the "Advanced" tab of this extension's settings panel.

The difference between this version and the previous one is that between creating the canvas and setting its size, it rendered the window using a "setTimeout" with 0ms delay; doing it immediately increases the amount of crashes in a noticeable way; it even crashed when trying to bring the "Add-ons" panel.

Things that I also tried, but without any effect, is to set safe parameters in the drawWindow call, ensure the size of the canvas is valid, and use a "setTimeout" instead of a nsITimer.

Also, I confirmed that the crash happens in the drawWindow call; commenting that operation gets rid of the crashes.

Let me know if you're able to reproduce the problem with this version.
Comment 39 Brian R. Bondy [:bbondy] 2012-02-14 16:27:15 PST
Sorry I can't reproduce even with the new extension.  Perhaps I can do a remote login to your machine with gotomeeting, logmein, or something similar to get a call stack?
Comment 40 Brian R. Bondy [:bbondy] 2012-02-15 11:02:56 PST
OK I can reproduce now with the new extension attached above.
I can reproduce when is set to gfx.canvas.azure.enabled; false and/or when I disable hardware acceleration using this tool:
"C:\Program Files (x86)\Microsoft DirectX SDK (June 2010)\Utilities\bin\x86\dxcpl.exe"

It is most easy to reproduce with many tabs open and when you turn on session restore.  When you click the restore button it crashes right then.

Strangely enough I can reproduce with a conditional breakpoint that
nsCanvasRenderingContext2D::DrawWindow's mThebes is non null but once it is inside RenderDocument the parameter is NULL.

>    rv = presShell->RenderDocument(r, renderDocFlags, bgColor, mThebes);


I'm not quite sure what's going on but it sounds like stack corruption of some sort.
I do notice that nsCanvasRenderingContext2D::Reset() is called after the call to nsCanvasRenderingContext2D::DrawWindow only when it crashes from a SetDimensions call.
I'm not sure why the value is non null before the call and null once inside the call though.

I'm going to unassign myself in hopes that someone who knows this code better will take this bug.
If anyone has any ideas for me I can re-take the bug.
Comment 41 Brian R. Bondy [:bbondy] 2012-07-26 18:35:04 PDT
Comment on attachment 557582 [details] [diff] [review]
Possible crash fix for taskbar previews

Review of attachment 557582 [details] [diff] [review]:
-----------------------------------------------------------------

While investigating bug 777703 I found several dozen instances where we store cairo_int_status_t which has max value 106 inside of cairo_status_t which has max value 40. 

The nil object array we index in cairo_create_in_error has a max size of CAIRO_STATUS_LAST_STATUS which is 40.

Would you reconsider landing this patch? I think we're causing memory corruption that could have been fixed for almost a year since this patch was suggested.

How about if I add an assert(status < CAIRO_STATUS_LAST_STATUS) before the changed lines?
I will file a follow up bug and I'll take it to actually fix the references I can find.  But I think that bug 777703 proves that having this fix is a really good idea.  C enums are not typesafe and so errors like this are very common.  I'd rather have a debug assertion than some random stack corruption.  And users would rather not crash.
Comment 42 Brian R. Bondy [:bbondy] 2012-07-26 18:49:50 PDT
Preferring stack corruption over simply adding a debug assertion and fixing the safety of the code seems bad from a security standpoint as well.  Especially now that we have proof that cairo_status_t does store values that are out of bound, it's not just based on my suspicion as it was earlier.
Comment 43 Robert Kaiser (not working on stability any more) 2012-07-27 05:22:32 PDT
Given comment #41 and #42 I'm nominating this for tracking on all active branches.

It sounds to me as if Brian is pretty sure there's memory corruption happening because of this, and we've had topcrashers all the time that have some memory corruption happening, so any potential source of corruptions should be fixed where possible (with memory/stack corruption crashes, it's usually impossible to know from our minidumps what caused the corruption, we only know we crashed because something is corrupted).
Comment 44 Brian R. Bondy [:bbondy] 2012-07-27 06:15:10 PDT
I'm not sure there is stack corruption but I suspected that values out of bound were passed into cairo_create_in_error before.  If such values are passed then an index is blindly offset and a cairo object is returned from it. 

I recently found a few dozen cases of such errors.  Whether any of them ever make it up to cairo_create_in_error or not I'm not sure, but the cairo_status_t and cairo_int_status_t are used a lot interchangeably when they shouldn't.  The cairo_int_sattus_t values start at 100 and the array I mentioned earlier only goes up to 40.
Comment 45 Alex Keybl [:akeybl] 2012-07-30 09:37:43 PDT
Not new to recent or upcoming releases, and not directly suspected of causing any recent top crashers. Tracking flags mean something very specific - an effort around fixing a long tail of possible memory corruption issues would need to be handled like crit-smash, crash-kill, or snappy.

If this was likely to be causing significant issues, we'd treat it differently. Please re-nominate if that's the case.
Comment 46 Jeff Muizelaar [:jrmuizel] 2012-09-20 05:39:43 PDT
Comment on attachment 557582 [details] [diff] [review]
Possible crash fix for taskbar previews

Review of attachment 557582 [details] [diff] [review]:
-----------------------------------------------------------------

I'd rather we just crash in _cairo_create_in_error and try to track down where these things are coming from. That should help us separate out the different cases and perhaps fix them.
Comment 47 Brian R. Bondy [:bbondy] 2012-09-20 07:48:18 PDT
(In reply to Jeff Muizelaar [:jrmuizel] from comment #46)
> Comment on attachment 557582 [details] [diff] [review]
> Possible crash fix for taskbar previews
> 
> Review of attachment 557582 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> I'd rather we just crash in _cairo_create_in_error and try to track down
> where these things are coming from. That should help us separate out the
> different cases and perhaps fix them.

I don't follow.  assert (status != CAIRO_STATUS_SUCCESS); will only be hit on debug builds. And I don't see a check there to abort if the value is out of range.

Would you accept a patch that did something like this instead?

+ /* Sanity check */
+ if (status < 0 || status > CAIRO_STATUS_LAST_STATUS) {
+   abort();
+ }
Comment 48 Mats Palmgren (:mats) 2012-11-30 17:55:49 PST
This crash fix seems to have stalled waiting for a reply from Jeff.
Comment 49 Jeff Muizelaar [:jrmuizel] 2013-03-07 08:03:05 PST
(In reply to Brian R. Bondy [:bbondy] from comment #47)
> (In reply to Jeff Muizelaar [:jrmuizel] from comment #46)
> > Comment on attachment 557582 [details] [diff] [review]
> > Possible crash fix for taskbar previews
> > 
> > Review of attachment 557582 [details] [diff] [review]:
> > -----------------------------------------------------------------
> > 
> > I'd rather we just crash in _cairo_create_in_error and try to track down
> > where these things are coming from. That should help us separate out the
> > different cases and perhaps fix them.
> 
> I don't follow.  assert (status != CAIRO_STATUS_SUCCESS); will only be hit
> on debug builds. And I don't see a check there to abort if the value is out
> of range.
> 
> Would you accept a patch that did something like this instead?
> 
> + /* Sanity check */
> + if (status < 0 || status > CAIRO_STATUS_LAST_STATUS) {
> +   abort();
> + }

Yes that would be better. Sorry about the delay.

-Jeff
Comment 50 Brian R. Bondy [:bbondy] 2013-03-11 06:48:07 PDT
Created attachment 723426 [details] [diff] [review]
Patch v2.

> Yes that would be better. Sorry about the delay.

No problem, at least it was within a year ;)  New patch with abort() in the place I suspect stack corruption happens.
Comment 51 Jeff Muizelaar [:jrmuizel] 2013-03-11 13:07:42 PDT
Comment on attachment 723426 [details] [diff] [review]
Patch v2.

Review of attachment 723426 [details] [diff] [review]:
-----------------------------------------------------------------

We'll need to be careful with this as it could cause unexpected aborts but those will still be interesting to see.
Comment 52 Brian R. Bondy [:bbondy] 2013-03-11 18:26:21 PDT
If it does that's better than unexpected stack corruptions, and if we are in that case then I think we should land the first submitted patch after we know it happens.

Do I need a documentation patch similar to this one before landing? ( https://bugzilla.mozilla.org/page.cgi?id=splinter.html&bug=614772&attachment=558463 )
Comment 53 Brian R. Bondy [:bbondy] 2013-03-22 06:42:10 PDT
Request timed out.
Comment 55 Joe Drew (not getting mail) 2013-03-23 16:05:44 PDT
https://hg.mozilla.org/mozilla-central/rev/d50dbae0b538

Note You need to log in before you can comment on or make changes to this bug.