Closed Bug 614772 Opened 14 years ago Closed 11 years ago

crash with taskbar preview [@ gfxContext::NewPath() ] mainly with Firefox Showcase

Categories

(Core :: Graphics, defect)

x86
Windows 7
defect
Not set
critical

Tracking

()

RESOLVED FIXED
mozilla22
Tracking Status
firefox15 - ---
firefox16 - ---
firefox17 - ---
firefox-esr10 - ---

People

(Reporter: scoobidiver, Assigned: bbondy)

References

Details

(Keywords: crash)

Crash Data

Attachments

(4 files)

It is a residual crash signature that exist in 3.0, 3.6 and trunk builds.
It is #225 top crasher in 4.0b7 for the last week.

Two comments say:
"clicked a close (X) button on the windows 7 taskbar tab preview of the leftmost tab"
"something with windows 7 taskbar preview and multiple tabs. it just quit unexpectedly."

Signature	gfxContext::NewPath()
UUID	ad59b146-9156-41b7-ba7d-7a2352101124
Time 	2010-11-24 18:13:05.623157
Uptime	2410
Install Age	2410 seconds (40.2 minutes) since version was first installed.
Product	Firefox
Version	4.0b8pre
Build ID	20101124042634
Branch	2.0
OS	Windows NT
OS Version	6.1.7600
CPU	x86
CPU Info	GenuineIntel family 6 model 23 stepping 7
Crash Reason	EXCEPTION_ACCESS_VIOLATION_READ
Crash Address	0x4
App Notes 	AdapterVendorID: 10de, AdapterDeviceID: 06e4

Crashing Thread
Frame 	Module 	Signature [Expand] 	Source
0 	xul.dll 	gfxContext::NewPath 	gfx/thebes/gfxContext.cpp:113
1 	xul.dll 	PresShell::RenderDocument 	layout/base/nsPresShell.cpp:5301
2 	xul.dll 	nsCanvasRenderingContext2D::DrawWindow 	content/canvas/src/nsCanvasRenderingContext2D.cpp:3636
3 	xul.dll 	nsIDOMCanvasRenderingContext2D_DrawWindow 	obj-firefox/js/src/xpconnect/src/dom_quickstubs.cpp:3566
4 	mozjs.dll 	js::Interpret 	js/src/jsinterp.cpp:4748
5 	mozjs.dll 	js::RunScript 	js/src/jsinterp.cpp:657
6 	mozjs.dll 	js::Invoke 	js/src/jsinterp.cpp:737
7 	mozjs.dll 	js::ExternalInvoke 	js/src/jsinterp.cpp:858
8 	mozjs.dll 	JS_CallFunctionValue 	js/src/jsapi.cpp:4973
9 	xul.dll 	nsXPCWrappedJSClass::CallMethod 	js/src/xpconnect/src/xpcwrappedjsclass.cpp:1694
10 	xul.dll 	nsXPCWrappedJS::CallMethod 	js/src/xpconnect/src/xpcwrappedjs.cpp:588
11 	xul.dll 	PrepareAndDispatch 	xpcom/reflect/xptcall/src/md/win32/xptcstubs.cpp:114
12 	xul.dll 	SharedStub 	xpcom/reflect/xptcall/src/md/win32/xptcstubs.cpp:141
13 	xul.dll 	mozilla::widget::TaskbarPreview::DrawBitmap 	widget/src/windows/TaskbarPreview.cpp:404

More reports at:
http://crash-stats.mozilla.com/report/list?product=Firefox&query_search=signature&query_type=exact&range_value=4&range_unit=weeks&hang_type=any&process_type=any&plugin_field=&plugin_query_type=&plugin_query=&do_query=1&admin=&signature=gfxContext%3A%3ANewPath%28%29
It looks like the thebes surface on the canvas is null. This is probably due to bug 559613 but I'm not sure exactly how.
Blocks: 559613
See the newly duped bug 644124 for STR :-)
Crash Signature: [@ gfxContext::NewPath() ]
Assignee: nobody → netzen
This crash happened to me as I was mouse hovering over each of the tab previews back and forth.  I can't reproduce anymore though.

This happens in thebes context when mCairo is not created correctly.
I.e. in the constructor of gfxContext we call:

> mCairo = cairo_create(surface->CairoSurface());

Trying currently to find a way in code to reproduce without using the debugger to manually set NULL on mCairo.  Perhaps by continually creating a surface.
cairo_create should always be giving back a valid cairo object which either performs no-ops or correct behaviour.
Ya I seen the comment in cairo_create and I seen the calls to cairo_create_in_error on error. 

Currently debugging to see if there is another way to get it to crash at the same location other than having a NULL mCairo.
The only way I can reproduce (manually with the debugger) is with a NULL or invalid mCairo.  Not with a NULL surface nor context. 

The noop object we return is one per error code from 0 to CAIRO_STATUS_LAST_STATUS.  I think perhaps we're offsetting that by too much and getting passed the array and returning an invalid cairo noop pointer.

Still debugging though.
As per Comment 7:

I couldn't find a case where the cairo status is less than 0 or more than CAIRO_STATUS_LAST_STATUS but I think this is the cause of the crash.  Perhaps because of an invalid cairo surface passed to cairo_create or perhaps -1 is being returned as a status somewhere.

In any case I think the fix is worth it and we can see if the crashes go away on Nightly or not for new builds.
The crashes happen and are reported every day a few times, and happen in FF6 dozens of times per day.

It's better than what we currently do in cairo_create_in_error: blindly index into _cairo_nil__objects with the status error code.
Attachment #557582 - Flags: review?(joe)
Comment on attachment 557582 [details] [diff] [review]
Possible crash fix for taskbar previews

Jeff's a better person for this, I think. At minimum, you're going to want to put a patch in cairo/ and add it to our list of patches in the README.
Attachment #557582 - Flags: review?(joe) → review?(jmuizelaar)
As per Joe's comment 9.
Attachment #558463 - Flags: review?(jmuizelaar)
Comment on attachment 557582 [details] [diff] [review]
Possible crash fix for taskbar previews

I thought I had reviewed this already, but I guess I forgot to submit.

I don't think we should do this it feels too much like a band-aid. Can you find the caller that's setting this and fix it there?
Attachment #557582 - Flags: review?(jmuizelaar) → review-
I'll try to find a root cause but if found, I think in addition to that fix we should still do this check since it's better than a memory violation if ever a bad error code is returned from anything in cairo.
(In reply to Brian R. Bondy [:bbondy] from comment #12)
> I'll try to find a root cause but if found, I think in addition to that fix
> we should still do this check since it's better than a memory violation if
> ever a bad error code is returned from anything in cairo.

It seems like that is more likely to postpone problems. If we do add a check, I think that it ensure that we crash rather than avoid it.
Attachment #558463 - Flags: review?(jmuizelaar) → review-
I can't reproduce this crash ever anymore for the past few weeks.  Could we do something like add telemetry to notify us if the status code I suspect is out of range is indeed out of range?  It would basically just be a patch we put on nightly temporary and revert it out after we have the data?
(In reply to Brian R. Bondy [:bbondy] from comment #14)
> I can't reproduce this crash ever anymore for the past few weeks.  Could we
> do something like add telemetry to notify us if the status code I suspect is
> out of range is indeed out of range?  It would basically just be a patch we
> put on nightly temporary and revert it out after we have the data?

Definitely. I've done something like this in the past:
http://mxr.mozilla.org/mozilla-central/ident?i=CrashSpline
Great, I'll approach it that way and re-submit a patch.  Will probably be next week as I'm on several other things this week. Thanks!
Crash Signature: [@ gfxContext::NewPath() ] → [@ gfxContext::NewPath() ] [@ gfxContext::NewPath]
It's correlated in 9.0.1 with Firefox Showcase: 
99% (134/136) vs.   0% (220/101230) {89506680-e3f4-484c-a2c0-ed711d481eda} (Firefox Showcase, https://addons.mozilla.org/addon/1810)
(In reply to Scoobidiver from comment #18)
> It's correlated in 9.0.1 with Firefox Showcase: 
> 99% (134/136) vs.   0% (220/101230) {89506680-e3f4-484c-a2c0-ed711d481eda}
> (Firefox Showcase, https://addons.mozilla.org/addon/1810)

I just had a user confirm that disabling the Showcase extension fixed their crash problem - https://support.mozilla.org/en-US/questions/913408
Summary: crash with taskbar preview [@ gfxContext::NewPath() ] → crash with taskbar preview [@ gfxContext::NewPath() ] mainly with Firefox Showcase
Awesome, I'll try to fix it with this new info this week or next.
I downloaded the extension, opened over 100 tabs and tried using each of the buttons available with that extension, but unfortunately could still not reproduce.

Likely the crashes are seen with this extension because it provides much more previews than is available via the taskbar previews option which is capped by Windows for the number of previews that are shown.

It anyone has more information on how to reproduce please let me know.
(In reply to Brian R. Bondy [:bbondy] from comment #21)
> I downloaded the extension, opened over 100 tabs and tried using each of the
> buttons available with that extension, but unfortunately could still not
> reproduce.
> 
> Likely the crashes are seen with this extension because it provides much
> more previews than is available via the taskbar previews option which is
> capped by Windows for the number of previews that are shown.
> 
> It anyone has more information on how to reproduce please let me know.

I'm Showcase's author; the users that have contacted me with this issue reported that it was triggered by enabling the "Thumbnail caching" option; this option can be found in the "Advanced" panel, and is disabled by default.
Thanks for the info, I tried that option but cannot reproduce still.
It instantly crashed for me :)

This is what I did:
- Have some tabs loaded with real websites (blank thumbnails are special cases).
- Press F12 to bring Showcase.
- Close it
- Go to the Showcase settings panel, "Advanced" tab.
- Toggle the cache option.

At this point, it crashed; if it doesn't in your computer, try reloading a tab, it appears to be what triggers the crash.

I can provide any information you need; to start with, the user agent is:
Mozilla/5.0 (Windows NT 6.1; WOW64; rv:10.0) Gecko/20100101 Firefox/10.0
browser.startup.homepage_override.buildID 20120111092507
OK, that's odd... now it doesn't crash following the same instructions. I'll be trying different things, see if I can reliably reproduce the issue.
After trying different things, it appears that the crashing happens, but it's random in most cases; however, opening http://www.terra.es/ seemed to reliably crash Firefox when the "thumbnail cache" in Showcase was enabled.

My feeling is that the problem happens when a tab is painted with "drawWindow" multiple times at once; a testcase like that should be trivial to implement.
I tried with that site and with thumbnail cache enabled, but no crashing.
I tried reloading several times as well as opening over 100 other tabs.

I'll investigate further through code since I can't reproduce.
Yes, I just tried the website I referred earlier, and now it doesn't crash; perhaps it was something that was present in that website at that moment.

I also tried to modify the code to paint the thumbnail multiple times, but is not crashing either.

I'll try to create a proper test case that causes the crash, but it appears to be tricky to reproduce.
It appears to crash quite consistently when opening flash games; for instance, try this one (which is very basic and small):
http://armorgames.com/play/21/phit
No crash yet, I have about 20 tabs open with that page.
I noticed that the extension has an export settings option. Could you export your settings to me so I can import them?
I was quite sure I was using the default configuration, but just in case I reset to defaults, and only change the thumbnail caching option; the crash still happens.

Are you using any Flash/ad blocker?
OK so we have the same config, no I don't have a flash / ad blocker.
Just tried the "Enable cache" option with 10.0 beta 5, with hardware acceleration enabled and disabled, and the crash still happens very often.

It's not about opening a lot of windows, it seems to be about browsing; also, it happens to crash a lot on "session restore" with GMail.

If there's any information or test I can provide that would help you, let me know.
One more piece of information; I had that thumbnail caching option enabled and running without problems when toggling the following option:
extensions.showcase.updateThumbnailWhenScroll (from the default value, enabled, to disabled)

In the code, this will disable the thumbnail refreshing when scrolling; this is done through a nsITimer instance, to prevent a large number of thumbnail refreshing; maybe it would be a good idea to correlate this code with the one that is failing in the Firefox side to see if they've anything in common.
I was trying different things to try to fix the problem, and I've come up with a change in my extension that seems to increase quite a lot the crashing frequency; should I upload it here?
Yes please, more of the same crashes would be great
You still need to set the "Enable thumbnail caching" from the "Advanced" tab of this extension's settings panel.

The difference between this version and the previous one is that between creating the canvas and setting its size, it rendered the window using a "setTimeout" with 0ms delay; doing it immediately increases the amount of crashes in a noticeable way; it even crashed when trying to bring the "Add-ons" panel.

Things that I also tried, but without any effect, is to set safe parameters in the drawWindow call, ensure the size of the canvas is valid, and use a "setTimeout" instead of a nsITimer.

Also, I confirmed that the crash happens in the drawWindow call; commenting that operation gets rid of the crashes.

Let me know if you're able to reproduce the problem with this version.
Sorry I can't reproduce even with the new extension.  Perhaps I can do a remote login to your machine with gotomeeting, logmein, or something similar to get a call stack?
OK I can reproduce now with the new extension attached above.
I can reproduce when is set to gfx.canvas.azure.enabled; false and/or when I disable hardware acceleration using this tool:
"C:\Program Files (x86)\Microsoft DirectX SDK (June 2010)\Utilities\bin\x86\dxcpl.exe"

It is most easy to reproduce with many tabs open and when you turn on session restore.  When you click the restore button it crashes right then.

Strangely enough I can reproduce with a conditional breakpoint that
nsCanvasRenderingContext2D::DrawWindow's mThebes is non null but once it is inside RenderDocument the parameter is NULL.

>    rv = presShell->RenderDocument(r, renderDocFlags, bgColor, mThebes);


I'm not quite sure what's going on but it sounds like stack corruption of some sort.
I do notice that nsCanvasRenderingContext2D::Reset() is called after the call to nsCanvasRenderingContext2D::DrawWindow only when it crashes from a SetDimensions call.
I'm not sure why the value is non null before the call and null once inside the call though.

I'm going to unassign myself in hopes that someone who knows this code better will take this bug.
If anyone has any ideas for me I can re-take the bug.
Assignee: netzen → nobody
Comment on attachment 557582 [details] [diff] [review]
Possible crash fix for taskbar previews

Review of attachment 557582 [details] [diff] [review]:
-----------------------------------------------------------------

While investigating bug 777703 I found several dozen instances where we store cairo_int_status_t which has max value 106 inside of cairo_status_t which has max value 40. 

The nil object array we index in cairo_create_in_error has a max size of CAIRO_STATUS_LAST_STATUS which is 40.

Would you reconsider landing this patch? I think we're causing memory corruption that could have been fixed for almost a year since this patch was suggested.

How about if I add an assert(status < CAIRO_STATUS_LAST_STATUS) before the changed lines?
I will file a follow up bug and I'll take it to actually fix the references I can find.  But I think that bug 777703 proves that having this fix is a really good idea.  C enums are not typesafe and so errors like this are very common.  I'd rather have a debug assertion than some random stack corruption.  And users would rather not crash.
Attachment #557582 - Flags: review- → review?(jmuizelaar)
Preferring stack corruption over simply adding a debug assertion and fixing the safety of the code seems bad from a security standpoint as well.  Especially now that we have proof that cairo_status_t does store values that are out of bound, it's not just based on my suspicion as it was earlier.
Given comment #41 and #42 I'm nominating this for tracking on all active branches.

It sounds to me as if Brian is pretty sure there's memory corruption happening because of this, and we've had topcrashers all the time that have some memory corruption happening, so any potential source of corruptions should be fixed where possible (with memory/stack corruption crashes, it's usually impossible to know from our minidumps what caused the corruption, we only know we crashed because something is corrupted).
I'm not sure there is stack corruption but I suspected that values out of bound were passed into cairo_create_in_error before.  If such values are passed then an index is blindly offset and a cairo object is returned from it. 

I recently found a few dozen cases of such errors.  Whether any of them ever make it up to cairo_create_in_error or not I'm not sure, but the cairo_status_t and cairo_int_status_t are used a lot interchangeably when they shouldn't.  The cairo_int_sattus_t values start at 100 and the array I mentioned earlier only goes up to 40.
Not new to recent or upcoming releases, and not directly suspected of causing any recent top crashers. Tracking flags mean something very specific - an effort around fixing a long tail of possible memory corruption issues would need to be handled like crit-smash, crash-kill, or snappy.

If this was likely to be causing significant issues, we'd treat it differently. Please re-nominate if that's the case.
Comment on attachment 557582 [details] [diff] [review]
Possible crash fix for taskbar previews

Review of attachment 557582 [details] [diff] [review]:
-----------------------------------------------------------------

I'd rather we just crash in _cairo_create_in_error and try to track down where these things are coming from. That should help us separate out the different cases and perhaps fix them.
Attachment #557582 - Flags: review?(jmuizelaar) → review-
(In reply to Jeff Muizelaar [:jrmuizel] from comment #46)
> Comment on attachment 557582 [details] [diff] [review]
> Possible crash fix for taskbar previews
> 
> Review of attachment 557582 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> I'd rather we just crash in _cairo_create_in_error and try to track down
> where these things are coming from. That should help us separate out the
> different cases and perhaps fix them.

I don't follow.  assert (status != CAIRO_STATUS_SUCCESS); will only be hit on debug builds. And I don't see a check there to abort if the value is out of range.

Would you accept a patch that did something like this instead?

+ /* Sanity check */
+ if (status < 0 || status > CAIRO_STATUS_LAST_STATUS) {
+   abort();
+ }
This crash fix seems to have stalled waiting for a reply from Jeff.
Assignee: nobody → netzen
Flags: needinfo?(jmuizelaar)
(In reply to Brian R. Bondy [:bbondy] from comment #47)
> (In reply to Jeff Muizelaar [:jrmuizel] from comment #46)
> > Comment on attachment 557582 [details] [diff] [review]
> > Possible crash fix for taskbar previews
> > 
> > Review of attachment 557582 [details] [diff] [review]:
> > -----------------------------------------------------------------
> > 
> > I'd rather we just crash in _cairo_create_in_error and try to track down
> > where these things are coming from. That should help us separate out the
> > different cases and perhaps fix them.
> 
> I don't follow.  assert (status != CAIRO_STATUS_SUCCESS); will only be hit
> on debug builds. And I don't see a check there to abort if the value is out
> of range.
> 
> Would you accept a patch that did something like this instead?
> 
> + /* Sanity check */
> + if (status < 0 || status > CAIRO_STATUS_LAST_STATUS) {
> +   abort();
> + }

Yes that would be better. Sorry about the delay.

-Jeff
Flags: needinfo?(jmuizelaar)
Attached patch Patch v2.Splinter Review
> Yes that would be better. Sorry about the delay.

No problem, at least it was within a year ;)  New patch with abort() in the place I suspect stack corruption happens.
Attachment #723426 - Flags: review?(jmuizelaar)
Comment on attachment 723426 [details] [diff] [review]
Patch v2.

Review of attachment 723426 [details] [diff] [review]:
-----------------------------------------------------------------

We'll need to be careful with this as it could cause unexpected aborts but those will still be interesting to see.
Attachment #723426 - Flags: review?(jmuizelaar) → review+
If it does that's better than unexpected stack corruptions, and if we are in that case then I think we should land the first submitted patch after we know it happens.

Do I need a documentation patch similar to this one before landing? ( https://bugzilla.mozilla.org/page.cgi?id=splinter.html&bug=614772&attachment=558463 )
Flags: needinfo?(jmuizelaar)
Request timed out.
Flags: needinfo?(jmuizelaar)
https://hg.mozilla.org/mozilla-central/rev/d50dbae0b538
Status: NEW → RESOLVED
Closed: 11 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: