Closed Bug 715401 Opened 13 years ago Closed 11 years ago

Firefox Crash @ nsCanvasRenderingContext2DAzure::ClearRect with Yahoo Toolbar

Categories

(Core :: Graphics: Canvas2D, defect)

11 Branch
x86
Windows 7
defect
Not set
critical

Tracking

()

RESOLVED WORKSFORME
mozilla13
Tracking Status
firefox11 + affected
firefox12 - affected
firefox13 - affected

People

(Reporter: marcia, Assigned: bas.schouten)

References

(Blocks 1 open bug)

Details

(Keywords: crash, regression, Whiteboard: [qa+][startupcrash])

Crash Data

Attachments

(1 file)

Seen while looking at Aurora crash stats.  https://crash-stats.mozilla.com/report/list?signature=nsCanvasRenderingContext2DAzure::ClearRect%28float,%20float,%20float,%20float%29. Signature also seen on trunk.

https://crash-stats.mozilla.com/report/index/6f226ed0-23bd-47c3-ae5c-217ea2111227

Frame 	Module 	Signature [Expand] 	Source
0 	xul.dll 	nsCanvasRenderingContext2DAzure::ClearRect 	content/canvas/src/nsCanvasRenderingContext2DAzure.cpp:2058
1 	xul.dll 	nsIDOMCanvasRenderingContext2D_ClearRect 	obj-firefox/js/xpconnect/src/dom_quickstubs.cpp:2003
2 	mozjs.dll 	js::InvokeKernel 	js/src/jsinterp.cpp:625
3 	mozjs.dll 	js::Interpret 	js/src/jsinterp.cpp:3506
4 	mozjs.dll 	js::ContextStack::pushInvokeFrame 	js/src/vm/Stack.cpp:690
5 	mozjs.dll 	js::InvokeKernel 	js/src/jsinterp.cpp:643
6 	mozjs.dll 	mozjs.dll@0x834f 	
7 	mozjs.dll 	JS_CallFunctionValue 	js/src/jsapi.cpp:5213
8 	mozjs.dll 	mozjs.dll@0x834f 	
9 		@0xffffff86
It's #10 top browser crasher in 11.0a2 with many dupes and #133 in 12.0a1.

It first appeared in 12.0a1/20111220 and 11.0a2/20111221.
The regression range is:
http://hg.mozilla.org/mozilla-central/pushloghtml?fromchange=d75ebb37080e&tochange=2afd7ae68e8b
Component: Canvas: WebGL → Canvas: 2D
Keywords: regression
QA Contact: canvas.webgl → canvas.2d
Hardware: x86_64 → x86
Version: 12 Branch → 11 Branch
92% of crashes happen within one minute.
It's currently #1 top crasher in 11.0b1.
Keywords: topcrash
Whiteboard: startupcrash
roc - would you mind finding somebody to take a look at this bug? It's a top startup crasher with a regression range provided in comment#1. Thanks!
Assignee: nobody → roc
This does look bad...

The crash is at ClearRect in this code:

NS_IMETHODIMP
nsCanvasRenderingContext2DAzure::ClearRect(float x, float y, float w, float h)
{
  if (!FloatValidate(x,y,w,h)) {
    return NS_OK;
  }
  mTarget->ClearRect(mgfx::Rect(x, y, w, h));

The most likely explanation is that mTarget is null. But we try pretty hard to make sure there is never a live context with null mTarget.

None of the checkins in the regression window look related to me. How about you Jeff?
Anymore progress on this?
There's a strong correlation with Yahoo Toolbar:
  nsCanvasRenderingContext2DAzure::ClearRect(float, float, float, float)|EXCEPTION_ACCESS_VIOLATION_READ (768 crashes)
     99% (764/768) vs.  10% (3127/30240) {635abd67-4fe9-1b23-4f01-e679fa7484c1} (Yahoo! Toolbar, https://addons.mozilla.org/addon/2032)
Summary: Firefox Crash [@ nsCanvasRenderingContext2DAzure::ClearRect(float, float, float, float) ] → Firefox Crash @ nsCanvasRenderingContext2DAzure::ClearRect with Yahoo Toolbar
Yahoo Toolbar does some slightly strange things ... it has a JS module instantiating a new XUL document, creates an HTML canvas in that document, then draws into it (including calls to clearRect). My attempts to write a similar testcase didn't work.

Maybe someone could try installing that toolbar on some Windows 7 machines with D3D10 layers enabled and then try to reproduce the crash? At least so we can get a decent stack?
Keywords: qawanted
I installed the version of the Toolbar in Comment 6 and played around with it a bit, but have not yet been able to generate a crash on this Win 7 machine.

The signature summary shows 8066 crashes for Beta 1 - seems like a lot for a beta.
Note that 39% have the addon hotfix installed as well: (131/336) vs.  31% (6814/22290) firefox-hotfix@mozilla.org
Here are some comments that can help reproducing it:
"It always crash when I enter mail.yahoo.com"
"Closed Yahoo mail and it Mozilla crashed."

Note: According to comments, it seems that many users are unaware they are using a Beta version.
I tried many operations in Yahoo mail (ajax version) and so far no crash - entering, exiting, composing, etc.
Here are some more recent correlations:

nsCanvasRenderingContext2DAzure::ClearRect(float, float, float, float)|EXCEPTION_ACCESS_VIOLATION_READ (467 crashes)
    100% (467/467) vs.   9% (2888/32450) {635abd67-4fe9-1b23-4f01-e679fa7484c1} (Yahoo! Toolbar, https://addons.mozilla.org/addon/2032)
     11% (50/467) vs.   1% (345/32450) {4ED1F68A-5463-4931-9384-8FFF5ED91D92}
     12% (56/467) vs.   3% (856/32450) m3ffxtbr@mywebsearch.com
      7% (35/467) vs.   2% (567/32450) {99079a25-328f-4bd4-be04-00955acaa0a7}
(In reply to Marcia Knous [:marcia] from comment #10)
> I tried many operations in Yahoo mail (ajax version) and so far no crash -
> entering, exiting, composing, etc.
Did you have Yahoo Toolbar enabled?
Yes - I have been explicitly testing by launching the items from the actual toolbar that is installed as well as from the menu item that gets added when you install the toolbar.

(In reply to Scoobidiver from comment #12)
> (In reply to Marcia Knous [:marcia] from comment #10)
> > I tried many operations in Yahoo mail (ajax version) and so far no crash -
> > entering, exiting, composing, etc.
> Did you have Yahoo Toolbar enabled?
Was the affected Yahoo Toolbar version released around 11/20/2011?

Is the there a newer version of the Yahoo Toolbar that's absent from crash-stats?

If either of these is true, we should perform outreach to Yahoo and get them involved - they may have already fixed the issue, which means we could add-on blocklist this older version.
(In reply to Alex Keybl [:akeybl] from comment #14)
> Is the there a newer version of the Yahoo Toolbar that's absent from
> crash-stats?
No. See:
    100% (411/411) vs.   9% (2790/32012) {635abd67-4fe9-1b23-4f01-e679fa7484c1} (Yahoo! Toolbar, https://addons.mozilla.org/addon/2032)
          0% (1/411) vs.   0% (1/32012) 2.3.11.20110727115843
          1% (4/411) vs.   0% (9/32012) 2.4.5.20111209014555
          9% (38/411) vs.   1% (247/32012) 2.4.6.20120111022845
         90% (368/411) vs.   8% (2491/32012) 2.4.6.20120119024823
(In reply to Scoobidiver from comment #15)
> (In reply to Alex Keybl [:akeybl] from comment #14)
> > Is the there a newer version of the Yahoo Toolbar that's absent from
> > crash-stats?
> No. See:
>     100% (411/411) vs.   9% (2790/32012)
> {635abd67-4fe9-1b23-4f01-e679fa7484c1} (Yahoo! Toolbar,
> https://addons.mozilla.org/addon/2032)
>           0% (1/411) vs.   0% (1/32012) 2.3.11.20110727115843
>           1% (4/411) vs.   0% (9/32012) 2.4.5.20111209014555
>           9% (38/411) vs.   1% (247/32012) 2.4.6.20120111022845
>          90% (368/411) vs.   8% (2491/32012) 2.4.6.20120119024823

Thanks Scoobidiver - I've also confirmed that there was not a new release of the toolbar around the first instance on crash-stats (https://addons.mozilla.org/en-US/firefox/addon/yahoo-toolbar/versions/).

I'm just very confused by the fact that in https://bugzilla.mozilla.org/show_bug.cgi?id=715401#c1 "It first appeared in 12.0a1/20111220 and 11.0a2/20111221."

Why wouldn't this issue have been apparent earlier on trunk? Can we cross reference the new changes in both of those builds to try to find a common landed patch in both?
(In reply to Scoobidiver from comment #1)
> The regression range is:
> http://hg.mozilla.org/mozilla-central/
> pushloghtml?fromchange=d75ebb37080e&tochange=2afd7ae68e8b
It's wrong (I didn't checked the build hour). See below

(In reply to Alex Keybl [:akeybl] from comment #16)
> Why wouldn't this issue have been apparent earlier on trunk? Can we cross
> reference the new changes in both of those builds to try to find a common
> landed patch in both?
In fact, it first appeared in 12.0a1/20111220112824 (the previous Nightly build is 11.0a1/20111220031159) and in 11.0a2/20111221172848 (the previous Aurora build is 10.0a2/20111221042035) at the worst moment.
So the regression range is between 11.0a1/20111220031159 and Aurora 11, i.e.:
http://hg.mozilla.org/mozilla-central/pushloghtml?fromchange=2afd7ae68e8b&tochange=a8506ab2c654
roc, so we were not able to reproduce this but scoobidiver did take a stab at identifying a regression range. Anything in there that you think is suspect?
(In reply to Sheila Mooney from comment #18)
> roc, so we were not able to reproduce this but scoobidiver did take a stab
> at identifying a regression range. Anything in there that you think is
> suspect?

The motivation behind this question is to take speculative backouts for Firefox 11 beta 4 (go-to-build this coming Tuesday, 2/21).
(In reply to Sheila Mooney from comment #18)
> roc, so we were not able to reproduce this but scoobidiver did take a stab
> at identifying a regression range. Anything in there that you think is
> suspect?

No, nothing specifically related to the crash area :-(. How confident are we that that is the true range?
Wait! I think I see it --- bug 704143. Changeset 857f872e4d7d. Try backing that out.
There is a way to "skin" the toolbar, and I tried applying some skins as well but no crash yet.
(In reply to Robert O'Callahan (:roc) (Mozilla Corporation) (away February 21 to March 17) from comment #21)
> Wait! I think I see it --- bug 704143. Changeset 857f872e4d7d.
Yes, it seems likely.

> Try backing that out.
bug 704143 was #8 top crasher in 11.0a2, this bug is #6 top crasher in 11.0b2. The null check has only shifted the crash signature, so backing out is not the solution. The real cause of bug 704143 must be fixed.
Here's what I think is happening.

The fact that bug 704143 had an effect means that we must hit situations where nsCanvasRenderingContext2DAzures are created, but CreateOffscreenDrawTarget fails (probably due to SupportsAzure being false since OOM on such small objects is unlikely). When CreateOffscreenDrawTarget fails during UpdateContext called by nsHTMLCanvasElement::GetContext, the error is propagated out so the new context can't be used so that can't be the problem here. But UpdateContext can also be called by nsHTMLCanvasElement::SetAttr changing the width/height of the canvas, in which case JS can keep holding a reference to the context with null mTarget, which will later crash when used.

Indeed, Yahoo toolbar's YTBGradientGenerator creates a canvas and gets a 2D context (probably early-ish during startup) and later on at various times (maybe during the skinning process) the gradients get rebuilt with the canvas size being changed to various widths and heights.

I don't know why we start being able to create an nsCanvasRenderingContext2DAzure with SupportsAzure true, then later discover that SupportsAzure has become false. But I can see at least one reason: gfxWindowsPlatform::CreatePlatformFontList can see InitFontList failing and decide to switch to GDI mode (the comment in the code say this can happen, see bug 594865). If we already created the canvas context by then, SupportsAzure will suddenly have changed from true to false.
Attached patch fix? — — Splinter Review
I think this will fix it --- see comment in the patch.

If this guess is right, then this will stop us from crashing but Yahoo toolbar's rendering will be messed up since its gradient generator won't work properly.
Attachment #598518 - Flags: review?(bas.schouten)
Fixing the toolbar rendering would probably require us to understand why Azure is enabled and later disabled. CreatePlatformFontList seems to be called during gfxPlatform::Init so I don't see how extension code could run before that.
FWIW, some users have complained in the French Support forum that they can't click links in the upper page of any pages. It was caused by Yahoo Toolbar.
Comment on attachment 598518 [details] [diff] [review]
fix?

Review of attachment 598518 [details] [diff] [review]:
-----------------------------------------------------------------

::: content/canvas/src/nsCanvasRenderingContext2DAzure.cpp
@@ +1291,5 @@
>  
>    mWidth = width;
>    mHeight = height;
>  
> +  // This first time this is called on this object is via

The first
Attachment #598518 - Flags: review?(bas.schouten) → review+
Attachment #598518 - Flags: approval-mozilla-beta?
Attachment #598518 - Flags: approval-mozilla-aurora?
https://hg.mozilla.org/mozilla-central/rev/42c1b4a36c06
Status: NEW → RESOLVED
Closed: 12 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla13
Comment on attachment 598518 [details] [diff] [review]
fix?

[Triage Comment]
Backing out bug 704143 would re-introduce a top 10 topcrasher, so we'll go with this speculative fix for the time being. Approved for Aurora/Beta. Please land asap.
Attachment #598518 - Flags: approval-mozilla-beta?
Attachment #598518 - Flags: approval-mozilla-beta+
Attachment #598518 - Flags: approval-mozilla-aurora?
Attachment #598518 - Flags: approval-mozilla-aurora+
Whiteboard: startupcrash → [qa+] startupcrash
bug 729116 has STR.
(In reply to Scoobidiver from comment #36)
> bug 729116 has STR.

1. Start Firefox with new profile
2. Visit http://maps.google.com/ and activate the experimental MapsGl.
3. Resize window size so that zooming slider overlaps with satellite icon. 
4. Move zooming slider upward to top-end and release mouse at + icon.
5. If browser does not crash, Reload page and go to Step 4.
Google does not seem to allow me to enable MapsGL in my Windows 7 VM with Firefox 11.0b4. Can somebody else try this?
Assignee: roc → bas.schouten
Did not reproduce on two different machines with the following GPU and driver versions:

Machine 1
GPU Accelerated Windows: 2/2 Direct3D 10
Azure Backend: direct2d
Driver Version: 8.15.10.2321

Machine 2
GPU Accelerated Windows: 0. Blocked for your graphics driver version.
Driver Version: 8.16.11.8934
Thanks Jason. Do you know what graphics cards these computers have? ATI, NVidia, Intel, etc?
Graphics Cards:

Machine 1: Intel(R) HD Graphics Family
Machine 2: NVIDIA ION
I've reached out to Alice to see if she can provide more environment specifics because we can't seem to nail down a reproducible case; even given her STR in comment 37.

In the meantime, Jason is testing if we can even reproduce this on a previous Beta.
Did not see this with either of the above Win 7 machines on FF 11 beta 1.
Thanks Jason. Hopefully Alice can provide some insight here.
Screen cast: http://youtu.be/BU_UmhAYAmM
4. Move zooming slider upward to top-end _quickly_ and release mouse at _stree tvew icon_.

Browser crashes in Win7 Classic and Aero.

Firefox11beta : I cannot reproduce this crash.
Aurora12.0a2 : bp-9ad8e956-3ba2-49a2-a42b-d2bee2120225
Nightly13.0a1 : bp-8b9eb621-4529-44da-ada3-8d6b92120225

Regression window(m-c)
Not crash:
http://hg.mozilla.org/mozilla-central/rev/561771f01881
Mozilla/5.0 (Windows NT 6.1; WOW64; rv:13.0a1) Gecko/13.0a1 Firefox/13.0a1 ID:20120220031231
Crashes
http://hg.mozilla.org/mozilla-central/rev/b8e7474374d5
Mozilla/5.0 (Windows NT 6.1; WOW64; rv:13.0a1) Gecko/13.0a1 Firefox/13.0a1 ID:20120220040949
Pushlog:
http://hg.mozilla.org/mozilla-central/pushloghtml?fromchange=561771f01881&tochange=b8e7474374d5

Regression window(m-i)
Not crash:
http://hg.mozilla.org/integration/mozilla-inbound/rev/d743384fb011
Mozilla/5.0 (Windows NT 6.1; WOW64; rv:13.0a1) Gecko/13.0a1 Firefox/13.0a1 ID:20120218222254
Crashes
http://hg.mozilla.org/integration/mozilla-inbound/rev/fc2b5bfc02a2
Mozilla/5.0 (Windows NT 6.1; WOW64; rv:13.0a1) Gecko/13.0a1 Firefox/13.0a1 ID:20120219005449
Pushlog:
http://hg.mozilla.org/integration/mozilla-inbound/pushloghtml?fromchange=d743384fb011&tochange=fc2b5bfc02a2

Suspected:
42c1b4a36c06	Robert O'Callahan — Bug 715401. Instead of trying to create a fallback surface, just leave mTarget alone. r=bas
Just now Firefox 11 was updated automatically.
And I can reproduce now. bp-d27dbf7c-28d2-4459-b2b9-928232120225
http://hg.mozilla.org/releases/mozilla-beta/rev/0338a18c2bc8
Mozilla/5.0 (Windows NT 6.1; WOW64; rv:11.0) Gecko/20100101 Firefox/11.0 ID:20120222074758
Regression window(beta)
Not crash:
http://hg.mozilla.org/releases/mozilla-beta/rev/bb27964243e7
Mozilla/5.0 (Windows NT 6.1; WOW64; rv:11.0) Gecko/20100101 Firefox/11.0 ID:20120215222917
Crashes
http://hg.mozilla.org/releases/mozilla-beta/rev/0338a18c2bc8
Mozilla/5.0 (Windows NT 6.1; WOW64; rv:11.0) Gecko/20100101 Firefox/11.0 ID:20120222074758
Pushlog:
http://hg.mozilla.org/releases/mozilla-beta/pushloghtml?fromchange=bb27964243e7&tochange=0338a18c2bc8
It's caused by the patch in this bug.
So, I filed a separate bug, but it is marked duplication of this now...
Depends on: 729116
I can't reproduce this for some reason, but I believe the following is happening here. In the past when an accelerated canvas was changed in size to a point where it was bigger than we could accelerate, (i.e. from SetDimensions) we'd create a fallback DrawTarget of 1x1. Now we're setting just mValid to false. Since the user doesn't execute a new GetContext they can just continue using the old context, which is now no longer valid.

i.e. Roc's comment in the above patch is incorrect:

'// nsHTMLCanvasElement::GetContext. If target was non-null then mTarget is
// non-null, otherwise we'll return an error here and GetContext won't
// return this context object and we'll never enter this code again.'

This is true for the initial call to InitializeWithTarget but -not- for subsequent calls through resizing of the canvas.

There's a couple of band-aids we could do here to return old behaviour, I'm not sure if we can do a 'real' fix though without using Skia DrawTargets. (which I don't think is an option in beta)

The 'optimal' thing to do for now is probably to do a hybrid between the old behaviour and roc's new behaviour. What roc's patch seems to have changed is that we are now !mValid and we return an error when we -can- get a 1x1 DrawTarget but not the one we needed. We could create a patch that still tries to create a dummy DrawTarget, but -does- report us as !valid. This would mean there's still the possibility of a crash in very unfortunate circumstances, but it should fix both this bug and the new one that was created. Patch coming up.
If my patch didn't fix this bug (comment #34), shouldn't we just back it out?
Going by comments in bug 729116, Bas thinks not, so I'll leave both these bugs in his capable hands :-).
It's not fixed in 11.0b5, 12.0a2 and 13.0a1 after the patch of bug 729116 landed.
The patch in this bug is still a good thing. As far as I can tell, the bug that Yahoo must still be hitting was present from 10 already, I suggest not backing out, unless we have real evidence the patch in this bug made things worse (after the patch to bug 729116 landed), since there should be no regression here and even a little bit of an improved situation.

The real fix here would be to have Skia Azure canvases available on windows. This would allow us to simply switch to a software canvas if for some reason we switched away from D2D in the background. The exact cause of this switch away from D2D that must be causing this bug is still unknown to me. Skia azure canvases should be available on windows in the near future, although I'm not sure what version we're first shipping software Skia.
(In reply to Bas Schouten (:bas) from comment #54)
> The patch in this bug is still a good thing. As far as I can tell, the bug
> that Yahoo must still be hitting was present from 10 already, I suggest not
> backing out, unless we have real evidence the patch in this bug made things
> worse (after the patch to bug 729116 landed), since there should be no
> regression here and even a little bit of an improved situation.

I agree we can leave both patches in at this point since we've had no reported regressions.

This bug will presumably remain a top startup crasher if left unfixed. We have no reason to believe that this bug was being hit in FF10 (based on crash-stats). Do we have any other angles of attack?
(In reply to Alex Keybl [:akeybl] from comment #55)
> (In reply to Bas Schouten (:bas) from comment #54)
> > The patch in this bug is still a good thing. As far as I can tell, the bug
> > that Yahoo must still be hitting was present from 10 already, I suggest not
> > backing out, unless we have real evidence the patch in this bug made things
> > worse (after the patch to bug 729116 landed), since there should be no
> > regression here and even a little bit of an improved situation.
> 
> I agree we can leave both patches in at this point since we've had no
> reported regressions.
> 
> This bug will presumably remain a top startup crasher if left unfixed. We
> have no reason to believe that this bug was being hit in FF10 (based on
> crash-stats). Do we have any other angles of attack?

Not a lot in this code changed from 10 to 11 as far as I know? If someone could reproduce it we could get someone to gather a regression range I suppose?
Alice, since you seemed to be able to reproduce this bug before, is there any insight you can give with regards to a regression range?
I tried reproducing this in B5 using the latest version of Yahoo toolbar which shows up in the correlations (2.4.6.20120119024823). I tried tweaking some of the toolbar settings and skinning the toolbar but so far no luck reproducing on a few different Win 7 machines.
(In reply to Anthony Hughes, Mozilla QA (irc: ashughes) from comment #57)
> Alice, since you seemed to be able to reproduce this bug before, is there
> any insight you can give with regards to a regression range?

I believe Alice was only able to reproduce bug 729116, which was fallout from the initial patch.
(In reply to Alex Keybl [:akeybl] from comment #59)
> I believe Alice was only able to reproduce bug 729116, which was fallout
> from the initial patch.

exactly yes.
Given that QA has been unable to reproduce so far, what more can be done?
Just saw this comment in the crash comments:

"what i know:
* direct2d enabled, tried to make a canvas a little big (~300px x ~8700px) canvas is inside a scrollable container and is that "big" to avoid repaints on scrolling.
* disabling direct2d makes the canvas work perfectly well (actually your direct2d code is broken) canvas.setAttribute("width", "8700") throws NS_ERR_OUTOFMEMORY (this may have been on Aurora 12.0a2 from today) upon trying to paint something else on that canvas ff crashes"
(In reply to Alex Keybl [:akeybl] from comment #62)
> Just saw this comment in the crash comments:
> 
> "what i know:
> * direct2d enabled, tried to make a canvas a little big (~300px x ~8700px)
> canvas is inside a scrollable container and is that "big" to avoid repaints
> on scrolling.
> * disabling direct2d makes the canvas work perfectly well (actually your
> direct2d code is broken) canvas.setAttribute("width", "8700") throws
> NS_ERR_OUTOFMEMORY (this may have been on Aurora 12.0a2 from today) upon
> trying to paint something else on that canvas ff crashes"

Can QA get a minimized testcase so we can do some testing around this scenario?
(In reply to Anthony Hughes, Mozilla QA (irc: ashughes) from comment #63)
> (In reply to Alex Keybl [:akeybl] from comment #62)
> > Just saw this comment in the crash comments:
> > 
> > "what i know:
> > * direct2d enabled, tried to make a canvas a little big (~300px x ~8700px)
> > canvas is inside a scrollable container and is that "big" to avoid repaints
> > on scrolling.
> > * disabling direct2d makes the canvas work perfectly well (actually your
> > direct2d code is broken) canvas.setAttribute("width", "8700") throws
> > NS_ERR_OUTOFMEMORY (this may have been on Aurora 12.0a2 from today) upon
> > trying to paint something else on that canvas ff crashes"
> 
> Can QA get a minimized testcase so we can do some testing around this
> scenario?

In email, Bas mentioned that he doesn't believe that that comment is related to this regression specifically.

Bas also mentioned that there may be a forward patch that will fix this issue. We wouldn't have a good way of verifying that it doesn't carry any regressions ahead of the release, though, and likely won't take the fix for FF11.

1) Backout - through code inspection can we find the culprit?
2) Add-on hotfix - can we disable direct2d temporarily for any users with the Yahoo Toolbar?
I looked at that report - the user provided an email but no URL showed in the crash. We could try emailing that individual to see if he has a public URL that can be accessed.

(In reply to Alex Keybl [:akeybl] from comment #62)
> Just saw this comment in the crash comments:
> 
> "what i know:
> * direct2d enabled, tried to make a canvas a little big (~300px x ~8700px)
> canvas is inside a scrollable container and is that "big" to avoid repaints
> on scrolling.
> * disabling direct2d makes the canvas work perfectly well (actually your
> direct2d code is broken) canvas.setAttribute("width", "8700") throws
> NS_ERR_OUTOFMEMORY (this may have been on Aurora 12.0a2 from today) upon
> trying to paint something else on that canvas ff crashes"
Since there has been discussions outside this bug, I am not sure where we ended up with this. Are we still trying to identify a regression range? Who owns the next step?
(In reply to Anthony Hughes, Mozilla QA (irc: ashughes) from comment #63)
> (In reply to Alex Keybl [:akeybl] from comment #62)
> > Just saw this comment in the crash comments:
> > 
> > "what i know:
> > * direct2d enabled, tried to make a canvas a little big (~300px x ~8700px)
> > canvas is inside a scrollable container and is that "big" to avoid repaints
> > on scrolling.
> > * disabling direct2d makes the canvas work perfectly well (actually your
> > direct2d code is broken) canvas.setAttribute("width", "8700") throws
> > NS_ERR_OUTOFMEMORY (this may have been on Aurora 12.0a2 from today) upon
> > trying to paint something else on that canvas ff crashes"
> 
> Can QA get a minimized testcase so we can do some testing around this
> scenario?

I have a new idea that could possibly help us reproduce this crash. It seems that all crashes that I've been able to find so far have been occurring on hardware/driver combinatinos that should be blocklisted. Hence the cause of this regression might be the same as the cause of bug 711656. We should check if when running the yahoo toolbar, and an old driver version (a vendor/driver version found in the bug possibly), we could reproduce this issue.

The cause of us switching away from D2D might be some form of driver crash, this is actually the main reason we're blacklisting these drivers.
What are the blocklist configurations we should be checking? From what I can tell, bug 711656 only mentions "Intel GMA X4500HD, 4500MHD, and HD Graphics up to 8.15.10.2141"
(In reply to Anthony Hughes, Mozilla QA (irc: ashughes) from comment #68)
> What are the blocklist configurations we should be checking? From what I can
> tell, bug 711656 only mentions "Intel GMA X4500HD, 4500MHD, and HD Graphics
> up to 8.15.10.2141"

I believe that's because the symptom there (that particular crash) is specific to Intel drivers. For this bug any ancient graphics driver should do it. The best way would probably be a GMA 4500 with an as old as possible graphics driver. Confirming using about:support that it's not getting blacklisted. And then using canvas or the yahoo toolbar.
I'm not sure QA has immediate access to the qualifying hardware. I'll send an email to the team to see what we can do.
(In reply to Anthony Hughes, Mozilla QA (irc: ashughes) from comment #68)
> What are the blocklist configurations we should be checking? From what I can
> tell, bug 711656 only mentions "Intel GMA X4500HD, 4500MHD, and HD Graphics
> up to 8.15.10.2141"
This bug is not specifically related to Intel GPUs, but to old graphics drivers that are no longer blocklisted:
55% (100/181) vs.   5% (1673/34679) igd10umd32.dll (Intel)
          9% (16/181) vs.   0% (30/34679) 8.14.10.1930
          9% (17/181) vs.   0% (22/34679) 8.15.10.1749
          1% (1/181) vs.   0% (1/34679) 8.15.10.1808
          1% (1/181) vs.   0% (1/34679) 8.15.10.1851
          2% (4/181) vs.   0% (5/34679) 8.15.10.1855
          8% (14/181) vs.   0% (16/34679) 8.15.10.1883
          2% (3/181) vs.   0% (4/34679) 8.15.10.1892
          1% (2/181) vs.   0% (2/34679) 8.15.10.1968
          2% (4/181) vs.   0% (8/34679) 8.15.10.1986
          2% (3/181) vs.   0% (4/34679) 8.15.10.2025
          1% (1/181) vs.   0% (1/34679) 8.15.10.2040
          3% (5/181) vs.   0% (5/34679) 8.15.10.2057
          3% (5/181) vs.   0% (6/34679) 8.15.10.2086
          3% (6/181) vs.   0% (6/34679) 8.15.10.2104
          3% (6/181) vs.   0% (91/34679) 8.15.10.2125
          1% (2/181) vs.   0% (2/34679) 8.15.10.2182
          1% (1/181) vs.   0% (26/34679) 8.15.10.2253
          2% (3/181) vs.   0% (50/34679) 8.15.10.2266
          3% (5/181) vs.   0% (100/34679) 8.15.10.2291
          1% (1/181) vs.   0% (80/34679) 8.15.10.2342
38% (68/181) vs.   4% (1349/34679) atidxx32.dll (ATI)
          2% (4/181) vs.   0% (4/34679) 8.15.10.163
         17% (31/181) vs.   0% (45/34679) 8.15.10.212
          1% (2/181) vs.   0% (2/34679) 8.15.10.229
          2% (4/181) vs.   0% (4/34679) 8.17.10.240
          2% (3/181) vs.   0% (5/34679) 8.17.10.247
          1% (1/181) vs.   0% (1/34679) 8.17.10.252
          1% (1/181) vs.   0% (1/34679) 8.17.10.256
          3% (6/181) vs.   0% (9/34679) 8.17.10.261
          4% (7/181) vs.   0% (8/34679) 8.17.10.269
          1% (1/181) vs.   0% (5/34679) 8.17.10.279
          1% (1/181) vs.   0% (76/34679) 8.17.10.286
          1% (1/181) vs.   0% (33/34679) 8.17.10.331
          3% (6/181) vs.   0% (45/34679) 8.17.10.342
11% (20/181) vs.   4% (1538/34679) nvwgf2um.dll (NVIDIA)
          2% (4/181) vs.   0% (5/34679) 8.15.11.8593
          1% (2/181) vs.   0% (2/34679) 8.15.11.8644
          1% (1/181) vs.   0% (1/34679) 8.15.11.8652
          2% (3/181) vs.   0% (3/34679) 8.15.11.8664
          1% (2/181) vs.   0% (3/34679) 8.16.11.8766
          1% (1/181) vs.   0% (1/34679) 8.16.11.8783
          2% (4/181) vs.   0% (4/34679) 8.16.11.8990
          2% (3/181) vs.   0% (4/34679) 8.16.11.9107
If I am reading this correctly, the highest correlation is to ATI's driver v8.15.10.212?
See bug 711656 comment 69: a backout of the presumed culprit from mozilla-beta is on try:

  https://tbpl.mozilla.org/?tree=Try&rev=b9c53783cfa9
The back-out is landed on beta. See bug 711656 for details.
Just to be clear, with the patch we're expecting crash rates to drop to roughly those of the analogous crash on beta:

https://crash-stats.mozilla.com/report/list?signature=nsCanvasRenderingContext2DAzure%3A%3AInitializeWithTarget%28mozilla%3A%3Agfx%3A%3ADrawTarget*%2C+int%2C+int%29

That is essentially the 10.x version of this crash.
Comment on attachment 598518 [details] [diff] [review]
fix?

This appears to be accurately checked into Gecko 12 despite bugs obvious "affected" status for Gecko 12
Attachment #598518 - Flags: checkin+
(In reply to Justin Wood (:Callek) from comment #76)
> This appears to be accurately checked into Gecko 12 despite bugs obvious
> "affected" status for Gecko 12
The patch landed in Beta and Aurora but it hasn't fixed the issue in these channels. See crash reports for Aurora:
https://crash-stats.mozilla.com/report/list?version=Firefox%3A12.0a2&query_search=signature&query_type=contains&reason_type=contains&range_value=4&range_unit=weeks&hang_type=any&process_type=any&signature=nsCanvasRenderingContext2DAzure%3A%3AClearRect%28float%2C%20float%2C%20float%2C%20float%29
Whiteboard: [qa+] startupcrash → [qa+][startupcrash]
We've done some local testing and unfortunately it's not as broad as we would have liked.  

Here is what we found:
Intel 8.15.10.1749 - Fx10: Blocklisted, Fx11: Blocklisted, No crash
Intel 8.15.10.1930 - Fx10: Blocklisted, Fx11: Blocklisted, No crash
NVidia 8.17.11.9539 - Fx10: Blocklisted, Fx11: Blocklisted, No crash
NVidia 8.17.12.7533 - Fx10: Enabled, Fx11: Enabled, No crash

We had extreme difficulty trying to track down drivers as listed in comment 71. If we can get some help tracking down these drivers we can try expanding on this testing for Monday.
(In reply to Anthony Hughes, Mozilla QA (irc: ashughes) from comment #78)
> Intel 8.15.10.1749 - Fx10: Blocklisted, Fx11: Blocklisted, No crash
> Intel 8.15.10.1930 - Fx10: Blocklisted, Fx11: Blocklisted, No crash
> NVidia 8.17.11.9539 - Fx10: Blocklisted, Fx11: Blocklisted, No crash
> NVidia 8.17.12.7533 - Fx10: Enabled, Fx11: Enabled, No crash
Which feature is blocklisted?
Indeed, there are several graphic feature blocklists: D2D, D3D9, D3D10.
Crash reports show D2D is enabled and D3D10 is disabled and sometimes D3D9 also.
This was specifically looking at "Direct2d Enabled" blocklisting.
(In reply to Scoobidiver from comment #79)
> (In reply to Anthony Hughes, Mozilla QA (irc: ashughes) from comment #78)
> > Intel 8.15.10.1749 - Fx10: Blocklisted, Fx11: Blocklisted, No crash
> > Intel 8.15.10.1930 - Fx10: Blocklisted, Fx11: Blocklisted, No crash
> > NVidia 8.17.11.9539 - Fx10: Blocklisted, Fx11: Blocklisted, No crash
> > NVidia 8.17.12.7533 - Fx10: Enabled, Fx11: Enabled, No crash
> Which feature is blocklisted?
> Indeed, there are several graphic feature blocklists: D2D, D3D9, D3D10.
> Crash reports show D2D is enabled and D3D10 is disabled and sometimes D3D9
> also.

D3D10 is first enabled, then disabled, the fact D3D10 becomes 'disabled' is the actual cause of this bug. D2D is disabled at runtime presumably because of a graphics driver crash/reset because of drivers not being blocklisted properly.

We've backed out the blocklisting-breakage bug on beta, let's see what happens there.
Blocks: 704143
Depends on: 711656, 706908
After the latest patch in bug 711656 landed, there are still crashes in 11.0 build 2 for unblocked graphics driver versions:
https://crash-stats.mozilla.com/report/list?product=Firefox&version=Firefox%3A11.0&build_id=20120312181643&signature=nsCanvasRenderingContext2DAzure%3A%3AClearRect%28float%2C%20float%2C%20float%2C%20float%29

* Intel driver versions: 8.15.10.2202, 8.15.10.2302, 8.15.10.2509
* NVIDIA driver versions: 8.17.12.9573 (bp-c2eaba7b-5f88-4e0b-8aec-8e0a62120314, D2D blocked)
Untracking for FF12 and FF13 since this signature has dropped off the top crash list.
The volume on this has gone way down. Only 64 in FF11 so far. Removing the top crash keyword.
Keywords: topcrash
Firefox 12.0
Mozilla/5.0 (Windows NT 6.1; rv:12.0) Gecko/20100101 Firefox/12.0
BuildID: 20120420145725

Firefox 15.0.1
Mozilla/5.0 (Windows NT 6.1; rv:15.0) Gecko/20100101 Firefox/15.0.1
BuildID: 20120905151427

Latest Nightly
Mozilla/5.0 (Windows NT 6.1; rv:25.0) Gecko/20130728 Firefox/25.0
BuildID: 20130728030204

Latest Aurora
Mozilla/5.0 (Windows NT 6.1; rv:24.0) Gecko/20130729 Firefox/24.0
BuildID: 20130729004005

Firefox 23 beta 9
Mozilla/5.0 (Windows NT 6.1; rv:23.0) Gecko/20100101 Firefox/23.0
BuildID: 20130725195523

Firefox 22RC
Mozilla/5.0 (Windows NT 6.1; rv:22.0) Gecko/20100101 Firefox/22.0
BuildID: 20130618035212

Tested on two different machines with Intel HD 2000 and AMD Radeon HD 6450 graphic cards.

I was unable to reproduce the crash with the signature [@ nsCanvasRenderingContext2DAzure::ClearRect(float, float, float, float) ] on the above builds.
In the last week in Socorro there are 16 crashes in total but only on old builds. I think we can close the issue until there are more recent crashes on recent builds. If someone disagrees please reopen the bug.
Status: REOPENED → RESOLVED
Closed: 12 years ago11 years ago
Keywords: qawanted
Resolution: --- → WORKSFORME
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: