Last Comment Bug 773097 - crash in gfxContext::~gfxContext()
: crash in gfxContext::~gfxContext()
Status: RESOLVED FIXED
[startupcrash][advisory-tracking+] [qa?]
: crash, sec-high, topcrash
Product: Core
Classification: Components
Component: Graphics (show other bugs)
: unspecified
: x86 Windows XP
: -- critical with 1 vote (vote)
: mozilla17
Assigned To: Joe Drew (not getting mail)
: Marcia Knous [:marcia - use ni]
Mentors:
: 734921 780904 (view as bug list)
Depends on: 787623 787892
Blocks: 591358 704762
  Show dependency treegraph
 
Reported: 2012-07-11 17:08 PDT by Jason Smith [:jsmith]
Modified: 2012-11-06 06:45 PST (History)
23 users (show)
djcater+bugzilla: in‑testsuite?
See Also:
Crash Signature:
(edit)
[@ gfxContext::~gfxContext() ]
[@ _moz_cairo_destroy ]
[@ moz_cairo_destroy ]
[@ _moz_cairo_surface_get_reference_count ]
[@ _cairo_gstate_restore ]
[@ gfxASurface::Release() ]
[@ je_free | mozilla::`anonymous namespace''::ContainerState::FindThebesLayerFor(nsDisplayItem*, nsIntRect const&, nsIntRect const&, mozilla::FrameLayerBuilder::Clip const&, nsIFrame*) ]
[@ mozilla::FrameLayerBuilder::BuildContainerLayerFor(nsDisplayListBuilder*, mozilla::layers::LayerManager*, nsIFrame*, nsDisplayItem*, nsDisplayList const&, mozilla::FrameLayerBuilder::ContainerParameters const&, gfx3DMatrix const*) ]
[@ nsTArray<nsGenericHTMLFormElement*, nsTArrayDefaultAllocator>::Clear() | nsTArray<nsCSSStyleSheet*, nsTArrayDefaultAllocator>::~nsTArray<nsCSSStyleSheet*, nsTArrayDefaultAllocator>() | gfxContext::AzureState::~AzureState() ]
[@ nsTArray<PtrInfo*, nsTArrayDefaultAllocator>::Clear() | nsTArray<float, nsTArrayDefaultAllocator>::~nsTArray<float, nsTArrayDefaultAllocator>() | gfxContext::AzureState::~AzureState() ]
[@ mozilla::RefCounted<mozilla::gfx::ScaledFont>::Release() ]
[@ PL_DHashTableOperate | mozilla::`anonymous namespace''::ContainerState::ProcessDisplayItems(nsDisplayList const&, mozilla::FrameLayerBuilder::Clip&, unsigned int) ]
[@ PL_DHashTableOperate | mozilla::`anonymous namespace''::ContainerState::ProcessDisplayItems(nsDisplayList const&, mozilla::FrameLayerBuilder::Clip&) ]
[@ PL_DHashTableOperate | nsTHashtable<mozilla::FrameLayerBuilder::ThebesLayerItemsEntry>::PutEntry(mozilla::layers::ThebesLayer*) ]
[@ nsTArray_base<nsTArrayDefaultAllocator>::SwapArrayElements<nsTArrayDefaultAllocator>(nsTArray_base<nsTArrayDefaultAllocator>&, unsigned int, unsigned int) | nsTHashtable<mozilla::FrameLayerBuilder::DisplayItemDataEntry>::s_CopyEntry(PLDHashTable*, PLDH... ]
[@ mozilla::FrameLayerBuilder::GetLeafLayerFor(nsDisplayListBuilder*, mozilla::layers::LayerManager*, nsDisplayItem*) ]
[@ @0x0 | PL_DHashTableOperate | mozilla::`anonymous namespace''::ContainerState::ProcessDisplayItems(nsDisplayList const&, mozilla::FrameLayerBuilder::Clip&, unsigned int) ]
[@ je_free | ChangeTable ]
[@ mozilla::RefCounted<mozilla::gfx::DrawTarget>::Release() ]
QA Whiteboard:
Iteration: ---
Points: ---
Has Regression Range: ---
Has STR: ---
+
affected
+
fixed
+
fixed
+
fixed
15+
fixed


Attachments
about supoort from machine (3.34 KB, text/plain)
2012-08-08 18:12 PDT, Marcia Knous [:marcia - use ni]
no flags Details
Testcase (376 bytes, text/html)
2012-08-09 03:04 PDT, Daniel Cater
no flags Details
don't create the surface from GetCanvasLayer (1.19 KB, patch)
2012-08-09 15:58 PDT, Joe Drew (not getting mail)
roc: review+
matt.woodrow: review+
lukasblakk+bugs: approval‑mozilla‑aurora+
lukasblakk+bugs: approval‑mozilla‑beta+
lukasblakk+bugs: approval‑mozilla‑esr10+
Details | Diff | Splinter Review
somewhat ineffective test (2.39 KB, patch)
2012-08-10 14:13 PDT, Joe Drew (not getting mail)
ehsan: review+
Details | Diff | Splinter Review

Description Jason Smith [:jsmith] 2012-07-11 17:08:00 PDT
This bug was filed from the Socorro interface and is 
report bp-0101e134-9558-4f01-97f2-18c332120710 .
============================================================= 

Found when looking at crash stats for the webapp runtime. Don't have reproduction steps for this.
Comment 1 Marcia Knous [:marcia - use ni] 2012-07-12 14:21:32 PDT
Frame 	Module 	Signature 	Source
0 	xul.dll 	gfxContext::~gfxContext 	gfx/thebes/gfxContext.cpp:121
1 	xul.dll 	gfxContext::`scalar deleting destructor' 	
2 	xul.dll 	nsWindow::OnPaint 	widget/windows/nsWindowGfx.cpp:427
3 	xul.dll 	nsWindow::ProcessMessage 	widget/windows/nsWindow.cpp:4738
4 	xul.dll 	nsCOMPtr_base::assign_from_qi 	obj-firefox/xpcom/build/nsCOMPtr.cpp:58
5 	xul.dll 	CallWindowProcCrashProtected 	xpcom/base/nsCrashOnException.cpp:32
6 	xul.dll 	nsWindow::WindowProc 	widget/windows/nsWindow.cpp:4267
7 	xul.dll 	CallWindowProcCrashProtected 	xpcom/base/nsCrashOnException.cpp:38
8 	user32.dll 	StringDuplicateW
Comment 2 Scoobidiver (away) 2012-08-07 05:28:58 PDT
It's #9 top browser crasher in 14.0.1, #5 in 15.0b3, #8 in 16.0a2 and #54 in 17.0a1.

This spike is likely related to the one of bug 734921.

Here are correlations per module:
  gfxContext::~gfxContext()|EXCEPTION_ACCESS_VIOLATION_READ (10195 crashes)
    100% (10193/10195) vs.  37% (95988/258368) d3d9.dll
    100% (10173/10195) vs.  37% (95823/258368) d3d8thk.dll
Comment 3 Robert Kaiser 2012-08-07 07:02:07 PDT
The signature in this bug, the one in bug 734921, as well as some other gfx-related ones have been radically exploding in yesterday's data (2012-08-06), see https://crash-analysis.mozilla.com/rkaiser/2012-08-06/2012-08-06.firefox.14.explosiveness.html
Comment 4 Robert Kaiser 2012-08-07 07:05:37 PDT
Top 10 URLs:

331 	http://www.google.com.tr/
243 	http://www.google.de/
201 	about:blank
188 	http://www.google.co.id/
173 	http://www.google.fr/
143 	http://www.google.com.au/
120 	http://www.google.pl/
112 	http://www.google.es/
112 	http://www.google.com.vn/
100 	http://www.google.co.th/

This is followed by a long list of other Google URLs. Is there some Google Doodle today or yesterday that could be causing this?
Comment 5 bloodflash 2012-08-07 08:52:15 PDT
@Robert Kaiser:

My reply to this Bug is here:
https://bugzilla.mozilla.org/show_bug.cgi?id=734921#c10
Comment 6 Joe Drew (not getting mail) 2012-08-07 12:36:18 PDT
I can reproduce this on an Intel Integrated Windows 7 machine here with D3D9 layers turned on, but only on Aurora and Beta (not Nightly). I'm going to bisect now.
Comment 7 Joe Drew (not getting mail) 2012-08-07 13:13:36 PDT
Note: you frequently need to clear the cache in order for this to reliably crash.
Comment 8 Robert Kaiser 2012-08-07 13:19:20 PDT
Joe, FYI, marcia says in bug 734921 that there's trunk signatures fitting the same kind of case. Still, having something reproducible is already a good step. :)
Comment 9 Joe Drew (not getting mail) 2012-08-07 13:45:34 PDT
This is fixed by turning on the Azure canvas implementation by default on Windows (i.e., bug 773460). Unfortunately, fixing it properly requires me to work out the difference between how Azure's canvas works and how the Thebes canvas works.
Comment 10 Mike Graboski 2012-08-07 15:10:41 PDT
Joe, are you able to reproduce the issue with older Google doodles?  Ones you can try:

http://www.google.com/logos/bunsen.html
http://www.google.com/logos/2011/calder11.html

Thanks,
Mike
Comment 11 Joe Drew (not getting mail) 2012-08-07 15:45:48 PDT
Nope, neither of these crash for me. What's more, I find it harder to reliably crash using http://www.google.com/doodles/hurdles-2012 too; http://www.google.com seems to be the easiest and most reliable crashing URL for me.
Comment 12 Mike Graboski 2012-08-07 17:05:27 PDT
Does it crash more reliably if you use: http://www.google.com/logos/2012/hurdles-2012-hp.html

That URL has the doodle, without the surrounding gallery elements.  

Not sure if this helps with debugging, but the older doodles used setTimeout, whereas the sports games use requestAnimationFrame if it's available.
Comment 13 Robert Kaiser 2012-08-08 06:59:25 PDT
Those crashes are literally killing us. In yesterday's data, this signature has slightly over 10% of all 14.0.1 crashes and is #2 while bug 734921 has over 50% of all crashes and is #1. This doodle made our crash rates increase by a factor of ~2.5 or so.
Comment 14 Joe Drew (not getting mail) 2012-08-08 08:30:52 PDT
FWIW, since the hurdles doodle is no longer on Google's homepage, this crash should jump way down in today's stats.
Comment 15 Daniel Cater 2012-08-08 08:46:15 PDT
(In reply to Joe Drew (:JOEDREW!) from comment #14)
> FWIW, since the hurdles doodle is no longer on Google's homepage, this crash
> should jump way down in today's stats.

I've seen this crash on today's basketball doodle as well so I don't think that will be the case. I expect there will also be similar game-type doodles from now until Sunday and given that the first 2 cause crashes, there's a high chance that the others will too.
Comment 16 Mike Graboski 2012-08-08 08:53:02 PDT
Just a few questions:

*  Is the crash rate info (browser version, number of crashes, URL before crash) publicly available?
*  When will we know if today's basketball doodle is causing as many crashes as yesterday's hurdles doodle?
*  What is the raw number (and/or what is the %) of all Firefox browsers that we estimate have crashed, both due to the hurdles doodle and the basketball doodle?
Comment 17 Scoobidiver (away) 2012-08-08 08:59:54 PDT
(In reply to Mike Graboski from comment #16)
> *  Is the crash rate info (browser version, number of crashes, URL before
> crash) publicly available?
Except URLs, they are available:
https://crash-stats.mozilla.com/report/list?signature=gfxContext%3A%3A~gfxContext%28%29

> *  When will we know if today's basketball doodle is causing as many crashes
> as yesterday's hurdles doodle?
There are crashes on August 8.

> *  What is the raw number (and/or what is the %) of all Firefox browsers
> that we estimate have crashed, both due to the hurdles doodle and the
> basketball doodle?
See comment 13.
Comment 18 Robert Kaiser 2012-08-08 09:17:00 PDT
(In reply to Mike Graboski from comment #16)
> *  Is the crash rate info (browser version, number of crashes, URL before
> crash) publicly available?

As Scoobidiver has pointed out, version, number, etc. are available on crash-stats. URLs are only internal as they often can contain privacy-related information (and even plaintext passwords at times).

> *  When will we know if today's basketball doodle is causing as many crashes
> as yesterday's hurdles doodle?

We're only doing daily aggregations, so we'll know exact numbers tomorrow, but crash-stats allows us to do advanced do searches over hours to get some feel.

> *  What is the raw number (and/or what is the %) of all Firefox browsers
> that we estimate have crashed, both due to the hurdles doodle and the
> basketball doodle?

We don't have unique IDs per installation/user for privacy reasons, so we can't tell for sure. What we know is that yesterday we had a rise of our crash rates for 14.0.1 by ~3 crashes per 100 active daily installations, we can attribute those entirely to those signatures with the doodle, mainly bug 734921 and this one, and that's quite significant. Graphs of this are at the crash-stats front page: https://crash-stats.mozilla.com/products/Firefox

For the current release, i.e. 14.0.1, this was 2.5 million crashes at a throttle rate of 10%, i.e. 25 million crashes per day more than usual!
Comment 19 Robert Kaiser 2012-08-08 09:27:25 PDT
(In reply to Robert Kaiser (:kairo@mozilla.com) from comment #18)
> For the current release, i.e. 14.0.1, this was 2.5 million crashes at a
> throttle rate of 10%, i.e. 25 million crashes per day more than usual!

Bah, sorry, my bad, that 2.5 million number already includes the throttling. It's 250k at 10% throttling and 2.5 million total.
Sorry for overdoing it, but it's still quite a lot.
Comment 20 Robert Kaiser 2012-08-08 09:58:39 PDT
And, FYI, I just checked if things improved today compared to yesterday, using bug 734921's signature as it's the main one:

Results within 3 hours of 08/07/2012 16:00:00, where the crash signature contains '_moz_cairo_destroy', and the product is one of Firefox and the crashing process was of any type (including unofficial release channels): 48960
https://crash-stats.mozilla.com/query/query?product=Firefox&version=ALL%3AALL&range_value=3&range_unit=hours&date=08%2F07%2F2012+16%3A00%3A00&query_search=signature&query_type=contains&query=_moz_cairo_destroy&reason=&build_id=&process_type=any&hang_type=any&do_query=1

Results within 3 hours of 08/08/2012 16:00:00, where the crash signature contains '_moz_cairo_destroy', and the product is one of Firefox and the crashing process was of any type (including unofficial release channels): 40786
https://crash-stats.mozilla.com/query/query?product=Firefox&version=ALL%3AALL&range_value=3&range_unit=hours&date=08%2F08%2F2012+16%3A00%3A00&query_search=signature&query_type=contains&query=_moz_cairo_destroy&reason=&build_id=&process_type=any&hang_type=any&do_query=1

This implies that today's doodle is just as bad as yesterday's in the crash rate.
Comment 21 Joe Drew (not getting mail) 2012-08-08 15:58:18 PDT
I have tried several memory debugging tools and all of them have failed to yield much in terms of things to go on in terms of debugging this.

At this point the best thing for us to move forward with debugging this is a reduced testcase. Unfortunately, doing the usual "Save as" on the hurdles doodle didn't give me much to go on.

If Google could send me a copy of this doodle privately (joe@mozilla.com) I'd really appreciate it.
Comment 22 Marcia Knous [:marcia - use ni] 2012-08-08 18:03:07 PDT
We can reproduce this consistently in the QA lab with a Win 7 machine with a GeForce GT520 card.

The two variables that seem to consistently cause the crash are:

*gfx.canvas.azure =false
*layers.prefs.d3d9=false
*clear cache right before restarting
*google.com or the google doodles page as your startup page
*https versus http seems to make a difference

Geo postulates it might be some kind of race condition.

We tried several requestanimationframe demos and we were not able to reproduce the crash with those demos.
Comment 23 Marcia Knous [:marcia - use ni] 2012-08-08 18:10:09 PDT
Correction to Comment 22:

layers.prefer.d3d9=true, not false as stated previously.
Comment 24 Marcia Knous [:marcia - use ni] 2012-08-08 18:12:22 PDT
Created attachment 650404 [details]
about supoort from machine

Attached is the about:support configuration from the machine we crashed on. We also were able to reproduce the crash on the lab Vista machine.
Comment 25 Geo Mealer [:geo] -- This account is inactive after 2015-07-07 2012-08-08 18:18:21 PDT
Definitely seeing the crash happen during whatever happens between loading the basic Doodle and loading the Chrome promo, on the home page. Disabling the Chrome promo didn't suppress the crash, so it's not that, but it does establish timeline.

I think it's probably doing a lazy JS request of some animation info (maybe pushing it into a graphics buffer of some kind?) and crashing doing that. But since cache definitely affected it, it'd be some sort of cached data.

As an external effect, you can see the Chrome promo appear immediately when the animation is cached, and after a short delay when not. There's a similar delay on the Doodle archive page before the "more doodles" part loads.

Crash is also reproducable by disabling Direct2D. Think it was gfx.disable.direct2d=true, but I don't have the about:support that supplied the testcase in front of me.

Also, I was primarily working on GT520, but it repro'd on an older 8xxx card, and the about:support was ATI. It did not repro on a machine w/ blacklisted video drivers, and also didn't repro with hardware acceleration disabled.
Comment 26 Robert Kaiser 2012-08-08 18:25:56 PDT
Ever since Joe mentioned in comment #7 that clearing the cache makes a difference this sounded a lot like a race condition thing to me (i.e. something's causing this when it loads with some delay and doesn't when it's loaded instantly by being cached, maybe some layout is changing by lazy-loading something and that causes an graphics crash). What puzzles me is how this causes a whole bunch of different crash signatures, almost as if that load would change something in memory and we crash when we access it in an unexpected way afterwards from different places (and which place that is can vary). As I have no clue about the code at all and am not even a C/C++ coder, this is all just hunches from all the crazy things I have heard so far when dealing with crashes. ;-)
Comment 27 Geo Mealer [:geo] -- This account is inactive after 2015-07-07 2012-08-08 18:48:02 PDT
Oh, one other thing that was maybe interesting, gfx.canvas.azure.enabled=false did give us reliable reproduction as mentioned above, but gfx.content.azure.enabled=false did not repro.
Comment 28 Mike Graboski 2012-08-08 18:51:22 PDT
Ok we've been able to pare down the code to a minimal test case.  Here's a tiny reproducing case.  

We are investigating if there's a workaround to this bug that we can push soon; else we may push a change that shows a static Google logo instead of the animated sports doodles for Firefox 14 and above.

1. Paste the HTML below into an html file.
2. Open the file in Firefox.
3. Set that file as the homepage (Tools -> Options -> General -> "Use Current Pages")
4. Clear the cache (Tools -> Clear Recent History)
5. Restart firefox and it will crash.


<!doctype html>
<head>
<script>
function load() {
 var context = document.getElementById('canvas').getContext('2d');
 var update = function() {
   context.fillRect(0, 0, 500, 200);
   mozRequestAnimationFrame(update);
 }
 mozRequestAnimationFrame(update);
}
</script>
</head>
<body onload='load()'>
 <canvas id='canvas' width='500px' height='200px'></canvas>
</body>
</html>
Comment 29 Joe Drew (not getting mail) 2012-08-08 18:54:13 PDT
Hah, yes, I should say that's quite minimal. 

Thanks so much, Mike.
Comment 30 Geo Mealer [:geo] -- This account is inactive after 2015-07-07 2012-08-08 18:55:09 PDT
Our explicit STR, btw:

Firefox 15.x on Windows Vista/7

1. Create new profile
2. about:config
3. Set gfx.canvas.azure.enabled=false
4. Set layers.prefer.d3d9=true
5. Restart
6. Navigate to http://www.google.com/doodles/basketball-2012

// no crash

7. Preferences, set start page to current
8. Clear all content (cache in particular, but just clear everything)
9. Restart

// crash between doodle and "more doodles" loading

3 + 4 must be set together. Either alone did not repro. 

Alternately, set gfx.disable.direct2d=true instead of changing either of them, repros as well with same signature.

Top level signature was actually _moz_cairo_destroy from Bug 734921. Next stack level was this one.

This worked reliably on a couple of different machines. Since I suspect race, network speed/latency (i.e. fast) may have been a factor.
Comment 31 Geo Mealer [:geo] -- This account is inactive after 2015-07-07 2012-08-08 19:12:31 PDT
Just tried the minimal case. Worked the same, same requirements for the prefs. Just substitute into #6 for STR.

I did look closer at the crash reports, though. 

The one with the two prefs as #3 + #4 is _moz_cairo_destroy at top level and gfxContext::~gfxContext just below, as I mention above.

However, when you disable direct2d instead of setting those prefs, you get a crash with gfxContext::~gfxContext as the top level.

This may help with Bug 734921, or at least help differentiate from this one.
Comment 32 Daniel Cater 2012-08-09 03:04:23 PDT
Created attachment 650485 [details]
Testcase

Here is the testcase from comment 28. My colleage was hitting this crash on the basketball homepage so I asked him to try this testcase and it also causes a crash (although the homepage crash signature was this one, and the testcase crash signature was bug 734921). They seem closely related though.

It didn't crash every time. The first time the browser was already open and then opening the file didn't crash. Then closing the browser and opening it by double-clicking the file caused the crash. Not sure if that helps at all. There was no cache clearing involved.

Crash: https://crash-stats.mozilla.com/report/index/cf99bc73-4946-4b2b-a0f3-541532120809

about:support graphics section:

Adapter Description
Mobile Intel(R) 965 Express Chipset Family

Vendor
ID0x8086

Device
ID0x2a12

Adapter RAM
Unknown

Adapter Driver
sigdumdx32 igd10umd32

Driver Version
8.15.10.1930

Driver Date
9-23-2009

Direct2D Enabled
Blocked for your graphics driver version.

DirectWrite Enabled
false (6.1.7601.17789)

ClearType Parameters
ClearType parameters not found

WebGL Renderer
Google Inc. -- ANGLE (Mobile Intel(R) 965 Express Chipset Family) -- OpenGL ES 2.0 (ANGLE 1.0.0.1041)

GPU Accelerated Windows
1/1 Direct3D 9
Comment 33 Daniel Cater 2012-08-09 03:07:41 PDT
(In reply to Daniel Cater from comment #32)
> There was no cache clearing involved.

Actually, there was no manual cache clearing involved. Of course the cache could have been nuked via bug 105843 though.
Comment 34 Robert Kaiser 2012-08-09 06:33:16 PDT
*** Bug 734921 has been marked as a duplicate of this bug. ***
Comment 35 Robert Kaiser 2012-08-09 06:34:32 PDT
Moving over all the signatures from bug 734921 as the actual discussions are going on over here, so let's consolidate this here.
Comment 36 Marcia Knous [:marcia - use ni] 2012-08-09 07:03:06 PDT
*** Bug 780904 has been marked as a duplicate of this bug. ***
Comment 37 Mike Graboski 2012-08-09 09:03:11 PDT
Can anyone try reproducing this bug today with your homepage set to google.com?  Changes to the Google sports doodles were pushed last night, and we expect that the crash issues may be resolved.
Comment 38 Scoobidiver (away) 2012-08-09 09:12:08 PDT
The spike stopped on August 9 after 8H20 UTC.
Comment 39 Mike Graboski 2012-08-09 09:16:08 PDT
Perfect, this is minutes after the doodle fix was pushed.
Comment 40 [:jberkus] Josh Berkus 2012-08-09 09:42:59 PDT
Crashes are now down to the normal level of around 250,000 per hour, as opposed to the 430,000 per hour we were seeing yesterday.
Comment 41 Joe Drew (not getting mail) 2012-08-09 11:24:56 PDT
I have made this testcase even smaller. And even scarier.

<canvas id='canvas' width='1px' height='1px'></canvas>
<script>
 var ctx = document.getElementById('canvas').getContext('2d');
</script>
Comment 42 Marcia Knous [:marcia - use ni] 2012-08-09 11:30:45 PDT
Mike: I don't see a crash today using the STR in Comment 30. I am testing on the same machine that we were able to reproduce the crash on last evening.

In looking at crash stats, I do still see some recent crashes with UTC times just a few minutes ago, so it appears it is still happening to some users.

(In reply to Mike Graboski from comment #37)
> Can anyone try reproducing this bug today with your homepage set to
> google.com?  Changes to the Google sports doodles were pushed last night,
> and we expect that the crash issues may be resolved.
Comment 43 Joe Drew (not getting mail) 2012-08-09 13:55:32 PDT
Those may very well be my crashes, Marcia. :) Check the URL.
Comment 44 Caboosey 2012-08-09 14:14:09 PDT
In my issues with the Google doodle.....
Firefox loaded google.com good as the homepage prior to August 8, 2012.
Firefox will still crash with safe mode most of the time with google.com set as homepage.
Firefox will not crash if google.com isn't homepage and and tell Firefox to go to google.com. The doodle will load fine and no crash.

My personal conclusion:
initial loading of the doodle and Firefox while the user has google.com has homepage seems to be the issue I see.

I and reported the Google Doodle issue to Google and referenced this bug.

https://productforums.google.com/d/topic/websearch/S5AjuXNtczo/discussion
Comment 45 Joe Drew (not getting mail) 2012-08-09 15:58:24 PDT
Created attachment 650708 [details] [diff] [review]
don't create the surface from GetCanvasLayer

This was a bit of "fun" to figure out.

On Windows, we start up Firefox with a software-only layer manager, to hide the latency of creating a D3D device. Five seconds after startup, we switch to the accelerated layer manager (if it's possible to). This works just fine, most of the time.

However, canvases always want to use an accelerated layer manager, because that way we're sure to get good performance out of them (and it won't matter whether you open your canvas before or after the 5 second interval). Therefore, before they create their surface, they ask for the "persistent" layer manager, which causes us to destroy the old, software-only layer manager, and create a new accelerated layer manager. Again, most of the time this works fine.

Unfortunately, bug 591358 made us defer construction of the canvas' backing surface until it's actually going to be used. And, in this case, "used" includes being drawn, even before it has any contents.

So what happened is that, during a paint, the layers system went through and asked a canvas to draw, which then caused us to destroy the software-only layer manager (which we were using to paint) and create the accelerated layer manager.

Because we simply hold on to pointers to layer managers, we had a stale pointer which we then happily used to corrupt most of the heap.

https://tbpl.mozilla.org/?tree=Try&rev=5c6fe721e239 is a try run for this patch.
Comment 46 Joe Drew (not getting mail) 2012-08-09 15:59:49 PDT
Also, I filed bug 781679 to help us make sure that this doesn't happen again.
Comment 47 Marcia Knous [:marcia - use ni] 2012-08-09 16:02:21 PDT
(In reply to Joe Drew (:JOEDREW!) from comment #43)
> Those may very well be my crashes, Marcia. :) Check the URL.

I rechecked and there still are crashes on 14.0.1 in [@ _moz_cairo_destroy ] - see

https://crash-stats.mozilla.com/report/index/578f98a5-6502-464b-96a5-a07322120809 as an example.  The UTC time stamp shows 21:33 - so either we are still catching up on processing of crashes from yesterday or people are still crashing.
Comment 48 Robert Kaiser 2012-08-09 16:23:14 PDT
(In reply to Marcia Knous [:marcia] from comment #47)
> The UTC time stamp shows 21:33 - so either we
> are still catching up on processing of crashes from yesterday or people are
> still crashing.

Catching up doesn't show current times, we always show the time we have received the crash from the user. Also, we were able to handle all crashes that were incoming over the last days.
So those are crashes happening now. Still, since Aug 09, 2012 08:17 UTC this has slowed down very significantly.
Comment 49 Robert Kaiser 2012-08-09 16:34:42 PDT
To clarify more: This is not in the top 100 any more for 14.0.1 in the last 12 hours.

There are still a low volume of crashes happening, but given that joe found out that this is caused by just initializing canvas at all, it doesn't take a lot to run into it somehow - and even the fixed version still is a canvas-based game so probably can trigger it in some way.
The real fix is to get the patch to our code landed.

Thanks, joe, for figuring that out, sounds like a real pain to find. :)
Comment 50 Robert O'Callahan (:roc) (Exited; email my personal email if necessary) 2012-08-09 16:47:24 PDT
Comment on attachment 650708 [details] [diff] [review]
don't create the surface from GetCanvasLayer

Review of attachment 650708 [details] [diff] [review]:
-----------------------------------------------------------------

I don't think we need Matt's review here.
Comment 51 Robert O'Callahan (:roc) (Exited; email my personal email if necessary) 2012-08-09 16:49:15 PDT
To be clear, this could only be triggered when loading a page using a 2D canvas within 5 seconds of starting the browser, on Windows using D3D9. The page would also have to call getContext("2d") but not draw into the canvas.
Comment 52 Joe Drew (not getting mail) 2012-08-09 17:30:38 PDT
> The page would also have to call getContext("2d") but not draw into the canvas.

Clarification: *before* drawing into the canvas. I.e., always. :)
Comment 53 Robert O'Callahan (:roc) (Exited; email my personal email if necessary) 2012-08-09 17:50:24 PDT
If the page actually draws into the canvas, then EnsureSurface will be called and the persistent layer manager will be switched to at that point, which will not be during painting so we wouldn't crash. I think.
Comment 54 Joe Drew (not getting mail) 2012-08-09 18:28:16 PDT
Er, right. I forgot the getContext() call wasn't what crashed, it just created the context so it could cause the crash.

If the page doesn't draw into the canvas before we draw it the first time, though, we'll crash.
Comment 55 Joe Drew (not getting mail) 2012-08-09 18:29:10 PDT
Comment on attachment 650708 [details] [diff] [review]
don't create the surface from GetCanvasLayer

Because this was roc's idea, Matt should look at this too.
Comment 56 farluiz 2012-08-10 01:21:48 PDT
Perhaps this information can help.

Firefox 14.0.1 on my Windows XP SP3 does not crashes in any scenario proposed here.

a) Reproducing case comment 28:
   no crash;
   works fine;

b) Reproducing case comment 30:
   no crash;
   works fine;

c) http://www.google.com/doodles/basketball-2012 and
   *ALL* other Google doodles as a homepage:
   no crash;
   works fine;

My computer.

- XP SP3 2GB memory;
- Firefox 14.0.1;
- Flash 11.3.300.268;
- Next Generation Java Plug-in 10.5.1 for Mozilla browsers;
- GPU AMD Radeon HD 3850 with Catalyst 11.12;
- Hardware Acceleration always enabled;
- Hardware Acceleration working perfectly!
- GPU acceleration of the Google doodles very well (great!);

- I tested in all scenarios:
  New profile, current profile, cache clear,
  cache unclear, gfx.canvas.azure.enabled=false and
  layers.prefer.d3d9=true, gfx.canvas.azure.enabled=true and
  layers.prefer.d3d9=false, etc
 
I've tried everything to crash Firefox and nothing! Google doodles works fine!

Does the crash only occurs in nvidia's cards?
Comment 57 Scoobidiver (away) 2012-08-10 01:30:13 PDT
farluiz, thanks for your help, but we already have a testcase and a fix.
Comment 58 Daniel Cater 2012-08-10 02:21:55 PDT
(In reply to Joe Drew (:JOEDREW!) from comment #45)
> On Windows, we start up Firefox with a software-only layer manager, to hide
> the latency of creating a D3D device. Five seconds after startup, we switch
> to the accelerated layer manager (if it's possible to). This works just
> fine, most of the time.

Is it possible to write a test that runs within 5 seconds of startup in order to test this?
Comment 59 :Ehsan Akhgari 2012-08-10 08:26:31 PDT
(In reply to comment #58)
> (In reply to Joe Drew (:JOEDREW!) from comment #45)
> > On Windows, we start up Firefox with a software-only layer manager, to hide
> > the latency of creating a D3D device. Five seconds after startup, we switch
> > to the accelerated layer manager (if it's possible to). This works just
> > fine, most of the time.
> 
> Is it possible to write a test that runs within 5 seconds of startup in order
> to test this?

Of course, the test just needs to run at the initialization time of one of our test frameworks.
Comment 60 Joe Drew (not getting mail) 2012-08-10 09:14:03 PDT
https://hg.mozilla.org/integration/mozilla-inbound/rev/ccbcf3438328

I'll request approval to land this patch on other branches once it's baked a little while.
Comment 61 Joe Drew (not getting mail) 2012-08-10 09:14:55 PDT
And I'll also try to work up a test for this.
Comment 62 Daniel Veditz [:dveditz] 2012-08-10 12:59:53 PDT
Tight timing window but potentially exploitable. Have no idea how easy or hard it might be to make a reliable exploit, but hiding anyway because the testcases here can only give someone a head start on doing so.
Comment 63 Joe Drew (not getting mail) 2012-08-10 14:13:16 PDT
Created attachment 650998 [details] [diff] [review]
somewhat ineffective test

This test *should* work, but at least on my very fast machine, on a debug build, it doesn't crash without my patch - presumably because with the reftest addon it takes more than 5 seconds to start up Firefox.

We might need to add a manual test for this.
Comment 64 Jason Smith [:jsmith] 2012-08-10 14:24:48 PDT
We're using in-moztrap? now, not in-litmus?
Comment 65 Ryan VanderMeulen [:RyanVM] 2012-08-11 19:59:30 PDT
(In reply to Joe Drew (:JOEDREW!) from comment #60)
> https://hg.mozilla.org/integration/mozilla-inbound/rev/ccbcf3438328
> 

https://hg.mozilla.org/mozilla-central/rev/ccbcf3438328
Comment 66 Joe Drew (not getting mail) 2012-08-14 10:33:53 PDT
Comment on attachment 650708 [details] [diff] [review]
don't create the surface from GetCanvasLayer

[Approval Request Comment]
Bug caused by (feature/regressing bug #): bug 591358
User impact if declined: Possibility for further crashes if Google or another major site that people have as their homepage includes canvases.
Testing completed (on m-c, etc.): On m-c for a few days. Local testing too.
Risk to taking this patch (and alternatives if risky): Could potentially break canvases in some very edge casey ways.
String or UUID changes made by this patch: none
Comment 67 Lukas Blakk [:lsblakk] use ?needinfo 2012-08-14 11:40:30 PDT
Comment on attachment 650708 [details] [diff] [review]
don't create the surface from GetCanvasLayer

Will track this for ESR 10.0.7 as well, please land to branches.
Comment 68 Joe Drew (not getting mail) 2012-08-14 12:58:52 PDT
https://hg.mozilla.org/releases/mozilla-beta/rev/fe13653f05ec
https://hg.mozilla.org/releases/mozilla-aurora/rev/c45599eff49f
https://hg.mozilla.org/releases/mozilla-esr10/rev/e6961914db3b

I presume we will set this to fx14:wontfix, but I'll leave that change to someone else.
Comment 69 Anthony Hughes (:ashughes) [GFX][QA][Mentor] 2012-08-27 12:22:55 PDT
(In reply to Marcia Knous [:marcia] from comment #22)
> We can reproduce this consistently in the QA lab with a Win 7 machine with a
> GeForce GT520 card.

Marcia, I'm having a hard time reproducing this locally. Can you please retry reproducing this in the QA lab, and if you can please verify the fix for Firefox 15, 10.0.7esr, 16.0a2 and 17.0a1 (in order of priority).

Thanks
Comment 70 Mihaela Velimiroviciu (:mihaelav) 2012-11-05 07:10:25 PST
Mozilla/5.0 (Windows NT 5.1; rv:17.0) Gecko/17.0 Firefox/17.0
Mozilla/5.0 (Windows NT 6.1; rv:17.0) Gecko/17.0 Firefox/17.0

I could not reproduce the crash on the above builds (latest beta - 17b4) with the scenarios from comments 28 and 30.
However, there are many crashes with this signature in socorro, on builds newer than the fixes, but they don't have similar stacktrace.
There are crash reports for the other signatures tracked here:
* https://crash-stats.mozilla.com/report/list?product=Firefox&query_search=signature&query_type=exact&query=gfxASurface%3A%3ARelease%28%29&reason_type=contains&date=11%2F05%2F2012%2014%3A42%3A24&range_value=1&range_unit=weeks&hang_type=any&process_type=any&do_query=1&signature=gfxASurface%3A%3ARelease%28%29
* https://crash-stats.mozilla.com/report/list?product=Firefox&version=Firefox%3A17.0b4&version=Firefox%3A17.0b3&version=Firefox%3A17.0b2&version=Firefox%3A17.0b1&platform=windows&query_search=signature&query_type=exact&query=_moz_cairo_surface_get_reference_count&reason_type=contains&date=11%2F05%2F2012%2014%3A54%3A01&range_value=4&range_unit=weeks&hang_type=any&process_type=any&do_query=1&signature=_moz_cairo_surface_get_reference_count
* https://crash-stats.mozilla.com/report/list?product=Firefox&version=Firefox%3A17.0b4&version=Firefox%3A17.0b3&version=Firefox%3A17.0b2&version=Firefox%3A17.0b1&platform=windows&query_search=signature&query_type=exact&query=gfxASurface%3A%3ARelease%28%29&reason_type=contains&date=11%2F05%2F2012%2014%3A54%3A01&range_value=4&range_unit=weeks&hang_type=any&process_type=any&do_query=1&signature=gfxASurface%3A%3ARelease%28%29
* https://crash-stats.mozilla.com/report/list?product=Firefox&version=Firefox%3A17.0b4&version=Firefox%3A17.0b3&version=Firefox%3A17.0b2&version=Firefox%3A17.0b1&platform=windows&query_search=signature&query_type=exact&query=mozilla%3A%3AFrameLayerBuilder%3A%3ABuildContainerLayerFor%28nsDisplayListBuilder%2A%2C%20mozilla%3A%3Alayers%3A%3ALayerManager%2A%2C%20nsIFrame%2A%2C%20nsDisplayItem%2A%2C%20nsDisplayList%20const%26amp%3B%2C%20mozilla%3A%3AFrameLayerBuilder%3A%3AContainerParameters%20const%26amp%3B%2C%20gfx3DMatrix%20const%2A%29&reason_type=contains&date=11%2F05%2F2012%2014%3A54%3A01&range_value=30&range_unit=days&hang_type=any&process_type=any&do_query=1&signature=mozilla%3A%3AFrameLayerBuilder%3A%3ABuildContainerLayerFor%28nsDisplayListBuilder%2A%2C%20mozilla%3A%3Alayers%3A%3ALayerManager%2A%2C%20nsIFrame%2A%2C%20nsDisplayItem%2A%2C%20nsDisplayList%20const%26%2C%20mozilla%3A%3AFrameLayerBuilder%3A%3AContainerParameters%20const%26%2C%20gfx3DMatrix%20const%2A%29
* https://crash-stats.mozilla.com/report/list?product=Firefox&version=Firefox%3A17.0b4&version=Firefox%3A17.0b3&version=Firefox%3A17.0b2&version=Firefox%3A17.0b1&platform=windows&query_search=signature&query_type=exact&query=%400x0%20%7C%20PL_DHashTableOperate%20%7C%20mozilla%3A%3A%60anonymous%20namespace%27%27%3A%3AContainerState%3A%3AProcessDisplayItems%28nsDisplayList%20const%26amp%3B%2C%20mozilla%3A%3AFrameLayerBuilder%3A%3AClip%26amp%3B%2C%20unsigned%20int%29&reason_type=contains&date=11%2F05%2F2012%2014%3A54%3A01&range_value=30&range_unit=days&hang_type=any&process_type=any&do_query=1&signature=%400x0%20%7C%20PL_DHashTableOperate%20%7C%20mozilla%3A%3A%60anonymous%20namespace%27%27%3A%3AContainerState%3A%3AProcessDisplayItems%28nsDisplayList%20const%26%2C%20mozilla%3A%3AFrameLayerBuilder%3A%3AClip%26%2C%20unsigned%20int%29
* https://crash-stats.mozilla.com/report/list?product=Firefox&version=Firefox%3A17.0b4&version=Firefox%3A17.0b3&version=Firefox%3A17.0b2&version=Firefox%3A17.0b1&platform=windows&query_search=signature&query_type=exact&query=je_free%20%7C%20ChangeTable&reason_type=contains&date=11%2F05%2F2012%2014%3A54%3A01&range_value=30&range_unit=days&hang_type=any&process_type=any&do_query=1&signature=je_free%20%7C%20ChangeTable
* https://crash-stats.mozilla.com/report/list?product=Firefox&version=Firefox%3A17.0b4&version=Firefox%3A17.0b3&version=Firefox%3A17.0b2&version=Firefox%3A17.0b1&platform=windows&query_search=signature&query_type=exact&query=mozilla%3A%3ARefCounted%26lt%3Bmozilla%3A%3Agfx%3A%3ADrawTarget%26gt%3B%3A%3ARelease%28%29&reason_type=contains&date=11%2F05%2F2012%2014%3A54%3A01&range_value=30&range_unit=days&hang_type=any&process_type=any&do_query=1&signature=mozilla%3A%3ARefCounted%3Cmozilla%3A%3Agfx%3A%3ADrawTarget%3E%3A%3ARelease%28%29

Joe, do you think these are related to this issue?

Note You need to log in before you can comment on or make changes to this bug.