Closed Bug 805406 Opened 12 years ago Closed 8 years ago

crash in gfxContext::PushClipsToDT with Direct2D 1.1 (d3d11.dll 6.2 or 6.3)

Categories

(Core :: Graphics, defect)

All
Windows 7
defect
Not set
critical

Tracking

()

RESOLVED FIXED
mozilla30
Tracking Status
firefox19 - ---
firefox20 + wontfix
firefox21 + wontfix
firefox22 + wontfix
firefox23 --- wontfix
firefox24 --- wontfix
firefox26 - wontfix
firefox27 - wontfix
firefox28 + wontfix
firefox29 + wontfix
firefox30 --- affected
relnote-firefox --- 27+

People

(Reporter: scoobidiver, Assigned: bas.schouten)

References

(Depends on 1 open bug)

Details

(Keywords: crash, reproducible, topcrash-win, Whiteboard: [latest in comment 39])

Crash Data

Attachments

(5 files)

This bug tracks crashes not fixed by bug 758531 and bug 803949.
In today's build, 85% of crashes happen on Windows 8 and with various GPUs and graphics driver versions. We need to monitor to see whether it's a confirmed tendency.

Signature 	gfxContext::PushClipsToDT(mozilla::gfx::DrawTarget*) More Reports Search
UUID	22050036-2520-4515-bff7-82b3c2121024
Date Processed	2012-10-24 22:58:06
Uptime	363
Install Age	8.4 hours since version was first installed.
Install Time	2012-10-25 04:33:51
Product	Firefox
Version	19.0a1
Build ID	20121024030643
Release Channel	nightly
OS	Windows NT
OS Version	6.2.8250
Build Architecture	x86
Build Architecture Info	GenuineIntel family 6 model 42 stepping 7
Crash Reason	EXCEPTION_ACCESS_VIOLATION_READ
Crash Address	0x0
App Notes 	
AdapterVendorID: 0x8086, AdapterDeviceID: 0x0116, AdapterSubsysID: 05061025, AdapterDriverVersion: 9.17.10.2584
D2D? D2D+ DWrite? DWrite+ D3D10 Layers? D3D10 Layers+ 
EMCheckCompatibility	True
Adapter Vendor ID	0x8086
Adapter Device ID	0x0116
Total Virtual Memory	2147352576
Available Virtual Memory	1672577024
System Memory Use Percentage	63
Available Page File	2341400576
Available Physical Memory	715902976

Frame 	Module 	Signature 	Source
0 	xul.dll 	gfxContext::PushClipsToDT 	gfx/thebes/gfxContext.cpp:2029
1 	xul.dll 	gfxContext::PushGroup 	gfx/thebes/gfxContext.cpp:1460
2 	xul.dll 	gfxContext::PushGroupAndCopyBackground 	gfx/thebes/gfxContext.cpp:1548
3 	xul.dll 	mozilla::layers::BasicLayerManager::PushGroupForLayer 	gfx/layers/basic/BasicLayerManager.cpp:86
4 	xul.dll 	mozilla::layers::BasicThebesLayer::PaintThebes 	gfx/layers/basic/BasicThebesLayer.cpp:131
5 	xul.dll 	mozilla::layers::BasicLayerManager::PaintSelfOrChildren 	gfx/layers/basic/BasicLayerManager.cpp:825

More reports at:
https://crash-stats.mozilla.com/report/list?signature=gfxContext%3A%3APushClipsToDT%28mozilla%3A%3Agfx%3A%3ADrawTarget*%29
(In reply to Scoobidiver from comment #0)
> In today's build, 85% of crashes happen on Windows 8 and with various GPUs
> and graphics driver versions. We need to monitor to see whether it's a
> confirmed tendency.
It's now 93%.
Summary: crash in gfxContext::PushClipsToDT → crash in gfxContext::PushClipsToDT mainly on Windows 8
Whiteboard: [Win8]
@Bas I think http://hg.mozilla.org/mozilla-central/annotate/93cc1ee94291/gfx/thebes/gfxContext.cpp#l2029 is the code responsible, but that would mean that wherever earlier it's being called with parameter as mDT, that is a null-ptr. Since after a call to PushClipsToDT there's an access of mDT->something in the calling functions, just null-checking inside PushClipsToDT won't fix it in my opinion. Ideas ?
Flags: needinfo?(bas)
Yes, mDT being NULL is bad. I don't have any ideas, one might expect this in an OOM situation, but I can't explain why that would be more frequent on Win8 so their might be another issue.
Flags: needinfo?(bas)
> Total Virtual Memory		4294836224
> Available Virtual Memory	2891964416
> System Memory Use Percentage	41
> Available Page File		25678143488
> Available Physical Memory	10019131392

I have quite some memory, so I doubt it was a OOM problem in my case. I was using a problematic beta version of the graphics driver at this time though (caused other crashes which is why it got blacklisted).

bp-6e7dda3e-2802-41fe-966d-d93fe2121113
(In reply to [Baboo] from comment #4)
> bp-6e7dda3e-2802-41fe-966d-d93fe2121113
It may be bug 758531 or bug 793175 that are not fixed in Firefox 17.
Please update to Firefox 17.0.1.
It's #8 top browser crasher in the first hours of 18.0 (it's the only open bug for this version).

Two comments talks about upgrading drivers (Windows 7 part).
Keywords: topcrash
(In reply to Scoobidiver from comment #5)
> Please update to Firefox 17.0.1.
Just happened on the newest beta (19.0), graphics driver restarted right before because of Windows' "Timeout Detection and Recovery" feature.

bp-b66a143b-a436-4a0f-80c6-1bdfc2130109
It dropped off after a few days, now only #35 browser crasher in 18.0 and #47 in 19.0b1.
Keywords: topcrash
It's still #3 on Win8-specific topcrashes though.
Noticed over 500 crashes for F19 beta 1 and 2 (last 2 weeks) while looking over bug 758531. Should this be tracked in F19 now given the volume of crashes?
It's #61 browser crasher in 19.0b2 so not a top crasher.
Yes, an absolute number doesn't say much. For example, the #10 topcrash has over 1000 crashes in a week in 19 beta 2 alone. That's what you get for having active daily installations in the range of millions. ;-)
Not a top crasher, and we're past Win8 release so we don't expect this to be a high volume any time soon.
Also note that the signature isn't *only* Win8, it just dominates it by far - right now, ~82% are Win8, ~18% Win7, and 1 crash on Vista (over the last week).
Easily reproducible for me now by simply installing a GPU driver while Firefox is running and judging from the comments I am not the only one. Could it be that some change in Win8's DWrite made Firefox fail to handle GPU hardware resets? Sort of a regression of the device reset handling, i.e. bug 604271? (BTW, shouldn't bug 553089 be marked as fixed?)
Depends on: 839805
It started spiking on Windows 7 after the release of KB2670838 on February 27 (see https://crash-analysis.mozilla.com/rkaiser/2013-02-28/2013-02-28.firefox.19.explosiveness.html):
Windows 8 	58.46 %
Windows 7 	41.409 %

There's no longer crash correlations (see bug 836671) but I assume it's related to Direct2D DLL 6.2.
Summary: crash in gfxContext::PushClipsToDT mainly on Windows 8 → crash in gfxContext::PushClipsToDT with d2d11.dll 6.2
Whiteboard: [Win8]
It's #4-5 top crasher in 20.0b1 for Windows 8 only. With the new rules in https://wiki.mozilla.org/CrashKill/Topcrash, that qualifies it for the topcrash keyword.
Keywords: topcrash
Crashed on win7 x64 also. I can relate to Baboo comment there: https://bugzilla.mozilla.org/show_bug.cgi?id=805406#c15  . I was also installing latest driver, using FF for some time afterwards and then it crashed. 20.0b2
The majority of crashes are now on Windows 7 after the KB2670838 MS hotfix:
Windows 7 	75.408 %
Windows 8 	24.565 %

In addition to be a topcrash for Windows 8 only, it's also an absolute top crash because #20 top browser crasher in 19.0 and 20.0b2, #11 in 21.0a2 and #4 in 22.0a1.

I don't know whether it's transient (related to graphics driver update?) or permanent.

Still no crash correlations to confirm (see 836671).
OS: Windows 8 → Windows 7
(In reply to [Baboo] from comment #15)
> Easily reproducible for me now by simply installing a GPU driver while
> Firefox is running and judging from the comments I am not the only one.
> Could it be that some change in Win8's DWrite made Firefox fail to handle
> GPU hardware resets? Sort of a regression of the device reset handling, i.e.
> bug 604271? (BTW, shouldn't bug 553089 be marked as fixed?)

(In reply to Alexander Doborshhuk from comment #18)
> Crashed on win7 x64 also. I can relate to Baboo comment there:
> https://bugzilla.mozilla.org/show_bug.cgi?id=805406#c15  . I was also
> installing latest driver, using FF for some time afterwards and then it
> crashed. 20.0b2

What graphics card and graphics drivers do you all use? We're going to try to reproduce in QA by updating graphics drivers and Win7 updates while updating to KB2670838, but the more info the better.

Bas, hoping you're the best person on the gfx team to help with the investigation as we gather more info.
Assignee: nobody → bas
Keywords: qawanted
(In reply to Alex Keybl [:akeybl] from comment #20)

> What graphics card and graphics drivers do you all use? We're going to try
> to reproduce in QA 

Well, i guess there are 2 most popular choices for topcrash - AMD and nvidia :).
For me the case was official nvidia whql drivers, english, GTX 550 Ti card.
Juan, can you see if we have any NVidia GTX 550 Ti cards in our lab? If not, can you see if you can order one in?
QA Contact: jbecerra
(In reply to Alexander Doborshhuk from comment #21)
> (In reply to Alex Keybl [:akeybl] from comment #20)
> 
> > What graphics card and graphics drivers do you all use? We're going to try
> > to reproduce in QA 
> 
> Well, i guess there are 2 most popular choices for topcrash - AMD and nvidia
> :).
> For me the case was official nvidia whql drivers, english, GTX 550 Ti card.

Can you check the exact GPU driver version in Windows Device Manager?
Alexander noted over email that it's version "9.18.13.1407, date 09.02.2013"
(In reply to Alex Keybl [:akeybl] from comment #24)
> Alexander noted over email that it's version "9.18.13.1407, date 09.02.2013"

Based on what I know about NVidia driver nomenclature I think this is GeForce 314.07.
See this crash too on Win8, 64Bit
NVidia GeForce 210
Driver version 310.70
A way to reproducte it, probably the animation between screens

http://dev.vaadin.com/ticket/11346
André, interesting. Can reproduce every time on FF 20b5 (ie bp-d106a725-12e7-4028-be4b-c64a02130315), but i think it's a different bug from this one
(In reply to André Schild from comment #28)

Also, i have not changed anything except swapping out nvidia card and installing AMD with corresponding drivers, and now i can't reproduce your issue with vaadin
I came across this recently. This can be also caused by faulty software which crashes the graphics driver so Win7 reloads it. In my case it is World of Warcraft. Here are two crash reports: ca943703-3bdc-4d0a-b513-f90fb2130316, 3aa20ad1-7425-4304-916b-2d6cd2130314. As I checked on Internet, this bug in WoW appeared about half year ago and still is not fixed.
QA Contact: jbecerra
QA Contact: jbecerra
I've been testing this on a Windows 8 machine with an nVidia GeForce GT 530 card with driver version 314.07 (9.18.13.1407) with IE10 (KB2809289 - latest as of today) using the steps in comment #15 and the link in comment #28.

I was not able to crash 19.0.2/20.b5 while upgrading the graphics driver.

I was able to crash 19.0.2, 20.0b5, and nightly - but not 20.b4 (funnel) - using the link in comment #28 every time, although it isn't the same crash signature, but it seems related:

https://crash-stats.mozilla.com/report/index/bp-3ad0ee39-c3f2-4550-b643-c1d6a2130319

We don't have a 550 graphics card. I'll order one from Amazon if the above information doesn't help.
I forgot to mention that disabling hardware acceleration made the crash go away; the one from following the steps in the link in comment #28
Adding needsinfo for Bas to see if :juanb's STR would be of help with further investigation here . The signature juan's crashing with, is different but wanted to check if it could be morphed or related.

In case this does not help, please let Juan know so we can order a 550 graphics card to progress on this with QA help.
Flags: needinfo?(bas)
Still waiting for card to arrive.
Hmm, I'm kind able to reproduce this crash, here is one of my last reports, I was able to crash Nightly (Mozilla/5.0 (Windows NT 6.1; WOW64; rv:22.0) Gecko/20130328 Firefox/22.0) some about 10 times: https://crash-stats.mozilla.com/report/index/bp-6b71dc49-f3bd-4c10-9e5f-f39dc2130329

  Graphics

        Adapter Description
        NVIDIA GeForce GTX 670

        Adapter Drivers
        nvd3dumx,nvwgf2umx,nvwgf2umx nvd3dum,nvwgf2um,nvwgf2um

        Adapter RAM
        2047

        Device ID
        0x1189

        Direct2D Enabled
        true

        DirectWrite Enabled
        true (6.2.9200.16492)

        Driver Date
        3-14-2013

        Driver Version
        9.18.13.1422

        GPU #2 Active
        false

        GPU Accelerated Windows
        1/1 Direct3D 10

        Vendor ID
        0x10de

        WebGL Renderer
        Google Inc. -- ANGLE (NVIDIA GeForce GTX 670)

        AzureCanvasBackend
        direct2d

        AzureContentBackend
        direct2d

        AzureFallbackCanvasBackend
        cairo

STR in my case:
1. install this addon: https://addons.mozilla.org/pl/firefox/addon/stratiform/
2. enter about:addons
3. Use "option" for Stratiform (sometimes You need open close options window few  times to crash)
4. Nightly crash

Latest forced crash: https://crash-stats.mozilla.com/report/index/bp-dbc38084-02b9-4d0f-89b8-e6f1b2130329
(In reply to juan becerra [:juanb] from comment #35)
> Still waiting for card to arrive.

Any luck getting the card to work in one of your systems?
FYI, this is topcrash #12 overall on Firefox 20 right now, on Win8 it's #5 (or even higher, depending on which day of stats I look at).
It's happening on both Win7 and Win8, but with higher frequency on the latter.
We discussed this today in a meeting with the graphics team. We're going to add a D2D version alongside driver version for remote blocklisting for FF21 - bug 861233.

We're also going to land a speculative null check on nightly, to watch and see if overall crash volume drops by the correct amount.
Blocks: KB2670838
Depends on: 861233
Whiteboard: [latest in comment 39]
By removing this flag we should be reducing memory usage somewhat in the presence of large canvases or large thebes layers. Hopefully this will help this crash. It might be nice to let this sit on central by itself for 1 or 2 days to see if it reduces the amount of crashes.
Attachment #738476 - Flags: review?(jmuizelaar)
Flags: needinfo?(bas)
(In reply to Alex Keybl [:akeybl] from comment #39)
> We discussed this today in a meeting with the graphics team. We're going to
> add a D2D version alongside driver version for remote blocklisting for FF21
> - bug 861233.
> 
> We're also going to land a speculative null check on nightly, to watch and
> see if overall crash volume drops by the correct amount.

Just to be clear, since I may have done a poor job at the meeting, a speculative null check on nightly by itself is -extremely- unlikely to reduce crash volume. There's something slightly more involved we're planning on doing which will get rendering into a completely messed up state, and may very well result in crashes soon after, but it also may not. That's what we're going to look at here.
Dropping qawanted since we've had no success in reproducing this crash. Once something lands that needs our verification (fix or blocklisting) please add the verifyme keyword.
Keywords: qawanted
Attachment #738476 - Flags: review?(jmuizelaar) → review+
Attachment #738486 - Flags: review?(jmuizelaar) → review+
Comment on attachment 738486 [details] [diff] [review]
Attempt to deal with group surface allocation failing

Review of attachment 738486 [details] [diff] [review]:
-----------------------------------------------------------------

Add a ReportFailure() in the bad case.
(In reply to Jeff Muizelaar [:jrmuizel] from comment #44)
> Comment on attachment 738486 [details] [diff] [review]
> Attempt to deal with group surface allocation failing
> 
> Review of attachment 738486 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> Add a ReportFailure() in the bad case.

This is in gfxContext where we don't really have this :-(
  gfx::LogFailure(msg);
(In reply to Alex Keybl [:akeybl] from comment #23)
> 
> Can you check the exact GPU driver version in Windows Device Manager?

AMD Radeon HD 7570M 
driver version: 12.100.17.0 (13.4)

Device reset because I updated Intel chipset software today on latest Beta:
bp-8dcbaa2f-814d-43c2-b81b-a12452130503
Just to clarify, I meant that I updated the Intel chipset drivers while I was running the latest Firefox beta the background.

BTW, how is this dependent on bug 861233? From what I've seen AMD and nVidia are equally affected, do you plan on blocking hardware acceleration for the majority if not all Win8 users?
Which Intel chipset drivers did you update? Was this for a CPU, integrated GPU, or a motherboard chipset? It would also be good to know specifically what chipset you have and the URL to the Intel chipset download so we can try to replicate.
Blocks: 859377
(In reply to Anthony Hughes, Mozilla QA (:ashughes) from comment #51)
> Which Intel chipset drivers did you update? Was this for a CPU, integrated
> GPU, or a motherboard chipset?

Motherboard chipset, but I don't think it matters since the crash happens with any device reset, no matter what caused it, be it chipset update or GPU update (which is only AMD in my case since my integrated Intel is disabled at the lowest level).
I can reproduce the problem on this website http://pcottle.github.io/learnGitBranching/index.html?demo using current nightly Mozilla/5.0 (Windows NT 6.2; WOW64; rv:24.0) Gecko/20130521 Firefox/24.0 on Windows 8 Pro x64 using a Nvidia Quadro 2000M with 320.00 drivers.

https://crash-stats.mozilla.com/report/index/bp-a9773f83-003f-421c-b2e1-ecc262130521
https://crash-stats.mozilla.com/report/index/bp-93c928ef-0589-40ec-a28c-f1bab2130521
Me too! Seth/Bas, is there additional information you want? I can provide VMMap info or a full dump if that would be helpful.
Flags: needinfo?(bas)
I can't reproduce crashes with the STR of comment 53 on Windows 7 (D2D11.dll 6.2) and Intel GPU.
Keywords: reproducible
(In reply to Scoobidiver from comment #55)
> I can't reproduce crashes with the STR of comment 53 on Windows 7 (D2D11.dll
> 6.2) and Intel GPU.

Did you mean you "can" reproduce crashes?
(In reply to Anthony Hughes, Mozilla QA (:ashughes) from comment #56)
> Did you mean you "can" reproduce crashes?
I mean can't. If I add reproducible as keyword, it's because two users can reproduce it.
Thanks for that clarification, Scoobidiver.
This a Topcrasher across all current branches:
#9 on Release (Fx21)
#5 on Beta (Fx22b)
#5 on Aurora (Fx23a)
#5 on Nightly (Fx24a)
Flags: needinfo?(seth)
Is there specific information you need from me?
Flags: needinfo?(seth)
Here are interesting correlations (bug 836671 and bug 867342 killed us) that explain why I don't crash with a DirectX-10 GPU and MoinMan/Semtex crash with a DirectX-11 GPU:
     99% (194/195) vs.  12% (4414/36680) d3d11.dll
          5% (10/195) vs.   0% (161/36680) 6.2.9200.16384
         24% (47/195) vs.   2% (601/36680) 6.2.9200.16420
          1% (1/195) vs.   0% (3/36680) 6.2.9200.16440
         69% (135/195) vs.  10% (3624/36680) 6.2.9200.16492
          1% (1/195) vs.   0% (1/36680) 6.3.9364.0
Summary: crash in gfxContext::PushClipsToDT with d2d11.dll 6.2 → crash in gfxContext::PushClipsToDT with DirectX-11 GPUs (d3d11.dll 6.2 or 6.3)
(In reply to Scoobidiver from comment #61)
> Here are interesting correlations (bug 836671 and bug 867342 killed us) that
> explain why I don't crash with a DirectX-10 GPU and MoinMan/Semtex crash
> with a DirectX-11 GPU:
>      99% (194/195) vs.  12% (4414/36680) d3d11.dll
>           5% (10/195) vs.   0% (161/36680) 6.2.9200.16384
>          24% (47/195) vs.   2% (601/36680) 6.2.9200.16420
>           1% (1/195) vs.   0% (3/36680) 6.2.9200.16440
>          69% (135/195) vs.  10% (3624/36680) 6.2.9200.16492
>           1% (1/195) vs.   0% (1/36680) 6.3.9364.0

That isn't actually related to the GPU, but rather to the API being used. With direct2D 1.1 D2D switched from using D3D10 to D3D11 as an API.
Flags: needinfo?(bas)
Summary: crash in gfxContext::PushClipsToDT with DirectX-11 GPUs (d3d11.dll 6.2 or 6.3) → crash in gfxContext::PushClipsToDT with Direct2D 1.1 (d3d11.dll 6.2 or 6.3)
given that I can reproduce this, what additional information would be useful?
Flags: needinfo?(seth)
You should probably be asking that question to Bas.
Flags: needinfo?(seth) → needinfo?(bas)
(In reply to Benjamin Smedberg  [:bsmedberg] from comment #63)
> given that I can reproduce this, what additional information would be useful?

Could you see if the patch on this bug makes it go away? (I've not been able to find the cause of the reftest failure yet, but I suspect it wasn't this patch, it'd be good to know if the patch is useful at all though)
Flags: needinfo?(bas)
(In reply to Bas Schouten (:bas.schouten) from comment #65)
> (In reply to Benjamin Smedberg  [:bsmedberg] from comment #63)
> > given that I can reproduce this, what additional information would be useful?
> 
> Could you see if the patch on this bug makes it go away? (I've not been able
> to find the cause of the reftest failure yet, but I suspect it wasn't this
> patch, it'd be good to know if the patch is useful at all though)

Hrm, never mind, on another machine I can reproduce this myself. I'll look into it! It doesn't look like this has anything to do with memory so far, fwiw.
I'm facing a crash which could be possibly due to this bug. Documented here : https://bugzilla.mozilla.org/show_bug.cgi?id=877629#c3
The reproducible bug here is something else in reality, but it is triggering this bug which seems to generally be D2D1.1's D3D11 device going away for some reason, and us not being able to deal with that properly. I have a patch up for the reproducible bug here and it might also help me to find a proper solution to the D3D11 device being removed issue.
Depends on: 877700
As I thought the patch suggested here to wallpaper over the problem only moves the crash somewhere else, I'll attempt to deal with the other crashes as well and come back with a patch that provides a more solid solution for D2D failing on us.
Depends on: 877629
Got reproducible crashes on 22b4 on outlook.com.
Login to inbox, search for any mail and open message, then use browser back button and voila

bp-daf1114e-7c75-4158-bf3f-614a32130605
bp-51ae9a08-ed37-45dc-a1de-f24022130605
(In reply to Bas Schouten (:bas.schouten) from comment #69)
> As I thought the patch suggested here to wallpaper over the problem only
> moves the crash somewhere else, I'll attempt to deal with the other crashes
> as well and come back with a patch that provides a more solid solution for
> D2D failing on us.

Given this, we'll no longer track for release. Sounds like we need a larger solution here. Any updates Bas?
MS has released an update, http://support.microsoft.com/kb/2834140 which updates D3D11.dll, this might or might not fix the various issues related to KB2670838 update.
See Also: → 891542
We should see a serious drop in incidence with the resolving of bug 877700.
(In reply to Bas Schouten (:bas.schouten) from comment #73)
> We should see a serious drop in incidence with the resolving of bug 877700.
It seems so. Comparing the two latest builds at the same time of the day, the older one had 34 crashes and the newer one has 9 crashes. Nevertheless, we should wait a little longer before shouting hurrah.
Comparing the last five builds at the same time of the day (11:00 UTC), here are the results:
20130727: 31
20130728: 26
20130729: 34
20130730: 9
20130731: 10
So bug 877700 has fixed 2/3 of crashes.
On a larger time window and without a time cutoff, the crash volume per build.day is:
20130725	54
20130726	53
20130727	64      Average: 68 crashes
20130728	86
20130729	85   
20130730	34      <-- patch of bug 877700
20130731	36
20130801	43
20130802	43
20130803	31      Average: 34 crashes
20130804	33
20130805	19

So the current volume is closer to half the previous volume than one third.
(In reply to Scoobidiver from comment #76)
> So the current volume is closer to half the previous volume than one third.

Thanks for that analysis. As bug 877700 has landed on 24, we should also get quite useful data from 24 Beta 1 on how much it helps in a higher-volume sample. :)
(In reply to Robert Kaiser (:kairo@mozilla.com) from comment #77)
> Thanks for that analysis. As bug 877700 has landed on 24, we should also get
> quite useful data from 24 Beta 1 on how much it helps in a higher-volume
> sample. :)
It's #4 top browser crasher in 23.0 and #12 in 24.0b1 so the crash volume dropped down.
Blocks: 897420
Or in other numbers:
When 23 was still in beta (last days of July), this signature was around 200 crashes per million ADI. Since 24 has made it to beta, it's around 80 crashes per million ADI.
So on beta, bug 877700 has apparently fixed ~60% of crashes with this signature. \o/
I'm not sure if this is the same bug or not, but I do have reproducible steps that cause Firefox to crash:

------------------------------
BUG REPORT:
Certain animations in Firefox (specifically, sending emails in Outlook.com) make Firefox close unexpectedly when Hardware Acceleration is enabled.

Steps to reproduce:
1) Install Firefox
2) Ensure Hardware Acceleration is enabled:
Options -> Advanced -> General -> Use hardware acceleration when available
(Make sure the option is checked; restart Firefox if you just checked that option)
3) Go to Outlook.com and log in
4) Send an email (ie: a test email to yourself)
5) Firefox crashes

If it doesn't crash on the first attempt, just keep sending emails until it crashes.

Broken for me, on:
Windows 8.1 Preview x64 (latest; fully patched)
Firefox 23.0.1 (latest; fully patched)
nVidia Drivers 326.80 Beta (latest)
GTX 660 Ti (with 2 other GPUs not connected to any monitor; GTX 460, GTS 240)

Workaround:
1) Disable Hardware Acceleration:
Options -> Advanced -> General -> Use hardware acceleration when available
(Make sure the option is unchecked)
2) Restart Firefox
------------------------------

Is this problem a Firefox problem, or is it an nVidia driver problem?

Additionally, here are the 2 most recent crashes inside my about:crashes display

https://crash-stats.mozilla.com/report/index/ffb2162b-bb03-4147-be9e-f94152130820
https://crash-stats.mozilla.com/report/index/cef9d463-4c09-4e37-82d8-632652130819
Hmm... I just installed Firefox 24.0 Beta 4, and I cannot reproduce the problem. Did the patch(s) from this bug report, make it into 24.0 Beta 4? It seems my particular issue is fixed.
(In reply to Jacob W. Klein from comment #80)
> I'm not sure if this is the same bug or not, but I do have reproducible
> steps that cause Firefox to crash:
> 
> ------------------------------
> BUG REPORT:
> Certain animations in Firefox (specifically, sending emails in Outlook.com)
> make Firefox close unexpectedly when Hardware Acceleration is enabled.
> 
> Steps to reproduce:
> 1) Install Firefox
> 2) Ensure Hardware Acceleration is enabled:
> Options -> Advanced -> General -> Use hardware acceleration when available
> (Make sure the option is checked; restart Firefox if you just checked that
> option)
> 3) Go to Outlook.com and log in
> 4) Send an email (ie: a test email to yourself)
> 5) Firefox crashes
> 
> If it doesn't crash on the first attempt, just keep sending emails until it
> crashes.
> 
> Broken for me, on:
> Windows 8.1 Preview x64 (latest; fully patched)
> Firefox 23.0.1 (latest; fully patched)
> nVidia Drivers 326.80 Beta (latest)
> GTX 660 Ti (with 2 other GPUs not connected to any monitor; GTX 460, GTS 240)
> 
> Workaround:
> 1) Disable Hardware Acceleration:
> Options -> Advanced -> General -> Use hardware acceleration when available
> (Make sure the option is unchecked)
> 2) Restart Firefox
> ------------------------------
> 
> Is this problem a Firefox problem, or is it an nVidia driver problem?
> 
> Additionally, here are the 2 most recent crashes inside my about:crashes
> display
> 
> https://crash-stats.mozilla.com/report/index/ffb2162b-bb03-4147-be9e-
> f94152130820
> https://crash-stats.mozilla.com/report/index/cef9d463-4c09-4e37-82d8-
> 632652130819

Hrm, I happen to use an outlook.com account, but neither my NVidia or AMD machine are affected by this. But this particular bug is somewhat timing sensitive. I wonder if the fix for bug 877700 is in beta 4?
(In reply to Jacob W. Klein from comment #81)
> Hmm... I just installed Firefox 24.0 Beta 4, and I cannot reproduce the
> problem. Did the patch(s) from this bug report, make it into 24.0 Beta 4? It
> seems my particular issue is fixed.

In this case, what you are seeing is a case of bug 877700, which we fixed in 24. Thanks for the steps and for testing this, though!
(In reply to Bas Schouten (:bas.schouten) from comment #82)
> Hrm, I happen to use an outlook.com account, but neither my NVidia or AMD
> machine are affected by this. But this particular bug is somewhat timing
> sensitive. I wonder if the fix for bug 877700 is in beta 4?

Well... Here is another set of steps that more-reliably produces the issue.
On my Windows 8.1 Preview x64 PC, nVidia GTX 660 Ti, GeForce 326.80 x64 beta drivers... I can readily produce the problem using FF 23.0.1, but cannot produce the problem at all on FF 24.0 beta 4.

1) Install Firefox
2) Ensure Hardware Acceleration is enabled:
Options -> Advanced -> General -> Use hardware acceleration when available
(Make sure the option is checked; restart Firefox if you just checked that option)
3) Go to Outlook.com and log in
4) Click New to start a new email
5) Type a character in the subject or the body, so that the website things you've started writing the email
6) Click Cancel
7) When prompted to save/delete the draft, click delete
8) Firefox closes unexpectedly

For me, sending an email would only occasionally crash FF 23, but it appears that deleting a new draft always crashes it.
(In reply to Waka_Flocka_Flame (Going hard in the paint since April) from comment #85)
> Still the Number 2 browser crasher in the current 23.0.1 release.

We know that. It has decreased a lot in volume on 24 though, due to the fix to bug 877700.


There's still cases happening in 24 and higher, though, and reproducible steps would be helpful there. bp-b3c000be-8236-42c6-97ac-2f1b22130825 was seen by an NVidia user with the JS-based OpenStreetMap iD editor, maybe this person can give us steps (or someone else find reliable ones), in which case we should create another split-off bug with those and get that case fixed as well.
topcrash is being replaced by more precise keywords per https://bugzilla.mozilla.org/show_bug.cgi?id=927557#c3
Keywords: topcrashtopcrash-win
#3 topcrasher on Fx27
crash volume of this signature combined with the volume in Bug 927413 - crash in mozilla::gfx::DrawTargetSkia::DrawSurface would put gfx crashes at #1 topcrasher on Fx27 above the Empty crash
gfxContext::PushClipsToDT(mozilla::gfx::DrawTarget*) is the #4 topcrasher on Fx28 with 461/10792 crashes in the last 7 days.   This crash signature spiked for the 2013111408 build and then went down again. 

It is the #5 topcrash for Fx27 with 461/10792 crashes in the last 7 days.
No change in the topcrash ranking from previous releases, sad to see this unresolved, but it's not a critical issue for upcoming releases.
This signature has been spiking significantly in the last two days (since 2013-11-19) on Nightly, though, I wonder what might have triggered that.
(In reply to Robert Kaiser (:kairo@mozilla.com) from comment #92)
> This signature has been spiking significantly in the last two days (since
> 2013-11-19) on Nightly, though, I wonder what might have triggered that.

This occurs on driver resets(i.e. crashes) on devices that have Direct2D 1.1 installed (until we can switch to using D2D 1.1, which is still dependent on removing some cairo code). It could be MS marked shipped D2D 1.1 as a forced update last tuesday?

If this is the case we may be forced to look for a better solution.
Many comments from the crash reports today for gfxContext::PushClipsToDT(mozilla::gfx::DrawTarget*) mention updating an Nvidia driver; two report that it is Nvidia Grpahics driver R331.
If NVidia released a new driver, that could explain why it's spiking right now.

(In reply to Bas Schouten (:bas.schouten) from comment #93)
> This occurs on driver resets(i.e. crashes) on devices that have Direct2D 1.1
> installed (until we can switch to using D2D 1.1, which is still dependent on
> removing some cairo code). It could be MS marked shipped D2D 1.1 as a forced
> update last tuesday?

Not this week (where it spiked) as Patch Tuesday was at least a week earlier. But I think that IE10 has been made an "important" update for a while now on Win7 and it depends on D2D 1.1 IIRC, so I'd guess the vast majority of Win7 users have this update installed nowadays.
(In reply to Robert Kaiser (:kairo@mozilla.com) from comment #96)
> If NVidia released a new driver, that could explain why it's spiking right
> now.
> 
> (In reply to Bas Schouten (:bas.schouten) from comment #93)
> > This occurs on driver resets(i.e. crashes) on devices that have Direct2D 1.1
> > installed (until we can switch to using D2D 1.1, which is still dependent on
> > removing some cairo code). It could be MS marked shipped D2D 1.1 as a forced
> > update last tuesday?
> 
> Not this week (where it spiked) as Patch Tuesday was at least a week
> earlier. But I think that IE10 has been made an "important" update for a
> while now on Win7 and it depends on D2D 1.1 IIRC, so I'd guess the vast
> majority of Win7 users have this update installed nowadays.

Could be a driver update then? The combination of installing a driver update and having D2D 1.1 installed would also trigger this crash in theory.
(In reply to Bas Schouten (:bas.schouten) from comment #97)
> Could be a driver update then? The combination of installing a driver update
> and having D2D 1.1 installed would also trigger this crash in theory.

Yes, I expect that. I remember we had spikes of this when driver updates have been released. I'd really prefer if we can find a solution that doesn't make us crash there, esp. as we need to assume most Win7 users have this version installed by now, but I understand that might not be trivial.
(In reply to Robert Kaiser (:kairo@mozilla.com) from comment #98)
> (In reply to Bas Schouten (:bas.schouten) from comment #97)
> > Could be a driver update then? The combination of installing a driver update
> > and having D2D 1.1 installed would also trigger this crash in theory.
> 
> Yes, I expect that. I remember we had spikes of this when driver updates
> have been released. I'd really prefer if we can find a solution that doesn't
> make us crash there, esp. as we need to assume most Win7 users have this
> version installed by now, but I understand that might not be trivial.

When we switch to D2D 1.1 (which I'd much rather see happen sooner than later) it should be pretty easy. Right now it would be a little tricky, but if we think this becomes a high enough priority it's certainly not impossible.
(In reply to Bas Schouten (:bas.schouten) from comment #99)
> When we switch to D2D 1.1 (which I'd much rather see happen sooner than
> later) it should be pretty easy.

What timeframe would that roughly be?
(In reply to Robert Kaiser (:kairo@mozilla.com) from comment #100)
> (In reply to Bas Schouten (:bas.schouten) from comment #99)
> > When we switch to D2D 1.1 (which I'd much rather see happen sooner than
> > later) it should be pretty easy.
> 
> What timeframe would that roughly be?

I'm hopeful that we're talking January, i.e. +/- 2 months.
Current Rank:
* Nightly: #3 @ 5.15%
* Aurora:  #2 @ 3.96%
* Beta:    #4 @ 0.98%
* Release: #5 @ 1.42%

Renominating for tracking given the volume.
Anthony, given that 1) this has been in such ranks for quite a while (and spiking when any graphics driver releases new versions) and 2) we are looking to see this fixed by Bas in January with the switch to D2D 1.1 but have no other immediate way to fix this, I do not think that tracking this crash makes too much sense.

That said, I'll be cheering when it goes away! :)
Agreed, not tracking. Thank you for calling attention though.
Just updating the ranks as of today since it's been almost a month.

Firefox 29: #1 @ 7.78%
Firefox 28: #1 @ 6.36%
Firefox 27: #2 @ 3.63%
Firefox 26: #3 @ 1.98%

I know we said we aren't tracking this and that we're waiting to ship D2D 1.1 to fix this but I think it's prudent to call out that this is increasing in volume as time goes on.
Bas, you said in comment #101 that you hoped we'd switch to D2D 1.1 in January, what is the status on this now and do we have a bug to track that?
Flags: needinfo?(bas)
(In reply to Robert Kaiser (:kairo@mozilla.com) from comment #106)
> Bas, you said in comment #101 that you hoped we'd switch to D2D 1.1 in
> January, what is the status on this now and do we have a bug to track that?

Bug 902952
Flags: needinfo?(bas)
I don't want to put oil into the fire, but ... FF is causing my computer to hang fully once or twice a day, requiring me hard-reboot and causing my RAID setup to rebuild the drive cluster.

This behaviour **** me off.

If it wasn't for the configurable cookie handling ("keep until FF is closed") and the Search Immediately On Key Press option I would have been using some other browser now. Downloaded others but they don't provide this option.

Well, if I seriously think about switching to another browser, other users with less patience and less demands could do, too. ... Just a thought ...

I noticed that FF frequently (but not always) crashes in the Outlook.com (former hotmail.com) web application when the application returns to the inbox after having sent an e-mail. Perhaps this does give clues?
(In reply to Axel from comment #108)
> I don't want to put oil into the fire, but ... FF is causing my computer to
> hang fully once or twice a day, requiring me hard-reboot and causing my RAID
> setup to rebuild the drive cluster.

Do you have any reason to believe that hang/reboot is related to this specific bug (I can't see how it would be)? If not it would probably be good to open up a separate bug for it.

> This behaviour **** me off.
> 
> If it wasn't for the configurable cookie handling ("keep until FF is
> closed") and the Search Immediately On Key Press option I would have been
> using some other browser now. Downloaded others but they don't provide this
> option.
> 
> Well, if I seriously think about switching to another browser, other users
> with less patience and less demands could do, too. ... Just a thought ...
> 
> I noticed that FF frequently (but not always) crashes in the Outlook.com
> (former hotmail.com) web application when the application returns to the
> inbox after having sent an e-mail. Perhaps this does give clues?

Are you seeing this specific crash when that happens? Because that would suggest on your configuration your driver is crashing frequently on Outlook.com. -If- so it would be great if you could open a separate bug for that as well and post the information from the graphics section from 'about:support' it sounds like we might want to blacklist your graphics card/driver combination for acceleration if it causes so many problems.
Flags: needinfo?(brille1)
> Do you have any reason to believe that hang/reboot is related to this
> specific bug (I can't see how it would be)? If not it would probably be good
> to open up a separate bug for it.

My original Bug 912137 has been routed as being a duplicate of this bug.

> Are you seeing this specific crash when that happens? Because that would
> suggest on your configuration your driver is crashing frequently on
> Outlook.com. -If- so it would be great if you could open a separate bug for
> that as well and post the information from the graphics section from
> 'about:support' it sounds like we might want to blacklist your graphics
> card/driver combination for acceleration if it causes so many problems.

My problem is that my computer just hangs. So I unfortunately can't create a bug of anything.

Occasionall my computer doesn't hang, instead my graphics driver (latest NVIDIA driver on NVIDIA GTS 450 graphics card) crashes and is able to resolve from it a few seconds later. This are those occasions when I have a chance to save my changes before the great hanging comes in. But all this only occurs while I'm running Firefox.
Flags: needinfo?(brille1)
I'm currently not at my office. I'll add the about:support information when I'm back at my machine again.
I've added my about:support export file to my original bug.

HTH
Update to the latest Nvidia GeForce 332.21 WHQL drivers, it fixes issues for Fermi GeForce 400/500 GPUs.

http://www.geforce.com/whats-new/articles/nvidia-geforce-332-21-whql-drivers-released

"[Fermi-class GPU]: Browser freezes and crashes. [1358403]"
(In reply to NVD from comment #113)
> Update to the latest Nvidia GeForce 332.21 WHQL drivers, it fixes issues for
> Fermi GeForce 400/500 GPUs.
> 
> http://www.geforce.com/whats-new/articles/nvidia-geforce-332-21-whql-drivers-
> released
> 
> "[Fermi-class GPU]: Browser freezes and crashes. [1358403]"

Can those of you following this bug who are able to reproduce this crash please confirm this to be true?

Lukas, should we relnote or do something on SUMO?
Flags: needinfo?(lsblakk)
Let's verify that the new version helps before considering how hard we want to promote this to users. We can promote via automatic crash-stats emails if it helps a lot, for example.

This crash is a symptom of the graphics driver resetting: what we're doing here is finding the causes of those resets. This driver update may fix some major causes, but certainly won't fix all of them: Windows update resets the driver when installing updates. So let's get some manual testing and volume data and then see how well we're doing.
(In reply to Benjamin Smedberg  [:bsmedberg] from comment #115)
> Let's verify that the new version helps before considering how hard we want
> to promote this to users. 

Agreed. Unfortunately we'll need to rely on users following this bug to do that verification for us. QA does not have reliable steps to reproduce this crash internally and I'm not sure crash-stats will tell us the whole story, especially if users aren't updating their drivers en-masse. I was more asking the question above to Lukas as a follow-up to if comment 114 can be proven by any means available.
For the record here, it looks like some other things may come up delaying Direct2D 1.1 a little bit. So this might well become early february. Rather than January as mentioned before.
So the last time I upgraded my Nvidia drivers in December I quickly downgraded again when I was getting constant driver resets.  I experimented again today with upgrading and noticed severe issues.  Nvidia drivers 327.33 and 331.58/65/82 all consistently cause problems with Firefox, with driver restarts, browser crashes, and very frequently complete system lockups.  The last good driver was 320.49.  The only crash reports I actually got out of it were
bp-5af3552d-dfc5-4e99-957d-0cd9b2140117
bp-280b9eff-0e3d-43f4-9fe6-a27642140117
since the lockup was the most frequent fate.  However it appears as if 332.21 is problem free, I've been running it now for several minutes without issues (and before the problems were near instantaneous).
My GPU is a G105M.
If anyone would like me to attempt to gather some more data I would be willing.
FWIW, I can reliably reproduce this crash on my laptop when changing the graphics adapter. Perhaps these are not really OOM conditions, but actual device resets.
An interesting tidbit here is that this crash signature jumped from 110-130 crashes per million ADI in Firefox 26 beta to 400-500 crashes per million ADI in Firefox 27 beta.

This huge increase tells us that this is surely not a driver-only issue but changes in our code are making this much worse. I'm not sure what landed in 27 that would influence this but I think we probably want to take a look at that.
Bhavana, it's late in 27 Beta but this isn't currently tracking so I want to make sure you see comment 120 and am marking relnote ? in case this should in fact go into our Known Issues section for FF27 release notes at the very least - though looking into any potential backouts this last week of Beta would also be good.
relnote-firefox: --- → ?
Flags: needinfo?(lsblakk) → needinfo?(bbajaj)
(In reply to Robert Kaiser (:kairo@mozilla.com) from comment #120)
> An interesting tidbit here is that this crash signature jumped from 110-130
> crashes per million ADI in Firefox 26 beta to 400-500 crashes per million
> ADI in Firefox 27 beta.

Could we get more information on the build dates to see when it really spiked. This might help narrow the regression range .
> 
> This huge increase tells us that this is surely not a driver-only issue but
> changes in our code are making this much worse. I'm not sure what landed in
> 27 that would influence this but I think we probably want to take a look at
> that.

Also NI on :bas to see if there is anything that landed in Fx27 which may have caused this increase.
Flags: needinfo?(bbajaj)
Flags: needinfo?(bas)
Looking at the last month of crash data:
https://crash-stats.mozilla.com/report/list?product=Firefox&query_type=contains&range_unit=days&process_type=any&hang_type=any&signature=gfxContext%3A%3APushClipsToDT%28mozilla%3A%3Agfx%3A%3ADrawTarget*%29&date=2014-01-22+18%3A00%3A00&range_value=28

Firefox 26 reports 47239 crashes per 46330 installs in Release or 1.02 crashes per install
Firefox 27 reports 33443 crashes per 26988 installs in Beta or 1.24 crashes per install
Firefox 25 and below reports crashes in the hundreds.

Based on this data I would say the spike in Firefox 26 was much more severe than the increase we've seen in Firefox 27. It would also seem that most users are only crashing once.
Anthony, I'm not sure that data has much meaning because we're comparing across channels.

I pulled some data from nightly and aurora:

https://crash-analysis.mozilla.com/bsmedberg/PushClipsToDT-nightly.csv
https://crash-analysis.mozilla.com/bsmedberg/PushClipsToDT-nightly.svg

https://crash-analysis.mozilla.com/bsmedberg/PushClipsToDT-aurora.csv
https://crash-analysis.mozilla.com/bsmedberg/PushClipsToDT-aurora.svg

From the Aurora data we're clearly doing worse after the merge, but the Nightly data is noisy enough that I can't point at a clear regression range.
(In reply to Anthony Hughes, QA Mentor (:ashughes) from comment #123)
> Looking at the last month of crash data:
> https://crash-stats.mozilla.com/report/
> list?product=Firefox&query_type=contains&range_unit=days&process_type=any&han
> g_type=any&signature=gfxContext%3A%3APushClipsToDT%28mozilla%3A%3Agfx%3A%3ADr
> awTarget*%29&date=2014-01-22+18%3A00%3A00&range_value=28
> 
> Firefox 26 reports 47239 crashes per 46330 installs in Release or 1.02
> crashes per install
> Firefox 27 reports 33443 crashes per 26988 installs in Beta or 1.24 crashes
> per install
> Firefox 25 and below reports crashes in the hundreds.
> 
> Based on this data I would say the spike in Firefox 26 was much more severe
> than the increase we've seen in Firefox 27. It would also seem that most
> users are only crashing once.

I wouldn't trust the analysis in that way - for one thing, you compare beta to release, which is different populations. Fore the other, you look at 25 data from the last week, and there's almost no users left on 25 at this point.
Note that gfxContext::PushGroupAndCopyBackground(gfxContentType) also jumped from 0 to ~100 crashes per million ADI from 26 to 27 on beta, we have bug 798274 filed for that signature but it probably is closely related to this one.
(In reply to Laurentiu Nicola from comment #119)
> FWIW, I can reliably reproduce this crash on my laptop when changing the
> graphics adapter. Perhaps these are not really OOM conditions, but actual
> device resets.

Yes, these are most likely device resets.
Flags: needinfo?(bas)
(In reply to bhavana bajaj [:bajaj] from comment #122)
> (In reply to Robert Kaiser (:kairo@mozilla.com) from comment #120)
> > An interesting tidbit here is that this crash signature jumped from 110-130
> > crashes per million ADI in Firefox 26 beta to 400-500 crashes per million
> > ADI in Firefox 27 beta.
> 
> Could we get more information on the build dates to see when it really
> spiked. This might help narrow the regression range .
> > 
> > This huge increase tells us that this is surely not a driver-only issue but
> > changes in our code are making this much worse. I'm not sure what landed in
> > 27 that would influence this but I think we probably want to take a look at
> > that.
> 
> Also NI on :bas to see if there is anything that landed in Fx27 which may
> have caused this increase.

It's unlikely, but far from impossible, that something changed causing us to trigger more device resets, as a side-effect that would make this crash occur significantly more.
So, this is the main reason why Firefox 27 will be less stable than 26, from all that we have looked at. In 28 (and 29), it's even worse, it's the #1 crash by far (more than double of the #2), in the average over 4 weeks, this signature plus the related gfxContext::PushGroupAndCopyBackground (bug 798274) make up 7.4% of all Aurora crashes at this point: https://crash-stats.mozilla.com/topcrasher/products/Firefox/versions/28.0a2?days=28

Given that the volume regression looks pretty bad in 28 and 29 so far and the high impact a fix could have there, I'm nominating for tracking there.

Bas, you said in comment #99 that it would be "certainly not impossible" to find a solution before a D2D 1.1 switch if priority is high enough. Between the volume of the crashes and D2D 1.1 support AFAIK not coming before the Firefox 30 cycle, I think the priority of this is becoming pretty high now. Bas, any way we can paper over the problems for 28/29?
Flags: needinfo?(bas)
(In reply to Robert Kaiser (:kairo@mozilla.com) from comment #129)
> So, this is the main reason why Firefox 27 will be less stable than 26, from
> all that we have looked at. In 28 (and 29), it's even worse, it's the #1
> crash by far (more than double of the #2), in the average over 4 weeks, this
> signature plus the related gfxContext::PushGroupAndCopyBackground (bug
> 798274) make up 7.4% of all Aurora crashes at this point:
> https://crash-stats.mozilla.com/topcrasher/products/Firefox/versions/28.
> 0a2?days=28
> 
> Given that the volume regression looks pretty bad in 28 and 29 so far and
> the high impact a fix could have there, I'm nominating for tracking there.
> 
> Bas, you said in comment #99 that it would be "certainly not impossible" to
> find a solution before a D2D 1.1 switch if priority is high enough. Between
> the volume of the crashes and D2D 1.1 support AFAIK not coming before the
> Firefox 30 cycle, I think the priority of this is becoming pretty high now.
> Bas, any way we can paper over the problems for 28/29?

If we dedicate resources to it, yes, there is such a way :). It's hard for me to say how much work it would be, it could be anything from 2 days to 2 weeks. Note that we're also discussing raising the priority on the work to get D2D 1.1 in to take care of this problem.
Flags: needinfo?(bas)
Keywords: topcrash-metro
I cannot reproduce the crash, however I can reproduce circumstances which I believe may lead to this crash under some circumstances. This patch should mitigate those circumstances. It should be noted that in this event we'll be in danger of going into another situation that presents some problems. I'm working on a patch for this as well in a separate bug.
Attachment #8372511 - Flags: review?(jmuizelaar)
Flags: needinfo?(benjamin)
Attachment #8372511 - Flags: review?(jmuizelaar) → review+
https://hg.mozilla.org/mozilla-central/rev/098775e8ff7c
Status: NEW → RESOLVED
Closed: 10 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla30
The patch here does not seem to have improved the crash rate:

https://crash-analysis.mozilla.com/bsmedberg/PushClipsToDT-nightly-140210.svg

PushClipsToDT crashes on the 9-feb nightly:
https://crash-stats.mozilla.com/search/?build_id=20140209030203&signature=%3DgfxContext%3A%3APushClipsToDT%28mozilla%3A%3Agfx%3A%3ADrawTarget*%29&version=30.0a1&_columns=date&_columns=signature&_columns=product&_columns=version&_columns=build_id&_columns=platform&_columns=user_comments

One thing that is surprising to me about this is that we're seeing crashes on Linux also, and at the same location as on Windows:

Linux: bp-58ef909d-d048-4b97-95de-3dfb82140209
Windows: bp-5f2ce188-acf5-43fd-9bc6-5f8532140209
both crash at http://hg.mozilla.org/mozilla-central/annotate/c8cd1f6b6d2d/gfx/thebes/gfxContext.cpp#l2110
(all null-derefs)

I've been focusing on the fact that these crashes happen on Windows TDR, but perhaps the bug is somewhere else?

Vlad said he could write a program to force a TDR, which might make this easier to reproduce.
Status: RESOLVED → REOPENED
Flags: needinfo?(benjamin) → needinfo?(vladimir)
Resolution: FIXED → ---
I'm still getting crashes and corruption when changing the graphics adapter. Should I open a more specific bug?
(In reply to Benjamin Smedberg  [:bsmedberg] from comment #134)
> The patch here does not seem to have improved the crash rate:
> 
> https://crash-analysis.mozilla.com/bsmedberg/PushClipsToDT-nightly-140210.svg
> 
> PushClipsToDT crashes on the 9-feb nightly:
> https://crash-stats.mozilla.com/search/
> ?build_id=20140209030203&signature=%3DgfxContext%3A%3APushClipsToDT%28mozilla
> %3A%3Agfx%3A%3ADrawTarget*%29&version=30.
> 0a1&_columns=date&_columns=signature&_columns=product&_columns=version&_colum
> ns=build_id&_columns=platform&_columns=user_comments

I have to disagree...

https://crash-stats.mozilla.com/search/?build_id=20140208030207&signature=%3DgfxContext%3A%3APushClipsToDT%28mozilla%3A%3Agfx%3A%3ADrawTarget*%29&version=30.0a1&_columns=date&_columns=signature&_columns=product&_columns=version&_columns=build_id&_columns=platform&_columns=user_comments

shows 86 results, as opposed to 29 on your search query. Suggesting a considerable drop in the crash rate.

> 
> One thing that is surprising to me about this is that we're seeing crashes
> on Linux also, and at the same location as on Windows:

This could occur on any sort of DrawTarget creation error (X render surface allocation perhaps?)

> 
> Linux: bp-58ef909d-d048-4b97-95de-3dfb82140209
> Windows: bp-5f2ce188-acf5-43fd-9bc6-5f8532140209
> both crash at
> http://hg.mozilla.org/mozilla-central/annotate/c8cd1f6b6d2d/gfx/thebes/
> gfxContext.cpp#l2110
> (all null-derefs)
> 
> I've been focusing on the fact that these crashes happen on Windows TDR, but
> perhaps the bug is somewhere else?
> 
> Vlad said he could write a program to force a TDR, which might make this
> easier to reproduce.

Causing a TDR isn't really the problem, but none of my machines(I tried 4 different ones) crash on a TDR, sadly.
Attachment #8374169 - Flags: review?(jmuizelaar) → review+
Flags: needinfo?(benjamin)
https://hg.mozilla.org/mozilla-central/rev/2812fd3a3213
Status: REOPENED → RESOLVED
Closed: 10 years ago10 years ago
Resolution: --- → FIXED
This kills all the last references to gfxD2DSurface in widget. Giving us more certainty we'll never hit these codepaths now or in the future.
Attachment #8374884 - Flags: review?(jmuizelaar)
Attachment #8374884 - Flags: review?(jmuizelaar) → review+
Can we get uplift nominations for this?  Looks like something we'd want to land sooner than later in Beta if low risk enough to take that high.
(In reply to Lukas Blakk [:lsblakk] from comment #140)
> Can we get uplift nominations for this?  Looks like something we'd want to
> land sooner than later in Beta if low risk enough to take that high.

I wanted to wait a few days first to confirm that this helped on trunk. Last I checked, it looked like it probably did, but I'll be able to tell you more on Monday (er, Tuesday as I'm out Monday as well).
Bas is going to comment on the risk of the patch.
Flags: needinfo?(vladimir)
FWIW, I'd still like to have the app vlad mentioned which can force a TDR.
Flags: needinfo?(benjamin) → needinfo?(vladimir)
The first patch appears to help reduce the rate: the second patch does not appear to have affected anything:

https://crash-analysis.mozilla.com/bsmedberg/PushClipsToDT-nightly-140217.svg

Here is a link to all the crashes since the 13-Feb nightly which contains the patch from comment 138.
Not to distract from the effort going on here to resolve this on Windows but I'm seeing many reports of this signuature on Linux as well. Here is an example report:
https://crash-stats.mozilla.com/report/index/77e8e96f-f5fe-4f51-98e5-52c542140216

Should this get a new bug report?
See Also: → 974656
Firefox 30.0a1: 804 crashes in the last week.
Firefox 29.0a2: 969 crashes in the last week.
Firefox 28.0b4: 3211 crashes in the last week.

I'm not sure this should be called fixed based on this information.

Source:
https://crash-stats.mozilla.com/report/list?signature=gfxContext%3A%3APushClipsToDT%28mozilla%3A%3Agfx%3A%3ADrawTarget%2A%29&product=Firefox&query_type=contains&range_unit=weeks&process_type=any&hang_type=any&date=2014-02-24+21%3A00%3A00&range_value=1#reports
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
(In reply to Benjamin Smedberg  [:bsmedberg] from comment #144)
> The first patch appears to help reduce the rate: the second patch does not
> appear to have affected anything:
> 
> https://crash-analysis.mozilla.com/bsmedberg/PushClipsToDT-nightly-140217.svg

Does that mean we should get at least the first patch uplifted to beta?
Flags: needinfo?(benjamin)
Bas can make the risk assessment. https://crash-analysis.mozilla.com/bsmedberg/PushClipsToDT-nightly-140225.svg is not very clear about how much the patch actually helped and how much is just noise. The variation is pretty huge in general.
Flags: needinfo?(benjamin)
Bas, see the last two comments, should we uplift something from this to beta?

That said, this didn't cause as much a spike in overall crash volume as I'd feared before 27 hitting release and 28 hitting beta - but still, it's the #1 crash signature overall on both channels with >5% of all our crashes coming down to this right now.
Flags: needinfo?(bas)
I'm guessing nothing will land on beta today, and after that, we only have the RC left for this release, so I'll call this a wontfix for 28.
Crash Signature: [@ gfxContext::PushClipsToDT(mozilla::gfx::DrawTarget*)] → [@ gfxContext::PushClipsToDT(mozilla::gfx::DrawTarget*)] [@ gfxContext::PushGroupAndCopyBackground(gfxASurface::gfxContentType)] [@ gfxContext::PushGroupAndCopyBackground(gfxContentType)]
(In reply to Robert Kaiser (:kairo@mozilla.com) from comment #150)
> Bas, see the last two comments, should we uplift something from this to beta?
> 
> That said, this didn't cause as much a spike in overall crash volume as I'd
> feared before 27 hitting release and 28 hitting beta - but still, it's the
> #1 crash signature overall on both channels with >5% of all our crashes
> coming down to this right now.

Sorry about his. I got severely distracted by B2G issues and overlooked this.

We should probably uplift some of this, in addition we should probably create a separate bug, it looks like a lot of the remaining crashes are not related to D2D (1.1). At least 2 crashes I looked at were on machines that didn't even have D2D enabled (or even loaded into the process space, so it wasn't enabled in the past on this run either):

https://crash-stats.mozilla.com/report/index/eb1df18c-5013-42e3-a5df-4973c2140307
https://crash-stats.mozilla.com/report/index/826011e6-76b3-4431-b80c-d79b62140307

So it seems there's a separate bug (maybe just an OOM symptom?) altogether.
Flags: needinfo?(bas)
(In reply to Bas Schouten (:bas.schouten) from comment #153)
> (In reply to Robert Kaiser (:kairo@mozilla.com) from comment #150)
> > Bas, see the last two comments, should we uplift something from this to beta?
> > 
> > That said, this didn't cause as much a spike in overall crash volume as I'd
> > feared before 27 hitting release and 28 hitting beta - but still, it's the
> > #1 crash signature overall on both channels with >5% of all our crashes
> > coming down to this right now.
> 
> Sorry about his. I got severely distracted by B2G issues and overlooked this.
> 
> We should probably uplift some of this

Unfortunately the waiting did cost us the time to still catch the 28 train. Can you request uplifts to Aurora for 29, though? (Setting ni? for that.)


> in addition we should probably
> create a separate bug, it looks like a lot of the remaining crashes are not
> related to D2D (1.1).
> [...]
> So it seems there's a separate bug (maybe just an OOM symptom?) altogether.

OK, let's file a followup for those, then.
Flags: needinfo?(bas)
Comment on attachment 8374169 [details] [diff] [review]
Never draw with Direct2D directly to a window

[Approval Request Comment]
Bug caused by (feature/regressing bug #): D2D 1.1 upgrade
User impact if declined: Crashes when driver resets occur
Testing completed (on m-c, etc.): extensive nightly testing
Risk to taking this patch (and alternatives if risky): Very low
String or IDL/UUID changes made by this patch: None
Attachment #8374169 - Flags: approval-mozilla-aurora?
Flags: needinfo?(bas)
Comment on attachment 8374884 [details] [diff] [review]
Remove all references to gfxD2DSurface in widget

[Approval Request Comment]
Bug caused by (feature/regressing bug #): D2D 1.1 upgrade
User impact if declined: Crashes when driver resets occur
Testing completed (on m-c, etc.): extensive nightly testing
Risk to taking this patch (and alternatives if risky): Very low
String or IDL/UUID changes made by this patch: None
Attachment #8374884 - Flags: approval-mozilla-aurora?
Comment on attachment 8372511 [details] [diff] [review]
Do not validate layers when our device is removed

[Approval Request Comment]
Bug caused by (feature/regressing bug #): D2D 1.1 upgrade
User impact if declined: Crashes when driver resets occur
Testing completed (on m-c, etc.): extensive nightly testing
Risk to taking this patch (and alternatives if risky): Very low
String or IDL/UUID changes made by this patch: None
Attachment #8372511 - Flags: approval-mozilla-aurora?
Bas, did they land in m-c ? thanks
(In reply to Sylvestre Ledru [:sylvestre] from comment #158)
> Bas, did they land in m-c ? thanks

At least some of them did in comment #133 and comment #138 but I actually can't get the full picture of all the patches in here myself, I hope Bas can. ;-)
Flags: needinfo?(bas)
(In reply to Sylvestre Ledru [:sylvestre] from comment #158)
> Bas, did they land in m-c ? thanks

The three that I requested uplift for did.
Flags: needinfo?(bas)
Comment on attachment 8372511 [details] [diff] [review]
Do not validate layers when our device is removed

OK. Thanks Bas.
Attachment #8372511 - Flags: approval-mozilla-aurora? → approval-mozilla-aurora+
Attachment #8374169 - Flags: approval-mozilla-aurora? → approval-mozilla-aurora+
Attachment #8374884 - Flags: approval-mozilla-aurora? → approval-mozilla-aurora+
Just hit this on a 2014-03-03 nightly, after I updated windows drivers in the middle of a session (but well after, about half an hour of browser usage after).
Flags: needinfo?(vladimir)
Depends on: 983202
No longer depends on: 983202
(In reply to Bas Schouten (:bas.schouten) from comment #139)
> Created attachment 8374884 [details] [diff] [review]
> Remove all references to gfxD2DSurface in widget

This never landed on m-c AFAICT, so skipping this one for now.

https://hg.mozilla.org/releases/mozilla-aurora/rev/869524719690
https://hg.mozilla.org/releases/mozilla-aurora/rev/0f8e3ca9fd4a
(In reply to Ryan VanderMeulen [:RyanVM UTC-4] from comment #163)
> (In reply to Bas Schouten (:bas.schouten) from comment #139)
> > Created attachment 8374884 [details] [diff] [review]
> > Remove all references to gfxD2DSurface in widget
> 
> This never landed on m-c AFAICT, so skipping this one for now.
> 
> https://hg.mozilla.org/releases/mozilla-aurora/rev/869524719690
> https://hg.mozilla.org/releases/mozilla-aurora/rev/0f8e3ca9fd4a

Thanks, I'll land the last one, it appears I put the wrong patch on the bug for this one. This was https://hg.mozilla.org/mozilla-central/rev/e34283cf0b42.. where I seem to have -also- messed up the bug number, causing it to get wrongly reported.
Symptoms for me include the page area of FF going black.

System info: http://valid.x86.fr/7tuqqv

Tested with HW acceleration disabled in regular mode.
Tested with HW acceleration enabled in safe mode.

Related crash reports: 
https://crash-stats.mozilla.com/report/index/3f23f303-9332-4034-8ff8-a56672140326
https://crash-stats.mozilla.com/report/index/2d8ade03-6364-45d7-9495-e7f5a2140326

Reproduce with "endless scrolling" websites. I.E.: 500px.com
http://500px.com/flow to be precise

also tested with nightly 2014-03-26
Steps to reproduce:
1. login to 500px.com
2. go to 500px.com/flow
3. scroll down until there is a "more" button and click it
4. repeat step 3, 20+ times
Haven't read all the comments here, but I get frequent crashes after a few days of having the browser open.

I start to see black painting, then frequent unresponsiveness, then it crashes.
Here is my most recent crash report.
https://crash-stats.mozilla.com/report/index/cb917cc6-e1a2-4cda-ad70-5aa762140325

I'm on win7, 64 bit, Aurora 30.0a2 (2014-03-23).
Been seeing this since at least January I believe and I frequent reddit a goodly amount.
AMA
(In reply to Caspy7 from comment #169)
> Haven't read all the comments here, but I get frequent crashes after a few
> days of having the browser open.
> 
> I start to see black painting, then frequent unresponsiveness, then it
> crashes.
> Here is my most recent crash report.
> https://crash-stats.mozilla.com/report/index/cb917cc6-e1a2-4cda-ad70-
> 5aa762140325
> 
> I'm on win7, 64 bit, Aurora 30.0a2 (2014-03-23).
> Been seeing this since at least January I believe and I frequent reddit a
> goodly amount.
> AMA

This is probably an OOM since we use fallible allocations, could you check your memory usage around the time this happens?
I'll try to catch it next time it happens, but it is usually above 2 GB and possibly closer to 3 GB.
(In reply to Jan [:Jan\] from comment #167)
> http://500px.com/flow to be precise
> 
> also tested with nightly 2014-03-26

just to specify, in nightly the error is an OOM. The issue happens with (stable channel) ff 28.
Depends on: 988862
Depends on: 994432
No longer depends on: 994432
On WinXP bp-5a6c62e2-a760-440b-b121-d3f422140417 and bp-b2ee17be-1c4e-40fd-8ecd-0d4562140417 . I did have some memory and made a comment in the BP.

Socorro sends me to Bug 974656 Comment 3 sends me here.

Strangely I get the 'Windows Error Reporter Popup' @ Address 002c:12cb (along with the Mozilla Error Reporter). This happens occasionally during the past few weeks. Prior to a month ago the WER did not pop up, only the MCR.


PS: Today I also had this BP which while not "gfxContext::PushClipsToDT" is an auxillary BP.

gfxContext::PushGroupAndCopyBackground(gfxContentType)
bp-80147586-6980-4b2f-a57e-84a812140417

.
From my Socorro Report: bp-7ff4551b-900e-4bdb-a4b2-4ae0a2140417 

User Comments: "Closing Tabs. Browser was sitting stable for hours, closed half a dozen Tabs, closed a few more and it crashed. Windows Error Reporter popup up with MCR." .

Looking at WinXp Task Manager there was a big peak in Kernel activity (as evidenced by checking [View][Show Kernel Times] ) right before the crash. WTM shows memory usage as increasing and decreasing slightly (cyclically) during 'idle period'.

Browser had sat stable (with over 100 Tabs open) for more than 6 hours, closing Tabs (freeing memory) essentially caused the crash.
If it helps any, here's author which apparently gets this crash using css transforms:

https://stackoverflow.com/questions/23140854/css-transforms-spaced-out-with-settimeout-firefox-crash
WinXP Nightly. Attempting to open this URL (which results in a 4K Video playing, since my YT settings allow auto-play AND I had my resolution set to 4K) results in an instant crash, 100% of the time.

https://www.youtube.com/user/Panasonic4K .


These BPs are for [@ gfxContext::PushClipsToDT(mozilla::gfx::DrawTarget*) ] :
bp-99e59577-07a5-4585-8740-318de2140420 , 
bp-859b9b13-b2ad-4f22-9680-94c112140420 ,
bp-79a9bb29-dbfe-444d-90d1-acfb22140420 .


BTW: Playing extremely short 4K Videos on YouTube results in a 'Green Screen' (in the Player's Window), but switching to less than 4K resolution allows the Video to play correctly. I am currently investigating IF that is a different or related Bug.

Note: My Monitor does not actually permit me to play 4K Videos, but I have played them before (on YT, at 4K resolution) and previously they played mostly acceptably (but very slowly, and erratically). Now I can not play them at 4K at all.

Speculation: Something may be wrong with the way the Flash Player is scaling Videos. Perhaps the Browser has an overflow Bug and does not expect 4K so it accepts the Flash Player's decision, to render at 4K when that much Memory is not being allocated.

That may be why I get a Green Screen on _very_ short 4K Videos and the (from Socorro) 'Crash Reason: EXCEPTION_ACCESS_VIOLATION_READ , Crash Address 0x0' when a normal (more common) length Video is attempted to be played.


I have the newest Flash (Debug Version).
(In reply to Rob from comment #176)
> WinXP Nightly. Attempting to open this URL (which results in a 4K Video
> playing, since my YT settings allow auto-play AND I had my resolution set to
> 4K) results in an instant crash, 100% of the time.

It would seem that after rebooting that issue has gone away. That confirms a previous Theory I had related to this, a problem that seems to crop up from time to time (for me). 

A precondition to the prior Comment 176 is that YouTube's Flash Player's Fonts must be fouled. You will notice that circumstance by trying to display (I)nfo or change the resolution (in which case the Fonts will be transparent and the Font's Pane will be the wrong shape (often full width of the Player)).

That circumstance most often occurs (for me, on WinXP) when I am almost OOM. The new Allocator does a great job at keeping me from crashing, but I think the Garbage Collection is leaving a bit of trash(ing) behind (corrupting something).
We're not getting any traction here for FF29 so have to wontfix (again).  Can we get an update of what state FF30 is in with regards to this?
Flags: needinfo?(bas)
(In reply to Lukas Blakk [:lsblakk] from comment #178)
> We're not getting any traction here for FF29 so have to wontfix (again). 
> Can we get an update of what state FF30 is in with regards to this?

What's left of these stack traces, as far as we can tell, is simply OOM. There's nothing we can do about this except bring out memory usage down.
Flags: needinfo?(bas)
A friend of mine is hitting this crash reasonably frequently. She says:
"Occurs only when using Google Maps Streetview on my second screen (only since Maps upgraded about a month or two? ago). When moving down a street (or doing a 360° scan) with repeated mouse clicks, the peripheral view will sometimes go fuzzy/dark before crashing. Has occurred reasonably frequently, but not everytime that I have used Google Maps."
I'm guessing WebGL...
Right after upgrading to FF29 stable, I got these 2 crashes of the same error code (and before the upgrade I had no crashes for some months).
Using Windows 7 x86 SP1 with 3GB RAM and E5200 CPU, with 150~ tabs open (yeah, I'm crazy) BUT most of them are in the background.
I noticed it happened while FF was totally idle (not in use) for some hours, just like Rob said in https://bugzilla.mozilla.org/show_bug.cgi?id=805406#c174
If it helps, I'm used to finish firefox.exe process so I won't need to wait for the session to be saved (using Session Manager addon). I know it isn't healthy but it did the trick in the last 10 FF versions WITHOUT crashes.
Plus I noticed that in the new version, FF takes about x2 MB RAM than it usually does, with the same usage of FF (many tabs open, most of them in the background).

The reported bugs from my system:
https://crash-stats.mozilla.com/report/index/9af68b2e-d775-4521-902b-722102140506
https://crash-stats.mozilla.com/report/index/1efd0f41-331d-4865-902c-ff47b2140505

If more info is needed please tell.

ET.
Another crash, same behavior:
https://crash-stats.mozilla.com/report/index/04e15942-3500-482a-8ab9-633f42140507
This time I've noticed it happened when Firefox was in the background for a long time but not minimized. When I switched to Firefox it hanged for a few seconds and then gave this crash.
Depends on: 1011348
Depends on: 1011864
My SO's Firefox 29.0.1 crashed with seemingly this signature.

bp-9b24bb20-2ce0-4edd-995b-ff5a52140606	06/06/2014	03:54 p.m.

Let me know if there's anything of interest that shall be collected from the host or profile.
Also see [https://support.mozilla.org/en-US/questions/1005505]
User crash stack: [https://crash-stats.mozilla.com/report/index/bp-6fa1a51a-44ef-4076-aaf5-4ac992140611]
Problem signature:

Problem Event Name:	APPCRASH
Application Name:	plugin-container.exe
Application Version:	30.0.0.5262
Application Timestamp:	5387f37a
Fault Module Name:	mozalloc.dll
Fault Module Version:	30.0.0.5262
Fault Module Timestamp:	5387c49f
Exception Code:	80000003
Exception Offset:	0000141b
OS Version:	6.1.7601.2.1.0.256.1
Locale ID:	3081
Additional Information 1:	0a9e
Additional Information 2:	0a9e372d3b4ad19135b953a78882e789
Additional Information 3:	0a9e
Additional Information 4:	0a9e372d3b4ad19135b953a78882e789
Blocks: 1139018
I believe this bug can be closed. At this point there have been no crashes reported recently beyond Firefox/Fennec 31 apart from [@ gfxContext::PushGroupAndCopyBackground(gfxContentType)]. For that signature there are 4 crashes with the most recent releases (40.0.*/41.0.*) but that represents ~1.5% of the total volume for this signature. The remaining crashes are all on much older version of Firefox/Fennec which are no longer supported. For this reason I am closing this bug as INCOMPLETE. Please reopen if this becomes a more serious problem in a supported release.
Status: REOPENED → RESOLVED
Closed: 10 years ago9 years ago
Resolution: --- → INCOMPLETE
(In reply to Anthony Hughes, QA Mentor (:ashughes) from comment #189)

Nowadays issues like this are getting reported as OOM|small from PushNewDT, like bp-49319b77-2969-4bf0-8728-2619c2151002.

I don't like that we dump these all in the OOM bucket. Yes, some of them are real OOMs, but others show plenty of available memory of all types. Blending them into OOM|small (a signature that we pretty much ignore) also means we don't notice spikes. Despite this, I haven't been able to get support to change the current annotation, and I've pretty much given up.
(In reply to David Major [:dmajor] from comment #190)
> I haven't been able to get support to change the current annotation, and I've pretty much given up.

I agree that OOM-bucketing is a problem but it's something I need to understand better before offering creative solutions. Can you email me off this bug to get me up to speed on what you think we need to do and what you've tried to champion so far?
Yeah, let's keep it open while talking about this.  We know these are not "real" OOMs.
Status: RESOLVED → REOPENED
Resolution: INCOMPLETE → ---
(In reply to Anthony Hughes, QA Mentor (:ashughes) from comment #191)
I don't really have any more context to add. Comment 190 pretty much captures my concerns. I had occasionally brought this up on IRC or in bugs whose numbers I no longer remember.
I see no point in keeping this bug report going if there is no expectation that we'll change hearts and minds about how we deal with OOMs.
Milan pointed out via email that bug 1205771 is probably an OOM|small related to this bug.
Status: REOPENED → NEW
Crash Signature: [@ gfxContext::PushClipsToDT(mozilla::gfx::DrawTarget*)] [@ gfxContext::PushGroupAndCopyBackground(gfxASurface::gfxContentType)] [@ gfxContext::PushGroupAndCopyBackground(gfxContentType)] → [@ OOM|small] [@ gfxContext::PushClipsToDT(mozilla::gfx::DrawTarget*)] [@ gfxContext::PushGroupAndCopyBackground(gfxASurface::gfxContentType)] [@ gfxContext::PushGroupAndCopyBackground(gfxContentType)]
Depends on: 1205771
Keywords: topcrash-metro
QA Contact: jbecerra → anthony.s.hughes
(In reply to Anthony Hughes, QA Mentor (:ashughes) from comment #194)
> I see no point in keeping this bug report going if there is no expectation
> that we'll change hearts and minds about how we deal with OOMs.

Just a follow-up to this. The only way we'll make progress here is if we can annotate the frames in the crashing thread. Using comment 190 as the usecase, gfxContext::PushNewDT(gfxContentType) shows up in Frame 1 of the Crashing thread. If I could use supersearch or the Socorro API to query this data (I can't currently) I'd be able to estimate the volume and pull out other topcrashes.

Anecdotally, counting the latest 60 OOM|small reports for 41.0.1 by hand reveal 20 (1/3rd) are in PushNewDT. It'd probably be safe to assume that this is still a topcrash and that there's probably at least one more topcrash hiding in OOM|small. Unfortunately there's no way to know for sure without going through each and every report by hand at this point. At a rate of ~14K reports per day, that's just impossible. 

I think the next logical step is to get a blocker bug on file against Socorro describing what we need to get better visibility into OOM|small. We'd also need a champion who can push hard on this as it's likely costing us users.

David, Milan, any thoughts on this?
The OOM|small pseudo-signature is deliberately designed to be opaque. For "regular" OOM|small, the stack is irrelevant at best and misleading at worst. The true cause is a complete exhaustion of memory resources, and the callsite is just a victim. So in general we shouldn't go looking for stacks in OOM|small.

PushNewDT is an exception because it's (sometimes) mis-using the OOM annotations. So I think it does make sense to see what fraction of stacks it accounts for. Bug 1208129 will allow full-stack searches using the proto_signature field. It's only on the stage server right now, with limited data, but anecdotally I also see roughly one-third of OOM|small coming from PushNewDT.
So what are the next steps for solving PushNewDT? What's needed to move this forward so we're not just laying whack-a-mole with issues like bug 1205771?
Crash Signature: [@ OOM|small] [@ gfxContext::PushClipsToDT(mozilla::gfx::DrawTarget*)] [@ gfxContext::PushGroupAndCopyBackground(gfxASurface::gfxContentType)] [@ gfxContext::PushGroupAndCopyBackground(gfxContentType)] → [@ OOM|small] [@ gfxContext::PushClipsToDT(mozilla::gfx::DrawTarget*)] [@ gfxContext::PushGroupAndCopyBackground(gfxASurface::gfxContentType)] [@ gfxContext::PushGroupAndCopyBackground(gfxContentType)] [@ gfxContext::PushClipsToDT] [@ gfxContext::Pus…
Outside of OOM|small this accounts for 13 crashes last month in Firefox 42.
Inside of OOM|small this accounts for 6735 crashes over the last week in Firefox 42.

That would make this our #5 topcrash in Firefox 42 with ~1.71% of the total crash volume, and about 18.8% of the OOM|small volume which remains our #1 issue by far.
(In reply to Anthony Hughes, QA Mentor (:ashughes) from comment #198)
> So what are the next steps for solving PushNewDT? What's needed to move this
> forward so we're not just laying whack-a-mole with issues like bug 1205771?

At this point, the activity on this topic is in bug 1107792.
As of 44, is this still a top crash?
Flags: needinfo?(anthony.s.hughes)
(In reply to Milan Sreckovic [:milan] from comment #202)
> As of 44, is this still a top crash?

Currently ranks #33 in 44.0 with 478 crashes reported in the last week, so no it technically is not a topcrash.
Flags: needinfo?(anthony.s.hughes)
(In reply to Anthony Hughes, QA Mentor (:ashughes) from comment #203)
> (In reply to Milan Sreckovic [:milan] from comment #202)
> > As of 44, is this still a top crash?
> 
> Currently ranks #33 in 44.0 with 478 crashes reported in the last week, so
> no it technically is not a topcrash.

I actually was told that a number of the really-high-ranked OOM|small crashes are actually this issue and not OOM actually. If that's the case, it surely is a topcrash.
(In reply to Robert Kaiser (:kairo@mozilla.com) from comment #204)
> I actually was told that a number of the really-high-ranked OOM|small
> crashes are actually this issue and not OOM actually. If that's the case, it
> surely is a topcrash.

Right - we're running into a bit of a limitation on how we track crashes.  The same signature shows up when we're using accelerated (D2D1.1, this and also bug 1243112) and non-accelerated content (bug 1107792.)

We believe we took care of the non-accelerated case in bug 1107792, at least to the point where we do not misreport an error as OOM condition.  That could be what you're referencing?

This, and bug 1243112 are about the accelerated case.  We haven't figured out what the cause is, although we're continuing the conversation in bug 1243112.

The problem is that the stack looks remarkably similar between the two scenarios, and it is only the examination of the pre-crash annotations that help us distinguish between the two.  For example, the  non-accelerated crash https://crash-stats.mozilla.com/report/index/9e42b7f1-0d08-4723-b6d1-596822151122 and accelerated one https://crash-stats.mozilla.com/report/index/82ce712c-7ba6-4fc3-8266-93f722160119
This should be fixed on nightly by bug 1220629.
Status: NEW → RESOLVED
Closed: 9 years ago8 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: