Last Comment Bug 664764 - Firefox crash @ mozilla::layers::BasicLayerManager::PushGroupWithCachedSurface
: Firefox crash @ mozilla::layers::BasicLayerManager::PushGroupWithCachedSurface
Status: RESOLVED FIXED
[qa-:needs bug 768831]
: crash, regression, reproducible, topcrash
Product: Core
Classification: Components
Component: Graphics (show other bugs)
: 7 Branch
: x86 Windows 7
: -- critical (vote)
: mozilla14
Assigned To: Robert O'Callahan (:roc) (Exited; email my personal email if necessary)
:
Mentors:
http://paulmhecht.com/new/live-traini...
: 709463 (view as bug list)
Depends on: 768831
Blocks: 532972
  Show dependency treegraph
 
Reported: 2011-06-16 10:18 PDT by Marcia Knous [:marcia - use ni]
Modified: 2012-08-27 12:24 PDT (History)
16 users (show)
See Also:
Crash Signature:
QA Whiteboard:
Iteration: ---
Points: ---
Has Regression Range: ---
Has STR: ---
fixed
14+
fixed


Attachments
Part 1: change assertion to warning since it can be triggered by resource exhaustion (1.25 KB, patch)
2012-03-18 15:37 PDT, Robert O'Callahan (:roc) (Exited; email my personal email if necessary)
matt.woodrow: review+
akeybl: approval‑mozilla‑beta-
akeybl: approval‑mozilla‑esr10+
Details | Diff | Review
Part 2: make PushGroupWithCachedSurface handle failure to get cached surface (2.10 KB, patch)
2012-03-18 15:38 PDT, Robert O'Callahan (:roc) (Exited; email my personal email if necessary)
matt.woodrow: review+
akeybl: approval‑mozilla‑beta-
akeybl: approval‑mozilla‑esr10+
Details | Diff | Review
Part 3: don't count memory usage for invalid gfxWindowsSurfaces (1.36 KB, patch)
2012-03-18 15:39 PDT, Robert O'Callahan (:roc) (Exited; email my personal email if necessary)
matt.woodrow: review+
akeybl: approval‑mozilla‑beta-
akeybl: approval‑mozilla‑esr10+
Details | Diff | Review

Description Marcia Knous [:marcia - use ni] 2011-06-16 10:18:11 PDT
Seen while reviewing trunk crash stats. Low volume Windows only trunk crash that started appearing using the 2011060100 build. 

https://crash-stats.mozilla.com/report/list?signature=mozilla::layers::BasicLayerManager::PushGroupWithCachedSurface%28gfxContext*,%20gfxASurface::gfxContentType%29

Possible pushlog regression range: http://hg.mozilla.org/mozilla-central/pushloghtml?fromchange=3d475b322365&tochange=1619d8fc6416. Looks as if a merge did happen in that changeset.

https://crash-stats.mozilla.com/report/index/92548d50-c5e0-4844-8d66-417fa2110616

Frame 	Module 	Signature [Expand] 	Source
0 	xul.dll 	mozilla::layers::BasicLayerManager::PushGroupWithCachedSurface 	gfx/layers/basic/BasicLayers.cpp:1277
1 	xul.dll 	mozilla::layers::BasicLayerManager::PushGroupForLayer 	gfx/layers/basic/BasicLayers.cpp:604
2 	xul.dll 	mozilla::layers::BasicLayerManager::PaintLayer 	gfx/layers/basic/BasicLayers.cpp:1638
3 	xul.dll 	mozilla::layers::BasicLayerManager::EndTransactionInternal 	gfx/layers/basic/BasicLayers.cpp:1525
4 	xul.dll 	mozilla::layers::BasicShadowLayerManager::EndTransaction 	gfx/layers/basic/BasicLayers.cpp:3043
5 	xul.dll 	nsDisplayList::PaintForFrame 	layout/base/nsDisplayList.cpp:629
6 	xul.dll 	nsLayoutUtils::PaintFrame 	layout/base/nsLayoutUtils.cpp:1636
7 	xul.dll 	PresShell::Paint 	layout/base/nsPresShell.cpp:6240
8 	xul.dll 	PresShell::FlushPendingNotifications 	layout/base/nsPresShell.cpp:4889
9 		@0x2214753 	
10 	xul.dll 	nsViewManager::DispatchEvent 	view/src/nsViewManager.cpp:921
11 	mozcrt19.dll 	imalloc 	obj-firefox/memory/jemalloc/crtsrc/jemalloc.c:3749
12 	xul.dll 	_cairo_array_grow_by 	gfx/cairo/cairo/src/cairo-array.c:153
13 	xul.dll 	AttachedHandleEvent 	view/src/nsView.cpp:192
14 	xul.dll 	nsWindow::DispatchEvent 	widget/src/windows/nsWindow.cpp:3548
15 	xul.dll 	nsWindow::DispatchWindowEvent 	widget/src/windows/nsWindow.cpp:3576
16 	xul.dll 	nsWindow::OnPaint 	widget/src/windows/nsWindowGfx.cpp:428
17 	xul.dll 	nsWindow::ProcessMessage 	widget/src/windows/nsWindow.cpp:4842
18 	xul.dll 	nsWindow::WindowProcInternal 	widget/src/windows/nsWindow.cpp:4401
19 	xul.dll 	CallWindowProcCrashProtected 	xpcom/base/nsCrashOnException.cpp:65
20 	xul.dll 	nsWindow::WindowProc 	widget/src/windows/nsWindow.cpp:4343
21 	user32.dll 	InternalCallWinProc 	
22 	user32.dll 	UserCallWinProcCheckWow 	
23 	user32.dll 	DispatchClientMessage 	
24 	user32.dll 	__fnDWORD 	
25 	ntdll.dll 	KiUserCallbackDispatcher 	
26 	xul.dll 	CallWindowProcCrashProtected 	xpcom/base/nsCrashOnException.cpp:71
27 	user32.dll 	DispatchMessageW 	
28 	xul.dll 	nsAppShell::ProcessNextNativeEvent 	widget/src/windows/nsAppShell.cpp:327
29 	nspr4.dll 	PR_IntervalNow 	nsprpub/pr/src/misc/prinrval.c:77
30 	xul.dll 	nsBaseAppShell::OnProcessNextEvent 	widget/src/xpwidgets/nsBaseAppShell.cpp:324
31 	nspr4.dll 	PR_ExitMonitor 	nsprpub/pr/src/threads/prmon.c:132
32 	xul.dll 	nsThread::ProcessNextEvent 	xpcom/threads/nsThread.cpp:582
33 	nspr4.dll 	_MD_CURRENT_THREAD 	nsprpub/pr/src/md/windows/w95thred.c:308
34 	xul.dll 	mozilla::ipc::MessagePump::Run 	ipc/glue/MessagePump.cpp:134
35 	xul.dll 	MessageLoop::RunHandler 	ipc/chromium/src/base/message_loop.cc:202
36 	xul.dll 	MessageLoop::Run 	ipc/chromium/src/base/message_loop.cc:176
37 	xul.dll 	nsBaseAppShell::Run 	widget/src/xpwidgets/nsBaseAppShell.cpp:189
38 	xul.dll 	xul.dll@0xb6866f 	
39 	xul.dll 	nsAppStartup::Run 	toolkit/components/startup/nsAppStartup.cpp:222
40 	xul.dll 	XRE_main 	toolkit/xre/nsAppRunner.cpp:3700
41 	firefox.exe 	wmain 	toolkit/xre/nsWindowsWMain.cpp:106
42 	firefox.exe 	__tmainCRTStartup 	obj-firefox/memory/jemalloc/crtsrc/crtexe.c:591
43 	kernel32.dll 	BaseProcessStart
Comment 1 Joe Drew (not getting mail) 2011-11-07 10:08:41 PST
Bas, there are comments here about driver crashes, and this is a null deref; maybe you can take a look?
Comment 2 Scoobidiver (away) 2011-12-12 08:58:12 PST
*** Bug 709463 has been marked as a duplicate of this bug. ***
Comment 3 Robert Kaiser (not working on stability any more) 2011-12-16 12:30:33 PST
This interestingly made a jump in yesterday's 8 release and 9 beta data, going up to #7 and #14 in topcrash ranks there.
Comment 4 Bob Clary [:bc:] 2012-02-09 04:02:15 PST
Debug build on Windows 7: Null pointer Crash [@ gfxContext::SetMatrix() ]

1. http://paulmhecht.com/new/live-training/index.html

2. Crash Debug at gfxContext::SetMatrix(gfxMatrix const&) mozilla::layers::BasicLayerManager::PushGroupWithCachedSurface(gfxContext*, gfxASurface::gfxContentType) 

###!!! ASSERTION: You can't dereference a NULL nsRefPtr with opera
tor->().: 'mRawPtr != 0', file c:\work\mozilla\builds\nightly\mozilla\firefox-de
bug\dist\include\nsAutoPtr.h, line 1056

During subsequent testing I also saw

ABORT: State invariants violated: 'mState == UNINITIALIZED || mState == CLONED', file c:/work/mozilla/builds/nightly/mozilla/widget/windows/AudioSession.cpp, line 210

bp-4ff17253-bd90-46fc-8b76-610642120209
Firefox 13.0a1 Crash Report [@ mozilla::layers::BasicLayerManager::PushGroupWithCachedSurface(gfxContext*, gfxASurface::gfxContentType) ] 

bp-63c7bea2-996e-4bf8-80b6-049fa2120209
Firefox 12.0a2 Crash Report [@ gfxContext::SetMatrix(gfxMatrix const&) ] 

bp-b1318b43-b9ef-47a8-a13e-df0ee2120209
Firefox 11.0 Crash Report [@ mozilla::layers::BasicLayerManager::PushGroupWithCachedSurface(gfxContext*, gfxASurface::gfxContentType) ]
Comment 5 Bob Clary [:bc:] 2012-02-12 01:19:13 PST
Another example: http://www.dkp.hu/news/osszes
Comment 6 Bob Clary [:bc:] 2012-03-04 07:15:06 PST
I can still reproduce with http://paulmhecht.com/new/live-training/index.html on Beta/11, Aurora/12 on Windows XP in automation. I was able to reproduce a crash on Nightly debug locally however. There have been about 3000 crashes with this signature in the last week though only a handful with Nightly/13. This is #42 on the top Firefox 10 crashers, and #42 on the top Firefox 11 crashers and doesn't appear in the top crashers for Firefox 12.

Testing an opt Nightly on WinXP hangs but doesn't crash.

I can reproduce locally and will start a reduction.
Comment 7 Robert O'Callahan (:roc) (Exited; email my personal email if necessary) 2012-03-18 14:10:30 PDT
On that page in a debug build I see us calling RecordMemoryUsedForSurfaceType on an invalid surface. Looks like a gfxWindowsSurface that failed to be initialized properly.
Comment 8 Robert O'Callahan (:roc) (Exited; email my personal email if necessary) 2012-03-18 14:20:09 PDT
And that's happening because we're running out of DCs. Win32 CreateCompatibleDC is failing as we try to create a canvas. Need to figure out where those DCs are going.
Comment 9 Robert O'Callahan (:roc) (Exited; email my personal email if necessary) 2012-03-18 14:51:23 PDT
Problem seems to be that the page uses cufon-yui to create a very large number of canvases, at least 5000. With BasicLayers on Windows, each one consumes an HDC and that causes us to run out of DCs and that causes things to start failing. Then I guess that causes us to crash along some code path that doesn't handle failure adequately.
Comment 10 Robert O'Callahan (:roc) (Exited; email my personal email if necessary) 2012-03-18 15:37:32 PDT
Created attachment 607017 [details] [diff] [review]
Part 1: change assertion to warning since it can be triggered by resource exhaustion
Comment 11 Robert O'Callahan (:roc) (Exited; email my personal email if necessary) 2012-03-18 15:38:36 PDT
Created attachment 607018 [details] [diff] [review]
Part 2: make PushGroupWithCachedSurface handle failure to get cached surface

This is the actual fix for this bug.
Comment 12 Robert O'Callahan (:roc) (Exited; email my personal email if necessary) 2012-03-18 15:39:29 PDT
Created attachment 607019 [details] [diff] [review]
Part 3: don't count memory usage for invalid gfxWindowsSurfaces
Comment 13 Robert O'Callahan (:roc) (Exited; email my personal email if necessary) 2012-03-18 15:42:57 PDT
Seems to me we should also do something about every canvas on Windows BasicLayers consuming a DC. Maybe after a certain number of canvases have been allocated (e.g. 500), we allocate the rest using gfxImageSurfaces? Any better ideas?
Comment 16 Bob Clary [:bc:] 2012-03-22 09:21:50 PDT
tested using Nightly/14, Aurora/13, Beta/12 on Linux, OS X 10.6, Windows XP/7

http://www.dkp.hu/news/osszes
http://paulmhecht.com/new/live-training/index.html

only reproduced the gfxContext::SetMatrix crash on http://paulmhecht.com/new/live-training/index.html with Firefox Beta on Windows XP.
Comment 17 Scoobidiver (away) 2012-06-13 11:25:37 PDT
It's currently #6 top browser crasher in 10.0.5 ESR.
Comment 18 Lukas Blakk [:lsblakk] use ?needinfo 2012-06-14 15:20:54 PDT
[Triage Comment]
Looks like a pretty serious regression/top crasher so I'd be fine with sending this out in the next ESR alongside FF14, please nominate.
Comment 19 Alex Keybl [:akeybl] 2012-06-24 12:11:18 PDT
What's the risk profile here?
Comment 20 Robert O'Callahan (:roc) (Exited; email my personal email if necessary) 2012-06-24 16:54:13 PDT
Low. This is simple code, the patch has baked long, and there have been no reported regressions.
Comment 21 Scoobidiver (away) 2012-06-24 23:52:30 PDT
It's already in Beta 14 as its target milestone in mozilla 14.
Comment 22 Alex Keybl [:akeybl] 2012-06-26 09:34:56 PDT
Comment on attachment 607017 [details] [diff] [review]
Part 1: change assertion to warning since it can be triggered by resource exhaustion

[Triage Comment]
Already in 14, so no need on beta. Approving for ESR given this is a top crasher on the branch.
Comment 24 Paul Silaghi, QA [:pauly] 2012-06-27 01:18:57 PDT
1. Open http://paulmhecht.com/new/live-training/index.html in couples of tabs
2. Reload
Actual results:
Crash on FF 14b9, Nightly 16.0a1 (2012-06-26) on Win 7 32-bit.
Comment 25 Scoobidiver (away) 2012-06-27 01:30:03 PDT
Nightly crashes with the STR in comment 24 but with an empty crashing thread signature.
Comment 26 XtC4UaLL [:xtc4uall] 2012-06-27 04:25:04 PDT
(In reply to Scoobidiver from comment #25)
> Nightly crashes with the STR in comment 24 but with an empty crashing thread
> signature.

Filed Bug 768831 for the Nightly Crash with a Windbg Log attached.
Comment 27 Paul Silaghi, QA [:pauly] 2012-06-27 05:49:31 PDT
(In reply to Scoobidiver from comment #25)
> Nightly crashes with the STR in comment 24 but with an empty crashing thread
> signature.

Same behavior on 2012-06-27-mozilla-beta-debug. Sometimes I get the "Unresponsive Script" warning when opening the link in a single tab.
Comment 28 Anthony Hughes (:ashughes) [GFX][QA][Mentor] 2012-07-10 08:23:04 PDT
Paul, can you do me a favour and test this also on Firefox 10.0.6esrpre builds? Also, please provide links to any crash reports. This should have been fixed for Firefox 14 and 10.0.6esr.
Comment 29 Paul Silaghi, QA [:pauly] 2012-07-11 06:00:27 PDT
The STR in comment 24 make Firefox to crash with an empty signature (reproducible on FF 14b12, FF 10.0.6esrpre and Nightly). But this was already filed in bug 768831. 
About the assertion this bug was initially filed for, I think we should wait until bug 768831 is fixed and then check again.
Comment 30 Anthony Hughes (:ashughes) [GFX][QA][Mentor] 2012-07-11 14:38:07 PDT
(In reply to Paul Silaghi [QA] from comment #29) 
> I think we should wait until bug 768831 is fixed and then check again.

Agreed, adding that dependency.
Comment 31 Anthony Hughes (:ashughes) [GFX][QA][Mentor] 2012-08-27 12:24:13 PDT
Marking qa- for the time being since we are still waiting on bug 768831. Please reset to qa+ once that has been resolved.

Note You need to log in before you can comment on or make changes to this bug.