Crash in mozilla::layers::CompositorChild::Destroy()

VERIFIED FIXED in Firefox 37

Status

()

Core
Graphics: Layers
--
critical
VERIFIED FIXED
3 years ago
3 years ago

People

(Reporter: whimboo, Assigned: sotaro, NeedInfo)

Tracking

({crash, topcrash-win})

38 Branch
mozilla38
All
Windows 7
crash, topcrash-win
Points:
---

Firefox Tracking Flags

(firefox37+ verified, firefox38+ verified)

Details

(Whiteboard: [mozmill][tbird crash], crash signature, URL)

Attachments

(2 attachments, 2 obsolete attachments)

(Reporter)

Description

3 years ago
Running one of our endurance tests with Mozmill we see lots of crashes on Windows 7 machines with the signature like bp-e739b432-19f7-4797-8bcc-077552150216.

Crash Reason 	EXCEPTION_ACCESS_VIOLATION_READ
Crash Address 	0x0

First 10 frames of the stack:

0 	xul.dll 	mozilla::layers::CompositorChild::Destroy() 	gfx/layers/ipc/CompositorChild.cpp
1 	xul.dll 	nsBaseWidget::DestroyCompositor() 	widget/nsBaseWidget.cpp
2 	xul.dll 	nsWindow::OnPaint(HDC__*, unsigned int) 	widget/windows/nsWindowGfx.cpp
3 	xul.dll 	nsWindow::ProcessMessage(unsigned int, unsigned int&, long&, long*) 	widget/windows/nsWindow.cpp
4 	xul.dll 	nsWindow::WindowProcInternal(HWND__*, unsigned int, unsigned int, long) 	widget/windows/nsWindow.cpp
5 	xul.dll 	CallWindowProcCrashProtected 	xpcom/base/nsCrashOnException.cpp
6 	xul.dll 	nsWindow::WindowProc(HWND__*, unsigned int, unsigned int, long) 	widget/windows/nsWindow.cpp
7 	user32.dll 	InternalCallWinProc 	
8 	user32.dll 	GetRealWindowOwner 	
9 	user32.dll 	DispatchClientMessage 	
10 	user32.dll 	__fnDWORD

The endurance test we are running is http://hg.mozilla.org/qa/mozmill-tests/file/default/firefox/tests/endurance/testBookmarks_OpenAndClose/test1.js

This is only happening for Firefox 38 so far, no single crash with Firefox 37.

Crash details as reported by crash-stats

Windows 7 	99.52 %	207
Windows 8.1 	0.48 %	1

Comment 1

3 years ago
This started in build 20150214030238 and has been the #1 nightly crash since then.

https://hg.mozilla.org/mozilla-central/pushloghtml?fromchange=2f5c5ec1a24b&tochange=a7c177546ca0
The regression range includes:
	7d4f76e0411b	Nicolas Silva — Bug 1120331 - "crash in mozilla::ipc::MessageChannel::Send(IPC::Message*)". r=sotaro

And again this one is all Win7 and all D3D10Warp.dll. Seems like that crash just moved signatures. Nical do you agree?
tracking-firefox38: --- → ?
Flags: needinfo?(nical.bugzilla)
(Assignee)

Comment 2

3 years ago
nical is PTO until end of February. I am going take a bug.
(Assignee)

Updated

3 years ago
Assignee: nobody → sotaro.ikeda.g
(Assignee)

Comment 3

3 years ago
(In reply to David Major [:dmajor] (UTC+13) from comment #1)
> This started in build 20150214030238 and has been the #1 nightly crash since
> then.
> 
> https://hg.mozilla.org/mozilla-central/
> pushloghtml?fromchange=2f5c5ec1a24b&tochange=a7c177546ca0
> The regression range includes:
> 	7d4f76e0411b	Nicolas Silva — Bug 1120331 - "crash in
> mozilla::ipc::MessageChannel::Send(IPC::Message*)". r=sotaro
> 
> And again this one is all Win7 and all D3D10Warp.dll. Seems like that crash
> just moved signatures. Nical do you agree?

It seems just moved the signature. But the change seems to make clear that it is not a problem of ClientLayerManager.

Comment 4

3 years ago
This bug just bite me on Mozilla/5.0 (Windows NT 6.1; WOW64; rv:38.0) Gecko/20100101 Firefox/38.0 ID:20150218030228 CSet: 9696d1c4b3ba sans e10s.

Report ID 	Date Submitted
bp-eaeaefd8-63cb-42f5-92de-ed7ba2150218
	2/18/2015	11:39 AM

Comment 5

3 years ago
me bp-7ad742b2-3c5c-43f9-b304-f54492150218 thunderbird 20150216030224
I had just mouse clicked on a different folder

FWIW, bp-c434f094-8506-4eed-bc18-a668b2150217 comment " always crashing when opening down the url line (most visited links) or try to copy an url with mouse, not ctrl + c"
Keywords: topcrash-win
Whiteboard: [mozmill] → [mozmill][tbird crash]

Comment 6

3 years ago
re comment 3 - for the 3 user email addresses with the most crashes, each with a long history of mozilla::ipc::MessageChannel::Send crashes, each user's rate of crashing for mozilla::layers::CompositorChild::Destroy is the SAME as for mozilla::ipc::MessageChannel::Send(IPC::Message*) for a similar time period

Comment 7

3 years ago
Alex, since you're seeing this (and the other IPC crash) frequently -- is there any common pattern? Is it more frequent on some websites? Or after specific actions that you do?
Flags: needinfo?(alex_mayorga)

Comment 8

3 years ago
Sadly I haven't fount any rhythm or rime as to what causes the crashes.

It seems to have something to do with mousing into the "Awesome bar".

Here are my crashes from the past week or so FWIW:

Report ID 	Date Submitted
bp-7784571b-3787-4578-b302-1462f2150219
	2/19/2015	10:54 AM
bp-889f82eb-4905-4163-83c4-c7f8d2150219
	2/19/2015	10:39 AM
bp-fbf1e11a-a312-4875-bf34-cc38e2150219
	2/19/2015	10:38 AM
bp-7102fcb2-68fd-499a-a14d-fde2c2150219
	2/19/2015	10:38 AM
bp-05734df1-cb76-45f0-ad8e-021e32150219
	2/19/2015	10:18 AM
bp-eaeaefd8-63cb-42f5-92de-ed7ba2150218
	2/18/2015	11:39 AM
bp-2a63e10c-8353-4488-ac76-9b10f2150213
	2/13/2015	4:33 PM
bp-be5088b3-ca30-4d91-8401-2d26f2150213
	2/13/2015	4:31 PM
bp-5c3c2e9c-36c2-48ed-a986-f5f562150213
	2/13/2015	4:30 PM
bp-f364e215-bf49-4826-8aa6-1be1e2150213
	2/13/2015	11:19 AM
bp-1f003cbd-078a-42fc-a1c6-f38a02150212
	2/12/2015	2:49 PM
bp-0efe8407-73b4-47c2-aea7-aa5162150212
	2/12/2015	12:01 PM

This Nightly is updated almost daily and the system runs on VMware (VMCI sockets STREAM : 0 : 1 : c:\Program Files\VMware\VMware Tools\VSock SDK\bin\win32\vsocklib.dll) and it is accessed through http://www.citrix.com/go/receiver.html

Kindly let me know if there's anything I could provide to help solve this bug.
Flags: needinfo?(alex_mayorga) → needinfo?(dmajor)
(Assignee)

Comment 9

3 years ago
I locally modified the code as to often call nsBaseWidget::DestroyCompositor() from nsWindow::OnPaint(). By this change, I could reproduce the almost same crash.

Comment 10

3 years ago
(In reply to alex_mayorga from comment #8)
> Kindly let me know if there's anything I could provide to help solve this bug.

Sotaro knows this code area better than me... Can Alex do anything to help?
Flags: needinfo?(sotaro.ikeda.g)

Updated

3 years ago
Flags: needinfo?(dmajor)
(Assignee)

Comment 11

3 years ago
(In reply to alex_mayorga from comment #8)
> Sadly I haven't fount any rhythm or rime as to what causes the crashes.
> 
> It seems to have something to do with mousing into the "Awesome bar".

nsWindow::OnPaint() is called when drawing browser's chrome part. Therefore, "Awesome bar" seems to affect to the crash. nsBaseWidget::DestroyCompositor() is called when gfxWindowsPlatform::DidRenderingDeviceReset() detects rendering device's reset. It might say that D3D Device on the machine might be unstable. Then, it might be that graphic driver might be unstable.
Flags: needinfo?(sotaro.ikeda.g)
(Assignee)

Comment 12

3 years ago
gfxWindowsPlatform::DidRenderingDeviceReset() is the following.

https://dxr.mozilla.org/mozilla-central/source/gfx/thebes/gfxWindowsPlatform.cpp#1143
(Assignee)

Comment 13

3 years ago
Created attachment 8566910 [details]
call stack of artifically created crash

call stack of a crash that I re-produced by locally modifying the code. This bug's crash and Bug 1120331 seems to be caused by same problem.
(Assignee)

Comment 14

3 years ago
nsBaseWidget::DestroyCompositor() causes ClientLayerManager's destrutor. And it causes nsWindow::GetLayerManager() call. It request new LayerManager creation, new CompositorChild's and new CompositorParent creation. And also current CompositorChild's destruction and current CompositorParent's destruction without correct IPC shutdown.
(Assignee)

Comment 15

3 years ago
Created attachment 8566917 [details] [diff] [review]
patch - Remove DidComposite() from ClientLayerManager's destrutor
(Assignee)

Comment 16

3 years ago
Comment on attachment 8566917 [details] [diff] [review]
patch - Remove DidComposite() from ClientLayerManager's destrutor

matt.woodrow, can we remove DidComposite() from ClientLayerManager's destrutor? It trigger LayerManager creation and causes the problem like Comment 14.
Attachment #8566917 - Flags: review?(matt.woodrow)
I don't think this is ok.

This happens when the layer manager has promised the refresh driver that it's doing a composite, but we're destroying the layer manager before it completes.

We lie and send the DidComposite message anyway, so that the refresh driver doesn't hang waiting for the response.

I think we should call mTransactionIdAllocator->RevokeTransactionId(mLatestTransactionId); instead, but this might still result in LayerManager creation.

Why are we creating a new layer manager if we've just destroyed the last one? What triggered the destruction?

And why does that crash?
(Assignee)

Comment 18

3 years ago
(In reply to Matt Woodrow (:mattwoodrow) from comment #17)
> I don't think this is ok.
> 
> This happens when the layer manager has promised the refresh driver that
> it's doing a composite, but we're destroying the layer manager before it
> completes.
> 
> We lie and send the DidComposite message anyway, so that the refresh driver
> doesn't hang waiting for the response.
> 
> I think we should call
> mTransactionIdAllocator->RevokeTransactionId(mLatestTransactionId); instead,
> but this might still result in LayerManager creation.
> 
> Why are we creating a new layer manager if we've just destroyed the last
> one? What triggered the destruction?

RenderingDeviceReset detect in nsWindow::OnPaint() triggers compositor's destruction. The destruction is done by nsBaseWidget::DestroyCompositor(). Comment 13, Comment 14 might help to understand it.

https://dxr.mozilla.org/mozilla-central/source/widget/windows/nsWindowGfx.cpp#194
 
> And why does that crash?

And DidComposite() from ClientLayerManager causes current using CompositorChild and CompositorParent destruction without correct shutdown. It creates dangling pointers.
(Assignee)

Updated

3 years ago
Attachment #8566917 - Flags: review?(matt.woodrow)
(Assignee)

Comment 19

3 years ago
(In reply to Matt Woodrow (:mattwoodrow) from comment #17)
> I don't think this is ok.
> 
> I think we should call
> mTransactionIdAllocator->RevokeTransactionId(mLatestTransactionId); instead,
> but this might still result in LayerManager creation.

Yes, it resulted to LayerManager creation and caused the crash.
(Assignee)

Comment 20

3 years ago
Created attachment 8566949 [details] [diff] [review]
patch - Care about new CompositorChild and CompositorParent creation.

This seems better than previous one.
Attachment #8566917 - Attachment is obsolete: true
(Assignee)

Comment 21

3 years ago
(In reply to Sotaro Ikeda [:sotaro] from comment #20)
> Created attachment 8566949 [details] [diff] [review]
> patch - Care about new CompositorChild and CompositorParent creation.
> 
> This seems better than previous one.

The patch at least seems to fix the problem.
(Reporter)

Comment 22

3 years ago
I can perfectly reproduce this crash with our Mozmill tests. Just found the time to check that locally. 

Here the steps:
1. Clone http://hg.mozilla.org/qa/mozmill-tests
2. Install Mozmill 2.0.10: https://pypi.python.org/pypi/mozmill
3. Run the tests via `mozmill -b %path/to/firefox% -m mozmill-tests/firefox/tests/functional/testPrivateBrowsing/manifest.ini`

On Windows 7 the tests always when executing the test testPrivateDownloadPanel.js.
(Assignee)

Comment 23

3 years ago
whimboo, can you check if attachment 8566949 [details] [diff] [review] fix the problem?
Flags: needinfo?(hskupin)
(Reporter)

Comment 25

3 years ago
I run this build a couple of times with the tests and yes, the crash is gone! Yay! Thanks!
Flags: needinfo?(hskupin)
(Assignee)

Comment 26

3 years ago
Good! Thanks for the confirmation.
(Assignee)

Comment 27

3 years ago
Comment on attachment 8566949 [details] [diff] [review]
patch - Care about new CompositorChild and CompositorParent creation.

Jeff, can you look into the patch?
Attachment #8566949 - Flags: review?(jmuizelaar)
Created attachment 8567129 [details] [diff] [review]
How about this instead

Sotaro, how about this instead?
(Assignee)

Updated

3 years ago
Attachment #8566949 - Flags: review?(jmuizelaar)
(Assignee)

Comment 29

3 years ago
(In reply to Jeff Muizelaar [:jrmuizel] from comment #28)
> Created attachment 8567129 [details] [diff] [review]
> How about this instead
> 
> Sotaro, how about this instead?

Yhea, it is simpler than mine.
(Assignee)

Comment 30

3 years ago
Jeff, as we talked, can you check the correctness of what the patch is doing?
Flags: needinfo?(jmuizelaar)
Attachment #8567129 - Flags: review+
Comment on attachment 8567129 [details] [diff] [review]
How about this instead

Nical, can do you a post commit review of this?
Flags: needinfo?(jmuizelaar)
Attachment #8567129 - Flags: review?(nical.bugzilla)
(Assignee)

Updated

3 years ago
Attachment #8566949 - Attachment is obsolete: true
(Reporter)

Comment 34

3 years ago
[Tracking Requested - why for this release]: Firefox 37 is indeed affected given that this is just a signature change from bug 1120331. So it would be great to get this fixed for Firefox 37 before we have the first beta. If not I will have to turn off all Mozmill tests on Windows 7 for beta, given that this would fail horribly on us while testing all locales.
status-firefox37: unaffected → affected
tracking-firefox37: --- → ?
https://hg.mozilla.org/mozilla-central/rev/b28fd46a1174
Status: NEW → RESOLVED
Last Resolved: 3 years ago
status-firefox38: affected → fixed
Resolution: --- → FIXED
Target Milestone: --- → mozilla38

Comment 36

3 years ago
Between this and bug 1120331, what's the status of this issue on Beta 37?
Given how huge this was on Aurora 37, I'm really worried about shipping betas with this crash.
Flags: needinfo?(sotaro.ikeda.g)
(Assignee)

Comment 37

3 years ago
Comment on attachment 8567129 [details] [diff] [review]
How about this instead

NOTE: Please see https://wiki.mozilla.org/Release_Management/B2G_Landing to better understand the B2G approval process and landings.

[Approval Request Comment]
Bug caused by (feature/regressing bug #): Bug 1107718
User impact if declined: crash could happen.
Testing completed: locally tested and comment 25.
Risk to taking this patch (and alternatives if risky): relatively low risk.
String or UUID changes made by this patch: none.
Flags: needinfo?(sotaro.ikeda.g)
Attachment #8567129 - Flags: approval-mozilla-b2g37?
(Assignee)

Comment 38

3 years ago
(In reply to Robert Kaiser (:kairo@mozilla.com) from comment #36)
> Between this and bug 1120331, what's the status of this issue on Beta 37?
> Given how huge this was on Aurora 37, I'm really worried about shipping
> betas with this crash.

It could be uplifted to Aurora 37. I just put "Approval Request".

Comment 39

3 years ago
(In reply to Sotaro Ikeda [:sotaro] from comment #38)
> It could be uplifted to Aurora 37. I just put "Approval Request".

You asked for it to be uplifted to B2G only. What you you want for 37 right now is approval-beta?
(Assignee)

Comment 40

3 years ago
(In reply to Robert Kaiser (:kairo@mozilla.com) from comment #39)
> (In reply to Sotaro Ikeda [:sotaro] from comment #38)
> > It could be uplifted to Aurora 37. I just put "Approval Request".
> 
> You asked for it to be uplifted to B2G only. What you you want for 37 right
> now is approval-beta?

Sorry, it was my mistake. Uplift for b2g is not necessary.
(Assignee)

Comment 41

3 years ago
Comment on attachment 8567129 [details] [diff] [review]
How about this instead

Approval Request Comment
[Feature/regressing bug #]: Bug 1107718
[User impact if declined]: crash could happen.
[Describe test coverage new/current, TreeHerder]: unchanged
[Risks and why]: locally tested and comment 25.
[String/UUID change made/needed]: none
Attachment #8567129 - Flags: approval-mozilla-b2g37? → approval-mozilla-beta?
Do we need the patch from bug 1120331 in addition to the fix that has been nommed for uplift here?
tracking-firefox37: ? → +
tracking-firefox38: ? → +
Flags: needinfo?(sotaro.ikeda.g)
Comment on attachment 8567129 [details] [diff] [review]
How about this instead

Thrilled to see the source of this crash identified and the confirmation of the fix in comment 25. Let's get this onto Beta and confirm with crash data from Beta 2. Beta+
Attachment #8567129 - Flags: approval-mozilla-beta? → approval-mozilla-beta+
(Assignee)

Comment 44

3 years ago
(In reply to Lawrence Mandel [:lmandel] (use needinfo) from comment #42)
> Do we need the patch from bug 1120331 in addition to the fix that has been
> nommed for uplift here?

We do not need the patch from bug 1120331. It actually does not fix the crash.
Flags: needinfo?(sotaro.ikeda.g)

Updated

3 years ago
Attachment #8567129 - Flags: review?(nical.bugzilla) → review+

Updated

3 years ago
Flags: needinfo?(nical.bugzilla)
(Reporter)

Comment 46

3 years ago
I have re-enabled our Mozmill tests for Firefox beta. None of our testruns on Windows 7 showed a crash anymore. The same applies for the other branches. Marking as verified fixed. Thanks!
Status: RESOLVED → VERIFIED
status-firefox37: fixed → verified
status-firefox38: fixed → verified
(Reporter)

Comment 47

3 years ago
I'm sorry to say but something has been re-introduced this crash with yesterdays Nightly builds. I will file a new bug for it.
(Reporter)

Updated

3 years ago
Blocks: 1142939
100% reproducible per above (ok, 4 for 4):
about:config; add New boolean media.peerconnection.video.vp9_enabled = true
https://mozilla.github.io/webrtc-landing/pc_test.html
Start
NI to make sure Sotaro/Matt see this
Flags: needinfo?(sotaro.ikeda.g)
Flags: needinfo?(matt.woodrow)
(Assignee)

Comment 51

3 years ago
It is a regression of Bug 1125848. Possible cause of change was just backed out. Bug 1125848 Comment 55.
Flags: needinfo?(sotaro.ikeda.g)
(Assignee)

Comment 52

3 years ago
(In reply to Sotaro Ikeda [:sotaro] from comment #51)
> It is a regression of Bug 1125848. Possible cause of change was just backed
> out. Bug 1125848 Comment 55.

jesup, can you confirm if the backout fix the problem?
Flags: needinfo?(rjesup)
(Assignee)

Updated

3 years ago
Flags: needinfo?(rjesup)
You need to log in before you can comment on or make changes to this bug.