Closed
Bug 1133426
Opened 8 years ago
Closed 8 years ago
Crash in mozilla::layers::CompositorChild::Destroy()
Categories
(Core :: Graphics: Layers, defect)
Tracking
()
VERIFIED
FIXED
mozilla38
People
(Reporter: whimboo, Assigned: sotaro, NeedInfo)
References
()
Details
(Keywords: crash, topcrash-win, Whiteboard: [mozmill][tbird crash])
Crash Data
Attachments
(2 files, 2 obsolete files)
4.75 KB,
text/plain
|
Details | |
2.63 KB,
patch
|
jrmuizel
:
review+
nical
:
review+
lmandel
:
approval-mozilla-beta+
|
Details | Diff | Splinter Review |
Running one of our endurance tests with Mozmill we see lots of crashes on Windows 7 machines with the signature like bp-e739b432-19f7-4797-8bcc-077552150216. Crash Reason EXCEPTION_ACCESS_VIOLATION_READ Crash Address 0x0 First 10 frames of the stack: 0 xul.dll mozilla::layers::CompositorChild::Destroy() gfx/layers/ipc/CompositorChild.cpp 1 xul.dll nsBaseWidget::DestroyCompositor() widget/nsBaseWidget.cpp 2 xul.dll nsWindow::OnPaint(HDC__*, unsigned int) widget/windows/nsWindowGfx.cpp 3 xul.dll nsWindow::ProcessMessage(unsigned int, unsigned int&, long&, long*) widget/windows/nsWindow.cpp 4 xul.dll nsWindow::WindowProcInternal(HWND__*, unsigned int, unsigned int, long) widget/windows/nsWindow.cpp 5 xul.dll CallWindowProcCrashProtected xpcom/base/nsCrashOnException.cpp 6 xul.dll nsWindow::WindowProc(HWND__*, unsigned int, unsigned int, long) widget/windows/nsWindow.cpp 7 user32.dll InternalCallWinProc 8 user32.dll GetRealWindowOwner 9 user32.dll DispatchClientMessage 10 user32.dll __fnDWORD The endurance test we are running is http://hg.mozilla.org/qa/mozmill-tests/file/default/firefox/tests/endurance/testBookmarks_OpenAndClose/test1.js This is only happening for Firefox 38 so far, no single crash with Firefox 37. Crash details as reported by crash-stats Windows 7 99.52 % 207 Windows 8.1 0.48 % 1
This started in build 20150214030238 and has been the #1 nightly crash since then. https://hg.mozilla.org/mozilla-central/pushloghtml?fromchange=2f5c5ec1a24b&tochange=a7c177546ca0 The regression range includes: 7d4f76e0411b Nicolas Silva — Bug 1120331 - "crash in mozilla::ipc::MessageChannel::Send(IPC::Message*)". r=sotaro And again this one is all Win7 and all D3D10Warp.dll. Seems like that crash just moved signatures. Nical do you agree?
tracking-firefox38:
--- → ?
Flags: needinfo?(nical.bugzilla)
Assignee | ||
Comment 2•8 years ago
|
||
nical is PTO until end of February. I am going take a bug.
Assignee | ||
Updated•8 years ago
|
Assignee: nobody → sotaro.ikeda.g
Assignee | ||
Comment 3•8 years ago
|
||
(In reply to David Major [:dmajor] (UTC+13) from comment #1) > This started in build 20150214030238 and has been the #1 nightly crash since > then. > > https://hg.mozilla.org/mozilla-central/ > pushloghtml?fromchange=2f5c5ec1a24b&tochange=a7c177546ca0 > The regression range includes: > 7d4f76e0411b Nicolas Silva — Bug 1120331 - "crash in > mozilla::ipc::MessageChannel::Send(IPC::Message*)". r=sotaro > > And again this one is all Win7 and all D3D10Warp.dll. Seems like that crash > just moved signatures. Nical do you agree? It seems just moved the signature. But the change seems to make clear that it is not a problem of ClientLayerManager.
Comment 4•8 years ago
|
||
This bug just bite me on Mozilla/5.0 (Windows NT 6.1; WOW64; rv:38.0) Gecko/20100101 Firefox/38.0 ID:20150218030228 CSet: 9696d1c4b3ba sans e10s. Report ID Date Submitted bp-eaeaefd8-63cb-42f5-92de-ed7ba2150218 2/18/2015 11:39 AM
Comment 5•8 years ago
|
||
me bp-7ad742b2-3c5c-43f9-b304-f54492150218 thunderbird 20150216030224 I had just mouse clicked on a different folder FWIW, bp-c434f094-8506-4eed-bc18-a668b2150217 comment " always crashing when opening down the url line (most visited links) or try to copy an url with mouse, not ctrl + c"
Keywords: topcrash-win
Whiteboard: [mozmill] → [mozmill][tbird crash]
Comment 6•8 years ago
|
||
re comment 3 - for the 3 user email addresses with the most crashes, each with a long history of mozilla::ipc::MessageChannel::Send crashes, each user's rate of crashing for mozilla::layers::CompositorChild::Destroy is the SAME as for mozilla::ipc::MessageChannel::Send(IPC::Message*) for a similar time period
Alex, since you're seeing this (and the other IPC crash) frequently -- is there any common pattern? Is it more frequent on some websites? Or after specific actions that you do?
Flags: needinfo?(alex_mayorga)
Comment 8•8 years ago
|
||
Sadly I haven't fount any rhythm or rime as to what causes the crashes. It seems to have something to do with mousing into the "Awesome bar". Here are my crashes from the past week or so FWIW: Report ID Date Submitted bp-7784571b-3787-4578-b302-1462f2150219 2/19/2015 10:54 AM bp-889f82eb-4905-4163-83c4-c7f8d2150219 2/19/2015 10:39 AM bp-fbf1e11a-a312-4875-bf34-cc38e2150219 2/19/2015 10:38 AM bp-7102fcb2-68fd-499a-a14d-fde2c2150219 2/19/2015 10:38 AM bp-05734df1-cb76-45f0-ad8e-021e32150219 2/19/2015 10:18 AM bp-eaeaefd8-63cb-42f5-92de-ed7ba2150218 2/18/2015 11:39 AM bp-2a63e10c-8353-4488-ac76-9b10f2150213 2/13/2015 4:33 PM bp-be5088b3-ca30-4d91-8401-2d26f2150213 2/13/2015 4:31 PM bp-5c3c2e9c-36c2-48ed-a986-f5f562150213 2/13/2015 4:30 PM bp-f364e215-bf49-4826-8aa6-1be1e2150213 2/13/2015 11:19 AM bp-1f003cbd-078a-42fc-a1c6-f38a02150212 2/12/2015 2:49 PM bp-0efe8407-73b4-47c2-aea7-aa5162150212 2/12/2015 12:01 PM This Nightly is updated almost daily and the system runs on VMware (VMCI sockets STREAM : 0 : 1 : c:\Program Files\VMware\VMware Tools\VSock SDK\bin\win32\vsocklib.dll) and it is accessed through http://www.citrix.com/go/receiver.html Kindly let me know if there's anything I could provide to help solve this bug.
Flags: needinfo?(alex_mayorga) → needinfo?(dmajor)
Assignee | ||
Comment 9•8 years ago
|
||
I locally modified the code as to often call nsBaseWidget::DestroyCompositor() from nsWindow::OnPaint(). By this change, I could reproduce the almost same crash.
![]() |
||
Comment 10•8 years ago
|
||
(In reply to alex_mayorga from comment #8) > Kindly let me know if there's anything I could provide to help solve this bug. Sotaro knows this code area better than me... Can Alex do anything to help?
Flags: needinfo?(sotaro.ikeda.g)
Assignee | ||
Comment 11•8 years ago
|
||
(In reply to alex_mayorga from comment #8) > Sadly I haven't fount any rhythm or rime as to what causes the crashes. > > It seems to have something to do with mousing into the "Awesome bar". nsWindow::OnPaint() is called when drawing browser's chrome part. Therefore, "Awesome bar" seems to affect to the crash. nsBaseWidget::DestroyCompositor() is called when gfxWindowsPlatform::DidRenderingDeviceReset() detects rendering device's reset. It might say that D3D Device on the machine might be unstable. Then, it might be that graphic driver might be unstable.
Flags: needinfo?(sotaro.ikeda.g)
Assignee | ||
Comment 12•8 years ago
|
||
gfxWindowsPlatform::DidRenderingDeviceReset() is the following. https://dxr.mozilla.org/mozilla-central/source/gfx/thebes/gfxWindowsPlatform.cpp#1143
Assignee | ||
Comment 13•8 years ago
|
||
call stack of a crash that I re-produced by locally modifying the code. This bug's crash and Bug 1120331 seems to be caused by same problem.
Assignee | ||
Comment 14•8 years ago
|
||
nsBaseWidget::DestroyCompositor() causes ClientLayerManager's destrutor. And it causes nsWindow::GetLayerManager() call. It request new LayerManager creation, new CompositorChild's and new CompositorParent creation. And also current CompositorChild's destruction and current CompositorParent's destruction without correct IPC shutdown.
Assignee | ||
Comment 15•8 years ago
|
||
Assignee | ||
Comment 16•8 years ago
|
||
Comment on attachment 8566917 [details] [diff] [review] patch - Remove DidComposite() from ClientLayerManager's destrutor matt.woodrow, can we remove DidComposite() from ClientLayerManager's destrutor? It trigger LayerManager creation and causes the problem like Comment 14.
Attachment #8566917 -
Flags: review?(matt.woodrow)
Comment 17•8 years ago
|
||
I don't think this is ok. This happens when the layer manager has promised the refresh driver that it's doing a composite, but we're destroying the layer manager before it completes. We lie and send the DidComposite message anyway, so that the refresh driver doesn't hang waiting for the response. I think we should call mTransactionIdAllocator->RevokeTransactionId(mLatestTransactionId); instead, but this might still result in LayerManager creation. Why are we creating a new layer manager if we've just destroyed the last one? What triggered the destruction? And why does that crash?
Assignee | ||
Comment 18•8 years ago
|
||
(In reply to Matt Woodrow (:mattwoodrow) from comment #17) > I don't think this is ok. > > This happens when the layer manager has promised the refresh driver that > it's doing a composite, but we're destroying the layer manager before it > completes. > > We lie and send the DidComposite message anyway, so that the refresh driver > doesn't hang waiting for the response. > > I think we should call > mTransactionIdAllocator->RevokeTransactionId(mLatestTransactionId); instead, > but this might still result in LayerManager creation. > > Why are we creating a new layer manager if we've just destroyed the last > one? What triggered the destruction? RenderingDeviceReset detect in nsWindow::OnPaint() triggers compositor's destruction. The destruction is done by nsBaseWidget::DestroyCompositor(). Comment 13, Comment 14 might help to understand it. https://dxr.mozilla.org/mozilla-central/source/widget/windows/nsWindowGfx.cpp#194 > And why does that crash? And DidComposite() from ClientLayerManager causes current using CompositorChild and CompositorParent destruction without correct shutdown. It creates dangling pointers.
Assignee | ||
Updated•8 years ago
|
Attachment #8566917 -
Flags: review?(matt.woodrow)
Assignee | ||
Comment 19•8 years ago
|
||
(In reply to Matt Woodrow (:mattwoodrow) from comment #17) > I don't think this is ok. > > I think we should call > mTransactionIdAllocator->RevokeTransactionId(mLatestTransactionId); instead, > but this might still result in LayerManager creation. Yes, it resulted to LayerManager creation and caused the crash.
Assignee | ||
Comment 20•8 years ago
|
||
This seems better than previous one.
Attachment #8566917 -
Attachment is obsolete: true
Assignee | ||
Comment 21•8 years ago
|
||
(In reply to Sotaro Ikeda [:sotaro] from comment #20) > Created attachment 8566949 [details] [diff] [review] > patch - Care about new CompositorChild and CompositorParent creation. > > This seems better than previous one. The patch at least seems to fix the problem.
Reporter | ||
Comment 22•8 years ago
|
||
I can perfectly reproduce this crash with our Mozmill tests. Just found the time to check that locally. Here the steps: 1. Clone http://hg.mozilla.org/qa/mozmill-tests 2. Install Mozmill 2.0.10: https://pypi.python.org/pypi/mozmill 3. Run the tests via `mozmill -b %path/to/firefox% -m mozmill-tests/firefox/tests/functional/testPrivateBrowsing/manifest.ini` On Windows 7 the tests always when executing the test testPrivateDownloadPanel.js.
Assignee | ||
Comment 23•8 years ago
|
||
whimboo, can you check if attachment 8566949 [details] [diff] [review] fix the problem?
Flags: needinfo?(hskupin)
Assignee | ||
Comment 24•8 years ago
|
||
https://treeherder.mozilla.org/#/jobs?repo=try&revision=065e63ac1eea
Reporter | ||
Comment 25•8 years ago
|
||
I run this build a couple of times with the tests and yes, the crash is gone! Yay! Thanks!
Flags: needinfo?(hskupin)
Assignee | ||
Comment 26•8 years ago
|
||
Good! Thanks for the confirmation.
Assignee | ||
Comment 27•8 years ago
|
||
Comment on attachment 8566949 [details] [diff] [review] patch - Care about new CompositorChild and CompositorParent creation. Jeff, can you look into the patch?
Attachment #8566949 -
Flags: review?(jmuizelaar)
Comment 28•8 years ago
|
||
Sotaro, how about this instead?
Assignee | ||
Updated•8 years ago
|
Attachment #8566949 -
Flags: review?(jmuizelaar)
Assignee | ||
Comment 29•8 years ago
|
||
(In reply to Jeff Muizelaar [:jrmuizel] from comment #28) > Created attachment 8567129 [details] [diff] [review] > How about this instead > > Sotaro, how about this instead? Yhea, it is simpler than mine.
Assignee | ||
Comment 30•8 years ago
|
||
Jeff, as we talked, can you check the correctness of what the patch is doing?
Flags: needinfo?(jmuizelaar)
Updated•8 years ago
|
Attachment #8567129 -
Flags: review+
Comment 31•8 years ago
|
||
Comment on attachment 8567129 [details] [diff] [review] How about this instead Nical, can do you a post commit review of this?
Flags: needinfo?(jmuizelaar)
Attachment #8567129 -
Flags: review?(nical.bugzilla)
Assignee | ||
Updated•8 years ago
|
Attachment #8566949 -
Attachment is obsolete: true
Assignee | ||
Comment 32•8 years ago
|
||
https://treeherder.mozilla.org/#/jobs?repo=try&revision=39019a085436
Assignee | ||
Comment 33•8 years ago
|
||
https://hg.mozilla.org/integration/mozilla-inbound/rev/b28fd46a1174
Reporter | ||
Comment 34•8 years ago
|
||
[Tracking Requested - why for this release]: Firefox 37 is indeed affected given that this is just a signature change from bug 1120331. So it would be great to get this fixed for Firefox 37 before we have the first beta. If not I will have to turn off all Mozmill tests on Windows 7 for beta, given that this would fail horribly on us while testing all locales.
tracking-firefox37:
--- → ?
Comment 35•8 years ago
|
||
https://hg.mozilla.org/mozilla-central/rev/b28fd46a1174
Status: NEW → RESOLVED
Closed: 8 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla38
![]() |
||
Comment 36•8 years ago
|
||
Between this and bug 1120331, what's the status of this issue on Beta 37? Given how huge this was on Aurora 37, I'm really worried about shipping betas with this crash.
Flags: needinfo?(sotaro.ikeda.g)
Assignee | ||
Comment 37•8 years ago
|
||
Comment on attachment 8567129 [details] [diff] [review] How about this instead NOTE: Please see https://wiki.mozilla.org/Release_Management/B2G_Landing to better understand the B2G approval process and landings. [Approval Request Comment] Bug caused by (feature/regressing bug #): Bug 1107718 User impact if declined: crash could happen. Testing completed: locally tested and comment 25. Risk to taking this patch (and alternatives if risky): relatively low risk. String or UUID changes made by this patch: none.
Flags: needinfo?(sotaro.ikeda.g)
Attachment #8567129 -
Flags: approval-mozilla-b2g37?
Assignee | ||
Comment 38•8 years ago
|
||
(In reply to Robert Kaiser (:kairo@mozilla.com) from comment #36) > Between this and bug 1120331, what's the status of this issue on Beta 37? > Given how huge this was on Aurora 37, I'm really worried about shipping > betas with this crash. It could be uplifted to Aurora 37. I just put "Approval Request".
![]() |
||
Comment 39•8 years ago
|
||
(In reply to Sotaro Ikeda [:sotaro] from comment #38) > It could be uplifted to Aurora 37. I just put "Approval Request". You asked for it to be uplifted to B2G only. What you you want for 37 right now is approval-beta?
Assignee | ||
Comment 40•8 years ago
|
||
(In reply to Robert Kaiser (:kairo@mozilla.com) from comment #39) > (In reply to Sotaro Ikeda [:sotaro] from comment #38) > > It could be uplifted to Aurora 37. I just put "Approval Request". > > You asked for it to be uplifted to B2G only. What you you want for 37 right > now is approval-beta? Sorry, it was my mistake. Uplift for b2g is not necessary.
Assignee | ||
Comment 41•8 years ago
|
||
Comment on attachment 8567129 [details] [diff] [review] How about this instead Approval Request Comment [Feature/regressing bug #]: Bug 1107718 [User impact if declined]: crash could happen. [Describe test coverage new/current, TreeHerder]: unchanged [Risks and why]: locally tested and comment 25. [String/UUID change made/needed]: none
Attachment #8567129 -
Flags: approval-mozilla-b2g37? → approval-mozilla-beta?
Comment 42•8 years ago
|
||
Do we need the patch from bug 1120331 in addition to the fix that has been nommed for uplift here?
Comment 43•8 years ago
|
||
Comment on attachment 8567129 [details] [diff] [review] How about this instead Thrilled to see the source of this crash identified and the confirmation of the fix in comment 25. Let's get this onto Beta and confirm with crash data from Beta 2. Beta+
Attachment #8567129 -
Flags: approval-mozilla-beta? → approval-mozilla-beta+
Assignee | ||
Comment 44•8 years ago
|
||
(In reply to Lawrence Mandel [:lmandel] (use needinfo) from comment #42) > Do we need the patch from bug 1120331 in addition to the fix that has been > nommed for uplift here? We do not need the patch from bug 1120331. It actually does not fix the crash.
Flags: needinfo?(sotaro.ikeda.g)
Updated•8 years ago
|
Attachment #8567129 -
Flags: review?(nical.bugzilla) → review+
Updated•8 years ago
|
Flags: needinfo?(nical.bugzilla)
Reporter | ||
Comment 46•8 years ago
|
||
I have re-enabled our Mozmill tests for Firefox beta. None of our testruns on Windows 7 showed a crash anymore. The same applies for the other branches. Marking as verified fixed. Thanks!
Reporter | ||
Comment 47•8 years ago
|
||
I'm sorry to say but something has been re-introduced this crash with yesterdays Nightly builds. I will file a new bug for it.
Comment 48•8 years ago
|
||
https://crash-stats.mozilla.com/report/index/bp-0881274e-1dc4-4f30-8b0c-d8cac2150313 https://crash-stats.mozilla.com/report/index/bp-4fb0828c-fa40-470e-a734-388252150313 https://crash-stats.mozilla.com/report/index/403fb6cc-86ad-4477-a3af-303022150313
Comment 49•8 years ago
|
||
100% reproducible per above (ok, 4 for 4): about:config; add New boolean media.peerconnection.video.vp9_enabled = true https://mozilla.github.io/webrtc-landing/pc_test.html Start
Comment 50•8 years ago
|
||
NI to make sure Sotaro/Matt see this
Flags: needinfo?(sotaro.ikeda.g)
Flags: needinfo?(matt.woodrow)
Assignee | ||
Comment 51•8 years ago
|
||
It is a regression of Bug 1125848. Possible cause of change was just backed out. Bug 1125848 Comment 55.
Flags: needinfo?(sotaro.ikeda.g)
Assignee | ||
Comment 52•8 years ago
|
||
(In reply to Sotaro Ikeda [:sotaro] from comment #51) > It is a regression of Bug 1125848. Possible cause of change was just backed > out. Bug 1125848 Comment 55. jesup, can you confirm if the backout fix the problem?
Flags: needinfo?(rjesup)
Assignee | ||
Updated•8 years ago
|
Flags: needinfo?(rjesup)
You need to log in
before you can comment on or make changes to this bug.
Description
•