Closed Bug 1311644 Opened 8 years ago Closed 8 years ago

Startup crash in mozilla::ipc::MessageChannel::AssertWorkerThread | mozilla::layers::PTextureChild::SendDestroy

Categories

(Core :: Graphics: Layers, defect, P3)

50 Branch
x86
Windows
defect

Tracking

()

RESOLVED FIXED
mozilla52
Tracking Status
firefox50 + wontfix
firefox51 + fixed
firefox52 + fixed

People

(Reporter: philipp, Assigned: pchang)

Details

(Keywords: crash, regression, Whiteboard: [gfx-noted])

Crash Data

Attachments

(3 files)

This bug was filed from the Socorro interface and is 
report bp-dc68b6ea-411c-4728-a591-0b93e2161020.
=============================================================
Crashing Thread (37)
Frame 	Module 	Signature 	Source
0 	xul.dll 	mozilla::ipc::MessageChannel::AssertWorkerThread() 	obj-firefox/dist/include/mozilla/ipc/MessageChannel.h:451
1 		@0x39 	
2 	xul.dll 	mozilla::layers::PTextureChild::SendDestroy() 	obj-firefox/ipc/ipdl/PTextureChild.cpp:58
3 	xul.dll 	mozilla::layers::PTextureChild::SendDestroy() 	obj-firefox/ipc/ipdl/PTextureChild.cpp:63
4 	xul.dll 	mozilla::layers::DeallocateTextureClient(mozilla::layers::TextureDeallocParams) 	gfx/layers/client/TextureClient.cpp:357
5 	xul.dll 	RunnableFunction<void (*)(mozilla::layers::TextureDeallocParams), mozilla::Tuple<mozilla::layers::TextureDeallocParams> >::Run() 	ipc/chromium/src/base/task.h:336
6 	xul.dll 	MessageLoop::RunTask(already_AddRefed<mozilla::Runnable>) 	ipc/chromium/src/base/message_loop.cc:346
7 	xul.dll 	MessageLoop::DeferOrRunPendingTask(MessageLoop::PendingTask&&) 	ipc/chromium/src/base/message_loop.cc:354
8 	xul.dll 	MessageLoop::DoWork() 	ipc/chromium/src/base/message_loop.cc:429
9 	xul.dll 	base::MessagePumpDefault::Run(base::MessagePump::Delegate*) 	ipc/chromium/src/base/message_pump_default.cc:36
10 	xul.dll 	MessageLoop::RunHandler() 	ipc/chromium/src/base/message_loop.cc:225
11 	xul.dll 	MessageLoop::Run() 	ipc/chromium/src/base/message_loop.cc:205
12 	xul.dll 	base::Thread::ThreadMain() 	ipc/chromium/src/base/thread.cc:180
13 	xul.dll 	`anonymous namespace'::ThreadFunc 	ipc/chromium/src/base/platform_thread_win.cc:28
14 	kernel32.dll 	BaseThreadInitThunk 	
15 	ntdll.dll 	__RtlUserThreadStart 	
16 	ntdll.dll 	_RtlUserThreadStart

this crash is regressing/spiking in firefox 50 beta builds - reports are annotated with "MOZ_RELEASE_ASSERT(mWorkerLoopID == MessageLoop::current()->id()) (not on worker thread!)" which was added by bug 1146471.

so far the crash is happening on win7-10 and 94% on startup - on 50.0b7 it's about 0.1% of all browser crashes.

correlations from 50.0b:
(100.0% in signature vs 00.10% overall) moz_crash_reason = MOZ_RELEASE_ASSERT(mWorkerLoopID == MessageLoop::current()->id()) (not on worker thread!)
(96.92% in signature vs 21.50% overall) "DXVA2D3D9+" in app_notes = true
(96.92% in signature vs 26.75% overall) "DXVA2D3D9?" in app_notes = true
(100.0% in signature vs 34.94% overall) reason = EXCEPTION_BREAKPOINT
(40.00% in signature vs 03.87% overall) cpu_info = GenuineIntel family 6 model 60 stepping 3 | 4
(26.15% in signature vs 02.00% overall) adapter_device_id = 0x041e

there are 2 single reports generated from 49.0.1 installations as well though:
https://crash-stats.mozilla.com/report/index/c09cd2ea-a14e-49dd-80a4-41b8e2161006
https://crash-stats.mozilla.com/report/index/cdbda033-9d5d-4e27-9464-827222161004
Tracking 52+ for this startup crash regression.
Philipp, do the 2 reports from 49.0.1 indicate this is a carry-over regression in 50 and thus Version should be set to 49?
Flags: needinfo?(madperson)
i'm not sure at all. 

in general when i see a distribution of crashes among firefox versions like this i'd suspect there is some new feature pref'd off by default (that a couple of release users have switched on manually), and that pref change is regularly riding the trains in pre-release builds...
Flags: needinfo?(madperson)
a strange thing timing-wise would be that the first recorded instance of this signature was on 2016-09-29, whereas the first builds of 50.0b were already pushed to users a week earlier than that. so maybe some external factor is playing into this as well?
Flags: needinfo?(nical.bugzilla)
Priority: -- → P3
Whiteboard: [gfx-noted]
I just discussed this with nical last week. I will work on the patch for this.
Assignee: nobody → howareyou322
Flags: needinfo?(nical.bugzilla)
I think this bug was caused by de-allocating a texture client with null mAllocator in [1]. Therefore, we can't get correct ipdl thread to dispatch message.

After checking more, I found only SurfaceFactory_Basic doesn't have mAllocator[2].
If my assumption is true, it is possible to hit the condition in [3]. 

nical, I think we just add early return if allocator is NULL. How do you think?

[1]http://searchfox.org/mozilla-central/source/gfx/layers/client/TextureClient.cpp#354 
[2]http://searchfox.org/mozilla-central/source/gfx/gl/SharedSurfaceGL.cpp#103
[3]http://searchfox.org/mozilla-central/source/gfx/layers/client/TextureClient.cpp#391
Flags: needinfo?(nical.bugzilla)
SurfaceFactory should only be used with WebGL, which is main-thread only right now. This crash is happening off the main thread so I suspect it is something else using the ImageBridge (probably video-related). The problem with early-returning if the allocator is null is that if the IPDL actor exists it is leaked, so I would rather we find out what exactly is causing the allocator to be null and make sure it does not happen.
Flags: needinfo?(nical.bugzilla)
Tracking 51+ as startup regression.
Hey Peter, any updates on this?
Flags: needinfo?(howareyou322)
It might be too late for 50, if we have a fix that seems safe we can consider uplifting for 50.1.0
Here are two places that we might get null allocator, TextureClient::CreateForYCbCr and TextureClient::CreateForYCbCrWithBufferSize. 

And there was a comment mentioned that " // The only reason we allow aAllocator to be null is for gtests".
Just uploaded a patch to pass valid allocator in GTest, and then we could force caller to pass valid allocator when calling CreateForYCbCr and CreateForYCbCrWithBufferSize.
Flags: needinfo?(howareyou322)
Just fixed compile error on windows.

The following is the latest try result.
https://treeherder.mozilla.org/#/jobs?repo=try&revision=98d021ba76795ce7dc8b512bf326679aba964ce1
Comment on attachment 8808505 [details]
Bug 1311644 - Create YCbCr texture client with valid allocator,

https://reviewboard.mozilla.org/r/91344/#review91668

If the gtest that was expecting a null allocator causes issues, it's fine to remove it. It's not a great test, it hasn't ever found a bug that I am aware of since I added it and it doesn't test a real-world situation very well.
Attachment #8808505 - Flags: review?(nical.bugzilla) → review+
Comment on attachment 8808506 [details]
Bug 1311644 - Pass ImageBridge as allocator to create YCbCr texture client,

https://reviewboard.mozilla.org/r/91346/#review91672

Sure, if that works out. If you want to remove the test it's fine as well as I mentioned in the other review.
Attachment #8808506 - Flags: review?(nical.bugzilla) → review+
(In reply to Peter Chang[:pchang] from comment #15)
> Just fixed compile error on windows.
> 
> The following is the latest try result.
> https://treeherder.mozilla.org/#/
> jobs?repo=try&revision=98d021ba76795ce7dc8b512bf326679aba964ce1

I tried to revert my changes and submit another try. And I still got same failures. Therefore, these failures were not related to my changes.
(In reply to Nicolas Silva [:nical] from comment #18)
> Comment on attachment 8808506 [details]
> Bug 1311644 - Pass ImageBridge as allocator to create YCbCr texture client,
> 
> https://reviewboard.mozilla.org/r/91346/#review91672
> 
> Sure, if that works out. If you want to remove the test it's fine as well as
> I mentioned in the other review.

I think we could still keep it because of the testing coverage.
Pushed by pchang@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/405843a8c845
Create YCbCr texture client with valid allocator, r=nical
https://hg.mozilla.org/integration/autoland/rev/ea948a05bfa4
Pass ImageBridge as allocator to create YCbCr texture client, r=nical
https://hg.mozilla.org/mozilla-central/rev/405843a8c845
https://hg.mozilla.org/mozilla-central/rev/ea948a05bfa4
Status: NEW → RESOLVED
Closed: 8 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla52
Comment on attachment 8808505 [details]
Bug 1311644 - Create YCbCr texture client with valid allocator,

Approval Request Comment
[Feature/regressing bug #]:none
[User impact if declined]:has chance to call TextureDestroy with null allocator and hit the wrong workerthread assertion again.
[Describe test coverage new/current, TreeHerder]:treeherder
[Risks and why]: low risk because the changes add more error checking to prevent abnormal condition.  
[String/UUID change made/needed]:none
Attachment #8808505 - Flags: approval-mozilla-beta?
Attachment #8808505 - Flags: approval-mozilla-aurora?
Comment on attachment 8808505 [details]
Bug 1311644 - Create YCbCr texture client with valid allocator,

Fix a crash. Aurora51+.
Attachment #8808505 - Flags: approval-mozilla-aurora? → approval-mozilla-aurora+
has conflicts 

grafting 374110:405843a8c845 "Bug 1311644 - Create YCbCr texture client with valid allocator, r=nical"
merging gfx/layers/client/TextureClient.cpp
warning: conflicts while merging gfx/layers/client/TextureClient.cpp! (edit, then use 'hg resolve --mark')
abort: unresolved conflicts, can't continue
(use 'hg resolve' and 'hg graft --continue')
Flags: needinfo?(howareyou322)
Upload patch for aurora/beta
Flags: needinfo?(howareyou322)
Tomcat, please upload attachment 8810316 [details] [diff] [review] for aurora/beta.
Flags: needinfo?(cbook)
Comment on attachment 8808505 [details]
Bug 1311644 - Create YCbCr texture client with valid allocator,

Moving uplift request to beta/release since we're now after merge day
Attachment #8808505 - Flags: approval-mozilla-release?
Attachment #8808505 - Flags: approval-mozilla-beta?
Attachment #8808505 - Flags: approval-mozilla-beta+
Attachment #8808505 - Flags: approval-mozilla-aurora+
This seems like a low volume fix on 50, ~30 instances and the same on 50 RC. Wontfix for 50.
Comment on attachment 8808505 [details]
Bug 1311644 - Create YCbCr texture client with valid allocator,

I don't see the urgency of taking this fix since it is a pretty low volume crash signature. Let this one ride the 51 train.
Attachment #8808505 - Flags: approval-mozilla-release? → approval-mozilla-release-
Flags: needinfo?(cbook)
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: