Closed
Bug 1314543
Opened 8 years ago
Closed 8 years ago
Crash in mozilla::gfx::FeatureChange::AssertSanity | "unexpected type tag"
Categories
(Core :: Graphics, defect)
Tracking
()
RESOLVED
FIXED
mozilla55
Tracking | Status | |
---|---|---|
firefox52 | --- | unaffected |
firefox-esr52 | --- | unaffected |
firefox53 | --- | fixed |
firefox54 | --- | fixed |
firefox55 | --- | fixed |
People
(Reporter: kanru, Assigned: vliu)
Details
(Keywords: crash, regression, Whiteboard: [gfx-noted])
Crash Data
Attachments
(1 file, 2 obsolete files)
10.57 KB,
patch
|
dvander
:
review+
lizzard
:
approval-mozilla-aurora+
lizzard
:
approval-mozilla-beta+
|
Details | Diff | Splinter Review |
This bug was filed from the Socorro interface and is
report bp-5abc044b-91fe-4bba-a8f8-71a002161102.
=============================================================
9 crashes since build 20161031030202
Looks like GPU process related.
Flags: needinfo?(dvander)
I can't easily see how this assertion is possible. There's only two possible values for the tag (null_t, or not null_t), and we do a null_t check before unwrapping the union.
Keeping this on my radar - will check back in a week to see if we have more reports.
Not seeing any more crashes this week.
Flags: needinfo?(dvander)
Updated•8 years ago
|
Whiteboard: [gfx-noted]
Comment 3•8 years ago
|
||
it's still around around and accounting for about 0.4% of browser crashes in early data for 53.0b1.
status-firefox52:
--- → unaffected
status-firefox53:
--- → affected
status-firefox54:
--- → affected
status-firefox55:
--- → affected
Flags: needinfo?(dvander)
Keywords: regression
OS: Windows 10 → Windows
Hardware: Unspecified → All
Comment 4•8 years ago
|
||
Vincent, please take a look at this bug. I saw around 62 crashes on FF53, 3 crashes on FF55.
https://crash-stats.mozilla.com/report/index/9f90899b-8912-47cf-b4cf-81a822170311
|[0][GFX1-]: Receive IPC close with reason=AbnormalShutdown (t=100476) |[1][GFX1-]: Receive IPC close with reason=AbnormalShutdown (t=100476) |[2][GFX1-]: Receive IPC close with reason=AbnormalShutdown (t=100476) |[3][GFX1-]: Receive IPC close with reason=AbnormalShutdown (t=100476) |[4][GFX1-]: Receive IPC close with reason=AbnormalShutdown (t=100476) |[5][GFX1-]: Receive IPC close with reason=AbnormalShutdown (t=100476)
Assignee: nobody → vliu
Assignee | ||
Comment 5•8 years ago
|
||
(In reply to Peter Chang[:pchang] from comment #4)
> Vincent, please take a look at this bug. I saw around 62 crashes on FF53, 3
> crashes on FF55.
>
> https://crash-stats.mozilla.com/report/index/9f90899b-8912-47cf-b4cf-
> 81a822170311
>
From the mini-dump, the crash happens at [1]. From [2], AssertSanity() passed TFeatureFailure to it, meaning it expected mType in [1] was also TFeatureFailure, but apparently the mismatch happens.
[1]: https://dxr.mozilla.org/mozilla-central/source/obj-x86_64-pc-linux-gnu/ipc/ipdl/_ipdlheaders/mozilla/gfx/GraphicsMessages.h#476
[2]:https://dxr.mozilla.org/mozilla-central/source/obj-x86_64-pc-linux-gnu/ipc/ipdl/_ipdlheaders/mozilla/gfx/GraphicsMessages.h#549
More observing frame 4 in mini-dump, GPUDeviceData was gotten from [3], the type would be T_None(0). In [4], it seems that if the mType was Tnull_t, gecko won't go into get_FeatureFailure()
d3d11Compositing_.mType = T_None(0)
oglCompositing_.mType = T_None(0)
gpuDevice_.mType = T_None(0)
[3]: https://dxr.mozilla.org/mozilla-central/source/gfx/ipc/GPUChild.cpp#83
[4]: https://dxr.mozilla.org/mozilla-central/source/gfx/config/gfxConfig.cpp#264
(In reply to David Anderson [:dvander] from comment #1)
> I can't easily see how this assertion is possible. There's only two possible
> values for the tag (null_t, or not null_t), and we do a null_t check before
> unwrapping the union.
>
> Keeping this on my radar - will check back in a week to see if we have more
> reports.
Hi David,
Is it possible gecko got the wrong type in [3]? Does it should return Tnull_t?
Assignee | ||
Comment 6•8 years ago
|
||
I also did an experiment. I tried to kill GPU process manually after firefox was launched. in [3] in Comment 5, mType is Tnull_t. I think that's why I think gecko should return Tnull_t in [3]. Or maybe I was wrong because I am not familiar with this part.
Assignee | ||
Comment 7•8 years ago
|
||
If GPU process was killed at the moment when sync ipc call GetDeviceStatus had sent in child side in [5] but not yet to return normally, mType would keep T_None and then hit the same crash point in [6]. I think it would be the possible reason to cause this crash.
[5]: https://dxr.mozilla.org/mozilla-central/source/gfx/ipc/GPUChild.cpp#83
[6]: https://dxr.mozilla.org/mozilla-central/source/obj-x86_64-pc-linux-gnu/ipc/ipdl/_ipdlheaders/mozilla/gfx/GraphicsMessages.h#476
Assignee | ||
Comment 8•8 years ago
|
||
Hi David,
Based on Comment 7, a patch was attached intends to make early return if IPC call fail. Can you please have a review? We may observe if this patch can drop the crash rate.
Attachment #8847037 -
Flags: review?(dvander)
Comment on attachment 8847037 [details] [diff] [review]
0001-Bug-1314543-Make-a-early-return-when-SendGetDeviceSt.patch
Review of attachment 8847037 [details] [diff] [review]:
-----------------------------------------------------------------
Great catch! It looks like callers of EnsureGPUReady don't expect it to fail either. It would be good to fix that now that the function is fallible.
Could you make it have a bool return, and make callers early-return on failure? This would replace patterns like:
> EnsureGPUReady();
>
> if (!mGPUChild) {
> return...
With:
> if (!EnsureGPUReady()) {
> return...
Attachment #8847037 -
Flags: review?(dvander) → feedback+
Assignee | ||
Comment 10•8 years ago
|
||
Hi David,
Thanks for your good suggestion on Comment 9. Could you please have a review to see if it meets your expectation?
Attachment #8847037 -
Attachment is obsolete: true
Attachment #8847479 -
Flags: review?(dvander)
Comment on attachment 8847479 [details] [diff] [review]
0001-Bug-1314543-Make-EnsureGPUReady-returns-bool-to-make.patch
Review of attachment 8847479 [details] [diff] [review]:
-----------------------------------------------------------------
Looks great, thanks!
Attachment #8847479 -
Flags: review?(dvander) → review+
Flags: needinfo?(dvander)
Assignee | ||
Comment 12•8 years ago
|
||
Comment 13•8 years ago
|
||
Pushed by vliu@mozilla.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/a6780188f669
Make EnsureGPUReady() returns bool to make sure GPU process is readay. r=dvander
Comment 14•8 years ago
|
||
vicent told me that this not realted to failures we see bug https://hg.mozilla.org/integration/mozilla-inbound/rev/a6780188f66986608a220efc6095e93daaa9e45e indicate frequent test failures in windows 8 xpc test that are perma failing , so backing this out because this change might have caused this
Flags: needinfo?(vliu)
Comment 15•8 years ago
|
||
Backout by cbook@mozilla.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/62cd40511419
Backed out changeset a6780188f669 for suspicion this cause perma windows 7 xpcshell failures
Assignee | ||
Comment 16•8 years ago
|
||
(In reply to Carsten Book [:Tomcat] from comment #14)
> vicent told me that this not realted to failures we see bug
> https://hg.mozilla.org/integration/mozilla-inbound/rev/
> a6780188f66986608a220efc6095e93daaa9e45e indicate frequent test failures in
> windows 8 xpc test that are perma failing , so backing this out because this
> change might have caused this
Thanks for the help. I'll look into my patch to figure it out.
Flags: needinfo?(vliu)
Assignee | ||
Comment 17•8 years ago
|
||
Hi David,
It seem that my previous r+ed patch caused xpcshell failure. After looked into my patch, the code modification in GPUProcessManager::CreateContentCompositorBridge() by me may change the original behavior. Here to attach my patch. In this patch. There were some changes in GPUProcessManager::CreateContentCompositorBridge() from previous one, others were the same. Could you please have a review? Thanks.
Attachment #8847479 -
Attachment is obsolete: true
Attachment #8849008 -
Flags: review?(dvander)
Assignee | ||
Comment 18•8 years ago
|
||
Here is the xpcshell test try result based on the patch on Comment 17
try: https://treeherder.mozilla.org/#/jobs?repo=try&revision=dc6684a2d9de525e18a823d89d69b67ef9696aac
Attachment #8849008 -
Flags: review?(dvander) → review+
Assignee | ||
Comment 19•8 years ago
|
||
Comment 20•8 years ago
|
||
Pushed by vliu@mozilla.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/50907f304e24
Make EnsureGPUReady() returns bool to make sure GPU process is readay. r=dvander
Comment 21•8 years ago
|
||
bugherder |
Status: NEW → RESOLVED
Closed: 8 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla55
Comment 22•8 years ago
|
||
I believe this needs an Aurora approval request but not Beta?
Comment 23•8 years ago
|
||
Whoops, I think we need this for 53 as well.
Comment 24•8 years ago
|
||
david, want to request uplift on this for aurora and beta?
Flags: needinfo?(dvander)
Comment on attachment 8849008 [details] [diff] [review]
0001-Bug-1314543-Make-EnsureGPUReady-returns-bool-to-make.patch
Approval Request Comment
[Feature/Bug causing the regression]: GPU process
[User impact if declined]: potential UI crashes if the GPU process fails to start
[Is this code covered by automated tests?]: no
[Has the fix been verified in Nightly?]: yes
[Needs manual test from QE? If yes, steps to reproduce]: no
[List of other uplifts needed for the feature/fix]: n/a
[Is the change risky?]: no
[Why is the change risky/not risky?]: This simply ensures that IPC is working before we decide that the GPU process has started successfully. In most cases it is working, and when it isn't, we had a chance of crashing before this patch.
[String changes made/needed]:
Flags: needinfo?(dvander)
Attachment #8849008 -
Flags: approval-mozilla-beta?
Attachment #8849008 -
Flags: approval-mozilla-aurora?
Comment 26•8 years ago
|
||
Comment on attachment 8849008 [details] [diff] [review]
0001-Bug-1314543-Make-EnsureGPUReady-returns-bool-to-make.patch
Prevents a browser crash, let's uplift for beta 10.
Attachment #8849008 -
Flags: approval-mozilla-beta?
Attachment #8849008 -
Flags: approval-mozilla-beta+
Attachment #8849008 -
Flags: approval-mozilla-aurora?
Attachment #8849008 -
Flags: approval-mozilla-aurora+
Comment 27•8 years ago
|
||
bugherder uplift |
Comment 28•8 years ago
|
||
bugherder uplift |
Assignee | ||
Comment 29•8 years ago
|
||
Clear the ni? flag since David has done with this. Thanks David.
Flags: needinfo?(vliu)
You need to log in
before you can comment on or make changes to this bug.
Description
•