Closed
Bug 1333056
Opened 8 years ago
Closed 8 years ago
content process crash in mozilla::gfx::GfxPrefValue::AssertSanity
Categories
(Core :: Graphics, defect, P3)
Tracking
()
RESOLVED
FIXED
mozilla58
People
(Reporter: benjamin, Assigned: rhunt)
References
Details
(Keywords: crash, regression, Whiteboard: [gfx-noted])
Crash Data
Attachments
(2 files)
|
938 bytes,
patch
|
mccr8
:
review+
|
Details | Diff | Splinter Review |
|
1.19 KB,
patch
|
dvander
:
review+
ritu
:
approval-mozilla-beta+
|
Details | Diff | Splinter Review |
This bug was filed from the Socorro interface and is
report bp-a358ca80-5de8-45de-af9c-b96852170123.
I experienced this crash on nightly this morning. I don't exactly understand what is asserting/aborting here; is this related to a particular pref I have set, or is this more a communication error between several processes? Is it likely that this is somehow related to the GPU process? (setting NI?anthony so he's aware and can hook up the dependencies appropriately).
I see 12 of these on nightly, so I'm not the only one but this isn't high volume yet.
Flags: needinfo?(anthony.s.hughes)
Going back further in the crash data this starts showing up with Firefox 52.0a1 20161103030205 which predates GPU Process so I think we can rule that out. That build ID correlates to https://hg.mozilla.org/mozilla-central/pushloghtml?fromchange=3e73fd638e687a4d7f46613586e5156b8e2af846&tochange=ade8d4a63e57560410de106450f37b50ed71cca5 which *might* implicate bug 1300678. Flagging Matt Woodrow for follow up.
status-firefox51:
--- → unaffected
status-firefox52:
--- → affected
status-firefox53:
--- → affected
Flags: needinfo?(anthony.s.hughes) → needinfo?(matt.woodrow)
Keywords: regression
OS: Windows 10 → Windows
Hardware: Unspecified → All
Whiteboard: [gfx-noted]
Version: unspecified → 52 Branch
Comment 2•8 years ago
|
||
This crash is happening in the GPU process.
It looks like the indices from the gfxPrefs list don't match up between the UI process and the GPU process (the index references a bool pref, but the union serialized wasn't a bool).
I can't see how this might happen though, it looks like gfxPrefs should be initialized the same regardless of process. Any ideas David?
Flags: needinfo?(matt.woodrow) → needinfo?(dvander)
I'm at a loss as to how this could happen, unless we're somehow launching a very different firefox.exe. Unfortunately the code for MOZ_RELEASE_ASSERT is involved enough that registers are trashed, so I can't guess as to which pref is misaligned. If we start seeing this in volume I can add instrumentation.
Flags: needinfo?(dvander)
Comment 4•8 years ago
|
||
The GPU process is riding the 53 train (hopefully).
status-firefox54:
--- → affected
Comment 5•8 years ago
|
||
Too late for a fix for 53, as we are in the last week of the 53 beta cycle.
Comment 6•8 years ago
|
||
There have been 23 Nightly crashes in the last week. Can we try adding that instrumentation mentioned in comment 3? I would think that our asserts for mismatched content and parent processes would catch this, but maybe the GPU process isn't going through those codepaths?
status-firefox55:
--- → affected
Flags: needinfo?(dvander)
Heh. That assert is only used for the content process, not the GPU process.
Flags: needinfo?(dvander)
Assignee: nobody → dvander
Status: NEW → ASSIGNED
This adds the build ID check to the GPU process. I'm not sure exactly how reliable this will be... AssertSanity will crash pretty soon after, and the parent might not get the message. But it's worth trying.
Attachment #8862959 -
Flags: review?(continuation)
Comment 9•8 years ago
|
||
Comment on attachment 8862959 [details] [diff] [review]
diagnostics
Review of attachment 8862959 [details] [diff] [review]:
-----------------------------------------------------------------
Cool. Yeah I was idly wondering if this would be needed. I suppose it makes sense that the GPU process might restart and/or be started earlier than the content process.
Attachment #8862959 -
Flags: review?(continuation) → review+
Comment 10•8 years ago
|
||
Pushed by danderson@mozilla.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/50998e4fbb31
Assert that the GPU process build ID matches the UI process. (bug 1333056, r=mccr8)
Whiteboard: [gfx-noted] → [gfx-noted][keep-open]
Comment 11•8 years ago
|
||
Backed out for crashing in Windows xpcshell tests, e.g. test_console_filtering.js and test_gpuprocess.js:
https://hg.mozilla.org/integration/mozilla-inbound/rev/23c8fd3467f41344d8804fbf055f9850a72c16c7
Push with failure: https://treeherder.mozilla.org/#/jobs?repo=mozilla-inbound&revision=50998e4fbb31abfdf8d4501edea7650436d54dc8&filter-resultStatus=testfailed&filter-resultStatus=busted&filter-resultStatus=exception&filter-resultStatus=retry&filter-resultStatus=usercancel&filter-resultStatus=runnable
Failure log: https://treeherder.mozilla.org/logviewer.html#?job_id=95304160&repo=mozilla-inbound
12:55:23 INFO - TEST-START | xpcshell-child-process.ini:dom/indexedDB/test/unit/test_index_object_cursors.js
12:55:24 INFO - mozcrash Saved minidump as C:\slave\test\build\blobber_upload_dir\f572cf7c-d549-42b5-9d87-bd7af49cf33f.dmp
12:55:24 INFO - mozcrash Saved app info as C:\slave\test\build\blobber_upload_dir\f572cf7c-d549-42b5-9d87-bd7af49cf33f.extra
12:55:24 WARNING - PROCESS-CRASH | devtools/shared/tests/unit/test_console_filtering.js | application crashed [@ mozilla::ipc::MessageChannel::SendBuildID()]
12:55:24 INFO - Crash dump filename: c:\users\cltbld~1.t-w\appdata\local\temp\xpc-other-ialrz5\f572cf7c-d549-42b5-9d87-bd7af49cf33f.dmp
12:55:24 INFO - Operating system: Windows NT
12:55:24 INFO - 6.2.9200
12:55:24 INFO - CPU: amd64
12:55:24 INFO - family 6 model 30 stepping 5
12:55:24 INFO - 8 CPUs
12:55:24 INFO - GPU: UNKNOWN
12:55:24 INFO - Crash reason: EXCEPTION_BREAKPOINT
12:55:24 INFO - Crash address: 0x7f901f41ac5
12:55:24 INFO - Assertion: Unknown assertion type 0x00000000
12:55:24 INFO - Process uptime: 3 seconds
12:55:24 INFO - Thread 0 (crashed)
12:55:24 INFO - 0 xul.dll!mozilla::ipc::MessageChannel::SendBuildID() [MessageChannel.cpp:50998e4fbb31 : 961 + 0x1b]
12:55:24 INFO - rax = 0x0000000000000000 rdx = 0x000000f563d3f938
12:55:24 INFO - rcx = 0x00000000ffffffff rbx = 0x00000000000003c1
12:55:24 INFO - rsi = 0x000000f564281190 rdi = 0x000000f56420b230
12:55:24 INFO - rbp = 0x000000f563d3f990 rsp = 0x000000f563d3f920
12:55:24 INFO - r8 = 0x000000f563d3f930 r9 = 0x000000f563d3f928
12:55:24 INFO - r10 = 0x0000000000000000 r11 = 0x000000f563d3ae60
12:55:24 INFO - r12 = 0x0000000000000000 r13 = 0x0000000000000001
12:55:24 INFO - r14 = 0x0000000000001740 r15 = 0x000007f9063faf50
12:55:24 INFO - rip = 0x000007f901f41ac5
12:55:24 INFO - Found by: given as instruction pointer in context
12:55:24 INFO - 1 xul.dll!mozilla::gfx::GPUParent::Init(unsigned long,MessageLoop *,IPC::Channel *) [GPUParent.cpp:50998e4fbb31 : 95 + 0x10]
12:55:24 INFO - rbx = 0x00000000000003c1 rbp = 0x000000f563d3f990
12:55:24 INFO - rsp = 0x000000f563d3f9b0 r12 = 0x0000000000000000
12:55:24 INFO - r13 = 0x0000000000000001 r14 = 0x0000000000001740
12:55:24 INFO - r15 = 0x000007f9063faf50 rip = 0x000007f9028eba78
12:55:24 INFO - Found by: call frame info
12:55:24 INFO - 2 xul.dll!XRE_InitChildProcess(int,char * * const,XREChildData const *) [nsEmbedFunctions.cpp:50998e4fbb31 : 677 + 0x38]
12:55:24 INFO - rbx = 0x00000000000003c1 rbp = 0x000000f563d3f990
12:55:24 INFO - rsp = 0x000000f563d3fa00 r12 = 0x0000000000000000
12:55:24 INFO - r13 = 0x0000000000000001 r14 = 0x0000000000001740
12:55:24 INFO - r15 = 0x000007f9063faf50 rip = 0x000007f9054fcf5e
12:55:24 INFO - Found by: call frame info
12:55:24 INFO - 3 plugin-container.exe!content_process_main(mozilla::Bootstrap *,int,char * * const) [plugin-container.cpp:50998e4fbb31 : 64 + 0x13]
12:55:24 INFO - rbx = 0x00000000000003c1 rbp = 0x000000f563d3f990
12:55:24 INFO - rsp = 0x000000f563d3fc20 r12 = 0x0000000000000000
12:55:24 INFO - r13 = 0x0000000000000001 r14 = 0x0000000000001740
12:55:24 INFO - r15 = 0x000007f9063faf50 rip = 0x000007f72b2e1283
12:55:24 INFO - Found by: call frame info
12:55:24 INFO - 4 plugin-container.exe!NS_internal_main(int,char * *) [MozillaRuntimeMain.cpp:50998e4fbb31 : 25 + 0xa]
12:55:24 INFO - rbx = 0x00000000000003c1 rbp = 0x000000f563d3f990
12:55:24 INFO - rsp = 0x000000f563d3fc60 r12 = 0x0000000000000000
12:55:24 INFO - r13 = 0x0000000000000001 r14 = 0x0000000000001740
12:55:24 INFO - r15 = 0x000007f9063faf50 rip = 0x000007f72b2e10de
12:55:24 INFO - Found by: call frame info
12:55:24 INFO - 5 plugin-container.exe!wmain [nsWindowsWMain.cpp:50998e4fbb31 : 111 + 0xb]
12:55:24 INFO - rbx = 0x00000000000003c1 rbp = 0x000000f563d3f990
12:55:24 INFO - rsp = 0x000000f563d3fc90 r12 = 0x0000000000000000
12:55:24 INFO - r13 = 0x0000000000000001 r14 = 0x0000000000001740
12:55:24 INFO - r15 = 0x000007f9063faf50 rip = 0x000007f72b2e1516
12:55:24 INFO - Found by: call frame info
12:55:24 INFO - 6 plugin-container.exe!__scrt_common_main_seh [exe_common.inl : 253 + 0x22]
12:55:24 INFO - rbx = 0x00000000000003c1 rbp = 0x000000f563d3f990
12:55:24 INFO - rsp = 0x000000f563d3fcf0 r12 = 0x0000000000000000
12:55:24 INFO - r13 = 0x0000000000000001 r14 = 0x0000000000001740
12:55:24 INFO - r15 = 0x000007f9063faf50 rip = 0x000007f72b3262f1
12:55:24 INFO - Found by: call frame info
12:55:24 INFO - 7 kernel32.dll!BaseThreadInitThunk + 0x1a
12:55:24 INFO - rbx = 0x00000000000003c1 rbp = 0x000000f563d3f990
12:55:24 INFO - rsp = 0x000000f563d3fd30 r12 = 0x0000000000000000
12:55:24 INFO - r13 = 0x0000000000000001 r14 = 0x0000000000001740
12:55:24 INFO - r15 = 0x000007f9063faf50 rip = 0x000007f920bc167e
12:55:24 INFO - Found by: call frame info
12:55:24 INFO - 8 ntdll.dll!RtlUserThreadStart + 0x21
12:55:24 INFO - rsp = 0x000000f563d3fd60 rip = 0x000007f9234fc3f1
12:55:24 INFO - Found by: stack scanning
12:55:24 INFO - 9 KERNELBASE.dll!GetLegacyComposition + 0x1180
12:55:24 INFO - rsp = 0x000000f563d3fd90 rip = 0x000007f9205609d0
12:55:24 INFO - Found by: stack scanning
Flags: needinfo?(dvander)
Comment 12•8 years ago
|
||
The number of crashes is low. Mark 54 won't fix.
Updated•8 years ago
|
Priority: -- → P3
Comment 13•8 years ago
|
||
It seems like crashes went way up in 56b6. Milan, could someone take a look again?
Ryan, can you take a look? Maybe redo the diagnostic?
Flags: needinfo?(rhunt)
Flags: needinfo?(milan)
Flags: needinfo?(dvander)
| Assignee | ||
Comment 15•8 years ago
|
||
I will try and figure out why David's patch got backed out.
Assignee: dvander → rhunt
Flags: needinfo?(rhunt)
| Assignee | ||
Comment 16•8 years ago
|
||
The problem here is from xpcshell creating and killing the GPU process faster than it can initialize. In some cases this can cause the channel to be in error or closed by the time SendBuildID happens.
I don't think it makes sense to crash in this case, because if the channel isn't connected the parent doesn't care what the build ID is. I'll add a patch that changes the assert.
| Assignee | ||
Comment 17•8 years ago
|
||
Attachment #8911367 -
Flags: review?(dvander)
(In reply to Ryan Hunt [:rhunt] from comment #16)
> The problem here is from xpcshell creating and killing the GPU process
That sounds all kinds scary to me. Can we not do that?
| Assignee | ||
Comment 19•8 years ago
|
||
(In reply to Milan Sreckovic [:milan] from comment #18)
> (In reply to Ryan Hunt [:rhunt] from comment #16)
> > The problem here is from xpcshell creating and killing the GPU process
>
> That sounds all kinds scary to me. Can we not do that?
I found one case we could mitigate: bug 1402500.
But in the general case, the GPU process is created when gfxPlatform is accessed, and that can happen randomly in gecko. If it happens near the end of the test (or if the test is really short because they spawn separate processes), it seems like it can trigger this.
We could disable the GPU process for most xpcshell tests. Or we could explicitly initialize it at the beginning of each test, which would probably also prevent this. I don't know what's better.
(In reply to Milan Sreckovic [:milan] from comment #18)
> (In reply to Ryan Hunt [:rhunt] from comment #16)
> > The problem here is from xpcshell creating and killing the GPU process
>
> That sounds all kinds scary to me. Can we not do that?
Why is it scary?
Attachment #8911367 -
Flags: review?(dvander) → review+
Comment 21•8 years ago
|
||
Pushed by rhunt@eqrion.net:
https://hg.mozilla.org/integration/mozilla-inbound/rev/ad52ca20b048
Don't crash in SendBuildID when MessageChannel isn't connected. (bug 1333056, r=dvander)
https://hg.mozilla.org/integration/mozilla-inbound/rev/d0673fb534d0
Assert that the GPU process build ID matches the UI process. (bug 1333056, r=mccr8)
Comment 22•8 years ago
|
||
| bugherder | ||
Comment 23•8 years ago
|
||
Comment on attachment 8911367 [details] [diff] [review]
send-build-id.patch
Approval Request Comment
[Feature/Bug causing the regression]: bug 1345978
[User impact if declined]: bug 1352458, which has a lot of crashes, but with not a lot of different users.
[Is this code covered by automated tests?]: This particular error path is not.
[Has the fix been verified in Nightly?]: This crash is not present in Nightly any more.
[Needs manual test from QE? If yes, steps to reproduce]: no
[List of other uplifts needed for the feature/fix]: none
[Is the change risky?]: I don't think it should be.
[Why is the change risky/not risky?]: It just removes a crash that I added not really knowing what I was doing. It should be okay to just keep going instead, as if we hit this state it is probably going to shut down nicely anyways.
[String changes made/needed]: none
There are two patches in this bug but only this one is needed to fix the crash.
Attachment #8911367 -
Flags: approval-mozilla-beta?
Comment 24•8 years ago
|
||
This patch has been in Nightly for a few weeks without any obvious regressions, which I think reduces the risk.
Comment on attachment 8911367 [details] [diff] [review]
send-build-id.patch
Crash fix, Bet57+
Attachment #8911367 -
Flags: approval-mozilla-beta? → approval-mozilla-beta+
Comment 26•8 years ago
|
||
| bugherder uplift | ||
Comment 27•8 years ago
|
||
Seems like the keep-open was a leftover from a long time ago and this is actually fixed now?
Status: ASSIGNED → RESOLVED
Closed: 8 years ago
status-firefox58:
--- → fixed
Resolution: --- → FIXED
Whiteboard: [gfx-noted][keep-open] → [gfx-noted]
Target Milestone: --- → mozilla58
Updated•8 years ago
|
You need to log in
before you can comment on or make changes to this bug.
Description
•