Closed
Bug 1333056
Opened 7 years ago
Closed 7 years ago
content process crash in mozilla::gfx::GfxPrefValue::AssertSanity
Categories
(Core :: Graphics, defect, P3)
Tracking
()
RESOLVED
FIXED
mozilla58
People
(Reporter: benjamin, Assigned: rhunt)
References
Details
(Keywords: crash, regression, Whiteboard: [gfx-noted])
Crash Data
Attachments
(2 files)
938 bytes,
patch
|
mccr8
:
review+
|
Details | Diff | Splinter Review |
1.19 KB,
patch
|
dvander
:
review+
ritu
:
approval-mozilla-beta+
|
Details | Diff | Splinter Review |
This bug was filed from the Socorro interface and is report bp-a358ca80-5de8-45de-af9c-b96852170123. I experienced this crash on nightly this morning. I don't exactly understand what is asserting/aborting here; is this related to a particular pref I have set, or is this more a communication error between several processes? Is it likely that this is somehow related to the GPU process? (setting NI?anthony so he's aware and can hook up the dependencies appropriately). I see 12 of these on nightly, so I'm not the only one but this isn't high volume yet.
Flags: needinfo?(anthony.s.hughes)
Going back further in the crash data this starts showing up with Firefox 52.0a1 20161103030205 which predates GPU Process so I think we can rule that out. That build ID correlates to https://hg.mozilla.org/mozilla-central/pushloghtml?fromchange=3e73fd638e687a4d7f46613586e5156b8e2af846&tochange=ade8d4a63e57560410de106450f37b50ed71cca5 which *might* implicate bug 1300678. Flagging Matt Woodrow for follow up.
status-firefox51:
--- → unaffected
status-firefox52:
--- → affected
status-firefox53:
--- → affected
Flags: needinfo?(anthony.s.hughes) → needinfo?(matt.woodrow)
Keywords: regression
OS: Windows 10 → Windows
Hardware: Unspecified → All
Whiteboard: [gfx-noted]
Version: unspecified → 52 Branch
Comment 2•7 years ago
|
||
This crash is happening in the GPU process. It looks like the indices from the gfxPrefs list don't match up between the UI process and the GPU process (the index references a bool pref, but the union serialized wasn't a bool). I can't see how this might happen though, it looks like gfxPrefs should be initialized the same regardless of process. Any ideas David?
Flags: needinfo?(matt.woodrow) → needinfo?(dvander)
I'm at a loss as to how this could happen, unless we're somehow launching a very different firefox.exe. Unfortunately the code for MOZ_RELEASE_ASSERT is involved enough that registers are trashed, so I can't guess as to which pref is misaligned. If we start seeing this in volume I can add instrumentation.
Flags: needinfo?(dvander)
Comment 4•7 years ago
|
||
The GPU process is riding the 53 train (hopefully).
status-firefox54:
--- → affected
Comment 5•7 years ago
|
||
Too late for a fix for 53, as we are in the last week of the 53 beta cycle.
Comment 6•7 years ago
|
||
There have been 23 Nightly crashes in the last week. Can we try adding that instrumentation mentioned in comment 3? I would think that our asserts for mismatched content and parent processes would catch this, but maybe the GPU process isn't going through those codepaths?
status-firefox55:
--- → affected
Flags: needinfo?(dvander)
Heh. That assert is only used for the content process, not the GPU process.
Flags: needinfo?(dvander)
Assignee: nobody → dvander
Status: NEW → ASSIGNED
This adds the build ID check to the GPU process. I'm not sure exactly how reliable this will be... AssertSanity will crash pretty soon after, and the parent might not get the message. But it's worth trying.
Attachment #8862959 -
Flags: review?(continuation)
Comment 9•7 years ago
|
||
Comment on attachment 8862959 [details] [diff] [review] diagnostics Review of attachment 8862959 [details] [diff] [review]: ----------------------------------------------------------------- Cool. Yeah I was idly wondering if this would be needed. I suppose it makes sense that the GPU process might restart and/or be started earlier than the content process.
Attachment #8862959 -
Flags: review?(continuation) → review+
Comment 10•7 years ago
|
||
Pushed by danderson@mozilla.com: https://hg.mozilla.org/integration/mozilla-inbound/rev/50998e4fbb31 Assert that the GPU process build ID matches the UI process. (bug 1333056, r=mccr8)
Whiteboard: [gfx-noted] → [gfx-noted][keep-open]
Comment 11•7 years ago
|
||
Backed out for crashing in Windows xpcshell tests, e.g. test_console_filtering.js and test_gpuprocess.js: https://hg.mozilla.org/integration/mozilla-inbound/rev/23c8fd3467f41344d8804fbf055f9850a72c16c7 Push with failure: https://treeherder.mozilla.org/#/jobs?repo=mozilla-inbound&revision=50998e4fbb31abfdf8d4501edea7650436d54dc8&filter-resultStatus=testfailed&filter-resultStatus=busted&filter-resultStatus=exception&filter-resultStatus=retry&filter-resultStatus=usercancel&filter-resultStatus=runnable Failure log: https://treeherder.mozilla.org/logviewer.html#?job_id=95304160&repo=mozilla-inbound 12:55:23 INFO - TEST-START | xpcshell-child-process.ini:dom/indexedDB/test/unit/test_index_object_cursors.js 12:55:24 INFO - mozcrash Saved minidump as C:\slave\test\build\blobber_upload_dir\f572cf7c-d549-42b5-9d87-bd7af49cf33f.dmp 12:55:24 INFO - mozcrash Saved app info as C:\slave\test\build\blobber_upload_dir\f572cf7c-d549-42b5-9d87-bd7af49cf33f.extra 12:55:24 WARNING - PROCESS-CRASH | devtools/shared/tests/unit/test_console_filtering.js | application crashed [@ mozilla::ipc::MessageChannel::SendBuildID()] 12:55:24 INFO - Crash dump filename: c:\users\cltbld~1.t-w\appdata\local\temp\xpc-other-ialrz5\f572cf7c-d549-42b5-9d87-bd7af49cf33f.dmp 12:55:24 INFO - Operating system: Windows NT 12:55:24 INFO - 6.2.9200 12:55:24 INFO - CPU: amd64 12:55:24 INFO - family 6 model 30 stepping 5 12:55:24 INFO - 8 CPUs 12:55:24 INFO - GPU: UNKNOWN 12:55:24 INFO - Crash reason: EXCEPTION_BREAKPOINT 12:55:24 INFO - Crash address: 0x7f901f41ac5 12:55:24 INFO - Assertion: Unknown assertion type 0x00000000 12:55:24 INFO - Process uptime: 3 seconds 12:55:24 INFO - Thread 0 (crashed) 12:55:24 INFO - 0 xul.dll!mozilla::ipc::MessageChannel::SendBuildID() [MessageChannel.cpp:50998e4fbb31 : 961 + 0x1b] 12:55:24 INFO - rax = 0x0000000000000000 rdx = 0x000000f563d3f938 12:55:24 INFO - rcx = 0x00000000ffffffff rbx = 0x00000000000003c1 12:55:24 INFO - rsi = 0x000000f564281190 rdi = 0x000000f56420b230 12:55:24 INFO - rbp = 0x000000f563d3f990 rsp = 0x000000f563d3f920 12:55:24 INFO - r8 = 0x000000f563d3f930 r9 = 0x000000f563d3f928 12:55:24 INFO - r10 = 0x0000000000000000 r11 = 0x000000f563d3ae60 12:55:24 INFO - r12 = 0x0000000000000000 r13 = 0x0000000000000001 12:55:24 INFO - r14 = 0x0000000000001740 r15 = 0x000007f9063faf50 12:55:24 INFO - rip = 0x000007f901f41ac5 12:55:24 INFO - Found by: given as instruction pointer in context 12:55:24 INFO - 1 xul.dll!mozilla::gfx::GPUParent::Init(unsigned long,MessageLoop *,IPC::Channel *) [GPUParent.cpp:50998e4fbb31 : 95 + 0x10] 12:55:24 INFO - rbx = 0x00000000000003c1 rbp = 0x000000f563d3f990 12:55:24 INFO - rsp = 0x000000f563d3f9b0 r12 = 0x0000000000000000 12:55:24 INFO - r13 = 0x0000000000000001 r14 = 0x0000000000001740 12:55:24 INFO - r15 = 0x000007f9063faf50 rip = 0x000007f9028eba78 12:55:24 INFO - Found by: call frame info 12:55:24 INFO - 2 xul.dll!XRE_InitChildProcess(int,char * * const,XREChildData const *) [nsEmbedFunctions.cpp:50998e4fbb31 : 677 + 0x38] 12:55:24 INFO - rbx = 0x00000000000003c1 rbp = 0x000000f563d3f990 12:55:24 INFO - rsp = 0x000000f563d3fa00 r12 = 0x0000000000000000 12:55:24 INFO - r13 = 0x0000000000000001 r14 = 0x0000000000001740 12:55:24 INFO - r15 = 0x000007f9063faf50 rip = 0x000007f9054fcf5e 12:55:24 INFO - Found by: call frame info 12:55:24 INFO - 3 plugin-container.exe!content_process_main(mozilla::Bootstrap *,int,char * * const) [plugin-container.cpp:50998e4fbb31 : 64 + 0x13] 12:55:24 INFO - rbx = 0x00000000000003c1 rbp = 0x000000f563d3f990 12:55:24 INFO - rsp = 0x000000f563d3fc20 r12 = 0x0000000000000000 12:55:24 INFO - r13 = 0x0000000000000001 r14 = 0x0000000000001740 12:55:24 INFO - r15 = 0x000007f9063faf50 rip = 0x000007f72b2e1283 12:55:24 INFO - Found by: call frame info 12:55:24 INFO - 4 plugin-container.exe!NS_internal_main(int,char * *) [MozillaRuntimeMain.cpp:50998e4fbb31 : 25 + 0xa] 12:55:24 INFO - rbx = 0x00000000000003c1 rbp = 0x000000f563d3f990 12:55:24 INFO - rsp = 0x000000f563d3fc60 r12 = 0x0000000000000000 12:55:24 INFO - r13 = 0x0000000000000001 r14 = 0x0000000000001740 12:55:24 INFO - r15 = 0x000007f9063faf50 rip = 0x000007f72b2e10de 12:55:24 INFO - Found by: call frame info 12:55:24 INFO - 5 plugin-container.exe!wmain [nsWindowsWMain.cpp:50998e4fbb31 : 111 + 0xb] 12:55:24 INFO - rbx = 0x00000000000003c1 rbp = 0x000000f563d3f990 12:55:24 INFO - rsp = 0x000000f563d3fc90 r12 = 0x0000000000000000 12:55:24 INFO - r13 = 0x0000000000000001 r14 = 0x0000000000001740 12:55:24 INFO - r15 = 0x000007f9063faf50 rip = 0x000007f72b2e1516 12:55:24 INFO - Found by: call frame info 12:55:24 INFO - 6 plugin-container.exe!__scrt_common_main_seh [exe_common.inl : 253 + 0x22] 12:55:24 INFO - rbx = 0x00000000000003c1 rbp = 0x000000f563d3f990 12:55:24 INFO - rsp = 0x000000f563d3fcf0 r12 = 0x0000000000000000 12:55:24 INFO - r13 = 0x0000000000000001 r14 = 0x0000000000001740 12:55:24 INFO - r15 = 0x000007f9063faf50 rip = 0x000007f72b3262f1 12:55:24 INFO - Found by: call frame info 12:55:24 INFO - 7 kernel32.dll!BaseThreadInitThunk + 0x1a 12:55:24 INFO - rbx = 0x00000000000003c1 rbp = 0x000000f563d3f990 12:55:24 INFO - rsp = 0x000000f563d3fd30 r12 = 0x0000000000000000 12:55:24 INFO - r13 = 0x0000000000000001 r14 = 0x0000000000001740 12:55:24 INFO - r15 = 0x000007f9063faf50 rip = 0x000007f920bc167e 12:55:24 INFO - Found by: call frame info 12:55:24 INFO - 8 ntdll.dll!RtlUserThreadStart + 0x21 12:55:24 INFO - rsp = 0x000000f563d3fd60 rip = 0x000007f9234fc3f1 12:55:24 INFO - Found by: stack scanning 12:55:24 INFO - 9 KERNELBASE.dll!GetLegacyComposition + 0x1180 12:55:24 INFO - rsp = 0x000000f563d3fd90 rip = 0x000007f9205609d0 12:55:24 INFO - Found by: stack scanning
Flags: needinfo?(dvander)
Comment 12•7 years ago
|
||
The number of crashes is low. Mark 54 won't fix.
Updated•7 years ago
|
Priority: -- → P3
Comment 13•7 years ago
|
||
It seems like crashes went way up in 56b6. Milan, could someone take a look again?
Ryan, can you take a look? Maybe redo the diagnostic?
Flags: needinfo?(rhunt)
Flags: needinfo?(milan)
Flags: needinfo?(dvander)
Assignee | ||
Comment 15•7 years ago
|
||
I will try and figure out why David's patch got backed out.
Assignee: dvander → rhunt
Flags: needinfo?(rhunt)
Assignee | ||
Comment 16•7 years ago
|
||
The problem here is from xpcshell creating and killing the GPU process faster than it can initialize. In some cases this can cause the channel to be in error or closed by the time SendBuildID happens. I don't think it makes sense to crash in this case, because if the channel isn't connected the parent doesn't care what the build ID is. I'll add a patch that changes the assert.
Assignee | ||
Comment 17•7 years ago
|
||
Attachment #8911367 -
Flags: review?(dvander)
(In reply to Ryan Hunt [:rhunt] from comment #16) > The problem here is from xpcshell creating and killing the GPU process That sounds all kinds scary to me. Can we not do that?
Assignee | ||
Comment 19•7 years ago
|
||
(In reply to Milan Sreckovic [:milan] from comment #18) > (In reply to Ryan Hunt [:rhunt] from comment #16) > > The problem here is from xpcshell creating and killing the GPU process > > That sounds all kinds scary to me. Can we not do that? I found one case we could mitigate: bug 1402500. But in the general case, the GPU process is created when gfxPlatform is accessed, and that can happen randomly in gecko. If it happens near the end of the test (or if the test is really short because they spawn separate processes), it seems like it can trigger this. We could disable the GPU process for most xpcshell tests. Or we could explicitly initialize it at the beginning of each test, which would probably also prevent this. I don't know what's better.
(In reply to Milan Sreckovic [:milan] from comment #18) > (In reply to Ryan Hunt [:rhunt] from comment #16) > > The problem here is from xpcshell creating and killing the GPU process > > That sounds all kinds scary to me. Can we not do that? Why is it scary?
Attachment #8911367 -
Flags: review?(dvander) → review+
Comment 21•7 years ago
|
||
Pushed by rhunt@eqrion.net: https://hg.mozilla.org/integration/mozilla-inbound/rev/ad52ca20b048 Don't crash in SendBuildID when MessageChannel isn't connected. (bug 1333056, r=dvander) https://hg.mozilla.org/integration/mozilla-inbound/rev/d0673fb534d0 Assert that the GPU process build ID matches the UI process. (bug 1333056, r=mccr8)
Comment 22•7 years ago
|
||
bugherder |
https://hg.mozilla.org/mozilla-central/rev/ad52ca20b048 https://hg.mozilla.org/mozilla-central/rev/d0673fb534d0
Comment 23•7 years ago
|
||
Comment on attachment 8911367 [details] [diff] [review] send-build-id.patch Approval Request Comment [Feature/Bug causing the regression]: bug 1345978 [User impact if declined]: bug 1352458, which has a lot of crashes, but with not a lot of different users. [Is this code covered by automated tests?]: This particular error path is not. [Has the fix been verified in Nightly?]: This crash is not present in Nightly any more. [Needs manual test from QE? If yes, steps to reproduce]: no [List of other uplifts needed for the feature/fix]: none [Is the change risky?]: I don't think it should be. [Why is the change risky/not risky?]: It just removes a crash that I added not really knowing what I was doing. It should be okay to just keep going instead, as if we hit this state it is probably going to shut down nicely anyways. [String changes made/needed]: none There are two patches in this bug but only this one is needed to fix the crash.
Attachment #8911367 -
Flags: approval-mozilla-beta?
Comment 24•7 years ago
|
||
This patch has been in Nightly for a few weeks without any obvious regressions, which I think reduces the risk.
Comment on attachment 8911367 [details] [diff] [review] send-build-id.patch Crash fix, Bet57+
Attachment #8911367 -
Flags: approval-mozilla-beta? → approval-mozilla-beta+
Comment 26•7 years ago
|
||
bugherder uplift |
https://hg.mozilla.org/releases/mozilla-beta/rev/208bf788e80c
Comment 27•7 years ago
|
||
Seems like the keep-open was a leftover from a long time ago and this is actually fixed now?
Status: ASSIGNED → RESOLVED
Closed: 7 years ago
status-firefox58:
--- → fixed
Resolution: --- → FIXED
Whiteboard: [gfx-noted][keep-open] → [gfx-noted]
Target Milestone: --- → mozilla58
Updated•7 years ago
|
You need to log in
before you can comment on or make changes to this bug.
Description
•