Closed Bug 1333056 Opened 3 years ago Closed 2 years ago

content process crash in mozilla::gfx::GfxPrefValue::AssertSanity

Categories

(Core :: Graphics, defect, P3, critical)

52 Branch
All
Windows
defect

Tracking

()

RESOLVED FIXED
mozilla58
Tracking Status
firefox51 --- unaffected
firefox52 --- disabled
firefox-esr52 --- disabled
firefox53 --- wontfix
firefox54 --- wontfix
firefox55 --- wontfix
firefox56 --- wontfix
firefox57 --- fixed
firefox58 --- fixed

People

(Reporter: benjamin, Assigned: rhunt)

References

Details

(Keywords: crash, regression, Whiteboard: [gfx-noted])

Crash Data

Attachments

(2 files)

This bug was filed from the Socorro interface and is 
report bp-a358ca80-5de8-45de-af9c-b96852170123.

I experienced this crash on nightly this morning. I don't exactly understand what is asserting/aborting here; is this related to a particular pref I have set, or is this more a communication error between several processes? Is it likely that this is somehow related to the GPU process? (setting NI?anthony so he's aware and can hook up the dependencies appropriately).

I see 12 of these on nightly, so I'm not the only one but this isn't high volume yet.
Flags: needinfo?(anthony.s.hughes)
Going back further in the crash data this starts showing up with Firefox 52.0a1 20161103030205 which predates GPU Process so I think we can rule that out. That build ID correlates to https://hg.mozilla.org/mozilla-central/pushloghtml?fromchange=3e73fd638e687a4d7f46613586e5156b8e2af846&tochange=ade8d4a63e57560410de106450f37b50ed71cca5 which *might* implicate bug 1300678. Flagging Matt Woodrow for follow up.
Flags: needinfo?(anthony.s.hughes) → needinfo?(matt.woodrow)
Keywords: regression
OS: Windows 10 → Windows
Hardware: Unspecified → All
Whiteboard: [gfx-noted]
Version: unspecified → 52 Branch
This crash is happening in the GPU process.

It looks like the indices from the gfxPrefs list don't match up between the UI process and the GPU process (the index references a bool pref, but the union serialized wasn't a bool).

I can't see how this might happen though, it looks like gfxPrefs should be initialized the same regardless of process. Any ideas David?
Flags: needinfo?(matt.woodrow) → needinfo?(dvander)
I'm at a loss as to how this could happen, unless we're somehow launching a very different firefox.exe. Unfortunately the code for MOZ_RELEASE_ASSERT is involved enough that registers are trashed, so I can't guess as to which pref is misaligned. If we start seeing this in volume I can add instrumentation.
Flags: needinfo?(dvander)
The GPU process is riding the 53 train (hopefully).
Too late for a fix for 53, as we are in the last week of the 53 beta cycle.
There have been 23 Nightly crashes in the last week.  Can we try adding that instrumentation mentioned in comment 3?  I would think that our asserts for mismatched content and parent processes would catch this, but maybe the GPU process isn't going through those codepaths?
Flags: needinfo?(dvander)
Heh. That assert is only used for the content process, not the GPU process.
Flags: needinfo?(dvander)
Assignee: nobody → dvander
Status: NEW → ASSIGNED
Attached patch diagnosticsSplinter Review
This adds the build ID check to the GPU process. I'm not sure exactly how reliable this will be... AssertSanity will crash pretty soon after, and the parent might not get the message. But it's worth trying.
Attachment #8862959 - Flags: review?(continuation)
Comment on attachment 8862959 [details] [diff] [review]
diagnostics

Review of attachment 8862959 [details] [diff] [review]:
-----------------------------------------------------------------

Cool. Yeah I was idly wondering if this would be needed. I suppose it makes sense that the GPU process might restart and/or be started earlier than the content process.
Attachment #8862959 - Flags: review?(continuation) → review+
Pushed by danderson@mozilla.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/50998e4fbb31
Assert that the GPU process build ID matches the UI process. (bug 1333056, r=mccr8)
Whiteboard: [gfx-noted] → [gfx-noted][keep-open]
Backed out for crashing in Windows xpcshell tests, e.g. test_console_filtering.js and test_gpuprocess.js:

https://hg.mozilla.org/integration/mozilla-inbound/rev/23c8fd3467f41344d8804fbf055f9850a72c16c7

Push with failure: https://treeherder.mozilla.org/#/jobs?repo=mozilla-inbound&revision=50998e4fbb31abfdf8d4501edea7650436d54dc8&filter-resultStatus=testfailed&filter-resultStatus=busted&filter-resultStatus=exception&filter-resultStatus=retry&filter-resultStatus=usercancel&filter-resultStatus=runnable
Failure log: https://treeherder.mozilla.org/logviewer.html#?job_id=95304160&repo=mozilla-inbound

12:55:23     INFO -  TEST-START | xpcshell-child-process.ini:dom/indexedDB/test/unit/test_index_object_cursors.js
12:55:24     INFO -  mozcrash Saved minidump as C:\slave\test\build\blobber_upload_dir\f572cf7c-d549-42b5-9d87-bd7af49cf33f.dmp
12:55:24     INFO -  mozcrash Saved app info as C:\slave\test\build\blobber_upload_dir\f572cf7c-d549-42b5-9d87-bd7af49cf33f.extra
12:55:24  WARNING -  PROCESS-CRASH | devtools/shared/tests/unit/test_console_filtering.js | application crashed [@ mozilla::ipc::MessageChannel::SendBuildID()]
12:55:24     INFO -  Crash dump filename: c:\users\cltbld~1.t-w\appdata\local\temp\xpc-other-ialrz5\f572cf7c-d549-42b5-9d87-bd7af49cf33f.dmp
12:55:24     INFO -  Operating system: Windows NT
12:55:24     INFO -                    6.2.9200
12:55:24     INFO -  CPU: amd64
12:55:24     INFO -       family 6 model 30 stepping 5
12:55:24     INFO -       8 CPUs
12:55:24     INFO -  GPU: UNKNOWN
12:55:24     INFO -  Crash reason:  EXCEPTION_BREAKPOINT
12:55:24     INFO -  Crash address: 0x7f901f41ac5
12:55:24     INFO -  Assertion: Unknown assertion type 0x00000000
12:55:24     INFO -  Process uptime: 3 seconds
12:55:24     INFO -  Thread 0 (crashed)
12:55:24     INFO -   0  xul.dll!mozilla::ipc::MessageChannel::SendBuildID() [MessageChannel.cpp:50998e4fbb31 : 961 + 0x1b]
12:55:24     INFO -      rax = 0x0000000000000000   rdx = 0x000000f563d3f938
12:55:24     INFO -      rcx = 0x00000000ffffffff   rbx = 0x00000000000003c1
12:55:24     INFO -      rsi = 0x000000f564281190   rdi = 0x000000f56420b230
12:55:24     INFO -      rbp = 0x000000f563d3f990   rsp = 0x000000f563d3f920
12:55:24     INFO -       r8 = 0x000000f563d3f930    r9 = 0x000000f563d3f928
12:55:24     INFO -      r10 = 0x0000000000000000   r11 = 0x000000f563d3ae60
12:55:24     INFO -      r12 = 0x0000000000000000   r13 = 0x0000000000000001
12:55:24     INFO -      r14 = 0x0000000000001740   r15 = 0x000007f9063faf50
12:55:24     INFO -      rip = 0x000007f901f41ac5
12:55:24     INFO -      Found by: given as instruction pointer in context
12:55:24     INFO -   1  xul.dll!mozilla::gfx::GPUParent::Init(unsigned long,MessageLoop *,IPC::Channel *) [GPUParent.cpp:50998e4fbb31 : 95 + 0x10]
12:55:24     INFO -      rbx = 0x00000000000003c1   rbp = 0x000000f563d3f990
12:55:24     INFO -      rsp = 0x000000f563d3f9b0   r12 = 0x0000000000000000
12:55:24     INFO -      r13 = 0x0000000000000001   r14 = 0x0000000000001740
12:55:24     INFO -      r15 = 0x000007f9063faf50   rip = 0x000007f9028eba78
12:55:24     INFO -      Found by: call frame info
12:55:24     INFO -   2  xul.dll!XRE_InitChildProcess(int,char * * const,XREChildData const *) [nsEmbedFunctions.cpp:50998e4fbb31 : 677 + 0x38]
12:55:24     INFO -      rbx = 0x00000000000003c1   rbp = 0x000000f563d3f990
12:55:24     INFO -      rsp = 0x000000f563d3fa00   r12 = 0x0000000000000000
12:55:24     INFO -      r13 = 0x0000000000000001   r14 = 0x0000000000001740
12:55:24     INFO -      r15 = 0x000007f9063faf50   rip = 0x000007f9054fcf5e
12:55:24     INFO -      Found by: call frame info
12:55:24     INFO -   3  plugin-container.exe!content_process_main(mozilla::Bootstrap *,int,char * * const) [plugin-container.cpp:50998e4fbb31 : 64 + 0x13]
12:55:24     INFO -      rbx = 0x00000000000003c1   rbp = 0x000000f563d3f990
12:55:24     INFO -      rsp = 0x000000f563d3fc20   r12 = 0x0000000000000000
12:55:24     INFO -      r13 = 0x0000000000000001   r14 = 0x0000000000001740
12:55:24     INFO -      r15 = 0x000007f9063faf50   rip = 0x000007f72b2e1283
12:55:24     INFO -      Found by: call frame info
12:55:24     INFO -   4  plugin-container.exe!NS_internal_main(int,char * *) [MozillaRuntimeMain.cpp:50998e4fbb31 : 25 + 0xa]
12:55:24     INFO -      rbx = 0x00000000000003c1   rbp = 0x000000f563d3f990
12:55:24     INFO -      rsp = 0x000000f563d3fc60   r12 = 0x0000000000000000
12:55:24     INFO -      r13 = 0x0000000000000001   r14 = 0x0000000000001740
12:55:24     INFO -      r15 = 0x000007f9063faf50   rip = 0x000007f72b2e10de
12:55:24     INFO -      Found by: call frame info
12:55:24     INFO -   5  plugin-container.exe!wmain [nsWindowsWMain.cpp:50998e4fbb31 : 111 + 0xb]
12:55:24     INFO -      rbx = 0x00000000000003c1   rbp = 0x000000f563d3f990
12:55:24     INFO -      rsp = 0x000000f563d3fc90   r12 = 0x0000000000000000
12:55:24     INFO -      r13 = 0x0000000000000001   r14 = 0x0000000000001740
12:55:24     INFO -      r15 = 0x000007f9063faf50   rip = 0x000007f72b2e1516
12:55:24     INFO -      Found by: call frame info
12:55:24     INFO -   6  plugin-container.exe!__scrt_common_main_seh [exe_common.inl : 253 + 0x22]
12:55:24     INFO -      rbx = 0x00000000000003c1   rbp = 0x000000f563d3f990
12:55:24     INFO -      rsp = 0x000000f563d3fcf0   r12 = 0x0000000000000000
12:55:24     INFO -      r13 = 0x0000000000000001   r14 = 0x0000000000001740
12:55:24     INFO -      r15 = 0x000007f9063faf50   rip = 0x000007f72b3262f1
12:55:24     INFO -      Found by: call frame info
12:55:24     INFO -   7  kernel32.dll!BaseThreadInitThunk + 0x1a
12:55:24     INFO -      rbx = 0x00000000000003c1   rbp = 0x000000f563d3f990
12:55:24     INFO -      rsp = 0x000000f563d3fd30   r12 = 0x0000000000000000
12:55:24     INFO -      r13 = 0x0000000000000001   r14 = 0x0000000000001740
12:55:24     INFO -      r15 = 0x000007f9063faf50   rip = 0x000007f920bc167e
12:55:24     INFO -      Found by: call frame info
12:55:24     INFO -   8  ntdll.dll!RtlUserThreadStart + 0x21
12:55:24     INFO -      rsp = 0x000000f563d3fd60   rip = 0x000007f9234fc3f1
12:55:24     INFO -      Found by: stack scanning
12:55:24     INFO -   9  KERNELBASE.dll!GetLegacyComposition + 0x1180
12:55:24     INFO -      rsp = 0x000000f563d3fd90   rip = 0x000007f9205609d0
12:55:24     INFO -      Found by: stack scanning
Flags: needinfo?(dvander)
The number of crashes is low. Mark 54 won't fix.
It seems like crashes went way up in 56b6. Milan, could someone take a look again?
Flags: needinfo?(milan)
Ryan, can you take a look?  Maybe redo the diagnostic?
Flags: needinfo?(rhunt)
Flags: needinfo?(milan)
Flags: needinfo?(dvander)
I will try and figure out why David's patch got backed out.
Assignee: dvander → rhunt
Flags: needinfo?(rhunt)
The problem here is from xpcshell creating and killing the GPU process faster than it can initialize. In some cases this can cause the channel to be in error or closed by the time SendBuildID happens.

I don't think it makes sense to crash in this case, because if the channel isn't connected the parent doesn't care what the build ID is. I'll add a patch that changes the assert.
Attachment #8911367 - Flags: review?(dvander)
(In reply to Ryan Hunt [:rhunt] from comment #16)
> The problem here is from xpcshell creating and killing the GPU process

That sounds all kinds scary to me.  Can we not do that?
(In reply to Milan Sreckovic [:milan] from comment #18)
> (In reply to Ryan Hunt [:rhunt] from comment #16)
> > The problem here is from xpcshell creating and killing the GPU process
> 
> That sounds all kinds scary to me.  Can we not do that?

I found one case we could mitigate: bug 1402500.

But in the general case, the GPU process is created when gfxPlatform is accessed, and that can happen randomly in gecko. If it happens near the end of the test (or if the test is really short because they spawn separate processes), it seems like it can trigger this.

We could disable the GPU process for most xpcshell tests. Or we could explicitly initialize it at the beginning of each test, which would probably also prevent this. I don't know what's better.
(In reply to Milan Sreckovic [:milan] from comment #18)
> (In reply to Ryan Hunt [:rhunt] from comment #16)
> > The problem here is from xpcshell creating and killing the GPU process
> 
> That sounds all kinds scary to me.  Can we not do that?

Why is it scary?
Attachment #8911367 - Flags: review?(dvander) → review+
Pushed by rhunt@eqrion.net:
https://hg.mozilla.org/integration/mozilla-inbound/rev/ad52ca20b048
Don't crash in SendBuildID when MessageChannel isn't connected. (bug 1333056, r=dvander)
https://hg.mozilla.org/integration/mozilla-inbound/rev/d0673fb534d0
Assert that the GPU process build ID matches the UI process. (bug 1333056, r=mccr8)
Comment on attachment 8911367 [details] [diff] [review]
send-build-id.patch

Approval Request Comment
[Feature/Bug causing the regression]: bug 1345978 
[User impact if declined]: bug 1352458, which has a lot of crashes, but with not a lot of different users.
[Is this code covered by automated tests?]: This particular error path is not.
[Has the fix been verified in Nightly?]: This crash is not present in Nightly any more.
[Needs manual test from QE? If yes, steps to reproduce]: no
[List of other uplifts needed for the feature/fix]: none
[Is the change risky?]: I don't think it should be.
[Why is the change risky/not risky?]:  It just removes a crash that I added not really knowing what I was doing. It should be okay to just keep going instead, as if we hit this state it is probably going to shut down nicely anyways.
[String changes made/needed]: none

There are two patches in this bug but only this one is needed to fix the crash.
Attachment #8911367 - Flags: approval-mozilla-beta?
This patch has been in Nightly for a few weeks without any obvious regressions, which I think reduces the risk.
Comment on attachment 8911367 [details] [diff] [review]
send-build-id.patch

Crash fix, Bet57+
Attachment #8911367 - Flags: approval-mozilla-beta? → approval-mozilla-beta+
Seems like the keep-open was a leftover from a long time ago and this is actually fixed now?
Status: ASSIGNED → RESOLVED
Closed: 2 years ago
Resolution: --- → FIXED
Whiteboard: [gfx-noted][keep-open] → [gfx-noted]
Target Milestone: --- → mozilla58
You need to log in before you can comment on or make changes to this bug.