Content process crash in IPC code during PLayerTransaction construction

NEW
Unassigned

Status

()

Core
Graphics: Layers
7 months ago
5 months ago

People

(Reporter: gerard, Unassigned)

Tracking

55 Branch
Points:
---

Firefox Tracking Flags

(Not tracked)

Details

(crash signature)

Attachments

(1 attachment)

(Reporter)

Description

7 months ago
Starting with nightly from one or two days ago, GPU process has been behaving erratically:
 - crashing (https://crash-stats.mozilla.com/report/index/bp-273d8c4f-90d7-4734-95d1-347480170507, https://crash-stats.mozilla.com/report/index/aa08ac63-fc35-41f5-8868-a8fd00170507)
 - failing to render, with logs filling like this:
> May  8 14:48:57 portable-alex Firefox-Nightly.desktop[3563]: [GFX1-]: Failed 2 buffer db=0 dw=0 for 0, 0, 206, 987
> May  8 14:48:57 portable-alex Firefox-Nightly.desktop[3563]: [GFX1-]: Failed 2 buffer db=0 dw=0 for 0, 0, 1863, 43
> May  8 14:48:57 portable-alex Firefox-Nightly.desktop[3563]: [GFX1-]: Failed 2 buffer db=0 dw=0 for 1863, 0, 64, 987
> May  8 14:48:57 portable-alex Firefox-Nightly.desktop[3563]: [GFX1-]: Failed 2 buffer db=0 dw=0 for 0, 0, 64, 187
> May  8 14:48:57 portable-alex Firefox-Nightly.desktop[3563]: [GFX1-]: Failed 2 buffer db=0 dw=0 for -5, 11, 186, 487
> May  8 14:48:57 portable-alex Firefox-Nightly.desktop[3563]: [GFX1-]: Failed 2 buffer db=0 dw=0 for 514, 55, 821, 3529
> May  8 14:48:57 portable-alex Firefox-Nightly.desktop[3563]: [GFX1-]: Failed 2 buffer db=0 dw=0 for -1, 12, 310, 469
> May  8 14:48:57 portable-alex Firefox-Nightly.desktop[3563]: [GFX1-]: Failed 2 buffer db=0 dw=0 for 525, 333, 478, 3245
> May  8 14:48:57 portable-alex Firefox-Nightly.desktop[3563]: [GFX1-]: Failed 2 buffer db=0 dw=0 for 0, 0, 206, 987
> May  8 14:48:57 portable-alex Firefox-Nightly.desktop[3563]: [GFX1-]: Failed 2 buffer db=0 dw=0 for 0, 0, 1863, 43
> May  8 14:49:02 portable-alex Firefox-Nightly.desktop[3563]: [GFX1-]: Failed 2 buffer db=0 dw=0 for 0, 0, 1875, 987

The issues seems to be intermittent, after a restart it seems to behave properly. So far, it is hard to provide STRs on the issue.

System is Ubuntu 16.10. GPU is Intel HD 5500.

Bug 1300310 and bug 1300635 reports similar error messages.
The GPU process is not enabled on Linux. The two crashes you linked in comment 0 are possibly regressions from bug 1350634. dvander, can you take a look?
Blocks: 1350634
Component: Layout: View Rendering → Graphics: Layers
Flags: needinfo?(dvander)
Version: unspecified → 55 Branch
Summary: GPU process hangs, crash or fails to render → Content process crash in IPC code during PLayerTransaction construction
(Reporter)

Comment 2

7 months ago
Okay, one more braindump from IRC, since :kats says it could be relevant. The only times I could reproduce the issue were indeed precisely on the restart of the browser next to applying an update. Issue could never be reproduced as soon as I did a proper kill & restart of Nightly.
Unfortunately we don't seem to have the IPC error in the crash metadata, and some of the crash stack is obscured, so it's hard to know what went wrong. Maybe this is another problem where build versions don't match.

Alexandre, does this happen with every update, or only an update that crosses bug 1350634's landing?
Flags: needinfo?(lissyx+mozillians)
(Reporter)

Comment 4

7 months ago
I would be able to assert this properly if we had a proper history is builds being installed. Given we don't have that, I am not able to be 100% affirmative that the firt update when I saw that would have been one that includes bug 1350634.
Flags: needinfo?(lissyx+mozillians)
Crash Signature: [@ mozalloc_abort | NS_DebugBreak | mozilla::ipc::FatalError | mozilla::dom::ContentChild::FatalErrorIfNotUsingGPUProcess | libxul.so@0xe1ef62 | mozilla::dom::TabChild::InitRenderingState]
See Also: → bug 1363306
(Reporter)

Comment 5

7 months ago
FYI, I had no issue upgrading today to https://hg.mozilla.org/mozilla-central/rev/b21b974d60d3075ae24f6fb1bae75d0f122f28fc
(Reporter)

Comment 6

7 months ago
Performed an upgrade today, there was high CPU usage so I killed and restarted firefox. During the shutdown, I spotted those:

> May 12 15:01:08 portable-alex Firefox-Nightly.desktop[21422]: windows.onFocusChanged event fired after context unloaded.
> May 12 15:01:12 portable-alex Firefox-Nightly.desktop[21422]: OKSandbox: Unexpected EOF, op 0 flags 01101 path /tmp/GeckoChildCrash6119.extra
> May 12 15:01:12 portable-alex Firefox-Nightly.desktop[21422]: Sandbox: Unexpected EOF, op 0 flags 01101 path /tmp/GeckoChildCrash6124.extra
> May 12 15:01:12 portable-alex Firefox-Nightly.desktop[21422]: Sandbox: Unexpected EOF, op 0 flags 01101 path /tmp/GeckoChildCrash8463.extra
> May 12 15:01:12 portable-alex Firefox-Nightly.desktop[21422]: Sandbox: Unexpected EOF, op 0 flags 01101 path /tmp/GeckoChildCrash6142.extra
> May 12 15:01:12 portable-alex Firefox-Nightly.desktop[21422]: Sandbox: Unexpected EOF, op 0 flags 01101 path /tmp/GeckoChildCrash6164.extra
> May 12 15:01:12 portable-alex Firefox-Nightly.desktop[21422]: Sandbox: Unexpected EOF, op 0 flags 01101 path /tmp/GeckoChildCrash6200.extra
> May 12 15:01:12 portable-alex Firefox-Nightly.desktop[21422]: Sandbox: Unexpected EOF, op 0 flags 01101 path /tmp/GeckoChildCrash6222.extra
> May 12 15:01:12 portable-alex Firefox-Nightly.desktop[21422]: Sandbox: Unexpected EOF, op 0 flags 01101 path /tmp/GeckoChildCrash6236.extra
> May 12 15:01:12 portable-alex Firefox-Nightly.desktop[21422]: Sandbox: Unexpected EOF, op 0 flags 01101 path /tmp/GeckoChildCrash6269.extra
> May 12 15:01:12 portable-alex Firefox-Nightly.desktop[21422]: Sandbox: Unexpected EOF, op 0 flags 01101 path /tmp/GeckoChildCrash6251.extra
> May 12 15:01:12 portable-alex Firefox-Nightly.desktop[21422]: Sandbox: Unexpected EOF, op 0 flags 01101 path /tmp/GeckoChildCrash6290.extra
> May 12 15:01:12 portable-alex Firefox-Nightly.desktop[21422]: Sandbox: Unexpected EOF, op 0 flags 01101 path /tmp/GeckoChildCrash6308.extra
> May 12 15:01:12 portable-alex kernel: [557299.797713] Chrome_ChildThr[6121]: segfault at 0 ip 00007fa1d4eeac79 sp 00007fa1d1ba9ba0 error 6 in libxul.so (deleted)[7fa1d4c2a000+3f89000]
> May 12 15:01:12 portable-alex kernel: [557299.797878] Chrome_ChildThr[6139]: segfault at 0 ip 00007ff08a4eac79 sp 00007ff0871a9ba0 error 6 in libxul.so (deleted)[7ff08a22a000+3f89000]
> May 12 15:01:12 portable-alex kernel: [557299.798048] Chrome_ChildThr[8465]: segfault at 0 ip 00007fb0651eac79 sp 00007fb061ea9ba0 error 6 in libxul.so (deleted)[7fb064f2a000+3f89000]
> May 12 15:01:12 portable-alex kernel: [557299.798063] Chrome_ChildThr[6147]: segfault at 0 ip 00007f74d7ceac79 sp 00007f74d4b63ba0 error 6 in libxul.so (deleted)[7f74d7a2a000+3f89000]
> May 12 15:01:12 portable-alex kernel: [557299.798225] Chrome_ChildThr[6176]: segfault at 0 ip 00007fdad9feac79 sp 00007fdad6ca9ba0 error 6 in libxul.so (deleted)[7fdad9d2a000+3f89000]
> May 12 15:01:12 portable-alex kernel: [557299.798261] Chrome_ChildThr[6202]: segfault at 0 ip 00007f644f9eac79 sp 00007f644c6a9ba0 error 6 in libxul.so (deleted)[7f644f72a000+3f89000]
> May 12 15:01:12 portable-alex kernel: [557299.798439] Chrome_ChildThr[6233]: segfault at 0 ip 00007f31547eac79 sp 00007f31514a9ba0 error 6 in libxul.so (deleted)[7f315452a000+3f89000]
> May 12 15:01:12 portable-alex kernel: [557299.798613] Chrome_ChildThr[6248]: segfault at 0 ip 00007f21d40eac79 sp 00007f21d0da9ba0 error 6 in libxul.so (deleted)[7f21d3e2a000+3f89000]
> May 12 15:01:12 portable-alex kernel: [557299.798623] Chrome_ChildThr[6284]: segfault at 0 ip 00007f676a8eac79 sp 00007f67675a9ba0 error 6
> May 12 15:01:12 portable-alex kernel: [557299.798623] Chrome_ChildThr[6256]: segfault at 0 ip 00007f72428eac79 sp 00007f723f5a9ba0 error 6
> May 12 15:01:12 portable-alex kernel: [557299.798627]  in libxul.so (deleted)[7f676a62a000+3f89000] in libxul.so (deleted)[7f724262a000+3f89000]

Kats, would the line referring to a deleted libxul be consistent with that issue of re-using wrong code at upgrade time ?
Flags: needinfo?(bugmail)
I don't think that libxul.so (deleted) is relevant here. That looks a lot like output from /proc/*/maps which refers to regions in memory.
Flags: needinfo?(bugmail)
If you were going through the updater, this is almost certainly the bug where our updater replaces the content process while the parent process is still running. A Windows fix is in the works (bug 1112937), but I don't know about Linux.
Flags: needinfo?(dvander)
(Reporter)

Comment 9

5 months ago
I had some of that spurious crashes again today: https://crash-stats.mozilla.com/report/index/b99e54e9-df35-4c32-90cc-ff9440170707
(Reporter)

Comment 10

5 months ago
Created attachment 8884202 [details]
two GPUs hangs?
You need to log in before you can comment on or make changes to this bug.