Closed Bug 1460127 Opened 6 years ago Closed 6 years ago

Failed to connect GPU process

Categories

(Core :: IPC, defect)

x86_64
Linux
defect
Not set
major

Tracking

()

VERIFIED FIXED
mozilla62
Tracking Status
firefox-esr52 --- unaffected
firefox-esr60 --- unaffected
firefox60 --- unaffected
firefox61 --- unaffected
firefox62 --- verified
firefox63 --- verified

People

(Reporter: tgnff242, Assigned: spohl)

References

(Blocks 1 open bug, )

Details

(Keywords: nightly-community, regression)

Crash Data

Attachments

(3 files)

Attached file about_support
User Agent: Mozilla/5.0 (X11; Linux x86_64; rv:62.0) Gecko/20100101 Firefox/62.0
Build ID: 20180508231737

Steps to reproduce:

1. Create a new profile on a Linux system.
2. Open about:config and create a new boolean variable named layers.gpu-process.enabled and set it to true.
3. Restart Firefox.


Actual results:

Check the Graphics section of about:support. GPU process hasn't been initialized.


Expected results:

GPU process isn't by default enabled on Linux, but this is a regression. I thought it would be better to report it.

Mozregression result points to Bug 1366808.

 3:29.86 INFO: No more inbound revisions, bisection finished.
 3:29.86 INFO: Last good revision: fb2b32cae6e816e4f3da342203e8ea31bb840c13
 3:29.86 INFO: First bad revision: eb036f55167d9369d12c131ae019a1a633986009
 3:29.86 INFO: Pushlog:
https://hg.mozilla.org/integration/mozilla-inbound/pushloghtml?fromchange=fb2b32cae6e816e4f3da342203e8ea31bb840c13&tochange=eb036f55167d9369d12c131ae019a1a633986009

It registers as a crash. Example report: https://crash-stats.mozilla.com/report/index/1a9be4bc-9279-43ea-bf89-663970180509
Blocks: 1366808
Crash Signature: [@ libc-2.27.so@0x135df6 ]
Has Regression Range: --- → yes
Has STR: --- → yes
Component: Untriaged → IPC
Flags: needinfo?(spohl.mozilla.bugs)
Keywords: regression
OS: Unspecified → Linux
Product: Firefox → Core
Hardware: Unspecified → x86_64
Thanks for the report. Will take a look tomorrow morning.
Debian Testing, KDE, Radeon RX480

"Failed to connect GPU process". WebRender does not start. Nightly falls back to OpenGL. For stability reasons I prefer using the GPU process.

> failures	[GFX1-]: Failed to connect GPU process
> 
> Entscheidungsprotokoll
> GPU_PROCESS	
> 	disabled by default: Disabled by default
> 	force_enabled by user: User force-enabled via pref
> 	failed by runtime: Failed to connect GPU process
> WEBRENDER	
> 	opt-in by default: WebRender is an opt-in feature
> 	available by user: Force enabled by pref
> 	unavailable by runtime: GPU Process is disabled
> OMTP	
> 	disabled by default: Disabled by default
Severity: normal → major
Status: UNCONFIRMED → NEW
Crash Signature: [@ libc-2.27.so@0x135df6 ] → [@ libc-2.27.so@0x135df6 ] [@ libc-2.27.so@0x134d96 ]
Ever confirmed: true
The issue here is that the GPU process does not know the parent's build ID, which results in the GPU process exiting to prevent buildID mismatches. It looks like the GPU process is launched by a child content process, which seems to indicate that the parent process properly sent its build ID to the child. If this is the case, the child is currently failing to pass the parent's build ID to the GPU process during launch. I haven't been able to verify this yet since the call stacks in the reports are not fully symbolicated and I don't currently know where in the code this is occurring. Still looking into it.
Concretization of comment 2:
1. Fresh Profile on Linux: Create layers.gpu-process.enabled;true. Set gfx.webrender.all to true.
2. Restart and open about:support: "Compositing" is now mostly OpenGL instead of WebRender. The failure log now contains "Failed to connect GPU process".

(Just want to mention it in case they have a common underlying issue:
layers.gpu-process.enabled;true + gfx.webrender.all;true + extensions.webextensions.remote;true = Crash = bug 1406230.
But OOP Webextensions (bug 1357487) do work with WebRender if the GPU process is left disabled.)
(In reply to Jan Andre Ikenmeyer [:darkspirit] from comment #4)
> Concretization of comment 2:
> 1. Fresh Profile on Linux: Create layers.gpu-process.enabled;true. Set
> gfx.webrender.all to true.
> 2. Restart and open about:support: "Compositing" is now mostly OpenGL
> instead of WebRender. The failure log now contains "Failed to connect GPU
> process".

I just built locally, set the two prefs, but I don't seem to be able to reproduce the issue. Compositing shows "WebRender" and the GPU process seems to be up and running. What is the "failure log" that you're referring to, and how might I be able to find it?

Also, I've checked how the processes are launched via `ps` and all content processes appear to get the -parentBuildID param passed. From what I can tell, the GPU process is launched as `-contentproc` as well.
Flags: needinfo?(spohl.mozilla.bugs) → needinfo?(jan)
Attached video 2018-05-10_18-36-26.mp4
mozregression --launch 2018-05-10 --pref gfx.webrender.all:true layers.gpu-process.enabled:true startup.homepage_welcome_url:'about:support'
Flags: needinfo?(jan)
I just used https://superuser.com/a/222924 to look if a gpu process was started at all because ps / ksysguard do not show one.

With a good Nightly I see
> type=EXECVE msg=audit(11.05.2018 03:50:01.053:3538411) : argc=12 a0=/tmp/tmpdQEV1r/firefox/firefox a1=-contentproc a2=-greomni a3=/tmp/tmpdQEV1r/firefox/omni.ja a4=-appomni a5=/tmp/tmpdQEV1r/firefox/browser/omni.ja a6=-appdir a7=/tmp/tmpdQEV1r/firefox/browser a9=4401 a10=true a11=gpu 
but not with a bad one. (I never used this tool before, I might be wrong.)

-----

> $ mozregression --launch 2018-05-09 --pref gfx.webrender.all:true layers.gpu-process.enabled:true startup.homepage_welcome_url:'about:support' -B debug
> **********
> You should use a config file. Please use the --write-config command line flag to help you create one.
> **********
> 
> 0:03.54 INFO: Downloading build from: https://queue.taskcluster.net/v1/task/PDxyFCc4TIKyrTdP0vNTqg/runs/0/artifacts/public%2Fbuild%2Ftarget.tar.bz2
> ===== Downloaded 100% =====
> 0:21.10 INFO: Running mozilla-inbound build built on 2018-05-09 23:56:12.353000, revision a8c2b116
> 0:40.24 INFO: Launching /tmp/tmpgpswm3/firefox/firefox
> 0:40.24 INFO: Application command: /tmp/tmpgpswm3/firefox/firefox -profile /tmp/tmpNwnia5.mozrunner
> 0:40.26 INFO: application_buildid: 20180509233616
> 0:40.26 INFO: application_changeset: a8c2b11687fcb76bbd256314abeeba72af760146
> 0:40.26 INFO: application_name: Firefox
> 0:40.26 INFO: application_repository: https://hg.mozilla.org/integration/mozilla-inbound
> 0:40.26 INFO: application_version: 62.0a1
> 0:42.78 INFO: [5600, Main Thread] WARNING: NS_ENSURE_SUCCESS(rv, rv) failed with result 0x80520012: file /builds/worker/workspace/build/src/extensions/cookie/nsPermissionManager.cpp, line 2910
> 0:43.06 INFO: [5600, Main Thread] WARNING: GLX_swap_control unsupported, ASAP mode may still block on buffer swaps.: file /builds/worker/workspace/build/src/gfx/gl/GLContextProviderGLX.cpp, line 219
> 0:43.09 INFO: [GLX] window 4ea has VisualID 0x21
> 0:43.11 INFO: [5600, GLXVsyncThread] WARNING: robust_buffer_access_behavior marked as unsupported: file /builds/worker/workspace/build/src/gfx/gl/GLContextFeatures.cpp, line 915
> 0:43.15 INFO: [5600, Gecko_IOThread] WARNING: pipe error (45): Die Verbindung wurde vom Kommunikationspartner zurückgesetzt: file /builds/worker/workspace/build/src/ipc/chromium/src/chrome/common/ipc_channel_posix.cc, line 353
> 0:43.26 INFO: ++DOCSHELL 0x7fe9f27c0000 == 1 [pid = 5600] [id = {f677b474-e92a-4115-be5f-4df9e1d733a9}]
> 0:43.26 INFO: ++DOMWINDOW == 1 (0x7fe9f0987600) [pid = 5600] [serial = 1] [outer = (nil)]
> 0:43.34 INFO: ++DOMWINDOW == 2 (0x7fe9f0a63c00) [pid = 5600] [serial = 2] [outer = 0x7fe9f0987600]
> 0:43.64 INFO: ++DOCSHELL 0x7fe9da1a7000 == 2 [pid = 5600] [id = {a02b5a22-b0c7-45da-8a12-e42504b3f7f2}]
> 0:43.64 INFO: ++DOMWINDOW == 3 (0x7fe9da1c1800) [pid = 5600] [serial = 3] [outer = (nil)]
> 0:43.64 INFO: ++DOMWINDOW == 4 (0x7fe9da2e9c00) [pid = 5600] [serial = 4] [outer = 0x7fe9da1c1800]
> 0:43.66 INFO: [5600, Main Thread] WARNING: NS_ENSURE_SUCCESS(rv, rv) failed with result 0x80040111: file /builds/worker/workspace/build/src/netwerk/protocol/res/SubstitutingProtocolHandler.cpp, line 338
> 0:43.67 INFO: [GFX1-]: Failed to connect GPU process
I was able to determine that a GPU process was launched using https://github.com/brendangregg/perf-tools.git. The command line was:

firefox-bin -contentproc -parentBuildID 20180510100205 -greomni /home/spohl/Desktop/firefox/omni.ja -appomni /home/spohl/Desktop/firefox/browser/omni.ja -appdir /home/spohl/Desktop/firefox/browser  58334 true gpu

As can be seen, the parentBuildID was properly passed to this process. However, it crashes in GPUProcessImpl::Init during strcmp. My suspicion right now is that aArgc is not valid for some reason. I'm building firefox locally to investigate further.
Attached patch PatchSplinter Review
Well, this turned out to be a failure on multiple fronts. The code for the `for` loop in GPUProcessImpl::Init[1] (introduced in bug 1366808) was copied from ContentChild::Init[2]. However, the copied code did not include the null check[3] at the top of the `for` loop. The null check is necessary because we are currently dereferencing aArgv[aArgc] in strcmp, which is expected to be null. This should fail on other platforms as well, but may depend on the way XRE_InitCommandLine initializes the command line arguments. aArgc may be less than the actual number of arguments in aArgv.

Furthermore, the loop was changed in bug 1447246 and I did not pick up on the changes before landing bug 1366808. This has been corrected as well.

[1] https://dxr.mozilla.org/mozilla-central/rev/0cd106a2eb78aa04fd481785257e6f4f9b94707b/gfx/ipc/GPUProcessImpl.cpp#35
[2] https://dxr.mozilla.org/mozilla-central/rev/6cffa8738ca5/dom/ipc/ContentProcess.cpp#119
[3] https://dxr.mozilla.org/mozilla-central/rev/6cffa8738ca5/dom/ipc/ContentProcess.cpp#120-122
Assignee: nobody → spohl.mozilla.bugs
Status: NEW → ASSIGNED
Attachment #8974882 - Flags: review?(jmathies)
(In reply to Stephen A Pohl [:spohl] from comment #10)
> https://treeherder.mozilla.org/#/jobs?repo=try&revision=49c02c30a268894768cd2de099d5ed674d48a2af

mozregression --repo try --launch 49c02c30a268894768cd2de099d5ed674d48a2af --pref gfx.webrender.all:true layers.gpu-process.enabled:true startup.homepage_welcome_url:'about:support'
> Decision Log
> GPU_PROCESS	
>	disabled by default: Disabled by default
>	available by user: Enabled via layers.gpu-process.enabled
> WEBRENDER	
>	opt-in by default: WebRender is an opt-in feature
>	available by user: Force enabled by pref

ps auxf
> /tmp/tmpUpsOiS/firefox/firefox -contentproc -parentBuildID 20180511043951 -greomni /tmp/tmpUpsOiS/firefox/omni.ja -appomni /tmp/tmpUpsOiS/firefox/browser/omni.ja -appdir /tmp/tmpUpsOiS/firefox/browser  10728 true gpu

Thanks, the fix works! :)
Attachment #8974882 - Flags: review?(jmathies) → review+
https://hg.mozilla.org/integration/mozilla-inbound/rev/f8ed4fa46e959caf384a535aacc01d8e1a466fb2
Bug 1460127: Fix null-deref to ensure that GPU processes launch and init properly. r=jimm
Blocks: 1347793
https://hg.mozilla.org/mozilla-central/rev/f8ed4fa46e95
Status: ASSIGNED → RESOLVED
Closed: 6 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla62
Flags: qe-verify+
I successfully reproduced the issue on an Intel integrated video chipset (on Amd the issue was not reproducing), using the STR from Comment 0, on Nightly (2018-05-08) under Ubuntu 16.04 (x64). 
The issue is not reproducing anymore on latest Nightly 63.0a1 (2018-07-15) and Firefox 62.0b8.
Status: RESOLVED → VERIFIED
Flags: qe-verify+
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: