Closed
Bug 1460127
Opened 6 years ago
Closed 6 years ago
Failed to connect GPU process
Categories
(Core :: IPC, defect)
Tracking
()
VERIFIED
FIXED
mozilla62
Tracking | Status | |
---|---|---|
firefox-esr52 | --- | unaffected |
firefox-esr60 | --- | unaffected |
firefox60 | --- | unaffected |
firefox61 | --- | unaffected |
firefox62 | --- | verified |
firefox63 | --- | verified |
People
(Reporter: tgnff242, Assigned: spohl)
References
(Blocks 1 open bug, )
Details
(Keywords: nightly-community, regression)
Crash Data
Attachments
(3 files)
User Agent: Mozilla/5.0 (X11; Linux x86_64; rv:62.0) Gecko/20100101 Firefox/62.0 Build ID: 20180508231737 Steps to reproduce: 1. Create a new profile on a Linux system. 2. Open about:config and create a new boolean variable named layers.gpu-process.enabled and set it to true. 3. Restart Firefox. Actual results: Check the Graphics section of about:support. GPU process hasn't been initialized. Expected results: GPU process isn't by default enabled on Linux, but this is a regression. I thought it would be better to report it. Mozregression result points to Bug 1366808. 3:29.86 INFO: No more inbound revisions, bisection finished. 3:29.86 INFO: Last good revision: fb2b32cae6e816e4f3da342203e8ea31bb840c13 3:29.86 INFO: First bad revision: eb036f55167d9369d12c131ae019a1a633986009 3:29.86 INFO: Pushlog: https://hg.mozilla.org/integration/mozilla-inbound/pushloghtml?fromchange=fb2b32cae6e816e4f3da342203e8ea31bb840c13&tochange=eb036f55167d9369d12c131ae019a1a633986009 It registers as a crash. Example report: https://crash-stats.mozilla.com/report/index/1a9be4bc-9279-43ea-bf89-663970180509
Updated•6 years ago
|
Blocks: 1366808
Crash Signature: [@ libc-2.27.so@0x135df6 ]
Has Regression Range: --- → yes
Has STR: --- → yes
Component: Untriaged → IPC
Flags: needinfo?(spohl.mozilla.bugs)
Keywords: regression
OS: Unspecified → Linux
Product: Firefox → Core
Hardware: Unspecified → x86_64
Assignee | ||
Comment 1•6 years ago
|
||
Thanks for the report. Will take a look tomorrow morning.
Comment 2•6 years ago
|
||
Debian Testing, KDE, Radeon RX480
"Failed to connect GPU process". WebRender does not start. Nightly falls back to OpenGL. For stability reasons I prefer using the GPU process.
> failures [GFX1-]: Failed to connect GPU process
>
> Entscheidungsprotokoll
> GPU_PROCESS
> disabled by default: Disabled by default
> force_enabled by user: User force-enabled via pref
> failed by runtime: Failed to connect GPU process
> WEBRENDER
> opt-in by default: WebRender is an opt-in feature
> available by user: Force enabled by pref
> unavailable by runtime: GPU Process is disabled
> OMTP
> disabled by default: Disabled by default
Severity: normal → major
Status: UNCONFIRMED → NEW
Crash Signature: [@ libc-2.27.so@0x135df6 ] → [@ libc-2.27.so@0x135df6 ]
[@ libc-2.27.so@0x134d96 ]
status-firefox62:
--- → affected
Ever confirmed: true
Keywords: nightly-community
Assignee | ||
Comment 3•6 years ago
|
||
The issue here is that the GPU process does not know the parent's build ID, which results in the GPU process exiting to prevent buildID mismatches. It looks like the GPU process is launched by a child content process, which seems to indicate that the parent process properly sent its build ID to the child. If this is the case, the child is currently failing to pass the parent's build ID to the GPU process during launch. I haven't been able to verify this yet since the call stacks in the reports are not fully symbolicated and I don't currently know where in the code this is occurring. Still looking into it.
Comment 4•6 years ago
|
||
Concretization of comment 2: 1. Fresh Profile on Linux: Create layers.gpu-process.enabled;true. Set gfx.webrender.all to true. 2. Restart and open about:support: "Compositing" is now mostly OpenGL instead of WebRender. The failure log now contains "Failed to connect GPU process". (Just want to mention it in case they have a common underlying issue: layers.gpu-process.enabled;true + gfx.webrender.all;true + extensions.webextensions.remote;true = Crash = bug 1406230. But OOP Webextensions (bug 1357487) do work with WebRender if the GPU process is left disabled.)
Assignee | ||
Comment 5•6 years ago
|
||
(In reply to Jan Andre Ikenmeyer [:darkspirit] from comment #4) > Concretization of comment 2: > 1. Fresh Profile on Linux: Create layers.gpu-process.enabled;true. Set > gfx.webrender.all to true. > 2. Restart and open about:support: "Compositing" is now mostly OpenGL > instead of WebRender. The failure log now contains "Failed to connect GPU > process". I just built locally, set the two prefs, but I don't seem to be able to reproduce the issue. Compositing shows "WebRender" and the GPU process seems to be up and running. What is the "failure log" that you're referring to, and how might I be able to find it? Also, I've checked how the processes are launched via `ps` and all content processes appear to get the -parentBuildID param passed. From what I can tell, the GPU process is launched as `-contentproc` as well.
Flags: needinfo?(spohl.mozilla.bugs) → needinfo?(jan)
Comment 6•6 years ago
|
||
mozregression --launch 2018-05-10 --pref gfx.webrender.all:true layers.gpu-process.enabled:true startup.homepage_welcome_url:'about:support'
Flags: needinfo?(jan)
Comment 7•6 years ago
|
||
I just used https://superuser.com/a/222924 to look if a gpu process was started at all because ps / ksysguard do not show one. With a good Nightly I see > type=EXECVE msg=audit(11.05.2018 03:50:01.053:3538411) : argc=12 a0=/tmp/tmpdQEV1r/firefox/firefox a1=-contentproc a2=-greomni a3=/tmp/tmpdQEV1r/firefox/omni.ja a4=-appomni a5=/tmp/tmpdQEV1r/firefox/browser/omni.ja a6=-appdir a7=/tmp/tmpdQEV1r/firefox/browser a9=4401 a10=true a11=gpu but not with a bad one. (I never used this tool before, I might be wrong.) ----- > $ mozregression --launch 2018-05-09 --pref gfx.webrender.all:true layers.gpu-process.enabled:true startup.homepage_welcome_url:'about:support' -B debug > ********** > You should use a config file. Please use the --write-config command line flag to help you create one. > ********** > > 0:03.54 INFO: Downloading build from: https://queue.taskcluster.net/v1/task/PDxyFCc4TIKyrTdP0vNTqg/runs/0/artifacts/public%2Fbuild%2Ftarget.tar.bz2 > ===== Downloaded 100% ===== > 0:21.10 INFO: Running mozilla-inbound build built on 2018-05-09 23:56:12.353000, revision a8c2b116 > 0:40.24 INFO: Launching /tmp/tmpgpswm3/firefox/firefox > 0:40.24 INFO: Application command: /tmp/tmpgpswm3/firefox/firefox -profile /tmp/tmpNwnia5.mozrunner > 0:40.26 INFO: application_buildid: 20180509233616 > 0:40.26 INFO: application_changeset: a8c2b11687fcb76bbd256314abeeba72af760146 > 0:40.26 INFO: application_name: Firefox > 0:40.26 INFO: application_repository: https://hg.mozilla.org/integration/mozilla-inbound > 0:40.26 INFO: application_version: 62.0a1 > 0:42.78 INFO: [5600, Main Thread] WARNING: NS_ENSURE_SUCCESS(rv, rv) failed with result 0x80520012: file /builds/worker/workspace/build/src/extensions/cookie/nsPermissionManager.cpp, line 2910 > 0:43.06 INFO: [5600, Main Thread] WARNING: GLX_swap_control unsupported, ASAP mode may still block on buffer swaps.: file /builds/worker/workspace/build/src/gfx/gl/GLContextProviderGLX.cpp, line 219 > 0:43.09 INFO: [GLX] window 4ea has VisualID 0x21 > 0:43.11 INFO: [5600, GLXVsyncThread] WARNING: robust_buffer_access_behavior marked as unsupported: file /builds/worker/workspace/build/src/gfx/gl/GLContextFeatures.cpp, line 915 > 0:43.15 INFO: [5600, Gecko_IOThread] WARNING: pipe error (45): Die Verbindung wurde vom Kommunikationspartner zurückgesetzt: file /builds/worker/workspace/build/src/ipc/chromium/src/chrome/common/ipc_channel_posix.cc, line 353 > 0:43.26 INFO: ++DOCSHELL 0x7fe9f27c0000 == 1 [pid = 5600] [id = {f677b474-e92a-4115-be5f-4df9e1d733a9}] > 0:43.26 INFO: ++DOMWINDOW == 1 (0x7fe9f0987600) [pid = 5600] [serial = 1] [outer = (nil)] > 0:43.34 INFO: ++DOMWINDOW == 2 (0x7fe9f0a63c00) [pid = 5600] [serial = 2] [outer = 0x7fe9f0987600] > 0:43.64 INFO: ++DOCSHELL 0x7fe9da1a7000 == 2 [pid = 5600] [id = {a02b5a22-b0c7-45da-8a12-e42504b3f7f2}] > 0:43.64 INFO: ++DOMWINDOW == 3 (0x7fe9da1c1800) [pid = 5600] [serial = 3] [outer = (nil)] > 0:43.64 INFO: ++DOMWINDOW == 4 (0x7fe9da2e9c00) [pid = 5600] [serial = 4] [outer = 0x7fe9da1c1800] > 0:43.66 INFO: [5600, Main Thread] WARNING: NS_ENSURE_SUCCESS(rv, rv) failed with result 0x80040111: file /builds/worker/workspace/build/src/netwerk/protocol/res/SubstitutingProtocolHandler.cpp, line 338 > 0:43.67 INFO: [GFX1-]: Failed to connect GPU process
Assignee | ||
Comment 8•6 years ago
|
||
I was able to determine that a GPU process was launched using https://github.com/brendangregg/perf-tools.git. The command line was: firefox-bin -contentproc -parentBuildID 20180510100205 -greomni /home/spohl/Desktop/firefox/omni.ja -appomni /home/spohl/Desktop/firefox/browser/omni.ja -appdir /home/spohl/Desktop/firefox/browser 58334 true gpu As can be seen, the parentBuildID was properly passed to this process. However, it crashes in GPUProcessImpl::Init during strcmp. My suspicion right now is that aArgc is not valid for some reason. I'm building firefox locally to investigate further.
Assignee | ||
Comment 9•6 years ago
|
||
Well, this turned out to be a failure on multiple fronts. The code for the `for` loop in GPUProcessImpl::Init[1] (introduced in bug 1366808) was copied from ContentChild::Init[2]. However, the copied code did not include the null check[3] at the top of the `for` loop. The null check is necessary because we are currently dereferencing aArgv[aArgc] in strcmp, which is expected to be null. This should fail on other platforms as well, but may depend on the way XRE_InitCommandLine initializes the command line arguments. aArgc may be less than the actual number of arguments in aArgv. Furthermore, the loop was changed in bug 1447246 and I did not pick up on the changes before landing bug 1366808. This has been corrected as well. [1] https://dxr.mozilla.org/mozilla-central/rev/0cd106a2eb78aa04fd481785257e6f4f9b94707b/gfx/ipc/GPUProcessImpl.cpp#35 [2] https://dxr.mozilla.org/mozilla-central/rev/6cffa8738ca5/dom/ipc/ContentProcess.cpp#119 [3] https://dxr.mozilla.org/mozilla-central/rev/6cffa8738ca5/dom/ipc/ContentProcess.cpp#120-122
Assignee: nobody → spohl.mozilla.bugs
Status: NEW → ASSIGNED
Attachment #8974882 -
Flags: review?(jmathies)
Assignee | ||
Comment 10•6 years ago
|
||
https://treeherder.mozilla.org/#/jobs?repo=try&revision=49c02c30a268894768cd2de099d5ed674d48a2af
Comment 11•6 years ago
|
||
(In reply to Stephen A Pohl [:spohl] from comment #10) > https://treeherder.mozilla.org/#/jobs?repo=try&revision=49c02c30a268894768cd2de099d5ed674d48a2af mozregression --repo try --launch 49c02c30a268894768cd2de099d5ed674d48a2af --pref gfx.webrender.all:true layers.gpu-process.enabled:true startup.homepage_welcome_url:'about:support' > Decision Log > GPU_PROCESS > disabled by default: Disabled by default > available by user: Enabled via layers.gpu-process.enabled > WEBRENDER > opt-in by default: WebRender is an opt-in feature > available by user: Force enabled by pref ps auxf > /tmp/tmpUpsOiS/firefox/firefox -contentproc -parentBuildID 20180511043951 -greomni /tmp/tmpUpsOiS/firefox/omni.ja -appomni /tmp/tmpUpsOiS/firefox/browser/omni.ja -appdir /tmp/tmpUpsOiS/firefox/browser 10728 true gpu Thanks, the fix works! :)
Updated•6 years ago
|
Attachment #8974882 -
Flags: review?(jmathies) → review+
Updated•6 years ago
|
status-firefox60:
--- → unaffected
status-firefox61:
--- → unaffected
status-firefox-esr52:
--- → unaffected
status-firefox-esr60:
--- → unaffected
Assignee | ||
Comment 12•6 years ago
|
||
https://hg.mozilla.org/integration/mozilla-inbound/rev/f8ed4fa46e959caf384a535aacc01d8e1a466fb2 Bug 1460127: Fix null-deref to ensure that GPU processes launch and init properly. r=jimm
Comment 13•6 years ago
|
||
bugherder |
https://hg.mozilla.org/mozilla-central/rev/f8ed4fa46e95
Status: ASSIGNED → RESOLVED
Closed: 6 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla62
Updated•6 years ago
|
Flags: qe-verify+
Comment 14•6 years ago
|
||
I successfully reproduced the issue on an Intel integrated video chipset (on Amd the issue was not reproducing), using the STR from Comment 0, on Nightly (2018-05-08) under Ubuntu 16.04 (x64). The issue is not reproducing anymore on latest Nightly 63.0a1 (2018-07-15) and Firefox 62.0b8.
You need to log in
before you can comment on or make changes to this bug.
Description
•