gnome - start dbus prior to run-task
Categories
(Firefox Build System :: Task Configuration, task)
Tracking
(firefox72 fixed)
Tracking | Status | |
---|---|---|
firefox72 | --- | fixed |
People
(Reporter: egao, Assigned: egao)
References
Details
Attachments
(1 file, 1 obsolete file)
From a slack group chat with @glandium, he noted that gnome-session-binary
was having issues launching in the docker image:
gnome-session-binary[30]: ERROR: Failed to connect to system bus: Could not connect: No such file or directory
aborting...
He was able to successfully have gnome-shell
and gnome-session-binary
launch after making some changes to the docker image:
https://hg.mozilla.org/try/rev/d9346bb7139dd8020f77ef0c196b66a1fe175c63
It appears that having dbus
running on the system is critical for GNOME-shell based systems such as Debian 10 for some reason. Without dbus
running, an unmanageable number of failures crop up on almost all test suites.
For example, compare the following:
Without dbus: https://treeherder.mozilla.org/#/jobs?repo=try&group_state=expanded&revision=fcc9dddb3631762427e13bfb2f1964e5d43e204c&selectedJob=274976282
With dbus: https://treeherder.mozilla.org/#/jobs?repo=try&group_state=expanded&revision=446b072c64d67ee4cb2938fba495504efbc0158e&selectedJob=276318105
The number of failures in mochitest-browser-chrome-e10s-
drops from 40+ to just 1. Execution time of the chunk also drops from 60min to 17min.
The task for this investigation item is to try various means to have the dbus
daemon start in the docker container without affecting existing tasks (eg. Ubuntu1604).
Assignee | ||
Comment 1•5 years ago
•
|
||
Various approaches were tried:
privileged container
Did not work.
Linux64/opt build succeeds but test container cannot be created.
[taskcluster 2019-11-13 02:00:49.123Z] Image 'public/image.tar.zst' from task 'RwP-gkzjROKtCb_nYCADQg' loaded. Using image ID sha256:9d7efd65177d3339cbe56007775038a93d43bd3fbaa55c517a20f093defba7c0.
[taskcluster:error] Docker configuration could not be created. This may indicate an authentication error when validating scopes necessary for running the task.
Error: Cannot run task using docker privileged mode. Worker must be enabled to allow running of privileged tasks.
at runAsPrivileged (/home/ubuntu/docker_worker/src/lib/task.js:122:11)
at Task.dockerConfig (/home/ubuntu/docker_worker/src/lib/task.js:327:26)
at Task.run (/home/ubuntu/docker_worker/src/lib/task.js:887:33)
at Task.start (/home/ubuntu/docker_worker/src/lib/task.js:700:17)
at TaskListener.runTaskset (/home/ubuntu/docker_worker/src/lib/task_listener.js:519:9)
at async Promise.all (index 0)
[taskcluster 2019-11-13 02:00:49.285Z] Unsuccessful task run with exit code: -1 completed in 340.784 seconds
entrypoint in docker image
Did not work.
Simply runs the dbus
initialization and exits:
[taskcluster 2019-11-12 20:29:17.784Z] === Task Starting ===
[....] Starting system message bus: dbus ok 8
[taskcluster 2019-11-12 20:29:18.909Z] === Task Finished ===
[taskcluster 2019-11-12 20:29:19.028Z] Artifact "public/logs/" not found at "/builds/worker/workspace/logs/"
[taskcluster 2019-11-12 20:29:19.235Z] Artifact "public/test_info/" not found at "/builds/worker/workspace/build/blobber_upload_dir/"
[taskcluster 2019-11-12 20:29:19.415Z] Successful task run with exit code: 0 completed in 323.077 seconds
Comment 2•5 years ago
|
||
Running systemd required a privileged container, but starting dbus alone doesn't. It only requires root, which run-task drops before running linux-test.sh.
Assignee | ||
Comment 3•5 years ago
•
|
||
multiple CMD calls in docker
Did not work.
Docker image can be built successfully, and Linux64/opt build also completes successfully.
However, the task exceptioned out for some reason.
Modify the command in run_task.py
Did not work.
It causes a docker failure:
[taskcluster 2019-11-14 00:32:53.845Z] Image 'public/image.tar.zst' from task 'RwP-gkzjROKtCb_nYCADQg' loaded. Using image ID sha256:9d7efd65177d3339cbe56007775038a93d43bd3fbaa55c517a20f093defba7c0.
[taskcluster 2019-11-14 00:32:54.102Z] === Task Starting ===
[taskcluster:error] Failure to properly start execution environment.
[taskcluster:error] (HTTP code 400) unexpected - OCI runtime create failed: container_linux.go:348: starting container process caused "exec: \"bash -c etc/init.d/dbus start;\": stat bash -c etc/init.d/dbus start;: no such file or directory": unknown
[taskcluster 2019-11-14 00:32:55.474Z] === Task Finished ===
[taskcluster 2019-11-14 00:32:55.588Z] Artifact "public/test_info/" not found at "/builds/worker/workspace/build/blobber_upload_dir/"
[taskcluster 2019-11-14 00:32:55.800Z] Artifact "public/logs/" not found at "/builds/worker/workspace/logs/"
[taskcluster 2019-11-14 00:32:55.931Z] Unsuccessful task run with exit code: -1 completed in 297.666 seconds
Assignee | ||
Comment 4•5 years ago
|
||
What appears to have worked:
modify run_task to initialize dbus
prior to task start
https://treeherder.mozilla.org/#/jobs?repo=try&group_state=expanded&selectedJob=276262415&revision=e7f2244a4595a70d6e028be1d85a580c5d8d32f0
https://hg.mozilla.org/try/rev/d711e96fcc08259c02485b18493c58f342f5b455
This approach was able to produce proper test outcomes.
test-linux64/opt-mochitest-e10s-1
running on debian10 produces a green run, which is what :glandium was able to observe with his manual editing of the task.
Assignee | ||
Comment 5•5 years ago
|
||
(In reply to Mike Hommey [:glandium] (high latency) from comment #2)
Running systemd required a privileged container, but starting dbus alone doesn't. It only requires root, which run-task drops before running linux-test.sh.
Yes - I was able to arrive at a solution that may not be the most ideal, but works and does not impact other existing tasks (eg. Ubuntu 1604, toolchain, etc). I did want to document the approaches I tried.
Assignee | ||
Comment 6•5 years ago
|
||
Updated•5 years ago
|
Assignee | ||
Comment 7•5 years ago
|
||
It turns out that fix for this is also applicable to Ubuntu 18.04.
Note the following pushes:
Ubuntu 18.04 (straight, fixes only to ubuntu1604-test-system-setup.sh
): https://treeherder.mozilla.org/#/jobs?repo=try&group_state=expanded&revision=fcc9dddb3631762427e13bfb2f1964e5d43e204c
Ubuntu 18.04 (with dbus fix applied): https://treeherder.mozilla.org/#/jobs?repo=try&group_state=expanded&revision=446b072c64d67ee4cb2938fba495504efbc0158e
For example, note the test mochitest-e10s-1
.
The original 18.04 push has the same number of failures, at the same tests. This is identical to the results produced by original Debian 10 image.
The fixed 18.04 push is now green. This is identical to the results produced by the fixed Debian 10 image.
It is the same story with the test mochitest-chrome-1proc-1
.
Original 18.04 push has multiple failures;
Fixed 18.04 push is green.
For other cases such as mochitest-browser-chrome-e10s-1
, failures remain but:
Original 18.04 push has 50+ failures;
Fixed 18.04 push has 1 failure.
Performance wise, the variation between Ubuntu 18.04 and Debian 10 are minor, with some chunks in favor of Ubuntu 18.04 and others in favor of Debian 10.
The only clear winner by Debian 10 is for the docker test image build task, where Debian 10 is 50% faster than Ubuntu 18.04.
Updated•5 years ago
|
Comment 8•5 years ago
|
||
Updated•5 years ago
|
Updated•5 years ago
|
Comment 10•5 years ago
|
||
bugherder |
Updated•5 years ago
|
Assignee | ||
Comment 11•5 years ago
|
||
Changed title to refer to GNOME as it is not an issue specific to debian 10.
Description
•