Closed Bug 1596526 Opened 5 years ago Closed 5 years ago

gnome - start dbus prior to run-task

Categories

(Firefox Build System :: Task Configuration, task)

task
Not set
normal

Tracking

(firefox72 fixed)

RESOLVED FIXED
mozilla72
Tracking Status
firefox72 --- fixed

People

(Reporter: egao, Assigned: egao)

References

Details

Attachments

(1 file, 1 obsolete file)

From a slack group chat with @glandium, he noted that gnome-session-binary was having issues launching in the docker image:

gnome-session-binary[30]: ERROR: Failed to connect to system bus: Could not connect: No such file or directory
aborting...

He was able to successfully have gnome-shell and gnome-session-binary launch after making some changes to the docker image:
https://hg.mozilla.org/try/rev/d9346bb7139dd8020f77ef0c196b66a1fe175c63

It appears that having dbus running on the system is critical for GNOME-shell based systems such as Debian 10 for some reason. Without dbus running, an unmanageable number of failures crop up on almost all test suites.

For example, compare the following:
Without dbus: https://treeherder.mozilla.org/#/jobs?repo=try&group_state=expanded&revision=fcc9dddb3631762427e13bfb2f1964e5d43e204c&selectedJob=274976282
With dbus: https://treeherder.mozilla.org/#/jobs?repo=try&group_state=expanded&revision=446b072c64d67ee4cb2938fba495504efbc0158e&selectedJob=276318105

The number of failures in mochitest-browser-chrome-e10s- drops from 40+ to just 1. Execution time of the chunk also drops from 60min to 17min.

The task for this investigation item is to try various means to have the dbus daemon start in the docker container without affecting existing tasks (eg. Ubuntu1604).

Various approaches were tried:

privileged container

https://treeherder.mozilla.org/#/jobs?repo=try&group_state=expanded&revision=813b0071a058668ba7c7d689580338d3d86cb285

Did not work.
Linux64/opt build succeeds but test container cannot be created.

[taskcluster 2019-11-13 02:00:49.123Z] Image 'public/image.tar.zst' from task 'RwP-gkzjROKtCb_nYCADQg' loaded.  Using image ID sha256:9d7efd65177d3339cbe56007775038a93d43bd3fbaa55c517a20f093defba7c0.

[taskcluster:error] Docker configuration could not be created.  This may indicate an authentication error when validating scopes necessary for running the task. 
 Error: Cannot run task using docker privileged mode.  Worker must be enabled to allow running of privileged tasks.
    at runAsPrivileged (/home/ubuntu/docker_worker/src/lib/task.js:122:11)
    at Task.dockerConfig (/home/ubuntu/docker_worker/src/lib/task.js:327:26)
    at Task.run (/home/ubuntu/docker_worker/src/lib/task.js:887:33)
    at Task.start (/home/ubuntu/docker_worker/src/lib/task.js:700:17)
    at TaskListener.runTaskset (/home/ubuntu/docker_worker/src/lib/task_listener.js:519:9)
    at async Promise.all (index 0)
[taskcluster 2019-11-13 02:00:49.285Z] Unsuccessful task run with exit code: -1 completed in 340.784 seconds

entrypoint in docker image

https://treeherder.mozilla.org/#/jobs?repo=try&group_state=expanded&revision=733a6527dd7574210d2ab189ceebd38502963570

Did not work.
Simply runs the dbus initialization and exits:

[taskcluster 2019-11-12 20:29:17.784Z] === Task Starting ===
[....] Starting system message bus: dbus ok 8
[taskcluster 2019-11-12 20:29:18.909Z] === Task Finished ===
[taskcluster 2019-11-12 20:29:19.028Z] Artifact "public/logs/" not found at "/builds/worker/workspace/logs/"
[taskcluster 2019-11-12 20:29:19.235Z] Artifact "public/test_info/" not found at "/builds/worker/workspace/build/blobber_upload_dir/"
[taskcluster 2019-11-12 20:29:19.415Z] Successful task run with exit code: 0 completed in 323.077 seconds

Running systemd required a privileged container, but starting dbus alone doesn't. It only requires root, which run-task drops before running linux-test.sh.

multiple CMD calls in docker

https://treeherder.mozilla.org/#/jobs?repo=try&group_state=expanded&selectedJob=276262415&revision=259673e8569038f7b243527d4f6f24d4d43b2447

Did not work.
Docker image can be built successfully, and Linux64/opt build also completes successfully.
However, the task exceptioned out for some reason.

Modify the command in run_task.py

https://treeherder.mozilla.org/#/jobs?repo=try&group_state=expanded&selectedJob=276262415&revision=abd1c83e335675a49e10b2321696adaae396dbe9

Did not work.
It causes a docker failure:

[taskcluster 2019-11-14 00:32:53.845Z] Image 'public/image.tar.zst' from task 'RwP-gkzjROKtCb_nYCADQg' loaded.  Using image ID sha256:9d7efd65177d3339cbe56007775038a93d43bd3fbaa55c517a20f093defba7c0.
[taskcluster 2019-11-14 00:32:54.102Z] === Task Starting ===

[taskcluster:error] Failure to properly start execution environment.

[taskcluster:error] (HTTP code 400) unexpected - OCI runtime create failed: container_linux.go:348: starting container process caused "exec: \"bash -c etc/init.d/dbus start;\": stat bash -c etc/init.d/dbus start;: no such file or directory": unknown 
[taskcluster 2019-11-14 00:32:55.474Z] === Task Finished ===
[taskcluster 2019-11-14 00:32:55.588Z] Artifact "public/test_info/" not found at "/builds/worker/workspace/build/blobber_upload_dir/"
[taskcluster 2019-11-14 00:32:55.800Z] Artifact "public/logs/" not found at "/builds/worker/workspace/logs/"
[taskcluster 2019-11-14 00:32:55.931Z] Unsuccessful task run with exit code: -1 completed in 297.666 seconds

What appears to have worked:

modify run_task to initialize dbus prior to task start

https://treeherder.mozilla.org/#/jobs?repo=try&group_state=expanded&selectedJob=276262415&revision=e7f2244a4595a70d6e028be1d85a580c5d8d32f0
https://hg.mozilla.org/try/rev/d711e96fcc08259c02485b18493c58f342f5b455

This approach was able to produce proper test outcomes.

test-linux64/opt-mochitest-e10s-1 running on debian10 produces a green run, which is what :glandium was able to observe with his manual editing of the task.

(In reply to Mike Hommey [:glandium] (high latency) from comment #2)

Running systemd required a privileged container, but starting dbus alone doesn't. It only requires root, which run-task drops before running linux-test.sh.

Yes - I was able to arrive at a solution that may not be the most ideal, but works and does not impact other existing tasks (eg. Ubuntu 1604, toolchain, etc). I did want to document the approaches I tried.

Attachment #9108856 - Attachment description: Bug 1596526 - if linux, start dbus in run-task to support debian 10 tests → Bug 1596526 - if linux, start dbus as root in run-task

It turns out that fix for this is also applicable to Ubuntu 18.04.

Note the following pushes:
Ubuntu 18.04 (straight, fixes only to ubuntu1604-test-system-setup.sh): https://treeherder.mozilla.org/#/jobs?repo=try&group_state=expanded&revision=fcc9dddb3631762427e13bfb2f1964e5d43e204c
Ubuntu 18.04 (with dbus fix applied): https://treeherder.mozilla.org/#/jobs?repo=try&group_state=expanded&revision=446b072c64d67ee4cb2938fba495504efbc0158e

For example, note the test mochitest-e10s-1.
The original 18.04 push has the same number of failures, at the same tests. This is identical to the results produced by original Debian 10 image.
The fixed 18.04 push is now green. This is identical to the results produced by the fixed Debian 10 image.

It is the same story with the test mochitest-chrome-1proc-1.
Original 18.04 push has multiple failures;
Fixed 18.04 push is green.

For other cases such as mochitest-browser-chrome-e10s-1, failures remain but:
Original 18.04 push has 50+ failures;
Fixed 18.04 push has 1 failure.


Performance wise, the variation between Ubuntu 18.04 and Debian 10 are minor, with some chunks in favor of Ubuntu 18.04 and others in favor of Debian 10.

The only clear winner by Debian 10 is for the docker test image build task, where Debian 10 is 50% faster than Ubuntu 18.04.

Attachment #9108856 - Attachment description: Bug 1596526 - if linux, start dbus as root in run-task → Bug 1596526 - if docker-worker and test system is linux, prepend call to initialize dbus
Attachment #9108856 - Attachment description: Bug 1596526 - if docker-worker and test system is linux, prepend call to initialize dbus → Bug 1596526 - use docker entrypoint to initialize dbus for Ubuntu 18.04 test image and set desktop session parameters accordingly in test-linux.sh
Pushed by egao@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/f03b4ad0f580 use docker entrypoint to initialize dbus for Ubuntu 18.04 test image and set desktop session parameters accordingly in test-linux.sh r=jmaher
Attachment #9110422 - Attachment is obsolete: true
Status: NEW → RESOLVED
Closed: 5 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla72
Assignee: nobody → egao

Changed title to refer to GNOME as it is not an issue specific to debian 10.

Summary: debian 10 - start dbus prior to run-task → gnome - start dbus prior to run-task
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: