Update desktop1604-test image to use a newer release
Categories
(Firefox Build System :: Task Configuration, task)
Tracking
(Not tracked)
People
(Reporter: egao, Assigned: egao)
References
(Depends on 4 open bugs)
Details
Attachments
(7 files, 1 obsolete file)
47 bytes,
text/x-phabricator-request
|
Details | Review | |
47 bytes,
text/x-phabricator-request
|
Details | Review | |
47 bytes,
text/x-phabricator-request
|
Details | Review | |
47 bytes,
text/x-phabricator-request
|
Details | Review | |
47 bytes,
text/x-phabricator-request
|
Details | Review | |
47 bytes,
text/x-phabricator-request
|
Details | Review | |
47 bytes,
text/x-phabricator-request
|
Details | Review |
Update the docker image currently used for Ubuntu 16.04 tests to use Ubuntu 18.04.
Assignee | ||
Updated•5 years ago
|
Comment 1•5 years ago
|
||
I know you spent time on this, but I think we shouldn't do this. We should instead switch the test image to Debian. Why? Because we can actually reproducibly build Debian images, but we can't do that for Ubuntu. And not being able to do that means things can break whenever some unrelated change triggers a new docker image build, which has happened a lot until we finally bailed and used a hack (bug 1503756)
Assignee | ||
Comment 2•5 years ago
|
||
:glandium - thanks for your input. I did spend some time on this, but not a significant amount (it was on the side while I did other things).
To confirm, you are proposing that we migrate our linux32/linux64 testing to run on Debian platform correct? That doesn't sound like too much work, though I might be underestimating the difficulty here.
:jmaher - what do you think? Migrating from one distro of linux to another is not a small change. Who do we have signing off on these sorts of decisions?
Comment 3•5 years ago
|
||
I agree getting a reliable Ubuntu image via docker is more difficult than it should be. the question is what do we need for test coverage. It is my understanding that Firefox is the default browser for Ubuntu distributions, and it is hard to tell for Fedora. I wasn't able to get numbers related to breakdown of linux distribution easily, that could help drive the decision.
A few things to consider:
- is it easy to get x86 libraries installed in Fedora as we currently do in Ubuntu?
- are there issues getting DRM installed in order to test media playback?
- what version of Fedora would we use?
- how often do we need to update it? - Fedora only supports software for ~13 months, whereas Ubuntu LTS supports for 5 years
- we should have parity with our hardware installs (driver support, install scripts) - need relops
- possibly bitbar android docker container could be the same? maybe packet.net host os?
Currently we upgrade every few years, which isn't ideal but on a more regular schedule we would need a team to officially support this, both as a docker image and as hardware installs.
:dhouse, in terms of installation on physical hardware in the datacenter (currently the moonshots) do you have preferences or concerns with either Ubuntu or Fedora?
Comment 4•5 years ago
|
||
Why are you talking of Fedora when I was talking about Debian?
Updated•5 years ago
|
Comment 5•5 years ago
|
||
sorry, Debian has 2 year windows for the LTS support which is better than Fedora but not as ideal as Ubuntu. The rest of my questions still stand.
Comment 6•5 years ago
|
||
Debian actually has some longer LTS support, and it's always possible to stay on older versions. Like we're currently using Debian 7 for build tasks.
Comment 7•5 years ago
|
||
Also note that LTS support in Ubuntu is what has broken us in many occasions, because they like to upgrade stuff. And that because of that, in practice, we haven't upgraded the still supported Ubuntu 16.04 for > 6 months.
Comment 8•5 years ago
|
||
BTW, extended LTS support for Debian 7 finished this year, after 6 years. The just released Debian 10 is set to be supported for 5 years.
Assignee | ||
Comment 9•5 years ago
•
|
||
One of the issues I've identified with :jmaher in our discussion is that, as far as I am aware there isn't a point of single authority that makes a call on what distribution and version of the said distribution to use.
I see two factors that will have an impact on decision to use a certain distribution/version of an operating system:
- ease of automation - deterministic builds, package availability, driver stability, etc.
- representative - based on usage numbers for the certain distribution/version
Personally, I am inclined to value the latter, which would mean sacrificing some of the advantages of using Debian (noted by :glandium). Several factors play a role in my reasoning.
Popular usage
Let me preface that Linux distribution statistics are difficult if not impossible to find, so much of the market share of various distributions are a combination of anecdotal evidence, sample polls and general interest.
With that said, it's generally agreed that Ubuntu represents the dominant Linux distribution. This is backed up by various data sources:
- https://thecloudmarket.com/stats#/by_platform_definition
- https://trends.google.com/trends/explore?q=%2Fm%2F03x5qm,Debian,Mint%20Linux
It isn't known if 18.04 has a majority within Ubuntu, but it's likely a fair assumption since it's the latest LTS release. I was not able to retrieve data on this, despite an hour or so playing with telemetry data.
Familiarity
Current CI seems to have been designed and written assuming Ubuntu, so there's a lot of familiarity with using Ubuntu as the base image. Debian is similar, but different enough that it may cause issues.
Driver support
Admittedly I am not 100% certain about this point, but if tests are ever run on hardware machines or gpu-accelerated machines the difference in driver availability might be problematic.
Package support
There are Debian equivalents to some of the packages that are required, but Ubuntu packages it nicely in a metapackage. For example, see multiverse
and ubuntu-restricted-extras
.
I'll be posting a discussion at mozilla.dev.platform
in the coming days to gather feedback and comments regarding this proposal.
Assignee | ||
Comment 10•5 years ago
|
||
I think, from the lack of response at https://groups.google.com/forum/#!topic/mozilla.dev.platform/HCYoPiBUi8M, I take it that there isn't really a strong case to be made for switching the distribution from Ubuntu despite its pitfalls.
I will wait another week prior to making a call, given that there seems to be no one that is responsible for making a final decision.
Comment 11•5 years ago
|
||
(In reply to Edwin Gao (:egao) from comment #10)
I think, from the lack of response at https://groups.google.com/forum/#!topic/mozilla.dev.platform/HCYoPiBUi8M, I take it that there isn't really a strong case to be made for switching the distribution from Ubuntu despite its pitfalls.
I'd take the opposite conclusion from that, i.e. no one has strong opinions, and glandium already expressed his here (which I agree with).
I'm ni?ing :RyanVM for input from release management.
Comment 12•5 years ago
|
||
I don't have a strong opinion on this either assuming Debian is able to run all the test suites we need it to run.
In general, I think our standards for Linux testing have been lower under the assumption that most users are getting their builds from distros anyway and the number of combinations of components in the wild is mind-bogglingly huge. Also, I don't recall the decision for changing the base OS for Linux tests in the past being one that went outside the various teams responsible for maintaining our automated test infrastructure.
So I guess my tl;dr is to say that going with whatever makes the most sense and is easiest to maintain going forward sounds like the reasonable option here and I don't see any reason to avoid the change based on what's been said here and (not) said on dev.platform.
Assignee | ||
Comment 13•5 years ago
•
|
||
Intent
My intent is to spend two weeks (maximum) to bring the Debian tests to a similar state as Ubuntu 18.04.
Current state of Debian push: https://treeherder.mozilla.org/#/jobs?repo=try&group_state=expanded&revision=cd2e7656a270507ca2beb6a7373c0ec6b334c3eb
Current state of Ubuntu 18.04: https://treeherder.mozilla.org/#/jobs?repo=try&group_state=expanded&revision=35e5ff46e184fff0305a891a216c31e41aac6895&selectedJob=257973741
Reason
After similar amount of effort spent on bootstrapping Ubuntu 18.04 and Debian 10 images, we have a discrepancy in terms of how suitable the resulting images are for running tests.
Some of the challenges faced are:
- debian image can be built, but lots of dependencies are missing (eg.
alsa-base
,ubuntu-restricted-extras
) - tweaks are needed in the test harnesses and scripts (eg.
test-linux.sh
) - lack of window manager
Once the underlying issues (resulting from switching to Debian) are resolved, then the image can be considered on equal footing as Ubuntu 18.04.
What may happen
If I am not able to have a working image that is ready to run a test in two weeks of full-time work, then all work will revert back to using Ubuntu 18.04 which provides a nearly ready-to-use image with a couple of hours of work.
Comment 14•5 years ago
|
||
FWIW, ubuntu-restricted-extras doesn't provide much that is useful to Firefox. Only libavcodec-extra, that it depends on, is.
Firefox doesn't support alsa anymore, so alsa-base shouldn't be necessary.
Assignee | ||
Comment 15•5 years ago
•
|
||
So far, mixed outcomes.
Initial focus has been on getting mochitest suites to run tests. In the initial baseline push https://treeherder.mozilla.org/#/jobs?repo=try&group_state=expanded&selectedJob=257490869&revision=cd2e7656a270507ca2beb6a7373c0ec6b334c3eb it is possible to see that nearly all of mochitest suites fail due to either
pactl list short modules
subprocess call returning an errorpactl load-module module-null-sink
call in test-linux.sh returning an error
The former scenario means the mochitest test harness has initialized, parsed the manifest and performed TEST-SKIP on annotated tests. We're getting further with this scenario.
The latter scenario is something I cannot seem to resolve. I've ensured that pulseaudio
is installed but it seems to have difficulty initializing.
Example of the push can be seen here: https://treeherder.mozilla.org/#/jobs?repo=try&group_state=expanded&revision=10fa9e6091f5592dd290312dfea25e9b1a0494af
Comment 16•5 years ago
|
||
pulseaudio apparently starts, and the first pactl load-module module-null-sink
works. So something might be killing pulseaudio later.
It's also worth noting that a few things are missing: dbus-launch (package dbus-x11), gnome-keyring-daemon (package gnome-keyring), and compiz (but I would advise to use something else than compiz, because it's not in prominent use anymore)
The script is also setting DESKTOP_SESSION=ubuntu, not sure what side effect that might have...
Comment 17•5 years ago
|
||
Also, you'll probably want to do something about bash: warning: setlocale: LC_ALL: cannot change locale (en_US.UTF-8)
. The locales package is installed, but nothing is set in /etc/locale.gen. So you need something like echo en_US.UTF-8 UTF-8 > /etc/locale.gen ; dpkg-reconfigure --frontend=noninteractive locales
.
Comment 18•5 years ago
|
||
pulseaudio is exiting on its own:
I: [pulseaudio] module-suspend-on-idle.c: Sink null idle for too long, suspending ...
D: [pulseaudio] sink.c: null: suspend_cause: (none) -> IDLE
D: [pulseaudio] sink.c: null: state: IDLE -> SUSPENDED
D: [pulseaudio] source.c: null.monitor: suspend_cause: (none) -> IDLE
D: [pulseaudio] source.c: null.monitor: state: IDLE -> SUSPENDED
D: [pulseaudio] core.c: Hmm, no streams around, trying to vacuum.
I: [pulseaudio] module-device-restore.c: Synced.
I: [pulseaudio] core.c: We are idle, quitting...
I: [pulseaudio] main.c: Daemon shutdown initiated.
There's a --exit-idle-time
option that could be passed to avoid this.
Assignee | ||
Comment 19•5 years ago
|
||
I found that the Debian image was having issues setting the LC_ALL=en_US.UTF-8
locale which caused other cascading failures.
After installing the required dependencies like gnome-keyring
, dbus-x11
and making the docker container generate then set the locales, at least a couple of the mochitest
subsuites now run to completion:
Assignee | ||
Comment 20•5 years ago
|
||
I will attempt moving back the pulseaudio
related initialization in test-linux.sh
back to a function, and add the extra argument --exit-idle-time
with value of 600.
Assignee | ||
Comment 21•5 years ago
|
||
That seems to help, though bunch of media tests fail still due to pactl
not being initialized when the tests are run.
However, just prior to the failure in the mochitest harness with pactl
, there is this peculiar error that is not observed on Ubuntu 16.04:
(gst-launch-1.0:866): GStreamer-CRITICAL **: 05:09:26.190: gst_object_unref: assertion '((GObject *) object)->ref_count > 0' failed
I have reasons to doubt that perhaps gstreamer1.0
, which is the current version, is not compatible or has significantly changed behavior.
Currently I am attempting to restore some of the debian jessie repositories so that I may install gstreamer0.1
to see how the tests behave.
Comment 22•5 years ago
|
||
You should probably use -1 as a value for exit-idle-time.
Assignee | ||
Comment 23•5 years ago
|
||
Since the last comment, the following were done:
- enable
jessie
repositories in apt - install 0.10 version of
gstreamer
and relevant libraries
Initially, the mochitest
harness was not able to locate the gst-launcher-0.10
library, because at a couple of places like this line was looking for gst-launcher-0.1
(note the missing 0), leading to:
[task 2019-08-01T18:07:13.280Z] 18:07:13 INFO - usage: runtests.py [options] [test paths]
[task 2019-08-01T18:07:13.280Z] 18:07:13 INFO - runtests.py: error: Missing gst-launch-{0.1,1.0}, required for --use-test-media-devices
[task 2019-08-01T18:07:13.336Z] 18:07:13 ERROR - Return code: 2
[task 2019-08-01T18:07:13.338Z] 18:07:13 ERROR - No checks run.
[task 2019-08-01T18:07:13.339Z] 18:07:13 ERROR - No suite end message was emitted by this harness.
[task 2019-08-01T18:07:13.339Z] 18:07:13 INFO - TinderboxPrint: mochitest-mochitest-plain-chunked<br/><em class="testfail">T-FAIL</em>
[task 2019-08-01T18:07:13.340Z] 18:07:13 ERROR - # TBPL FAILURE #
[task 2019-08-01T18:07:13.341Z] 18:07:13 WARNING - setting return code to 2
[task 2019-08-01T18:07:13.341Z] 18:07:13 ERROR - The mochitest suite: mochitest-plain-chunked ran with return status: FAILURE
Once the string is corrected the issues with gstreamer
not being found is resolved.
That brings the status back to mochitest suites failing with pactl not found
errors.
I will take another look at the dependencies and how pulseaudio
is being initialized.
Assignee | ||
Comment 24•5 years ago
|
||
So far, I haven't had luck in having pulseaudio/pactl
remain initialized when the test harness is run.
When lucky, pactl
initialization passes and the harness begins running tests:
[task 2019-08-06T20:04:38.368Z] 20:04:38 INFO - Running manifest: browser/components/extensions/test/mochitest/mochitest.ini
[task 2019-08-06T20:04:38.765Z] 20:04:38 INFO - Setting pipeline to PAUSED ...
[task 2019-08-06T20:04:38.765Z] 20:04:38 INFO - Pipeline is PREROLLING ...
[task 2019-08-06T20:04:38.766Z] 20:04:38 INFO - Pipeline is PREROLLED ...
[task 2019-08-06T20:04:38.766Z] 20:04:38 INFO - Setting pipeline to PLAYING ...
[task 2019-08-06T20:04:38.766Z] 20:04:38 INFO - New clock: GstSystemClock
[task 2019-08-06T20:04:38.802Z] 20:04:38 INFO - Got EOS from element "pipeline0".
[task 2019-08-06T20:04:38.802Z] 20:04:38 INFO - Execution ended after 33416930 ns.
[task 2019-08-06T20:04:38.802Z] 20:04:38 INFO - Setting pipeline to PAUSED ...
[task 2019-08-06T20:04:38.802Z] 20:04:38 INFO - Setting pipeline to READY ...
[task 2019-08-06T20:04:38.802Z] 20:04:38 INFO - Setting pipeline to NULL ...
[task 2019-08-06T20:04:38.802Z] 20:04:38 INFO - Freeing pipeline ...
[task 2019-08-06T20:04:38.802Z] 20:04:38 INFO - /usr/bin/pactl
[task 2019-08-06T20:04:38.809Z] 20:04:38 INFO - 0 module-device-restore
[task 2019-08-06T20:04:38.810Z] 20:04:38 INFO - 1 module-stream-restore
[task 2019-08-06T20:04:38.810Z] 20:04:38 INFO - 2 module-card-restore
[task 2019-08-06T20:04:38.811Z] 20:04:38 INFO - 3 module-augment-properties
[task 2019-08-06T20:04:38.811Z] 20:04:38 INFO - 4 module-udev-detect
[task 2019-08-06T20:04:38.812Z] 20:04:38 INFO - 6 module-native-protocol-unix
[task 2019-08-06T20:04:38.812Z] 20:04:38 INFO - 7 module-default-device-restore
[task 2019-08-06T20:04:38.813Z] 20:04:38 INFO - 8 module-rescue-streams
[task 2019-08-06T20:04:38.813Z] 20:04:38 INFO - 9 module-always-sink
[task 2019-08-06T20:04:38.814Z] 20:04:38 INFO - 11 module-intended-roles
[task 2019-08-06T20:04:38.814Z] 20:04:38 INFO - 12 module-suspend-on-idle
[task 2019-08-06T20:04:38.815Z] 20:04:38 INFO - 13 module-position-event-sounds
[task 2019-08-06T20:04:38.815Z] 20:04:38 INFO - 14 module-filter-heuristics
[task 2019-08-06T20:04:38.816Z] 20:04:38 INFO - 15 module-filter-apply
[task 2019-08-06T20:04:38.816Z] 20:04:38 INFO - 16 module-switch-on-port-available
[task 2019-08-06T20:04:38.816Z] 20:04:38 INFO - 17 module-null-sink
[task 2019-08-06T20:04:38.822Z] 20:04:38 INFO - 0 module-device-restore
[task 2019-08-06T20:04:38.823Z] 20:04:38 INFO - 1 module-stream-restore
[task 2019-08-06T20:04:38.823Z] 20:04:38 INFO - 2 module-card-restore
[task 2019-08-06T20:04:38.823Z] 20:04:38 INFO - 3 module-augment-properties
[task 2019-08-06T20:04:38.826Z] 20:04:38 INFO - 4 module-udev-detect
[task 2019-08-06T20:04:38.826Z] 20:04:38 INFO - 6 module-native-protocol-unix
[task 2019-08-06T20:04:38.826Z] 20:04:38 INFO - 7 module-default-device-restore
[task 2019-08-06T20:04:38.826Z] 20:04:38 INFO - 8 module-rescue-streams
[task 2019-08-06T20:04:38.826Z] 20:04:38 INFO - 9 module-always-sink
[task 2019-08-06T20:04:38.826Z] 20:04:38 INFO - 11 module-intended-roles
[task 2019-08-06T20:04:38.826Z] 20:04:38 INFO - 12 module-suspend-on-idle
[task 2019-08-06T20:04:38.826Z] 20:04:38 INFO - 13 module-position-event-sounds
[task 2019-08-06T20:04:38.826Z] 20:04:38 INFO - 14 module-filter-heuristics
[task 2019-08-06T20:04:38.826Z] 20:04:38 INFO - 15 module-filter-apply
[task 2019-08-06T20:04:38.826Z] 20:04:38 INFO - 16 module-switch-on-port-available
[task 2019-08-06T20:04:38.826Z] 20:04:38 INFO - 17 module-null-sink
[task 2019-08-06T20:04:39.161Z] 20:04:39 INFO - pk12util: PKCS12 IMPORT SUCCESSFUL
Normally however, the following occurs:
[task 2019-08-06T21:23:01.106Z] 21:23:01 INFO - Running manifest: browser/components/extensions/test/mochitest/mochitest.ini
[task 2019-08-06T21:23:01.489Z] 21:23:01 INFO - Setting pipeline to PAUSED ...
[task 2019-08-06T21:23:01.489Z] 21:23:01 INFO - Pipeline is PREROLLING ...
[task 2019-08-06T21:23:01.490Z] 21:23:01 INFO - Pipeline is PREROLLED ...
[task 2019-08-06T21:23:01.490Z] 21:23:01 INFO - Setting pipeline to PLAYING ...
[task 2019-08-06T21:23:01.491Z] 21:23:01 INFO - New clock: GstSystemClock
[task 2019-08-06T21:23:01.527Z] 21:23:01 INFO - Got EOS from element "pipeline0".
[task 2019-08-06T21:23:01.527Z] 21:23:01 INFO - Execution ended after 33426459 ns.
[task 2019-08-06T21:23:01.527Z] 21:23:01 INFO - Setting pipeline to PAUSED ...
[task 2019-08-06T21:23:01.528Z] 21:23:01 INFO - Setting pipeline to READY ...
[task 2019-08-06T21:23:01.528Z] 21:23:01 INFO - Setting pipeline to NULL ...
[task 2019-08-06T21:23:01.528Z] 21:23:01 INFO - Freeing pipeline ...
[task 2019-08-06T21:23:01.528Z] 21:23:01 INFO - /usr/bin/pactl
[task 2019-08-06T21:23:01.535Z] 21:23:01 INFO - Connection failure: Connection refused
[task 2019-08-06T21:23:01.536Z] 21:23:01 INFO - pa_context_connect() failed: Connection refused
[task 2019-08-06T21:23:01.536Z] 21:23:01 INFO - Traceback (most recent call last):
[task 2019-08-06T21:23:01.537Z] 21:23:01 INFO - File "/builds/worker/workspace/build/tests/mochitest/runtests.py", line 3176, in <module>
[task 2019-08-06T21:23:01.537Z] 21:23:01 INFO - sys.exit(cli())
[task 2019-08-06T21:23:01.537Z] 21:23:01 INFO - File "/builds/worker/workspace/build/tests/mochitest/runtests.py", line 3172, in cli
[task 2019-08-06T21:23:01.538Z] 21:23:01 INFO - return run_test_harness(parser, options)
[task 2019-08-06T21:23:01.538Z] 21:23:01 INFO - File "/builds/worker/workspace/build/tests/mochitest/runtests.py", line 3157, in run_test_harness
[task 2019-08-06T21:23:01.538Z] 21:23:01 INFO - result = runner.runTests(options)
[task 2019-08-06T21:23:01.539Z] 21:23:01 INFO - File "/builds/worker/workspace/build/tests/mochitest/runtests.py", line 2660, in runTests
[task 2019-08-06T21:23:01.539Z] 21:23:01 INFO - res = self.runMochitests(options, tests_in_manifest)
[task 2019-08-06T21:23:01.540Z] 21:23:01 INFO - File "/builds/worker/workspace/build/tests/mochitest/runtests.py", line 2440, in runMochitests
[task 2019-08-06T21:23:01.540Z] 21:23:01 INFO - result = self.doTests(options, testsToRun)
[task 2019-08-06T21:23:01.540Z] 21:23:01 INFO - File "/builds/worker/workspace/build/tests/mochitest/runtests.py", line 2721, in doTests
[task 2019-08-06T21:23:01.540Z] 21:23:01 INFO - devices = findTestMediaDevices(self.log)
[task 2019-08-06T21:23:01.541Z] 21:23:01 INFO - File "/builds/worker/workspace/build/tests/mochitest/runtests.py", line 814, in findTestMediaDevices
[task 2019-08-06T21:23:01.541Z] 21:23:01 INFO - if not null_sink_loaded():
[task 2019-08-06T21:23:01.541Z] 21:23:01 INFO - File "/builds/worker/workspace/build/tests/mochitest/runtests.py", line 810, in null_sink_loaded
[task 2019-08-06T21:23:01.542Z] 21:23:01 INFO - [pactl, 'list', 'short', 'modules'])
[task 2019-08-06T21:23:01.542Z] 21:23:01 INFO - File "/usr/lib/python2.7/subprocess.py", line 223, in check_output
[task 2019-08-06T21:23:01.543Z] 21:23:01 INFO - raise CalledProcessError(retcode, cmd, output=output)
[task 2019-08-06T21:23:01.543Z] 21:23:01 INFO - subprocess.CalledProcessError: Command '['/usr/bin/pactl', 'list', 'short', 'modules']' returned non-zero exit status 1
[task 2019-08-06T21:23:01.571Z] 21:23:01 ERROR - Return code: 1
The condition that differentiates the lucky instance with the normal instance is unknown. Even in the same revision on try
, one instance of the test may be able to successfully initialize pactl
while another instance may fail and throw the subprocess error.
Comment 25•5 years ago
|
||
Your attempts that I found on try don't set the exit timeout for pulseaudio.
Assignee | ||
Comment 26•5 years ago
|
||
(In reply to Mike Hommey [:glandium] from comment #25)
Your attempts that I found on try don't set the exit timeout for pulseaudio.
I've had many pushes; the recent ones (today) I removed the exit timeout to go back to a known state where it somewhat worked.
Comment 27•5 years ago
|
||
The default idle exit timeout is too short. See comment 22.
Assignee | ||
Comment 28•5 years ago
|
||
The problem does not appear to be that exit timeout is too short, since I did make the change to try -1
as a value last week, but it failed to keep pactl
and pulseaudio
initialized. For an example from last week, see https://hg.mozilla.org/try/rev/2e099b2a9f8d71e8c7e36b5abd778f4402b116f8.
What worked was :tomprince's suggestion to use 0
as the timeout at test-linux possibly combined with modification to mochitest/runtests.py.
Changes involved in the file:
- wrap the
subprocess.check_call
call with try/except block to return a boolean - additional call to start, daemonize and set exit timer prior to
pactl load-module module-null-sink
Ideally, instead of initializing again in runtests.py
the better practice would be to check the status using pulseaudio --check
, and execute required actions based on the outcome of that call.
Regardless with this change I've gotten some mochitest suites to green status: https://treeherder.mozilla.org/#/jobs?repo=try&group_state=expanded&revision=a55741c38bce60fe642d78b9213703296869bfc5
Assignee | ||
Comment 29•5 years ago
|
||
Using mozilla-central 2f9fcfd57416a8424ff12a11c9734ee9a2fb6ed0
as baseline, roughly half the tests are green:
This will provide a good starting point to begin filing bugs for developers to address.
Assignee | ||
Updated•5 years ago
|
Assignee | ||
Updated•5 years ago
|
Comment 31•5 years ago
|
||
Note bug 1562627 is a similar-ish pulseaudio issue that's already happening on Ubuntu 16.04.
Comment 32•5 years ago
|
||
(In reply to Ryan VanderMeulen [:RyanVM][PTO Aug 5-9] from comment #12)
I don't have a strong opinion on this either assuming Debian is able to run all the test suites we need it to run.
In general, I think our standards for Linux testing have been lower under the assumption that most users are getting their builds from distros anyway and the number of combinations of components in the wild is mind-bogglingly huge. Also, I don't recall the decision for changing the base OS for Linux tests in the past being one that went outside the various teams responsible for maintaining our automated test infrastructure.
So I guess my tl;dr is to say that going with whatever makes the most sense and is easiest to maintain going forward sounds like the reasonable option here and I don't see any reason to avoid the change based on what's been said here and (not) said on dev.platform.
+1 Sorry I missed the NI about Ubuntu vs {insert base distro here} on hardware.
I think we had picked Ubuntu for the large user-base and the simplicity in getting the drivers and packages. But I don't think we are using anything Ubuntu-specific and could move upstream for the hardware to use Debian also (I'd like to be using the same distro for docker worker and hardware so that we might share some knowledge and config).
jmaher, what do you think? Would it benefit us to switch for perf tests to match the docker worker?
Comment 33•5 years ago
|
||
I would prefer all our linux test machines use the same image. This means:
32 bit linux aws
64 bit linux aws
linux hardware in datacenter
(possibly bitbar)
(possibly packet.net base image for emulators)
Assignee | ||
Updated•5 years ago
|
Assignee | ||
Comment 34•5 years ago
|
||
Assignee | ||
Comment 35•5 years ago
•
|
||
I have a prototype patch attached to this bug that would add the necessary pipings for Debian 10 to be built and used on CI as a drop-in replacement of Ubuntu 16.04.
Please note that since this is still a prototype, I'm sure there are inefficiencies and unnecessary steps taken in the scripts, but this patch will have our CI infra in a state where Debian 10 can be built, used as test images and have some test suites pass.
My goal is to have this patch in mozilla-central so that I can focus on greening test suites, filing bugs for developers and directing them to make a few changes to enable them to test on Debian 10. Once a stable, green baseline is achieved the Dockerfile and test system setup script can be optimized so that unnecessary packages, files and such are not included.
Assignee | ||
Comment 36•5 years ago
|
||
Updated•5 years ago
|
Assignee | ||
Comment 37•5 years ago
•
|
||
:kinetik - I'm not sure who else I could reach out to regarding pulseaudio issues that I'm still running into; I cannot seem to have it reliably start and remain started in my test image, even after the fixes in bug 1572311 (setting pulseaudio --exit-idle-time=-1
). I've kept that bug closed since this is not a GTest specific question.
GTest - https://treeherder.mozilla.org/#/jobs?repo=try&group_state=expanded&revision=fb9298e8a6a5811b82cdb60aa8448ec6088d597a
Other suites (mochitest in particular): https://treeherder.mozilla.org/#/jobs?repo=try&group_state=expanded&selectedJob=263525493&revision=3618c4ddabcba50c66b937a2be770354455a941b
When test-linux.sh
initializes pulseaudio
, the following is the log:
[task 2019-08-26T19:34:43.889Z] + pulseaudio --fail --daemonize --start -vvvv
[task 2019-08-26T19:34:43.912Z] D: [pulseaudio] conf-parser.c: Parsing configuration file '/etc/pulse/client.conf'
[task 2019-08-26T19:34:43.912Z] D: [pulseaudio] conf-parser.c: Parsing configuration file '/etc/pulse/client.conf.d/00-disable-autospawn.conf'
[task 2019-08-26T19:34:43.952Z] I: [pulseaudio] main.c: Daemon startup successful.
[task 2019-08-26T19:34:43.953Z] + pulseaudio --check
[task 2019-08-26T19:34:43.960Z] + '[' 0 -eq 0 ']'
[task 2019-08-26T19:34:43.960Z] + echo 'Pulseaudio successfully initialized'
[task 2019-08-26T19:34:43.960Z] Pulseaudio successfully initialized
[task 2019-08-26T19:34:43.960Z] + pactl load-module module-null-sink
[task 2019-08-26T19:34:43.969Z] 16
When the test is then run in the harness, despite my added code to check for pulseaudio
and call the same initialization as in test-linux.sh
if it is not running, the following is the output:
[task 2019-08-26T19:35:58.029Z] 19:35:58 INFO - Running manifest: accessible/tests/browser/browser.ini
[task 2019-08-26T19:35:58.306Z] 19:35:58 INFO - Setting pipeline to PAUSED ...
[task 2019-08-26T19:35:58.306Z] 19:35:58 INFO - libv4l2: error getting pixformat: Invalid argument
[task 2019-08-26T19:35:58.307Z] 19:35:58 INFO - Pipeline is PREROLLING ...
[task 2019-08-26T19:35:58.307Z] 19:35:58 INFO - Pipeline is PREROLLED ...
[task 2019-08-26T19:35:58.307Z] 19:35:58 INFO - Setting pipeline to PLAYING ...
[task 2019-08-26T19:35:58.307Z] 19:35:58 INFO - New clock: GstSystemClock
[task 2019-08-26T19:35:58.335Z] 19:35:58 INFO - Got EOS from element "pipeline0".
[task 2019-08-26T19:35:58.335Z] 19:35:58 INFO - Execution ended after 33401858 ns.
[task 2019-08-26T19:35:58.335Z] 19:35:58 INFO - Setting pipeline to PAUSED ...
[task 2019-08-26T19:35:58.335Z] 19:35:58 INFO - Setting pipeline to READY ...
[task 2019-08-26T19:35:58.335Z] 19:35:58 INFO - Setting pipeline to NULL ...
[task 2019-08-26T19:35:58.336Z] 19:35:58 INFO - Freeing pipeline ...
[task 2019-08-26T19:35:58.344Z] 19:35:58 INFO - Connection failure: Connection refused
[task 2019-08-26T19:35:58.344Z] 19:35:58 INFO - pa_context_connect() failed: Connection refused
[task 2019-08-26T19:35:58.351Z] 19:35:58 INFO - D: [pulseaudio] conf-parser.c: Parsing configuration file '/etc/pulse/client.conf'
[task 2019-08-26T19:35:58.352Z] 19:35:58 INFO - D: [pulseaudio] conf-parser.c: Parsing configuration file '/etc/pulse/client.conf.d/00-disable-autospawn.conf'
[task 2019-08-26T19:35:58.353Z] 19:35:58 INFO - N: [pulseaudio] main.c: User-configured server at {689cfabc30776e6bfe2e7477e81eaa6d}unix:/tmp/pulse-qrpFpnpvYwVl/native, which appears to be local. Probing deeper.
[task 2019-08-26T19:35:58.354Z] 19:35:58 INFO - I: [pulseaudio] main.c: Daemon startup successful.
[task 2019-08-26T19:35:58.360Z] 19:35:58 INFO - Connection failure: Connection refused
[task 2019-08-26T19:35:58.360Z] 19:35:58 INFO - pa_context_connect() failed: Connection refused
[task 2019-08-26T19:35:58.360Z] 19:35:58 INFO - Traceback (most recent call last):
[task 2019-08-26T19:35:58.360Z] 19:35:58 INFO - File "/builds/worker/workspace/build/tests/mochitest/runtests.py", line 3191, in <module>
[task 2019-08-26T19:35:58.361Z] 19:35:58 INFO - sys.exit(cli())
[task 2019-08-26T19:35:58.361Z] 19:35:58 INFO - File "/builds/worker/workspace/build/tests/mochitest/runtests.py", line 3187, in cli
[task 2019-08-26T19:35:58.361Z] 19:35:58 INFO - return run_test_harness(parser, options)
[task 2019-08-26T19:35:58.361Z] 19:35:58 INFO - File "/builds/worker/workspace/build/tests/mochitest/runtests.py", line 3172, in run_test_harness
[task 2019-08-26T19:35:58.361Z] 19:35:58 INFO - result = runner.runTests(options)
[task 2019-08-26T19:35:58.361Z] 19:35:58 INFO - File "/builds/worker/workspace/build/tests/mochitest/runtests.py", line 2675, in runTests
[task 2019-08-26T19:35:58.361Z] 19:35:58 INFO - res = self.runMochitests(options, tests_in_manifest)
[task 2019-08-26T19:35:58.362Z] 19:35:58 INFO - File "/builds/worker/workspace/build/tests/mochitest/runtests.py", line 2454, in runMochitests
[task 2019-08-26T19:35:58.362Z] 19:35:58 INFO - result = self.doTests(options, testsToRun)
[task 2019-08-26T19:35:58.364Z] 19:35:58 INFO - File "/builds/worker/workspace/build/tests/mochitest/runtests.py", line 2736, in doTests
[task 2019-08-26T19:35:58.364Z] 19:35:58 INFO - devices = findTestMediaDevices(self.log)
[task 2019-08-26T19:35:58.365Z] 19:35:58 INFO - File "/builds/worker/workspace/build/tests/mochitest/runtests.py", line 830, in findTestMediaDevices
[task 2019-08-26T19:35:58.366Z] 19:35:58 INFO - 'module-null-sink'
[task 2019-08-26T19:35:58.367Z] 19:35:58 INFO - File "/usr/lib/python2.7/subprocess.py", line 190, in check_call
[task 2019-08-26T19:35:58.368Z] 19:35:58 INFO - raise CalledProcessError(retcode, cmd)
[task 2019-08-26T19:35:58.368Z] 19:35:58 INFO - subprocess.CalledProcessError: Command '['/usr/bin/pactl', 'load-module', 'module-null-sink']' returned non-zero exit status 1
[task 2019-08-26T19:35:58.389Z] 19:35:58 ERROR - Return code: 1
Would you have any ideas or hints for me to try out? As noted in bug 1572311, changing the timer to -1
seems to fix GTest but only about 50% of the time. If you know someone that might be better suited to help, let me know - this pulseaudio issue has been plaguing my efforts since the beginning as this bug thread shows.
Comment 38•5 years ago
|
||
I don't have any specific advice... Starting PA from multiple places seems like the wrong approach. If test-linux.sh is responsible for setting the environment up, PA should be started there, once, and nowhere else. If PA is exiting after that, the reason should be present in the logs, so configure PA to run with verbose logging to somewhere, then examine the logs to find out what caused the exit and address that. We should be able to use the same code path on Debian and Ubuntu - any differences in PA behaviour should then be normalized by using the same command line and config everywhere.
Comment 39•5 years ago
|
||
Comment 40•5 years ago
|
||
Pushed by egao@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/4d4271a0ccad add new dockerfile for debian 10 (buster) test image and add necessary piping without switching the main CI pipeline from ubuntu 16.04 r=jmaher
Comment 41•5 years ago
|
||
Backed out changeset 4d4271a0ccad (bug 1565332) for Android Mochitest failures on a CLOSED TREE.
Backout link: https://hg.mozilla.org/integration/autoland/rev/f6714b862df3110ec4435d152ddbdbe6f3555527
Push with failures: https://treeherder.mozilla.org/#/jobs?repo=autoland&resultStatus=testfailed%2Cbusted%2Cexception&revision=4d4271a0ccad1e5a561ca93bd4144767cf0503a4&selectedJob=264342207
Log link: https://treeherder.mozilla.org/logviewer.html#/jobs?job_id=264342207&repo=autoland&lineNumber=525
Log snippet:
[task 2019-08-30T22:33:31.146Z] script.py exitcode 127
[taskcluster 2019-08-30T22:33:31.160Z] Exit Code: 1
[taskcluster 2019-08-30T22:33:31.160Z] User Time: 307.239ms
[taskcluster 2019-08-30T22:33:31.160Z] Kernel Time: 113.971ms
[taskcluster 2019-08-30T22:33:31.160Z] Wall Time: 11.883975031s
[taskcluster 2019-08-30T22:33:31.160Z] Result: FAILED
[taskcluster 2019-08-30T22:33:31.160Z] === Task Finished ===
[taskcluster 2019-08-30T22:33:31.161Z] Task Duration: 11.885089203s
[taskcluster 2019-08-30T22:33:31.682Z] Uploading redirect artifact public/logs/live.log to URL https://queue.taskcluster.net/v1/task/Gr-uUzz9TTidG7B3E_go7Q/runs/0/artifacts/public/logs/live_backing.log with mime type "text/plain; charset=utf-8" and expiry 2020-08-29T22:28:11.106Z
[taskcluster:error] exit status 1
Comment 42•5 years ago
|
||
Pushed by egao@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/a425cc70e4de add new dockerfile for debian 10 (buster) test image and add necessary piping without switching the main CI pipeline from ubuntu 16.04 r=jmaher
Comment 43•5 years ago
|
||
bugherder |
Updated•5 years ago
|
Assignee | ||
Comment 44•5 years ago
•
|
||
The base debian10 patch is now merged into mozilla-central.
There still exists two odd issues;
failed suites reporting success (non-zero exit codes are overridden to 0)this is addressed with Attachment 9090622 [details].- GTest still suffers from intermittent
error initializing cubeb library
errors despitepactl, pulseaudio
andpacmd
functioning, and this is an intermittent issue that occurs in roughly ~30% of the pushes
Other concerns:
- desktop environment differences may be responsible for multiple failures
Assignee | ||
Comment 45•5 years ago
|
||
Updated•5 years ago
|
Comment 46•5 years ago
|
||
Pushed by egao@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/7c9167e4f8fb add option to toggle linux desktop tests to run on debian 10 r=ahal
Comment 47•5 years ago
|
||
bugherder |
Comment 48•5 years ago
|
||
Pushed by egao@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/ccf8a603df02 restore set -e in the debian-specific block in test-linux.sh r=gbrown
Assignee | ||
Comment 49•5 years ago
|
||
According to my work in bug 1572311 comment 24 and a few other try pushes, it seems that we are safely able to conclude the following:
- pulseaudio is much more reliably initialized from
mozharness
instead oftest-linux.sh
- not all tests require pulseaudio but we (for some reason) blanket require pulseaudio, but we initialize
NEED_PULSEAUDIO
to always be true in this line
So the next course of action(s) are:
- investigate if pulseaudio should be stripped from
test-linux.sh
;- if above is a yes, then where it should be placed in:
run-task
desktop_unittest.py
- elsewhere
- if above is a yes, then where it should be placed in:
- investigate the list of tests that actually require pulseaudio, and restrict the initialization of pulseaudio to just those tests
- this is a task under 1518930
Comment 50•5 years ago
|
||
bugherder |
Assignee | ||
Comment 51•5 years ago
|
||
Updated•5 years ago
|
Updated•5 years ago
|
Updated•5 years ago
|
Updated•5 years ago
|
Comment 53•5 years ago
|
||
Pushed by archaeopteryx@coole-files.de: https://hg.mozilla.org/integration/autoland/rev/8b5c572d7695 Pin pip to 19.2.3 to avoid breaking docker image. a=bustage-fix CLOSED TREE
Comment 54•5 years ago
|
||
bugherder |
Comment 55•5 years ago
|
||
Pushed by egao@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/fd0d2f340380 change how pulseaudio is initialized for Debian 10 test image without affecting existing Ubuntu 16.04 process r=jlund,dustin
Comment 56•5 years ago
|
||
bugherder |
Assignee | ||
Comment 57•5 years ago
|
||
I tried a try push with the command xrandr
added during the mochitest runtests.py
setup phase to check the screen resolution to ensure the screen resolution is as expected, and this is what I get:
[task 2019-10-21T23:38:16.417Z] xrandr
[task 2019-10-21T23:38:16.417Z] + xrandr
[task 2019-10-21T23:38:16.442Z] xrandr: Failed to get size of gamma for output screen
[task 2019-10-21T23:38:16.442Z] Screen 0: minimum 1 x 1, current 1600 x 1200, maximum 1600 x 1200
[task 2019-10-21T23:38:16.443Z] screen connected 1600x1200+0+0 0mm x 0mm
[task 2019-10-21T23:38:16.444Z] 1600x1200 0.00*
This value is in line with what I expect, but we see some screen resolution related failures scattered throughout various suites. Maybe there is something else to it.
Assignee | ||
Updated•4 years ago
|
Assignee | ||
Updated•4 years ago
|
Assignee | ||
Comment 58•4 years ago
|
||
Comment 59•4 years ago
|
||
Pushed by egao@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/f75d0d696161 remove gnome-initial-setup and games frmo ubuntu1804 image r=jmaher
Comment 60•4 years ago
|
||
bugherder |
Assignee | ||
Comment 61•4 years ago
|
||
Comment 62•4 years ago
|
||
Pushed by egao@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/3d65525be59d clean up references to debian10-test in-tree r=jmaher
Comment 63•4 years ago
|
||
bugherder |
Assignee | ||
Comment 64•4 years ago
|
||
This task has been finished.
Updated•4 years ago
|
Assignee | ||
Updated•4 years ago
|
Description
•