Closed
Bug 1171033
(tc-linux64-debug)
Opened 9 years ago
Closed 9 years ago
[tracking] Schedule linux64 desktop tests via TaskCluster on try and enable with tier2 status
Categories
(Taskcluster :: General, defect)
Taskcluster
General
Tracking
(Not tracked)
RESOLVED
FIXED
People
(Reporter: ahal, Assigned: armenzg)
References
(Blocks 1 open bug)
Details
(Whiteboard: [bb2tc] [milestone1][leave-open])
Attachments
(1 file, 1 obsolete file)
Linux64 opt builds are currently scheduled on try (but hidden). It's a good time to look into scheduling its tests as well.
This will double the load until we can decommission the buildbot scheduled tests, so I'll try and get them scheduled not by default. Apparently this might be hard in TC.
Assignee | ||
Comment 1•9 years ago
|
||
Perharps we can add a special flag to the TC try parser to activate non-default jobs.
I don't know.
Blocks: bb-to-tc
Reporter | ||
Comment 2•9 years ago
|
||
(In reply to Armen Zambrano G. (:armenzg - Toronto) from comment #1)
> Perharps we can add a special flag to the TC try parser to activate
> non-default jobs.
> I don't know.
I realized for the time being, I can just keep pushing the change that schedules the tests to try. If it turns out they need a lot of greening up and multiple people start working on it, then we might need to land it permanently, but until then, another problem for another time.
Reporter | ||
Comment 3•9 years ago
|
||
I got mochitests scheduled but they're failing because they can't find the tests.zip. I haven't looked into it yet.
https://treeherder.mozilla.org/#/jobs?repo=try&revision=d83136a649e9&exclusion_profile=false
Reporter | ||
Comment 4•9 years ago
|
||
Morgan, do you think this is caused by that artifact problem you mentioned in the meeting? If so, is there a bug I can follow along?
Flags: needinfo?(winter2718)
Reporter | ||
Comment 5•9 years ago
|
||
Talked to Morgan on irc, it's likely the same problem. Tests.zip is an artifact in taskcluster, so it isn't getting uploaded by the builds yet due to bug 1172107.
Depends on: 1172107
Flags: needinfo?(winter2718)
Reporter | ||
Comment 6•9 years ago
|
||
Bug 1171033 - Schedule linux64 mochitest with taskcluster
Reporter | ||
Comment 7•9 years ago
|
||
Quick update:
Tests.zip is now found fine and harness runs, but firefox can't seem to start due to the following error:
firefox: /lib/x86_64-linux-gnu/libc.so.6: version `GLIBC_2.17' not found (required by /home/worker/build/application/firefox/firefox)
I added "sudo apt-get update && sudo apt-get install -y libc6" to the test image's Dockerfile, but that didn't seem to work, or else I'm not testing it properly. Here's the latest try run:
https://treeherder.mozilla.org/#/jobs?repo=try&revision=1fa1f76ee1bd&exclusion_profile=false
Reporter | ||
Comment 8•9 years ago
|
||
Ok I have a better understanding of what needs to happen regarding the glibc issue now. There are two ways to solve the problem:
1. Link to the proper glibc in the tester or builder. The quick hack workaround is to download/unzip 2.17 in the tester and add it at the beginning of LD_LIBRARY_PATH. The proper solution is to fix bug 1179818. Adding that as a blocker because even if I do hack around the issue, it's probably a pre-requisite to scheduling live and un-hidden.
2. Upgrade the tester image to 14.04. This might be tricky however, as I suspect it will cause a lot of test failures. I'd like to not worry about this at the same time as migrating to taskcluster. That being said, it's probably worth at least pushing to try and seeing how things go.
Depends on: 1179818
Updated•9 years ago
|
Component: TaskCluster → General
Product: Testing → Taskcluster
Updated•9 years ago
|
Blocks: q3-bb-tc-migration
Comment 9•9 years ago
|
||
Andrew -- I think this is ready jump back in. We have green builds! They produce artifacts with simple names, and there's already code in the `mach taskcluster-graph` command for turning those simple names into artifact URLs for test tasks. This is how B2G links its builds and tests.
I think the tricky bit will be getting an operating system image that we're happy with, but IIRC you'd already made some progress on that front?
Flags: needinfo?(ahalberstadt)
Reporter | ||
Comment 10•9 years ago
|
||
Woohoo, thanks!
I doubt the progress I made on the image will be very relevant anymore since the os has changed, but it wasn't very much anyway. From here, just scheduling and pushing to try to see what happens is the best way to go. I'm trying to finish something else up this quarter, but I'm sure I'll get started on them again in Q4.
Flags: needinfo?(ahalberstadt)
Comment 11•9 years ago
|
||
The *build* image changed. The tests can run in whatever OS you prefer.
Reporter | ||
Comment 12•9 years ago
|
||
Right, for some reason I thought I'd need to change the tester image to match the toolchain of the build.. but that's what you just fixed by making the build image match the test.
In that case yes, I have a slightly modified image I can work off.
Updated•9 years ago
|
No longer blocks: q3-bb-tc-migration
Reporter | ||
Comment 13•9 years ago
|
||
On Friday I got some tests running:
https://treeherder.mozilla.org/#/jobs?repo=try&revision=844213339596
The tests all passed, but the job is still orange because of some 'pactl' (pulseaudio related tool) failure. I don't understand why it's happening. Hopefully just a matter of updating some config in the image.
The good news is that it's mochitest specific, so shouldn't block other tests suites. The code path that causes the failure can also be turned off if we wanted by not passing in --use-test-media-devices.
Assignee | ||
Comment 15•9 years ago
|
||
ahal: I'm trying to land this piece of code which allows test jobs to work even if --read-buildbot-config is used (since Buildbot jobs can only be run with that action).
For TC tasks, I assume that the builds specify ['extra']['locations']['build'].
I assume you will be defining the call to Mozharness with --installer-url, nevertheless, I wanted to let you know what I'm doing.
[1] https://reviewboard.mozilla.org/r/21137
if parent_task['extra'].get('locations'):
# Build tasks generated under TC specify where they upload their builds
installer_path = parent_task['extra']['locations']['build']
self.set_artifacts(
self.url_to_artifact(parent_id, installer_path),
self.url_to_artifact(parent_id, 'public/build/test_packages.json'),
self.url_to_artifact(parent_id, 'public/build/target.crashreporter-symbols.zip')
)
Reporter | ||
Comment 16•9 years ago
|
||
Comment on attachment 8621613 [details]
MozReview Request: Bug 1171033 - Schedule linux64 taskcluster tests on try
Bug 1171033 - Schedule linux64 mochitest with taskcluster
Updated•9 years ago
|
Whiteboard: [bb2tc] [milestone1]
Reporter | ||
Comment 17•9 years ago
|
||
Comment on attachment 8621613 [details]
MozReview Request: Bug 1171033 - Schedule linux64 taskcluster tests on try
Bug 1171033 - Schedule linux64 mochitest with taskcluster
Reporter | ||
Comment 18•9 years ago
|
||
Here's a try run with the latest patch that also schedules mochitest bc, dt and gl:
https://treeherder.mozilla.org/#/jobs?repo=try&revision=6210c226e2e0
They all fail with the pactl issue (bug 1214194), but tests still seem to be run and passing.
Reporter | ||
Comment 19•9 years ago
|
||
Also note the need for a bit of a creative try syntax. Fixing that is bug 1213314.
Reporter | ||
Comment 20•9 years ago
|
||
Comment on attachment 8621613 [details]
MozReview Request: Bug 1171033 - Schedule linux64 taskcluster tests on try
Bug 1171033 - Schedule linux64 taskcluster tests on try
Attachment #8621613 -
Attachment description: MozReview Request: Bug 1171033 - Schedule linux64 mochitest with taskcluster → MozReview Request: Bug 1171033 - Schedule linux64 taskcluster tests on try
Reporter | ||
Comment 21•9 years ago
|
||
Here's a new try run with reftest and xpcshell:
https://treeherder.mozilla.org/#/jobs?repo=try&revision=6529e38db5d2
Some notes:
1. A green reftest job! The other jobs all have test failures, but seemingly no harness-wide issues.
2. These are chunked based on debug builds in buildbot.. still have to figure how to distinguish opt vs debug chunking in the configs (or ignore opt for now like originally planned).
3. This patch will likely need to be refactored once :dustin's image refactor lands. Results may also vary on the new image.
4. There's some info missing in the "Job details" pane that shows up for buildbot jobs (i.e passed/failed/skipped, artifacts uploaded, etc..). Though I think this is a wider taskcluster issue.
Depends on: 1213325
Assignee | ||
Comment 22•9 years ago
|
||
Sweet!
For #4, is there a bug filed that you're aware? Last quarter we managed to get some decent traction on bringing the sheriffability level and I think this is one of them.
Reporter | ||
Comment 23•9 years ago
|
||
Comment on attachment 8621613 [details]
MozReview Request: Bug 1171033 - Schedule linux64 taskcluster tests on try
Bug 1171033 - Schedule linux64 taskcluster tests on try
Comment 24•9 years ago
|
||
FYI, at least one of the xpcshell errors you're seeing looks like a misconfigured system encoding:
TEST-UNEXPECTED-FAIL | dom/plugins/test/unit/test_bug455213.js | run_test - [run_test : 75] "Plug-in for testing purposes.â„¢ (हिनà¥\x8Dदी ä¸æ–‡ العربية)" == "Plug-in for testing purposes.™ (हिन्दी 中文 العربية)"
You might just need LANG=en_US.UTF-8 in the environment.
Reporter | ||
Comment 25•9 years ago
|
||
I want to get the task configs landed minus the branch configs to actually schedule them, so that:
a) they don't bitrot
b) it's easier for other people to push them to try
c) when we're ready to enable them, it's just a small and easy to understand patch that needs to land
Here's an actually pretty green looking try run:
https://treeherder.mozilla.org/#/jobs?repo=try&revision=0b1d20fc2248
Aside from the orange jobs, we'll need to fix:
1. Add --use-test-media-devices back (dustin has the fix, we just need to test it out)
2. There are failures in the debug bc5 job, investigate why those aren't turning the job orange.
3. xpcshell tasks seem to just abruptly end with no output indicating why.
I'll file new bugs to tackle these problems in time.
Reporter | ||
Comment 26•9 years ago
|
||
Comment on attachment 8621613 [details]
MozReview Request: Bug 1171033 - Schedule linux64 taskcluster tests on try
Bug 1171033 - Add taskcluster linux64 test configs, r=dustin
This adds test configs for desktop linux64 unittests, including: mochitest-plain,
mochitest-browser-chrome, mochitest-devtools-chrome, reftest and xpcshell. It
also does a minor refactor of the b2g configs to remove some b2g-specific logic
from the base 'test.yml' config.
This does *not* schedule these tests anywhere just yet.
Attachment #8621613 -
Attachment description: MozReview Request: Bug 1171033 - Schedule linux64 taskcluster tests on try → MozReview Request: Bug 1171033 - Add taskcluster linux64 test configs, r=dustin
Attachment #8621613 -
Flags: review?(dustin)
Reporter | ||
Comment 27•9 years ago
|
||
Schedule taskcluster linux64 tests on try
Reporter | ||
Comment 28•9 years ago
|
||
Also to note, that patch cargo cults a lot of how the b2g configs were set up. E.g, we may want to start organizing the configs into subdirectories, and/or try to use less inheritance.
It also currently doesn't handle different build types. E.g, if you want opt to have different chunking from debug, you currently have to override that in the branch configs. There's no way to set it directly in the test configs.
Comment 29•9 years ago
|
||
Comment on attachment 8621613 [details]
MozReview Request: Bug 1171033 - Schedule linux64 taskcluster tests on try
https://reviewboard.mozilla.org/r/11013/#review20493
This looks good! A few comments, but I'd be happy to land this as-is, perhaps farming these out to low-priority bugs?
::: testing/taskcluster/tasks/test.yml:27
(Diff revision 6)
> loopbackAudio: true
It'd be nice to not have these defined for every job, if they're not required..
::: testing/taskcluster/tasks/test.yml:33
(Diff revision 6)
> tc-vcs: '/home/worker/.tc-vcs'
I'm confident that we don't need the tc-vcs cache for firefox tests. I don't know about B2G, but maybe this can move to the b2g base file?
As for linux-cache -- I have no idea what that's for. Is it necessary/useful in this case? Caching brings risks of cache poisoning and the potential need to clobber, so unless this is a known win I think we should leave it out.
Attachment #8621613 -
Flags: review?(dustin) → review+
Comment 30•9 years ago
|
||
To the note -- yeah, it's kind of a mess. I've been avoiding modifying a lot of the B2G stuff, because I don't know how it works or want to get involved in maintaining it; and I think that once we've loaded our task definitions in using this system, we will have a better idea of the requirements for a system to generate them in a more maintainable fashion.
Reporter | ||
Comment 31•9 years ago
|
||
Comment on attachment 8621613 [details]
MozReview Request: Bug 1171033 - Schedule linux64 taskcluster tests on try
Bug 1171033 - Add taskcluster linux64 test configs, r=dustin
This adds test configs for desktop linux64 unittests, including: mochitest-plain,
mochitest-browser-chrome, mochitest-devtools-chrome, reftest and xpcshell. It
also does a minor refactor of the b2g configs to remove some b2g-specific logic
from the base 'test.yml' config.
This does *not* schedule these tests anywhere just yet.
Reporter | ||
Comment 32•9 years ago
|
||
Comment on attachment 8677585 [details]
MozReview Request: Schedule taskcluster linux64 tests on try
Schedule taskcluster linux64 tests on try
Reporter | ||
Comment 33•9 years ago
|
||
Comment on attachment 8621613 [details]
MozReview Request: Bug 1171033 - Schedule linux64 taskcluster tests on try
Bug 1171033 - Add taskcluster linux64 test configs, r=dustin
This adds test configs for desktop linux64 unittests, including: mochitest-plain,
mochitest-browser-chrome, mochitest-devtools-chrome, reftest and xpcshell. It
also does a minor refactor of the b2g configs to remove some b2g-specific logic
from the base 'test.yml' config.
This does *not* schedule these tests anywhere just yet.
Reporter | ||
Comment 34•9 years ago
|
||
Comment on attachment 8677585 [details]
MozReview Request: Schedule taskcluster linux64 tests on try
Schedule taskcluster linux64 tests on try
Reporter | ||
Comment 35•9 years ago
|
||
Fixed review comments. Here's a try run proving that b2g emulator and mulet didn't get broken, which is all I care about for now:
https://treeherder.mozilla.org/#/jobs?repo=try&revision=4517d152da44
The tc mochitest failures are because I didn't remove --use-test-media-devices this time, and the image hasn't been updated with the fix yet.
Reporter | ||
Updated•9 years ago
|
Whiteboard: [bb2tc] [milestone1] → [bb2tc] [milestone1][leave-open]
Comment 36•9 years ago
|
||
Reporter | ||
Comment 37•9 years ago
|
||
Had to back this out for breaking the decision task in:
https://hg.mozilla.org/integration/mozilla-inbound/rev/d351ee79b4e4
Still investigating why.
Reporter | ||
Updated•9 years ago
|
Attachment #8621613 -
Attachment description: MozReview Request: Bug 1171033 - Add taskcluster linux64 test configs, r=dustin → MozReview Request: Bug 1171033 - Add taskcluster linux64 test configs (but not scheduled anywhere yet), r=dustin
Reporter | ||
Comment 38•9 years ago
|
||
Comment on attachment 8621613 [details]
MozReview Request: Bug 1171033 - Schedule linux64 taskcluster tests on try
Bug 1171033 - Add taskcluster linux64 test configs (but not scheduled anywhere yet), r=dustin
This adds test configs for desktop linux64 unittests, including: mochitest-plain,
mochitest-browser-chrome, mochitest-devtools-chrome, reftest and xpcshell. It
also does a minor refactor of the b2g configs to remove some b2g-specific logic
from the base 'test.yml' config.
This does *not* schedule these tests anywhere just yet.
Reporter | ||
Comment 39•9 years ago
|
||
Comment on attachment 8677585 [details]
MozReview Request: Schedule taskcluster linux64 tests on try
Schedule taskcluster linux64 tests on try
Reporter | ||
Comment 40•9 years ago
|
||
Pretty sure I found and fixed the issue, but I was never able to reproduce by running |mach taskcluster-graph| locally :/. Here's a try run that includes gaia_build_tests this time:
https://treeherder.mozilla.org/#/jobs?repo=try&revision=3c25591967ee
Comment 41•9 years ago
|
||
Comment 42•9 years ago
|
||
Reporter | ||
Comment 43•9 years ago
|
||
Here's latest try run with --use-test-media-devices added back in and dustin's fix for the pactl issue. It seems to work now, though it looks considerably less green than it did before:
https://treeherder.mozilla.org/#/jobs?repo=try&revision=b84d34d7166a
Also, the jobs got scheduled twice for some reason?
But either way, we're at a point where it's worth talking about how we want to get these things turned on. E.g, it looks like we could move mochitest-gl over immediately if we want. Do we run it side-by-side with buildbot for awhile? Do we just disable the buildbot job right away? Is there anything else (i.e sheriff wise) blocking us from turning it on?
Comment 45•9 years ago
|
||
(In reply to Andrew Halberstadt [:ahal] from comment #43)
> But either way, we're at a point where it's worth talking about how we want
> to get these things turned on. E.g, it looks like we could move mochitest-gl
> over immediately if we want. Do we run it side-by-side with buildbot for
> awhile? Do we just disable the buildbot job right away? Is there anything
> else (i.e sheriff wise) blocking us from turning it on?
\o/ I am overjoyed by this question! There are no sheriff blockers.
I suggest we make these Tier2 initially -- for our evaluation period.
The idea was to run jobs side-by-side for a while. I don't think we had a fixed period identified, so looping :jgriffin in.
Flags: needinfo?(sdeckelmann) → needinfo?(jgriffin)
Comment 46•9 years ago
|
||
I agree about running them side-by-side as Tier 2 for a while. I think two weeks should be enough time to compare failure rates to give us confidence that we can turn the buildbot jobs off. We should let the sheriffs know what we're doing, so they can look for problems during that window as well.
Flags: needinfo?(jgriffin)
Reporter | ||
Updated•9 years ago
|
Alias: tc-linux64
Summary: Schedule linux64 desktop tests via TaskCluster on try → [tracking] Schedule linux64 desktop tests via TaskCluster on try
Reporter | ||
Comment 47•9 years ago
|
||
Comment on attachment 8621613 [details]
MozReview Request: Bug 1171033 - Schedule linux64 taskcluster tests on try
Bug 1171033 - Schedule linux64 taskcluster tests on try
Attachment #8621613 -
Attachment description: MozReview Request: Bug 1171033 - Add taskcluster linux64 test configs (but not scheduled anywhere yet), r=dustin → MozReview Request: Bug 1171033 - Schedule linux64 taskcluster tests on try
Reporter | ||
Updated•9 years ago
|
Attachment #8677585 -
Attachment is obsolete: true
Reporter | ||
Comment 48•9 years ago
|
||
(crap, latest mozreview patch overwrote the one that already landed)
For anyone wanting to help green up the tests, here's the latest state:
https://treeherder.mozilla.org/#/jobs?repo=try&revision=603b29218a37
Just push the attached patch to try along with test fix ups or disablings.
Assignee | ||
Comment 49•9 years ago
|
||
How relevant are these?
[dix] Could not init font path element /usr/share/fonts/X11/100dpi/:unscaled, removing from list!
[dix] Could not init font path element /usr/share/fonts/X11/75dpi/:unscaled, removing from list!
[dix] Could not init font path element /usr/share/fonts/X11/Type1, removing from list!
[dix] Could not init font path element /usr/share/fonts/X11/100dpi, removing from list!
[dix] Could not init font path element /usr/share/fonts/X11/75dpi, removing from list!
[dix] Could not init font path element /var/lib/defoma/x-ttcidfont-conf.d/dirs/TrueType, removing from list!
Assignee | ||
Comment 51•9 years ago
|
||
It seems that my push is using my images.
https://treeherder.mozilla.org/#/jobs?repo=try&revision=30eb0ab890a2&filter-searchStr=TC <- armenzg
https://treeherder.mozilla.org/#/jobs?repo=try&revision=f28fce077ec4&filter-searchStr=TC <- jmaher
The results are looking similar (good!).
A weird thing which I have noticed is that we have to filter jobs with "TC".
The reason is that the Linux jobs we cancelled through TH's UI are not actually being cancelled (bug 1213520).
Assignee | ||
Comment 52•9 years ago
|
||
FTR, ignore my previous push as it was missing my docker changes. I'm currently working on bug 1223123 and more try pushes will happen there.
Current status summary
######################
Tier-2 blockers are bugs that if fixed will make jobs green.
Tier-1 blockers are bugs which will block switching equivalent Buildbot jobs off.
Tier-2 blockers:
* Bug 1222162 - browser/components/search/test has a few tests which are failing when run on task cluster
** might be a dupe of bug 1223123
* Bug 1223123 - We need a window manager of some type while running test jobs on linux
Tier-1 blockers:
* Bug 1218537 - Taskcluster jobs don't print information into the treeherder "Job details" pane
* Bug 1221553 - TaskCluster test jobs are skipping blobber uploads
* Bug 1221661 - task cluster test results seem to fail on crash reporter tests- probably a common cause
Assignee | ||
Comment 53•9 years ago
|
||
We're probably going to need Mesa as well (bug 1220658).
Assignee | ||
Comment 54•9 years ago
|
||
Feedback wanted in this comment. Thanks :)
This is the current set of results:
https://treeherder.mozilla.org/#/jobs?repo=try&revision=52bcfd8df0a1,96923331f0ef,042f329cbbb1&group_state=expanded
The first one shows Buildbot jobs, the second one has *most* Buildbot jobs and most harnesses work.
The last push addresses any remaining suites which the second push failed to have addressed correctly.
NOTE:
* I don't know of a *clean* way where we could run by default a chunk suite without running all other ones
* We will probably need to separate yml files into opt/debug to control chunking separately (since debug runs so slow)
* I have needed to chunk a lot more than on Buildbot
I think I will prep patches to enable jobs which are green by default and run it with --times 10 to see how stable they're are.
After that I think I will focus on fixing the upload of artifacts.
My current worry is bug 1221661 with the crashreporter not working.
I'm also considering creating a tier1 tracker bug to help separate issues which are not blockers to make this a tier2.
Suites that came green:
* cpp
* Jit1&2
* mochitest-push
* crashtest opt
* marionette opt
* jsreftest
* wr
Unknown (waiting on last push):
* luciddream
* mochitest-other
Assignee | ||
Comment 55•9 years ago
|
||
Should we aim to run the "opt" test jobs which come back green? (Since all my pushes have already been doing that)
Even if on Q1 we will only be swapping the debug builds between Buildbot and TC.
The counter argument to this would be that we would be running jobs side-by-side without clear intent of replacing the opt builds.
I need to know this so I can adjust my patches if we won't.
Assignee | ||
Comment 56•9 years ago
|
||
Since I started looking this in November, here's where we stand.
Anyone reading, please let me know if you have any questions of where we currently stand.
NOTE: Not all test issues have been filed as some of them will have shared issues. In January we should press the pedal down to file everything and get lots of developer involment. We will have then all *known* remaining docker image issues ironed out.
Latest clean push (still running atm):
https://treeherder.mozilla.org/#/jobs?repo=try&revision=fbe188bab4b0
* We're running 8 jobs side by side on the integration trees: http://mzl.la/1NH8qCA
** Cpp, Jit, mochistest gl/push, JsReftest, web platform reftest, xpcshell (4 chunks) and crashtest
** Crashtests will show up in the next merge from inbound to central
* bug 1221553 (dustin) - fix upload of artifacts
** It shows under the task inspector; not TH (bug 1218537 & bug 1218537)
* bug 1221661 (dustin) - allow enabling ptrace for worker in order for crash reporter to work
** dumping of crashes still needs work (bug 1233716)
* bug 1223123 (armenzg) - we added a window manager
* jmaher helped fixing various test failures and filed bugs for developers to help investigate
** FIXED: bug 1224641, bug 1232316 and bug 1232979
* bug 1227637 (dustin) - install latest patched mesa
* bug 1227657 (armenzg) - Removed Ubuntu's update prompt as it would still focus
* bug 1228289 (glandium) - Avoid l10n-check overwriting final package when MOZ_SIMPLE_PACKAGE_NAME is set
* bug 1228416 (dustin) - Redirect gnome-session's output into its own artifact to reduce intertwined noise with Mozharness' exectution
* bug 1230330 (armenzg) - Switch from b2gtest worker to deskt-test worker type
** b2gtest workers were running with capacity of 4 which seemed to be affecting tests
Current issues:
* Bug 1231618 - tc-vcs should change paths.default in repository's hgrc
* Bug 1232407 (armenzg) - Allow starting desktop-test images with VNC if requested
* Bug 1233716 (armenzg) - Fix dumping of crashes in docker containers
Test failures (these will have to be reviewed to determine if they're still valid):
* Bug 1222162 - bc1 issue
* Bug 1224724 - reftest issue
* Bug 1226282 - probably dupe of 1221661
* Bug 1226751 - intermittent issue (hopefully to be backed out)
* Bug 1232981 - bidi/83958-1*.html reftests fail on new linux64 docker container
* Bug 1232983 - border-radius/clipping-6.html is failing on linux64 docker container
* Bug 1232985 - /bugs/321402-4|5|6.xul fail on linux64 when running from a docker container
* Bug 1233054 - Luciddream jobs failing in desktop-test container to load libfreetype.so.6
* Bug 1232980 - many bidi/with-first-letter-*.html reftests fail on new linux64 docker container
* Bug 1233554 - Linux x64 debug crashtest e10s crash in docker image
Innocous:
* bug 1227652 - pygtk import issue (probably from another utility using python's logging)
Assignee | ||
Comment 57•9 years ago
|
||
Analysis per suite [1]:
* Web platform tests
** focus issues and time outs
* Luciddream (bug 1233054) - issues with packages
* Mochitests (plain, browser-chrome, devtools, other)
** focus issues and time outs
* Reftests
** lots of issues with pixels differing
There are also crashes happening in some of those jobs.
Similar situations are found for e10s equivalent jobs.
[1] https://treeherder.mozilla.org/#/jobs?repo=try&revision=fbe188bab4b0&group_state=expanded
Assignee | ||
Comment 58•9 years ago
|
||
Current plan (timeline 1 week)
* (dustin) We're going to try to run docker-worker on m1.medium
** Bug to be filed
** We want to compare results to match Buildbot's current set up is
** m1.medium is single core. xpcshell tests will start working again
** In the future, we will be able to run some jobs on multi-core versus single-core if we would like to
** NOTE: We don't what we will get when running on m1.medium
* (armenzg) Switch from gnome-session to xsession to match Buildbot set up
** Also reported better VNC results
** Not too much change wrt to test results [1]
* (armenzg) fix crash that mochitests are hitting (file bug)
* (armenzg) dowgrade mesa
* (armenzg) disable screen saver and locking
* (armenzg) Fix dumping of crashes in docker containers
[1] https://treeherder.mozilla.org/#/jobs?repo=try&group_state=expanded&revision=050d6b6dd77d,c63b24f60dd
Assignee | ||
Comment 59•9 years ago
|
||
For the curious, this is where we currently stand:
https://treeherder.mozilla.org/#/jobs?repo=try&group_state=expanded&revision=3c9a54d68c95
Last week's plan (7 days ago)
* [DONE] (dustin) Run jobs on m1.medium (par with releng)
* [DONE] (armenzg) Switch from gnome-session to xsession to match Buildbot set up
* [DONE] (armenzg) Disable apport crash reporter (stealing mochitest focus)
* [DONE] (armenzg) dowgrade mesa
* [DONE] (armenzg) disable screen saver and locking
* [DONE] (armenzg) Fix dumping of crashes in docker containers
Fixed in the last week:
* [DONE] (armenzg) Split mochitests into 8 chunks
* reviewed current dependencies
Current plan (1-2 weeks timeline)
* Find root issue for reftests failing (we assume some more docker image work)
* Investigate and file bugs for current test issues
* Aim to get green all the way with m1.medium
Ongoing:
* Split mochitest-other into mochitest-a11y and mochitest-chrome
* Add gtest
* Filed a11y issues - bug 1239301
* m-8 test-alerts.html - bug 1236036
* e10s crash in crashtest - bug 1233554
* Luciddream (this is an opt *only* job - bug 1233554
* R1 - bug 1232985
* Build tier1 issue - bug 1231618
* Intermittent m-3 (I don't see it happening anymore; removing dep) - bug 1197642
* Intermittent bc7 (I don't see it happening anymore; removing dep) - bug 1011171
Comment 60•9 years ago
|
||
I just did a scan of the tester-specific stuff that we do in PuppetAgain:
* need to start the window manager session (duh, but it did take us a long time to figure this out)
* tweaks::fonts
---> bug 1240056
* clean::appstate
---> only on OS X
* EDID data
---> only for GPUs
* gnome-settings-daemon upgrade to 3.4.2-0ubuntu0.6.2 (bug 846348)
---> we are already running 3.4.2-0ubuntu0.6.6
* disable jockey-gtk and deja-dup-monitor (bug 9849444)
---> bug 1240084
Assignee | ||
Comment 61•9 years ago
|
||
Here's where we are with greenness:
https://treeherder.mozilla.org/#/jobs?repo=try&revision=5e96bb453f67
We've enabled a lot of jobs on inbound.
I need to spend time filing bugs for R3, R-e10s2 and R-e10s3.
C-e10s is already filed (bug 1233554).
Assignee | ||
Comment 62•9 years ago
|
||
selena: we need to determine the scope of this bug.
A - make them run on try
B - run side by side *green* as tier-2 on all trunk trees
C - run side by side *green* as tier-1 on all trunk trees
D - replace the Buildbot jobs on trunk trees
At any point, we have to determine at which point this project is no more an engineering productivy effort but a releng/taskcluster effort.
I'm going to file a bug for D and assume that we're doing B in here [as per dustin [1]].
[1] <dustin> armenzg: for your part, I think it'd be OK if you left things with the tests enabled in TC at tier 2
Current known blockers for debug builds being tier1:
* bug 1174263
* bug 1231320
* bug 1234929
* bug 1231618
Flags: needinfo?(sdeckelmann)
Comment 63•9 years ago
|
||
(In reply to Armen Zambrano [:armenzg] - Engineering productivity from comment #62)
> selena: we need to determine the scope of this bug.
>
> A - make them run on try
> B - run side by side *green* as tier-2 on all trunk trees
> C - run side by side *green* as tier-1 on all trunk trees
> D - replace the Buildbot jobs on trunk trees
>
> At any point, we have to determine at which point this project is no more an
> engineering productivy effort but a releng/taskcluster effort.
>
> I'm going to file a bug for D and assume that we're doing B in here [as per
> dustin [1]].
Sounds good! Thank you for going through the details there.
I'll file a bug for C and make the bugs you mentioned blockers.
Flags: needinfo?(sdeckelmann)
Updated•9 years ago
|
Summary: [tracking] Schedule linux64 desktop tests via TaskCluster on try → [tracking] Schedule linux64 desktop tests via TaskCluster on try and enable with tier2 status
Assignee | ||
Comment 64•9 years ago
|
||
This is the current breakdown:
To be discussed:
* Pending jobs reports for sherrifs [1]
* Proper integration under Treeherder -> Infra menu [2]
* SETA support
(armenzg) Make TC Linux64 *debug* test jobs tier2
* bug 1232985: /bugs/321402-4|5|6.xul fail on linux64 when running from a docker container
* bug 1233554: Linux x64 debug crashtest e10s crash in docker image (tests/reftest/tests/dom/canvas/crashtests/780392-1.html)
* bug 1241297: wpt-3 e10s always takes as long as the max runtime allows it to
* bug 1238435: Intermittent e10s TEST-UNEXPECTED-TIMEOUT | /html/dom/reflection-forms.html, /html/dom/reflection-embedded.html, /html/dom/reflection-grouping.html followed by busting the whole run
* bug 1242033: Linux x64 debug e10s reftest 3 - element-paint-native-widget.html element-paint-native-widget-ref.html
* bug 1242682: Separate dom/media into its own subsuite
Make TC Linux64 *debug* test jobs tier1 (dropping associated Buildbot builders)
* bug 1218537: It's not possible to submit multiple "Job Info" artifacts
* bug 1241280: Fix web platform tests grouping
* bug 1242023: Cannot schedule buildbot bridge builds
* This would allow us to schedule the jobs on the Buildbot generated build without having to wait for L64 debug builds to replace the Buildbot one
Make TC Linux64 *OPT* test jobs tier1
* bug 1233054: [opt] Luciddream jobs failing in desktop-test container to load libfreetype.so.6
Make TC Linux64 *debug* _BUILD_ jobs tier1
* bug 1231618: (tier1 issue) tc-vcs should change paths.default in repository's hgrc
* bug 1241111: Allow overriding SOURCE_REV_URL, SOURCE_REPO, SOURCE_CHANGESET
Optimizations:
* Evaluate switching to m3 once we are completely green on m1
* bug 1235889: time to run taskcluster jobs take 20% longer than buildbot peers
[1]
http://builddata.pub.build.mozilla.org/reports/pending/pending.html
[2] http://people.mozilla.org/~armenzg/sattap/99d8ad71.png
Assignee | ||
Updated•9 years ago
|
Assignee | ||
Comment 65•9 years ago
|
||
The list of dependencies is now up to date.
Comment 66•9 years ago
|
||
we need to verify runtimes and intermittent rates as compared to buildbot.
Assignee | ||
Updated•9 years ago
|
Alias: tc-linux64 → tc-linux64-debug
Assignee | ||
Comment 67•9 years ago
|
||
After the experiment from this weekend, jmaher discovered few discrepancies to what we should actually be running.
For instance, a lot of e10s jobs were *not* running as e10s due to a bug in the task definitions (the payload was defined twice).
Due to this, our greeness has gone rather up [1]
In the same fashion as crashtest e10s, we're seeing crashes for:
* m-4
* d-t{1,2,9}
New issues:
* marionette - lots of test issues
* dt8 - it used to take 30mins. It now times out
* Wr - various test failures
e10s issues:
* m10 - test_alerts.html is back
* bc1 - test/alerts/browser_notification_close.js
* bc6 - (intermittent) sessionstore/test/browser_crashedTabs.js and sessionstore/test/browser_579879.js
* dt6 - inspector/test/browser_inspector_initialization.js
Hopefully to be fixed in my next push:
* JP - wrong values in the test definitions
* m-other - I increase the max run time
[1] https://treeherder.mozilla.org/#/jobs?repo=try&revision=c9397ac87d91&filter-searchStr=e10s&group_state=expanded
Assignee | ||
Comment 68•9 years ago
|
||
Bug 1245243 might fix most of the e10s issues we're seeing.
Comment 69•9 years ago
|
||
catlee asked me to see if I can help with the issues you guys are running into. Can someone please tell me where to look for the latest state of things? I have been looking at this bug and bug 1237024 but getting details straight in my mind is pretty difficult... Thanks!
Comment 70•9 years ago
|
||
thanks Ehsan! Right now we are waiting on bug 1245243, we suspect this will fix some of the e10s failures we are seeing (at the very least crashtest in bug 1233554). On top of that there are a list of failures to figure out:
https://treeherder.mozilla.org/#/jobs?repo=try&revision=aba896a9c4e2
We were waiting for the shared mem mount increase and then planned on retesting. Keep in mind the m-e10s(dt*) are scheduled on tc, but they are not greened up on buildbot- so ignore those.
Outside of greening up jobs, we need to resolve how to get stats for the sheriffs on wait times and other machine related info. Maybe you could allocate a day next week to help look at any remaining mochitest/browser-chrome tests that are still failing after we fix bug 1245243.
Assignee | ||
Comment 71•9 years ago
|
||
Removing bug 1242682 and bug 1243080 as they're no real blockers but improvements.
Comment 72•9 years ago
|
||
Sadly I only have tomorrow and then will go on vacation. Can you please ping me when I get back?
Assignee | ||
Comment 73•9 years ago
|
||
(In reply to :Ehsan Akhgari (Away 2/10-2/19) from comment #72)
> Sadly I only have tomorrow and then will go on vacation. Can you please
> ping me when I get back?
We will, however, we hope to be done by then. Enjoy your break!
Assignee | ||
Comment 74•9 years ago
|
||
Latest push [1]
Status summary:
* Since last week we properly pass --e10s flag to e10s jobs
** This is why we had a sudden increase of oranges
* We are now using a newer version of docker which allow us to modify /dev/shm
** This takes away a couple of crashes on e10s
* Wr is now passing (it is rather intermittent)
* Wr-e10s is now beeing scheduled and running green
Remaining issues
* Marionette still has a bunch of test failures - bug 1246283
* s/bc1/bc4/ - test/alerts/browser_notification_close.js (bug 1244936)
* Bump mochitest's timeout from 45 seconds to 90 seconds fixes some tests - bug 1246152 (still to land)
** This is due that running mochitests inside of docker is slower
* We will be increasing the global timeout for devtools (fixes dt8 time out - bug 1246279)
[1] https://treeherder.mozilla.org/#/jobs?repo=try&revision=4d8d791a8e5d&exclusion_profile=false&group_state=expanded
Assignee | ||
Comment 75•9 years ago
|
||
* m10 e10s issue - bug 1246019
Assignee | ||
Comment 76•9 years ago
|
||
Remaining test issues:
* m10 e10s - test_alerts.html - bug 1246019 - bug 1227730
* bc4 e10s - test/alerts/browser_notification_close.js (bug 1244936)
* wpt reftests - bdi-paragraph-level-container.html - bug 1247033
Fixed:
* Marionette - bug 1246283
* Some higher mochitest intermitency - mount workspace under hosts' SSD - bug 1246947
Assignee | ||
Comment 77•9 years ago
|
||
Status summary:
* bug 1227730 fixes all remaining perma failures; we're waiting on dev to get the patch reviewed and landed
* bug 1227637 releng hosts have a new mesa and we have to match it
This weekend we would like to run another experiment to compare Buildbot to TaskCluster/docker since we've mounted ~/workspace to the hosts' SSD disk.
Assignee | ||
Comment 78•9 years ago
|
||
It seems that mochitest-push has become a mess both for Buildbot and TC
https://treeherder.mozilla.org/#/jobs?repo=mozilla-inbound&filter-searchStr=mochitest-push&fromchange=ee60dc3d0655&exclusion_profile=false&tochange=c5d6c3e00c91
For TC, it was green until d320678c4fab (4 push on that view).
They're currently all hidden.
Assignee | ||
Comment 79•9 years ago
|
||
I think we should start looking into enabling mochitest-plain and mochitest-browser-chrome until kits gets bug 1227730 fixed.
We can hide m10 and bc4 until then.
This is our current latest *greenest* run:
https://treeherder.mozilla.org/#/jobs?repo=try&revision=55098f406c33&group_state=expanded
Assignee | ||
Comment 80•9 years ago
|
||
Everything is running side by side and visibly on Treeherder.
The only work left in here is finishing up the documentation and improve how to run this locally.
Depends on: 1245254
Assignee | ||
Comment 81•9 years ago
|
||
9 months later and all dependencies have been solved.
We're now running L64 debug test jobs as tier-2 on most integration/trunk repos (except few - bug 1252471).
We can now close this and deal with other platforms.
Status: ASSIGNED → RESOLVED
Closed: 9 years ago
Resolution: --- → FIXED
Comment 82•9 years ago
|
||
Congrats! And you managed to get it done before you turned into a pumpkin. :)
Reporter | ||
Comment 83•9 years ago
|
||
Awesome, that is an impressive dependency tree!
You need to log in
before you can comment on or make changes to this bug.
Description
•