Last Comment Bug 1171033 - (tc-linux64-debug) [tracking] Schedule linux64 desktop tests via TaskCluster on try and enable with tier2 status
(tc-linux64-debug)
: [tracking] Schedule linux64 desktop tests via TaskCluster on try and enable w...
Status: RESOLVED FIXED
[bb2tc] [milestone1][leave-open]
:
Product: Taskcluster
Classification: Other
Component: General (show other bugs)
: unspecified
: Unspecified Unspecified
-- normal
: ---
Assigned To: Armen Zambrano - Back on March 27th [:armenzg] (EDT/UTC-4)
:
:
Mentors:
: 1209064 (view as bug list)
Depends on: 1171140 1171390 1172107 1175938 1176031 1179818 1182142 1184084 1187047 1189892 1209064 1212881 1213314 1213325 1214194 1214809 1217396 1218542 1218791 1218841 1221553 1221661 1222162 1223123 1224641 1224724 1225484 1226282 1227637 1227657 1227730 1228289 1228416 1228632 1229893 1230330 1232070 1232316 1232407 1232796 1232979 1232980 1232981 1232983 1232985 1233044 1233554 1233716 1233725 1234352 1235892 1235898 1236036 1236047 1236076 1236081 1236086 1237068 1237663 1238739 1238948 1239301 1239327 1239766 1240056 1240084 1240171 1241277 1241280 1241297 1241506 1241942 1241979 1242033 1242502 1243005 1243039 1244233 1244720 1244936 1245243 1245254 1246019 1246152 1246279 1246283 1246947 1247033 1247382 1248028 1251693 1251734 1270885
Blocks: bb-to-tc 1235889 1240062 1243024
  Show dependency treegraph
 
Reported: 2015-06-03 07:52 PDT by Andrew Halberstadt [:ahal]
Modified: 2016-05-06 09:16 PDT (History)
10 users (show)
See Also:
QA Whiteboard:
Iteration: ---
Points: ---

MozReview Requests
Submitter Diff Changes Open Issues Last Updated
Loading...
Error loading review requests:
Show discarded requests

Attachments
MozReview Request: Bug 1171033 - Schedule linux64 taskcluster tests on try (40 bytes, text/x-review-board-request)
2015-06-12 08:04 PDT, Andrew Halberstadt [:ahal]
dustin: review+
Details | Review
MozReview Request: Schedule taskcluster linux64 tests on try (40 bytes, text/x-review-board-request)
2015-10-22 11:00 PDT, Andrew Halberstadt [:ahal]
no flags Details | Review

Description User image Andrew Halberstadt [:ahal] 2015-06-03 07:52:40 PDT
Linux64 opt builds are currently scheduled on try (but hidden). It's a good time to look into scheduling its tests as well.

This will double the load until we can decommission the buildbot scheduled tests, so I'll try and get them scheduled not by default. Apparently this might be hard in TC.
Comment 1 User image Armen Zambrano - Back on March 27th [:armenzg] (EDT/UTC-4) 2015-06-03 11:52:29 PDT
Perharps we can add a special flag to the TC try parser to activate non-default jobs.
I don't know.
Comment 2 User image Andrew Halberstadt [:ahal] 2015-06-04 06:14:46 PDT
(In reply to Armen Zambrano G. (:armenzg - Toronto) from comment #1)
> Perharps we can add a special flag to the TC try parser to activate
> non-default jobs.
> I don't know.

I realized for the time being, I can just keep pushing the change that schedules the tests to try. If it turns out they need a lot of greening up and multiple people start working on it, then we might need to land it permanently, but until then, another problem for another time.
Comment 3 User image Andrew Halberstadt [:ahal] 2015-06-05 09:04:51 PDT
I got mochitests scheduled but they're failing because they can't find the tests.zip. I haven't looked into it yet.

https://treeherder.mozilla.org/#/jobs?repo=try&revision=d83136a649e9&exclusion_profile=false
Comment 4 User image Andrew Halberstadt [:ahal] 2015-06-05 11:08:57 PDT
Morgan, do you think this is caused by that artifact problem you mentioned in the meeting? If so, is there a bug I can follow along?
Comment 5 User image Andrew Halberstadt [:ahal] 2015-06-05 12:36:22 PDT
Talked to Morgan on irc, it's likely the same problem. Tests.zip is an artifact in taskcluster, so it isn't getting uploaded by the builds yet due to bug 1172107.
Comment 6 User image Andrew Halberstadt [:ahal] 2015-06-12 08:04:43 PDT
Created attachment 8621613 [details]
MozReview Request: Bug 1171033 - Schedule linux64 taskcluster tests on try

Bug 1171033 - Schedule linux64 mochitest with taskcluster
Comment 7 User image Andrew Halberstadt [:ahal] 2015-06-18 07:45:17 PDT
Quick update: 
Tests.zip is now found fine and harness runs, but firefox can't seem to start due to the following error:
firefox: /lib/x86_64-linux-gnu/libc.so.6: version `GLIBC_2.17' not found (required by /home/worker/build/application/firefox/firefox)

I added "sudo apt-get update && sudo apt-get install -y libc6" to the test image's Dockerfile, but that didn't seem to work, or else I'm not testing it properly. Here's the latest try run:
https://treeherder.mozilla.org/#/jobs?repo=try&revision=1fa1f76ee1bd&exclusion_profile=false
Comment 8 User image Andrew Halberstadt [:ahal] 2015-07-10 09:29:49 PDT
Ok I have a better understanding of what needs to happen regarding the glibc issue now. There are two ways to solve the problem:

1. Link to the proper glibc in the tester or builder. The quick hack workaround is to download/unzip 2.17 in the tester and add it at the beginning of LD_LIBRARY_PATH. The proper solution is to fix bug 1179818. Adding that as a blocker because even if I do hack around the issue, it's probably a pre-requisite to scheduling live and un-hidden.

2. Upgrade the tester image to 14.04. This might be tricky however, as I suspect it will cause a lot of test failures. I'd like to not worry about this at the same time as migrating to taskcluster. That being said, it's probably worth at least pushing to try and seeing how things go.
Comment 9 User image Dustin J. Mitchell [:dustin] 2015-09-21 12:12:50 PDT
Andrew -- I think this is ready jump back in.  We have green builds!  They produce artifacts with simple names, and there's already code in the `mach taskcluster-graph` command for turning those simple names into artifact URLs for test tasks.  This is how B2G links its builds and tests.

I think the tricky bit will be getting an operating system image that we're happy with, but IIRC you'd already made some progress on that front?
Comment 10 User image Andrew Halberstadt [:ahal] 2015-09-22 10:09:14 PDT
Woohoo, thanks!

I doubt the progress I made on the image will be very relevant anymore since the os has changed, but it wasn't very much anyway. From here, just scheduling and pushing to try to see what happens is the best way to go. I'm trying to finish something else up this quarter, but I'm sure I'll get started on them again in Q4.
Comment 11 User image Dustin J. Mitchell [:dustin] 2015-09-22 10:35:22 PDT
The *build* image changed.  The tests can run in whatever OS you prefer.
Comment 12 User image Andrew Halberstadt [:ahal] 2015-09-22 11:12:41 PDT
Right, for some reason I thought I'd need to change the tester image to match the toolchain of the build.. but that's what you just fixed by making the build image match the test.

In that case yes, I have a slightly modified image I can work off.
Comment 13 User image Andrew Halberstadt [:ahal] 2015-10-05 11:31:39 PDT
On Friday I got some tests running:
https://treeherder.mozilla.org/#/jobs?repo=try&revision=844213339596

The tests all passed, but the job is still orange because of some 'pactl' (pulseaudio related tool) failure. I don't understand why it's happening. Hopefully just a matter of updating some config in the image.

The good news is that it's mochitest specific, so shouldn't block other tests suites. The code path that causes the failure can also be turned off if we wanted by not passing in --use-test-media-devices.
Comment 14 User image Andrew Halberstadt [:ahal] 2015-10-06 10:58:51 PDT
*** Bug 1209064 has been marked as a duplicate of this bug. ***
Comment 15 User image Armen Zambrano - Back on March 27th [:armenzg] (EDT/UTC-4) 2015-10-06 11:35:40 PDT
ahal: I'm trying to land this piece of code which allows test jobs to work even if --read-buildbot-config is used (since Buildbot jobs can only be run with that action).

For TC tasks, I assume that the builds specify ['extra']['locations']['build'].

I assume you will be defining the call to Mozharness with --installer-url, nevertheless, I wanted to let you know what I'm doing.

[1] https://reviewboard.mozilla.org/r/21137
            if parent_task['extra'].get('locations'):
                # Build tasks generated under TC specify where they upload their builds
                installer_path = parent_task['extra']['locations']['build']

                self.set_artifacts(
                    self.url_to_artifact(parent_id, installer_path),
                    self.url_to_artifact(parent_id, 'public/build/test_packages.json'),
                    self.url_to_artifact(parent_id, 'public/build/target.crashreporter-symbols.zip')
                )
Comment 16 User image Andrew Halberstadt [:ahal] 2015-10-07 09:17:41 PDT
Comment on attachment 8621613 [details]
MozReview Request: Bug 1171033 - Schedule linux64 taskcluster tests on try

Bug 1171033 - Schedule linux64 mochitest with taskcluster
Comment 17 User image Andrew Halberstadt [:ahal] 2015-10-09 08:04:56 PDT
Comment on attachment 8621613 [details]
MozReview Request: Bug 1171033 - Schedule linux64 taskcluster tests on try

Bug 1171033 - Schedule linux64 mochitest with taskcluster
Comment 18 User image Andrew Halberstadt [:ahal] 2015-10-13 06:44:35 PDT
Here's a try run with the latest patch that also schedules mochitest bc, dt and gl:
https://treeherder.mozilla.org/#/jobs?repo=try&revision=6210c226e2e0

They all fail with the pactl issue (bug 1214194), but tests still seem to be run and passing.
Comment 19 User image Andrew Halberstadt [:ahal] 2015-10-13 06:45:35 PDT
Also note the need for a bit of a creative try syntax. Fixing that is bug 1213314.
Comment 20 User image Andrew Halberstadt [:ahal] 2015-10-15 12:38:34 PDT
Comment on attachment 8621613 [details]
MozReview Request: Bug 1171033 - Schedule linux64 taskcluster tests on try

Bug 1171033 - Schedule linux64 taskcluster tests on try
Comment 21 User image Andrew Halberstadt [:ahal] 2015-10-15 12:39:19 PDT
Here's a new try run with reftest and xpcshell:
https://treeherder.mozilla.org/#/jobs?repo=try&revision=6529e38db5d2

Some notes:
1. A green reftest job! The other jobs all have test failures, but seemingly no harness-wide issues.
2. These are chunked based on debug builds in buildbot.. still have to figure how to distinguish opt vs debug chunking in the configs (or ignore opt for now like originally planned).
3. This patch will likely need to be refactored once :dustin's image refactor lands. Results may also vary on the new image.
4. There's some info missing in the "Job details" pane that shows up for buildbot jobs (i.e passed/failed/skipped, artifacts uploaded, etc..). Though I think this is a wider taskcluster issue.
Comment 22 User image Armen Zambrano - Back on March 27th [:armenzg] (EDT/UTC-4) 2015-10-15 14:11:55 PDT
Sweet!

For #4, is there a bug filed that you're aware? Last quarter we managed to get some decent traction on bringing the sheriffability level and I think this is one of them.
Comment 23 User image Andrew Halberstadt [:ahal] 2015-10-16 12:05:59 PDT
Comment on attachment 8621613 [details]
MozReview Request: Bug 1171033 - Schedule linux64 taskcluster tests on try

Bug 1171033 - Schedule linux64 taskcluster tests on try
Comment 24 User image Ted Mielczarek [:ted.mielczarek] 2015-10-20 03:49:53 PDT
FYI, at least one of the xpcshell errors you're seeing looks like a misconfigured system encoding:
 TEST-UNEXPECTED-FAIL | dom/plugins/test/unit/test_bug455213.js | run_test - [run_test : 75] "Plug-in for testing purposes.â„¢ (हिनà¥\x8Dदी 中文 العربية)" == "Plug-in for testing purposes.™ (हिन्दी 中文 العربية)" 

You might just need LANG=en_US.UTF-8 in the environment.
Comment 25 User image Andrew Halberstadt [:ahal] 2015-10-22 10:51:01 PDT
I want to get the task configs landed minus the branch configs to actually schedule them, so that:

a) they don't bitrot
b) it's easier for other people to push them to try
c) when we're ready to enable them, it's just a small and easy to understand patch that needs to land

Here's an actually pretty green looking try run:
https://treeherder.mozilla.org/#/jobs?repo=try&revision=0b1d20fc2248

Aside from the orange jobs, we'll need to fix:
1. Add --use-test-media-devices back (dustin has the fix, we just need to test it out)
2. There are failures in the debug bc5 job, investigate why those aren't turning the job orange.
3. xpcshell tasks seem to just abruptly end with no output indicating why.

I'll file new bugs to tackle these problems in time.
Comment 26 User image Andrew Halberstadt [:ahal] 2015-10-22 11:00:27 PDT
Comment on attachment 8621613 [details]
MozReview Request: Bug 1171033 - Schedule linux64 taskcluster tests on try

Bug 1171033 - Add taskcluster linux64 test configs, r=dustin

This adds test configs for desktop linux64 unittests, including: mochitest-plain,
mochitest-browser-chrome, mochitest-devtools-chrome, reftest and xpcshell. It
also does a minor refactor of the b2g configs to remove some b2g-specific logic
from the base 'test.yml' config.

This does *not* schedule these tests anywhere just yet.
Comment 27 User image Andrew Halberstadt [:ahal] 2015-10-22 11:00:29 PDT
Created attachment 8677585 [details]
MozReview Request: Schedule taskcluster linux64 tests on try

Schedule taskcluster linux64 tests on try
Comment 28 User image Andrew Halberstadt [:ahal] 2015-10-22 11:21:37 PDT
Also to note, that patch cargo cults a lot of how the b2g configs were set up. E.g, we may want to start organizing the configs into subdirectories, and/or try to use less inheritance.

It also currently doesn't handle different build types. E.g, if you want opt to have different chunking from debug, you currently have to override that in the branch configs. There's no way to set it directly in the test configs.
Comment 29 User image Dustin J. Mitchell [:dustin] 2015-10-22 11:37:01 PDT
Comment on attachment 8621613 [details]
MozReview Request: Bug 1171033 - Schedule linux64 taskcluster tests on try

https://reviewboard.mozilla.org/r/11013/#review20493

This looks good!  A few comments, but I'd be happy to land this as-is, perhaps farming these out to low-priority bugs?

::: testing/taskcluster/tasks/test.yml:27
(Diff revision 6)
>          loopbackAudio: true

It'd be nice to not have these defined for every job, if they're not required..

::: testing/taskcluster/tasks/test.yml:33
(Diff revision 6)
>        tc-vcs: '/home/worker/.tc-vcs'

I'm confident that we don't need the tc-vcs cache for firefox tests.  I don't know about B2G, but maybe this can move to the b2g base file?

As for linux-cache -- I have no idea what that's for.  Is it necessary/useful in this case?  Caching brings risks of cache poisoning and the potential need to clobber, so unless this is a known win I think we should leave it out.
Comment 30 User image Dustin J. Mitchell [:dustin] 2015-10-22 11:44:45 PDT
To the note -- yeah, it's kind of a mess.  I've been avoiding modifying a lot of the B2G stuff, because I don't know how it works or want to get involved in maintaining it; and I think that once we've loaded our task definitions in using this system, we will have a better idea of the requirements for a system to generate them in a more maintainable fashion.
Comment 31 User image Andrew Halberstadt [:ahal] 2015-10-22 12:49:03 PDT
Comment on attachment 8621613 [details]
MozReview Request: Bug 1171033 - Schedule linux64 taskcluster tests on try

Bug 1171033 - Add taskcluster linux64 test configs, r=dustin

This adds test configs for desktop linux64 unittests, including: mochitest-plain,
mochitest-browser-chrome, mochitest-devtools-chrome, reftest and xpcshell. It
also does a minor refactor of the b2g configs to remove some b2g-specific logic
from the base 'test.yml' config.

This does *not* schedule these tests anywhere just yet.
Comment 32 User image Andrew Halberstadt [:ahal] 2015-10-22 12:49:04 PDT
Comment on attachment 8677585 [details]
MozReview Request: Schedule taskcluster linux64 tests on try

Schedule taskcluster linux64 tests on try
Comment 33 User image Andrew Halberstadt [:ahal] 2015-10-22 13:07:50 PDT
Comment on attachment 8621613 [details]
MozReview Request: Bug 1171033 - Schedule linux64 taskcluster tests on try

Bug 1171033 - Add taskcluster linux64 test configs, r=dustin

This adds test configs for desktop linux64 unittests, including: mochitest-plain,
mochitest-browser-chrome, mochitest-devtools-chrome, reftest and xpcshell. It
also does a minor refactor of the b2g configs to remove some b2g-specific logic
from the base 'test.yml' config.

This does *not* schedule these tests anywhere just yet.
Comment 34 User image Andrew Halberstadt [:ahal] 2015-10-22 13:07:51 PDT
Comment on attachment 8677585 [details]
MozReview Request: Schedule taskcluster linux64 tests on try

Schedule taskcluster linux64 tests on try
Comment 35 User image Andrew Halberstadt [:ahal] 2015-10-23 07:18:27 PDT
Fixed review comments. Here's a try run proving that b2g emulator and mulet didn't get broken, which is all I care about for now:
https://treeherder.mozilla.org/#/jobs?repo=try&revision=4517d152da44

The tc mochitest failures are because I didn't remove --use-test-media-devices this time, and the image hasn't been updated with the fix yet.
Comment 37 User image Andrew Halberstadt [:ahal] 2015-10-23 08:29:07 PDT
Had to back this out for breaking the decision task in:
https://hg.mozilla.org/integration/mozilla-inbound/rev/d351ee79b4e4

Still investigating why.
Comment 38 User image Andrew Halberstadt [:ahal] 2015-10-23 10:55:30 PDT
Comment on attachment 8621613 [details]
MozReview Request: Bug 1171033 - Schedule linux64 taskcluster tests on try

Bug 1171033 - Add taskcluster linux64 test configs (but not scheduled anywhere yet), r=dustin

This adds test configs for desktop linux64 unittests, including: mochitest-plain,
mochitest-browser-chrome, mochitest-devtools-chrome, reftest and xpcshell. It
also does a minor refactor of the b2g configs to remove some b2g-specific logic
from the base 'test.yml' config.

This does *not* schedule these tests anywhere just yet.
Comment 39 User image Andrew Halberstadt [:ahal] 2015-10-23 10:55:31 PDT
Comment on attachment 8677585 [details]
MozReview Request: Schedule taskcluster linux64 tests on try

Schedule taskcluster linux64 tests on try
Comment 40 User image Andrew Halberstadt [:ahal] 2015-10-23 10:56:39 PDT
Pretty sure I found and fixed the issue, but I was never able to reproduce by running |mach taskcluster-graph| locally :/. Here's a try run that includes gaia_build_tests this time:
https://treeherder.mozilla.org/#/jobs?repo=try&revision=3c25591967ee
Comment 42 User image Carsten Book [:Tomcat] 2015-10-26 03:37:13 PDT
https://hg.mozilla.org/mozilla-central/rev/53b63a378f81
Comment 43 User image Andrew Halberstadt [:ahal] 2015-10-26 13:05:46 PDT
Here's latest try run with --use-test-media-devices added back in and dustin's fix for the pactl issue. It seems to work now, though it looks considerably less green than it did before:
https://treeherder.mozilla.org/#/jobs?repo=try&revision=b84d34d7166a

Also, the jobs got scheduled twice for some reason?

But either way, we're at a point where it's worth talking about how we want to get these things turned on. E.g, it looks like we could move mochitest-gl over immediately if we want. Do we run it side-by-side with buildbot for awhile? Do we just disable the buildbot job right away? Is there anything else (i.e sheriff wise) blocking us from turning it on?
Comment 44 User image Dustin J. Mitchell [:dustin] 2015-10-26 13:20:41 PDT
I think that's a question for Selena..
Comment 45 User image Selena Deckelmann :selenamarie :selena use ni? 2015-10-26 13:49:56 PDT
(In reply to Andrew Halberstadt [:ahal] from comment #43)

> But either way, we're at a point where it's worth talking about how we want
> to get these things turned on. E.g, it looks like we could move mochitest-gl
> over immediately if we want. Do we run it side-by-side with buildbot for
> awhile? Do we just disable the buildbot job right away? Is there anything
> else (i.e sheriff wise) blocking us from turning it on?

\o/  I am overjoyed by this question!  There are no sheriff blockers.

I suggest we make these Tier2 initially -- for our evaluation period.

The idea was to run jobs side-by-side for a while. I don't think we had a fixed period identified, so looping :jgriffin in.
Comment 46 User image Jonathan Griffin (:jgriffin) 2015-10-26 13:52:02 PDT
I agree about running them side-by-side as Tier 2 for a while. I think two weeks should be enough time to compare failure rates to give us confidence that we can turn the buildbot jobs off. We should let the sheriffs know what we're doing, so they can look for problems during that window as well.
Comment 47 User image Andrew Halberstadt [:ahal] 2015-11-04 06:39:05 PST
Comment on attachment 8621613 [details]
MozReview Request: Bug 1171033 - Schedule linux64 taskcluster tests on try

Bug 1171033 - Schedule linux64 taskcluster tests on try
Comment 48 User image Andrew Halberstadt [:ahal] 2015-11-04 06:41:20 PST
(crap, latest mozreview patch overwrote the one that already landed)

For anyone wanting to help green up the tests, here's the latest state:
https://treeherder.mozilla.org/#/jobs?repo=try&revision=603b29218a37

Just push the attached patch to try along with test fix ups or disablings.
Comment 49 User image Armen Zambrano - Back on March 27th [:armenzg] (EDT/UTC-4) 2015-11-04 12:14:19 PST
How relevant are these?
[dix] Could not init font path element /usr/share/fonts/X11/100dpi/:unscaled, removing from list!
[dix] Could not init font path element /usr/share/fonts/X11/75dpi/:unscaled, removing from list!
[dix] Could not init font path element /usr/share/fonts/X11/Type1, removing from list!
[dix] Could not init font path element /usr/share/fonts/X11/100dpi, removing from list!
[dix] Could not init font path element /usr/share/fonts/X11/75dpi, removing from list!
[dix] Could not init font path element /var/lib/defoma/x-ttcidfont-conf.d/dirs/TrueType, removing from list!
Comment 50 User image Armen Zambrano - Back on March 27th [:armenzg] (EDT/UTC-4) 2015-11-06 08:34:17 PST
I will be grabbing this.
Comment 51 User image Armen Zambrano - Back on March 27th [:armenzg] (EDT/UTC-4) 2015-11-12 09:57:58 PST
It seems that my push is using my images.
https://treeherder.mozilla.org/#/jobs?repo=try&revision=30eb0ab890a2&filter-searchStr=TC <- armenzg
https://treeherder.mozilla.org/#/jobs?repo=try&revision=f28fce077ec4&filter-searchStr=TC <- jmaher

The results are looking similar (good!).

A weird thing which I have noticed is that we have to filter jobs with "TC".
The reason is that the Linux jobs we cancelled through TH's UI are not actually being cancelled (bug 1213520).
Comment 52 User image Armen Zambrano - Back on March 27th [:armenzg] (EDT/UTC-4) 2015-11-12 11:46:04 PST
FTR, ignore my previous push as it was missing my docker changes. I'm currently working on bug 1223123 and more try pushes will happen there.

Current status summary
######################
Tier-2 blockers are bugs that if fixed will make jobs green.
Tier-1 blockers are bugs which will block switching equivalent Buildbot jobs off.

Tier-2 blockers:
* Bug 1222162 - browser/components/search/test has a few tests which are failing when run on task cluster
** might be a dupe of bug 1223123
* Bug 1223123 - We need a window manager of some type while running test jobs on linux

Tier-1 blockers:
* Bug 1218537 - Taskcluster jobs don't print information into the treeherder "Job details" pane
* Bug 1221553 - TaskCluster test jobs are skipping blobber uploads
* Bug 1221661 - task cluster test results seem to fail on crash reporter tests- probably a common cause
Comment 53 User image Armen Zambrano - Back on March 27th [:armenzg] (EDT/UTC-4) 2015-11-12 11:51:51 PST
We're probably going to need Mesa as well (bug 1220658).
Comment 54 User image Armen Zambrano - Back on March 27th [:armenzg] (EDT/UTC-4) 2015-12-01 13:10:50 PST
Feedback wanted in this comment. Thanks :)

This is the current set of results:
https://treeherder.mozilla.org/#/jobs?repo=try&revision=52bcfd8df0a1,96923331f0ef,042f329cbbb1&group_state=expanded

The first one shows Buildbot jobs, the second one has *most* Buildbot jobs and most harnesses work.
The last push addresses any remaining suites which the second push failed to have addressed correctly.

NOTE:
* I don't know of a *clean* way where we could run by default a chunk suite without running all other ones
* We will probably need to separate yml files into opt/debug to control chunking separately (since debug runs so slow)
* I have needed to chunk a lot more than on Buildbot

I think I will prep patches to enable jobs which are green by default and run it with --times 10 to see how stable they're are.
After that I think I will focus on fixing the upload of artifacts.

My current worry is bug 1221661 with the crashreporter not working.

I'm also considering creating a tier1 tracker bug to help separate issues which are not blockers to make this a tier2.

Suites that came green:
* cpp
* Jit1&2
* mochitest-push
* crashtest opt
* marionette opt
* jsreftest
* wr

Unknown (waiting on last push):
* luciddream
* mochitest-other
Comment 55 User image Armen Zambrano - Back on March 27th [:armenzg] (EDT/UTC-4) 2015-12-01 14:07:12 PST
Should we aim to run the "opt" test jobs which come back green? (Since all my pushes have already been doing that)

Even if on Q1 we will only be swapping the debug builds between Buildbot and TC.

The counter argument to this would be that we would be running jobs side-by-side without clear intent of replacing the opt builds.

I need to know this so I can adjust my patches if we won't.
Comment 56 User image Armen Zambrano - Back on March 27th [:armenzg] (EDT/UTC-4) 2015-12-21 08:38:27 PST
Since I started looking this in November, here's where we stand.

Anyone reading, please let me know if you have any questions of where we currently stand.

NOTE: Not all test issues have been filed as some of them will have shared issues. In January we should press the pedal down to file everything and get lots of developer involment. We will have then all *known* remaining docker image issues ironed out.

Latest clean push (still running atm):
https://treeherder.mozilla.org/#/jobs?repo=try&revision=fbe188bab4b0

* We're running 8 jobs side by side on the integration trees: http://mzl.la/1NH8qCA
** Cpp, Jit, mochistest gl/push, JsReftest, web platform reftest, xpcshell (4 chunks) and crashtest
** Crashtests will show up in the next merge from inbound to central
* bug 1221553 (dustin) - fix upload of artifacts
** It shows under the task inspector; not TH (bug 1218537 & bug 1218537)
* bug 1221661 (dustin) - allow enabling ptrace for worker in order for crash reporter to work
** dumping of crashes still needs work (bug 1233716)
* bug 1223123 (armenzg) - we added a window manager
* jmaher helped fixing various test failures and filed bugs for developers to help investigate
** FIXED: bug 1224641, bug 1232316 and bug 1232979
*  bug 1227637 (dustin) - install latest patched mesa
*  bug 1227657 (armenzg) - Removed Ubuntu's update prompt as it would still focus
*  bug 1228289 (glandium) - Avoid l10n-check overwriting final package when MOZ_SIMPLE_PACKAGE_NAME is set
*  bug 1228416 (dustin) - Redirect gnome-session's output into its own artifact to reduce intertwined noise with Mozharness' exectution
*  bug 1230330 (armenzg) - Switch from b2gtest worker to deskt-test worker type
** b2gtest workers were running with capacity of 4 which seemed to be affecting tests

Current issues:
* Bug 1231618 - tc-vcs should change paths.default in repository's hgrc
* Bug 1232407 (armenzg) - Allow starting desktop-test images with VNC if requested
* Bug 1233716 (armenzg) - Fix dumping of crashes in docker containers

Test failures (these will have to be reviewed to determine if they're still valid):
* Bug 1222162 - bc1 issue
* Bug 1224724 - reftest issue
* Bug 1226282 - probably dupe of 1221661
* Bug 1226751 - intermittent issue (hopefully to be backed out)
* Bug 1232981 - bidi/83958-1*.html reftests fail on new linux64 docker container
* Bug 1232983 - border-radius/clipping-6.html is failing on linux64 docker container
* Bug 1232985 - /bugs/321402-4|5|6.xul fail on linux64 when running from a docker container
* Bug 1233054 - Luciddream jobs failing in desktop-test container to load libfreetype.so.6
* Bug 1232980 - many bidi/with-first-letter-*.html reftests fail on new linux64 docker container
* Bug 1233554 - Linux x64 debug crashtest e10s crash in docker image

Innocous:
* bug 1227652 - pygtk import issue (probably from another utility using python's logging)
Comment 57 User image Armen Zambrano - Back on March 27th [:armenzg] (EDT/UTC-4) 2015-12-22 12:58:02 PST
Analysis per suite [1]:
* Web platform tests
** focus issues and time outs
* Luciddream (bug 1233054) - issues with packages
* Mochitests (plain, browser-chrome, devtools, other)
** focus issues and time outs
* Reftests
** lots of issues with pixels differing

There are also crashes happening in some of those jobs.
Similar situations are found for e10s equivalent jobs.

[1] https://treeherder.mozilla.org/#/jobs?repo=try&revision=fbe188bab4b0&group_state=expanded
Comment 58 User image Armen Zambrano - Back on March 27th [:armenzg] (EDT/UTC-4) 2016-01-07 09:14:59 PST
Current plan (timeline 1 week)
* (dustin) We're going to try to run docker-worker on m1.medium
** Bug to be filed
** We want to compare results to match Buildbot's current set up is
** m1.medium is single core. xpcshell tests will start working again
** In the future, we will be able to run some jobs on multi-core versus single-core if we would like to
** NOTE: We don't what we will get when running on m1.medium
* (armenzg) Switch from gnome-session to xsession to match Buildbot set up
** Also reported better VNC results
** Not too much change wrt to test results [1]
* (armenzg) fix crash that mochitests are hitting (file bug)
* (armenzg) dowgrade mesa
* (armenzg) disable screen saver and locking
* (armenzg) Fix dumping of crashes in docker containers


[1] https://treeherder.mozilla.org/#/jobs?repo=try&group_state=expanded&revision=050d6b6dd77d,c63b24f60dd
Comment 59 User image Armen Zambrano - Back on March 27th [:armenzg] (EDT/UTC-4) 2016-01-14 07:03:30 PST
For the curious, this is where we currently stand:
https://treeherder.mozilla.org/#/jobs?repo=try&group_state=expanded&revision=3c9a54d68c95

Last week's plan (7 days ago)
* [DONE] (dustin) Run jobs on m1.medium (par with releng)
* [DONE] (armenzg) Switch from gnome-session to xsession to match Buildbot set up
* [DONE] (armenzg) Disable apport crash reporter (stealing mochitest focus)
* [DONE] (armenzg) dowgrade mesa
* [DONE] (armenzg) disable screen saver and locking
* [DONE] (armenzg) Fix dumping of crashes in docker containers

Fixed in the last week:
* [DONE] (armenzg) Split mochitests into 8 chunks
* reviewed current dependencies

Current plan (1-2 weeks timeline)
* Find root issue for reftests failing (we assume some more docker image work)
* Investigate and file bugs for current test issues
* Aim to get green all the way with m1.medium

Ongoing:
* Split mochitest-other into mochitest-a11y and mochitest-chrome
* Add gtest
* Filed a11y issues - bug 1239301
* m-8 test-alerts.html - bug 1236036
* e10s crash in crashtest - bug 1233554
* Luciddream (this is an opt *only* job - bug 1233554
* R1 - bug 1232985
* Build tier1 issue - bug 1231618
* Intermittent m-3 (I don't see it happening anymore; removing dep) - bug 1197642
* Intermittent bc7 (I don't see it happening anymore; removing dep) - bug 1011171
Comment 60 User image Dustin J. Mitchell [:dustin] 2016-01-15 10:06:10 PST
I just did a scan of the tester-specific stuff that we do in PuppetAgain:

 * need to start the window manager session (duh, but it did take us a long time to figure this out)
 * tweaks::fonts
   ---> bug 1240056
 * clean::appstate
   ---> only on OS X
 * EDID data
   ---> only for GPUs
 * gnome-settings-daemon upgrade to 3.4.2-0ubuntu0.6.2 (bug 846348)
   ---> we are already running 3.4.2-0ubuntu0.6.6
 * disable jockey-gtk and deja-dup-monitor (bug 9849444)
   ---> bug 1240084
Comment 61 User image Armen Zambrano - Back on March 27th [:armenzg] (EDT/UTC-4) 2016-01-21 07:51:54 PST
Here's where we are with greenness:
https://treeherder.mozilla.org/#/jobs?repo=try&revision=5e96bb453f67

We've enabled a lot of jobs on inbound.

I need to spend time filing bugs for R3, R-e10s2 and R-e10s3.
C-e10s is already filed (bug 1233554).
Comment 62 User image Armen Zambrano - Back on March 27th [:armenzg] (EDT/UTC-4) 2016-01-22 09:59:36 PST
selena: we need to determine the scope of this bug.

A - make them run on try
B - run side by side *green* as tier-2 on all trunk trees
C - run side by side *green* as tier-1 on all trunk trees
D - replace the Buildbot jobs on trunk trees

At any point, we have to determine at which point this project is no more an engineering productivy effort but a releng/taskcluster effort.

I'm going to file a bug for D and assume that we're doing B in here [as per dustin [1]].

[1] <dustin> armenzg: for your part, I think it'd be OK if you left things with the tests enabled in TC at tier 2

Current known blockers for debug builds being tier1:
* bug 1174263
* bug 1231320
* bug 1234929
* bug 1231618
Comment 63 User image Selena Deckelmann :selenamarie :selena use ni? 2016-01-25 07:37:05 PST
(In reply to Armen Zambrano [:armenzg] - Engineering productivity from comment #62)
> selena: we need to determine the scope of this bug.
> 
> A - make them run on try
> B - run side by side *green* as tier-2 on all trunk trees
> C - run side by side *green* as tier-1 on all trunk trees
> D - replace the Buildbot jobs on trunk trees
> 
> At any point, we have to determine at which point this project is no more an
> engineering productivy effort but a releng/taskcluster effort.
> 
> I'm going to file a bug for D and assume that we're doing B in here [as per
> dustin [1]].

Sounds good!  Thank you for going through the details there. 

I'll file a bug for C and make the bugs you mentioned blockers.
Comment 64 User image Armen Zambrano - Back on March 27th [:armenzg] (EDT/UTC-4) 2016-01-26 07:50:45 PST
This is the current breakdown:

To be discussed:
* Pending jobs reports for sherrifs [1]
* Proper integration under Treeherder -> Infra menu [2]
* SETA support

(armenzg) Make TC Linux64 *debug* test jobs tier2
* bug 1232985: /bugs/321402-4|5|6.xul fail on linux64 when running from a docker container
* bug 1233554: Linux x64 debug crashtest e10s crash in docker image (tests/reftest/tests/dom/canvas/crashtests/780392-1.html)
* bug 1241297: wpt-3 e10s always takes as long as the max runtime allows it to
   * bug 1238435: Intermittent e10s TEST-UNEXPECTED-TIMEOUT | /html/dom/reflection-forms.html, /html/dom/reflection-embedded.html, /html/dom/reflection-grouping.html followed by busting the whole run
* bug 1242033: Linux x64 debug e10s reftest 3 - element-paint-native-widget.html element-paint-native-widget-ref.html
* bug 1242682: Separate dom/media into its own subsuite

Make TC Linux64 *debug* test jobs tier1 (dropping associated Buildbot builders)
* bug 1218537: It's not possible to submit multiple "Job Info" artifacts
* bug 1241280: Fix web platform tests grouping
* bug 1242023: Cannot schedule buildbot bridge builds
   * This would allow us to schedule the jobs on the Buildbot generated build without having to wait for L64 debug builds to replace the Buildbot one

Make TC Linux64 *OPT* test jobs tier1
* bug 1233054: [opt] Luciddream jobs failing in desktop-test container to load libfreetype.so.6

Make TC Linux64 *debug* _BUILD_ jobs tier1
* bug 1231618: (tier1 issue) tc-vcs should change paths.default in repository's hgrc
   * bug 1241111: Allow overriding SOURCE_REV_URL, SOURCE_REPO, SOURCE_CHANGESET

Optimizations:
* Evaluate switching to m3 once we are completely green on m1
   * bug 1235889: time to run taskcluster jobs take 20% longer than buildbot peers


[1]
http://builddata.pub.build.mozilla.org/reports/pending/pending.html
[2] http://people.mozilla.org/~armenzg/sattap/99d8ad71.png
Comment 65 User image Armen Zambrano - Back on March 27th [:armenzg] (EDT/UTC-4) 2016-01-26 09:10:32 PST
The list of dependencies is now up to date.
Comment 66 User image Joel Maher ( :jmaher) 2016-01-26 09:11:46 PST
we need to verify runtimes and intermittent rates as compared to buildbot.
Comment 67 User image Armen Zambrano - Back on March 27th [:armenzg] (EDT/UTC-4) 2016-02-01 14:10:06 PST
After the experiment from this weekend, jmaher discovered few discrepancies to what we should actually be running.
For instance, a lot of e10s jobs were *not* running as e10s due to a bug in the task definitions (the payload was defined twice).
Due to this, our greeness has gone rather up [1]

In the same fashion as crashtest e10s, we're seeing crashes for:
* m-4
* d-t{1,2,9}

New issues:
* marionette - lots of test issues
* dt8 - it used to take 30mins. It now times out
* Wr - various test failures

e10s issues:
* m10 - test_alerts.html is back
* bc1 - test/alerts/browser_notification_close.js
* bc6 - (intermittent) sessionstore/test/browser_crashedTabs.js and sessionstore/test/browser_579879.js
* dt6 - inspector/test/browser_inspector_initialization.js

Hopefully to be fixed in my next push:
* JP - wrong values in the test definitions
* m-other - I increase the max run time

[1] https://treeherder.mozilla.org/#/jobs?repo=try&revision=c9397ac87d91&filter-searchStr=e10s&group_state=expanded
Comment 68 User image Armen Zambrano - Back on March 27th [:armenzg] (EDT/UTC-4) 2016-02-02 12:25:19 PST
Bug 1245243 might fix most of the e10s issues we're seeing.
Comment 69 User image :Ehsan Akhgari 2016-02-03 14:52:33 PST
catlee asked me to see if I can help with the issues you guys are running into.  Can someone please tell me where to look for the latest state of things?  I have been looking at this bug and bug 1237024 but getting details straight in my mind is pretty difficult...  Thanks!
Comment 70 User image Joel Maher ( :jmaher) 2016-02-03 16:52:35 PST
thanks Ehsan!  Right now we are waiting on bug 1245243, we suspect this will fix some of the e10s failures we are seeing (at the very least crashtest in bug 1233554).  On top of that there are a list of failures to figure out:
https://treeherder.mozilla.org/#/jobs?repo=try&revision=aba896a9c4e2

We were waiting for the shared mem mount increase and then planned on retesting.  Keep in mind the m-e10s(dt*) are scheduled on tc, but they are not greened up on buildbot- so ignore those.

Outside of greening up jobs, we need to resolve how to get stats for the sheriffs on wait times and other machine related info.  Maybe you could allocate a day next week to help look at any remaining mochitest/browser-chrome tests that are still failing after we fix bug 1245243.
Comment 71 User image Armen Zambrano - Back on March 27th [:armenzg] (EDT/UTC-4) 2016-02-04 07:37:41 PST
Removing bug 1242682 and bug 1243080 as they're no real blockers but improvements.
Comment 72 User image :Ehsan Akhgari 2016-02-08 11:54:51 PST
Sadly I only have tomorrow and then will go on vacation.  Can you please ping me when I get back?
Comment 73 User image Armen Zambrano - Back on March 27th [:armenzg] (EDT/UTC-4) 2016-02-08 11:59:09 PST
(In reply to :Ehsan Akhgari (Away 2/10-2/19) from comment #72)
> Sadly I only have tomorrow and then will go on vacation.  Can you please
> ping me when I get back?

We will, however, we hope to be done by then. Enjoy your break!
Comment 74 User image Armen Zambrano - Back on March 27th [:armenzg] (EDT/UTC-4) 2016-02-08 14:28:12 PST
Latest push [1]

Status summary:
* Since last week we properly pass --e10s flag to e10s jobs
** This is why we had a sudden increase of oranges
* We are now using a newer version of docker which allow us to modify /dev/shm
** This takes away a couple of crashes on e10s
* Wr is now passing (it is rather intermittent)
* Wr-e10s is now beeing scheduled and running green

Remaining issues
* Marionette still has a bunch of test failures - bug 1246283
* s/bc1/bc4/ - test/alerts/browser_notification_close.js (bug 1244936)
* Bump mochitest's timeout from 45 seconds to 90 seconds fixes some tests - bug 1246152 (still to land)
** This is due that running mochitests inside of docker is slower
* We will be increasing the global timeout for devtools (fixes dt8 time out - bug 1246279)

[1] https://treeherder.mozilla.org/#/jobs?repo=try&revision=4d8d791a8e5d&exclusion_profile=false&group_state=expanded
Comment 75 User image Armen Zambrano - Back on March 27th [:armenzg] (EDT/UTC-4) 2016-02-08 14:34:10 PST
* m10 e10s issue - bug 1246019
Comment 76 User image Armen Zambrano - Back on March 27th [:armenzg] (EDT/UTC-4) 2016-02-11 06:46:24 PST
Remaining test issues:
* m10 e10s - test_alerts.html - bug 1246019 - bug 1227730
* bc4 e10s - test/alerts/browser_notification_close.js (bug 1244936)
* wpt reftests - bdi-paragraph-level-container.html - bug 1247033

Fixed:
* Marionette - bug 1246283
* Some higher mochitest intermitency - mount workspace under hosts' SSD - bug 1246947
Comment 77 User image Armen Zambrano - Back on March 27th [:armenzg] (EDT/UTC-4) 2016-02-18 11:26:25 PST
Status summary:
* bug 1227730 fixes all remaining perma failures; we're waiting on dev to get the patch reviewed and landed
* bug 1227637 releng hosts have a new mesa and we have to match it

This weekend we would like to run another experiment to compare Buildbot to TaskCluster/docker since we've mounted ~/workspace to the hosts' SSD disk.
Comment 78 User image Armen Zambrano - Back on March 27th [:armenzg] (EDT/UTC-4) 2016-02-19 08:42:40 PST
It seems that mochitest-push has become a mess both for Buildbot and TC
https://treeherder.mozilla.org/#/jobs?repo=mozilla-inbound&filter-searchStr=mochitest-push&fromchange=ee60dc3d0655&exclusion_profile=false&tochange=c5d6c3e00c91

For TC, it was green until d320678c4fab  (4 push on that view).

They're currently all hidden.
Comment 79 User image Armen Zambrano - Back on March 27th [:armenzg] (EDT/UTC-4) 2016-02-22 11:05:48 PST
I think we should start looking into enabling mochitest-plain and mochitest-browser-chrome until kits gets bug 1227730 fixed.
We can hide m10 and bc4 until then.

This is our current latest *greenest* run:
https://treeherder.mozilla.org/#/jobs?repo=try&revision=55098f406c33&group_state=expanded
Comment 80 User image Armen Zambrano - Back on March 27th [:armenzg] (EDT/UTC-4) 2016-02-24 11:16:38 PST
Everything is running side by side and visibly on Treeherder.

The only work left in here is finishing up the documentation and improve how to run this locally.
Comment 81 User image Armen Zambrano - Back on March 27th [:armenzg] (EDT/UTC-4) 2016-03-08 06:34:52 PST
9 months later and all dependencies have been solved.
We're now running L64 debug test jobs as tier-2 on most integration/trunk repos (except few - bug 1252471).

We can now close this and deal with other platforms.
Comment 82 User image Ted Mielczarek [:ted.mielczarek] 2016-03-08 07:58:00 PST
Congrats! And you managed to get it done before you turned into a pumpkin. :)
Comment 83 User image Andrew Halberstadt [:ahal] 2016-03-08 08:12:27 PST
Awesome, that is an impressive dependency tree!

Note You need to log in before you can comment on or make changes to this bug.