Add talos builders for linux64-qr builds

RESOLVED FIXED in mozilla57

Status

Testing
Talos
RESOLVED FIXED
9 months ago
9 months ago

People

(Reporter: kats, Assigned: kmoir)

Tracking

Trunk
mozilla57
Points:
---

Firefox Tracking Flags

(Not tracked)

Details

Attachments

(4 attachments, 1 obsolete attachment)

We'd like to allow running talos on the linux64-qr test platform (Linux 64-bit with QuantumRender enabled). I did a try push with the taskcluster changes in bug 1383149 but it didn't work, presumably because buildbot couldn't find the necessary configuration on its end. I'm not sure what needs to be done here to make this work but basically we need builders with strings like "Ubuntu HW 12.04 x64 try qr talos chromez-e10s" (same for all the talos tests) which behave exactly the same as regular linux64 talos jobs, but also set the MOZ_WEBRENDER=1 environment variable.

One point of concern here is that webrender has previously failed to work on Ubuntu 12.04 - for reftests and mochitests we had to wait until the machines were running 16.04 before we could get the tests running properly. We might have the same problem with talos, assuming the "Ubuntu HW 12.04" in the buildername is correct and the talos machines are running 12.04.
Joel, do you know what I would need to do to get this working?
Flags: needinfo?(jmaher)
(Reporter)

Updated

9 months ago
Summary: Add talos builders for linux64-qr buidls → Add talos builders for linux64-qr builds
:kats, we run talos on 12.04.  Could we run on windows instead?  could we verify via try server by hacking up existing linux64 talos tests to verify that we can run on 12.04?

this is not an easy fix- we had planned on upgrading to 16.04 when we upgrade hardware- that will take place in the new datacenter which is scheduled in Q4.  If needed now, it will derail a few other plans, so please see if we can test on another platform or if we can run on 12.04 at all.
Flags: needinfo?(jmaher) → needinfo?(bugmail)
I did a try push with QR enabled via pref on linux 12.04 and windows: https://treeherder.mozilla.org/#/jobs?repo=try&revision=1ef70274663d3a1da7b580201cc0120e904cce96

So far it's looking like it actually does run fine on Linux 12.04 which is promising. Perhaps the hardware/graphics support is different (compared to the mochitest/reftest machines) and has the stuff webrender needs. I'll wait for the windows results to come back too. We care about windows more so if we can get good data from that, it might be preferable.
There's some "BROWSER FAILED TO GENERATE MOZAFTERPAINT IN 5 SECONDS" on g5 and o on windows. On Linux g4 is producing Infinity and NaNs out the wazoo. g1/g2 are timing out/crashing - not sure what the problem there is. But these also seem like issues with webrender/gecko rather than the harness. So we can probably enable the ones that work and then try and get the rest working as well.
Flags: needinfo?(bugmail)
Joel: so can we set up the builders for Linux x64 Ubuntu 12.04 and Windows 10? Then I can make the taskcluster changes to enable the working talos tests on those platforms and continue investigation. For now we just need them on the m-c and try branches.
Flags: needinfo?(jmaher)
I didn't have much luck hacking on this yesterday (despite figuring out a couple other buildbot related changes), I will have more help available tomorrow, so I will try again then!  leaving the ni for now
:Callek, could you help me with the buildbot changes here (or find someone who can).  I would like to have 2 new platforms: linux64-qr and win64-qr which run talos tests only on mozilla-central and try.  I spent a couple hours on this with no luck (list_builder_differences.sh didn't show the new platforms).

I will be on PTO for the next few days.
Flags: needinfo?(jmaher) → needinfo?(bugspam.Callek)

Comment 8

9 months ago
 I'm not as familiar with buildbot changes as I'd like these days, Kim do you or buildduty think you can help Joel here?
Flags: needinfo?(bugspam.Callek) → needinfo?(kmoir)
(Assignee)

Comment 9

9 months ago
Bug 1338871 is an example of these changes.  You need to enable the jobs in taskcluster since they will be scheduled via bbb.  

They buildbot changes are are just the bare minimum needed to enable the builders in buildbot on the required branches. The name has to exactly match the string of of the bbb task that is created in taskcluster.

For example, define a platform
https://hg.mozilla.org/build/buildbot-configs/file/tip/mozilla-tests/config.py#l56
https://hg.mozilla.org/build/buildbot-configs/file/tip/mozilla-tests/config.py#l180
https://hg.mozilla.org/build/buildbot-configs/file/tip/mozilla-tests/config.py#l237
https://hg.mozilla.org/build/buildbot-configs/file/tip/mozilla-tests/config.py#l342
https://hg.mozilla.org/build/buildbot-configs/file/tip/mozilla-tests/config.py#l1224

enable on appropriate branches
https://hg.mozilla.org/build/buildbot-configs/file/tip/mozilla-tests/config.py#l2880

use existing pool of machines for these tests (have to use a different name because of duplicate builder name issues in buildbot)
https://hg.mozilla.org/build/buildbot-configs/file/tip/mozilla-tests/production_config.py#l21
https://hg.mozilla.org/build/buildbot-configs/file/tip/mozilla-tests/production_config.py#l74

Let me know if you have questions, it is kind of a pain.

Once the builders are enabled on buildbot, you and run a try run and see if they work.  If the task has an exception, usually the reason is that the names in buildbot and taskcluster don't match
Flags: needinfo?(kmoir)
Created attachment 8892975 [details] [diff] [review]
Patch for buildbot-configs

Is there a way to test this change without landing it? I basically did a bunch of copy/paste at the line numbers you provided using stylo as the template. It seems straightforward enough so I hope I didn't do anything wrong but if there's a way to test it that would be great.
Assignee: nobody → bugmail
Attachment #8892975 - Flags: review?(kmoir)
(Assignee)

Comment 11

9 months ago
I looked at them and have a new modifications + additional patches.
(Assignee)

Comment 12

9 months ago
Created attachment 8893555 [details] [diff] [review]
bug1383712puppet.patch
(Assignee)

Comment 13

9 months ago
Created attachment 8893556 [details] [diff] [review]
bug1383712tools.patch
(Assignee)

Comment 14

9 months ago
Created attachment 8893558 [details] [diff] [review]
bug1383712bb.patch
Attachment #8892975 - Attachment is obsolete: true
Attachment #8892975 - Flags: review?(kmoir)
(Assignee)

Updated

9 months ago
Attachment #8893555 - Flags: review?(spacurar)
(Assignee)

Updated

9 months ago
Attachment #8893556 - Flags: review?(spacurar)
(Assignee)

Updated

9 months ago
Attachment #8893558 - Flags: review?(spacurar)
thanks for the help on this :kmoir!
Attachment #8893555 - Flags: review?(spacurar) → review+
Attachment #8893556 - Flags: review?(spacurar) → review+
Attachment #8893558 - Flags: review?(spacurar) → review+
(Assignee)

Updated

9 months ago
Attachment #8893555 - Flags: checked-in+
(Assignee)

Updated

9 months ago
Attachment #8893556 - Flags: checked-in+
(Assignee)

Updated

9 months ago
Attachment #8893558 - Flags: checked-in+
Do you know if there is anything else that needs to be done here? I kicked off another try push trying to use these builders and got the same "malformed payload" error in taskcluster [1]. The payload for one of the tasks [2] looks fine to me, and the buildername is "Ubuntu HW 12.04 x64 try qr talos chromez-e10s" which is what I would expect. For comparison, I looked at a stylo talos push [3] which has the exact same buildername but with "qr" replaced with "stylo" as I would expect. That task completed successfully.

[1] https://tools.taskcluster.net/groups/SFkFyCkeTm2yBNIbvySROg/tasks/EoeFNepiSkiZuM85DPq75Q/runs/0
[2] https://tools.taskcluster.net/groups/SFkFyCkeTm2yBNIbvySROg/tasks/EoeFNepiSkiZuM85DPq75Q/details
[3] https://tools.taskcluster.net/groups/R_qD1_JyRdOwBZs_XFMK6w/tasks/Y9RsgEsjQpywiRYCKcUB5Q/details
Flags: needinfo?(kmoir)
(Assignee)

Comment 17

9 months ago
Created attachment 8893952 [details] [diff] [review]
bug1383712intree.patch

You need this patch to ensure the builder names are are the same on buildbot and taskcluster. (we run talos tests still on buildbot through buildbot bridge, they are scheduled through taskcluster)  The malformed payload usually means the buildernames on the two systems don't match.
Flags: needinfo?(kmoir)
That seems to work, thanks!
I'm going to close this bug since the non-mozilla-central patches are all landed. I'll merge the m-c patch from comment 17 into the rest of the stuff that I'll be landing in bug 1383149.
Assignee: bugmail → kmoir
Status: NEW → RESOLVED
Last Resolved: 9 months ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla57
You need to log in before you can comment on or make changes to this bug.