Make it possible to run gaia try jobs *without* doing a build

RESOLVED FIXED

Status

defect
RESOLVED FIXED
5 years ago
Last year

People

(Reporter: jgriffin, Assigned: aki)

Tracking

Firefox Tracking Flags

(Not tracked)

Details

Attachments

(11 attachments, 11 obsolete attachments)

757 bytes, patch
nthomas
: review+
Details | Diff | Splinter Review
761 bytes, patch
RyanVM
: review+
Details | Diff | Splinter Review
29.42 KB, patch
catlee
: review+
Details | Diff | Splinter Review
2.28 KB, patch
catlee
: review+
Details | Diff | Splinter Review
5.70 KB, patch
jgriffin
: review+
Details | Diff | Splinter Review
689 bytes, patch
aki
: review+
Details | Diff | Splinter Review
983 bytes, patch
aki
: review+
nthomas
: checked-in+
Details | Diff | Splinter Review
5.24 KB, patch
nthomas
: review+
Details | Diff | Splinter Review
4.09 KB, patch
rail
: review+
Details | Diff | Splinter Review
wah
4.54 KB, patch
rail
: review+
Details | Diff | Splinter Review
1.39 KB, patch
catlee
: review+
Details | Diff | Splinter Review
The gaia team would like to be able to trigger gaia try jobs against "the latest B2G desktop builds available", in order to reduce turnaround time.

In this mode, gaia try would behave similar to how Travis does...it just finds the most recent B2G desktop build that's available on ftp.mozilla.org, and runs tests against it.

We could add logic to the mozharness scripts that handle these test jobs to pull down the most recent build from m-c to run against, but we'd need a way to schedule the test jobs in the absence of a build job.  In buildbot lingo, I think this means we'd need to trigger a sendchange either with a null build url, or with the 'latest' build url.

Potentially, this behavior could be gated on something in the commit message, similar to how try syntax is handled.

:catlee, can you estimate how hard this might be?
Talked with aki about this in the b2g-cross-functional meeting.  It seemed like a viable approach might be to:

- make a 'dummy' no-op build that just serves to kick off tests
- add a 'latest' link for B2G desktop builds that test jobs could use to locate the latest build

According to gwagner, this is medium-high priority, since it's the last blocker to moving off of Travis. I've volunteered myself and ahal to help if there's anything we can do.
Blocks: 914632
Assignee: nobody → aki
Comment on attachment 8399743 [details] [diff] [review]
(needs testing) post_upload

This patch works, but created a mozilla-central-linux32_gecko/1396323260/latest softlink that points to mozilla-central-linux32_gecko/1396323260/en-US, rather than a mozilla-central-linux32_gecko/latest softlink that points to mozilla-central-linux32_gecko/1396323260 .
Attachment #8399743 - Attachment is obsolete: true
Posted patch post_upload2 (obsolete) — Splinter Review
This patch:

* adds a --release-to-latest-tinderbox-builds option, although I debated making this set behavior
* adds an avoid_race_condition option to rel_symlink()
* removes shadow central stuff
Attachment #8400255 - Flags: review?(catlee)
Call post_upload.py with --release-to-latest-tinderbox-builds.
This turns it on for all MBF-based builds (including b2g desktop)... not sure how much we should worry about earlier builds finishing later.
Attachment #8400256 - Flags: review?(catlee)
Isn't this use case exactly what the taskcluster work is aiming to do?
Taskcluster is still a month or two out technically (and depends on sheriffs using treeherder which may be even longer). James Lal said if doing this wasn't too difficult it would be good to land it in the meantime, otherwise if it was a big complicated change, we could just wait for taskcluster.
Using TaskCluster to do this instead has a lot of dependencies...all the B2G-related tests (including emulator ones) have to be stood up in TaskCluster, and we have to run them side-by-side for a while to verify we're getting consistent results and a similar number of intermittents.

TaskCluster currently only runs the tests we don't run in TBPL...namely, gaia-build tests and linter tests.

Treeherder has to be deployed, and the sheriffs need to go through some iterations on it to iron out the rough edges, and then sheriffing has to be transitioned to it.

As ahal says, this is likely to be a couple of months of work, realistically, and it's hoped that we can use gaia try in the interim, if that's less work.
Comment on attachment 8400255 [details] [diff] [review]
post_upload2

>diff --git a/stage/post_upload.py b/stage/post_upload.py
>+def rel_symlink(_to, _from, avoid_race_condition=False):
...
>+        if os.path.exists(_from):
>+            if os.path.isdir(_from):
>+                shutil.rmtree(_from)
>+            else:
>+                os.unlink(_from)

There's theoretically an issue if two builds try to update the latest symlink at the same time, eg unlink() fails because a very slightly earlier job has just done that. We'd should only hit it if I/O operations are slow though. Locking or a retry could be used to avoid that.
Attachment #8400255 - Flags: review?(catlee) → review+
Attachment #8400256 - Flags: review?(catlee) → review+
Comment on attachment 8400255 [details] [diff] [review]
post_upload2

Pretty sure this is no-op until the buildbotcustom piece is live.

http://hg.mozilla.org/build/tools/rev/c616fa4b85e0

Checked into productdelivery:
Sending        files/bin/post_upload.py
Transmitting file data .
Committed revision 85578.
Attachment #8400255 - Flags: checked-in+
This was delayed a bit while we debated whether to try getting this in Task Cluster...

Next steps:
* Figure out how best to trigger jobs from the commit hook.  Leaning towards a sendchange but self-serve may also be a solution.
** jhford is blocked on me for this.
* We need a way to see these results.  https://bugzilla.mozilla.org/show_bug.cgi?id=914632#c2 says we need an hg repo, which might be overkill since we won't be creating builds here, but would possibly lead to a gaia-inbound (though I don't know if the patches slated for gaia-try are going to be riskier or more experimental than what would belong in gaia-inbound).
** hg repo
** configs for that repo in buildbot-configs/mozilla and mozilla-tests
** tbpl support for that repo

I think that's it, but it's certainly possible I'm missing something.
Also juggling Flame builds as I haven't found another person to own one of these bugs yet.
Depends on: 991961
Live in production.
Depends on: 992212
I turned the post_upload.py into a no-op because of bustage:
09:55 < nagios-releng> Fri 06:55:50 PDT [4446] buildbot-master111.srv.releng.scl3.mozilla.com:Command Queue is CRITICAL: 5 dead items (http://m.mozilla.org/Command+Queue)
09:55 < nagios-releng> Fri 06:55:50 PDT [4447] buildbot-master112.srv.releng.scl3.mozilla.com:Command Queue is CRITICAL: 5 dead items (http://m.mozilla.org/Command+Queue)
09:55 < nagios-releng> Fri 06:55:50 PDT [4448] buildbot-master51.srv.releng.use1.mozilla.com:Command Queue is CRITICAL: 1 dead item (http://m.mozilla.org/Command+Queue)
09:55 < nagios-releng> Fri 06:55:59 PDT [4449] buildbot-master53.srv.releng.usw2.mozilla.com:Command Queue is CRITICAL: 2 dead items (http://m.mozilla.org/Command+Queue)
09:58 < catlee> hm
09:59 -!- bholley [bholley@B08EEEEB.6BBCE916.7B974E06.IP] has joined #buildduty
09:59 < catlee> OSError: [Errno 40] Too many levels of symbolic links: '/home/ftp/pub/firefox/tinderbox-builds/mozilla-inbound-linux/1396616987/tmpurkgMy'
09:59 < catlee> is that from aki's stuff?


https://hg.mozilla.org/build/tools/rev/f64669f52207
We were getting symlinks that were pointing at themselves. e.g.
lrwxrwxrwx 1 ffxbld firefox 10 Apr  4 07:29 ./b2g-inbound-linux/1396613985 -> 1396613985
Ugh, sorry.  I did test.  I'm not sure how that happened, but I'll try to figure it out.
Stale latest links have been removed in firefox, mobile, and b2g tinderbox-builds/* dirs.
Posted patch best effort softlink (obsolete) — Splinter Review
I tested with:

* existing softlink (new softlink works)
* existing file (softlink works)
* existing non-writable file (softlink works if directory is writable)
* existing directory (softlink is created inside the directory, which is wrong, but probably harmless)
* existing non-writable directory (softlink fails with an STDERR message)

I'm thinking a best effort softlink is our best option here.  Thoughts?
Attachment #8400255 - Attachment is obsolete: true
Attachment #8402993 - Flags: review?(nthomas)
Posted patch gaia-try-tbplSplinter Review
Attachment #8403041 - Flags: review?(ryanvm)
Comment on attachment 8403041 [details] [diff] [review]
gaia-try-tbpl

Review of attachment 8403041 [details] [diff] [review]:
-----------------------------------------------------------------

Let's put it under Try for consistency.
Attachment #8403041 - Flags: review?(ryanvm) → review+
Comment on attachment 8402993 [details] [diff] [review]
best effort softlink

Leaning towards latest nightlies, which already have a latest dir.
Attachment #8402993 - Flags: review?(nthomas)
If we go back to latest for tinderbox-builds, then we should take into account builds finishing out of sequence compared to pushes. I think we'd expect latest to mean latest code, instead of latest build.
That would be different than our latest assumption for both nightlies and releases, where it means latest build, not necessarily latest code.
Since these try jobs will be running against somewhat arbitrary gecko, I'm not sure the difference matters here.  Latest build is probably fine.
Posted patch gaia-try-configs wip (obsolete) — Splinter Review
I'm triggering tests!
Now working on getting them to not be perma-red wastes of infrastructure time...
Attachment #8407341 - Attachment is obsolete: true
Posted patch gaia-try-mh (obsolete) — Splinter Review
This isn't completely blowing up.
Gaia integration isn't finding the mozbase requirements txt.  I hope we're looking at the test zip, and not looking at the actual tree, because gaia-try isn't going to have an actual tree.
Gaia-try uses test.zip; it doesn't look for an actual tree.
Posted patch gaia-try-mhSplinter Review
This patch:

* adds a config file for gaia-try
* allows for specifying a different location for gaia.json (optional; we could also have the file exist in a b2g/config/ dir in gaia-try, but I figured having the file at the root level would be clearer to the casual observer)
* fixes the |git remote -v| directory
* allows for setting installer_url and test_url via commandline instead of requiring they live in buildprops.json.

I tested an earlier version of this patch on ash and it didn't look like it was causing issues.  I'll still want a good cycle on Cypress on default after landing, though.

I got further with gaia-integration by allowing for setting self.test_url even though gaia-integration has require_test_zip=False for some reason [?].  Now it gets past the mozbase requirements file and is dying on npm commands on a build slave, which is probably a good sign.
Attachment #8408451 - Flags: review?(jgriffin)
Attachment #8408003 - Flags: review?(catlee)
Attachment #8408002 - Attachment description: gaia-try-configs wip → gaia-try-configs
Attachment #8408002 - Flags: review?(catlee)
Attachment #8408090 - Attachment is obsolete: true
Attachment #8408003 - Attachment description: gaia-try-custom wip → gaia-try-custom
Comment on attachment 8408451 [details] [diff] [review]
gaia-try-mh

Review of attachment 8408451 [details] [diff] [review]:
-----------------------------------------------------------------

lgtm!
Attachment #8408451 - Flags: review?(jgriffin) → review+
Attachment #8402993 - Attachment is obsolete: true
Attachment #8400255 - Attachment is obsolete: false
Depends on: 1000304
Attachment #8408003 - Flags: review?(catlee) → review+
Attachment #8408002 - Flags: review?(catlee) → review+
TBPL part in production :-)
buildbot-config patch is in production: http://hg.mozilla.org/build/buildbot-configs/rev/46353eebf660
buildbotcustom patch is in production: http://hg.mozilla.org/build/buildbotcustom/rev/22ee9ed82c3b

:)
Attachment #8400255 - Flags: checked-in+ → checked-in-
Posted patch fix_gaia-try_uploads (obsolete) — Splinter Review
Attachment #8414923 - Flags: review?(nthomas)
I see a green test run!
http://buildbot-master04.srv.releng.usw2.mozilla.com:8201/builders/b2g_ubuntu32_vm%20gaia-try%20opt%20test%20reftest/builds/0
Hopefully this latest patch will fix the log upload so it appears on tbpl properly.
Posted patch self-serveSplinter Review
And I think this adds it to self-serve?
Attachment #8414940 - Flags: review?(catlee)
Comment on attachment 8414923 [details] [diff] [review]
fix_gaia-try_uploads

This patch seems more related to our traditional way of uploading builds, or pulling packages, which may not apply to gaia-try.

For postrun failures, I think you need to duplicate the block starting at
 http://hg.mozilla.org/build/puppet/file/5f842bcfcacd/modules/buildmaster/templates/postrun-default.cfg.erb#l20
for gaia-try. I tried that on bm106 and got this was uploaded
 http://ftp.mozilla.org/pub/mozilla.org/b2g/try-builds/jford@mozilla.com-3d288cf7101e/gaia-try-macosx64_gecko/gaia-try_mountainlion-b2gdt_test-gaia-ui-test-bm106-tests1-macosx-build0.txt.gz
Attachment #8414923 - Flags: review?(nthomas)
Attachment #8414923 - Attachment is obsolete: true
Attachment #8415412 - Flags: review?(aki)
Comment on attachment 8415412 [details] [diff] [review]
[puppet] Add gaia-try ssh config

We could do that, or maybe

(bb08)deathduck:/src/gaia-try/buildbotcustom/bin [10:28:22] (default)
711$ hg diff
diff --git a/bin/postrun.py b/bin/postrun.py
--- a/bin/postrun.py
+++ b/bin/postrun.py
@@ -63,17 +63,17 @@ class PostRunner(object):
                        self.config['statusdb.master_name']]
         if "nightly" in builder.name:
             upload_args.append("--nightly")
         if builder.name.startswith("release-"):
             upload_args.append("--release")
             upload_args.append(
                 "%s/%s" % (info.get('version'), info.get('build_number')))

-        if branch and 'try' in branch:
+        if branch and branch.startswith('try'):
             upload_args.append("--try")
         elif branch == 'shadow-central':
             upload_args.append("--shadow")

         if 'l10n' in builder.name:
             upload_args.append("--l10n")

         if product:

?
Attachment #8415412 - Flags: review?(aki) → review+
Comment on attachment 8414940 [details] [diff] [review]
self-serve

jlund irc review
Attachment #8414940 - Flags: review?(catlee) → review+
Comment on attachment 8415412 [details] [diff] [review]
[puppet] Add gaia-try ssh config

Landed as is, 
  https://hg.mozilla.org/build/puppet/rev/cd7fdf51bc1b

because we're already getting --try but also the wrong keys, eg

/builds/buildbot/tests1-macosx/bin/python /builds/buildbot/tests1-macosx/lib/python2.7/site-packages/buildbotcustom/bin/log_uploader.py -r 2 -t 10 --master-name bm106-tests1-macosx --try --product b2g --platform macosx64_gecko --branch gaia-try --user ffxbld -i /home/cltbld/.ssh/ffxbld_dsa stage.mozilla.org /builds/buildbot/tests1-macosx/master/gaia-try_mountainlion-b2gdt_test-gaia-ui-test 1

and that is teh fail.
Attachment #8415412 - Flags: checked-in+
https://tbpl.mozilla.org/?tree=Gaia-Try is now showing logs/jobs properly.
Self serve has is_done status: https://secure.pub.build.mozilla.org/buildapi/self-serve/gaia-try/rev/697bc885c5f5139b213e4b72be05e8fe7de0ea0c/is_done

Thanks Nick and Catlee!

I think we're done here.
Status: NEW → RESOLVED
Closed: 5 years ago
Resolution: --- → FIXED
We don't upload crashreporter symbols to latest-nightly/ so debug tests (and any crashes in opt tests most likely) are burning.

I'm probably going to force Nick to review my post_upload.py patch.
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Attachment #8400255 - Attachment is obsolete: true
Posted patch post_upload4Splinter Review
Back to best-effort softlink!
Attachment #8416016 - Flags: review?(nthomas)
Attachment #8416016 - Flags: review?(nthomas) → review+
Comment on attachment 8416016 [details] [diff] [review]
post_upload4

https://hg.mozilla.org/build/tools/rev/1fb7364360a1
(bb08)deathduck:/src/clean/svn/productdelivery/files/bin [12:21:15]
758$ svn ci
Sending        post_upload.py
Transmitting file data .
Committed revision 86794.
Attachment #8416016 - Flags: checked-in+
Posted patch use_tinderbox-builds-latest (obsolete) — Splinter Review
I'm going to give this a whirl in staging.
Comment on attachment 8416071 [details] [diff] [review]
use_tinderbox-builds-latest

Crap, this doesn't account for debug.
This is going to be fragile around merge day; ideally I fix bug 979554 before next merge day and we upload the gecko/crashreporter/tests files without version numbers.

I tested as far as downloading/extracting the installer, test zip, and crashreporter symbols, which works.
Attachment #8416199 - Attachment is obsolete: true
Attachment #8416618 - Flags: review?(rail)
Fix the linux32 links
Attachment #8416618 - Attachment is obsolete: true
Attachment #8416618 - Flags: review?(rail)
Attachment #8416645 - Flags: review?(rail)
Posted patch wahSplinter Review
Attachment #8416645 - Attachment is obsolete: true
Attachment #8416645 - Flags: review?(rail)
Attachment #8416648 - Flags: review?(rail)
Attachment #8416619 - Flags: review?(rail) → review+
Attachment #8416648 - Flags: review?(rail) → review+
Comment on attachment 8416619 [details] [diff] [review]
(custom) deal with opt_extra_args / debug_extra_args

https://hg.mozilla.org/build/buildbotcustom/rev/0122c7db5121
Attachment #8416619 - Flags: checked-in+
afaict, coallesce_jobs does nothing.
enable_merging specifies whether coallescing happens.
Attachment #8417563 - Flags: review?(catlee)
Attachment #8417563 - Flags: review?(catlee) → review+
Merged and deployed to production.
I don't see coalescing anymore.
The symbol downloads should work ondemand for opt tests; the debug tests were turned off.  Resolving fixed again.
Status: REOPENED → RESOLVED
Closed: 5 years ago5 years ago
Resolution: --- → FIXED
Component: General Automation → General
You need to log in before you can comment on or make changes to this bug.