Closed
Bug 1132151
Opened 10 years ago
Closed 9 years ago
Adapt mozregression to use builds on S3
Categories
(Testing :: mozregression, defect)
Testing
mozregression
Tracking
(Not tracked)
RESOLVED
FIXED
People
(Reporter: jgriffin, Assigned: parkouss)
References
Details
Attachments
(1 file)
Releng is working on switching build storage from ftp to S3; at some point ftp will be going away. This means we'll need to adapt mozregression to be able to find and download files from S3.
Reporter | ||
Comment 1•10 years ago
|
||
mshal, lightsofapollo, can one of you provide instructions we can use to locate/download builds from S3, so we can begin looking at how we can adapt mozregression?
Flags: needinfo?(mshal)
Flags: needinfo?(jlal)
Assignee | ||
Comment 2•10 years ago
|
||
According to jgriffin, it's likely that CDN won't work for nightlies after the switch to s3 is made. (bug 1113072)
Comment 3•10 years ago
|
||
I suspect we'll need to figure out bug 1133074 before trying anything meaningful here, but here is a rough outline of what it might look like using taskcluster-client.py (https://github.com/taskcluster/taskcluster-client.py/) -
import taskcluster
queue = taskcluster.Queue()
index = taskcluster.client.Index()
# The actual index here will depend on bug 1133074
task = index.findTask('buildbot.branches.mozilla-central.linux')
url = queue.buildUrl('getLatestArtifact', task['taskId'], 'public/build/firefox-38.0a1.en-US.linux-i686.tar.bz2')
# Fetch URL
print "URL: ", url
Flags: needinfo?(mshal)
Comment 4•10 years ago
|
||
Ok, I talked with :mshal, :jonasfj and a few others on irc about this, see https://bugzilla.mozilla.org/show_bug.cgi?id=1133074#c6
Here's what I think needs to happen:
1. Let :mshal, :jonasfj or whoever implement the indexing described in my comment bug 1133074 (specifically the part about being able to get builds within some kind of date range)
2. Start adapting mozregression to use taskcluster for getting lists of inbound builds, using the API (apparently taskcluster is only currently uploading builds for desktop to S3, but we can start prototyping before everything is coverted over)
3. Once everything is moved over to being uploaded to S3 and indexed with taskcluster, remove the old ftp-crawling code and use the new taskcluster API for everything.
I believe the taskcluster API provides everything we wanted in bug 1095756 in terms of faster download times, I suspect we should probably close that one in favor of working on this.
Flags: needinfo?(jlal)
Reporter | ||
Comment 5•10 years ago
|
||
Mozdownload is going to need the same treatment. Does it make sense to try to unify these things since we are going to have to make a significant refactor?
Flags: needinfo?(wlachance)
Comment 6•10 years ago
|
||
IMO, no. mozdownload and mozregression are pretty different animals that share only a small subset of functionality in common. I don't think much is to be gained from trying to unify them.
Flags: needinfo?(wlachance)
Comment 7•10 years ago
|
||
mozdownload will switch to CDN for now to speed up download rates for people outside of the US. Especially for beta and release builds. Once S3 is setup we will also switch to it. Unless then please see https://github.com/mozilla/mozdownload/issues/252 for the CDN case.
Comment 8•10 years ago
|
||
(In reply to Michael Shal [:mshal] from comment #3)
> I suspect we'll need to figure out bug 1133074 before trying anything
> meaningful here, but here is a rough outline of what it might look like
> using taskcluster-client.py
> (https://github.com/taskcluster/taskcluster-client.py/) -
>
> import taskcluster
>
> queue = taskcluster.Queue()
> index = taskcluster.client.Index()
>
> # The actual index here will depend on bug 1133074
> task = index.findTask('buildbot.branches.mozilla-central.linux')
> url = queue.buildUrl('getLatestArtifact', task['taskId'],
> 'public/build/firefox-38.0a1.en-US.linux-i686.tar.bz2')
A little late to the party, but it would be nice if the index was designed such that you didn't need to know the version number. Whether that's a totally-generic thing like:
url = queue.buildUrl('getLatestArtifact', task['taskId'], type='package')
or just some wildcard-based fuzziness like:
url = queue.buildUrl('getLatestArtifact', task['taskId'], 'public/build/firefox-*.tar.bz2')
that would be a lot more usable.
Comment 9•10 years ago
|
||
@ted,
There is bug 1137562 to get redirect artifacts with a static name for artifacts that have
a version number in their name. This would make automation much easier many places.
For now this is best implemented at the task-level, ie. in mozharness.
Note, the type='package' syntax is nice. There have been discussions on artifact types that carries
some semantics, but so far we haven't found a good solution.
Comment 10•10 years ago
|
||
if we could get the packaging code to stop embedding the version number into the filename, then this problem also goes away
Reporter | ||
Comment 11•10 years ago
|
||
So it seems like this is pretty straightforward using the TaskCluster client (https://github.com/taskcluster/taskcluster-client.py).
Here's a gist which demonstrates the API's: https://gist.github.com/jonallengriffin/6f21f9963771cb711303
The only limitations:
1 - we can't non-magically distinguish a build url from a tests.zip url, but that's no different than the current situation
2 - we can't get a list of revisions from most recent to older, as we could indirectly by scraping the FTP site; we'll probably have to use https://hg.mozilla.org/integration/mozilla-inbound/json-pushes or similar to get that
Julien, is that enough for mozregression's needs?
Flags: needinfo?(j.parkouss)
Assignee | ||
Comment 12•10 years ago
|
||
To resume the mozregression needs, we need two ways of acessing build information:
1. by date (for nightlies)
2. by changesets (for inbound)
As far as I can understand (on the gist link you provided, https://gist.github.com/jonallengriffin/6f21f9963771cb711303#file-gistfile1-py-L37), there is a way to get a task by "revision" (I tried on mozilla-inbound, "revision" seems to be the 12 first characters of a changeset). As we already use json-pushes in this case to order build, no problem here.
So point 2 should be good already.
About point 1, :jgriffin if you are suggesting that we can use json-pushes to get the revision between two dates (as suggested in bug 1133074 comment 9 point C) then I think it's ok in theory. This seems to query mozilla-central for example (looking at http://mozilla-version-control-tools.readthedocs.org/en/latest/hgmo/pushlog.html#query-parameters):
https://hg.mozilla.org/mozilla-central/json-pushes?startdate=2015-02-01&enddate=2015-03-01
But I am not able to get a result for a revision found in the json result using:
index.findTask('buildbot.revisions.2ed663b8bc05.mozilla-central.linux')
I got a "Indexed task not found" error, so I suppose:
1. the url https://hg.mozilla.org/mozilla-central/json-pushes is not what I think
2. The syntax I use in index.findTask is wrong (but this is good for mozilla-inbound)
3. something else ?
So once this point is cleared - that is we can get revisions between two dates for mozilla-central | mozilla-aurora | ..., and that we can use one of these revision with taskcluster to get build info - then I think it will be enough for mozregression's needs.
Flags: needinfo?(j.parkouss)
Assignee | ||
Comment 13•10 years ago
|
||
By the way, TaskCluster seems awesome! It will be really great for mozregression. :)
@wlach
Maybe we could start using it for inbounds ? This may be a good starting point for TaskCluster integration in mozregression - and probably a regression hunting speedup.
Reporter | ||
Comment 14•10 years ago
|
||
> But I am not able to get a result for a revision found in the json result using:
>
> index.findTask('buildbot.revisions.2ed663b8bc05.mozilla-central.linux')
That's because 2ed663b8bc05 is from Feb 1, 2015, and S3 uploads were not turned on until somewhere around the end of March.
The fate of older data on the FTP server is still unclear; it may be moved to S3 as well, or elsewhere. For now it probably makes sense to adapt mozregression to use newer data, and we can figure out what to do about older data later. In the interim mozregression may need to poll TC for new builds but use the old FTP-scraping logic for older builds. :(
Reporter | ||
Comment 15•10 years ago
|
||
See also bug 1147107 as a reference.
Comment 16•10 years ago
|
||
> 2 - we can't get a list of revisions from most recent to older, as we could indirectly by scraping the FTP site; we'll
> probably have to use https://hg.mozilla.org/integration/mozilla-inbound/json-pushes or similar to get that
Already Julien seems to have a solution, however, I wanted to share that with mozci we created a module which uses pushlog's API:
https://github.com/armenzg/mozilla_ci_tools/blob/master/mozci/sources/pushlog.py
http://mozilla-version-control-tools.readthedocs.org/en/latest/hgmo/pushlog.html
For reference.
Comment 17•10 years ago
|
||
With these taskcluster bugs, we should be able to just grab artifacts directly using the index parameters - something like:
wget http://.../artifacts/mozilla-central.linux64.nightly/firefox.tar.bz2
or:
wget http://.../artifacts/2015.04.27.linux64.nightly/firefox.tar.bz2
(just rough examples - the actual URLs are TBD :)
That should mean mozregression and other clients can grab artifacts without needing the taskcluster python client. The python client can be used today in order to try things out, but this should be even simpler in the near future.
Updated•10 years ago
|
Assignee | ||
Comment 18•10 years ago
|
||
Ok, so thinking more about this, and after some discussions on #taskcluster, #treeherder and #releng, I can see two problems finally:
1. the syntax 'buildbot.revisions.2ed663b8bc05.mozilla-central.linux' is not universal. According to :pmoore we can't use it for b2g, and it won't work for firefox desktop and fennec soon (bug 1135206, bug 1118394). So we need to have a clean and universal way to find a build given a revision / branch / platform [/ app_name] (tracked by bug 1159700).
2. Dates. With json-pushes, we have the push dates. Previously we were crawling this kind of url:
http://ftp.mozilla.org/pub/mozilla.org/firefox/nightly/2015/01/2015-01-03-03-02-15-mozilla-central/
And we used the date in the url as reference. I asked on #releng, and that date in the url is the timestamp of the build (buildbot buildid property).
So basically we need to be able to get build information from a build date (so not using json-pushes here). This is tracked by bug 1133074.
Assignee | ||
Comment 19•10 years ago
|
||
Ok, so according to :catlee, my assumption for the point 2 is false - the date is the date of the push (see bug 1133074 comment 17).
So we could use the dates from the json-pushes it seems - in this case bug 1133074 is not a requirement here.
I'll give this a try when I got the time.
Comment 20•10 years ago
|
||
(In reply to Julien Pagès from comment #18)
> 1. the syntax 'buildbot.revisions.2ed663b8bc05.mozilla-central.linux' is not
> universal. According to :pmoore we can't use it for b2g, and it won't work
> for firefox desktop and fennec soon (bug 1135206, bug 1118394). So we need
> to have a clean and universal way to find a build given a revision / branch
> / platform [/ app_name] (tracked by bug 1159700).
By "not universal" you mean because it has "buildbot" in the name? catlee and I were just talking about that this morning - we should be able to just do a separate index without "buildbot" so that both taskcluster and buildbot jobs can share the same revisions namespace and such.
Comment 21•10 years ago
|
||
Will inbound-archive get some speed-love, too?
Today I hit a new low-score: 27KiB/s. :(
Downloading the same inbound-build (four weeks old) from download.cdn.mozilla.net = 1.5MiB/s
Assignee | ||
Comment 22•9 years ago
|
||
So after some discussion on irc with :jgriffin, it appears that using archive.mozilla.org instead of ftp.mozilla.org is one of the possible long term approach (for now :)) to get builds in the s3 way.
So this should be the only thing we need to change to use s3 with the nightlies. For info this is what was done for mozdownload, see Bug 1136822.
I gave this a try, and it seems to work well (for recent builds, and I also tried old builds from 2010).
Assignee | ||
Comment 23•9 years ago
|
||
This just uses https://archive.mozilla.org instead of http://ftp.mozilla.org.
And we should now be s3 compatible with that change.
Note that it looks a lot faster for me (in europe) than the old ftp. If it is true, this is awesome!
Comment 24•9 years ago
|
||
Comment on attachment 8640590 [details] [review]
use builds on S3 for nightlies
Awesome! :)
Attachment #8640590 -
Flags: review?(wlachance) → review+
Assignee | ||
Comment 25•9 years ago
|
||
Merged with https://github.com/mozilla/mozregression/commit/577940fe80b51bf47f8ce2df2010cbaca276e26a.
Status: ASSIGNED → RESOLVED
Closed: 9 years ago
Resolution: --- → FIXED
Comment 26•9 years ago
|
||
archive.mozilla.org is only a short-term solution for Firefox. We're planning on migrating away from it in Q4/Q1
Assignee | ||
Comment 27•9 years ago
|
||
ok, thanks :catlee for the comment.
So for firefox and firefox OS (mozregression does not use Firefox OS for now, but support will probably be added) this is not the long term approach here. We will have to use taskcluster indexes - and probably keep archive.m.o for old builds.
Reporter | ||
Comment 28•9 years ago
|
||
I'm going to call this done; we can open a new bug for any follow-up work that may be needed in Q4/Q1.
Status: REOPENED → RESOLVED
Closed: 9 years ago → 9 years ago
Resolution: --- → FIXED
You need to log in
before you can comment on or make changes to this bug.
Description
•