Closed Bug 1344936 Opened 8 years ago Closed 7 years ago

archive.m.o no longer has tinderbox builds for linux jobs

Categories

(Release Engineering :: General, defect)

defect
Not set
normal

Tracking

(Not tracked)

RESOLVED WORKSFORME

People

(Reporter: erahm, Unassigned)

References

Details

As of January 18th there are no longer entries for Linux tinderbox builds on https://archive.mozilla.org. The assumption is that we turned off buildbot builds and taskcluster builds don't have entries in archive.m.o.

This has broken our memory regression tracking tool, AreWeSlimYet. It has also broken mozdownload (which AreWeSlimYet uses, so it's double broken). In theory it's broken anything that relies on mozdownload for tinderbox builds.

It would be great if:
  a) We could store taskcluster builds on archive.m.o
  b) Backfill all missing logs

There is further discussion on dev-platform: https://groups.google.com/d/msg/mozilla.dev.platform/r1J4b496XiM/42RpNfPnAwAJ
It looks like bug 1330680 / bug 1253312 / others in that dep tree are where the builds were switched off in the buildbot-configs.
:bz also brought up concerns that our taskcluster build storage doesn't necessarily have the same lifetime guarantees as our archive.m.o storage did.
Chris, can you please help to clarify if that was in intended step or just a side-effect by turning of buildbot jobs? Thanks.
Flags: needinfo?(catlee)
May someone else from Releng can help to get at least a discussion started. CC'ing Kim and Alin who both worked on the patches for disabling builds on Buildbot.
Flags: needinfo?(kmoir)
Flags: needinfo?(aselagea)
So the artifacts for each build can be seen in the taskcluster artifacts index.  For instance, this is the latest firefox linux-opt build on m-i

https://tools.taskcluster.net/index/artifacts/#gecko.v2.mozilla-inbound.latest.firefox/gecko.v2.mozilla-inbound.latest.firefox.linux-opt

You can also look by revision

https://tools.taskcluster.net/index/artifacts/#gecko.v2.mozilla-inbound.revision.0009a43ad49a316c86cdb52bcb44006b32011170.firefox/gecko.v2.mozilla-inbound.revision.0009a43ad49a316c86cdb52bcb44006b32011170.firefox.linux64-opt

or pushdate

https://tools.taskcluster.net/index/artifacts/#gecko.v2.mozilla-inbound/gecko.v2.mozilla-inbound

As for the tasklcluster artifact retention policy, it appears to be 28 days for tasks on try, 1 year for other artifacts
https://dxr.mozilla.org/mozilla-central/source/taskcluster/taskgraph/transforms/task.py#819

(see bug 1304180)

release artifacts are moved to another bucket with a different retention policy

I'm not sure if there are plans to move the artifacts to amo, let me find out.
Flags: needinfo?(kmoir)
Actually, I should have asked in the earlier comment, what are your retention requirements for the artifacts for AreWeSlimYet and mozdownload?

As Callek mentioned in the dev-platform thread, the nightly builds via taskcluster are copied here, but not the per commit ones

http://archive.mozilla.org/pub/firefox/nightly/
Basically for regression testing it would always be good to be able to go back as far as possible. We had a lot of situations in the past when older nightly builds were not enough and tinderbox builds were necessary.

So the retention policy of Taskcluster artifacts would be the limiting factor here, if nothing would be mirrored.
So would it be possible to change the url that the AreWeSlimYet and mozdownload projects use to the taskcluster index? Going forward we'll be moving more platforms to taskcluster
(In reply to Kim Moir [:kmoir] from comment #8)
> So would it be possible to change the url that the AreWeSlimYet and
> mozdownload projects use to the taskcluster index?

tl;dr - For AWSY, no.

AWSY currently scrapes archive.m.o for new builds and uses timestamps to identify builds throughout rather than revisions (it's old, I didn't make this choice). Upgrading that would be a big task, we don't have anyone to do it. If we had some warning we probably could have planned for this.

The second issue is that AWSY also depends on mozdownload which would need to be updated. I can't speak to how quickly that can be done, but again I believe there aren't resources dedicated to it this quarter. Henrik can probably fill in here.

> Going forward we'll be moving more platforms to taskcluster

This is understood, and I'm fine with it, we just need a grace period. We're working on getting AWSY running in task cluster (rather than as a standalone server), which would make this a non-issue in the next 4 weeks or so. Given this is already in process, reworking the current AWSY to be version and taskculster based is probably a non-starter.

The issue right now is we have no memory regression tracking in the interim (since Jan 18th, continuing with all new builds), and that's problematic. We need to backfill this and pretty much the only way to do that is with builds on archive.m.o.
(In reply to Henrik Skupin (:whimboo) from comment #3)
> Chris, can you please help to clarify if that was in intended step or just a
> side-effect by turning of buildbot jobs? Thanks.

Yes, this was intended. The Taskcluster index is a much more scalable and flexible way of accessing this data. I'm sorry that you were caught by surprise here.

How is AWSY affected, aside from mozdownload? Is your code available somewhere we can look at?

(In reply to Henrik Skupin (:whimboo) from comment #7)
> Basically for regression testing it would always be good to be able to go
> back as far as possible. We had a lot of situations in the past when older
> nightly builds were not enough and tinderbox builds were necessary.
> 
> So the retention policy of Taskcluster artifacts would be the limiting
> factor here, if nothing would be mirrored.

The retention policy for Taskcluster artifacts is 1 year for all branches, except for try, which is 28 days. This is much longer retention that we had for tinderbox builds.

Nightly builds are also kept for 1 year on Taskcluster, which is part of the reason why we copy them to archive.m.o so we can keep them forever.
Flags: needinfo?(catlee)
(In reply to Chris AtLee [:catlee] from comment #10)
> Is your code available somewhere we can look at?

I'm presuming this is it:
https://github.com/mozilla/areweslimyet/blob/master/benchtester/BuildGetter.py
Flags: needinfo?(aselagea)
ni :catlee for AWSY code links.
Flags: needinfo?(catlee)
You're right, it's going to be hard to refactor this code to get away from tinderbox-builds.

I hear there are efforts underway to run AWSY under Taskcluster. How far away is that from happening?
Flags: needinfo?(catlee)
(In reply to Chris AtLee [:catlee] from comment #14)
> You're right, it's going to be hard to refactor this code to get away from
> tinderbox-builds.
> 
> I hear there are efforts underway to run AWSY under Taskcluster. How far
> away is that from happening?

We have an initial landing, it'll take a few weeks to get enough data to decide whether or not running in AWS produces stable results. Unfortunately that still leaves us with a 2 month gap even if everything goes well.
Closing this since I believe AWSY is running as part of regular automation and submitting to perfherder.
Status: NEW → RESOLVED
Closed: 7 years ago
Resolution: --- → WORKSFORME
Component: General Automation → General
You need to log in before you can comment on or make changes to this bug.