increase space on ftp.m.o so we can save tinderbox builds for ~1 month

RESOLVED FIXED

Status

mozilla.org Graveyard
Server Operations
RESOLVED FIXED
7 years ago
3 years ago

People

(Reporter: Joe Drew (not getting mail), Assigned: mrz)

Tracking

Bug Flags:
needs-treeclosure ?

Details

(Reporter)

Description

7 years ago
Right now we save tinderbox builds for something like 2 days, which is just not enough when trying to find regressions. We should buy enough disk to make it possible for us to save tinderbox builds for a lot longer.

This doesn't have to be fast disk, and it doesn't have to be redundant or backed up. It just needs to be faster to download than to build.

(The graphics team was bitten hard by this when trying to bisect a WebGL regression. It happened sometime in the last 3 days, but we don't save enough tinderbox builds to bisect.)
See also Bug 463034 where i asked for this 2 Years ago.
http://hourly-archive.localgho.st/hourly-archive2/ is an Alternative.
FWIW, the current 24 hour expiry policy has us using ~60G in firefox/tinderbox-builds at the moment. That'll fluctuate depending on how many changes land each day.

We can trim that down if we only want to make the Firefox binaries available, since the tests and symbols archive are what's taking up most of the space. Depends if the need is to be able to re-run test suites multiple days later or if you just want to be able to pull the builds and check them.
(In reply to comment #2)
> We can trim that down if we only want to make the Firefox binaries available,
> since the tests and symbols archive are what's taking up most of the space.
> Depends if the need is to be able to re-run test suites multiple days later or
> if you just want to be able to pull the builds and check them.
Talos runs would be useful, which means we at least need the symbols.  Having only 24 hours to ask for more talos runs makes it difficult sometimes.
(In reply to comment #2)
> FWIW, the current 24 hour expiry policy has us using ~60G in
> firefox/tinderbox-builds at the moment. That'll fluctuate depending on how many
> changes land each day.

At 60G per day it seems reasonable to be able to store at least 2 weeks worth of builds.

Updated

7 years ago
Assignee: server-ops → mrz
(In reply to comment #4)
> (In reply to comment #2)
> > FWIW, the current 24 hour expiry policy has us using ~60G in
> > firefox/tinderbox-builds at the moment. That'll fluctuate depending on how many
> > changes land each day.
> 
> At 60G per day it seems reasonable to be able to store at least 2 weeks worth
> of builds.

Lets increase this by 2TB.

1) We're now posting logs for builds, tests to ftp.m.o, alongside the builds.
2) As part of the Tinderbox meeting yesterday, we'd like to keep the builds and their logs, on ftp.m.o for ~1month. (This is a reduction from our current ~60day retention of *logs* on tinderbox server, but an increase from our current retention of *builds* on ftp.m.o. It doesnt make much sense to keep logs without builds, so ~30days is a good compromise. 60gb x 30 = 1.8TB, so rounding this up to 2.0TB.)
Severity: minor → normal
Summary: buy enough disk to make it possible to save tinderbox builds for at least 1 week → increase space on ftp.m.o so we can save tinderbox builds for ~1 month
Status: NEW → RESOLVED
Last Resolved: 7 years ago
Resolution: --- → DUPLICATE
Duplicate of bug: 614786
This change will consume much of the free space that bug 614786 will create so don't be surprised if you get a request to expand the space on the Netapp #1. Duping kinda obscures that this is asking for more resources.
(In reply to comment #6)
> 
> *** This bug has been marked as a duplicate of bug 614786 ***

This is not a DUP, its a 2nd request for 2TB additional space on ftp.m.o. This is to support storing builds and logs for a month. Unrelated to bug#614786.
Status: RESOLVED → REOPENED
Flags: needs-treeclosure?
Resolution: DUPLICATE → ---
Bug 614786 is supplying you with an additional 1.8 TB.  So you need 2 TB more on top of that?
ok, answered question yes on IRC.

09:16:40 < joduinn> 2TB is for keeping trybuilds for longer
09:17:00 < joduinn> other 2TB is for keeping incremental builds for longer
Assuming there's space on the netapp shelves, we could in theory expand the space on the existing partitions once the move is done.
summary after talking with aravind, aki, bhearsum:

1) aravind will add 2TB of space as "firefox/tinderbox-builds". 

2) This diskspace is RAID, but does not have HA heads. 

3) Note: it is possible to have this space disappear, if the head fails. If this happens,
* all ~30days worth of builds will disappear, until the head is replaced. Estimate 1 day. No builds will actually be lost, and they will reappear as soon as replacement head is online.
* This will *not* close the tree, because the existing high-availability mountpoint will automatically become visible, so new build-on-checkin builds can still post correctly, and still be copied down for testing. 
* Bringing back up the new head will require a brief tree closure, while we move over the files generated during the outage.
10.253.0.139:/data/tinderbox-builds is ready for use.
The new mount point and the bind mounts are in place, the old tree is being rsynced into the new one.  We can call this done.  I will delete the old tree once the rsync is done.
Status: REOPENED → RESOLVED
Last Resolved: 7 years ago7 years ago
Resolution: --- → FIXED
(In reply to comment #14)
> The new mount point and the bind mounts are in place, the old tree is being
> rsynced into the new one.  We can call this done.  I will delete the old tree
> once the rsync is done.

That rsync work is being done in bug#614786
Blocks: 614786
Slight change in *how* we implement this, after irc discussion with aki, aravind, bhearsum, justdave, joduinn. Posting here (even though bug already closed, to keep all interested parties in the loop!)

1) aki made point about not just filling the HA space because we have space is valid. Its also shared with release builds, nightlies, etc, and it would be nice to not always play "find more space games".
2) we did promise to keep builds and logs for longer then currently do, as part of the stop-using-tinderbox-server project.
3) at the time, we said 30 days (arbitrary) - based on current setup of tinderbox builds kept for 1 day, tinderbox build logs kept for 60 days
4) I'm proposing that we keep 30 days, and we now do it as 14days on HA, and 16 on nonHA. This means that if the nonHA disk fails, it will not close the tree.
5) aki proposed something about using softlinks to avoid breaking links in bugs, blogs, etc. If that is something aki and justdave can get in place, that would be great.
I've adjusted the cron job that cleans up firefox/tinderbox-builds to keep builds for 30 days. [surf:/etc/cron.d/cleanup-hourly-builds]
(In reply to comment #16)
> 4) I'm proposing that we keep 30 days, and we now do it as 14days on HA, and 16
> on nonHA. This means that if the nonHA disk fails, it will not close the tree.
> 5) aki proposed something about using softlinks to avoid breaking links in
> bugs, blogs, etc. If that is something aki and justdave can get in place, that
> would be great.

joduinn, this isn't the case right now. Please followup if you feel strongly about it.

Comment 19

7 years ago
https://bugzilla.mozilla.org/show_bug.cgi?id=614786#c50

Comment 20

7 years ago
I think it is important for links to tryserver builds to stay consistent so that we can paste them in bugs an have them work for the full time period. Do I need to file a new bug about that?
(In reply to comment #20)
There's already one on file - bug 615963.
Product: mozilla.org → mozilla.org Graveyard
You need to log in before you can comment on or make changes to this bug.