clean up b2g builds on pvtbuilds2

RESOLVED FIXED

Status

Infrastructure & Operations
RelOps
RESOLVED FIXED
5 years ago
5 years ago

People

(Reporter: catlee, Assigned: digi)

Tracking

Details

(Reporter)

Description

5 years ago
we'll need to create regular cleanup job for these builds
(Reporter)

Comment 1

5 years ago
some data points:

each panda build is ~80MB
we currently have 85GB free
estimate 100 builds a day => 8G per day

we're just getting going here, and there aren't any consumers of these builds yet, so I think we can have pretty short retention.

let's say keep dep builds for 48 hours, and try builds for 24 hours.
(Reporter)

Comment 2

5 years ago
how's this?

# as b2gbld
@hourly find /pub/mozilla.org/b2g/tinderbox-builds/ -mindepth 2 -maxdepth 2 -type d -mtime +2 | xargs rm -rf

# as b2gtry
@hourly find /pub/mozilla.org/b2g/try-builds/ -mindepth 2 -maxdepth 2 -type d -mtime +1 | xargs rm -rf
(Reporter)

Comment 3

5 years ago
jhopkins suggested adding -print0 | xargs -0 for protection against spaces in file/directory names

# as b2gbld
@hourly find /pub/mozilla.org/b2g/tinderbox-builds/ -mindepth 2 -maxdepth 2 -type d -mtime +2 -print0 | xargs -0 rm -rf

# as b2gtry
@hourly find /pub/mozilla.org/b2g/try-builds/ -mindepth 2 -maxdepth 2 -type d -mtime +1 -print0 | xargs -0 rm -rf
(Reporter)

Comment 4

5 years ago
relops - please set up cron jobs as per comment #3
Assignee: catlee → server-ops-releng
Component: Release Engineering: Automation (General) → Server Operations: RelEng
QA Contact: catlee → arich
Assignee: server-ops-releng → dustin
Added.  I'll monitor and see how they work.
Seems fine
Status: NEW → RESOLVED
Last Resolved: 5 years ago
Resolution: --- → FIXED
This has filled up again, vast majority of the space is in /mnt/pvt_builds/pub/mozilla.org/b2g/tinderbox-builds

:gcox has added some extra space to the NFS mount to avoid bothering anyone late at night but would like to know if this can be temporary and rolled back.

Checking the cleanup script it's not finding anything old enough to delete. Seems to be running ok, there's just a lot of stuff there from the last couple of days.
Status: RESOLVED → REOPENED
Resolution: FIXED → ---

Comment 8

5 years ago
> :gcox [...] would like to know if this can be temporary and rolled back.

... or, if there's not a cleanup, "use your knowledge of the volume to give me a better estimate of what the disk needs are going to be, please and thank you."  I just took a SWAG when I added space (was 100g, bumped to 125g for the overnight).
(In reply to Chris AtLee [:catlee] from comment #3)
> jhopkins suggested adding -print0 | xargs -0 for protection against spaces
> in file/directory names
> 
> # as b2gbld
> @hourly find /pub/mozilla.org/b2g/tinderbox-builds/ -mindepth 2 -maxdepth 2
> -type d -mtime +2 -print0 | xargs -0 rm -rf

This one is working, but each panda job uploads about 145MB of data, and each otoro/unagi about 50MB, so the mozilla-inbound-* directories are huge at 24GB and 11GB/11GB respectively. 

catlee, do we need to upload 26MB of crash symbols and 15MB of b2g-20.0a1.en-US.android-arm.tar.gz for each depend build for unagi& otoro ? Panda has that as well the img.bz2's, but I guess we need that for the tests ?

> # as b2gtry
> @hourly find /pub/mozilla.org/b2g/try-builds/ -mindepth 2 -maxdepth 2 -type
> d -mtime +1 -print0 | xargs -0 rm -rf

This one is broken. It should be 1 instead of 2 on the min/maxdepth args (I've done this manually).

Having said that, all the old dirs were empty, so the 24G of space consumed by try-builds/ is very recent. Need to fix this to prevent this space growing unbounded.
Fixed the min/max depth on the try builds:

pir@wedge> svn ci puppet/trunk/manifests/nodes/fuzzing.pp -m "fix as per bug 794566"
Sending        puppet/trunk/manifests/nodes/fuzzing.pp
Transmitting file data .
Committed revision 54003.
(Reporter)

Comment 11

5 years ago
(In reply to Nick Thomas [:nthomas] from comment #9)
> (In reply to Chris AtLee [:catlee] from comment #3)
> > jhopkins suggested adding -print0 | xargs -0 for protection against spaces
> > in file/directory names
> > 
> > # as b2gbld
> > @hourly find /pub/mozilla.org/b2g/tinderbox-builds/ -mindepth 2 -maxdepth 2
> > -type d -mtime +2 -print0 | xargs -0 rm -rf
> 
> This one is working, but each panda job uploads about 145MB of data, and
> each otoro/unagi about 50MB, so the mozilla-inbound-* directories are huge
> at 24GB and 11GB/11GB respectively. 
> 
> catlee, do we need to upload 26MB of crash symbols and 15MB of
> b2g-20.0a1.en-US.android-arm.tar.gz for each depend build for unagi& otoro ?
> Panda has that as well the img.bz2's, but I guess we need that for the tests
> ?

Probably not. I bet we don't need to upload anything for unagi/otoro dep builds except the logs.
This is slightly more urgent than was earlier appreciated since it just alerted again after taking up the extra assigned space.

[root@pvtbuilds2.dmz.scl3 tinderbox-builds]# pwd
/mnt/pvt_builds/pub/mozilla.org/b2g/tinderbox-builds
[root@pvtbuilds2.dmz.scl3 tinderbox-builds]# df -h .
Filesystem            Size  Used Avail Use% Mounted on
10.22.74.11:/vol/pvtbuilds
                      125G  112G   14G  90% /mnt/pvt_builds
[root@pvtbuilds2.dmz.scl3 tinderbox-builds]# du -sh .
83G     .

If we could get the uploads changed if possible today that'd be great (and remove any unneeded files from the mount).
If there's anything I can do to help with that, let me know.

Comment 13

5 years ago
Boosted again to 150g.  :(
Alerted again for space:

< nagios-scl3> | Fri 13:38:31 PST [585] pvtbuilds2.dmz.scl3.mozilla.com:Disk - /mnt/pvt_builds is CRITICAL: DISK CRITICAL - free space: /mnt/pvt_builds 9198MB (5% inode=99%)

nthomas is trying to clean up.
Temporary fix:
 cd /mnt/pvt_builds/pub/mozilla.org/b2g/tinderbox-builds/
 rm -v *-{otoro,unagi}/*/b2g*tar.gz
 rm -v *-{otoro,unagi}/*/*.zip
Leaving 58G GB free (38%). Bug 819543 for RelEng to not upload these bits that aren't needed.

Even with that the basic problem here is that this work is ramping up, and we're going to need more space over the long term. Any estimate I'd make is likely woefully wrong given the way that b2g is scaling, but maybe a half terabyte is a good place to start. How much free capacity to we have on hand ?
Depends on: 819543
:gcox - how much space is available here?
Flags: needinfo?(gcox)

Comment 17

5 years ago
There's a lot of space, but the danger of adding space is that people often don't turn a judicious eye towards the usage until their backs are against the wall, and, by then it's a mess to go through.  The space that has been reclaimed here (e.g. unneeded uploads fixed in 819543) becomes easier to find when it's small, and as such we've gone from 112g used to 77g.

Which is to say, we could boost to 500g, but we're already looking better, so my inclination is not to throw gobs of space at something (just because we have it to throw|before it's needed).  I guess my question would be, ignoring that the volume of b2g builds will ramp, are we now keeping the right things for the ones we have?
Flags: needinfo?(gcox)
Brian, are we having space problems on this volume anymore?  If not, R/F :)
Assignee: dustin → bhourigan
per B2G meeting on Tuesday, people were uncomfortable with keeping 14-or-less days worth of builds for regression hunting. People would (ideally) like to have 30 days worth of builds, which (iirc) is consistent with how long we keep Firefox desktop builds.

Given that, can we:
1) Setup enough space for 30days worth of builds (from comment#1 and back-of-envelope-math, I'd swag that at needing 240gb, so maybe 250?300? to give some breathing room)

2) Setup deleting/retention cron jobs, to age off builds over 30 days old.

3) Once these are in place, post a policy note to the newsgroups.



(Its late, so sanity check - did I miss anything?)
(In reply to John O'Duinn [:joduinn] from comment #19)
> per B2G meeting on Tuesday, people were uncomfortable with keeping
> 14-or-less days worth of builds for regression hunting. People would
> (ideally) like to have 30 days worth of builds, which (iirc) is consistent
> with how long we keep Firefox desktop builds.
> 
> Given that, can we:
> 1) Setup enough space for 30days worth of builds (from comment#1 and
> back-of-envelope-math, I'd swag that at needing 240gb, so maybe 250?300? to
> give some breathing room)

In the meantime the share has been grown to 500G. Currently 216G of that is used, of which about 6G is fuzzing and the rest b2g. There may be some files in pvt/ that are not getting cleaned up by existing cron jobs.

<BLINK> We are currently not uploading any dep builds for otoro and unagi builds, because of bug 819543 spawned out of this bug </BLINK>. We do upload all panda builds, and keep them for a couple of days. 

If we could clarify exactly which builds people would like to keep for 30 days that'd be helpful. The 30 days for Firefox isn't quite the same because those bits are getting tested, where unagi/otoro are not.

> 2) Setup deleting/retention cron jobs, to age off builds over 30 days old.

Existing cron jobs (see early comments this bug) would need the arg on the mtime changed. Possibly modified by what is desirable to keep.

> 3) Once these are in place, post a policy note to the newsgroups.

Perhaps you mean a b2g mailing list ?
(Assignee)

Comment 21

5 years ago
I migrated the crons from pvtbuilds2 and onto the product delivery cluster, changing the retention period to 30 days.

Are otoro builds subject to the same policy?
(In reply to Brian Hourigan [:digi] from comment #21)
> I migrated the crons from pvtbuilds2 and onto the product delivery cluster,
> changing the retention period to 30 days.

Thanks.

> Are otoro builds subject to the same policy?
Yes.
(Assignee)

Comment 23

5 years ago
Thanks - I've applied the same retention policy to the Otoro builds.
(In reply to John O'Duinn [:joduinn] from comment #22)
> (In reply to Brian Hourigan [:digi] from comment #21)
> > I migrated the crons from pvtbuilds2 and onto the product delivery cluster,
> > changing the retention period to 30 days.
> 
> Thanks.
> 
> > Are otoro builds subject to the same policy?
> Yes.

(In reply to Brian Hourigan [:digi] from comment #23)
> Thanks - I've applied the same retention policy to the Otoro builds.

Thanks :digi.


Anything left to do here, or can we close?
(Assignee)

Comment 25

5 years ago
We're only at %43 on the pvtbuilds share with the retention policy in effect, I think the scope of this bug has been satisfied. Closing.
Status: REOPENED → RESOLVED
Last Resolved: 5 years ago5 years ago
Resolution: --- → FIXED
(In reply to Brian Hourigan [:digi] from comment #25)
> We're only at %43 on the pvtbuilds share with the retention policy in
> effect, I think the scope of this bug has been satisfied. Closing.

Thanks :digi!
Fixed a missing space in the cron for /pvt/b2g*/tinderbox-builds, but there was still 350G (70%) free anyway.

Sending        manifests/pvtbuilds_cron.pp
Transmitting file data .
Committed revision 58282.
Component: Server Operations: RelEng → RelOps
Product: mozilla.org → Infrastructure & Operations
You need to log in before you can comment on or make changes to this bug.