Closed
Bug 715840
Opened 12 years ago
Closed 12 years ago
Disk space issues on surf (stage)
Categories
(Release Engineering :: General, defect, P2)
Tracking
(Not tracked)
RESOLVED
FIXED
People
(Reporter: nthomas, Assigned: nthomas)
References
Details
Attachments
(1 file)
7.41 KB,
text/plain
|
Details |
eg [72] surf:disk - /mnt/netapp/stage/archive.mozilla.org/pub/firefox is WARNING: DISK WARNING - free space: /mnt/netapp/stage/archive.mozilla.org/pub/firefox 180264 MB (4% inode=97%): surf:disk - /mnt/netapp/stage/archive.mozilla.org/pub/firefox is WARNING: DISK WARNING - free space: /mnt/netapp/stage/archive.mozilla.org/pub/firefox 180264 MB (4% inode=97%):
Assignee | ||
Comment 1•12 years ago
|
||
tl;dr * we've seen more than usual data arrive on the disks for the last 3 x 24 hours, this is presumably a post-holiday rush * the rapid release process (particularly doing a beta-a-week) means we consume disk space more rapidly than we did * there are some short-term mitigations we can do, but we need to be re-examining our retention policies and asking about more space until the scl3 solution arrives
Assignee | ||
Comment 2•12 years ago
|
||
migitation 1, push bits in mobile/tinderbox-builds/ older than 4 days (ie android) onto cm-ixstore. This matches what we do for Firefox. Bug 715706 is setting up the mount points, and I have an initial transfer going on with cron job to follow. Strangely, the free space on /mnt/netapp/stage isn't going up, eg netapp free cm-ixstore01 used before 164G 69G now 162G 112G
Depends on: 715706
Assignee | ||
Comment 3•12 years ago
|
||
/mnt/netapp/stage: ------------------ aka the non-firefox partition, aka everything else I don't understand the lack of improving netapp free-space, but it's late so perhaps I'm getting confused (certainly can't spell, and the mounts are tangled with symlinks). Populating mobile/tinderbox-builds/old/ should save us about 380G on this partition, once the transfer finishes (running as ffxbld@stage). It will need bug 715706 to be finished by dustin/IT, specifically setting up the mounts on ftp.m.o, so that the files in old/ are visible everywhere. The emergency response to very low disk is to remove /mnt/netapp/stage/releases.mozilla.org which will free about 155G. Prior to that you'd need to ask IT to disable the line */15 * * * * root /root/bin/sync-stage1-releases in stage:/etc/crond/ftp-staging-rw-server so that it doesn't get put back again. AFAIK that is only used for populating /pub/mozilla.org/zz/rsyncd-motd and the nagios checks on surf for the module sizes, but check with justdave if you can raise him. /mnt/netapp/stage/archive.mozilla.org/pub/firefox: -------------------------------------------------- aka the firefox partition I've got a move of firefox/nightly/{8.0,8.0.1}-candidates to ~cltbld/old-candidate-dirs/firefox running, which is 95G more space. That might cause problems if we need to do a 8.0 -> 9.0.x partial. The emergency response to low disk is to modify stage:~ffxbld/bin/cleanup_tinderbox-builds.sh to change the |-mtime +3| to |-mtime +2| or +1 (3 or 2 days). This will shift files to cm-ixstore01 (non-HA!) earlier than we are now. Note - there is no locking around the cron job, would be worth disabling the cron (ffxbld@stage) and run the command manually for first call after that change. Trending: --------- See stage:~nthomas/stage_data.log. The format is <date> <free MB on /mnt/netapp/stage> <free MB on firefox partition> Graph at http://people.mozilla.com/~nthomas/trend-recent.png
Comment 4•12 years ago
|
||
Here's a du -hsx of the subdirs of /mnt/netapp/stage/archive.mozilla.org/pub: 15G addons 166G calendar 97G camino 648K cck 54M chimera 2.8G data 93M directory 3.3G diskimages 2.5M extensions 5.6G firebird 3.5M grendel 9.4G js 33M l10n-kits 455M labs 2.8G minimo 1011G mobile 128G mozilla 848K msgsdk 303M nspr 620K OJI 382M phoenix 20M profiles 1.1T seamonkey 8.3G security 49M static-analysis 1.4T thunderbird 251M utilities 762M webtools 1.4T xulrunner 44K zz The highlights (over 1TB) are xulrunner, thunderbird, seamonkey, and mobile. Are any of those sizes surprising (I'm new to this)?
Comment 5•12 years ago
|
||
This probably isn't news to anyone but me, but nightlies appear to be the unbounded-growth item here: 30M mobile/1.0b1/ 29M mobile/1.0b2/ 31M mobile/1.0b3/ 38M mobile/1.0b4/ 37M mobile/1.0b5/ 38M mobile/1.0rc1/ 38M mobile/1.0rc3/ 19M mobile/b1rc3/ 96G mobile/candidates/ 62M mobile/dists/ 701G mobile/nightly/ 80G mobile/releases/ 2.7G mobile/repos/ 534M mobile/source/ 132G mobile/tinderbox-builds/ 4.0K seamonkey/bundles/ 3.7G seamonkey/experimental/ 791G seamonkey/nightly/ 219G seamonkey/releases/ 46G seamonkey/tinderbox-builds/ 102M thunderbird/bundles/ 9.4M thunderbird/extensions/ 61M thunderbird/m-builds/ 796G thunderbird/nightly/ 527G thunderbird/releases/ 72M thunderbird/test/ 13G thunderbird/tinderbox-builds/ 29G thunderbird/try-builds/ 3.3M xulrunner/eclipse/ 96K xulrunner/mar-generation-tools/ 1.4T xulrunner/nightly/ 45G xulrunner/releases/ 7.8G xulrunner/tinderbox-builds/
Comment 6•12 years ago
|
||
(In reply to Dustin J. Mitchell [:dustin] from comment #5) > 701G mobile/nightly/ Possibly some cleanup to be done here, but this might be accurate. > 791G seamonkey/nightly/ cc-ing Callek to see whether there's anything to be done for SeaMonkey here. > 1.4T xulrunner/nightly/ Ugh, all the release candidates dirs are still in there, most of which can probably go away since they're surprisingly large with source+bundles+sdks. Needs similar crontabs setup to ffxbld too, I reckon.
Assignee | ||
Comment 7•12 years ago
|
||
Thanks for looking Dustin. I wasn't aware of size of xulrunner/ or mobile/, and there may be things that needn't be kept in thunderbird/ or seamonkey/ too. Eg for Firefox we only keep a zip for windows instead of windows+zip, delete out old nightly mar and test files. I'll follow up to analyse those some more.
Our current state is pretty good:
> Filesystem Size Used Avail Use% Mounted on
> 10.253.0.10:/vol/ftp_stage
> 6.1T 5.5T 595G 91% /mnt/netapp/stage
> 10.253.0.11:/vol/stage
> 4.2T 3.7T 471G 89% /mnt/netapp/stage/archive.mozilla.org/pub/firefox
On the first partition that is mostly the mobile/tinderbox-builds change, shuffling the older builds to cm-ixstore. It took some time for that to show up in df output, which everyone professes to be weird but somehow related to when the netapp actually frees the space (eg during the daily deduping).
On the second (firefox) that seems to be a combination of starting to move a big chunk of a very busy last week off to cm-ixstore01, and lerxst allocating another 100G.
Assignee | ||
Comment 8•12 years ago
|
||
xulrunner: * tinderbox-builds/: static, 2.5G. Copies of the xulrunner nightlies on central, aurora, and 1.9.2 * nightlies/: * for each day with code changes, 810M / branch for central, aurora & beta; 220M for 1.9.2 * unnecessary beta nightlies will be disabled in bug 716775 * bug 661244 should get central and aurora back under 300M (ie save 500M/day) * release related: * releases/: 1150MB growth per beta or release, permanent storage * nightly:/ 1250MB temporary per build cycle for nightly/<version>-candidates. Deletable after we release a version * bug 661244 would also help here * about 60G in old candidates directories, which I propose we just nuke since there's nothing in there that's not in releases/ already. Oh, actually 9.0b5, 9.0b6, and 10.0b1 releases never got copied into releases/, so should double check that assertion and do fixes.
Comment 9•12 years ago
|
||
(In reply to Nick Thomas [:nthomas] from comment #8) > * about 60G in old candidates directories, which I propose we just nuke > since > there's nothing in there that's not in releases/ already. Oh, actually > 9.0b5, > 9.0b6, and 10.0b1 releases never got copied into releases/, so should > double check > that assertion and do fixes. Nick: were you waiting on feedback before proceeding with the cleanup? I say go for it.
Assignee | ||
Comment 10•12 years ago
|
||
After another disk crunch, I've set the cron jobs to keep 3 days of {firefox,mobile}/tinderbox-builds on the netapp disk before moving them to cm-ixstore. The same is happening for firefox/try-builds, but is still running. The normal cron job is disabled while that happens because the first sync is > 1hr. (In reply to Chris Cooper [:coop] from comment #9) > Nick: were you waiting on feedback before proceeding with the cleanup? I say go for > it. Aki mostly did this a few days ago.
Assignee | ||
Comment 11•12 years ago
|
||
try-builds is done and the cron re-enabled.
Comment 12•12 years ago
|
||
/mnt/netapp/stage/archive.mozilla.org/pub/firefox is CRITICAL: DISK CRITICAL - free space: /mnt/netapp/stage/archive.mozilla.org/pub/firefox 127766 MB (2% inode=96%)
Assignee | ||
Comment 13•12 years ago
|
||
Turns out some locking was added for moving older Firefox try builds over cm-ixstore, but a stale lock was left after setting that up so it was effectively disabled. The catchup is running.
Comment 14•12 years ago
|
||
To get this into a bug instead of just email, the next step to alleviating this disk space problem is to split off one or more directories onto their own partitions on the other sjc1 netapp. We have verified that snapshots are off and that there's nothing more we can remove there. From dparsons: We have 1TB I can allocate on mpt-netapp-b. It has the same connectivity to the network as mpt-netapp-a, but it has less load on it too. I think splitting the data up is the only way to go, if waiting for scl3 isn't an option.
Comment 15•12 years ago
|
||
per releng/it meeting: 1) this netapp-b is also tier1, just like netapp-a. 2) netapp-b has only ~1TB capacity, so RelEng to figure out what subset on partition can be moved from netapp-a. 3) nthomas/dparsons: can you confirm that this netapp-b is as performant as netapp-a?
Comment 16•12 years ago
|
||
:joduinn, mpt-netapp-b has the same specs as mpt-netapp-a. In terms of active load, it's a little less loaded than mpt-netapp-a, so it might be a little faster.
Comment 17•12 years ago
|
||
nthomas: bear just sent mail about sync-ing up with you. I think the easiest win here would be to move the xulrunner/nightly/20[05,06,07,08,09,10,11] to cm-ixstore01, or whatever subset will fit. That could buy us up to 1.3TB by my calculations. Should we tackle that here or in bug 725811?
Comment 18•12 years ago
|
||
I'm running a heavily nice'd version of du -hsx of the subdirs of /mnt/netapp/stage/archive.mozilla.org/pub to confirm the current levels of usage and will put the output info into https://etherpad.mozilla.org/iDYivvH9c4 From there we can work out what scripts will be needed to perform the moves as moving this much data will probably have to happen overnight to minimize the impact (load and dir changes.)
Comment 19•12 years ago
|
||
cm-ixstore01 is going to be decommissioned as part of the move to scl3, and moving data from it to the scl3 netapp is a difficult process. Is there not enough space on mpt-netapp-[b-d]? How much do you need in total?
Comment 20•12 years ago
|
||
(In reply to Dan Parsons [:lerxst] from comment #19) > cm-ixstore01 is going to be decommissioned as part of the move to scl3, and > moving data from it to the scl3 netapp is a difficult process. ok, good to know. Also noting from RelEng/IT meeting this morning that IT does not recommend cm-ixstore for "tier1" files. We'd really dont want to use cm-ixstore for tier1 files unless we cant find any other way out of this recurring-production-impacting corner while we wait for magic of scl3. > Is there not enough space on mpt-netapp-[b-d]? Oh, didnt know about netapp-c,netapp-d in this morning's meeting. How much space can we have across netapp-b/c/d ? netapp-b has ~1TB free? netapp-c has ?? free? netapp-d has ?? free? (from irc, some think there's 500gb free, but not certain). > How much do you need in total? per comment#18, Bear is calculating, and we expect to have this data tmrw.
Comment 21•12 years ago
|
||
From #infra: [5:24pm] lerxst: there's a possible alternative here [5:24pm] lerxst: zimbra_backups is on the same aggr as ftp_stage, and it's taking 1.85TB [5:25pm] lerxst: if we can move zimbra_backups to scl3, I can give the current ftp_stage a lot more space [5:25pm] lerxst: let me look into that [5:32pm] coop: lerxst: would that be an immediate fix? [5:32pm] lerxst: coop: very possibly yes. looking into it now [5:32pm] • coop didn't know anything was up in scl3 yet. good to hear [5:32pm] lerxst: i just built the netapp there yesterday. doing the esx servers there right now
Comment 22•12 years ago
|
||
I'm moving zimbra_backups from mpt-netapp-a to scl3-na1a right now. It should be done within a few hours. So, some time later tonight, I should be able to give another 1TB to ftp_stage.
Comment 23•12 years ago
|
||
(In reply to Dan Parsons [:lerxst] from comment #22) > I'm moving zimbra_backups from mpt-netapp-a to scl3-na1a right now. It > should be done within a few hours. So, some time later tonight, I should be > able to give another 1TB to ftp_stage. Is that extra 1TB going to be added to the existing partition, or will this be a new partition, i.e. we'll still need to figure out what to move onto that freed space?
Comment 24•12 years ago
|
||
If we *do* still need to move stuff around, I've tried to collate our thoughts in a single spot here, since we're also talking about related issues in bug 708865. There are two directory structures we're looking at moving, namely the yearly archives under xulrunner/nightly and firefox/nightly. The contents of both dirs are static, i.e. they are dated archives of nightly builds from previous years and will never be modified or added to. AIUI, the firefox nightly archives are already backed up to tape (https://bugzilla.mozilla.org/show_bug.cgi?id=708865#c1). I don't know whether the xulrunner nightlies are backed up or not. If tape backup of the firefox nightlies is tested and reliable, I would be fine moving those firefox nightlies to the slower storage of cm-ixstore01, with the understanding that we would need to either transfer those builds over from cm-ixstore01 to scl3, or simply restore from tape again into scl3. The xulrunner nightly archives could be redistributed around on any of the current netapp[b-d] devices that have sufficient space. They don't all have to be on the same device, provided the dir structure looks like it does now to someone trying to find the builds. Here are the sizes of the various yearly archives I'm talking about shuffling around: /pub/mozilla.org/xulrunner/nightly/20?? 6.1G 2005 17G 2006 29G 2007 49G 2008 186G 2009 319G 2010 684G 2011 /pub/mozilla/firefox/nightly/20?? 13G 2004 25G 2005 104G 2006 80G 2007 112G 2008 168G 2009 181G 2010 ???G 2011 (Still running)
Comment 25•12 years ago
|
||
I know cshields is keen to revisit the decision about whether we actually *need* to keep all these nightlies or not. While I don't disagree in principle, we've already had that public discussion in the past two years: http://coop.deadsquid.com/2010/07/reclaiming-space-on-stage-mozilla-org-space-reclaimed Developers let us know the archives are a useful tool for regression hunting, and yes, some devs have gone back multiple years when bisecting builds. For now, we need to treat the nightly archives as a necessary, legacy artifact and not delete them in the interim because they are inconvenient in the face of a colo move. The newsgroup thread on the issue is here: https://groups.google.com/group/mozilla.dev.planning/browse_thread/thread/67c282b346b3f968/3f118a7411a21712?#3f118a7411a21712
Assignee | ||
Comment 26•12 years ago
|
||
(In reply to Chris Cooper [:coop] from comment #24) > There are two directory structures we're looking at moving, namely the > yearly archives under xulrunner/nightly and firefox/nightly. The contents of > both dirs are static, i.e. they are dated archives of nightly builds from > previous years and will never be modified or added to. There are some cleanups still pending on firefox/nightly/2011/, eg installer and xpi's older than 6 months, but otherwise this is true. They can be easily be redone if we are restoring from tape.
Comment 27•12 years ago
|
||
:coop, when this copy is done, I'll be able to immediately expand ftp_stage by 1TB. No one will have to move anything to make this space happen. Unfortunately it's going to be a few more hours (so, possibly tomorrow once I wake up again) but when that happens, hey, 1TB for everybody.
Comment 28•12 years ago
|
||
From the du run of last night: /mnt/netapp/stage/archive.mozilla.org/pub/firefox/nightly/20?? 13G 2004 25G 2005 100G 2006 77G 2007 108G 2008 161G 2009 174G 2010 556G 2011 419G 2012 and /mnt/netapp/stage/archive.mozilla.org/pub/ 5.1T total 16G addons 14M artwork 4.0K bouncer 177G calendar 98G camino 648K cck 54M chimera 2.8G data 93M directory 3.3G diskimages 2.5M extensions 5.6G firebird 3.7T firefox 3.5M grendel 6.8G js 33M l10n-kits 524M labs 2.8G minimo 760G mobile 128G mozilla 848K msgsdk 304M nspr 620K OJI 382M phoenix 20M profiles 952G seamonkey 8.3G security 49M static-analysis 1.5T thunderbird 251M utilities 778M webtools 1.5T xulrunner 48K zz The largest are 3.7T firefox 1.5T xulrunner 1.5T thunderbird
Comment 29•12 years ago
|
||
(In reply to Dan Parsons [:lerxst] from comment #27) > :coop, when this copy is done, I'll be able to immediately expand ftp_stage > by 1TB. No one will have to move anything to make this space happen. > Unfortunately it's going to be a few more hours (so, possibly tomorrow once > I wake up again) but when that happens, hey, 1TB for everybody. per irc, this copy is still going, revised ETA "a few hours".
Comment 30•12 years ago
|
||
The move is done, and I am now expanding the capacity of ftp_stage. before: Filesystem Size Used Avail Use% Mounted on 10.253.0.10:/vol/ftp_stage 6.5T 6.3T 172G 98% /mnt/netapp/stage after: Filesystem Size Used Avail Use% Mounted on 10.253.0.10:/vol/ftp_stage 7.5T 6.6T 955G 88% /mnt/netapp/stage It will probably keep growing even bigger for a while.
Comment 31•12 years ago
|
||
(In reply to Dan Parsons [:lerxst] from comment #30) > The move is done, and I am now expanding the capacity of ftp_stage. > > before: > Filesystem Size Used Avail Use% Mounted on > 10.253.0.10:/vol/ftp_stage > 6.5T 6.3T 172G 98% /mnt/netapp/stage > after: > Filesystem Size Used Avail Use% Mounted on > 10.253.0.10:/vol/ftp_stage > 7.5T 6.6T 955G 88% /mnt/netapp/stage > > It will probably keep growing even bigger for a while. w00t! Thanks Dan. (In reply to John O'Duinn [:joduinn] from comment #20) > (In reply to Dan Parsons [:lerxst] from comment #19) ... > > Is there not enough space on mpt-netapp-[b-d]? > Oh, didnt know about netapp-c,netapp-d in this morning's meeting. How much > space can we have across netapp-b/c/d ? > netapp-b has ~1TB free? > netapp-c has ?? free? > netapp-d has ?? free? (from irc, some think there's 500gb free, but not > certain). per irc w/lerxst: 16:33:13 < lerxst> mpt-netapp-a: aggr0 = 500gb free; aggr1 = 1.1tb free 16:33:38 < lerxst> mpt-netapp-b: aggr0 = 200gb free; aggr1 = 1tb free 16:34:03 < lerxst> mpt-netapp-c: aggr0 = 300gb free; aggr1 = full 16:34:18 < lerxst> mpt-netapp-d: aggr0 = full; aggr1 = full nthomas over to you now to start rejuggling as best as possible!
Assignee | ||
Comment 32•12 years ago
|
||
Final state: 10.253.0.10:/vol/ftp_stage 7.5T 5.5T 2.1T 73% /mnt/netapp/stage So 1T more total, 0.8T less used and 1.9T more free. If there was 0.8T to delete then that project picked a funny day to clean up, or the netapp pulled a rabbit out of the hat. Perhaps it was able to be more efficient with some free space to work with ? Anyway, seems unlikely we'll have any issues with that partition before we get to scl3. Next step is to figure out where to offload some of firefox/nightly/20??/ onto, so that we can get some breathing room on 10.253.0.11:/vol/stage. I'll work up an estimate of how much new usage we can expect in the next few weeks, and then we can figure out if we should carve some of the new space off the existing partition on mpt-netapp-a, or use some of the other free space.
Comment 33•12 years ago
|
||
I just gave mpt-netapp-b:/vol/stage another 100GB, hope that helps until you can move stuff.
Assignee | ||
Comment 34•12 years ago
|
||
I've moved /pub/mozilla.org/firefox/nightly/{2004..2008} to /pub/mozilla.org/firefox-old-builds, which means from mpt-netapp-b:/vol/stage ('firefox') to mpt-netapp-a:/vol/ftp_stage ('everything else'). Right now we have: Filesystem Size Used Avail Use% Mounted on mpt-netapp-a:/vol/ftp_stage 7.5T 5.8T 1.7T 78% /mnt/netapp/stage mpt-netapp-b:/vol/stage 4.3T 3.5T 764G 83% /mnt/netapp/stage/archive.mozilla.org/pub/firefox The latter is mid-cycle for the weekly load of firefox/{try,tinderbox}-builds, and we should bottom out at around 450G free. If we need more then we can move another 161G of firefox/nightly/2009. Thanks to everyone who contributed to providing more space or setting it up.
Status: ASSIGNED → RESOLVED
Closed: 12 years ago
Resolution: --- → FIXED
Updated•11 years ago
|
Product: mozilla.org → Release Engineering
You need to log in
before you can comment on or make changes to this bug.
Description
•