Closed
Bug 858609
Opened 12 years ago
Closed 12 years ago
new NetApp volumes for FTP cluster
Categories
(Infrastructure & Operations Graveyard :: WebOps: Product Delivery, task)
Infrastructure & Operations Graveyard
WebOps: Product Delivery
All
Other
Tracking
(Not tracked)
RESOLVED
FIXED
People
(Reporter: nmaul, Assigned: gcox)
References
Details
This has been discussed via email... just making a bug to track the actual work.
We currently have 4 NetApp volumes for the FTP cluster:
[root@ftp1.dmz.scl3 ~]# df -h
Filesystem Size Used Avail Use% Mounted on
/dev/sda3 67G 8.3G 56G 14% /
tmpfs 7.8G 4.0K 7.8G 1% /dev/shm
/dev/sda1 97M 82M 11M 89% /boot
10.22.74.11:/vol/ftp_stage/stage_qtree
17T 16T 987G 95% /mnt/netapp/stage
10.22.74.11:/vol/tinderbox_builds
7.9T 6.1T 1.9T 77% /mnt/cm-ixstore01
10.22.74.10:/vol/stage/stage_qtree
15T 13T 2.4T 84% /mnt/netapp/stage/archive.mozilla.org/pub/firefox
10.22.74.11:/vol/pvtbuilds
370G 337G 34G 91% /mnt/pvt_builds
We need to split off some new ones, to hold some of what's in the 10.22.74.11:/vol/ftp_stage/stage_qtree volume.
Please make 5 new volumes.
/mnt/netapp/stage/archive.mozilla.org/pub/thunderbird (3.6TB used)
/mnt/netapp/stage/archive.mozilla.org/pub/xulrunner (2.4TB used)
/mnt/netapp/stage/archive.mozilla.org/pub/seamonkey (2.1TB used)
/mnt/netapp/stage/archive.mozilla.org/pub/mobile (1.4TB used)
/mnt/netapp/stage/archive.mozilla.org/pub/b2g (6.1TB used)
Can we make the usable volume sizes 4.5TB, 3TB, 3TB, 2TB, and 8TB, respectively? Enough to hold them, plus some growth.
Once these are made we'll mount them up and migrate over the data.
CC'ing catlee and joduinn, because I'm not sure exactly who needs to know about this or what needs to be done. The basic process will be to mount the new points up somewhere, rsync over the data, then (quickly) move the old dir out of the way and put the new one in place, then probably do a final rsync to make sure we got everything.
Does this seem problematic to you two for any of thunderbird, xulrunner, seamonkey, mobile, or b2g?
| Reporter | ||
Updated•12 years ago
|
Assignee: server-ops → server-ops-storage
Component: Server Operations → Server Operations: Storage
QA Contact: shyam → dparsons
| Assignee | ||
Updated•12 years ago
|
Assignee: server-ops-storage → gcox
| Assignee | ||
Comment 1•12 years ago
|
||
Sized as requested. Same export perms as the existing ftp_stage.
10.22.74.10:/vol/archivemo_thunderbird
10.22.74.10:/vol/archivemo_xulrunner
10.22.74.10:/vol/archivemo_seamonkey
10.22.74.10:/vol/archivemo_mobile
10.22.74.11:/vol/archivemo_b2g
Passing over.
Assignee: gcox → nmaul
Component: Server Operations: Storage → Server Operations: Web Operations
QA Contact: dparsons → nmaul
Summary: new NetApp volumes to for FTP cluster → new NetApp volumes for FTP cluster
| Reporter | ||
Comment 2•12 years ago
|
||
I have started an rsync on upload1.dmz.scl3 for both thunderbird and mobile. They use separate mount points from the other things on this system (and even separate from each other), which should eliminate contention within Linux and make the operations slightly faster (and independent).
Even so, I expect this to take several hours even for the smallest volume. We may have to revisit this over the weekend. Conservatively, I'm estimating around 9MB/sec transfer rate. Unless it speeds up dramatically at some point, I'm not even sure we'll be done with even the smallest volume before the weekend is up. :/
| Reporter | ||
Comment 3•12 years ago
|
||
I gave up on rsync for the "mobile" volume and used ndmpcopy on the NetApp command line... this was drastically faster (~100MB/s, vs ~10MB/s), and the sync completed in about 3.5 hours.
The "mobile" mount is now in place. We ran into some problems with the FTP cluster nodes. After mounting up the new volume, it wasn't actually "mounted", and no files were visible. It also would not un-mount, either. I suspect this may have been caused by the presence of a nested mount underneath "mobile"... a bind mount to cm-ixstore01. We ultimately had to reboot the 6 FTP cluster nodes to get this back to normal, but after doing so things worked properly. This did burn one of the Android trees... thanks to :philor and :Callek for helping out. We did not need to close the tree(s).
After the mount was in place and in use, I took a snapshot of the original volume and began deleting the old "mobile" directory... it will take a while, but we'll get 1.4TB of space back. This should last us through the weekend, and into the start of the week... at which time we can work on another directory.
"xulrunner" would be my next choice.
| Reporter | ||
Comment 4•12 years ago
|
||
Updated current dir sizes:
2.4T xulrunner
2.7T mobile
2.2T seamonkey
3.7T thunderbird
6.6T b2g
xulrunner has not changed significantly.
mobile is already moved, of course... that's just for reference. Interestingly it has nearly doubled in size since the last check.
seamonkey shows minor growth... no problem there.
thunderbird shows minor growth... no problem there.
b2g shows significant growth (6.1T->6.6T). This seems realistic, and will probably grow more during the b2g work week.
Whiteboard: [reit-ops]
| Reporter | ||
Comment 5•12 years ago
|
||
Ignore my last re: mobile growing... I forgot there's a bind mount underneath mobile that includes archive data. It's currently at 1.2TB on that volume.
This is maybe a place we could simplify someday, and eliminate the extra mount for mobile archives... but that's a different project. :)
Comment 6•12 years ago
|
||
(In reply to Jake Maul [:jakem] from comment #4)
> Updated current dir sizes:
> 6.6T b2g
>
> b2g shows significant growth (6.1T->6.6T). This seems realistic, and will
> probably grow more during the b2g work week.
Something seems pathologically wrong here, investigating.
| Reporter | ||
Comment 7•12 years ago
|
||
Great news....
[15:21:22] <nthomas> jakem: there's 4TB in b2g/tinderbox-builds, most of which can be deleted because it's more than 30 days old. Any objections from a load point of view from me doing that ?
Looks like we may be missing a cleanup cron like the other directories have. Once this is fixed we'll be very well off for the b2g work week, even if we don't get any of the other directories migrated. :)
| Assignee | ||
Comment 8•12 years ago
|
||
I see a bunch of deletes going through on /vol/tinderbox_builds; that's actually on a different aggr than ftp_stage. So, while the cleanup is definitely appreciated, it doesn't alleviate/mitigate the ftp_stage cleanup needs.
Comment 9•12 years ago
|
||
I may not be around tomorrow, so here's some verbosity.
(In reply to Jake Maul [:jakem] from comment #7)
> [15:21:22] <nthomas> jakem: there's 4TB in b2g/tinderbox-builds, most of
> which can be deleted because it's more than 30 days old. Any objections from
> a load point of view from me doing that ?
>
> Looks like we may be missing a cleanup cron like the other directories have.
> Once this is fixed we'll be very well off for the b2g work week, even if we
> don't get any of the other directories migrated. :)
This is from bug 771017 not getting completed when we got hung up on some permissions work. We basically want to run
# Clean up b2g tinderbox-builds
@hourly nice -n 19 find /home/ftp/pub/b2g/tinderbox-builds -mindepth 2 -maxdepth 2 -type d -mtime +30 -name 1????????? -exec rm -rf {} \;
to get back to something sane. Feel free to run that manually. Note that doesn't help bug 855594, there are two partitions where b2g builds go to.
| Reporter | ||
Comment 10•12 years ago
|
||
Running this by hand now... the output (without the -exec) looked good to me.
Will look into adding a cron for this... probably to ffxbld, because that's the user that seems to own all these directories. Dunno why. :)
| Reporter | ||
Comment 12•12 years ago
|
||
This still needs love, but honestly I have zero time to work on it. Unassigning myself and triaging to product delivery.
Assignee: nmaul → server-ops-webops
Component: Server Operations: Web Operations → WebOps: Product Delivery
Product: mozilla.org → Infrastructure & Operations
| Assignee | ||
Comment 13•12 years ago
|
||
In bug 971684 we broke up the existing stage and ftp_stage volumes into smaller components, much along the lines laid out by the above volumes (after all, they already existed). We'll see more movement later, but the above volumes are done.
Assignee: server-ops-webops → gcox
Status: NEW → RESOLVED
Closed: 12 years ago
Resolution: --- → FIXED
See Also: → 971684
Updated•9 years ago
|
Product: Infrastructure & Operations → Infrastructure & Operations Graveyard
You need to log in
before you can comment on or make changes to this bug.
Description
•