Closed Bug 858609 Opened 12 years ago Closed 12 years ago

new NetApp volumes for FTP cluster

Categories

(Infrastructure & Operations Graveyard :: WebOps: Product Delivery, task)

All
Other
task
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: nmaul, Assigned: gcox)

References

Details

This has been discussed via email... just making a bug to track the actual work. We currently have 4 NetApp volumes for the FTP cluster: [root@ftp1.dmz.scl3 ~]# df -h Filesystem Size Used Avail Use% Mounted on /dev/sda3 67G 8.3G 56G 14% / tmpfs 7.8G 4.0K 7.8G 1% /dev/shm /dev/sda1 97M 82M 11M 89% /boot 10.22.74.11:/vol/ftp_stage/stage_qtree 17T 16T 987G 95% /mnt/netapp/stage 10.22.74.11:/vol/tinderbox_builds 7.9T 6.1T 1.9T 77% /mnt/cm-ixstore01 10.22.74.10:/vol/stage/stage_qtree 15T 13T 2.4T 84% /mnt/netapp/stage/archive.mozilla.org/pub/firefox 10.22.74.11:/vol/pvtbuilds 370G 337G 34G 91% /mnt/pvt_builds We need to split off some new ones, to hold some of what's in the 10.22.74.11:/vol/ftp_stage/stage_qtree volume. Please make 5 new volumes. /mnt/netapp/stage/archive.mozilla.org/pub/thunderbird (3.6TB used) /mnt/netapp/stage/archive.mozilla.org/pub/xulrunner (2.4TB used) /mnt/netapp/stage/archive.mozilla.org/pub/seamonkey (2.1TB used) /mnt/netapp/stage/archive.mozilla.org/pub/mobile (1.4TB used) /mnt/netapp/stage/archive.mozilla.org/pub/b2g (6.1TB used) Can we make the usable volume sizes 4.5TB, 3TB, 3TB, 2TB, and 8TB, respectively? Enough to hold them, plus some growth. Once these are made we'll mount them up and migrate over the data. CC'ing catlee and joduinn, because I'm not sure exactly who needs to know about this or what needs to be done. The basic process will be to mount the new points up somewhere, rsync over the data, then (quickly) move the old dir out of the way and put the new one in place, then probably do a final rsync to make sure we got everything. Does this seem problematic to you two for any of thunderbird, xulrunner, seamonkey, mobile, or b2g?
Assignee: server-ops → server-ops-storage
Component: Server Operations → Server Operations: Storage
QA Contact: shyam → dparsons
Assignee: server-ops-storage → gcox
Sized as requested. Same export perms as the existing ftp_stage. 10.22.74.10:/vol/archivemo_thunderbird 10.22.74.10:/vol/archivemo_xulrunner 10.22.74.10:/vol/archivemo_seamonkey 10.22.74.10:/vol/archivemo_mobile 10.22.74.11:/vol/archivemo_b2g Passing over.
Assignee: gcox → nmaul
Component: Server Operations: Storage → Server Operations: Web Operations
QA Contact: dparsons → nmaul
Summary: new NetApp volumes to for FTP cluster → new NetApp volumes for FTP cluster
I have started an rsync on upload1.dmz.scl3 for both thunderbird and mobile. They use separate mount points from the other things on this system (and even separate from each other), which should eliminate contention within Linux and make the operations slightly faster (and independent). Even so, I expect this to take several hours even for the smallest volume. We may have to revisit this over the weekend. Conservatively, I'm estimating around 9MB/sec transfer rate. Unless it speeds up dramatically at some point, I'm not even sure we'll be done with even the smallest volume before the weekend is up. :/
I gave up on rsync for the "mobile" volume and used ndmpcopy on the NetApp command line... this was drastically faster (~100MB/s, vs ~10MB/s), and the sync completed in about 3.5 hours. The "mobile" mount is now in place. We ran into some problems with the FTP cluster nodes. After mounting up the new volume, it wasn't actually "mounted", and no files were visible. It also would not un-mount, either. I suspect this may have been caused by the presence of a nested mount underneath "mobile"... a bind mount to cm-ixstore01. We ultimately had to reboot the 6 FTP cluster nodes to get this back to normal, but after doing so things worked properly. This did burn one of the Android trees... thanks to :philor and :Callek for helping out. We did not need to close the tree(s). After the mount was in place and in use, I took a snapshot of the original volume and began deleting the old "mobile" directory... it will take a while, but we'll get 1.4TB of space back. This should last us through the weekend, and into the start of the week... at which time we can work on another directory. "xulrunner" would be my next choice.
Whiteboard: [reit-ops]
Updated current dir sizes: 2.4T xulrunner 2.7T mobile 2.2T seamonkey 3.7T thunderbird 6.6T b2g xulrunner has not changed significantly. mobile is already moved, of course... that's just for reference. Interestingly it has nearly doubled in size since the last check. seamonkey shows minor growth... no problem there. thunderbird shows minor growth... no problem there. b2g shows significant growth (6.1T->6.6T). This seems realistic, and will probably grow more during the b2g work week.
Whiteboard: [reit-ops]
Ignore my last re: mobile growing... I forgot there's a bind mount underneath mobile that includes archive data. It's currently at 1.2TB on that volume. This is maybe a place we could simplify someday, and eliminate the extra mount for mobile archives... but that's a different project. :)
(In reply to Jake Maul [:jakem] from comment #4) > Updated current dir sizes: > 6.6T b2g > > b2g shows significant growth (6.1T->6.6T). This seems realistic, and will > probably grow more during the b2g work week. Something seems pathologically wrong here, investigating.
Great news.... [15:21:22] <nthomas> jakem: there's 4TB in b2g/tinderbox-builds, most of which can be deleted because it's more than 30 days old. Any objections from a load point of view from me doing that ? Looks like we may be missing a cleanup cron like the other directories have. Once this is fixed we'll be very well off for the b2g work week, even if we don't get any of the other directories migrated. :)
I see a bunch of deletes going through on /vol/tinderbox_builds; that's actually on a different aggr than ftp_stage. So, while the cleanup is definitely appreciated, it doesn't alleviate/mitigate the ftp_stage cleanup needs.
I may not be around tomorrow, so here's some verbosity. (In reply to Jake Maul [:jakem] from comment #7) > [15:21:22] <nthomas> jakem: there's 4TB in b2g/tinderbox-builds, most of > which can be deleted because it's more than 30 days old. Any objections from > a load point of view from me doing that ? > > Looks like we may be missing a cleanup cron like the other directories have. > Once this is fixed we'll be very well off for the b2g work week, even if we > don't get any of the other directories migrated. :) This is from bug 771017 not getting completed when we got hung up on some permissions work. We basically want to run # Clean up b2g tinderbox-builds @hourly nice -n 19 find /home/ftp/pub/b2g/tinderbox-builds -mindepth 2 -maxdepth 2 -type d -mtime +30 -name 1????????? -exec rm -rf {} \; to get back to something sane. Feel free to run that manually. Note that doesn't help bug 855594, there are two partitions where b2g builds go to.
Running this by hand now... the output (without the -exec) looked good to me. Will look into adding a cron for this... probably to ffxbld, because that's the user that seems to own all these directories. Dunno why. :)
Dropping prio, this is no longer urgent.
Severity: major → normal
This still needs love, but honestly I have zero time to work on it. Unassigning myself and triaging to product delivery.
Assignee: nmaul → server-ops-webops
Component: Server Operations: Web Operations → WebOps: Product Delivery
Product: mozilla.org → Infrastructure & Operations
In bug 971684 we broke up the existing stage and ftp_stage volumes into smaller components, much along the lines laid out by the above volumes (after all, they already existed). We'll see more movement later, but the above volumes are done.
Assignee: server-ops-webops → gcox
Status: NEW → RESOLVED
Closed: 12 years ago
Resolution: --- → FIXED
See Also: → 971684
Product: Infrastructure & Operations → Infrastructure & Operations Graveyard
You need to log in before you can comment on or make changes to this bug.