Closed
Bug 971684
Opened 10 years ago
Closed 10 years ago
Rearrange product delivery mounts to remove nested client-side mounts
Categories
(Infrastructure & Operations :: Change Requests, task)
Infrastructure & Operations
Change Requests
Tracking
(Not tracked)
RESOLVED
FIXED
People
(Reporter: gcox, Assigned: gcox)
References
Details
Having the client (ftp/rsync/upload) mount volume on top of volume creates problems in managing the back end storage. So, for the TCW we want to start fixing that. This involves... Rearranging /mnt/netapp/stage/archive.mozilla.org/pub/seamonkey /mnt/netapp/stage/archive.mozilla.org/pub/mobile to be mounted from the filer side instead of the client side. Creating/splitting off for size managability /mnt/netapp/stage/archive.mozilla.org/pub/b2g /mnt/netapp/stage/archive.mozilla.org/pub/thunderbird /mnt/netapp/stage/archive.mozilla.org/pub/xulrunner Mounting the stage volume underneath the ftp_stage volume, on the filer side. Again, this is pretty much an administrative move (all data will end up at the same places as it started), but it should then remove all client-side locking that would prevent us from doing filer maintenance. * Date/time: next TCW, probably 2 hours needed just to be safe. * Affects product delivery's main mounts * impact centers within a TCW. * Notif is part of the TCW * gcox on point
Flags: cab-review?
Comment 1•10 years ago
|
||
Approved by the CAB on Feb 12 to be carried out during the TCW on Feb 22nd.
Blocks: 971818
Flags: cab-review? → cab-review+
Updated•10 years ago
|
Assignee: server-ops → gcox
Comment 2•10 years ago
|
||
The simplest way to avoid writes during this period would be to disable the inbound services on upload[12] & upload-cron. I.e. all changes from the build network are made via these 3 hosts. Ideally, we can find a switch to throw to make these hosts inaccessible from the build network during the critical window. Jake: is it possible to take these hosts offline during the critical window? Or down their network connections? If we can't stop access to those hosts, we'll have to kill all running jobs -- almost every build & test step produces output stored on ftp. That will be time consuming and disruptive (lengthening TCW).
Flags: needinfo?(nmaul)
Comment 3•10 years ago
|
||
My vote is remount read-only on upload1/upload2/upload-cron, final rsync, swap to junction mount. Jobs will fail if they happen to finish in the few mins of the change, but it's a tree closure and they can be rerun. Minor fallout for RelEng uploading logs, easy to fix.
Comment 4•10 years ago
|
||
One possible issue I see is that files could "pile up" in /tmp/tmpXXX directories since the move commands (to the final destination) will fail and be retried. Hal suggests we do an 'rm -rf' of those files.
Comment 5•10 years ago
|
||
Probably we'll be OK because there's more than 50G free for /tmp and the outage should be fairly short for each partition. Will keep an eye on it during the TCW though.
Comment 6•10 years ago
|
||
:nthomas's solution seems reasonable to me. (commenting just to +1 and remove the NEEDINFO)
Flags: needinfo?(nmaul)
Comment 7•10 years ago
|
||
Greg, could you record the work you're planning to do in breaking up firefox/ here ?
Assignee | ||
Comment 8•10 years ago
|
||
The vol known as 'stage', /mnt/netapp/stage/archive.mozilla.org/pub/firefox, has 6 subdirs: bundles candidates nightly releases tinderbox-builds try-builds * candidates is already a junction mount, no change there. * nightly, tinderbox-builds, try-builds will be broken to new junction mounts. * releases and bundles will sync off to a new volume (internally called archivemo_firefox), where they will be at the root of the volume instead of in a qtree. * the junction mounts beneath stage will be rehomed to archivemo_firefox. * archivemo_firefox will become a junction mount beneath the main ftp_stage volume, which will itself be going through a de-qtree'ing process. This came about because of my own oversight: I forgot that you can't junctionmount a qtree'd point in a subvolume. If we didn't proceed on this we wouldn't achieve the goal of untangling this mount, the old stage volume would've stayed in place. The breakup of stage/subjunctioning volumes are just bonus wins.
Assignee | ||
Comment 9•10 years ago
|
||
qtrees eliminated, all mounts folded up into junction mounts, and now just* 'ftp_stage' is mounted by the clients. More shuffling will need to take place in a future window, but this streamlines the process. * There's also pvtbuilds and ffxbld, but they aren't nested in the stage mount so they don't count.
Assignee | ||
Comment 10•10 years ago
|
||
Also: checkins 82952, 82953, 82954, 82956 were the checkins to puppet, removing old mount references.
Updated•10 years ago
|
Product: mozilla.org → Infrastructure & Operations
Updated•9 years ago
|
Change Request: --- → approved
Flags: cab-review+
You need to log in
before you can comment on or make changes to this bug.
Description
•