Closed Bug 728451 Opened 12 years ago Closed 12 years ago

Build new FTP servers, dm-download equivalent in scl3

Categories

(mozilla.org Graveyard :: Server Operations, task)

task
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: dustin, Assigned: bhourigan)

References

Details

We'll need a number of hosts to handle file downloads in scl3, similar to dm-ftp01 and ftp*.dmz.sjc1 and pm-zlb-ftp1 (non-bouncer FTP servers, lb) and dm-download{02..04} (bouncer'd FTP servers).  These will all serve files from the build volume (bug 728449).
Dustin:

What is the timeline on this bug?
It will need to finish at about the same time as the surf replacement (bug 728454), and once both are set up, we'll throw the switch to start serving ftp from these servers, at the same time as we releng switches to uploading builds to the new surf.
New hosts are up, data migration was started today but suspended due to ff12 release. Data transfer will resume tomorrow AM at noon.
Brian: any update here? 

I heard from Amy/Corey that this might be ready today for releng to start testing.
They're already cc-ed, but nthomas and rail have signed up to help test this from the releng side.
joduinn is organizing the downtime for bug 728451. He's asking pre-approvers for 6 hours on Friday morning, 6am-12pm PST.
rail, nthomas: do you guys have a plan for testing this in advance of the proposed downtime? Do you need help putting something together?
We're working off https://etherpad.mozilla.org/fFq0WCr3Ql, and the two of us are going to start working through that very shortly.
Depends on: 751455
tl'dr status: no show stoppers found, looking solid, more testing to do

Done:
- netflows verified for compile slaves that need ssh access
- some issues with user setup fixed (uid/gid's, umask)
- simple test of post_upload.py successful
- initial tests of downloads on scl1 test slaves successful, initial load tests work fine (more needed to simulate actual test load)

Known issues:
- virus scanning for release needs a little work, see bug 751455

To do:
- test rsync
- updating rsync modules now in puppet svn, verify write access works
- test virus scanning once modified
- review crontabs (now in puppet svn)
- look at caching on http side
- rail suggested a buildbot master (universal) with a few build and test slaves with /etc/hosts overrides to the new machines, to test end-to-end coverage for try. Would poll real try repo but upload to firefox/try-builds-test

The etherpad in comment #8 is the realtime tracking doc.
Also need to write a downtime plan - https://etherpad.mozilla.org/Hd9ruGh2j9
(In reply to Nick Thomas [:nthomas] from comment #10)
> Also need to write a downtime plan - https://etherpad.mozilla.org/Hd9ruGh2j9

Also available via http://tinyurl.com/ftp-downtime-plan-scl3 for awesomebar goodness.
Depends on: 751726
Depends on: 752272
after the weekend (and monday) work, is there anything left to do here?
IT needs to document this system. Other than that I'm unaware of any other outstanding issues.
Blocks: 753889
Depends on: 754174
Depends on: 754727
Doesn't block scl3 anymore since services are up and running from scl3 now.
No longer blocks: scl3-move
Documentation is completed - migration is completed. Closing bug.
Status: NEW → RESOLVED
Closed: 12 years ago
Resolution: --- → FIXED
Product: mozilla.org → mozilla.org Graveyard
You need to log in before you can comment on or make changes to this bug.