Closed
Bug 601025
Opened 14 years ago
Closed 13 years ago
improve sync time for release builds between stage.m.o and pvt-mirror1/2/n
Categories
(Infrastructure & Operations :: RelOps: General, task)
Tracking
(Not tracked)
RESOLVED
FIXED
People
(Reporter: joduinn, Assigned: zandr)
References
Details
(Whiteboard: [q1] [tracking])
During the FF3.6.9 release, we noticed that it took a long time to get the release builds visible on pvt-mirror1/2/n. As best as I can tell, the timeline was: 13:10:wed: RelEng does "push to mirrors" (aka on stage.m.o, run rsync from 3.6.9-candidates to releases/3.6.9 directories) 14:36:wed: pvt-mirror1 has release bits; mirroring to external sites starts. pvt-mirror1 is the machine which all external mirror nodes sync from, so this delay caused delayed in mirror absorption during a release. From a quick postmortem with mrz, it might be possible to speed this up by more carefully target what exactly to rsync over to pvt-mirror1. Currently everything under "/" is being sync'd over. One proposal was to have a different (additional?) sync running which only brought over files under "firefox/releases". This smaller set of files should sync more quickly, which means having releases visible on pvt-mirror1 faster.
Reporter | ||
Updated•14 years ago
|
Summary: improve sync time between stage.m.o and pvt-mirror1 → improve sync time for release builds between stage.m.o and pvt-mirror1/2/n
Comment 1•14 years ago
|
||
Another idea that came up while I was chatting with justdave about this earlier: He said that the slowest part of "start of rsync from candidates -> release" to "mirrors getting files" was the rsync that releng runs, because it's copied from a remote NFS partition to another one. We came to the conclusion that this could be sped up if the rsync was run on the ftp.mozilla.org machine rather than stage.mozilla.org, though I don't recall exactly why. This is a bit tricky because (rightly) nobody from RelEng has access to this machine, so it wolud have to be done by IT or triggered through a system that doesn't currently exist.
Updated•14 years ago
|
Assignee: server-ops → justdave
Comment 2•14 years ago
|
||
Yeah, what Ben said. Here's the graphic overview: https://people.mozilla.com/~justdave/MirrorNetwork.pdf The files being copied are on the Equalogic SAN box. When running the rsync from surf, they files go through dm-ftp01's network interfaces 4 times... once from eql to ftp, once from ftp to stage where the copy is being run, then from stage back to ftp and then back to the eql. Running the rsync directly on dm-ftp01 would cut the bandwidth in half used by the rsync process, as well as eliminating NFS from the equation (since the eql SAN is iscsi at that point)
Comment 3•14 years ago
|
||
Could we rsync ahead of time to pub-test? and from there symlink from pub to pub-test when we get the go from drivers? I am saying this to have the wanted bits on stage-rsync.mozilla.org ahead of time and only require a symlink change. This would cut all the networks transfers. Would this be a viable option?
Reporter | ||
Comment 4•14 years ago
|
||
justdave: There's a few options here so far - which way do you want to proceed?
Updated•14 years ago
|
Whiteboard: [q4] [tracking]
Comment 5•14 years ago
|
||
Bug 614786 will likely help with this.
Updated•14 years ago
|
Assignee: justdave → zandr
Whiteboard: [q4] [tracking] → [q1] [tracking]
Assignee | ||
Updated•13 years ago
|
Assignee | ||
Comment 6•13 years ago
|
||
(In reply to comment #5) > Bug 614786 will likely help with this. Did it? OK to close?
Comment 7•13 years ago
|
||
It took 20 minutes to copy 10G into firefox/releases/5.0b2 on May 20, and 17.5 minutes to copy 11.8G into firefox/releases/4.0rc2 on Mar 18. That's about 12 and 8 MB/s respectively, on a path which is netapp-b -> surf -> netapp-b. While not awesome it's better than when this bug was first filed. I can't find any data for how long it took to get to pv-mirror01 due to assorted automation failures/edge cases for 4.0rc2, 4.0.1 and 5.0b2, but my experience is that it's not dissimilar.
Updated•13 years ago
|
Status: NEW → RESOLVED
Closed: 13 years ago
Resolution: --- → FIXED
Updated•11 years ago
|
Component: Server Operations: RelEng → RelOps
Product: mozilla.org → Infrastructure & Operations
You need to log in
before you can comment on or make changes to this bug.
Description
•