Closed
Bug 748814
Opened 13 years ago
Closed 13 years ago
Tracking bug for Apr 25 2012 downtime
Categories
(Release Engineering :: General, defect)
Tracking
(Not tracked)
RESOLVED
FIXED
People
(Reporter: rail, Assigned: bear)
References
Details
(Whiteboard: [buildduty][downtime])
he Mozilla IT and RelEng teams need to take a downtime on Wednesday,
April 25th to migrate some services that support the Firefox continuous
integration system to a new data center. The user-facing systems are:
* build API (and builddata)
* clobberer
* OPSI configuration servers for the build farm
* trychooser
The IT team would also like to use the downtime to upgrade some systems
and services to provide better performance and/or scalability, notably:
* re-balancing the ganeti cluster in our scl1 colo
* fixing DHCP in our scl1 colo
* reorganizing our minis and reconfiguring the switch to which are
attached in our mtv1 colo
* upgrading our rabbitmq installation
* deploying a new pair of databases for buildbot
* moving CVS to the new scl3 colo
* upgrading zimbra
The downtime is scheduled for 3 hours, starting at 09:00 PST. The trees
will be closed during that time. We will open the trees and inform
#developers as soon as possible after the maintenance is complete.
As always, please let RelEng/myself know ASAP if there is any reason we
should not proceed with this downtime.
Assignee | ||
Updated•13 years ago
|
Comment 1•13 years ago
|
||
(Originally announced via all@m.c, Yammer and https://groups.google.com/forum/?fromgroups#!topic/mozilla.dev.planning/rrekazutYPQ )
Assignee | ||
Comment 2•13 years ago
|
||
timeline from todays downtime
0857 rail starts closing trees
0903 bear gives all clear for IT to start
0926 rail is stopping all build masters to allow db change
0923 arr finishes dhcp migration in scl1
0927 arr had to reboot redis01
0928 redis01 up
0929 cruncher has been transitioned by dustin
0934 rabbitmq upgrade started by dustin
0936 confirmed all masters are down and config changes being made for db and relengweb01 update
0941 arr reports all ganeti moves are done
0948 mburns reports production-opsi migrated - networking changes in progress
0957 production-opsi up and running
1001 relengweb1 is cutover and ready for testing
1005 mtv1 minis back online
1032 sheeri reports db cutover done, waiting on catlee's confirmation
1037 dustin reports rabbitmq updated
1053 catlee reports db cutover tested ok
1100 exploring why multiple linux slaves are unable to connect to the scl1 puppet master
1115 arr rebooted puppet master scl1 and its running but clients are still timing out
1139 ravi testing firewall rollover
1150 non-scl3 masters are being started
1150 ravi bouncing releng.scl3 vpn
1156 schedulers db needed updating
1159 linux slaves in scl1 are still having puppetd issues - iptables "hack" is helping
1210 downtime done - two issues need post downtime work on the releng side
1219 trees opened
need to file a post-downtime bug for the scl1 puppet problem
Assignee | ||
Comment 3•13 years ago
|
||
https://bugzilla.mozilla.org/show_bug.cgi?id=748906 filed for post downtime work
See Also: → 748906
Reporter | ||
Comment 4•13 years ago
|
||
All done here, trees are open.
Status: NEW → RESOLVED
Closed: 13 years ago
Resolution: --- → FIXED
Updated•12 years ago
|
Product: mozilla.org → Release Engineering
You need to log in
before you can comment on or make changes to this bug.
Description
•