Closed
Bug 1142639
Opened 11 years ago
Closed 10 years ago
MDN Planned down-time for RabbitMQ P2V
Categories
(developer.mozilla.org Graveyard :: General, defect)
Tracking
(Not tracked)
RESOLVED
FIXED
People
(Reporter: groovecoder, Unassigned)
References
Details
We are taking MDN down while we perform physical-to-virtual on the RabbitMQ servers:
https://bugzilla.mozilla.org/show_bug.cgi?id=1078752
As part of this, we need to:
1. Create site-wide notices of the planned down-time
2. Update the maintenance page (bug 1141314)
3. Redirect all traffic to the maintenance page
| Reporter | ||
Comment 1•11 years ago
|
||
:cyliang - can you give an approximate start for the down-time so we can add it to our site-wide notices?
:cyliang - can you help us make sure the redirect will go to *our* down-time page (which includes helpful links for visitors to get MDN docs offline), and not just the generic mozilla hard-hat page?
Flags: needinfo?(cliang)
Comment 2•10 years ago
|
||
* The last CAB meeting rearranged the work during the tree-closing window so that the P2V work will take place *before* the PHX1 networking work. We're slated to do the P2V of the rabbit servers at 9 AM EDT / 7 AM PDT.
* If the maintenance page has a specific URL, I can handle enabling / disabling a redirect to that page from the load balancer. I just need to know what the maintenance URL should be. =)
Flags: needinfo?(cliang)
| Reporter | ||
Comment 3•10 years ago
|
||
:cyliang - I just merged the updated maintenance page to our repo here:
https://github.com/mozilla/kuma/tree/master/maintenance
Can you grab that and just serve it statically from another web host/server? Or could/should we do it with our own Apache server? Are we taking the entire Apache server down during the RabbitMQ downtime?
Flags: needinfo?(cliang)
Comment 4•10 years ago
|
||
In addition to showing the maintenance page when a user visits the homepage, can we also show the page for all other paths? In other words, can developer.mozilla.org/* either directly load or temporarily redirect to the page?
Comment 5•10 years ago
|
||
@groovecoder: I wasn't planning on taking down the MDN Apache web server. If it's easier for me to host the static page on another server, I believe I can do so. If it's easy enough for you to host it in some corner of the MDN web servers, I can point there just as well.
@openjck: Any HTTP / HTTPS request attempting to go to the developer.mozilla.org IP address will be written to go the maintenance URL, so it *should* catch all paths.
Flags: needinfo?(cliang)
| Reporter | ||
Comment 6•10 years ago
|
||
I just submitted https://github.com/mozilla/kuma/pull/3116 which puts the maintenance site into our own media/ directory, which would allow us to redirect all MDN traffic to:
https://developer.mozilla.org/media/maintenance/
Which will be served by Apache as a static page/site.
Comment 7•10 years ago
|
||
Commits pushed to master at https://github.com/mozilla/kuma
https://github.com/mozilla/kuma/commit/3421193a134052e002dc7bef80ffcdb547a3610b
bug 1142639 - move maintenance/ to media/ for static serving
https://github.com/mozilla/kuma/commit/260f0290572f56bc33e0545b748be5c61ba9933b
Merge pull request #3116 from groovecoder/move-maintenance-to-media-1142639
bug 1142639 - move maintenance/ to media/ for static serving
| Reporter | ||
Comment 8•10 years ago
|
||
https://developer.mozilla.org/media/maintenance/ is live so we can plan to redirect all MDN traffic there and let Apache serve it as static page.
Comment 9•10 years ago
|
||
Right now, I have a rule on the load balancer for MDN that says:
If header "bunnies" == "true" then
redirect to maintenance page
This appears to be working. (See below.) For the outage, I'll just remove the header check; once the outage is over, I'll just make the rule in-active.
cliang-07757:~ cliang$ curl -kI https://developer.mozilla.org/
HTTP/1.1 301 MOVED PERMANENTLY
Server: Apache
Vary: Accept-Language, Accept-Encoding
X-Backend-Server: developer1.webapp.scl3.mozilla.com
Content-Type: text/html; charset=utf-8
Access-Control-Allow-Credentials: false
Date: Fri, 13 Mar 2015 20:26:43 GMT
Location: https://developer.mozilla.org/en-US/
Transfer-Encoding: chunked
Access-Control-Allow-Origin: *
X-Frame-Options: DENY
Access-Control-Allow-Methods: GET
Connection: Keep-Alive
X-Cache-Info: cached
cliang-07757:~ cliang$ curl -kI https://developer.mozilla.org/en-US/Firefox/
HTTP/1.1 301 MOVED PERMANENTLY
Server: Apache
Vary: Cookie, Accept-Encoding
X-Backend-Server: developer1.webapp.scl3.mozilla.com
Content-Type: text/html; charset=utf-8
Access-Control-Allow-Credentials: false
Date: Fri, 13 Mar 2015 20:33:54 GMT
Location: https://developer.mozilla.org/en-US/Firefox
Transfer-Encoding: chunked
Access-Control-Allow-Origin: *
Connection: Keep-Alive
X-Frame-Options: DENY
Access-Control-Allow-Methods: GET
X-Cache-Info: caching
cliang-07757:~ cliang$ curl -H "bunnies: true" -kI https://developer.mozilla.org/
HTTP/1.1 302 Moved Temporarily
Content-Type: text/html
Date: Fri, 13 Mar 2015 20:34:05 GMT
Location: https://developer.mozilla.org/media/maintenance/
Connection: Keep-Alive
Content-Length: 0
cliang-07757:~ cliang$ curl -H "bunnies: true" -kI https://developer.mozilla.org/en-US/Firefox/
HTTP/1.1 302 Moved Temporarily
Content-Type: text/html
Date: Fri, 13 Mar 2015 20:34:12 GMT
Location: https://developer.mozilla.org/media/maintenance/
Connection: Keep-Alive
Content-Length: 0
| Reporter | ||
Comment 11•10 years ago
|
||
Yup. Here are the emails from the start and end of the down-time ...
Start
=====
We have started the RabbitMQ P2V as part of this maintenance window.
MDN is redirecting to our updated maintenance page.
We have stopped the celery processes on the production cluster.
We see that the stage celery processes use the same RabbitMQ cluster as production :( so we're getting lots of "connection closed unexpectedly" emails from the site re-render task that's going on there. For now, we're going to let it run expecting the P2V can finish in 10-15 minutes and the broker will be back up for those tasks. If the cluster down-time stretches past 15 minutes, we may kill the stage re-render process.
I'll update this thread.
End
===
The RabbitMQ P2V is done. (Thanks cyliang!)
MDN is back up. The RabbitMQ queues (prod & stage) are active again. Production celery tasks are starting and completing. The stage errors have stopped. Total time between first error to last was 25m.
Status: NEW → RESOLVED
Closed: 10 years ago
Flags: needinfo?(lcrouch)
Resolution: --- → FIXED
Updated•5 years ago
|
Product: developer.mozilla.org → developer.mozilla.org Graveyard
You need to log in
before you can comment on or make changes to this bug.
Description
•