Closed Bug 1205867 Opened 10 years ago Closed 10 years ago

Migrate Pulse/PulseGuardian from phx1 to CloudAMQP/Heroku

Categories

(Infrastructure & Operations Graveyard :: WebOps: Other, task)

task
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: mcote, Unassigned)

References

Details

(Whiteboard: [kanban:https://webops.kanbanize.com/ctrl_board/2/1793] )

The time has come to move Pulse (the RabbitMQ cluster) and PulseGuardian (the Pulse management web app) out of phx1 and into CloudAMQP and Heroku, respectively. The migration plan is at https://docs.google.com/document/d/1F207nMJUXXxyDNuJuoPDfFzK39RSy0gOqrYMR-21AcQ/edit# On the day of, I will need someone on hand who can close ports on the Pulse zlb, can dump the PulseGuardian database and deliver it to me, can dump the RabbitMQ definitions and deliver them to me, and can update the pulse.mozilla.org DNS entry. I've been testing everything I can ahead of time to make sure it goes smoothly. Jgriffin has agreed to be around to help with testing and such.
Whiteboard: [kanban:https://webops.kanbanize.com/ctrl_board/2/1793]
:mcote - this will also require new network flows to be added for the new IP addresses to be accessible by buildbot machines. That work doesn't need a TCW afaik, but will block this work. Can you open a bug against releng to add the flows, and include the IP addresses there? (And block this bug, of course) h/t :arr for thinking of this!
Flags: needinfo?(mcote)
:mcote also, could you add the information requested in https://wiki.mozilla.org/IT/ChangeControl#Submitting_a_Change_Request then set the "cab-review" flag please? Holler if you need a hand.
Depends on: 1205892
Thanks! Filed request for netflows and CAB, blocking this bug.
Flags: needinfo?(mcote)
As discussed we'll be doing this outside of a TCW. Current plan is to migrate at least Pulse itself next Wednesday, October 7th, around 5 pm PDT. PulseGuardian may be moved earlier.
Summary: Migration Pulse/PulseGuardian from phx1 to CloudAMQP/Heroku during next TCW → Migration Pulse/PulseGuardian from phx1 to CloudAMQP/Heroku
No longer blocks: 1204963
Depends on: 1210072
Summary: Migration Pulse/PulseGuardian from phx1 to CloudAMQP/Heroku → Migrate Pulse/PulseGuardian from phx1 to CloudAMQP/Heroku
Depends on: 1212044
PulseGuardian was successfully migrated to Heroku on 2015/10/06. We attempted to migrate Pulse to CloudAMQP today, but we neglected to get a proper SSL certificate for pulse.mozilla.org onto the CloudAMQP cluster, so clients with strict hostname checking were failing to connect. We've contacted CloudAMQP support to figure out the best way to handle this, whether that be providing them with our cert or some other mechanism. We should be good to retry next week.
Status: NEW → ASSIGNED
Depends on: 1214636
Blocks: 1214636
No longer depends on: 1214636
The migration today succeeded. The VIPs for the old cluster have been turned off and all traffic is flowing through the CloudAMQP instance. The old cluster is still running but will be decommissioned in a day or two; see bug 1214636.
Status: ASSIGNED → RESOLVED
Closed: 10 years ago
Resolution: --- → FIXED
The move of the RabbitMQ server totally broke our Mozmill CI because of IP (and maybe other) change. As result our machines can no longer connect to th broker. We should really announce those changes to all the customers and not break them unexpectedly. I will handle our regression via bug 1215464.
(In reply to Hal Wine [:hwine] (use NI) from comment #1) > :mcote - this will also require new network flows to be added for the new IP > addresses to be accessible by buildbot machines. That work doesn't need a > TCW afaik, but will block this work. Exactly this was not done for our machines so that we are stranded now. Sadly I cannot open bug 1205889 at all due to permission issues, so I will most likely reference this bug in the one I will file now. I assume we need an identical setup.
Both last two comments are not related to this topic. I would suggest you to file a new on that.
Product: Infrastructure & Operations → Infrastructure & Operations Graveyard
You need to log in before you can comment on or make changes to this bug.