Closed Bug 1307169 Opened 8 years ago Closed 7 years ago

implement emergency shut-off

Categories

(Release Engineering Graveyard :: Applications: Balrog (backend), defect, P1)

defect

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: bhearsum, Assigned: asilva)

References

Details

(Whiteboard: [lang=python])

Currently, Balrog has a long standing rule without a product or version specified that points at No-Update. It normally sits at priority 0, but can be elevated to a higher one to quickly shut off ALL updates. It was always a bit of a hack, and we've since outgrown it. Eg: using this existing rule shuts off GMP updates as well as Firefox ones, which can break Netflix for users who don't yet have Widevine installed. Additionally, the multiple sign off system that we're going to start working on soon will probably break the current hack. I think we've reached the point where having a more formal emergency shut off is a good idea. It should have product, and maybe channel level granularity. Shutoff should not require multiple sign off, but it should send a notification that it happened.
Varun has been doing some thinking about this, and is planning to work on it.
Assignee: nobody → varunj.1011
No longer blocks: 1278974
Priority: -- → P3
Whiteboard: [lang=python]
Varun, are you still planning to look at this?
Flags: needinfo?(varunj.1011)
See Also: → 1369383
Sorry for the delay! Yes, I will look at it soon.
Flags: needinfo?(varunj.1011)
This is becoming more urgent now that multiple signoffs are in place for all channels. I think we should aim to fix this before the end of the year. We should probably have a look at the current design and make sure it's compatible with multiple signoffs before proceeding any further.
Priority: P3 → P2
Priority: P2 → P1
Varun and I talked about this awhile back. The idea at that point was to have a separate table that tracked whether or not a given product+channel's updates were currently disabled. This table would be exempt from multiple signoffs, and we'd add some new UI to control it. I suspect we'd probably want a button on the Rules page that shuts off or on updates for the currently selected product+channel. One downside to this plan is that you wouldn't need multiple signoff to turn updates back on. I think that's probably OK for a initial implementation, we could always add it in later.
I spoke with Varun - he's not going to have time to look at this anytime soon. Thanks for your brainstorming on this, Varun!
Assignee: varunj.1011 → nobody
Catlee and I talked a little bit about this today. He reminded me that one of the crucial requirements here is that we need to be able to give folks who otherwise don't have access to Balrog the power to make an emergency shutoff. These folks shouldn't be able to turn updates back on, though - that should be left to folks who are more experienced and knowledgeable about Balrog and updates. We also said that the UI for this should be very simple and fast to use. Because we're going to be doing these on a product+channel basis, perhaps it should be integrated into the Rules UI, and use the currently filtered product+channel? Bonus points for shiny big red buttons and playing sounds like http://soundbible.com/1511-Fire-Truck-Siren.html as part of shutoff.
Assignee: nobody → allan.tavares
Depends on: 1434280
The backend of this was implemented in https://github.com/mozilla/balrog/pull/450. I'm leaving this bug open to continue to track the frontend work taking place in https://github.com/mozilla/balrog/pull/460
Depends on: 1438299
The UI hit production today. Are we all done with the Balrog side of things, Allan?
Flags: needinfo?(allan.tavares)
Cool! Yeah, all done in Balrog. It is necessary set "emergency_shutoff/delete" permission to balrogagent in prod, if it already not set. We need to provide, to people that have no access to VPN, the ability to shut off updates. As response to an e-mail that I sent to RelEng team, Catlee and Jlund like the ideia of having a site in mozilla/services througth auth0 authentication. Also, we have shutoff information available in Public API (https://aus5.mozilla.org/api/v1/emergency_shutoff). Nthomas suggests showing shutoff status in: - Balrog UI index and Delivery Dashboard (https://mozilla.github.io/delivery-dashboard). Thanks again for mentoring, bhearsum!
Flags: needinfo?(allan.tavares)
(In reply to Allan [:asilva] from comment #13) > Cool! Yeah, all done in Balrog. Okay! I'm going to close this bug out then - we can track alternative UIs and clients in other bugs. > It is necessary set > "emergency_shutoff/delete" permission to balrogagent in prod, if it already > not set. Are you sure about this? As far as I can tell, the Agent only goes through the /scheduled_changes/emergency_shutoff endpoint, which eventually ends up checking for "scheduled_change" "enact" permission (https://github.com/mozilla/balrog/blob/master/auslib/db.py#L1269). I didn't add any other permissions in dev or staging, and it seemed to work OK there.
Status: NEW → RESOLVED
Closed: 7 years ago
Flags: needinfo?(allan.tavares)
Resolution: --- → FIXED
Ah, you right, I forgot this https://github.com/mozilla/balrog/blob/master/auslib/db.py#L1309 Validation is using the "Scheduled By" field.
Flags: needinfo?(allan.tavares)
Product: Release Engineering → Release Engineering Graveyard
You need to log in before you can comment on or make changes to this bug.