730008 - [bedrock] Integrate chief/commander to allow devs to push bedrock live

Reporter

Description

•

12 years ago

This is somewhat related to bug 728386.

A critical component of adopting bedrock will be the ability for devs to push content live. It's already difficult enough that engagement can't push stuff live, and if we move this ability down to the IT level, there's 2 layers to go through just to push some content live.

AMO has a big red button to push stuff live (https://github.com/jbalogh/chief). I'd like to integrate this functionality in bedrock. We can figure out the right workflow, and it's fine if it's restricted in certain ways, but as long as we can push the button a few times a week, that will work.

This probably mitigates the need for a 4th site which auto-updated from master, as discussed in bug 728386. If we can push to stage too, then we don't need that.

When we know some heavy changes are coming through, we'd still involved IT and they can use stage to test the rollout.

How does this sound? I'm willing to write the chief script and get it all hooked up but obviously we'll need some help from IT.

James Long (:jlongster)

Reporter

Updated

•

12 years ago

Summary: Integrate chief/commander to allow devs to push bedrock live → [bedrock] Integrate chief/commander to allow devs to push bedrock live

Fred Wenzel [:wenzel]

Comment 1

•

12 years ago

In the process of this, please also reconcile this:
https://github.com/mozilla/bedrock/blob/master/bin/update_site.py

with what script we're currently using to do a production push for bedrock. This will a) keep us from losing the script and b) from having to guess what the pull/compile/push procedure for bedrock is.

Corey Shields [:cshields]

Comment 2

•

12 years ago

chief was designed with AMO in mind, and recently using it elsewhere has shown that it is not easily portable.  This will take some time as it is not a high priority, but we'll get it done.

Jake Maul [:jakem]

Assignee

Updated

•

12 years ago

Depends on: 737925

James Long (:jlongster)

Reporter

Updated

•

12 years ago

Blocks: 736338

James Long (:jlongster)

Reporter

Comment 3

•

12 years ago

Finding a solution to our push problem is a high priority, though it doesn't have to be in the form of chief. How would you suggest we structure our pushes with IT? We update frequently and also need emergency pushes every now and then when something is mysteriously broken on the live site.

Something like 2 pushes a week (Tuesday and Thursday) would be great, but not sure if that would be a strain on IT. For mozilla.org, there's not even a database, so it's usually a simple push.

James Long (:jlongster)

Reporter

Updated

•

12 years ago

No longer blocks: 736338

Brandon Burton [:solarce]

Comment 4

•

12 years ago

I know there are some other bugs tracking our work on making commander/chief our standard solution, is this bug still necessary, particularly with the move to bedrock on django?

matthew zeier [:mrz]

Comment 5

•

12 years ago

(In reply to James Long (:jlongster) from comment #3)
> Finding a solution to our push problem is a high priority, though it doesn't
> have to be in the form of chief. How would you suggest we structure our
> pushes with IT? We update frequently and also need emergency pushes every
> now and then when something is mysteriously broken on the live site.

I suggest filing blocker bugs to get emergency fixes pushed.  Planned updates can be coordinated with Ops (who does not sleep) so that you don't feel there are too many levels to go through.

The QA process should catch most/all of those and this shouldn't be something that needs to happen often.

Jeff Vier [:jvier]

Comment 6

•

12 years ago

I agree with mrz.

Privilege & responsibility are coupled -- allowing pushes to be triggered by people who aren't prepared to fix mistakes/errors is asking for outages.  Thus, the team with the pager is the team who should push releases.

With Ops being global, there's no reason why having them do the push should delay anything, and if something *does* go wrong, they're right there to fix it.

Corey Shields [:cshields]

Comment 7

•

12 years ago

(In reply to Jeff Vier [:jvier] from comment #6)
> Privilege & responsibility are coupled -- allowing pushes to be triggered by
> people who aren't prepared to fix mistakes/errors is asking for outages. 
> Thus, the team with the pager is the team who should push releases.

For www.mozilla.org we can't risk any downtime, I have to agree with this.

Status: NEW → RESOLVED

Closed: 12 years ago

Resolution: --- → WONTFIX

Fred Wenzel [:wenzel]

Comment 8

•

12 years ago

For the record, I don't expect the usual simple content changes on mozilla.org to cause significant problems. Much like AMO and SUMO, simple pushes, simple fixes, are key. Not everything will be 100% perfect all the time. Just last week, the SUMO team gave a brownbag on the issue that was very enlightening.

I respectfully disagree with every simple push needing direct ops action (and if it does, the dev/release process would be broken and should be fixed), but if you can't stomach extending the same privilege to mozilla.org that AMO and SUMO have, that's fine, but expect a *lot* of content pushes.

James Long (:jlongster)

Reporter

Comment 9

•

12 years ago

I agree with Fred, here are a few points:

* The QA process does not catch everything, and its quite common for something to be broken (a newsletter form not signing up, etc). mozilla.org changes too quickly for QA to catch everything and its common for regressions to happen. We are trying our best to fix this.

* Content pushes need to happen at least 2-3 times a week (just template updates)

* Pushing *would* be restricted to people who are prepared to fix errors. Probably only 1 (maybe 2) core developers would be able to push.

* If an error occurs, it's highly likely that Ops would not be able to fix it, since they don't know the codebase or the history of what was pushed. If you really want people around that can fix the problem, we need to schedule a time that a core developer can be there for the push too.

Like Fred said, I respect the decision you guys come up with, and if you decide to not allow this then we'll need to push several times a week.

Jake Maul [:jakem]

Assignee

Comment 10

•

12 years ago

I want to revisit this. Let's set up a meeting for sometime this month to discuss requirements and commitments. I believe this is something we should work towards, and we're not going to overcome the hurdles if we don't at least start down the track.

The main hurdle to just making it work at all is getting chief set up. However, I believe this will be comparatively minor next to the job you (webdev) will have in order to make this really feasible for most/all use cases. IIRC the SUMO team did a brownbag a while back about how they implemented this... we should re-watch that and get an update from them on anything they've had to change since then.

In particular, I believe they invested a considerable amount of time/effort into automated testing in stage and prod, and in metrics/statistics gathering in prod... so they know before they push that it won't break, they know after the push that it didn't break, and they know all the time that it's not broken. :)

Assignee: server-ops → nmaul

Status: RESOLVED → REOPENED

Resolution: WONTFIX → ---

Fred Wenzel [:wenzel]

Comment 11

•

12 years ago

Thanks Jake, I appreciate the effort. Happy to help! CCing cmore, whose team is in charge of care and feeding of mozilla.org.

Anthony Ricaud (:rik)

Updated

•

12 years ago

Blocks: bedrock-cd

Jake Maul [:jakem]

Assignee

Comment 12

•

12 years ago

This is up and running (finally), although until bug 775689 is resolved there are no pushbots for IRC yet. But the main app is done.

Note a couple things:

1) This doesn't push the PHP side of the app. That's still done via cron, automatically.

2) Stage is actually currently auto-deployed. We probably want to stop doing that.

3) You need a password (and VPN access) to be able to use these. Let me know who should get the password and I'll send it to them.


http://bedrockadm.private.phx1.mozilla.com/chief/bedrock.prod
https://www.mozilla.org/media/revision.txt
http://bedrockadm.private.phx1.mozilla.com/chief/bedrock.prod/logs/


http://bedrockadm.private.phx1.mozilla.com/chief/bedrock.stage
https://www.allizom.org/media/revision.txt
http://bedrockadm.private.phx1.mozilla.com/chief/bedrock.stage/logs/

Anthony Ricaud (:rik)

Comment 13

•

12 years ago

Thanks. Can you send me the password please? (To anthony@moz)

Jake Maul [:jakem]

Assignee

Comment 14

•

12 years ago

Sent. Closing this out... let me know if anyone else should have it as well. I know you're the primary dev these days, but not sure if you have a backup or anything like that. :)

Status: REOPENED → RESOLVED

Closed: 12 years ago → 12 years ago

Resolution: --- → FIXED

Anthony Ricaud (:rik)

Comment 15

•

12 years ago

Reopening for passwords.

Can you send passwords to mkelly and pmac, hereby cced?

Let me know how we should handle other password requests in the future.

Status: RESOLVED → REOPENED

Resolution: FIXED → ---

Anthony Ricaud (:rik)

Comment 16

•

12 years ago

Passwords sent.

Status: REOPENED → RESOLVED

Closed: 12 years ago → 12 years ago

Resolution: --- → FIXED

Anthony Ricaud (:rik)

Comment 17

•

12 years ago

So, looks like this does not work. When I look at the logs, the script only does git fetch but no git merge. So we have all the code, just not in the working directory.

Status: RESOLVED → REOPENED

Resolution: FIXED → ---

Jake Maul [:jakem]

Assignee

Comment 18

•

12 years ago

This is fixed.

The issue is that the Chief currently isn't set up to do a "merge"... it does fetch and then checkout. This works fine if you use an explicit commit tag, or origin/master... it doesn't work if you use just "master", because it's already on the master branch. origin/master causes it to fast-forward, but just 'master' is a no-op.

Status: REOPENED → RESOLVED

Closed: 12 years ago → 12 years ago

Resolution: --- → FIXED

Nobody; OK to take it and work on it

Updated

•

11 years ago

Component: Server Operations: Web Operations → WebOps: Other

Product: mozilla.org → Infrastructure & Operations

BMO Automation

Updated

•

5 years ago

Product: Infrastructure & Operations → Infrastructure & Operations Graveyard