Closed Bug 827473 Opened 11 years ago Closed 11 years ago

Chief for tbpl.mozilla.org

Categories

(Infrastructure & Operations Graveyard :: WebOps: Other, task, P3)

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: nmaul, Assigned: bburton)

References

Details

(Whiteboard: [triaged 20130111])

Attachments

(3 files)

According to bug counts, this is one of the most-frequently-pushed sites. Let's get us (WebOps) out of the way on this.
Google didn't return anything for "Chief", is this about "Chef" http://www.opscode.com/chef/ ?

Either way, like the sounds of being able to self-serve deployments :-D
s/sounds/sound/
(In reply to Ed Morley [:edmorley UTC+0] from comment #1)
> Google didn't return anything for "Chief", is this about "Chef"
> http://www.opscode.com/chef/ ?
> 
> Either way, like the sounds of being able to self-serve deployments :-D

Chief is an open source tool that Mozilla IT has written, https://github.com/mozilla/chief 

Some details on how it works are in https://mana.mozilla.org/wiki/display/websites/Chief
What Brandon said... the basic idea is that you get a big red button to push to deploy your code, rather than having to open a bug with IT and wait for us to deploy it for you.
Ah, I looked on wiki.m.o and hg.m.o - which would explain why I didn't find it!
Thank you - sounds great :-)
Priority: -- → P3
Whiteboard: [triaged 20130111]
OS: Mac OS X → All
Hardware: x86 → All
(In reply to Ed Morley [:edmorley UTC+0] from comment #1)
> Google didn't return anything for "Chief", is this about "Chef"
> http://www.opscode.com/chef/ ?
> 
> Either way, like the sounds of being able to self-serve deployments :-D

I wanted to confirm you have access to and have used MPT VPN

Thanks
Yes, I have access :-)
Ok, making progress

You can go to http://genericadm.private.phx1.mozilla.com/chief/tbpl.stage and do a push, but there are no changes to push yet

Next steps are

1. commit a change we can push to stage
2. what irc channel would you like a bot in (optional)
3. PM me for the password
4. adding the update script and settings-dist file to Hg, if I attach them to the bug can you commit them?
(In reply to Brandon Burton [:solarce] from comment #8)
> Ok, making progress
> 
> You can go to http://genericadm.private.phx1.mozilla.com/chief/tbpl.stage
> and do a push, but there are no changes to push yet
> 
> Next steps are
> 
> 1. commit a change we can push to stage
> 2. what irc channel would you like a bot in (optional)
> 3. PM me for the password
> 4. adding the update script and settings-dist file to Hg, if I attach them
> to the bug can you commit them?

Hey Ed, 

Can you let me know on these items so I can get staging finished then get prod setup for you?

Thanks
Flags: needinfo?(emorley)
(In reply to Brandon Burton [:solarce] from comment #8)
> 1. commit a change we can push to stage

Can you define what you mean by stage? TBPL only has tbpl-dev.allizom.org and tbpl.mozilla.org at present - and tbh due to the way each instance has to populate data, I think we'd like to avoid the infra load/DB storage issues (each DB is up to 150GB right before a purge) of a third instance if possible.

> 2. what irc channel would you like a bot in (optional)

What will the bot do? Presumably just notify when a push is taking place? 

If so, my main use cases are:
a) Checking the push has completed when I'm the one doing it (in which case I presume I'll be able to see that via the webapp?)
b) Be notified when someone else does a push and hasn't told me they are doing it (in which case a bot doesn't help when I'm not in the channel -> would email notifications be something that we could use instead?)

> 3. PM me for the password

Sure will do :-)

> 4. adding the update script and settings-dist file to Hg, if I attach them
> to the bug can you commit them?

Yup, I'm happy to commit them.
Flags: needinfo?(emorley)
Attached image Screenshot
To paraphrase discussion on IRC...
* TBPL has a stage instance setup on the admin node for the cluster it's on (but it isn't running afaict, which is fine, since we don't need/want it)
* tbpl-dev will be updated with the 15 min cron (as before), production will now use Chief.
* Email notifications aren't possible (but with the history page, aren't needed now tbh), so I've opted to have the notification bot in #ateam
* To use Chief you need to be connected to MPT-VPN.
* Chief deploy for TBPL uses a static password for all users, which I now have (we'll need to work out who needs access; ideally not many people, so we can avoid situations we've had in the past where people from other teams have filed production push bugs and pushed more then just their own change to production).

In order for solarce to test/work out a few issues, I had to commit a bunch of whitespace/no-op changes to the repo - but it's coming on well now (see screenshot).

Note to self: We'll need to make sure various wiki pages (wiki.m.o + intranet + mana) are updated to reflect that the correct process is to request a push from me/whomever, rather than asking for an IT prod push.

Thank you very much for doing this Brandon! :-D
So

19:44:14    tbpldev | oh nice, solarce is pushing tbpl-dev fedc97a70b7f
19:44:15    tbpldev | watching http://genericadm.private.phx1.mozilla.com/chief/tbp
                    | l.dev/logs/fedc97a70b7f.1360889047
19:44:15    tbpldev | http://hg.mozilla.org/webtools/tbpl/compare/<!DOCTYPE HTML
                    | PUBLIC "-//IETF//DTD HTML 2.0//EN">
19:44:15    tbpldev | <html><head>
19:44:15    tbpldev | <title>401 Authorization Required</title>
19:44:15    tbpldev | </head><body>
19:44:15    tbpldev | <h1>Authorization Required</h1>
19:44:15    tbpldev | <p>This server could not verify that you
19:44:15    tbpldev | are authorized to access the document
19:44:16    tbpldev | requested.  Either you supplied the wrong
19:44:17    tbpldev | credentials (e.g., bad password), or your
19:44:18    tbpldev | browser doesn't understand how to supply
19:44:19    tbpldev | the credentials required.</p>
19:44:20    tbpldev | </body></html>
19:44:21    tbpldev | ...fedc97a70b7f
19:44:22    tbpldev | something terrible happened. check the logs (fedc97a70b7f
                    | solarce)

This is because it tries to use what I set for 

        /* URL that shows the current revision of the site. */
        revision: 'https://tbpl-dev.allizom.org/cache/revision_info.txt',

To do a comparison

Does -dev need basic auth?

Could I disable it for that URL?
Per :philor in #ateam I disabled the auth

Can you coordinate with me on doing one more push to -dev in the morning to confirm the bot is happy, then we'll finish up prod?
(In reply to Brandon Burton [:solarce] from comment #14)
> Per :philor in #ateam I disabled the auth
> 
> Can you coordinate with me on doing one more push to -dev in the morning to
> confirm the bot is happy, then we'll finish up prod?

Sure, ping me when you want me to push (I can't push now or else the 15 min cron will steal it). I'm away from wifi for the weekend from ~1800 UTC+0 today, until Sunday evening.

Alternatively, if you have mercurial level 3 commit access, the repo doesn't need any additional permissions.

Thanks again :-)
Brandon - I just pushed to prod but no notifications in #ateam?

Also, do you have the configs for me to commit to the TBPL repo? :-)
(In reply to Ed Morley [:edmorley UTC+0] from comment #16)
> Brandon - I just pushed to prod but no notifications in #ateam?
> 
> Also, do you have the configs for me to commit to the TBPL repo? :-)

Yes, they were still in #chief. I've pushed the change to make them be in #ateam and see them now.

So for the changes,

in the root of the repo, please make the directories 'scripts/updates' and put the attached files in it
Attached file update.py for tbpl
Note, the actually command-settings.py are on the filesystem, but not put into the source VCS repo

So -dev and -prod have the proper configs in place

Will I need to do some manual "merging" to replace the files you're adding to Hg and get rid of the ones I manually put on the filesystem?
Attachment #715601 - Attachment mime type: text/x-python-script → text/plain
The tbpl IRC bot output for a push I just performed was:

{
15:08	tbplprod	well then, edmorley is pushing tbpl-prod 6b6b9a561260
15:08	tbplprod	watching http://genericadm.private.phx1.moz...rod/logs/6b6b9a561260.1361804971
15:08	tbplprod	http://hg.mozilla.org/webtools/tbpl/compare/changeset: 1027:b75631c40f54
15:08	tbplprod	tag: tip
15:08	tbplprod	user: Ed Morley <snip%mozilla.com>
15:08	tbplprod	date: Mon Feb 18 16:36:21 2013 +0000
15:08	tbplprod	summary: Backout 02d97a972676 for regressing display of annotated summary when there are multiple log lines
15:08	tbplprod	...6b6b9a561260
15:08	tbplprod	the push is now going to the webheads!! (6b6b9a561260 edmorley)
15:08	tbplprod	edmorley pushed tbpl-prod 6b6b9a561260
}

A few of the people in the channel would like this to be less verbose.

I don't suppose the output could be something like:
tbplprod	well then, edmorley is pushing tbpl-prod 6b6b9a561260. Log: http://genericadm.private.phx1.mozilla.com/chief/tbpl.prod/logs/6b6b9a561260.1361804971
tbplprod	push completed! Changes: https://hg.mozilla.org/webtools/tbpl/pushloghtml?fromchange=FOO&tochange=BAR

Where fromchange in the example above would be b75631c40f54, and tochange would be the new revision specified in the Chef UI.

If that's not possible, could we at least omit the middle block, that lists the old/current repo revision? (Since it's not clear that it is in fact the old revision and not the new).

Many thanks :-)
(Sorry not not yet replying to comment 20 and earlier, we had a ~work-week last week. I'll hold off checking them in until I know if they'll need changing post comment 21)
(In reply to Ed Morley [:edmorley UTC+0] from comment #21)
> The tbpl IRC bot output for a push I just performed was:
> 
> {
> 15:08	tbplprod	well then, edmorley is pushing tbpl-prod 6b6b9a561260
> 15:08	tbplprod	watching
> http://genericadm.private.phx1.moz...rod/logs/6b6b9a561260.1361804971
> 15:08	tbplprod	http://hg.mozilla.org/webtools/tbpl/compare/changeset:
> 1027:b75631c40f54
> 15:08	tbplprod	tag: tip
> 15:08	tbplprod	user: Ed Morley <snip%mozilla.com>
> 15:08	tbplprod	date: Mon Feb 18 16:36:21 2013 +0000
> 15:08	tbplprod	summary: Backout 02d97a972676 for regressing display of
> annotated summary when there are multiple log lines
> 15:08	tbplprod	...6b6b9a561260
> 15:08	tbplprod	the push is now going to the webheads!! (6b6b9a561260
> edmorley)
> 15:08	tbplprod	edmorley pushed tbpl-prod 6b6b9a561260
> }
> 
> A few of the people in the channel would like this to be less verbose.
> 
> I don't suppose the output could be something like:
> tbplprod	well then, edmorley is pushing tbpl-prod 6b6b9a561260. Log:
> http://genericadm.private.phx1.mozilla.com/chief/tbpl.prod/logs/6b6b9a561260.
> 1361804971
> tbplprod	push completed! Changes:
> https://hg.mozilla.org/webtools/tbpl/pushloghtml?fromchange=FOO&tochange=BAR
> 
> Where fromchange in the example above would be b75631c40f54, and tochange
> would be the new revision specified in the Chef UI.
> 
> If that's not possible, could we at least omit the middle block, that lists
> the old/current repo revision? (Since it's not clear that it is in fact the
> old revision and not the new).
> 
> Many thanks :-)

The majority of that output is because the script runs the following

ctx.local("hg incoming -r %s --template '{node|short} | {desc|firstline}\n'" % tag)

If you're ok with removing that or making it less verbose you can reduce the output. A script that does not generate any output from the commands it runs will look like this

17:23:44  mozigniteprod | look, solarce is pushing mozignite-prod master
17:23:45  mozigniteprod | watching http://genericadm.private.phx1.mozilla.com/chief/mozignite.prod/logs/master.1361917405
17:23:53  mozigniteprod | the push is now going to the webheads!! (master solarce)
17:23:54  mozigniteprod | Finished: update_code (8.142s), pre_update (8.142s), update (0.003s)
17:23:59  mozigniteprod | Finished: update_code (3.552s), pre_update (3.552s), update (0.003s)
17:24:14  mozigniteprod | Finished: rsync_project (0.719s), commit_www (13.172s), checkin_changes (13.924s)
17:24:17  mozigniteprod | solarce pushed mozignite-prod master
17:24:19  mozigniteprod | Finished: deploy_app (7.317s), deploy (24.796s)

Anything beyond that is dependent on the update.py contents

Since that script comes from your source repo, you can tweak it as needed and shouldn't need IT help to deploy changes
Assignee: server-ops-webops → bburton
(In reply to Brandon Burton [:solarce] from comment #23)
> The majority of that output is because the script runs the following
> 
> ctx.local("hg incoming -r %s --template '{node|short} | {desc|firstline}\n'"
> % tag)
> 
> If you're ok with removing that or making it less verbose you can reduce the
> output. 

I'd quite like to leave that in, since it's useful in the chief logs.

> A script that does not generate any output from the commands it runs
> will look like this

That example is still a bit too verbose I think - please can we just disable the ircbot, we can manage without it.

Thank you :-)
Checked in the configs:
https://hg.mozilla.org/webtools/tbpl/rev/7fda1eb61e0c

We'll need to check what happens re the conflicts (comment 20) when we try to deploy next. (Testing locally implies as long as the update.py files are identical, mercurial should cope ok.
(In reply to Ed Morley [:edmorley UTC+0] from comment #24)
> (In reply to Brandon Burton [:solarce] from comment #23)
> > The majority of that output is because the script runs the following
> > 
> > ctx.local("hg incoming -r %s --template '{node|short} | {desc|firstline}\n'"
> > % tag)
> > 
> > If you're ok with removing that or making it less verbose you can reduce the
> > output. 
> 
> I'd quite like to leave that in, since it's useful in the chief logs.


> 
> > A script that does not generate any output from the commands it runs
> > will look like this
> 
> That example is still a bit too verbose I think - please can we just disable
> the ircbot, we can manage without it.
> 
> Thank you :-)

Yeah, further modification of the output would require hacking on the Chief and Node Pushbots code, which I'm not qualified to do, though both are on github if you wanted to file issues.

I've pushed a config change to disable the bot for your two chief setups and confirmed they are no longer in #ateam
(In reply to Ed Morley [:edmorley UTC+0] from comment #25)
> Checked in the configs:
> https://hg.mozilla.org/webtools/tbpl/rev/7fda1eb61e0c
> 
> We'll need to check what happens re the conflicts (comment 20) when we try
> to deploy next. (Testing locally implies as long as the update.py files are
> identical, mercurial should cope ok.

Yeah, it should be fine, but if you want to coordinate me doing an 'rm' with your push, let me know

I'll RF this otherwise
Status: NEW → RESOLVED
Closed: 11 years ago
Resolution: --- → FIXED
Thank you for all your work on this - very pleased with our self-deploy ability now! :-)
Depends on: 847865
(Configs in production)
http://genericadm.private.phx1.mozilla.com/chief/tbpl.prod/history is only showing the last chief deploy now - are the logs cleared out regularly or was this unintentional?
That's a known bug / design flaw in Chief. The history panel is built off of in-memory data. Restarts and the like cause it to be cleared out.

You can see the actual log files here:

http://genericadm.private.phx1.mozilla.com/chief/tbpl.prod/logs/
Ok that sounds fine to me :-)
I didn't realize this was the case, I've filed https://github.com/mozilla/chief/issues/8 to think about this and see if it'd be easy to make read from disk
Depends on: 881696
Depends on: 886822
Depends on: 917420
Depends on: 917425
Component: Server Operations: Web Operations → WebOps: Other
Product: mozilla.org → Infrastructure & Operations
Product: Infrastructure & Operations → Infrastructure & Operations Graveyard
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: