Closed Bug 1032978 Opened 10 years ago Closed 9 years ago

Add a standalone process that listens to pulse for release related buildbot messages

Categories

(Release Engineering :: Applications: Shipit, defect)

x86
macOS
defect
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: zeller, Assigned: zeller, Mentored)

References

Details

(Whiteboard: [kanban:engops:https://mozilla.kanbanize.com/ctrl_board/6/2471] [shipit])

Attachments

(4 files, 18 obsolete files)

3.93 KB, patch
bhearsum
: review+
bhearsum
: checked-in+
Details | Diff | Splinter Review
1.19 KB, patch
bhearsum
: review+
bhearsum
: checked-in+
Details | Diff | Splinter Review
11.60 KB, patch
bhearsum
: review+
bhearsum
: checked-in+
Details | Diff | Splinter Review
1015 bytes, patch
bhearsum
: review+
zeller
: checked-in+
Details | Diff | Splinter Review
We need to launch a new standalone process that listens to pulse.m.o for buildbot messages related to releases. I see this process listening for these messages and then using a REST API endpoint on shipit to have them added to new table(s) in the shipit database. This is necessary before we can ship changes to bug 826753
Assignee: nobody → johnlzeller
Blocks: 826753
Depends on: 1032975
Whiteboard: [shipit]
Blocks: 1032985
Attached file shipit-agent.py (obsolete) —
This represents the first version of shipit-agent.py. It needs more love, and won't be ready to ship until we have r+'d the tables to add to the shipit database via bug 1032975. Nevertheless, I could use some feedback here for general things. Some that come to mind already:

 * Using another form of auth other than .netrc (since this isn't a local tool)
 * Filtering for strictly release messages from pulse.m.o
Attachment #8448955 - Flags: feedback?(rail)
Attachment #8448955 - Flags: feedback?(bhearsum)
Attached file releasedata.sql (obsolete) —
I have attached a dump of the schema and 640 release messages from pulse.m.o that *should* represent most, if not all, of the Firefox-31-0b5-build1 release. Please use this dump with "cat releasedata.sql | sqlite3 releasedata.sqlite" to view the messages as rows in a database. It'll help give the questions I have below more context.

Some questions regarding the filtering of messages from pulse.m.o:

 * Should I be filtering for all messages where routing_key = "%release%"? OR, should I be filtering where routing_key = "build.release.%"? OR, some other field like key?
   - It seems like msgs where routing_key contains *release* are related to a release, obviously. But, there are a bunch of additional prefixes that I am not sure how to categorize (unittest.release.*, talos.release.*, etc)
 * I am currently attempting to use the fields product + version + build_number to construct the release name (ie Firefox-31.0b5-build1), but many messages have None in the version and build_number fields. Additionally, sometimes it seems like unrelated products show up like "mobile" or "xulrunner", when I am pretty certain only "Firefox" was running. I could be mistaken... Any advice on what to do with the messages that have None or don't fit the product name?
 * Is there a solid set of messages I should look at to determine each of the following status': Tagging completed, All builds/repacks completed, Updates on betatest, Ready for releasetest, Ready for release, Postrelease? (status steps taken from bug 826753 comment 0

Some of these questions may answer one another, but this was an attempt to put all the unanswered questions out there.
Flags: needinfo?(rail)
Flags: needinfo?(bhearsum)
(In reply to John Zeller [:zeller] from comment #2)

Removing needinfo? from comment 2. This should have been posted to bug 1032985
Flags: needinfo?(rail)
Flags: needinfo?(bhearsum)
Comment on attachment 8448955 [details]
shipit-agent.py

This is supposed to run in foreground, correct? We'll need to keep it running with supervisor in this case (in production).
>#!/usr/bin/env python
>
># Relies on .netrc for username and password

it should read the credentials from a file, deployed by puppet.

>import requests
>from mozillapulse import consumers
>import json
>
>common_keys = ['locale', 'testsurl', 'previous_buildid', 'job_number', 'build_number', 
>                'builddate', 'platform', 'version', 'revision', 'status', 'buildtype', 
>                'product', 'slave', 'buildername', 'buildid', 'timestamp', 'key', 
>                'locales', 'logurl', 'repack', 'tree', 'buildurl', 'release', 'talos', 
>                'insertion_time', 'test', 'os', 'tags']
>
>def got_message(data, message):
>    # TODO: Should this be looking for build.release in the data['payload']['key'] field instead?
>    if "build.release-" in data['_meta']['routing_key']: # Only catch and store release msgs
>        misc = {}
>        for key in data['payload'].keys():
>            if key not in common_keys: misc[key] = data['payload'][key]
>
>        payload = {}
>        payload['sent'] = unicode(data['_meta'].get('sent'))

I bet you can iterate common_keys and assign values instead.


>        payload['routing_key'] = unicode(data['_meta'].get('routing_key'))
>        payload['exchange'] = unicode(data['_meta'].get('exchange'))
>        payload['serializer'] = unicode(data['_meta'].get('serializer'))
>        payload['locale'] = unicode(data['payload'].get('locale'))
>        payload['testsurl'] = unicode(data['payload'].get('testsurl'))
>        payload['previous_buildid'] = unicode(data['payload'].get('previous_buildid'))
>        payload['job_number'] = unicode(data['payload'].get('job_number'))
>        payload['build_number'] = unicode(data['payload'].get('build_number'))
>        payload['builddate'] = unicode(data['payload'].get('builddate'))
>        payload['platform'] = unicode(data['payload'].get('platform'))
>        payload['version'] = unicode(data['payload'].get('version'))
>        payload['revision'] = unicode(data['payload'].get('revision'))
>        payload['status'] = unicode(data['payload'].get('status'))
>        payload['buildtype'] = unicode(data['payload'].get('buildtype'))
>        payload['product'] = unicode(data['payload'].get('product'))
>        payload['slave'] = unicode(data['payload'].get('slave'))
>        payload['buildername'] = unicode(data['payload'].get('buildername'))
>        payload['buildid'] = unicode(data['payload'].get('buildid'))
>        payload['timestamp'] = unicode(data['payload'].get('timestamp'))
>        payload['key'] = unicode(data['payload'].get('key'))
>        payload['locales'] = unicode(data['payload'].get('locales'))
>        payload['logurl'] = unicode(data['payload'].get('logurl'))
>        payload['repack'] = unicode(data['payload'].get('repack'))
>        payload['tree'] = unicode(data['payload'].get('tree'))
>        payload['buildurl'] = unicode(data['payload'].get('buildurl'))
>        payload['release'] = unicode(data['payload'].get('release'))
>        payload['talos'] = unicode(data['payload'].get('talos'))
>        payload['insertion_time'] = unicode(data['payload'].get('insertion_time'))
>        payload['test'] = unicode(data['payload'].get('test'))
>        payload['os'] = unicode(data['payload'].get('os'))
>        payload['tags'] = unicode(data['payload'].get('tags'))
>        payload['misc_payload'] = json.dumps(misc)
>
>        url = r'http://127.0.0.1:5000/status/' + getReleaseName(payload['product'], payload['version'], 
>                                                                payload['build_number'])

I assume that the http:// part will be configurable.

>        r = requests.post(url, data=payload)
>        print r.status_code
>    else:
>        pass
>    
>    message.ack()
>
>def getReleaseName(product, version, build_number):
>    return '%s-%s-build%s' % (product.title(), version, str(build_number))

no need for str(), %s does that automagically.

>    
>def main():
>    try:
>        import uuid
>        unique_label = 'quickstart-%s' % uuid.uuid4()
>    except:

Meh, uuid is a must be. kill datetime.

>        from datetime import datetime
>        unique_label = 'quickstart-%s' % datetime.now()
>
>    pulse = consumers.NormalizedBuildConsumer(applabel=unique_label)
>    pulse.configure(topic='#', callback=got_message)

Can you listen to "build.release-*" messages only here?

>    pulse.listen()
>
>if __name__ == "__main__":
>    main()
Attachment #8448955 - Flags: feedback?(rail) → feedback+
Attachment #8448955 - Flags: feedback?(bhearsum)
Attached file shipit-agent.py (obsolete) —
I switched from the NormalizedBuildConsumer to the BuildConsumer, which holds more data about release builds.

I've broken up the scope into 2 phases. Phase 1 will offer coarse grain detail about release status, and phase 2 will delve deeper into which platforms are being built, which are done, etc, etc. Hence, in phase 1 I am not including '.started' pulse messages. In phase 2 we might start saving them to offer this sort of detail.

Questions:
* If 'test' is in a builderName should we save it? ie when a pulse message like 'WINNT 6.2 release-mozilla-release opt test mochitest-5' comes by, should we just ignore these? I am assuming yes because the original bug said nothing about tracking test status.
* Should the applabel being set with pulse (via unique_label) be more unique than 'quickstart-*'?
* Should the pulse connection be set to durable?
Attachment #8448955 - Attachment is obsolete: true
Attachment #8458329 - Flags: feedback?(bhearsum)
Comment on attachment 8458329 [details]
shipit-agent.py

It'd be helpful to see this script in action, too. Meanwhile, some comments below:

> * If 'test' is in a builderName should we save it? ie when a pulse message
> like 'WINNT 6.2 release-mozilla-release opt test mochitest-5' comes by,
> should we just ignore these? I am assuming yes because the original bug said
> nothing about tracking test status.

You're right that we don't care about test builders. (It may be a nice enhancement later, but not something in scope for now.)

However, you need to be careful about how you remove them. We have a builder called "release-mozilla-esr31-ready_for_releasetest_testing" that we do want to track. Maybe looking for " test " would work.

> * Should the applabel being set with pulse (via unique_label) be more unique
> than 'quickstart-*'?
> * Should the pulse connection be set to durable?

Can you spell out the implications for these a bit more? I think you're more familiar with Pulse than me =). (Which probably means someone else should look at this patch at some point, too.)


>#!/usr/bin/env python
>"""shipit-agent.py [options]
>
>Listens for buildbot release messages from pulse and then sends them
>to the shipit REST API."""
># TODO: Run with supervisor if supposed to be run in the foreground
># TODO: Read credentials from a file deployed by puppet, rather than .netrc
>
>import requests
>from mozillapulse import consumers
>import json
>import uuid
>import logging as log
>from optparse import OptionParser
>import urllib2
>
>TARGET_KEYS = (u'sent', u'routing_key', u'message_id', u'results')
>PROPERTIES_KEYS = (u'product', u'version', u'build_number', u'branch',
>                   u'platform')
>
>
>def got_message(data, message):
>        # Grab all PROPERTIES_KEYS from data['build']['properties']
>        for key in PROPERTIES_KEYS:
>            for prop in data['build'].get('properties'):
>                if prop[0] == key:
>                    try:
>                        payload[key] = prop[1]
>                    except IndexError as e:
>                        payload[key] = 'None'
>                        log.error('IndexError: key "{}" was not found in data["build"]["properties"] for pulse message {} - {}'.
>                                  format(key, payload['builderName'], e))

platform won't be present on the updates builder - you may need to adjust this logic a bit.


>        # Determine Status Type
>        status_match = {'_tag': 'tag', '_repack_': 'build/repack',
>                        'update': 'update',

You might need to use regex to get a better match on this. This is going to match all of the update verify builders, which we don't want.

>                        '-ready_for_releasetest': 'releasetest',
>                        '-ready_for_release': 'release',
>                        '-postrelease': 'postrelease'}

It'd be good to define this somewhere more visible - otherwise they're just magic strings.

>        # Set seqNum and seqTotal
>        # TODO: Have pulse add these to pulse messages
>        if payload['status_type'] == 'build/repack' or \
>           payload['status_type'] == 'update':
>            if len(payload['builderName'].split('/')) == 2:

If you stop matching the update builders (see above), it's only the repacks that need this block. However, it may be better to just look for "/" in the builder name rather than hardcode this based on type.

>        # TODO: Utilize the null entry for sqlite
>        for key in payload:
>            if payload.get(key) is None:
>                payload[key] = 'None'

Why is sqlite involved here?

>
>        # Assemble and send the POST request
>        url = options.protocol + r'://' + options.hostname + r'/status/' + \

I'd recommend using the "furl" library to this sort of thing - it will make sure you don't end up with bad URLs. SlaveAPI has some good examples of how to use it: http://git.mozilla.org/?p=build/slaveapi.git;a=blob;f=slaveapi/clients/inventory.py;h=40d894aea98d13332d4015f9ff99b40e10661739;hb=HEAD#l20

>            getReleaseName(payload['product'], payload[
>                           'version'], payload['build_number'])
>        try:
>            r = requests.post(url, data=payload)

These requests probably aren't going to work due to CSRF protection. You should add a new subclass to https://github.com/mozilla/build-tools/blob/master/lib/python/kickoff/api.py (which will automatically deal with CSRF for you) to handle this and any other endpoints you're using. You should be able to model after Release or ReleaseL10n, but let me know if you have any questions.

>    message.ack()

This will never get run if the above code raises an uncaught exception. Should this be in a "finally" block?

>
>def getReleaseName(product, version, build_number):
>    return '{}-{}-build{}'.format(product.title(), version, build_number)
>
>
>def main():
>    parser = OptionParser()
>    parser.set_defaults(
>        protocol='http',
>        hostname='127.0.0.1:5000',
>        logfilename='shipit-agent.log',
>        verbosity=log.INFO
>    )

>    parser.add_option("-p", "--protocol", dest="protocol",
>                      help="protocol for sending POST request to shipit")
>    parser.add_option("-d", "--dest-hostname", dest="hostname",
>                      help="host where shipit resides")

You're better off just combining this into a single option. You should also move all of this config to a .ini to make deployment much easier.

>    parser.add_option("-l", "--log-filename", dest="logfilename")
>
>    # TODO: Better way than global?
>    # TODO: Make the protocol and host configurable via puppet
>    global options

No, don't do this :). Pass functions the things they need instead.


>    # TODO: Make unique_label even more unique?
>    unique_label = 'quickstart-{}'.format(uuid.uuid4())

Yeah...I'l need 

>    pulse = consumers.BuildConsumer(applabel=unique_label)

Hm. I don't think this should necessarily be unique for every instance of this script. How about using something that containts the ship it API URL? This would mean that if we ever ran multiple instances of this script in production, they'd share a queue (which we'd want), but you could still test against ship-it-dev or a local instance without interfering with the production queue.

What I'm worried about here is a configuration where multiple instances of this script are all sending requests to the same ship it instance.

>    # TODO: Should this be configured to be a durable connection?
>    pulse.configure(topic='build.#.finished', callback=got_message)

Are you able to filter on partial names? Eg, something like build.release*.finished? That would prevent a lot of messages that you don't care about from ever reaching the script.
Attachment #8458329 - Flags: feedback?(bhearsum) → feedback+
(In reply to Ben Hearsum [:bhearsum] from comment #6)
> Comment on attachment 8458329 [details]
> shipit-agent.py
> 
> It'd be helpful to see this script in action, too.

Absolutely. How specifically did you want to see that? There are couple more things to tweak to make sure it's not failing on certain inputs, but once that is done, we can schedule something with another release.

> Meanwhile, some comments
> below:
> 
> > * If 'test' is in a builderName should we save it? ie when a pulse message
> > like 'WINNT 6.2 release-mozilla-release opt test mochitest-5' comes by,
> > should we just ignore these? I am assuming yes because the original bug said
> > nothing about tracking test status.
> 
> You're right that we don't care about test builders. (It may be a nice
> enhancement later, but not something in scope for now.)
> 
> However, you need to be careful about how you remove them. We have a builder
> called "release-mozilla-esr31-ready_for_releasetest_testing" that we do want
> to track. Maybe looking for " test " would work.

Shoot, good catch. 

> 
> > * Should the applabel being set with pulse (via unique_label) be more unique
> > than 'quickstart-*'?
> > * Should the pulse connection be set to durable?
> 
> Can you spell out the implications for these a bit more? I think you're more
> familiar with Pulse than me =). (Which probably means someone else should
> look at this patch at some point, too.)

haha no problem. For applabel, the * is a random number generated by uuid.uuid4(). I am unsure of the implications, because I am not sure if the applabel is actually used to identify consumers. If it is actually used (or will be used in the future) I could see changing quickstart to something like shipitagent. 'quickstart-*' is what shows up in the sample script: http://hg.mozilla.org/users/clegnitto_mozilla.com/pulsequickstart/file/default/python/consumer.py#l25

As for durability. There is a pulse option that allows the pulse server to track you as a consumer and queue up messages you may have missed (ie you haven't ACK'd). Could be useful in the event that it falls over.

> 
> 
> >#!/usr/bin/env python
> >"""shipit-agent.py [options]
> >
> >Listens for buildbot release messages from pulse and then sends them
> >to the shipit REST API."""
> ># TODO: Run with supervisor if supposed to be run in the foreground
> ># TODO: Read credentials from a file deployed by puppet, rather than .netrc
> >
> >import requests
> >from mozillapulse import consumers
> >import json
> >import uuid
> >import logging as log
> >from optparse import OptionParser
> >import urllib2
> >
> >TARGET_KEYS = (u'sent', u'routing_key', u'message_id', u'results')
> >PROPERTIES_KEYS = (u'product', u'version', u'build_number', u'branch',
> >                   u'platform')
> >
> >
> >def got_message(data, message):
> >        # Grab all PROPERTIES_KEYS from data['build']['properties']
> >        for key in PROPERTIES_KEYS:
> >            for prop in data['build'].get('properties'):
> >                if prop[0] == key:
> >                    try:
> >                        payload[key] = prop[1]
> >                    except IndexError as e:
> >                        payload[key] = 'None'
> >                        log.error('IndexError: key "{}" was not found in data["build"]["properties"] for pulse message {} - {}'.
> >                                  format(key, payload['builderName'], e))
> 
> platform won't be present on the updates builder - you may need to adjust
> this logic a bit.
> 

Are you sure? This might be where there is a difference between udpate and update_verify. I have been following update_verify messages and those seem to have platform set in them. ie messages with builderNames like release-mozilla-release-linux64_update_verify_1/6 . Here is the json pulse message from this one: https://zeller.pastebin.mozilla.org/5573396 . Platform is found in msg['build']['properties']

> 
> >        # Determine Status Type
> >        status_match = {'_tag': 'tag', '_repack_': 'build/repack',
> >                        'update': 'update',
> 
> You might need to use regex to get a better match on this. This is going to
> match all of the update verify builders, which we don't want.

Aha. So do you have an example of the sort of builderName we are looking for, for updates? Or even better, a regex? ;P

> 
> >                        '-ready_for_releasetest': 'releasetest',
> >                        '-ready_for_release': 'release',
> >                        '-postrelease': 'postrelease'}
> 
> It'd be good to define this somewhere more visible - otherwise they're just
> magic strings.

Like where? I was considering that it'd be awful nice to submit a patch for buildbotcustom/bin/postrun.py that adds a status_type to the pulse_messages. I started writing the patch already. Would something like that be adequete?

> 
> >        # Set seqNum and seqTotal
> >        # TODO: Have pulse add these to pulse messages
> >        if payload['status_type'] == 'build/repack' or \
> >           payload['status_type'] == 'update':
> >            if len(payload['builderName'].split('/')) == 2:
> 
> If you stop matching the update builders (see above), it's only the repacks
> that need this block. However, it may be better to just look for "/" in the
> builder name rather than hardcode this based on type.

This was another concern I had. It makes me a bit uncomfortable to rely on parsing for a simple "/" to determine messages that hold a seqNum and seqTotal. Do you think this would be another good place to add a key to pulse? Perhaps something like 'seq' which looks like '1/6', and is nullable.

> 
> >        # TODO: Utilize the null entry for sqlite
> >        for key in payload:
> >            if payload.get(key) is None:
> >                payload[key] = 'None'
> 
> Why is sqlite involved here?

More of a not for the REST addition. When the POST is grabbed by the REST API, it enters the data into sqlite. some values are nullable. So I just need to utilize that rather than convert them to a string called 'None'.

> 
> >
> >        # Assemble and send the POST request
> >        url = options.protocol + r'://' + options.hostname + r'/status/' + \
> 
> I'd recommend using the "furl" library to this sort of thing - it will make
> sure you don't end up with bad URLs. SlaveAPI has some good examples of how
> to use it:
> http://git.mozilla.org/?p=build/slaveapi.git;a=blob;f=slaveapi/clients/
> inventory.py;h=40d894aea98d13332d4015f9ff99b40e10661739;hb=HEAD#l20
> 
> >            getReleaseName(payload['product'], payload[
> >                           'version'], payload['build_number'])
> >        try:
> >            r = requests.post(url, data=payload)
> 
> These requests probably aren't going to work due to CSRF protection. You
> should add a new subclass to
> https://github.com/mozilla/build-tools/blob/master/lib/python/kickoff/api.py
> (which will automatically deal with CSRF for you) to handle this and any
> other endpoints you're using. You should be able to model after Release or
> ReleaseL10n, but let me know if you have any questions.

Okay I will look into 'furl' and CSRF here.

> 
> >    message.ack()
> 
> This will never get run if the above code raises an uncaught exception.
> Should this be in a "finally" block?

Yes :)

> 
> >
> >def getReleaseName(product, version, build_number):
> >    return '{}-{}-build{}'.format(product.title(), version, build_number)
> >
> >
> >def main():
> >    parser = OptionParser()
> >    parser.set_defaults(
> >        protocol='http',
> >        hostname='127.0.0.1:5000',
> >        logfilename='shipit-agent.log',
> >        verbosity=log.INFO
> >    )
> 
> >    parser.add_option("-p", "--protocol", dest="protocol",
> >                      help="protocol for sending POST request to shipit")
> >    parser.add_option("-d", "--dest-hostname", dest="hostname",
> >                      help="host where shipit resides")
> 
> You're better off just combining this into a single option. You should also
> move all of this config to a .ini to make deployment much easier.

Check will do.

> 
> >    parser.add_option("-l", "--log-filename", dest="logfilename")
> >
> >    # TODO: Better way than global?
> >    # TODO: Make the protocol and host configurable via puppet
> >    global options
> 
> No, don't do this :). Pass functions the things they need instead.

Ideally yes. What I was unsure of was how the heck to pass to the got_message function, since the function itself is set as a callback variable in pulse.configure(). There is surely a way, but I gotta look harder at the implementation of pulse.configure.

> 
> 
> >    # TODO: Make unique_label even more unique?
> >    unique_label = 'quickstart-{}'.format(uuid.uuid4())
> 
> Yeah...I'l need 
> 
> >    pulse = consumers.BuildConsumer(applabel=unique_label)
> 
> Hm. I don't think this should necessarily be unique for every instance of
> this script. How about using something that containts the ship it API URL?
> This would mean that if we ever ran multiple instances of this script in
> production, they'd share a queue (which we'd want), but you could still test
> against ship-it-dev or a local instance without interfering with the
> production queue.
> 
> What I'm worried about here is a configuration where multiple instances of
> this script are all sending requests to the same ship it instance.

Good concerns. Who should I talk to more about this? I am not quite certain how the applabel is used. Does this mean the applabel would be hardcoded in the config? etc etc.

> 
> >    # TODO: Should this be configured to be a durable connection?
> >    pulse.configure(topic='build.#.finished', callback=got_message)
> 
> Are you able to filter on partial names? Eg, something like
> build.release*.finished? That would prevent a lot of messages that you don't
> care about from ever reaching the script.

I tried this without success. It seems like topic is parsed on '.'
Flags: needinfo?(bhearsum)
(In reply to John Zeller [:zeller] from comment #7)
> (In reply to Ben Hearsum [:bhearsum] from comment #6)
> > Comment on attachment 8458329 [details]
> > shipit-agent.py
> > 
> > It'd be helpful to see this script in action, too.
> 
> Absolutely. How specifically did you want to see that? There are couple more
> things to tweak to make sure it's not failing on certain inputs, but once
> that is done, we can schedule something with another release.

Doesn't really matter to me. I'd just like to the script output, and when ready, see the ship it instance it's updating.

> > 
> > > * Should the applabel being set with pulse (via unique_label) be more unique
> > > than 'quickstart-*'?
> > > * Should the pulse connection be set to durable?
> > 
> > Can you spell out the implications for these a bit more? I think you're more
> > familiar with Pulse than me =). (Which probably means someone else should
> > look at this patch at some point, too.)
> 
> haha no problem. For applabel, the * is a random number generated by
> uuid.uuid4(). I am unsure of the implications, because I am not sure if the
> applabel is actually used to identify consumers. If it is actually used (or
> will be used in the future) I could see changing quickstart to something
> like shipitagent. 'quickstart-*' is what shows up in the sample script:
> http://hg.mozilla.org/users/clegnitto_mozilla.com/pulsequickstart/file/
> default/python/consumer.py#l25



> 
> As for durability. There is a pulse option that allows the pulse server to
> track you as a consumer and queue up messages you may have missed (ie you
> haven't ACK'd). Could be useful in the event that it falls over.

Ah! Yes, it sounds like we want this!

> > 
> > 
> > >#!/usr/bin/env python
> > >"""shipit-agent.py [options]
> > >
> > >Listens for buildbot release messages from pulse and then sends them
> > >to the shipit REST API."""
> > ># TODO: Run with supervisor if supposed to be run in the foreground
> > ># TODO: Read credentials from a file deployed by puppet, rather than .netrc
> > >
> > >import requests
> > >from mozillapulse import consumers
> > >import json
> > >import uuid
> > >import logging as log
> > >from optparse import OptionParser
> > >import urllib2
> > >
> > >TARGET_KEYS = (u'sent', u'routing_key', u'message_id', u'results')
> > >PROPERTIES_KEYS = (u'product', u'version', u'build_number', u'branch',
> > >                   u'platform')
> > >
> > >
> > >def got_message(data, message):
> > >        # Grab all PROPERTIES_KEYS from data['build']['properties']
> > >        for key in PROPERTIES_KEYS:
> > >            for prop in data['build'].get('properties'):
> > >                if prop[0] == key:
> > >                    try:
> > >                        payload[key] = prop[1]
> > >                    except IndexError as e:
> > >                        payload[key] = 'None'
> > >                        log.error('IndexError: key "{}" was not found in data["build"]["properties"] for pulse message {} - {}'.
> > >                                  format(key, payload['builderName'], e))
> > 
> > platform won't be present on the updates builder - you may need to adjust
> > this logic a bit.
> > 
> 
> Are you sure? This might be where there is a difference between udpate and
> update_verify. I have been following update_verify messages and those seem
> to have platform set in them. ie messages with builderNames like
> release-mozilla-release-linux64_update_verify_1/6 . Here is the json pulse
> message from this one: https://zeller.pastebin.mozilla.org/5573396 .
> Platform is found in msg['build']['properties']

So...to my surprise, "platform" is set on the updates builder (eg: http://buildbot-master82.srv.releng.scl3.mozilla.com:8001/builders/release-mozilla-release-updates/builds/1). That's doesn't make sense though, and should be ignored. The work that this builder does is applicable to all platforms. More importantly "Updates" is the thing that blocks "Updates on betatest". The update verify builders block "Ready for release". Ping me on IRC if you're still confused about this.

> 
> > 
> > >        # Determine Status Type
> > >        status_match = {'_tag': 'tag', '_repack_': 'build/repack',
> > >                        'update': 'update',
> > 
> > You might need to use regex to get a better match on this. This is going to
> > match all of the update verify builders, which we don't want.
> 
> Aha. So do you have an example of the sort of builderName we are looking
> for, for updates? Or even better, a regex? ;P

"release-mozilla-release-updates", where the "mozilla-release" part is different depending on the product/branch. You can get a pretty good idea of the different builders used in a release if you look at a log directory like http://ftp.mozilla.org/pub/mozilla.org/firefox/candidates/31.0esr-candidates/build2/logs/.

> 
> > 
> > >                        '-ready_for_releasetest': 'releasetest',
> > >                        '-ready_for_release': 'release',
> > >                        '-postrelease': 'postrelease'}
> > 
> > It'd be good to define this somewhere more visible - otherwise they're just
> > magic strings.
> 
> Like where? I was considering that it'd be awful nice to submit a patch for
> buildbotcustom/bin/postrun.py that adds a status_type to the pulse_messages.
> I started writing the patch already. Would something like that be adequete?

Having them sent along with the Pulse event seems like a good idea. However, I'd love it even more if it could be something defined with the builder, rather than something that postrun.py needs to do. Maybe it can be a property instead? Those are defined in https://github.com/mozilla/build-buildbotcustom/blob/master/process/release.py for release jobs.

> > >        # Set seqNum and seqTotal
> > >        # TODO: Have pulse add these to pulse messages
> > >        if payload['status_type'] == 'build/repack' or \
> > >           payload['status_type'] == 'update':
> > >            if len(payload['builderName'].split('/')) == 2:
> > 
> > If you stop matching the update builders (see above), it's only the repacks
> > that need this block. However, it may be better to just look for "/" in the
> > builder name rather than hardcode this based on type.
> 
> This was another concern I had. It makes me a bit uncomfortable to rely on
> parsing for a simple "/" to determine messages that hold a seqNum and
> seqTotal. Do you think this would be another good place to add a key to
> pulse? Perhaps something like 'seq' which looks like '1/6', and is nullable.

This could probably be another property :). Maybe two properties, one for each side?

> > >        # TODO: Utilize the null entry for sqlite
> > >        for key in payload:
> > >            if payload.get(key) is None:
> > >                payload[key] = 'None'
> > 
> > Why is sqlite involved here?
> 
> More of a not for the REST addition. When the POST is grabbed by the REST
> API, it enters the data into sqlite. some values are nullable. So I just
> need to utilize that rather than convert them to a string called 'None'.

OK, so this is talking about the database that the Ship It app uses, right? In that case, you should probably say "database" instead of sqlite. It uses mysql on the production servers :).


> > >    parser.add_option("-l", "--log-filename", dest="logfilename")
> > >
> > >    # TODO: Better way than global?
> > >    # TODO: Make the protocol and host configurable via puppet
> > >    global options
> > 
> > No, don't do this :). Pass functions the things they need instead.
> 
> Ideally yes. What I was unsure of was how the heck to pass to the
> got_message function, since the function itself is set as a callback
> variable in pulse.configure(). There is surely a way, but I gotta look
> harder at the implementation of pulse.configure.

The normal way of doing this sort of thing is a wrapper. Eg:

def do_got_message(myarg, pulsearg1, pulsearg2):
    # do the work!

if __name__ == "__main__":
    myarg = whatever
    def got_message(*args, **kwargs):
        do_got_message(myarg, *args, **kwargs)

    pulse.blah(got_message)

> 
> > 
> > 
> > >    # TODO: Make unique_label even more unique?
> > >    unique_label = 'quickstart-{}'.format(uuid.uuid4())
> > 
> > Yeah...I'l need 
> > 
> > >    pulse = consumers.BuildConsumer(applabel=unique_label)
> > 
> > Hm. I don't think this should necessarily be unique for every instance of
> > this script. How about using something that containts the ship it API URL?
> > This would mean that if we ever ran multiple instances of this script in
> > production, they'd share a queue (which we'd want), but you could still test
> > against ship-it-dev or a local instance without interfering with the
> > production queue.
> > 
> > What I'm worried about here is a configuration where multiple instances of
> > this script are all sending requests to the same ship it instance.
> 
> Good concerns. Who should I talk to more about this? I am not quite certain
> how the applabel is used. Does this mean the applabel would be hardcoded in
> the config? etc etc.

It could be in the config, or the config could just define the ship it URI, and then the codes combines it with something else. I'm not too fussy here.

> 
> > 
> > >    # TODO: Should this be configured to be a durable connection?
> > >    pulse.configure(topic='build.#.finished', callback=got_message)
> > 
> > Are you able to filter on partial names? Eg, something like
> > build.release*.finished? That would prevent a lot of messages that you don't
> > care about from ever reaching the script.
> 
> I tried this without success. It seems like topic is parsed on '.'

Boo! Alright.
Flags: needinfo?(bhearsum)
(In reply to Ben Hearsum [:bhearsum] from comment #8)
> > > >    # TODO: Make unique_label even more unique?
> > > >    unique_label = 'quickstart-{}'.format(uuid.uuid4())
> > > 
> > > Yeah...I'l need 
> > > 
> > > >    pulse = consumers.BuildConsumer(applabel=unique_label)
> > > 
> > > Hm. I don't think this should necessarily be unique for every instance of
> > > this script. How about using something that containts the ship it API URL?
> > > This would mean that if we ever ran multiple instances of this script in
> > > production, they'd share a queue (which we'd want), but you could still test
> > > against ship-it-dev or a local instance without interfering with the
> > > production queue.
> > > 
> > > What I'm worried about here is a configuration where multiple instances of
> > > this script are all sending requests to the same ship it instance.
> > 
> > Good concerns. Who should I talk to more about this? I am not quite certain
> > how the applabel is used. Does this mean the applabel would be hardcoded in
> > the config? etc etc.
> 
> It could be in the config, or the config could just define the ship it URI,
> and then the codes combines it with something else. I'm not too fussy here.

I am still not real sure what to do here. A few thoughts on if there are multiple instances listening from pulse, and they all have the same applabel (production):
* Would there actually be be a queue? Can pulse even accept multiple app_label's that are the same?
* If it doesn't queue properly, I would be concerned about race conditions. For instance, what if #1 has a message, #2 signs on, #1 ack's to pulse, and then pulse sends another, but both #1 and #2 grab it?

Makes sense in terms of having uniqueness between the prod and dev environments, but I am concerns about how pulse is actually doing these things. I will look into it on my own, but if you have any idea, I'd love some clarity on it.
Flags: needinfo?(bhearsum)
Attached file shipit-agent.py (obsolete) —
I have made changes based on the new schema reviewed in bug 1032975, and the feedback in comment 6 and comment 8. This is still not ready, and I need to test, but this is a good time to grab some more feedback I think.
Attachment #8458329 - Attachment is obsolete: true
Attachment #8462926 - Flags: feedback?(bhearsum)
(In reply to John Zeller [:zeller] from comment #9)
> (In reply to Ben Hearsum [:bhearsum] from comment #8)
> > > > >    # TODO: Make unique_label even more unique?
> > > > >    unique_label = 'quickstart-{}'.format(uuid.uuid4())
> > > > 
> > > > Yeah...I'l need 
> > > > 
> > > > >    pulse = consumers.BuildConsumer(applabel=unique_label)
> > > > 
> > > > Hm. I don't think this should necessarily be unique for every instance of
> > > > this script. How about using something that containts the ship it API URL?
> > > > This would mean that if we ever ran multiple instances of this script in
> > > > production, they'd share a queue (which we'd want), but you could still test
> > > > against ship-it-dev or a local instance without interfering with the
> > > > production queue.
> > > > 
> > > > What I'm worried about here is a configuration where multiple instances of
> > > > this script are all sending requests to the same ship it instance.
> > > 
> > > Good concerns. Who should I talk to more about this? I am not quite certain
> > > how the applabel is used. Does this mean the applabel would be hardcoded in
> > > the config? etc etc.
> > 
> > It could be in the config, or the config could just define the ship it URI,
> > and then the codes combines it with something else. I'm not too fussy here.
> 
> I am still not real sure what to do here. A few thoughts on if there are
> multiple instances listening from pulse, and they all have the same applabel
> (production):
> * Would there actually be be a queue? Can pulse even accept multiple
> app_label's that are the same?

My read of it was that yes, it can - and that it will make sure each message is only sent to one of them.

> * If it doesn't queue properly, I would be concerned about race conditions.
> For instance, what if #1 has a message, #2 signs on, #1 ack's to pulse, and
> then pulse sends another, but both #1 and #2 grab it?

Dunno :). I would hope that Pulse is smart enough to deal with this.

> Makes sense in terms of having uniqueness between the prod and dev
> environments, but I am concerns about how pulse is actually doing these
> things. I will look into it on my own, but if you have any idea, I'd love
> some clarity on it.

I was hoping this would be simple to deal with, but clearly it isn't (not your fault). Let's just go with anything as the app label right now. We'll cross this bridge whenever we get to a point where we want to run multiple pulse agent instances - just add a comment about this being unknown right now. I don't think you should spent any more time on this for the initial landing.
Flags: needinfo?(bhearsum)
Attachment #8448966 - Attachment is obsolete: true
Comment on attachment 8462926 [details]
shipit-agent.py

This is greatly improved. You've noted most of the things that need to be done still with TODOs. I'd like to see the next version of this as a real patch to the build/tools repository, and there's a few small things below:

>#!/usr/bin/env python
>"""shipit-agent.py [options]

Now that I have a better understanding of the full system, I think this needs a better name. "agent" is pretty generic, how about something like "shipit-notifier.py"?

>            # Set chunkNum and chunkTotal
>            # TODO: hardcode based on type, or just on "/"? Using "/" may be
>            # too vague
>            # TODO: define these with the builder as a property

Once these are defined as properties on the builder, I don't think you'll need any sort of hardcoding. You can just look for chunkNum and chunkTotal, and set them in the payload if they're present.

>            # Assemble and send the POST request
>            url = furl(options.api_root)
>            url.path.add('status')
>            url.path.add(getReleaseName(payload['product'], payload[
>                                        'version'], payload['build_number']))
>            auth = (options.username, options.password)
>            log.debug("making request to {}".format(url))
>            try:
>                r = requests.post(str(url), auth=auth, data=payload)
>            except Exception as e:
>                log.error('ERROR: failed to submit release event {} - {}'.
>                          format(payload['event_name'], e))

Just a reminder: this needs to be moved to build/tools/lib/python/kickoff/api.py still.


>def getReleaseName(product, version, build_number):
>    return '{}-{}-build{}'.format(product.title(), version, build_number)

You should just import this from https://github.com/mozilla/build-tools/blob/master/lib/python/release/info.py, seeing as this script is going to live in that repo.

>
>def main():
>    parser = OptionParser()
>    parser.add_option("-c", "--config", dest="config",
>                      help="Configuration file")

Yay :). Please add an example ini when you make a proper patch. https://github.com/mozilla/build-tools/blob/master/buildfarm/release/release-runner.ini.example can probably serve as an example.
Attachment #8462926 - Flags: feedback?(bhearsum) → feedback+
Attached patch pulse.patch (obsolete) — Splinter Review
I am not sure of a better place to put this, and since it directly correlates to the standalone script I am going to put it here.

This patch is for release.py to add chunkTotal, chunkNum, and status_type to the build properties. I think that we could do without adding the releasetest and release status_type's here, but I have added them for good measure.
Attachment #8463685 - Flags: feedback?(bhearsum)
Attachment #8463685 - Attachment description: patch for build-buildbotcustom → pulse.patch
Attachment #8463685 - Attachment is patch: true
Attachment #8463685 - Attachment mime type: text/x-patch → text/plain
Attached file release.py (obsolete) —
Here is the changed release.py file in case it is hard to tell how I placed the additions of status_type's.
Attachment #8463690 - Attachment mime type: text/x-python → text/plain
Comment on attachment 8463685 [details] [diff] [review]
pulse.patch

Review of attachment 8463685 [details] [diff] [review]:
-----------------------------------------------------------------

(In reply to John Zeller [:zeller] from comment #14)
> Created attachment 8463690 [details]
> release.py
> 
> Here is the changed release.py file in case it is hard to tell how I placed
> the additions of status_type's.

Thanks, this is helpful! You can also pass "-U8" (or a higher number) to provide more context in the diff.

::: process/release.py
@@ +1010,1 @@
>                  },

I wouldn't consider xulrunner part of "build" status. It's a product wholly aside from Firefox, and not something that QA or RelMan really cares about.

@@ +1275,4 @@
>              properties={
>                  'platform': None,
>                  'branch': 'release-%s' % sourceRepoInfo['name'],
> +                'status_type': 'update',

It's not clear to me whether we want status_type for dummy builders or not. It's going to be hard to know until we're using it. So, let's keep it for now and see how we like it.

@@ +1320,5 @@
>                                 'platform': platform,
>                                 'branch': 'release-%s' % sourceRepoInfo['name'],
> +                               'chunkTotal': int(updateVerifyChunks),
> +                               'chunkNum': int(n),
> +                               'status_type': 'update_verify',

Did we decide to create another status_type for this? I recall talking (or at least thinking) about these maybe being part of the "release" status_type.
Attachment #8463685 - Flags: feedback?(bhearsum) → feedback+
(In reply to Ben Hearsum [:bhearsum] from comment #15)
> ::: process/release.py
> @@ +1320,5 @@
> >                                 'platform': platform,
> >                                 'branch': 'release-%s' % sourceRepoInfo['name'],
> > +                               'chunkTotal': int(updateVerifyChunks),
> > +                               'chunkNum': int(n),
> > +                               'status_type': 'update_verify',
> 
> Did we decide to create another status_type for this? I recall talking (or
> at least thinking) about these maybe being part of the "release" status_type.

You're right, we decided that. I just threw this in as literally what it is. Do you think it should be be named "release" instead? OR did it block "releasetest"?

Also, do you think it's unnecessary to label the "releasetest" and "release" binary steps?
Attached patch pulse.patch (obsolete) — Splinter Review
My only questions are in comment 16
Attachment #8463685 - Attachment is obsolete: true
Attachment #8464153 - Flags: review?(bhearsum)
Attached file bug1032978.patch (obsolete) —
Okay this is the first real patch here. Includes changes from previous shipit-agent.py feedback.

shipit-notifier.conf.example was added as an attempt an a .conf file for supervisord to you. Is there anything in the shipit-notifier.py file that needs to be updated for supervisord?

shipit-notifier.ini.example has been added

api.py changes were added. The class I have added is very barebones at the moment. I am not quite sure what needs to be done to move the furl/request chunk of shipit-notifier.py to the new Status class in api.py
Attachment #8462926 - Attachment is obsolete: true
Attachment #8464220 - Flags: review?(bhearsum)
Attachment #8463690 - Attachment is obsolete: true
Attached patch bug1032978.patch (obsolete) — Splinter Review
Added in the entry of enUSPlatforms to release-runner.py when a release is marked as 'Complete'. All statements and comments still stand from comment 18.
Attachment #8464220 - Attachment is obsolete: true
Attachment #8464220 - Flags: review?(bhearsum)
Attachment #8464234 - Flags: review?(bhearsum)
(In reply to John Zeller [:zeller] from comment #16)
> (In reply to Ben Hearsum [:bhearsum] from comment #15)
> > ::: process/release.py
> > @@ +1320,5 @@
> > >                                 'platform': platform,
> > >                                 'branch': 'release-%s' % sourceRepoInfo['name'],
> > > +                               'chunkTotal': int(updateVerifyChunks),
> > > +                               'chunkNum': int(n),
> > > +                               'status_type': 'update_verify',
> > 
> > Did we decide to create another status_type for this? I recall talking (or
> > at least thinking) about these maybe being part of the "release" status_type.
> 
> You're right, we decided that. I just threw this in as literally what it is.
> Do you think it should be be named "release" instead? OR did it block
> "releasetest"?

"ready for releasetest" (what you're calling the "releasetest" status) means "all prerequisite steps before we can test updates on the releasetest channel have been run"
"ready for release" (what you've been calling the "release" status) means "all prerequisite steps before we can release to users have been run"

Update verify is a verification that we need to do before we're ready to release, so it contributes to the "release" status.

> Also, do you think it's unnecessary to label the "releasetest" and "release"
> binary steps?

What do you mean by "label them binary"?
Comment on attachment 8464153 [details] [diff] [review]
pulse.patch

Review of attachment 8464153 [details] [diff] [review]:
-----------------------------------------------------------------

Let me know if you want to chat today so we can get on the same page.

::: process/release.py
@@ +1009,1 @@
>                  },

Still need to remove status type from the xulrunner builders - they don't contribute to any status that anyone cares about.

@@ +1319,5 @@
>                                 'platform': platform,
>                                 'branch': 'release-%s' % sourceRepoInfo['name'],
> +                               'chunkTotal': int(updateVerifyChunks),
> +                               'chunkNum': int(n),
> +                               'status_type': 'update_verify',

And I still think this is part of the "release" status...
Attachment #8464153 - Flags: review?(bhearsum) → review-
Comment on attachment 8464234 [details] [diff] [review]
bug1032978.patch

Review of attachment 8464234 [details] [diff] [review]:
-----------------------------------------------------------------

There's some things that need fixing but this is getting really far along - it's great to see! When you have the next version ready, I think I'd like to see it running in staging. You'll need your own ship it instance, release runner instance, and shipit-notifier all hooked up to each other for that.

(In reply to John Zeller [:zeller] from comment #18)
> shipit-notifier.conf.example was added as an attempt an a .conf file for
> supervisord to you. Is there anything in the shipit-notifier.py file that
> needs to be updated for supervisord?

I don't think so. We have helpers in Puppet that deal with most of the supervisord details. This is what it looks like for release runner, as an example:
[program:releaserunner]
command=/builds/releaserunner/tools/buildfarm/release/release-runner.sh /builds/releaserunner /builds/releaserunner/release-runner.log /builds/releaserunner/release-runner.ini
autostart=true
autorestart=true
user=cltbld
environment=
exitcodes=0
log_stderr=true
log_stdout=true
redirect_stderr=true
stdout_logfile=/builds/releaserunner/release-runner.log
autorestart=true
logfile_maxbytes=50MB
logfile_backups=10
stopasgroup=true
killasgroup=true

You can drop this file in your next patch, it's not something we'll need in this repo when it lands.

> shipit-notifier.ini.example has been added

Great!

> api.py changes were added. The class I have added is very barebones at the
> moment. I am not quite sure what needs to be done to move the furl/request
> chunk of shipit-notifier.py to the new Status class in api.py

Have a look at how release-runner.py uses the existing classes there. Eg:
https://github.com/mozilla/build-tools/blob/master/buildfarm/release/release-runner.py#L63
https://github.com/mozilla/build-tools/blob/master/buildfarm/release/release-runner.py#L83

I don't think you need a wrapper like it has, but the above should show you how it instantiates and calls the update method. You should be doing similar.

::: buildfarm/release/release-runner.py
@@ +421,5 @@
>              rr.update_status(release, 'Running sendchange command')
> +            cfgFile = getReleaseConfigName(
> +                release['product'], path.basename(release['branch']),
> +                release['version'], staging)
> +            rr.start_release_automation(release, sendchange_master, cfgFile)

You should be passing along the enUSPlatform here rather than the config file. I know we pass along cfgFile to bump_configs, but that's because it's doing a job that requires knowing the filename. These changes look fine otherwise.

::: buildfarm/release/shipit-notifier.ini.example
@@ +3,5 @@
> +username: admin
> +password: password
> +
> +[shipit-agent]
> +logfile: shipit-notifier.log

It turns out I misled you when I said log details should be in here. Turns out that scripts like this just log to stdout/stderr, and supervisord deals with the output from there. Sorry about that!

::: buildfarm/release/shipit-notifier.py
@@ +1,1 @@
> +#!/usr/bin/env python

This script is much improved - almost done, I think. Once you integrate the new Status object from api.py I'll want to see it running.

@@ +22,5 @@
> +
> +def receive_message(options, data, message):
> +    try:
> +        if (data['payload']['build'].get('builderName').startswith('release-')) \
> +            and (' test ' not in data['payload']['build'].get('builderName')):

This is a case where it's probably better to use "break" rather than big long conditionals followed by a big long intended block. Ie:
if not data['payload']['build'].get('builderName').startswith('release-'):
  return

@@ +25,5 @@
> +        if (data['payload']['build'].get('builderName').startswith('release-')) \
> +            and (' test ' not in data['payload']['build'].get('builderName')):
> +
> +            data.update(data.pop('_meta'))
> +            data.update(data.pop('payload'))

This is fine, but what's it's purpose? Probably deserves a short comment.

@@ +35,5 @@
> +                payload[key] = data.get(key)
> +            payload[u'event_name'] = data['build'].get(u'builderName')
> +
> +            for key in PROPERTIES_KEYS:
> +                for prop in data['build'].get('properties'):

Yay, these are nice loops :)

@@ +42,5 @@
> +                            payload[key] = prop[1]
> +                        except IndexError as e:
> +                            payload[key] = None
> +                            log.error('{} not in build properties for {} - {}'.
> +                                      format(key, payload['event_name'], e))

Are these really errors? Things that get logged as errors are usually things that break the running program (https://docs.python.org/2/howto/logging.html#logging-basic-tutorial). If that was the case here, I'd expect this method to abort. I'm only guessing, but this seems like it's more of a warning.

@@ +64,5 @@
> +                log.debug('successful POST request for release event {}'.
> +                          format(payload['event_name']))
> +            else:
> +                log.error('ERROR {}: failed POST request for release event {}'.
> +                          format(r.status_code, payload['event_name']))

This should end up as something like:
status_api = Status(...)
status_api.update(releaseName, **payload)

The API base class will take care of all of the details and error handling.

@@ +76,5 @@
> +                      help="Configuration file")
> +    options = parser.parse_args()[0]
> +
> +    def got_message(*args, **kwargs):
> +        receive_message(options, *args, **kwargs)

I don't think this works anymore. OptionParser() isn't going to parse the configuration file for you. You need to use ConfigParser for that.

Also, please pass the specific things (ie, api_root) needed. Having receive_message depend directly on a command line interface object makes it difficult to re-use.
Attachment #8464234 - Flags: review?(bhearsum)
Attachment #8464234 - Flags: review-
Attachment #8464234 - Flags: feedback+
Attached patch bug1032978.patch (obsolete) — Splinter Review
Very close! :D

* The supervisord shipit-notifier.conf.example file is already included in this patch. Were you suggesting to leave it in this patch, or add to the next?
* I set the logfile name to be sys.stdout. Not sure if this is the right way to go about it. Depends how supervisord deals with the output I suppose.
Attachment #8464234 - Attachment is obsolete: true
Attachment #8466022 - Flags: review?(bhearsum)
Attached patch pulse.patch (obsolete) — Splinter Review
(In reply to Ben Hearsum [:bhearsum] from comment #20)
> (In reply to John Zeller [:zeller] from comment #16)
> > (In reply to Ben Hearsum [:bhearsum] from comment #15)
> > > ::: process/release.py
> > > @@ +1320,5 @@
> > > >                                 'platform': platform,
> > > >                                 'branch': 'release-%s' % sourceRepoInfo['name'],
> > > > +                               'chunkTotal': int(updateVerifyChunks),
> > > > +                               'chunkNum': int(n),
> > > > +                               'status_type': 'update_verify',
> > > 
> > > Did we decide to create another status_type for this? I recall talking (or
> > > at least thinking) about these maybe being part of the "release" status_type.
> > 
> > You're right, we decided that. I just threw this in as literally what it is.
> > Do you think it should be be named "release" instead? OR did it block
> > "releasetest"?
> 
> "ready for releasetest" (what you're calling the "releasetest" status) means
> "all prerequisite steps before we can test updates on the releasetest
> channel have been run"
> "ready for release" (what you've been calling the "release" status) means
> "all prerequisite steps before we can release to users have been run"
> 
> Update verify is a verification that we need to do before we're ready to
> release, so it contributes to the "release" status.
> 
> > Also, do you think it's unnecessary to label the "releasetest" and "release"
> > binary steps?
> 
> What do you mean by "label them binary"?

I mean that there is only one of each message, so it might be a bit pointless to name them. I have here for consistencies sake, but I am offering veto :)

(In reply to Ben Hearsum [:bhearsum] from comment #21)
> Comment on attachment 8464153 [details] [diff] [review]
> pulse.patch
> 
> Review of attachment 8464153 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> Let me know if you want to chat today so we can get on the same page.

Let's tag up today now that the patches are close to settled.

> 
> @@ +1319,5 @@
> >                                 'platform': platform,
> >                                 'branch': 'release-%s' % sourceRepoInfo['name'],
> > +                               'chunkTotal': int(updateVerifyChunks),
> > +                               'chunkNum': int(n),
> > +                               'status_type': 'update_verify',
> 
> And I still think this is part of the "release" status...

My thinking here is to keep it explicit, rather than the implied relation of update_verify and release. Given that, I think it makes sense to keep it as is. What do you think given that?
Attachment #8464153 - Attachment is obsolete: true
Attachment #8466029 - Flags: review?(bhearsum)
Comment on attachment 8466022 [details] [diff] [review]
bug1032978.patch

Review of attachment 8466022 [details] [diff] [review]:
-----------------------------------------------------------------

Super super close now. I really would like to this running before it lands though.

(In reply to John Zeller [:zeller] from comment #23)
> Created attachment 8466022 [details] [diff] [review]
> bug1032978.patch
> 
> Very close! :D
> 
> * The supervisord shipit-notifier.conf.example file is already included in
> this patch. Were you suggesting to leave it in this patch, or add to the
> next?

When I said "drop" I meant "remove". It's not needed at all, because Puppet deals with it.

> * I set the logfile name to be sys.stdout. Not sure if this is the right way
> to go about it. Depends how supervisord deals with the output I suppose.

You don't need to set it at all, actually. This is how release-runner does it:
 logging.basicConfig(format="%(asctime)s - %(levelname)s - %(message)s",
level=log_level)

::: buildfarm/release/shipit-notifier.py
@@ +25,5 @@
> +def receive_message(config, data, message):
> +    try:
> +        if not (data['payload']['build'].get('builderName').startswith('release-')) \
> +                and not (' test ' not in data['payload']['build'].get('builderName')):
> +            return

You're using a double negative in the second part of this, "and not (... not ...)". I think this means that test builders will make it through this condition. This is probably better to break into two blocks, ie:
if (is not a release builder):
  return
if (is a test builder):
  return

@@ +29,5 @@
> +            return
> +
> +        # Concat the _meta and payload keys of the original pulse message
> +        data.update(data.pop('_meta'))
> +        data.update(data.pop('payload'))

This comment is better, but can you answer the why this code exists, rather than just say what it does?

@@ +49,5 @@
> +                                    format(key, payload['event_name'], e))
> +
> +        name = getReleaseName(payload.pop('product'), 
> +                              payload.pop('version'), 
> +                              payload.pop('build_number'))

Are you sure you want "pop" here? It removes the key from the dict. Perhaps you just need payload["product"]?

::: lib/python/kickoff/api.py
@@ +51,1 @@
>          log.debug('Data sent: %s' % data)

This file looks fine! I think it truly is as simple as it seems.
Attachment #8466022 - Flags: review?(bhearsum)
Attachment #8466022 - Flags: review-
Attachment #8466022 - Flags: feedback+
Comment on attachment 8466029 [details] [diff] [review]
pulse.patch

Per our Vidyo conversation, status_type is getting a new name.
Attachment #8466029 - Attachment is obsolete: true
Attachment #8466029 - Flags: review?(bhearsum)
Status_type changed to event_group
Attachment #8469475 - Flags: review?(bhearsum)
(In reply to Ben Hearsum [:bhearsum] from comment #25)
> Comment on attachment 8466022 [details] [diff] [review]
> bug1032978.patch
> 
> Review of attachment 8466022 [details] [diff] [review]:
> -----------------------------------------------------------------
> @@ +49,5 @@
> > +                                    format(key, payload['event_name'], e))
> > +
> > +        name = getReleaseName(payload.pop('product'), 
> > +                              payload.pop('version'), 
> > +                              payload.pop('build_number'))
> 
> Are you sure you want "pop" here? It removes the key from the dict. Perhaps
> you just need payload["product"]?

Yup! payload is playing doubleduty as a placeholder for these variables. We don't want them in the POST to the API, so I am popping them to remove them once we have the name.
(In reply to John Zeller [:zeller] from comment #28)
> (In reply to Ben Hearsum [:bhearsum] from comment #25)
> > Comment on attachment 8466022 [details] [diff] [review]
> > bug1032978.patch
> > 
> > Review of attachment 8466022 [details] [diff] [review]:
> > -----------------------------------------------------------------
> > @@ +49,5 @@
> > > +                                    format(key, payload['event_name'], e))
> > > +
> > > +        name = getReleaseName(payload.pop('product'), 
> > > +                              payload.pop('version'), 
> > > +                              payload.pop('build_number'))
> > 
> > Are you sure you want "pop" here? It removes the key from the dict. Perhaps
> > you just need payload["product"]?
> 
> Yup! payload is playing doubleduty as a placeholder for these variables. We
> don't want them in the POST to the API, so I am popping them to remove them
> once we have the name.

In that case, you may want to use a different data structure than "payload" to pass when you call update(). If you don't, any new data that gets added to payload will get sent to ship it by default - is that what you want?
(In reply to Ben Hearsum [:bhearsum] from comment #29)
> (In reply to John Zeller [:zeller] from comment #28)
> > (In reply to Ben Hearsum [:bhearsum] from comment #25)
> > > Comment on attachment 8466022 [details] [diff] [review]
> > > bug1032978.patch
> > > 
> > > Review of attachment 8466022 [details] [diff] [review]:
> > > -----------------------------------------------------------------
> > > @@ +49,5 @@
> > > > +                                    format(key, payload['event_name'], e))
> > > > +
> > > > +        name = getReleaseName(payload.pop('product'), 
> > > > +                              payload.pop('version'), 
> > > > +                              payload.pop('build_number'))
> > > 
> > > Are you sure you want "pop" here? It removes the key from the dict. Perhaps
> > > you just need payload["product"]?
> > 
> > Yup! payload is playing doubleduty as a placeholder for these variables. We
> > don't want them in the POST to the API, so I am popping them to remove them
> > once we have the name.
> 
> In that case, you may want to use a different data structure than "payload"
> to pass when you call update(). If you don't, any new data that gets added
> to payload will get sent to ship it by default - is that what you want?

My assumption is exactly that, that anything present in payload at the time of calling update() will be sent to shipit. However, I suppose it's more explicit to have another data structure to hold none 'payload' items. I will do that instead I think
Attached patch bug1032978.patch (obsolete) — Splinter Review
Changes made based on feedback in previous comment, and changed status_type to reflect event_group and group.
Attachment #8466022 - Attachment is obsolete: true
Attachment #8469503 - Flags: review?(bhearsum)
Attachment #8469475 - Flags: review?(bhearsum) → review+
Comment on attachment 8469503 [details] [diff] [review]
bug1032978.patch

Review of attachment 8469503 [details] [diff] [review]:
-----------------------------------------------------------------

r=me with the two things below fixed. Feel free to carry forward the r+ to the next patch.

However, please don't land this one yet - I'd like to figure out some way for me to see this in action before it lands, it's not super high risk, but the release runner bit has potential for bustage.

::: buildfarm/release/shipit-notifier.py
@@ +41,5 @@
> +                    log.warning('{} not in build properties for {} - {}'.
> +                                format(key, payload['event_name'], e))
> +        name = getReleaseName(payload.pop('product'), 
> +                              payload.pop('version'), 
> +                              payload.pop('build_number'))

There's no reason to pop here - just use payload["product"]

@@ +43,5 @@
> +        name = getReleaseName(payload.pop('product'), 
> +                              payload.pop('version'), 
> +                              payload.pop('build_number'))
> +
> +        payload = {}

And use a new name for this to avoid confusion, and subtle bugs in the future.
Attachment #8469503 - Flags: review?(bhearsum) → review+
Attachment #8469475 - Flags: checked-in+
Attached patch bug1032978.patch (obsolete) — Splinter Review
Attachment #8469503 - Attachment is obsolete: true
Attachment #8470995 - Flags: review?(bhearsum)
Merged to production, and deployed.
comment 34 refers specifically to pulse.patch
Attached patch bug1032978.patch (obsolete) — Splinter Review
Ran a random test on this with junk randoms and mostly valid randoms. Nothing seems to break things now. There is at least 1 more change that could possibly be made, and that is whether we want to catch all Exceptions in the receive_message function. The way it is setup now, it won't fall over from an Exception in that function, but it makes sure to call log.error when an Exception is raised. Here is the hacky test I used to generate random POSTs: http://zeller.pastebin.mozilla.org/5935935

shipit-notifier.py output: http://zeller.pastebin.mozilla.org/5935819
shipit app output: http://zeller.pastebin.mozilla.org/5935822

Ran into a weird error in api.py where requests.session.request() did not recognize being sent config. I checked the requests source and sure enough it doesn't seem to accept it. I am not sure if this is something that has come up before or not, but for testing locally I just simply removed the config=self.config in the 2 locations in exists in api.py.

One more question, but it's for REST API BUG 1032985
Attachment #8470995 - Attachment is obsolete: true
Attachment #8470995 - Flags: review?(bhearsum)
Attachment #8471267 - Flags: review?(bhearsum)
Comment on attachment 8471267 [details] [diff] [review]
bug1032978.patch

John found a couple of bugs in this during a live demo yesterday - a new patch will be needed here.
Attachment #8471267 - Attachment is obsolete: true
Attachment #8471267 - Flags: review?(bhearsum)
One thing that was pointed out to me yesterday is that we need to make sure that failed builds are accounted for correctly. For example, one chunk of repacks could fail (results would be 2 instead of 0). As it stands right now, that would get passed along to ship it, and ship it would consider it "complete". I think you need to make a decision about how you want to handle that. Two options could be:
1) Only pass along events for builds that were successful.
2) Pass along all events for builds and update Ship It to allow for multiple events with the same name+event_name, and check "results" when deciding upon status.
(In reply to Ben Hearsum [:bhearsum] from comment #37)
> Comment on attachment 8471267 [details] [diff] [review]
> bug1032978.patch
> 
> John found a couple of bugs in this during a live demo yesterday - a new
> patch will be needed here.

You referring to the config part?
(In reply to Ben Hearsum [:bhearsum] from comment #38)
> One thing that was pointed out to me yesterday is that we need to make sure
> that failed builds are accounted for correctly. For example, one chunk of
> repacks could fail (results would be 2 instead of 0). As it stands right
> now, that would get passed along to ship it, and ship it would consider it
> "complete". I think you need to make a decision about how you want to handle
> that. Two options could be:
> 1) Only pass along events for builds that were successful.
> 2) Pass along all events for builds and update Ship It to allow for multiple
> events with the same name+event_name, and check "results" when deciding upon
> status.

I actually thought we were already doing this, must have been a previous version. Nevertheless, yes definitely a concern. I think we should go with 2, that way we have the data for previous releases that can be used in the future.
(In reply to John Zeller [:zeller] from comment #40)
> (In reply to Ben Hearsum [:bhearsum] from comment #38)
> > One thing that was pointed out to me yesterday is that we need to make sure
> > that failed builds are accounted for correctly. For example, one chunk of
> > repacks could fail (results would be 2 instead of 0). As it stands right
> > now, that would get passed along to ship it, and ship it would consider it
> > "complete". I think you need to make a decision about how you want to handle
> > that. Two options could be:
> > 1) Only pass along events for builds that were successful.
> > 2) Pass along all events for builds and update Ship It to allow for multiple
> > events with the same name+event_name, and check "results" when deciding upon
> > status.
> 
> I actually thought we were already doing this, must have been a previous
> version. Nevertheless, yes definitely a concern. I think we should go with
> 2, that way we have the data for previous releases that can be used in the
> future.

Okay this is being handled is the most recent patch represented in Bug 1032985 comment 14 - have at it!
Attached patch bug1032978.patch (obsolete) — Splinter Review
This is tested with the most recent patch in Bug 1032985 comment 16. The only remaining *possible* issue is the error I ran into here concerning config, see comment 36. Copying error description here for ease.

from comment 36:
"Ran into a weird error in api.py where requests.session.request() did not recognize being sent config. I checked the requests source and sure enough it doesn't seem to accept it. I am not sure if this is something that has come up before or not, but for testing locally I just simply removed the config=self.config in the 2 locations in exists in api.py."
Attachment #8474711 - Flags: review?(bhearsum)
Attached patch bug1032978.patch (obsolete) — Splinter Review
Changes shipit-agent to shipit-notifier in a couple remaining places. Removed the test pulse host.

Questions from comment 42 still need an answer.
Attachment #8474711 - Attachment is obsolete: true
Attachment #8474711 - Flags: review?(bhearsum)
Attachment #8474786 - Flags: review?(bhearsum)
(In reply to John Zeller [:zeller] from comment #42)
> Created attachment 8474711 [details] [diff] [review]
> bug1032978.patch
> 
> This is tested with the most recent patch in Bug 1032985 comment 16. The
> only remaining *possible* issue is the error I ran into here concerning
> config, see comment 36. Copying error description here for ease.
> 
> from comment 36:
> "Ran into a weird error in api.py where requests.session.request() did not
> recognize being sent config. I checked the requests source and sure enough
> it doesn't seem to accept it. I am not sure if this is something that has
> come up before or not, but for testing locally I just simply removed the
> config=self.config in the 2 locations in exists in api.py."

I guess that's it, I don't remember now. If things are working with the current state of the code, there's nothing to worry about.
(In reply to Ben Hearsum [:bhearsum] from comment #44)
> (In reply to John Zeller [:zeller] from comment #42)
> > Created attachment 8474711 [details] [diff] [review]
> > bug1032978.patch
> > 
> > This is tested with the most recent patch in Bug 1032985 comment 16. The
> > only remaining *possible* issue is the error I ran into here concerning
> > config, see comment 36. Copying error description here for ease.
> > 
> > from comment 36:
> > "Ran into a weird error in api.py where requests.session.request() did not
> > recognize being sent config. I checked the requests source and sure enough
> > it doesn't seem to accept it. I am not sure if this is something that has
> > come up before or not, but for testing locally I just simply removed the
> > config=self.config in the 2 locations in exists in api.py."
> 
> I guess that's it, I don't remember now. If things are working with the
> current state of the code, there's nothing to worry about.

Sounds good to me, I figured as much and just removed it for testing.
Comment on attachment 8474786 [details] [diff] [review]
bug1032978.patch

Review of attachment 8474786 [details] [diff] [review]:
-----------------------------------------------------------------

This seems fine. Please don't land it until the ship it side is sorted out in bug 1032985 - we should push those changes before this lands.
Attachment #8474786 - Flags: review?(bhearsum) → review+
Comment on attachment 8474786 [details] [diff] [review]
bug1032978.patch

With the necessary ship-it change landed, I also landed this and updated the release runner instance - it should take effect with the next release.
Attachment #8474786 - Flags: checked-in+
Backed out both the tools and buildbotcustom patches:

Sylvestre
08:56:30 is there anyone to see what is wrong with beta9 build ?
08:56:40 [release-runner] failed

pmoore
09:12:53 Sylvestre: i'll take a look

Sylvestre
09:13:18 pmoore, thanks

pmoore
09:18:05 Sylvestre: i'm not too familiar with how release runner works, but i see the problem arose when it tried to make a commit "Update release config for Firefox-32.0b9-build1" to buildbot-configs, but there were no changes to commit
09:18:42 i'll see if i can work out why there were no changes
09:22:14 Sylvestre: this looks suspicious to me:

[cltbld@buildbot-master81.srv.releng.scl3.mozilla.com buildbot-configs]$ pwd
/builds/releaserunner/tools/buildfarm/release/buildbot-configs
[cltbld@buildbot-master81.srv.releng.scl3.mozilla.com buildbot-configs]$ hg status
? mozilla/localconfig.py
? mozilla/thunderbird_localconfig.py
[cltbld@buildbot-master81.srv.releng.scl3.mozilla.com buildbot-configs]$

09:22:40 there are local changes not added, which were not tracked by hg
09:23:19 i'll have a look at historical changesets with similar commit messages to see what typically should be in such a commit ....

Sylvestre
09:23:46 pmoore, Maybe related to bhearsum's intern
09:29:11 pmoore, I think we should just revert the change and restart the build

pmoore
09:32:56 Sylvestre: is that a release runner change?
09:34:52 bug 1032978 ?
09:35:56 Sylvestre: is this the first release since comment https://bugzilla.mozilla.org/show_bug.cgi?id=1032978#c47 was made?
09:36:16 zeller: ^^

Sylvestre
09:39:13 pmoore, probably
09:39:37 last gtb of a beta was last monday

pmoore
09:39:44 ok
09:39:49 i'll backout, and r? you

Sylvestre
09:40:41 ok, however, i don't see what could cause that in the patches

pmoore
09:40:56 yes, looks only related to pulse notifications
09:41:08 maybe i should continue investigating first

Sylvestre
09:43:10 maybe this patch generates these temp files
09:43:18 blocking the hg commit

pmoore
09:43:58 Sylvestre: i don't mind either way - if you like we can try a backout, or i can continue looking

Sylvestre
09:44:25 backout + retry, we will know very soon with that :)

pmoore
09:44:51 is anything downstream consuming these pulse notifications, if we disable them?
09:45:10 do mozmill tests run against betas, for example?

mgerva
09:45:16 mmmh there are 3 "Build of Firefox-32.0b9-build1" and 2 "[release-runner] failed"

pmoore
09:45:31 mgerva: ^^

Sylvestre
09:46:46 mgerva, yes but no email about a potential build

mgerva
09:47:12 Sylvestre: and nothing in /running /pending :(

Sylvestre
09:47:41 fyi, bhearsum warned me about potential issues with the next beta build

pmoore
09:47:55 ok

Sylvestre
09:48:02 I also did some changes but they are before (and seems to work)

pmoore
09:48:08 i'll backout

Sylvestre
09:48:32 ta

pmoore
09:54:05 Sylvestre: i backed out the tools change, wondering if i should also back out the buildbotcustom change
09:54:14 then i'd also need to reconfig

Sylvestre
09:54:27 pmoore, ok
09:54:34 let me know if I can help

pmoore
09:54:38 i'll back out both, to be safe
09:54:52 thanks
Comment on attachment 8469475 [details] [diff] [review]
set event_group property for release builders

Backed out
Attachment #8469475 - Flags: checked-in+ → checked-in-
Comment on attachment 8474786 [details] [diff] [review]
bug1032978.patch

Backed out
Attachment #8474786 - Flags: checked-in+ → checked-in-
Comment on attachment 8474786 [details] [diff] [review]
bug1032978.patch

Updated the release runner checkout for the backout (to rev bcfa18146bc7).

The error we hit was:
[bumps and lands everything, reconfigs masters, then gets to the sendchange ...]
2014-08-21 23:22:22,585 - INFO - updating status for Firefox-32.0b9-build1 to Running sendchange command
2014-08-21 23:22:22,585 - DEBUG - Request to https://ship-it.mozilla.org/releases/Firefox-32.0b9-build1
2014-08-21 23:22:22,585 - DEBUG - Data sent: {'status': 'Running sendchange command', 'csrf_token': ...
2014-08-21 23:22:22,587 - INFO - Resetting dropped connection: ship-it.mozilla.org
2014-08-21 23:22:22,877 - DEBUG - "POST /releases/Firefox-32.0b9-build1 HTTP/1.1" 200 0
2014-08-21 23:22:22,878 - INFO - updating status for Firefox-32.0b9-build1 to Sendchange failed
2014-08-21 23:22:22,878 - DEBUG - Request to https://ship-it.mozilla.org/releases/Firefox-32.0b9-build1
2014-08-21 23:22:22,878 - DEBUG - Data sent: {'status': 'Sendchange failed', 'csrf_token': ...
2014-08-21 23:22:22,930 - DEBUG - "POST /releases/Firefox-32.0b9-build1 HTTP/1.1" 200 0
2014-08-21 23:22:22,930 - ERROR - Sendchange failed for {'status': 'Pending', 'comment': 'Bug 1057061 is still outstanding for beta9 as the patch was leaking ...
'32.0b8build1,32.0b7build1', 'ready': True, 'mozillaRevision': '58eb677e55f3', 'name': 'Firefox-32.0b9-build1'}:
Traceback (most recent call last):
  File "release-runner.py", line 428, in main
    enUSPlatforms = readReleaseConfig(cfgFile)['enUSPlatforms']
  File "/builds/releaserunner/tools/lib/python/release/info.py", line 66, in readReleaseConfig
    return readConfig(configfile, keys=['releaseConfig'], required=required)
  File "/builds/releaserunner/tools/lib/python/release/info.py", line 84, in readConfig
    execfile(configfile, c)
IOError: [Errno 2] No such file or directory: 'release-firefox-mozilla-beta.py'
We also didn't get mail for this error.
Attached patch bug1032978.patch (obsolete) — Splinter Review
Looks like the issue was because cfgFile was *only* the filename, when it needed to be the entire path concated with the file name. I am going to work out a way to test this change.

pulse.patch should still be good, I don't think that has any issues.
Attachment #8474786 - Attachment is obsolete: true
Attachment #8477637 - Flags: review?(bhearsum)
Comment on attachment 8477637 [details] [diff] [review]
bug1032978.patch

Review of attachment 8477637 [details] [diff] [review]:
-----------------------------------------------------------------

::: buildfarm/release/release-runner.py
@@ +424,5 @@
> +            cfgFile = path.join(cfgDir, cfgFile)
> +            cfgFile = getReleaseConfigName(
> +                release['product'], path.basename(release['branch']),
> +                release['version'], staging)
> +            enUSPlatforms = readReleaseConfig(cfgFile)['enUSPlatforms']

I don't think this is going to change anything. You set cfgFile, and then you overwrite it with what getReleaseConfigName returns. You need to join these up. You can also use a structure like:
cfgFile = path.join(
    configs_workdir,
    "mozilla",
    ....
)

which might be easier to read than temporary variables
Attachment #8477637 - Flags: review?(bhearsum) → review-
Attached patch bug1032978.patch (obsolete) — Splinter Review
Aha, yeah I did overwrite it didn't I, opps! Fixed it all up and used mgerva's staging-release to run a fake release with the changes and it works! :)
Attachment #8477637 - Attachment is obsolete: true
Attachment #8481635 - Flags: review?(bhearsum)
Testing the patch above with staging-release highlighted the need for a shipit tweak (forms and releases view). These have been tested as well and are needed for the above patch to work :)
Attachment #8481640 - Flags: review?(bhearsum)
Comment on attachment 8481635 [details] [diff] [review]
bug1032978.patch

This looks fine, but it doesn't have the shipit notifier bits now. Please reattach a patch with those included.
Attachment #8481635 - Flags: review?(bhearsum)
Comment on attachment 8481640 [details] [diff] [review]
track enUSPlatforms in ship it

Review of attachment 8481640 [details] [diff] [review]:
-----------------------------------------------------------------

This seems important indeed...
Attachment #8481640 - Flags: review?(bhearsum) → review+
Opps, good catch
Attachment #8481635 - Attachment is obsolete: true
Attachment #8482893 - Flags: review?(bhearsum)
Attachment #8482893 - Flags: review?(bhearsum) → review+
Comment on attachment 8481640 [details] [diff] [review]
track enUSPlatforms in ship it

I landed this and pushed it to production.
Attachment #8481640 - Flags: checked-in+
Comment on attachment 8482893 [details] [diff] [review]
send enUSPlatforms to ship it when starting a release

I landed this and updated the production release runner to use it.
Attachment #8482893 - Flags: checked-in+
Simple patch. Just changes applabel to be 'shipit-notifier' rather than 'quickstart-{}'.format(uuid.uuid4()). Necessary for using durable connection :)
Attachment #8487460 - Flags: review?(bhearsum)
Attachment #8487460 - Flags: review?(bhearsum) → review+
Attachment #8487460 - Flags: checked-in+
So this is landed, but not working yet. We got an error when the first event came in:
2014-09-18 11:09:59,781 msg received - release-mozilla-release-final_verification
2014-09-18 11:09:59,781 u'event_group' - release-mozilla-release-final_verification


Unfortunately, the logging was done with log.error(), so no Traceback was given. We should probably change that log.exception() so it prints one.
Landed a change to shipit_notifier.py that replaces log.error() with log.exception()
Looks like it's because some pulse messages don't have 'event_group' set. I am going to patch this by setting payload[u'group'] to 'other' when the 'event_group' is missing. Hopefully this 'event_group' is not missing in all pulse messages. Did we properly deploy the release_runner changes that added this into the build properties?

Here is the Traceback:
2014-09-18 17:10:27,017 u'event_group' - release-mozilla-beta-firefox_reset_schedulers
Traceback (most recent call last):
  File "/builds/shipit_notifier/tools/buildfarm/release/shipit-notifier.py", line 65, in receive_message
    payload[u'group'] = payload.pop(u'event_group')
KeyError: u'event_group'
(In reply to John Zeller [:zeller] from comment #66)
> Looks like it's because some pulse messages don't have 'event_group' set. I
> am going to patch this by setting payload[u'group'] to 'other' when the
> 'event_group' is missing. Hopefully this 'event_group' is not missing in all
> pulse messages. Did we properly deploy the release_runner changes that added
> this into the build properties?

For the record, release runner isn't involved with build properties. You're talking about a patch to https://github.com/mozilla/build-buildbotcustom/blob/master/process/release.py.

You're right though, it looks like it never got relanded after it was backed out in https://github.com/mozilla/build-buildbotcustom/commit/23a53d0f16ca015f9ccf0eb37b57c8d25e90a8f0.
Attachment #8469475 - Attachment description: pulse.patch → set event_group property for release builders
Attachment #8469475 - Flags: checked-in- → checked-in+
Attachment #8481640 - Attachment description: shipit.patch → track enUSPlatforms in ship it
Attachment #8482893 - Attachment description: bug1032978.patch → send enUSPlatforms to ship it when starting a release
Attachment #8487460 - Attachment description: bug1032978-2.patch → fix pules configuration in ship it notifier
In prod with reconfig on 2014-09-22 08:20 PT
As this blocks bug 826753, this is a relman request for Q4 if possible.
Whiteboard: [shipit] → [kanban:engops:https://mozilla.kanbanize.com/ctrl_board/6/2471] [shipit]
Mass component change for ship it bugs.
Component: Release Automation → Ship It
Rail fixed up the remaining issues in other bugs. We now have basic status on the Ship It web interface.
Status: NEW → RESOLVED
Closed: 9 years ago
Resolution: --- → FIXED
Component: Applications: ShipIt (backend) → Applications: ShipIt
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: