Closed Bug 614576 Opened 14 years ago Closed 13 years ago

send build/test events to pulse

Categories

(Release Engineering :: General, defect, P3)

x86_64
Linux
defect

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: bhearsum, Assigned: catlee)

References

Details

(Whiteboard: [pulse])

Attachments

(3 files, 3 obsolete files)

Work on this is already underway as I understand it. Filing bug to track it.
Whiteboard: [pulse]
Attached patch WIP, mostly tested (obsolete) — Splinter Review
The PulseStatus implementation.

This doesn't address some of the naming recommendations made on the list, but I'm not sure this is right place for that.  Bear's idea of an external message translator makes more sense.
Attachment #493015 - Flags: feedback?(dustin)
Attachment #493015 - Flags: feedback?(bhearsum)
Comment on attachment 493015 [details] [diff] [review]
WIP, mostly tested

It seems like the time/utc stuff belongs in a more general purpose file.

What happens if an item of self.ignores isn't a string or a class with a match method? Does that get handled gracefully at runtime? ignores should probably be renamed to ignoredBuilders, too.

Style nit: Move the ignore stuff to the bottom.

I'm not really knowledgeable enough about pulse to know if the routing keys are sensible. Maybe Christian should have a look over those?
Attachment #493015 - Flags: feedback?(bhearsum) → feedback+
Comment on attachment 493015 [details] [diff] [review]
WIP, mostly tested

I think you want d.addBoth here:
        d.addCallbacks(_finished)
furthermore, _finished should return its argument - otherwise it will end up silently ignoring exceptions.

pushEvents is only *sometimes* called in an asynchronous fashion (expecting a deferred), and when it's not, any failure will become an unhandled exception.  So you should probably add
        d.addErrback(log.err)
after d.addBoth(_finished).

Are the message ID's ordered from least to most specific?  It seems like you would want names like
  builder.$builder.$buildnum.started
  builder.$builder.$buildnum.$stepname.started
  builder.$builder.$buildnum.$stepname.log.$logname.started # keep room for substeps
  builder.$builder.$buildnum.$stepname.log.$logname.chunk
  builder.$builder.$buildnum.$stepname.log.$logname.finished
  builder.$builder.$buildnum.$stepname.stopped
  builder.$builder.$buildnum.finished
  slave.$slavename.{dis,}connected
  change.$changenum.added
etc.  Then a wildcard could filter for everything involving, say, "builder.mylinuxbuilder.*"
Attachment #493015 - Flags: feedback?(dustin) → feedback+
Attached patch PulseStatus implementation (obsolete) — Splinter Review
All of your comments have been addressed.  I named the routing keys 'build.X.Y.Z' instead of 'builder.X.Y.Z' because I think that is a better description of what the data is about.

I also removed a lot of redundant data from certain events, particularly log events.
Attachment #493015 - Attachment is obsolete: true
Attachment #493994 - Flags: review?(dustin)
Attachment #493994 - Flags: review?(bhearsum)
(In reply to comment #4)
> Created attachment 493994 [details] [diff] [review]
> PulseStatus implementation
> 
> All of your comments have been addressed.  I named the routing keys
> 'build.X.Y.Z' instead of 'builder.X.Y.Z' because I think that is a better
> description of what the data is about.

+1 to build.X.Y.Z
Comment on attachment 493994 [details] [diff] [review]
PulseStatus implementation

Per IRC, something like "assert hasattr(i, 'match') and callable(i.match)" would be good in __init__, for the ignores, and builderAdded should use try/except just in case match() doesn't take the args we expect it to.

r=me with that change
Attachment #493994 - Flags: review?(bhearsum) → review+
carrying over bhearsum's r+
Attachment #493994 - Attachment is obsolete: true
Attachment #494081 - Flags: review?(dustin)
Attachment #493994 - Flags: review?(dustin)
Comment on attachment 494081 [details] [diff] [review]
PulseStatus implementation

looks good
Attachment #494081 - Flags: review?(dustin) → review+
Blocks: 610915
2. The RelEng side should send a notification email to an email list if it
can't connect to or send the message via Pulse. This makes it so if
pulse.mozilla.org does go down, it won't affect anything RelEng related. We can
later manually craft messages and send them if need be.
(In reply to comment #9)
> 2. The RelEng side should send a notification email to an email list if it
> can't connect to or send the message via Pulse. This makes it so if
> pulse.mozilla.org does go down, it won't affect anything RelEng related. We can
> later manually craft messages and send them if need be.

Whoops, forgot to mention that this is a request from Christian/QA copied from bug 610915.
Also FYI, there is a native twisted library for AMQP if you want to drop down to
that level rather than using my python helper lib.
(In reply to comment #10)
> (In reply to comment #9)
> > 2. The RelEng side should send a notification email to an email list if it
> > can't connect to or send the message via Pulse. This makes it so if
> > pulse.mozilla.org does go down, it won't affect anything RelEng related. We can
> > later manually craft messages and send them if need be.
> 
> Whoops, forgot to mention that this is a request from Christian/QA copied from
> bug 610915.

The masters communicate to pulse in a separate thread, so it shouldn't affect regular operation of the master.  It also buffers up to 10,000 messages by default before dropping them.  Any errors are logged into twisted.log, which is picked up by our exception watcher.  Do we need anything more than this?
If PULSE_PASSWORD is false, we won't load the plugin.

I think we'll need to restart masters to pick up the new carrot/mozillapulse dependencies.
Attachment #496062 - Flags: review?(bhearsum)
Attachment #496062 - Flags: review?(bhearsum) → review+
Same as before, with the addition of PULSE_USERNAME.
Attachment #496062 - Attachment is obsolete: true
Attachment #496213 - Flags: review?(bhearsum)
Attachment #496213 - Flags: review?(bhearsum) → review+
(In reply to comment #12)
> The masters communicate to pulse in a separate thread, so it shouldn't affect
> regular operation of the master.  It also buffers up to 10,000 messages by
> default before dropping them.  Any errors are logged into twisted.log, which is
> picked up by our exception watcher.  Do we need anything more than this?

Nope, sounds great.
mozillapulse + dependencies installed on all our production buildbot masters.
Comment on attachment 496213 [details] [diff] [review]
Turn Pulse on in master_common.py

changeset:   3403:d794e3d8d80b

Checked into default with a minor tweak of using hasattr(passwords, 'PULSE_PASSWORD') instead of if passwords.PULSE_PASSWORD
Attachment #496213 - Flags: checked-in+
Comment on attachment 494081 [details] [diff] [review]
PulseStatus implementation

On default changeset:   1234:bc6013743b17
Attachment #494081 - Flags: checked-in+
Depends on: 618062
Attachment #497378 - Flags: review?(bhearsum)
Attachment #497378 - Flags: review?(bhearsum) → review+
Flags: needs-reconfig?
Comment on attachment 497378 [details] [diff] [review]
Replace spaces with underscores in routing keys

Checked in on default: changeset:   1257:4e63a612d5ac
Attachment #497378 - Flags: checked-in+
Comment on attachment 497378 [details] [diff] [review]
Replace spaces with underscores in routing keys

Landed on default and production-0.8.
Clearing reconfig flag
Flags: needs-reconfig?
This is on everywhere now.
Status: ASSIGNED → RESOLVED
Closed: 13 years ago
Resolution: --- → FIXED
Blocks: 659776
Blocks: 660105
Product: mozilla.org → Release Engineering
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: