Closed
Bug 1156296
Opened 9 years ago
Closed 9 years ago
make sure buildbot bridge message acking makes sense
Categories
(Release Engineering :: General, defect)
Tracking
(Not tracked)
RESOLVED
FIXED
People
(Reporter: bhearsum, Assigned: bhearsum)
References
Details
(Whiteboard: [bbb])
Attachments
(1 file, 1 obsolete file)
13.10 KB,
patch
|
selenamarie
:
review+
bhearsum
:
checked-in+
|
Details | Diff | Splinter Review |
Right now, the Buildbot Bridge acks messages from both the Taskcluster and Buildbot queues even if an exception is raised. This probably makes sense for some cases (eg, if a bug in the bridge tries to do something that continually raises an exception, we don't want the message to stick around forever), but maybe not in others. This needs to be given more thought, and the logic adjusted. It might even be best to delegate message acking to the BuildbotListener and TCListener classes, rather than having the ListenerService base do it.
Assignee | ||
Comment 1•9 years ago
|
||
Selena had some good thoughts on this in bug 1135192: > What happens when we get a partial failure between acking a pulse message, > claiming a task and updating a row in mysql? Should that group of events be > wrapped in try/finally blocks to clean up after itself if there's a partial > failure? I think she's right - this (and any other place where we writing changes to multiple systems) should be wrapped, and acks should probably be conditional on successfully completing everything. The only question might be whether or not all of the operations are undo-able....
Assignee | ||
Comment 2•9 years ago
|
||
The philosophy I took here was to ack messages if everything completes successfully, as soon as we do something that's not repeatable (eg, most Taskcluster requests), or if we hit an error in the data (eg, lots of the cases in handleFinished). I also realized that the buildername checking I recently added to BuildbotListener could be done outside of the loops in handleStarted/Finished, so I moved them there.
Attachment #8609375 -
Flags: review?(sdeckelmann)
Comment 3•9 years ago
|
||
Comment on attachment 8609375 [details] [diff] [review] improve message acking Review of attachment 8609375 [details] [diff] [review]: ----------------------------------------------------------------- ::: bbb/services.py @@ +58,5 @@ > + for allowed in self.allowed_builders: > + if re.match(allowed, buildername): > + log.debug("Builder %s matches an allowed pattern", buildername) > + break > + else: Hah! So, if self.allowed_builders is empty do we want this to run? I'm guessing so. That's a funny config issue, but maybe documented? for/else is tricksy. What if we turned this into a function instead? like: def allow_builder(self, buildername): is_allowed = False for allowed in self.allowed_builders: if re.match(allowed, buildername): log.debug("Builder %s matches an allowed pattern", buildername) is_allowed = True break return is_allowed if not self.allow_builder(buildername): log.debug("Builder %s does not match any pattern, ignoring it", buildername) msg.ack() return @@ +134,5 @@ > + for allowed in self.allowed_builders: > + if re.match(allowed, buildername): > + log.debug("Builder %s matches an allowed pattern", buildername) > + break > + else: See previous comment about using a function for this guy. @@ +165,5 @@ > log.info("Buildbot results are %s", results) > if results == SUCCESS: > log.info("Marking task %s as completed", taskid) > self.tc_queue.reportCompleted(taskid, runid) > + msg.ack() Should we be setting acked = True in these clauses? it appears that we'd try to double-ack once we make it through this conditional block. @@ +201,4 @@ > self.bbb_db.deleteBuildRequest(brid) > > + # If everything went well and the message hasn't been acked, do it. This could > + # happen if the "WEIRD" conditions is hit in every iteration of the loop Is this comment correct? From reading, it seems like we could hit this if results == SKIPPED also. @@ +358,4 @@ > log.debug("Builder %s matches an allowed pattern", buildername) > break > else: > log.info("Builder %s does not match any pattern, rejecting it", buildername) See above about for/else :)
Assignee | ||
Comment 4•9 years ago
|
||
(In reply to Selena Deckelmann :selenamarie :selena from comment #3) > Comment on attachment 8609375 [details] [diff] [review] > improve message acking > > Review of attachment 8609375 [details] [diff] [review]: > ----------------------------------------------------------------- > > ::: bbb/services.py > @@ +58,5 @@ > > + for allowed in self.allowed_builders: > > + if re.match(allowed, buildername): > > + log.debug("Builder %s matches an allowed pattern", buildername) > > + break > > + else: > > Hah! So, if self.allowed_builders is empty do we want this to run? I'm > guessing so. That's a funny config issue, but maybe documented? for/else is > tricksy. If allowed_builders is empty I think it make sense that we ignore _everything_. With that being the case I think this code is correct. However... > What if we turned this into a function instead? > > like: > > def allow_builder(self, buildername): > is_allowed = False > for allowed in self.allowed_builders: > if re.match(allowed, buildername): > log.debug("Builder %s matches an allowed pattern", buildername) > is_allowed = True > break > return is_allowed > > if not self.allow_builder(buildername): > log.debug("Builder %s does not match any pattern, ignoring it", > buildername) > msg.ack() > return ...this is much more readable anyways, so it's worth doing. > @@ +165,5 @@ > > log.info("Buildbot results are %s", results) > > if results == SUCCESS: > > log.info("Marking task %s as completed", taskid) > > self.tc_queue.reportCompleted(taskid, runid) > > + msg.ack() > > Should we be setting acked = True in these clauses? it appears that we'd try > to double-ack once we make it through this conditional block. > > @@ +201,4 @@ > > self.bbb_db.deleteBuildRequest(brid) > > > > + # If everything went well and the message hasn't been acked, do it. This could > > + # happen if the "WEIRD" conditions is hit in every iteration of the loop > > Is this comment correct? From reading, it seems like we could hit this if > results == SKIPPED also. It's....partially correct :). I'll update it.
Assignee | ||
Comment 5•9 years ago
|
||
I think this addresses all of your concerns. I implemented allow_builders a bit differently, using an early return instead of a break (for short functions this reads better to me). I fixed up the "acked" stuff too, and discovered that I can check msg.acknowledge rather than maintaining a local variable to track it.
Attachment #8609375 -
Attachment is obsolete: true
Attachment #8609375 -
Flags: review?(sdeckelmann)
Attachment #8609514 -
Flags: review?(sdeckelmann)
Updated•9 years ago
|
Attachment #8609514 -
Flags: review?(sdeckelmann) → review+
Assignee | ||
Updated•9 years ago
|
Attachment #8609514 -
Flags: checked-in+
Assignee | ||
Updated•9 years ago
|
Status: NEW → RESOLVED
Closed: 9 years ago
Resolution: --- → FIXED
Assignee | ||
Updated•9 years ago
|
Whiteboard: [bbb]
Updated•6 years ago
|
Component: General Automation → General
You need to log in
before you can comment on or make changes to this bug.
Description
•