Closed Bug 1156296 Opened 9 years ago Closed 9 years ago

make sure buildbot bridge message acking makes sense

Categories

(Release Engineering :: General, defect)

x86_64
Linux
defect
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: bhearsum, Assigned: bhearsum)

References

Details

(Whiteboard: [bbb])

Attachments

(1 file, 1 obsolete file)

Right now, the Buildbot Bridge acks messages from both the Taskcluster and Buildbot queues even if an exception is raised. This probably makes sense for some cases (eg, if a bug in the bridge tries to do something that continually raises an exception, we don't want the message to stick around forever), but maybe not in others. This needs to be given more thought, and the logic adjusted.

It might even be best to delegate message acking to the BuildbotListener and TCListener classes, rather than having the ListenerService base do it.
Selena had some good thoughts on this in bug 1135192:
> What happens when we get a partial failure between acking a pulse message,
> claiming a task and updating a row in mysql? Should that group of events be
> wrapped in try/finally blocks to clean up after itself if there's a partial
> failure?

I think she's right - this (and any other place where we writing changes to multiple systems) should be wrapped, and acks should probably be conditional on successfully completing everything. The only question might be whether or not all of the operations are undo-able....
Attached patch improve message acking (obsolete) — Splinter Review
The philosophy I took here was to ack messages if everything completes successfully, as soon as we do something that's not repeatable (eg, most Taskcluster requests), or if we hit an error in the data (eg, lots of the cases in handleFinished).

I also realized that the buildername checking I recently added to BuildbotListener could be done outside of the loops in handleStarted/Finished, so I moved them there.
Attachment #8609375 - Flags: review?(sdeckelmann)
Comment on attachment 8609375 [details] [diff] [review]
improve message acking

Review of attachment 8609375 [details] [diff] [review]:
-----------------------------------------------------------------

::: bbb/services.py
@@ +58,5 @@
> +        for allowed in self.allowed_builders:
> +            if re.match(allowed, buildername):
> +                log.debug("Builder %s matches an allowed pattern", buildername)
> +                break
> +        else:

Hah! So, if self.allowed_builders is empty do we want this to run? I'm guessing so. That's a funny config issue, but maybe documented? for/else is tricksy. 

What if we turned this into a function instead? 

like: 

def allow_builder(self, buildername):
  is_allowed = False
  for allowed in self.allowed_builders:
     if re.match(allowed, buildername):
        log.debug("Builder %s matches an allowed pattern", buildername)
        is_allowed = True
        break
   return is_allowed

if not self.allow_builder(buildername):
  log.debug("Builder %s does not match any pattern, ignoring it", buildername)
  msg.ack()
  return

@@ +134,5 @@
> +        for allowed in self.allowed_builders:
> +            if re.match(allowed, buildername):
> +                log.debug("Builder %s matches an allowed pattern", buildername)
> +                break
> +        else:

See previous comment about using a function for this guy.

@@ +165,5 @@
>              log.info("Buildbot results are %s", results)
>              if results == SUCCESS:
>                  log.info("Marking task %s as completed", taskid)
>                  self.tc_queue.reportCompleted(taskid, runid)
> +                msg.ack()

Should we be setting acked = True in these clauses? it appears that we'd try to double-ack once we make it through this conditional block.

@@ +201,4 @@
>                  self.bbb_db.deleteBuildRequest(brid)
>  
> +        # If everything went well and the message hasn't been acked, do it. This could
> +        # happen if the "WEIRD" conditions is hit in every iteration of the loop

Is this comment correct? From reading, it seems like we could hit this if results == SKIPPED also.

@@ +358,4 @@
>                  log.debug("Builder %s matches an allowed pattern", buildername)
>                  break
>          else:
>              log.info("Builder %s does not match any pattern, rejecting it", buildername)

See above about for/else :)
(In reply to Selena Deckelmann :selenamarie :selena from comment #3)
> Comment on attachment 8609375 [details] [diff] [review]
> improve message acking
> 
> Review of attachment 8609375 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> ::: bbb/services.py
> @@ +58,5 @@
> > +        for allowed in self.allowed_builders:
> > +            if re.match(allowed, buildername):
> > +                log.debug("Builder %s matches an allowed pattern", buildername)
> > +                break
> > +        else:
> 
> Hah! So, if self.allowed_builders is empty do we want this to run? I'm
> guessing so. That's a funny config issue, but maybe documented? for/else is
> tricksy. 

If allowed_builders is empty I think it make sense that we ignore _everything_. With that being the case I think this code is correct. However...

> What if we turned this into a function instead? 
> 
> like: 
> 
> def allow_builder(self, buildername):
>   is_allowed = False
>   for allowed in self.allowed_builders:
>      if re.match(allowed, buildername):
>         log.debug("Builder %s matches an allowed pattern", buildername)
>         is_allowed = True
>         break
>    return is_allowed
> 
> if not self.allow_builder(buildername):
>   log.debug("Builder %s does not match any pattern, ignoring it",
> buildername)
>   msg.ack()
>   return

...this is much more readable anyways, so it's worth doing.

> @@ +165,5 @@
> >              log.info("Buildbot results are %s", results)
> >              if results == SUCCESS:
> >                  log.info("Marking task %s as completed", taskid)
> >                  self.tc_queue.reportCompleted(taskid, runid)
> > +                msg.ack()
> 
> Should we be setting acked = True in these clauses? it appears that we'd try
> to double-ack once we make it through this conditional block.
> 
> @@ +201,4 @@
> >                  self.bbb_db.deleteBuildRequest(brid)
> >  
> > +        # If everything went well and the message hasn't been acked, do it. This could
> > +        # happen if the "WEIRD" conditions is hit in every iteration of the loop
> 
> Is this comment correct? From reading, it seems like we could hit this if
> results == SKIPPED also.

It's....partially correct :). I'll update it.
I think this addresses all of your concerns. I implemented allow_builders a bit differently, using an early return instead of a break (for short functions this reads better to me). I fixed up the "acked" stuff too, and discovered that I can check msg.acknowledge rather than maintaining a local variable to track it.
Attachment #8609375 - Attachment is obsolete: true
Attachment #8609375 - Flags: review?(sdeckelmann)
Attachment #8609514 - Flags: review?(sdeckelmann)
Attachment #8609514 - Flags: review?(sdeckelmann) → review+
Attachment #8609514 - Flags: checked-in+
Status: NEW → RESOLVED
Closed: 9 years ago
Resolution: --- → FIXED
Whiteboard: [bbb]
Component: General Automation → General
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: