Closed
Bug 1156301
Opened 9 years ago
Closed 9 years ago
be able to cancel buildbot builds through taskcluster
Categories
(Release Engineering :: General, defect)
Tracking
(Not tracked)
RESOLVED
FIXED
People
(Reporter: bhearsum, Assigned: bhearsum)
References
Details
Attachments
(2 files)
3.58 KB,
patch
|
dustin
:
review+
bhearsum
:
checked-in+
|
Details | Diff | Splinter Review |
22.11 KB,
patch
|
selenamarie
:
review+
bhearsum
:
checked-in+
|
Details | Diff | Splinter Review |
As things stand right now, Buildbot jobs that have a corresponding TaskCluster task can be cancelled in TaskCluster, which will put the task in an exception state, but the Buildbot Build will continue to run. The Buildbot bridge should be able to cancel the corresponding Build to avoid unnecessary work (and potential confusion when a "cancelled" job completes). One complication here might be how to attach artifacts for the cancelled job. IIRC, Taskcluster immediately puts the task in an exception state, which means we can't attach any artifacts to it, which Buildbot (and the bridge) would want to do after the Build stops.
Comment 1•9 years ago
|
||
1) a task-exception message is published when a task is cancelled, identified with {reason: 'canceled'}. So being notified shouldn't be an issues. 2) As for artifacts after task resolution see: bug 1148860.
Assignee | ||
Comment 2•9 years ago
|
||
(In reply to Jonas Finnemann Jensen (:jonasfj) from comment #1) > 1) a task-exception message is published when a task is cancelled, > identified with {reason: 'canceled'}. > So being notified shouldn't be an issues. > > 2) As for artifacts after task resolution see: bug 1148860. Thanks Jonas!
Depends on: 1148860
Assignee | ||
Comment 3•9 years ago
|
||
I've got a patch in the works for this. I thought this would depend on bug 1156810, but it turns out we can work around it by just setting "X-Remote-User" on the client side.
Assignee | ||
Comment 4•9 years ago
|
||
I have an untested patch for this at https://github.com/bhearsum/bbb/compare/tc-cancel?expand=1
Assignee | ||
Comment 5•9 years ago
|
||
I still need to finish testing my patch for the bridge, but this is certainly needed.
Attachment #8607586 -
Flags: review?(dustin)
Updated•9 years ago
|
Attachment #8607586 -
Flags: review?(dustin) → review+
Assignee | ||
Updated•9 years ago
|
Attachment #8607586 -
Flags: checked-in+
Assignee | ||
Comment 6•9 years ago
|
||
Selena, I'm not sure if you're comfortable reviewing this since it deals with a lot of interactions with BuildAPI, but you're probably the best reviewer for the bridge parts! If you want someone else to validate the buildapi stuff just let me know. I don't have too much to say up front about this patch - there's a lot of inline comments about how it works and the edge cases that I know it won't deal with just yet. The high level overview is that this patch gets the TCListener paying attention to cancellation events sent by Taskcluster (via the task-exception exchange), and cancels the relevant Buildbot Build or BuildRequest upon receiving one. There's a little bit of hackiness around the branch names because "branch" means different things to different systems. Taskcluster generally talks about full URLs to repos, Buildbot generally uses the path part of the URL (eg: integration/mozilla-inbound), and BuildAPI uses the "short" name (eg: mozilla-inbound). So there's some translation that needs to happen at times.
Attachment #8610121 -
Flags: review?(sdeckelmann)
Assignee | ||
Comment 7•9 years ago
|
||
Turns out that bug 1148860 doesn't block us here, but we should make some improvements after it's fixed.
No longer depends on: 1148860
Comment 8•9 years ago
|
||
Comment on attachment 8610121 [details] [diff] [review] support cancellation of buildbot builds/buildrequests through taskcluster Review of attachment 8610121 [details] [diff] [review]: ----------------------------------------------------------------- A couple nits on documentation and TODOs. One question I had was whether BuildAPI returns immedidately for DELETE calls. It's a TODO rather than a refactor if so. ::: bbb/runner.py @@ -76,5 @@ > > signal(SIGTERM, handle_sigterm) > > log.info("Running %s service", args.service[0]) > - # TODO: If we're not going to run with supervisor or something similar, Did something change here? ::: bbb/servicebase.py @@ +30,5 @@ > + # background on this. > + url = "%s/%s" % (self.base_uri, url) > + log.debug("Making %s request to %s", method, url) > + r = requests.request(method, url, headers={"X-Remote-User": "buildbot-bridge"}) > + # TODO: should we raise here? Maybe return the return code so the consumer We are raising-- maybe revise this TODO. Do consumers want/need to explicitly know about a 200? @@ +39,5 @@ > + log.exception("Caught exception:") > + > + def cancelBuild(self, branch, id_): > + url = "%s/build/%s" % (branch, id_) > + self._do_request("DELETE", url) I'm assuming this returns immediately? If so, do we need to check that it worked? If not, could this process get stalled for a while waiting for the task to complete? ::: bbb/services.py @@ +158,5 @@ > expires = arrow.now().replace(weeks=1).isoformat() > + try: > + createJsonArtifact(self.tc_queue, taskid, runid, "public/properties.json", properties, expires) > + except TaskclusterRestFailure: > + # Probably tried to create an artifact for a completed job. This can be reworked Do you want a TODO marker here? @@ +160,5 @@ > + createJsonArtifact(self.tc_queue, taskid, runid, "public/properties.json", properties, expires) > + except TaskclusterRestFailure: > + # Probably tried to create an artifact for a completed job. This can be reworked > + # after bug 1148860 is fixed. > + log.exception("Caught exception when creating an artifact for %s, not retrying...", taskid) If we think this is most likely a completed job, could we also put that into the log message? @@ +207,5 @@ > # TODO: runid might be wrong for the rerun for a period of time because we don't update it > # until the TCListener gets the task-pending event. Maybe we should update it here too/instead? > self.tc_queue.rerunTask(taskid) > elif results == CANCELLED: > + # We could end up in this block for two different reasons: Excellent explanation!
Attachment #8610121 -
Flags: review?(sdeckelmann) → review+
Assignee | ||
Comment 9•9 years ago
|
||
(In reply to Selena Deckelmann :selenamarie :selena from comment #8) > One question I had was whether BuildAPI returns immedidately for DELETE > calls. It's a TODO rather than a refactor if so. Yeah, it returns right away and processes the request in another process. > ::: bbb/runner.py > @@ -76,5 @@ > > > > signal(SIGTERM, handle_sigterm) > > > > log.info("Running %s service", args.service[0]) > > - # TODO: If we're not going to run with supervisor or something similar, > > Did something change here? Naw, just removing this comment because we settled on supervisord, so it's not relevant anymore. > ::: bbb/servicebase.py > @@ +30,5 @@ > > + # background on this. > > + url = "%s/%s" % (self.base_uri, url) > > + log.debug("Making %s request to %s", method, url) > > + r = requests.request(method, url, headers={"X-Remote-User": "buildbot-bridge"}) > > + # TODO: should we raise here? Maybe return the return code so the consumer > > We are raising-- maybe revise this TODO. Do consumers want/need to > explicitly know about a 200? Actually, I don't think we're raising because this code ends up catching the RequestException...
Comment 10•9 years ago
|
||
(In reply to Ben Hearsum [:bhearsum] from comment #9) > (In reply to Selena Deckelmann :selenamarie :selena from comment #8) > > ::: bbb/servicebase.py > > @@ +30,5 @@ > > > + # background on this. > > > + url = "%s/%s" % (self.base_uri, url) > > > + log.debug("Making %s request to %s", method, url) > > > + r = requests.request(method, url, headers={"X-Remote-User": "buildbot-bridge"}) > > > + # TODO: should we raise here? Maybe return the return code so the consumer > > > > We are raising-- maybe revise this TODO. Do consumers want/need to > > explicitly know about a 200? > > Actually, I don't think we're raising because this code ends up catching the > RequestException... Oh sorry. Thinko. Thanks. Nothing to add. I haven't thought through whether that would be helpful for the consumer :/
Assignee | ||
Comment 11•9 years ago
|
||
Comment on attachment 8610121 [details] [diff] [review] support cancellation of buildbot builds/buildrequests through taskcluster I improved the comments/log message around the failure to create artifacts in https://github.com/bhearsum/bbb/commit/b94d4b678d8cf16a29cb4f577ced68000074d459. I'm going to bump the version number and get this deployed now!
Attachment #8610121 -
Flags: checked-in+
Assignee | ||
Comment 12•9 years ago
|
||
Deployed in production!
Status: NEW → RESOLVED
Closed: 9 years ago
Resolution: --- → FIXED
Updated•6 years ago
|
Component: General Automation → General
You need to log in
before you can comment on or make changes to this bug.
Description
•