Closed Bug 658934 Opened 13 years ago Closed 11 years ago

When a build is canceled via self-serve the builder needs to clobber

Categories

(Release Engineering :: General, defect, P3)

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: khuey, Assigned: catlee)

References

Details

(Keywords: buildapi, sheriffing-P1, Whiteboard: [capacity])

Attachments

(3 files, 2 obsolete files)

The self-serve API lets us kill builds, but this appears to leave the slave with broken object directories at times.  We've had a bit of weirdness on m-c recently that was fixed by clobbers that I believe can be attributed to this.

The self-serve API should schedule the builder for a clobber when a build is killed to avoid any broken object directories gumming up later builds.
Alternatively, the interruption could be less brutal than SIGKILL.
(In reply to comment #1)
> Alternatively, the interruption could be less brutal than SIGKILL.

By initiating buildbots "STOP BUILD" function, or is that internally using SIGKILL and is what you meant/what we use now?
Dustin will come along and give you the absolutely correct info, but IIRC buildbot is as brutal as SIGKILL when you use Stop Build.
Yeah, being less brutal works for me too.
dustin ^
Whiteboard: [selfserve]
Depends on the operating system, among other things, and on the buildslave version.  Clobbering-after-stopping is by far the safer option, at any rate.
Assignee: nobody → catlee
Priority: -- → P3
I cancelled five Android builds; so far, only two of the five have burned their next builds. Of course, only two of the five have *had* next builds...
Blocks: 664858
I chatted with Dustin about this. We think the best approach is to wrap the make invocation with steps that touch a file, and then delete the file afterwards. If the build is cancelled while compiling, the file will exist on the next build run. clobberer.py can then notice this, and run a clobber.
adding this to the dep tree for bug 697101 which would allow clobbers to happen with API calls.
Assignee: catlee → nobody
This bit us again for the second time in as many weeks today. Would it be possible to bump the priority of this? :-)
Blocks: 764460
Whiteboard: [selfserve] → [selfserve][sheriff-want]
The sheriffs fairly frequently have situations where we could have cancelled unnecessary jobs to reduce infra load, but don't because the "cancel all" button would cause build bustage & cancelling each (non-breaking) job is just too tedious.

Out of this, bug 666756 & bug 673246; this bug looks like the lowest hanging fruit to enable us to reduce infra load.

As such catlee, what would you think about raising the priority on this one? :-)
Whiteboard: [selfserve][sheriff-want] → [selfserve][sheriff-want][capacity]
Priority: P3 → P2
Keywords: buildapi
Keywords: sheriffing-P1
Whiteboard: [selfserve][sheriff-want][capacity] → [capacity]
Attached patch add new API for clobberer (obsolete) — Splinter Review
I'd rather not tie self-serve directly to the clobberer db, and use APIs if possible.

This patch adds a new API to clobberer where you can pass in slave-$slavename=$buildername as POST parameters, and clobberer will create the appropriate entries in the clobber_times table.

This API could then be used by the self-serve agent when cancelling a build. The flow would be something like this:
- user clicks cancel on tbpl
- request goes into self-serve, and is received by the self-serve agent
- the agent figures out which master, builder name, and build number is being cancelled. we already have this logic
- *NEW* the agent scrapes the master's build page to figure out which slave the build is running on
- *NEW* the agent uses the builder name and slave name and the new clobberer API to scheduler a clobber
- the agent cancels the build via the buildbot interface
Attachment #716488 - Flags: feedback?(jhopkins)
Comment on attachment 716488 [details] [diff] [review]
add new API for clobberer

Looks good.

Only one nitpick, and that's to avoid "select *" when you can and name the individual fields that are needed.  Mostly, this makes the purpose of the query more self-evident.
Attachment #716488 - Flags: feedback?(jhopkins) → feedback+
Assignee: nobody → catlee
Priority: P2 → P3
Attachment #716488 - Attachment is obsolete: true
Attachment #735161 - Flags: review?(jhopkins)
Attachment #735161 - Flags: review?(jhopkins) → review+
Attachment #735161 - Flags: checked-in+
hoping to test this sometime this week. drive-by comments welcome!
Blocks: 666756
I tested this, and it failed because clobberer requires authentication to trigger a clobber.

We can get around this by using real credentials for self-serve agent here, or by disabling auth for the agent (but then who shows up in the clobberer log?)
Just spoke with dustin, and I think we have a solution here. We can hit the same HTTP API that the build slaves use, and that doesn't require auth.
Attachment #740526 - Flags: review?(jhopkins)
Attachment #740527 - Flags: review?(jhopkins)
Attachment #735410 - Attachment is obsolete: true
Attachment #740526 - Flags: review?(jhopkins) → review+
Attachment #740527 - Flags: review?(jhopkins) → review+
Attachment #740526 - Flags: checked-in+
Attachment #740527 - Flags: checked-in+
Status: NEW → RESOLVED
Closed: 11 years ago
Resolution: --- → FIXED
Blocks: 867171
Product: mozilla.org → Release Engineering
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: