Closed
Bug 658934
Opened 14 years ago
Closed 12 years ago
When a build is canceled via self-serve the builder needs to clobber
Categories
(Release Engineering :: General, defect, P3)
Release Engineering
General
Tracking
(Not tracked)
RESOLVED
FIXED
People
(Reporter: khuey, Assigned: catlee)
References
Details
(Keywords: buildapi, sheriffing-P1, Whiteboard: [capacity])
Attachments
(3 files, 2 obsolete files)
|
4.17 KB,
patch
|
jhopkins
:
review+
catlee
:
checked-in+
|
Details | Diff | Splinter Review |
|
25.89 KB,
patch
|
jhopkins
:
review+
catlee
:
checked-in+
|
Details | Diff | Splinter Review |
|
724 bytes,
patch
|
jhopkins
:
review+
catlee
:
checked-in+
|
Details | Diff | Splinter Review |
The self-serve API lets us kill builds, but this appears to leave the slave with broken object directories at times. We've had a bit of weirdness on m-c recently that was fixed by clobbers that I believe can be attributed to this.
The self-serve API should schedule the builder for a clobber when a build is killed to avoid any broken object directories gumming up later builds.
Comment 1•14 years ago
|
||
Alternatively, the interruption could be less brutal than SIGKILL.
Comment 2•14 years ago
|
||
(In reply to comment #1)
> Alternatively, the interruption could be less brutal than SIGKILL.
By initiating buildbots "STOP BUILD" function, or is that internally using SIGKILL and is what you meant/what we use now?
Comment 3•14 years ago
|
||
Dustin will come along and give you the absolutely correct info, but IIRC buildbot is as brutal as SIGKILL when you use Stop Build.
| Reporter | ||
Comment 4•14 years ago
|
||
Yeah, being less brutal works for me too.
Comment 6•14 years ago
|
||
Depends on the operating system, among other things, and on the buildslave version. Clobbering-after-stopping is by far the safer option, at any rate.
Updated•14 years ago
|
Assignee: nobody → catlee
Priority: -- → P3
Comment 7•14 years ago
|
||
I cancelled five Android builds; so far, only two of the five have burned their next builds. Of course, only two of the five have *had* next builds...
| Assignee | ||
Comment 8•14 years ago
|
||
I chatted with Dustin about this. We think the best approach is to wrap the make invocation with steps that touch a file, and then delete the file afterwards. If the build is cancelled while compiling, the file will exist on the next build run. clobberer.py can then notice this, and run a clobber.
Comment 9•14 years ago
|
||
adding this to the dep tree for bug 697101 which would allow clobbers to happen with API calls.
| Assignee | ||
Updated•14 years ago
|
Assignee: catlee → nobody
Comment 10•13 years ago
|
||
This bit us again for the second time in as many weeks today. Would it be possible to bump the priority of this? :-)
Updated•13 years ago
|
Whiteboard: [selfserve] → [selfserve][sheriff-want]
Comment 12•13 years ago
|
||
The sheriffs fairly frequently have situations where we could have cancelled unnecessary jobs to reduce infra load, but don't because the "cancel all" button would cause build bustage & cancelling each (non-breaking) job is just too tedious.
Out of this, bug 666756 & bug 673246; this bug looks like the lowest hanging fruit to enable us to reduce infra load.
As such catlee, what would you think about raising the priority on this one? :-)
Whiteboard: [selfserve][sheriff-want] → [selfserve][sheriff-want][capacity]
| Assignee | ||
Updated•13 years ago
|
Priority: P3 → P2
Updated•13 years ago
|
Keywords: sheriffing-P1
Whiteboard: [selfserve][sheriff-want][capacity] → [capacity]
| Assignee | ||
Comment 13•12 years ago
|
||
I'd rather not tie self-serve directly to the clobberer db, and use APIs if possible.
This patch adds a new API to clobberer where you can pass in slave-$slavename=$buildername as POST parameters, and clobberer will create the appropriate entries in the clobber_times table.
This API could then be used by the self-serve agent when cancelling a build. The flow would be something like this:
- user clicks cancel on tbpl
- request goes into self-serve, and is received by the self-serve agent
- the agent figures out which master, builder name, and build number is being cancelled. we already have this logic
- *NEW* the agent scrapes the master's build page to figure out which slave the build is running on
- *NEW* the agent uses the builder name and slave name and the new clobberer API to scheduler a clobber
- the agent cancels the build via the buildbot interface
Attachment #716488 -
Flags: feedback?(jhopkins)
Comment 14•12 years ago
|
||
Comment on attachment 716488 [details] [diff] [review]
add new API for clobberer
Looks good.
Only one nitpick, and that's to avoid "select *" when you can and name the individual fields that are needed. Mostly, this makes the purpose of the query more self-evident.
Attachment #716488 -
Flags: feedback?(jhopkins) → feedback+
| Assignee | ||
Updated•12 years ago
|
Assignee: nobody → catlee
Priority: P2 → P3
| Assignee | ||
Comment 15•12 years ago
|
||
Attachment #716488 -
Attachment is obsolete: true
Attachment #735161 -
Flags: review?(jhopkins)
Updated•12 years ago
|
Attachment #735161 -
Flags: review?(jhopkins) → review+
| Assignee | ||
Updated•12 years ago
|
Attachment #735161 -
Flags: checked-in+
| Assignee | ||
Comment 16•12 years ago
|
||
hoping to test this sometime this week. drive-by comments welcome!
| Assignee | ||
Comment 17•12 years ago
|
||
I tested this, and it failed because clobberer requires authentication to trigger a clobber.
We can get around this by using real credentials for self-serve agent here, or by disabling auth for the agent (but then who shows up in the clobberer log?)
| Assignee | ||
Comment 18•12 years ago
|
||
Just spoke with dustin, and I think we have a solution here. We can hit the same HTTP API that the build slaves use, and that doesn't require auth.
Comment 19•12 years ago
|
||
| Assignee | ||
Comment 20•12 years ago
|
||
Attachment #740526 -
Flags: review?(jhopkins)
| Assignee | ||
Comment 21•12 years ago
|
||
Attachment #740527 -
Flags: review?(jhopkins)
| Assignee | ||
Updated•12 years ago
|
Attachment #735410 -
Attachment is obsolete: true
Updated•12 years ago
|
Attachment #740526 -
Flags: review?(jhopkins) → review+
Updated•12 years ago
|
Attachment #740527 -
Flags: review?(jhopkins) → review+
| Assignee | ||
Updated•12 years ago
|
Attachment #740526 -
Flags: checked-in+
| Assignee | ||
Updated•12 years ago
|
Attachment #740527 -
Flags: checked-in+
| Assignee | ||
Updated•12 years ago
|
Status: NEW → RESOLVED
Closed: 12 years ago
Resolution: --- → FIXED
Updated•12 years ago
|
Product: mozilla.org → Release Engineering
You need to log in
before you can comment on or make changes to this bug.
Description
•