Closed
Bug 658934
Opened 13 years ago
Closed 11 years ago
When a build is canceled via self-serve the builder needs to clobber
Categories
(Release Engineering :: General, defect, P3)
Release Engineering
General
Tracking
(Not tracked)
RESOLVED
FIXED
People
(Reporter: khuey, Assigned: catlee)
References
Details
(Keywords: buildapi, sheriffing-P1, Whiteboard: [capacity])
Attachments
(3 files, 2 obsolete files)
4.17 KB,
patch
|
jhopkins
:
review+
catlee
:
checked-in+
|
Details | Diff | Splinter Review |
25.89 KB,
patch
|
jhopkins
:
review+
catlee
:
checked-in+
|
Details | Diff | Splinter Review |
724 bytes,
patch
|
jhopkins
:
review+
catlee
:
checked-in+
|
Details | Diff | Splinter Review |
The self-serve API lets us kill builds, but this appears to leave the slave with broken object directories at times. We've had a bit of weirdness on m-c recently that was fixed by clobbers that I believe can be attributed to this. The self-serve API should schedule the builder for a clobber when a build is killed to avoid any broken object directories gumming up later builds.
Comment 1•13 years ago
|
||
Alternatively, the interruption could be less brutal than SIGKILL.
Comment 2•13 years ago
|
||
(In reply to comment #1) > Alternatively, the interruption could be less brutal than SIGKILL. By initiating buildbots "STOP BUILD" function, or is that internally using SIGKILL and is what you meant/what we use now?
Comment 3•13 years ago
|
||
Dustin will come along and give you the absolutely correct info, but IIRC buildbot is as brutal as SIGKILL when you use Stop Build.
Reporter | ||
Comment 4•13 years ago
|
||
Yeah, being less brutal works for me too.
Comment 6•13 years ago
|
||
Depends on the operating system, among other things, and on the buildslave version. Clobbering-after-stopping is by far the safer option, at any rate.
Updated•13 years ago
|
Assignee: nobody → catlee
Priority: -- → P3
Comment 7•13 years ago
|
||
I cancelled five Android builds; so far, only two of the five have burned their next builds. Of course, only two of the five have *had* next builds...
Assignee | ||
Comment 8•13 years ago
|
||
I chatted with Dustin about this. We think the best approach is to wrap the make invocation with steps that touch a file, and then delete the file afterwards. If the build is cancelled while compiling, the file will exist on the next build run. clobberer.py can then notice this, and run a clobber.
Comment 9•13 years ago
|
||
adding this to the dep tree for bug 697101 which would allow clobbers to happen with API calls.
Assignee | ||
Updated•12 years ago
|
Assignee: catlee → nobody
Comment 10•12 years ago
|
||
This bit us again for the second time in as many weeks today. Would it be possible to bump the priority of this? :-)
Updated•12 years ago
|
Whiteboard: [selfserve] → [selfserve][sheriff-want]
Comment 12•12 years ago
|
||
The sheriffs fairly frequently have situations where we could have cancelled unnecessary jobs to reduce infra load, but don't because the "cancel all" button would cause build bustage & cancelling each (non-breaking) job is just too tedious. Out of this, bug 666756 & bug 673246; this bug looks like the lowest hanging fruit to enable us to reduce infra load. As such catlee, what would you think about raising the priority on this one? :-)
Whiteboard: [selfserve][sheriff-want] → [selfserve][sheriff-want][capacity]
Assignee | ||
Updated•12 years ago
|
Priority: P3 → P2
Updated•11 years ago
|
Keywords: sheriffing-P1
Whiteboard: [selfserve][sheriff-want][capacity] → [capacity]
Assignee | ||
Comment 13•11 years ago
|
||
I'd rather not tie self-serve directly to the clobberer db, and use APIs if possible. This patch adds a new API to clobberer where you can pass in slave-$slavename=$buildername as POST parameters, and clobberer will create the appropriate entries in the clobber_times table. This API could then be used by the self-serve agent when cancelling a build. The flow would be something like this: - user clicks cancel on tbpl - request goes into self-serve, and is received by the self-serve agent - the agent figures out which master, builder name, and build number is being cancelled. we already have this logic - *NEW* the agent scrapes the master's build page to figure out which slave the build is running on - *NEW* the agent uses the builder name and slave name and the new clobberer API to scheduler a clobber - the agent cancels the build via the buildbot interface
Attachment #716488 -
Flags: feedback?(jhopkins)
Comment 14•11 years ago
|
||
Comment on attachment 716488 [details] [diff] [review] add new API for clobberer Looks good. Only one nitpick, and that's to avoid "select *" when you can and name the individual fields that are needed. Mostly, this makes the purpose of the query more self-evident.
Attachment #716488 -
Flags: feedback?(jhopkins) → feedback+
Assignee | ||
Updated•11 years ago
|
Assignee: nobody → catlee
Priority: P2 → P3
Assignee | ||
Comment 15•11 years ago
|
||
Attachment #716488 -
Attachment is obsolete: true
Attachment #735161 -
Flags: review?(jhopkins)
Updated•11 years ago
|
Attachment #735161 -
Flags: review?(jhopkins) → review+
Assignee | ||
Updated•11 years ago
|
Attachment #735161 -
Flags: checked-in+
Assignee | ||
Comment 16•11 years ago
|
||
hoping to test this sometime this week. drive-by comments welcome!
Assignee | ||
Comment 17•11 years ago
|
||
I tested this, and it failed because clobberer requires authentication to trigger a clobber. We can get around this by using real credentials for self-serve agent here, or by disabling auth for the agent (but then who shows up in the clobberer log?)
Assignee | ||
Comment 18•11 years ago
|
||
Just spoke with dustin, and I think we have a solution here. We can hit the same HTTP API that the build slaves use, and that doesn't require auth.
Comment 19•11 years ago
|
||
docs updated: https://mana.mozilla.org/wiki/display/IT/BuildAPI https://mana.mozilla.org/wiki/pages/viewpage.action?pageId=24805981
Assignee | ||
Comment 20•11 years ago
|
||
Attachment #740526 -
Flags: review?(jhopkins)
Assignee | ||
Comment 21•11 years ago
|
||
Attachment #740527 -
Flags: review?(jhopkins)
Assignee | ||
Updated•11 years ago
|
Attachment #735410 -
Attachment is obsolete: true
Updated•11 years ago
|
Attachment #740526 -
Flags: review?(jhopkins) → review+
Updated•11 years ago
|
Attachment #740527 -
Flags: review?(jhopkins) → review+
Assignee | ||
Updated•11 years ago
|
Attachment #740526 -
Flags: checked-in+
Assignee | ||
Updated•11 years ago
|
Attachment #740527 -
Flags: checked-in+
Assignee | ||
Updated•11 years ago
|
Status: NEW → RESOLVED
Closed: 11 years ago
Resolution: --- → FIXED
Updated•11 years ago
|
Product: mozilla.org → Release Engineering
You need to log in
before you can comment on or make changes to this bug.
Description
•