Since we seem to be heading back into another round of "remoteFailed: [Failure instance: Traceback (failure with no frames): <class 'twisted.internet.error.ConnectionLost'>: Connection to the other side was lost in a non-clean fashion." it would be nice if we auto-retried, rather than randomly manually deciding whether or not someone might want a particular test result badly enough to poke buildduty about it.
I still haven't gotten a chance to look at this, but I will =\.
Hmmm, been digging into the Buildbot code and it looks like the Build object, which is responsible for deciding on overall Build status, doesn't work very well when an earlier step sets a worse status than a subsequent. In these cases, we have an early-ish step setting RETRY, and then once the slave disconnects, buildbot dumbly tries to run any steps marked as alwaysRun=True. Those fail out with an exception because the slave is already gone, and the Build object chooses to set the overall status as EXCEPTION rather than retry. So, there's two upstream bugs I'm going to address here: - Build object needs to be smarter about setting overall build status - Shouldn't run alwaysRun steps if a build fails out due to a slave disconnecting
Created attachment 481281 [details] [diff] [review] fix buildbot to stop overriding RETRY This patch is a combination of my two upstream buildbot commits that should resolve this bug: http://github.com/buildbot/buildbot/commit/8556b470586c72c5788587fec427959e9fe5a553 http://github.com/buildbot/buildbot/commit/90e38d541ce3114d3cd0faa38d1dfa0b0b713329
Comment on attachment 481281 [details] [diff] [review] fix buildbot to stop overriding RETRY survey of IRC told me I should still get review on this, even though it's upstream. Nick or Rail, either of you feel comfortable doing this?
Comment on attachment 481281 [details] [diff] [review] fix buildbot to stop overriding RETRY I don't have my head in the buildbot codebase so can't review this. However I would be interested in how the test results change with and without the first hunk. Just testStepDoneRetryOverridesAnythingElse ? And with 6 flavors of *On* settings, do you have a enough tests to make sure no combination regresses ?
I'm pretty certain I'm not changing anything subtle based on reading the code, but it's pretty trivial to add more tests -- I'll do that.
Created attachment 481577 [details] [diff] [review] fix regression, add more tests Good idea indeed, Nick. I ended up catching a regression. Here's an all inclusive patch that tests at least the default and non-default states of each of those flags. Testing every possible combination isn't _quite_ possible (5 step results ^ 6 flags ^ 2), but this covers a much better portion than before.
Comment on attachment 481577 [details] [diff] [review] fix regression, add more tests Looks good. The only confusing for me is the description of warnOnWarnings in the documentation: warnOnWarnings when True, a WARNINGS or _FAILURE_ of this build step will mark the overall build as having WARNINGS. The remaining steps will still be executed. I you really want to mark overall result as WARNINGS when warnOnWarnings is set and the step result is FAILURE, you need another "if" in "if result == FAILURE" section.
(In reply to comment #8) > Comment on attachment 481577 [details] [diff] [review] > fix regression, add more tests > > Looks good. The only confusing for me is the description of warnOnWarnings in > the documentation: > > warnOnWarnings > when True, a WARNINGS or _FAILURE_ of this build step will mark the overall > build as having WARNINGS. The remaining steps will still be executed. > > I you really want to mark overall result as WARNINGS when warnOnWarnings is set > and the step result is FAILURE, you need another "if" in "if result == FAILURE" > section. Seems like a documentation error to me. Here's the old code: http://github.com/buildbot/buildbot/blob/a09caa3ce81adc89c61730b57180b7066348a465/master/buildbot/process/base.py#L399 ...which doesn't match the docs afaict. I'm going to guess that somebody copied/pasted from flunkOnWarnings and didn't change it enough. I'll fix them, too.
This is ready to land in the next tree closure window.
Comment on attachment 481577 [details] [diff] [review] fix regression, add more tests Landed w/out the tests, because test_process_base.py doesn't exist in our repo (it was added post-0.8.1). changeset: 95:58225ba72678
I'm in the process of updating our masters for this now.
I just confirmed that this is working on one of the masters w/ the new code.
(for posterity, I tested by kill -9'ing the buildbot process on the slave, which caused the build to finish with RETRY and was automatically restarted)
This should be active on all the 0.8.0 masters now.