Closed Bug 1166414 Opened 10 years ago Closed 9 years ago

SETA does not retrigger for across the board build bustage

Categories

(Infrastructure & Operations Graveyard :: CIDuty, task)

task
Not set
normal

Tracking

(Not tracked)

RESOLVED DUPLICATE of bug 1173822

People

(Reporter: kmoir, Unassigned)

Details

Possible fixes: 1) make it easy for the sheriffs to "run all coalesced jobs" (via arbitrary build api, mozci/treeherder) when the build failures are fixed 2) inspect build results and retrigger if failed with coalescing disabled irc conversation from today: jmaher kmoir: RyanVM|sheriffduty brings up a valid concern with SETA; specifically if we force coalescing for 9 pushes, what happens if the 10th push is a DONTBUILD or we have a build failure kmoir jmaher: hmm, my understanding would be that it wouldn't run another one to account for the failure. kmoir jmaher: it's just at the level of scheduling, not inspecting results and then changing scheduling jmaher kmoir: assuming we don't have a valid build for the 10th run, would the 11th run attempt it, and would we continue to attempt it until we had a valid build to schedule? jmaher kmoir: sounds like we could get into a state where we don't run some jobs for 20 or 30 pushes kmoir jmaher: I don't think so, but I could do some testing. kmoir jmaher: you mean if there are multiple failures on each of the pushes where the job would run jmaher kmoir: ok- thanks; is there anything I could do to help test this out RyanVM|sheriffduty kmoir: yes, say the 10th push is a DONTBUILD RyanVM|sheriffduty or across the board bustage jmaher kmoir: yes- assuming the 10th push was a DONTBUILD and the 18-23rd pushes are full of build failures jmaher mshal: I honestly have no idea kmoir jmaher: is that a likely scenario? RyanVM|sheriffduty kmoir: across the board bustage is jmaher kmoir: we have a lot of build failures on the tree; dontbuild is not that often, but a couple times a day jonasfj Note, I have no idea what I'm talking aobut RyanVM|sheriffduty kmoir: it's what got me thinking about it today RyanVM|sheriffduty kmoir: https://treeherder.mozilla.org/#/jobs?repo=mozilla-inbound&filter-searchStr=build RyanVM|sheriffduty add another 20 pushes to get a feel for what today's been like kmoir looks kmoir jmaher RyanVM: so my thinking is that it wouldn't reschedule them the way it is implemented now, would have to do some testing to confirm kmoir this was not a scenario I looked at when testing jmaher kmoir: do you see the concern that RyanVM|sheriffduty has with build failures kmoir jmaher: yes, definitely jmaher kmoir: open to ideas and helping where possible to reduce concerns with that kmoir jmaher: the only thing I can think of is to inspect the results of the last run and if there is a failure, do not coallesce, not sure how to implement this yet jmaher kmoir: another option is to make it easy for the sheriffs to "run all coalesced jobs" (via arbitrary build api, mozci/treeherder) when the build failures are fixed kmoir jmaher: okay I'll open a bug and we can discuss the way forward there jmaher kmoir: cool
I think an easy fix here would be to have the sheriffs click a button and fill all the jobs for a given push; this would be useful when the builds are passing. The problem here is we would need this button to be pushed AFTER all the builds complete for the given revision. maybe there is a simple hack in buildbot.
pretty sure that if the 10th build is DONTBUILD, we would test on the 11th. this is because we're not actually counting pushes, we're counting sendchanges from the builds to trigger tests. if we don't get to the point of running sendchange for whatever reason, then that doesn't count towards the pending test count.
That's exactly what I was hoping to hear. Thanks catlee!
should we close this bug then?
Would we get a sendchange if the entire push was busted across the board?
If a push was busted across the board and the build was broken (red), the tests aren't invoked so no sendchange is invoked. So wouldn't it invoke the tests on the next sendchange when the build was fixed?
Status: NEW → RESOLVED
Closed: 9 years ago
Resolution: --- → DUPLICATE
Component: Platform Support → Buildduty
Product: Release Engineering → Infrastructure & Operations
Product: Infrastructure & Operations → Infrastructure & Operations Graveyard
You need to log in before you can comment on or make changes to this bug.