Closed Bug 1161128 Opened 9 years ago Closed 6 years ago

Autophone - skip jobs to manage work load

Categories

(Testing Graveyard :: Autophone, enhancement)

enhancement
Not set
normal

Tracking

(firefox40 affected)

RESOLVED WONTFIX
Tracking Status
firefox40 --- affected

People

(Reporter: bc, Assigned: bc)

Details

Autophone has always had problems keeping up with the load especially on mozilla-inbound which has recently seen up to 75 builds in one day.

With a limited set of tests s1s2 local blank,twitter, webappstartup, it is somewhat manageable but with the addition of nytimes, plus the remote versions of the s1s2tests, the Mochitest media tests and perhaps some versions of the Talos tests in the future, it seems we are either doomed to limit the number of tests we run, reduce the run time by reducing iterations, or add more devices.

With the addition of the 2 new linux hosts we will be able to add more devices but that is really a stop gap as I expect more tests in the future.

One possible approach would be to allow Autophone to skip builds either on a default basis (skip every other build or some other pattern) or in response to the load (skip a build when falling behind). Skipping builds require that we be able to actually test the skipped builds if necessary.

If I understand the current situation, we have started to do this with the buildbot managed jobs. The advantage of buildbot managed jobs, is that the buildbot self serve api can easily trigger the skipped tests when needed.

But for non-buildbot jobs such as Autophone, if we don't initially submit the job to Treeherder, it is not possible for a non-Autophone administrator to trigger the job from the Treeherder UI. If we submit the job as pending, then decide to skip it we need a job result which represents a completed state in Treeherder but which doesn't represent an error.

Treeherder folks: What do you think of adding a "skipped" job completed state to Treeherder?

Ryan: Would this use of skipped jobs be ok with the sheriffs?

jmaher: Would this work with talos? Or would talos require running on each build?
Flags: needinfo?(ryanvm)
Flags: needinfo?(mdoglio)
Flags: needinfo?(jmaher)
Flags: needinfo?(cdawson)
Doesn't bother me. Sounds like it's more a question for the perf team.
Flags: needinfo?(ryanvm)
bc, great question.  This will not affect talos.  We already do a fair amount of coalescing which skewed results.  Possibly running perf tests every 3rd push would be adequate?  Ideally no fewer than every 5th push.
Flags: needinfo?(jmaher)
(In reply to Bob Clary [:bc:] from comment #0)
> Treeherder folks: What do you think of adding a "skipped" job completed
> state to Treeherder?

There is already a 'coalesced' state, would this be sufficient?
emorley: That might work. The only issue I can think of might be that the coalesced jobs are hidden by default or that it might be confusing to overload the state for these two different cases. It appears that the coalesced jobs are left in a pending state though Treeherder doesn't count them in terms of number of jobs in progress.

I would be marking these as completed then issuing new jobs for the build if they need to be retriggered. If that usage is ok, we can leverage coalesced with the understanding that if we need to retrigger 'skipped' jobs we'll need to show the coalesced jobs.
There is also a "usercancel" (pink) state.  This seems like a good fit.  But we could always add a new "result" of skipped.  That's fairly low overhead.  It would require changes in the service and UI in several places (just to have the complete list/array).  And then we have to pick a new color.  :)  Somehow the last one would be the hardest...
Flags: needinfo?(cdawson)
+1 to using the coalesced state.
Flags: needinfo?(mdoglio)
Autophone is going away. Resolving these to wontfix.
Status: ASSIGNED → RESOLVED
Closed: 6 years ago
Resolution: --- → WONTFIX
Product: Testing → Testing Graveyard
You need to log in before you can comment on or make changes to this bug.