Closed Bug 1258480 Opened 10 years ago Closed 9 years ago

Long Inter-Job Timing on t-yosemite-r7

Categories

(Release Engineering :: General, defect)

defect
Not set
normal

Tracking

(Not tracked)

RESOLVED INCOMPLETE

People

(Reporter: ekyle, Unassigned)

References

Details

In an attempt to simulate the actions of Buildbot [1], we discovered the time between a job ending on a machine, and the next job starting on the same machine is variable, and sometimes large. Let's call this time the "Inter-Job Timing". Let's call the machines that take the longest to start the next job "sleepy machines". The simulator [1] lists the sleepy machines. It seems some machines are taking an hour to start the next job. These long times appear to be happening soon after 11pmGMT (4pmPDT). Please confirm these long delays actually exist, and I am not just missing information due to timezone anomalies. The primary concern is, throughout the day, the Inter-Job Timing can be 4 minutes, or more. I made a few sheets [3] with a few machines on Mar18. Mar18 is a good pick because the machines were saturated all day; the time between jobs is not a side effect of there being no work. One of the columns shows the Inter-Job Timing. The data I used ultimately came from the buildbot json logs [4]. I used ActiveData to index-and-query a particular machines, and all the jobs it handled in a day [2]. [1] http://people.mozilla.org/~klahnakoski/temp/Buildbot-Simulator.html#num=200&pool=t-yosemite-r7&date=2016-03-18 [2] http://activedata.allizom.org/tools/query.html#query_id=GGnkMnsw [3] https://docs.google.com/spreadsheets/d/1MRWcbNre6RBKI_LC0UDpVAoILtzteMmqdZrJ_3F1MjE/edit?usp=sharing [4] http://builddata.pub.build.mozilla.org/builddata/buildjson/
Having runner logs in papertrail (bug 1179819) would make this easier to dig into.
Depends on: 1179819
Status: NEW → RESOLVED
Closed: 9 years ago
Resolution: --- → INCOMPLETE
Component: General Automation → General
You need to log in before you can comment on or make changes to this bug.