Closed
Bug 1036468
Opened 11 years ago
Closed 8 years ago
reduce overhead in job setup in buildbot as well as test scripts + mozharness
Categories
(Release Engineering :: Applications: MozharnessCore, defect)
Release Engineering
Applications: MozharnessCore
Tracking
(Not tracked)
RESOLVED
INCOMPLETE
People
(Reporter: jmaher, Unassigned)
References
Details
(Whiteboard: [kanban:engops:https://mozilla.kanbanize.com/ctrl_board/6/2506] [capacity][good next bug])
in looking at a new talos job I was playing with on try:
https://tbpl-dev.allizom.org/php/getParsedLog.php?id=43046379&tree=Try&full=1
tbpl showed 12 minutes, but the test itself took roughly 2 minutes. Ouch, that is not a good return on investment, something must be going on-
List of times and buildbot steps:
00:03 set props basedir
00:09 rm -Rf properties
00:32 rm -Rf scripts
00:44 hg clone hg.mozilla.org/build/mozharness scripts
00:08 hg update -C -r production
00:41 hg id -i
00:45 <download to oath.txt> ?wtf?
07:51 python -u talos_script.py ...
00:10 rm -f oath.txt
00:11 slave lost reboot
-------------
03:23 time outside of the mozharness script
07:51 time inside the mozharness script
lets look at what happens inside the mozharness script:
01:32 - download talos.json (timeout 30 seconds, sleep 60 seconds)
00:01 - download firefox build (win32.zip)
00:15 - rm c:/slave/talos-data\\talos
00:15 - clone talos repo
00:02 - download tp5n.zip
00:36 - unzip tp5n.zip (we could cache this - rarely changes)
00:01 - download + unzip flash.zip (we could cache this - rarely changes)
00:13 - virtualenv setup
00:01 - install pyyaml (never changes)
00:10 - install pywin32 (never changes)
00:40 - talos setup.py installation of dependencies
00:01 - unzip win32.zip firefox bundle
02:54 - talos initialization, test, upload results
00:01 - mozharness cleanup, reporting
total 6:46 (missing 1:05 here, so this isn't 100% accurate)
As you can see there are some things that we should reconsider:
in buildbot:
* figure out how to improve downloading of oauth.txt (possibly save 20-40 seconds)
* don't remove script and reclone, just cleanup (possibly savings of 30-60 seconds)
* potential savings here could be between 0:50 and 1:30
in mozharness:
* tp5n.zip +flash.zip - keep them static or do some md5sum to ensure we have the latest bits (we spend 0:37 here, we could save 15-20 seconds)
* install pyyaml and pywin32 on the system globally (save 10-11 seconds)
* optimize our timeout and retry for talos.json (maybe save 30-60 seconds)
* reuse talos repo (pull latest, update to <rev>, delete files, save 10-15 seconds)
* potential savings here could be: 1:05-1:46
doing all of these changes could take a 11 minute job and save between 1:55 and 3:16. Multiply that out by hundreds of thousands of jobs and we have a big win
Updated•11 years ago
|
Component: General Automation → Mozharness
Comment 1•11 years ago
|
||
Slave pre-flight tasks (bug 712206) should make sure the tools/scripts repo is always checked out and as up-to-date as possible before the slave even returns to the pool.
Depends on: 712206
Updated•11 years ago
|
Whiteboard: [capacity]
Updated•11 years ago
|
Whiteboard: [capacity] → [capacity][good next bug]
Updated•11 years ago
|
Whiteboard: [capacity][good next bug] → [kanban:engops:https://mozilla.kanbanize.com/ctrl_board/6/2506] [capacity][good next bug]
Updated•8 years ago
|
Status: NEW → RESOLVED
Closed: 8 years ago
Resolution: --- → INCOMPLETE
You need to log in
before you can comment on or make changes to this bug.
Description
•