Closed Bug 1036122 Opened 10 years ago Closed 3 years ago

mozharness should locally cache cloned source repositories

Categories

(Release Engineering :: Applications: MozharnessCore, defect)

defect
Not set
normal

Tracking

(Not tracked)

RESOLVED INACTIVE

People

(Reporter: gps, Unassigned)

References

(Blocks 1 open bug)

Details

(Whiteboard: [kanban:engops:https://mozilla.kanbanize.com/ctrl_board/6/2098] )

I'm running some mozharness jobs locally. I noticed the desktop unittest scripts are always cloning the mozharness repo from hg.mozilla.org. You can see this in the logs:

https://tbpl.mozilla.org/php/getParsedLog.php?id=43357142&tree=Fx-Team&full=1
10:10:54     INFO - #####
10:10:54     INFO - ##### Running pull step.
10:10:54     INFO - #####
10:10:54     INFO - Running pre-action listener: _resource_record_pre_action
10:10:54     INFO - Running main action method: pull
10:10:54     INFO - Changing directory to /builds/slave/test/build.
10:10:54     INFO - retry: Calling <bound method DesktopUnittest._get_revision of <__main__.DesktopUnittest object at 0x24a41d0>> with args: (<mozharness.base.vcs.mercurial.MercurialVCS object at 0x23e7c50>, 'tools'), kwargs: {}, attempt #1
10:10:54     INFO - Setting /builds/slave/test/build/tools to https://hg.mozilla.org/build/tools.
10:10:54     INFO - Cloning https://hg.mozilla.org/build/tools to /builds/slave/test/build/tools.
10:10:54     INFO - Running command: ['hg', '--config', 'ui.merge=internal:merge', 'clone', 'https://hg.mozilla.org/build/tools', '/builds/slave/test/build/tools']
10:10:54     INFO - Copy/paste: hg --config ui.merge=internal:merge clone https://hg.mozilla.org/build/tools /builds/slave/test/build/tools
10:10:54     INFO - Calling ['hg', '--config', 'ui.merge=internal:merge', 'clone', 'https://hg.mozilla.org/build/tools', '/builds/slave/test/build/tools'] with output_timeout 1000
10:11:08     INFO -  requesting all changes
10:11:08     INFO -  adding changesets
10:11:08     INFO -  adding manifests
10:11:08     INFO -  adding file changes
10:11:08     INFO -  added 4741 changesets with 9622 changes to 1184 files
10:11:08     INFO -  updating to branch default
10:11:08     INFO -  545 files updated, 0 files merged, 0 files removed, 0 files unresolved
10:11:08     INFO - Return code: 0

IMO we should be using a local clone/cache to avoid re-pulling redundant data for every job. This will make jobs execute faster and it will reduce load on the version control server.

If we're doing close to 100,000 jobs per day and 90% of them are cloning the mozharness repo (no clue if that estimate is accurate), at 14s per clone, that's 350 hours of wall time cloning per day. I'm not sure what our EC2 instance composition and cost is, but assuming $0.20/hr, this is $70/day or ~$25k/year. The mozharness repo is ~2MB in size over the wire. That comes to 180 GB of clone transfer per day. That's a non-trivial amount of load on the Mercurial server as well.

Considering a single engineer costs ~$100/day and this inefficiency is costing us $70/day, I'd say this is worth fixing! I'm just not sure if we're down to fixing $70/day efficiency losses yet.
Err, I meant engineers cost $100/hour :)
You are off by factor of 25. Testers are $0.0081/hour. Not worth fixing before TC
(In reply to Taras Glek (:taras) from comment #2)
> You are off by factor of 25. Testers are $0.0081/hour. Not worth fixing
> before TC

I love when I'm proved wrong by data!

The cost (CPU, bandwidth) to the version control server might still be painful enough for that group though.
I wonder if this bug should be re-examined in light of today's prolonged outage.
In production, many mozharness clones/pulls happen via hgtool, which uses hg share and a clean clone to avoid cloning repos again.

For tools, we have runner which now pre-clones tools (and practically every other hg repo) onto slaves before starting buildbot.  This actually potentially caused, and definitely exacerbated, the hg.m.o issues, but if we end up pointing to that repo instead of re-cloning, we save ourselves a clone at runtime.
Blocks: try-tracker
Depends on: 1050109
Whiteboard: [kanban:engops:https://mozilla.kanbanize.com/ctrl_board/6/2091]
Whiteboard: [kanban:engops:https://mozilla.kanbanize.com/ctrl_board/6/2091] → [kanban:engops:https://mozilla.kanbanize.com/ctrl_board/6/2098]
Blocks: 1096337

I think between run-task, robustcheckout, and VMs that disappear quickly after running tasks, we can resolve this.

Status: NEW → RESOLVED
Closed: 3 years ago
Resolution: --- → INACTIVE
You need to log in before you can comment on or make changes to this bug.