Closed Bug 1096653 Opened 10 years ago Closed 10 years ago

gaia-central is cloned too often

Categories

(Release Engineering :: General, defect, P1)

defect

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: gps, Unassigned)

References

Details

Attachments

(2 files)

gaia-central accounts for more traffic on hg.mozilla.org than any other repository. I understand gaia is a popular repository. However, an alarmingly high number of `hg pull` / `hg clone` requests appear to be full clones.

17% of "getbundle" requests to gaia-central today were full clones. Contrast with 3% for mozilla-inbound.

The network cannot handle the Gaia automation inefficiently fetching 1.6 GB of full clones over and over. Please change the Gaia automation to do more incremental pulls, like the Firefox desktop automation does.

This is a P1 bug because we're running into scaling problems in SCL3 and gaia-central's clone traffic is the #1 contributor from hg.mozilla.org.
Component: Release Automation → General Automation
QA Contact: bhearsum → catlee
FTR, the AWS instances should have the repo in /builds/hg-shared pre-baked since last Friday, see  http://hg.mozilla.org/build/cloud-tools/rev/97e4b068a7e8#l1.10
Across all hgweb nodes on Nov 10, gaia-central accounted for 1,992,687,013,298 of network transfer. On disk, gaia-central is 1,758,767,587 bytes give or take a few MB. So, gaia-central is cloned an equivalent of 1,133 times.

1,992,687,013,298
              x 8 bits
=================
15,941,496,106,384 bits
   % 1,000,000,000 bits per second in a 1 Gbps link
==================
            15,941 seconds of 1 Gbps network saturation (assuming non-realistic 0 overhead)

This is absolutely ridiculous. Bug 1096337 puts this further in perspective.
IIRC, our test instances use that repo too.
Worth mentioning: we've observed that gaia-central clones from hg.mozilla.org seem to come in batches. The server will be relatively quiet then we'll see 3 or 4 gaia-central clones start around the same time. I've observed this on single hgweb server at a time. I assume it occurs on multiple hgweb servers when it happens. I assume some something in Gaia automation land triggers several jobs at once, each of them start close to each other, and each in turn clones gaia-central. I would *love* confirmation of this theory.
This sounds exactly like our tests: a single (long-ish) build triggers a bunch of test jobs, and they (some/all?) clone gaia-central.
I landed https://hg.mozilla.org/build/cloud-tools/rev/b8b3934e224e to pre-bake gaia-central hg share on the instances. We'll see the results tomorrow.
Rail: you are awesome.
Thanks, but not as awesome as you are!!! :D
In the gecko builds (eg b2g_mozilla-central_linux32_gecko build), philor noticed that we end up clobbering our clone of 
  https://hg.mozilla.org/integration/gaia-central
to get
  https://hg.mozilla.org//integration/gaia-central
This would affect the first job with preseeded hg_share, so AWS builders mainly.

The URL comes in part from 
  http://hg.mozilla.org/mozilla-central/file/cbe6afcae26c/b2g/config/gaia.json
which has
  "repo_path": "/integration/gaia-central"
and is used at
  http://hg.mozilla.org/build/buildbotcustom/file/default/process/factory.py#l1269
where it's <host> + '/' + repo_path.

This fixes the b2g_bumper side to match what we do everywhere in buildbot config/code.
Attachment #8520350 - Flags: review?(catlee)
Comment on attachment 8520350 [details] [diff] [review]
[mozharness] Strip leading /

Review of attachment 8520350 [details] [diff] [review]:
-----------------------------------------------------------------

assuming we don't have other consumers that depend on the leading /, r+
Attachment #8520350 - Flags: review?(catlee) → review+
Attached patch normalize pathsSplinter Review
only brief testing done, but I think this should work?
Attachment #8520362 - Flags: feedback?(nthomas)
Comment on attachment 8520362 [details] [diff] [review]
normalize paths

>diff --git a/lib/python/util/hg.py b/lib/python/util/hg.py
>             dest_sharedPath_data = os.path.normpath(
>                 open(dest_sharedPath).read())
>             norm_sharedRepo = os.path.normpath(os.path.join(sharedRepo, '.hg'))
>-            if dest_sharedPath_data != norm_sharedRepo:
>+            if normalize_path(dest_sharedPath_data) != normalize_path(norm_sharedRepo):

Norming twice here. Otherwise looks good.
Attachment #8520362 - Flags: feedback?(nthomas) → feedback+
Comment on attachment 8520350 [details] [diff] [review]
[mozharness] Strip leading /

https://hg.mozilla.org/build/mozharness/rev/8fd85bee31d3
https://hg.mozilla.org/build/mozharness/rev/f73abc0ea6bf

I can't find anything outside of buildbot consuming gaia.json.
Attachment #8520350 - Flags: checked-in+
gps, can you see any improvement?
Yes!

The byte totals for UTC today (it is currently ~1630 UTC):

983601964941 build/tools
643524840540 projects/fig/json-pushes
459745929363 integration/gaia-central
229746885068 mozilla-central
123972445776 releases/mozilla-aurora
108275973501 build/mozharness
95812577194 build/talos
43919864318 integration/gaia-2_0
39484917744 integration/gaia-2_1
18408937380 integration/gaia-1_4
17841191468 projects/cypress/json-pushes
16214923480 projects/maple
13286153245 projects/oak
10576371771 comm-central

gaia-central is no longer operating at ~2x of build/tools. That's an improvement.

However, we're about to begin a work day. Ask me to pull these numbers again in a few hours, once automation has been humming from west coast load for a bit.
gps, can you take a look at the stats again? If it looks better, should we close the bug?
The data looks terrific! I think we can call this one done.
Status: NEW → RESOLVED
Closed: 10 years ago
Resolution: --- → FIXED
Blocks: 1181783
Blocks: 1223615
Component: General Automation → General
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: