Closed Bug 1191128 Opened 6 years ago Closed 6 years ago

Generate bundles more efficiently

Categories

(Developer Services :: Mercurial: hg.mozilla.org, defect)

defect
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: gps, Assigned: gps)

References

Details

Attachments

(8 files)

40 bytes, text/x-review-board-request
fubar
: review+
Details
40 bytes, text/x-review-board-request
fubar
: review+
Details
40 bytes, text/x-review-board-request
fubar
: review+
Details
40 bytes, text/x-review-board-request
fubar
: review+
Details
40 bytes, text/x-review-board-request
fubar
: review+
Details
40 bytes, text/x-review-board-request
fubar
: review+
Details
40 bytes, text/x-review-board-request
fubar
: review+
Details
40 bytes, text/x-review-board-request
fubar
: review+
Details
Our bundle generation started with simple cron jobs. As more jobs are added, the risk for slowness in bundle generation to cascade into CPU explosion due to the next cron starting is greater. Let's refactor how we generate bundles so we use a single process that generates all bundles sequentially.
ansible/hg-ssh: create a Python 2.7 virtualenv (bug 1191128); r?fubar

We want to start executing server processes out of a virtualenv so we
can use Python 2.7 and so we have full control over the Python
environment. Prepare for this by creating a virtualenv with Mercurial
that uses Python 2.7.
Attachment #8643415 - Flags: review?(klibby)
ansible/hg-ssh: run S3 bundle generation out of virtualenv (bug 1191128); r?fubar

We now have a Python 2.7 virtualenv. Let's put it to use by having the
S3 bundle generation processes run out of it. This is a prerequisite to
introducing new features to the bundle generation script.

We remove the installation of python-boto from system packages and add
the boto package to the virtualenv. The transition should be
transparent.
Attachment #8643416 - Flags: review?(klibby)
scripts: remove subprocess.check_output polyfill (bug 1191128); r?fubar

We are now running from Python 2.7. We don't need to reimplement
subprocess.check_output because Python 2.7 implements it for us.
Attachment #8643417 - Flags: review?(klibby)
scripts: extract code for producing bundle file into function (bug 1191128); r?fubar

We are going to be refactoring how this code is called in a subsequent
commit. Refactor first to make the subsequent diff easier to comprehend.
Attachment #8643418 - Flags: review?(klibby)
scripts: use concurrent threads to generate bundles (bug 1191128); r?fubar

Previously, bundle generation occurred sequentially, one type after the
other. We would only consume 1 CPU core despite the machine having 12
cores, most of which are not in use at a given instant. This commit
throws a futures ThreadPoolExecutor at the problem to enable concurrent
bundle generation and upload.

The jump from 1 core to 3 for bundle generation seems reasonable. There
will still be 9 cores available. With 10 mirrors, there is potential for
CPU exhaustion. However, Mercurial processes likely won't eat up an
entire core, giving enough headroom for that 10th mirror to have
sufficient CPU.

Use of ThreadPoolExecutor for upload likely has marginal gain, as upload
is performed inside the Python process and the Python GIL will ensure we
don't consume more than 1 core. However, networks are involved and I/O
will release the GIL, so some benefit is expected.
Attachment #8643419 - Flags: review?(klibby)
scripts: support reading list of repos from a file (bug 1191128); r?fubar

Instead of specifying multiple cron entries which may overlap if
execution is slow, it is safer to have a single process that handles all
repositories serially.

We add support to generate-hg-s3-bundles for reading the list of
repositories from a file. This will make it easy to define long lists of
repositories to generate bundles for.
Attachment #8643420 - Flags: review?(klibby)
ansible/hg-ssh: use a single CRON entry for bundle generation (bug 1191128); r?fubar

We add a file containing the list of repositories whose bundles to
generate. It should be identical to the list of repositories in the CRON
jobs today.

We install a new, single CRON entry that generates bundles from this
file.

Note: Ansible won't delete the existing CRON entries. We'll want to
manually edit the crontab on hgssh1 when this is deployed.
Attachment #8643421 - Flags: review?(klibby)
Comment on attachment 8643415 [details]
MozReview Request: ansible/hg-ssh: create a Python 2.7 virtualenv (bug 1191128); r=fubar

https://reviewboard.mozilla.org/r/15079/#review13541

Ship It!
Attachment #8643415 - Flags: review?(klibby) → review+
Comment on attachment 8643416 [details]
MozReview Request: ansible/hg-ssh: run S3 bundle generation out of virtualenv (bug 1191128); r?fubar

https://reviewboard.mozilla.org/r/15081/#review13543

::: ansible/tasks/hgmo-bundle-cron.yml:15
(Diff revision 1)
> -        job='/repo/hg/scripts/outputif /repo/hg/scripts/generate-hg-s3-bundles {{ item.repo }}'
> +        job='/repo/hg/scripts/outputif /repo/hg/venv_tools/bin/python repo/hg/scripts/generate-hg-s3-bundles {{ item.repo }}'

Missing a leading '/' on 'repo/hg/scripts/generate-hg-s3-bundles'
Attachment #8643416 - Flags: review?(klibby)
Comment on attachment 8643417 [details]
MozReview Request: scripts: remove subprocess.check_output polyfill (bug 1191128); r=fubar

https://reviewboard.mozilla.org/r/15083/#review13545

Ship It!
Attachment #8643417 - Flags: review?(klibby) → review+
Comment on attachment 8643418 [details]
MozReview Request: scripts: extract code for producing bundle file into function (bug 1191128); r?fubar

https://reviewboard.mozilla.org/r/15085/#review13547

::: scripts/generate-hg-s3-bundles:58
(Diff revision 1)
> +    if t == 'stream':

't' should be 'typ', no?  Also on line 61.
Attachment #8643418 - Flags: review?(klibby)
Comment on attachment 8643419 [details]
MozReview Request: scripts: use concurrent threads to generate bundles (bug 1191128); r?fubar

https://reviewboard.mozilla.org/r/15087/#review13549

::: scripts/generate-hg-s3-bundles:128
(Diff revision 1)
> +    with futures.ThreadPoolExecutor(3) as e:

How do you feel about setting the number of cores as a variable, e.g. NUM_THREADS or NUM_CORES, so it stands out a bit more at the top if/when we want to change it?
Attachment #8643419 - Flags: review?(klibby) → review+
Comment on attachment 8643420 [details]
MozReview Request: scripts: support reading list of repos from a file (bug 1191128); r=fubar

https://reviewboard.mozilla.org/r/15089/#review13551

Ship It!
Attachment #8643420 - Flags: review?(klibby) → review+
Attachment #8643421 - Flags: review?(klibby)
Comment on attachment 8643421 [details]
MozReview Request: ansible/hg-ssh: use a single CRON entry for bundle generation (bug 1191128); r?fubar

https://reviewboard.mozilla.org/r/15091/#review13553

::: ansible/tasks/hgmo-bundle-cron.yml:10
(Diff revision 1)
> -  cron: name="Generate Mercurial bundles for {{ item.repo }}"
> +  cron: name="Generate Mercurial bundles"

According to the ansible docs, you should be able to nuke the existing cron, e.g.:

- name: remove old s3 bundle cron
  cron: name="Generate Mercurial bundles"
  state=absent

Then define a new cron entry for the updated job.
https://reviewboard.mozilla.org/r/15087/#review13549

> How do you feel about setting the number of cores as a variable, e.g. NUM_THREADS or NUM_CORES, so it stands out a bit more at the top if/when we want to change it?

Yeah, I don't like magic numbers either.
Comment on attachment 8643415 [details]
MozReview Request: ansible/hg-ssh: create a Python 2.7 virtualenv (bug 1191128); r=fubar

ansible/hg-ssh: create a Python 2.7 virtualenv (bug 1191128); r=fubar

We want to start executing server processes out of a virtualenv so we
can use Python 2.7 and so we have full control over the Python
environment. Prepare for this by creating a virtualenv with Mercurial
that uses Python 2.7.
Attachment #8643415 - Attachment description: MozReview Request: ansible/hg-ssh: create a Python 2.7 virtualenv (bug 1191128); r?fubar → MozReview Request: ansible/hg-ssh: create a Python 2.7 virtualenv (bug 1191128); r=fubar
Comment on attachment 8643416 [details]
MozReview Request: ansible/hg-ssh: run S3 bundle generation out of virtualenv (bug 1191128); r?fubar

ansible/hg-ssh: run S3 bundle generation out of virtualenv (bug 1191128); r?fubar

We now have a Python 2.7 virtualenv. Let's put it to use by having the
S3 bundle generation processes run out of it. This is a prerequisite to
introducing new features to the bundle generation script.

We remove the installation of python-boto from system packages and add
the boto package to the virtualenv. The transition should be
transparent.
Attachment #8643416 - Flags: review?(klibby)
Attachment #8643417 - Attachment description: MozReview Request: scripts: remove subprocess.check_output polyfill (bug 1191128); r?fubar → MozReview Request: scripts: remove subprocess.check_output polyfill (bug 1191128); r=fubar
Comment on attachment 8643417 [details]
MozReview Request: scripts: remove subprocess.check_output polyfill (bug 1191128); r=fubar

scripts: remove subprocess.check_output polyfill (bug 1191128); r=fubar

We are now running from Python 2.7. We don't need to reimplement
subprocess.check_output because Python 2.7 implements it for us.
Comment on attachment 8643418 [details]
MozReview Request: scripts: extract code for producing bundle file into function (bug 1191128); r?fubar

scripts: extract code for producing bundle file into function (bug 1191128); r?fubar

We are going to be refactoring how this code is called in a subsequent
commit. Refactor first to make the subsequent diff easier to comprehend.
Attachment #8643418 - Flags: review?(klibby)
Comment on attachment 8643419 [details]
MozReview Request: scripts: use concurrent threads to generate bundles (bug 1191128); r?fubar

scripts: use concurrent threads to generate bundles (bug 1191128); r?fubar

Previously, bundle generation occurred sequentially, one type after the
other. We would only consume 1 CPU core despite the machine having 12
cores, most of which are not in use at a given instant. This commit
throws a futures ThreadPoolExecutor at the problem to enable concurrent
bundle generation and upload.

The jump from 1 core to 3 for bundle generation seems reasonable. There
will still be 9 cores available. With 10 mirrors, there is potential for
CPU exhaustion. However, Mercurial processes likely won't eat up an
entire core, giving enough headroom for that 10th mirror to have
sufficient CPU.

Use of ThreadPoolExecutor for upload likely has marginal gain, as upload
is performed inside the Python process and the Python GIL will ensure we
don't consume more than 1 core. However, networks are involved and I/O
will release the GIL, so some benefit is expected.
Comment on attachment 8643420 [details]
MozReview Request: scripts: support reading list of repos from a file (bug 1191128); r=fubar

scripts: support reading list of repos from a file (bug 1191128); r=fubar

Instead of specifying multiple cron entries which may overlap if
execution is slow, it is safer to have a single process that handles all
repositories serially.

We add support to generate-hg-s3-bundles for reading the list of
repositories from a file. This will make it easy to define long lists of
repositories to generate bundles for.
Attachment #8643420 - Attachment description: MozReview Request: scripts: support reading list of repos from a file (bug 1191128); r?fubar → MozReview Request: scripts: support reading list of repos from a file (bug 1191128); r=fubar
Comment on attachment 8643421 [details]
MozReview Request: ansible/hg-ssh: use a single CRON entry for bundle generation (bug 1191128); r?fubar

ansible/hg-ssh: use a single CRON entry for bundle generation (bug 1191128); r?fubar

We add a file containing the list of repositories whose bundles to
generate. It should be identical to the list of repositories in the CRON
jobs today.

We install a new, single CRON entry that generates bundles from this
file.

Note: Ansible won't delete the existing CRON entries. We'll want to
manually edit the crontab on hgssh1 when this is deployed.
Attachment #8643421 - Flags: review?(klibby)
scripts: create HTML index listing all bundles (bug 1191128); r?fubar

Now that all bundles are generated in the same invocation, this makes
generating an index of all known bundles easy. So do it.

We will enable static website hosting on the S3 buckets so browser
visitors see this index when loading the S3 bucket URL.
Attachment #8643831 - Flags: review?(klibby)
Attachment #8643416 - Flags: review?(klibby) → review+
Comment on attachment 8643416 [details]
MozReview Request: ansible/hg-ssh: run S3 bundle generation out of virtualenv (bug 1191128); r?fubar

https://reviewboard.mozilla.org/r/15081/#review13579

Ship It!
Comment on attachment 8643418 [details]
MozReview Request: scripts: extract code for producing bundle file into function (bug 1191128); r?fubar

https://reviewboard.mozilla.org/r/15085/#review13581

Ship It!
Attachment #8643418 - Flags: review?(klibby) → review+
Comment on attachment 8643421 [details]
MozReview Request: ansible/hg-ssh: use a single CRON entry for bundle generation (bug 1191128); r?fubar

https://reviewboard.mozilla.org/r/15091/#review13583

Ship It!
Attachment #8643421 - Flags: review?(klibby) → review+
Comment on attachment 8643831 [details]
MozReview Request: scripts: create HTML index listing all bundles (bug 1191128); r?fubar

https://reviewboard.mozilla.org/r/15161/#review13585

Ship It!
Attachment #8643831 - Flags: review?(klibby) → review+
url:        https://hg.mozilla.org/hgcustom/version-control-tools/rev/1e504004d485eb788022fc06c0b2341c782b5f8e
changeset:  1e504004d485eb788022fc06c0b2341c782b5f8e
user:       Gregory Szorc <gps@mozilla.com>
date:       Wed Aug 05 10:14:30 2015 -0700
description:
ansible/hg-ssh: create a Python 2.7 virtualenv (bug 1191128); r=fubar

We want to start executing server processes out of a virtualenv so we
can use Python 2.7 and so we have full control over the Python
environment. Prepare for this by creating a virtualenv with Mercurial
that uses Python 2.7.

url:        https://hg.mozilla.org/hgcustom/version-control-tools/rev/28303cb5293a4d2ffebbc80b565aaf937fd1e979
changeset:  28303cb5293a4d2ffebbc80b565aaf937fd1e979
user:       Gregory Szorc <gps@mozilla.com>
date:       Wed Aug 05 11:49:42 2015 -0700
description:
ansible/hg-ssh: run S3 bundle generation out of virtualenv (bug 1191128); r=fubar

We now have a Python 2.7 virtualenv. Let's put it to use by having the
S3 bundle generation processes run out of it. This is a prerequisite to
introducing new features to the bundle generation script.

We remove the installation of python-boto from system packages and add
the boto package to the virtualenv. The transition should be
transparent.

url:        https://hg.mozilla.org/hgcustom/version-control-tools/rev/1f991f40f21ab64a49cbd94ded4266e3f2be4b2c
changeset:  1f991f40f21ab64a49cbd94ded4266e3f2be4b2c
user:       Gregory Szorc <gps@mozilla.com>
date:       Wed Aug 05 10:15:33 2015 -0700
description:
scripts: remove subprocess.check_output polyfill (bug 1191128); r=fubar

We are now running from Python 2.7. We don't need to reimplement
subprocess.check_output because Python 2.7 implements it for us.

url:        https://hg.mozilla.org/hgcustom/version-control-tools/rev/dee8f1805657ad5e5170c3632cb41fbe92c00a9e
changeset:  dee8f1805657ad5e5170c3632cb41fbe92c00a9e
user:       Gregory Szorc <gps@mozilla.com>
date:       Wed Aug 05 11:49:59 2015 -0700
description:
scripts: extract code for producing bundle file into function (bug 1191128); r=fubar

We are going to be refactoring how this code is called in a subsequent
commit. Refactor first to make the subsequent diff easier to comprehend.

url:        https://hg.mozilla.org/hgcustom/version-control-tools/rev/d76306c367d58185f0224c6de5df1ffd8bfa5438
changeset:  d76306c367d58185f0224c6de5df1ffd8bfa5438
user:       Gregory Szorc <gps@mozilla.com>
date:       Wed Aug 05 11:50:16 2015 -0700
description:
scripts: use concurrent threads to generate bundles (bug 1191128); r=fubar

Previously, bundle generation occurred sequentially, one type after the
other. We would only consume 1 CPU core despite the machine having 12
cores, most of which are not in use at a given instant. This commit
throws a futures ThreadPoolExecutor at the problem to enable concurrent
bundle generation and upload.

The jump from 1 core to 3 for bundle generation seems reasonable. There
will still be 9 cores available. With 10 mirrors, there is potential for
CPU exhaustion. However, Mercurial processes likely won't eat up an
entire core, giving enough headroom for that 10th mirror to have
sufficient CPU.

Use of ThreadPoolExecutor for upload likely has marginal gain, as upload
is performed inside the Python process and the Python GIL will ensure we
don't consume more than 1 core. However, networks are involved and I/O
will release the GIL, so some benefit is expected.

url:        https://hg.mozilla.org/hgcustom/version-control-tools/rev/3920001130bfa20ee43acbb06077520ddd0ca9e4
changeset:  3920001130bfa20ee43acbb06077520ddd0ca9e4
user:       Gregory Szorc <gps@mozilla.com>
date:       Wed Aug 05 10:17:27 2015 -0700
description:
scripts: support reading list of repos from a file (bug 1191128); r=fubar

Instead of specifying multiple cron entries which may overlap if
execution is slow, it is safer to have a single process that handles all
repositories serially.

We add support to generate-hg-s3-bundles for reading the list of
repositories from a file. This will make it easy to define long lists of
repositories to generate bundles for.

url:        https://hg.mozilla.org/hgcustom/version-control-tools/rev/946bda3ac215594cdd9b593a33a0cfb6c9b81e3a
changeset:  946bda3ac215594cdd9b593a33a0cfb6c9b81e3a
user:       Gregory Szorc <gps@mozilla.com>
date:       Wed Aug 05 11:50:36 2015 -0700
description:
ansible/hg-ssh: use a single CRON entry for bundle generation (bug 1191128); r=fubar

We add a file containing the list of repositories whose bundles to
generate. It should be identical to the list of repositories in the CRON
jobs today.

We install a new, single CRON entry that generates bundles from this
file.

Note: Ansible won't delete the existing CRON entries. We'll want to
manually edit the crontab on hgssh1 when this is deployed.

url:        https://hg.mozilla.org/hgcustom/version-control-tools/rev/270815304c94a433f9e987dd90a14f9c63fe0bd7
changeset:  270815304c94a433f9e987dd90a14f9c63fe0bd7
user:       Gregory Szorc <gps@mozilla.com>
date:       Wed Aug 05 11:50:53 2015 -0700
description:
scripts: create HTML index listing all bundles (bug 1191128); r=fubar

Now that all bundles are generated in the same invocation, this makes
generating an index of all known bundles easy. So do it.

We will enable static website hosting on the S3 buckets so browser
visitors see this index when loading the S3 bucket URL.
url:        https://hg.mozilla.org/hgcustom/version-control-tools/rev/e0d6de7da48a1d70524c4959bda671c8a1252fd6
changeset:  e0d6de7da48a1d70524c4959bda671c8a1252fd6
user:       Gregory Szorc <gps@mozilla.com>
date:       Wed Aug 05 11:54:48 2015 -0700
description:
ansible/hg-ssh: fix YAML syntax errors

These are leftovers from bug 1191128. You think I would have caught
these during test...
url:        https://hg.mozilla.org/hgcustom/version-control-tools/rev/6dce8ac8597f5725fb3a169d7f430c7e92f90f8e
changeset:  6dce8ac8597f5725fb3a169d7f430c7e92f90f8e
user:       Gregory Szorc <gps@mozilla.com>
date:       Wed Aug 05 12:00:33 2015 -0700
description:
ansible/hg-ssh: fix typo (owne -> owner)

Fixup from bug 1191128.
url:        https://hg.mozilla.org/hgcustom/version-control-tools/rev/60cd962db46baee91a849a505b610c12eb3cac2d
changeset:  60cd962db46baee91a849a505b610c12eb3cac2d
user:       Gregory Szorc <gps@mozilla.com>
date:       Wed Aug 05 12:16:07 2015 -0700
description:
scripts: fix indentation of bundle generation block

Wrong indentation was causing us to not generate all bundles. Derp. This
is a fixup from bug 1191128.
After a number of fixups, this is deployed and seems to be working. I'm running bundle generation manually to verify everything works. I wouldn't be surprised if there are more minor fixups. But I'm satisfied with calling this resolved.
Status: ASSIGNED → RESOLVED
Closed: 6 years ago
Resolution: --- → FIXED
url:        https://hg.mozilla.org/hgcustom/version-control-tools/rev/0b7935ce4b35aedda1727c9eb2535747e8a1e808
changeset:  0b7935ce4b35aedda1727c9eb2535747e8a1e808
user:       Gregory Szorc <gps@mozilla.com>
date:       Wed Aug 05 15:37:46 2015 -0700
description:
scripts: don't include bucket name in relative path

This is more leftover from bug 1191128. This was resulting in URLs like
https://s3-us-west-2.amazonaws.com/moz-hg-bundles-us-west-2/moz-hg-bundles-us-east-1/build/mozharness/7f37f95308e7b9752941d3412aea2cfe9e3f57f2.gzip.hg,
which is obviously wrong.
You need to log in before you can comment on or make changes to this bug.