Closed
Bug 1191128
Opened 7 years ago
Closed 7 years ago
Generate bundles more efficiently
Categories
(Developer Services :: Mercurial: hg.mozilla.org, defect)
Developer Services
Mercurial: hg.mozilla.org
Tracking
(Not tracked)
RESOLVED
FIXED
People
(Reporter: gps, Assigned: gps)
References
Details
Attachments
(8 files)
40 bytes,
text/x-review-board-request
|
fubar
:
review+
|
Details |
MozReview Request: ansible/hg-ssh: run S3 bundle generation out of virtualenv (bug 1191128); r?fubar
40 bytes,
text/x-review-board-request
|
fubar
:
review+
|
Details |
40 bytes,
text/x-review-board-request
|
fubar
:
review+
|
Details |
40 bytes,
text/x-review-board-request
|
fubar
:
review+
|
Details |
40 bytes,
text/x-review-board-request
|
fubar
:
review+
|
Details |
40 bytes,
text/x-review-board-request
|
fubar
:
review+
|
Details |
40 bytes,
text/x-review-board-request
|
fubar
:
review+
|
Details |
40 bytes,
text/x-review-board-request
|
fubar
:
review+
|
Details |
Our bundle generation started with simple cron jobs. As more jobs are added, the risk for slowness in bundle generation to cascade into CPU explosion due to the next cron starting is greater. Let's refactor how we generate bundles so we use a single process that generates all bundles sequentially.
Assignee | ||
Comment 1•7 years ago
|
||
ansible/hg-ssh: create a Python 2.7 virtualenv (bug 1191128); r?fubar We want to start executing server processes out of a virtualenv so we can use Python 2.7 and so we have full control over the Python environment. Prepare for this by creating a virtualenv with Mercurial that uses Python 2.7.
Attachment #8643415 -
Flags: review?(klibby)
Assignee | ||
Comment 2•7 years ago
|
||
ansible/hg-ssh: run S3 bundle generation out of virtualenv (bug 1191128); r?fubar We now have a Python 2.7 virtualenv. Let's put it to use by having the S3 bundle generation processes run out of it. This is a prerequisite to introducing new features to the bundle generation script. We remove the installation of python-boto from system packages and add the boto package to the virtualenv. The transition should be transparent.
Attachment #8643416 -
Flags: review?(klibby)
Assignee | ||
Comment 3•7 years ago
|
||
scripts: remove subprocess.check_output polyfill (bug 1191128); r?fubar We are now running from Python 2.7. We don't need to reimplement subprocess.check_output because Python 2.7 implements it for us.
Attachment #8643417 -
Flags: review?(klibby)
Assignee | ||
Comment 4•7 years ago
|
||
scripts: extract code for producing bundle file into function (bug 1191128); r?fubar We are going to be refactoring how this code is called in a subsequent commit. Refactor first to make the subsequent diff easier to comprehend.
Attachment #8643418 -
Flags: review?(klibby)
Assignee | ||
Comment 5•7 years ago
|
||
scripts: use concurrent threads to generate bundles (bug 1191128); r?fubar Previously, bundle generation occurred sequentially, one type after the other. We would only consume 1 CPU core despite the machine having 12 cores, most of which are not in use at a given instant. This commit throws a futures ThreadPoolExecutor at the problem to enable concurrent bundle generation and upload. The jump from 1 core to 3 for bundle generation seems reasonable. There will still be 9 cores available. With 10 mirrors, there is potential for CPU exhaustion. However, Mercurial processes likely won't eat up an entire core, giving enough headroom for that 10th mirror to have sufficient CPU. Use of ThreadPoolExecutor for upload likely has marginal gain, as upload is performed inside the Python process and the Python GIL will ensure we don't consume more than 1 core. However, networks are involved and I/O will release the GIL, so some benefit is expected.
Attachment #8643419 -
Flags: review?(klibby)
Assignee | ||
Comment 6•7 years ago
|
||
scripts: support reading list of repos from a file (bug 1191128); r?fubar Instead of specifying multiple cron entries which may overlap if execution is slow, it is safer to have a single process that handles all repositories serially. We add support to generate-hg-s3-bundles for reading the list of repositories from a file. This will make it easy to define long lists of repositories to generate bundles for.
Attachment #8643420 -
Flags: review?(klibby)
Assignee | ||
Comment 7•7 years ago
|
||
ansible/hg-ssh: use a single CRON entry for bundle generation (bug 1191128); r?fubar We add a file containing the list of repositories whose bundles to generate. It should be identical to the list of repositories in the CRON jobs today. We install a new, single CRON entry that generates bundles from this file. Note: Ansible won't delete the existing CRON entries. We'll want to manually edit the crontab on hgssh1 when this is deployed.
Attachment #8643421 -
Flags: review?(klibby)
Comment 8•7 years ago
|
||
Comment on attachment 8643415 [details] MozReview Request: ansible/hg-ssh: create a Python 2.7 virtualenv (bug 1191128); r=fubar https://reviewboard.mozilla.org/r/15079/#review13541 Ship It!
Attachment #8643415 -
Flags: review?(klibby) → review+
Comment 9•7 years ago
|
||
Comment on attachment 8643416 [details] MozReview Request: ansible/hg-ssh: run S3 bundle generation out of virtualenv (bug 1191128); r?fubar https://reviewboard.mozilla.org/r/15081/#review13543 ::: ansible/tasks/hgmo-bundle-cron.yml:15 (Diff revision 1) > - job='/repo/hg/scripts/outputif /repo/hg/scripts/generate-hg-s3-bundles {{ item.repo }}' > + job='/repo/hg/scripts/outputif /repo/hg/venv_tools/bin/python repo/hg/scripts/generate-hg-s3-bundles {{ item.repo }}' Missing a leading '/' on 'repo/hg/scripts/generate-hg-s3-bundles'
Attachment #8643416 -
Flags: review?(klibby)
Comment 10•7 years ago
|
||
Comment on attachment 8643417 [details] MozReview Request: scripts: remove subprocess.check_output polyfill (bug 1191128); r=fubar https://reviewboard.mozilla.org/r/15083/#review13545 Ship It!
Attachment #8643417 -
Flags: review?(klibby) → review+
Comment 11•7 years ago
|
||
Comment on attachment 8643418 [details] MozReview Request: scripts: extract code for producing bundle file into function (bug 1191128); r?fubar https://reviewboard.mozilla.org/r/15085/#review13547 ::: scripts/generate-hg-s3-bundles:58 (Diff revision 1) > + if t == 'stream': 't' should be 'typ', no? Also on line 61.
Attachment #8643418 -
Flags: review?(klibby)
Comment 12•7 years ago
|
||
Comment on attachment 8643419 [details] MozReview Request: scripts: use concurrent threads to generate bundles (bug 1191128); r?fubar https://reviewboard.mozilla.org/r/15087/#review13549 ::: scripts/generate-hg-s3-bundles:128 (Diff revision 1) > + with futures.ThreadPoolExecutor(3) as e: How do you feel about setting the number of cores as a variable, e.g. NUM_THREADS or NUM_CORES, so it stands out a bit more at the top if/when we want to change it?
Attachment #8643419 -
Flags: review?(klibby) → review+
Comment 13•7 years ago
|
||
Comment on attachment 8643420 [details] MozReview Request: scripts: support reading list of repos from a file (bug 1191128); r=fubar https://reviewboard.mozilla.org/r/15089/#review13551 Ship It!
Attachment #8643420 -
Flags: review?(klibby) → review+
Updated•7 years ago
|
Attachment #8643421 -
Flags: review?(klibby)
Comment 14•7 years ago
|
||
Comment on attachment 8643421 [details] MozReview Request: ansible/hg-ssh: use a single CRON entry for bundle generation (bug 1191128); r?fubar https://reviewboard.mozilla.org/r/15091/#review13553 ::: ansible/tasks/hgmo-bundle-cron.yml:10 (Diff revision 1) > - cron: name="Generate Mercurial bundles for {{ item.repo }}" > + cron: name="Generate Mercurial bundles" According to the ansible docs, you should be able to nuke the existing cron, e.g.: - name: remove old s3 bundle cron cron: name="Generate Mercurial bundles" state=absent Then define a new cron entry for the updated job.
Assignee | ||
Comment 15•7 years ago
|
||
https://reviewboard.mozilla.org/r/15087/#review13549 > How do you feel about setting the number of cores as a variable, e.g. NUM_THREADS or NUM_CORES, so it stands out a bit more at the top if/when we want to change it? Yeah, I don't like magic numbers either.
Assignee | ||
Comment 16•7 years ago
|
||
Comment on attachment 8643415 [details] MozReview Request: ansible/hg-ssh: create a Python 2.7 virtualenv (bug 1191128); r=fubar ansible/hg-ssh: create a Python 2.7 virtualenv (bug 1191128); r=fubar We want to start executing server processes out of a virtualenv so we can use Python 2.7 and so we have full control over the Python environment. Prepare for this by creating a virtualenv with Mercurial that uses Python 2.7.
Attachment #8643415 -
Attachment description: MozReview Request: ansible/hg-ssh: create a Python 2.7 virtualenv (bug 1191128); r?fubar → MozReview Request: ansible/hg-ssh: create a Python 2.7 virtualenv (bug 1191128); r=fubar
Assignee | ||
Comment 17•7 years ago
|
||
Comment on attachment 8643416 [details] MozReview Request: ansible/hg-ssh: run S3 bundle generation out of virtualenv (bug 1191128); r?fubar ansible/hg-ssh: run S3 bundle generation out of virtualenv (bug 1191128); r?fubar We now have a Python 2.7 virtualenv. Let's put it to use by having the S3 bundle generation processes run out of it. This is a prerequisite to introducing new features to the bundle generation script. We remove the installation of python-boto from system packages and add the boto package to the virtualenv. The transition should be transparent.
Attachment #8643416 -
Flags: review?(klibby)
Assignee | ||
Updated•7 years ago
|
Attachment #8643417 -
Attachment description: MozReview Request: scripts: remove subprocess.check_output polyfill (bug 1191128); r?fubar → MozReview Request: scripts: remove subprocess.check_output polyfill (bug 1191128); r=fubar
Assignee | ||
Comment 18•7 years ago
|
||
Comment on attachment 8643417 [details] MozReview Request: scripts: remove subprocess.check_output polyfill (bug 1191128); r=fubar scripts: remove subprocess.check_output polyfill (bug 1191128); r=fubar We are now running from Python 2.7. We don't need to reimplement subprocess.check_output because Python 2.7 implements it for us.
Assignee | ||
Comment 19•7 years ago
|
||
Comment on attachment 8643418 [details] MozReview Request: scripts: extract code for producing bundle file into function (bug 1191128); r?fubar scripts: extract code for producing bundle file into function (bug 1191128); r?fubar We are going to be refactoring how this code is called in a subsequent commit. Refactor first to make the subsequent diff easier to comprehend.
Attachment #8643418 -
Flags: review?(klibby)
Assignee | ||
Comment 20•7 years ago
|
||
Comment on attachment 8643419 [details] MozReview Request: scripts: use concurrent threads to generate bundles (bug 1191128); r?fubar scripts: use concurrent threads to generate bundles (bug 1191128); r?fubar Previously, bundle generation occurred sequentially, one type after the other. We would only consume 1 CPU core despite the machine having 12 cores, most of which are not in use at a given instant. This commit throws a futures ThreadPoolExecutor at the problem to enable concurrent bundle generation and upload. The jump from 1 core to 3 for bundle generation seems reasonable. There will still be 9 cores available. With 10 mirrors, there is potential for CPU exhaustion. However, Mercurial processes likely won't eat up an entire core, giving enough headroom for that 10th mirror to have sufficient CPU. Use of ThreadPoolExecutor for upload likely has marginal gain, as upload is performed inside the Python process and the Python GIL will ensure we don't consume more than 1 core. However, networks are involved and I/O will release the GIL, so some benefit is expected.
Assignee | ||
Comment 21•7 years ago
|
||
Comment on attachment 8643420 [details] MozReview Request: scripts: support reading list of repos from a file (bug 1191128); r=fubar scripts: support reading list of repos from a file (bug 1191128); r=fubar Instead of specifying multiple cron entries which may overlap if execution is slow, it is safer to have a single process that handles all repositories serially. We add support to generate-hg-s3-bundles for reading the list of repositories from a file. This will make it easy to define long lists of repositories to generate bundles for.
Attachment #8643420 -
Attachment description: MozReview Request: scripts: support reading list of repos from a file (bug 1191128); r?fubar → MozReview Request: scripts: support reading list of repos from a file (bug 1191128); r=fubar
Assignee | ||
Comment 22•7 years ago
|
||
Comment on attachment 8643421 [details] MozReview Request: ansible/hg-ssh: use a single CRON entry for bundle generation (bug 1191128); r?fubar ansible/hg-ssh: use a single CRON entry for bundle generation (bug 1191128); r?fubar We add a file containing the list of repositories whose bundles to generate. It should be identical to the list of repositories in the CRON jobs today. We install a new, single CRON entry that generates bundles from this file. Note: Ansible won't delete the existing CRON entries. We'll want to manually edit the crontab on hgssh1 when this is deployed.
Attachment #8643421 -
Flags: review?(klibby)
Assignee | ||
Comment 23•7 years ago
|
||
scripts: create HTML index listing all bundles (bug 1191128); r?fubar Now that all bundles are generated in the same invocation, this makes generating an index of all known bundles easy. So do it. We will enable static website hosting on the S3 buckets so browser visitors see this index when loading the S3 bucket URL.
Attachment #8643831 -
Flags: review?(klibby)
Updated•7 years ago
|
Attachment #8643416 -
Flags: review?(klibby) → review+
Comment 24•7 years ago
|
||
Comment on attachment 8643416 [details] MozReview Request: ansible/hg-ssh: run S3 bundle generation out of virtualenv (bug 1191128); r?fubar https://reviewboard.mozilla.org/r/15081/#review13579 Ship It!
Comment 25•7 years ago
|
||
Comment on attachment 8643418 [details] MozReview Request: scripts: extract code for producing bundle file into function (bug 1191128); r?fubar https://reviewboard.mozilla.org/r/15085/#review13581 Ship It!
Attachment #8643418 -
Flags: review?(klibby) → review+
Comment 26•7 years ago
|
||
Comment on attachment 8643421 [details] MozReview Request: ansible/hg-ssh: use a single CRON entry for bundle generation (bug 1191128); r?fubar https://reviewboard.mozilla.org/r/15091/#review13583 Ship It!
Attachment #8643421 -
Flags: review?(klibby) → review+
Comment 27•7 years ago
|
||
Comment on attachment 8643831 [details] MozReview Request: scripts: create HTML index listing all bundles (bug 1191128); r?fubar https://reviewboard.mozilla.org/r/15161/#review13585 Ship It!
Attachment #8643831 -
Flags: review?(klibby) → review+
Assignee | ||
Comment 28•7 years ago
|
||
url: https://hg.mozilla.org/hgcustom/version-control-tools/rev/1e504004d485eb788022fc06c0b2341c782b5f8e changeset: 1e504004d485eb788022fc06c0b2341c782b5f8e user: Gregory Szorc <gps@mozilla.com> date: Wed Aug 05 10:14:30 2015 -0700 description: ansible/hg-ssh: create a Python 2.7 virtualenv (bug 1191128); r=fubar We want to start executing server processes out of a virtualenv so we can use Python 2.7 and so we have full control over the Python environment. Prepare for this by creating a virtualenv with Mercurial that uses Python 2.7. url: https://hg.mozilla.org/hgcustom/version-control-tools/rev/28303cb5293a4d2ffebbc80b565aaf937fd1e979 changeset: 28303cb5293a4d2ffebbc80b565aaf937fd1e979 user: Gregory Szorc <gps@mozilla.com> date: Wed Aug 05 11:49:42 2015 -0700 description: ansible/hg-ssh: run S3 bundle generation out of virtualenv (bug 1191128); r=fubar We now have a Python 2.7 virtualenv. Let's put it to use by having the S3 bundle generation processes run out of it. This is a prerequisite to introducing new features to the bundle generation script. We remove the installation of python-boto from system packages and add the boto package to the virtualenv. The transition should be transparent. url: https://hg.mozilla.org/hgcustom/version-control-tools/rev/1f991f40f21ab64a49cbd94ded4266e3f2be4b2c changeset: 1f991f40f21ab64a49cbd94ded4266e3f2be4b2c user: Gregory Szorc <gps@mozilla.com> date: Wed Aug 05 10:15:33 2015 -0700 description: scripts: remove subprocess.check_output polyfill (bug 1191128); r=fubar We are now running from Python 2.7. We don't need to reimplement subprocess.check_output because Python 2.7 implements it for us. url: https://hg.mozilla.org/hgcustom/version-control-tools/rev/dee8f1805657ad5e5170c3632cb41fbe92c00a9e changeset: dee8f1805657ad5e5170c3632cb41fbe92c00a9e user: Gregory Szorc <gps@mozilla.com> date: Wed Aug 05 11:49:59 2015 -0700 description: scripts: extract code for producing bundle file into function (bug 1191128); r=fubar We are going to be refactoring how this code is called in a subsequent commit. Refactor first to make the subsequent diff easier to comprehend. url: https://hg.mozilla.org/hgcustom/version-control-tools/rev/d76306c367d58185f0224c6de5df1ffd8bfa5438 changeset: d76306c367d58185f0224c6de5df1ffd8bfa5438 user: Gregory Szorc <gps@mozilla.com> date: Wed Aug 05 11:50:16 2015 -0700 description: scripts: use concurrent threads to generate bundles (bug 1191128); r=fubar Previously, bundle generation occurred sequentially, one type after the other. We would only consume 1 CPU core despite the machine having 12 cores, most of which are not in use at a given instant. This commit throws a futures ThreadPoolExecutor at the problem to enable concurrent bundle generation and upload. The jump from 1 core to 3 for bundle generation seems reasonable. There will still be 9 cores available. With 10 mirrors, there is potential for CPU exhaustion. However, Mercurial processes likely won't eat up an entire core, giving enough headroom for that 10th mirror to have sufficient CPU. Use of ThreadPoolExecutor for upload likely has marginal gain, as upload is performed inside the Python process and the Python GIL will ensure we don't consume more than 1 core. However, networks are involved and I/O will release the GIL, so some benefit is expected. url: https://hg.mozilla.org/hgcustom/version-control-tools/rev/3920001130bfa20ee43acbb06077520ddd0ca9e4 changeset: 3920001130bfa20ee43acbb06077520ddd0ca9e4 user: Gregory Szorc <gps@mozilla.com> date: Wed Aug 05 10:17:27 2015 -0700 description: scripts: support reading list of repos from a file (bug 1191128); r=fubar Instead of specifying multiple cron entries which may overlap if execution is slow, it is safer to have a single process that handles all repositories serially. We add support to generate-hg-s3-bundles for reading the list of repositories from a file. This will make it easy to define long lists of repositories to generate bundles for. url: https://hg.mozilla.org/hgcustom/version-control-tools/rev/946bda3ac215594cdd9b593a33a0cfb6c9b81e3a changeset: 946bda3ac215594cdd9b593a33a0cfb6c9b81e3a user: Gregory Szorc <gps@mozilla.com> date: Wed Aug 05 11:50:36 2015 -0700 description: ansible/hg-ssh: use a single CRON entry for bundle generation (bug 1191128); r=fubar We add a file containing the list of repositories whose bundles to generate. It should be identical to the list of repositories in the CRON jobs today. We install a new, single CRON entry that generates bundles from this file. Note: Ansible won't delete the existing CRON entries. We'll want to manually edit the crontab on hgssh1 when this is deployed. url: https://hg.mozilla.org/hgcustom/version-control-tools/rev/270815304c94a433f9e987dd90a14f9c63fe0bd7 changeset: 270815304c94a433f9e987dd90a14f9c63fe0bd7 user: Gregory Szorc <gps@mozilla.com> date: Wed Aug 05 11:50:53 2015 -0700 description: scripts: create HTML index listing all bundles (bug 1191128); r=fubar Now that all bundles are generated in the same invocation, this makes generating an index of all known bundles easy. So do it. We will enable static website hosting on the S3 buckets so browser visitors see this index when loading the S3 bucket URL.
Assignee | ||
Comment 29•7 years ago
|
||
url: https://hg.mozilla.org/hgcustom/version-control-tools/rev/e0d6de7da48a1d70524c4959bda671c8a1252fd6 changeset: e0d6de7da48a1d70524c4959bda671c8a1252fd6 user: Gregory Szorc <gps@mozilla.com> date: Wed Aug 05 11:54:48 2015 -0700 description: ansible/hg-ssh: fix YAML syntax errors These are leftovers from bug 1191128. You think I would have caught these during test...
Assignee | ||
Comment 30•7 years ago
|
||
url: https://hg.mozilla.org/hgcustom/version-control-tools/rev/6dce8ac8597f5725fb3a169d7f430c7e92f90f8e changeset: 6dce8ac8597f5725fb3a169d7f430c7e92f90f8e user: Gregory Szorc <gps@mozilla.com> date: Wed Aug 05 12:00:33 2015 -0700 description: ansible/hg-ssh: fix typo (owne -> owner) Fixup from bug 1191128.
Assignee | ||
Comment 31•7 years ago
|
||
url: https://hg.mozilla.org/hgcustom/version-control-tools/rev/60cd962db46baee91a849a505b610c12eb3cac2d changeset: 60cd962db46baee91a849a505b610c12eb3cac2d user: Gregory Szorc <gps@mozilla.com> date: Wed Aug 05 12:16:07 2015 -0700 description: scripts: fix indentation of bundle generation block Wrong indentation was causing us to not generate all bundles. Derp. This is a fixup from bug 1191128.
Assignee | ||
Comment 32•7 years ago
|
||
After a number of fixups, this is deployed and seems to be working. I'm running bundle generation manually to verify everything works. I wouldn't be surprised if there are more minor fixups. But I'm satisfied with calling this resolved.
Status: ASSIGNED → RESOLVED
Closed: 7 years ago
Resolution: --- → FIXED
Assignee | ||
Comment 33•7 years ago
|
||
url: https://hg.mozilla.org/hgcustom/version-control-tools/rev/0b7935ce4b35aedda1727c9eb2535747e8a1e808 changeset: 0b7935ce4b35aedda1727c9eb2535747e8a1e808 user: Gregory Szorc <gps@mozilla.com> date: Wed Aug 05 15:37:46 2015 -0700 description: scripts: don't include bucket name in relative path This is more leftover from bug 1191128. This was resulting in URLs like https://s3-us-west-2.amazonaws.com/moz-hg-bundles-us-west-2/moz-hg-bundles-us-east-1/build/mozharness/7f37f95308e7b9752941d3412aea2cfe9e3f57f2.gzip.hg, which is obviously wrong.
You need to log in
before you can comment on or make changes to this bug.
Description
•