Closed
Bug 1188379
Opened 10 years ago
Closed 1 year ago
Cache subdirectory tarballs
Categories
(Developer Services :: Mercurial: hg.mozilla.org, defect)
Developer Services
Mercurial: hg.mozilla.org
Tracking
(Not tracked)
RESOLVED
WONTFIX
People
(Reporter: dustin, Unassigned)
Details
From gps in bug 1188054:
---
We can install some HTTP foo that redirects specific URLs to e.g. CloudFront URLs. The hard part is managing the generation of said archives. Some archives take several seconds to generate. If automation starts to become real aggressive about fetching archives, we may need to pre-generate archives on the server at push time or have a single client obtain an exclusive lock and trigger builds of certain archives so we don't e.g. have 50 clients all request the same archive at the same time and have 50 processes on the server all generating the archive at once. This could lead to CPU exhaustion if the requested archive is large.
To make this truly useful, you'll probably want *any* requested archive to cache. If we did this, nobody would have to do anything before requesting a new archive. However, if we do that, we'll probably end up with a lot of random archives stored in the cache. Given the cost of S3, I'm not too worried, however.
| Reporter | ||
Comment 1•10 years ago
|
||
There are two use-cases for subdirs right now.
The first is getting a copy of mozharness so that it can check out the rest of the tree. It seems like the long way around the barn, and it is, but it works with Buildbot. Taskcluster *build* jobs don't do this -- they just check out the whole tree with tc-vcs before running the mozharness script. This is what Jordan has just implemented.
The second isn't happening yet, but will soon: getting a copy of mozharness to run tests. In both TC and Buildbot, we don't want the entire gecko tree, just the mozharness bits that know how to run a test job. So here taskcluster will want to use a subdirectory.
In both cases, there's a substantial thundering-herd problem: all those builds fire up more-or-less simultaneously, and all of them want a copy of mozharness. The tests will use the same copy of mozharness as the build, so those should always hit cache. If you generate on-demand, you'll need some kind of locking so that all requests that come in during the generation can be satisfied by the same result. Archiver has this (although it was difficult!).
Generating on push could be a bit tricky since in princple we can request any subdirectory, not just mozharness.
Comment 2•10 years ago
|
||
Obligatory points I'd like to make before implementing this:
1. Directory archives only contain a specific revision of files and not version history. This leads to the next point...
2. As directories grow in size, the overhead for deleting and re-creating the entire directory tree also grows. At some point you are better served by having a version control tool manage the directory, as it knows how to perform incremental updates.
3. https://bitbucket.org/facebook/hg-experimental/src/8127c1a88c7852c1130101e151cb88294c4bee0e/sparse.py?at=default is a Mercurial extension that allows "sparse checkouts." That is, you clone the entire repository but only check out a certain directory. Combined with persistent and shared clones between jobs, this could be a powerful tool in automation.
| Reporter | ||
Comment 3•10 years ago
|
||
Sounds like we need to make a decision, then -- should tc-vcs use subdirectory tarballs (in which case, we need to solve this bug) or should tc-vcs use sparse checkouts (in which case, this can be WONTFIX'd, and we may want to consider doing the same in buildbot-driven builds)
Updated•1 year ago
|
Status: NEW → RESOLVED
Closed: 1 year ago
Resolution: --- → WONTFIX
You need to log in
before you can comment on or make changes to this bug.
Description
•