Closed Bug 1188054 Opened 10 years ago Closed 8 years ago

Extend TC-VCS to handle subdirectory downloads

Categories

(Taskcluster :: General, defect)

defect
Not set
normal

Tracking

(Not tracked)

RESOLVED INCOMPLETE

People

(Reporter: dustin, Unassigned)

Details

(Whiteboard: [tc-vcs])

https://hg.mozilla.org supports checkouts (well, downloads) of subdirectories of repositories. This is particularly helpful in cases where you only need a subdirectory of a very large checkout. For example: - scripts required for an ad-hoc task like building gcc - testing/mozharness, to run tests against a gecko build The idea would be to specify a subdirectory to tc-vcs in addition to the other parameters, with the expectation that only that subdirectory would be checked out. Releng has some support for caching the tarballs in S3 via archiver (https://api.pub.build.mozilla.org/docs/usage/archiver/), but perhaps TC-VCS wants to use some other means of caching. I suspect that hitting hgmo directly for every tarball will quickly clobber it.
Fetching archives from hg.mozilla.org is expensive on the server. If we are to start downloading subdirectories of repositories, it is imperative that a caching layer (such as archiver) be mandatory. Alternatively, we could build something into the server itself so requests for archives HTTP redirect to S3 or some such. On that note, I'd rather solve caching on the server because it tends to be simpler to have a centralized solution as opposed to N solutions for N clients.
Yes, pretty please solve the caching on the server -- archiver is really only a workaround for that lack. I assume some of the bundleclone logic could be re-used?
We can install some HTTP foo that redirects specific URLs to e.g. CloudFront URLs. The hard part is managing the generation of said archives. Some archives take several seconds to generate. If automation starts to become real aggressive about fetching archives, we may need to pre-generate archives on the server at push time or have a single client obtain an exclusive lock and trigger builds of certain archives so we don't e.g. have 50 clients all request the same archive at the same time and have 50 processes on the server all generating the archive at once. This could lead to CPU exhaustion if the requested archive is large. To make this truly useful, you'll probably want *any* requested archive to cache. If we did this, nobody would have to do anything before requesting a new archive. However, if we do that, we'll probably end up with a lot of random archives stored in the cache. Given the cost of S3, I'm not too worried, however.
..moved to bug 1188379 I think this bug remains at "teach tc-vcs to get subdirectories from hg.m.o". If bug 1188379 works out, that's all we'll need. Otherwise, we'll have a separate bug for "teach tc-vcs to get subdirectories from archiver" or proxxy or to do its own caching or something..
Whiteboard: [tc-vcs]
Component: Tools → General
Status: NEW → RESOLVED
Closed: 8 years ago
Resolution: --- → INCOMPLETE
You need to log in before you can comment on or make changes to this bug.