Closed
Bug 1011095
Opened 11 years ago
Closed 11 years ago
Reconsider S3 CNAMEing for tasks.taskcluster.net to ensure HTTPS support
Categories
(Taskcluster :: General, defect)
Taskcluster
General
Tracking
(Not tracked)
RESOLVED
FIXED
People
(Reporter: jonasfj, Unassigned)
Details
Currently we CNAME a few things to S3 buckets, specifically:
schemas.taskcluster.net (schemas are loaded from here on the fly)
references.taskcluster.net (documentation and input for auto-generated client)
tasks.taskcluster.net (tasks, runs, artifacts)
For schemas and references HTTPS access is less critical. But schemas are loaded from schemas.taskcluster.net on the fly, so ideally, we should have HTTPS here.
Things loaded from references will usually be validated manually to some extend.
So let's consider tasks.taskcluster.net for now.
The problem:
We store tasks in us-west-2, ideally we should have a bucket for each region, at least for artifacts. But for now just using us-west-2 will do.
When we CNAME the S3 bucket, we obviously don't get HTTPS when accessing it, using the CNAME. So perhaps we should phase out the CNAMEing all together.
Furthermore, in order to CNAME the bucket name must be the same as the domain name.
Hence, the bucket name is "tasks.taskcluster.net", this is bad because it prevents us from accessing the bucket using HTTPS.
Buckets outside US standard region must be access using virtual host pattern:
- https://<bucket-name>.s3.amazonaws.com/<key>
If the <bucket-name> contains a dot ".", then the wildcard SSL certificate AWS uses won't work anymore. For details see:
http://docs.aws.amazon.com/AmazonS3/latest/dev/VirtualHosting.html
Hence, we should probably reconsider CNAMEing S3 buckets. Use region specific buckets for artifacts, and store task.json, runs.json, logs.json, result.json and resolution.json in a special bucket or azure blob storage (which consistency wise would be much better suited for this).
| Reporter | ||
Comment 1•11 years ago
|
||
For current CNAME mapping, see: bug 1007335 (though this might change).
| Reporter | ||
Comment 2•11 years ago
|
||
Update:
My bad, it seems we can still use HTTPS access for bucket names that contains dots "." outside the US standard region, we just have to use the specific region end-point (which is usually a smart thing to configure for performance anyways).
Example:
- https://s3-<region>.amazonaws.com/<bucket-name>/<key>
(This will work even with dots "." in the <bucket-name>)
Nevertheless, we should reconsider the CNAME policy anyways, so people don't build tools that uses HTTP.
Comment 3•11 years ago
|
||
+1 from me ... we should avoid http and having explicit endpoints from the queue as to where the tasks live is fairly easy given the knowledge we have of the state of the tasks (how we persist that state is another issue).
| Reporter | ||
Comment 4•11 years ago
|
||
Yeah, perhaps we should make a series of buckets for artifacts in different regions, with a bucket name like: taskcluster-artifacts-<region>
And then store task.json, logs.json, result.json, resolution.json in azure blob storage, I think we can sign PUT urls for azure too.
Artifacts can go anywhere as URLs are in result.json, but the .json files should probably always be accessed through the queue. Even if they are deleted from database. This way we can support secret tasks, as we'll just redirect to signed URLs from the queue.
Comment 5•11 years ago
|
||
^ I agree with the the above but any particular reason why you think we should use azure blob storage for the .json files? We could just use another S3 bucket.
Azure has pretty nice "signing" support via SAS Azure SAS http://msdn.microsoft.com/en-US/library/azure/dn140255.aspx it's at least as flexible as what we can do on AWS for individual blobs
| Reporter | ||
Comment 6•11 years ago
|
||
We should consider azure blob storage over S3 for consistency as resolution.json may be overwritten when a rerun occurs, so atomic updates would be nice now.
I suspect we're going to commit further crimes like rerun in the future, getting atomic updates would be great. Imagine a feature that modifies at pending task or something...
Also it would allow for idempotent task creation. Currently we can't check if a taskId is used, hence, they have to be generated in the server to ensure uniqueness. With azure blob storage we can check if a taskId is already used and make the post task api look like: /v1/task/:taskId/create
I.e. poster can decide taskId.
| Reporter | ||
Comment 7•11 years ago
|
||
This is again no longer relevant. Everything is routed through the queue now.. And we only (afaik) use HTTPS end-points (unless some stupid user decides to use baseUrl = http://queue.taskcluster.net..
We might be able to check against that, but I doubt it's necessary.
| Reporter | ||
Updated•11 years ago
|
Status: NEW → RESOLVED
Closed: 11 years ago
Resolution: --- → FIXED
Updated•10 years ago
|
Component: TaskCluster → General
Product: Testing → Taskcluster
Target Milestone: --- → mozilla41
Version: unspecified → Trunk
Comment 8•10 years ago
|
||
Resetting Version and Target Milestone that accidentally got changed...
Target Milestone: mozilla41 → ---
Version: Trunk → unspecified
You need to log in
before you can comment on or make changes to this bug.
Description
•