Closed Bug 1011095 Opened 11 years ago Closed 11 years ago

Reconsider S3 CNAMEing for tasks.taskcluster.net to ensure HTTPS support

Categories

(Taskcluster :: General, defect)

defect
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: jonasfj, Unassigned)

Details

Currently we CNAME a few things to S3 buckets, specifically: schemas.taskcluster.net (schemas are loaded from here on the fly) references.taskcluster.net (documentation and input for auto-generated client) tasks.taskcluster.net (tasks, runs, artifacts) For schemas and references HTTPS access is less critical. But schemas are loaded from schemas.taskcluster.net on the fly, so ideally, we should have HTTPS here. Things loaded from references will usually be validated manually to some extend. So let's consider tasks.taskcluster.net for now. The problem: We store tasks in us-west-2, ideally we should have a bucket for each region, at least for artifacts. But for now just using us-west-2 will do. When we CNAME the S3 bucket, we obviously don't get HTTPS when accessing it, using the CNAME. So perhaps we should phase out the CNAMEing all together. Furthermore, in order to CNAME the bucket name must be the same as the domain name. Hence, the bucket name is "tasks.taskcluster.net", this is bad because it prevents us from accessing the bucket using HTTPS. Buckets outside US standard region must be access using virtual host pattern: - https://<bucket-name>.s3.amazonaws.com/<key> If the <bucket-name> contains a dot ".", then the wildcard SSL certificate AWS uses won't work anymore. For details see: http://docs.aws.amazon.com/AmazonS3/latest/dev/VirtualHosting.html Hence, we should probably reconsider CNAMEing S3 buckets. Use region specific buckets for artifacts, and store task.json, runs.json, logs.json, result.json and resolution.json in a special bucket or azure blob storage (which consistency wise would be much better suited for this).
For current CNAME mapping, see: bug 1007335 (though this might change).
Update: My bad, it seems we can still use HTTPS access for bucket names that contains dots "." outside the US standard region, we just have to use the specific region end-point (which is usually a smart thing to configure for performance anyways). Example: - https://s3-<region>.amazonaws.com/<bucket-name>/<key> (This will work even with dots "." in the <bucket-name>) Nevertheless, we should reconsider the CNAME policy anyways, so people don't build tools that uses HTTP.
+1 from me ... we should avoid http and having explicit endpoints from the queue as to where the tasks live is fairly easy given the knowledge we have of the state of the tasks (how we persist that state is another issue).
Yeah, perhaps we should make a series of buckets for artifacts in different regions, with a bucket name like: taskcluster-artifacts-<region> And then store task.json, logs.json, result.json, resolution.json in azure blob storage, I think we can sign PUT urls for azure too. Artifacts can go anywhere as URLs are in result.json, but the .json files should probably always be accessed through the queue. Even if they are deleted from database. This way we can support secret tasks, as we'll just redirect to signed URLs from the queue.
^ I agree with the the above but any particular reason why you think we should use azure blob storage for the .json files? We could just use another S3 bucket. Azure has pretty nice "signing" support via SAS Azure SAS http://msdn.microsoft.com/en-US/library/azure/dn140255.aspx it's at least as flexible as what we can do on AWS for individual blobs
We should consider azure blob storage over S3 for consistency as resolution.json may be overwritten when a rerun occurs, so atomic updates would be nice now. I suspect we're going to commit further crimes like rerun in the future, getting atomic updates would be great. Imagine a feature that modifies at pending task or something... Also it would allow for idempotent task creation. Currently we can't check if a taskId is used, hence, they have to be generated in the server to ensure uniqueness. With azure blob storage we can check if a taskId is already used and make the post task api look like: /v1/task/:taskId/create I.e. poster can decide taskId.
This is again no longer relevant. Everything is routed through the queue now.. And we only (afaik) use HTTPS end-points (unless some stupid user decides to use baseUrl = http://queue.taskcluster.net.. We might be able to check against that, but I doubt it's necessary.
Status: NEW → RESOLVED
Closed: 11 years ago
Resolution: --- → FIXED
Component: TaskCluster → General
Product: Testing → Taskcluster
Target Milestone: --- → mozilla41
Version: unspecified → Trunk
Resetting Version and Target Milestone that accidentally got changed...
Target Milestone: mozilla41 → ---
Version: Trunk → unspecified
You need to log in before you can comment on or make changes to this bug.