Closed Bug 1091780 Opened 10 years ago Closed 6 years ago

HTTPS for schemas.taskcluster.net and references.taskcluster.net

Categories

(Taskcluster :: General, defect)

defect
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: jonasfj, Unassigned)

References

Details

(Keywords: sec-low, wsec-http)

We should configure:
 - schemas.taskcluster.net, and
 - references.taskcluster.net
with cloudfront so that we can add HTTPS support.

I (jonasfj) has certificates and can almost remember how I configured this for tools.taskcluster.net.
But we might need to modify how we upload files, to include CacheControl, probably also charset utf8, just make sure things are done right.

Note, however, we decide to do docs going forward (bug 1056282), we need to do this first. Otherwise, we'll get mixed content errors on the docs site when loading schemas from HTTP.
Component: TaskCluster → General
Product: Testing → Taskcluster
Assignee: nobody → dustin
Blocks: 1269740
Amy:

 schemas.taskcluster.net CNAME dwct33i6dj3i.cloudfront.net
 references.taskcluster.net CNAME d27ltxxyvc62mr.cloudfront.net

plz?
Flags: needinfo?(arich)
both updated in inventory
Flags: needinfo?(arich)
OK, this should be good to go once that DNS propagates.  Jonas, do you want to take a look at the cloudfront configuration and see if there's any room for improvement around caching / encodings?  If not, go ahead and close this.
Flags: needinfo?(jopsen)
Ugh, looks like we need to set `Access-Control-Allow-Origin: *`, which needs to be set on the S3 objects.
OK, that's done, but schemas has internal http references :(
> schemas has internal http references :(
Yes, why we need to redirect http -> https (at-least for now)

> see if there's any room for improvement around caching / encodings?

I see that for tools.taskcluster.net I didn't use an "S3 origin".
I used the web-hosting of an S3 bucket as a "custom origin".
This is because I wanted it to serve index.html, when someone requested folder/.
Example:
  d2riyrukoaoyvi.cloudfront.net/manual/apis/scopes
Works because you created a file called "manual/apis/scopes"
On tools, because I use the "Static Website Hosting" url for a custom origin, I have:
  "manual/apis/scopes/index.html" served when someone requests:
  "manual/apis/scopes/" or "manual/apis/scopes"
The URL would look like this: "docs.taskcluster.net.s3-website-us-east-1.amazonaws.com"
This could be the only difference between webhosting and not webhosting, but it feels more web-like.
And the static site won't have files without .html that are html files :)

Note:
  I see that for tools.taskcluster.net, I used the origin as HTTP-only. I think this is a
  limitation of the S3 "Static Website Hosting" thing. But this is the Cloudfront -> S3 part, so
  I think it's okay.
  Nevertheless, please consider using buckets without dots "." in the bucket name.
  Otherwise, we can't use virtual style access and get HTTPS (yes, I know I've violated this).
  Now is not the time to rename schemas and references buckets, docs automation project can do that,
  as we would have to move all components too.

Nits:
 - I would overwrite S3 caching, and disable caching I think (at least control it from cloudfront)
 - Then maybe enable gzip
 - Also I'm prejudice against S3 buckets in us-east. read-after-write consistency requires
   special tricks here: https://forums.aws.amazon.com/ann.jspa?annID=3112
   This could be out-dated the FAQ has changed since then.. hence my wording "prejudice".
   I'm just not sure, if the default end-point of us-east-1 is read-after-write consistent.
Flags: needinfo?(jopsen)
Redirecting isn't enough -- we need to actually fix the http references.

I built the site generation to generate a key/value store rather than a filesystem, so I don't see a lot of reason to have keys named 'index.html' and then configure the web server to add that to the name.  It might be nice to have automatic handling of a trailing / (for example, http://docs.taskcluster.net.s3-website-us-east-1.amazonaws.com/manual/ doesn't work right now), but not at the expense of naming everything 'index.html'.

I'm not sure what you're saying about the dots -- it doesn't matter in this case, since we're using cloudfront.  Unless cloudfront is somehow using unencrypted access to the backend??

I'll override caching and gzip.

I don't thing read-after-write matters for docs.
That should help with the mixed content, at least for schema.  We need to also fix the refs in the source.

    auth
        api-docs
    queue
        api-docs
        exchanges
    scheduler
        exchanges
    github
        configuration
        api-docs
        exchanges
    aws-provisioner
        api-docs
        exchanges
    hooks
        api-docs
    index
        api-docs
    purge-cache
        api-docs
    secrets
        api-docs
Well, that got ugly quickly:

 * Using `https://schemas.taskcluster.net` URLs in a $ref in a service caused

   Error: can't resolve reference https://schemas.taskcluster.net/auth/v1/get-role-response.json# from id http://schemas.taskcluster.net/auth/v1/list-roles-response.json#

 * tc-lib-api generates some explicit http:// URLs
 * tc-lib-validate generates some explicit http:// URLs

For the moment, I've reverted all of schemas, references, and docs to serve both http and https.  That's going to leave things a little bit broken for those who have cached the 301 redirects, but will at least allow time to update the two libraries and all services in a methodical fashion.

(the revert is still InProgress in the CloudFormation UI)
Commit pushed to master at https://github.com/taskcluster/taskcluster-docs

https://github.com/taskcluster/taskcluster-docs/commit/238323023e5d317831f119099e30f1d55ff64982
Revert "Bug 1091780: forcibly rewrite http schema URLs"

This reverts commit b623835f434c02b06a765ce26df7e269d7deb5ee.
So this schema/references stuff sucks.  The "right" way to fix this is to update all of the documented services to use https.  But it turns out that's complicated: there are three layers of libraries to upgrade, one of which is tc-base, and many of the services are on a very old version of tc-base.  Which means that upgrading would require a more substantial rewriting of those services.

Instead of doing that, we're going to leave bug 1091780 open, and primarily use http://schemas and http://references in production, with cloudfront serving both http and https.  The docs site will rewrite the URLs to https://.. dynamically in the browser.
No longer blocks: 1269740
Assignee: dustin → nobody
Brian, is this easier now that the translation is done during doc generation, rather than on the frontend?
Flags: needinfo?(bstack)
I think it should be easier. Only problem is that not every service uses this yet. It will take some time to run around and get all of them updated. If it's important, I can spend a day soon updating all of the remaining services.
Flags: needinfo?(bstack)
It's not immediately important that everything be https.  It would be good if, as you get the services updated, all of the internal links became https so that, eventually, we can shut off the http versions.
The HSTS header is missing too:

$ curl -IsSf http://schemas.taskcluster.net/taskcluster-treeherder/v1/task-treeher der-config.json | grep -i 'strict'
$
Blocks: 1351415
This will happen as part of merging docs, tools, homepage, schema, and references, as part of r17y.
See Also: → 1463260
These remain available via http but are universally used as https.  I suspect we could turn off http, but rather than break things let's just wait until bug 1457610 at which point they will only be availble (at a different URL) via https.  This bug will be fixed at that point, but I'll close it now to save seeing it in triage a few times before then..
Status: NEW → RESOLVED
Closed: 6 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.