HSTS preload list not updating?

RESOLVED FIXED in Firefox -esr60

Status

()

defect
P1
normal
RESOLVED FIXED
Last year
Last year

People

(Reporter: mismith, Assigned: keeler)

Tracking

60 Branch
mozilla63
Points:
---
Dependency tree / graph

Firefox Tracking Flags

(firefox-esr52 unaffected, firefox-esr6062+ fixed, firefox61 wontfix, firefox62+ wontfix, firefox63+ fixed)

Details

Attachments

(3 attachments)

According to <https://wiki.mozilla.org/SecurityEngineering/HTTP_Strict_Transport_Security_(HSTS)_Preload_List>, the HSTS preload list is supposed to be updated daily.

But looking through <https://hg.mozilla.org/mozilla-central/log/tip/security/manager/ssl/nsSTSPreloadList.inc>, I can't find any new domains added to the list in the past month.

There are domains in Chromium's list <https://chromium.googlesource.com/chromium/src/net/+/master/http/transport_security_state_static.json> which are missing and don't appear on the errors list <https://hg.mozilla.org/mozilla-central/raw-file/tip/security/manager/ssl/nsSTSPreloadList.errors>.

Picking an arbitrary example, 'alonetone.com' was added to Chromium's list in May <https://chromium.googlesource.com/chromium/src/net/+/c7fa5723f942f9d4be6c52b5b8bc0574ebca275e%5E%21/> and returns an eligible 'Strict-Transport-Security' header, yet is not in Firefox's list.
I just made a really quick comparison with reduced versions of
https://chromium.googlesource.com/chromium/src/net/+/master/http/transport_security_state_static.json?format=TEXT (base64), nsSTSPreloadList.inc and nsSTSPreloadList.errors using https://stackoverflow.com/a/4546712:
It looks like 5643 domains are unknown to Firefox.
Is
> "policy": "bulk-1-year"
making problems?
Flags: needinfo?(dkeeler)
Simon, now that we've moved the jobs that produce these files to... docker? taskcluster? I have no idea where to find the logs. Can you point me in the right direction? Thanks! (Also it looks like we no longer update the .errors file - was that intended?)
Flags: needinfo?(dkeeler) → needinfo?(sfraser)
There's some related comments on the Wikimedia bug tracker task to implement the HSTS preload list for all outgoing links from Wikipedia.  See https://phabricator.wikimedia.org/T200745 (I'm Cyde over there.)
When the HSTS preload script was reworked to use async/await in bug 1436369,
`fetchstatus` would create an asynchronous xml http request and then attempt to
access a response header from it. However, there was nothing to ensure that the
request had completed before this code ran. This patch ensures that the request
has completed before the response header is used.

This patch also replaces a lingering instance of `Ci.nsISSLStatusProvider` that
should have been changed to `Ci.nsITransportSecurityInfo` in bug 1475647.

Finally, this patch removes the old, redundant getHSTSPreloadList.js in
security/manager/tools as well as the unused nsSTSPreloadList.errors file in
security/manager/ssl.
Assignee: nobody → dkeeler
Blocks: 1436369
Priority: -- → P1
Following this fix, how often will the HSTS preload list in central be updated?
(In reply to Ben McIlwain from comment #5)
> Following this fix, how often will the HSTS preload list in central be
> updated?

At least once a week. Maybe daily.
Currently twice a week.
Comment on attachment 8998048 [details]
bug 1479918 - fix HSTS update script to only access XHR headers after each request has completed r?sfraser

Simon Fraser [:sfraser] ⌚️GMT has approved the revision.
Attachment #8998048 - Flags: review+
(In reply to [:keeler] (use needinfo) from comment #2)
> Simon, now that we've moved the jobs that produce these files to... docker?
> taskcluster? I have no idea where to find the logs. Can you point me in the
> right direction? Thanks! (Also it looks like we no longer update the .errors
> file - was that intended?)

Yes, that was intended, as that sort of log should not be in-tree. 

Tasks can be found in treeherder, such as https://treeherder.mozilla.org/#/jobs?repo=mozilla-central&filter-searchStr=pfu&fromchange=3cb90f16402bc5e1203e2771dc93553b8377fa40&selectedJob=192208505

Logs are available if you inspect the task.
Flags: needinfo?(sfraser)
When can we expect the first update to the HSTS preload list? Those ~5k URLs :darksider mentioned still appear to be missing from the version in mozilla-central.
(Oh - I see now that the patch was just approved hours ago, so it makes sense that there hasn't been an update yet.)
Pushed by dkeeler@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/8015bcc55dae
fix HSTS update script to only access XHR headers after each request has completed r=sfraser
https://hg.mozilla.org/mozilla-central/rev/8015bcc55dae
Status: NEW → RESOLVED
Closed: Last year
Resolution: --- → FIXED
Target Milestone: --- → mozilla63
FYI, I forced a run on m-c tip today after this fix was merged, but I'm not seeing any changes to the HSTS list other than a date bump:
https://phabricator.services.mozilla.com/D3333
Flags: needinfo?(dkeeler)
Looks like every request to verify the presence of the HSTS header is failing, with messages like "ERROR: exception making request to kiwiirc.com" in the log.
Confirmed (perfektesgewicht.de). It would be good if the migration to Taskcluster also fixes bug 1309201.
Huh - maybe there's something in the docker environment that's preventing us from connecting to sites (other than to download the source list)? Any ideas, Simon?
Flags: needinfo?(dkeeler) → needinfo?(sfraser)
(In reply to [:keeler] (use needinfo) from comment #18)
> Huh - maybe there's something in the docker environment that's preventing us
> from connecting to sites (other than to download the source list)? Any
> ideas, Simon?

I've re-run the task mentioned above as an interactive task to try and recreate the conditions as much as possible.

kiwiirc.com appears to be ipv4 only.
# openssl s_client -connect kiwiirc.com:443 -crlf
[... openssl certs elided... ]  
GET https://kiwiirc.com HTTP/1.0

HTTP/1.1 400 Bad Request
Date: Wed, 15 Aug 2018 07:34:02 GMT
Content-Type: text/html
Content-Length: 171
Connection: close
Server: cloudflare-nginx
CF-RAY: 44a9f9f669dcbfc7-MAN


perfektesgewicht.de is responding correctly, though:
Strict-Transport-Security: max-age=315360000; includeSubDomains; preload
It was included in the output, presumably due to the check that says 'if we already added it, but couldn't test this time, add it anyway'. Better logging there would be helpful, though.

When we moved it to taskcluster the two variants were producing identical diffs, so I'd need more support that it's due to the migration or worker it's running on.
Flags: needinfo?(sfraser)
(In reply to Simon Fraser [:sfraser] ⌚️GMT from comment #19)
> kiwiirc.com appears to be ipv4 only.
> # openssl s_client -connect kiwiirc.com:443 -crlf
> [... openssl certs elided... ]  
> GET https://kiwiirc.com HTTP/1.0

You're missing a Host header:

> $ openssl s_client -connect kiwiirc.com:443 -crlf                                                                                                                                                                                                                                                                  > [...snip...]
> GET / HTTP/1.1
> Host: kiwiirc.com
> 
> HTTP/1.1 200 OK
> Date: Wed, 15 Aug 2018 11:55:05 GMT
> [...snip...]
> Strict-Transport-Security: max-age=5256000
> [...snip...]

Can we re-open this, since the HSTS preload list still isn't updating?
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Target Milestone: mozilla63 → ---
> > $ openssl s_client -connect kiwiirc.com:443 -crlf                                                                                                                                                                                                                                                                  > [...snip...]
> > GET / HTTP/1.1
> > Host: kiwiirc.com
> > 
> > HTTP/1.1 200 OK
> > Date: Wed, 15 Aug 2018 11:55:05 GMT
> > [...snip...]
> > Strict-Transport-Security: max-age=5256000
> > [...snip...]
>

In which case it's likely due to the max age being too low. It's set to 10886400 here:

https://dxr.mozilla.org/mozilla-central/source/taskcluster/docker/periodic-updates/scripts/getHSTSPreloadList.js#28
(In reply to Simon Fraser [:sfraser] ⌚️GMT from comment #21)
> In which case it's likely due to the max age being too low. It's set to
> 10886400 here:
> 
> https://dxr.mozilla.org/mozilla-central/source/taskcluster/docker/periodic-
> updates/scripts/getHSTSPreloadList.js#28

I don't think we're even reaching that test, though. The "ERROR: exception making request" message is printed here:

https://dxr.mozilla.org/mozilla-central/rev/4e56a2f51ad739ca52046723448f3129a58f1666/taskcluster/docker/periodic-updates/scripts/getHSTSPreloadList.js#181

signifying that the HTTP request itself is failing.

Anyway, I didn't mean to single out kiwiirc.com, here. In the log :RyanVM linked, I'm seeing that same exception message for *every* domain the script checks for HSTS preloading eligibility - *all* of the HTTP requests seem to be failing.
(In reply to Michael Smith [:mismith] (non-bmo mail: mds009@eng.ucsd.edu) from comment #22)

> I don't think we're even reaching that test, though. The "ERROR: exception
> making request" message is printed here:
> 
> https://dxr.mozilla.org/mozilla-central/rev/
> 4e56a2f51ad739ca52046723448f3129a58f1666/taskcluster/docker/periodic-updates/
> scripts/getHSTSPreloadList.js#181
> 
> signifying that the HTTP request itself is failing.

Curiously, that should also have a colon after the hostname at a minimum, even if the exception is empty. What's going on there?
(In reply to Simon Fraser [:sfraser] ⌚️GMT from comment #23)

> Curiously, that should also have a colon after the hostname at a minimum,
> even if the exception is empty. What's going on there?

Ah, no, ignore that, the current tip doesn't have that part.
Testing with a smaller number of hosts, I get some errors from inside RedirectAndAuthStopper instead. It seems to be sites like siraweb.org that have a 'moved permanently' response.

JavaScript error: /home/worker/scripts/getHSTSPreloadList.js, line 151: Error: 2152398880

Adding error handlers has indicated that the errors on the full set of hosts are all timeouts. Perhaps the callbacks can't cope with the rate of host queries?
Instead of having the one script to do everything, it might to be better to split the script into two. 

The first script runs weekly or fortnightly to download the HSTS preload list and commit it into code. 
The second script runs daily or weekly to check if the domains have valid HSTS header, etc. 

Using this method makes sure that the preload list is always current with the Chromium preload list and if the second script breaks for any reason in the future it wouldn't affect the updating of the list.
(In reply to Burton from comment #26)
> Instead of having the one script to do everything, it might to be better to
> split the script into two. 
> 
> The first script runs weekly or fortnightly to download the HSTS preload
> list and commit it into code. 
> The second script runs daily or weekly to check if the domains have valid
> HSTS header, etc. 
> 
> Using this method makes sure that the preload list is always current with
> the Chromium preload list and if the second script breaks for any reason in
> the future it wouldn't affect the updating of the list.

I like the idea of splitting up the task, but I'm unsure if it's a good idea to include the contents of the preload list without checking it first. Would we not lose the value of checking it at all? 

If we split the task up, I think it'd probably be good to divide up the task of checking the hosts, to avoid overloading a particular worker or hitting dns query rate limits.
I'm out of my javascript knowledge, at this point. Any advice?
Flags: needinfo?(dkeeler)
I think we're basically causing the platform (i.e. gecko/necko) to livelock by having 50000 simultaneous XHRs. I wrote a patch to break up the requests in batches of 100.
Flags: needinfo?(dkeeler)
Prior to this patch, getHSTSPreloadList.js would queue an XHR for every preload
list candidate site. This meant that there would be ~50,000 requests in flight
simultaneously. Simply processing these requests caused them to all time out,
and no useful work was done. This patch resolves them in batches of 100 to
avoid this issue.
(In reply to [:keeler] (use needinfo) from comment #30)
> Created attachment 9002019 [details]
> bug 1479918 - batch requests in getHSTSPreloadList.js to avoid livelock
> r?sfraser
> 
> Prior to this patch, getHSTSPreloadList.js would queue an XHR for every
> preload
> list candidate site. This meant that there would be ~50,000 requests in
> flight
> simultaneously. Simply processing these requests caused them to all time out,
> and no useful work was done. This patch resolves them in batches of 100 to
> avoid this issue.

How long does the run take, now? The time limit on the task is set to 30 minutes, if we need to adjust that.
Somewhere around 4-6 hours? I forgot to time it accurately. We could potentially increase the batch size until we start seeing too many failures. In theory each batch could take up to 30 seconds, so some quick math makes it t=417/b, where b is the batch size and t is in hours (assuming 50000 domains to probe).
Ah, shame, I was hoping to not go back to the thing taking such a long time, but it can't be helped. I may adjust the task start times so that they finish around the time they have been, so far. It'll be less surprising for the people landing the changes.
We'll also *really* need to get to the bottom of the random timeout issues too then.
I'm running the patched version locally, I started 6 hours ago, it finished the HSTS probing about an hour ago; so it's still got the error in it which causes the event loop to keep spinning once all the tasks have returned.
Too late for 62 with today's merge to release.
Comment on attachment 9002019 [details]
bug 1479918 - batch requests in getHSTSPreloadList.js to avoid livelock r?sfraser

Simon Fraser [:sfraser] ⌚️GMT has approved the revision.
Attachment #9002019 - Flags: review+
Pushed by rvandermeulen@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/e0f0d5824dd8
batch requests in getHSTSPreloadList.js to avoid livelock r=sfraser
Backed out for failing eslint at builds/worker/checkouts/gecko/taskcluster/docker/periodic-updates/scripts/getHSTSPreloadList.js:260:43

Push that started the failure: https://treeherder.mozilla.org/#/jobs?repo=autoland&revision=e0f0d5824dd8aaaaf1395e569cec1806b028b12e

Failure log: https://treeherder.mozilla.org/logviewer.html#?job_id=196434638&repo=autoland&lineNumber=273

Backout: https://hg.mozilla.org/integration/autoland/rev/2d1a1f2205c5875bac3a97031b205abf4af8b610
Flags: needinfo?(dkeeler)
Pushed by ryanvm@gmail.com:
https://hg.mozilla.org/integration/autoland/rev/9c7178279949
Batch requests in getHSTSPreloadList.js to avoid livelock. r=sfraser
Trivial issue, I went ahead and fixed and re-pushed.
Flags: needinfo?(dkeeler)
https://hg.mozilla.org/mozilla-central/rev/9c7178279949
Status: REOPENED → RESOLVED
Closed: Last yearLast year
Resolution: --- → FIXED
Target Milestone: --- → mozilla63
Dana, it looks like the ESR60 run mostly just removed a bunch of entries and I see differences between the changeset that landed on autoland in comment 43 vs. the one for esr60 in comment 44. Any ideas why that might be?
Flags: needinfo?(dkeeler)
Bug 1475647 removed nsISSLStatusProvider in 63, hence changing the QueryInterface to nsITransportSecurityInfo. we need to keep using nsISSLStatusProvider in the uplifted version, though.
Flags: needinfo?(dkeeler)
Attachment #9005261 - Flags: review+
You need to log in before you can comment on or make changes to this bug.