Closed Bug 836097 Opened 11 years ago Closed 11 years ago

automate updating the in-tree HSTS preload list

Categories

(Release Engineering :: General, defect)

defect
Not set
normal

Tracking

(firefox22+ fixed, firefox23+ fixed)

RESOLVED FIXED
Tracking Status
firefox22 + fixed
firefox23 + fixed

People

(Reporter: keeler, Assigned: coop)

References

Details

Attachments

(5 files)

Spoke with coop about this over email. For anyone coming across this without context, here's what I'd like to happen:

1. Once a week or so, the script at security/manager/tools/getHSTSPreloadList.js gets run automatically
2. Maybe a human looks at the output (nsSTSPreloadList.inc and nsSTSPreloadList.errors) until we're confident it's doing the right thing[0]
3. The output gets checked in as security/manager/boot/src/nsSTSPreloadList.{inc,errors} to mozilla-central

Thanks!

[0] So, for instance, if there's a transient network failure, we shouldn't drop sites off the list.
Just to be more specific, here's what I've been doing to run the script:

1. [build firefox]
2. cd [source root]/security/manager/boot/src
3. LD_LIBRARY_PATH=[path to built libraries] [path to built binaries]/xpcshell ../../tools/getHSTSPreloadList.js

This creates nsSTSPreloadList.inc and nsSTSPreloadList.errors in the current working directory, which happens to be security/manager/boot/src, which is where we want them.
We have stuff that updates the in-tree blocklist already. Can probably just hook into that automation.
In lieu of an automatic update (if that ends up tricky to do), adding this as a manual pre-uplift (or post-uplift?)  step might be a fine temporary solution.
As Reed says, I'm using the existing blocklist update script as a template.

I would prefer *not* to need to build xpcshell as a prerequisite for the HSTS update. This introduces a whole other suite of build failures, etc that could prevent the preload list from being updated when it needs to be, or, on the flip-side, require an hour-long build only to find out the are no updates to the preload list anyway.

My alternate method right now (which seems to work) is downloading both the firefox browser and tests package, unpacking both appropriately, and then running the bundled xpcshell against a copy of getHSTSPreloadList.js downloaded directly from hgweb. This allows me to generate new versions of nsSTSPreloadList.errors and nsSTSPreloadList.inc that can be compared against existing in-tree versions, also downloaded directly from hgweb. All this happens in <5min without any build or even needing to clone m-c. At that point, we can clone m-c only if there *are* changes that need to land.

Is my approach valid, or am I missing something from the original process you outlined?
Status: NEW → ASSIGNED
(In reply to Chris Cooper [:coop] from comment #4)
> I would prefer *not* to need to build xpcshell as a prerequisite for the
> HSTS update.

Yeah - that would be an unfortunate requirement if it weren't already available. I originally wrote the script in python, but we decided we needed the HTTPS verification to be as close as possible to the currently shipping firefox (and as I understand it, python's default is to do no verification).

> My alternate method right now (which seems to work) is downloading both the
> firefox browser and tests package, unpacking both appropriately, and then
> running the bundled xpcshell against a copy of getHSTSPreloadList.js
> downloaded directly from hgweb. This allows me to generate new versions of
> nsSTSPreloadList.errors and nsSTSPreloadList.inc that can be compared
> against existing in-tree versions, also downloaded directly from hgweb. All
> this happens in <5min without any build or even needing to clone m-c. At
> that point, we can clone m-c only if there *are* changes that need to land.
> 
> Is my approach valid, or am I missing something from the original process
> you outlined?

Well, as long as this process taking on the order of 5 minutes is okay, this should work great. I guess there's the question of which firefox you're downloading - release? nightly? Nightly would probably be best (except if we ever break something in it...) On that note, it would be great if this process would stop and alert someone if the list changed "too much" from one iteration to the next.

Thanks!
(In reply to David Keeler (:keeler) from comment #5)
> Yeah - that would be an unfortunate requirement if it weren't already
> available. I originally wrote the script in python, but we decided we needed
> the HTTPS verification to be as close as possible to the currently shipping
> firefox (and as I understand it, python's default is to do no verification).

That's fair. My understanding is that https resolution in python is a bit of a joke, and I'm not sure we have the libs on the buildfarm to use it anyway.

> Well, as long as this process taking on the order of 5 minutes is okay, this
> should work great. I guess there's the question of which firefox you're
> downloading - release? nightly? Nightly would probably be best (except if we
> ever break something in it...) On that note, it would be great if this
> process would stop and alert someone if the list changed "too much" from one
> iteration to the next.

5 minutes is definitely better than (possibly) an hour just for the build. If there are changes to commit, the clone can take a while, but there are already some speedups baked into the blocklist script that I'm also using here.

Since we're only going to be updating m-c, I'm pulling my build+tests packages from latest-mozilla-central on stage. If we start also updating other branches, I would use their corresponding packages.

I have the script working. I'll post a commit-ready diff that it generated shortly.
(In reply to David Keeler (:keeler) from comment #0)
> 2. Maybe a human looks at the output (nsSTSPreloadList.inc and
> nsSTSPreloadList.errors) until we're confident it's doing the right thing[0]
> 3. The output gets checked in as
> security/manager/boot/src/nsSTSPreloadList.{inc,errors} to mozilla-central

I think this is reasonable. However, I think we should make one modification: It is OK to remove entries automatically if we got a valid HTTP response from the server that does not include the HSTS header. However, if we didn't get a valid HTTP response from the server then we shouldn't remove the entry automatically; instead we should leave that origin's entry unchanged. This way, transient network failures will not result in entries getting dropped from the list.

In particular, I am worried about the idea that some site will be dropped from the preload list due to a networking error and nobody will notice before the release to fix it.
(In reply to Brian Smith (:bsmith) from comment #7) 
> I think this is reasonable. However, I think we should make one
> modification: It is OK to remove entries automatically if we got a valid
> HTTP response from the server that does not include the HSTS header.
> However, if we didn't get a valid HTTP response from the server then we
> shouldn't remove the entry automatically; instead we should leave that
> origin's entry unchanged. This way, transient network failures will not
> result in entries getting dropped from the list.
> 
> In particular, I am worried about the idea that some site will be dropped
> from the preload list due to a networking error and nobody will notice
> before the release to fix it.

Those sound like changes that should be baked into getHSTSPreloadList.js.

I've attached the diff between the current preload list and the one generated by my script. If the diff looks good, I can plug the script into automation pretty quickly.

I would default to running the script once a week on the same schedule as the blocklist update (late night on weekend). Is that OK?
Attachment #719955 - Flags: review?(dkeeler)
Comment on attachment 719955 [details]
Updated HSTS preload lists generated by getHSTSPreloadList.js

Removals (verified by hand):
alpha.irccloud.com - 301 redirect w/o STS header
crypto.is - no STS header
factor.cc - can't connect
surfeasy.com - no STS header
www.surfeasy.com - no STS header

Changes (verified by hand):
crypto.cat - no longer indicates includeSubdomains
cloudsecurityalliance.org - now indicates includeSubdomains

I'm confident the script got the additions right.
Looks good to me!
Attachment #719955 - Flags: review?(dkeeler) → review+
(In reply to David Keeler (:keeler) from comment #0)
> 1. Once a week or so, the script at
> security/manager/tools/getHSTSPreloadList.js gets run automatically
> 2. Maybe a human looks at the output (nsSTSPreloadList.inc and
> nsSTSPreloadList.errors) until we're confident it's doing the right thing[0]
> 3. The output gets checked in as
> security/manager/boot/src/nsSTSPreloadList.{inc,errors} to mozilla-central

We should also check the output into mozilla-aurora at least (and possibly also mozilla-beta).  Otherwise the list will typically be 12-13 weeks old by the time it ships to the release channel.  It would expire 5-6 weeks after that, which means it would expire before we unthrottle updates to the next Firefox release.  (This already happened to Firefox 19, which shipped with a preload list that expired before Firefox 20 was released.)
Here's an example run of all the pieces in staging:

http://dev-master01.build.scl1.mozilla.com:8044/builders/Linux%20x86-64%20mozilla-central%20hsts%20preload%20update

I only had the final push to hg commented out.

Note: this script is *very* similar to the blocklist update script. It's probably worth filing a follow-up bug to factor out the similar bits, both in the tools repo and for the builder creation in buildbotcustom. Possibly coupled with a re-write in mozharness?
Attachment #720315 - Flags: review?(armenzg)
Comment on attachment 720311 [details] [diff] [review]
Script for updating the HSTS preload list

Review of attachment 720311 [details] [diff] [review]:
-----------------------------------------------------------------

::: scripts/hsts/update_hsts_preload_list.sh
@@ +67,5 @@
> +    USAGE
> +    exit 1
> +fi
> +
> +HGREPO="http://${HGHOST}/${BRANCH}"

http://

@@ +80,5 @@
> +{
> +    echo "INFO: Retrieving current version from hg..."
> +    VERSION_URL_HG="${HGREPO}/raw-file/default/${APP_DIR}/config/version.txt"
> +    rm -f version.txt
> +    ${WGET} --no-check-certificate -O version.txt ${VERSION_URL_HG}

with --no-check-certificate ?

Seems like you could just download the original over ssh:// to begin with...
(In reply to Reed Loden [:reed] from comment #14) 
> with --no-check-certificate ?

Does http://hg.m.o not redirect to https:// automatically? If that's true, then yes, the --no-check-certificate is redundant.
 
> Seems like you could just download the original over ssh:// to begin with...

Not sure what you're suggesting here...if you're suggesting cloning via ssh:// at the outset, we're trying to avoid cloning the entire repo just to get 3 small files. Am I missing something here?
(In reply to Chris Cooper [:coop] from comment #15)
> (In reply to Reed Loden [:reed] from comment #14) 
> > with --no-check-certificate ?
> 
> Does http://hg.m.o not redirect to https:// automatically? If that's true,
> then yes, the --no-check-certificate is redundant.

It would be strange for this feature to be 100% about checking certificates very strictly and yet the process for building the feature doesn't require checking certificates at all.

We should never rely on http://hg.m.o redirecting to https:// because that redirect itself would be insecure.

If you'd like, I can help you configure Mercurial to validate the certificates correctly, and with the minimal risk of us trusting the wrong certs. We should be doing this everywhere we have an automatic checkout of the source code as part of the build.
(In reply to Chris Cooper [:coop] from comment #8)
> Created attachment 719955 [details]
> Updated HSTS preload lists generated by getHSTSPreloadList.js
> 
> (In reply to Brian Smith (:bsmith) from comment #7) 
> > I think this is reasonable. However, I think we should make one
> > modification: It is OK to remove entries automatically if we got a valid
> > HTTP response from the server that does not include the HSTS header.
> > However, if we didn't get a valid HTTP response from the server then we
> > shouldn't remove the entry automatically; instead we should leave that
> > origin's entry unchanged. This way, transient network failures will not
> > result in entries getting dropped from the list.
> > 
> > In particular, I am worried about the idea that some site will be dropped
> > from the preload list due to a networking error and nobody will notice
> > before the release to fix it.
> 
> Those sound like changes that should be baked into getHSTSPreloadList.js.

I agree. My point is that that needs to be done before we deploy this feature. The benefit of the requirement of manual intervention is that we never have to worry about accidentally dropping a site off the list because of a transient failure. The fact that we let the list expire prior to releases is good evidence that we should expect some failures of oversight here.

> I would default to running the script once a week on the same schedule as
> the blocklist update (late night on weekend). Is that OK?

I think that is OK. However, I think it should be run against mozilla-central AND mozilla-aurora.

(In reply to Matt Brubeck (:mbrubeck) from comment #10)
> We should also check the output into mozilla-aurora at least (and possibly
> also mozilla-beta).  Otherwise the list will typically be 12-13 weeks old by
> the time it ships to the release channel.  It would expire 5-6 weeks after
> that, which means it would expire before we unthrottle updates to the next
> Firefox release.  (This already happened to Firefox 19, which shipped with a
> preload list that expired before Firefox 20 was released.)

In another bug, you also mentioned that the 18 week cutoff causes these kinds of issues. When we chose the 18 week cutoff, we did so, IIRC, with the understanding (IIRC) that we would update the list sometime in the aurora phase. We could definitely make the cutoff longer. However, I think 18 weeks from the aurora->beta branch point is a good compromise between freshness, minimizing DoS potential due to lack of updates, and leeway for abnormalities in our release process (e.g. if we delay a release for a week or two due to some external events like holidays).
(In reply to Brian Smith (:bsmith) from comment #16) 
> It would be strange for this feature to be 100% about checking certificates
> very strictly and yet the process for building the feature doesn't require
> checking certificates at all.
> 
> We should never rely on http://hg.m.o redirecting to https:// because that
> redirect itself would be insecure.
> 
> If you'd like, I can help you configure Mercurial to validate the
> certificates correctly, and with the minimal risk of us trusting the wrong
> certs. We should be doing this everywhere we have an automatic checkout of
> the source code as part of the build.

Agreed that a security-related process is only as secure as its least-secure link.

In production, this script will be pulling from one of the internal hg mirrors, but I'm not sure whether these mirrors are setup to serve over https. I'll need to check that.

In the fallback case, I can make sure we clone via ssh:// rather than http://.

(In reply to Brian Smith (:bsmith) from comment #17) 
> I agree. My point is that that needs to be done before we deploy this
> feature. The benefit of the requirement of manual intervention is that we
> never have to worry about accidentally dropping a site off the list because
> of a transient failure. The fact that we let the list expire prior to
> releases is good evidence that we should expect some failures of oversight
> here.

If we don't trust the script yet, then we shouldn't be automating it.

What's the desired interim outcome here: to have the script spit out the diff of what it _would have_ committed until we're confident in the output? I can set the 
dryrun flag to script to start if so desired.

> I think that is OK. However, I think it should be run against
> mozilla-central AND mozilla-aurora.

Sure, that's easy to do.
 
> In another bug, you also mentioned that the 18 week cutoff causes these
> kinds of issues. When we chose the 18 week cutoff, we did so, IIRC, with the
> understanding (IIRC) that we would update the list sometime in the aurora
> phase. We could definitely make the cutoff longer. However, I think 18 weeks
> from the aurora->beta branch point is a good compromise between freshness,
> minimizing DoS potential due to lack of updates, and leeway for
> abnormalities in our release process (e.g. if we delay a release for a week
> or two due to some external events like holidays).

We can always run the script by hand against mozilla-beta (or mozilla-release) if we need to make an exception.
Comment on attachment 720311 [details] [diff] [review]
Script for updating the HSTS preload list

Review of attachment 720311 [details] [diff] [review]:
-----------------------------------------------------------------

r=me - As long as the other concerns are addressed.

::: scripts/hsts/update_hsts_preload_list.sh
@@ +88,5 @@
> +        return ${WGET_STATUS}
> +    fi
> +    VERSION=`cat version.txt | sed 's/[^.0-9]*$//'`
> +    if [ "${VERSION}" == "" ]; then
> +        echo "ERROR: Unable to parse version from version.txt"	

nit: extra whitespace at the end.

@@ +135,5 @@
> +    # The created files should be non-empty.
> +    echo "INFO: Checking whether new HSTS preload list is valid..."
> +    if [ ! -s "${PRELOAD_ERRORS}" ]; then
> +        echo "New HSTS preload error list is empty. I guess that's good?"
> +    fi 

Nit: white space at end of line.

@@ +236,5 @@
> +result=$?
> +if [ ${result} != 0 ]; then
> +    if [ "${DRY_RUN}" == "true" ]; then
> +        echo "INFO: HSTS preload lists differ, but not updating hg in dry-run mode."
> +    else   

Nit: white space.
Attachment #720311 - Flags: review?(armenzg) → review+
Attachment #720313 - Flags: review?(armenzg) → review+
Attachment #720315 - Flags: review?(armenzg) → review+
Attachment #720311 - Flags: checked-in+
Comment on attachment 720313 [details] [diff] [review]
Enable HSTS preload list updating on m-c (buildbot-configs)

https://hg.mozilla.org/build/buildbot-configs/rev/1b9d77a2ff11
Attachment #720313 - Flags: checked-in+
Comment on attachment 720315 [details] [diff] [review]
Add HSTS update builder when enabled (buildbotcustom)

https://hg.mozilla.org/build/buildbotcustom/rev/0c5934341f32
Attachment #720315 - Flags: checked-in+
The releng-side patches are landed now and will go live with our next reconfig. 

The builder will run against mozilla-central and mozilla-aurora, but the |hg push| step of the script is currently commented out until bug 847621 is fixed. Once we're happy with the output from the getHSTSPreloadList.js script, we can uncomment the push step and have things just start working, i.e. no reconfig required.
Merged and reconfiguration completed.
From: root@cruncher.srv.releng.scl3.mozilla.com (Cron Daemon)
To: release@mozilla.com
Subject: Cron <buildduty@cruncher> /home/buildduty/buildfaster/bin/buildfaster_report.sh
Date: Mon, 18 Mar 2013 06:17:25 -0700 (PDT)

Linux x86-64 mozilla-central hsts preload update
Traceback (most recent call last):
  File "/home/buildduty/buildfaster/buildfaster_report.py", line 322, in <module>
    assert False, props['buildername']
AssertionError: Linux x86-64 mozilla-central hsts preload update
(In reply to Nick Thomas [:nthomas] from comment #25)

I'm going to guess this happened because the source file in the Chrome tree changed location. What happened as a result of this failure? Did it cause other failures, or is this just a local warning saying, "we couldn't complete this step"?
Flags: needinfo?(nthomas)
(In reply to Nick Thomas [:nthomas] from comment #25)
> Linux x86-64 mozilla-central hsts preload update
> Traceback (most recent call last):
>   File "/home/buildduty/buildfaster/buildfaster_report.py", line 322, in
> <module>
>     assert False, props['buildername']
> AssertionError: Linux x86-64 mozilla-central hsts preload update

This is on our side. I'll fix the script.
(In reply to Chris Cooper [:coop] from comment #27)
> This is on our side. I'll fix the script.

https://hg.mozilla.org/build/braindump/rev/9f28d70da2cb
Flags: needinfo?(nthomas)
I recently added the absolute path to the current nsSTSPreloadList.inc: 

https://hg.mozilla.org/build/tools/rev/651880ce5cb9

I've uploaded a log of the current script output here:

https://people.mozilla.com/~coop/hsts_script_output.txt

It's currently failing on this:

INFO: Running "LD_LIBRARY_PATH=. ./xpcshell ../getHSTSPreloadList.js /builds/slave/m-cen-l64-hsts-000000000000000/firefox/nsSTSPreloadList.inc"
../getHSTSPreloadList.js:297: NS_ERROR_FILE_NOT_FOUND: Component returned failure code: 0x80520012 (NS_ERROR_FILE_NOT_FOUND) [nsIFileInputStream.init]

This script used to work, so this looks like a new dependency has been introduced.
From what I can see, update_hsts_preload_list.sh is doing this:

132     rm -rf ${PRELOAD_ERRORS} ${PRELOAD_INC}
133     echo INFO: Running \"LD_LIBRARY_PATH=. ./xpcshell ../${PRELOAD_SCRIPT} ${PWD}/${PRELOAD_INC}\"
134     LD_LIBRARY_PATH=. ./xpcshell ../${PRELOAD_SCRIPT} ${PWD}/${PRELOAD_INC}

which removes the current preload list before the script can read it. Line 132 shouldn't be necessary, given that opening an existing file by default truncates it (although that should probably be made explicit in the script in the future).
Nominating for tracking-firefox22 and 23.  The current list expires just four weeks after the Firefox 22 release date.  HSTS preloads will be disabled for release channel users at that time if we don't get an updated list into Firefox 22 (currently mozilla-aurora).

If the automated script is enabled for aurora before the next merge (May 13), then we're set; otherwise we need to push a manual update for Firefox 22.
Here's the output from the now-working script. The actual push step is still commented out.

If you're happy with the output, I can enable this on m-c and aurora.
Awesome! Looks like it's doing exactly what we want it to, so go ahead and enable it. Also, if I could make one more request: it looks like it's not diffing nsSTSPreloadList.errors - if it could do that too, that would be great. Thanks!
Attachment #742435 - Flags: review?(armenzg) → review+
Comment on attachment 742435 [details] [diff] [review]
Diff the HSTS error log and re-enable the final hg push

Review of attachment 742435 [details] [diff] [review]:
-----------------------------------------------------------------

https://hg.mozilla.org/build/tools/rev/5aaa1bc1053e
Attachment #742435 - Flags: checked-in+
Tracking - please update the status flags for 22 & 23 to 'fixed' when this is enabled on m-c and aurora.
(In reply to Matt Brubeck (:mbrubeck) from comment #40)
> Those are not nsSTSPreloadList.inc updates.

Oops, my bad. I was searching by the user that runs the scripts only.

Looks like the aurora update is now failing to find a chromium resource that the script needs:

INFO: Generating new HSTS preload list...
INFO: Running "LD_LIBRARY_PATH=. ./xpcshell /builds/slave/m-aurora-l64-hsts-000000000000/getHSTSPreloadList.js /builds/slave/m-aurora-l64-hsts-000000000000/nsSTSPreloadList.inc"
uncaught exception: ERROR: problem downloading 'https://src.chromium.org/viewvc/chrome/trunk/src/net/base/transport_security_state_static.json': status 404

Is this a known issue? Transitory?
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
d'oh - I need to uplift the patch in bug 847621 to aurora.
Ok - patch uplifted. This should work on aurora now.
https://tbpl.mozilla.org/?tree=Mozilla-Aurora&rev=487982220b4e
Status: REOPENED → RESOLVED
Closed: 11 years ago11 years ago
Resolution: --- → FIXED
(In reply to Chris Cooper [:coop] from comment #8)
> I would default to running the script once a week on the same schedule as
> the blocklist update (late night on weekend). Is that OK?

Only if you don't actually care whether or not either this or the blocklist update happens (and, secondarily, don't care how much the two updates randomly coalesce builds and tests in a random amount and a random direction).

At 03:02 Saturday, we start both a blocklist update, which if it needs to update will clone the tree, patch it, and try to commit with no fallback if it hits a push race, and also an HSTS update, which if it needs to update will clone the tree, patch it, and try to commit with no fallback if it hits a push race.

This week, the two pushes were separated by 6:09 on mozilla-central, and by 2:20 on mozilla-aurora, so they both succeeded, but that's close enough that there's really no reason at all to believe that both, or either, will succeed on either tree on any given week.
(In reply to Phil Ringnalda (:philor) from comment #45) 
> Only if you don't actually care whether or not either this or the blocklist
> update happens (and, secondarily, don't care how much the two updates
> randomly coalesce builds and tests in a random amount and a random
> direction).
> 
> At 03:02 Saturday, we start both a blocklist update, which if it needs to
> update will clone the tree, patch it, and try to commit with no fallback if
> it hits a push race, and also an HSTS update, which if it needs to update
> will clone the tree, patch it, and try to commit with no fallback if it hits
> a push race.
> 
> This week, the two pushes were separated by 6:09 on mozilla-central, and by
> 2:20 on mozilla-aurora, so they both succeeded, but that's close enough that
> there's really no reason at all to believe that both, or either, will
> succeed on either tree on any given week.

They're both running off the same weekly scheduler which is going to fire jobs off at exactly the same time.

What's the solution here?

* add an optional offset to the weekly scheduler somehow
* chain the two processes together somehow, but not make them dependent on each other, i.e. have one weekly updates job that runs both (and in the future, possibly more) scripts <- my preferred option
* add a secondary pull+update to each process after the initial clone. The clone is going to be the long pole of this process, so a secondary pull+update would grab any interim changes from the other builder. Still a potential race here though.
Putting all our eggs in a single weekend_update.py script makes the most sense to me. I did think of making one of them busy-wait for 10 minutes to give the other some room, but that thought went away after I punched myself in the head for a while.
(In reply to Phil Ringnalda (:philor) from comment #47)
> Putting all our eggs in a single weekend_update.py script makes the most
> sense to me. I did think of making one of them busy-wait for 10 minutes to
> give the other some room, but that thought went away after I punched myself
> in the head for a while.

Filed bug 869051.
Depends on: 883582
Depends on: 903762
Product: mozilla.org → Release Engineering
Component: General Automation → General
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: