Migrate HSTS/HPKP/blocklist updates away from buildbot

RESOLVED FIXED

Status

enhancement
RESOLVED FIXED
Last year
11 months ago

People

(Reporter: sfraser, Assigned: sfraser)

Tracking

(Blocks 1 bug)

unspecified
Dependency tree / graph

Firefox Tracking Flags

(firefox-esr60 fixed, firefox60 fixed, firefox61 fixed, firefox62 fixed)

Details

Attachments

(7 attachments, 4 obsolete attachments)

45 bytes, text/x-phabricator-request
keeler
: review+
Details | Review
45 bytes, text/x-phabricator-request
Callek
: review+
Details | Review
59 bytes, text/x-review-board-request
catlee
: review+
Details
45 bytes, text/x-phabricator-request
keeler
: review+
Details | Review
45 bytes, text/x-phabricator-request
Callek
: review+
Details | Review
45 bytes, text/x-phabricator-request
Callek
: review+
Details | Review
46 bytes, text/x-phabricator-request
jlorenzo
: review+
Details | Review
As part of the taskcluster migration we need to move this task away from buildbot, and run it in taskcluster instead.

A good place would be running inside a container.

Do we want to run the task as part of mozilla-central's CI or nightly? If running in nightly it wouldn't affect that build, but instead the next one, which maybe counter-intuitive, but it does lower the overhead of getting the task in place and finding where it runs when people need to debug it.

What shall we do with the output of this script? The current version commits and pushes to mercurial directly, but this is not considered good practice. Once phabricator's autoland feature arries we could submit the data there.

Need some alerting if it hasn't run for a few days, or the files haven't been updated for a few days.
Assignee: nobody → sfraser
Blocks: 1303847
Given a docker image that can produce these diffs, do we:

1. Run this as part of nightly, and allow some potential confusion that it will affect the next build, not the current one
2. Have a cronjob somewhere that connects to a taskcluster hook


With the output, do we
1. Wait for Phabricator's 'lando'
2. Submit to phabricator and have manual approval & landing
3. Just publish diffs and have it all be manual
4. Port over the current hg permissions
Flags: needinfo?(bugspam.Callek)
(In reply to Simon Fraser [:sfraser] ⌚️GMT from comment #2)
> Given a docker image that can produce these diffs, do we:
> 
> 1. Run this as part of nightly, and allow some potential confusion that it
> will affect the next build, not the current one

I'd say no to "as part of nightly" but I do say we should replace https://dxr.mozilla.org/mozilla-central/source/taskcluster/ci/repo-update/kind.yml#13 with this. It is run on a cron, currently.

> 2. Have a cronjob somewhere that connects to a taskcluster hook

If we were to do a different cron, I'd say we use taskclusters .cron.yml instead of any generic cron connected to a hook. But I think the answer to #1 there is all we need.

> With the output, do we
> 1. Wait for Phabricator's 'lando'
> 2. Submit to phabricator and have manual approval & landing

Both 1 and 2 require publishing to phabricator aiui.

> 3. Just publish diffs and have it all be manual

Publishing diffs and manual is a good medium-term solution, I think.  (maybe we do this in parallel to the buildbot-based hsts for short-term)

> 4. Port over the current hg permissions

I think this is a no-go unless its part of a scriptworker, ala treescript.

I have a slight preference to doing it as part of phabricator, but I'm not sure what sort of secrets are required there or how we'd go about it.

:catlee any thoughts as to the overall plan here?
Flags: needinfo?(bugspam.Callek) → needinfo?(catlee)
(In reply to Justin Wood (:Callek) from comment #3)
> (In reply to Simon Fraser [:sfraser] ⌚️GMT from comment #2)
> > Given a docker image that can produce these diffs, do we:
> > 
> > 1. Run this as part of nightly, and allow some potential confusion that it
> > will affect the next build, not the current one
> 
> I'd say no to "as part of nightly" but I do say we should replace
> https://dxr.mozilla.org/mozilla-central/source/taskcluster/ci/repo-update/
> kind.yml#13 with this. It is run on a cron, currently.
> 
> > 2. Have a cronjob somewhere that connects to a taskcluster hook
> 
> If we were to do a different cron, I'd say we use taskclusters .cron.yml
> instead of any generic cron connected to a hook. But I think the answer to
> #1 there is all we need.

The file linked in #1 doesn't seem to run anything, so I'm a bit confused there - it mentions being run out of cron


> > 3. Just publish diffs and have it all be manual
> 
> Publishing diffs and manual is a good medium-term solution, I think.  (maybe
> we do this in parallel to the buildbot-based hsts for short-term)

This is the option I like the least, as it's the largest regression and the most manual work.
In any case, we need to work out who's responsible for these manual tasks. It'll be either:

1. Review on phabricator
2. Land patch

or 
1. Download and apply diffs
2. Review changes
3. Land patch

Once a day at most. Who will be on the hook for this?

> I have a slight preference to doing it as part of phabricator, but I'm not
> sure what sort of secrets are required there or how we'd go about it.

You install a ${HOME}/.arcrc with a token, and then the command line 'Just Works', so I'm happy to automate that as long as we can get an api key that's not tied to a specific person's username
(In reply to Simon Fraser [:sfraser] ⌚️GMT from comment #4)
> (In reply to Justin Wood (:Callek) from comment #3)
> > (In reply to Simon Fraser [:sfraser] ⌚️GMT from comment #2)
> > > Given a docker image that can produce these diffs, do we:
> > > 
> > > 1. Run this as part of nightly, and allow some potential confusion that it
> > > will affect the next build, not the current one
> > 
> > I'd say no to "as part of nightly" but I do say we should replace
> > https://dxr.mozilla.org/mozilla-central/source/taskcluster/ci/repo-update/
> > kind.yml#13 with this. It is run on a cron, currently.
> > 
> > > 2. Have a cronjob somewhere that connects to a taskcluster hook
> > 
> > If we were to do a different cron, I'd say we use taskclusters .cron.yml
> > instead of any generic cron connected to a hook. But I think the answer to
> > #1 there is all we need.
> 
> The file linked in #1 doesn't seem to run anything, so I'm a bit confused
> there - it mentions being run out of cron

https://dxr.mozilla.org/mozilla-central/source/.cron.yml#113 --> https://dxr.mozilla.org/mozilla-central/source/taskcluster/taskgraph/target_tasks.py#622


> 
> > > 3. Just publish diffs and have it all be manual
> > 
> > Publishing diffs and manual is a good medium-term solution, I think.  (maybe
> > we do this in parallel to the buildbot-based hsts for short-term)
> 
> This is the option I like the least, as it's the largest regression and the
> most manual work.
> In any case, we need to work out who's responsible for these manual tasks.
> It'll be either:
> 
> 1. Review on phabricator
> 2. Land patch
> 
> or 
> 1. Download and apply diffs
> 2. Review changes
> 3. Land patch
> 
> Once a day at most. Who will be on the hook for this?

I don't disagree, but we could have ourselves be on the hook with `r=release a=release` until we unblock. We could ask sheriffs, etc.

> > I have a slight preference to doing it as part of phabricator, but I'm not
> > sure what sort of secrets are required there or how we'd go about it.
> 
> You install a ${HOME}/.arcrc with a token, and then the command line 'Just
> Works', so I'm happy to automate that as long as we can get an api key
> that's not tied to a specific person's username

Sure, but how do we want to go about managing a token in TC, we can't install directly in docker easily, we don't want arbitrary in-tree code to steal our token, (especially on try), what/who does said token need to be setup for. And how/who do we flag for review, and/or if lando is ready how does lando know this is safe to autoland for us?
I think pushing to phabricator with an API key stored in TC secrets is reasonable at this point.

We should continue to run the old job in parallel until we have a good way to get the patches landed from phabricator.
Flags: needinfo?(catlee)
Comment on attachment 8952674 [details]
Bug 1436369 Add docker entrypoint for periodic file udpates r=callek

Justin Wood (:Callek) has approved the revision.

https://phabricator.services.mozilla.com/D627
Attachment #8952674 - Flags: review+
Comment hidden (mozreview-request)
Attachment #8952702 - Attachment is obsolete: true
Attachment #8952702 - Flags: review?(catlee)
Attachment #8952703 - Attachment is obsolete: true
Attachment #8952703 - Flags: review?(dkeeler)
Comment hidden (mozreview-request)

Comment 15

Last year
mozreview-review
Comment on attachment 8952704 [details]
Bug 1436369 Add docker entrypoint for periodic file udpates

https://reviewboard.mozilla.org/r/221938/#review227848
Attachment #8952704 - Flags: review?(catlee) → review+
Comment on attachment 8952672 [details]
Bug 1436369 Migrate HSTS and HPKP update scripts to docker image r=keeler

David Keeler [:keeler] (use needinfo) has approved the revision.

https://phabricator.services.mozilla.com/D626
Attachment #8952672 - Flags: review+

Comment 17

Last year
Pushed by sfraser@mozilla.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/abbaec76d981
Docker image for periodic file updates r=catlee
https://hg.mozilla.org/integration/mozilla-inbound/rev/f594c12c6441
Migrate HSTS and HPKP update scripts to docker image r=keeler
https://hg.mozilla.org/integration/mozilla-inbound/rev/b35f149bf4ab
Add docker entrypoint for periodic file udpates r=catlee
Blocks: 1171193
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Comment on attachment 8956405 [details]
Bug 1436369 Move buildbot periodic file updates to clearer name r=callek

Justin Wood (:Callek) has approved the revision.

https://phabricator.services.mozilla.com/D681
Attachment #8956405 - Flags: review+

Comment 22

Last year
Pushed by sfraser@mozilla.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/b3b9609544ec
Move buildbot periodic file updates to clearer name r=callek
https://hg.mozilla.org/integration/mozilla-inbound/rev/370c9dbdd572
Add taskcluster periodic file updates r=callek
Backed out for Gecko decision task failure. Bug 1436369 and Bug 1436469

backout: https://hg.mozilla.org/integration/mozilla-inbound/rev/3363cb492aa149a43e196e127080b8b5bcf60684

push with failure: https://treeherder.mozilla.org/#/jobs?repo=mozilla-inbound&revision=370c9dbdd5729e944b277706fa73785a8fd6fef0

failure log: https://treeherder.mozilla.org/logviewer.html#?job_id=166231289&repo=mozilla-inbound

[task 2018-03-06T15:14:13.976791Z] Traceback (most recent call last):
[task 2018-03-06T15:14:13.976963Z]   File "/builds/worker/checkouts/gecko/taskcluster/mach_commands.py", line 169, in taskgraph_decision
[task 2018-03-06T15:14:13.977050Z]     return taskgraph.decision.taskgraph_decision(options)
[task 2018-03-06T15:14:13.977448Z]   File "/builds/worker/checkouts/gecko/taskcluster/taskgraph/decision.py", line 137, in taskgraph_decision
[task 2018-03-06T15:14:13.977520Z]     full_task_json = tgg.full_task_graph.to_json()
[task 2018-03-06T15:14:13.977633Z]   File "/builds/worker/checkouts/gecko/taskcluster/taskgraph/generator.py", line 164, in full_task_graph
[task 2018-03-06T15:14:13.977688Z]     return self._run_until('full_task_graph')
[task 2018-03-06T15:14:13.977778Z]   File "/builds/worker/checkouts/gecko/taskcluster/taskgraph/generator.py", line 320, in _run_until
[task 2018-03-06T15:14:13.977828Z]     k, v = self._run.next()
[task 2018-03-06T15:14:13.977926Z]   File "/builds/worker/checkouts/gecko/taskcluster/taskgraph/generator.py", line 244, in _run
[task 2018-03-06T15:14:13.978003Z]     new_tasks = kind.load_tasks(self.parameters, list(all_tasks.values()))
[task 2018-03-06T15:14:13.978093Z]   File "/builds/worker/checkouts/gecko/taskcluster/taskgraph/generator.py", line 73, in load_tasks
[task 2018-03-06T15:14:13.978153Z]     for task_dict in transforms(trans_config, inputs)]
[task 2018-03-06T15:14:13.978255Z]   File "/builds/worker/checkouts/gecko/taskcluster/taskgraph/transforms/task.py", line 1789, in check_run_task_caches
[task 2018-03-06T15:14:13.978293Z]     for task in tasks:
[task 2018-03-06T15:14:13.978394Z]   File "/builds/worker/checkouts/gecko/taskcluster/taskgraph/transforms/task.py", line 1735, in check_task_dependencies
[task 2018-03-06T15:14:13.978433Z]     for task in tasks:
[task 2018-03-06T15:14:13.978534Z]   File "/builds/worker/checkouts/gecko/taskcluster/taskgraph/transforms/task.py", line 1723, in check_task_identifiers
[task 2018-03-06T15:14:13.978571Z]     for task in tasks:
[task 2018-03-06T15:14:13.978666Z]   File "/builds/worker/checkouts/gecko/taskcluster/taskgraph/transforms/task.py", line 1706, in chain_of_trust
[task 2018-03-06T15:14:13.978704Z]     for task in tasks:
[task 2018-03-06T15:14:13.978795Z]   File "/builds/worker/checkouts/gecko/taskcluster/taskgraph/transforms/task.py", line 1516, in build_task
[task 2018-03-06T15:14:13.978832Z]     for task in tasks:
[task 2018-03-06T15:14:13.978928Z]   File "/builds/worker/checkouts/gecko/taskcluster/taskgraph/transforms/task.py", line 1486, in add_index_routes
[task 2018-03-06T15:14:13.978966Z]     for task in tasks:
[task 2018-03-06T15:14:13.979056Z]   File "/builds/worker/checkouts/gecko/taskcluster/taskgraph/transforms/task.py", line 1315, in validate
[task 2018-03-06T15:14:13.979093Z]     for task in tasks:
[task 2018-03-06T15:14:13.979191Z]   File "/builds/worker/checkouts/gecko/taskcluster/taskgraph/transforms/task.py", line 1303, in task_name_from_label
[task 2018-03-06T15:14:13.979228Z]     for task in tasks:
[task 2018-03-06T15:14:13.979321Z]   File "/builds/worker/checkouts/gecko/taskcluster/taskgraph/transforms/task.py", line 1264, in set_defaults
[task 2018-03-06T15:14:13.979358Z]     for task in tasks:
[task 2018-03-06T15:14:13.979680Z]   File "/builds/worker/checkouts/gecko/taskcluster/taskgraph/transforms/job/__init__.py", line 149, in make_task_description
[task 2018-03-06T15:14:13.979806Z]     assert 'implementation' not in worker
[task 2018-03-06T15:14:13.979848Z] AssertionError
[taskcluster 2018-03-06 15:14:14.348Z] === Task Finished ===
[taskcluster 2018-03-06 15:14:15.940Z] Unsuccessful task run with exit code: 1 completed in 160.396 seconds
Flags: needinfo?(sfraser)
Have got a fix, will reapply shortly.
Flags: needinfo?(sfraser)

Comment 26

Last year
bugherder
https://hg.mozilla.org/mozilla-central/rev/b3b9609544ec
https://hg.mozilla.org/mozilla-central/rev/370c9dbdd572
Status: REOPENED → RESOLVED
Closed: Last yearLast year
Resolution: --- → FIXED
Comment on attachment 8956465 [details]
Bug 1436469 Move periodic-file updates docker image r=callek

Justin Wood (:Callek) has approved the revision.

https://phabricator.services.mozilla.com/D685
Attachment #8956465 - Flags: review+

Comment 28

Last year
Pushed by sfraser@mozilla.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/6ba388ca08da
periodic file updates migration, stage 1 r=callek
Status: RESOLVED → REOPENED
Resolution: FIXED → ---

Comment 30

Last year
Pushed by sfraser@mozilla.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/514feaa0c698
periodic file updates migration, stage 1 r=callek

Comment 32

Last year
Pushed by sfraser@mozilla.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/6d0dcc642e1a
periodic file updates migration, stage 1 r=callek
Flags: needinfo?(sfraser)
https://hg.mozilla.org/mozilla-central/rev/6d0dcc642e1a
Status: REOPENED → RESOLVED
Closed: Last yearLast year
Resolution: --- → FIXED
Comment on attachment 8956400 [details]
Bug 1436369 Clean up hsts preload script r=keeler

David Keeler [:keeler] (use needinfo) has approved the revision.

https://phabricator.services.mozilla.com/D680
Attachment #8956400 - Flags: review+
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Attachment #8949036 - Attachment is obsolete: true
Attachment #8952666 - Attachment is obsolete: true
Depends on: 1453615
Depends on: 1453703
Component: General Automation → General
I'm not seeing any automated blocklist updates running on Beta/Release. Those still need to be running as they were previously on Buildbot, no? That said, the last run I see on mozilla-release was on January 9th?!
https://hg.mozilla.org/releases/mozilla-release/rev/0a0a804a5b6f
Flags: needinfo?(sfraser)
As best I could tell when I did the migration, they weren't running on beta/release. I thought the changes were just being uplifted. We can add them, though.
Flags: needinfo?(sfraser)
Does it need to include HSTS/HPKP updates when run on beta/release, or just the blocklist?
(In reply to Simon Fraser [:sfraser] ⌚️GMT from comment #36)
> As best I could tell when I did the migration, they weren't running on
> beta/release. I thought the changes were just being uplifted. We can add
> them, though.

Yeah, I honestly don't know what happened there. Offhand, I'm guessing that it wasn't an intentional change, but I could be out of the loop too.

(In reply to Simon Fraser [:sfraser] ⌚️GMT from comment #37)
> Does it need to include HSTS/HPKP updates when run on beta/release, or just
> the blocklist?

For feature parity, we would only need the blocklist updates. That said, it's been a point of contention since post-Dawn. Since the removal of the Aurora cycle, RelMan has had to manually bump the pinning expiration date late in each Beta cycle to avoid possibly expiring before there's a new stable release out.

My understanding is that the decision not to do HSTS/HPKP updates on Beta was rooted in concerns for introducing risk into the releases by doing so, but assuming that's the case, I think we should re-evaluate how true that is. For one, we still *are* doing these updates on ESR without any similar concerns, and our Beta releases certainly get more testing during the cycle than our ESR builds do. Also, it's standard practice to close Beta at the end of each cycle during RC week, which should give us some buffer to react if a pinning update were to introduce some major regression. Finally, can anybody recall the last time a pinning update broke something? I can't offhand.

That's a long answer to a short question :). Catlee, what are your thoughts on the above? And if there's others who should be consulted about this, who would they be?
Flags: needinfo?(catlee)
And to be clear, I'm totally on board with leaving the automatic HSTS/HPKP updates off on mozilla-release. My arguments above only pertain to Beta.
the blocklist and remote-settings changes need to happen on beta,
but not the hsts/hpkp updates, so we have to split out the control of what
runs by project.
Comment on attachment 8982485 [details]
Bug 1436369 Run blocklist updates on mozilla-beta r=jlorenzo

Johan Lorenzo [:jlorenzo] has approved the revision.

https://phabricator.services.mozilla.com/D1487
Attachment #8982485 - Flags: review+

Comment 42

Last year
Pushed by sfraser@mozilla.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/e942cbff9588
Run blocklist updates on mozilla-beta r=jlorenzo
(In reply to Ryan VanderMeulen [:RyanVM] from comment #38)
> (In reply to Simon Fraser [:sfraser] ⌚️GMT from comment #36)
> > As best I could tell when I did the migration, they weren't running on
> > beta/release. I thought the changes were just being uplifted. We can add
> > them, though.
> 
> Yeah, I honestly don't know what happened there. Offhand, I'm guessing that
> it wasn't an intentional change, but I could be out of the loop too.
> 
> (In reply to Simon Fraser [:sfraser] ⌚️GMT from comment #37)
> > Does it need to include HSTS/HPKP updates when run on beta/release, or just
> > the blocklist?
> 
> For feature parity, we would only need the blocklist updates. That said,
> it's been a point of contention since post-Dawn. Since the removal of the
> Aurora cycle, RelMan has had to manually bump the pinning expiration date
> late in each Beta cycle to avoid possibly expiring before there's a new
> stable release out.
> 
> My understanding is that the decision not to do HSTS/HPKP updates on Beta
> was rooted in concerns for introducing risk into the releases by doing so,
> but assuming that's the case, I think we should re-evaluate how true that
> is. For one, we still *are* doing these updates on ESR without any similar
> concerns, and our Beta releases certainly get more testing during the cycle
> than our ESR builds do. Also, it's standard practice to close Beta at the
> end of each cycle during RC week, which should give us some buffer to react
> if a pinning update were to introduce some major regression. Finally, can
> anybody recall the last time a pinning update broke something? I can't
> offhand.
> 
> That's a long answer to a short question :). Catlee, what are your thoughts
> on the above? And if there's others who should be consulted about this, who
> would they be?

At the very least we should be doing blocklist updates on beta. IIRC, the HSTS/HPKP expiry times at one time were set far enough into the future to account for the time from nightly (-> aurora) -> beta -> release. I'm not sure if that's still the case, but I also don't see any problems with bumping the expiry times on beta.

ESR was different because it's a long-lived release branch, and it wouldn't make sense to keep the pins frozen in time for more than a year.

I'm not aware of any recent test or user visible failures caused by pinning updates, but I also don't pay close attention to it. Perhaps looking at hg history to see if any of the updates have been backed out would tell us?
Flags: needinfo?(catlee)
AFAICT, no automated HPKP or HSTS update has ever been backed out for regressions in the field (going back to July of 2014 when they were first turned on). David, what are your thoughts about enabling automatic HSTS/HPKP updates on Beta during the cycle? Comment 38 covers the rationale for asking.
Flags: needinfo?(dkeeler)
I think you're right - it should be safe to enable these automatic updates on Beta.
Flags: needinfo?(dkeeler)

Comment 46

Last year
bugherder
https://hg.mozilla.org/mozilla-central/rev/e942cbff9588
Status: REOPENED → RESOLVED
Closed: Last yearLast year
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.