Closed Bug 1093196 Opened 10 years ago Closed 7 years ago

reschedule HSTS and HPKP automatic updates to run daily, and be visible on treeherder

Tracking

(Not tracked)

Status:

RESOLVED FIXED

People

(Reporter: keeler, Assigned: aobreja)

References

Details

Attachments

(3 files)

bug1093196_b_custom_notifications.patch 8 years ago Andrei Obreja [:aobreja NOT AVAILABLE][:buildduty] 1.77 KB, patch	kmoir : review+ aobreja : checked-in+	Details \| Diff \| Splinter Review
bug1093196_treeherder.patch 8 years ago Andrei Obreja [:aobreja NOT AVAILABLE][:buildduty] 1.79 KB, patch		Details \| Diff \| Splinter Review
periodic_file_update job.PNG 8 years ago Andrei Obreja [:aobreja NOT AVAILABLE][:buildduty] 116.77 KB, image/png		Details

Dana Keeler (she/her) (use needinfo) [:keeler]

Reporter

Description

•

10 years ago

The HSTS and HPKP automatic update scripts aren't quite fool-proof yet (see e.g. bug 1092606), and they occasionally break the build. It's particularly a bummer when this happens since they're currently scheduled to happen on Saturdays, when nobody is around. Let's re-schedule for something like Thursday mornings.

Phil Ringnalda (:philor)

Comment 1

•

10 years ago

What do you want to do, among things that are possible, when-not-if it loses a push race? The reason they run Saturday morning is because nobody wants to write the code to deal with someone else pushing between the time the updater pulls and when it pushes.

Dana Keeler (she/her) (use needinfo) [:keeler]

Reporter

Comment 2

•

10 years ago

The scripts update files that only they touch (i.e. normally no human-initiated commit touches those files). If there's a push between the time the scripts check out and when they check in, in the majority of cases an automatic merge should be successful. If not, they can abandon the attempted changes and send an email that they failed or something.

Chris AtLee [:catlee]

Comment 3

•

8 years ago

Can we run these every day instead?

Blocks: 1303847

:Cykesiopka

Comment 4

•

8 years ago

keeler: See comment 3.

Flags: needinfo?(dkeeler)

Dana Keeler (she/her) (use needinfo) [:keeler]

Reporter

Comment 5

•

8 years ago

Sure, we could (note that the HSTS updater takes on the order of an hour (the HPKP updater is faster)).

Flags: needinfo?(dkeeler)

Chris AtLee [:catlee]

Comment 6

•

8 years ago

OOC, why do they check in this error log? https://dxr.mozilla.org/mozilla-central/source/security/manager/ssl/StaticHPKPins.errors Would those be better as logs in the Treeherder job?

Dana Keeler (she/her) (use needinfo) [:keeler]

Reporter

Comment 7

•

8 years ago

Yeah - as long as they're accessible somewhere, I don't think we need to check (either of) the error logs in.

Chris AtLee [:catlee]

Updated

•

8 years ago

Summary: reschedule HSTS and HPKP automatic updates for Thursday mornings (PST) or something → reschedule HSTS and HPKP automatic updates to run daily, and be visible on treeherder

Kim Moir [:kmoir] ET

Updated

•

8 years ago

Component: General Automation → Buildduty

QA Contact: catlee → bugspam.Callek

Phil Ringnalda (:philor)

Comment 8

•

8 years ago

That's a bit of a problem with our plans for autoland, in that we want to never have merges when we "merge" autoland to m-c, which requires that there not be anything on m-c which isn't on autoland below the merge point, so having this land on m-c every day would require that it happen at a time when a sheriff is available to merge it to autoland, and then autoland couldn't be merged back until after that push, or whatever push above it has backed out everything busted, had finished PGO builds. We could half-ass around it by just having actual merges from autoland for a while, until the ocean boils and there are hardly any pushes going to mozilla-inbound and this can be switched to push there without much fear of push races, but ideal would be either teaching this how to deal with push races or even prettier, teach it to do whatever it would take to let autoland do its landing for it.

Andrei Obreja [:aobreja NOT AVAILABLE][:buildduty]

Assignee

Comment 9

•

8 years ago

Could you tell me please on which server I could see these logs that are been generating on each Saturday ?

Flags: needinfo?(dkeeler)

Dana Keeler (she/her) (use needinfo) [:keeler]

Reporter

Comment 10

•

8 years ago

https://archive.mozilla.org/pub/firefox/tinderbox-builds/mozilla-central-linux64/ https://archive.mozilla.org/pub/firefox/tinderbox-builds/mozilla-aurora-linux64/ https://archive.mozilla.org/pub/firefox/tinderbox-builds/mozilla-esr45-linux64/ (they're at the bottom - search for "periodicupdate")

Flags: needinfo?(dkeeler)

Kim Moir [:kmoir] ET

Comment 11

•

8 years ago

Andrei, you asked earlier in the day re updating treeherder so these jobs appear I think you need to write a patch to include these jobs, here github.com:mozilla/treeherder-service.git treeherder/etl/buildbot.py and update the tests as well tests/etl/test_buildbot.py you can probably ask questions in #treeherder if you need more details

Andrei Obreja [:aobreja NOT AVAILABLE][:buildduty]

Assignee

Updated

•

8 years ago

Assignee: nobody → aobreja

Andrei Obreja [:aobreja NOT AVAILABLE][:buildduty]

Assignee

Comment 12

•

8 years ago

Attached patch bug1093196_b_custom_notifications.patch — Details — Splinter Review

The patch for buildbotcustom repository,it should reschedule HSTS and HPKP automatic updates to run daily at 3:02 AM.

Attachment #8798857 - Flags: review?(kmoir)

Andrei Obreja [:aobreja NOT AVAILABLE][:buildduty]

Assignee

Comment 13

•

8 years ago

Attached patch bug1093196_treeherder.patch — Details — Splinter Review

Patch that should make these jobs visible on treeherder. Thanks Kim for the hint.

Attachment #8798858 - Flags: feedback?(kmoir)

Phil Ringnalda (:philor)

Comment 14

•

8 years ago

That's not going to make it visible on treeherder, because just like tbpl before it what treeherder really is is "a list of pushes, and the jobs that are pending/running/finished on them" rather than "a list of jobs, some of them associated with pushes." Unlike pretty much everything else we run (release tagging in the non-release-promotion world is the only other direct parallel), the periodic update job doesn't start out with a revision that it ran on, it instead either creates a revision if it succeeds or doesn't create one if it fails in any way. Even if you altered the script to capture the revision it creates by pushing, and altered the job to suddenly be run on that revision rather than on no revision as the final step (which I don't have any idea about the feasability of doing), you're still only going to be able to usefully make it visible on treeherder when it succeeds, and not when it fails, since when it fails there's no revision where it makes the tiniest bit of sense to display it.

Kim Moir [:kmoir] ET

Comment 15

•

8 years ago

Comment on attachment 8798858 [details] [diff] [review] bug1093196_treeherder.patch Usually I get someone from the treeherder team to review these requests. (edmorley) Also, usually I create a pull request against their github repo which allows you to run the tests and see if they pass.

Attachment #8798858 - Flags: feedback?(kmoir)

Kim Moir [:kmoir] ET

Updated

•

8 years ago

Attachment #8798857 - Flags: review?(kmoir) → review+

Andrei Obreja [:aobreja NOT AVAILABLE][:buildduty]

Assignee

Updated

•

8 years ago

Attachment #8798857 - Flags: checked-in+

Nick Thomas [:nthomas] (UTC+12)

Comment 16

•

8 years ago

Comment on attachment 8798857 [details] [diff] [review] bug1093196_b_custom_notifications.patch http://hg.mozilla.org/build/buildbotcustom/rev/3fbd15421d2a to remove the accidental commit of misc.py.orig in 3d94b2506858, http://hg.mozilla.org/build/buildbotcustom/rev/c3eb75097fe3 to merge to production.

Ryan VanderMeulen [:RyanVM]

Comment 17

•

8 years ago

Note that this still doesn't solve the fundamental problem that caused the recent public stir - namely that changes to our release cycle have invalidated the assumption that not doing these updates on Beta is OK because we release often enough that users will get a newer version before we run out of time. Running every day on Trunk/Aurora isn't going to magically do anything to change the fact that we can go 7+ weeks without an update after we go to Beta. And that we throttle updates to release users for multiple weeks sometimes. We still need a better story for not cutting it so close with the expiration date if we want to avoid ever getting stuck in this situation again.

Ryan VanderMeulen [:RyanVM]

Comment 18

•

8 years ago

(And note that I already had to do a manual update to the expiration time for Fx50 to avoid the same problem happening again months after the last time)

Andrei Obreja [:aobreja NOT AVAILABLE][:buildduty]

Assignee

Comment 19

•

8 years ago

Attached image periodic_file_update job.PNG — Details

By checking the latest logs from (1) I found that we don't have a revision number which should be set in set_script_properties (see (2) on step 8). The script_repo_revision is getting a value here (2) in step 6 (get_script_repo_revision) but the value is not set in revision in step 8, so we don't have a value for this revision. Here we have an example of what is run on set_script_properties where we should set the revision number: (3) My guess is that we should force the revision to take the value of the script_repo_revision. (1)https://archive.mozilla.org/pub/firefox/tinderbox-builds/mozilla-aurora-linux64/ (2)http://buildbot-master72.bb.releng.usw2.mozilla.com:8001/builders/Linux%20x86-64%20mozilla-aurora%20periodic%20file%20update/builds/4 (3) http://buildbot-master72.bb.releng.usw2.mozilla.com:8001/builders/Linux%20x86-64%20mozilla-aurora%20periodic%20file%20update/builds/4/steps/set_script_properties/logs/stdio

Andrei Obreja [:aobreja NOT AVAILABLE][:buildduty]

Assignee

Comment 20

•

8 years ago

The set_script_properties step is set here (4) To set the revision number we can also use this part (5) (lines 52-58). (4)http://hg.mozilla.org/build/buildbotcustom/file/default/process/factory.py (5)http://hg/build/tools/file/tip/scripts/valgrind/valgrind.sh

Andrei Obreja [:aobreja NOT AVAILABLE][:buildduty]

Assignee

Updated

•

8 years ago

Depends on: 1317254

Kim Moir [:kmoir] ET

Comment 21

•

8 years ago

Andrei, what's the current status of this bug? Perhaps we can discuss in the standup tomorrow morning so we can get it unblocked

Flags: needinfo?(aselagea)

Alin Selagea [:aselagea]

Comment 22

•

8 years ago

I guess this was intended for Andrei, so shifting the ni request to him :)

Flags: needinfo?(aselagea) → needinfo?(aobreja)

Kim Moir [:kmoir] ET

Comment 23

•

8 years ago

I talked to Andrei about this in our standup and he said that the job runs once a day. The problem is that because the script doesn't reference a revision so it doesn't appear on treeherder.

Flags: needinfo?(aobreja)

Kim Moir [:kmoir] ET

Updated

•

8 years ago

Depends on: 1171193

Justin Wood (:Callek)

Updated

•

8 years ago

Comment 24

•

7 years ago

Note explaining the priority level: P5 doesn't mean we've lowered the priority, but the contrary. However, we're aligning these levels to the buildduty quarterly deliverables, where P1-P3 are taken by our daily waterline KTLO operational tasks.

Priority: -- → P5

Mihai Tabara [:mtabara]⌚️GMT

Comment 25

•

7 years ago

Can we close this now that this work has landed?

Flags: needinfo?(bugspam.Callek)

Justin Wood (:Callek)

Comment 26

•

7 years ago

Yes, this was fixed by scheduling via taskcluster (so the buildbot job has a revision to work from). Bug 1402457

Status: NEW → RESOLVED

Closed: 7 years ago

Flags: needinfo?(bugspam.Callek)

Resolution: --- → FIXED

Mihai Tabara [:mtabara]⌚️GMT

Comment 27

•

7 years ago

(In reply to Justin Wood (:Callek) from comment #26) > Yes, this was fixed by scheduling via taskcluster (so the buildbot job has a > revision to work from). Bug 1402457 Awesome, thanks!

BMO Automation

Updated

•

7 years ago

Product: Release Engineering → Infrastructure & Operations

BMO Automation

Updated

•

5 years ago

Product: Infrastructure & Operations → Infrastructure & Operations Graveyard

You need to log in before you can comment on or make changes to this bug.