Closed Bug 1426661 Opened 6 years ago Closed 11 months ago

Don't build on automated HSTS preload list updates

Categories

(Release Engineering :: General, enhancement)

enhancement
Not set
normal

Tracking

(Not tracked)

RESOLVED WONTFIX

People

(Reporter: marco, Unassigned)

References

Details

There are 1-2 commits like this per day, which we build on all platforms (and we run all the tests for all of the platforms).
We could save some resources by not building for these commits, given the changes they are making are very unlikely to cause build/test failures.
We build them because they *have* caused test failures. Plus, aren't we doing relpro everywhere now, so if we DONTBUILD, we don't ship?
(In reply to Phil Ringnalda (:philor) from comment #1)
> We build them because they *have* caused test failures. Plus, aren't we
> doing relpro everywhere now, so if we DONTBUILD, we don't ship?

Do they cause platform-specific failures? Or can we restrict them to a single platform?
Are they causing failures so often that it's worth to make mozilla-central pushes that only contain those changesets? Or can we push them together with other changesets?
Pushing them with something else (which I don't know how we would do, or what the something else would be) would defeat the purpose of knowing that the failure came from them. We could push them somewhere cheaper, like autoland, but someone would have to sign up to write better push code since the choice of mozilla-central (and the time that they used to land, which used to be earlier, and the day, which used to be Saturday-only, before we went to every day to solve something that wasn't in any way solved by every day) was designed to avoid push races since the current push code's solution to racing is to say "oops, I got raced, I'll just fail." Eventually the shiny-all-autoland-future will require autoland-the-service having some way to handle accepting automated pushes like this, so there's probably already a bug about that.

I don't know of anyone with the knowledge to predict how they can currently or will next fail, to know whether we could build on one platform and what that platform would be, given how massive our platform differences for testing really are, what with things like "non-e10s is only tested on Win7 debug" (which means some insanely high percentage of devtools tests only run there), or whether there's something like a platform-specific helper extension that talos installs that we could accidentally blocklist, or something I can't imagine, like the previous bustage which sticks in my mind, that we were installing Flash even though we didn't actually want it, installing different versions on different platforms, and we wound up blocklisting the version we were installing on one platform, causing bustage in tests which didn't actually want Flash running but were surprised to find the blocked-plugin UI.

Personally, my choice would be to declare that the move to scheduling via TC and putting a "pfu" job on treeherder on whatever happens to be the tip revision at the time the job starts is a better solution to the problem we "solved" by running them every day, which was that nobody would notice for weeks when the Saturday-only job failed, since at the time failure was only signaled by someone noticing that the job didn't exist. That would mean we could go back to just running them once a week, as long as we lied to ourselves about how our sheriffs would notice a single failed job on a revision they had already completely starred, a revision that might well be three back by the time of the failure, and as long as we persuaded relman that they need to be responsible for ensuring that they either have a current update push (the change from Saturday to every day did solve the problem that relman was moving around gtb dates on esr at the time, suddenly going from having a two day old periodic update push to having a five or six day old one), or, assuming it's possible for someone to manually trigger one, responsible for triggering one before deciding to ship at times when there has been a needed blocklist update since the last periodic push.

I might be missing one, but I think those are the three constraints:

* both blocklist updates and HPKP/HSTS updates are capable of breaking tests, and every time they have it has been in an incomprehensible way, so we need to run tests on them

* the problem that we need to have a fresh HPKP/HSTS update on mozilla-central as it leaves for beta, fresh enough that it won't expire by the time that code is replaced by the next version being shipped off mozilla-release, needs to be solved, whether by someone accepting responsibility for checking it before the merge, or by adding periodic updates on beta (perhaps by getting around our fear of untested changes by uplifting one manually reviewed week-old mozilla-central update somewhere in the middle of the beta cycle), or by permanently enshrining what we're doing now, a very silly manual update of the expiration date without changing the data at some point during the beta cycle

* some blocklist updates are "whatever, whenever" and can be picked up by the way that Firefox itself periodically updates the list, but some blocklist updates are to blocklist an extension which is causing startup crashes and preventing Firefox from running long enough to get around to updating the blocklist, so we either need daily mozilla-release and esr updates or we need a release step where someone looks at the diff between the in-tree blocklist and the current AMO blocklist and decides whether or not it's okay to ship with the current in-tree one
Component: General Automation → General
See Also: → 1380189

HSTS and remote-settings updates happen twice a week, and have caught issues before, so I don't think we want them to land with DONTBUILD.

Status: NEW → RESOLVED
Closed: 11 months ago
Resolution: --- → WONTFIX
You need to log in before you can comment on or make changes to this bug.