Closed Bug 1822861 Opened 1 year ago Closed 1 year ago

Perma HSTS preload list generation failed

Categories

(Core :: Security Block-lists, Allow-lists, and other State, defect)

defect

Tracking

()

RESOLVED FIXED
113 Branch
Tracking Status
firefox-esr102 --- unaffected
firefox111 --- unaffected
firefox112 + fixed
firefox113 + fixed

People

(Reporter: noriszfay, Assigned: RyanVM)

Details

(Keywords: intermittent-failure)

Attachments

(2 files)

Parsed log: https://treeherder.mozilla.org/logviewer?job_id=409138019&repo=mozilla-beta&lineNumber=84712
Full log: https://firefoxci.taskcluster-artifacts.net/Waz0cdjBQSmWydXEgMd9Ug/2/public/logs/live_backing.log

JavaScript error: /home/worker/scripts/getHSTSPreloadList.js, line 164: NS_ERROR_ENTITY_CHANGED: 
ERROR: exception making request to erenimrek.com.tr
JavaScript error: /home/worker/scripts/getHSTSPreloadList.js, line 164: NS_ERROR_ENTITY_CHANGED: 
JavaScript error: /home/worker/scripts/getHSTSPreloadList.js, line 164: NS_ERROR_ENTITY_CHANGED: 
JavaScript error: /home/worker/scripts/getHSTSPreloadList.js, line 164: NS_ERROR_ENTITY_CHANGED: 
JavaScript error: /home/worker/scripts/getHSTSPreloadList.js, line 164: NS_ERROR_ENTITY_CHANGED: 
JavaScript error: /home/worker/scripts/getHSTSPreloadList.js, line 164: NS_ERROR_ENTITY_CHANGED: 
JavaScript error: /home/worker/scripts/getHSTSPreloadList.js, line 164: NS_ERROR_ENTITY_CHANGED: 
JavaScript error: /home/worker/scripts/getHSTSPreloadList.js, line 164: NS_ERROR_ENTITY_CHANGED: 
ERROR: exception making request to geoactivism.org
JavaScript error: /home/worker/scripts/getHSTSPreloadList.js, line 164: NS_ERROR_ENTITY_CHANGED: 
JavaScript error: /home/worker/scripts/getHSTSPreloadList.js, line 164: NS_ERROR_ENTITY_CHANGED: 
JavaScript error: /home/worker/scripts/getHSTSPreloadList.js, line 164: NS_ERROR_ENTITY_CHANGED: 
JavaScript error: /home/worker/scripts/getHSTSPreloadList.js, line 164: NS_ERROR_ENTITY_CHANGED: 
JavaScript error: /home/worker/scripts/getHSTSPreloadList.js, line 164: NS_ERROR_ENTITY_CHANGED: 
JavaScript error: /home/worker/scripts/getHSTSPreloadList.js, line 164: NS_ERROR_ENTITY_CHANGED: 
JavaScript error: /home/worker/scripts/getHSTSPreloadList.js, line 164: NS_ERROR_ENTITY_CHANGED: 
JavaScript error: /home/worker/scripts/getHSTSPreloadList.js, line 164: NS_ERROR_ENTITY_CHANGED: 
JavaScript error: /home/worker/scripts/getHSTSPreloadList.js, line 164: NS_ERROR_ENTITY_CHANGED: 
JavaScript error: /home/worker/scripts/getHSTSPreloadList.js, line 164: NS_ERROR_ENTITY_CHANGED: 
JavaScript error: /home/worker/scripts/getHSTSPreloadList.js, line 164: NS_ERROR_ENTITY_CHANGED: 
JavaScript error: /home/worker/scripts/getHSTSPreloadList.js, line 164: NS_ERROR_ENTITY_CHANGED: 
JavaScript error: /home/worker/scripts/getHSTSPreloadList.js, line 164: NS_ERROR_ENTITY_CHANGED: 
JavaScript error: /home/worker/scripts/getHSTSPreloadList.js, line 164: NS_ERROR_ENTITY_CHANGED: 
/home/worker/scripts/periodic_file_updates.sh: line 171:    45 Segmentation fault      (core dumped) LD_LIBRARY_PATH=${LD_LIBRARY_PATH}:. ./xpcshell "${HSTS_PRELOAD_SCRIPT}" "${HSTS_PRELOAD_INC_OLD}"
+ echo 'HSTS preload list generation failed'
HSTS preload list generation failed
+ exit 43
[taskcluster 2023-03-16 16:12:58.531Z] === Task Finished ===
[taskcluster 2023-03-16 16:12:58.536Z] Artifact "public/build/StaticHPKPins.h.diff" not found at "/home/worker/artifacts/StaticHPKPins.h.diff": (HTTP code 404) no such container - Could not find the file /home/worker/artifacts/StaticHPKPins.h.diff in container 14658f5f02a2cedf0cdd542199a3908fa44f5075977281388c2d1b07ec082adb 
[taskcluster 2023-03-16 16:12:58.538Z] Artifact "public/build/remote-settings.diff" not found at "/home/worker/artifacts/remote-settings.diff": (HTTP code 404) no such container - Could not find the file /home/worker/artifacts/remote-settings.diff in container 14658f5f02a2cedf0cdd542199a3908fa44f5075977281388c2d1b07ec082adb 
[taskcluster 2023-03-16 16:12:58.540Z] Artifact "public/build/nsSTSPreloadList.diff" not found at "/home/worker/artifacts/nsSTSPreloadList.diff": (HTTP code 404) no such container - Could not find the file /home/worker/artifacts/nsSTSPreloadList.diff in container 14658f5f02a2cedf0cdd542199a3908fa44f5075977281388c2d1b07ec082adb 
[taskcluster 2023-03-16 16:13:00.197Z] Unsuccessful task run with exit code: 43 completed in 9690.188 seconds

I've been poking at this a bit. Unfortunately, the last green run on Beta was prior to the Gecko 112 uplift on Monday. However, we did have a green run on mozilla-central on the last revision prior to the merge, which makes me feel reasonably good that this could be by something which landed on 113 and got uplifted to Beta.

Ignoring the 112 merge commit, here's a rough regression range:
https://hg.mozilla.org/releases/mozilla-beta/pushloghtml?fromchange=4597c864a7b163089de157fc6eefcaa67afdd42d&tochange=853c78c9c586a11ffbd5279667cf514f0dba2542

From that, only bug 1569405 really seems to stand out to me. The ORB change doesn't seem likely because that would have been presumably causing issues on m-c already if it were the problem.

Hmm, that sounds unlikely to me. That patch really just adds a null check, so if it was relevant here I'd expect to see a crash here without.

Assignee: nobody → ryanvm
Status: NEW → ASSIGNED

I'm disabling the HSTS updates for now to at least unblock the rest of the job.

Component: Task Configuration → Security Block-lists, Allow-lists, and other State
Keywords: leave-open
Product: Firefox Build System → Core
Pushed by rvandermeulen@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/26eb1010f766
Temporarily disable HSTS pinning updates due to crashes during the run. r=jcristau
Pushed by ryanvm@gmail.com:
https://hg.mozilla.org/mozilla-central/rev/20511d3af52f
Temporarily disable HSTS pinning updates due to crashes during the run. r=jcristau, a=NPOTB DONTBUILD

Right now, my best guess is that we have a combination of a latent bug being triggered by a remote change. After doing a bit of Try hackery to force periodic_file_updates.sh to pull a specific build rather than the latest from the index, I can reproduce the crashes even off a rev that previously ran green.

It is interesting that ESR102 isn't crashing. Suggests that it was something recent(ish).

One more wrinkle - of the 8 Try pushes I initially triggered, 3 actually managed to run without crashing.

Dana, I've been striking out trying to bisect this any further on Try. Do you have any ideas how we might be able to try to reproduce these crashes locally so they can get caught in a debugger?

Flags: needinfo?(dkeeler)

You could try taking subsets of the preload list so it doesn't take so long to process. You might even be able to narrow it down to a specific domain. I think modifying hostsToContact would be the easiest way to do this.

Flags: needinfo?(dkeeler)

So far, I've been able to confirm that the crashes happen intermittently, but frequently. They also happen at different points during the run, suggesting that it isn't tied to one specific domain. My guess is that something changed during the 112 cycle that made us more likely to OOM or something, which is how this is manifesting. A stronger bit of evidence towards that is that if I change instance size to xlarge, I can't get any failures on Try. I'm not in a position to try to further narrow down why we seem to be more prone to these issues than we were previously, so I'm going to just submit the patch to bump the instance size.

Talking to Dana more, we might want to consider rethinking our general approach to these jobs, however. We already have bug 1810856 on file for splitting the repo-update job into two different ones so we can more easily trigger remote-settings updates without having to wait hours for the HSTS updates as well. We could potentially take that one step further and split the HSTS part into multiple phases where we probe the sites across multiple jobs running in parallel and then combine the results for a final patch to be produced. Doing so would probably put us in a much better place with respect to how heavyweight this whole process is at the moment.

Pushed by rvandermeulen@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/b769d68836a6
Switch the repo-update job to xlarge instances and re-enable HSTS updates. r=jcristau DONTBUILD
Keywords: leave-open
Status: ASSIGNED → RESOLVED
Closed: 1 year ago
Resolution: --- → FIXED
Target Milestone: --- → 113 Branch
See Also: → 1834975
See Also: 1834975
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: