Closed Bug 1694217 Opened 3 years ago Closed 3 years ago

Intermittent periodic file update [taskcluster:error] Task timeout after 18000 seconds. Force killing container.

Categories

(Release Engineering :: Release Automation: Other, defect)

defect

Tracking

(firefox-esr7887+ fixed, firefox86 fixed, firefox87+ fixed, firefox88+ fixed)

RESOLVED FIXED
Tracking Status
firefox-esr78 87+ fixed
firefox86 --- fixed
firefox87 + fixed
firefox88 + fixed

People

(Reporter: intermittent-bug-filer, Assigned: jcristau)

References

Details

(Keywords: intermittent-failure)

Attachments

(3 files, 1 obsolete file)

Filed by: malexandru [at] mozilla.com
Parsed log: https://treeherder.mozilla.org/logviewer?job_id=330707720&repo=mozilla-beta
Full log: https://firefox-ci-tc.services.mozilla.com/api/queue/v1/task/YpyjYCLZSQe97EEa-5_Nng/runs/0/artifacts/public/logs/live_backing.log


@@ -11777,7 +11781,6 @@
 
 // Fastly Inc. : http://www.fastly.com/
 // Submitted by Fastly Security <security@fastly.com>
-edgecompute.app
 fastly-terrarium.com
 fastlylb.net
 map.fastlylb.net
@@ -12614,10 +12617,6 @@
 *.northflank.app
 *.code.run
 
-// Noticeable : https://noticeable.io
-// Submitted by Laurent Pellegrino <security@noticeable.io>
-noticeable.news
-
 // Now-DNS : https://now-dns.com
 // Submitted by Steve Russell <steve@now-dns.com>
 dnsking.ch
+ '[' -s /home/worker/artifacts/effective_tld_names.diff ']'
+ return 0
+ SUFFIX_LIST_UPDATED=true
+ '[' true == false ']'
+ '[' '' == true ']'
+ clone_repo
+ cd /home/worker
+ '[' '!' -d mozilla-beta ']'
+ /usr/bin/hg robustcheckout --sharebase /tmp/hg-store -b default https://hg.mozilla.org/releases/mozilla-beta mozilla-beta
(using Mercurial 4.8.1)
ensuring https://hg.mozilla.org/releases/mozilla-beta@default is available at mozilla-beta
(sharing from new pooled repository 8ba995b74e18334ab3707f27e9eb8f4e37ba3d29)
applying clone bundle from https://s3-external-1.amazonaws.com/moz-hg-bundles-us-east-1/releases/mozilla-beta/55fe52412f1546f6e4550dc1ee64fefce542b527.stream-v2.hg
611975 files to transfer, 3.49 GB of data
transferred 3.49 GB in 127.6 seconds (28.0 MB/sec)
finished applying clone bundle
searching for changes
adding changesets
adding manifests
adding file changes
added 3771 changesets with 23004 changes to 10456 files
new changesets d44181196337:a8a1f0369b83
searching for changes
no changes found
(pulling to obtain default)
(remote resolved default to a8a1f0369b834c74b1445833b2460617bac947f5; result is not deterministic)
(revision already present locally; not pulling)
(warning: large working directory being used without fsmonitor enabled; enable fsmonitor to improve performance; see "hg help -e fsmonitor")
287704 files updated, 0 files merged, 0 files removed, 0 files unresolved
updated to a8a1f0369b834c74b1445833b2460617bac947f5
PERFHERDER_DATA: {"framework": {"name": "vcs"}, "suites": [{"extraOptions": ["m5.4xlarge"], "hgVersion": "4.8.1", "lowerIsBetter": true, "name": "clone", "serverUrl": "hg.mozilla.org", "shouldAlert": false, "subtests": [], "value": 156.22249293327332}, {"extraOptions": ["m5.4xlarge"], "hgVersion": "4.8.1", "lowerIsBetter": true, "name": "update", "serverUrl": "hg.mozilla.org", "shouldAlert": false, "subtests": [], "value": 58.85510206222534}, {"extraOptions": ["m5.4xlarge"], "hgVersion": "4.8.1", "lowerIsBetter": true, "name": "overall", "serverUrl": "hg.mozilla.org", "shouldAlert": false, "subtests": [], "value": 217.2575089931488}, {"extraOptions": ["m5.4xlarge"], "hgVersion": "4.8.1", "lowerIsBetter": true, "name": "overall_clone", "serverUrl": "hg.mozilla.org", "shouldAlert": false, "subtests": [], "value": 217.2575089931488}, {"extraOptions": ["m5.4xlarge"], "hgVersion": "4.8.1", "lowerIsBetter": true, "name": "overall_clone_fullcheckout", "serverUrl": "hg.mozilla.org", "shouldAlert": false, "subtests": [], "value": 217.2575089931488}]}
+ /usr/bin/hg -R mozilla-beta pull
pulling from https://hg.mozilla.org/releases/mozilla-beta
searching for changes
no changes found
+ /usr/bin/hg -R mozilla-beta update -C default
0 files updated, 0 files merged, 0 files removed, 0 files unresolved
+ COMMIT_MESSAGE='No Bug, mozilla-beta repo-update'
+ '[' true == true ']'
+ stage_hsts_files
+ cd /home/worker
++ basename /home/worker/data/nsSTSPreloadList.inc
+ cp -f /home/worker/firefox/nsSTSPreloadList.inc mozilla-beta/security/manager/ssl/
+ COMMIT_MESSAGE='No Bug, mozilla-beta repo-update HSTS'
+ '[' true == true ']'
+ stage_hpkp_files
+ cd /home/worker
+ cp -f /home/worker/data/StaticHPKPins.h.out mozilla-beta/security/manager/ssl/StaticHPKPins.h
+ COMMIT_MESSAGE='No Bug, mozilla-beta repo-update HSTS HPKP'
+ '[' true == true ']'
+ stage_remote_settings_files
+ cd /home/worker
+ cp -a /home/worker/data/remote-settings.out/blocklists /home/worker/data/remote-settings.out/main /home/worker/data/remote-settings.out/pinning /home/worker/data/remote-settings.out/security-state mozilla-beta/services/settings/dumps
+ COMMIT_MESSAGE='No Bug, mozilla-beta repo-update HSTS HPKP remote-settings'
+ '[' true == true ']'
+ stage_tld_suffix_files
+ cd /home/worker
+ cp -a public_suffix_list.dat mozilla-beta//netwerk/dns/effective_tld_names.dat
+ COMMIT_MESSAGE='No Bug, mozilla-beta repo-update HSTS HPKP remote-settings tld-suffixes'
+ '[' false == true ']'
+ '[' false == true ']'
+ '[' true == true ']'
+ COMMIT_MESSAGE='No Bug, mozilla-beta repo-update HSTS HPKP remote-settings tld-suffixes - a=repo-update'
+ /usr/bin/hg -R mozilla-beta commit -u 'ffxbld <ffxbld@mozilla.com>' -m 'No Bug, mozilla-beta repo-update HSTS HPKP remote-settings tld-suffixes - a=repo-update'
+ push_repo
+ cd mozilla-beta
+ '[' '!' -r /home/worker/.arcrc ']'
++ command -v arc
+ ARC=/usr/bin/arc
+ '[' -z ryanvm ']'
++ /usr/bin/arc list
++ grep 'Needs Review'
++ grep -E 'mozilla-beta repo-update'
++ awk 'match($0, /D[0-9]+[^: ]/) { print substr($0, RSTART, RLENGTH)  }'
+ /usr/bin/arc diff --verbatim --reviewers ryanvm
Linting...
No lint engine configured for this project.
Running unit tests...
No unit test engine is configured for this project.
    Diff for 'services/settings/dumps/security-state/intermediates.json' with
    context is 4,198,657 bytes in length. Generally, source changes should
    not be this large. If this file is a huge text file, try using the
    '--less-context' flag. If the file is not a text file, you can mark it
    'binary'. Mark this file as 'binary' and continue? [y/N] 
[taskcluster:error] Task timeout after 18000 seconds. Force killing container.
[taskcluster 2021-02-22 15:06:42.575Z] === Task Finished ===
[taskcluster 2021-02-22 15:06:42.575Z] Unsuccessful task run with exit code: -1 completed in 18009.359 seconds```
Summary: Intermittent [taskcluster:error] Task timeout after 18000 seconds. Force killing container. → Intermittent pfu [taskcluster:error] Task timeout after 18000 seconds. Force killing container.
Component: Task Configuration → Release Automation: Other
Product: Firefox Build System → Release Engineering
QA Contact: mtabara
Summary: Intermittent pfu [taskcluster:error] Task timeout after 18000 seconds. Force killing container. → Intermittent periodic file update[taskcluster:error] Task timeout after 18000 seconds. Force killing container.
Assignee: nobody → aryx.bugmail
Status: NEW → ASSIGNED
Pushed by archaeopteryx@coole-files.de:
https://hg.mozilla.org/integration/autoland/rev/89d3fb47753d
use --less-context option on submission of periodic file update patches to work around Phabricator limit to changes in file. r=mtabara DONTBUILD

As pointed out by jcristau, the patch is likely insufficient because the changes are in one line. Arcanist has a hardcoded limit of 4MB. moz-phab doesn't list a limit and a switch might be required.

Assignee: aryx.bugmail → nobody
Status: ASSIGNED → NEW
Keywords: leave-open
Summary: Intermittent periodic file update[taskcluster:error] Task timeout after 18000 seconds. Force killing container. → Intermittent periodic file update [taskcluster:error] Task timeout after 18000 seconds. Force killing container.

We need to be able to update this data, so tracking this as a release blocker.

Severity: S4 → --
Priority: P5 → --

The recent spike seems to be caused by:
https://hg.mozilla.org/releases/mozilla-release/rev/755246fd3b355aaa9885c5159523de1a75abed65
https://hg.mozilla.org/releases/mozilla-beta/rev/17c99982b6a9898cd5a69f5d75a0d290f121f428

Failure log: https://treeherder.mozilla.org/logviewer?job_id=331234368&repo=mozilla-beta&lineNumber=77840

...
No unit test engine is configured for this project.
    Diff for 'services/settings/dumps/security-state/intermediates.json' with
    context is 4,198,656 bytes in length. Generally, source changes should
    not be this large. If this file is a huge text file, try using the
    '--less-context' flag. If the file is not a text file, you can mark it
    'binary'. Mark this file as 'binary' and continue? [y/N] 
[taskcluster:error] Task timeout after 18000 seconds. Force killing container.
[taskcluster 2021-02-26 01:35:49.669Z] === Task Finished ===
[taskcluster 2021-02-26 01:35:49.669Z] Unsuccessful task run with exit code: -1 completed in 18009.488 seconds

Mihai, could you, please, take a look?

Flags: needinfo?(mtabara)

(In reply to Bogdan Tara[:bogdan_tara | bogdant] from comment #6)

The recent spike seems to be caused by:
https://hg.mozilla.org/releases/mozilla-release/rev/755246fd3b355aaa9885c5159523de1a75abed65
https://hg.mozilla.org/releases/mozilla-beta/rev/17c99982b6a9898cd5a69f5d75a0d290f121f428

Failure log: https://treeherder.mozilla.org/logviewer?job_id=331234368&repo=mozilla-beta&lineNumber=77840

...
No unit test engine is configured for this project.
    Diff for 'services/settings/dumps/security-state/intermediates.json' with
    context is 4,198,656 bytes in length. Generally, source changes should
    not be this large. If this file is a huge text file, try using the
    '--less-context' flag. If the file is not a text file, you can mark it
    'binary'. Mark this file as 'binary' and continue? [y/N] 
[taskcluster:error] Task timeout after 18000 seconds. Force killing container.
[taskcluster 2021-02-26 01:35:49.669Z] === Task Finished ===
[taskcluster 2021-02-26 01:35:49.669Z] Unsuccessful task run with exit code: -1 completed in 18009.488 seconds

Mihai, could you, please, take a look?

Aware of this, thanks for raising it. We no longer have an owner on this in RelEng, trying to dig to find a suitable solution here.
Assigning this to me for now.

Assignee: nobody → mtabara
Flags: needinfo?(mtabara)

Dana, something changed recently causing our periodic updates of intermediates.json to cause hugely bloated diffs (>4MB), breaking the jobs due to Phabricator limits in diff size. Is there anything you can think of that we could do here to lesson the size of this? I assume they're all on one line for reasons.

Flags: needinfo?(dkeeler)

I've pushed a manual update of intermediates.json to mozilla-release. Mihai is going to manually trigger a new run of the pfu cron job to confirm that it runs green. If so, I'll do the same for the other trees.
https://hg.mozilla.org/releases/mozilla-release/rev/1492ed8d716e5987847b2b8c56861c0ac2d551ce

Pushed by ryanvm@gmail.com:
https://hg.mozilla.org/mozilla-central/rev/96cb6ca6f536
Manually update intermediates.json to work around arc limitations. r+a=me DONTBUILD

Also, while it may become obsolete if we switch from arc to moz-phab here, it's probably worth noting that we're timing out at the moment because we're not properly noticing the fatal arc error and terminating the job, so instead we just sit there after the error until eventually the harness times out the job instead.

(In reply to Ryan VanderMeulen [:RyanVM] from comment #8)

Dana, something changed recently causing our periodic updates of intermediates.json to cause hugely bloated diffs (>4MB), breaking the jobs due to Phabricator limits in diff size. Is there anything you can think of that we could do here to lesson the size of this? I assume they're all on one line for reasons.

I think they're all on line because of how wget saves the json data: https://searchfox.org/mozilla-central/rev/362676fcadac37f9f585141a244a9a640948794a/taskcluster/docker/periodic-updates/scripts/periodic_file_updates.sh#307 - my (optimistic) guess is that isn't necessary. Again, optimistically, maybe pretty-printing it would make diffs more manageable?

Flags: needinfo?(dkeeler)

The workaround from comment 11 got the jobs running green again for now, so I'm downgrading the severity as we're no longer blocked. That said, the overall situation is still fragile and we still need prioritize making this more robust before the next change to intermediates.json puts us back here again.

Should we try something like this then:

diff --git a/taskcluster/docker/periodic-updates/scripts/periodic_file_updates.sh b/taskcluster/docker/periodic-updates/scripts/periodic_file_updates.sh
--- a/taskcluster/docker/periodic-updates/scripts/periodic_file_updates.sh
+++ b/taskcluster/docker/periodic-updates/scripts/periodic_file_updates.sh
@@ -304,7 +304,7 @@ function compare_remote_settings_files {
     remote_records_url="$REMOTE_SETTINGS_SERVER/buckets/${bucket}/collections/${collection}/records"
     local_location_output="$REMOTE_SETTINGS_OUTPUT/${bucket}/${collection}.json"
     mkdir -p "$REMOTE_SETTINGS_OUTPUT/${bucket}"
-    ${WGET} -qO "$local_location_output" "$remote_records_url"
+    ${WGET} -qO- "$remote_records_url" | ${JQ} . > "${local_location_output}"
 
     # 5. Download attachments if needed.
     if [ "${bucket}" = "blocklists" ] && [ "${collection}" = "addons-bloomfilters" ]; then

Mathieu, would remote settings be OK with dumps getting reformatted this way?

Flags: needinfo?(mathieu)

Mathieu, would remote settings be OK with dumps getting reformatted this way?

Yes, sure, no problem. Making the diffs easier to read would probably benefit everyone :)
The loading code is straightforward and since it's fetching from resource:// the overhead should be negligible.

Flags: needinfo?(mathieu)

One other question I guess: is the ordering of entries kept mostly the same when the "intermediates" collection is updated? If not, the diff will probably still be quite big when pretty-printed.

Answering my own question, it looks like entries are sorted by last_modified so we should be good.

This will make the diffs both easier to review, and smaller; the latter
should avoid bumping into arc's hardcoded 4MB size limit for diffs.

Pushed by jcristau@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/0a68e4b60e1c
pretty-print remote settings dumps. r=releng-reviewers,bhearsum DONTBUILD

Stealing this bug from Mihai.

Assignee: mtabara → jcristau
Status: NEW → ASSIGNED

This commit was created by running:

for f in services/settings/dumps//.json; do jq . < $f | sponge $f; done

Future updates will automatically be pretty-printed, so this is a
one-time manual update to get us to the transitioned state.

Pushed by rvandermeulen@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/4770ba6932ee
manually run all remote-settings dumps through jq. r=RyanVM
Pushed by asasaki@mozilla.com:
https://hg.mozilla.org/ci/ci-configuration/rev/b64dae93e3b6
enable periodic-update manual cron. r=mtabara,RyanVM

https://phabricator.services.mozilla.com/D106867

The remote settings changes are a 39k diff now :-)

Attachment #9204778 - Attachment is obsolete: true
Attachment #9204778 - Attachment is obsolete: false
Attachment #9204778 - Attachment is obsolete: true
See Also: → 1705639
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: