Closed Bug 1677458 Opened 5 years ago Closed 5 years ago

[autoland closed] Perma pydep abort: No such file or directory: /home/worker/mozilla-central/.hg/store/data/browser/components/newtab/test/unit/asrouter/_a_s_router.test.js.d

Categories

(Developer Services :: Mercurial: hg.mozilla.org, defect, P1)

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: intermittent-bug-filer, Assigned: sheehan)

References

Details

(Keywords: intermittent-failure)

Filed by: malexandru [at] mozilla.com
Parsed log: https://treeherder.mozilla.org/logviewer?job_id=321880013&repo=mozilla-central
Full log: https://firefox-ci-tc.services.mozilla.com/api/queue/v1/task/RgAm-P1OTs21Kgw8UjTzyQ/runs/0/artifacts/public/logs/live_backing.log


[taskcluster 2020-11-16 10:09:47.094Z] Task ID: RgAm-P1OTs21Kgw8UjTzyQ
[taskcluster 2020-11-16 10:09:47.094Z] Worker ID: i-048bafb79423860d2
[taskcluster 2020-11-16 10:09:47.094Z] Worker Group: us-east-1
[taskcluster 2020-11-16 10:09:47.094Z] Worker Node Type: m5.4xlarge
[taskcluster 2020-11-16 10:09:47.094Z] Worker Type: b-linux
[taskcluster 2020-11-16 10:09:47.094Z] Public IP: 54.237.218.130
[taskcluster 2020-11-16 10:09:47.094Z] Hostname: ip-10-145-29-26
[taskcluster 2020-11-16 10:09:47.829Z] Downloading artifact "public/image.tar.zst" from task ID: dUmGYuQcR6i1H7F4sg-VWQ.
[taskcluster 2020-11-16 10:09:52.831Z] Download Progress: 55.45%
[taskcluster 2020-11-16 10:09:55.947Z] Downloaded artifact successfully.
[taskcluster 2020-11-16 10:09:55.947Z] Downloaded 216.627 mb
[taskcluster 2020-11-16 10:09:55.947Z] Decompressing downloaded image
[taskcluster 2020-11-16 10:09:57.197Z] Loading docker image from downloaded archive.
[taskcluster 2020-11-16 10:10:04.043Z] Image 'public/image.tar.zst' from task 'dUmGYuQcR6i1H7F4sg-VWQ' loaded.  Using image ID sha256:e4ad6696c9284ad2b9ba7a362e377bd3b0d651412e4dbdd782dfbc9c5bfa0427.
[taskcluster 2020-11-16 10:10:04.209Z] === Task Starting ===
+ test mozilla-central
+ test taskcluster/docker/funsize-update-generator/requirements.in
+ PIP_ARG=-2
+ '[' -n 1 ']'
+ PIP_ARG=-3
+ export ARTIFACTS_DIR=/home/worker/artifacts
+ ARTIFACTS_DIR=/home/worker/artifacts
+ mkdir -p /home/worker/artifacts
+ queue_base=https://firefox-ci-tc.services.mozilla.com/api/queue/v1
+ '[' -n RgAm-P1OTs21Kgw8UjTzyQ ']'
+ curl --location --retry 10 --retry-delay 10 -o /home/worker/task.json https://firefox-ci-tc.services.mozilla.com/api/queue/v1/task/RgAm-P1OTs21Kgw8UjTzyQ
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0
100  2553  100  2553    0     0  31912      0 --:--:-- --:--:-- --:--:-- 31912
++ jq -r '.scopes[] | select(contains ("arc-phabricator-token"))' /home/worker/task.json
++ awk -F: '{print $3}'
+ ARC_SECRET=project/releng/gecko/build/level-3/arc-phabricator-token
+ '[' -n project/releng/gecko/build/level-3/arc-phabricator-token ']'
+ getent hosts taskcluster
172.17.0.2      taskcluster d552e5b6ea9d quizzical_shtern
+ set +x
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0
100   110  100   110    0     0   1170      0 --:--:-- --:--:-- --:--:--  1157
100   110  100   110    0     0   1170      0 --:--:-- --:--:-- --:--:--  1157
+ chmod 600 /home/worker/.arcrc
+ export HGPLAIN=1
+ HGPLAIN=1
+ /home/worker/scripts/update_pipfiles.sh -b mozilla-central -f taskcluster/docker/funsize-update-generator/requirements.in -3
applying clone bundle from https://s3-external-1.amazonaws.com/moz-hg-bundles-us-east-1/mozilla-central/b9dd8e54c8ae449638635758a6571285655a4d0c.stream-v2.hg
607315 files to transfer, 3.31 GB of data
transferred 3.31 GB in 137.8 seconds (24.6 MB/sec)
finished applying clone bundle
searching for changes
adding changesets
adding manifests
adding file changes
added 3 changesets with 7 changes to 6 files
new changesets efceae9186cd:e22423381bcd
updating to branch default
(warning: large working directory being used without fsmonitor enabled; enable fsmonitor to improve performance; see "hg help -e fsmonitor")
abort: No such file or directory: /home/worker/mozilla-central/.hg/store/data/browser/components/newtab/test/unit/asrouter/_a_s_router.test.js.d
[taskcluster 2020-11-16 10:13:08.611Z] === Task Finished ===
[taskcluster 2020-11-16 10:13:08.669Z] Artifact "public/build/requirements.txt.diff" not found at "/home/worker/artifacts/requirements.txt.diff"
[taskcluster 2020-11-16 10:13:08.810Z] Unsuccessful task run with exit code: 255 completed in 201.717 seconds```

This affects the bugbug service used for scheduling the pushes on autoland which gets slowed down by it and the decision task scheduling all other tasks is very slow (20+ minutes) and sometimes even hits its time limit.

autoland had to be closed for this.

Severity: S4 → S1
Flags: needinfo?(sheehan)
Priority: P5 → P1
Summary: Perma pydep abort: No such file or directory: /home/worker/mozilla-central/.hg/store/data/browser/components/newtab/test/unit/asrouter/_a_s_router.test.js.d → [autoland closed] Perma pydep abort: No such file or directory: /home/worker/mozilla-central/.hg/store/data/browser/components/newtab/test/unit/asrouter/_a_s_router.test.js.d

Please note this also affects Code Review bot, that triggers static-analysis for different programming languages, like c/c++, python, etc. Also we rely on Code Review bot to detect build failures on non unified build environment.

Seems both autoland and central are having problems with cloning from the stream clone bundle today. Rolling back to yesterday's bundle seems to have fixed autoland so I'm going to do the same for central now.

Running hg verify against the repos on hgssh now to try and find a source of the problem.

Assignee: nobody → sheehan
Flags: needinfo?(sheehan)

I rolled back to yesterday's bundles and the issues seem to have disappeared. Looking at the logs, it seems pushes touching the file mentioned in the error message took place around the same time as bundle creation. My only theory at the moment is that these pushes somehow corrupted the bundle generation process for both mozilla-central and autoland.

I created a new stream bundle from autoland using the same command as our automation process and then applied the bundle and updated the working directory without issue, which is enough evidence for the time being that running the bundle generation script overnight will produce working bundles.

Status: NEW → RESOLVED
Closed: 5 years ago
Resolution: --- → FIXED

The issue starts to reappear:
2020-11-17T10:58:58.152157+00:00 app[worker.1]: 2020-11-17 10:58:58.139265 [ERROR ] libmozevent.utils: Mercurial robustcheckout failure (out=b'' err=b"\rupdating [ ] 3100/286000 4m55sabort: No such file or directory: '/tmp/pulselistener/mozilla-central-shared/8ba995b74e18334ab3707f27e9eb8f4e37ba3d29/.hg/store/data/browser/components/newtab/test/unit/asrouter/_a_s_router.test.js.d'\n")

Status: RESOLVED → REOPENED
Resolution: FIXED → ---

Any idea why this still may persist in an intermittent way?

Flags: needinfo?(sheehan)

Probably today's clone bundle is broken again, maybe there is some bug in the bundle generation?

(In reply to Andi-Bogdan Postelnicu [:andi] from comment #6)

Any idea why this still may persist in an intermittent way?

Something is busted in the bundle generation process around that specific file. We fixed this problem yesterday by rolling back to bundles from Nov 15, but now the problem is back again after generating new bundles overnight.

Flags: needinfo?(sheehan)

When debugging the problem yesterday I ran the hg debugrebuildfncache command against the autoland repo and the command reported no problems. This led me to believe the source of the bad bundles was a race condition against the repos during the bundle generation process since the fncache for one of the affected repos was seemingly okay.

This morning I recalled that bundles for autoland are actually just copied over from the bundles for central. Running hg debugrebuildfncache against central produces the following, indicating the problem was indeed a corrupt fncache on central:

adding data/browser/components/newtab/test/unit/asrouter/ASRouter.test.js.d                                                                                                                   
1 items added, 0 removed from fncache

So tomorrow's bundles should be produced with a fixed fncache. I'm going to file a bug to verify the fncache before generating bundles on all repos, since we've run into this problem a few times now.

Status: REOPENED → RESOLVED
Closed: 5 years ago5 years ago
Resolution: --- → FIXED
See Also: → 1677789
You need to log in before you can comment on or make changes to this bug.