Closed Bug 1858404 Opened 2 years ago Closed 1 years ago

Perma OSX toolchain custom-car tarfile.ReadError: file could not be opened successfully

Categories

(Testing :: Raptor, defect, P5)

defect

Tracking

(Not tracked)

RESOLVED INCOMPLETE

People

(Reporter: intermittent-bug-filer, Unassigned)

References

Details

(Keywords: intermittent-failure)

Filed by: smolnar [at] mozilla.com
Parsed log: https://treeherder.mozilla.org/logviewer?job_id=432097053&repo=mozilla-central
Full log: https://firefox-ci-tc.services.mozilla.com/api/queue/v1/task/ZLPEsx2_S5Sa915x2OrCRg/runs/0/artifacts/public/logs/live_backing.log


Hook 'python3 src/build/download_nacl_toolchains.py --mode nacl_core_sdk sync --extract' took 38.24 secs
[task 2023-10-11T10:48:20.244Z] ________ running 'python3 src/build/mac_toolchain.py' in '/private/var/opt/generic-worker/tasks/task_169701582160974/custom_car/chromium'
[task 2023-10-11T10:48:20.244Z] Skipping Mac toolchain installation for mac
[task 2023-10-11T10:48:21.163Z] ________ running 'python3 src/tools/clang/scripts/update.py' in '/private/var/opt/generic-worker/tasks/task_169701582160974/custom_car/chromium'
[task 2023-10-11T10:48:21.163Z] Downloading https://commondatastorage.googleapis.com/chromium-browser-clang/Mac/clang-llvmorg-18-init-7785-geef35c28-1.tar.xz .......... Done.
[task 2023-10-11T10:48:21.163Z] Traceback (most recent call last):
[task 2023-10-11T10:48:21.163Z]   File "src/tools/clang/scripts/update.py", line 386, in <module>
[task 2023-10-11T10:48:21.163Z]     sys.exit(main())
[task 2023-10-11T10:48:21.163Z]   File "src/tools/clang/scripts/update.py", line 382, in main
[task 2023-10-11T10:48:21.163Z]     return UpdatePackage(args.package, args.host_os, output_dir)
[task 2023-10-11T10:48:21.163Z]   File "src/tools/clang/scripts/update.py", line 289, in UpdatePackage
[task 2023-10-11T10:48:21.163Z]     DownloadAndUnpackPackage(package_file, dir, host_os)
[task 2023-10-11T10:48:21.163Z]   File "src/tools/clang/scripts/update.py", line 199, in DownloadAndUnpackPackage
[task 2023-10-11T10:48:21.163Z]     DownloadAndUnpack(cds_full_url, output_dir)
[task 2023-10-11T10:48:21.163Z]   File "src/tools/clang/scripts/update.py", line 174, in DownloadAndUnpack
[task 2023-10-11T10:48:21.163Z]     t = tarfile.open(mode='r:*', fileobj=f)
[task 2023-10-11T10:48:21.163Z]   File "/private/var/opt/generic-worker/tasks/task_169701582160974/fetches/python/lib/python3.8/tarfile.py", line 1608, in open
[task 2023-10-11T10:48:21.163Z]     raise ReadError("file could not be opened successfully")
[task 2023-10-11T10:48:21.163Z] tarfile.ReadError: file could not be opened successfully
[task 2023-10-11T10:48:21.164Z] Error: Command 'python3 src/tools/clang/scripts/update.py' returned non-zero exit status 1 in /private/var/opt/generic-worker/tasks/task_169701582160974/custom_car/chromium
[task 2023-10-11T10:48:21.164Z] Downloading https://commondatastorage.googleapis.com/chromium-browser-clang/Mac/clang-llvmorg-18-init-7785-geef35c28-1.tar.xz .......... Done.
[task 2023-10-11T10:48:21.164Z] Traceback (most recent call last):
[task 2023-10-11T10:48:21.164Z]   File "src/tools/clang/scripts/update.py", line 386, in <module>
[task 2023-10-11T10:48:21.164Z]     sys.exit(main())
[task 2023-10-11T10:48:21.164Z]   File "src/tools/clang/scripts/update.py", line 382, in main
[task 2023-10-11T10:48:21.164Z]     return UpdatePackage(args.package, args.host_os, output_dir)
[task 2023-10-11T10:48:21.164Z]   File "src/tools/clang/scripts/update.py", line 289, in UpdatePackage
[task 2023-10-11T10:48:21.164Z]     DownloadAndUnpackPackage(package_file, dir, host_os)
[task 2023-10-11T10:48:21.164Z]   File "src/tools/clang/scripts/update.py", line 199, in DownloadAndUnpackPackage
[task 2023-10-11T10:48:21.164Z]     DownloadAndUnpack(cds_full_url, output_dir)
[task 2023-10-11T10:48:21.164Z]   File "src/tools/clang/scripts/update.py", line 174, in DownloadAndUnpack
[task 2023-10-11T10:48:21.164Z]     t = tarfile.open(mode='r:*', fileobj=f)
[task 2023-10-11T10:48:21.164Z]   File "/private/var/opt/generic-worker/tasks/task_169701582160974/fetches/python/lib/python3.8/tarfile.py", line 1608, in open
[task 2023-10-11T10:48:21.164Z]     raise ReadError("file could not be opened successfully")
[task 2023-10-11T10:48:21.164Z] tarfile.ReadError: file could not be opened successfully
[task 2023-10-11T10:48:21.164Z] 
[taskcluster 2023-10-11T10:48:21.198Z]    Exit Code: 2
[taskcluster 2023-10-11T10:48:21.198Z]    User Time: 8m15.090322s
[taskcluster 2023-10-11T10:48:21.198Z]  Kernel Time: 7m5.696301s
[taskcluster 2023-10-11T10:48:21.198Z]    Wall Time: 9m44.172473s
[taskcluster 2023-10-11T10:48:21.198Z]       Result: FAILED
[taskcluster 2023-10-11T10:48:21.198Z] === Task Finished ===
[taskcluster 2023-10-11T10:48:21.198Z] Task Duration: 9m44.176991s
[taskcluster:error] Uploading error artifact public/build from file public/build with message "Could not read directory '/var/opt/generic-worker/tasks/task_169701582160974/public/build'", reason "file-missing-on-worker" and expiry 2024-01-09T10:34:20.626Z
[taskcluster:error] TASK FAILURE during artifact upload: file-missing-on-worker: Could not read directory '/var/opt/generic-worker/tasks/task_169701582160974/public/build'
[taskcluster 2023-10-11T10:48:21.607Z] Uploading artifact public/logs/certified.log from file generic-worker/certified.log with content encoding "gzip", mime type "text/plain; charset=utf-8" and expiry 2024-01-09T10:34:20.626Z
[taskcluster 2023-10-11T10:48:21.991Z] Uploading artifact public/chain-of-trust.json from file generic-worker/chain-of-trust.json with content encoding "gzip", mime type "text/plain; charset=utf-8" and expiry 2024-01-09T10:34:20.626Z
[taskcluster 2023-10-11T10:48:22.353Z] Uploading artifact public/chain-of-trust.json.sig from file generic-worker/chain-of-trust.json.sig with content encoding "gzip", mime type "application/octet-stream" and expiry 2024-01-09T10:34:20.626Z
[taskcluster 2023-10-11T10:48:22.619Z] [mounts] Preserving cache: Moving "/var/opt/generic-worker/tasks/task_169701582160974/checkouts" to "/var/opt/generic-worker/caches/RZxAo1qyRH2V9PFQQQskvQ"
[taskcluster 2023-10-11T10:48:22.621Z] [mounts] Denying task_169701582160974 access to '/var/opt/generic-worker/caches/RZxAo1qyRH2V9PFQQQskvQ'
[taskcluster 2023-10-11T10:48:22.717Z] Uploading redirect artifact public/logs/live.log to URL https://firefox-ci-tc.services.mozilla.com/api/queue/v1/task/ZLPEsx2_S5Sa915x2OrCRg/runs/0/artifacts/public%2Flogs%2Flive_backing.log with mime type "text/plain; charset=utf-8" and expiry 2024-01-09T10:34:20.626Z
[taskcluster:error] exit status 2
[taskcluster:error] file-missing-on-worker: Could not read directory '/var/opt/generic-worker/tasks/task_169701582160974/public/build'

Flags: needinfo?(kshampur)

Looking into it

Component: Toolchains → Raptor
Flags: needinfo?(kshampur)
Product: Firefox Build System → Testing
Flags: needinfo?(kshampur)

I am suspecting this might be related to the workers not properly being cleaned up.

here is a build in progress ( no failure as above) with the removal of the custom-car dir https://treeherder.mozilla.org/jobs?repo=try&selectedTaskRun=Xisgf4zvSGS-wvmAr9U9Vg.0&tier=1%2C2%2C3&revision=054834afb599484ce0ce723aaafa745999f9d961

but failure as is https://treeherder.mozilla.org/jobs?repo=try&tier=1%2C2%2C3&revision=b542ed285480d37b1872b2819543f0ad0376f9e3&selectedTaskRun=PhEpYPeKSeKeR1BU5UQdlA.0

but the tar file error doesn't make sense if this were the case. maybe it is just specific workers. trying out a couple more runs

Actually might be machine related? I looked at upstream of depot_tools, nothing stands out in the past 24 hours.

This failure message does not seem to occur on 196 or 138 from what I've seen so far (just noting I intentionally cancelled the latter job here once I saw it made it to the build steps, much past the error bug 1858404,comment 0)

but seems to happen on 200, 199, 139 from this push in m-c and also machine 195

:glandium/:masterwayz might this still be related to Bug 1857748?

Flags: needinfo?(mh+mozilla)
Flags: needinfo?(mgoossens)

No, I bet this error is because the toolchain python doesn't support lzma because of bug 1834414.

Flags: needinfo?(mh+mozilla)
Flags: needinfo?(mgoossens)
Flags: needinfo?(sguelton)

I'll look how to fix that part.

Flags: needinfo?(sguelton)

:sergesanspaille, not sure if this is adviseable - I tried updating this from python 3.8.10 -> 3.8.18 (and made the appropriate changes here) and it seems to have lzma and seems to help pass the CaR test (I cancelled the one on machine 196 there as soon as I saw it make it to the main build stage, which was previously failing before that on the previous 3.8.10 toolchain e.g.)

(also worth mentioning I tried a 3.9.18 toolchain, but it was not working. Not sure what to make of it, might be related to the mac worker machines not being cleaned.)

anyway, if this 3.8.10->3.8.18 update is fine, I can make a patch?

Flags: needinfo?(sguelton)

So the situation is that some worker have lzma headers and libs installed locally (in /usr/local/...) and some don't, which can lead to situation where either:

  1. python is built on a machine with lzma, and used on a machine with lzma -> everything is fine
  2. python is built on a machine with lzma, and used on a machine without lzma -> error at use time
  3. python is built on a machine without lzma -> warning at build time, error at use time

I don't know if this was the sole reason for that bug, but it's worth fixing, so I reopened https://bugzilla.mozilla.org/show_bug.cgi?id=1834414 and submitted a patch. I triggered a build of the failing job from OP, let see how it goes: https://treeherder.mozilla.org/jobs?repo=try&revision=0853a26f12bffc7e694816cdc2af8bb6db480e0b&selectedTaskRun=LAFFyjQzSgyE2CD5sugxag.0

Flags: needinfo?(sguelton)
Depends on: 1834414
Flags: needinfo?(kshampur)
Status: NEW → RESOLVED
Closed: 1 years ago
Resolution: --- → INCOMPLETE
You need to log in before you can comment on or make changes to this bug.