Intermittent exception [d2g] could not load docker image: exit status 1
Categories
(Infrastructure & Operations :: RelOps: Posix OS, defect, P3)
Tracking
(Not tracked)
People
(Reporter: chorotan, Assigned: jcristau)
References
Details
Attachments
(2 files)
Failure log: https://firefoxci.taskcluster-artifacts.net/bJtKR2NMTU2ILZ5uK2_cKA/0/public/logs/live_backing.log
[taskcluster 2025-09-23T18:07:07.870Z] Task ID: bJtKR2NMTU2ILZ5uK2_cKA
[taskcluster 2025-09-23T18:07:07.870Z] === Task Starting ===
[taskcluster 2025-09-23T18:07:08.117Z] Uploading redirect artifact public/logs/live.log to URL https://firefoxci-websocktunnel.services.mozilla.com/us-central1-a.5637449981301321985.60099/log/HkH23Yb0QAaYLl4sz-STXA with mime type "text/plain; charset=utf-8" and expiry 2025-09-23T19:22:08.063Z
[taskcluster 2025-09-23T18:07:08.299Z] [mounts] Moving existing writable directory cache gecko-level-3-checkouts-hg58-v3-38439bed431008791a5f from /home/generic-worker/caches/V1jpBh5tRKu5vWIxYbE4vA to /home/task_175865082763016/cache0
[taskcluster 2025-09-23T18:07:08.299Z] [mounts] Creating directory /home/task_175865082763016
[taskcluster 2025-09-23T18:07:08.307Z] [mounts] Updating ownership of files inside directory '/home/task_175865082763016/cache0' from task_175864999138420 to task_175865082763016
[taskcluster 2025-09-23T18:07:12.366Z] [mounts] Successfully mounted writable directory cache '/home/task_175865082763016/cache0'
[taskcluster 2025-09-23T18:07:12.366Z] [mounts] Downloading task e2R5awZzQDqHxQcZOa1-4w artifact public/image.tar.zst to /home/generic-worker/downloads/fjTHG07kTy2YJXoTWca8Tw
[taskcluster 2025-09-23T18:07:20.361Z] [mounts] Downloaded 1725268915 bytes with SHA256 99e742e057e327ea4f8f15a1cf58cf647ef9f11d76bb8ded3bf7fd73f42f448f from task e2R5awZzQDqHxQcZOa1-4w artifact public/image.tar.zst to /home/generic-worker/downloads/fjTHG07kTy2YJXoTWca8Tw
[taskcluster:warn 2025-09-23T18:07:20.361Z] [mounts] Download /home/generic-worker/downloads/fjTHG07kTy2YJXoTWca8Tw of task e2R5awZzQDqHxQcZOa1-4w artifact public/image.tar.zst has SHA256 99e742e057e327ea4f8f15a1cf58cf647ef9f11d76bb8ded3bf7fd73f42f448f but task payload does not declare a required value, so content authenticity cannot be verified
[taskcluster 2025-09-23T18:07:20.361Z] [mounts] Creating directory /home/task_175865082763016
[taskcluster 2025-09-23T18:07:20.377Z] [mounts] Copying /home/generic-worker/downloads/fjTHG07kTy2YJXoTWca8Tw to /home/task_175865082763016/dockerimage
[taskcluster 2025-09-23T18:07:33.389Z] [mounts] Granting task_175865082763016 full control of file '/home/task_175865082763016/dockerimage'
[taskcluster 2025-09-23T18:07:34.604Z] [d2g] Loading docker image
[taskcluster:warn 2025-09-23T18:08:41.033Z] [d2g] Artifact "public/remote/pup_errorsummary.json" not found at "/builds/worker/pup_errorsummary.json": exit status 1
[taskcluster:warn 2025-09-23T18:08:41.033Z] Error response from daemon: No such container: taskcontainer_NG3pfVJyT4mGNeLt8xs6Dw
[taskcluster:warn 2025-09-23T18:08:41.033Z]
[taskcluster 2025-09-23T18:08:41.105Z] Uploading error artifact public/remote/pup_errorsummary.json from file artifact0.json with message "Could not read file '/home/task_175865082763016/artifact0.json'", reason "file-missing-on-worker" and expiry 2026-09-23T16:52:37.779Z
[taskcluster 2025-09-23T18:08:41.105Z] [mounts] Preserving cache: Moving "/home/task_175865082763016/cache0" to "/home/generic-worker/caches/V1jpBh5tRKu5vWIxYbE4vA"
[taskcluster 2025-09-23T18:08:41.198Z] Uploading link artifact public/logs/live.log to artifact public/logs/live_backing.log with expiry 2026-09-23T16:52:37.779Z
[taskcluster:error] [d2g] could not load docker image: exit status 1
[taskcluster:error]
| Assignee | ||
Comment 1•5 months ago
|
||
From the worker logs:
Sep 23 18:07:33Z gecko-t-t-linux-docker-amd-bfjnvrqms8ctz-lwy7rnyw start-worker 2025/09/23 18:07:33 Starting task feature D2G...
Sep 23 18:07:33Z gecko-t-t-linux-docker-amd-bfjnvrqms8ctz-lwy7rnyw start-worker 2025/09/23 18:07:33 Loaded file d2g-image-cache.json
Sep 23 18:08:40Z gecko-t-t-linux-docker-amd-bfjnvrqms8ctz-lwy7rnyw dockerd time="2025-09-23T18:08:40.201958553Z" level=info msg="Layer sha256:53e4dfc804e0c6f716f1f6abab7e52946476239fe59ac49fbf7e111d240e6bfd cleaned up"
Sep 23 18:08:40Z gecko-t-t-linux-docker-amd-bfjnvrqms8ctz-lwy7rnyw dockerd time="2025-09-23T18:08:40.231717132Z" level=info msg="Layer sha256:337c58fbf55ec1878b83b9a421aaa4196f71af5b82cc927f98b5b26e2f238b76 cleaned up"
Sep 23 18:08:40Z gecko-t-t-linux-docker-amd-bfjnvrqms8ctz-lwy7rnyw dockerd time="2025-09-23T18:08:40.320021380Z" level=info msg="Layer sha256:107cbdaeec042e6154640c94972c638f4e2fee795902b149e8ce9acbd03d59d7 cleaned up"
Sep 23 18:08:41Z gecko-t-t-linux-docker-amd-bfjnvrqms8ctz-lwy7rnyw start-worker 2025/09/23 18:08:41 Stopping task feature D2G...
Matt, any idea why the load might have failed here?
Comment 2•5 months ago
|
||
These worker logs look suspicious
Sep 23 18:08:40Z gecko-t-t-linux-docker-amd-bfjnvrqms8ctz-lwy7rnyw dockerd time="2025-09-23T18:08:40.201958553Z" level=info msg="Layer sha256:53e4dfc804e0c6f716f1f6abab7e52946476239fe59ac49fbf7e111d240e6bfd cleaned up"
Sep 23 18:08:40Z gecko-t-t-linux-docker-amd-bfjnvrqms8ctz-lwy7rnyw dockerd time="2025-09-23T18:08:40.231717132Z" level=info msg="Layer sha256:337c58fbf55ec1878b83b9a421aaa4196f71af5b82cc927f98b5b26e2f238b76 cleaned up"
Sep 23 18:08:40Z gecko-t-t-linux-docker-amd-bfjnvrqms8ctz-lwy7rnyw dockerd time="2025-09-23T18:08:40.320021380Z" level=info msg="Layer sha256:107cbdaeec042e6154640c94972c638f4e2fee795902b149e8ce9acbd03d59d7 cleaned up"
But it's hard to say exactly what happened here without being able to see the stderr output. I could patch d2g so we could get that output to help in future issues. Let's see if this issue is prevalent or not.
Comment 3•5 months ago
|
||
I found a couple other tasks that ran on that worker that also depended on the same docker image that ran into the same issue.
https://firefox-ci-tc.services.mozilla.com/tasks/M2l--90iT4ub-nN8MKnkgQ/runs/0 and https://firefox-ci-tc.services.mozilla.com/tasks/Vb58OTZtSsql9MqK7E7jzg/runs/0
https://github.com/taskcluster/taskcluster/pull/7973 landed now and we'll have better logging for future failures once a new version is released/published to FxCI workers.
Also, looking at the dependents of https://firefox-ci-tc.services.mozilla.com/tasks/e2R5awZzQDqHxQcZOa1-4w I'm seeing multiple tasks that had the worker successfully load in this docker image, which leads me to believe there's some strange issue with that individual worker.
| Comment hidden (Intermittent Failures Robot) |
Comment 5•5 months ago
|
||
The latest version has been rolled out to FxCI workers today. Let's look out for similar failures and this time we should have more info on the specific error.
Comment 6•5 months ago
|
||
The severity field is not set for this bug.
:bhearsum, could you have a look please?
For more information, please visit BugBot documentation.
Updated•5 months ago
|
Updated•5 months ago
|
Comment 7•5 months ago
|
||
I discovered a handful more of these issues on that same pool https://firefox-ci-tc.services.mozilla.com/worker-manager/gecko-t%2Ft-linux-docker-amd
Now that they're running an updated version of Generic Worker that includes more d2g logging, I can see a root cause:
Oct 08 03:52:10 PM gecko-t-t-linux-docker-amd-jhp5qmujtf2j6ynvs3av2w start-worker 2025/10/08 19:52:10 ERROR(s) encountered: [d2g] could not load docker image: exit status 1
Oct 08 03:52:10 PM gecko-t-t-linux-docker-amd-jhp5qmujtf2j6ynvs3av2w start-worker stdout:
Oct 08 03:52:10 PM gecko-t-t-linux-docker-amd-jhp5qmujtf2j6ynvs3av2w start-worker stderr: write /usr/lib/x86_64-linux-gnu/libx265.so.199: no space left on device
:hneiva or :jcristau, do you think we should bump the disk size? Running into any other disk space issues similar to this on that pool?
Comment 8•5 months ago
|
||
Comment 9•5 months ago
|
||
I haven't noticed any errors like this, but then again I wasn't specifically looking for these errors.
I've added a PR to bump the disk size for the gecko pool to 120gb (was 75gb).
| Assignee | ||
Comment 10•5 months ago
|
||
Somehow we're not using the local ssd on that worker:
Oct 08 17:27:18 gecko-t-t-linux-docker-amd-jhp5qmujtf2j6ynvs3av2w configure-generic-worker-disks /mnt is not using an nvme0n* device. Setting it up...
Oct 08 17:27:18 gecko-t-t-linux-docker-amd-jhp5qmujtf2j6ynvs3av2w configure-generic-worker-disks Detecting available nvme0n* devices...
Oct 08 17:27:18 gecko-t-t-linux-docker-amd-jhp5qmujtf2j6ynvs3av2w configure-generic-worker-disks Found 1 nvme0n* devices: /dev/nvme0n1
Oct 08 17:27:18 gecko-t-t-linux-docker-amd-jhp5qmujtf2j6ynvs3av2w configure-generic-worker-disks Cannot use /dev/nvme0n1: device is partitioned
Oct 08 17:27:18 gecko-t-t-linux-docker-amd-jhp5qmujtf2j6ynvs3av2w configure-generic-worker-disks Command requires all devices to be found.
| Assignee | ||
Comment 11•5 months ago
•
|
||
Ah. Because the boot disk is on nvme0n1:
Oct 08 17:27:11 gecko-t-t-linux-docker-amd-jhp5qmujtf2j6ynvs3av2w multipath nvme0n1: uid = nvme.1ae0-6e766d655f636172642d7064-6e766d655f636172642d7064-00000001 (sysfs)
Oct 08 17:27:11 gecko-t-t-linux-docker-amd-jhp5qmujtf2j6ynvs3av2w systemd-fsck /dev/nvme0n1p15: 12 files, 12499/213663 clusters
Oct 08 17:27:11 gecko-t-t-linux-docker-amd-jhp5qmujtf2j6ynvs3av2w kernel nvme0n1: p1 p14 p15 p16
Oct 08 17:27:15 gecko-t-t-linux-docker-amd-jhp5qmujtf2j6ynvs3av2w kernel EXT4-fs (nvme0n1p1): Supports (experimental) DIO atomic writes awu_min: 4096, awu_max: 4096
Oct 08 17:27:15 gecko-t-t-linux-docker-amd-jhp5qmujtf2j6ynvs3av2w kernel EXT4-fs (nvme0n1p1): mounted filesystem f8b1fd37-145c-44d7-bd5d-420a37a1468c ro with ordered data mode. Quota mode: none.
Oct 08 17:27:15 gecko-t-t-linux-docker-amd-jhp5qmujtf2j6ynvs3av2w kernel EXT4-fs (nvme0n1p1): re-mounted f8b1fd37-145c-44d7-bd5d-420a37a1468c r/w.
Oct 08 17:27:15 gecko-t-t-linux-docker-amd-jhp5qmujtf2j6ynvs3av2w kernel EXT4-fs (nvme0n1p16): Supports (experimental) DIO atomic writes awu_min: 4096, awu_max: 4096
Oct 08 17:27:15 gecko-t-t-linux-docker-amd-jhp5qmujtf2j6ynvs3av2w kernel EXT4-fs (nvme0n1p16): mounted filesystem 44bd5d32-077f-4b29-871d-cfeb7677f0f6 r/w with ordered data mode. Quota mode: none.
Oct 08 17:27:15 gecko-t-t-linux-docker-amd-jhp5qmujtf2j6ynvs3av2w kernel EXT4-fs (nvme0n1p1): resizing filesystem from 15466235 to 19398395 blocks
Oct 08 17:27:15 gecko-t-t-linux-docker-amd-jhp5qmujtf2j6ynvs3av2w kernel EXT4-fs (nvme0n1p1): resized filesystem to 19398395
Oct 08 17:27:18 gecko-t-t-linux-docker-amd-jhp5qmujtf2j6ynvs3av2w configure-generic-worker-disks Found 1 nvme0n* devices: /dev/nvme0n1
Oct 08 17:27:18 gecko-t-t-linux-docker-amd-jhp5qmujtf2j6ynvs3av2w configure-generic-worker-disks Cannot use /dev/nvme0n1: device is partitioned
Oct 08 18:58:39 gecko-t-t-linux-docker-amd-jhp5qmujtf2j6ynvs3av2w fstrim /boot/efi: 98.2 MiB (102995968 bytes) trimmed on /dev/nvme0n1p15
Oct 08 18:58:39 gecko-t-t-linux-docker-amd-jhp5qmujtf2j6ynvs3av2w fstrim /boot: 795 MiB (833662976 bytes) trimmed on /dev/nvme0n1p16
Oct 08 18:58:39 gecko-t-t-linux-docker-amd-jhp5qmujtf2j6ynvs3av2w fstrim /: 16.2 GiB (17419616256 bytes) trimmed on /dev/nvme0n1p1
So where's the local ssd?
| Assignee | ||
Comment 12•5 months ago
|
||
So where's the local ssd?
It's at /dev/nvme1n1, which the configure-generic-worker-disks script ignores.
| Assignee | ||
Comment 13•5 months ago
|
||
| Assignee | ||
Updated•5 months ago
|
| Assignee | ||
Comment 14•5 months ago
|
||
OK this looks better:
Oct 09 16:36:05 gecko-t-t-linux-docker-amd-alpha-cqv594rxql-7qv-gnwtziq configure-generic-worker-disks /mnt is not using a local SSD. Setting it up...
Oct 09 16:36:05 gecko-t-t-linux-docker-amd-alpha-cqv594rxql-7qv-gnwtziq configure-generic-worker-disks Detecting available local SSDs...
Oct 09 16:36:05 gecko-t-t-linux-docker-amd-alpha-cqv594rxql-7qv-gnwtziq configure-generic-worker-disks Found 1 local SSD devices: /dev/disk/by-id/google-local-nvme-ssd-0
Oct 09 16:36:05 gecko-t-t-linux-docker-amd-alpha-cqv594rxql-7qv-gnwtziq configure-generic-worker-disks Physical volume "/dev/disk/by-id/google-local-nvme-ssd-0" successfully created.
Oct 09 16:36:05 gecko-t-t-linux-docker-amd-alpha-cqv594rxql-7qv-gnwtziq configure-generic-worker-disks Volume group "instance_storage" successfully created
Oct 09 16:36:05 gecko-t-t-linux-docker-amd-alpha-cqv594rxql-7qv-gnwtziq configure-generic-worker-disks Logical volume "lv_instance_storage" created.
Oct 09 16:36:05 gecko-t-t-linux-docker-amd-alpha-cqv594rxql-7qv-gnwtziq configure-generic-worker-disks mke2fs 1.47.0 (5-Feb-2023)
Oct 09 16:36:08 gecko-t-t-linux-docker-amd-alpha-cqv594rxql-7qv-gnwtziq configure-generic-worker-disks Discarding device blocks: 0/98302976#010#010#010#010#010#010#010#010#010#010#010#010#010#010#010#010#01027262976/98302976#010#010#010#010#010#010#010#010#010#010#010#010#010#010#010#010#01063438848/98302976#010#010#010#010#010#010#010#010#010#010#010#010#010#010#010#010#010 #010#010#010#010#010#010#010#010#010#010#010#010#010#010#010#010#010done
Oct 09 16:36:08 gecko-t-t-linux-docker-amd-alpha-cqv594rxql-7qv-gnwtziq configure-generic-worker-disks Creating filesystem with 98302976 4k blocks and 24576000 inodes
Oct 09 16:36:08 gecko-t-t-linux-docker-amd-alpha-cqv594rxql-7qv-gnwtziq configure-generic-worker-disks Filesystem UUID: 474bab00-b63f-4ac0-a4ca-49fca3977c8f
Oct 09 16:36:08 gecko-t-t-linux-docker-amd-alpha-cqv594rxql-7qv-gnwtziq configure-generic-worker-disks Superblock backups stored on blocks:
Oct 09 16:36:08 gecko-t-t-linux-docker-amd-alpha-cqv594rxql-7qv-gnwtziq configure-generic-worker-disks #01132768, 98304, 163840, 229376, 294912, 819200, 884736, 1605632, 2654208,
Oct 09 16:36:08 gecko-t-t-linux-docker-amd-alpha-cqv594rxql-7qv-gnwtziq configure-generic-worker-disks #0114096000, 7962624, 11239424, 20480000, 23887872, 71663616, 78675968
Oct 09 16:36:08 gecko-t-t-linux-docker-amd-alpha-cqv594rxql-7qv-gnwtziq configure-generic-worker-disks Allocating group tables: 0/3000#010#010#010#010#010#010#010#010#010 #010#010#010#010#010#010#010#010#010done
Oct 09 16:36:08 gecko-t-t-linux-docker-amd-alpha-cqv594rxql-7qv-gnwtziq configure-generic-worker-disks Writing inode tables: 0/3000#010#010#010#010#010#010#010#010#010 #010#010#010#010#010#010#010#010#010done
Oct 09 16:36:08 gecko-t-t-linux-docker-amd-alpha-cqv594rxql-7qv-gnwtziq configure-generic-worker-disks Creating journal (262144 blocks): done
Oct 09 16:36:08 gecko-t-t-linux-docker-amd-alpha-cqv594rxql-7qv-gnwtziq configure-generic-worker-disks Writing superblocks and filesystem accounting information: 0/3000#010#010#010#010#010#010#010#010#010 #010#010#010#010#010#010#010#010#010done
Oct 09 16:36:08 gecko-t-t-linux-docker-amd-alpha-cqv594rxql-7qv-gnwtziq configure-generic-worker-disks umount: /home: not mounted.
Oct 09 16:36:08 gecko-t-t-linux-docker-amd-alpha-cqv594rxql-7qv-gnwtziq configure-generic-worker-disks umount: /mnt: not mounted.
Oct 09 16:36:12 gecko-t-t-linux-docker-amd-alpha-cqv594rxql-7qv-gnwtziq configure-generic-worker-disks /mnt is now using local SSDs.
Oct 09 16:36:12 gecko-t-t-linux-docker-amd-alpha-cqv594rxql-7qv-gnwtziq configure-generic-worker-disks Creating directories for generic-worker
Oct 09 16:36:12 gecko-t-t-linux-docker-amd-alpha-cqv594rxql-7qv-gnwtziq configure-generic-worker-disks Creating docker specific directories
| Comment hidden (Intermittent Failures Robot) |
| Assignee | ||
Comment 16•5 months ago
|
||
Description
•