Open Bug 1528422 Opened 5 years ago Updated 1 year ago

Failure to preserve cache should not be a task failure

Categories

(Firefox Build System :: Task Configuration, task)

task

Tracking

(Not tracked)

People

(Reporter: glandium, Unassigned)

References

(Depends on 1 open bug)

Details

https://taskcluster-artifacts.net/Q4U57yaWQziPsY4k8u8qXg/0/public/logs/live_backing.log

[taskcluster:error] [mounts] Could not unmount <nil> due to: 'Could not persist cache "level-3-checkouts-sparse" due to mkdir Y:\caches\RCJjhIy6RxWPWOp7McwHhA\src\build\build-clang\build-clang\src\llvm\test\MC\Disassembler\Mips\mips32r6: There is not enough space on the disk.'
[taskcluster 2019-02-16T00:02:26.973Z] Uploading redirect artifact public/logs/live.log to URL https://queue.taskcluster.net/v1/task/Q4U57yaWQziPsY4k8u8qXg/runs/0/artifacts/public/logs/live_backing.log with mime type "text/plain; charset=utf-8" and expiry 2020-02-15T23:03:03.157Z
[taskcluster:error] Could not persist cache "level-3-checkouts-sparse" due to mkdir Y:\caches\RCJjhIy6RxWPWOp7McwHhA\src\build\build-clang\build-clang\src\llvm\test\MC\Disassembler\Mips\mips32r6: There is not enough space on the disk.

The task was otherwise successful. That the worker could not persist the cache is irrelevant and shouldn't turn the build red, which prevents things depending on it from running.

Blocks: 1528155

Pete has good reasons to think otherwise -- I'll let him describe when he's back.

Flags: needinfo?(pmoore)

The rationale for not being able to unmount a cache defaulting to a task failure rather than task exception is described in bug 1527799 comment 5.

However, for the issue raised in this bug, the inability to persist the cache is indeed the fault of the worker, not the task, and therefore should be a task exception rather than failure. However, this goes away when bug 1526311 is fixed.

Depends on: 1526311
Flags: needinfo?(pmoore)
See Also: → 1527799

and therefore should be a task exception rather than failure

I don't agree. A task exception may or may not trigger a rerun, depending on the number of retries left, but the fact is, the task was successful, and its artifacts have been stored properly. The worker may want to die so as not to be used for subsequent tasks, but the task that just finished should just be marked successful. A task should not fail (exceptions are just another way to fail, that triggers retries) when the worker is just unable to do its cleanup.

Severity: normal → S3
You need to log in before you can comment on or make changes to this bug.