Open Bug 1833507 Opened 2 years ago Updated 1 year ago

docker image task expired its docker image before the task expired

Categories

(Firefox Build System :: Task Configuration, defect, P2)

defect

Tracking

(Not tracked)

People

(Reporter: bhearsum, Unassigned)

References

(Depends on 1 open bug)

Details

This causes us to insert it into graphs as a cached task, but the tasks that depended on it couldn't pull the image.

Task in question: https://firefox-ci-tc.services.mozilla.com/tasks/J1D9qCm9SHmjbXjs8EH6sA

It expires on 2024-02-14T05:27:42.688Z while the docker image artifact was 2023-05-15T05:27:42.688Z.

I don't know if this is widespread or not, but it could be. Possibly related to https://bugzilla.mozilla.org/show_bug.cgi?id=1809580 ?

Considering the task you're linking is defined with:

  ...
  "expires": "2024-02-14T05:27:42.688Z",
  ...
  "artifacts": {
      "public/image.tar.zst": {
        "path": "/workspace/image.tar.zst",
        "type": "file",
        "expires": "2023-05-15T05:27:42.688Z"
      }
    },

... pretty sure it's the same underlying issue.

See Also: → 1833488
See Also: → 1809580

A couple ideas we've discussed:

  • adding a verification to ensure that cached tasks' artifacts don't expire until the task itself does
  • having tasks declare the artifacts they need, and make index-search take that into consideration when deciding whether to reuse a task

having tasks declare the artifacts they need, and make index-search take that into consideration when deciding whether to reuse a task

For tasks that use fetch-content (which should be most of them except tests), that's in the MOZ_FETCHES environment variable in the task definition.

Severity: -- → S3
Priority: -- → P2

3 months later, this happened again (because that's the expiry time for the docker images).

... which says bug 1809580 was not enough... (look around) so, in fact bug 1809580 seems to have been enough, but when it landed, some images weren't retriggered (or all of them?), so we still had the soon to expire docker images from before bug 1809580. Heitor triggered the Rebuild Cached Tasks action, which created new images that expire in a year.

See Also: → 1855814
Depends on: 1860081

I wrote up a verification for this in bug 1860081, but it actually didn't find any problematic tasks. To test, I modified some cached tasks to expire their artifacts after a day and the verification detected those no problem, so I think it's working properly.

So I guess someone must have fixed cached task expiry at some point, and the errors were seeing now are just the tasks that were created before the fix? If that's the case, we could rebuild cached tasks if we wanted to avoid seeing this again.

See Also: → 1891183
You need to log in before you can comment on or make changes to this bug.