Closed Bug 1908324 Opened 7 months ago Closed 7 months ago

requests.exceptions.HTTPError: 404 Client Error: Not Found for url: chain-of-trust.json when commit hook tries to update clang-tidy

Categories

(Firefox Build System :: Toolchains, defect)

defect

Tracking

(firefox130 fixed)

RESOLVED FIXED
130 Branch
Tracking Status
firefox130 --- fixed

People

(Reporter: Gijs, Assigned: jcristau)

References

(Blocks 1 open bug)

Details

Attachments

(1 file)

$ hg commit -m "<elided>"
Ô£û 0 problems (0 errors, 1 warning, 0 fixed)
(pass -W/--warnings to see warnings.)
Error running mach:

    mach --log-no-times artifact toolchain --from-task UqfhacfnS0OYspyDimMOLw:public/build/clang-tidy.tar.zst

The error occurred in code that was called by the mach command. This is either
a bug in the called code itself or in the way that mach is calling it.
You can invoke ``./mach busted`` to check if this issue is already on file. If it
isn't, please use ``./mach busted file artifact`` to report it. If ``./mach busted`` is
misbehaving, you can also inspect the dependencies of bug 1543241.

If filing a bug, please include the full output of mach, including this error
message.

The details of the failure are as follows:

requests.exceptions.HTTPError: 404 Client Error: Not Found for url: https://firefox-ci-tc.services.mozilla.com/api/queue/v1/task/UqfhacfnS0OYspyDimMOLw/artifacts/public/chain-of-trust.json

  File "path\to\mozilla-unified\python\mozbuild\mozbuild\artifact_commands.py", line 496, in artifact_toolchain
    record = ArtifactRecord(task_id, name)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "path\to\mozilla-unified\python\mozbuild\mozbuild\artifact_commands.py", line 339, in __init__
    cot.raise_for_status()
  File "path\to\mozilla-unified\third_party\python\requests\requests\models.py", line 1021, in raise_for_status
    raise HTTPError(http_error_msg, response=self)

Sentry event ID: 31ab7ae302b547ab9d1c00acffdeff45
Sentry is attempting to send 0 pending error messages
Waiting up to 2 seconds
Press Ctrl-Break to quit
created new head

Same error when trying to run ./mach lint --outgoing.

This is on m-c tip from yesterday morning so any build tasks should long have completed...

Aaaannnd mach bootstrap fails with the same error.

Looks like https://firefox-ci-tc.services.mozilla.com/tasks/UqfhacfnS0OYspyDimMOLw has been created but not run?

I don't know why the mach commands are not filtering this out / using only a task that has run.

(In reply to :Gijs (he/him) from comment #3)

Looks like https://firefox-ci-tc.services.mozilla.com/tasks/UqfhacfnS0OYspyDimMOLw has been created but not run?

It says it started 1 minute later than its creation time. It also says it comes from autoland. You're asking for trouble building off autoland.

I don't know why the mach commands are not filtering this out / using only a task that has run.

There would be nothing to fall back to.

(In reply to Mike Hommey [:glandium] from comment #4)

(In reply to :Gijs (he/him) from comment #3)

Looks like https://firefox-ci-tc.services.mozilla.com/tasks/UqfhacfnS0OYspyDimMOLw has been created but not run?

It says it started 1 minute later than its creation time. It also says it comes from autoland. You're asking for trouble building off autoland.

I didn't build from autoland, I built from central (and in fact, central from yesterday morning - rev https://hg.mozilla.org/mozilla-central/rev/2ed6b77c66d3 . There is more context in the scrollback from #developers on matrix (conversation with jcristau). Can you check again what the right outcome here is supposed to be, with that context?

I don't know why the mach commands are not filtering this out / using only a task that has run.

There would be nothing to fall back to.

I don't really understand this part. Even if I manually use mach artifact toolchain to install an earlier clang-tidy (which succeeded), if I then run ./mach lint it once again tries to install that same "broken" task from comment 0. Why not just use the clang-tidy that's already there? From what jcristau said, also, the taskcluster index "latest" should only reference completed tasks, and so if we were using that we should always get a completed copy of clang-tidy to do linting with on a local build.

Flags: needinfo?(mh+mozilla)

(In reply to Mike Hommey [:glandium] from comment #4)

(In reply to :Gijs (he/him) from comment #3)

Looks like https://firefox-ci-tc.services.mozilla.com/tasks/UqfhacfnS0OYspyDimMOLw has been created but not run?

It says it started 1 minute later than its creation time

Also, I don't believe this is correct. The task creation time was:

T09:06:04.373Z

and the task start time was:

T10:09:23.261Z

(1 hour and 3 minutes later)

and the task finish time was another 14 minutes or so after that. Certainly, this morning, I ended up submitting a patch without going through clang-tidy (or clang-format) because I was unable to run it locally despite spending the best part of half an hour if not longer trying, which then meant additional churn as I had to update the patch again once it was up on phab and that rang clang-tidy/clang-format for me.

Oh, this is a case of a change on autoland triggering new toolchain builds without a hash change because all in all, the change didn't affect those toolchains. So during the time they're waiting to be built, you're kind of screwed.

the taskcluster index "latest" should only reference completed tasks

The taskcluster index "latest" may also randomly reference tasks on release branches. But I'm not sure what makes the "latest" index different in when it's updated, because the routes definitions are not different. And presumably they're both in a similar situation: already existing index keys that change where they point to. Julien, can you elaborate?

Flags: needinfo?(mh+mozilla) → needinfo?(jcristau)

Oh, is it the opposite, where the decision task preemptively sets the index keys, except for latest?

Right, so the decision task triggered (among others) https://firefox-ci-tc.services.mozilla.com/tasks/PqDurZEjQ_6SfuX_o5aspg ("eager-index-toolchain-win64-clang-tidy"), whose log says:
Inserting UqfhacfnS0OYspyDimMOLw into index (rank 0) under: gecko.cache.level-3.toolchains.v3.win64-clang-tidy.hash.344612d8cea6f5668d6c34de81487fea6920390c6703e9de7d58befe7f8be6cb
AFAICT the problem here is that https://firefox-ci-tc.services.mozilla.com/tasks/UqfhacfnS0OYspyDimMOLw (and thus presumably also previous executions of "toolchain-win64-clang-tidy") also has rank 0. If it didn't, then the eager-index would have been a no-op, and we would have continued pointing at a successful/completed task all along.

Flags: needinfo?(jcristau)

Tier-2 tasks default to index rank 0, which means their index entry can
get overwritten with an unfinished task when the eager-index task runs,
breaking consumers. Use the build_date instead, same as for tier-1
tasks, so that the eager-index task only comes into play when the cache
hash changes.

Assignee: nobody → jcristau
Status: NEW → ASSIGNED
Attachment #9413445 - Attachment description: Bug 1908324 - use non-zero index rank for win64-clang-tidy toolchain task. r?glandium → Bug 1908324 - make win64-clang-tidy toolchain task tier-1. r?glandium

I went digging a bit more.. this by-tier default comes from bug 1274311. At the time we had jobs running in buildbot as tier 1 and in taskcluster as tier 2, with the same index path, and we wanted the index to keep pointing at the tier 1 jobs.

A fix here might be to use rank 1 instead of 0 for non-tier 1 jobs, that way they'd still be lower than tier 1 but would take precedence over eager-index.

Component: General → Toolchains
Pushed by jcristau@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/fd4c60ac71c2 make win64-clang-tidy toolchain task tier-1. r=glandium DONTBUILD
Duplicate of this bug: 1720793
See Also: → 1908626
Status: ASSIGNED → RESOLVED
Closed: 7 months ago
Resolution: --- → FIXED
Target Milestone: --- → 130 Branch
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: