sccache write errors on at least some Windows builds on try
Categories
(Cloud Services :: Operations: Taskcluster, defect)
Tracking
(Not tracked)
People
(Reporter: glandium, Assigned: edunham)
Details
I just figured out randomly that sccache write errors were occurring on some but not all Windows builds on try. It's plausible that the buckets permissions are not setup properly or something along those lines.
The sccache logs say:
Cache write error: Error(Msg("failed to put cache entry in s3"), State { next_error: Some(Error(BadHTTPStatus(400), State { next_error: None, backtrace: InternalBacktrace })), backtrace: InternalBacktrace })
This doesn't happen on autoland.
The few logs I've looked at randomly were all on eu-central-1b or eu-central-1c, using the taskcluster-level-1-sccache-eu-central-1 bucket.
Updated•5 years ago
|
A 400 error strongly suggests that the error looks client-side; I'd expect to see 5xx from a bucket permissions problem. Is it possible to inspect an individual cache write query that got an error compared to one that didn't using your tooling?
The sccache buckets are all in the mozilla-taskcluster
AWS account, which I think I accidentally broke my access to when I switched my 2-factor auth to a new phone, so I'll get that fixed tomorrow in order to take a closer look at how taskcluster-level-1-sccache-eu-central-1
might differ from the others.
Comment 2•5 years ago
|
||
Is this still a cause for concern? I see the treeherder graph shows errors much less frequently now.
I think edunham is slightly wrong, in that we'd expect to see 403 for a permissions problem. However, this error is 400 not 403, which does suggest an issue with the request the client is making.
It it possible to get the sccache code to report the actual error code returned int he response from S3? Those are listed at https://docs.aws.amazon.com/AmazonS3/latest/API/ErrorResponses.html#ErrorCodeList , and you can see they let you distinguish between the many possible causes of a 400 response.
![]() |
||
Comment 3•5 years ago
|
||
(In reply to Brian Pitts from comment #2)
It it possible to get the sccache code to report the actual error code returned int he response from S3? Those are listed at https://docs.aws.amazon.com/AmazonS3/latest/API/ErrorResponses.html#ErrorCodeList , and you can see they let you distinguish between the many possible causes of a 400 response.
I think this is possible. It would involve either updating sccache's S3 code:
https://github.com/mozilla/sccache/blob/master/src/simples3/s3.rs
or updating the Rusoto patch (assuming Rusoto has better/richer s3 HTTP error handling the current code):
Comment 4•5 years ago
|
||
I'm going to close this since I don't think there's anything we can do to help on the Taskcluster operations side. Feel free to reopen if that changes.
Description
•