Closed
Bug 1314284
Opened 9 years ago
Closed 9 years ago
cloud-mirror redirecting to S3 URLs that 404
Categories
(Taskcluster :: General, defect)
Taskcluster
General
Tracking
(Not tracked)
RESOLVED
FIXED
People
(Reporter: intermittent-bug-filer, Unassigned)
References
Details
(Keywords: intermittent-failure)
Filed by: philringnalda [at] gmail.com
https://treeherder.mozilla.org/logviewer.html#?job_id=4038477&repo=mozilla-aurora
https://queue.taskcluster.net/v1/task/b5mkEdlKSTez2kb6yKaQEQ/runs/0/artifacts/public%2Flogs%2Flive_backing.log
"<dustin> looks like hte failures are all in us-east-1"
Comment 1•9 years ago
|
||
We have been in a call investigating this for 3 hours now :)
The error has not recurred beyond this instance, as the object eventually stopped 404'ing and started 200'ing. We were able to find another object illustrating this issue that was not causing failures but which we could reproduce using some simple `curl` operations. We purged this object and it is now stuck in the copying state.
We have not yet found a root cause, although we have a few possibilities:
* eventual-consistency issue with S3
* bug in cloud-mirror flagging a failed copy as successful
And have made a change which may help:
* Disable backfilling (which causes a HEAD request to S3 objects before copying them, which disables S3's read-after-write consistency)
Status: NEW → RESOLVED
Closed: 9 years ago
Resolution: --- → FIXED
Comment 2•9 years ago
|
||
Affected objects:
https://cloud-mirror-production-us-east-1.s3.amazonaws.com/https%3A%2F%2Fs3-us-west-2.amazonaws.com%2Ftaskcluster-public-artifacts%2FZsXD2f8BTq61uQ-T6mkyMw%2F0%2Fpublic%2Fimage.tar
(comment 0 - this one reappeared somehow)
https://cloud-mirror-production-us-east-1.s3.amazonaws.com/https%3A%2F%2Fs3-us-west-2.amazonaws.com%2Ftaskcluster-public-artifacts%2FD1kBHRulQpqMw5hxVup2wg%2F0%2Fpublic%2Fimage.tar
(comment 1 - reproduced via curl, then purged (delete from redis and from S3), and is now copying)
Comment hidden (Intermittent Failures Robot) |
Comment 4•9 years ago
|
||
This is recurring quite a lot, per some emails from papertrail and per a current treeclosure.
Blocks: 1302596
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Summary: Intermittent [taskcluster:error] Error: Error loading docker image. Could not download artifact "public/image.tar from task "ZsXD2f8BTq61uQ-T6mkyMw" after 1 attempt(s). Error: Not Found → cloud-mirror redirecting to S3 URLs that 404
Comment 5•9 years ago
|
||
We stopped doing HEAD requests (by removing the backfilling support) on Nov 1.
https://github.com/taskcluster/cloud-mirror/pull/24
I believe jonas and john landed some error-handling fixes yesterday (they were merged, but I don't know about deployment):
https://github.com/taskcluster/cloud-mirror/pull/27
https://github.com/taskcluster/cloud-mirror/pull/28
https://github.com/taskcluster/cloud-mirror/pull/29
https://github.com/taskcluster/cloud-mirror/pull/30
Comment 6•9 years ago
|
||
Regarding S3 consistency:
- we have determined that we are using the read-after-write consistency endpoints everywhere
- we have seen this with mirrored artifacts in us-west-1 as well
- while read-after-write fails if there is a HEAD request just before the PUT, this effect is short-lived, not the hours we're seeing
- S3 is eventually consistent for "existing" objects. Discussion ongoing as to what that means.
Comment 7•9 years ago
|
||
The failure today is for ZsXD2f8BTq61uQ-T6mkyMw/public/image.tar
Comment 8•9 years ago
|
||
I purged that object from us-west-1 and us-east-1
curl -v -X DELETE https://cloud-mirror.taskcluster.net/v1/purge/s3/us-west-1/https%3A%2F%2Fs3-us-west-2.amazonaws.com%2Ftaskcluster-public-artifacts%2FZsXD2f8BTq61uQ-T6mkyMw%2F0%2Fpublic%2Fimage.tar
< Date: Fri, 04 Nov 2016 13:18:06 GMT
Comment 9•9 years ago
|
||
Fixes for cloud-mirror have been deployed around 09:15 CDT that we are hopeful will correct this issue. We'll be monitoring for issues.
Comment 10•9 years ago
|
||
I just flushed the redis cache to force all possible invalid present values to be removed
> flushdb
OK
(0.89s)
Comment hidden (Intermittent Failures Robot) |
Comment 12•9 years ago
|
||
Those 7 days are not since the fix 4 days ago, so *done*
Status: REOPENED → RESOLVED
Closed: 9 years ago → 9 years ago
Resolution: --- → FIXED
You need to log in
before you can comment on or make changes to this bug.
Description
•