Closed Bug 1364463 Opened 8 years ago Closed 8 years ago

Bad objects being cached in cloud mirror

Categories

(Taskcluster :: General, defect)

defect
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: catlee, Assigned: jhford)

References

Details

In https://treeherder.mozilla.org/logviewer.html#?job_id=98688466&repo=mozilla-central&lineNumber=145, we failed to download a partial update from TC. The relevant lines are: 2017-05-12 12:00:09,824 - INFO - Downloading https://queue.taskcluster.net/v1/task/OuO2Yft0Qpu5_4yEXUFZjQ/artifacts/public/env/Firefox-mozilla-central-55.0a1-linux64-es-MX-20170510183715-20170512100218.partial.mar to /tmp/tmptpwz2Z... 2017-05-12 12:00:09,824 - DEBUG - attempt 1/5 2017-05-12 12:00:09,824 - DEBUG - retry: Calling <function download at 0x7fc3bcaf78c0> with args: ('https://queue.taskcluster.net/v1/task/OuO2Yft0Qpu5_4yEXUFZjQ/artifacts/public/env/Firefox-mozilla-central-55.0a1-linux64-es-MX-20170510183715-20170512100218.partial.mar', '/tmp/tmptpwz2Z'), kwargs: {}, attempt #1 2017-05-12 12:00:09,825 - DEBUG - Downloading https://queue.taskcluster.net/v1/task/OuO2Yft0Qpu5_4yEXUFZjQ/artifacts/public/env/Firefox-mozilla-central-55.0a1-linux64-es-MX-20170510183715-20170512100218.partial.mar to /tmp/tmptpwz2Z 2017-05-12 12:00:09,828 - INFO - Starting new HTTPS connection (1): queue.taskcluster.net 2017-05-12 12:00:10,079 - DEBUG - "GET /v1/task/OuO2Yft0Qpu5_4yEXUFZjQ/artifacts/public/env/Firefox-mozilla-central-55.0a1-linux64-es-MX-20170510183715-20170512100218.partial.mar HTTP/1.1" 303 29 2017-05-12 12:00:10,081 - INFO - Starting new HTTPS connection (1): cloud-mirror.taskcluster.net 2017-05-12 12:00:16,459 - DEBUG - "GET /v1/redirect/s3/us-east-1/https%3A%2F%2Fs3-us-west-2.amazonaws.com%2Ftaskcluster-public-artifacts%2FOuO2Yft0Qpu5_4yEXUFZjQ%2F0%2Fpublic%2Fenv%2FFirefox-mozilla-central-55.0a1-linux64-es-MX-20170510183715-20170512100218.partial.mar HTTP/1.1" 302 301 2017-05-12 12:00:16,460 - INFO - Starting new HTTPS connection (1): cloud-mirror-production-us-east-1.s3.amazonaws.com 2017-05-12 12:00:16,504 - DEBUG - "GET /https%3A%2F%2Fs3-us-west-2.amazonaws.com%2Ftaskcluster-public-artifacts%2FOuO2Yft0Qpu5_4yEXUFZjQ%2F0%2Fpublic%2Fenv%2FFirefox-mozilla-central-55.0a1-linux64-es-MX-20170510183715-20170512100218.partial.mar HTTP/1.1" 200 282 2017-05-12 12:00:16,505 - DEBUG - Downloaded 282 bytes 2017-05-12 12:00:16,505 - DEBUG - Content-Length: 282 bytes The original artifact as fetched from queue.tc.net looks ok: curl -iL https://queue.taskcluster.net/v1/task/OuO2Yft0Qpu5_4yEXUFZjQ/ar tifacts/public/env/Firefox-mozilla-central-55.0a1-linux64-es-MX-20170510183715-20170512100218.p artial.mar HTTP/1.1 303 See Other Server: Cowboy Connection: keep-alive X-Powered-By: Express Strict-Transport-Security: max-age=7776000 I Access-Control-Allow-Origin: * Access-Control-Allow-Methods: OPTIONS,GET,HEAD,POST,PUT,DELETE,TRACE,CONNECT Access-Control-Request-Method: * Access-Control-Allow-Headers: X-Requested-With,Content-Type,Authorization,Accept,Origin Location: https://public-artifacts.taskcluster.net/OuO2Yft0Qpu5_4yEXUFZjQ/0/public/env/Firefox- mozilla-central-55.0a1-linux64-es-MX-20170510183715-20170512100218.partial.mar Vary: Accept Content-Type: text/plain; charset=utf-8 Content-Length: 29 Date: Fri, 12 May 2017 15:15:25 GMT Via: 1.1 vegur HTTP/1.1 200 OK Content-Type: application/octet-stream Content-Length: 16954022 Connection: keep-alive Date: Fri, 12 May 2017 15:15:28 GMT Last-Modified: Fri, 12 May 2017 11:53:32 GMT ETag: "13c2fca56a66fc69e9e5c6842102d1a7" x-amz-version-id: qxFDZLatR5t5JPcmEOuCBp39xce0Bvj. Accept-Ranges: bytes Server: AmazonS3 X-Cache: Miss from cloudfront Via: 1.1 392869124c677c4f82415d8ce2dcdd73.cloudfront.net (CloudFront) X-Amz-Cf-Id: IDTZ0dYFwS2H4bVdte-l3ksMdtox1otfUFDiK0hH5RAKe5b8_8ZsNg== But fetching the cached version from cloud mirror fails: curl -iL https://cloud-mirror.taskcluster.net/v1/redirect/s3/us-east-1/https%3A%2F%2Fs3-us-west-2.amazonaws.com%2Ftaskcluster-public-artifacts%2FOuO2Yft0Qpu5_4yEXUFZjQ%2F0%2Fpublic%2Fenv%2FFirefox-mozilla-central-55.0a1-linux64-es-MX-20170510183715-20170512100218.partial.mar HTTP/1.1 302 Found Server: Cowboy Connection: keep-alive X-Powered-By: Express Strict-Transport-Security: max-age=7776000 Access-Control-Allow-Origin: * Access-Control-Allow-Methods: OPTIONS,GET,HEAD,POST,PUT,DELETE,TRACE,CONNECT Access-Control-Request-Method: * Access-Control-Allow-Headers: X-Requested-With,Content-Type,Authorization,Accept,Origin Location: https://cloud-mirror-production-us-east-1.s3.amazonaws.com/https%3A%2F%2Fs3-us-west-2.amazonaws.com%2Ftaskcluster-public-artifacts%2FOuO2Yft0Qpu5_4yEXUFZjQ%2F0%2Fpublic%2Fenv%2FFirefox-mozilla-central-55.0a1-linux64-es-MX-20170510183715-20170512100218.partial.mar Content-Type: application/json; charset=utf-8 Content-Length: 301 Etag: W/"12d-ea1202a0" Date: Fri, 12 May 2017 15:16:13 GMT Via: 1.1 vegur HTTP/1.1 200 OK x-amz-id-2: rKtg9RE+ueB3airfWkBLARA4VpGofIoSpYTYjab2KjCJqQ4dsIoof9M+FZsYOX79IP5bwLH024g= x-amz-request-id: EC1BF649940A6CF3 Date: Fri, 12 May 2017 15:16:14 GMT Last-Modified: Fri, 12 May 2017 12:00:17 GMT x-amz-expiration: expiry-date="Sun, 14 May 2017 00:00:00 GMT", rule-id="us-east-1-1-day" ETag: "7b65e6af6641ddd18bbff71e39eba530" x-amz-meta-cloud-mirror-upstream-url: https://s3-us-west-2.amazonaws.com/taskcluster-public-artifacts/OuO2Yft0Qpu5_4yEXUFZjQ/0/public/env/Firefox-mozilla-central-55.0a1-linux64-es-MX-20170510183715-20170512100218.partial.mar x-amz-meta-cloud-mirror-upstream-content-length: <unknown> x-amz-meta-cloud-mirror-stored: 2017-05-12T12:00:15.839Z x-amz-meta-cloud-mirror-upstream-etag: <unknown> x-amz-meta-cloud-mirror-addresses: [{"c":200,"u":"https://s3-us-west-2.amazonaws.com/taskcluster-public-artifacts/OuO2Yft0Qpu5_4yEXUFZjQ/0/public/env/Firefox-mozilla-central-55.0a1-linux64-es-MX-20170510183715-20170512100218.partial.mar","t":"2017-05-12T12:00:10.717Z"}] Accept-Ranges: bytes Content-Type: application/xml Content-Length: 282 Server: AmazonS3 <?xml version="1.0" encoding="UTF-8"?> <Error><Code>InternalError</Code><Message>We encountered an internal error. Please try again.</Message><RequestId>F177343936D4C750</RequestId><HostId>qsJdhOO7uH65Ug9v+nMMSBGlkJslzGJcvjMiZH6HWyMGTo44U4/RjMtV0bkPgGs3HE1dHO4mNFY=</HostId></Error>
Is this a consistent problem or the usually very infrequent one?
Assignee: nobody → jhford
It looks like this is not recurring. Please feel free to open a new bug if that's not the case. Generally speaking, this corruption has been caught around 3-4 times in the last two years, and until we have the new SHA256 stuff in the queue, it's pretty difficult for this to be detected and remedied automatically. I'm going to mark this bug as FIXED because the invalid file was successfully purged from the cache.
Status: NEW → RESOLVED
Closed: 8 years ago
Resolution: --- → FIXED
I suspect we only notice it when it breaks nightly or release updates. Should automation be modified to purge the cache in case of errors?
Ideally yes, but when the new artifact api lands we'll be able to detect whether the file is the one that was intended to be created. I intend to also add some headers into cloud-mirror based files which will let us know whether the corruption occured there. The ideal outcome would be that cloud-mirror would be able to use the x-amz-meta-{content,transfer}-sha256 values to detect invalid transfer and reject the transfer.
Depends on: 1433059
You need to log in before you can comment on or make changes to this bug.