Closed
Bug 1306189
Opened 8 years ago
Closed 6 years ago
Determine root cause of truncated zip files ("Check bug 1306189 for details on downloading a truncated zip file." or "[test-linux.sh:error] error unzipping mozharness.zip")
Categories
(Taskcluster :: General, defect)
Taskcluster
General
Tracking
(Not tracked)
RESOLVED
FIXED
People
(Reporter: intermittent-bug-filer, Assigned: jhford)
References
Details
(Keywords: intermittent-failure, Whiteboard: [stockwell infra])
Attachments
(1 file, 1 obsolete file)
Filed by: philringnalda [at] gmail.com https://treeherder.mozilla.org/logviewer.html#?job_id=4176795&repo=autoland https://queue.taskcluster.net/v1/task/RbNBAYbMTNmC5DA9irwPww/runs/0/artifacts/public%2Flogs%2Flive_backing.log
Comment hidden (mozreview-request) |
Comment 2•8 years ago
|
||
This was reproducible if I run "rm -rf build" before running the Mozharness script for testing.
Updated•8 years ago
|
Assignee: nobody → armenzg
Comment 3•8 years ago
|
||
mozreview-review |
Comment on attachment 8796140 [details] Bug 1306189 - Create upload dir before writing to it + more logging. https://reviewboard.mozilla.org/r/82096/#review80618 Can you add the following, too? iff --git a/testing/mozharness/mozharness/base/script.py b/testing/mozharness/mozharness/base/script.py --- a/testing/mozharness/mozharness/base/script.py +++ b/testing/mozharness/mozharness/base/script.py @@ -388,19 +388,21 @@ class ScriptMixin(PlatformMixin): # Bug 1302237 - URLError: <urlopen error [Errno 104] Connection reset by peer> # Bug 1301807 - BadStatusLine: '' response = urllib2.urlopen(request) if parsed_url.scheme in ('http', 'https'): expected_file_size = int(response.headers.get('Content-Length')) self.info('Http code: {}'.format(response.getcode())) - for k in ('Content-Encoding', 'Content-Type', 'via', 'x-amz-cf-id', - 'x-amz-version-id', 'x-cache'): + for k in ('Content-Encoding', 'Content-Type', 'via', 'x-cache'): self.info('{}: {}'.format(k, response.headers.get(k))) + for k in response.headers.keys(): + if k.lower().startswith('x-amz-'): + self.info('{}: {}'.format(k, response.headers.get(k))) file_contents = response.read() obtained_file_size = len(file_contents) self.info('Expected file size: {}'.format(expected_file_size)) self.info('Obtained file size: {}'.format(obtained_file_size)) if obtained_file_size != expected_file_size: raise FetchedIncorrectFilesize( ::: testing/mozharness/mozharness/base/script.py:692 (Diff revision 1) > try: > function(**kwargs) > except zipfile.BadZipfile: > # Bug 1305752 - Sometimes a good download turns out to be a > # corrupted zipfile. Let's upload the file for inspection > - filepath = os.path.join(self.query_abs_dirs()['abs_upload_dir'], url.split('/')[-1]) > + upload_dir = self.query_abs_dirs()['abs_upload_dir'] This is still going to end up in the uploads dir, which is only copied to ~/artifacts on success. Just use a hard-coded /home/worker/artifacts?
Attachment #8796140 -
Flags: review?(dustin) → review-
Assignee | ||
Comment 4•8 years ago
|
||
Any reason to not just log all of the headers? Can the consuming script echo a base64 encoded copy of the file if it's 1024 bytes or fewer, or the first and last 1024 bytes, again as base64 strings? Can it also run du <file> and file <file>? That way we can see whether it's an incomplete file or an S3 error message
Comment 6•8 years ago
|
||
from a test usw1 machine: ubuntu@ip-172-31-9-14:~$ curl -v -L https://queue.taskcluster.net/v1/task/MhzwnYN3QuG9mhnS0ZLl7g/artifacts/public/build/target.crashreporter-symbols.zip > /dev/null * Hostname was NOT found in DNS cache % Total % Received % Xferd Average Speed Time Time Time Current Dload Upload Total Spent Left Speed 0 0 0 0 0 0 0 0 --:--:-- --:--:-- --:--:-- 0* Trying 54.235.141.80... * Connected to queue.taskcluster.net (54.235.141.80) port 443 (#0) * successfully set certificate verify locations: * CAfile: none CApath: /etc/ssl/certs * SSLv3, TLS handshake, Client hello (1): } [data not shown] * SSLv3, TLS handshake, Server hello (2): { [data not shown] * SSLv3, TLS handshake, CERT (11): { [data not shown] * SSLv3, TLS handshake, Server key exchange (12): { [data not shown] * SSLv3, TLS handshake, Server finished (14): { [data not shown] * SSLv3, TLS handshake, Client key exchange (16): } [data not shown] * SSLv3, TLS change cipher, Client hello (1): } [data not shown] * SSLv3, TLS handshake, Finished (20): } [data not shown] 0 0 0 0 0 0 0 0 --:--:-- --:--:-- --:--:-- 0* SSLv3, TLS change cipher, Client hello (1): { [data not shown] * SSLv3, TLS handshake, Finished (20): { [data not shown] * SSL connection using ECDHE-RSA-AES128-GCM-SHA256 * Server certificate: * subject: C=US; ST=California; L=Mountain View; O=Mozilla Corporation; CN=auth.taskcluster.net * start date: 2016-03-17 00:00:00 GMT * expire date: 2019-03-22 12:00:00 GMT * subjectAltName: queue.taskcluster.net matched * issuer: C=US; O=DigiCert Inc; CN=DigiCert SHA2 Secure Server CA * SSL certificate verify ok. > GET /v1/task/MhzwnYN3QuG9mhnS0ZLl7g/artifacts/public/build/target.crashreporter-symbols.zip HTTP/1.1 > User-Agent: curl/7.35.0 > Host: queue.taskcluster.net > Accept: */* > < HTTP/1.1 303 See Other * Server Cowboy is not blacklisted < Server: Cowboy < Connection: keep-alive < X-Powered-By: Express < Strict-Transport-Security: max-age=7776000 < Access-Control-Allow-Origin: * < Access-Control-Allow-Methods: OPTIONS,GET,HEAD,POST,PUT,DELETE,TRACE,CONNECT < Access-Control-Request-Method: * < Access-Control-Allow-Headers: X-Requested-With,Content-Type,Authorization,Accept,Origin < Location: https://cloud-mirror.taskcluster.net/v1/redirect/s3/us-west-1/https%3A%2F%2Fs3-us-west-2.amazonaws.com%2Ftaskcluster-public-artifacts%2FMhzwnYN3QuG9mhnS0ZLl7g%2F0%2Fpublic%2Fbuild%2Ftarget.crashreporter-symbols.zip < Vary: Accept < Content-Type: text/plain; charset=utf-8 < Content-Length: 29 < Date: Thu, 29 Sep 2016 14:20:33 GMT < Via: 1.1 vegur < * Ignoring the response-body { [data not shown] 100 29 100 29 0 0 35 0 --:--:-- --:--:-- --:--:-- 35 * Connection #0 to host queue.taskcluster.net left intact * Issue another request to this URL: 'https://cloud-mirror.taskcluster.net/v1/redirect/s3/us-west-1/https%3A%2F%2Fs3-us-west-2.amazonaws.com%2Ftaskcluster-public-artifacts%2FMhzwnYN3QuG9mhnS0ZLl7g%2F0%2Fpublic%2Fbuild%2Ftarget.crashreporter-symbols.zip' * Hostname was NOT found in DNS cache * Trying 54.225.222.159... * Connected to cloud-mirror.taskcluster.net (54.225.222.159) port 443 (#1) * successfully set certificate verify locations: * CAfile: none CApath: /etc/ssl/certs * SSLv3, TLS handshake, Client hello (1): } [data not shown] * SSLv3, TLS handshake, Server hello (2): { [data not shown] * SSLv3, TLS handshake, CERT (11): { [data not shown] * SSLv3, TLS handshake, Server key exchange (12): { [data not shown] * SSLv3, TLS handshake, Server finished (14): { [data not shown] * SSLv3, TLS handshake, Client key exchange (16): } [data not shown] * SSLv3, TLS change cipher, Client hello (1): } [data not shown] * SSLv3, TLS handshake, Finished (20): } [data not shown] * SSLv3, TLS change cipher, Client hello (1): { [data not shown] * SSLv3, TLS handshake, Finished (20): { [data not shown] * SSL connection using ECDHE-RSA-AES128-GCM-SHA256 * Server certificate: * subject: C=US; ST=California; L=Mountain View; O=Mozilla Corporation; CN=auth.taskcluster.net * start date: 2016-03-17 00:00:00 GMT * expire date: 2019-03-22 12:00:00 GMT * subjectAltName: cloud-mirror.taskcluster.net matched * issuer: C=US; O=DigiCert Inc; CN=DigiCert SHA2 Secure Server CA * SSL certificate verify ok. > GET /v1/redirect/s3/us-west-1/https%3A%2F%2Fs3-us-west-2.amazonaws.com%2Ftaskcluster-public-artifacts%2FMhzwnYN3QuG9mhnS0ZLl7g%2F0%2Fpublic%2Fbuild%2Ftarget.crashreporter-symbols.zip HTTP/1.1 > User-Agent: curl/7.35.0 > Host: cloud-mirror.taskcluster.net > Accept: */* > < HTTP/1.1 302 Found * Server Cowboy is not blacklisted < Server: Cowboy < Connection: keep-alive < X-Powered-By: Express < Strict-Transport-Security: max-age=7776000 < Access-Control-Allow-Origin: * < Access-Control-Allow-Methods: OPTIONS,GET,HEAD,POST,PUT,DELETE,TRACE,CONNECT < Access-Control-Request-Method: * < Access-Control-Allow-Headers: X-Requested-With,Content-Type,Authorization,Accept,Origin < Location: https://cloud-mirror-production-us-west-1.s3-us-west-1.amazonaws.com/https%3A%2F%2Fs3-us-west-2.amazonaws.com%2Ftaskcluster-public-artifacts%2FMhzwnYN3QuG9mhnS0ZLl7g%2F0%2Fpublic%2Fbuild%2Ftarget.crashreporter-symbols.zip < Content-Type: application/json; charset=utf-8 < Content-Length: 259 < Etag: W/"103-b30b5cae" < Date: Thu, 29 Sep 2016 14:20:33 GMT < Via: 1.1 vegur < * Ignoring the response-body { [data not shown] 100 259 100 259 0 0 223 0 0:00:01 0:00:01 --:--:-- 223 * Connection #1 to host cloud-mirror.taskcluster.net left intact * Issue another request to this URL: 'https://cloud-mirror-production-us-west-1.s3-us-west-1.amazonaws.com/https%3A%2F%2Fs3-us-west-2.amazonaws.com%2Ftaskcluster-public-artifacts%2FMhzwnYN3QuG9mhnS0ZLl7g%2F0%2Fpublic%2Fbuild%2Ftarget.crashreporter-symbols.zip' * Hostname was NOT found in DNS cache * Trying 54.231.236.4... * Connected to cloud-mirror-production-us-west-1.s3-us-west-1.amazonaws.com (54.231.236.4) port 443 (#2) * successfully set certificate verify locations: * CAfile: none CApath: /etc/ssl/certs * SSLv3, TLS handshake, Client hello (1): } [data not shown] * SSLv3, TLS handshake, Server hello (2): { [data not shown] * SSLv3, TLS handshake, CERT (11): { [data not shown] * SSLv3, TLS handshake, Server key exchange (12): { [data not shown] * SSLv3, TLS handshake, Server finished (14): { [data not shown] * SSLv3, TLS handshake, Client key exchange (16): } [data not shown] * SSLv3, TLS change cipher, Client hello (1): } [data not shown] * SSLv3, TLS handshake, Finished (20): } [data not shown] * SSLv3, TLS change cipher, Client hello (1): { [data not shown] * SSLv3, TLS handshake, Finished (20): { [data not shown] * SSL connection using ECDHE-RSA-AES128-GCM-SHA256 * Server certificate: * subject: C=US; ST=Washington; L=Seattle; O=Amazon.com Inc.; CN=*.s3-us-west-1.amazonaws.com * start date: 2016-07-18 00:00:00 GMT * expire date: 2017-10-26 12:00:00 GMT * subjectAltName: cloud-mirror-production-us-west-1.s3-us-west-1.amazonaws.com matched * issuer: C=US; O=DigiCert Inc; OU=www.digicert.com; CN=DigiCert Baltimore CA-2 G2 * SSL certificate verify ok. > GET /https%3A%2F%2Fs3-us-west-2.amazonaws.com%2Ftaskcluster-public-artifacts%2FMhzwnYN3QuG9mhnS0ZLl7g%2F0%2Fpublic%2Fbuild%2Ftarget.crashreporter-symbols.zip HTTP/1.1 > User-Agent: curl/7.35.0 > Host: cloud-mirror-production-us-west-1.s3-us-west-1.amazonaws.com > Accept: */* > < HTTP/1.1 200 OK < x-amz-id-2: czf57hvnPvvPN+fOlmDWEvf6+xsrpUFudHRsY42lthoEUNUb8yupL0N7ka269ZQ5KCjImPfC/I8= < x-amz-request-id: 5790A2B5710C1FE8 < Date: Thu, 29 Sep 2016 14:20:34 GMT < Last-Modified: Wed, 28 Sep 2016 22:11:45 GMT < x-amz-expiration: expiry-date="Fri, 30 Sep 2016 00:00:00 GMT", rule-id="us-west-1-1-day" < ETag: "dfd4ae0e879f8aa8cb53633fdd6d05a1" < x-amz-meta-addresses: [{"code":200,"url":"https://s3-us-west-2.amazonaws.com/taskcluster-public-artifacts/MhzwnYN3QuG9mhnS0ZLl7g/0/public/build/target.crashreporter-symbols.zip","t":"2016-09-28T22:05:48.545Z"}] < x-amz-meta-url: https://s3-us-west-2.amazonaws.com/taskcluster-public-artifacts/MhzwnYN3QuG9mhnS0ZLl7g/0/public/build/target.crashreporter-symbols.zip < x-amz-meta-upstream-etag: "51c9d15315b6b7ed2d4d25a3c82c0327" < x-amz-meta-stored: 2016-09-28T22:05:48.545Z < Accept-Ranges: bytes < Content-Type: application/zip < Content-Length: 295497 * Server AmazonS3 is not blacklisted < Server: AmazonS3 < { [data not shown] 100 288k 100 288k 0 0 226k 0 0:00:01 0:00:01 --:--:-- 226k * Connection #2 to host cloud-mirror-production-us-west-1.s3-us-west-1.amazonaws.com left intact so we can reproduce manually
Comment 7•8 years ago
|
||
That output is a zip file. It has no table of contents according to unzip: ubuntu@ip-172-31-9-14:~$ unzip x Archive: x End-of-central-directory signature not found. Either this file is not a zipfile, or it constitutes one disk of a multi-part archive. In the latter case the central directory and zipfile comment will be found on the last disk(s) of this archive. unzip: cannot find zipfile directory in one of x or x.zip, and cannot find x.ZIP, period. The directory is at the end of a zip file, suggesting this is truncated. It looks like it contains symbols: ubuntu@ip-172-31-9-14:~$ strings -10 x TestJemalloc/E62C52E5C1BE87150466003D2B28254C0/TestJemalloc.symUT ,&T~APTj=s c7L"x}uNh5 8l .kMD(es &]5.`b&]5.`b2 G~)TZ51hJX TestTXMgr/A90500BC60591C71C8D74410D042DC6C0/TestTXMgr.symUT !*cBL_`vI! 3G3>=&S#*rN#> P^UK-8`#4T \V#cdX`|g]E u>1t!^u!^I eEksI6EW4E ^EOV3U\YE9a. TestCOMPtrEq/5DF88692AF1479817997BD15E32BFA4D0/TestCOMPtrEq.symUT TestBlockingSocket/3BD9340DB24DDC8A3BF9C56E42ABF19D0/TestBlockingSocket.symUT 3).]|/1\STs fB] 4:LPfon y:-ve 8S\x M1G-*F_<i6\ TestCOMArray/12771A92A039EC8FFC3A0E39B788A8B30/TestCOMArray.symUT <B6!(+.l3A% Jchg.Wh|F? TestAudioMixer/532992C60DD92D56E877D135859471B80/TestAudioMixer.symUT
Assignee | ||
Comment 8•8 years ago
|
||
the length of the file is 295497 for the broken usw1 copy, 114583749 for the working use1 copy.
Assignee | ||
Comment 9•8 years ago
|
||
Interestingly, the hexdump of these two files is wildly different US-East-1 =============================================== 0000000 4b50 0403 0014 0002 0008 0000 3c21 b151 0000010 3710 1bde 0000 5553 0000 0015 0000 6572 0000020 7466 7365 2f74 7561 6f74 616d 6974 6e6f 0000030 702e c579 6b3c db73 9238 f5df b02b a5cc 0000040 dd44 8cc8 789d 9266 faec 14ee 4e5b 2354 0000050 3a4b 4e49 3666 6293 2451 7164 914c 3e5c US-West-1 =============================================== 0000000 4b50 0403 0014 0000 0008 acf8 493c 6ccd 0000010 3a44 fa54 0000 23b6 0004 003f 001c 6554 0000020 7473 654a 616d 6c6c 636f 452f 3236 3543 0000030 4532 4335 4231 3845 3137 3035 3634 3036 0000040 3330 3244 3242 3238 3435 3043 542f 7365 0000050 4a74 6d65 6c61 6f6c 2e63 7973 556d 0954
Comment hidden (mozreview-request) |
Updated•8 years ago
|
Attachment #8796140 -
Attachment is obsolete: true
Attachment #8796140 -
Flags: review?(dustin)
Comment 11•8 years ago
|
||
I'm not seeing that difference: ubuntu@ip-172-31-9-14:~$ for f in target.crashreporter-symbols.zip-us* corrupt ; do ls -l $f; hexdump $f | head; done -rw-rw-r-- 1 ubuntu ubuntu 114583749 Sep 29 15:16 target.crashreporter-symbols.zip-use1 0000000 4b50 0403 0014 0000 0008 acf8 493c 6ccd 0000010 3a44 fa54 0000 23b6 0004 003f 001c 6554 0000020 7473 654a 616d 6c6c 636f 452f 3236 3543 0000030 4532 4335 4231 3845 3137 3035 3634 3036 0000040 3330 3244 3242 3238 3435 3043 542f 7365 0000050 4a74 6d65 6c61 6f6c 2e63 7973 556d 0954 0000060 0300 38a4 57ec 38a4 57ec 7875 000b 0401 0000070 01f4 0000 f404 0001 d400 5b5c db73 9638 0000080 d77e d0af 6ee3 a6cd 3bc5 5741 5757 b639 0000090 edd3 5c99 b13c b333 2fb5 9029 6d04 644e -rw-rw-r-- 1 ubuntu ubuntu 295497 Sep 29 15:19 target.crashreporter-symbols.zip-usw1 0000000 4b50 0403 0014 0000 0008 acf8 493c 6ccd 0000010 3a44 fa54 0000 23b6 0004 003f 001c 6554 0000020 7473 654a 616d 6c6c 636f 452f 3236 3543 0000030 4532 4335 4231 3845 3137 3035 3634 3036 0000040 3330 3244 3242 3238 3435 3043 542f 7365 0000050 4a74 6d65 6c61 6f6c 2e63 7973 556d 0954 0000060 0300 38a4 57ec 38a4 57ec 7875 000b 0401 0000070 01f4 0000 f404 0001 d400 5b5c db73 9638 0000080 d77e d0af 6ee3 a6cd 3bc5 5741 5757 b639 0000090 edd3 5c99 b13c b333 2fb5 9029 6d04 644e -rw-rw-r-- 1 ubuntu ubuntu 114583749 Sep 29 15:14 target.crashreporter-symbols.zip-usw2 0000000 4b50 0403 0014 0000 0008 acf8 493c 6ccd 0000010 3a44 fa54 0000 23b6 0004 003f 001c 6554 0000020 7473 654a 616d 6c6c 636f 452f 3236 3543 0000030 4532 4335 4231 3845 3137 3035 3634 3036 0000040 3330 3244 3242 3238 3435 3043 542f 7365 0000050 4a74 6d65 6c61 6f6c 2e63 7973 556d 0954 0000060 0300 38a4 57ec 38a4 57ec 7875 000b 0401 0000070 01f4 0000 f404 0001 d400 5b5c db73 9638 0000080 d77e d0af 6ee3 a6cd 3bc5 5741 5757 b639 0000090 edd3 5c99 b13c b333 2fb5 9029 6d04 644e to verify, us-east-1 and us-west-2 are identical: ubuntu@ip-172-31-9-14:~$ diff target.crashreporter-symbols.zip-us{e1,w2} ubuntu@ip-172-31-9-14:~$ where was the us-east-1 download in comment #9 from?
Comment 12•8 years ago
|
||
John's going to check with docker-cloud about how we can hook up container logging to papertrail. The logging we do have from cloud-mirror is this (from the curl output above) < x-amz-meta-addresses: [{"code":200,"url":"https://s3-us-west-2.amazonaws.com/taskcluster-public-artifacts/MhzwnYN3QuG9mhnS0ZLl7g/0/public/build/target.crashreporter-symbols.zip","t":"2016-09-28T22:05:48.545Z"}] < x-amz-meta-url: https://s3-us-west-2.amazonaws.com/taskcluster-public-artifacts/MhzwnYN3QuG9mhnS0ZLl7g/0/public/build/target.crashreporter-symbols.zip < x-amz-meta-upstream-etag: "51c9d15315b6b7ed2d4d25a3c82c0327" < x-amz-meta-stored: 2016-09-28T22:05:48.545Z I verified in us-east-1 that the Etag for the original and the copy will not match, so that difference is not helpful. The timestamp is useful, though, in that it lets us find the time the copy was made. I'm running dustin@lamport ~/tmp/cloud-mirror $ docker run -e DOCKERCLOUD_USER='...' -e DOCKERCLOUD_PASS='...' dockercloud/cli service logs -t0 -f cloud-mirror-copier 2>&1 | multilog s10485760 . so if we see another instance of this where the source task was created after Thu Sep 29 17:11:50 UTC 2016, we should be able to see the copy in the logs.
Comment hidden (mozreview-request) |
Comment 15•8 years ago
|
||
mozreview-review |
Comment on attachment 8796311 [details] Bug 1306189 - Add proper signature + better logging + stop trying to upload the corrupted file. https://reviewboard.mozilla.org/r/82198/#review80818 ::: testing/mozharness/mozharness/base/script.py:692 (Diff revision 1) > try: > function(**kwargs) > except zipfile.BadZipfile: > - # Bug 1305752 - Sometimes a good download turns out to be a > - # corrupted zipfile. Let's upload the file for inspection > - filepath = os.path.join(self.query_abs_dirs()['abs_upload_dir'], url.split('/')[-1]) > + # Bug 1306189 - Sometimes a good download turns out to be a > + # corrupted zipfile. Let's create a signature that is easy to match > + self.fatal('Check bug 1306189 for details on downloading a truncated zip file.') ++
Attachment #8796311 -
Flags: review?(dustin) → review+
Comment 16•8 years ago
|
||
Pushed by armenzg@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/42722bb6b23c Add proper signature + better logging + stop trying to upload the corrupted file. r=dustin
Updated•8 years ago
|
Keywords: leave-open
Comment 17•8 years ago
|
||
bugherder |
https://hg.mozilla.org/mozilla-central/rev/42722bb6b23c
Comment 18•8 years ago
|
||
Passing to dustin to investigate the root issue of zip files being truncated on the server side. Moving to the TaskCluster product and removed intermittent-orange (as the patch landed should create a unique signature).
Assignee: armenzg → dustin
Component: Mozharness → General
Keywords: intermittent-failure,
leave-open
Product: Release Engineering → Taskcluster
QA Contact: jlund
Summary: Intermittent-infra IOError: [Errno 2] No such file or directory: '/home/worker/workspace/build/upload/target.crashreporter-symbols.zip' → Determine root cause of truncated zip files (was Intermittent-infra IOError: [Errno 2] No such file or directory: '/home/worker/workspace/build/upload/target.crashreporter-symbols.zip')
Comment 20•8 years ago
|
||
Just for sanity, did you verify that the contents of the truncated file are exactly the same as the first however many bytes of the full file?
Updated•8 years ago
|
Keywords: intermittent-failure
Summary: Determine root cause of truncated zip files (was Intermittent-infra IOError: [Errno 2] No such file or directory: '/home/worker/workspace/build/upload/target.crashreporter-symbols.zip') → Determine root cause of truncated zip files ("Check bug 1306189 for details on downloading a truncated zip file.")
Comment 21•8 years ago
|
||
Sort of -- John found them to be different (comment 9) and I found them to match (comment 11)
Comment 22•8 years ago
|
||
https://tools.taskcluster.net/task-inspector/#UpYRcM4XROGd7Mfyb5ykGQ/0 failed to fetch https://queue.taskcluster.net/v1/task/Hv4rirrYRtmcYfD974aCTw/artifacts/public/build/mozharness.zip in us-west-1.
Comment 23•8 years ago
|
||
From us-west-2, right now, I get sent directly from the queue to the bucket, with Content-Length: 666179. From us-west-1, I get < HTTP/1.1 200 OK < x-amz-id-2: pIZg6YtCptfu+r/UBhEWTuxVYpU/daX18chBpk0K98qWU4hBSZeZpBTTiuP7zYoJ4ZzUvtK6aJ8= < x-amz-request-id: 9D0ED50924968A1B < Date: Mon, 03 Oct 2016 15:53:14 GMT < Last-Modified: Mon, 03 Oct 2016 13:40:14 GMT < x-amz-expiration: expiry-date="Wed, 05 Oct 2016 00:00:00 GMT", rule-id="us-west-1-1-day" < ETag: "e9dd103fcd876c748b79284623b87fda" < x-amz-meta-addresses: [{"code":200,"url":"https://s3-us-west-2.amazonaws.com/taskcluster-public-artifacts/Hv4rirrYRtmcYfD974aCTw/0/public/build/mozharness.zip","t":"2016-10-03T13:34:13.563Z"}] < x-amz-meta-url: https://s3-us-west-2.amazonaws.com/taskcluster-public-artifacts/Hv4rirrYRtmcYfD974aCTw/0/public/build/mozharness.zip < x-amz-meta-upstream-etag: "f53b7be128045dd069161606704a3baa" < x-amz-meta-stored: 2016-10-03T13:34:13.563Z < Accept-Ranges: bytes < Content-Type: application/zip < Content-Length: 34380 Examining the first 160 bytes of the files shows the same content, so I expect this is a truncation. The "upstream etag" in the above matches the etag I get with the us-west-2 request.
Comment 24•8 years ago
|
||
The artifact has the correct length in us-east-1 (https://cloud-mirror-production-us-west-1.s3-us-west-1.amazonaws.com/https%3A%2F%2Fs3-us-west-2.amazonaws.com%2Ftaskcluster-public-artifacts%2FHv4rirrYRtmcYfD974aCTw%2F0%2Fpublic%2Fbuild%2Fmozharness.zip) Here's some logging from cloud-mirror for this us-east-1 copy, with other stuff mixed in. I've annotated the lines I *think* are relevant with "*". *2016-10-03T13:34:11.854317269Z Mon, 03 Oct 2016 13:34:11 GMT cloud-proxy:main received message! *2016-10-03T13:34:11.854327449Z Mon, 03 Oct 2016 13:34:11 GMT cloud-proxy:main this message checks out *2016-10-03T13:34:11.854370392Z Mon, 03 Oct 2016 13:34:11 GMT cloud-mirror:CacheManager:s3_us-east-1 putting https://s3-us-west-2.amazonaws.com/taskcluster-public-artifacts/Hv4rirrYRtmcYfD974aCTw/0/public/build/mozharness.zip 2016-10-03T13:34:11.909190552Z Mon, 03 Oct 2016 13:34:11 GMT cloud-mirror:s3-storage-provider upload completed 2016-10-03T13:34:11.909262400Z Mon, 03 Oct 2016 13:34:11 GMT cloud-mirror:S3StorageProvider:s3_us-west-1 completed S3 upload 2016-10-03T13:34:11.909620150Z Mon, 03 Oct 2016 13:34:11 GMT cloud-mirror:CacheManager:s3_us-west-1 uploaded https://s3-us-west-2.amazonaws.com/taskcluster-public-artifacts/LvENO_XfTnW4Y31PG_dB2g/0/public/build/target.mochitest.tests.zip 74126733 bytes in 4.097237224000001 seconds *2016-10-03T13:34:11.924573925Z Mon, 03 Oct 2016 13:34:11 GMT cloud-mirror:CacheManager:s3_us-east-1 creating read stream for https://s3-us-west-2.amazonaws.com/taskcluster-public-artifacts/Hv4rirrYRtmcYfD974aCTw/0/public/build/mozharness.zip 2016-10-03T13:34:11.931181904Z Mon, 03 Oct 2016 13:34:11 GMT queue:QueueListener:692406183521/cloud-mirror-production received 0 messages 2016-10-03T13:34:11.931280246Z Mon, 03 Oct 2016 13:34:11 GMT queue:QueueListener:692406183521/cloud-mirror-production handling message efe69f48-50eb-480e-ac00-f53babc0502c 2016-10-03T13:34:11.931660690Z Mon, 03 Oct 2016 13:34:11 GMT cloud-proxy:main received message! 2016-10-03T13:34:11.931680480Z Mon, 03 Oct 2016 13:34:11 GMT cloud-proxy:main this message checks out 2016-10-03T13:34:11.931818626Z Mon, 03 Oct 2016 13:34:11 GMT cloud-mirror:CacheManager:s3_us-east-1 putting https://s3-us-west-2.amazonaws.com/taskcluster-public-artifacts/Pvjd56KdRI2lFw4QrshkcQ/0/public/build/fennec-52.0a1.en-US.android-arm.crashreporter-symbols.zip 2016-10-03T13:34:11.939652691Z Mon, 03 Oct 2016 13:34:11 GMT queue:QueueListener:692406183521/cloud-mirror-production received 0 messages 2016-10-03T13:34:11.939721793Z Mon, 03 Oct 2016 13:34:11 GMT queue:QueueListener:692406183521/cloud-mirror-production handling message 7abf1ee7-4517-48e6-812f-a53457ed4d36 2016-10-03T13:34:11.939766585Z Mon, 03 Oct 2016 13:34:11 GMT cloud-proxy:main received message! 2016-10-03T13:34:11.939802508Z Mon, 03 Oct 2016 13:34:11 GMT cloud-proxy:main this message checks out 2016-10-03T13:34:11.939854616Z Mon, 03 Oct 2016 13:34:11 GMT cloud-mirror:CacheManager:s3_us-east-1 putting https://s3-us-west-2.amazonaws.com/taskcluster-public-artifacts/L1mvNlJTRoaCZNdXO68WNQ/0/public/test_info//mochitest-devtools-chrome-chunked_errorsummary.log *2016-10-03T13:34:11.942201426Z Mon, 03 Oct 2016 13:34:11 GMT cloud-mirror:follow-redirects Follwed all redirects: https://s3-us-west-2.amazonaws.com/taskcluster-public-artifacts/Hv4rirrYRtmcYfD974aCTw/0/public/build/mozharness.zip *2016-10-03T13:34:11.942975137Z Mon, 03 Oct 2016 13:34:11 GMT cloud-mirror:CacheManager:s3_us-east-1 created read stream for https://s3-us-west-2.amazonaws.com/taskcluster-public-artifacts/Hv4rirrYRtmcYfD974aCTw/0/public/build/mozharness.zip *2016-10-03T13:34:11.944342169Z Mon, 03 Oct 2016 13:34:11 GMT cloud-mirror:S3StorageProvider:s3_us-east-1 starting S3 upload *2016-10-03T13:34:11.944404824Z Mon, 03 Oct 2016 13:34:11 GMT cloud-mirror:s3-storage-provider initiating upload 2016-10-03T13:34:11.994837279Z Mon, 03 Oct 2016 13:34:11 GMT queue:QueueListener:692406183521/cloud-mirror-production removed message be98d6b4-e898-4507-8776-44d47b95856d from queue 2016-10-03T13:34:11.994888790Z Mon, 03 Oct 2016 13:34:11 GMT queue:QueueListener:692406183521/cloud-mirror-production handler successfully processed message be98d6b4-e898-4507-8776-44d47b95856d 2016-10-03T13:34:12.009627957Z Mon, 03 Oct 2016 13:34:12 GMT cloud-mirror:CacheManager:s3_us-east-1 creating read stream for https://s3-us-west-2.amazonaws.com/taskcluster-public-artifacts/L1mvNlJTRoaCZNdXO68WNQ/0/public/test_info//mochitest-devtools-chrome-chunked_errorsummary.log 2016-10-03T13:34:12.011278419Z Mon, 03 Oct 2016 13:34:12 GMT cloud-mirror:CacheManager:s3_us-east-1 creating read stream for https://s3-us-west-2.amazonaws.com/taskcluster-public-artifacts/Pvjd56KdRI2lFw4QrshkcQ/0/public/build/fennec-52.0a1.en-US.android-arm.crashreporter-symbols.zip 2016-10-03T13:34:12.037686650Z Mon, 03 Oct 2016 13:34:12 GMT cloud-mirror:follow-redirects Follwed all redirects: https://s3-us-west-2.amazonaws.com/taskcluster-public-artifacts/Pvjd56KdRI2lFw4QrshkcQ/0/public/build/fennec-52.0a1.en-US.android-arm.crashreporter-symbols.zip 2016-10-03T13:34:12.038206536Z Mon, 03 Oct 2016 13:34:12 GMT cloud-mirror:CacheManager:s3_us-east-1 created read stream for https://s3-us-west-2.amazonaws.com/taskcluster-public-artifacts/Pvjd56KdRI2lFw4QrshkcQ/0/public/build/fennec-52.0a1.en-US.android-arm.crashreporter-symbols.zip 2016-10-03T13:34:12.038293470Z Mon, 03 Oct 2016 13:34:12 GMT cloud-mirror:follow-redirects Follwed all redirects: https://s3-us-west-2.amazonaws.com/taskcluster-public-artifacts/L1mvNlJTRoaCZNdXO68WNQ/0/public/test_info//mochitest-devtools-chrome-chunked_errorsummary.log 2016-10-03T13:34:12.038774397Z Mon, 03 Oct 2016 13:34:12 GMT cloud-mirror:CacheManager:s3_us-east-1 created read stream for https://s3-us-west-2.amazonaws.com/taskcluster-public-artifacts/L1mvNlJTRoaCZNdXO68WNQ/0/public/test_info//mochitest-devtools-chrome-chunked_errorsummary.log 2016-10-03T13:34:12.039792234Z Mon, 03 Oct 2016 13:34:12 GMT cloud-mirror:S3StorageProvider:s3_us-east-1 starting S3 upload 2016-10-03T13:34:12.039919796Z Mon, 03 Oct 2016 13:34:12 GMT cloud-mirror:S3StorageProvider:s3_us-east-1 starting S3 upload 2016-10-03T13:34:12.039923300Z Mon, 03 Oct 2016 13:34:12 GMT cloud-mirror:s3-storage-provider initiating upload 2016-10-03T13:34:12.040095973Z Mon, 03 Oct 2016 13:34:12 GMT cloud-mirror:s3-storage-provider initiating upload 2016-10-03T13:34:12.341981977Z Mon, 03 Oct 2016 13:34:12 GMT statsum Submitting 27 data-points with dimensionality 3 to project: cloud-mirror 2016-10-03T13:34:12.343265117Z Mon, 03 Oct 2016 13:34:12 GMT statsum Submitting data-points to: https://statsum.taskcluster.net/v1/project/cloud-mirror 2016-10-03T13:34:12.601867785Z Mon, 03 Oct 2016 13:34:12 GMT cloud-mirror:s3-storage-provider upload completed 2016-10-03T13:34:12.601929824Z Mon, 03 Oct 2016 13:34:12 GMT cloud-mirror:S3StorageProvider:s3_us-east-1 completed S3 upload 2016-10-03T13:34:12.602083131Z Mon, 03 Oct 2016 13:34:12 GMT cloud-mirror:CacheManager:s3_us-east-1 uploaded https://s3-us-west-2.amazonaws.com/taskcluster-public-artifacts/L1mvNlJTRoaCZNdXO68WNQ/0/public/test_info//mochitest-devtools-chrome-chunked_errorsummary.log 0 bytes in 0.5631460070000001 seconds 2016-10-03T13:34:12.680216542Z Mon, 03 Oct 2016 13:34:12 GMT queue:QueueListener:692406183521/cloud-mirror-production removed message 7abf1ee7-4517-48e6-812f-a53457ed4d36 from queue 2016-10-03T13:34:12.680273554Z Mon, 03 Oct 2016 13:34:12 GMT queue:QueueListener:692406183521/cloud-mirror-production handler successfully processed message 7abf1ee7-4517-48e6-812f-a53457ed4d36 2016-10-03T13:34:13.200873325Z Mon, 03 Oct 2016 13:34:13 GMT queue:QueueListener:692406183521/cloud-mirror-production received 0 messages 2016-10-03T13:34:13.200919578Z Mon, 03 Oct 2016 13:34:13 GMT queue:QueueListener:692406183521/cloud-mirror-production handling message 6fdf775a-4c02-46ca-bfc8-12ecac8b413d 2016-10-03T13:34:13.201003478Z Mon, 03 Oct 2016 13:34:13 GMT cloud-proxy:main received message! 2016-10-03T13:34:13.201025230Z Mon, 03 Oct 2016 13:34:13 GMT cloud-proxy:main this message checks out 2016-10-03T13:34:13.201065105Z Mon, 03 Oct 2016 13:34:13 GMT cloud-mirror:CacheManager:s3_us-west-1 putting https://s3-us-west-2.amazonaws.com/taskcluster-public-artifacts/Hv4rirrYRtmcYfD974aCTw/0/public/build/mozharness.zip *2016-10-03T13:34:13.256635114Z Mon, 03 Oct 2016 13:34:13 GMT cloud-mirror:s3-storage-provider upload completed *2016-10-03T13:34:13.256873767Z Mon, 03 Oct 2016 13:34:13 GMT cloud-mirror:S3StorageProvider:s3_us-east-1 completed S3 upload *2016-10-03T13:34:13.257013780Z Mon, 03 Oct 2016 13:34:13 GMT cloud-mirror:CacheManager:s3_us-east-1 uploaded https://s3-us-west-2.amazonaws.com/taskcluster-public-artifacts/Hv4rirrYRtmcYfD974aCTw/0/public/build/mozharness.zip 666179 bytes in 1.31393689 seconds
Comment 25•8 years ago
|
||
Here's the same for the us-west-1 copy, at 13:34:13 just as the S3 metadata promised: 2016-10-03T13:34:13.201003478Z Mon, 03 Oct 2016 13:34:13 GMT cloud-proxy:main received message! 2016-10-03T13:34:13.201025230Z Mon, 03 Oct 2016 13:34:13 GMT cloud-proxy:main this message checks out 2016-10-03T13:34:13.201065105Z Mon, 03 Oct 2016 13:34:13 GMT cloud-mirror:CacheManager:s3_us-west-1 putting https://s3-us-west-2.amazonaws.com/taskcluster-public-artifacts/Hv4rirrYRtmcYfD974aCTw/0/public/build/mozharness.zip 2016-10-03T13:34:13.256635114Z Mon, 03 Oct 2016 13:34:13 GMT cloud-mirror:s3-storage-provider upload completed 2016-10-03T13:34:13.256873767Z Mon, 03 Oct 2016 13:34:13 GMT cloud-mirror:S3StorageProvider:s3_us-east-1 completed S3 upload 2016-10-03T13:34:13.257013780Z Mon, 03 Oct 2016 13:34:13 GMT cloud-mirror:CacheManager:s3_us-east-1 uploaded https://s3-us-west-2.amazonaws.com/taskcluster-public-artifacts/Hv4rirrYRtmcYfD974aCTw/0/public/build/mozharness.zip 666179 bytes in 1.31393689 seconds 2016-10-03T13:34:13.270749360Z Mon, 03 Oct 2016 13:34:13 GMT cloud-mirror:CacheManager:s3_us-west-1 creating read stream for https://s3-us-west-2.amazonaws.com/taskcluster-public-artifacts/Hv4rirrYRtmcYfD974aCTw/0/public/build/mozharness.zip 2016-10-03T13:34:13.284721575Z Mon, 03 Oct 2016 13:34:13 GMT cloud-mirror:follow-redirects Follwed all redirects: https://s3-us-west-2.amazonaws.com/taskcluster-public-artifacts/Hv4rirrYRtmcYfD974aCTw/0/public/build/mozharness.zip 2016-10-03T13:34:13.285119191Z Mon, 03 Oct 2016 13:34:13 GMT cloud-mirror:CacheManager:s3_us-west-1 created read stream for https://s3-us-west-2.amazonaws.com/taskcluster-public-artifacts/Hv4rirrYRtmcYfD974aCTw/0/public/build/mozharness.zip 2016-10-03T13:34:13.286021891Z Mon, 03 Oct 2016 13:34:13 GMT cloud-mirror:S3StorageProvider:s3_us-west-1 starting S3 upload 2016-10-03T13:34:13.286078670Z Mon, 03 Oct 2016 13:34:13 GMT cloud-mirror:s3-storage-provider initiating upload 2016-10-03T13:34:13.338140036Z Mon, 03 Oct 2016 13:34:13 GMT queue:QueueListener:692406183521/cloud-mirror-production removed message 143042b3-ecd9-4d07-8ba5-b36d01b7422a from queue 2016-10-03T13:34:13.338204244Z Mon, 03 Oct 2016 13:34:13 GMT queue:QueueListener:692406183521/cloud-mirror-production handler successfully processed message 143042b3-ecd9-4d07-8ba5-b36d01b7422a 2016-10-03T13:34:13.348434800Z Mon, 03 Oct 2016 13:34:13 GMT cloud-mirror:s3-storage-provider upload completed 2016-10-03T13:34:13.348513742Z Mon, 03 Oct 2016 13:34:13 GMT cloud-mirror:S3StorageProvider:s3_us-east-1 completed S3 upload 2016-10-03T13:34:13.348583441Z Mon, 03 Oct 2016 13:34:13 GMT cloud-mirror:CacheManager:s3_us-east-1 uploaded https://s3-us-west-2.amazonaws.com/taskcluster-public-artifacts/LvENO_XfTnW4Y31PG_dB2g/0/public/build/target.mochitest.tests.zip 74126733 bytes in 4.7369786970000005 seconds 2016-10-03T13:34:13.425401839Z Mon, 03 Oct 2016 13:34:13 GMT queue:QueueListener:692406183521/cloud-mirror-production removed message 5c37f227-e3a5-4855-963f-0357f84f5cb9 from queue 2016-10-03T13:34:13.425428269Z Mon, 03 Oct 2016 13:34:13 GMT queue:QueueListener:692406183521/cloud-mirror-production handler successfully processed message 5c37f227-e3a5-4855-963f-0357f84f5cb9 2016-10-03T13:34:13.664843637Z Mon, 03 Oct 2016 13:34:13 GMT cloud-mirror:s3-storage-provider upload completed 2016-10-03T13:34:13.664907727Z Mon, 03 Oct 2016 13:34:13 GMT cloud-mirror:S3StorageProvider:s3_us-west-1 completed S3 upload 2016-10-03T13:34:13.664994984Z Mon, 03 Oct 2016 13:34:13 GMT cloud-mirror:CacheManager:s3_us-west-1 uploaded https://s3-us-west-2.amazonaws.com/taskcluster-public-artifacts/Hv4rirrYRtmcYfD974aCTw/0/public/build/mozharness.zip 666179 bytes in 0.379772142 seconds note that it indicates the correct size in the completion message. This overlaps, too, with the us-east-1 copy .. maybe relevant?
Comment 26•8 years ago
|
||
That last log line comes from https://github.com/taskcluster/cloud-mirror/blob/master/src/cache-manager.js#L106 this.debug(`uploaded ${rawUrl} ${m.bytes} bytes in ${duration/1000} seconds`); where m is a stream meter. So it's measuring the total bytes read from the input stream. I don't see any handling of Content-Length, but then the SDK is using multipart uploads, so maybe that's not appropriate. There's a similar report here (but for the PHP SDK): https://github.com/aws/aws-sdk-php/issues/564
Comment 27•8 years ago
|
||
Some broader grepping of the logs shows an interesting set of phenomena: 2016-10-03T13:34:13.257013780Z Mon, 03 Oct 2016 13:34:13 GMT cloud-mirror:CacheManager:s3_us-east-1 uploaded https://s3-us-west-2.amazonaws.com/taskcluster-public-artifacts/Hv4rirrYRtmcYfD974aCTw/0/public/build/mozharness.zip 666179 bytes in 1.31393689 seconds 2016-10-03T13:34:13.664994984Z Mon, 03 Oct 2016 13:34:13 GMT cloud-mirror:CacheManager:s3_us-west-1 uploaded https://s3-us-west-2.amazonaws.com/taskcluster-public-artifacts/Hv4rirrYRtmcYfD974aCTw/0/public/build/mozharness.zip 666179 bytes in 0.379772142 seconds 2016-10-03T13:40:13.141331892Z Mon, 03 Oct 2016 13:40:13 GMT cloud-mirror:CacheManager:s3_us-west-1 uploaded https://s3-us-west-2.amazonaws.com/taskcluster-public-artifacts/Hv4rirrYRtmcYfD974aCTw/0/public/build/mozharness.zip 34380 bytes in 359.57729642099997 seconds (trolling through treeherder for other instances:) 2016-10-03T13:42:05.009741282Z Mon, 03 Oct 2016 13:42:05 GMT cloud-mirror:CacheManager:s3_us-east-1 uploaded https://s3-us-west-2.amazonaws.com/taskcluster-public-artifacts/Wd54dlTSSJmXfPW4kKQimw/0/public/build/target.crashreporter-symbols.zip 159453196 bytes in 7.567680615 seconds 2016-10-03T13:42:05.704680409Z Mon, 03 Oct 2016 13:42:05 GMT cloud-mirror:CacheManager:s3_us-east-1 uploaded https://s3-us-west-2.amazonaws.com/taskcluster-public-artifacts/Wd54dlTSSJmXfPW4kKQimw/0/public/build/target.crashreporter-symbols.zip 159453196 bytes in 8.143963533 seconds 2016-10-03T13:42:15.031537867Z Mon, 03 Oct 2016 13:42:15 GMT cloud-mirror:CacheManager:s3_us-west-1 uploaded https://s3-us-west-2.amazonaws.com/taskcluster-public-artifacts/Wd54dlTSSJmXfPW4kKQimw/0/public/build/target.crashreporter-symbols.zip 159453196 bytes in 4.500056646 seconds 2016-10-03T13:42:15.269754483Z Mon, 03 Oct 2016 13:42:15 GMT cloud-mirror:CacheManager:s3_us-west-1 uploaded https://s3-us-west-2.amazonaws.com/taskcluster-public-artifacts/Wd54dlTSSJmXfPW4kKQimw/0/public/build/target.crashreporter-symbols.zip 159453196 bytes in 4.203620871 seconds 2016-10-03T13:42:34.842992886Z Mon, 03 Oct 2016 13:42:34 GMT cloud-mirror:CacheManager:s3_us-east-1 uploaded https://s3-us-west-2.amazonaws.com/taskcluster-public-artifacts/Wd54dlTSSJmXfPW4kKQimw/0/public/build/target.crashreporter-symbols.zip 159453196 bytes in 7.959859654 seconds 2016-10-03T13:47:02.016014542Z Mon, 03 Oct 2016 13:47:02 GMT cloud-mirror:CacheManager:s3_us-east-1 uploaded https://s3-us-west-2.amazonaws.com/taskcluster-public-artifacts/Wd54dlTSSJmXfPW4kKQimw/0/public/build/target.crashreporter-symbols.zip 0 bytes in 304.693212035 seconds I've summed all of the available data up in https://gist.github.com/djmitche/c2dd3630960f4784e822a7f0c54295c9 So it looks like we do multiple copies of each object within a few minutes of one another, and in some cases the last of those ends with an error after 300s+ with only a truncated file read. But in at least one case (SlX0CIF8QiaNy6oL78cOcA target.web-platform.tests.zip) the stream-meter reports reading the correct number of bytes, but not writing them. In no case do I see "error from stream-meter" in the logs, but I'm not sure if that means there are no errors in the stream, or just no errors in the stream *meter*.
Comment 28•8 years ago
|
||
John, can I toss this over to you? I think we've narrowed the problem to incomplete transfers from S3 (where the logged sizes are wrong) or to S3 (where the logged sizes are correct but the resulting S3 object is truncated). Maybe there's more error-handling possible here? If not, I suspect that the "Content-Length" header from the source stream is reliable enough, so checking that against the other content lengths (from the stream meter and maybe a final HEAD request against the copied object?) would help detect the issue. I see that the SDK's upload method can take a ContentLength parameter, too. That might cause it to do some better error-checking and retrying of uploads. I was surprised to see the multiple copies of the same object to the same region -- does that seem wrong to you, too? I doubt we can reliably test failed uploads in integration tests, but I can help re-create failures manually, if that's useful.
Comment 29•8 years ago
|
||
Whoops, helps to adjust bug fields too! Also, I'm happy to keep working on this, too -- just at the end of where I can be very useful without your (John's) expertise.
Flags: needinfo?(jhford)
Comment 30•8 years ago
|
||
One more thing :) If it would be useful, we could scan all three artifact buckets and look for objects with the same keys but different sizes, to find more instances of this error.
Updated•8 years ago
|
Summary: Determine root cause of truncated zip files ("Check bug 1306189 for details on downloading a truncated zip file.") → Determine root cause of truncated zip files ("Check bug 1306189 for details on downloading a truncated zip file." or "[test-linux.sh:error] error unzipping mozharness.zip")
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment 38•8 years ago
|
||
Hi John, any updates about this? Anything we can do in our side?
Updated•8 years ago
|
Assignee: dustin → jhford
Assignee | ||
Comment 39•8 years ago
|
||
(In reply to Armen Zambrano [:armenzg] (EDT/UTC-4) from comment #38) > Hi John, any updates about this? > Anything we can do in our side? Hi Armen, I've been making some code changes which should hopefully make this easier to debug, as well as a couple small improvements which should lessen the impact. They have not landed yet. I'm also still debugging the problem. Right now, it seems as though something really strange is happening for gzip and deflate encoded resources. I've added a new check that verifies that the number of bytes that are uploaded to S3 is the same as the Content-Length: header on the original request. For most files this check is perfect. The problem is that for Gzip and Deflate encoded files, we seem to reliably uploading more bytes than we're downloading, which is weird. In this case, it's 'http://httpbin.org/gzip'. For my tests here, I'm seeing { "gzipped": true, "headers": { "Accept-Encoding": "*", "Host": "httpbin.org" }, "method": "GET", "origin": "178.19.212.229" } We're reliably getting issues. The upstream copy of the file is: localhost:~/taskcluster/cloud-mirror $ hexdump expected.gz | tee expected.hex 0000000 ef1f bdbf 0008 ef07 bdbf bfef 57bd ef02 0000010 bdbf ef25 bdbf 0e31 bfef 20bd 0410 5e7b 0000020 bfef efbd bdbf bfef 12bd 6824 ef12 bdbf 0000030 614b bfef 0fbd bfef 07bd 3812 4501 ef38 0000040 bdbf 4d2f ef2c bdbf 473d ef5c bdbf bfef 0000050 efbd bdbf 4a6e ef6b bdbf bfef 5cbd 062b 0000060 bfef 6bbd 1f6e ef3c bdbf bfef 12bd 033e 0000070 bfef 55bd bfef 26bd ef51 bdbf bfef 7bbd 0000080 bfef 7cbd ef19 bdbf bfef efbd bdbf 144b 0000090 2701 bfef 77bd ce45 72b4 ef27 bdbf bfef 00000a0 efbd bdbf 3172 bfef 22bd bfef d9bd ef8f 00000b0 bdbf 7237 bfef 7ebd d30f 1cb8 ef1b bdbf 00000c0 bfef 63bd ef2e bdbf bfef efbd bdbf bfef 00000d0 efbd bdbf bfef 38bd bfef efbd bdbf 0373 00000e0 bfef 5dbd bfef 00bd ef66 bdbf bfef 64bd 00000f0 bfef 00bd 0000 00000f6 And the Amazon copy of the file is: localhost:~/taskcluster/cloud-mirror $ hexdump actual.gz | tee actual.hex 0000000 ef1f bdbf 0008 ef05 bdbf bfef 57bd ef02 0000010 bdbf ef25 bdbf 0e31 bfef 20bd 0410 5e7b 0000020 bfef efbd bdbf bfef 12bd 6824 ef12 bdbf 0000030 614b bfef 0fbd bfef 07bd 3812 4501 ef38 0000040 bdbf 4d2f ef2c bdbf 473d ef5c bdbf bfef 0000050 efbd bdbf 4a6e ef6b bdbf bfef 5cbd 062b 0000060 bfef 6bbd 1f6e ef3c bdbf bfef 12bd 033e 0000070 bfef 55bd bfef 26bd ef51 bdbf bfef 7bbd 0000080 bfef 7cbd ef19 bdbf bfef efbd bdbf 144b 0000090 2701 bfef 77bd ce45 72b4 ef27 bdbf bfef 00000a0 efbd bdbf 3172 bfef 22bd bfef d9bd ef8f 00000b0 bdbf 7237 bfef 7ebd d30f 1cb8 ef1b bdbf 00000c0 bfef 63bd ef2e bdbf bfef efbd bdbf bfef 00000d0 efbd bdbf bfef 38bd bfef efbd bdbf 0373 00000e0 bfef 5dbd bfef 00bd ef66 bdbf bfef 64bd 00000f0 bfef 00bd 0000 00000f6 And the differences are: localhost:~/taskcluster/cloud-mirror $ diff actual.hex expected.hex 1c1 < 0000000 ef1f bdbf 0008 ef05 bdbf bfef 57bd ef02 --- > 0000000 ef1f bdbf 0008 ef07 bdbf bfef 57bd ef02 I'm not sure what's going on with ef05 vs ef07. I'm about to re-read the zlib spec [1], but from my last time reading it, this was something to do with a difference in the generation time of the archive, though that should not be happening as nothing should be touching the gzip encoding, but I am hitting the /gzip endpoint two different times, and it might correctly be setting the time-of-generation header in the GZIP file. That wouldn't explain the difference in length and bytes uploaded numbers, but is a good first step. [1] http://www.zlib.org/rfc-gzip.html#file-format
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment 46•8 years ago
|
||
I notice that this problem seems to happen in batches, affecting only one particular revision and platform at a time. That seems odd to me; does it make sense to anyone else? Examples: Oct 24, fx-team, 215f9686117673a2c914ed207bc7da9bb8d741ad, win32-pgo, 16 failures Oct 21, autoland, 4b2363bd94e72797913817ae8bb9195afd978169, linux64-debug, 32 failures Oct 18, fx-team, dc89484d4b45abf442162e5ea2dd46f9de40197d, linux64-opt, 20 failures
Comment 47•8 years ago
|
||
Actually, maybe it's not odd: I suppose it just suggests that a particular file is not accessible, for a certain time period. (I had been thinking of this bug as an intermittent network problem, a server being down or a connection timing out. Now I see that it is more about a specific file, for a period of time.)
Comment 48•8 years ago
|
||
Right -- once a file is mirrored incorrectly, it will "poison" a bunch of downstream jobs. By the way, John's got a fix for this brewing -- https://github.com/taskcluster/cloud-mirror/pull/23
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment 51•8 years ago
|
||
this had a large spike yesterday of 59 failures reported- we might want to focus on this as it is becoming more of a nuisance for developers and sheriffs.
Comment 52•8 years ago
|
||
John Ford is heads down working on this issue and he believes it should be resolved with https://github.com/taskcluster/cloud-mirror/pull/23
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Assignee | ||
Comment 55•8 years ago
|
||
I've deployed the change that I believe will fix this issue today.
Flags: needinfo?(jhford)
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment 58•8 years ago
|
||
The 7 days trend looks in the right direction. Good job jhford!
Status: NEW → RESOLVED
Closed: 8 years ago
Resolution: --- → FIXED
Comment 59•8 years ago
|
||
Armen, we should probably back out the debug logging that's in this bug.
Status: RESOLVED → REOPENED
Flags: needinfo?(armenzg)
Resolution: FIXED → ---
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment 63•7 years ago
|
||
jhford, getting this on your radar. Not sure if you're aware of any issues that might have attributed to this on September 27th.
Flags: needinfo?(jhford)
Assignee | ||
Comment 64•7 years ago
|
||
No aware of anything that would've had a specific effect on Sept 27. When we have the new artifact api is being used in production for these builds, finding the system on which the root of this issue occurs should be easy because we'll have consistency failures either when uploading or when downloading the bundles.
Flags: needinfo?(jhford)
Updated•7 years ago
|
Whiteboard: [stockwell infra]
Comment 65•6 years ago
|
||
The immediate issue here is fixed. I think that root-causing this particular issue is too late, and the artifact API will surface any related issues later, at which point we won't remember to look at this bug :)
Status: REOPENED → RESOLVED
Closed: 8 years ago → 6 years ago
Resolution: --- → FIXED
Comment hidden (Intermittent Failures Robot) |
You need to log in
before you can comment on or make changes to this bug.
Description
•