Closed Bug 1306865 Opened 9 years ago Closed 8 years ago

Intermittent-infra Funsize ValueError: No JSON object could be decoded

Categories

(Taskcluster :: Services, defect)

defect
Not set
normal

Tracking

(Not tracked)

RESOLVED INCOMPLETE

People

(Reporter: intermittent-bug-filer, Unassigned)

Details

(Keywords: bulk-close-intermittents, intermittent-failure, Whiteboard: [stockwell infra])

It feels like it is a cloud-mirror issue: ---request begin--- GET /https%3A%2F%2Fs3-us-west-2.amazonaws.com%2Ftaskcluster-public-artifacts%2FM2ix_CuUSOm52G49nE76Jg%2F0%2Fpublic%2Fbuild%2Ftarget.test_packages.json H TTP/1.1 User-Agent: Wget/1.13.4 (linux-gnu) Accept: */* Host: cloud-mirror-production-us-east-1.s3.amazonaws.com Connection: Keep-Alive ---request end--- HTTP request sent, awaiting response... ---response begin--- HTTP/1.1 200 OK x-amz-id-2: PFuos/Q3ppTiXJKGxRmAFjbzqTHSQm4QlEkSSe1khDzYQ2D8gXvdGmHoeBeL74IeIBGsJnDWMVA= x-amz-request-id: 266DE2A80F6E01FF Date: Wed, 11 Jan 2017 11:42:54 GMT Last-Modified: Wed, 11 Jan 2017 07:40:14 GMT x-amz-expiration: expiry-date="Fri, 13 Jan 2017 00:00:00 GMT", rule-id="us-east-1-1-day" ETag: "865588d50a8998d378f5afbf8c4c491f" x-amz-meta-cloud-mirror-upstream-url: https://s3-us-west-2.amazonaws.com/taskcluster-public-artifacts/M2ix_CuUSOm52G49nE76Jg/0/public/build/target.test_ packages.json x-amz-meta-cloud-mirror-upstream-content-length: <unknown> x-amz-meta-cloud-mirror-stored: 2017-01-11T07:40:13.119Z x-amz-meta-cloud-mirror-upstream-etag: <unknown> x-amz-meta-cloud-mirror-addresses: [{"c":200,"u":"https://s3-us-west-2.amazonaws.com/taskcluster-public-artifacts/M2ix_CuUSOm52G49nE76Jg/0/public/build/ target.test_packages.json","t":"2017-01-11T07:40:07.984Z"}] Accept-Ranges: bytes Content-Type: application/xml Content-Length: 282 Server: AmazonS3 ---response end--- 200 OK Disabling further reuse of socket 4. Closed 4/SSL 0x0000000001005880 Registered socket 3 for persistent reuse. Length: 282 [application/xml] Saving to: `target.test_packages.json.2' 100%[==============================================================================================================>] 282 --.-K/s in 0s 2017-01-11 11:42:53 (6.86 MB/s) - `target.test_packages.json.2' saved [282/282] root@taskcluster-worker:~/workspace/build# cat target.test_packages.json <?xml version="1.0" encoding="UTF-8"?> <Error><Code>InternalError</Code><Message>We encountered an internal error. Please try again.</Message><RequestId>A690174B7BB0423A</RequestId><HostId>+G Wa47hi3/ZgD2bJwuRvCrTBi7/8XROTDQ5q9kVe2HpwrIi3DESwoopdIUAnUtQ66epbvon2k6Q=</HostId></Error>root@taskcluster-worker:~/workspace/build#
Component: General Automation → Platform and Services
Product: Release Engineering → Taskcluster
QA Contact: catlee
This message in the log suggests to me that the resource was attempted to be fetched before it was uploaded to the original upstream bucket. <Error><Code>NoSuchKey</Code><Message>The specified key does not exist.</Message><Key>https://s3-us-west-2.amazonaws.com/taskcluster-public-artifacts/QS3KiICIRfiwGZUn-Bxaxw/0/public/env/manifest.json</Key><RequestId>E5BB45D3FECFB3E3</RequestId><HostId>EvAXe70tgw4SWYWOQVC4L6JVE0BPcJxwgxZIOs/VCvKO5CIvDCmKieVvJDLIQJN45WBwqFeBLvY=</HostId></Error>+ python /home/worker/bin/funsize-balrog-submitter.py --artifacts-url-prefix https://queue.taskcluster.net/v1/task/QS3KiICIRfiwGZUn-Bxaxw/artifacts/public/env --manifest /home/worker/artifacts/manifest.json -a http://balrog/api --signing-cert /home/worker/keys/nightly.pubkey --verbose From US-East-1, I get the following for the resource that failed to download as: ~ $ curl -L -v -o out https://queue.taskcluster.net/v1/task/QS3KiICIRfiwGZUn-Bxaxw/artifacts/public/env/manifest.json * Hostname was NOT found in DNS cache % Total % Received % Xferd Average Speed Time Time Time Current Dload Upload Total Spent Left Speed 0 0 0 0 0 0 0 0 --:--:-- --:--:-- --:--:-- 0* Trying 54.225.134.170... * Connected to queue.taskcluster.net (54.225.134.170) port 443 (#0) * successfully set certificate verify locations: * CAfile: none CApath: /etc/ssl/certs * SSLv3, TLS handshake, Client hello (1): } [data not shown] * SSLv3, TLS handshake, Server hello (2): { [data not shown] * SSLv3, TLS handshake, CERT (11): { [data not shown] * SSLv3, TLS handshake, Server key exchange (12): { [data not shown] * SSLv3, TLS handshake, Server finished (14): { [data not shown] * SSLv3, TLS handshake, Client key exchange (16): } [data not shown] * SSLv3, TLS change cipher, Client hello (1): } [data not shown] * SSLv3, TLS handshake, Finished (20): } [data not shown] * SSLv3, TLS change cipher, Client hello (1): { [data not shown] * SSLv3, TLS handshake, Finished (20): { [data not shown] * SSL connection using ECDHE-RSA-AES128-GCM-SHA256 * Server certificate: * subject: C=US; ST=California; L=Mountain View; O=Mozilla Corporation; CN=auth.taskcluster.net * start date: 2016-03-17 00:00:00 GMT * expire date: 2019-03-22 12:00:00 GMT * subjectAltName: queue.taskcluster.net matched * issuer: C=US; O=DigiCert Inc; CN=DigiCert SHA2 Secure Server CA * SSL certificate verify ok. > GET /v1/task/QS3KiICIRfiwGZUn-Bxaxw/artifacts/public/env/manifest.json HTTP/1.1 > User-Agent: curl/7.35.0 > Host: queue.taskcluster.net > Accept: */* > < HTTP/1.1 404 Not Found * Server Cowboy is not blacklisted < Server: Cowboy < Connection: keep-alive < X-Powered-By: Express < Strict-Transport-Security: max-age=7776000 < Access-Control-Allow-Origin: * < Access-Control-Allow-Methods: OPTIONS,GET,HEAD,POST,PUT,DELETE,TRACE,CONNECT < Access-Control-Request-Method: * < Access-Control-Allow-Headers: X-Requested-With,Content-Type,Authorization,Accept,Origin < Content-Type: application/json; charset=utf-8 < Content-Length: 37 < Etag: W/"25-c445155e" < Date: Wed, 11 Jan 2017 12:35:47 GMT < Via: 1.1 vegur < { [data not shown] 100 37 100 37 0 0 95 0 --:--:-- --:--:-- --:--:-- 95 * Connection #0 to host queue.taskcluster.net left intact ~ $ cat out { "message": "Artifact not found" }~ $
(In reply to John Ford [:jhford] CET/CEST Berlin Time from comment #2) > From US-East-1, I get the following for the resource that failed to download > as: > > ~ $ curl -L -v -o out > https://queue.taskcluster.net/v1/task/QS3KiICIRfiwGZUn-Bxaxw/artifacts/ > public/env/manifest.json From https://tools.taskcluster.net/task-inspector/#QS3KiICIRfiwGZUn-Bxaxw/0 it looks like this is a signing-worker-v1 worker type (of the signing-provisioner-v1 provisioner) from 3 months ago, that has not yet expired (expires in October 2017) yet has no artifacts, (including no log file). Aki, do you know more about this? Thanks!
Flags: needinfo?(aki)
The task definition for that task points to the manifest https://queue.taskcluster.net/v1/task/H6hLVYKBSAyZkxJmqwHzLg/artifacts/public/env/manifest.json which also appears not to exist at the moment.
Ah, it looks like that manifest probably used to exist, but expired. In task H6hLVYKBSAyZkxJmqwHzLg: "public/env": { "path": "/home/worker/artifacts/", "expires": "2016-10-08T16:05:58.680033Z", "type": "directory" } So at the time task QS3KiICIRfiwGZUn-Bxaxw ran, it did exist. But it is not clear why there are no artifacts attached to QS3KiICIRfiwGZUn-Bxaxw - it could be that these artifacts were set to expire earlier than the task expiry, but that is not part of the task payload, so we can't see that. This would be my guess though - that the artifact(s) of task QS3KiICIRfiwGZUn-Bxaxw expired recently, causing this problem.
I think we can ignore this. - signing-worker-v1 workers are only just now becoming tier1 - funsize running against signing-worker-v1 workers are only just now becoming tier1 - aiui there was a new release of cloud mirror, though I'm not sure if that happened after oct 1. Have we seen other instances of this?
Flags: needinfo?(aki)
So far this has only be reported on the 11th, but I'm still not sure why it happened. Task B was requesting an artifact from task A after artifacts were uploaded for Task A and Task A was marked resolved. This shouldn't have been a timing issue. The artifact exists at the time of me writing this comment too. It's hard to diagnose now that it's a week old (papertrail log searching is only around for 3 days). John is working on improving how we upload/download artifacts so that tasks are only completed successfully once artifacts are uploaded and content is verified on the s3 side.
Whiteboard: [stockwell infra]
Status: NEW → RESOLVED
Closed: 8 years ago
Resolution: --- → INCOMPLETE
Component: Platform and Services → Services
You need to log in before you can comment on or make changes to this bug.