Closed
Bug 1306865
Opened 9 years ago
Closed 8 years ago
Intermittent-infra Funsize ValueError: No JSON object could be decoded
Categories
(Taskcluster :: Services, defect)
Taskcluster
Services
Tracking
(Not tracked)
RESOLVED
INCOMPLETE
People
(Reporter: intermittent-bug-filer, Unassigned)
Details
(Keywords: bulk-close-intermittents, intermittent-failure, Whiteboard: [stockwell infra])
Comment 1•8 years ago
|
||
It feels like it is a cloud-mirror issue:
---request begin---
GET /https%3A%2F%2Fs3-us-west-2.amazonaws.com%2Ftaskcluster-public-artifacts%2FM2ix_CuUSOm52G49nE76Jg%2F0%2Fpublic%2Fbuild%2Ftarget.test_packages.json H
TTP/1.1
User-Agent: Wget/1.13.4 (linux-gnu)
Accept: */*
Host: cloud-mirror-production-us-east-1.s3.amazonaws.com
Connection: Keep-Alive
---request end---
HTTP request sent, awaiting response...
---response begin---
HTTP/1.1 200 OK
x-amz-id-2: PFuos/Q3ppTiXJKGxRmAFjbzqTHSQm4QlEkSSe1khDzYQ2D8gXvdGmHoeBeL74IeIBGsJnDWMVA=
x-amz-request-id: 266DE2A80F6E01FF
Date: Wed, 11 Jan 2017 11:42:54 GMT
Last-Modified: Wed, 11 Jan 2017 07:40:14 GMT
x-amz-expiration: expiry-date="Fri, 13 Jan 2017 00:00:00 GMT", rule-id="us-east-1-1-day"
ETag: "865588d50a8998d378f5afbf8c4c491f"
x-amz-meta-cloud-mirror-upstream-url: https://s3-us-west-2.amazonaws.com/taskcluster-public-artifacts/M2ix_CuUSOm52G49nE76Jg/0/public/build/target.test_
packages.json
x-amz-meta-cloud-mirror-upstream-content-length: <unknown>
x-amz-meta-cloud-mirror-stored: 2017-01-11T07:40:13.119Z
x-amz-meta-cloud-mirror-upstream-etag: <unknown>
x-amz-meta-cloud-mirror-addresses: [{"c":200,"u":"https://s3-us-west-2.amazonaws.com/taskcluster-public-artifacts/M2ix_CuUSOm52G49nE76Jg/0/public/build/
target.test_packages.json","t":"2017-01-11T07:40:07.984Z"}]
Accept-Ranges: bytes
Content-Type: application/xml
Content-Length: 282
Server: AmazonS3
---response end---
200 OK
Disabling further reuse of socket 4.
Closed 4/SSL 0x0000000001005880
Registered socket 3 for persistent reuse.
Length: 282 [application/xml]
Saving to: `target.test_packages.json.2'
100%[==============================================================================================================>] 282 --.-K/s in 0s
2017-01-11 11:42:53 (6.86 MB/s) - `target.test_packages.json.2' saved [282/282]
root@taskcluster-worker:~/workspace/build# cat target.test_packages.json
<?xml version="1.0" encoding="UTF-8"?>
<Error><Code>InternalError</Code><Message>We encountered an internal error. Please try again.</Message><RequestId>A690174B7BB0423A</RequestId><HostId>+G
Wa47hi3/ZgD2bJwuRvCrTBi7/8XROTDQ5q9kVe2HpwrIi3DESwoopdIUAnUtQ66epbvon2k6Q=</HostId></Error>root@taskcluster-worker:~/workspace/build#
Component: General Automation → Platform and Services
Product: Release Engineering → Taskcluster
QA Contact: catlee
Comment 2•8 years ago
|
||
This message in the log suggests to me that the resource was attempted to be fetched before it was uploaded to the original upstream bucket.
<Error><Code>NoSuchKey</Code><Message>The specified key does not exist.</Message><Key>https://s3-us-west-2.amazonaws.com/taskcluster-public-artifacts/QS3KiICIRfiwGZUn-Bxaxw/0/public/env/manifest.json</Key><RequestId>E5BB45D3FECFB3E3</RequestId><HostId>EvAXe70tgw4SWYWOQVC4L6JVE0BPcJxwgxZIOs/VCvKO5CIvDCmKieVvJDLIQJN45WBwqFeBLvY=</HostId></Error>+ python /home/worker/bin/funsize-balrog-submitter.py --artifacts-url-prefix https://queue.taskcluster.net/v1/task/QS3KiICIRfiwGZUn-Bxaxw/artifacts/public/env --manifest /home/worker/artifacts/manifest.json -a http://balrog/api --signing-cert /home/worker/keys/nightly.pubkey --verbose
From US-East-1, I get the following for the resource that failed to download as:
~ $ curl -L -v -o out https://queue.taskcluster.net/v1/task/QS3KiICIRfiwGZUn-Bxaxw/artifacts/public/env/manifest.json
* Hostname was NOT found in DNS cache
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
0 0 0 0 0 0 0 0 --:--:-- --:--:-- --:--:-- 0* Trying 54.225.134.170...
* Connected to queue.taskcluster.net (54.225.134.170) port 443 (#0)
* successfully set certificate verify locations:
* CAfile: none
CApath: /etc/ssl/certs
* SSLv3, TLS handshake, Client hello (1):
} [data not shown]
* SSLv3, TLS handshake, Server hello (2):
{ [data not shown]
* SSLv3, TLS handshake, CERT (11):
{ [data not shown]
* SSLv3, TLS handshake, Server key exchange (12):
{ [data not shown]
* SSLv3, TLS handshake, Server finished (14):
{ [data not shown]
* SSLv3, TLS handshake, Client key exchange (16):
} [data not shown]
* SSLv3, TLS change cipher, Client hello (1):
} [data not shown]
* SSLv3, TLS handshake, Finished (20):
} [data not shown]
* SSLv3, TLS change cipher, Client hello (1):
{ [data not shown]
* SSLv3, TLS handshake, Finished (20):
{ [data not shown]
* SSL connection using ECDHE-RSA-AES128-GCM-SHA256
* Server certificate:
* subject: C=US; ST=California; L=Mountain View; O=Mozilla Corporation; CN=auth.taskcluster.net
* start date: 2016-03-17 00:00:00 GMT
* expire date: 2019-03-22 12:00:00 GMT
* subjectAltName: queue.taskcluster.net matched
* issuer: C=US; O=DigiCert Inc; CN=DigiCert SHA2 Secure Server CA
* SSL certificate verify ok.
> GET /v1/task/QS3KiICIRfiwGZUn-Bxaxw/artifacts/public/env/manifest.json HTTP/1.1
> User-Agent: curl/7.35.0
> Host: queue.taskcluster.net
> Accept: */*
>
< HTTP/1.1 404 Not Found
* Server Cowboy is not blacklisted
< Server: Cowboy
< Connection: keep-alive
< X-Powered-By: Express
< Strict-Transport-Security: max-age=7776000
< Access-Control-Allow-Origin: *
< Access-Control-Allow-Methods: OPTIONS,GET,HEAD,POST,PUT,DELETE,TRACE,CONNECT
< Access-Control-Request-Method: *
< Access-Control-Allow-Headers: X-Requested-With,Content-Type,Authorization,Accept,Origin
< Content-Type: application/json; charset=utf-8
< Content-Length: 37
< Etag: W/"25-c445155e"
< Date: Wed, 11 Jan 2017 12:35:47 GMT
< Via: 1.1 vegur
<
{ [data not shown]
100 37 100 37 0 0 95 0 --:--:-- --:--:-- --:--:-- 95
* Connection #0 to host queue.taskcluster.net left intact
~ $ cat out
{
"message": "Artifact not found"
}~ $
Comment 3•8 years ago
|
||
(In reply to John Ford [:jhford] CET/CEST Berlin Time from comment #2)
> From US-East-1, I get the following for the resource that failed to download
> as:
>
> ~ $ curl -L -v -o out
> https://queue.taskcluster.net/v1/task/QS3KiICIRfiwGZUn-Bxaxw/artifacts/
> public/env/manifest.json
From https://tools.taskcluster.net/task-inspector/#QS3KiICIRfiwGZUn-Bxaxw/0 it looks like this is a signing-worker-v1 worker type (of the signing-provisioner-v1 provisioner) from 3 months ago, that has not yet expired (expires in October 2017) yet has no artifacts, (including no log file).
Aki, do you know more about this? Thanks!
Flags: needinfo?(aki)
Comment 4•8 years ago
|
||
The task definition for that task points to the manifest https://queue.taskcluster.net/v1/task/H6hLVYKBSAyZkxJmqwHzLg/artifacts/public/env/manifest.json which also appears not to exist at the moment.
Comment 5•8 years ago
|
||
Ah, it looks like that manifest probably used to exist, but expired. In task H6hLVYKBSAyZkxJmqwHzLg:
"public/env": {
"path": "/home/worker/artifacts/",
"expires": "2016-10-08T16:05:58.680033Z",
"type": "directory"
}
So at the time task QS3KiICIRfiwGZUn-Bxaxw ran, it did exist. But it is not clear why there are no artifacts attached to QS3KiICIRfiwGZUn-Bxaxw - it could be that these artifacts were set to expire earlier than the task expiry, but that is not part of the task payload, so we can't see that. This would be my guess though - that the artifact(s) of task QS3KiICIRfiwGZUn-Bxaxw expired recently, causing this problem.
Comment 6•8 years ago
|
||
I think we can ignore this.
- signing-worker-v1 workers are only just now becoming tier1
- funsize running against signing-worker-v1 workers are only just now becoming tier1
- aiui there was a new release of cloud mirror, though I'm not sure if that happened after oct 1.
Have we seen other instances of this?
Flags: needinfo?(aki)
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment 9•8 years ago
|
||
So far this has only be reported on the 11th, but I'm still not sure why it happened.
Task B was requesting an artifact from task A after artifacts were uploaded for Task A and Task A was marked resolved. This shouldn't have been a timing issue. The artifact exists at the time of me writing this comment too.
It's hard to diagnose now that it's a week old (papertrail log searching is only around for 3 days). John is working on improving how we upload/download artifacts so that tasks are only completed successfully once artifacts are uploaded and content is verified on the s3 side.
Updated•8 years ago
|
Whiteboard: [stockwell infra]
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Updated•8 years ago
|
Status: NEW → RESOLVED
Closed: 8 years ago
Keywords: bulk-close-intermittents
Resolution: --- → INCOMPLETE
Assignee | ||
Updated•6 years ago
|
Component: Platform and Services → Services
You need to log in
before you can comment on or make changes to this bug.
Description
•