Closed
Bug 1496906
Opened 7 years ago
Closed 7 years ago
AccessDenied accessing logs for some tasks (shown in Treeherder as "Log parsing failed. Unable to generate failure summary.")
Categories
(Taskcluster :: Workers, defect)
Taskcluster
Workers
Tracking
(Not tracked)
RESOLVED
WORKSFORME
People
(Reporter: cbrindusan, Unassigned)
Details
https://treeherder.mozilla.org/#/jobs?repo=mozilla-inbound&resultStatus=pending,running,success,testfailed,busted,exception&selectedJob=203802556&revision=a994947dd760f44b168c870c0c243e284bda4b55 has "Log parsing failed. Unable to generate failure summary." in "Failure Summary".
Raw log doesen't open, the following error is displayed:
"This XML file does not appear to have any style information associated with it. The document tree is shown below.
<Error>
<Code>AccessDenied</Code>
<Message>Access Denied</Message>
<RequestId>F83C6B6D55229DEB</RequestId>
<HostId></HostId>
</Error>"
| Reporter | ||
Updated•7 years ago
|
Component: Treeherder: Log Parsing & Classification → Operations
Product: Tree Management → Taskcluster
Version: --- → unspecified
| Reporter | ||
Updated•7 years ago
|
Version: unspecified → Trunk
| Comment hidden (Intermittent Failures Robot) |
| Comment hidden (Intermittent Failures Robot) |
Comment 3•7 years ago
|
||
Example task where this occurred:
https://tools.taskcluster.net/groups/R8fuQpy0R_GYO4dCroX1gA/tasks/ELnHcEX0S_KGyCRWx0OhUQ/details
The `State` is reported as `failed`.
Both the `live.log` and `live_backing.log` give AccessDenied.
However `certified.log` is accessible:
https://tools.taskcluster.net/groups/R8fuQpy0R_GYO4dCroX1gA/tasks/ELnHcEX0S_KGyCRWx0OhUQ/runs/0/logs/public%2Flogs%2Fcertified.log
Dustin, do you know why the logs are AccessDenied in this case? Is this another failure mode where taskcluster-treeherder should not be setting the log URL, so it would more-correctly appear as "doesn't have log"? :-)
Flags: needinfo?(dustin)
Summary: "Log parsing failed. Unable to generate failure summary." for unknown failure → AccessDenied accessing logs for some tasks (shown in Treeherder as "Log parsing failed. Unable to generate failure summary.")
Comment 4•7 years ago
|
||
That's weird, for sure. I see the AccessDenied even from outside EC2, so it's unlikely to do with mirroring. That task was 7 days ago, which is beyond our log retention period, so I can't see what happened on the worker. But at a guess, the worker failed while trying to upload the logs to S3, but after having gotten a signed PUT URL for the queue - hence the existence of the artifact record, but no actual artifact.
Flags: needinfo?(dustin)
| Comment hidden (Intermittent Failures Robot) |
Comment 6•7 years ago
|
||
These instances are from less than 7 days ago - don't suppose you could take a look at the logs?
https://taskcluster-artifacts.net/dxefp1p3Rjeyd6LbUY7QdQ/0/public/logs/live_backing.log
-> https://tools.taskcluster.net/groups/WAqjD7CUSam-d7q4xni-vw/tasks/dxefp1p3Rjeyd6LbUY7QdQ/runs/0
https://taskcluster-artifacts.net/ewYZKJ2mTLqeTNwXMDOrnw/0/public/logs/live_backing.log
-> https://tools.taskcluster.net/groups/PbmEnMS6TrOJyTTvTJEC6Q/tasks/ewYZKJ2mTLqeTNwXMDOrnw/runs/0
https://taskcluster-artifacts.net/X4lSW1DuQ82fSVtQgz5jmA/0/public/logs/live_backing.log
-> https://tools.taskcluster.net/groups/Mkp0y1dhTt-7a43-4Zm0Nw/tasks/X4lSW1DuQ82fSVtQgz5jmA/runs/0
Flags: needinfo?(dustin)
Comment 7•7 years ago
|
||
Taking the last of those, I see
991294539308367890 2018-10-23T07:37:37 2018-10-23T07:37:38Z 42682693 taskcluster-queue 54.82.21.236 Local7 Info app/web.22 Authorized task-client/X4lSW1DuQ82fSVtQgz5jmA/0/on/us-west-1/i-093c03a3941a8a2fd/until/1540280667.147 for POST access to /v1/task/X4lSW1DuQ82fSVtQgz5jmA/runs/0/artifacts/public%2Flogs%2Flive_backing.log
worker requests and is given a signed S3 PUT URL
991294539576807429 2018-10-23T07:37:38 2018-10-23T07:37:38Z 2469960232 docker-worker.aws-provisioner.us-west-1b.ami-a45db0c7.m3-large.i-093c03a3941a8a2fd 13.57.28.215 User Notice docker-worker {"type":"Uploading public/logs/live_backing.log","source":"uploadToS3","taskId":"X4lSW1DuQ82fSVtQgz5jmA","runId":0,"artifactName":"public/logs/live_backing.log"}
991294544882602020 2018-10-23T07:37:39 2018-10-23T07:37:39Z 2469960232 docker-worker.aws-provisioner.us-west-1b.ami-a45db0c7.m3-large.i-093c03a3941a8a2fd 13.57.28.215 User Notice docker-worker {"type":"retrying artifact upload","source":"uploadToS3","taskId":"X4lSW1DuQ82fSVtQgz5jmA","runId":0,"artifactName":"public/logs/live_backing.log","attemptNumber":2}
991294545050374169 2018-10-23T07:37:39 2018-10-23T07:37:39Z 2469960232 docker-worker.aws-provisioner.us-west-1b.ami-a45db0c7.m3-large.i-093c03a3941a8a2fd 13.57.28.215 User Notice docker-worker {"type":"Uploading public/logs/live_backing.log","source":"uploadToS3","taskId":"X4lSW1DuQ82fSVtQgz5jmA","runId":0,"artifactName":"public/logs/live_backing.log"}
991294553120215041 2018-10-23T07:37:41 2018-10-23T07:37:41Z 2469960232 docker-worker.aws-provisioner.us-west-1b.ami-a45db0c7.m3-large.i-093c03a3941a8a2fd 13.57.28.215 User Notice docker-worker {"type":"retrying artifact upload","source":"uploadToS3","taskId":"X4lSW1DuQ82fSVtQgz5jmA","runId":0,"artifactName":"public/logs/live_backing.log","attemptNumber":3}
991294553292181504 2018-10-23T07:37:41 2018-10-23T07:37:41Z 2469960232 docker-worker.aws-provisioner.us-west-1b.ami-a45db0c7.m3-large.i-093c03a3941a8a2fd 13.57.28.215 User Notice docker-worker {"type":"Uploading public/logs/live_backing.log","source":"uploadToS3","taskId":"X4lSW1DuQ82fSVtQgz5jmA","runId":0,"artifactName":"public/logs/live_backing.log"}
991294564310618112 2018-10-23T07:37:44 2018-10-23T07:37:44Z 2469960232 docker-worker.aws-provisioner.us-west-1b.ami-a45db0c7.m3-large.i-093c03a3941a8a2fd 13.57.28.215 User Notice docker-worker {"type":"retrying artifact upload","source":"uploadToS3","taskId":"X4lSW1DuQ82fSVtQgz5jmA","runId":0,"artifactName":"public/logs/live_backing.log","attemptNumber":4}
991294564478390278 2018-10-23T07:37:44 2018-10-23T07:37:44Z 2469960232 docker-worker.aws-provisioner.us-west-1b.ami-a45db0c7.m3-large.i-093c03a3941a8a2fd 13.57.28.215 User Notice docker-worker {"type":"Uploading public/logs/live_backing.log","source":"uploadToS3","taskId":"X4lSW1DuQ82fSVtQgz5jmA","runId":0,"artifactName":"public/logs/live_backing.log"}
991294578684497931 2018-10-23T07:37:47 2018-10-23T07:37:47Z 2469960232 docker-worker.aws-provisioner.us-west-1b.ami-a45db0c7.m3-large.i-093c03a3941a8a2fd 13.57.28.215 User Notice docker-worker {"type":"retrying artifact upload","source":"uploadToS3","taskId":"X4lSW1DuQ82fSVtQgz5jmA","runId":0,"artifactName":"public/logs/live_backing.log","attemptNumber":5}
991294578856464389 2018-10-23T07:37:47 2018-10-23T07:37:47Z 2469960232 docker-worker.aws-provisioner.us-west-1b.ami-a45db0c7.m3-large.i-093c03a3941a8a2fd 13.57.28.215 User Notice docker-worker {"type":"Uploading public/logs/live_backing.log","source":"uploadToS3","taskId":"X4lSW1DuQ82fSVtQgz5jmA","runId":0,"artifactName":"public/logs/live_backing.log"}
991294598292869127 2018-10-23T07:37:52 2018-10-23T07:37:52Z 2469960232 docker-worker.aws-provisioner.us-west-1b.ami-a45db0c7.m3-large.i-093c03a3941a8a2fd 13.57.28.215 User Notice docker-worker {"type":"retrying artifact upload","source":"uploadToS3","taskId":"X4lSW1DuQ82fSVtQgz5jmA","runId":0,"artifactName":"public/logs/live_backing.log","attemptNumber":6}
991294598460641289 2018-10-23T07:37:52 2018-10-23T07:37:52Z 2469960232 docker-worker.aws-provisioner.us-west-1b.ami-a45db0c7.m3-large.i-093c03a3941a8a2fd 13.57.28.215 User Notice docker-worker {"type":"Uploading public/logs/live_backing.log","source":"uploadToS3","taskId":"X4lSW1DuQ82fSVtQgz5jmA","runId":0,"artifactName":"public/logs/live_backing.log"}
991294618421334028 2018-10-23T07:37:57 2018-10-23T07:37:56Z 2469960232 docker-worker.aws-provisioner.us-west-1b.ami-a45db0c7.m3-large.i-093c03a3941a8a2fd 13.57.28.215 User Notice docker-worker {"type":"retrying artifact upload","source":"uploadToS3","taskId":"X4lSW1DuQ82fSVtQgz5jmA","runId":0,"artifactName":"public/logs/live_backing.log","attemptNumber":7}
991294618593300485 2018-10-23T07:37:57 2018-10-23T07:37:56Z 2469960232 docker-worker.aws-provisioner.us-west-1b.ami-a45db0c7.m3-large.i-093c03a3941a8a2fd 13.57.28.215 User Notice docker-worker {"type":"Uploading public/logs/live_backing.log","source":"uploadToS3","taskId":"X4lSW1DuQ82fSVtQgz5jmA","runId":0,"artifactName":"public/logs/live_backing.log"}
991294653150171139 2018-10-23T07:38:05 2018-10-23T07:38:05Z 2469960232 docker-worker.aws-provisioner.us-west-1b.ami-a45db0c7.m3-large.i-093c03a3941a8a2fd 13.57.28.215 User Notice docker-worker {"type":"retrying artifact upload","source":"uploadToS3","taskId":"X4lSW1DuQ82fSVtQgz5jmA","runId":0,"artifactName":"public/logs/live_backing.log","attemptNumber":8}
991294653150171141 2018-10-23T07:38:05 2018-10-23T07:38:05Z 2469960232 docker-worker.aws-provisioner.us-west-1b.ami-a45db0c7.m3-large.i-093c03a3941a8a2fd 13.57.28.215 User Notice docker-worker {"type":"Uploading public/logs/live_backing.log","source":"uploadToS3","taskId":"X4lSW1DuQ82fSVtQgz5jmA","runId":0,"artifactName":"public/logs/live_backing.log"}
991294656132321286 2018-10-23T07:38:06 2018-10-23T07:38:05Z 2469960232 docker-worker.aws-provisioner.us-west-1b.ami-a45db0c7.m3-large.i-093c03a3941a8a2fd 13.57.28.215 User Notice docker-worker {"type":"reporting efficiency","source":"top","provisionerId":"aws-provisioner-v1","workerId":"i-093c03a3941a8a2fd","workerGroup":"us-west-1","workerType":"gecko-t-linux-large","workerNodeType":"m3.large","efficiency":98.82547455295736,"uptime":3635,"totalRunTime":3592306}
991294656304287747 2018-10-23T07:38:06 2018-10-23T07:38:05Z 2469960232 docker-worker.aws-provisioner.us-west-1b.ami-a45db0c7.m3-large.i-093c03a3941a8a2fd 13.57.28.215 User Notice docker-worker {"type":"aliveness check","source":"top","provisionerId":"aws-provisioner-v1","workerId":"i-093c03a3941a8a2fd","workerGroup":"us-west-1","workerType":"gecko-t-linux-large","workerNodeType":"m3.large","alive":true,"uptime":3635,"interval":30000}
991294693197385729 2018-10-23T07:38:14 2018-10-23T07:38:14Z 2469960232 docker-worker.aws-provisioner.us-west-1b.ami-a45db0c7.m3-large.i-093c03a3941a8a2fd 13.57.28.215 User Notice docker-worker {"type":"retrying artifact upload","source":"uploadToS3","taskId":"X4lSW1DuQ82fSVtQgz5jmA","runId":0,"artifactName":"public/logs/live_backing.log","attemptNumber":9}
991294693365157910 2018-10-23T07:38:14 2018-10-23T07:38:14Z 2469960232 docker-worker.aws-provisioner.us-west-1b.ami-a45db0c7.m3-large.i-093c03a3941a8a2fd 13.57.28.215 User Notice docker-worker {"type":"Uploading public/logs/live_backing.log","source":"uploadToS3","taskId":"X4lSW1DuQ82fSVtQgz5jmA","runId":0,"artifactName":"public/logs/live_backing.log"}
991294746733481990 2018-10-23T07:38:27 2018-10-23T07:38:27Z 2469960232 docker-worker.aws-provisioner.us-west-1b.ami-a45db0c7.m3-large.i-093c03a3941a8a2fd 13.57.28.215 User Notice docker-worker {"type":"retrying artifact upload","source":"uploadToS3","taskId":"X4lSW1DuQ82fSVtQgz5jmA","runId":0,"artifactName":"public/logs/live_backing.log","attemptNumber":10}
991294746901254157 2018-10-23T07:38:27 2018-10-23T07:38:27Z 2469960232 docker-worker.aws-provisioner.us-west-1b.ami-a45db0c7.m3-large.i-093c03a3941a8a2fd 13.57.28.215 User Notice docker-worker {"type":"Uploading public/logs/live_backing.log","source":"uploadToS3","taskId":"X4lSW1DuQ82fSVtQgz5jmA","runId":0,"artifactName":"public/logs/live_backing.log"}
991294781978218501 2018-10-23T07:38:36 2018-10-23T07:38:35Z 2469960232 docker-worker.aws-provisioner.us-west-1b.ami-a45db0c7.m3-large.i-093c03a3941a8a2fd 13.57.28.215 User Notice docker-worker {"type":"aliveness check","source":"top","provisionerId":"aws-provisioner-v1","workerId":"i-093c03a3941a8a2fd","workerGroup":"us-west-1","workerType":"gecko-t-linux-large","workerNodeType":"m3.large","alive":true,"uptime":3665,"interval":30000}
991294803486609409 2018-10-23T07:38:41 2018-10-23T07:38:41Z 2469960232 docker-worker.aws-provisioner.us-west-1b.ami-a45db0c7.m3-large.i-093c03a3941a8a2fd 13.57.28.215 User Notice docker-worker {"type":"retrying artifact upload","source":"uploadToS3","taskId":"X4lSW1DuQ82fSVtQgz5jmA","runId":0,"artifactName":"public/logs/live_backing.log","attemptNumber":11}
991294803654381597 2018-10-23T07:38:41 2018-10-23T07:38:41Z 2469960232 docker-worker.aws-provisioner.us-west-1b.ami-a45db0c7.m3-large.i-093c03a3941a8a2fd 13.57.28.215 User Notice docker-worker {"type":"Uploading public/logs/live_backing.log","source":"uploadToS3","taskId":"X4lSW1DuQ82fSVtQgz5jmA","runId":0,"artifactName":"public/logs/live_backing.log"}
worker retries uploading the file..
991294804405161998 2018-10-23T07:38:41 2018-10-23T07:38:41Z 2469960232 docker-worker.aws-provisioner.us-west-1b.ami-a45db0c7.m3-large.i-093c03a3941a8a2fd 13.57.28.215 User Notice docker-worker {"type":"error killing states","source":"top","provisionerId":"aws-provisioner-v1","workerId":"i-093c03a3941a8a2fd","workerGroup":"us-west-1","workerType":"gecko-t-linux-large","workerNodeType":"m3.large","error":"Could not kill states properly. Error: Error: Error calling 'killed' for localLiveLog : Error: Could not upload artifact. Status Code: 403\n at Promise.all.then.results (/home/ubuntu/docker_worker/src/lib/states.js:58:15)\n at <anonymous>\n at process._tickDomainCallback (internal/process/next_tick.js:228:7)"}
991294808364584964 2018-10-23T07:38:42 2018-10-23T07:38:42Z 2469960232 docker-worker.aws-provisioner.us-west-1b.ami-a45db0c7.m3-large.i-093c03a3941a8a2fd 13.57.28.215 User Notice docker-worker {"type":"task resolved","source":"top","provisionerId":"aws-provisioner-v1","workerId":"i-093c03a3941a8a2fd","workerGroup":"us-west-1","workerType":"gecko-t-linux-large","workerNodeType":"m3.large","taskId":"X4lSW1DuQ82fSVtQgz5jmA","runId":0,"taskState":"failed"}
Worker gives up. In the "Could not kill states" error, we see the status code 403.
So it looks like the PUT URL it got from the queue didn't work. That could potentially be a queue bug, but Queue is basically just calling an AWS API and returning the result, so it's hard to see how that could go wrong at a rate of one in a million.
John, any ideas?
Flags: needinfo?(dustin) → needinfo?(jhford)
Comment 8•7 years ago
|
||
Calling createArtifact doesn't allocate anything in S3, rather it generates a specific signed URL that anyone can post to later to create the artifact. The actual artifact only becomes allocated in S3 when the actual PUT request starts. This is true for both types of S3 uploads.
S3 will also never give 404s for unauthenticated GET requests as it's a security issue to do so. It confirms the presence/absence of a file, so unless you are signing the GET request with the account credentials, you always get a 403 Access Denied, whether the access is denied or the file is not found.
I cannot seem to log into the S3 management console right now to confirm my suspicion, but if I could I'd double check that the artifact is truly not there. I verified that loading a URL through our CDN that's expected not to be there (https://taskcluster-artifacts.net/dxefp1p3Rjeyd6adslkjflasdfjlasdldshlsdaf), the same AccessDenied error is returned.
I think it's possible that the Queue returned an invalid URL or that by the time the URL was used, the signature was invalid, or that S3 was experiencing an outage while these uploads were attempted. Signing is a deterministic action, so I would expect this to be a service issue with S3 or networks before a Queue bug.
Given the lack of information in the logs, like status code of the upload attempt, I can only assume that S3 was returning 500 series errors from the upload attempts. A 400 should stop the upload attempts, since a 400 is not a recoverable error.
Wander, would it be a lot of work to add the status code of the upload attempts as well as logging in the non-public logs (e.g. papertrail) what the actual signed urls given were? Maybe we already log both of these? Can you confirm whether a 400 from S3 will terminate upload attempts?
Component: Operations → Docker-Worker
Flags: needinfo?(jhford) → needinfo?(wcosta)
Comment 10•7 years ago
|
||
Commit pushed to master at https://github.com/taskcluster/docker-worker
https://github.com/taskcluster/docker-worker/commit/b0279dcacde516dda37479b38c6302ab1cdfa2d0
Bug 1496906: Add more logs details to artifact upload
Comment 11•7 years ago
|
||
Now that we've got debugging, let's re-open if we see this again.
Status: NEW → RESOLVED
Closed: 7 years ago
Resolution: --- → WORKSFORME
| Assignee | ||
Updated•7 years ago
|
Component: Docker-Worker → Workers
You need to log in
before you can comment on or make changes to this bug.
Description
•