cot_verify intermittent failure: KeyError: 'payload' on task object
Categories
(Release Engineering :: Release Automation, defect)
Tracking
(Not tracked)
People
(Reporter: jlorenzo, Unassigned)
References
(Blocks 1 open bug)
Details
(Keywords: intermittent-failure)
Attachments
(2 files)
It occurred twice today:
2020-04-03T04:13:21 INFO - found dependencies: [('signing:l10n:docker-image:parent', 'R_4dO-NQT4aGVyXBSJ0vFw'), ('signing:l10n:docker-image:docker-image', 'FAtibEfnS3u8DWLGzP6QHQ')]
2020-04-03T04:13:21 CRITICAL - Chain of Trust verification error!
Traceback (most recent call last):
File "/builds/scriptworker/lib/python3.7/site-packages/scriptworker/cot/verify.py", line 1901, in verify_chain_of_trust
await build_task_dependencies(chain, chain.task, chain.name, chain.task_id)
File "/builds/scriptworker/lib/python3.7/site-packages/scriptworker/cot/verify.py", line 606, in build_task_dependencies
await build_link(chain, task_name, task_id)
File "/builds/scriptworker/lib/python3.7/site-packages/scriptworker/cot/verify.py", line 581, in build_link
await build_task_dependencies(chain, task_defn, task_name, task_id)
File "/builds/scriptworker/lib/python3.7/site-packages/scriptworker/cot/verify.py", line 606, in build_task_dependencies
await build_link(chain, task_name, task_id)
File "/builds/scriptworker/lib/python3.7/site-packages/scriptworker/cot/verify.py", line 581, in build_link
await build_task_dependencies(chain, task_defn, task_name, task_id)
File "/builds/scriptworker/lib/python3.7/site-packages/scriptworker/cot/verify.py", line 606, in build_task_dependencies
await build_link(chain, task_name, task_id)
File "/builds/scriptworker/lib/python3.7/site-packages/scriptworker/cot/verify.py", line 575, in build_link
link.task = task_defn
File "/builds/scriptworker/lib/python3.7/site-packages/scriptworker/cot/verify.py", line 246, in task
self.task_type = guess_task_type(self.name, self.task)
File "/builds/scriptworker/lib/python3.7/site-packages/scriptworker/cot/verify.py", line 389, in guess_task_type
if is_action(task_defn):
File "/builds/scriptworker/lib/python3.7/site-packages/scriptworker/task.py", line 531, in is_action
if _extract_from_env_in_payload(task, "ACTION_CALLBACK"):
File "/builds/scriptworker/lib/python3.7/site-packages/scriptworker/task.py", line 255, in _extract_from_env_in_payload
return task["payload"].get("env", {}).get(key, default)
KeyError: 'payload'
2020-04-03T04:13:21 ERROR - Hit ScriptWorkerException: "'payload'"
https://firefox-ci-tc.services.mozilla.com/tasks/ZEYZxKgeSv6voGQWKoYV4w/runs/0/logs/https%3A%2F%2Ffirefox-ci-tc.services.mozilla.com%2Fapi%2Fqueue%2Fv1%2Ftask%2FZEYZxKgeSv6voGQWKoYV4w%2Fruns%2F0%2Fartifacts%2Fpublic%2Flogs%2Fchain_of_trust.log#L49
https://firefox-ci-tc.services.mozilla.com/tasks/BsGmrjKqS_2zvNgIU0f8ug/runs/0/logs/https%3A%2F%2Ffirefox-ci-tc.services.mozilla.com%2Fapi%2Fqueue%2Fv1%2Ftask%2FBsGmrjKqS_2zvNgIU0f8ug%2Fruns%2F0%2Fartifacts%2Fpublic%2Flogs%2Fchain_of_trust.log
| Reporter | ||
Comment 1•6 years ago
|
||
I wonder if queue.task() returned an empty dict[1] instead of raising an error. That might be a bug in the actual queue service and not in the taskcluster library. Does this sound plausible to you, Aki?
| Comment hidden (Intermittent Failures Robot) |
Comment 3•6 years ago
|
||
I think that's plausible. I think it's worth detecting and logging, and potentially retrying, on the scriptworker side. Filing a bug against the queue if it's a queue bug would make sense as well.
Updated•6 years ago
|
Comment 5•6 years ago
|
||
I went looking for the queue.task call from the log snippet above, and I don't see any of the taskIds in that snippet here. Thanks to the Goog for not providing any mechanism but screen-shotting to get that data out. Search URL is
Comment 6•6 years ago
|
||
Oh, those timestamps are UTC-7, lol...
Comment 7•6 years ago
|
||
OK, with the correct timestamps, I see these queue API calls from the scriptworker. All have 200 responses.
Comment 8•6 years ago
|
||
That last /task/ GET, logged at iprepd.
{
"insertId": "wx3kzp20fmeosm6bo",
"jsonPayload": {
"referrer": "",
"request": "GET /api/queue/v1/task/R_4dO-NQT4aGVyXBSJ0vFw HTTP/1.1",
"status": "200",
"remote_addr": "63.245.211.193",
"user_agent": "Python/3.7 aiohttp/4.0.0a0",
"proxy_host": "queue",
"remote_user": "",
"request_time": 0.082,
"bytes_sent": 4361,
"trace": "8d153a1c0eb7b70b817bbd94274ea9f5/2125829794290070311",
"x_forwarded_for": "63.245.211.193, 35.190.5.182",
"x_forwarded_proto": "https"
},
"resource": {
"type": "k8s_container",
"labels": {
"namespace_name": "firefoxcitc-taskcluster",
"location": "us-west1",
"project_id": "moz-fx-taskcluster-prod-4b87",
"pod_name": "iprepd-nginx-firefoxcitc-taskcluster-768bd898f9-z49cw",
"cluster_name": "taskcluster-firefoxcitc-v1",
"container_name": "nginx"
}
},
"timestamp": "2020-04-03T11:13:21.588166397Z",
"severity": "INFO",
"labels": {
"k8s-pod/pod-template-hash": "768bd898f9",
"k8s-pod/app_kubernetes_io/version": "0.1.13",
"k8s-pod/jenkins-build-id": "1",
"k8s-pod/app_kubernetes_io/name": "iprepd-nginx",
"k8s-pod/app_kubernetes_io/instance": "firefoxcitc",
"k8s-pod/app_kubernetes_io/managed-by": "jenkins",
"k8s-pod/app_kubernetes_io/part-of": "iprepd-nginx",
"k8s-pod/fullname": "iprepd-nginx-firefoxcitc-taskcluster",
"k8s-pod/app_kubernetes_io/component": "nginx"
},
"logName": "projects/moz-fx-taskcluster-prod-4b87/logs/stdout",
"receiveTimestamp": "2020-04-03T11:13:23.448443386Z"
}
A little local experimenting shows that the bytes_sent is approximately the bytes in the response, and would have been compressed in this case. For example:
dustin@rubin ~ $ curl -s -H "Accept-Encoding: gzip" https://firefox-ci-tc.services.mozilla.com/api/queue/v1/task/R_4dO-NQT4aGVyXBSJ0vFw | wc -c
4348
shows bytes_sent as 4361 in the logs, precisely matching the log entry pasted above. So Taskcluster responded with the full task definition. Something got lost between the HTTP response leaving the load balancer and scriptworker/task.py:255..
Description
•