Closed Bug 1627256 Opened 6 years ago Closed 6 years ago

cot_verify intermittent failure: KeyError: 'payload' on task object

Categories

(Release Engineering :: Release Automation, defect)

defect
Not set
normal

Tracking

(Not tracked)

RESOLVED DUPLICATE of bug 1618731

People

(Reporter: jlorenzo, Unassigned)

References

(Blocks 1 open bug)

Details

(Keywords: intermittent-failure)

Attachments

(2 files)

It occurred twice today:

2020-04-03T04:13:21     INFO - found dependencies: [('signing:l10n:docker-image:parent', 'R_4dO-NQT4aGVyXBSJ0vFw'), ('signing:l10n:docker-image:docker-image', 'FAtibEfnS3u8DWLGzP6QHQ')]
2020-04-03T04:13:21 CRITICAL - Chain of Trust verification error!
Traceback (most recent call last):
  File "/builds/scriptworker/lib/python3.7/site-packages/scriptworker/cot/verify.py", line 1901, in verify_chain_of_trust
    await build_task_dependencies(chain, chain.task, chain.name, chain.task_id)
  File "/builds/scriptworker/lib/python3.7/site-packages/scriptworker/cot/verify.py", line 606, in build_task_dependencies
    await build_link(chain, task_name, task_id)
  File "/builds/scriptworker/lib/python3.7/site-packages/scriptworker/cot/verify.py", line 581, in build_link
    await build_task_dependencies(chain, task_defn, task_name, task_id)
  File "/builds/scriptworker/lib/python3.7/site-packages/scriptworker/cot/verify.py", line 606, in build_task_dependencies
    await build_link(chain, task_name, task_id)
  File "/builds/scriptworker/lib/python3.7/site-packages/scriptworker/cot/verify.py", line 581, in build_link
    await build_task_dependencies(chain, task_defn, task_name, task_id)
  File "/builds/scriptworker/lib/python3.7/site-packages/scriptworker/cot/verify.py", line 606, in build_task_dependencies
    await build_link(chain, task_name, task_id)
  File "/builds/scriptworker/lib/python3.7/site-packages/scriptworker/cot/verify.py", line 575, in build_link
    link.task = task_defn
  File "/builds/scriptworker/lib/python3.7/site-packages/scriptworker/cot/verify.py", line 246, in task
    self.task_type = guess_task_type(self.name, self.task)
  File "/builds/scriptworker/lib/python3.7/site-packages/scriptworker/cot/verify.py", line 389, in guess_task_type
    if is_action(task_defn):
  File "/builds/scriptworker/lib/python3.7/site-packages/scriptworker/task.py", line 531, in is_action
    if _extract_from_env_in_payload(task, "ACTION_CALLBACK"):
  File "/builds/scriptworker/lib/python3.7/site-packages/scriptworker/task.py", line 255, in _extract_from_env_in_payload
    return task["payload"].get("env", {}).get(key, default)
KeyError: 'payload'
2020-04-03T04:13:21    ERROR - Hit ScriptWorkerException: "'payload'"

https://firefox-ci-tc.services.mozilla.com/tasks/ZEYZxKgeSv6voGQWKoYV4w/runs/0/logs/https%3A%2F%2Ffirefox-ci-tc.services.mozilla.com%2Fapi%2Fqueue%2Fv1%2Ftask%2FZEYZxKgeSv6voGQWKoYV4w%2Fruns%2F0%2Fartifacts%2Fpublic%2Flogs%2Fchain_of_trust.log#L49
https://firefox-ci-tc.services.mozilla.com/tasks/BsGmrjKqS_2zvNgIU0f8ug/runs/0/logs/https%3A%2F%2Ffirefox-ci-tc.services.mozilla.com%2Fapi%2Fqueue%2Fv1%2Ftask%2FBsGmrjKqS_2zvNgIU0f8ug%2Fruns%2F0%2Fartifacts%2Fpublic%2Flogs%2Fchain_of_trust.log

I wonder if queue.task() returned an empty dict[1] instead of raising an error. That might be a bug in the actual queue service and not in the taskcluster library. Does this sound plausible to you, Aki?

[1] https://github.com/mozilla-releng/scriptworker/blob/532c2794bd80a443e776621c4019c591c374c91c/src/scriptworker/cot/verify.py#L1930

Flags: needinfo?(aki)

I think that's plausible. I think it's worth detecting and logging, and potentially retrying, on the scriptworker side. Filing a bug against the queue if it's a queue bug would make sense as well.

Flags: needinfo?(aki)
Status: NEW → RESOLVED
Closed: 6 years ago
Resolution: --- → DUPLICATE
Blocks: 1628088

Oh, those timestamps are UTC-7, lol...

Attached image image.png

OK, with the correct timestamps, I see these queue API calls from the scriptworker. All have 200 responses.

That last /task/ GET, logged at iprepd.

{
  "insertId": "wx3kzp20fmeosm6bo",
  "jsonPayload": {
    "referrer": "",
    "request": "GET /api/queue/v1/task/R_4dO-NQT4aGVyXBSJ0vFw HTTP/1.1",
    "status": "200",
    "remote_addr": "63.245.211.193",
    "user_agent": "Python/3.7 aiohttp/4.0.0a0",
    "proxy_host": "queue",
    "remote_user": "",
    "request_time": 0.082,
    "bytes_sent": 4361,
    "trace": "8d153a1c0eb7b70b817bbd94274ea9f5/2125829794290070311",
    "x_forwarded_for": "63.245.211.193, 35.190.5.182",
    "x_forwarded_proto": "https"
  },
  "resource": {
    "type": "k8s_container",
    "labels": {
      "namespace_name": "firefoxcitc-taskcluster",
      "location": "us-west1",
      "project_id": "moz-fx-taskcluster-prod-4b87",
      "pod_name": "iprepd-nginx-firefoxcitc-taskcluster-768bd898f9-z49cw",
      "cluster_name": "taskcluster-firefoxcitc-v1",
      "container_name": "nginx"
    }
  },
  "timestamp": "2020-04-03T11:13:21.588166397Z",
  "severity": "INFO",
  "labels": {
    "k8s-pod/pod-template-hash": "768bd898f9",
    "k8s-pod/app_kubernetes_io/version": "0.1.13",
    "k8s-pod/jenkins-build-id": "1",
    "k8s-pod/app_kubernetes_io/name": "iprepd-nginx",
    "k8s-pod/app_kubernetes_io/instance": "firefoxcitc",
    "k8s-pod/app_kubernetes_io/managed-by": "jenkins",
    "k8s-pod/app_kubernetes_io/part-of": "iprepd-nginx",
    "k8s-pod/fullname": "iprepd-nginx-firefoxcitc-taskcluster",
    "k8s-pod/app_kubernetes_io/component": "nginx"
  },
  "logName": "projects/moz-fx-taskcluster-prod-4b87/logs/stdout",
  "receiveTimestamp": "2020-04-03T11:13:23.448443386Z"
}

A little local experimenting shows that the bytes_sent is approximately the bytes in the response, and would have been compressed in this case. For example:

dustin@rubin ~ $ curl -s -H "Accept-Encoding: gzip" https://firefox-ci-tc.services.mozilla.com/api/queue/v1/task/R_4dO-NQT4aGVyXBSJ0vFw | wc -c 
4348

shows bytes_sent as 4361 in the logs, precisely matching the log entry pasted above. So Taskcluster responded with the full task definition. Something got lost between the HTTP response leaving the load balancer and scriptworker/task.py:255..

You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: