Closed Bug 1595838 Opened 5 years ago Closed 5 years ago

blob artifacts cause 500 error in completeArtifact in firefox-ci deployment

Categories

(Taskcluster :: Services, defect)

defect
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: dustin, Assigned: dustin)

Details

(and probably the community deployment too)

What we're seeing is

2019-11-12 16:34:44.444260 [ERROR   ] code_review_bot.cli: Static analysis failure (revision=PHID-DIFF-26zmu6ynjyeg5qheaw7b error=TaskclusterAuthFailure('ext.certificate.expiry < now\n\n---\n\n* method:     completeArtifact\n* errorCode:  AuthenticationFailed\n* statusCode: 401\n* time:       2019-11-12T16:34:44.495Z'))
2019-11-12 16:34:45.880340 [INFO    ] code_review_bot.revisions: Updated HarborMaster status (state=<BuildState.Fail: 'fail'>)
Traceback (most recent call last):
  File "/usr/local/bin/code-review-bot", line 11, in <module>
    load_entry_point('code-review-bot==1.0.5', 'console_scripts', 'code-review-bot')()
  File "/usr/local/lib/python3.8/contextlib.py", line 75, in inner
    return func(*args, **kwds)
  File "/usr/local/lib/python3.8/site-packages/code_review_bot-1.0.5-py3.8.egg/code_review_bot/cli.py", line 128, in main
    w.run(revision)
  File "/usr/local/lib/python3.8/site-packages/code_review_bot-1.0.5-py3.8.egg/code_review_bot/workflow.py", line 93, in run
    self.publish(revision, issues)
  File "/usr/local/lib/python3.8/site-packages/code_review_bot-1.0.5-py3.8.egg/code_review_bot/workflow.py", line 107, in publish
    patch.publish(self.queue_service)
  File "/usr/local/lib/python3.8/site-packages/code_review_bot-1.0.5-py3.8.egg/code_review_bot/revisions.py", line 59, in publish
    self.url = create_blob_artifact(
  File "/usr/local/lib/python3.8/site-packages/code_review_tools-0.1.0-py3.8.egg/code_review_tools/taskcluster.py", line 183, in create_blob_artifact
    queue_service.completeArtifact(
  File "/usr/local/lib/python3.8/site-packages/taskcluster-21.0.0-py3.8.egg/taskcluster/generated/queue.py", line 450, in completeArtifact
    return self._makeApiCall(self.funcinfo["completeArtifact"], *args, **kwargs)
  File "/usr/local/lib/python3.8/site-packages/taskcluster-21.0.0-py3.8.egg/taskcluster/client.py", line 271, in _makeApiCall
    response = self._makeHttpRequest(entry['method'], _route, payload)
  File "/usr/local/lib/python3.8/site-packages/taskcluster-21.0.0-py3.8.egg/taskcluster/client.py", line 532, in _makeHttpRequest
    raise exceptions.TaskclusterAuthFailure(
taskcluster.exceptions.TaskclusterAuthFailure: ext.certificate.expiry < now

The error message there is from bug 1595843, but suggests that the proxy was getting 500's. I can find the 500's in the logs, but without any incidentIds and without any associated error logging (bug 1580803).

A possible culprit:

services/queue/src/artifacts.js

 768       if (headRes.statusCode >= 300 || headRes.statusCode < 200) {
 769         return res.reportError('InternalServerError', [                                                                                                                                                                                                                                                                   
 770           `When attempting to do a HEAD request for the uploaded artifact ${url}`,
 771           `a status code of ${headRes.statusCode} was returned.`,
 772         ].join(' '), {});
 773       }

That would cause the client to return a 500 but not log anything. That should probably be a 400 error.

In my dev environment, my blob configuration is even less correct, so I get a 500 calling createArtifact, and that is properly logged with an exception object and matching incidentIds.

Regarding support or blob artifacts in general -- this appears to be the first broken thing. I don't think they are correctly configured in firefox-ci either. So that's a nice datapoint for bug 1577785.

(and the original complaint here has been rectified by switching to S3 artifacts).

Confirmed with a Python client, in staging:

Traceback (most recent call last):
  File "x.py", line 98, in <module>
    main()
  File "x.py", line 96, in main
    create_blob_artifact(queue_service, taskId, 0, 'public/README.md', open('README.md', 'rb').read(), 'text/plain', datetime.timedelta(hours=1))
  File "x.py", line 77, in create_blob_artifact
    task_id, run_id, path, {"etags": [push.headers["ETag"]]}
  File "/home/dustin/p/taskcluster/clients/client-py/taskcluster/generated/queue.py", line 450, in completeArtifact
    return self._makeApiCall(self.funcinfo["completeArtifact"], *args, **kwargs)
  File "/home/dustin/p/taskcluster/clients/client-py/taskcluster/client.py", line 271, in _makeApiCall
    response = self._makeHttpRequest(entry['method'], _route, payload)
  File "/home/dustin/p/taskcluster/clients/client-py/taskcluster/client.py", line 543, in _makeHttpRequest
    superExc=None
taskcluster.exceptions.TaskclusterRestFailure: When attempting to do a HEAD request for the uploaded artifact https://cloudopsstage-public-blobs.s3.amazonaws.com/fEnr--VzQHeEJNnQ18kdww/0/public/README.md a status code of 403 was returned.

Here's x.py, fwiw:

import datetime
import hashlib
import taskcluster
import requests
import json

TASK = json.loads('''{
"provisionerId": "null-provider",
"workerType": "testing",
"schedulerId": "-",
"dependencies": [],
"requires": "all-completed",
"routes": [],
"priority": "lowest",
"retries": 5,
"created": "2019-11-12T18:53:47.611Z",
"deadline": "2019-11-12T21:53:47.611Z",
"expires": "2020-11-12T21:53:47.611Z",
"scopes": [],
"payload": {
"image": "ubuntu:13.10",
"command": [
"/bin/bash",
"-c",
"for ((i=1;i<=600;i++)); do echo $i; sleep 1; done"
],
"maxRunTime": 630
},
"metadata": {
"name": "Example Task",
"description": "Markdown description of **what** this task does",
"owner": "name@example.com",
"source": "https://dustin.taskcluster-dev.net/tasks/create"
},
"tags": {},
"extra": {}
}''')

TASKCLUSTER_DATE_FORMAT = "%Y-%m-%dT%H:%M:%S.%fZ"

def create_blob_artifact(
    queue_service, task_id, run_id, path, content, content_type, ttl
):
    """
    Manually create and upload a blob artifact to use a specific content type
    """
    assert isinstance(content, bytes)
    assert isinstance(ttl, datetime.timedelta)

    # Create artifact on Taskcluster
    sha256 = hashlib.sha256(content).hexdigest()
    resp = queue_service.createArtifact(
        task_id,
        run_id,
        path,
        {
            "storageType": "blob",
            "expires": (datetime.datetime.utcnow() + ttl).strftime(
                TASKCLUSTER_DATE_FORMAT
            ),
            "contentType": content_type,
            "contentSha256": sha256,
            "contentLength": len(content),
        },
    )
    assert resp["storageType"] == "blob", "Not a blob storage"
    assert len(resp["requests"]) == 1, "Should only get one request"
    request = resp["requests"][0]
    assert request["method"] == "PUT", "Should get a PUT request"

    # Push the artifact on storage service
    push = requests.put(url=request["url"], headers=request["headers"], data=content)
    push.raise_for_status()

    # Mark artifact as completed
    queue_service.completeArtifact(
        task_id, run_id, path, {"etags": [push.headers["ETag"]]}
    )

    # Build the absolute url
    return f"https://firefox-ci-tc.services.mozilla.com/api/queue/v1/task/{task_id}/runs/{run_id}/artifacts/{path}"

def main():
    TASK['created'] = taskcluster.fromNow('0 seconds')
    TASK['deadline'] = taskcluster.fromNow('120 seconds')
    TASK['expires'] = taskcluster.fromNow('3 hours')
    options = taskcluster.optionsFromEnvironment()
    options['maxRetries'] = 0
    queue_service = taskcluster.Queue(options)

    taskId = taskcluster.slugid.nice()
    queue_service.createTask(taskId, TASK)

    queue_service.claimTask(taskId, 0, {'workerId': 'me', 'workerGroup': 'us'})

    create_blob_artifact(queue_service, taskId, 0, 'public/README.md', open('README.md', 'rb').read(), 'text/plain', datetime.timedelta(hours=1))

main()

(note that this requires a bunch of scopes -- I ran it with root scopes, which is why I only did it on staging and my dev env)

This code was subsequently deleted, but hey, it was fun.

Status: NEW → RESOLVED
Closed: 5 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.