Closed Bug 1277568 Opened 8 years ago Closed 8 years ago

Generic worker live log artifacts are unreachable after task completes

Categories

(Firefox Build System :: Task Configuration, task)

task
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: ekyle, Assigned: pmoore)

Details

Attachments

(1 file)

Looking at artifacts [1], the URLs [2] get forwarded to bad "servers".  Here is what the error looks like on may side:
 
> ERROR: HTTPSConnectionPool(
>     host='gq275tyaaaavkecitvprfvkbx4nfmuwsh4pn2xddtjjw4aon.taskcluster-worker.net', 
>     port=60023
> ): Max retries exceeded with url: /log/A2IV1T3zRQKo9ykszrRrNQ (
>     Caused by NewConnectionError(
>         '<requests.packages.urllib3.connection.VerifiedHTTPSConnection object at 0x7f4b84adfc90>: 
>         Failed to establish a new connection: [Errno -5] No address associated with hostname',
>     )
>  )

My sample for the past hour seems to indicate that only the "command_000000.log.live" is affected.

[1] https://queue.taskcluster.net/v1/task/50oDK_RFQ92Fh1y_c589nw/artifacts
[2] http://queue.taskcluster.net/v1/task/50oDK_RFQ92Fh1y_c589nw/artifacts/public/logs/command_000000.log.live
These are NSS tasks - Tim/Pete -- any idea about what's going on here?
Component: General → Task Configuration
These are livelog artifacts which connect you to the worker while the job is running. Once the task has completed, the worker stops serving the livelogs, and the URLs no longer work.

I'll check with the team how these get fixed in docker worker...
Maybe there is no problem:  The expiry shows them as expired.

> "expires": "2016-06-02T08:44:02.271Z"
docker-worker streams from the task output to two places, one to the live log service running along side the task, and one to a "backing" log that is a temp file on disk.

When the task first starts, the live log artifact that's created is a redirect artifact that redirects to the live log endpoint.

Upon completion of the task, the "backing" log artifact is created with the temp file that had the task output, and then the live log redirect artifact is redirected to that "backing" artifact so in the end both artifacts are pointing to the same underlying file.  Redirect artifacts can be recreated to point to a different URL as long as the expiration, content type, and artifact path are the same.  This is how the live log redirect artifact is able to be pointed to the backing log URL after the task completes [1].

[1] https://github.com/taskcluster/docker-worker/blob/master/lib/features/local_live_log.js#L207
Awesome, I'll do the same in generic worker then!

Thanks Greg.

I think it might be more intuitive if we allowed it such that a redirect artifact could be replaced by an s3 artifact, such that we'd only have one artifact for the log, and it would be mostly transparent to the user if it was a redirect to the worker, or a download from s3. At the moment having two artifacts with slightly different names that point to the same thing seems confusing. Previously I thought opposition to this idea was that artifacts should be immutable, but given that redirect artifacts can be replaced with different redirect artifacts, maybe we can do it after all.

What are your thoughts, Greg and Jonas?
Flags: needinfo?(jopsen)
Flags: needinfo?(garndt)
This was part of the motivation for the discussion of publish/unpublish where when the live artifact is no longer relevant, it could be unpublished and the new artifact for the backing log file uploaded and published.
Flags: needinfo?(garndt)
Summary: Lots of bad artifact URLs → Generic worker live log artifacts are unreachable after task completes
This should mirror docker worker behaviour, so that after a live log is no longer available, the livelog artifact is replaced by an alternative redirect artifact which points to the underlying log file.

The only difference is that we have one livelog per command in the Generic Worker, rather than just a single live log per task.
Assignee: nobody → pmoore
Status: NEW → ASSIGNED
Attachment #8759785 - Flags: review?(garndt)
Attachment #8759785 - Flags: review?(garndt) → review+
Commits pushed to master at https://github.com/taskcluster/generic-worker

https://github.com/taskcluster/generic-worker/commit/7e8d86367fe35294267c47996eb14064b481c275
Bug 1277568: redirect livelog artifact to underlying log when command completes and livelog is no longer available

https://github.com/taskcluster/generic-worker/commit/29808f82315d3a831565ce3cb1c18ab1e9175193
Merge pull request #8 from taskcluster/bug1277568

Bug 1277568: redirect livelog artifact to underlying log when task command completes
I think it is perfectly acceptable to set the "expires" property, rather than redirect.
(In reply to Kyle Lahnakoski [:ekyle] from comment #9)
> I believe it may be worse now:
> 
> http://queue.taskcluster.net/v1/task/ZQxYgHVQQDiUx-AuL4RTvA/artifacts/public/
> logs/command_000000.log.live
> 
> ERROR: Exceeded 30 redirects.

Hi Kyle,

That livelog redirect was from a integration test run before this bug had been resolved. It shouldn't redirect like that now. I'll be landing a new release today and we can confirm it functions the same as docker worker. Agreed, the redirect you've highlighted is a bad one, but the code that generated it has been fixed.
New AMIs have been created for us-west-1, us-west-2, us-east-1 and worker types win2012r2 and ttaubert-win2012r2 have been updated. This change is now live.

See https://treeherder.mozilla.org/#/jobs?repo=try&revision=2dc87f618c6f440745c65fdaa781fee1fbceb83d as an example.
Status: ASSIGNED → RESOLVED
Closed: 8 years ago
Resolution: --- → FIXED
I think we briefly discussed this on IRC. But it might be a thing for London too.
Flags: needinfo?(jopsen)
Product: TaskCluster → Firefox Build System
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: