Generic worker live log artifacts are unreachable after task completes

RESOLVED FIXED

Status

RESOLVED FIXED
2 years ago
2 months ago

People

(Reporter: ekyle, Assigned: pmoore)

Tracking

Firefox Tracking Flags

(Not tracked)

Details

Attachments

(1 attachment)

(Reporter)

Description

2 years ago
Looking at artifacts [1], the URLs [2] get forwarded to bad "servers".  Here is what the error looks like on may side:
 
> ERROR: HTTPSConnectionPool(
>     host='gq275tyaaaavkecitvprfvkbx4nfmuwsh4pn2xddtjjw4aon.taskcluster-worker.net', 
>     port=60023
> ): Max retries exceeded with url: /log/A2IV1T3zRQKo9ykszrRrNQ (
>     Caused by NewConnectionError(
>         '<requests.packages.urllib3.connection.VerifiedHTTPSConnection object at 0x7f4b84adfc90>: 
>         Failed to establish a new connection: [Errno -5] No address associated with hostname',
>     )
>  )

My sample for the past hour seems to indicate that only the "command_000000.log.live" is affected.

[1] https://queue.taskcluster.net/v1/task/50oDK_RFQ92Fh1y_c589nw/artifacts
[2] http://queue.taskcluster.net/v1/task/50oDK_RFQ92Fh1y_c589nw/artifacts/public/logs/command_000000.log.live
These are NSS tasks - Tim/Pete -- any idea about what's going on here?
Component: General → Task Configuration
(Assignee)

Comment 2

2 years ago
These are livelog artifacts which connect you to the worker while the job is running. Once the task has completed, the worker stops serving the livelogs, and the URLs no longer work.

I'll check with the team how these get fixed in docker worker...
(Reporter)

Comment 3

2 years ago
Maybe there is no problem:  The expiry shows them as expired.

> "expires": "2016-06-02T08:44:02.271Z"

Comment 4

2 years ago
docker-worker streams from the task output to two places, one to the live log service running along side the task, and one to a "backing" log that is a temp file on disk.

When the task first starts, the live log artifact that's created is a redirect artifact that redirects to the live log endpoint.

Upon completion of the task, the "backing" log artifact is created with the temp file that had the task output, and then the live log redirect artifact is redirected to that "backing" artifact so in the end both artifacts are pointing to the same underlying file.  Redirect artifacts can be recreated to point to a different URL as long as the expiration, content type, and artifact path are the same.  This is how the live log redirect artifact is able to be pointed to the backing log URL after the task completes [1].

[1] https://github.com/taskcluster/docker-worker/blob/master/lib/features/local_live_log.js#L207
(Assignee)

Comment 5

2 years ago
Awesome, I'll do the same in generic worker then!

Thanks Greg.

I think it might be more intuitive if we allowed it such that a redirect artifact could be replaced by an s3 artifact, such that we'd only have one artifact for the log, and it would be mostly transparent to the user if it was a redirect to the worker, or a download from s3. At the moment having two artifacts with slightly different names that point to the same thing seems confusing. Previously I thought opposition to this idea was that artifacts should be immutable, but given that redirect artifacts can be replaced with different redirect artifacts, maybe we can do it after all.

What are your thoughts, Greg and Jonas?
Flags: needinfo?(jopsen)
Flags: needinfo?(garndt)

Comment 6

2 years ago
This was part of the motivation for the discussion of publish/unpublish where when the live artifact is no longer relevant, it could be unpublished and the new artifact for the backing log file uploaded and published.
Flags: needinfo?(garndt)
(Assignee)

Updated

2 years ago
Summary: Lots of bad artifact URLs → Generic worker live log artifacts are unreachable after task completes
(Assignee)

Comment 7

2 years ago
Created attachment 8759785 [details] [review]
Github Pull Request for generic-worker

This should mirror docker worker behaviour, so that after a live log is no longer available, the livelog artifact is replaced by an alternative redirect artifact which points to the underlying log file.

The only difference is that we have one livelog per command in the Generic Worker, rather than just a single live log per task.
Assignee: nobody → pmoore
Status: NEW → ASSIGNED
Attachment #8759785 - Flags: review?(garndt)

Updated

2 years ago
Attachment #8759785 - Flags: review?(garndt) → review+

Comment 8

2 years ago
Commits pushed to master at https://github.com/taskcluster/generic-worker

https://github.com/taskcluster/generic-worker/commit/7e8d86367fe35294267c47996eb14064b481c275
Bug 1277568: redirect livelog artifact to underlying log when command completes and livelog is no longer available

https://github.com/taskcluster/generic-worker/commit/29808f82315d3a831565ce3cb1c18ab1e9175193
Merge pull request #8 from taskcluster/bug1277568

Bug 1277568: redirect livelog artifact to underlying log when task command completes
(Reporter)

Comment 10

2 years ago
I think it is perfectly acceptable to set the "expires" property, rather than redirect.
(Assignee)

Comment 11

2 years ago
(In reply to Kyle Lahnakoski [:ekyle] from comment #9)
> I believe it may be worse now:
> 
> http://queue.taskcluster.net/v1/task/ZQxYgHVQQDiUx-AuL4RTvA/artifacts/public/
> logs/command_000000.log.live
> 
> ERROR: Exceeded 30 redirects.

Hi Kyle,

That livelog redirect was from a integration test run before this bug had been resolved. It shouldn't redirect like that now. I'll be landing a new release today and we can confirm it functions the same as docker worker. Agreed, the redirect you've highlighted is a bad one, but the code that generated it has been fixed.
(Assignee)

Comment 12

2 years ago
New AMIs have been created for us-west-1, us-west-2, us-east-1 and worker types win2012r2 and ttaubert-win2012r2 have been updated. This change is now live.

See https://treeherder.mozilla.org/#/jobs?repo=try&revision=2dc87f618c6f440745c65fdaa781fee1fbceb83d as an example.
Status: ASSIGNED → RESOLVED
Last Resolved: 2 years ago
Resolution: --- → FIXED
I think we briefly discussed this on IRC. But it might be a thing for London too.
Flags: needinfo?(jopsen)

Updated

7 months ago
Product: TaskCluster → Firefox Build System
You need to log in before you can comment on or make changes to this bug.