Closed Bug 1444362 Opened 8 years ago Closed 5 years ago

Report papertrail url of latest system log line in task log header

Categories

(Taskcluster :: Workers, enhancement, P5)

enhancement

Tracking

(Not tracked)

RESOLVED WONTFIX

People

(Reporter: pmoore, Unassigned)

Details

When trying to find associated system logs for a given task, we currently have to perform quite expensive and inefficient papertrail queries, e.g. looking for taskIds accross all logging systems, or large worker type pools. If we can find a way to get a url that points to the current latest log line of the worker's own system logs, it will be much easier to immediately pull up the worker system logs, at the correct location, for a given task. It probably makes sense to do this just at the start of the task. We could optionally also do it if an error line is added to the task log (but probably not every error line, perhaps just the first).
Priority: -- → P5
The construction of the papertrail url is pretty trivial, it turns out, For example, looking at https://tools.taskcluster.net/groups/RuJmle5CTwm8b8tHQQ-kzA/tasks/Ns-vM-8QR3Sqs0bbXSevLA/runs/0/logs/public%2Flogs%2Flive.log We see: [taskcluster 2018-10-16T09:35:01.770Z] Worker Type (gecko-3-b-win2012) settings: [taskcluster 2018-10-16T09:35:01.770Z] { [taskcluster 2018-10-16T09:35:01.770Z] "aws": { [taskcluster 2018-10-16T09:35:01.770Z] "ami-id": "ami-07486aea011cdb861", [taskcluster 2018-10-16T09:35:01.770Z] "availability-zone": "us-west-2c", [taskcluster 2018-10-16T09:35:01.770Z] "instance-id": "i-0adf7b07aa71c30f9", [taskcluster 2018-10-16T09:35:01.770Z] "instance-type": "c4.4xlarge", [taskcluster 2018-10-16T09:35:01.770Z] "local-ipv4": "10.144.57.90", [taskcluster 2018-10-16T09:35:01.770Z] "public-hostname": "ec2-34-217-123-25.us-west-2.compute.amazonaws.com", [taskcluster 2018-10-16T09:35:01.770Z] "public-ipv4": "34.217.123.25" [taskcluster 2018-10-16T09:35:01.770Z] }, [taskcluster 2018-10-16T09:35:01.770Z] "config": { [taskcluster 2018-10-16T09:35:01.770Z] "deploymentId": "4e4f36e00a6f", [taskcluster 2018-10-16T09:35:01.770Z] "runTasksAsCurrentUser": false [taskcluster 2018-10-16T09:35:01.770Z] }, [taskcluster 2018-10-16T09:35:01.770Z] "generic-worker": { [taskcluster 2018-10-16T09:35:01.770Z] "go-arch": "amd64", [taskcluster 2018-10-16T09:35:01.770Z] "go-os": "windows", [taskcluster 2018-10-16T09:35:01.770Z] "go-version": "go1.10.3", [taskcluster 2018-10-16T09:35:01.770Z] "release": "https://github.com/taskcluster/generic-worker/releases/tag/v10.11.2", [taskcluster 2018-10-16T09:35:01.770Z] "revision": "4b47d5d9f45eabc283947da0a6b1c02eddd0a7fe", [taskcluster 2018-10-16T09:35:01.770Z] "source": "https://github.com/taskcluster/generic-worker/commits/4b47d5d9f45eabc283947da0a6b1c02eddd0a7fe", [taskcluster 2018-10-16T09:35:01.770Z] "version": "10.11.2" [taskcluster 2018-10-16T09:35:01.770Z] }, [taskcluster 2018-10-16T09:35:01.770Z] "machine-setup": { [taskcluster 2018-10-16T09:35:01.770Z] "ami-created": "2018-10-15 16:22:48.555Z", [taskcluster 2018-10-16T09:35:01.770Z] "manifest": "https://github.com/mozilla-releng/OpenCloudConfig/blob/4e4f36e00a6f6b427a43b5ed8c426e2dc9389da5/userdata/Manifest/gecko-3-b-win2012.json" [taskcluster 2018-10-16T09:35:01.770Z] } [taskcluster 2018-10-16T09:35:01.770Z] } [taskcluster 2018-10-16T09:35:01.770Z] Task ID: Ns-vM-8QR3Sqs0bbXSevLA [taskcluster 2018-10-16T09:35:01.770Z] === Task Starting === From here we can deduce that the task started at 2018-10-16T09:35:01.770Z which rounded down to the nearest second is 1539682501 in unix-seconds-since-epoch-format. The papertrail system name is i-0adf7b07aa71c30f9.gecko-3-b-win2012.usw2.mozilla.com. On this system, the worker logs as program "generic-worker". Putting this information together we can build the URL to the server logs: > https://papertrailapp.com/systems/i-0adf7b07aa71c30f9.gecko-3-b-win2012.usw2.mozilla.com/events?q=program%3Ageneric-worker&time=1539682501 Obtaining these three pieces of data on the worker is relatively straightforward: the timestamp can be determined from querying the current time, the papertrail system name is configured in the nxlog configuration as the hostname, and the program name for the generic-worker log provider is statically configurd in the nxlog configuration. So the hard part here is obtaining this information when the task starts, not when the task user is created, which can be a long time before a task is claimed. Once we've constructed the URL, we also need to be able to write it to the task log. The only obvious way I can think to implement this at the moment, is to allow a callback to be configured in the worker, for the worker to call each time a task starts. This callback could then return information which should be added to the log header. Another alternative is that this is implemented in a script which runs as the very first step of the task.
Note, not all workers will log to papertrail, so this is something that should be implemented not in the worker codebase, but by the host configuration tool. However, the worker may need to provide hooks so that the host can inject this information into the task logs.
Component: Generic-Worker → Workers

Note, there is a new tool that somewhat removes the need for this, although it may be still useful at some point.

But for now, see:

QA Whiteboard: [lang=go]
Status: NEW → RESOLVED
Closed: 5 years ago
Resolution: --- → WONTFIX
You need to log in before you can comment on or make changes to this bug.