docker-worker has funny hostname in logs in GCP
Categories
(Taskcluster :: Workers, defect)
Tracking
(Not tracked)
People
(Reporter: dustin, Assigned: dustin)
Details
Jul 30 21:50:52 docker-worker.aws-provisioner.<!DOCTYPE kernel: [ 49.015051] init: docker main process ended,
| Assignee | ||
Comment 1•6 years ago
|
||
Based on Brian's work, it looks like deploy/template/var/lib/cloud/scripts/per-boot/init.sh changes the logging hostname and restarts the logger. That means that the syslog for a worker is split between the logs from before that point and after. Which explains why when worker startup fails, the logs don't appear in papertrail. In fact, searching for ip-10-144-55-254 finds me the first half of the logs for i-05a0eea0dd561896a.
On the other hand, after the fact, it's going to be very difficult to determine the ec2 hostname for the host (since the IP is lost), whereas the instance-id is pretty easy to find (since it matches the workerId).
In general, though, we don't know that instance IDs will match. So I think the "real" fix to this, to enable finding logs after-the-fact, is either
- tc-worker-runner logs workerGroup / workerId on startup; or
- workers log hostname at the beginning of a task
and then not reset the hostname at all during startup.
In the interim, I'll stop resetting the hostname in GCP, but leave the hostnames as they are in AWS until one or both of the above are complete.
| Assignee | ||
Comment 2•6 years ago
|
||
| Assignee | ||
Comment 3•6 years ago
|
||
tc-worker-runner logs workerGroup / workerId on startup; or
https://github.com/taskcluster/taskcluster-worker-runner/pull/17
| Assignee | ||
Comment 4•6 years ago
|
||
On a rando AWS instance:
root@ip-172-31-35-60:~# node
> require('os').hostname()
'ip-172-31-35-60'
so, I think we could just log that at task startup.
| Assignee | ||
Comment 5•6 years ago
|
||
| Assignee | ||
Updated•6 years ago
|
Description
•