Closed Bug 1602637 Opened 6 years ago Closed 6 years ago

packet.net: idle workers

Categories

(Infrastructure & Operations :: RelOps: General, task)

task
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: aerickson, Unassigned)

Details

The following packet.net workers aren't reporting to TC (per https://firefox-ci-tc.services.mozilla.co/provisioners/terraform-packet/worker-types/gecko-t-linux):

['machine-0', 'machine-11', 'machine-20', 'machine-42', 'machine-44', 'machine-46']

I have access to the packet.net console and can reboot the hosts, but not sure if there's any debugging to do.

The queue is pretty heavily loaded currently (I have an alert when there are 600+ jobs for 4+ hour that's firing, https://earthangel-b40313e5.influxcloud.net/d/wIJoZ4HWk/android-queues?orgId=1&fullscreen&panelId=10&refresh=5m).

Thanks,
Andy

I rebooted all faulty machines, but I believe the real problem is that 60 machines is no longer enough.

['machine-0', 'machine-11', 'machine-20', 'machine-42', 'machine-44', 'machine-46'] are working again.

machine-23 is still quarantined. I'll follow up in that ticket.

I think we are very close to needing more instances. I'll keep an eye on the graphs.

Status: NEW → RESOLVED
Closed: 6 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.