Closed Bug 1596892 Opened 5 years ago Closed 5 years ago

packet.net machine-23 is very slow

Categories

(Infrastructure & Operations :: RelOps: Hardware, task)

task
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: gbrown, Assigned: aerickson)

References

(Blocks 1 open bug)

Details

Component: General → RelOps: Hardware
Product: Taskcluster → Infrastructure & Operations

Is this still happening? Yesterday I deleted some old, duplicated instances.

Flags: needinfo?(gbrown)

I changed CPU governor to performance in the instance. Please, keep an eye to see if the situation changed.

Still happening:

https://treeherder.mozilla.org/logviewer.html#/jobs?job_id=277970936&repo=autoland&lineNumber=38695

Can we just quarantine machine-23? (I am not authorized.)

I've quarantined the host.

Wander, do you think your recent reboot of this instance fixed things? Should I un-quarantine and monitor it?

Thanks,
Andy

Flags: needinfo?(wcosta)

(In reply to Andrew Erickson [:aerickson] from comment #7)

Wander, do you think your recent reboot of this instance fixed things? Should I un-quarantine and monitor it?

Thanks,
Andy

Yes, I think so

Flags: needinfo?(wcosta)

OK.

I've run into an issue while trying to remote the host from quarantine (https://github.com/taskcluster/taskcluster/issues/2166).

Assignee: nobody → aerickson

https://github.com/taskcluster/taskcluster/issues/2166 has been fixed, but won't be shipped for a few weeks. Have to remove from quarantine via API.

taskcluster api queue quarantineWorker ..
and https://docs.taskcluster.net/docs/reference/platform/queue/api#quarantineWorker

Worker removed from quarantine.

Script used:

#!/usr/bin/env python

import taskcluster

#
# https://pypi.org/project/taskcluster/
#

creds = {
    "clientId": "project/releng/quarantine-worker-aerickson",
    "accessToken": "INSERT_VALID_TOKEN",
}
queue = taskcluster.Queue(
    {"rootUrl": "https://firefox-ci-tc.services.mozilla.com", "credentials": creds}
)

queue.quarantineWorker(
    "terraform-packet",
    "gecko-t-linux",
    "packet-sjc1",
    "machine-23",
    {"quarantineUntil": taskcluster.fromNow("-1 year")},
)

machine-23 seems to be doing pretty well. I'll keep watching it.

gecko-t-linux.machine-23  {sr: [========  ]  81.2%, suc: 13, cmp: 16, exc:  0, rng:  4, notes: ['No jobs in queue.'], alerts: ['Low health (less than 0.85)!']}

I think all of the failures (original and new) are reftests; maybe something odd about the gpu?

Placed back in quarantine.

Wander, how do we swap out the hardware on this instance? If you just delete and recreate will it give us the same host?

Flags: needinfo?(wcosta)

(In reply to Andrew Erickson [:aerickson] from comment #15)

Placed back in quarantine.

Wander, how do we swap out the hardware on this instance? If you just delete and recreate will it give us the same host?

Done!

Flags: needinfo?(wcosta)

Thanks Wander. :)

Removed from quarantine.

Status: NEW → RESOLVED
Closed: 5 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.