Closed Bug 1473358 Opened 7 years ago Closed 5 years ago

Problems with yosemite machines

Categories

(Infrastructure & Operations :: DCOps, defect)

defect
Not set
normal

Tracking

(Not tracked)

RESOLVED WONTFIX

People

(Reporter: apop, Unassigned)

References

Details

Currently we have tried to log on, reboot and reimage the following machines, on most of the we couldn't connect because of errors like : "Stdio forwarding request failed: Session open refused by peer ssh_exchange_identification: Connection closed by remote host" or because it was asking password for the root account. The faulty machines are : t-yosemite-r7-394 t-yosemite-r7-280 t-yosemite-r7-189 t-yosemite-r7-127 T-yosemite-r7-093 t-yosemite-r7-426 t-yosemite-r7-349 t-yosemite-r7-327 t-yosemite-r7-260 Some of these have bugs opened. I'll add them to the ticket
Depends on: 1473589
Adding the following: t-yosemite-r7-121 t-yosemite-r7-124 t-yosemite-r7-130 t-yosemite-r7-356 Made problem tracking bug for each of 'em.
Adding t-yosemite-r7-442 ...
t-yosemite-r7-414 wins it's place there ...
t-yosemite-r7-357 goes here as well ...
> because it was asking password for the root account. quite a few of these were reimaged in previously opened tickets, i.e. 260 and 327. is there a puppet issue? :dhouse, are you able to log in to 260 and 327?
Flags: needinfo?(dhouse)
This may be a puppet issue. :apop I'm going to review this since it may be a problem across the osx machines (when we are reimaging or something). I will first check for the deploystudio result emails for these machines, and then look for logs for these machines in papertrail, and then I may run puppet manually on another machine to make sure there isn't a warning or something that we are missing that could be causing the ssh keys to not get added on the first puppet run on these.
Flags: needinfo?(dhouse)
I still need to check for the deploystudio emails, but first I tried dns/ping/ssh: I'm seeing different results from the machines listed in this bug. So there may be a few different problems on these: ``` [dhouse@rejh2.srv.releng.mdc1.mozilla.com moonshot]$ for I in 093 121 124 127 130 189 260 280 327 349 356 357 394 414 426 442 ; do for dc in t-yosemite-r7-${I}.test.releng.mdc{1..2}.mozilla.com; do host $dc >/dev/null && (ping=$(ping -q -c1 -w5 $dc 2>&1 | grep -o "[0-9]*% packet loss") && ssh=$(ssh -o LogLevel=ERROR -o PasswordAuthentication=no -o StrictHostKeyChecking=no -o ConnectTimeout=5 -o UserKnownHostsFile=/dev/null root@${dc} hostname 2>&1 | grep -o "t-yose.*$"); echo -e "$I $ping\t$ssh"); done; done 093 0% packet loss t-yosemite-r7-093.test.releng.mdc2.mozilla.com 121 0% packet loss t-yosemite-r7-121.test.releng.mdc2.mozilla.com 124 100% packet loss t-yosemite-r7-124.test.releng.mdc2.mozilla.com port 22: Connection timed out 127 0% packet loss t-yosemite-r7-127.test.releng.mdc2.mozilla.com port 22: Connection refused 130 100% packet loss t-yosemite-r7-130.test.releng.mdc2.mozilla.com port 22: Connection timed out 189 100% packet loss t-yosemite-r7-189.test.releng.mdc2.mozilla.com port 22: Connection timed out 260 100% packet loss t-yosemite-r7-260.test.releng.mdc1.mozilla.com port 22: Connection timed out 280 0% packet loss 327 100% packet loss t-yosemite-r7-327.test.releng.mdc1.mozilla.com port 22: Connection timed out 349 100% packet loss t-yosemite-r7-349.test.releng.mdc1.mozilla.com port 22: Connection timed out 356 0% packet loss t-yosemite-r7-356.test.releng.mdc1.mozilla.com port 22: Connection refused 357 100% packet loss t-yosemite-r7-357.test.releng.mdc1.mozilla.com port 22: Connection timed out 394 0% packet loss 414 0% packet loss 426 100% packet loss t-yosemite-r7-426.test.releng.mdc1.mozilla.com port 22: Connection timed out 442 0% packet loss ``` So, dns resolves for all of them. And results are: A. no ping/ssh problems found with 093, 121, 280(loaner), 394, 414, 442 B. connection refused by 127, 356 C. timeout from 124, 130, 189, 260, 327, 349, 357, 426
(In reply to Dave House [:dhouse] from comment #7) > 280 0% packet loss > 394 0% packet loss > 414 0% packet loss > 442 0% packet loss > ``` > > So, dns resolves for all of them. And results are: > A. no ping/ssh problems found with 093, 121, 280(loaner), 394, 414, 442 > B. connection refused by 127, 356 > C. timeout from 124, 130, 189, 260, 327, 349, 357, 426 My summary missed that 280, 394, 414, 442 were pingable, and did not timeout on ssh but refused my ssh key. A. no ping/ssh problems found with 093, 121 B. connection refused by 127, 356 C. timeout from 124, 130, 189, 260, 327, 349, 357, 426 D. ssh failed to 280, 394, 414, 442 (denied/passwordprompt) 093 - active and running tasks (closed bug https://bugzilla.mozilla.org/show_bug.cgi?id=t-yosemite-r7-093): https://tools.taskcluster.net/provisioners/releng-hardware/worker-types/gecko-t-osx-1010/workers/mdc2/t-yosemite-r7-093 121 - is quarantined (bug https://bugzilla.mozilla.org/show_bug.cgi?id=t-yosemite-r7-121 depends on this bug): https://tools.taskcluster.net/provisioners/releng-hardware/worker-types/gecko-t-osx-1010/workers/mdc2/t-yosemite-r7-121
was it a puppet issue? should i proceed with having QTS reimage the previously mentioned minis?
Depends on: t-yosemite-r7-246
Depends on: t-yosemite-r7-168
Depends on: t-yosemite-r7-142
Depends on: t-yosemite-r7-120
Depends on: t-yosemite-r7-045
Depends on: t-yosemite-r7-087
Depends on: t-yosemite-r7-039
Depends on: t-yosemite-r7-306
Depends on: t-yosemite-r7-092
Depends on: t-yosemite-r7-110
Depends on: t-yosemite-r7-161
Depends on: t-yosemite-r7-378
Depends on: t-yosemite-r7-457
Depends on: t-yosemite-r7-080
Depends on: t-yosemite-r7-075
Type: task → defect
Status: NEW → RESOLVED
Closed: 5 years ago
Resolution: --- → WONTFIX
You need to log in before you can comment on or make changes to this bug.