Closed
Bug 778922
(tegra-182)
Opened 12 years ago
Closed 11 years ago
tegra-182 problem tracking
Categories
(Infrastructure & Operations Graveyard :: CIDuty, task)
Tracking
(Not tracked)
RESOLVED
FIXED
People
(Reporter: Callek, Unassigned)
References
()
Details
(Whiteboard: [buildduty][buildslaves][capacity][badslave])
Last Job 4 days, 19:59:36 ago
Reporter | ||
Comment 1•12 years ago
|
||
tried to PDU reboot, and never lost ping connectivity. Can't connect remotely. bash-3.2$ telnet tegra-182 20701 Trying 10.250.50.92... telnet: connect to address 10.250.50.92: Connection refused telnet: Unable to connect to remote host bash-3.2$ Please ensure this (a) is the correct PDU info: "pdu": "pdu5.df202-1.build.mtv1.mozilla.com", "pduid": ".AA10" And that this tegra has no battery attached, and that its switches are properly set. Lastly please reimage once more.
Depends on: 780798
Comment 2•12 years ago
|
||
Hi Justin, The pdu info was wrong. It should have been AA2. Question though, should it be listed in inventory as AA2 or A2? It was previously listed as just A10 in inventory so I changed it to A2. Thanks, Van
Reporter | ||
Comment 3•12 years ago
|
||
This tegra I think is also having issues ala: Bug 782495 -- I'm running ./stop_cp.sh on its foopy atm, pending ATeam direction on that bug.
Blocks: 782495
Comment 4•12 years ago
|
||
This machine is in the production pool, though I'm not sure it's supposed to be.
Status: NEW → RESOLVED
Closed: 12 years ago
Resolution: --- → FIXED
Comment 5•12 years ago
|
||
Well, it's done 101 jobs since it snuck back in, of which only 26 were failures, but 12 of the last 25 makes me think it's having issues a la "this tegra is busted crap that makes Android tests look like something that should be ignored."
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Comment 6•12 years ago
|
||
Make that 15 of the last 25, and where I'm seeing it frequently is in bug 689856, which I think probably is just the stealthy way of doing bug 782495 without leaving evidence. Rather than just taking it out for a few days by running stop_cp and then letting it come back, can we actually disable it? Do we even have a way of disabling broken tegras? We seem to be doing this in bug after bug after bug, "this tegra is broken" "ran stop_cp" "this tegra is back in production" "this tegra is broken and failing all the time" "ran stop_cp" "this tegra is back in production" "this tegra is broken and failing all the time" "ran stop_cp"
Comment 7•12 years ago
|
||
https://tbpl.mozilla.org/php/getParsedLog.php?id=15246213&tree=Mozilla-Inbound for which I nearly filed an invalid bug
Comment 8•12 years ago
|
||
https://tbpl.mozilla.org/php/getParsedLog.php?id=15253875&tree=Mozilla-Inbound https://tbpl.mozilla.org/php/getParsedLog.php?id=15250682&tree=Mozilla-Inbound
Updated•12 years ago
|
Summary: tegra-182 problem tracking → [disable me] tegra-182 problem tracking
Comment 9•12 years ago
|
||
Comment #6 clearly needs addressing, but in order to staunch the flakiness stop_cp has been run.
Summary: [disable me] tegra-182 problem tracking → tegra-182 problem tracking
Reporter | ||
Comment 10•12 years ago
|
||
ATeam thinks this is a device issue, lets recover it.
Depends on: 792316
Reporter | ||
Comment 11•12 years ago
|
||
IT handled this, start_cp run now
Status: REOPENED → RESOLVED
Closed: 12 years ago → 12 years ago
Resolution: --- → FIXED
Comment 12•12 years ago
|
||
Whatever its current state is, it isn't FIXED.
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Reporter | ||
Comment 13•12 years ago
|
||
had to PDU reboot, but PDU settings were wrong (c#2), just fixed them, and the device is up, and looks to be functioning properly.
Status: REOPENED → RESOLVED
Closed: 12 years ago → 12 years ago
Resolution: --- → FIXED
Comment 16•12 years ago
|
||
https://tbpl.mozilla.org/php/getParsedLog.php?id=15459701&tree=Mozilla-Inbound
Comment 17•12 years ago
|
||
SUTAgent not present; time for recovery
Comment 18•12 years ago
|
||
Do we have two images that we use for tegra recovery, "the good image" and "the borken image"? This slave appears to have been in pretty good shape, and continuing to do jobs both before and after comment 17, but since the probable time of that recovery (guessing based on when there was a long enough gap in between runs) it has turned simply awful, 18% green.
Updated•12 years ago
|
Reporter | ||
Comment 20•12 years ago
|
||
Should be back up.
Status: REOPENED → RESOLVED
Closed: 12 years ago → 12 years ago
Resolution: --- → FIXED
Comment 25•11 years ago
|
||
Those duplicates would be the four times that someone wasn't as lucky as I was in comment 7, so they went to all the trouble of filing a bug when the summary could have been "tegra-182 is broken" and the description could have been "tegra-182 is broken."
Severity: normal → major
Status: RESOLVED → REOPENED
Priority: P3 → --
Resolution: FIXED → ---
Whiteboard: [buildduty][buildslaves][capacity] → [buildduty][buildslaves][capacity][badslave]
Reporter | ||
Comment 26•11 years ago
|
||
running stop_cp based on c#25
Reporter | ||
Comment 27•11 years ago
|
||
(mass change: filter on tegraCallek02reboot2013) I just rebooted this device, hoping that many of the ones I'm doing tonight come back automatically. I'll check back in tomorrow to see if it did, if it does not I'll triage next step manually on a per-device basis. --- Command I used (with a manual patch to the fabric script to allow this command) (fabric)[jwood@dev-master01 fabric]$ python manage_foopies.py -j15 -f devices.json `for i in 021 032 036 039 046 048 061 064 066 067 071 074 079 081 082 083 084 088 093 104 106 108 115 116 118 129 152 154 164 168 169 174 179 182 184 187 189 200 207 217 223 228 234 248 255 264 270 277 285 290 294 295 297 298 300 302 304 305 306 307 308 309 310 311 312 314 315 316 319 320 321 322 323 324 325 326 328 329 330 331 332 333 335 336 337 338 339 340 341 342 343 345 346 347 348 349 350 354 355 356 358 359 360 361 362 363 364 365 367 368 369; do echo '-D' tegra-$i; done` reboot_tegra The command does the reboot, one-at-a-time from the foopy the device is connected from. with one ssh connection per foopy
Reporter | ||
Updated•11 years ago
|
Reporter | ||
Comment 28•11 years ago
|
||
Back from recovery
Status: REOPENED → RESOLVED
Closed: 12 years ago → 11 years ago
Resolution: --- → FIXED
Reporter | ||
Updated•11 years ago
|
Assignee | ||
Updated•11 years ago
|
Product: mozilla.org → Release Engineering
Updated•6 years ago
|
Product: Release Engineering → Infrastructure & Operations
Updated•4 years ago
|
Product: Infrastructure & Operations → Infrastructure & Operations Graveyard
You need to log in
before you can comment on or make changes to this bug.
Description
•