Closed Bug 785172 Opened 12 years ago Closed 12 years ago

Hands on needed for many new tegras

Categories

(Infrastructure & Operations :: DCOps, task)

task
Not set
major

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: Callek, Assigned: sal)

References

Details

Many of the new tegras have issues that prevented me from bringing them into production service. I'll outline actions for IT here (and below said actions, I'll do a copy/paste of the <trimmed> e-mail i sent to ATeam just for historic/public reasons about this).  Doing this two-stage so that IT has clear actions, and I have clear reasons, without confusion.


* Check PDU connection/ports (making sure it also powers up) on: 
tegra-366 
tegra-370

* Swap SDCards on:
tegra-320
tegra-322
tegra-329
tegra-333
tegra-345
tegra-352
tegra-353
tegra-357

* Reimage:
***All above*** ^ and:
tegra-308
tegra-315
tegra-327
tegra-328
tegra-335
tegra-336
tegra-337
tegra-341
tegra-344
tegra-348
tegra-350
tegra-354
tegra-355
tegra-358
tegra-360
tegra-363
tegra-368
tegra-369



==============
EMAIL follows
==============

Hey Guys,

First off, the reftest issue we had seen, even after the new Styrofoam was put in place have still had a few reftest fails.

Second off, <irrelevant here>

Third, many of the tegras from running verify I have determined I cannot YET put into production, before I file an IT bug to try reimage(s) on them and swapping SDCards/etc. <>

The issues fall into a few categories:

* PDU Reboot didn't bring the tegra back to pingable (possibly dead tegra, or just needs a reimage)
* PDU Reboot brought it back to pingable, but no connection via SUTAgent possible (dead/bad image is most likely)
* Bad SDCard (PDU reboot was fine, but unable to verify the watcher.ini *or* unable to write to the sdcard, many of these I checked manually and have no mountpoint for the sdcard)
* "##AGENT-WARNING## [PK] command is currently not implemented."
** This one is the most confusing to me, and appeared on MANY of these new ones, and was during the attempt at upgrading from SUTAgent 1.00 to 1.11.

The list, straight from my own raw notes, follows:

BAD

= Host is Down
366 - even after PDU
370 - even after PDU

= no SUTAgent Connection
308 - PDU cycled [@9:45p 8/22]
315 - PDU cycled [@9:45p 8/22]

= no SDCard mounted [anymore in some cases]
320 - PDU cycled [@9:50p 8/22]
322 - PDU cycled [@9:50p 8/22]
329 - PDU cycled
333 - PDU cycled
345 - PDU cycled
352 - PDU cycled
353 - PDU cycled
357 - PDU cycled

= response: ##AGENT-WARNING## [PK] command is currently not implemented.
327 - PDU cycled
328 - PDU cycled
335 - PDU cycled
336 - PDU cycled
337 - PDU cycled
341 - PDU cycled
344 - PDU cycled
348 - PDU cycled
350 - PDU cycled
354 - PDU cycled
355 - PDU cycled
358 - PDU cycled
360 - PDU cycled
363 - PDU cycled
368 - PDU cycled
369 - PDU cycled
We'll head up to mtv1 and work on this asap. 


van
Work has been completed.

The only thing is that tegra-352,353 & 357 are using new Kingston 8GB SD-cards because we ran out of the 16GB Sandisk cards.


Let us know if anything else is needed.
tegra-366 - DOA (https://bugzilla.mozilla.org/show_bug.cgi?id=767447#c54)
tegra-370 - the NIC is dead. We tried formatting/reflashing and using different switch ports/cables but we're not receiving a link light.
This work is done, so I believe this bug can be closed.  Yes?
(In reply to Melissa O'Connor [:melissa] from comment #4)
> This work is done, so I believe this bug can be closed.  Yes?

Yes. We may need more hands on for a list of the new tegras still, (many of them are failing SDCard stuff shortly after having been ok/up. right now)

But I'll get a new bug/batch on file for that next week after I am able to triage more on my end.
Status: NEW → RESOLVED
Closed: 12 years ago
Resolution: --- → FIXED
Depends on: 786966
Product: mozilla.org → Infrastructure & Operations
You need to log in before you can comment on or make changes to this bug.