Closed
Bug 808437
Opened 12 years ago
Closed 12 years ago
Something has broken in tegra recovery
Categories
(Infrastructure & Operations Graveyard :: CIDuty, task)
Tracking
(Not tracked)
RESOLVED
FIXED
People
(Reporter: philor, Unassigned)
References
Details
Last week, we had three tegra recovery bugs: bug 806950, bug 807663 and bug 807965.
Every single tegra involved in those now has its tracking bug reopened because it is broken, doing horribly, failing more than 50% of the time and doing so in suspicious ways like timing out in reftests and mochitests and failing to even initialize the browser in talos.
Particularly telling are tegra-093 and tegra-182, because from looking at buildapi/recent, there's no evidence I can see that they had a problem at all, but then they got recovered, and have turned into broken tegras.
The one before those was bug 802655, which did three tegras, two of which I persuaded Callek were bad hardware and should be scrapped, the third of which I regretted having not included in that tar-brushing a day later.
The one before that was bug 792692, which only did two of the tegras in the 300s, which are all awful and thus hard to tell about, but one of the two, tegra-336, seems to have been restarted on November 1st and to be running okay.
So, what could have changed between 2012-09-21, the last time we did a successful reimage, and now?
Reporter | ||
Updated•12 years ago
|
Reporter | ||
Comment 1•12 years ago
|
||
tegra-064 got reimaged in bug 807963 rather than a tegra-recovery bug, but it's broken just the same.
Blocks: tegra-064
Comment 2•12 years ago
|
||
>So, what could have changed between 2012-09-21, the last time we did a successful reimage, and now?
>tegra-064 got reimaged in bug 807963 rather than a tegra-recovery bug, but it's broken just the same.
:philor, is there a way to check if the latest image was used in tegra-064? I know there are several images on the imaging netbook and want to confirm the latest image has been used since 9-21.
Reporter | ||
Comment 3•12 years ago
|
||
s/philor/Callek/, since I'm a volunteer who looks at logs after test jobs finish, not a releng employee with access to anything.
Flags: needinfo?(bugspam.Callek)
Reporter | ||
Comment 4•12 years ago
|
||
tegra-057 got a reimage in bug 807962, and is also busted.
Blocks: tegra-057
Comment 5•12 years ago
|
||
Hi,
I just reimaged tegra-057 and tegra-064 with what is supposed to be the correct image. Is there a way you can run tests on them to confirm they're working as normal?
Thanks,
Van
Comment 6•12 years ago
|
||
(In reply to Van Le [:van] from comment #5)
> I just reimaged tegra-057 and tegra-064 with what is supposed to be the
> correct image. Is there a way you can run tests on them to confirm they're
> working as normal?
Apparently we crossed streams, and I never took down 057 first -- but no big worry there, it will pickup a new job soon. I've just started up 064 as well.
*leaving* their problem tracking bugs open for now
Flags: needinfo?(bugspam.Callek)
Reporter | ||
Comment 7•12 years ago
|
||
The fact that it's difficult to say whether or not 057 and 064 got another "bad image" doesn't bode well for that email thread about verifying that tegras are in good shape before putting them back in service.
064 still has a busted sd card, whether because it got one bad one replaced by another, or it has a busted slot, or something less imaginable - every other test run was failing by not being able to write to the card. The ones that did run... there were only four, only one failed, but in a suspicious way.
057 is probably busted in the bad-image way - it's done 15 green runs and 12 non-green, which is a bit higher than the average success rate for broken ones, but well below the average for unbroken ones.
But if the eventual post-image verification process takes two days to be sure a tegra is healthy, that's not going to be very handy.
Reporter | ||
Comment 8•12 years ago
|
||
057 is busted in the bad-image way, but it's ugly that it takes this long to be sure.
Comment 9•12 years ago
|
||
From what I can tell, the system is working as-intended, and nothing unexpected changed. This bug is closeable.
Status: NEW → RESOLVED
Closed: 12 years ago
Resolution: --- → FIXED
Updated•12 years ago
|
Assignee | ||
Updated•11 years ago
|
Product: mozilla.org → Release Engineering
Updated•7 years ago
|
Product: Release Engineering → Infrastructure & Operations
Updated•5 years ago
|
Product: Infrastructure & Operations → Infrastructure & Operations Graveyard
You need to log in
before you can comment on or make changes to this bug.
Description
•