Closed Bug 797245 (talos-r4-lion-063) Opened 12 years ago Closed 11 years ago

talos-r4-lion-063 problem tracking

Categories

(Infrastructure & Operations Graveyard :: CIDuty, task)

x86_64
macOS
task
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: philor, Unassigned)

References

Details

(Whiteboard: [buildslaves][capacity][badslave?] This slave might have issues even if hardware diagnostics did not catch anything)

Attachments

(2 files)

Please disable this slave and have hardware diagnostics run on it, particularly RAM tests - we've had this bug 795215 "pink pixel of death" problem for a while, where we'll get intermittent failures in a ton of reftests and the difference is that one canvas has a single pixel which is pink rather than white (a single bit of difference), but once I filed it so we could see that it was actually talos-r4-lion-063 twice in a row, hello bad RAM.
Blocks: 795215
Summary: talos-r4-lion-063 problem tracking → [disable me] talos-r4-lion-063 problem tracking
Depends on: 798380
Summary: [disable me] talos-r4-lion-063 problem tracking → talos-r4-lion-063 problem tracking
Putting back into the pool.
This slave might have issues even if hardware diagnostics did not catch anything.
If it gives us trouble we should dig into the problem and try to find a test case.
If such test case is found we should put the slave in bug 712206.
At that point we can ask re-imaging and try again.
If that fails again we should decommission the slave.
Whiteboard: [buildduty][buildslaves][capacity][badslave?] → [buildduty][buildslaves][capacity][badslave?] This slave might have issues even if hardware diagnostics did not catch anything
Status: NEW → RESOLVED
Closed: 12 years ago
Resolution: --- → FIXED
https://tbpl.mozilla.org/php/getParsedLog.php?id=17848078&tree=Mozilla-Aurora is the same old pink pixel of death. So is https://tbpl.mozilla.org/php/getParsedLog.php?id=17878263&tree=Ionmonkey.

Test case? Write the same large number of values to memory twice, read them, make sure they are still the same when you read them. Although there's something in the reftest harness about "reusing canvases" that may mean it's partially more like writing a big block of zeros, overwriting a small part with non-zero data, then reading it over and over again and making sure none of the zeros become non-zero.
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Nice, now it has burned 12 jobs in a row, "hdiutil: attach failed - Device not configured"
Severity: normal → major
[16:15]	<philor>	who's on sledgehammerduty?
[16:17]	<philor>	that damn talos-r4-lion-063, which has hardware problems no matter how many inadequate runs of inadequate hardware diagnostics it gets, has burned its last dozen runs

Disabled in slavealloc
Severity: major → normal
(In reply to Aki Sasaki [:aki] from comment #9)
> Disabled in slavealloc

And it managed to dodge the sledgehammer:

talos-r4-lion-063:~ cltbld$ uptime
11:26  up 19:51, 3 users, load averages: 0.66 0.58 0.53

I manually restarted teh slave now, so it should properly come back disabled.
We should decomm this slave. Hardware diagnostics show no problems with it, but it keeps burning jobs.
Depends on: 828608
Attachment #699979 - Flags: review?(rail) → review+
Comment on attachment 699981 [details] [diff] [review]
remove from buildbot

I think it'll conflict with https://bugzilla.mozilla.org/attachment.cgi?id=699975&action=edit
Attachment #699981 - Flags: review?(rail) → review-
(In reply to Rail Aliiev [:rail] from comment #18)
> Comment on attachment 699981 [details] [diff] [review]
> remove from buildbot
> 
> I think it'll conflict with
> https://bugzilla.mozilla.org/attachment.cgi?id=699975&action=edit

I was planning to resolve that when I land.
Comment on attachment 699981 [details] [diff] [review]
remove from buildbot

r+ in this case.

BTW, maybe it'll be easier to read the cofigs if we have something likes this:

'lion': dict([("talos-r4-lion-%03i" % x, {}) for x in \
      set(range(4,85)) - set([10, 63, 83]) ]),
Attachment #699981 - Flags: review- → review+
Not a buildduty need for tracking anymore

Ben --> can you please make sure all the patches are properly landed.
Flags: needinfo?(bhearsum)
Whiteboard: [buildduty][buildslaves][capacity][badslave?] This slave might have issues even if hardware diagnostics did not catch anything → [buildslaves][capacity][badslave?] This slave might have issues even if hardware diagnostics did not catch anything
Status: REOPENED → RESOLVED
Closed: 12 years ago11 years ago
Flags: needinfo?(bhearsum)
Resolution: --- → FIXED
Product: mozilla.org → Release Engineering
Product: Release Engineering → Infrastructure & Operations
Product: Infrastructure & Operations → Infrastructure & Operations Graveyard
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: