Closed
Bug 842461
(t-snow-r4-0064)
Opened 12 years ago
Closed 11 years ago
t-snow-r4-0064 problem tracking
Categories
(Infrastructure & Operations Graveyard :: CIDuty, task, P3)
Tracking
(Not tracked)
RESOLVED
FIXED
People
(Reporter: philor, Unassigned)
References
()
Details
(Whiteboard: [buildduty][buildslaves][capacity])
https://tbpl.mozilla.org/php/getParsedLog.php?id=19863634&tree=Mozilla-Inbound is a pink pixel of death reftest failure (actually a single pixel that's 255,255,247 instead of 255, more of an off-white pixel of death). Once a few more happen, I'll reopen, we'll run diagnostics that won't find anything, rinse, repeat.
Reporter | ||
Updated•12 years ago
|
Status: NEW → RESOLVED
Closed: 12 years ago
Resolution: --- → FIXED
Reporter | ||
Comment 1•12 years ago
|
||
https://tbpl.mozilla.org/php/getParsedLog.php?id=19888311&tree=Fx-Team is exactly the sort of GC crash that started us down the road of (unsuccessfully) blaming slaves with bad RAM for having bad RAM.
Disable, diagnostics that won't show anything, reimage, restart the cycle of blame, please.
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Comment 2•12 years ago
|
||
Disabled in slavealloc.
Comment 3•11 years ago
|
||
back in production
Status: REOPENED → RESOLVED
Closed: 12 years ago → 11 years ago
Resolution: --- → FIXED
Reporter | ||
Comment 4•11 years ago
|
||
Bad-RAM-caused GC crash in https://bugzilla.mozilla.org/show_bug.cgi?id=856612#c1
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Comment 5•11 years ago
|
||
Disabled in slavealloc due to failing to clone hg repos:
could not lookup DNS configuration info service: (ipc/send) invalid destination port
abort: error: nodename nor servname provided, or not known
Comment 6•11 years ago
|
||
Also note that the reboot step fails when in this state because tools aren't cloned:
python: can't open file 'tools/buildfarm/maintenance/count_and_reboot.py': [Errno 2] No such file or directory
Comment 7•11 years ago
|
||
After manual reboot (thanks to :jhopkins), DNS looks okay.
re-enabled in slavealloc - will monitor for a clean job or two
Comment 8•11 years ago
|
||
two successful jobs with reboots, declaring victory
Status: REOPENED → RESOLVED
Closed: 11 years ago → 11 years ago
Resolution: --- → FIXED
Assignee | ||
Updated•11 years ago
|
Product: mozilla.org → Release Engineering
Reporter | ||
Comment 9•11 years ago
|
||
Mildly curious that the same thing would afflict the same slave again, but exactly as with comment 5, busted DNS that will heal once someone's around to reboot it, disabled in slavealloc.
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Comment 10•11 years ago
|
||
Memtest requested in bug 933889.
I didn't see anyone dealing with comment 4.
Comment 11•11 years ago
|
||
2013-03-25 diagnostics requested
2013-04-30 same issues + DNS issues
2013-11-11 memtest does not find memory issues
I'm putting this into production as I need logs to do anything in here.
Updated•11 years ago
|
Updated•11 years ago
|
Assignee: nobody → armenzg
Comment 12•11 years ago
|
||
1 job failed in the last 50 jobs.
Status: REOPENED → RESOLVED
Closed: 11 years ago → 11 years ago
Resolution: --- → FIXED
Reporter | ||
Comment 13•11 years ago
|
||
Score for the last 100 jobs is three expected failures from bad checkins, and three suspicious GC crashes that make me think I'll be coming back to disable it before too long.
Reporter | ||
Comment 14•11 years ago
|
||
And another GC crash.
Disabled in slavealloc.
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Comment 15•11 years ago
|
||
Memory replacement requested.
Comment 16•11 years ago
|
||
Rebooted into production.
Comment 17•11 years ago
|
||
It is looking good.
Status: REOPENED → RESOLVED
Closed: 11 years ago → 11 years ago
Resolution: --- → FIXED
Reporter | ||
Comment 18•11 years ago
|
||
https://tbpl.mozilla.org/php/getParsedLog.php?id=31631470&tree=Mozilla-Aurora
14:00:40 INFO - firefox-bin(955,0x7fff70221cc0) malloc: *** error for object 0x120e7b008: incorrect checksum for freed object - object was probably modified after being freed.
14:00:40 INFO - *** set a breakpoint in malloc_error_break to debug
14:01:15 WARNING - PROCESS-CRASH | file:///builds/slave/talos-slave/test/build/tests/reftest/tests/content/events/crashtests/recursive-DOMNodeInserted.html | application crashed [@ libSystem.B.dylib + 0x4f0b6]
Comment 19•11 years ago
|
||
2013-04 - disk diagnostics fixed some bad sectors - bug 854544
2013-11 - memtest requested - bug 933889
2013-11 - memory replaced - bug 944520
Keeping an eye on philor reporting more issues.
Reporter | ||
Comment 20•11 years ago
|
||
https://tbpl.mozilla.org/php/getParsedLog.php?id=31995503&tree=Mozilla-Central - "Error in collecting counter: Private Bytes, pid: 958, exception: list index out of range," and then a socket error traceback, rather like Python got loaded into the bad memory this time.
Updated•11 years ago
|
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Updated•11 years ago
|
Status: REOPENED → RESOLVED
Closed: 11 years ago → 11 years ago
Resolution: --- → FIXED
Updated•10 years ago
|
Assignee: armenzg → nobody
QA Contact: armenzg → bugspam.Callek
Updated•10 years ago
|
Alias: talos-r4-snow-066 → t-snow-r4-0064
Summary: talos-r4-snow-066 problem tracking → t-snow-r4-0064 problem tracking
Updated•6 years ago
|
Product: Release Engineering → Infrastructure & Operations
Updated•5 years ago
|
Product: Infrastructure & Operations → Infrastructure & Operations Graveyard
You need to log in
before you can comment on or make changes to this bug.
Description
•