Closed
Bug 547131
Opened 14 years ago
Closed 14 years ago
talos-r3 master or slaves unwell
Categories
(Release Engineering :: General, defect)
Tracking
(Not tracked)
RESOLVED
WORKSFORME
People
(Reporter: nthomas, Assigned: anodelman)
References
Details
Attachments
(3 files)
We've had several bunches of talos runs fail out when the slaves lose contact with the master. In particular at 15:22 (5 slaves), 15:37 (17), 17:30-18:00 (50+). Possibly fallout from bug 546731 ? Master is also gobbling up 1.5G right now, but CPU usage is OK.
Reporter | ||
Comment 1•14 years ago
|
||
There are lots of "talos dirty" runs that are stuck at "downloading to dirtyMaxDBs.zip". I feel like restarting the master.
Reporter | ||
Comment 2•14 years ago
|
||
This is all the builds today that ended in exception, meaning they lost connection between slave and master.
Reporter | ||
Comment 3•14 years ago
|
||
dmoore, do the times correlate with any work you were doing ?
Reporter | ||
Comment 4•14 years ago
|
||
talos-r3 master got a stop/start in a quiet period, with a purge_events for good measure. talos-pool has a similar problem with stalled builds but I haven't touche that.
Comment 5•14 years ago
|
||
The last time we had machines fail in downloading dirtyDBs.zip it was because the slaves had a bad version of Twisted IIRC.
Reporter | ||
Comment 6•14 years ago
|
||
There were 4 more premature disconnects at 22:11 PST on the 18th, two each of fedora and leopard boxes. Nothing else between then and now. Seems to me there could be network congestion issue between the slaves in MV and the master in MPT, so I think we should leave this open to see what happens when MV arrives back at work. Alternatively, twice as many slaves connecting to talos-master may be saturating the network connection to communicate logs and download files. Filed bug 547602 to add munin monitoring. Note that there are other issues causing problems for r3 jobs * timing out downloading symbols on mac - should be fixed by bug 546939 * "talos dirty" jobs for XP and Leopard are timing out retrieving dirtyMaxDBs.zip from the master, bug 547600
Reporter | ||
Comment 8•14 years ago
|
||
Still some failures occurring. This is all the 'exception' results since those in attachment 427704 [details], including the 4 I mentioned in comment 6. I verified that a couple are lost connections between master and slave.
Assignee | ||
Comment 9•14 years ago
|
||
Is this still occurring now that rev3 master is in production?
Reporter | ||
Comment 10•14 years ago
|
||
Don't know which of these line up with our recent problems.
Assignee | ||
Comment 11•14 years ago
|
||
Nothing new here since the 9th - still an issue?
Reporter | ||
Comment 12•14 years ago
|
||
One yesterday (tracemonkey-xp-v8 218 at 2010-03-23 00:24:51 UTC), then lots a week ago, otherwise fine.
Status: NEW → RESOLVED
Closed: 14 years ago
Resolution: --- → WORKSFORME
Updated•11 years ago
|
Product: mozilla.org → Release Engineering
You need to log in
before you can comment on or make changes to this bug.
Description
•