Lots of ondemand_update failures with various Java IO Exceptions

RESOLVED WORKSFORME

Status

Mozilla QA
Infrastructure
RESOLVED WORKSFORME
4 years ago
4 years ago

People

(Reporter: ashughes, Unassigned)

Tracking

Details

(Reporter)

Description

4 years ago
Today while triggering the ondemand_update testruns for Firefox 23.0b2 on betatest I encountered several Java IO Exceptions (18/120 testruns).

Here are some of the failures:
http://mm-ci-master.qa.scl3.mozilla.com:8080/job/ondemand_update/13456
http://mm-ci-master.qa.scl3.mozilla.com:8080/job/ondemand_update/13449
http://mm-ci-master.qa.scl3.mozilla.com:8080/job/ondemand_update/13448
http://mm-ci-master.qa.scl3.mozilla.com:8080/job/ondemand_update/13447

I'm not sure what happened but rebuilding the failed testruns seems to have fixed it. It's a bit concerning that this happened so many times though.
(Reporter)

Comment 1

4 years ago
I did not re-encounter these issues with releasetest or beta channel updates so I'm assuming this was a one-off, though it did occur 18 times initially. I'm resolving this bug WORKSFORME but feel free to reopen if you feel this warrants further investigation.
Status: NEW → RESOLVED
Last Resolved: 4 years ago
Resolution: --- → WORKSFORME
It would appear that a network issue caused the nodes to temporarily become unavailable. I'm unsure what further investigation can done in this case. The following is from the log for one of the nodes:

ERROR: Connection terminated
ha:AAAAWB+LCAAAAAAAAABb85aBtbiIQSmjNKU4P08vOT+vOD8nVc8DzHWtSE4tKMnMz/PLL0ldFVf2c+b/lb5MDAwVRQxSaBqcITRIIQMEMIIUFgAAckCEiWAAAAA=java.io.IOException: Unexpected ter
mination of the channel
	at hudson.remoting.Channel$ReaderThread.run(Channel.java:1133)
Caused by: java.io.EOFException
	at java.io.ObjectInputStream$BlockDataInputStream.peekByte(ObjectInputStream.java:2577)
	at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1315)
	at java.io.ObjectInputStream.readObject(ObjectInputStream.java:369)
	at hudson.remoting.Channel$ReaderThread.run(Channel.java:1127)

The same can be seen in many of the logs. This appears to have occurred at 11:31 (Pacific) on July 2nd, which correlates with the time of the failures.
FYI, the node logs can be found if you SSH to mm-ci-master.qa.scl3.mozilla.com and locate /home/mozauto/mozmill-ci/jenkins-master/slave-mm-*
You need to log in before you can comment on or make changes to this bug.