Today while triggering the ondemand_update testruns for Firefox 23.0b2 on betatest I encountered several Java IO Exceptions (18/120 testruns). Here are some of the failures: http://mm-ci-master.qa.scl3.mozilla.com:8080/job/ondemand_update/13456 http://mm-ci-master.qa.scl3.mozilla.com:8080/job/ondemand_update/13449 http://mm-ci-master.qa.scl3.mozilla.com:8080/job/ondemand_update/13448 http://mm-ci-master.qa.scl3.mozilla.com:8080/job/ondemand_update/13447 I'm not sure what happened but rebuilding the failed testruns seems to have fixed it. It's a bit concerning that this happened so many times though.
I did not re-encounter these issues with releasetest or beta channel updates so I'm assuming this was a one-off, though it did occur 18 times initially. I'm resolving this bug WORKSFORME but feel free to reopen if you feel this warrants further investigation.
It would appear that a network issue caused the nodes to temporarily become unavailable. I'm unsure what further investigation can done in this case. The following is from the log for one of the nodes: ERROR: Connection terminated ha:AAAAWB+LCAAAAAAAAABb85aBtbiIQSmjNKU4P08vOT+vOD8nVc8DzHWtSE4tKMnMz/PLL0ldFVf2c+b/lb5MDAwVRQxSaBqcITRIIQMEMIIUFgAAckCEiWAAAAA=java.io.IOException: Unexpected ter mination of the channel at hudson.remoting.Channel$ReaderThread.run(Channel.java:1133) Caused by: java.io.EOFException at java.io.ObjectInputStream$BlockDataInputStream.peekByte(ObjectInputStream.java:2577) at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1315) at java.io.ObjectInputStream.readObject(ObjectInputStream.java:369) at hudson.remoting.Channel$ReaderThread.run(Channel.java:1127) The same can be seen in many of the logs. This appears to have occurred at 11:31 (Pacific) on July 2nd, which correlates with the time of the failures.
FYI, the node logs can be found if you SSH to mm-ci-master.qa.scl3.mozilla.com and locate /home/mozauto/mozmill-ci/jenkins-master/slave-mm-*