Failures in some Amazon EC2 B2G builds ("Connection to the other side was lost in a non-clean fashion.")

RESOLVED FIXED

Status

Release Engineering
General
--
major
RESOLVED FIXED
6 years ago
19 days ago

People

(Reporter: emorley, Assigned: catlee)

Tracking

Firefox Tracking Flags

(Not tracked)

Details

(Whiteboard: [briarpatch][ec2])

(Reporter)

Description

6 years ago
Might just be a transient network issue, but figured since the EC2 builds are still pretty new, wouldn't hurt to file in case we can improve logging output or reliability somehow.

B2G gb_armv7a_gecko-debug mozilla-inbound build on 2012-07-17 10:04:38 PDT for push b2aeb8be3ded
slave: bld-linux64-ec2-005
https://tbpl.mozilla.org/php/getParsedLog.php?id=13612195&tree=Mozilla-Inbound

B2G gb_armv7a_gecko mozilla-inbound build on 2012-07-17 10:04:38 PDT for push b2aeb8be3ded
slave: bld-linux64-ec2-004
https://tbpl.mozilla.org/php/getParsedLog.php?id=13612197&tree=Mozilla-Inbound

B2G gb_armv7a_gecko mozilla-inbound build on 2012-07-17 10:21:38 PDT for push 2b31d5caf239
slave: bld-linux64-ec2-002
https://tbpl.mozilla.org/php/getParsedLog.php?id=13612193&tree=Mozilla-Inbound

Not sure who has been dealing with the AWS work, other than what I read in http://oduinn.com/blog/2012/07/11/releng-production-systems-go-hybrid-now-available-on-aws/ , so don't really know who to CC.
ed: thanks for this. Yes, all very new, so yes, we're very interested in shaking out any surprises.

catlee: any ideas?
Component: Release Engineering → Release Engineering: Automation (General)
QA Contact: catlee
(Assignee)

Comment 2

6 years ago
those look like "regular" connection timeouts. the builds were running for a very long time before finally losing their connections.

Comment 3

6 years ago
this is from the monitoring code - it makes an entry that the slave was idle and then a bit later goes to do a shutdown.

twice now the slave has started a job in the minutes between the marking of idle and the shutdown request.

I have disabled this code for the short term until I can test and fix it.
(Assignee)

Updated

6 years ago
Assignee: nobody → bear
Whiteboard: [briarpatch][ec2]
(Assignee)

Updated

6 years ago
Assignee: bear → catlee
(Assignee)

Comment 4

6 years ago
I can't promise this will never happen again, but the immediate cause of this particular case is fixed. We now verify that buildbot has shut down before turning off the VM.
Status: NEW → RESOLVED
Last Resolved: 6 years ago
Resolution: --- → FIXED
Product: mozilla.org → Release Engineering
Component: General Automation → General
Product: Release Engineering → Release Engineering
You need to log in before you can comment on or make changes to this bug.