Try Linux talos boxes take a really long time to give back results

RESOLVED FIXED

Status

defect
P2
normal
RESOLVED FIXED
10 years ago
6 years ago

People

(Reporter: jrmuizel, Assigned: anodelman)

Tracking

Firefox Tracking Flags

(Not tracked)

Details

Attachments

(1 attachment)

I started my build today at 10:00 am est and still don't (at 3:51 pm est) have linux talos results. There are also builds that started at 7:14 am est that don't have results yet.
Do you have links to the log where your build was generated?
Is this try talos only?
Assignee: nobody → anodelman
Priority: -- → P2
Summary: Linux talos boxes take a really long time to give back results → Try Linux talos boxes take a really long time to give back results
A lot of failure to download builds in the log.  Investigating.
(In reply to comment #3)
> Is this try talos only?

Yes, sorry, try talos only.
Depends on: 549473
Pretty much all the linux try slaves are in a stuck state, I can't even log onto them to investigate.  Once they are given a kick I can find out more.
This has been broken since Feb. 27th. I wonder what has changed in the slaves or the network setup.

wget --progress=dot:mega -N --no-check-certificate http://build.mozilla.org/tryserver-builds/masayuki@d-toybox.com-try-8bd857f27b57/try-8bd857f27b57-linux.tar.bz2
 in dir /home/mozqa/talos-slave/linux-cold/../talos-data (timeout 1200 secs)
 watching logfiles {}
 argv: ['wget', '--progress=dot:mega', '-N', '--no-check-certificate', u'http://build.mozilla.org/tryserver-builds/masayuki@d-toybox.com-try-8bd857f27b57/try-8bd857f27b57-linux.tar.bz2']
 environment: {'GNOME_DESKTOP_SESSION_ID': 'Default', 'LOGNAME': 'mozqa', 'WINDOWID': '39845984', 'PATH': '/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games', 'HOME': '/home/mozqa', 'DISPLAY': ':0.0', 'SSH_AGENT_PID': '5706', 'LANG': 'en_US.UTF-8', 'TERM': 'xterm', 'SHELL': '/bin/bash', 'XAUTHORITY': '/home/mozqa/.Xauthority', 'SESSION_MANAGER': 'local/qm-ref-ubuntu:/tmp/.ICE-unix/5657', 'XDG_DATA_DIRS': '/usr/local/share/:/usr/share/:/usr/share/gdm/', 'WINDOWPATH': '7', 'USERNAME': 'mozqa', 'GDM_XSERVER_LOCATION': 'local', 'COLORTERM': 'gnome-terminal', 'SSH_AUTH_SOCK': '/tmp/ssh-XVPOmk5657/agent.5657', 'GNOME_KEYRING_SOCKET': '/tmp/keyring-Tj8gRk/socket', 'GDMSESSION': 'default', 'DBUS_SESSION_BUS_ADDRESS': 'unix:abstract=/tmp/dbus-pxYKgfbHfP,guid=09f76fe7b00a7beadf6290004b88f1cc', 'XDG_SESSION_COOKIE': 'd16c023c06e46dc1bca5ea0046b85100-1267265994.817081-1722945', 'DESKTOP_SESSION': 'default', 'GDM_LANG': 'en_US.UTF-8', 'PWD': '/home/mozqa', 'GTK_RC_FILES': '/etc/gtk/gtkrc:/home/mozqa/.gtkrc-1.2-gnome2', 'USER': 'mozqa'}
--07:59:33--  http://build.mozilla.org/tryserver-builds/masayuki@d-toybox.com-try-8bd857f27b57/try-8bd857f27b57-linux.tar.bz2
           => `try-8bd857f27b57-linux.tar.bz2'
Resolving build.mozilla.org... 10.2.74.128
Connecting to build.mozilla.org|10.2.74.128|:80... connected.
HTTP request sent, awaiting response... 302 Found
Location: https://build.mozilla.org/tryserver-builds/masayuki@d-toybox.com-try-8bd857f27b57/try-8bd857f27b57-linux.tar.bz2 [following]
--07:59:33--  https://build.mozilla.org/tryserver-builds/masayuki@d-toybox.com-try-8bd857f27b57/try-8bd857f27b57-linux.tar.bz2
           => `try-8bd857f27b57-linux.tar.bz2'
Connecting to build.mozilla.org|10.2.74.128|:443... connected.
OpenSSL: error:1408F119:SSL routines:SSL3_GET_RECORD:decryption failed or bad record mac
Unable to establish SSL connection.
program finished with exit code 1
Status: NEW → ASSIGNED
OS: Mac OS X → Linux
This is what it needs to work again:
 --secure-protocol=SSLv2
(In reply to comment #7)
> This has been broken since Feb. 27th. I wonder what has changed in the slaves
> or the network setup.
...
Connecting to build.mozilla.org|10.2.74.128|:443... connected.
OpenSSL: error:1408F119:SSL routines:SSL3_GET_RECORD:decryption failed or bad
record mac
Unable to establish SSL connection.



mrz, dmoore: did anything change with networks on 27feb?
Severity: normal → critical
Maybe related to the ssl renegotiation config? See e.g. bug 545329 / bug 549109.
Depends on: 549710
The wget error stopped happening after the slaves were rebooted.  I've pulled a sick slave (qm-pubuntu-try13) out of the pool that was consistently failing.

The latest results looks good to me.
In regard to comment #7, that error doesn't appear on the mozillatest tinderbox anymore (post my having the machines rebooted) - I believe that it is now a red herring.
Switching to 'normal' as the current tests look good.  Will keep this bug open to track getting the sick talos box back in the pool.  

jrmuizel is going to also keep an eye on some new builds that he has in the queue.
Severity: critical → normal
qm-pubuntu-try13 back up and connected.

I'm seeing lots of linux try green, so we're good to go here.
Status: ASSIGNED → RESOLVED
Closed: 10 years ago
Resolution: --- → FIXED
Attachment #429813 - Flags: review?(anodelman)
Product: mozilla.org → Release Engineering
You need to log in before you can comment on or make changes to this bug.