Closed Bug 1317987 Opened 9 years ago Closed 9 years ago

Intermittent Caught exception: <urlopen error [Errno 1] _ssl.c:504: error:14077410:SSL routines:SSL23_GET_SERVER_HELLO:sslv3 alert handshake failure>

Categories

(Infrastructure & Operations Graveyard :: CIDuty, task)

task
Not set
critical

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: intermittent-bug-filer, Unassigned)

References

Details

seems this hit autoland and other trees
closed integration trees now
Severity: normal → blocker
who is working on this? if the trees are closed for this bug I would expect to see a link to a bug, or a ni? or a comment about a specific person working on something.
pinged people on irc but no response so far, so trying here
Flags: needinfo?(coop)
Flags: needinfo?(catlee)
Flags: needinfo?(bugspam.Callek)
This is affecting Mac testers and all three flavors of Windows testers.
Flags: needinfo?(coop)
The actual error appears to be coming from contacting queue.taskcluster.net similar to in bug 1260584.
Flags: needinfo?(garndt)
Flags: needinfo?(catlee)
Flags: needinfo?(bugspam.Callek)
Bug 1260548 was about 5xx errors from cloudfront
I have opened up a support ticket with Heroku to find out if they have changed anything with our SSL endpoint that we were not aware of that could break this. It appears to be an issue with hosts using sslv3 to connect. 9:14 AM <@daveio> and as expected openssl s_client -ssl3 -connect queue.taskcluster.net:443 fails 9:16 AM <@daveio> the build job needs to be changed to use at least tlsv1, ideally tlsv1.2 Nothing has changed with these jobs recently, so it points to something on the heroku end.
There is no SSLv3 on this system; don't know if there was before: 1 ECDHE-RSA-AES128-GCM-SHA256 TLSv1.2 ECDH,P-256,256bits prime256v1 2 ECDHE-RSA-AES128-SHA256 TLSv1.2 ECDH,P-256,256bits prime256v1 3 ECDHE-RSA-AES128-SHA TLSv1,TLSv1.1,TLSv1.2 ECDH,P-256,256bits prime256v1 4 ECDHE-RSA-AES256-GCM-SHA384 TLSv1.2 ECDH,P-256,256bits prime256v1 5 ECDHE-RSA-AES256-SHA384 TLSv1.2 ECDH,P-256,256bits prime256v1 6 ECDHE-RSA-AES256-SHA TLSv1,TLSv1.1,TLSv1.2 ECDH,P-256,256bits prime256v1 7 AES128-GCM-SHA256 TLSv1.2 None 8 AES128-SHA256 TLSv1.2 None 9 AES128-SHA TLSv1,TLSv1.1,TLSv1.2 None 10 AES256-GCM-SHA384 TLSv1.2 None 11 AES256-SHA256 TLSv1.2 None 12 AES256-SHA TLSv1,TLSv1.1,TLSv1.2 None 13 DES-CBC3-SHA TLSv1,TLSv1.1,TLSv1.2 None
Retriggered jobs are green. Joel was able to successfully make a request on a 10.10 loaner using urllib. I don't think this is SSL-related. I think it was a transient network issue.
I /think/ that "SSL23_GET_SERVER_HELLO:sslv3 alert handshake failure" message is a bit misleading. As far as protocol selection is concerned in OpenSSL, there are constants with "SSLv23" or "SSL23" in them that actually mean "use the newest version SSL+TLS that both ends support." If we're actually using SSL (and not TLS) in automation, that's a security issue and any services still using SSL deserve to break.
Heroku has investigated and confirmed that nothing has changed with our SSL endpoint recently. Also, they confirmed that SSLv3 is indeed disabled just to avoid confusion in the future. There analysis points to a network issue. If we have any additional logs to provide them, please let me know. These failures seem concentrated 100% to windows and mac running out of scl3 or route their traffic through scl3 (like the win7vm instances).
Flags: needinfo?(garndt)
Also, trees have been reopened as of 2016-11-16T18:05:57+00:00
Severity: blocker → critical
No longer blocks: t-yosemite-r7-0038
No longer blocks: t-yosemite-r7-0182
No other occurrence since Nov 16, so it likely was a temporary network issue. Will mark as resolved for now, please feel free to reopen if needed.
Status: NEW → RESOLVED
Closed: 9 years ago
Resolution: --- → FIXED
Product: Release Engineering → Infrastructure & Operations
Product: Infrastructure & Operations → Infrastructure & Operations Graveyard
You need to log in before you can comment on or make changes to this bug.