Firefox TLS handshake issue - Timed Out

RESOLVED FIXED in Firefox 62

Status

()

defect
P2
normal
RESOLVED FIXED
Last year
Last year

People

(Reporter: petr.sumbera, Assigned: petr.sumbera)

Tracking

unspecified
mozilla62
Points:
---

Firefox Tracking Flags

(firefox62 fixed)

Details

(Whiteboard: [necko-triaged])

Attachments

(3 attachments)

Posted image firefox-timedout.png
User Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:59.0) Gecko/20100101 Firefox/59.0
Build ID: 20180323154952

Steps to reproduce:

This is issue on Solaris (intel) with latest trunk Firefox (last version I have fully working is version 52 ESR).

Please see attached screen shot.

Firefox started with removed profile and this was first request.

Few observations:
- there is no such issue when accessing plain HTTP site (no HTTPS)
- sometime reload gets me little bit further
- sometime I get into state where everything seem to work

On console there is no obvious error.

Any idea how to investigate this further?
Can you reach that server via other means (e.g. in another browser, or via the "ping" command)?
Flags: needinfo?(petr.sumbera)
I can ping server. And I can use older Firefox 52 to connect to the site without any issue.
Flags: needinfo?(petr.sumbera)
I used Wireshark to investigate it further and it seems to be issue during TLS 1.2 handshake on client side.

With Firefox 52 I can see that after receiving "New Session Ticket, Change Cipher Spec, Finished" from server, Firefox sends HTTP GET packet. Where with version 61 it sends instead [FIN, ACK] and server responds with "Alert (Level: Warning, Description: Close Notify)"

See attached textual output of the communication.
Summary: Firefox issue - Timed Out → Firefox TLS handshake issue - Timed Out
Following is end of debug output (SSLTRACE=127 SSLDEBUG=1):

5903: SSL3[-1270945232]: handle session_ticket handshake
5903: SSL[-1270945232]: consume bytes: [Len: 4]
   00 00 01 2c                                       ...,
5903: SSL[-1270945232]: consume bytes: [Len: 2]
   00 d0                                             ..
5903: SSL3[-1270945232]: gather state 1 (need 5 more)
5903: SSL[-1270945232]: raw gather data: [Len: 5]
   14 03 03 00 01                                    .....
5903: SSL3[-1270945232]: gather state 2 (need 1 more)
5903: SSL[-1270945232]: raw gather data: [Len: 1]
   01                                                .
5903: SSL[-1270945232]: got record of 1 bytes
5903: SSL[-1270945232]: ciphertext: [Len: 1]
   01                                                .
5903: SSL[-1270945232]: cleartext: [Len: 1]
   01                                                .
5903: SSL: frag hash1: header [Len: 13]
   00 00 00 00 00 00 00 05 14 03 03 00 01            .............
5903: SSL: frag hash1: input [Len: 1]
   01                                                .
5903: SSL3[-1270945232]: handle change_cipher_spec record
5903: SSL3[-1270945232] Set Current Read Cipher Suite to Pending
5903: SSL[-]: decrement refct for read spec -1486985904. epoch=0 new ct = 0
5903: SSL[-]: Freeing read spec -1486985904. epoch=0
5903: SSL3[-1270945232]: gather state 1 (need 5 more)
5903: SSL[-1270945232]: raw gather data: [Len: 5]
   16 03 03 00 28                                    ....(
5903: SSL3[-1270945232]: gather state 2 (need 40 more)
5903: SSL[-1270945232]: raw gather data: [Len: 40]
   53 03 78 68 7b fa c8 51 94 88 10 fa da 29 33 51   S.xh{..Q.....)3Q
   ea 7a b5 40 a2 7d 2d 12 5b 4c ef bb 6b dc 48 62   .z.@.}-.[L..k.Hb
   16 5f 75 0d 54 d1 b3 65                           ._u.T..e
5903: SSL[-1270945232]: got record of 40 bytes
5903: SSL[-1270945232]: ciphertext: [Len: 40]
   53 03 78 68 7b fa c8 51 94 88 10 fa da 29 33 51   S.xh{..Q.....)3Q
   ea 7a b5 40 a2 7d 2d 12 5b 4c ef bb 6b dc 48 62   .z.@.}-.[L..k.Hb
   16 5f 75 0d 54 d1 b3 65                           ._u.T..e
5903: SSL3[-1270945232]: handle handshake message: finished     (20)
5903: SSL3[-1270945232]: handle finished handshake
5903: SSL[-1270945232]: Hash handshake message: [Len: 12]
   c8 ac 68 50 f3 fe 18 43 77 65 03 43               ..hP...Cwe.C
5903: SSL[0]: new read spec -1486989552 ct=1
5903: SSL[0]: new write spec -1486985904 ct=1
5903: SSL[-1180946768]: connect failed, errno=-5934
5903: SSL[-1180946768]: secure connect completed, rv == -1
5903: SSL[0]: new read spec -1486985600 ct=1
5903: SSL[0]: new write spec -1486984688 ct=1
5903: SSL[-1485601744]: connect failed, errno=-5934
5903: SSL[-1485601744]: secure connect completed, rv == -1
5903: SSL3[-1270945232]: ssl3_GatherCompleteHandshake
5903: SSL[-1270945232]: handshake gathering, rv=-1
SSL: destroy sid: sid=0xa30ba7e0 cached=0
5903: SSL[-]: Freeing write spec -1486984992. epoch=1
5903: SSL[-]: Freeing read spec -1486985296. epoch=1
5903: SSL[-1516546592]: closing, rv=0 errno=-5998
++DOMWINDOW == 5 (7fffa4ae9c00) [pid = 5908] [serial = 7] [outer = 7fffb29b8800]
5903: SSL[-]: Freeing write spec -1579091744. epoch=0
5903: SSL[-]: Freeing read spec -1579092048. epoch=0
5903: SSL[-1597998112]: closing, rv=0 errno=-5978
5903: SSL[-]: Freeing write spec -1579100256. epoch=0
5903: SSL[-]: Freeing read spec -1579099952. epoch=0
5903: SSL[-1264619360]: closing, rv=0 errno=-5978
Maybe the issue is here:

5903: SSL3[-1270945232]: ssl3_GatherCompleteHandshake
5903: SSL[-1270945232]: handshake gathering, rv=-1

And in code it's probably here:

https://searchfox.org/mozilla-central/source/security/nss/lib/ssl/ssl3gthr.c#439

        if (ss->ssl3.hs.restartTarget) {
            ssl_ReleaseSSL3HandshakeLock(ss);
            PORT_SetError(PR_WOULD_BLOCK_ERROR);
            return (int)SECFailure;                 <===
        }
Component: Untriaged → Security
Assignee: nobody → nobody
Component: Security → Libraries
Product: Firefox → NSS
Version: Trunk → other
Is this issue specific to Solaris or can you reproduce it with other systems? Is there a publicly accessible server that exhibits this behaviour that could be used to test against?
Flags: needinfo?(petr.sumbera)
I haven't tried any other system but I'm pretty sure it's Solaris specific (otherwise someone on Linux or Windows already complained).

It can be reproduced on public servers too. But there comes other things to consider. Like that I'm behind firewall so I need to set proxy (where I used to removed ~/.mozilla directory).

For example:

https://www.idnes.cz/

It usually loads only parts of page (usually some pictures are missing beside articles).

On the other hand https://www.mozilla.org/ and https://www.bbc.co.uk/ seems to work well.

Since it's not very deterministic I chose that internal server where I can exhibit the problem everytime.
Flags: needinfo?(petr.sumbera)
I was able to build Firefox on Linux so I can now compare debug messages:

Following (two unsucesfull tries I see only on Solaris):

7321: SSL3[-579810480]: ssl3_GatherCompleteHandshake
7321: SSL3[-579810480]: gather state 1 (need 5 more)
7321: SSL3[-579810480]: recv error -5998
7321: SSL[-579810480]: handshake gathering, rv=-1
7321: SSL3[-579810480]: ssl3_GatherCompleteHandshake
7321: SSL3[-579810480]: gather state 1 (need 5 more)
7321: SSL3[-579810480]: recv error -5998

But finally it passes (on Linux it runs just once):

7321: SSL3[-579810480]: ssl3_GatherCompleteHandshake
7321: SSL3[-579810480]: gather state 1 (need 5 more)
7321: SSL[-579810480]: raw gather data: [Len: 5]
   16 03 03 00 41                                    ....A
   
Interesting is that on Solaris there are values displayed as signed (not just for SSL[num] but also for spec?):
   
7321: SSL[-579810480]: saving 6 bytes of data (81 total saved so far)
7321: SSL[-]: decrement refct for write spec -788074960. epoch=0 new ct = 0
7321: SSL[-]: Freeing write spec -788074960. epoch=0   

Again on Solaris one failure (but following one is ok):

7321: SSL[-579810480]: saving 45 bytes of data (126 total saved so far)
7321: SSL[-579810480]: sending 126 bytes of saved data
7321: SSL3[-579810480]: gather state 1 (need 5 more)
7321: SSL3[-579810480]: recv error -5998
7321: SSL3[-579810480]: ssl3_GatherCompleteHandshake

One more signed spec value on Solaris:

7321: SSL[-]: decrement refct for read spec -788077392. epoch=0 new ct = 0
7321: SSL[-]: Freeing read spec -788077392. epoch=0

And finally this is last common part:

7321: SSL3[-579810480]: handle handshake message: finished     (20)
7321: SSL3[-579810480]: handle finished handshake
7321: SSL[-579810480]: Hash handshake message: [Len: 12]
   66 41 11 2f 5a 84 84 dc 2b 46 50 20               fA./Z...+FP
7321: SSL[-579810480]: handshake gathering, rv=-2
7321: SSL[-579810480]: handshake blocked (need 0)

On Solaris it continues like this:

SSL: destroy sid: sid=0xfd488de0 cached=0
7321: SSL[-]: Freeing write spec -788074352. epoch=1
7321: SSL[-]: Freeing read spec -788074656. epoch=1
7321: SSL[-788443184]: closing, rv=0 errno=-5998

And linux continues like this:

32763: SSL3[1899437552]: ssl3_GatherCompleteHandshake
32763: SSL[1889267232]: ssl_Poll flags 6 -> 6
32763: SSL[1840593408]: ssl_Poll flags 6 -> 6
Still trying to find differences against Linux (using some debug printfs). The issue seems to be related to following:

SSL handshake is not completed because socket is closed (nsNSSSocketInfo::CloseSocketAndDestroy) before nsNSSSocketInfo::SetCertVerificationResult() is called (mFd is null).
The socket (see above) is closed while there are following debug log messages:

2018-06-14 10:53:44.106593 UTC - [19406:Socket Thread]: V/nsHttp canceling transaction: tls handshake takes too long: tls handshake last 791ms, timeout is 30000ms.
2018-06-14 10:53:44.106703 UTC - [19406:Socket Thread]: V/nsHttp nsHttpConnection::CloseTransaction[this=d13b7800 trans=d13b6c00 reason=804b000e]
2018-06-14 10:53:44.106812 UTC - [19406:Socket Thread]: V/nsHttp nsHttpTransaction::Close [this=d13b6c00 reason=804b000e]
..
2018-06-14 08:36:42.471689 UTC - [19069:Socket Thread]: D/nsSocketTransport nsSocketInputStream::Read [this=ca4dee58 count=4000]
2018-06-14 08:36:42.472126 UTC - [19069:Socket Thread]: D/nsSocketTransport   calling PR_Read [count=4000]
2018-06-14 08:36:42.472867 UTC - [19069:Socket Thread]: V/pipnss [d378db50] read -1 bytes
2018-06-14 08:36:42.473082 UTC - [19069:Socket Thread]: D/nsSocketTransport   PR_Read returned [n=-1]

Note "tls handshake takes too long: tls handshake last 791ms, timeout is 30000ms" !!!

source/netwerk/protocol/http/nsHttpConnection.cpp:

      if (initialTLSDelta > gHttpHandler->TLSHandshakeTimeout()) {
        LOG(("canceling transaction: tls handshake takes too long: tls handshake "

This compares time ticks with ms.
Component: Libraries → Networking: HTTP
Product: NSS → Core
Version: other → unspecified
Attachment #8985591 - Flags: review?(dd.mozilla)
Assignee: nobody → petr.sumbera
Priority: -- → P2
Whiteboard: [necko-triaged]
Comment on attachment 8985591 [details] [diff] [review]
Bug1452130.patch

Review of attachment 8985591 [details] [diff] [review]:
-----------------------------------------------------------------

thanks.
Attachment #8985591 - Flags: review?(dd.mozilla) → review+
Keywords: checkin-needed
Pushed by apavel@mozilla.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/4789a350ec7f
nsHttpConnection.cpp should compare time intervals in the same units r=dragana
Keywords: checkin-needed
https://hg.mozilla.org/mozilla-central/rev/4789a350ec7f
Status: UNCONFIRMED → RESOLVED
Closed: Last year
Resolution: --- → FIXED
Target Milestone: --- → mozilla62
You need to log in before you can comment on or make changes to this bug.