Google Talk (GTalk) cannot reconnect due to SSL session issues

RESOLVED WORKSFORME

Status

defect
--
major
RESOLVED WORKSFORME
4 years ago
4 years ago

People

(Reporter: clokep, Assigned: clokep)

Tracking

({regression})

Thunderbird Tracking Flags

(thunderbird38- unaffected)

Details

(Whiteboard: [1.6-blocking])

Attachments

(3 attachments)

(Assignee)

Description

4 years ago
Posted file Debug log
If a Google Talk account disconnect, it fails to reconnect with the server closing the connection with a net reset. No data gets sent from the server to the client.

I've attached a debug log.

Florian found that disabling SSL sessions seems to fix the issue (similar to bug 954724, although the ability to disable SSL sessions didn't exist until Mozilla 35). This can be disabled by flipping security.ssl.disable_session_identifiers to true.

Updated

4 years ago
Keywords: regression
Whiteboard: [1.6-blocking]

Comment 1

4 years ago
For STR switching your status to Offline and then back to Available is enough.

Comment 2

4 years ago
Thanks for posting on this bug.  I will be using an alternative until this is resolved.

Comment 3

4 years ago
I wonder if there is anything obvious we should be looking at here? Or is there a way to force a fresh SSL session? (The socket code is http://mxr.mozilla.org/comm-central/source/chat/modules/socket.jsm)
Flags: needinfo?(dkeeler)
Some NSPR logs that might help are nsHttp, pipnss, and certverifier (see e.g. https://developer.mozilla.org/en-US/docs/Mozilla/Debugging/HTTP_logging ). You might also use wireshark or something to capture some packets to see what actual network traffic is occurring. You could just use security.ssl.disable_session_identifiers, but that's just a workaround - it would be better to find out what's going wrong.
Flags: needinfo?(dkeeler)
Duplicate of this bug: 1125463
(Assignee)

Comment 6

4 years ago
Posted file NSS log
I created an NSS log with:
set NSPR_LOG_MODULES=timestamp,nsHttp:5,pipnss:5,certverifier:5
set NSPR_LOG_FILE=%TEMP%\nss-log.txt

After which I started Instantbird, connected to GTalk (which is successful), immediately disconnected and then attempted to reconnect again, which successfully reproduced the bug. Unfortunately, this log looks like gibberish to me. :)
Flags: needinfo?(dkeeler)
I don't see anything obviously wrong in that log. Maybe a packet trace would help?
Flags: needinfo?(dkeeler)

Comment 8

4 years ago
I meant to post this other observation, just in case it provides some clearance:

During a session, if the client disconnects, it would not reconnect unless I restart InstantBird, and even though it is  a gamble that it would connect by that method, it seems to have a better chance as opposed to not restarting the client.
(In reply to deOmega from comment #8)
> I meant to post this other observation, just in case it provides some
> clearance:
> 
> During a session, if the client disconnects, it would not reconnect unless I
> restart InstantBird, and even though it is  a gamble that it would connect
> by that method, it seems to have a better chance as opposed to not
> restarting the client.

Right, we know restarting Instantbird makes reconnecting possible. If you want to workaround the problem, you can go to about:config (type "/about config" in a conversation), create the boolean preference security.ssl.disable_session_identifiers and set it to true.

Comment 10

4 years ago
(In reply to Florian Quèze [:florian] [:flo] from comment #9)
> (In reply to deOmega from comment #8)
> > I meant to post this other observation, just in case it provides some
> > clearance:
> > 
> > During a session, if the client disconnects, it would not reconnect unless I
> > restart InstantBird, and even though it is  a gamble that it would connect
> > by that method, it seems to have a better chance as opposed to not
> > restarting the client.
> 
> Right, we know restarting Instantbird makes reconnecting possible. If you
> want to workaround the problem, you can go to about:config (type "/about
> config" in a conversation), create the boolean preference
> security.ssl.disable_session_identifiers and set it to true.

This would require users to update to a version above 1.5, correct?
Hmm, possibly, yes. I was just mentioning this for you, I wasn't thinking about "users" in general.

Comment 12

4 years ago
I understood that you were...  thank you.

Comment 13

4 years ago
Ok,  that is the same string I tried unsuccessfully while I was in irc recently.  

Using that, I actually cannot establish even a temporary connection unless I remove the string AND revert to the 1.5 version of instantbird. 

String input:
http://oi62.tinypic.com/fcnes.jpg

error image example:

http://oi61.tinypic.com/35jjzp0.jpg
(Assignee)

Comment 14

4 years ago
David, I have a few packet captures:
1. The initial connection
2. A reconnection with disable_session_identifiers set to true (this seems to fail to reconnect once, then actually reconnects on a second try)
3. A reconnection with disable_session_identifiers set to false (which never actually reconnects)

I'd rather not post packet captures in a public forum like this though, would I be able to email them to you? Thanks.
Flags: needinfo?(dkeeler)
Hi Patrick,

Emailing them to me works. I believe you can also add attachments as private, so only members of the core-security group will be able to view them.
Flags: needinfo?(dkeeler)
(Assignee)

Comment 16

4 years ago
I went ahead and sent David the captures in an email.

Comment 17

4 years ago
I was finally able to get the string to work and as of now, both accounts are connected.  Hopefully this holds, but again, this is the first time I am able to establish a connection while using the string.


Thanks for all the guidance!
(Assignee)

Comment 18

4 years ago
I'm not sure what else to do here besides set the pref by default...
Assignee: nobody → patrick+mozilla-bugzilla
Status: NEW → ASSIGNED
Attachment #8574990 - Flags: review?(florian)
(Assignee)

Updated

4 years ago
Comment on attachment 8574990 [details] [diff] [review]
Disable session IDs

This is fine for Instantbird. For Thunderbird, I think someone else should have a look too. I don't know if there could be unexpected consequences.
Attachment #8574990 - Flags: review?(florian) → review+
(Assignee)

Comment 20

4 years ago
Comment on attachment 8574990 [details] [diff] [review]
Disable session IDs

Review of attachment 8574990 [details] [diff] [review]:
-----------------------------------------------------------------

Joshua, not sure if you're the right person to look at this? (Maybe rkent?)
Attachment #8574990 - Flags: review?(Pidgeot18)
So one issue that definitely needs to be addressed is the question of what the unintended consequences of disabling session identifiers are, since this is a big hammer that hits everybody. I don't know this TLS feature is, nor do I know why Google Talk's servers seem to be broken with it enabled. If it's a problem with Google Talk, then quite possibly it's best fixed by having Google fix their servers.

It sounds like David already has a packet trace and could provide some answer as to the question of what the impact of this change would be.
Flags: needinfo?(dkeeler)
Changing that pref disables TLS session tickets and the session cache, meaning that session resumption can't happen. Session resumption is a performance optimization. Without it, each time a client wants to connect to a server, they have to negotiate a full handshake (which can involve many more TCP round-trips). If connections are long-lived, this probably won't be noticeable. However, if the client makes many short connections (and closes them each time), the user could notice a slowdown.
Flags: needinfo?(dkeeler)
Ryan, Karl: can you comment on potential issues on Google's end from comment 21?
Flags: needinfo?(ryan.sleevi)
Flags: needinfo?(kdubost)

Comment 24

4 years ago
contactemail
I pinged people at Google about the issue.
Flags: needinfo?(kdubost)

Comment 25

4 years ago
I'm not sure what the question is for Google.
Flags: needinfo?(ryan.sleevi)
Patrick, 

could you provide here for the record what would need to be changed on Google servers?
(or at least a summary of the technical issue, and a step by step to reproduce it, so Google engineers can work out eventually a solution).

Thanks.
Flags: needinfo?(clokep)
(Assignee)

Comment 27

4 years ago
(In reply to Karl Dubost :karlcow from comment #26)
> could you provide here for the record what would need to be changed on
> Google servers?

I don't think I can offer any useful information here: I don't have any idea how Google has their servers configured (nor do I know much about setting up SSL in general). We've had other issues with the SSL used on the Google Talk servers, however; see bug 1092701.

> (or at least a summary of the technical issue, and a step by step to
> reproduce it, so Google engineers can work out eventually a solution).

STR are (using Instantbird or Thunderbird):
1. Create a Google Talk account.
2. Connect the Google Talk account. (This should succeed, download contacts, etc.)
3. Disconnect the Google Talk account.
4. Attempt to (Re-)Connect the Google Talk account and it gets stuck in "Initializing stream..." (i.e. connection never succeeds).

I've attached NSS logs and our debug logs which might shed some further light on this.

Please let me know if I can provide any other information (or more detailed steps to reproduce) or if someone at Google would like to work directly with me. Thanks for the help so far!
Flags: needinfo?(clokep)
(Assignee)

Comment 28

4 years ago
Karl, is there any further information you need from me? Have we had anything from Google engineers? Thanks!
Flags: needinfo?(kdubost)
Patrick. 
No thanks. That was useful.
We rarely get answers from Google Engineers. 
We notify them about the issue.
Best scenario: someone acks and work on it.
Likely scenario: someone acks and give us the issue number at Google.
Most likely scenario: blackhole.
Flags: needinfo?(kdubost)
Comment on attachment 8574990 [details] [diff] [review]
Disable session IDs

I don't claim to have investigated disable_session_identifiers but it seems to me to be very risky to change a setting like this globally when it only is affected by a particular account type. Wouldn't it only make sense to set this when someone has a Google Talk account configured?
I haven't seen any argument why this should block, so removing tb38 blocking.

Comment 32

4 years ago
jcranmer: Not taking this patch would be rather painful for gtalk users, so a review would be appreciated.

Comment 33

4 years ago
I've talked with my colleagues, and none of us can find any evidence to support this as an issue.

That is, we can't find anything to suggest that it's not a bug in your code.

Considering other clients don't have this issue, it may help for you to debug further, or to wireshark. The attached logs are insufficient for providing any useful diagnostic capabilities.
(Assignee)

Comment 34

4 years ago
Ryan, I've sent a variety of WireShark captures to David Keeler, but he was unable to find anything wrong with them. I'd be happy to provide them to you as well. (I didn't attach them to this bug since they contain IP addresses, etc.)
Flags: needinfo?(ryan.sleevi)

Comment 35

4 years ago
A wireshark with an SSLKEYLOGFILE emailed is ideal ( https://developer.mozilla.org/en-US/docs/Mozilla/Projects/NSS/Key_Log_Format ), but of course, that will include all of the decrypted contents (e.g. usernames and passwords, not just IP addresses). If you had a testing account you could work with, great.

Alternatively, you could take a stab at debugging it yourself, since the above would help you show the full TLS handshake state machine in Wireshark.

Just from talking with our server folks, we don't believe there's an issue at our end.
Flags: needinfo?(ryan.sleevi)
(Assignee)

Updated

4 years ago
Duplicate of this bug: 1166159

Comment 37

4 years ago
(In reply to Ryan Sleevi from comment #35)
> Just from talking with our server folks, we don't believe there's an issue
> at our end.

And we may have identified a possibly related issue; it'd be nice to know
1) If the issue still reproduces
2) If the issue reproduces if false start is disabled ( SSL_OptionSet(nss_fd, SSL_ENABLE_FALSE_START, PR_FALSE) )
Flags: needinfo?(clokep)
(Assignee)

Comment 38

4 years ago
(In reply to Ryan Sleevi from comment #37)
> (In reply to Ryan Sleevi from comment #35)
> > Just from talking with our server folks, we don't believe there's an issue
> > at our end.
> 
> And we may have identified a possibly related issue; it'd be nice to know
> 1) If the issue still reproduces

Definitely able to still reproduce it! (I've even seen it a few times after using the workaround in comment #0).

> 2) If the issue reproduces if false start is disabled (
> SSL_OptionSet(nss_fd, SSL_ENABLE_FALSE_START, PR_FALSE) )

I toggled the perf security.ssl.enable_false_start to false (it defaulted to true) and I seemed to get an immediate reconnect on one of my accounts. The other account did not reconnect.

The settings I was able to disconnect/reconnect with for a handful of times in a row (on both my accounts) are:
security.ssl.disable_session_identifiers set to true
security.ssl.enable_false_start set to false

Sorry I never got you the wireshark session from comment 35, I was changing jobs and didn't have a development computer set-up. I can provide this pretty easily now if you need it.
Flags: needinfo?(clokep) → needinfo?(ryan.sleevi)
(Assignee)

Comment 39

4 years ago
Ryan, is there anything I can do to help with this? Have you been able to narrow this down?
(Assignee)

Comment 40

4 years ago
Florian notified me he hasn't been having this issue any longer. I can also no longer reproduce this! I'm assuming there were server side changes made.

Please reopen if you're seeing this issue.
Status: ASSIGNED → RESOLVED
Last Resolved: 4 years ago
Resolution: --- → WORKSFORME
(Assignee)

Comment 41

4 years ago
Comment on attachment 8574990 [details] [diff] [review]
Disable session IDs

Clearing review.
Attachment #8574990 - Flags: review?(Pidgeot18)

Updated

4 years ago
Flags: needinfo?(ryan.sleevi)
You need to log in before you can comment on or make changes to this bug.