Closed Bug 1123801 Opened 9 years ago Closed 9 years ago

Google Talk (GTalk) cannot reconnect due to SSL session issues

Categories

(Chat Core :: XMPP, defect)

defect
Not set
major

Tracking

(thunderbird38- unaffected)

RESOLVED WORKSFORME
Tracking Status
thunderbird38 - unaffected

People

(Reporter: clokep, Assigned: clokep)

References

Details

(Keywords: regression, Whiteboard: [1.6-blocking])

Attachments

(3 files)

Attached file Debug log
If a Google Talk account disconnect, it fails to reconnect with the server closing the connection with a net reset. No data gets sent from the server to the client.

I've attached a debug log.

Florian found that disabling SSL sessions seems to fix the issue (similar to bug 954724, although the ability to disable SSL sessions didn't exist until Mozilla 35). This can be disabled by flipping security.ssl.disable_session_identifiers to true.
Keywords: regression
Whiteboard: [1.6-blocking]
For STR switching your status to Offline and then back to Available is enough.
Thanks for posting on this bug.  I will be using an alternative until this is resolved.
I wonder if there is anything obvious we should be looking at here? Or is there a way to force a fresh SSL session? (The socket code is http://mxr.mozilla.org/comm-central/source/chat/modules/socket.jsm)
Flags: needinfo?(dkeeler)
Some NSPR logs that might help are nsHttp, pipnss, and certverifier (see e.g. https://developer.mozilla.org/en-US/docs/Mozilla/Debugging/HTTP_logging ). You might also use wireshark or something to capture some packets to see what actual network traffic is occurring. You could just use security.ssl.disable_session_identifiers, but that's just a workaround - it would be better to find out what's going wrong.
Flags: needinfo?(dkeeler)
Attached file NSS log
I created an NSS log with:
set NSPR_LOG_MODULES=timestamp,nsHttp:5,pipnss:5,certverifier:5
set NSPR_LOG_FILE=%TEMP%\nss-log.txt

After which I started Instantbird, connected to GTalk (which is successful), immediately disconnected and then attempted to reconnect again, which successfully reproduced the bug. Unfortunately, this log looks like gibberish to me. :)
Flags: needinfo?(dkeeler)
I don't see anything obviously wrong in that log. Maybe a packet trace would help?
Flags: needinfo?(dkeeler)
I meant to post this other observation, just in case it provides some clearance:

During a session, if the client disconnects, it would not reconnect unless I restart InstantBird, and even though it is  a gamble that it would connect by that method, it seems to have a better chance as opposed to not restarting the client.
(In reply to deOmega from comment #8)
> I meant to post this other observation, just in case it provides some
> clearance:
> 
> During a session, if the client disconnects, it would not reconnect unless I
> restart InstantBird, and even though it is  a gamble that it would connect
> by that method, it seems to have a better chance as opposed to not
> restarting the client.

Right, we know restarting Instantbird makes reconnecting possible. If you want to workaround the problem, you can go to about:config (type "/about config" in a conversation), create the boolean preference security.ssl.disable_session_identifiers and set it to true.
(In reply to Florian Quèze [:florian] [:flo] from comment #9)
> (In reply to deOmega from comment #8)
> > I meant to post this other observation, just in case it provides some
> > clearance:
> > 
> > During a session, if the client disconnects, it would not reconnect unless I
> > restart InstantBird, and even though it is  a gamble that it would connect
> > by that method, it seems to have a better chance as opposed to not
> > restarting the client.
> 
> Right, we know restarting Instantbird makes reconnecting possible. If you
> want to workaround the problem, you can go to about:config (type "/about
> config" in a conversation), create the boolean preference
> security.ssl.disable_session_identifiers and set it to true.

This would require users to update to a version above 1.5, correct?
Hmm, possibly, yes. I was just mentioning this for you, I wasn't thinking about "users" in general.
I understood that you were...  thank you.
Ok,  that is the same string I tried unsuccessfully while I was in irc recently.  

Using that, I actually cannot establish even a temporary connection unless I remove the string AND revert to the 1.5 version of instantbird. 

String input:
http://oi62.tinypic.com/fcnes.jpg

error image example:

http://oi61.tinypic.com/35jjzp0.jpg
David, I have a few packet captures:
1. The initial connection
2. A reconnection with disable_session_identifiers set to true (this seems to fail to reconnect once, then actually reconnects on a second try)
3. A reconnection with disable_session_identifiers set to false (which never actually reconnects)

I'd rather not post packet captures in a public forum like this though, would I be able to email them to you? Thanks.
Flags: needinfo?(dkeeler)
Hi Patrick,

Emailing them to me works. I believe you can also add attachments as private, so only members of the core-security group will be able to view them.
Flags: needinfo?(dkeeler)
I went ahead and sent David the captures in an email.
I was finally able to get the string to work and as of now, both accounts are connected.  Hopefully this holds, but again, this is the first time I am able to establish a connection while using the string.


Thanks for all the guidance!
I'm not sure what else to do here besides set the pref by default...
Assignee: nobody → patrick+mozilla-bugzilla
Status: NEW → ASSIGNED
Attachment #8574990 - Flags: review?(florian)
Comment on attachment 8574990 [details] [diff] [review]
Disable session IDs

This is fine for Instantbird. For Thunderbird, I think someone else should have a look too. I don't know if there could be unexpected consequences.
Attachment #8574990 - Flags: review?(florian) → review+
Comment on attachment 8574990 [details] [diff] [review]
Disable session IDs

Review of attachment 8574990 [details] [diff] [review]:
-----------------------------------------------------------------

Joshua, not sure if you're the right person to look at this? (Maybe rkent?)
Attachment #8574990 - Flags: review?(Pidgeot18)
So one issue that definitely needs to be addressed is the question of what the unintended consequences of disabling session identifiers are, since this is a big hammer that hits everybody. I don't know this TLS feature is, nor do I know why Google Talk's servers seem to be broken with it enabled. If it's a problem with Google Talk, then quite possibly it's best fixed by having Google fix their servers.

It sounds like David already has a packet trace and could provide some answer as to the question of what the impact of this change would be.
Flags: needinfo?(dkeeler)
Changing that pref disables TLS session tickets and the session cache, meaning that session resumption can't happen. Session resumption is a performance optimization. Without it, each time a client wants to connect to a server, they have to negotiate a full handshake (which can involve many more TCP round-trips). If connections are long-lived, this probably won't be noticeable. However, if the client makes many short connections (and closes them each time), the user could notice a slowdown.
Flags: needinfo?(dkeeler)
Ryan, Karl: can you comment on potential issues on Google's end from comment 21?
Flags: needinfo?(ryan.sleevi)
Flags: needinfo?(kdubost)
I pinged people at Google about the issue.
Flags: needinfo?(kdubost)
I'm not sure what the question is for Google.
Flags: needinfo?(ryan.sleevi)
Patrick, 

could you provide here for the record what would need to be changed on Google servers?
(or at least a summary of the technical issue, and a step by step to reproduce it, so Google engineers can work out eventually a solution).

Thanks.
Flags: needinfo?(clokep)
(In reply to Karl Dubost :karlcow from comment #26)
> could you provide here for the record what would need to be changed on
> Google servers?

I don't think I can offer any useful information here: I don't have any idea how Google has their servers configured (nor do I know much about setting up SSL in general). We've had other issues with the SSL used on the Google Talk servers, however; see bug 1092701.

> (or at least a summary of the technical issue, and a step by step to
> reproduce it, so Google engineers can work out eventually a solution).

STR are (using Instantbird or Thunderbird):
1. Create a Google Talk account.
2. Connect the Google Talk account. (This should succeed, download contacts, etc.)
3. Disconnect the Google Talk account.
4. Attempt to (Re-)Connect the Google Talk account and it gets stuck in "Initializing stream..." (i.e. connection never succeeds).

I've attached NSS logs and our debug logs which might shed some further light on this.

Please let me know if I can provide any other information (or more detailed steps to reproduce) or if someone at Google would like to work directly with me. Thanks for the help so far!
Flags: needinfo?(clokep)
Karl, is there any further information you need from me? Have we had anything from Google engineers? Thanks!
Flags: needinfo?(kdubost)
Patrick. 
No thanks. That was useful.
We rarely get answers from Google Engineers. 
We notify them about the issue.
Best scenario: someone acks and work on it.
Likely scenario: someone acks and give us the issue number at Google.
Most likely scenario: blackhole.
Flags: needinfo?(kdubost)
Comment on attachment 8574990 [details] [diff] [review]
Disable session IDs

I don't claim to have investigated disable_session_identifiers but it seems to me to be very risky to change a setting like this globally when it only is affected by a particular account type. Wouldn't it only make sense to set this when someone has a Google Talk account configured?
I haven't seen any argument why this should block, so removing tb38 blocking.
jcranmer: Not taking this patch would be rather painful for gtalk users, so a review would be appreciated.
I've talked with my colleagues, and none of us can find any evidence to support this as an issue.

That is, we can't find anything to suggest that it's not a bug in your code.

Considering other clients don't have this issue, it may help for you to debug further, or to wireshark. The attached logs are insufficient for providing any useful diagnostic capabilities.
Ryan, I've sent a variety of WireShark captures to David Keeler, but he was unable to find anything wrong with them. I'd be happy to provide them to you as well. (I didn't attach them to this bug since they contain IP addresses, etc.)
Flags: needinfo?(ryan.sleevi)
A wireshark with an SSLKEYLOGFILE emailed is ideal ( https://developer.mozilla.org/en-US/docs/Mozilla/Projects/NSS/Key_Log_Format ), but of course, that will include all of the decrypted contents (e.g. usernames and passwords, not just IP addresses). If you had a testing account you could work with, great.

Alternatively, you could take a stab at debugging it yourself, since the above would help you show the full TLS handshake state machine in Wireshark.

Just from talking with our server folks, we don't believe there's an issue at our end.
Flags: needinfo?(ryan.sleevi)
(In reply to Ryan Sleevi from comment #35)
> Just from talking with our server folks, we don't believe there's an issue
> at our end.

And we may have identified a possibly related issue; it'd be nice to know
1) If the issue still reproduces
2) If the issue reproduces if false start is disabled ( SSL_OptionSet(nss_fd, SSL_ENABLE_FALSE_START, PR_FALSE) )
Flags: needinfo?(clokep)
(In reply to Ryan Sleevi from comment #37)
> (In reply to Ryan Sleevi from comment #35)
> > Just from talking with our server folks, we don't believe there's an issue
> > at our end.
> 
> And we may have identified a possibly related issue; it'd be nice to know
> 1) If the issue still reproduces

Definitely able to still reproduce it! (I've even seen it a few times after using the workaround in comment #0).

> 2) If the issue reproduces if false start is disabled (
> SSL_OptionSet(nss_fd, SSL_ENABLE_FALSE_START, PR_FALSE) )

I toggled the perf security.ssl.enable_false_start to false (it defaulted to true) and I seemed to get an immediate reconnect on one of my accounts. The other account did not reconnect.

The settings I was able to disconnect/reconnect with for a handful of times in a row (on both my accounts) are:
security.ssl.disable_session_identifiers set to true
security.ssl.enable_false_start set to false

Sorry I never got you the wireshark session from comment 35, I was changing jobs and didn't have a development computer set-up. I can provide this pretty easily now if you need it.
Flags: needinfo?(clokep) → needinfo?(ryan.sleevi)
Ryan, is there anything I can do to help with this? Have you been able to narrow this down?
Florian notified me he hasn't been having this issue any longer. I can also no longer reproduce this! I'm assuming there were server side changes made.

Please reopen if you're seeing this issue.
Status: ASSIGNED → RESOLVED
Closed: 9 years ago
Resolution: --- → WORKSFORME
Comment on attachment 8574990 [details] [diff] [review]
Disable session IDs

Clearing review.
Attachment #8574990 - Flags: review?(Pidgeot18)
Flags: needinfo?(ryan.sleevi)
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: