Crash in [@ NSSSocketControl::StartTLS | NS_InvokeByIndex] if you have a chat account
Categories
(Chat Core :: XMPP, defect, P1)
Tracking
(thunderbird_esr115 unaffected, thunderbird118 wontfix, thunderbird122 wontfix, thunderbird127 fixed)
Tracking | Status | |
---|---|---|
thunderbird_esr115 | --- | unaffected |
thunderbird118 | --- | wontfix |
thunderbird122 | --- | wontfix |
thunderbird127 | --- | fixed |
People
(Reporter: wsmwk, Assigned: valentin)
References
(Depends on 1 open bug, Regression)
Details
(Keywords: crash, regression)
Crash Data
Attachments
(3 files)
#2 crash for Thunderbird 118.0b1
Crash report: https://crash-stats.mozilla.org/report/index/d71470d0-1d0e-4df5-9829-d864f0230901
User comment "quit before completing typing password" after updating
MOZ_CRASH Reason: MOZ_DIAGNOSTIC_ASSERT(mOwningThread == PR_GetCurrentThread())
Top 10 frames of crashing thread:
0 XUL NSSSocketControl::StartTLS security/manager/ssl/NSSSocketControl.cpp:288
1 XUL NS_InvokeByIndex
2 XUL CallMethodHelper::Invoke js/xpconnect/src/XPCWrappedNative.cpp:1627
2 XUL CallMethodHelper::Call js/xpconnect/src/XPCWrappedNative.cpp:1180
2 XUL XPCWrappedNative::CallMethod js/xpconnect/src/XPCWrappedNative.cpp:1126
3 XUL XPC_WN_CallMethod js/xpconnect/src/XPCWrappedNativeJSOps.cpp:966
4 XUL CallJSNative js/src/vm/Interpreter.cpp:486
4 XUL js::InternalCallOrConstruct js/src/vm/Interpreter.cpp:580
5 XUL InternalCall js/src/vm/Interpreter.cpp:647
5 XUL js::CallFromStack js/src/vm/Interpreter.cpp:652
First crash NSSSocketControl::StartTLS | NS_InvokeByIndex
bp-2d0c8cf3-9560-4c7c-a02d-932d20230814 118.0a1 buildid 20230810105704
First crash NSSSocketControl::StartTLS | XPTC__InvokebyIndex bp-e75bc93f-ba0b-416f-bc33-a30f80230831 118.0b1 buildid 20230828190302
Comment 1•1 year ago
|
||
Unfortunately not much to go on from the stack.
Reporter | ||
Updated•1 year ago
|
Comment 2•1 year ago
|
||
The reason for the crash is clear.
We're running into a failure of a self-test. We have code running on a thread that doesn't own the respected connection object.
This is an inallowed inconsistency, as per strict rules, enforced by the Mozilla core code.
We had such issues in the past, and we thought we had fixed them, but apparently we haven't completely.
Updated•1 year ago
|
Comment 3•1 year ago
•
|
||
I case you need help in reproducing this, just use the following build of the sieve webextension
https://dev.azure.com/thsmi/1164eec3-b870-4973-8110-5e275d208aac/_apis/build/builds/19977/artifacts?artifactName=XPI%20-%20Thunderbird%20WebExtension&api-version=7.1&%24format=zip
Whenever you connect to a manage sieve server the thunderbird crashes reproducible when start TLS is invoked.
As it crashes before authenticating you can use the host "mx2fea.netcup.net" for testing.
And the crash signature is identical with the one in this ticket
Comment 4•11 months ago
•
|
||
I got to this crash bug by a crash report that happened when trying to connect an XMPP account (after entering its password...). The crash for that case would be at https://searchfox.org/comm-central/rev/8f7c552caee92e39beb2f2d5fb848537c9311f28/chat/modules/socket.sys.mjs#263-267 and is likely due to the socket changes we battled earlier for email, and chat has seemed mostly fine. I guess we should convert the chat socket to a TCPSocket so we have the abstraction that handles all the socket thread stuff for us.
So this would then be caused by bug 1847260
Comment 5•11 months ago
|
||
...and I think the sieve extension might have the same issue: https://github.com/thsmi/sieve/blob/8a1af600e42bff2440e959a268afcab3385418a1/src/wx/api/sieve/SieveSocketApi.js#L238-L253
Comment 6•11 months ago
|
||
(In reply to Martin Giger [:freaktechnik] from comment #4)
I guess we should convert the chat socket to a TCPSocket so we have the abstraction that handles all the socket thread stuff for us.
I tried to convert our IM socket.sys.mjs to use TCPSocket
, however that's currently not directly possible since the socket needs the more complicated https://searchfox.org/mozilla-central/rev/42acdc9cd5ae89222bdceeeaed7bacac755be48f/netwerk/base/nsISocketTransportService.idl#155 to support SRV records in XMPP, where we check the certificate of the domain the SRV record is for, while opening a socket to a different host.
Not sure what exactly our best option forward here is, we'd probably need either a custom native entry point that lets us do this in c-c, or expand the TCPSocket interface to support this use case.
Comment 7•11 months ago
|
||
Exciting!
I read bug 787369. Bug 787369 comment 56 says SRV would not be used if the pref is set. And if the pref is set we don't use nsIRoutedSocketTransportService atm.
It seems like a pretty odd usecase, and especially if setting a pref for server/host can work around it, we could potentially just drop that use (which must be very small by any metric.)
Reporter | ||
Comment 8•10 months ago
|
||
#5 crash for 121 beta. Although based on the graph it has a strange crash rate cycle.
Comment 9•10 months ago
•
|
||
(In reply to Magnus Melin [:mkmelin] from comment #7)
The actual usage numbers must be small as well. There's like 10k xmpp users in total, so my spitball estimate is the users of this special feature are likely under 3 digits.
Reporter | ||
Comment 11•9 months ago
|
||
The oldest buildid I find is 20230810105704 which is 118.0a1 bp-2d0c8cf3-9560-4c7c-a02d-932d20230814
Updated•9 months ago
|
Updated•9 months ago
|
Updated•8 months ago
|
Comment 12•7 months ago
|
||
(In reply to Magnus Melin [:mkmelin] from comment #7)
Exciting!
I read bug 787369. Bug 787369 comment 56 says SRV would not be used if the pref is set. And if the pref is set we don't use nsIRoutedSocketTransportService atm.
Magnus, which pref are you referring to?
It seems like a pretty odd usecase, and especially if setting a pref for server/host can work around it, we could potentially just drop that use (which must be very small by any metric.)
What do you want to drop? Support for XMPP ?
Comment 13•7 months ago
|
||
Are you referring to chat.dns.srv.disable ?
Is it reasonable to disable that pref, when there are servers that might need that pref?
For example, I have configured those prefs for my own xmpp server:
$ dig +short srv _xmpp-server._tcp.kuix.de.
0 5 5269 kuix.de.
$ dig +short srv _xmpp-client._tcp.kuix.de.
0 5 5222 kuix.de.
Comment 14•7 months ago
|
||
I also crash with chat.dns.srv.disable set to true.
Comment 15•7 months ago
|
||
Right, chat.dns.srv.disable.
Yes, you'd also crash without it, as is. But AIUI, dropping that (and with that, createRoutedTransport usage) we could switch to TCPSocket for chat (instead of homegrown socket.sys.mjs). Martin apparently had a WIP patch for that.
Again, AIUI, what would generally no longer work is autoconfigure XMPP using SRV records where there is mismatch on the certificate (if user specified hostname example.com and SRV says im.example.com, certificate should be checked against example.com)
Comment 16•7 months ago
|
||
Thanks for the explanation.
socket.sys.mjs is code in the chat module, so I'm reassigning the component.
I think this either needs to be fixed in time for the summer 2024 release, in a way that doesn't degrade today's functionality,
or support for XMPP should be disabled by default.
Comment 17•7 months ago
|
||
We can't get around that calling tlsSocketControl.StartTLS()
isn't going to work anymore.
Somewhat ugly solution: you could have TCPSocket try connecting with StartTLS, and when getting the security error event about wrong name on the certificate, check if it's the one for the user hostname (as per rfc3920), and if so add an exception for it like we do for self signed certs. Then reconnect.
Comment 18•6 months ago
|
||
(In reply to Magnus Melin [:mkmelin] from comment #15)
Again, AIUI, what would generally no longer work is autoconfigure XMPP using SRV records where there is mismatch on the certificate (if user specified hostname example.com and SRV says im.example.com, certificate should be checked against example.com)
I'm fairly certain this is a common setup for XMPP, unfortunately. (And why SRV records are generally confusing and poorly supported in many libraries.)
Is there an upstream bug for supporting a separate domain to check the certificate against for TCPSocket
?
Comment 19•6 months ago
|
||
Filed bug 1889865.
Updated•6 months ago
|
Assignee | ||
Comment 20•6 months ago
|
||
Thunderbird uses nsITLSSocketControl.startTLS to drive the XMPP
connection, but since we started enforcing proper thread safety in
bug 1847260, it's asserting when called on the main thread.
This patch marks StartTLS and proxyStartTLS as [noscript] and adds
asyncStartTLS. This new method dispatches a runnable to the socket thread
to call startTLS, then posts the result back to the original (main)
thread.
Updated•6 months ago
|
Assignee | ||
Comment 21•6 months ago
|
||
Hi folks,
I posted a patch to get around the startTLS crash. It's still untested, but let me know how it works.
startTLS() {
this.transport.tlsSocketControl
.QueryInterface(Ci.nsITLSSocketControl)
.StartTLS();
},
should become something like this:
startTLS() {
this.transport.tlsSocketControl
.QueryInterface(Ci.nsITLSSocketControl)
.asyncStartTLS((rv) => { console.log(rv) });
},
Otherwise you can turn startTLS into a promise instead.
startTLS() {
return new Promise(resolve => {
this.transport.tlsSocketControl
.QueryInterface(Ci.nsITLSSocketControl)
.asyncStartTLS((rv) => { resolve(rv) });
}
},
Same here
Comment 22•6 months ago
|
||
Comment 23•6 months ago
|
||
Thanks, with your patch plus the WIP I just posted startTLS for XMPP indeed seems to work fine.
Assignee | ||
Comment 24•6 months ago
|
||
(In reply to Martin Giger [:freaktechnik] from comment #23)
Thanks, with your patch plus the WIP I just posted startTLS for XMPP indeed seems to work fine.
Excellent. Thank you for confirming it works.
FYI - I addressed the review comments and now asyncStartTLS returns a promise which should simplify your patch a bit.
Comment 25•6 months ago
|
||
Updated•6 months ago
|
Comment 26•6 months ago
|
||
bugherder |
Comment 27•6 months ago
|
||
Thunderbird side crash fix not landed yet.
Updated•6 months ago
|
Updated•6 months ago
|
Assignee | ||
Comment 28•6 months ago
|
||
Updated•6 months ago
|
Comment 29•6 months ago
|
||
Updated•6 months ago
|
Comment 30•6 months ago
|
||
bugherder |
Comment 31•6 months ago
|
||
Pushed by brendan@thunderbird.net:
https://hg.mozilla.org/comm-central/rev/bc5ccf2686da
Use asyncStartTLS in chat socket. r=clokep,mkmelin
Updated•6 months ago
|
Reporter | ||
Comment 32•3 months ago
|
||
AFAICT Crashes are totally gone after the 127 checkin.
Description
•