Closed Bug 1851294 Opened 1 year ago Closed 6 months ago

Crash in [@ NSSSocketControl::StartTLS | NS_InvokeByIndex] if you have a chat account

Categories

(Chat Core :: XMPP, defect, P1)

Thunderbird 118
Unspecified
All

Tracking

(thunderbird_esr115 unaffected, thunderbird118 wontfix, thunderbird122 wontfix, thunderbird127 fixed)

VERIFIED FIXED
127 Branch
Tracking Status
thunderbird_esr115 --- unaffected
thunderbird118 --- wontfix
thunderbird122 --- wontfix
thunderbird127 --- fixed

People

(Reporter: wsmwk, Assigned: valentin)

References

(Depends on 1 open bug, Regression)

Details

(Keywords: crash, regression)

Crash Data

Attachments

(3 files)

#2 crash for Thunderbird 118.0b1

Crash report: https://crash-stats.mozilla.org/report/index/d71470d0-1d0e-4df5-9829-d864f0230901

User comment "quit before completing typing password" after updating

MOZ_CRASH Reason: MOZ_DIAGNOSTIC_ASSERT(mOwningThread == PR_GetCurrentThread())

Top 10 frames of crashing thread:

0  XUL  NSSSocketControl::StartTLS  security/manager/ssl/NSSSocketControl.cpp:288
1  XUL  NS_InvokeByIndex  
2  XUL  CallMethodHelper::Invoke  js/xpconnect/src/XPCWrappedNative.cpp:1627
2  XUL  CallMethodHelper::Call  js/xpconnect/src/XPCWrappedNative.cpp:1180
2  XUL  XPCWrappedNative::CallMethod  js/xpconnect/src/XPCWrappedNative.cpp:1126
3  XUL  XPC_WN_CallMethod  js/xpconnect/src/XPCWrappedNativeJSOps.cpp:966
4  XUL  CallJSNative  js/src/vm/Interpreter.cpp:486
4  XUL  js::InternalCallOrConstruct  js/src/vm/Interpreter.cpp:580
5  XUL  InternalCall  js/src/vm/Interpreter.cpp:647
5  XUL  js::CallFromStack  js/src/vm/Interpreter.cpp:652

First crash NSSSocketControl::StartTLS | NS_InvokeByIndex
bp-2d0c8cf3-9560-4c7c-a02d-932d20230814 118.0a1 buildid 20230810105704

First crash NSSSocketControl::StartTLS | XPTC__InvokebyIndex bp-e75bc93f-ba0b-416f-bc33-a30f80230831 118.0b1 buildid 20230828190302

Flags: needinfo?(mkmelin+mozilla)

Unfortunately not much to go on from the stack.

Flags: needinfo?(mkmelin+mozilla)
Severity: S2 → S3

The reason for the crash is clear.

We're running into a failure of a self-test. We have code running on a thread that doesn't own the respected connection object.

This is an inallowed inconsistency, as per strict rules, enforced by the Mozilla core code.

We had such issues in the past, and we thought we had fixed them, but apparently we haven't completely.

Priority: -- → P1

I case you need help in reproducing this, just use the following build of the sieve webextension
https://dev.azure.com/thsmi/1164eec3-b870-4973-8110-5e275d208aac/_apis/build/builds/19977/artifacts?artifactName=XPI%20-%20Thunderbird%20WebExtension&api-version=7.1&%24format=zip

Whenever you connect to a manage sieve server the thunderbird crashes reproducible when start TLS is invoked.
As it crashes before authenticating you can use the host "mx2fea.netcup.net" for testing.

And the crash signature is identical with the one in this ticket

I got to this crash bug by a crash report that happened when trying to connect an XMPP account (after entering its password...). The crash for that case would be at https://searchfox.org/comm-central/rev/8f7c552caee92e39beb2f2d5fb848537c9311f28/chat/modules/socket.sys.mjs#263-267 and is likely due to the socket changes we battled earlier for email, and chat has seemed mostly fine. I guess we should convert the chat socket to a TCPSocket so we have the abstraction that handles all the socket thread stuff for us.
So this would then be caused by bug 1847260

(In reply to Martin Giger [:freaktechnik] from comment #4)

I guess we should convert the chat socket to a TCPSocket so we have the abstraction that handles all the socket thread stuff for us.

I tried to convert our IM socket.sys.mjs to use TCPSocket, however that's currently not directly possible since the socket needs the more complicated https://searchfox.org/mozilla-central/rev/42acdc9cd5ae89222bdceeeaed7bacac755be48f/netwerk/base/nsISocketTransportService.idl#155 to support SRV records in XMPP, where we check the certificate of the domain the SRV record is for, while opening a socket to a different host.

Not sure what exactly our best option forward here is, we'd probably need either a custom native entry point that lets us do this in c-c, or expand the TCPSocket interface to support this use case.

Exciting!
I read bug 787369. Bug 787369 comment 56 says SRV would not be used if the pref is set. And if the pref is set we don't use nsIRoutedSocketTransportService atm.
It seems like a pretty odd usecase, and especially if setting a pref for server/host can work around it, we could potentially just drop that use (which must be very small by any metric.)

#5 crash for 121 beta. Although based on the graph it has a strange crash rate cycle.

See Also: → 1869230

(In reply to Magnus Melin [:mkmelin] from comment #7)
The actual usage numbers must be small as well. There's like 10k xmpp users in total, so my spitball estimate is the users of this special feature are likely under 3 digits.

Duplicate of this bug: 1869230

The oldest buildid I find is 20230810105704 which is 118.0a1 bp-2d0c8cf3-9560-4c7c-a02d-932d20230814

Crash Signature: [@ NSSSocketControl::StartTLS | NS_InvokeByIndex] [@ NSSSocketControl::StartTLS | XPTC__InvokebyIndex] → [@ NSSSocketControl::StartTLS | NS_InvokeByIndex] [@ NSSSocketControl::StartTLS | XPTC__InvokebyIndex] [@ NSSSocketControl::StartTLS | _NS_InvokeByIndex ]
Summary: Crash in [@ NSSSocketControl::StartTLS | NS_InvokeByIndex] → Crash in [@ NSSSocketControl::StartTLS | NS_InvokeByIndex] in you have a chat account
Summary: Crash in [@ NSSSocketControl::StartTLS | NS_InvokeByIndex] in you have a chat account → Crash in [@ NSSSocketControl::StartTLS | NS_InvokeByIndex] if you have a chat account

(In reply to Magnus Melin [:mkmelin] from comment #7)

Exciting!
I read bug 787369. Bug 787369 comment 56 says SRV would not be used if the pref is set. And if the pref is set we don't use nsIRoutedSocketTransportService atm.

Magnus, which pref are you referring to?

It seems like a pretty odd usecase, and especially if setting a pref for server/host can work around it, we could potentially just drop that use (which must be very small by any metric.)

What do you want to drop? Support for XMPP ?

Flags: needinfo?(mkmelin+mozilla)

Are you referring to chat.dns.srv.disable ?

Is it reasonable to disable that pref, when there are servers that might need that pref?

For example, I have configured those prefs for my own xmpp server:

$ dig +short srv _xmpp-server._tcp.kuix.de.
0 5 5269 kuix.de.
$ dig +short srv _xmpp-client._tcp.kuix.de.
0 5 5222 kuix.de.

I also crash with chat.dns.srv.disable set to true.

Right, chat.dns.srv.disable.
Yes, you'd also crash without it, as is. But AIUI, dropping that (and with that, createRoutedTransport usage) we could switch to TCPSocket for chat (instead of homegrown socket.sys.mjs). Martin apparently had a WIP patch for that.

Again, AIUI, what would generally no longer work is autoconfigure XMPP using SRV records where there is mismatch on the certificate (if user specified hostname example.com and SRV says im.example.com, certificate should be checked against example.com)

Flags: needinfo?(mkmelin+mozilla)

Thanks for the explanation.

socket.sys.mjs is code in the chat module, so I'm reassigning the component.

I think this either needs to be fixed in time for the summer 2024 release, in a way that doesn't degrade today's functionality,
or support for XMPP should be disabled by default.

Component: Security → XMPP
Product: Thunderbird → Chat Core

We can't get around that calling tlsSocketControl.StartTLS() isn't going to work anymore.

Somewhat ugly solution: you could have TCPSocket try connecting with StartTLS, and when getting the security error event about wrong name on the certificate, check if it's the one for the user hostname (as per rfc3920), and if so add an exception for it like we do for self signed certs. Then reconnect.

(In reply to Magnus Melin [:mkmelin] from comment #15)

Again, AIUI, what would generally no longer work is autoconfigure XMPP using SRV records where there is mismatch on the certificate (if user specified hostname example.com and SRV says im.example.com, certificate should be checked against example.com)

I'm fairly certain this is a common setup for XMPP, unfortunately. (And why SRV records are generally confusing and poorly supported in many libraries.)

Is there an upstream bug for supporting a separate domain to check the certificate against for TCPSocket?

Depends on: 1889865
Severity: S3 → S1

Thunderbird uses nsITLSSocketControl.startTLS to drive the XMPP
connection, but since we started enforcing proper thread safety in
bug 1847260, it's asserting when called on the main thread.

This patch marks StartTLS and proxyStartTLS as [noscript] and adds
asyncStartTLS. This new method dispatches a runnable to the socket thread
to call startTLS, then posts the result back to the original (main)
thread.

Assignee: nobody → valentin.gosu
Status: NEW → ASSIGNED

Hi folks,

I posted a patch to get around the startTLS crash. It's still untested, but let me know how it works.

https://searchfox.org/comm-central/rev/677e9ab6d1238f8d7006038cba593148fe78f8ff/chat/modules/socket.sys.mjs#263-267

startTLS() {
  this.transport.tlsSocketControl
    .QueryInterface(Ci.nsITLSSocketControl)
    .StartTLS();
},

should become something like this:

startTLS() {
  this.transport.tlsSocketControl
    .QueryInterface(Ci.nsITLSSocketControl)
    .asyncStartTLS((rv) => { console.log(rv) });
},

Otherwise you can turn startTLS into a promise instead.

startTLS() {
  return new Promise(resolve => {
    this.transport.tlsSocketControl
    .QueryInterface(Ci.nsITLSSocketControl)
    .asyncStartTLS((rv) => { resolve(rv) });
  }
},

Same here

Thanks, with your patch plus the WIP I just posted startTLS for XMPP indeed seems to work fine.

(In reply to Martin Giger [:freaktechnik] from comment #23)

Thanks, with your patch plus the WIP I just posted startTLS for XMPP indeed seems to work fine.

Excellent. Thank you for confirming it works.
FYI - I addressed the review comments and now asyncStartTLS returns a promise which should simplify your patch a bit.

Pushed by valentin.gosu@gmail.com: https://hg.mozilla.org/integration/autoland/rev/78c2695361e6 Add nsITLSSocketControl.asyncStartTLS r=jschanck,necko-reviewers,kershaw
Attachment #9397096 - Attachment description: WIP: Bug 1851294 - Use asyncStartTLS in chat socket. → Bug 1851294 - Use asyncStartTLS in chat socket. r=clokep,#thunderbird-reviewers
Status: ASSIGNED → RESOLVED
Closed: 6 months ago
Resolution: --- → FIXED
Target Milestone: --- → 127 Branch

Thunderbird side crash fix not landed yet.

Status: RESOLVED → REOPENED
Resolution: FIXED → ---
See Also: 1869230
Pushed by valentin.gosu@gmail.com: https://hg.mozilla.org/integration/autoland/rev/5a83d0473294 Use nsMainThreadPtrHolder<Promise> to avoid assertion r=necko-reviewers,kershaw

Pushed by brendan@thunderbird.net:
https://hg.mozilla.org/comm-central/rev/bc5ccf2686da
Use asyncStartTLS in chat socket. r=clokep,mkmelin

Status: REOPENED → RESOLVED
Closed: 6 months ago6 months ago
Keywords: leave-open
Resolution: --- → FIXED

AFAICT Crashes are totally gone after the 127 checkin.

Status: RESOLVED → VERIFIED
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: