Open Bug 1667047 Opened 4 years ago Updated 5 months ago

Crash in [@ ssl3_SendClientHello | ssl_BeginClientHandshake | ssl_SecureRecv | ssl_Recv | PSMRecv]

Categories

(NSS :: Libraries, defect, P3)

Unspecified
All

Tracking

(Not tracked)

People

(Reporter: gsvelto, Unassigned)

References

Details

(Keywords: crash, csectype-nullptr, Whiteboard: [necko-triaged][necko-monitor])

Crash Data

Crash report: https://crash-stats.mozilla.org/report/index/3e4be348-0a4b-40dc-bdc2-203a40200923

Top 10 frames of crashing thread:

0 nss3.dll ssl3_SendClientHello security/nss/lib/ssl/ssl3con.c:5151
1 nss3.dll ssl_BeginClientHandshake security/nss/lib/ssl/sslcon.c:189
2 nss3.dll ssl_SecureRecv security/nss/lib/ssl/sslsecur.c:810
3 nss3.dll ssl_Recv security/nss/lib/ssl/sslsock.c:3135
4 xul.dll PSMRecv security/manager/ssl/nsNSSIOLayer.cpp:1232
5 xul.dll mozilla::net::nsSocketTransport::IsAlive netwerk/base/nsSocketTransport2.cpp:2714
6 xul.dll mozilla::net::nsHttpConnection::IsAlive netwerk/protocol/http/nsHttpConnection.cpp:1208
7 xul.dll mozilla::net::nsHttpConnection::CanReuse netwerk/protocol/http/nsHttpConnection.cpp:1146
8 xul.dll mozilla::net::nsHttpConnection::CanDirectlyActivate netwerk/protocol/http/nsHttpConnection.cpp:1171
9 xul.dll mozilla::net::nsHttpConnectionMgr::UpdateCoalescingForNewConn netwerk/protocol/http/nsHttpConnectionMgr.cpp:853

This is a NULL-pointer dereference crash, happening on both Linux and Windows.

Dragana, you reviewed bug 1556491 which has the same signature. Is there any relation?

Flags: needinfo?(dd.mozilla)

Thy are not related. Something very strange is happening here, I will take a look.

Assignee: nobody → dd.mozilla
Status: NEW → ASSIGNED
Flags: needinfo?(dd.mozilla)
Severity: -- → S4
Priority: -- → P2
Whiteboard: [necko-triaged]

Hi Thyla,

most probably the reason we are crashing is in necko, but I need help to understand what conditions can lead to this crash.

Is this the same as the analysis from this comment. The crash in that bug is old so I do not have stack trace to look at.
Can you find someone to look at it?

thanks!

Flags: needinfo?(tvandermerwe)

Hi Dragana,

I'm going to ask Kevin to take a look -- he'll be back next week (and I'll be out for a bit).

Thyla

Flags: needinfo?(tvandermerwe) → needinfo?(kjacobs.bugzilla)

To my eyes, this looks like the same issue as bug 1556491 making the analysis in https://bugzilla.mozilla.org/show_bug.cgi?id=1556491#c4 still valid. The higher stack frames differ, but the conditions look the same.

Dragana, are you sure they're not related?

There is one thing I don't understand from Martin's analysis: SSL_ResetHandshake doesn't reset ss->ssl3.cwSpec->macDef. While it will be overwritten by the next handshake, ISTM we have a problem unless we either

  1. Add a check to tolerate a NULL macDef (and don't reset it in SSL_ResetHandshake) to follow the 1.3 case, OR
  2. Reset it to ssl_mac_null in SSL_ResetHandshake so that's never NULL in ssl3_SendClientHello.
Flags: needinfo?(kjacobs.bugzilla) → needinfo?(dd.mozilla)
Assignee: dd.mozilla → nobody
Status: ASSIGNED → NEW
Flags: needinfo?(dd.mozilla)

Probably should re-triage this

Severity: S4 → --
Priority: P2 → --
Severity: -- → S4
Priority: -- → P3

(In reply to Kevin Jacobs [:kjacobs] from comment #5)

To my eyes, this looks like the same issue as bug 1556491 making the analysis in https://bugzilla.mozilla.org/show_bug.cgi?id=1556491#c4 still valid. The higher stack frames differ, but the conditions look the same.

Dragana, are you sure they're not related?

There is one thing I don't understand from Martin's analysis: SSL_ResetHandshake doesn't reset ss->ssl3.cwSpec->macDef. While it will be overwritten by the next handshake, ISTM we have a problem unless we either

  1. Add a check to tolerate a NULL macDef (and don't reset it in SSL_ResetHandshake) to follow the 1.3 case, OR
  2. Reset it to ssl_mac_null in SSL_ResetHandshake so that's never NULL in ssl3_SendClientHello.

Dana, is this something we can fix in NSS?

Flags: needinfo?(dkeeler)

I imagine so. John?

Flags: needinfo?(dkeeler) → needinfo?(jschanck)

Looks like some relevant changes landed in 105 as part of Bug 1772201. I see 4 instances of this crash signature in 105 (with unusual stacks) and none for 106. Are we sure this is still a problem?

Flags: needinfo?(jschanck)

Maybe you're right. All of the 105 instances have tmmon.dll on the stack. It's possible there's nothing we can do here anymore.

Crash Signature: [@ ssl3_SendClientHello | ssl_BeginClientHandshake | ssl_SecureRecv | ssl_Recv | PSMRecv] → [@ ssl3_SendClientHello | ssl_BeginClientHandshake | ssl_SecureRecv | ssl_Recv | PSMRecv] [@ ssl3_SendClientHello | ssl_BeginClientHandshake | ssl_Do1stHandshake | ssl_SecureRecv | ssl_Recv | PSMRecv]

This is still lurking around across all platforms. Can we please re-triage?

Severity: S4 → --
Priority: P3 → --

Not totally certain on the severity, but marking for review. Please let me know if this is more urgent/severe.

Severity: -- → S3
Priority: -- → P2
Whiteboard: [necko-triaged] → [necko-triaged][necko-priority-review]

Only perhaps 1/2 the current instances seem to have tmmon on the stack (tmmon.dll or tmmon64.dll)

The others have near the base of the stack:
mozilla::interceptor::FuncHook<mozilla::interceptor::WindowsDllInterceptor<mozilla::interceptor::VMSharingPolicyShared>, void ()(int, void, void*)>::operator()(int&, void*&, void*&) const
From: toolkit/xre/dllservices/mozglue/nsWindowsDllInterceptor.h:150

For reference, tmmon64.dll is part of Trend Micro. See https://bugs.chromium.org/p/chromium/issues/detail?id=882982
Chrome did block them for a short while in 2019, but no longer: https://source.chromium.org/chromium/chromium/src/+/main:chrome/chrome_elf/third_party_dlls/hardcoded_blocklist.cc

Flags: needinfo?(kershaw)

John, sorry that I am going to ask again: could we try to fix this in NSS?
As per comment#13, this crash is still happening and half of the reports don't have tmmon.dll.

I think bug 1556491 should have already fixed the problem at the necko side, so I am out of ideas here. Could you take a look? Thanks.

Flags: needinfo?(kershaw) → needinfo?(jschanck)

(In reply to christopher.staite from bug 1848013 comment #6)

I have created a server so you can reproduce the crash, however I believe this is a separate issue and should have a separate bug report.

Steps to reproduce crash:
Set HTTP/HTTPS proxy to h2.menlotest.com port 3128
Navigate to https://h2.menlotest.com/websocket-test/
Note that Firefox crashes immediately

Interestingly, if the page is loaded on startup it doesn't crash, only if you navigate to it.

See Also: → 1848013

Following Christopher's instructions, I was able to reproduce the crash in a debug build. I hit an assertion failure in ssl_SetClientHelloSpecVersion.

[Parent 1261615: Socket Thread]: V/nsHttp proxy CONNECT succeeded! endtoendssl=1 onlyconnect=0
[Parent 1261615: Socket Thread]: V/nsHttp TlsHandshaker::InitSSLParams [mOwner=7fe05ad7da00] connectingToProxy=0
[Parent 1261615: Socket Thread]: V/nsHttp nsHttpConnection::GetTLSSocketControl trans=7fe052c4d600 socket=7fe059345d90
[Parent 1261615: Socket Thread]: V/nsHttp nsHttpConnection::SetupSSL Allow SPDY NPN selection
[Parent 1261615: Socket Thread]: I/nsHttp Http2Session::ALPNCallback sslsocketcontrol=7fe052e3aa00
[Parent 1261615: Socket Thread]: I/nsHttp Http2Session::ALPNCallback version=304
[Parent 1261615: Socket Thread]: V/nsHttp TlsHandshaker::SetupNPNList 7fe05ad7da00 0
[Parent 1261615: Socket Thread]: V/nsHttp InitSSLParams Setting up SPDY Negotiation OK mOwner=7fe05ad7da00
[Parent 1261615: Socket Thread]: V/nsHttp InitSSLParams [rv=0]
[Parent 1261615: Socket Thread]: V/nsHttp OutputStreamTunnel::AsyncWait [this=7fe056f76610]
[Parent 1261615: Socket Thread]: I/nsHttp Http2Session::TransactionHasDataToWrite 7fe052c4cc00 stream=7fe059345c40 ID=0x11
[Parent 1261615: Socket Thread]: V/nsHttp nsHttpConnection::ResumeSend [this=7fe0538b4100]
[Parent 1261615: Socket Thread]: V/nsHttp nsHttpConnection::ForceSend [this=7fe0538b4100]
[Parent 1261615: Socket Thread]: V/nsHttp resetting transaction's response head
[Parent 1261615: Socket Thread]: V/nsHttp nsHttpResponseHead::Reset
[Parent 1261615: Socket Thread]: V/nsHttp nsHttpConnection::OnSocketReadable 7fe05ad7da00 trans->ws rv=0 n=0 socketin=0
[Parent 1261615: Socket Thread]: V/nsHttp nsHttpConnection::GetTLSSocketControl trans=7fe052c4d600 socket=7fe059345d90
[Parent 1261615: Socket Thread]: E/nsHttp nsHttpTransaction::OnSocketStatus [this=7fe052c4d600 status=4b000c progress=0]
[Parent 1261615: Socket Thread]: V/nsHttp TlsHandshaker::EnsureNPNComplete [mOwner=7fe05ad7da00] drive TLS handshake
[Parent 1261615: Socket Thread]: V/nsHttp nsHttpConnection::OnSocketReadable 7fe05ad7da00 return due to inactive tunnel setup but incomplete NPN state
[Parent 1261615: Socket Thread]: I/nsHttp Http2StreamBase::WriteSegments 0
[Parent 1261615: Socket Thread]: V/nsHttp nsHttpConnection::OnSocketReadable 7fe0538b4100 trans->ws rv=0 n=0 socketin=0
[Parent 1261615: Socket Thread]: I/nsHttp Http2Session::WriteSegments 7fe052c4cc00 InternalState 1
[Parent 1261615: Socket Thread]: V/nsHttp nsHttpConnection::ResumeSend [this=7fe0538b4100]
1261615: SSL[1368674848]: sending client-hello
1261615: SSL[1368674848]: using external token
1261615: SSL3[1368674848]: send initial ClientHello handshake
Assertion failure: spec->cipherDef->cipher == cipher_null, at /home/john/repos/mozilla-unified-2/security/nss/lib/ssl/ssl3con.c:5010

The log shows a call to TlsHandshaker::InitSSLParams. Stepping through in a debugger, I see that call has proxyStartSSL set to true, and this ultimately causes SSL_ResetHandshake to be called. This puts us on the path to sending a client hello, but SSL_ResetHandshake does not reset ss->ssl3.cwSpec (spec in the assertion failure).

I'm not sure what's supposed to happen here. Either we shouldn't be calling SSL_ResetHandshake, or SSL_ResetHandshake isn't doing its job.

Dennis, any thoughts?

Flags: needinfo?(jschanck) → needinfo?(djackson)

Hey Anna, it seems this issue is becoming more problematic. Given that Dennis will realistically not be able to take a look in the near term future, can you please try to write a patch based on Comment 17?

Flags: needinfo?(nkulatova)
See Also: → 1869579

FYI
I am working a patch in bug 1848013, which should be able to fix this crash. Not sure if we want to do something in NSS to avoid this crash. If not, we can close this bug until 1848013 is fixed.

We shouldn't be crashing here, even if the caller is making a mistake, so we'll land a patch here eventually. I will drop the priority, however.

Priority: P2 → P3

I agree with John, we should work on the patch to check the behaviour.

For now let's see if we have any more crashes with https://phabricator.services.mozilla.com/D196271 landed :)

Flags: needinfo?(nkulatova)

The fix is in 115.8.0esr
I think the crash volume should start coming down. A check in NSS to avoid crashing wouldn't hurt.

Assignee: nobody → nobody
Component: Networking: HTTP → Libraries
Product: Core → NSS
Whiteboard: [necko-triaged][necko-priority-review] → [necko-triaged][necko-monitor]

Agree we should be erroring rather than crashing in this circumstance but its good to see the necko patch has largely addressed the crash. It will need some investigation of the handshake reset code to establish the correct behaviour and add the defence in depth in NSS.

Flags: needinfo?(djackson)
You need to log in before you can comment on or make changes to this bug.