Assertion failure: sid->cached == in_client_cache, at /builds/worker/workspace/build/src/security/nss/lib/ssl/sslnonce.c:1221
Categories
(NSS :: Libraries, defect, P2)
Tracking
(firefox-esr68- wontfix, firefox66 wontfix, firefox67 wontfix, firefox68 wontfix, firefox69 wontfix, firefox70 wontfix, firefox71 wontfix, firefox75 wontfix, firefox76 wontfix, firefox77 fixed)
People
(Reporter: bc, Assigned: jcj)
References
()
Details
(Keywords: assertion)
Attachments
(1 file)
Bughunter found this on Linux/Windows nightlies. I reproduced on Beta/Nightly Linux. Opt Nightly didn't crash fwiw.
s-s just in case.
Beta = NSS 3.42.1
Nightly = NSS 3.43 Beta
Assertion failure: sid->cached == in_client_cache, at /builds/worker/workspace/build/src/security/nss/lib/ssl/sslnonce.c:1221
Redirecting call to abort() to mozalloc_abort
Hit MOZ_CRASH() at /builds/worker/workspace/build/src/memory/mozalloc/mozalloc_abort.cpp:33
Program /home/bclary/mozilla/builds/nightly/mozilla/firefox-debug/dist/bin/firefox (pid = 32209) received signal 11.
Stack:
#01: WasmTrapHandler(int, siginfo*, void*) (/home/bclary/mozilla/builds/nightly/mozilla/firefox-debug/dist/bin/libxul.so)
#02: __restore_rt (sigaction.c:?)
#03: mozalloc_abort (/home/bclary/mozilla/builds/nightly/mozilla/firefox-debug/dist/bin/firefox)
#04: ??? (/home/bclary/mozilla/builds/nightly/mozilla/firefox-debug/dist/bin/firefox)
#05: _PR_MapOptionName (/home/bclary/mozilla/builds/nightly/mozilla/firefox-debug/dist/bin/libnspr4.so)
#06: ssl3_SetSIDSessionTicket (/home/bclary/mozilla/builds/nightly/mozilla/firefox-debug/dist/bin/libssl3.so)
#07: ssl3_FinishHandshake (/home/bclary/mozilla/builds/nightly/mozilla/firefox-debug/dist/bin/libssl3.so)
#08: ssl3_HandlePostHelloHandshakeMessage (/home/bclary/mozilla/builds/nightly/mozilla/firefox-debug/dist/bin/libssl3.so)
#09: ssl3_HandleHandshakeMessage (/home/bclary/mozilla/builds/nightly/mozilla/firefox-debug/dist/bin/libssl3.so)
#10: ssl3_HandleNonApplicationData (/home/bclary/mozilla/builds/nightly/mozilla/firefox-debug/dist/bin/libssl3.so)
#11: ssl3_HandleRecord (/home/bclary/mozilla/builds/nightly/mozilla/firefox-debug/dist/bin/libssl3.so)
#12: ssl3_GatherCompleteHandshake (/home/bclary/mozilla/builds/nightly/mozilla/firefox-debug/dist/bin/libssl3.so)
#13: SSL_ForceHandshake (/home/bclary/mozilla/builds/nightly/mozilla/firefox-debug/dist/bin/libssl3.so)
#14: nsNSSSocketInfo::DriveHandshake() (/home/bclary/mozilla/builds/nightly/mozilla/firefox-debug/dist/bin/libxul.so)
#15: mozilla::net::nsHttpConnection::EnsureNPNComplete(nsresult&, unsigned int&) (/home/bclary/mozilla/builds/nightly/mozilla/firefox-debug/dist/bin/libxul.so)
#16: mozilla::net::nsHttpConnection::OnSocketWritable() (/home/bclary/mozilla/builds/nightly/mozilla/firefox-debug/dist/bin/libxul.so)
#17: mozilla::net::nsHttpConnection::OnOutputStreamReady(nsIAsyncOutputStream*) (/home/bclary/mozilla/builds/nightly/mozilla/firefox-debug/dist/bin/libxul.so)
#18: non-virtual thunk to mozilla::net::nsHttpConnection::OnOutputStreamReady(nsIAsyncOutputStream*) (/home/bclary/mozilla/builds/nightly/mozilla/firefox-debug/dist/bin/libxul.so)
#19: mozilla::net::nsSocketOutputStream::OnSocketReady(nsresult) (/home/bclary/mozilla/builds/nightly/mozilla/firefox-debug/dist/bin/libxul.so)
#20: mozilla::net::nsSocketTransport::OnSocketReady(PRFileDesc*, short) (/home/bclary/mozilla/builds/nightly/mozilla/firefox-debug/dist/bin/libxul.so)
#21: mozilla::net::nsSocketTransportService::DoPollIteration(mozilla::BaseTimeDuration<mozilla::TimeDurationValueCalculator>*) (/home/bclary/mozilla/builds/nightly/mozilla/firefox-debug/dist/bin/libxul.so)
#22: mozilla::net::nsSocketTransportService::Run() (/home/bclary/mozilla/builds/nightly/mozilla/firefox-debug/dist/bin/libxul.so)
#23: non-virtual thunk to mozilla::net::nsSocketTransportService::Run() (/home/bclary/mozilla/builds/nightly/mozilla/firefox-debug/dist/bin/libxul.so)
#24: nsThread::ProcessNextEvent(bool, bool*) (/home/bclary/mozilla/builds/nightly/mozilla/firefox-debug/dist/bin/libxul.so)
Comment 1•6 years ago
|
||
:mt, we weren't sure how to rate this -- is this just a DoS crash, or is the assert protecting some more complex state?
Comment 2•6 years ago
|
||
Let's be cautious. This indicates that there is a logic error that might result in completely different code being hit. There's a good amount of digging required to understand what is going on here. It might be related to Bug 1489945, but no telling yet.
Assignee | ||
Updated•6 years ago
|
Updated•6 years ago
|
Comment 4•6 years ago
|
||
Following up in email with jcj.
Comment 5•6 years ago
|
||
:mt, do you still think this might be related to bug 1489945?
Assignee | ||
Comment 6•6 years ago
|
||
I appreciate the abundance of caution here.
The assertion failure here is protecting the write lock on the session cache, in the event the cache appears of a different type than expected. Given the timing of when this bug appeared, in nightly, my immediate guess is that the cache might be due to Bug 1399439's addition of in_external_cache
as a type, and maybe this assertion should permit that style of caching -- maybe. I haven't dug into the implications of that.
So down this path we have a server sending us a new session ticket, and us attempting to cache it. We have apparently already gone down this road before, otherwise sid->u.ssl3.lock
would have been unset, so we received a duplicate session ticket. For TLS 1.2, when we construct sid->u.ssl3.lock
we set the cache to be in_client_cache
. However, for TLS 1.3 the cache can be set to in_external_cache
while setting up the session ticket. The site in the bug negotiates TLS 1.2 (or 1.1), and besides which I don't know how a downgrade would happen here, or whether the session ID would remain stable through one. I'm guessing not, so that's probably a false start.
So that leaves us with asking what other paths could change the cache from in_client_cache
to another option, and what they may portend -- whether there was some other source of corruption that we're seeing as the assert.
- The session ticket could have expired.
ssl_LookupSID
will return an expired session token, withinvalid_cache
. This seems quite possible. - The session cache could have been cleared by the user while the connection was in progress (
SSL_ClearSessionCache
). There's a lock on it, but the lock won't stop all the caches from being set as invalid. - Maybe it moved to the external token, but I don't see how that would have happened.
All-in-all, once a session ID has been looked up LookupSID
it's up to the caller to ensure it's valid. It doesn't look like we ever really do that in a thorough way though. So I think this is just a logic error, not worthy of sec-high.
Perhaps this should instead lead to an audit of the correctness of sid
types and mark this down in severity?
Comment 7•6 years ago
|
||
The only thing that makes me hesitate on a downgrade here is that, while this didn't cause a crash and the immediate code just cleanly exits, we're into uncharted (or XKCD 2200) territory. This is some of the most complex code we have, and while JC has taken the first steps, this will take some serious digging to understand the issue properly.
Updated•6 years ago
|
Comment 8•6 years ago
|
||
This is old enough that we need to make some progress. Flagging Thyla for prioritization.
Comment 9•6 years ago
|
||
The NSS team is going to look at this next week.
Comment 10•6 years ago
|
||
Here's a trace log of the issue. The first number in brackets is the [log line number], followed by a [socket descriptor].
... Socket 36 receives a SH with a new SID, 0x7432a5c0 with a session ticket. This is cached after FIN:
[11] 29813: SSL3[173350336]: handle server_hello handshake
[13] SSL3[173350336]: !sid_match against 0x7432aaa0
[14] SSL[173350336]: Uncache: zap=0x7432aaa0 cached=0 addr=0x0000000000000000ffff0000c9db2b42 port=0xbb01 time=0 cipherSuite=0
[15] SSL: destroy sid: sid=0x7432aaa0 cached=0
[16] SSL3[173350336]: New sid: sid=0x7432a5c0
[36] 29813: SSL3[173350336]: handle finished handshake
[38] SSL: Cache: sid=0x7432a5c0 cached=0 addr=0x0000000000000000ffff0000c9db2b42 port=0xbb01 time=3cd6013e cached=0
Socket 0 and 88 start new connections with this SID (with matching, non-empty session IDs and session tickets):
[368] 29813: SSL[1188368000]: sending client-hello with sid 0x7432a5c0 (Session ID x, Session Ticket y)
[369] 29813: SSL3[1188368000]: send initial ClientHello handshake
...
[379] 29813: SSL[1188167488]: sending client-hello with sid 0x7432a5c0 (Session ID x, Session Ticket y)
[380] 29813: SSL3[1188167488]: send initial ClientHello handshake
Socket 0 receives a SH (Session ID x, Session Ticket <empty>) and uncaches 0x7432a5c0 (sid->cache
becomes invalid_cache
):
[401] 29813: SSL3[1188368000]: handle server_hello handshake
[403] SSL3[1188368000]: !sid_match against 0x7432a5c0
SSL3[1188368000]: sid 0x7c6c1900 SessionID comparison:
SSL: Sent sid->u.ssl3.sessionID: [Len: 32]
00 a6 70 9d 63 30 69 b7 fd 6d f8 25 ff 83 b3 a9 ..p.c0i..m.%....
3d 22 43 d2 34 a5 2a 11 c4 11 3b e0 2f 6b 6a e8 ="C.4.*...;./kj.
SSL: Received sidBytes: [Len: 0]
<NULL>
[404] SSL[1188368000]: Uncache: zap=0x7432a5c0 cached=1 addr=0x0000000000000000ffff0000c9db2b42 port=0xbb01 time=3cd6013e cipherSuite=57
[405] SSL3[1188368000]: New sid: sid=0x45268720
RFC 5246: "The server may return an empty session_id to indicate that the session will not be cached and therefore cannot be resumed."
NOTE: The lack of a destroy trace indicates that there is no UAF here.
Socket 88 receives SH with a non-empty and matching Session ID and a ticket. After FIN, we ssl3_SetSIDSessionTicket
and hit the assertion:
[422] 29813: SSL3[1188167488]: handle server_hello handshake
[423] 29813: SSL3[1188167488]: sid_match against 0x7432a5c0 (early exit from `ssl3_HandleServerHelloPart2`)
SSL: Sent sid->u.ssl3.sessionID: [Len: 32]
00 a6 70 9d 63 30 69 b7 fd 6d f8 25 ff 83 b3 a9 ..p.c0i..m.%....
3d 22 43 d2 34 a5 2a 11 c4 11 3b e0 2f 6b 6a e8 ="C.4.*...;./kj.
SSL: Received sidBytes: [Len: 32]
00 a6 70 9d 63 30 69 b7 fd 6d f8 25 ff 83 b3 a9 ..p.c0i..m.%....
3d 22 43 d2 34 a5 2a 11 c4 11 3b e0 2f 6b 6a e8 ="C.4.*...;./kj.
[424] 29813: SSL3[1188167488]: handle session_ticket handshake
[425] 29813: SSL3[1188167488]: receivedNewSessionTicket=True (in ssl3_HandleNewSessionTicket)
[430] 29813: SSL3[1188167488]: handle finished handshake
[431] 29813: SSL3[1188167488]: send change_cipher_spec record
[434] 29813: SSL3[1188167488] Set Current Write Cipher Suite to Pending
[435] 29813: SSL3[1188167488]: send finished handshake
[436] 29813: SSL3[1188167488] SendRecord type: handshake (22) nIn=16
[439] Assertion failure: sid->cached == in_client_cache, at /Users/kjacobs/repos/mozilla-unified/security/nss/lib/ssl/sslnonce.c:1210
So, the simplest fix is to permit invalid_cache
in ssl3_SetSIDSessionTicket
. The SID is already out of the cache (and won't be inserted again) but not yet freed. I'll note that a comment seems to address this possibility, but the code does not.
I also don't see a security issue in this, and it is not caused by bug 1399439. @mt, any thoughts on the issue or possibility of changing the assertion?
Comment 11•6 years ago
|
||
Thanks for the analysis. Let me try to restate a few things, to clarify my understanding, having come to this relatively cold after all this time.
So when we uncache, we remove the item from the cache, but only decrement the refcount. So if there are other people who might be using that SID, they will find their value is invalid_cache
(out from under them, but that is why we have locking...).
So when they attempt to put the value back in cache, they shouldn't have any opinion about the state of the SID.
(Note that the offending assertion is bad anyway because it is reading information guarded by the lock from outside the lock...)
What this code should then care about is what it needs to do in order to ensure that the cached value is updated correctly. The code we have seems fine without the assertion. And remove the one a few lines down too. We might assert that it isn't never_cached
instead.
So in the end, Sockets 0 and 88 share a SID, so they think they have the same ticket (whichever of the two overwrote the value last), but that won't be used in future because Socket 0 threw out the session. Makes me wonder why we bother copying the ticket in at that point. Better perhaps to throw it out. Unless there is code somewhere that relies on the value being set. Might be worth checking, because if any code does read that, they could be using data from a different connection.
I took a brief look and it looks like the worst we do is check some values when we generate socket information. But that is TLS 1.3-related info that should be the same for every TLS <=1.2 connection. It's probably OK then. TLS 1.3 makes a new SID for every connection, so I'm not concerned about TLS 1.3 so much.
Comment 12•6 years ago
|
||
Oh, just to add to this. I can confirm Kevin's analysis:
In a release build, these assertions won't fire, and that is OK.
Comment 13•6 years ago
|
||
So when we uncache, we remove the item from the cache, but only decrement the refcount. So if there are other people who might be using that SID, they will find their value is invalid_cache (out from under them, but that is why we have locking...).
Yes, exactly.
So when they attempt to put the value back in cache, they shouldn't have any opinion about the state of the SID.
Rather, when they attempt to store a session ticket into the sslSessionID, they shouldn't presume it to still be cached. An sslSessionID will only be cached if its state is never_cached
.
In the end, Socket 0 has a new sslSessionID which would have the same sid->u.ssl3.sessionID
via the copy, except !sid_match
led us here. This new sslSessionID has a different session_id and a different ticket, and will eventually be cached after Finished.
Socket 88's (uncached) sslSessionID will get a new ticket, but only this socket and any outstanding shared references will continue to use it.
Comment 15•5 years ago
|
||
This patch relaxes an overzealous assertion for the case where:
- Two sockets start connections with a shared SID.
- One receives an empty session ticket in the SH, and evicts the SID from cache.
- The second socket receives a new session ticket, and attempts to set it in the SID.
We currently assert that the sid is in_client_cache
at 3), but clearly it cannot be. The outstanding reference remains valid despite the eviction.
This also solves a related assertion failure after https://hg.mozilla.org/mozilla-central/rev/c5a8b641d905 where the same scenario occurs, but instead of being in_client_cache
or evicted, the SID is in_external_cache
.
Comment 16•5 years ago
|
||
Comment 17•5 years ago
|
||
[Tracking Requested - why for this release]: Do we need to backport this to NSS 3.44 for 68.9esr given its sec rating?
Assignee | ||
Comment 18•5 years ago
|
||
Actually, this is not even a security bug, just an overzealous assertion. It is not appropriate to be marked as sec-*, and the patch does not need to be considered secret.
Further, there's no reason to backport this, as the assertion only affects non-release builds.
Removing the sec- flag, and I intend to unhide this bug later this week if there's no dissent.
Updated•5 years ago
|
Description
•