Closed Bug 1531906 Opened 2 years ago Closed 4 months ago

Assertion failure: sid->cached == in_client_cache, at /builds/worker/workspace/build/src/security/nss/lib/ssl/sslnonce.c:1221

Categories

(NSS :: Libraries, defect, P2)

3.42
defect

Tracking

(firefox-esr68- wontfix, firefox66 wontfix, firefox67 wontfix, firefox68 wontfix, firefox69 wontfix, firefox70 wontfix, firefox71 wontfix, firefox75 wontfix, firefox76 wontfix, firefox77 fixed)

RESOLVED FIXED
Tracking Status
firefox-esr68 - wontfix
firefox66 --- wontfix
firefox67 --- wontfix
firefox68 --- wontfix
firefox69 --- wontfix
firefox70 --- wontfix
firefox71 --- wontfix
firefox75 --- wontfix
firefox76 --- wontfix
firefox77 --- fixed

People

(Reporter: bc, Assigned: jcj)

References

()

Details

(Keywords: assertion)

Attachments

(1 file)

Bughunter found this on Linux/Windows nightlies. I reproduced on Beta/Nightly Linux. Opt Nightly didn't crash fwiw.

s-s just in case.

Beta = NSS 3.42.1
Nightly = NSS 3.43 Beta

  1. https://titantv.com/Default.aspx?r=t
Assertion failure: sid->cached == in_client_cache, at /builds/worker/workspace/build/src/security/nss/lib/ssl/sslnonce.c:1221
Redirecting call to abort() to mozalloc_abort

Hit MOZ_CRASH() at /builds/worker/workspace/build/src/memory/mozalloc/mozalloc_abort.cpp:33

Program /home/bclary/mozilla/builds/nightly/mozilla/firefox-debug/dist/bin/firefox (pid = 32209) received signal 11.
Stack:
#01: WasmTrapHandler(int, siginfo*, void*) (/home/bclary/mozilla/builds/nightly/mozilla/firefox-debug/dist/bin/libxul.so)
#02: __restore_rt (sigaction.c:?)
#03: mozalloc_abort (/home/bclary/mozilla/builds/nightly/mozilla/firefox-debug/dist/bin/firefox)
#04: ??? (/home/bclary/mozilla/builds/nightly/mozilla/firefox-debug/dist/bin/firefox)
#05: _PR_MapOptionName (/home/bclary/mozilla/builds/nightly/mozilla/firefox-debug/dist/bin/libnspr4.so)
#06: ssl3_SetSIDSessionTicket (/home/bclary/mozilla/builds/nightly/mozilla/firefox-debug/dist/bin/libssl3.so)
#07: ssl3_FinishHandshake (/home/bclary/mozilla/builds/nightly/mozilla/firefox-debug/dist/bin/libssl3.so)
#08: ssl3_HandlePostHelloHandshakeMessage (/home/bclary/mozilla/builds/nightly/mozilla/firefox-debug/dist/bin/libssl3.so)
#09: ssl3_HandleHandshakeMessage (/home/bclary/mozilla/builds/nightly/mozilla/firefox-debug/dist/bin/libssl3.so)
#10: ssl3_HandleNonApplicationData (/home/bclary/mozilla/builds/nightly/mozilla/firefox-debug/dist/bin/libssl3.so)
#11: ssl3_HandleRecord (/home/bclary/mozilla/builds/nightly/mozilla/firefox-debug/dist/bin/libssl3.so)
#12: ssl3_GatherCompleteHandshake (/home/bclary/mozilla/builds/nightly/mozilla/firefox-debug/dist/bin/libssl3.so)
#13: SSL_ForceHandshake (/home/bclary/mozilla/builds/nightly/mozilla/firefox-debug/dist/bin/libssl3.so)
#14: nsNSSSocketInfo::DriveHandshake() (/home/bclary/mozilla/builds/nightly/mozilla/firefox-debug/dist/bin/libxul.so)
#15: mozilla::net::nsHttpConnection::EnsureNPNComplete(nsresult&, unsigned int&) (/home/bclary/mozilla/builds/nightly/mozilla/firefox-debug/dist/bin/libxul.so)
#16: mozilla::net::nsHttpConnection::OnSocketWritable() (/home/bclary/mozilla/builds/nightly/mozilla/firefox-debug/dist/bin/libxul.so)
#17: mozilla::net::nsHttpConnection::OnOutputStreamReady(nsIAsyncOutputStream*) (/home/bclary/mozilla/builds/nightly/mozilla/firefox-debug/dist/bin/libxul.so)
#18: non-virtual thunk to mozilla::net::nsHttpConnection::OnOutputStreamReady(nsIAsyncOutputStream*) (/home/bclary/mozilla/builds/nightly/mozilla/firefox-debug/dist/bin/libxul.so)
#19: mozilla::net::nsSocketOutputStream::OnSocketReady(nsresult) (/home/bclary/mozilla/builds/nightly/mozilla/firefox-debug/dist/bin/libxul.so)
#20: mozilla::net::nsSocketTransport::OnSocketReady(PRFileDesc*, short) (/home/bclary/mozilla/builds/nightly/mozilla/firefox-debug/dist/bin/libxul.so)
#21: mozilla::net::nsSocketTransportService::DoPollIteration(mozilla::BaseTimeDuration<mozilla::TimeDurationValueCalculator>*) (/home/bclary/mozilla/builds/nightly/mozilla/firefox-debug/dist/bin/libxul.so)
#22: mozilla::net::nsSocketTransportService::Run() (/home/bclary/mozilla/builds/nightly/mozilla/firefox-debug/dist/bin/libxul.so)
#23: non-virtual thunk to mozilla::net::nsSocketTransportService::Run() (/home/bclary/mozilla/builds/nightly/mozilla/firefox-debug/dist/bin/libxul.so)
#24: nsThread::ProcessNextEvent(bool, bool*) (/home/bclary/mozilla/builds/nightly/mozilla/firefox-debug/dist/bin/libxul.so)

:mt, we weren't sure how to rate this -- is this just a DoS crash, or is the assert protecting some more complex state?

Flags: needinfo?(martin.thomson)

Let's be cautious. This indicates that there is a logic error that might result in completely different code being hit. There's a good amount of digging required to understand what is going on here. It might be related to Bug 1489945, but no telling yet.

Flags: needinfo?(martin.thomson)

calling "sec-high" based on "Let's be cautious"

Keywords: sec-high
Assignee: nobody → jjones

Following up in email with jcj.

:mt, do you still think this might be related to bug 1489945?

Flags: needinfo?(mt)

I appreciate the abundance of caution here.

The assertion failure here is protecting the write lock on the session cache, in the event the cache appears of a different type than expected. Given the timing of when this bug appeared, in nightly, my immediate guess is that the cache might be due to Bug 1399439's addition of in_external_cache as a type, and maybe this assertion should permit that style of caching -- maybe. I haven't dug into the implications of that.

So down this path we have a server sending us a new session ticket, and us attempting to cache it. We have apparently already gone down this road before, otherwise sid->u.ssl3.lock would have been unset, so we received a duplicate session ticket. For TLS 1.2, when we construct sid->u.ssl3.lock we set the cache to be in_client_cache. However, for TLS 1.3 the cache can be set to in_external_cache while setting up the session ticket. The site in the bug negotiates TLS 1.2 (or 1.1), and besides which I don't know how a downgrade would happen here, or whether the session ID would remain stable through one. I'm guessing not, so that's probably a false start.

So that leaves us with asking what other paths could change the cache from in_client_cache to another option, and what they may portend -- whether there was some other source of corruption that we're seeing as the assert.

  1. The session ticket could have expired. ssl_LookupSID will return an expired session token, with invalid_cache. This seems quite possible.
  2. The session cache could have been cleared by the user while the connection was in progress (SSL_ClearSessionCache). There's a lock on it, but the lock won't stop all the caches from being set as invalid.
  3. Maybe it moved to the external token, but I don't see how that would have happened.

All-in-all, once a session ID has been looked up LookupSID it's up to the caller to ensure it's valid. It doesn't look like we ever really do that in a thorough way though. So I think this is just a logic error, not worthy of sec-high.

Perhaps this should instead lead to an audit of the correctness of sid types and mark this down in severity?

The only thing that makes me hesitate on a downgrade here is that, while this didn't cause a crash and the immediate code just cleanly exits, we're into uncharted (or XKCD 2200) territory. This is some of the most complex code we have, and while JC has taken the first steps, this will take some serious digging to understand the issue properly.

Flags: needinfo?(mt)

This is old enough that we need to make some progress. Flagging Thyla for prioritization.

Flags: needinfo?(tvandermerwe)

The NSS team is going to look at this next week.

Flags: needinfo?(tvandermerwe)

Here's a trace log of the issue. The first number in brackets is the [log line number], followed by a [socket descriptor].

... Socket 36 receives a SH with a new SID, 0x7432a5c0 with a session ticket. This is cached after FIN:

[11] 29813: SSL3[173350336]: handle server_hello handshake
[13] SSL3[173350336]: !sid_match against 0x7432aaa0
[14] SSL[173350336]: Uncache: zap=0x7432aaa0 cached=0 addr=0x0000000000000000ffff0000c9db2b42 port=0xbb01 time=0 cipherSuite=0
[15] SSL: destroy sid: sid=0x7432aaa0 cached=0
[16] SSL3[173350336]: New sid: sid=0x7432a5c0
[36] 29813: SSL3[173350336]: handle finished handshake
[38] SSL: Cache: sid=0x7432a5c0 cached=0 addr=0x0000000000000000ffff0000c9db2b42 port=0xbb01 time=3cd6013e cached=0

Socket 0 and 88 start new connections with this SID (with matching, non-empty session IDs and session tickets):

[368] 29813: SSL[1188368000]: sending client-hello with sid 0x7432a5c0 (Session ID x, Session Ticket y)
[369] 29813: SSL3[1188368000]: send initial ClientHello handshake
...
[379] 29813: SSL[1188167488]: sending client-hello with sid 0x7432a5c0 (Session ID x, Session Ticket y)
[380] 29813: SSL3[1188167488]: send initial ClientHello handshake

Socket 0 receives a SH (Session ID x, Session Ticket <empty>) and uncaches 0x7432a5c0 (sid->cache becomes invalid_cache):

[401] 29813: SSL3[1188368000]: handle server_hello handshake
[403] SSL3[1188368000]: !sid_match against 0x7432a5c0
SSL3[1188368000]: sid 0x7c6c1900 SessionID comparison:
SSL: Sent sid->u.ssl3.sessionID: [Len: 32]
   00 a6 70 9d 63 30 69 b7 fd 6d f8 25 ff 83 b3 a9   ..p.c0i..m.%....
   3d 22 43 d2 34 a5 2a 11 c4 11 3b e0 2f 6b 6a e8   ="C.4.*...;./kj.
SSL: Received sidBytes: [Len: 0]
   <NULL>
[404] SSL[1188368000]: Uncache: zap=0x7432a5c0 cached=1 addr=0x0000000000000000ffff0000c9db2b42 port=0xbb01 time=3cd6013e cipherSuite=57
[405] SSL3[1188368000]: New sid: sid=0x45268720

RFC 5246: "The server may return an empty session_id to indicate that the session will not be cached and therefore cannot be resumed."

NOTE: The lack of a destroy trace indicates that there is no UAF here.

Socket 88 receives SH with a non-empty and matching Session ID and a ticket. After FIN, we ssl3_SetSIDSessionTicket and hit the assertion:

[422] 29813: SSL3[1188167488]: handle server_hello handshake
[423] 29813: SSL3[1188167488]: sid_match against 0x7432a5c0 (early exit from `ssl3_HandleServerHelloPart2`)
SSL: Sent sid->u.ssl3.sessionID: [Len: 32]
   00 a6 70 9d 63 30 69 b7 fd 6d f8 25 ff 83 b3 a9   ..p.c0i..m.%....
   3d 22 43 d2 34 a5 2a 11 c4 11 3b e0 2f 6b 6a e8   ="C.4.*...;./kj.
SSL: Received sidBytes: [Len: 32]
   00 a6 70 9d 63 30 69 b7 fd 6d f8 25 ff 83 b3 a9   ..p.c0i..m.%....
   3d 22 43 d2 34 a5 2a 11 c4 11 3b e0 2f 6b 6a e8   ="C.4.*...;./kj.
[424] 29813: SSL3[1188167488]: handle session_ticket handshake
[425] 29813: SSL3[1188167488]: receivedNewSessionTicket=True (in ssl3_HandleNewSessionTicket)
[430] 29813: SSL3[1188167488]: handle finished handshake
[431] 29813: SSL3[1188167488]: send change_cipher_spec record
[434] 29813: SSL3[1188167488] Set Current Write Cipher Suite to Pending
[435] 29813: SSL3[1188167488]: send finished handshake
[436] 29813: SSL3[1188167488] SendRecord type: handshake  (22) nIn=16
[439] Assertion failure: sid->cached == in_client_cache, at /Users/kjacobs/repos/mozilla-unified/security/nss/lib/ssl/sslnonce.c:1210

So, the simplest fix is to permit invalid_cache in ssl3_SetSIDSessionTicket. The SID is already out of the cache (and won't be inserted again) but not yet freed. I'll note that a comment seems to address this possibility, but the code does not.

I also don't see a security issue in this, and it is not caused by bug 1399439. @mt, any thoughts on the issue or possibility of changing the assertion?

Flags: needinfo?(mt)

Thanks for the analysis. Let me try to restate a few things, to clarify my understanding, having come to this relatively cold after all this time.

So when we uncache, we remove the item from the cache, but only decrement the refcount. So if there are other people who might be using that SID, they will find their value is invalid_cache (out from under them, but that is why we have locking...).

So when they attempt to put the value back in cache, they shouldn't have any opinion about the state of the SID.

(Note that the offending assertion is bad anyway because it is reading information guarded by the lock from outside the lock...)

What this code should then care about is what it needs to do in order to ensure that the cached value is updated correctly. The code we have seems fine without the assertion. And remove the one a few lines down too. We might assert that it isn't never_cached instead.

So in the end, Sockets 0 and 88 share a SID, so they think they have the same ticket (whichever of the two overwrote the value last), but that won't be used in future because Socket 0 threw out the session. Makes me wonder why we bother copying the ticket in at that point. Better perhaps to throw it out. Unless there is code somewhere that relies on the value being set. Might be worth checking, because if any code does read that, they could be using data from a different connection.

I took a brief look and it looks like the worst we do is check some values when we generate socket information. But that is TLS 1.3-related info that should be the same for every TLS <=1.2 connection. It's probably OK then. TLS 1.3 makes a new SID for every connection, so I'm not concerned about TLS 1.3 so much.

Flags: needinfo?(mt)

Oh, just to add to this. I can confirm Kevin's analysis:

In a release build, these assertions won't fire, and that is OK.

So when we uncache, we remove the item from the cache, but only decrement the refcount. So if there are other people who might be using that SID, they will find their value is invalid_cache (out from under them, but that is why we have locking...).

Yes, exactly.

So when they attempt to put the value back in cache, they shouldn't have any opinion about the state of the SID.

Rather, when they attempt to store a session ticket into the sslSessionID, they shouldn't presume it to still be cached. An sslSessionID will only be cached if its state is never_cached.

In the end, Socket 0 has a new sslSessionID which would have the same sid->u.ssl3.sessionID via the copy, except !sid_match led us here. This new sslSessionID has a different session_id and a different ticket, and will eventually be cached after Finished.

Socket 88's (uncached) sslSessionID will get a new ticket, but only this socket and any outstanding shared references will continue to use it.

Duplicate of this bug: 1630864

This patch relaxes an overzealous assertion for the case where:

  1. Two sockets start connections with a shared SID.
  2. One receives an empty session ticket in the SH, and evicts the SID from cache.
  3. The second socket receives a new session ticket, and attempts to set it in the SID.

We currently assert that the sid is in_client_cache at 3), but clearly it cannot be. The outstanding reference remains valid despite the eviction.

This also solves a related assertion failure after https://hg.mozilla.org/mozilla-central/rev/c5a8b641d905 where the same scenario occurs, but instead of being in_client_cache or evicted, the SID is in_external_cache.

Status: NEW → RESOLVED
Closed: 4 months ago
Resolution: --- → FIXED
Target Milestone: --- → 3.52

[Tracking Requested - why for this release]: Do we need to backport this to NSS 3.44 for 68.9esr given its sec rating?

Actually, this is not even a security bug, just an overzealous assertion. It is not appropriate to be marked as sec-*, and the patch does not need to be considered secret.

Further, there's no reason to backport this, as the assertion only affects non-release builds.

Removing the sec- flag, and I intend to unhide this bug later this week if there's no dissent.

Flags: needinfo?(jjones)
Keywords: sec-high

Not a security bug.

Group: crypto-core-security
You need to log in before you can comment on or make changes to this bug.