Closed Bug 167756 Opened 22 years ago Closed 22 years ago

NSS needs to poll for smartcard/hardware token removal

Categories

(NSS :: Libraries, defect, P1)

defect

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: Bill.Burns, Assigned: rrelyea)

Details

Attachments

(1 file, 2 obsolete files)

I couldn't find an existing bug, apologies if this is a duplicate. I noticed that once I authenticate using SSL client-auth to a website or mail server I can unplug the smartcard reader or remove the smartcard and STILL get access to these services. I think it weakens the security of Mozilla to be able to do this. I'd like to see a way for NSS to recognize when the token is no longer available and destroy any SSL session cache information reliant on that token.
Assigned the bug to Bob.
Assignee: wtc → relyea
Status: UNCONFIRMED → NEW
Ever confirmed: true
Sigh, NSS does pull the smart card, but sometimes the clients do not generate new handshakes if keep alive is turned on. Bill can you point to sites where this happens so that I can verify that the NSS code is still working correctly? bob
obob.netscape.com (SSL/IMAP or HTTPS): after removing my card I can keep checking for new messages. This presumably applies to nsmail-1 and nsmail-2 as well.
As I recall from the last time I looked into this sort of complaint: When an SSL client goes to do an SSL/TLS handshake (e.g. typically on a new TCP connection), and there is an SSL session (master secret) in the cache, on which client authentication was done (using a smart card) when the session was originally established, SSL/TLS will NOT "restart" (reuse) that old session IF the "token" that was used to do the client authentication was removed and reinserted since the session was established. However, IIRC, SSL only notices that the token was removed/reinserted when a handshake is done. So, if the original SSL/TCP connection is still going continuously since the original client authentication was done, and no new handshakes are done on that connection, that connection will continue to work, even after the client auth token is removed and reinserted. IMAP clients are notorious for establishing a TCP/SSL connection and leaving that same connection up for days at a time. Since no additional handshakes are done while the original connection remains open, removal of the token goes unnoticed. Bob, does your memory about this differ from mine?
Oh, your using SSL/IMAP. This is a client issue. The client opens a single connection to the IMAP server and goes. We never again get a new handshake request. bob
Yes, I believe Nelson is correct (as you can surmise from my previous post). bob
Target 3.7 for now until we decide where we want to fix this. My assumption is we need to periodically timeout the IMAP connection at the client level. bob
Target Milestone: --- → 3.7
Priority: -- → P1
The only places to make the check are in SSL, or have a timeout in the application. We think we can do this in SSL, since the check for Token presense is now has a timer so it won't actually hit the token every time it's called. This means there may be a small delay in detecting the pulled token (order of 10 sec or so). NOTE: that NSS has not control over the data once it's been sent. If the application redisplays data out of the cache, there's not way for NSS to prevent that.
I think we have to check for more than token presence. We also have to check to see if the token has become logged out. If the token "series" number gets bumped on logout, then we should check that because it's so cheap to check. But if the series only gets bumped on login, then we need a different check.
After implementing this, it will be necessary to measure the performance impact. I'd suggest trying to implement this so that the test is done at most once per call to PR_Recv or PR_Send, not once per SSL record that is written.
Login state does not change the series. PK11_IsLoggedIn() does have a 1 second delay over which the last state is cached. We could add a login series type flag, but that would mean logging out, then logging back in would cause SSL sessions to fail (like when we import our keys we logout then log back in). I think we just want to know "am I logged in now?" which the PK11_IsLoggedIn() does. One side effect is that the 'logout out after X minutes' option will timeout on existing connections. I submit this is probably more correct behavior, but it is a change from our past behavior. bob
re comment 10. Agreed. It only make sense to implement this at the SSL layer if it does not adversely affect performance. We should test against some of the more recalcitrant pkcs #11 devices.
About comment 11 : I think that, ideally, we want to ask "Am I logged in now to the same token as the one I used for authentication?" Assuming that the existing series does change every time a token is removed and reinserted, then it is probably sufficient to ask "am I logged in now, and is the slot's series number the same as it was when I authenticated?"
Yes, checking "I am I logged in" and "has my series changes" should be the correct tests for this case. bob
Moved to target milestone 3.8 because the original NSS 3.7 release has been renamed 3.8.
Target Milestone: 3.7 → 3.8
I tested this with SMTP over SSL using nsmail-1, and it does work as expected when I remove the token. I think this is because Mozilla opens a new TCP connection to the SMTP server for each message. Therefore, there is a restart handshake, and NSS gets an opportunity to check the token presence. This makes me believe that there is already a function written somewhere that performs the check that we can invoke. If that is true, then the only remaining question is when, and how, often to invoke it. It seems to me that the desirable behavior here depends on the application, especially in light of the likely impact on performance this change will have. While the SSL layer is the correct place to do the token presence and login check, I would be in favor of having the check be configurable by the application. Here is what I propose : We would not do the check for each PR_Recv or PR_Send, as I think this would be too expensive. Instead, for each PR_Recv or PR_Send, we would take a timestamp using an inexpensive call provided by NSPR (Wan-Teh, would PR_Now() be adequate ?). We would also save the timestamp of the last checks. Then, using these timestamps, we would only do the token check periodically, with a configurable period. I would suggest 5 or 10 seconds as a default. Applications could override the setting to change the checking period. A period of 0 would mean to check every time, -1 to never do the check (ie. only on handshake). Obviously, whenever a handshake happens, we would reset the timestamps.
I agree the desirable behavior depends on the application. I can imagine under some circumstances people may want an SSL connection with client auth to persist after they pull out their smartcard. PR_IntervalNow() is the right function to get a timestamp if the intervals in question are at most 6 hours. If an interval may be longer than 6 hours, PR_Now() is the only function you can use. PR_Now() is always a system call. PR_IntervalNow() is a system call on some platforms but can be implemented cheaper on other platforms.
I don't know how many keep-alive TCP connections are expected to last more than 6 hours without activity. If we choose to limit the interval to 6 hours by way of API, what will happen if somebody actually leaves a connection idle for longer than 6 hours is that the check will not be performed when it should be. I don't think we can afford the expense of PR_Now() on every PR_Recv / PR_Send if it is a system call. Nevertheless, we need some configurability. In server applications with hardware tokens we definitely don't want to do this check, so the default value for server sockets should be -1 (never check token presence).
Julien, I think the answer is to just use PR_IntervalNow(), regardless of its rollover period. The value returned by PR_IntervalNow() is in units that vary from platform to platform. The value rolls over on all platforms. The amount of time required varies from a few hours to (I think) a few days). On some platforms, the value can roll over in as little as 6 hours. Big deal. In any case, if your algorithm does not recheck the token until (say) 10 seconds have passed since the last check, by determining if (PR_IntervalNow() - lastCheckIntervalTime > PR_SecondsToInterval(10)) (assuming all values are unsigned 32-bit values) then there will be a 10 second window, every rollover period, during which the token will not be checked. That is, if the token was last checked at time t, then the token will not be checked again if the next time sample is in the window t..t+10s, or t+r..t+r+10s or in general t+kr..t+kr_10s, where r is the rollover period, k is any non-negative integer, and 10s is 10 seconds. I think this is likely to be good enough.
Some notes about this patch: ssl_ClientAuthTokenPresent short circuits if no clAuthValid is false. This means severs will should always sort circuit the test. There is a delay of 10 seconds between checks to prevent 'pounding the card' continuously.
I've checked the code into the tip to facilitate review... particularly to see where the ssl_ClientAuthTokenPresent() call is made.
Attached patch Better version of the patch... (obsolete) — Splinter Review
In this patch I've added 2 things: 1) we remember the last removal state, so we still get errors even on retries in the last 10 seconds. 2) set the PORT_ERROR code. This patch incorporates both sets of changes. Things to I'd look for in a review: is the SecurityInfo structure the right place for the lastTime and lastState values. Do I need to worry about copying these values? Is there a better error code that I should set, should we define a new one? This, of course, on top of correctness. bob
Attachment #112413 - Attachment is obsolete: true
Review comments: 0. Indentation is goofy. Did you edit this with an editor with tabstops set to every 4 spaces instead of every 8? 1. If this function is really only for use in ssl3, then its name should begin with ssl3_ (note the 3 and lower case) to be consistent with all other functions in this file, and it should be static with a forward declaration if there are calls to it before it is defined in the file. Otherwise, it should not be in ssl3con.c, and it should be declared in sslimpl.h. 2. The last update time should really be per slot, not per socket. NSS should poll the slot once every 10 seconds, not once every 10 seconds for each and every socket that used it for client auth. Suggestion: add a lastUpdateIntervalTime value to the slot table. Update it in SECMOD_LookupSlot or some function that it calls. Create a new accessor function for it, similar to PK11_GetSlotSeries, e.g. PK11_GetSlotPollTime. 3. There's code in ssl3_SendClientHello that looks remarkably similar to your new function. Perhaps that code should be replaced with a call to your new function. (?) > is the SecurityInfo structure the right place for the lastTime and lastState > values. For lastTime, I'd say no, it belongs in the slot I think. For lastState, Hmmm. I think that info belongs in the SSL SID, because it really applies to all connections sharing the SSL session (the master secret). But there are some issues with that idea. Today, the sid is created once and never modified thereafter (except to be destroyed), so it isn't locked. Maybe it wouldn't need to be locked for this purpose either. > Do I need to worry about copying these values? If the value is in ss->sec or one of its member structs, then the function ssl_CopySecurityInfo should at least assign a value to the new copy. > Is there a better error code that I should set, > should we define a new one? That error seems perfect.
This patch incorporates Nelson's comments. The main one is the per slot time delay verse a per socket time delay. Since IsPresent and IsLoggedIn already have per slot time values, the fix is to simply not do any time delay at the SSL layer.
Attachment #112447 - Attachment is obsolete: true
We should also consider this patch for 3.7.2.
Attachment #114349 - Flags: superreview?(nelsonb)
Attachment #114349 - Flags: review?(jpierre)
Target Milestone: 3.8 → 3.7.2
Comment on attachment 114349 [details] [diff] [review] Incorporate Nelson's comments. This patch looks OK as far as it goes. However, in reviewing it, I found a small problem in PK11_IsLoggedIn (actually in pk11_InDelayPeriod) that will cause it to sometimes poll more often than once per second. I'll add another comment below shortly.
Attachment #114349 - Flags: superreview?(nelsonb) → superreview+
pk11_InDelayPeriod doesn't handle wrap-around of the PRIntervalTime properly. Anytime there's a wrap around, it returns false. But the formula that it uses when there is no wrap around, namely: ((time-lastTime) < delayTime); is correct whether there's a wrap around or not, because time, lastTime, and delayTime are all unsigned ints of the same size. So, the solution is to remove the test for wrap around. e.g. time = PR_IntervalNow(); -return (PRBool) (lastTime) && (time > lastTime) && - ((time-lastTime) < delayTime); +return (PRBool) (lastTime) && ((time-lastTime) < delayTime);
Comment on attachment 114349 [details] [diff] [review] Incorporate Nelson's comments. I've backed out this patch from the NSS 3.7 branch. It is still in the tip. This patch has two issues. 1. It may break code that depends on the current behavior that the removable token only needs to be present for the SSL client-auth handshake. 2. It will close an SSL client-auth connection if the token is automatically logged out after a timeout. If either of these issues is valid, we should consider making the new behavior an SSL socket option that is disabled by default.
OS: MacOS X → All
Hardware: Macintosh → All
The patch is in the NSS TIP (3.8) and NSS_3_7_BRANCH (3.7.2) now. Bob, could you look at the issue that Nelson pointed out in comment 27? Are we sure that pk11_InDelayPeriod will be called at least once every six hours? If not, you can't use PRIntervalTime and will need to use PRTime.
Reguarding both yours and Nelson's comments. The delay is simply to short circuit and expensive lookup when called repeatedly in a short period of time. The typical usage would see several repeated calls in a very short period. Getting a single false negative (it is not in the delay period when it is). Is not a bit problem because going through the more expensive lookup is not an error. Which is why I coded the wrap around case as I did (I knew the wrap case would generate a false negative, but wasn't worried about it). Since Nelson's patch is both simpler and more correct, there is no reason not to include it. The next question is false positives. What happens if we miss the fact that the card is pulled for the delay period. I believe the answer is "not much". We will already miss a card pull for about 10 seconds after the initial check. If we miss the card pull 6 hours later because the code went to sleep for 6 hours then suddenly, at just the right time. With the current code it means you have to go exactly 6 hours without doing any operation on the token (including looking up certificates). Once you start, the code basically gives you 10 seconds, then the normal check kicks in. I would be happy to include code that could improve this if it doesn't increase the cost to do the check (what is the cost of PR_Now() verse PR_IntervalNow()?). bob bob
Marked the bug fixed.
Status: NEW → RESOLVED
Closed: 22 years ago
Resolution: --- → FIXED
This bug is marked fixed, but comments 27 29 and 30 are not yet addressed. As I understand comment 30, Bob agrees to implement the changes I proposed in comment 27. So I thought Bob would fix that before closing this bug. Is a separate bug report needed for that issue?
Nelson, I needed to mark this bug fixed because NSS 3.7.2 has been released. Please open a separate bug for the remaining issue. Thanks.
Oops, I thought Nelson had checked in the fix suggested in comment 27 . I see wtc did last night in the tip. Nelson is correct my intention was that it should be fixed before closing this bug. bob
I don't think the fix suggested in comment 27 is critical, so I only checked it into the tip. If you think it's important we can check it in on the 3.7 branch for 3.7.3. (We already missed 3.7.3 Beta 1.)
In response to comment 33, I opened bug 197147 to cover comment 27. Wan-Teh has since checked in the fix for that bug on the trunk. I agree that fixing it on the trunk is sufficient.
Attachment #114349 - Flags: review?(jpierre)
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: