Closed Bug 167756 Opened 22 years ago Closed 21 years ago

NSS needs to poll for smartcard/hardware token removal

Categories

(NSS :: Libraries, defect, P1)

defect

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: Bill.Burns, Assigned: rrelyea)

Details

Attachments

(1 file, 2 obsolete files)

I couldn't find an existing bug, apologies if this is a duplicate.

I noticed that once I authenticate using SSL client-auth to a website or mail
server I can unplug the smartcard reader or remove the smartcard and STILL get
access to these services.

I think it weakens the security of Mozilla to be able to do this.  I'd like to
see a way for NSS to recognize when the token is no longer available and destroy
any SSL session cache information reliant on that token.
Assigned the bug to Bob.
Assignee: wtc → relyea
Status: UNCONFIRMED → NEW
Ever confirmed: true
Sigh, NSS does pull the smart card, but sometimes the clients do not generate
new handshakes if keep alive is turned on.

Bill can you point to sites where this happens so that I can verify that the NSS
code is still working correctly?

bob
obob.netscape.com (SSL/IMAP or HTTPS):  after removing my card I can keep
checking for new messages.

This presumably applies to nsmail-1 and nsmail-2 as well.
As I recall from the last time I looked into this sort of complaint:

When an SSL client goes to do an SSL/TLS handshake (e.g. typically on a 
new TCP connection), and there is an SSL session (master secret) in the
cache, on which client authentication was done (using a smart card) when 
the session was originally established, SSL/TLS will NOT "restart" (reuse)
that old session IF the "token" that was used to do the client authentication
was removed and reinserted since the session was established.

However, IIRC, SSL only notices that the token was removed/reinserted when
a handshake is done.  So, if the original SSL/TCP connection is still going 
continuously since the original client authentication was done, and no new
handshakes are done on that connection, that connection will continue to 
work, even after the client auth token is removed and reinserted.

IMAP clients are notorious for establishing a TCP/SSL connection and leaving
that same connection up for days at a time.  Since no additional handshakes
are done while the original connection remains open, removal of the token 
goes unnoticed.

Bob, does your memory about this differ from mine?
Oh, your using SSL/IMAP. This is a client issue. The client opens a single
connection to the IMAP server and goes. We never again get a new handshake request.

bob
Yes, I believe Nelson is correct (as you can surmise from my previous post).

bob
Target 3.7 for now until we decide where we want to fix this. My assumption is
we need to periodically timeout the IMAP connection at the client level.

bob
Target Milestone: --- → 3.7
Priority: -- → P1
The only places to make the check are in SSL, or have a timeout in the application. 

We think we can do this in SSL, since the check for Token presense is now has a
timer so it won't actually hit the token every time it's called. This means
there may be a small delay in detecting the pulled token (order of 10 sec or so).

NOTE: that NSS has not control over the data once it's been sent. If the
application redisplays data out of the cache, there's not way for NSS to prevent
that. 
I think we have to check for more than token presence.  We also have to 
check to see if the token has become logged out.  If the token "series"
number gets bumped on logout, then we should check that because it's so 
cheap to check.  But if the series only gets bumped on login, then we 
need a different check.
After implementing this, it will be necessary to measure the performance impact. 

I'd suggest trying to implement this so that the test is done at most once per 
call to PR_Recv or PR_Send, not once per SSL record that is written.
Login state does not change the series. PK11_IsLoggedIn() does have a 1 second
delay over which the last state is cached. We could add a login series type
flag, but that would mean logging out, then logging back in would cause SSL
sessions to fail (like when we import our keys we logout then log back in). I
think we just want to know "am I logged in now?" which the PK11_IsLoggedIn() does.

One side effect is that the 'logout out after X minutes' option will timeout on
existing connections. I submit this is probably more correct behavior, but it is
a change from our past behavior.

bob
re comment 10. Agreed. It only make sense to implement this at the SSL layer if
it does not adversely affect performance. We should test against some of the
more recalcitrant pkcs #11 devices.
About comment 11 :  

I think that, ideally, we want to ask "Am I logged in now to the same 
token as the one I used for authentication?"  

Assuming that the existing series does change every time a token is removed
and reinserted, then it is probably sufficient to ask "am I logged in now,
and is the slot's series number the same as it was when I authenticated?"
Yes, checking "I am I logged in" and "has my series changes" should be the
correct tests for this case.

bob
Moved to target milestone 3.8 because the original
NSS 3.7 release has been renamed 3.8.
Target Milestone: 3.7 → 3.8
I tested this with SMTP over SSL using nsmail-1, and it does work as expected
when I remove the token. I think this is because Mozilla opens a new TCP
connection to the SMTP server for each message. Therefore, there is a restart
handshake, and NSS gets an opportunity to check the token presence.

This makes me believe that there is already a function written somewhere that
performs the check that we can invoke. If that is true, then the only remaining
question is when, and how, often to invoke it.

It seems to me that the desirable behavior here depends on the application,
especially in light of the likely impact on performance this change will have.

While the SSL layer is the correct place to do the token presence and login
check, I would be in favor of having the check be configurable by the application.

Here is what I propose :

We would not do the check for each PR_Recv or PR_Send, as I think this would be
too expensive. Instead, for each PR_Recv or PR_Send, we would take a timestamp
using an inexpensive call provided by NSPR (Wan-Teh, would PR_Now() be adequate
?). We would also save the timestamp of the last checks. Then, using these
timestamps, we would only do the token check periodically, with a configurable
period. I would suggest 5 or 10 seconds as a default. Applications could
override the setting to change the checking period. A period of 0 would mean to
check every time, -1 to never do the check (ie. only on handshake).

Obviously, whenever a handshake happens, we would reset the timestamps.
I agree the desirable behavior depends on the application.
I can imagine under some circumstances people may want an
SSL connection with client auth to persist after they pull
out their smartcard.

PR_IntervalNow() is the right function to get a timestamp
if the intervals in question are at most 6 hours.  If an
interval may be longer than 6 hours, PR_Now() is the only
function you can use.  PR_Now() is always a system call.
PR_IntervalNow() is a system call on some platforms but
can be implemented cheaper on other platforms.
I don't know how many keep-alive TCP connections are expected to last more than
6 hours without activity. If we choose to limit the interval to 6 hours by way
of API, what will happen if somebody actually leaves a connection idle for
longer than 6 hours is that the check will not be performed when it should be. I
don't think we can afford the expense of PR_Now() on every PR_Recv / PR_Send if
it is a system call.

Nevertheless, we need some configurability. In server applications with hardware
tokens we definitely don't want to do this check, so the default value for
server sockets should be -1 (never check token presence).
Julien, I think the answer is to just use PR_IntervalNow(), regardless of
its rollover period.  The value returned by PR_IntervalNow() is in units
that vary from platform to platform.  The value rolls over on all platforms.  
The amount of time required varies from a few hours to (I think) a few days).  
On some platforms, the value can roll over in as little as 6 hours.  Big deal.  

In any case, if your algorithm does not recheck the token until (say) 10 
seconds have passed since the last check, by determining 
  if (PR_IntervalNow() - lastCheckIntervalTime > PR_SecondsToInterval(10))
(assuming all values are unsigned 32-bit values) then there will be a 10 second 
window, every rollover period, during which the token will not be checked.  
That is, if the token was last checked at time t, then the token will not be 
checked again if the next time sample is in the window t..t+10s, or 
t+r..t+r+10s or in general t+kr..t+kr_10s, where r is the rollover period, 
k is any non-negative integer, and 10s is 10 seconds.  I think this is 
likely to be good enough.
Some notes about this patch: ssl_ClientAuthTokenPresent short circuits if no
clAuthValid is false. This means severs will should always sort circuit the
test.

There is a delay of 10 seconds between checks to prevent 'pounding the card'
continuously.
I've checked the code into the tip to facilitate review... particularly to see
where the ssl_ClientAuthTokenPresent() call is made.
Attached patch Better version of the patch... (obsolete) — Splinter Review
In this patch I've added 2 things: 1) we remember the last removal state, so we
still get errors even on retries in the last 10 seconds. 2) set the PORT_ERROR
code.

This patch incorporates both sets of changes.

Things to I'd look for in a review: is the SecurityInfo structure the right
place for the lastTime and lastState values. Do I need to worry about copying
these values? Is there a better error code that I should set, should we define
a new one?

This, of course, on top of correctness.

bob
Attachment #112413 - Attachment is obsolete: true
Review comments:

0. Indentation is goofy.  Did you edit this with an editor with tabstops set
to every 4 spaces instead of every 8?

1. If this function is really only for use in ssl3, then its name should 
begin with ssl3_ (note the 3 and lower case) to be consistent with all other 
functions in this file, and it should be static with a forward declaration if
there are calls to it before it is defined in the file.  Otherwise, it should 
not be in ssl3con.c, and it should be declared in sslimpl.h.

2. The last update time should really be per slot, not per socket.  
NSS should poll the slot once every 10 seconds, not once every 10 seconds 
for each and every socket that used it for client auth.

Suggestion: add a lastUpdateIntervalTime value to the slot table.  
Update it in SECMOD_LookupSlot or some function that it calls.
Create a new accessor function for it, similar to PK11_GetSlotSeries, 
e.g. PK11_GetSlotPollTime.

3. There's code in ssl3_SendClientHello that looks remarkably similar to 
your new function.  Perhaps that code should be replaced with a call to your
new function. (?)

> is the SecurityInfo structure the right place for the lastTime and lastState 
> values. 

For lastTime, I'd say no, it belongs in the slot I think.

For lastState, Hmmm.  I think that info belongs in the SSL SID, because it 
really applies to all connections sharing the SSL session (the master secret).
But there are some issues with that idea.  Today, the sid is created once and
never modified thereafter (except to be destroyed), so it isn't locked.  
Maybe it wouldn't need to be locked for this purpose either.

> Do I need to worry about copying these values? 

If the value is in ss->sec or one of its member structs, then the function 
ssl_CopySecurityInfo should at least assign a value to the new copy.

> Is there a better error code that I should set, 
> should we define a new one?

That error seems perfect.
This patch incorporates Nelson's comments. The main one is the per slot time
delay verse a per socket time delay. Since IsPresent and IsLoggedIn already
have per slot time values, the fix is to simply not do any time delay at the
SSL layer.
Attachment #112447 - Attachment is obsolete: true
We should also consider this patch for 3.7.2.
Attachment #114349 - Flags: superreview?(nelsonb)
Attachment #114349 - Flags: review?(jpierre)
Target Milestone: 3.8 → 3.7.2
Comment on attachment 114349 [details] [diff] [review]
Incorporate Nelson's comments.

This patch looks OK as far as it goes.	However, in reviewing it, I found a
small problem in PK11_IsLoggedIn (actually in pk11_InDelayPeriod) that will
cause it to sometimes poll more often than once per second.  I'll add another
comment below shortly.
Attachment #114349 - Flags: superreview?(nelsonb) → superreview+
pk11_InDelayPeriod doesn't handle wrap-around of the PRIntervalTime properly.
Anytime there's a wrap around, it returns false.  
But the formula that it uses when there is no wrap around, namely:
    ((time-lastTime) < delayTime);
is correct whether there's a wrap around or not, because time, lastTime, and
delayTime are all unsigned ints of the same size.  So, the solution is to 
remove the test for wrap around.  e.g.

 time = PR_IntervalNow();
-return (PRBool) (lastTime) && (time > lastTime) && 
-                               ((time-lastTime) < delayTime);
+return (PRBool) (lastTime) && ((time-lastTime) < delayTime);
Comment on attachment 114349 [details] [diff] [review]
Incorporate Nelson's comments.

I've backed out this patch from the NSS 3.7 branch.
It is still in the tip.

This patch has two issues.

1. It may break code that depends on the current behavior
that the removable token only needs to be present for the
SSL client-auth handshake.

2. It will close an SSL client-auth connection if the token
is automatically logged out after a timeout.

If either of these issues is valid, we should consider making
the new behavior an SSL socket option that is disabled by
default.
OS: MacOS X → All
Hardware: Macintosh → All
The patch is in the NSS TIP (3.8) and NSS_3_7_BRANCH (3.7.2) now.

Bob, could you look at the issue that Nelson pointed out in comment
27?  Are we sure that pk11_InDelayPeriod will be called at least
once every six hours?  If not, you can't use PRIntervalTime and
will need to use PRTime.
Reguarding both yours and Nelson's comments.

The delay is simply to short circuit and expensive lookup when called repeatedly
in a short period of time. The typical usage would see several repeated calls in
a very short period.

Getting a single false negative (it is not in the delay period when it is). Is
not a bit problem because going through the more expensive lookup is not an
error. Which is why I coded the wrap around case as I did (I knew the wrap case
would generate a false negative, but wasn't worried about it). Since Nelson's
patch is both simpler and more correct, there is no reason not to include it.

The next question is false positives. What happens if we miss the fact that the
card is pulled for the delay period. I believe the answer is "not much". We will
already miss a card pull for about 10 seconds after the initial check. If we
miss the card pull 6 hours later because the code went to sleep for 6 hours then
suddenly, at just the right time. With the current code it means you have to go
exactly 6 hours without doing any operation on the token (including looking up
certificates). Once you start, the code basically gives you 10 seconds, then the
normal check kicks in. I would be happy to include code that could improve this
if it doesn't increase the cost to do the check (what is the cost of PR_Now()
verse PR_IntervalNow()?).

bob

bob

Marked the bug fixed.
Status: NEW → RESOLVED
Closed: 21 years ago
Resolution: --- → FIXED
This bug is marked fixed, but comments 27 29 and 30 are not yet addressed.
As I understand comment 30, Bob agrees to implement the changes I proposed
in comment 27.  So I thought Bob would fix that before closing this bug.
Is a separate bug report needed for that issue?
Nelson,

I needed to mark this bug fixed because NSS 3.7.2 has
been released.  Please open a separate bug for the
remaining issue.  Thanks.
Oops, I thought Nelson had checked in the fix suggested in comment 27 . I see
wtc did last night in the tip. Nelson is correct my intention was that it should
be fixed before closing this bug.

bob
I don't think the fix suggested in comment 27 is critical,
so I only checked it into the tip.  If you think it's
important we can check it in on the 3.7 branch for 3.7.3.
(We already missed 3.7.3 Beta 1.)
In response to comment 33, I opened bug 197147 to cover comment 27.
Wan-Teh has since checked in the fix for that bug on the trunk.
I agree that fixing it on the trunk is sufficient.
Attachment #114349 - Flags: review?(jpierre)
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: