Closed Bug 364684 Opened 18 years ago Closed 17 years ago

NSS crashes when slot's session handle counter overflows

Categories

(NSS :: Libraries, defect, P1)

defect

Tracking

(Not tracked)

RESOLVED FIXED
3.11.6

People

(Reporter: nelson, Assigned: nelson)

Details

Attachments

(2 files, 5 obsolete files)

sftk_HandleObject uses the following code to assign new handles 
to session objects.  

1818     /* PKCS #11 object ID's are unique for all objects on a
1819      * token */
1820     PZ_Lock(slot->objectLock);
1821     object->handle = slot->tokenIDCount++;
1822     PZ_Unlock(slot->objectLock);

It actually does this for all new objects, but then the handle is
typically changed again for token objects.  

Anyway, when the value of this tokenIDCount counter goes from 0x7fffffff
to 0x80000000, bad things happen, specifically crashes.  This is because
the sign bit tells softoken that this is a token object handle, not a 
session object handle.  This is actually seen to happen about every 90 
days in busy servers.

It's obviously easy enough to change that line of code to AND off the 
high order bit, e.g. 

1821     object->handle = slot->tokenIDCount++ & ~SFTK_TOKEN_MASK;

But that won't solve the problem of duplicated session handles.  
So, the challenge is to find a better way of assigning unique session 
handles.  I can think of several ideas, including a free list of handle
values.  Good ideas welcome.
Attached patch patch v1 (obsolete) — Splinter Review
Bob, in your review, please tell me if the understanding reflected 
in the new comments is incorrect, and if so, please enlighten me.
Assignee: nobody → nelson
Status: NEW → ASSIGNED
Attachment #249605 - Flags: review?(rrelyea)
Priority: -- → P1
Target Milestone: --- → 3.11.5
Attachment #249605 - Flags: review?(wtchang)
It's unclear to me how we want to target this bug fix.
Do we intend to hold of all softoken fixes for the remainder of NSS 3.11.x ?
Do all softoken fixes go into 3.12 (and later) at this point? 
Or might we produce 3.11.x releases of softoken (with x > 5) that are 
not FIPS validated and contain fixes not in softoken 3.11.4 ?
Until I know, this bug will be targetted at 3.12.
Target Milestone: 3.11.5 → 3.12
Comment on attachment 249605 [details] [diff] [review]
patch v1

I see a few typos in my patch that I will fix before checkin, e.g.

>+     * At this point, all new objects are structured session objects.
Should read: structured as session objects.

>+     * sftk_narrowToXxxObject uses soft_isToken,
Should read: uses sftk_isToken

I should initialize duplicateObject to NULL at the beginning of this
new loop below.
>+    do {
>+	PRUint32 index;
>+
>+	PZ_Lock(slot->objectLock);
>+	handle = slot->tokenIDCount;
>+	slot->tokenIDCount = (handle + 1U) & ~SFTK_TOKEN_MASK;
>+	/* Is there already a session object with this handle? */
>+	index = sftk_hash(handle, slot->tokObjHashSize);
>+	sftkqueue_find2(duplicateObject, handle, index, slot->tokObjects);
>+	PZ_Unlock(slot->objectLock);
>+    } while (duplicateObject != NULL);
>+    object->handle = handle;
Attached patch patch v2 (obsolete) — Splinter Review
This patch contains all the fixes I described in previous comments,
plus one enhancement.  It only starts checking for duplicate object IDs
after the object ID counter has wrapped around once.  

With this patch, the counter counts from 0x00000001 up to 0x7fffffff,
then keeps going up through 0x80000000 and eventually through 0xffffffff.
We mask off the high order bit to get the actual object handle value,
so the session object handles are always in the range 0x1-0x7fffffff.

Once the high-order "sign" bit is set in the counter, it stays set.  
When the counter overflows from 0xffffffff to zero, it gets set to 0x80000001.
This way, we remember that it has wrapped around (gone past 0x7fffffff) 
and we avoid using a zero object handle.  The sign bit also serves to 
remind us to check for (and avoid) duplicate object handles.
Attachment #249605 - Attachment is obsolete: true
Attachment #249825 - Flags: superreview?(wtchang)
Attachment #249825 - Flags: review?
Attachment #249605 - Flags: review?(wtchang)
Attachment #249605 - Flags: review?(rrelyea)
Attachment #249825 - Flags: review? → review?(rrelyea)
Attached patch patch v3 (obsolete) — Splinter Review
This patch renames "tokenIDCount" (which was not a count of token IDs)
to "sessionObjectHandleCount" which accurately describes its purpose.
It also uses a variable named minSessionObjectHandle, rather than a 
hard-coded decimal constant 1, to initialize the sessionObjectHandleCount
at NSS Init time, and also upon counter overflow.  This allows me to test
the overflow code by setting the value of this variable to a large value,
such as 0x7fffff80 in the debugger.
Attachment #249825 - Attachment is obsolete: true
Attachment #249976 - Flags: superreview?(wtchang)
Attachment #249976 - Flags: review?(rrelyea)
Attachment #249825 - Flags: superreview?(wtchang)
Attachment #249825 - Flags: review?(rrelyea)
Comment on attachment 249976 [details] [diff] [review]
patch v3

We already solved this problem for session handles in bug 125149.
So I encourage you to make the code that tests for a duplicate object
handle as close as possible to the corresponding code in NSC_OpenSession.
In particular, a sftkqueue_find call could replace the sftk_hash and
sftkqueue_find2 calls in this patch.

I'd not rename the tokenIDCount member, or would do a more complete job.
To see why, please read this comment in pkcs11i.h:

>                   ...,  objectLock protects all elements of the token
>- * object hash table (tokObjects[], tokenIDCount, and tokenHashTable),
>+ * object hash table (tokObjects[], sessionObjectHandleCount, and tokenHashTable),

We renamed one member, the the comment still says "token object hash table",
and there are still members named tokObjects and tokObjHashSize.

In pkcs11.c, I'd declare minSessionObjectHandle as const and only manually remove
the const before building the file for a debugging session.

If this bug is critical and there is no workaround, we need to fix it in a NSS
3.11.x patch release, which will need to be revalidated.  (How the revalidation
should be done is another question.)
Attachment #249976 - Flags: superreview?(wtchang) → superreview+
This patch is the same as v3, plus the following changes:

a) now uses a sftkqueue_find call instead of the sftk_hash and sftkqueue_find2 
calls used in previous patch.

b) more complete renaming of variables related to the slot's object hash table
and the slot's session object handle counter.  The changed names were/are:

   tokObjects      -> slotObjects               (new in this patch)
   tokObjHashSize  -> objectHashSize            ( "   "   "    "  )
   tokenHashTable  -> objectHashTable           ( "   "   "    "  )
   tokenIDCount    -> sessionObjectHandleCount  (also in patch v3)

This renaming affects only 3 files in softoken, but it changes many lines
in those 3 files.  Because of the magnitude of this change, I'm asking 
for review of this patch, especially from Bob Relyea (even though patch v3 
got sr+).
Attachment #250338 - Flags: superreview?(wtchang)
Attachment #250338 - Flags: review?(rrelyea)
Wan-Teh, in comment 6 you wrote:
> If this bug is critical and there is no workaround, we need to fix it in 
> a NSS 3.11.x patch release, which will need to be revalidated. 

This bug is critical for NSS servers that stay up long enough for this 
counter to overflow.  That is less than 90 days for some busy servers.
I'm genuinely surprised that we haven't seen this 4+ year old bug before now.
We generate at least 5 new objects for each SSL connection.  It overflows  
after ~429.5 million connections.  At 55 connections/second, that's ~90 days.  
At 100 connections/second, that's less than 50 days.  There is no workaround, 
other than stopping and restarting the server before the counter overflows.  
That workaround is unacceptable to some server owners.
Comment on attachment 250338 [details] [diff] [review]
patch v4, incorporating Wan-Teh's comments

I strongly suggest that you create a version of this patch without
the renaming for the NSS_3_11_BRANCH.  This patch is inundated with
noise that it's hard to see the bug fix.

>+static PRUint32       minSessionObjectHandle = 1U;

I still think we should make this variable 'const'.

>+	    sftkqueue_find(duplicateObject, handle, slot->slotObjects, \
>+	                   slot->objectHashSize);

Is it necessary to continue the line here with a backslash (\)?
It's not necessary in C, but since sftkqueue_find is a C preprocessor
macro, I am not sure what the rule is.

>-	slot->tokObjHashSize = SPACE_TOKEN_OBJECT_HASH_SIZE;
>+	slot->objectHashSize = SPACE_TOKEN_OBJECT_HASH_SIZE;

>-	slot->tokObjHashSize = TIME_TOKEN_OBJECT_HASH_SIZE;
>+	slot->objectHashSize = TIME_TOKEN_OBJECT_HASH_SIZE;

This is another reason that I think renaming should be done in a separate
patch.  We see "TOKEN_OBJECT" here again.
Attachment #250338 - Flags: superreview?(wtchang) → superreview+
Alright, I'll make a separate patch for the branch.
This patch is waiting for Bob Relyea's review, even on the trunk.
Target Milestone: 3.12 → 3.11.5
(In reply to comment #7)
>
>    tokenHashTable  -> objectHashTable           ( "   "   "    "  )

The new name may be incorrect.  We seem to store token key objects
in this hash table.
Comment on attachment 250338 [details] [diff] [review]
patch v4, incorporating Wan-Teh's comments

>+ * (head[]->refCount),  objectLock protects all elements of the slot's

It may be better to change "token's" to "slot's".

>+    SFTKObject		**slotObjects;		/* variable - reset */
>+    unsigned int	objectHashSize;		/* invariant */

We seem to only store session object handles in this hash table.  If I'm
right, the new names probably should reflect that.

The fact that only session object handles are stored in this hash table
is used in proving the correctness of the following line in this patch:

+	    sftkqueue_find(duplicateObject, handle, slot->slotObjects, \
+	                   slot->objectHashSize);
Comment on attachment 250338 [details] [diff] [review]
patch v4, incorporating Wan-Teh's comments

r+ = relyea
Attachment #250338 - Flags: review?(rrelyea) → review+
Wan-Teh, 
Now that you've pointed out that the NSS-style hash table of objects
is really exclusively a table of session objects, I am convinced more
than ever that these variable names MUST be changed, even on the branch.
There were two variables whose names CLEARLY identified them as pertaining
to token objects, that were in fact exclusively for session objects.  
That's as wrong as it can be, as wrong as a logic error in the code, and
completely misleading to developers (like me!).  I think this new set of
names clears it up.  Please let me know if you disagree.

So, this patch is the same as the previous one, except for the new names

   tokenIDCount    -> sessionObjectHandleCount  (also in patch v3)
   tokenHashTable  -> tokObjHashTable           (new in this patch)
   tokObjects      -> sesObjHashTable           ( "   "   "    "  )
   tokObjHashSize  -> sesObjHashSize            ( "   "   "    "  )
Attachment #249976 - Attachment is obsolete: true
Attachment #249976 - Flags: review?(rrelyea)
I also meant to fix the misnamed _TOKEN_OBJECT_HASH_SIZE symbols,
which are actually SESSION_OBJECT_HASH_SIZEs.
Attachment #250541 - Attachment is obsolete: true
Attachment #250545 - Flags: review?(wtchang)
Comment on attachment 250545 [details] [diff] [review]
patch v6 - alternative variable renaming (take two)

Nelson, Bob should review the new names.  I was just pointing
out that some of the new names you chose before were inaccurate,
and that there are other members/macros that also need to be
renamed.  I don't know the code well enough to do a good review
quickly.

I would change 'sesObj' to 'sessObj' to be consistent with
the existing member named 'sessHashSize'.

Note that
-- NSS developers read both the (patched) code and the diffs
-- non-NSS developers only read the diffs
So smaller, simpler diffs have its advantages.  Imagine the FIPS
testing lab or a Federal government consultant review this patch
to assess the differences from the version of softoken that we
submitted for FIPS validation.

Simply adding comments to pkcs11i.h to explain what each
structure member is would accomplish most of the benefits of
renaming.

I don't want to block the checkin of your fix, so you can check
in this patch.  (I didn't review it.)
Attachment #250545 - Flags: review?(wtchang) → review?(rrelyea)
Comment on attachment 250545 [details] [diff] [review]
patch v6 - alternative variable renaming (take two)

r+, though wtc is correct, sess is more consistant than ses for abreviation for session. The existing code has lots of occurances of sessObject.

Also, in the line fixing my typo of declare, it misses the typo of formatting.
Attachment #250545 - Flags: review?(rrelyea) → review+
This patch incorporates the suggestion to use an "sess" prefix, rather than
"ses".  This was committed on the trunk some time ago.  I plan to commit it
to the 3_11 BRANCH today.
Attachment #250545 - Attachment is obsolete: true
Committed on branch. Checkin comment:

Bug 364684. Don't let the counter whose values are used for session object
handles overflow the sign bit.  Rename several softoken variables to
correctly identify their meaning.  r=rrelyea,wtchang
  tokenIDCount    -> sessionObjectHandleCount
  tokenHashTable  -> tokObjHashTable
  tokObjects      -> sessObjHashTable
  tokObjHashSize  -> sessObjHashSize

Checking in pkcs11.c;  new revision: 1.112.2.21; previous revision: 1.112.2.20
Checking in pkcs11i.h; new revision: 1.41.2.4; previous revision: 1.41.2.3
Checking in pkcs11u.c; new revision: 1.69.2.4; previous revision: 1.69.2.3
Status: ASSIGNED → RESOLVED
Closed: 17 years ago
Resolution: --- → FIXED
Target Milestone: 3.11.5 → 3.11.6
You need to log in before you can comment on or make changes to this bug.