Last Comment Bug 406120 - Allow application to specify OCSP timeout
: Allow application to specify OCSP timeout
Status: RESOLVED FIXED
:
Product: NSS
Classification: Components
Component: Libraries (show other bugs)
: trunk
: All All
: -- normal (vote)
: 3.12
Assigned To: nobody
:
Mentors:
Depends on:
Blocks: 404059
  Show dependency treegraph
 
Reported: 2007-11-29 22:17 PST by Kai Engert (:kaie)
Modified: 2007-12-04 23:41 PST (History)
7 users (show)
See Also:
Crash Signature:
(edit)
QA Whiteboard:
Iteration: ---
Points: ---


Attachments
Patch v1 (4.94 KB, patch)
2007-11-29 22:33 PST, Kai Engert (:kaie)
no flags Details | Diff | Review
Patch v2 (3.98 KB, patch)
2007-11-30 23:04 PST, Kai Engert (:kaie)
nelson: superreview+
Details | Diff | Review
Patch v3 (4.29 KB, patch)
2007-12-03 15:16 PST, Kai Engert (:kaie)
nelson: review+
Details | Diff | Review

Description Kai Engert (:kaie) 2007-11-29 22:17:30 PST
NSS currently uses a hardcoded 60 seconds timeout value for OCSP requests.
See mozilla/security/nss/lib/certhigh/ocsp.c :

    if ((*hcv1->createFcn)(
            pServerSession,
            "http",
            path,
            "POST",
            PR_TicksPerSecond() * 60,
            &pRequestSession) != SECSuccess) {
        PORT_SetError(SEC_ERROR_OCSP_SERVER_ERROR);
        goto loser;
    }


This can cause problems in WIFI environments, where the server cert used on a secure signup page triggers the application to contact a blocked OCSP server (blocked, because not yet paid).

Clearly, this is a misconfiguration of a WIFI hotspot, but that's what seems to happen in real life.


In bug 404059 it has been proposed to lower the timeout to 10 seconds.

Right now, the application is unable to control the timeout that NSS will request for OCSP requests.


This bug proposes a new NSS API function that allows to set the timeout value.
Comment 1 Kai Engert (:kaie) 2007-11-29 22:33:12 PST
Created attachment 290832 [details] [diff] [review]
Patch v1

What about this?
Comment 2 Nelson Bolyard (seldom reads bugmail) 2007-11-29 23:40:39 PST
Comment on attachment 290832 [details] [diff] [review]
Patch v1

>+SECStatus
>+CERT_SetOCSPTimeout(PRUint32 seconds)
>+{
>+    PR_EnterMonitor(OCSP_Global.monitor);
>+    OCSP_Global.timeoutSeconds = seconds;
>+    PR_ExitMonitor(OCSP_Global.monitor);
>+    return SECSuccess;
>+}

Using a PRMonitor to guard the updates to a single 32-bit
variable seems like overkill and inefficient.  
We're not trying to keep this value in sync with other 
variables that are protected as a set by that monitor.
I think the only purpose accomplished by this monitor 
in this case is to ensure that all CPUs see the change 
immediately on systems that can reorder stores from 
write-back caches.  If that's the only purpose, I think
it can be accomplished with PR_AtomicSet and PR_AtomicGet.
Let's ask Julien.  He has strong opinions about this subject.

Other than that, this appears OK to me.
Comment 3 Kai Engert (:kaie) 2007-11-30 00:01:18 PST
I would like to note this is a call to a global configuration value, which probably gets called rarely, so I think: If the call to the monitor isn't wrong, it shouldn't hurt.

It's also in synch with all other accesses to the global OCSP config variables, which use the same monitor.
Comment 4 Nelson Bolyard (seldom reads bugmail) 2007-11-30 00:39:54 PST
The use of the monitor in function fetchOcspHttpClientV1 is called more
often.  
Comment 5 Kai Engert (:kaie) 2007-11-30 01:36:58 PST
(In reply to comment #4)
> The use of the monitor in function fetchOcspHttpClientV1 is called more
> often.  

That's correct.
Could we agree to use the monitor/lock for write access to the variable in CERT_SetOCSPTimeout ?

Should we omit locking for read access of the variable?

If you think it's unnecessary, I'm ok to remove the monitor lock in fetchOcspHttpClientV1.


(In reply to comment #2)
> I think
> it can be accomplished with PR_AtomicSet and PR_AtomicGet.


There is no PR_AtomicGet function.

There is a PR_AtomicSet function. Looking at its implementation, it locks a mutex, too. Is it really that much of a difference whether we use our usual OCSP protection monitor or the mutex used by AtomicSet?


Please let's not start a lengthy discussion on this simple subject :-)
I think, either we are consistent and use the same lock for all grouped data, or we don't lock.

What happens when the monitor is locked and we start to use other locks like the one used by AtomicSet? Let's not complicate things and stick to a single lock.
Comment 6 Julien Pierre 2007-11-30 15:14:57 PST
Kai,

You must have looked at only one implementation of PR_AtomicSet. It only uses a mutex on platforms that don't have real atomic instructions. Most do. And on those platforms, the atomic operation is much cheaper. There is no need for a PR_AtomicGet because in general a CPU, even one that reorders reads and writes, will not split the writes to smaller than a 32 bit value. So I think using PR_AtomicSet and a plain read is fine here. This is what it was designed to do. As soon as you want to deal with anything more than that one variable however, you will need a lock or monitor.
Comment 7 Kai Engert (:kaie) 2007-11-30 23:04:56 PST
Created attachment 290960 [details] [diff] [review]
Patch v2

ok, I'm convined
Comment 8 Nelson Bolyard (seldom reads bugmail) 2007-12-01 03:26:40 PST
Comment on attachment 290960 [details] [diff] [review]
Patch v2

sr=nelson
Comment 9 Wan-Teh Chang 2007-12-01 06:56:19 PST
Comment on attachment 290960 [details] [diff] [review]
Patch v2

Kai, it's not necessary to use PR_AtomicSet.  Just say

    OCSP_Global.timeoutSeconds = seconds;

See CERT_SetSlopTime and CERT_GetSlopTime.  The purpose
of PR_AtomicSet is to read the old value and set the new
value atomically.  You just need to set the new value
atomically.  On 32-bit and 64-bit processors, we can
safely assume that setting a 32-bit unsigned integer can
be done with a single machine instruction.
Comment 10 Nelson Bolyard (seldom reads bugmail) 2007-12-01 10:37:13 PST
On all single-CPU systems, we can assume that a single store will do the 
job.  On most (but not all) multi-CPU systems, we can assume that too.
There are, however, some lame multi-CPU systems that do not assure that 
writes to memory happen when expected. In such systems, an instruction 
on one CPU can attempt to store a value to memory, and afterwards, another
CPU in the system can read that memory location and still see the old value,
because the first CPU's actual write to memory was delayed.  (This may be
due to write-back caches, and not necessarily the CPU itself.)
IINM, PR_AtomicSet is intended to solve that problem, too, by taking the necessary steps to ensure that the "set" value actually gets written out.  
Comment 11 Wan-Teh Chang 2007-12-01 14:01:21 PST
Memory barriers must be used on both the writer and reader sides.
It is pointless to use a memory barrier on the writer side only.

It is fine for the reader to read a stale value as long as the
fresh value will be updated eventually.  Without something else
as a reference, the reader can't even determine if the value is
fresh or stale.  Let me explain this with an example:

We have wo threads, Writer and Reader.  Writer sets timeoutSeconds
to 60 seconds.  Reader reads timeoutSeconds.  The initial value of
timeoutSeconds is 30 seconds.

The sequence of read and write events is allowed to happen in either
order:

Legal event sequence 1:

Writer                  Reader

timeoutSeconds=60

                        read timeoutSeconds(60)

Legal event sequence 2:

Writer                  Reader

                        read timeoutSeconds(30)

timeoutSeconds=60

Do you see why it's pointless to ensure visibility of memory
writes when the read and write events are allowed to occur
in either order?  You're trying to ensure that in
event sequence 1, Reader gets timeoutSeconds==60.
But since event sequence 2 is also legal, it is legal for
Reader to get timeoutSeconds==30 anyway.
Comment 12 Kai Engert (:kaie) 2007-12-03 09:35:40 PST
I absolutely agree that Patch v1 would be saver.
If I understand correctly, Wan-Teh recommends Patch v1.

But I'm happy to go with either patch.
It shouldn't matter much for this scenario, it should be ok for the code to use the old pref value when there's a race. This is just a preference, not a state.
Comment 13 Kai Engert (:kaie) 2007-12-03 09:36:58 PST
Julien, the patch is waiting for your review.

You have the final word on this, please either mark v1 or v2 as reviewed.
Thanks
Comment 14 Wan-Teh Chang 2007-12-03 10:16:25 PST
Patch v1:
- correct use of memory barriers
- has locking overhead 

Patch v2:
- incorrect use of memory barriers (missing on reader side,
  may be missing on writer side)
- low overhead (atomic instruction)

But both patches are trying to ensure a property that the reader
and writer can't verify without synchronization, as I explained
in comment 11.  That's why I said we don't even need to use
PR_AtomicSet.

Like Kai, I'm happy to go with either patch.
Comment 15 Nelson Bolyard (seldom reads bugmail) 2007-12-03 14:42:15 PST
Kai, why don't you write a patch v3 that uses no locks or atomic ops and
merely does stores and loads.  Such code will work fine on single CPU
systems and on most (but not all) multi-CPU systems.  However, I think
the likelihood of anyone bringing one of those lame multi-CPU systems to 
a wireless hotspot is sufficiently close to zero that we can ignore that
concern altogether.  

Under the circumstances, which are that reader will always run AFTER the
writer ATTEMPTS to store the value, comment 11 appears to be irrelevant.
The concern in this case is that, even though the writer has executed a 
store instruction, to store the value in memory, the actual memory cycle
to store the value is delayed for so long that the subsequent reader 
thread, running on another CPU, doesn't see the change, even though the
write instruction was performed before the reader started.  

I just don't want to see a monitor (or lock) being used unconditionally on 
all platforms, including those on which it is never necessary for this case.
Well designed systems should not pay a performance penalty to accomodate 
lame ones.  
Comment 16 Kai Engert (:kaie) 2007-12-03 15:16:14 PST
Created attachment 291311 [details] [diff] [review]
Patch v3
Comment 17 Kai Engert (:kaie) 2007-12-03 15:23:00 PST
(In reply to comment #15)
> Kai, why don't you write a patch v3 that uses no locks or atomic ops and
> merely does stores and loads.

done


> Such code will work fine on single CPU
> systems and on most (but not all) multi-CPU systems.  However, I think
> the likelihood of anyone bringing one of those lame multi-CPU systems to 
> a wireless hotspot is sufficiently close to zero that we can ignore that
> concern altogether.  

Even if someone does that, it probably will not have an effect.

The call to set the ocsp timeout will probably be used once during an init phase, only.


I have one more comment.

When I read a book about multithreading in the past, the author claimed that our scenario

- thread 1 writes a 32 bit value to memory (4 bytes)
- thread 2 reads the 32 bit value / 4 bytes from memory

the following can happen:
- 1, 2, 3 bytes from thread 1 arrive in memory
- at the time thread 2 attempts to read the value, it get's a mix of some bytes from the old value, and some bytes from the new value

So, let's say the old value was 0x100 and the new value is 0xff
In the worst case, the consumer might see a value of 0x1ff.


Possible or impossible?

If possible, we should do something to avoid it.
Comment 18 Nelson Bolyard (seldom reads bugmail) 2007-12-04 16:59:57 PST
Comment on attachment 291311 [details] [diff] [review]
Patch v3

I say, go ahead and commit this.  We can always fine-tune the implementation later, but in the mean time, this establishes the API that PSM can use.
Comment 19 Nelson Bolyard (seldom reads bugmail) 2007-12-04 17:03:19 PST
(In reply to comment #17)

> When I read a book about multithreading in the past, the author claimed that
> our scenario
> 
> - thread 1 writes a 32 bit value to memory (4 bytes)
> - thread 2 reads the 32 bit value / 4 bytes from memory
> 
> the following can happen:
> - 1, 2, 3 bytes from thread 1 arrive in memory
> - at the time thread 2 attempts to read the value, it get's a mix of 
>   some bytes from the old value, and some bytes from the new value
> 
> So, let's say the old value was 0x100 and the new value is 0xff
> In the worst case, the consumer might see a value of 0x1ff.
> 
> Possible or impossible?

I believe it is true that, with all current 32-bit or 64-bit CPUs, 
a 32-bit write to a 4-byte aligned address (i.e. address%4==0) will
always be done with a single indivisible memory cycle.  
So that problem is not possible today, provided the alignment is correct.
Comment 20 Kai Engert (:kaie) 2007-12-04 23:41:34 PST
checked in.

Checking in certhigh/ocsp.c;
/cvsroot/mozilla/security/nss/lib/certhigh/ocsp.c,v  <--  ocsp.c
new revision: 1.45; previous revision: 1.44
done
Checking in nss/nss.def;
/cvsroot/mozilla/security/nss/lib/nss/nss.def,v  <--  nss.def
new revision: 1.182; previous revision: 1.181
done

Note You need to log in before you can comment on or make changes to this bug.