406120 - Allow application to specify OCSP timeout

Reporter

Description

•

18 years ago

NSS currently uses a hardcoded 60 seconds timeout value for OCSP requests. See mozilla/security/nss/lib/certhigh/ocsp.c : if ((*hcv1->createFcn)( pServerSession, "http", path, "POST", PR_TicksPerSecond() * 60, &pRequestSession) != SECSuccess) { PORT_SetError(SEC_ERROR_OCSP_SERVER_ERROR); goto loser; } This can cause problems in WIFI environments, where the server cert used on a secure signup page triggers the application to contact a blocked OCSP server (blocked, because not yet paid). Clearly, this is a misconfiguration of a WIFI hotspot, but that's what seems to happen in real life. In bug 404059 it has been proposed to lower the timeout to 10 seconds. Right now, the application is unable to control the timeout that NSS will request for OCSP requests. This bug proposes a new NSS API function that allows to set the timeout value.

Kai Engert [:KaiE:]

Reporter

Updated

•

18 years ago

Target Milestone: --- → 3.12

Version: 3.12 → trunk

Kai Engert [:KaiE:]

Reporter

Comment 1

•

18 years ago

Attached patch Patch v1 (obsolete) — Details — Splinter Review

What about this?

Attachment #290832 - Flags: superreview?(rrelyea)

Attachment #290832 - Flags: review?(nelson)

Nelson Bolyard (seldom reads bugmail)

Comment 2

•

18 years ago

Comment on attachment 290832 [details] [diff] [review] Patch v1 >+SECStatus >+CERT_SetOCSPTimeout(PRUint32 seconds) >+{ >+ PR_EnterMonitor(OCSP_Global.monitor); >+ OCSP_Global.timeoutSeconds = seconds; >+ PR_ExitMonitor(OCSP_Global.monitor); >+ return SECSuccess; >+} Using a PRMonitor to guard the updates to a single 32-bit variable seems like overkill and inefficient. We're not trying to keep this value in sync with other variables that are protected as a set by that monitor. I think the only purpose accomplished by this monitor in this case is to ensure that all CPUs see the change immediately on systems that can reorder stores from write-back caches. If that's the only purpose, I think it can be accomplished with PR_AtomicSet and PR_AtomicGet. Let's ask Julien. He has strong opinions about this subject. Other than that, this appears OK to me.

Attachment #290832 - Flags: review?(julien.pierre.boogz)

Kai Engert [:KaiE:]

Reporter

Comment 3

•

18 years ago

I would like to note this is a call to a global configuration value, which probably gets called rarely, so I think: If the call to the monitor isn't wrong, it shouldn't hurt. It's also in synch with all other accesses to the global OCSP config variables, which use the same monitor.

Nelson Bolyard (seldom reads bugmail)

Comment 4

•

18 years ago

The use of the monitor in function fetchOcspHttpClientV1 is called more often.

Kai Engert [:KaiE:]

Reporter

Comment 5

•

18 years ago

(In reply to comment #4) > The use of the monitor in function fetchOcspHttpClientV1 is called more > often. That's correct. Could we agree to use the monitor/lock for write access to the variable in CERT_SetOCSPTimeout ? Should we omit locking for read access of the variable? If you think it's unnecessary, I'm ok to remove the monitor lock in fetchOcspHttpClientV1. (In reply to comment #2) > I think > it can be accomplished with PR_AtomicSet and PR_AtomicGet. There is no PR_AtomicGet function. There is a PR_AtomicSet function. Looking at its implementation, it locks a mutex, too. Is it really that much of a difference whether we use our usual OCSP protection monitor or the mutex used by AtomicSet? Please let's not start a lengthy discussion on this simple subject :-) I think, either we are consistent and use the same lock for all grouped data, or we don't lock. What happens when the monitor is locked and we start to use other locks like the one used by AtomicSet? Let's not complicate things and stick to a single lock.

Julien Pierre

Comment 6

•

18 years ago

Kai, You must have looked at only one implementation of PR_AtomicSet. It only uses a mutex on platforms that don't have real atomic instructions. Most do. And on those platforms, the atomic operation is much cheaper. There is no need for a PR_AtomicGet because in general a CPU, even one that reorders reads and writes, will not split the writes to smaller than a 32 bit value. So I think using PR_AtomicSet and a plain read is fine here. This is what it was designed to do. As soon as you want to deal with anything more than that one variable however, you will need a lock or monitor.

Kai Engert [:KaiE:]

Reporter

Comment 7

•

18 years ago

Attached patch Patch v2 (obsolete) — Details — Splinter Review

ok, I'm convined

Attachment #290832 - Attachment is obsolete: true

Attachment #290960 - Flags: superreview?(nelson)

Attachment #290960 - Flags: review?(julien.pierre.boogz)

Attachment #290832 - Flags: superreview?(rrelyea)

Attachment #290832 - Flags: review?(nelson)

Attachment #290832 - Flags: review?(julien.pierre.boogz)

Nelson Bolyard (seldom reads bugmail)

Comment 8

•

18 years ago

Comment on attachment 290960 [details] [diff] [review] Patch v2 sr=nelson

Attachment #290960 - Flags: superreview?(nelson) → superreview+

Wan-Teh Chang

Comment 9

•

18 years ago

Comment on attachment 290960 [details] [diff] [review] Patch v2 Kai, it's not necessary to use PR_AtomicSet. Just say OCSP_Global.timeoutSeconds = seconds; See CERT_SetSlopTime and CERT_GetSlopTime. The purpose of PR_AtomicSet is to read the old value and set the new value atomically. You just need to set the new value atomically. On 32-bit and 64-bit processors, we can safely assume that setting a 32-bit unsigned integer can be done with a single machine instruction.

Nelson Bolyard (seldom reads bugmail)

Comment 10

•

18 years ago

On all single-CPU systems, we can assume that a single store will do the job. On most (but not all) multi-CPU systems, we can assume that too. There are, however, some lame multi-CPU systems that do not assure that writes to memory happen when expected. In such systems, an instruction on one CPU can attempt to store a value to memory, and afterwards, another CPU in the system can read that memory location and still see the old value, because the first CPU's actual write to memory was delayed. (This may be due to write-back caches, and not necessarily the CPU itself.) IINM, PR_AtomicSet is intended to solve that problem, too, by taking the necessary steps to ensure that the "set" value actually gets written out.

Wan-Teh Chang

Comment 11

•

18 years ago

Memory barriers must be used on both the writer and reader sides. It is pointless to use a memory barrier on the writer side only. It is fine for the reader to read a stale value as long as the fresh value will be updated eventually. Without something else as a reference, the reader can't even determine if the value is fresh or stale. Let me explain this with an example: We have wo threads, Writer and Reader. Writer sets timeoutSeconds to 60 seconds. Reader reads timeoutSeconds. The initial value of timeoutSeconds is 30 seconds. The sequence of read and write events is allowed to happen in either order: Legal event sequence 1: Writer Reader timeoutSeconds=60 read timeoutSeconds(60) Legal event sequence 2: Writer Reader read timeoutSeconds(30) timeoutSeconds=60 Do you see why it's pointless to ensure visibility of memory writes when the read and write events are allowed to occur in either order? You're trying to ensure that in event sequence 1, Reader gets timeoutSeconds==60. But since event sequence 2 is also legal, it is legal for Reader to get timeoutSeconds==30 anyway.

Kai Engert [:KaiE:]

Reporter

Comment 12

•

18 years ago

I absolutely agree that Patch v1 would be saver. If I understand correctly, Wan-Teh recommends Patch v1. But I'm happy to go with either patch. It shouldn't matter much for this scenario, it should be ok for the code to use the old pref value when there's a race. This is just a preference, not a state.

Kai Engert [:KaiE:]

Reporter

Comment 13

•

18 years ago

Julien, the patch is waiting for your review. You have the final word on this, please either mark v1 or v2 as reviewed. Thanks

Wan-Teh Chang

Comment 14

•

18 years ago

Patch v1: - correct use of memory barriers - has locking overhead Patch v2: - incorrect use of memory barriers (missing on reader side, may be missing on writer side) - low overhead (atomic instruction) But both patches are trying to ensure a property that the reader and writer can't verify without synchronization, as I explained in comment 11. That's why I said we don't even need to use PR_AtomicSet. Like Kai, I'm happy to go with either patch.

Nelson Bolyard (seldom reads bugmail)

Comment 15

•

18 years ago

Kai, why don't you write a patch v3 that uses no locks or atomic ops and merely does stores and loads. Such code will work fine on single CPU systems and on most (but not all) multi-CPU systems. However, I think the likelihood of anyone bringing one of those lame multi-CPU systems to a wireless hotspot is sufficiently close to zero that we can ignore that concern altogether. Under the circumstances, which are that reader will always run AFTER the writer ATTEMPTS to store the value, comment 11 appears to be irrelevant. The concern in this case is that, even though the writer has executed a store instruction, to store the value in memory, the actual memory cycle to store the value is delayed for so long that the subsequent reader thread, running on another CPU, doesn't see the change, even though the write instruction was performed before the reader started. I just don't want to see a monitor (or lock) being used unconditionally on all platforms, including those on which it is never necessary for this case. Well designed systems should not pay a performance penalty to accomodate lame ones.

Kai Engert [:KaiE:]

Reporter

Comment 16

•

18 years ago

Attached patch Patch v3 — Details — Splinter Review

Attachment #290960 - Attachment is obsolete: true

Attachment #291311 - Flags: review?(nelson)

Attachment #290960 - Flags: review?(julien.pierre.boogz)

Kai Engert [:KaiE:]

Reporter

Comment 17

•

18 years ago

(In reply to comment #15) > Kai, why don't you write a patch v3 that uses no locks or atomic ops and > merely does stores and loads. done > Such code will work fine on single CPU > systems and on most (but not all) multi-CPU systems. However, I think > the likelihood of anyone bringing one of those lame multi-CPU systems to > a wireless hotspot is sufficiently close to zero that we can ignore that > concern altogether. Even if someone does that, it probably will not have an effect. The call to set the ocsp timeout will probably be used once during an init phase, only. I have one more comment. When I read a book about multithreading in the past, the author claimed that our scenario - thread 1 writes a 32 bit value to memory (4 bytes) - thread 2 reads the 32 bit value / 4 bytes from memory the following can happen: - 1, 2, 3 bytes from thread 1 arrive in memory - at the time thread 2 attempts to read the value, it get's a mix of some bytes from the old value, and some bytes from the new value So, let's say the old value was 0x100 and the new value is 0xff In the worst case, the consumer might see a value of 0x1ff. Possible or impossible? If possible, we should do something to avoid it.

Nelson Bolyard (seldom reads bugmail)

Comment 18

•

18 years ago

Comment on attachment 291311 [details] [diff] [review] Patch v3 I say, go ahead and commit this. We can always fine-tune the implementation later, but in the mean time, this establishes the API that PSM can use.

Attachment #291311 - Flags: review?(nelson) → review+

Nelson Bolyard (seldom reads bugmail)

Comment 19

•

18 years ago

(In reply to comment #17) > When I read a book about multithreading in the past, the author claimed that > our scenario > > - thread 1 writes a 32 bit value to memory (4 bytes) > - thread 2 reads the 32 bit value / 4 bytes from memory > > the following can happen: > - 1, 2, 3 bytes from thread 1 arrive in memory > - at the time thread 2 attempts to read the value, it get's a mix of > some bytes from the old value, and some bytes from the new value > > So, let's say the old value was 0x100 and the new value is 0xff > In the worst case, the consumer might see a value of 0x1ff. > > Possible or impossible? I believe it is true that, with all current 32-bit or 64-bit CPUs, a 32-bit write to a 4-byte aligned address (i.e. address%4==0) will always be done with a single indivisible memory cycle. So that problem is not possible today, provided the alignment is correct.

Kai Engert [:KaiE:]

Reporter

Comment 20

•

18 years ago

checked in. Checking in certhigh/ocsp.c; /cvsroot/mozilla/security/nss/lib/certhigh/ocsp.c,v <-- ocsp.c new revision: 1.45; previous revision: 1.44 done Checking in nss/nss.def; /cvsroot/mozilla/security/nss/lib/nss/nss.def,v <-- nss.def new revision: 1.182; previous revision: 1.181 done

Status: NEW → RESOLVED

Closed: 18 years ago

Resolution: --- → FIXED

Patch v1 18 years ago Kai Engert [:KaiE:] 4.94 KB, patch		Details \| Diff \| Splinter Review
Patch v2 18 years ago Kai Engert [:KaiE:] 3.98 KB, patch	nelson : superreview+	Details \| Diff \| Splinter Review
Patch v3 18 years ago Kai Engert [:KaiE:] 4.29 KB, patch	nelson : review+	Details \| Diff \| Splinter Review