Open Bug 951781 Opened 11 years ago Updated 6 months ago

libssl accesses the NSS certificate database during handshake, causing disk I/O to block network activity

Categories

(NSS :: Libraries, defect, P3)

Tracking

(Not tracked)

People

(Reporter: briansmith, Unassigned)

Details

(Keywords: main-thread-io, perf, Whiteboard: [snappy])

Gecko does all its network I/O on a thread we call the "socket transport thread." This is the thread on which the SSL handshake is executed.

In ssl3_HandleCertificate and maybe other places, CERT_NewTempCertificate and maybe other functions that access the NSS certificate database are called. My understanding is that this can cause disk I/O. We are not supposed to have disk I/O on the socket transport thread.

We can probably solve this problem by making a new version of the async certificate verification API that passes the application SECItems instead of CERTCertificate objects and by making the parsing DER -> CERTCertificate lazy in other functions like SSL_PeerCertificate. Then, if an application uses the new verification API and if it avoids functions like SSL_PeerCertificate then it will avoid the disk I/O on the networking thread.
Brian, this would only cause disk I/O in the event of PKCS#11 modules being loaded, and even then, only if they've fired certain events.

NSS maintains an in-memory cache of the contents - both of the user DB and in modules' contents - and will only re-scan if the module has gone through a state change or fired an event.

Indeed, you're at MUCH greater risk from calling *any* NSS function on the socket transport function, due to the need of going through PKCS#11 modules. If a PKCS#11 module is slow, blocks, or requires physical (eg: serial or USB) I/O - you'll end up blocked on those. Even if you move the handshake off to a worker, ANY NSS function is at risk of triggering this.

In Chromium's case, we've had to move the entire SSL layer off onto a dedicated thread from the IO (which handles socket I/O and local IPC) thread because of this on Linux and ChromeOS. This is because Linux users may have PKCS#11 modules, and on ChromeOS, we have a PKCS#11 module that interacts with the TPM. There are too many code paths in NSS that end up getting blocked on the PK11Slot/Module locks - even if they're doing nothing with the TPM - that we had to move it wholesale.
Severity: normal → S3
Severity: S3 → S4
Priority: -- → P3
You need to log in before you can comment on or make changes to this bug.