Closed Bug 326482 Opened 19 years ago Closed 17 years ago

NSS ECC performance problems (intel)

Categories

(NSS :: Libraries, defect, P1)

3.9.7
x86
Linux
defect

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: rrelyea, Assigned: rrelyea)

References

Details

(Keywords: perf, Whiteboard: ECC)

Attachments

(10 files, 9 obsolete files)

1.51 KB, patch
wtc
: review+
nelson
: review+
Details | Diff | Splinter Review
2.13 KB, patch
rrelyea
: review+
Details | Diff | Splinter Review
2.01 KB, patch
nelson
: review+
wtc
: review+
Details | Diff | Splinter Review
4.13 KB, patch
nelson
: review-
Details | Diff | Splinter Review
9.16 KB, patch
nelson
: review+
Details | Diff | Splinter Review
7.85 KB, text/plain
Details
1.84 KB, patch
nelson
: review+
julien.pierre
: review+
Details | Diff | Splinter Review
5.72 KB, patch
nelson
: review-
Details | Diff | Splinter Review
11.15 KB, patch
nelson
: review+
Details | Diff | Splinter Review
8.82 KB, patch
rrelyea
: review+
nelson
: review+
Details | Diff | Splinter Review
There are some bugs in NSS that affect ECC performance. This bug is meant to track those issues.
Without this patch, ECDH ends up generating a sensitive key in the database token. It cannot move the key into the crypto token without doing a key exchange. Without this patch we will do one or more extra RSA operations every full handshack, defeating all our performance gains with ECDH
Attachment #211207 - Flags: review?(wtchang)
Attached patch Cache the server's public key (obsolete) — Splinter Review
This patch does 2 things: 1) it rationalizes a very complicated if statement back to something that is more readable (looks like it was the result of an old merge). 2) caches the server's public key so we don't need to keep extracting it. Note: This patch is a test patch. The real version should probably safe the public key from SSL_ConfigureSecureServer. (This patch would leak a PublicKey structure in a race).
This one we need comments on in 2 senses: 1) do we want to do it. 2) is this the right way. This patch is designed to be checked in. It does a threadsafe check (using atomic increment) to prevent an ephemeral leak. Currently Apache with open SSL generates one ephemeral p160r1 key at startup (independent of the signing key). There is/will be a separate bug so NSS creates this key based on the cert signing key.
Unlike the first and 3rd patches, this only gets us a small percent (about 4%), but it's worth doing. Part of the performance gain we got in RSA was copying the token key to a session key. This elliminated lots of DES decryptions of the key itself. ECC is more sensitive to this than RSA was.
Attachment #211215 - Flags: review?(wtchang)
Comment on attachment 211215 [details] [diff] [review] CHECKED IN:Fix a bug where ECC keys were not being copied. Glad you caught that bug where it was using the wrong attribute count value.
Attachment #211215 - Flags: review+
Comment on attachment 211210 [details] [diff] [review] Cache the server's public key You didn't ask for review of this patch, but I reviewed it anyway. :-) You're right that this patch will leak a key in a race. The serverKeyPair should not be altered while the ssl handshake is in progress, but rather at the time it is created. The fundamental problem is that the serverKeyPair structure only has the private key filled in when it is created, and does not also cache the publib key at that time. The solution is to extract the pubkey at the time when the serverKeyPair is created, and cache it in the serverKeyPair at that time. The function that creates the serverKeyPair is SSL_ConfigSecureServer in sslsecur.c. Today it extracts the public key, then frees it. Instead, it should save that pubkey and put it into the serverKeyPair.
Attachment #211210 - Flags: review-
Comment on attachment 211212 [details] [diff] [review] Only generate one ephemeral key, not one per connection Here's another unsolicited review :-) >+ privKey = SECKEY_CreateECPrivateKey(&ecParams, &pubKey, NULL); > PORT_Free(ecParams.data); >- return rv; >+ if (!privKey || !pubKey || !(keyPair = ssl3_NewKeyPair(privKey, pubKey))) { >+ ssl_MapLowLevelError(SEC_ERROR_KEYGEN_FAIL); >+ return SECFailure; >+ } If ssl3_NewKeyPair returns NULL, this error path leaks pubKey and PrivKey.
Attachment #211212 - Flags: review-
This patch changes SSL_ConfigSecureServer to cache the extracted server public key. With this patch in place, the other patch (attachment 212210 [details] [diff] [review]) will only have to stop extracting the key, and start using the cached copy. Bob, you'll want to redi that patch to incorporate this one, I think.
Attachment #211285 - Flags: review?(rrelyea)
Comment on attachment 211285 [details] [diff] [review] CHECKED IN:Supplemental patch to cache the server's pub key yes, this is exactly what I envisioned.
Attachment #211285 - Flags: review?(rrelyea) → review+
Comment on attachment 211207 [details] [diff] [review] Implement the derive sensitive only for those derivation functions that require it. I don't know if this is complete (if it handles all the cases that need it), but it appears correct for those it handles. r+=nelson
Attachment #211207 - Flags: review+
Checking in pkcs11u.c; /cvsroot/mozilla/security/nss/lib/softoken/pkcs11u.c,v <-- pkcs11u.c new revision: 1.70; previous revision: 1.69 done Patch "Fix a bug where ECC keys were not being copied." checked into tip.
OK, the previous patch was missing the essential change. Not doing the sensitive check global.
Attachment #211207 - Attachment is obsolete: true
Attachment #211396 - Flags: review?(nelson)
Attachment #211207 - Flags: review?(wtchang)
nelson's "Supplemental patch to cache the server's pub key" checked into the tip Checking in sslsecur.c; /cvsroot/mozilla/security/nss/lib/ssl/sslsecur.c,v <-- sslsecur.c new revision: 1.35; previous revision: 1.34 done
This patch fixes the problem nelson identified in the error path. It also fixes sets the ECDHE key strength to something comparable to the signing key strength.
Attachment #211212 - Attachment is obsolete: true
Attachment #211405 - Flags: review?(nelson)
Both this patch and the previous patch requires patch 211285 above (that patch is now in on the tip).
Attachment #211210 - Attachment is obsolete: true
Attachment #211408 - Flags: review?(nelson)
This is the only patch to ECC code in general. This patch preallocates some scratch MPI structures for the lower code to use, removing the need to free and clear the data structures several times in a loop. It also changes some of the calls to multiply where the target is the same as one of the sources to prevent another copy in MPI.
This .s file can replace mpi_x86.s to accellerate boxes with intel processors with SSE. This cuts RSA operations by about half and ECC by 30%
Comment on attachment 211215 [details] [diff] [review] CHECKED IN:Fix a bug where ECC keys were not being copied. r=wtc.
Attachment #211215 - Flags: review?(wtchang) → review+
Comment on attachment 211396 [details] [diff] [review] CHECKED IN version 2: Implement the derive sensitive only for those derivation functions that require it. I verified that this patch only calls sftk_DeriveSensitiveCheck in some cases correctly, but I haven't found out which cases require the derive sensitive check. I'll need to read about the 17 cases in PKCS #11 Chap. 12 "Mechanisms". I'll do that next week.
Comment on attachment 211396 [details] [diff] [review] CHECKED IN version 2: Implement the derive sensitive only for those derivation functions that require it. I believe this patch does what Bob intended for it to do, and does it correctly. I am *NOT* sure that this patch correctly implements the PKCS11 semantics defined for the sensitive attribute in derived keys. I fear that this patch might make softoken produce non-sensitive derivative keys in places where the PKCS11 spec requires them to be sensitive. The downside of such a change is that our code (e.g. PK11wrap) will become dependent on that non-conforming behavior, and will not work with other PKCS11 modules that do conform. Bob, if you're sure this is conformant to the PKCS11 spec, then r=nelson
Attachment #211396 - Flags: review?(nelson) → review+
Comment on attachment 211396 [details] [diff] [review] CHECKED IN version 2: Implement the derive sensitive only for those derivation functions that require it. r=wtc, with the same caveat as Nelson's that I did not check the PKCS #11 standard.
Attachment #211396 - Flags: review+
Whiteboard: ECC
Comment on attachment 211408 [details] [diff] [review] version 2: Cache the server's public key (now use the cached key). >- svrCert = ss->serverCerts[exchKeyType].serverCert; >- svrPubKey = CERT_ExtractPublicKey(svrCert); >+ svrPubKey = ss->serverCerts[exchKeyType].serverKeyPair->pubKey; This new line will crash if ss->serverCerts[exchKeyType].serverKeyPair == NULL >@@ -6105,8 +6098,8 @@ > ssl3_HandleClientKeyExchange(sslSocket *ss, SSL3Opaque *b, PRUint32 length) > { > SECKEYPrivateKey *serverKey = NULL; >- SECStatus rv; > const ssl3KEADef * kea_def; >+ SECStatus rv; Moving that declaration of rv down by one line seems unnecessary. >-#ifdef NSS_ENABLE_ECC >- /* XXX Using SSLKEAType to index server certifiates >- * does not work for (EC)DHE ciphers. Until we have You deleted/moved a large block of code here, and I'm still trying to convince myself that the new code is (or is not) equivalent to the old. Consider this part of the review still not completed. I will continue to look at this. >@@ -6201,8 +6181,7 @@ > (ss->ephemeralECDHKeyPair != NULL)) { > serverPubKey = ss->ephemeralECDHKeyPair->pubKey; > } else { >- serverPubKey = CERT_ExtractPublicKey( >- ss->serverCerts[kt_ecdh].serverCert); >+ serverPubKey = ss->serverCerts[kt_ecdh].serverKeyPair->pubKey; Here again, if ss->serverCerts[kt_ecdh].serverKeyPair is NULL, this new line of code will crash where the previous code did not. > } > if (serverPubKey == NULL) { > /* XXX Is this the right error code? */
Attachment #211408 - Flags: review?(nelson) → review-
Blocks: 328514
Target Milestone: 3.12 → 3.11.1
patch "Fix bug where ECC keys were not being copied" checked into tip and 3.11. (tip is rev 1.70). Checking in pkcs11u.c; /cvsroot/mozilla/security/nss/lib/softoken/pkcs11u.c,v <-- pkcs11u.c new revision: 1.69.2.1; previous revision: 1.69 done
Comment on attachment 211420 [details] [diff] [review] CHECKED IN 3.12: Remove mp_init/mp_clear calls (and potential mallocs,frees and zeros) in tight loops r=nelson Please correct these spelling errors: >+#if MAX_SCRATCH < 4 >+#error "Scratch array defined to small " too >+#if MAX_SCRATCH < 6 >+#error "Scratch array defined to small " too
Attachment #211420 - Attachment description: Remove mp)init/mp_clear calls (and potential mallocs,frees and zeros) in tight loops → Remove mp_init/mp_clear calls (and potential mallocs,frees and zeros) in tight loops
Attachment #211420 - Flags: review+
Comment on attachment 211405 [details] [diff] [review] version 2: Only generate one ephemeral key, not one per connection Review comments: >- * XXX For now, the elliptic curve is hardcoded to NIST P-256. >+ * XXX For now, the elliptic curve is hardcoded to be the same >+ * strength as the signing certificate (ECC or RSA). s/hardcoded/chosen/ >+/* These should be indexed by curve, and probably some code to >+ * rotate the keys out after 'n' uses. */ >+static PRInt32 globalECDHEKeyFlag = 0; >+static ssl3KeyPair *globalECDHEKeyPair = NULL; Yes, in the world of negotiated curves, we need to pick a curve that is supported by the remote box. For clients and (especially) servers that support ECDHE_RSA_mumble, we will need to generate a different ephemeral key for each curve that we and our peers mutually support. So, IMO, we need an array of global "ephemeral" keys, one per supported curve. The signature of the function below may need to change to support that. > SECStatus > ssl3_CreateECDHEphemeralKeys(sslSocket *ss) >+ /* There's a global key use it */ >+ if (globalECDHEKeyPair != NULL) { Obviously, this function is going to have to be structured to determine the curve for which it needs a key before it gets this far. >+ /* ok, no one has generated a global key yet, do so */ >+ /* pick the cipher based on the server's key */ >+ srvPublicKey = ss->serverCerts[kt_ecdh].serverKeyPair->pubKey; If there's any way that we could have come this far and yet that serverKeyPair pointer is still NULL, then the above statement will crash. >+ /* map the ECC DH key strength to that of the RSA signing cert */ or as close as we can get within the set of negotiated curves. >+ srvPublicKey = ss->serverCerts[kt_rsa].serverKeyPair->pubKey; Same issue here. If serverKeyPair is NULL, boom. Obviously, the rest of this function needs to change to deal with negotiated sets of curves. >+ /* now make sure only one guy sets the global key (at startup, several >+ * threads may be trying to generate global keys */ >+ count = PR_AtomicIncrement(&globalECDHEKeyFlag); >+ /* we win the race, our key becomes global */ >+ if (count == 1) { Should we use PR_RunOnce instead of this scheme?
Attachment #211405 - Flags: review?(nelson) → review-
Implement the derive sensitive only... patch checked into 3.11 branch and tip Checking in pkcs11c.c; /cvsroot/mozilla/security/nss/lib/softoken/pkcs11c.c,v <-- pkcs11c.c new revision: 1.68.2.4; previous revision: 1.68.2.3 done and tip (pkcs11c.c, revision 1.76)
Comment on attachment 211285 [details] [diff] [review] CHECKED IN:Supplemental patch to cache the server's pub key request second review so this patch can be checked into the 3.11 branch.
Attachment #211285 - Flags: superreview?(wtchang)
"Remove mp_init/mp_clear calls....." patch checked into tip. Checking in ecp_jm.c; /cvsroot/mozilla/security/nss/lib/freebl/ecl/ecp_jm.c,v <-- ecp_jm.c new revision: 1.4; previous revision: 1.3 done
Comment on attachment 211285 [details] [diff] [review] CHECKED IN:Supplemental patch to cache the server's pub key r=wtc. While reviewing this patch, I have the folloing observation and question about the SSL_ConfigSecureServer function. The requirement that 'cert' and 'key' must be both NULL or both non-NULL is crucial to the correctness of this function. In particular, 'pubKey' is created in a "if (cert)" block and adopted by sc->serverKeyPair in a "if (key)" block. To prove the property that When the function returns SECSuccess, a created 'pubKey' must have been adopted and therefore doesn't need to be destroyed. one must show that "if (cert)" and "if (key)" are equivalent conditions. Question: this function returns SECFailure directly if the ssl3_CreateRSAStepDownKeys call fails. I'm wondering if that "return SECFailure" statement should be replaced with "goto loser".
Attachment #211285 - Flags: superreview?(wtchang) → superreview+
"Supplemental patch to cache the server's pub key" checked into NSS 3.11 Checking in sslsecur.c; /cvsroot/mozilla/security/nss/lib/ssl/sslsecur.c,v <-- sslsecur.c new revision: 1.34.2.1; previous revision: 1.34 done
Attachment #211405 - Attachment is obsolete: true
Attachment #213674 - Flags: review?(nelson)
Comment on attachment 211285 [details] [diff] [review] CHECKED IN:Supplemental patch to cache the server's pub key Nelson, I found that no NSS code uses serverKeyPair->pubKey, so is this patch for some new code that will use serverKeyPair->pubKey?
Now that all the callers of ssl3_NewKeyPair pass non-NULL privKey and pubKey to it, we should make its invalid argument checking more strict. After all, it creates a key "pair" object. libSSL also has checks like this: svrAuth->serverKeyPair && svrAuth->SERVERKEY With this patch applied, one can prove that svrAuth->serverKeyPair != NULL implies svrAuth->SERVERKEY != NULL, so we only need to test svrAuth->serverKeyPair. But I guess some of us may prefer to keep those tests as defensive programming.
Attachment #213677 - Flags: review?(nelson)
(In reply to comment #32) > Nelson, I found that no NSS code uses serverKeyPair->pubKey, > so is this patch for some new code that will use serverKeyPair->pubKey? Yes, new code in attachment 211408 [details] [diff] [review]
(In reply to comment #29) > The requirement that 'cert' and 'key' must be both NULL > or both non-NULL is crucial to the correctness of this function. > one must show that "if (cert)" and "if (key)" are equivalent conditions. I think that is accomplished by this code at the beginning of the function: if (!cert != !key) { PORT_SetError(SEC_ERROR_INVALID_ARGS); return SECFailure; } > Question: this function returns SECFailure directly > if the ssl3_CreateRSAStepDownKeys call fails. I'm > wondering if that "return SECFailure" statement should > be replaced with "goto loser". If this function returns SECFailure, the caller should close the socket immediately, which will free the relevant resources. So going to loser will cause the resources to be freed a tiny bit sooner. The only other consideration is if the caller ignores the return value from this function and proceeds under the assumption that it succeeded. In that case, we want this function to leave the socket in a state where it will be robust against such an application error. I'm not immediately sure whether releasing the resources now, or later, is more robust.
P1 for 3.11.1 per today's meeting.
Priority: -- → P1
Comment on attachment 213677 [details] [diff] [review] CHECKED IN 3.12: ssl3_NewKeyPair should disallow a null privKey or pubKey I'm not sure what motivates this change, but it seems harmless, because it appears that presently no code in libSSL depends on its present ability to create a pair with only one key. r=nelson
Attachment #213677 - Flags: review?(nelson) → review+
Comment on attachment 213674 [details] [diff] [review] version 3: Only generate one ephemeral key, not one per connection This is much closer than before. And with respect to curve selection is probably the best we can do until the curve negotiation is in place. But I see one more design issue. This scheme populates a static array of pointers with references to key pairs, which in turn hold references to keys. There must be some scheme that releases those resources, or else NSS_Shutdown will fail. I'm afraid to suggest this, because I think it may greatly increase the magnitude of this effort, but I think the resource of a global array of ephemeral key pairs should be managed in libNSS, not in libSSL, and libSSL should access it through some calls to new libNSS functions. Then libnSS can manage the resource release at shutdown time. Obligatory nit picking: (:-) >+ /* convert to strength in bits */ >+ serverKeyStrengthInBits *= 8; I suggest using BPB or similar rather than '8' there. >+ if (serverKeyStrengthInBits <= 1024) { >+ ec_curve = ec_secp160r2; >+ } else if (serverKeyStrengthInBits <= 2048) { >+ ec_curve = ec_secp224r1; >+ } else if (serverKeyStrengthInBits <= 3072) { >+ ec_curve = ec_secp256r1; >+ } else if (serverKeyStrengthInBits <= 7168) { >+ ec_curve = ec_secp384r1; >+ } else { >+ ec_curve = ec_secp521r1; >+ } All these curves are in GFp. Might the peer need GF2m? >+ /* now make sure only one guy sets the global key (at startup, several >+ * threads may be trying to generate global keys */ >+ count = PR_AtomicIncrement(&globalECDHEKeyFlag[ec_curve]); >+ /* we win the race, our key becomes global */ >+ if (count == 1) { >+ PORT_Assert(globalECDHEKeyPair[ec_curve] == NULL); >+ globalECDHEKeyPair[ec_curve] = keyPair; >+ ss->ephemeralECDHKeyPair = >+ ssl3_GetKeyPairRef(globalECDHEKeyPair[ec_curve]); >+ } else { >+ /* we lost, use our key by ourselves */ >+ ss->ephemeralECDHKeyPair = keyPair; >+ } I might suggest this simpler alternative to the above: >+ /* we always use our own key */ >+ ss->ephemeralECDHKeyPair = keyPair; >+ /* now make sure only one guy sets the global key (at startup, several >+ * threads may be trying to generate global keys */ >+ count = PR_AtomicIncrement(&globalECDHEKeyFlag[ec_curve]); >+ if (count == 1) { >+ /* we win the race, our key becomes global */ >+ PORT_Assert(globalECDHEKeyPair[ec_curve] == NULL); >+ globalECDHEKeyPair[ec_curve] = ssl3_GetKeyPairRef(keyPair); >+ }
Attachment #213674 - Flags: review?(nelson) → review-
Comment on attachment 213677 [details] [diff] [review] CHECKED IN 3.12: ssl3_NewKeyPair should disallow a null privKey or pubKey I checked in the ssl3_NewKeyPair patch on the NSS trunk (NSS 3.12). Nelson, the motivation for this patch is code cleanup. Removing unnecessary/unreachable code paths makes it easier to reason about the properties of the code.
> But I see one more design issue. This scheme populates a static > array of pointers with references to key pairs, which in turn hold > references to keys. There must be some scheme that releases those > resources, or else NSS_Shutdown will fail. Arg, I know I responded to this, but I must not have hit commit. Good issue. I see 3 possible solutions: 1) Move all this stuff to somewhere in nss3. 2) Implement shutdown callbacks for nss3 (so shared libraries can register functions to be called on NSS shutdown). 3) clear the keys out in SSL_ClearSessionCache. 3 is probably the easiest, but it may contain pitfalls I'm not aware of (would application be surprised that clearing the SSL cache also clears out the ec ephemeral keys? Is it ok to call SSL_ClearSessionCache while the server is actively handling request, or is it only something to be called when the system is quiessed?)
Attachment #211215 - Attachment description: Fix a bug where ECC keys were not being copied. → CHECKED IN:Fix a bug where ECC keys were not being copied.
Attachment #211285 - Attachment description: Supplemental patch to cache the server's pub key → CHECKED IN:Supplemental patch to cache the server's pub key
Attachment #211396 - Attachment description: version 2: Implement the derive sensitive only for those derivation functions that require it. → CHECKED IN version 2: Implement the derive sensitive only for those derivation functions that require it.
Attachment #211420 - Attachment description: Remove mp_init/mp_clear calls (and potential mallocs,frees and zeros) in tight loops → CHECKED IN 3.12: Remove mp_init/mp_clear calls (and potential mallocs,frees and zeros) in tight loops
Attachment #213677 - Attachment description: ssl3_NewKeyPair should disallow a null privKey or pubKey → CHECKED IN 3.12: ssl3_NewKeyPair should disallow a null privKey or pubKey
This patch is in 2 parts and should be reviewed together. I split the patch in part 1 and part 2 in the hopes that part 1 will diff with version 3 of the patch.
Attachment #213677 - Attachment is obsolete: true
Attachment #215280 - Flags: review?(nelson)
version 4.1 and 4.2 should be reviewed together. I implemented the NSS callback on shutdown method for the following reasons: 1) Clients and servers call different cache cleanup functions. The client version was thread-safe, which presented some challenges to this cleanup function. This made my proposal 3 iffy. 2) the ssl_KeyPair code is part of ssl3, making moving the whole thing down into nss3.dll problematic. 3) The shutdown callback option was always the prefered solution anyway and was relatively simple to implement. bob
Attachment #215283 - Flags: review?(nelson)
Attachment #213674 - Attachment is obsolete: true
Attachment #213677 - Attachment is obsolete: false
version 4.1 and 4.2 should be reviewed together. I implemented the NSS callback on shutdown method for the following reasons: 1) Clients and servers call different cache cleanup functions. The client version was thread-safe, which presented some challenges to this cleanup function. This made my proposal 3 iffy. 2) the ssl_KeyPair code is part of ssl3, making moving the whole thing down into nss3.dll problematic. 3) The shutdown callback option was always the prefered solution anyway and was relatively simple to implement. bob
Attachment #215330 - Flags: review?(nelson)
Comment on attachment 215330 [details] [diff] [review] version 4.2: register/unregister shutdown functions (duplicate patch) This patch was attached twice, obsoleting one of the references.
Attachment #215330 - Attachment is obsolete: true
Attachment #215330 - Flags: review?(nelson)
Comment on attachment 215283 [details] [diff] [review] version 4.2: register/unregister shutdown functions >@@ -184,6 +183,28 @@ > */ > SECStatus NSS_NoDB_Init(const char *configdir); > >+/* >+ * Allow applications and libraries to register with NSS so that they are called >+ * when NSS shuts down. >+ * >+ * Shutdown functions that a void * pointer which is NULL in this release, but is >+ * reserved for future versions of NSS to pass some future status information >+ * back to the shutdown function. If the shutdown function returns SECFailure, >+ * Shutdown will still complete, but NSS_Shutdown() will return SECFailure. >+ */ >+typedef SECStatus (*NSS_ShutdownFunc)(void *); This type takes a void*, yet the registration function provides no way to register a value for this void *, and no way to store the void *. Instead we always pass NULL. This needs to be fixed. Why define it in a way that MUST be fixed? Better might be to delcare this typedef without any formal parameters. e.g. just >+typedef SECStatus (*NSS_ShutdownFunc)(); or >+typedef SECStatus (*NSS_ShutdownFunc)(...); But best to just add the void* pointers into the definition now. >+/* >+ * Register a shutdown function. >+ */ >+SECStatus NSS_RegisterShutdown(NSS_ShutdownFunc sFunc); This function obviously wants an argument for the "completion" value of that that void* that will be passed to the shutdown func. >+/* >+ * Remove an existing shutdown function (you may do this if your library is >+ * complete and going away, but NSS is still running). >+ */ >+SECStatus NSS_UnregisterShutdown(NSS_ShutdownFunc sFunc); Probably needs that same completion value here, too. >Index: nssinit.c >+#define NSS_SHUTDOWN_STEP 10 >+ >+static struct NSSShutdownListStr { >+ PZLock *lock; >+ int dataSize; Apparently this "size" is in units of funtion pointers, not bytes. Call this "maxFuncs" rather than "dataSize". >+ int numFuncs; >+ NSS_ShutdownFunc *funcs; >+} nssShutdownList = { 0 }; >+ >+/* >+ * register a callback to be called when NSS shuts down >+ */ >+SECStatus >+NSS_RegisterShutdown(NSS_ShutdownFunc sFunc) >+{ >+ if (!nss_IsInitted) { >+ PORT_SetError(SEC_ERROR_NSS_NOT_INITIALIZED); >+ return SECFailure; >+ } >+ PORT_Assert(nssShutdownList.lock); >+ PZ_Lock(nssShutdownList.lock); >+ if (nssShutdownList.dataSize == nssShutdownList.numFuncs) { >+ nssShutdownList.funcs = (NSS_ShutdownFunc *)PORT_Realloc >+ (nssShutdownList.funcs, >+ (nssShutdownList.dataSize + NSS_SHUTDOWN_STEP) >+ *sizeof(NSS_ShutdownFunc)); IINM, if this Realloc returns NULL, we not only are unable to register a new shutdown func, but we also lose all the previously registered shutdown func registrations. double-plus-un-good. Use a tmp pointer, so that you don't lose the old values if the new one is NULL. >+ if (!nssShutdownList.funcs) { >+ return SECFailure; >+ } >+ nssShutdownList.dataSize += NSS_SHUTDOWN_STEP; >+ } >+ nssShutdownList.funcs[nssShutdownList.numFuncs++] = sFunc; >+ PZ_Unlock(nssShutdownList.lock); >+ return SECSuccess; >+} >+/* >+ * bring up and shutdown the shutdown list >+ */ >+static SECStatus >+nss_InitShutdownList(void) >+{ >+ nssShutdownList.lock = PZ_NewLock(); >+ if (nssShutdownList.lock == NULL) { >+ return SECFailure; >+ } >+ nssShutdownList.funcs = PORT_NewArray(NSS_ShutdownFunc, NSS_SHUTDOWN_STEP); ZnewArray, I think.
Attachment #215283 - Flags: review?(nelson) → review-
>>+typedef SECStatus (*NSS_ShutdownFunc)(void *); >This type takes a void*, yet the registration function provides no way >to register a value for this void *, and no way to store the void *. >Instead we always pass NULL. This needs to be fixed. >Why define it in a way that MUST be fixed? This is a design change. The comment above clearly states that the pointer is for future versions of NSS to pass any state back to the call backs. It's not a user arg pointer. For this release it is always NULL. I included it so in the future NSS could define data that it might need to pass to the shutdown functions. Do you really think user arg data is necessary or just a nice to have (I am reluctant to add it since it may encourage libraries to register a plethera of shutdown functions rather than a single one). >IINM, if this Realloc returns NULL, we not only are unable to >register a new shutdown func, but we also lose all the previously >registered shutdown func registrations. double-plus-un-good. >Use a tmp pointer, so that you don't lose the old values if the >new one is NULL. I'll take care of this one. I keep forgetting that realloc doesn't free the old memory if it fails. bob
Incorporated review comments. I've kept the single NSS supplied argument (currently NULL).
Attachment #215283 - Attachment is obsolete: true
Attachment #215733 - Flags: review?(nelson)
This is the sse2 version that does not require multiple shared libraries. We do a runtime check in the assembler code to choose our instruction set.
Comment on attachment 215739 [details] [diff] [review] Use SSE2 multiply instructions on intel processors. This should be the last patch in this bug. Note: This patch may require regenerating mpcpucache_x86.s. bob
Attachment #215739 - Flags: review?(nelson)
Comment on attachment 215280 [details] [diff] [review] version 4.1: Only generate one ephemeral key, not one per connection Please review this with 4.1b
Attachment #215280 - Flags: superreview?(julien.pierre.bugs)
Comment on attachment 215733 [details] [diff] [review] version 4.2b: register/unregister shutdown functions Please review with version 4.1
Attachment #215733 - Flags: superreview?(julien.pierre.bugs)
Comment on attachment 215739 [details] [diff] [review] Use SSE2 multiply instructions on intel processors. Unfortunately this patch is probably not suitable for a new NSS engineer.
Attachment #215739 - Flags: superreview?(julien.pierre.bugs)
Comment on attachment 215739 [details] [diff] [review] Use SSE2 multiply instructions on intel processors. Do we not care about SSE2 instructions on Intel CPUs when running Solaris? or when running on Windows? BTW, I think this doesn't require NSS expertise as much as it requires Intel assembler expertise for SSE2. I'm afraid I've forgotten much of what I once knew about SSE2. :-( Are there any other mozilla contributors who know SSE2 who could review this? Otherwise this will have to wait until I get the TLS header extension work done.
Attachment #215733 - Attachment description: version 4.2b: Only generate one ephemeral key, not one per connection → version 4.2b: register/unregister shutdown functions
Comment on attachment 215733 [details] [diff] [review] version 4.2b: register/unregister shutdown functions IIRC, In our weekly concall Thursday, we decided that the means for registering a shutdown callback must allow an argument value to be registered along with it, a value that will be an argument to the callback. We also discussed that the registration function should detect and disallow duplicate registrations. I'm not sure we definitely agreed on how ot handle that. I think we definitely want to disallow registration of a pair {function address, callback argument} that duplicates another already-registered pair exactlyt (that is, duplicates both the function address and the argument).
Attachment #215733 - Flags: review?(nelson) → review-
Comment on attachment 215280 [details] [diff] [review] version 4.1: Only generate one ephemeral key, not one per connection The changes to the shutdown callback registration function will necessitate changes to this code. That's the main reason for r-. I believe the PRCallOnce needs to be cleared out by the shutdown function also, so that NSS/SSL can be started, shutdown, and restarted again, perhaps many times, during the lifetime of a single process. I have lots of other suggested changes below. If you'd like, once you've committed the shutdown registration functionality, I'm willing to make the next rev of *this* patch and ask you to review it. I'm going to suggest using a separate PRCallOnce to "protect" the creation of each of the key pairs for the different curves. I think that simplifies the code and certain reduces the possible occurrence of unnecessary key pair gens. Finally, I'm going to suggest replaceing the several new arrays (each of a different member type) with a single new array, whose member type is a struct that contains all of those types that today are in inidividual arrays. Doing so makes it easier to add more members, and also makes it easier in a debugger to find the corresponding values, because they're all side-by-side in memory. >Index: ssl3ecc.c >+/* arrays of ECDHE KeyPairs */ >+static PRInt32 globalECDHEKeyFlag[ec_pastLastName] = { 0 }; >+static ssl3KeyPair *globalECDHEKeyPair[ec_pastLastName] = { NULL }; Let's declare a struct type containing a flag, an ssk3KeyPair*, and a PRCallOnce, and have a single array of that struct type, rather than separate arrays. > >+/* function to clear out the lists */ >+static SECStatus >+ssl3_ShutdownECDHECurves(void *p) >+{ >+ int i; >+ >+ for(i=0; i < ec_pastLastName; i++) { >+ ssl3KeyPair *keyPair = globalECDHEKeyPair[i]; >+ globalECDHEKeyPair[i] = NULL; >+ globalECDHEKeyFlag[i] = 0; Obviously this has to be rewritten for the new struct proposal. We'd also add code here to clear out the runonce. >+static PRStatus >+ssl3_ECRegister(void) >+{ >+ if (NSS_RegisterShutdown(ssl3_ShutdownECDHECurves) != SECSuccess) { will need to change. If this function returns SECStatus, why not just return it's value here, instead of having if ... return SECFailure; else return SECSuccess; >+#define GET_SERVER_PUBLICKEY(sock, type) \ >+ (ss->serverCerts[type].serverKeyPair ? \ >+ ss->serverCerts[type].serverKeyPair->pubKey : NULL) I think this macro/function belongs in sslimpl.h, probably named SSL_GET_SERVER_PUBLIC_KEY >+ /* find the appropriate curve */ >+ if (ss->ssl3.hs.kea_def->kea == kea_ecdhe_ecdsa) { >+ srvPublicKey = GET_SERVER_PUBLICKEY(ss, kt_ecdh); >+ ec_curve = ec_noName; >+ if (srvPublicKey) { >+ ec_curve = params2ecName(&srvPublicKey->u.ec.DEREncodedParams); >+ } This code above is the code I was looking for. Thanks. >+ if (serverKeyStrengthInBits <= 1024) { >+ ec_curve = ec_secp160r2; >+ } else if (serverKeyStrengthInBits <= 2048) { ... This code above will need to be changed to take into consideration the negotiated/supported curves. But I'll do that after you check this in. I think the code below would be better structured this way: if there's no "global" key for this curve, then use the PRCallOnceType for this curve to create it. (that is, have a function to create it that gets called through a PRCallOnceType of which there is a separate instance for each curve). Then if there's still no "global" key, we're doomed. Else use that (maybe new) global key. That method obviates the atomically incremented flag. >+ /* we always use our own key */ >+ ss->ephemeralECDHKeyPair = keyPair; With the alternative design, we always use the global key, and we never have generated our own non-global key to use instead. >+ /* first curve the registers, also registers the callback function */ curve the registers? What does that mean? :-)
Attachment #215280 - Flags: review?(nelson) → review-
QA Contact: jason.m.reid → libraries
Attachment #215733 - Attachment is obsolete: true
Attachment #216808 - Flags: superreview?(julien.pierre.bugs)
Attachment #216808 - Flags: review?(nelson)
Attachment #215733 - Flags: superreview?(julien.pierre.bugs)
Comment on attachment 216808 [details] [diff] [review] version 5:register/unregister shutdown functions Very close. >+ if (i > 0) { >+ PZ_Unlock(nssShutdownList.lock); >+ PORT_SetError(SEC_ERROR_DUPLICATE_FUNCTION); This error code isn't defined in NSS today, and your patch doesnt define it. >+/* >+ * bring up and shutdown the shutdown list >+ */ >+static SECStatus >+nss_InitShutdownList(void) That comment is wrong for this function. This function doesn't shut it down. Please make a separate comment for the shutdown function. >+static SECStatus >+nss_ShutdownShutdownList(void) >+{ >+ SECStatus rv = SECSuccess; >+ int i; >+ >+ /* call all the registerd functions first */ >+ for (i=0; i < nssShutdownList.numFuncs; i++) { This function shuld definitely hold the lock while doing this.
Attachment #216808 - Flags: review?(nelson) → review-
Comment on attachment 215739 [details] [diff] [review] Use SSE2 multiply instructions on intel processors. r=nelson with some comments and suggested changes. >+unsigned long >+s_mpi_is_sse2() >+{ >+ unsigned long eax, ebx, ecx, edx; >+ unsigned long cpuidLevel; Suggest using int instead of long, since these values are always 32 bits (and we can't use NSPR). >+ char string[65]; Looks like this string needn't be longer than 13 or 17 bytes. >+ # -1 means to call s_mpi_is_sse to determine if we support sse >+ # instructions. >+ # 0 means to use x86 instructions >+ # 1 means to use sse2 instructions Several thoughts on this: a) if the initialized value was zero, rather than -1, this could be in bss, rather than in initialized data. b) if this was a pointer, rather than a flag, it could be initialized to point to s_mpi_is_sse, which could change the pointer to point to either s_mpv_mul_d_x86 or to s_mpv_mul_d_sse2. Then the code at label s_mpv_mul_d could be a single jump indirect instruction, rather than a series of 8 instructions. >+# sigh, handle the difference between -fPIC and not PIC >+# default to pic, since this file seems to be exclusively >+# linux right now (solaris uses mpi_i86pc.s and windows uses >+# mpi_x86_asm.c) I think of this file, not as being the "linux" file, but rather as being the "gas" file. I think of the other two files not as being for other OSes, bur as being for other assemblers. This file should be usable on all x86-family boxes that use gas. I'm wondering if this file will be usable on the new X86 MACs with this change, or if it will only be usable on Linux. But if necessary, it can be changed again for x86 MAC, when the time comes. There's some indentation strangeness that occurs repeatedly in several places: >+ mov 12(%ebp),%ecx # ecx = a_len >+ movd 16(%ebp),%mm1 # mm1 = b ^ >+ mov 20(%ebp),%edi >+ movd 0(%esi),%mm0 # mm0 = *a++ >+ add $4,%esi ^^^^^^ >+ pmuludq %mm1,%mm0 # mm0 = b * *a++ >+ movd 0(%esi),%mm0 # mm0 = *a++ >+ add $4,%esi >+ pmuludq %mm1,%mm0 # mm0 = b * *a++ >+ paddq %mm0,%mm2 # add the carry >+ movd 0(%edi),%mm0 That movd instruction could be moved backwards (that is, up) a few lines/instructions, so that the fetch time overlapped the (presently) preceeding paddq and/or pmuludq. Might make it a weensie bit faster. >+ dec %ecx # --a_len >+ jnz 15b # jmp if a_len != 0 I think there's a single instruction that combines dec cx and jnz but it's obviously not crucial. >+ cmp $0,%eax I think there's a "test" instruction that takes only one operand and is equivalent to cmp $0,reg might buy us a few nanojiffies. but again, this is not a crucial change.
Attachment #215739 - Flags: review?(nelson) → review+
Attachment #215283 - Attachment description: version 4.2: Only generate one ephemeral key, not one per connection → version 4.2: register/unregister shutdown functions
Attachment #215330 - Attachment description: version 4.2: Only generate one ephemeral key, not one per connection → version 4.2: register/unregister shutdown functions (duplicate patch)
Comment on attachment 215739 [details] [diff] [review] Use SSE2 multiply instructions on intel processors. r=wtc. Like Nelson, I have some suggested changes and comments on this patch. Note that I didn't review the new SSE2 assembly code. I only reviewed the 8 lines of new assembly code at the beginning of each function, and made sure that the original assembly code is the same as the new assembly code for the "x86" case. 1. mpi/mpcpucache.c >+unsigned long >+s_mpi_is_sse2() >+{ >+ unsigned long eax, ebx, ecx, edx; >+ unsigned long cpuidLevel; In comment 58, Nelson suggested that we change 'long' to 'int'. The reason we use 'long' here is that eax, ebx, ecx, and edx are passed as arguments to cpuid() and that function's prototype uses 'unsigned long'. The 'cpuidLevel' variable is set but not used. I suggest we remove it. >+ char string[65]; I agree with Nelson that this string's size just needs to be 13. >+ if (is386()) { >+ return 0; >+ } if (is486()) { >+ return 0; >+ } This should be written as if (is386() || is486()) { return 0; } This code is copied from the s_mpi_getProcessorLineSize function. In that function, I suggest we put the second 'if' on a separate line from the closing '}' of the first 'if'. >+ *(int *)string = ebx; >+ *(int *)&string[4] = edx; >+ *(int *)&string[8] = ecx; I assume these potentially unaligned 'int' accesses are intentional. I know the x86 processors allow unaligned memory access. 2. mpi/mpi_x86.s >+ # -1 means to call s_mpi_is_sse to determine if we support sse Change the two instances of "sse" to "sse2". >+.ifndef NO_PIC >+.macro GET var,reg >+ movl \var@GOTOFF(%ebx),\reg >+.endm >+.macro PUT reg,var >+ movl \reg,\var@GOTOFF(%ebx) >+.endm >+.else >+.macro GET var,reg >+ movl \var(%ebx),\reg >+.endm >+.macro PUT reg,var >+ movl \reg,\var >+.endm >+.endif >+ > .text Since the GET and PUT macros are used in the .text section, it seems better to move their definitions after the .text directive. Did you test the NO_PIC (second) case? I am wondering if the PUT macro should be defined as movl \reg,\var(%ebx) That is, with (%ebx) after \var. I am curious how you picked the numbered labels. In the original code, the numbered labels are mostly contiguous, with only 9 and 13 missing. In the new code, there are much more "holes" (3,4,7,8,9,12,13,14,17,18,19,24,29,33) in the numbered labels. Why doesn't the s_mpv_div_2dx1d function at the end of this file have an sse2 version?
Attachment #215739 - Flags: superreview?(julien.pierre.bugs) → superreview+
Comment on attachment 213677 [details] [diff] [review] CHECKED IN 3.12: ssl3_NewKeyPair should disallow a null privKey or pubKey I want to eliminate unnecessary libSSL source code differences between trunk and branch, so I am requesting second review on this patch for the branch.
Attachment #213677 - Flags: review?(julien.pierre.bugs)
Comment on attachment 216808 [details] [diff] [review] version 5:register/unregister shutdown functions I applied this patch and attempted to build it, and got these compiler errors: nss/nssinit.c(646) : error C2065: 'SEC_ERROR_NSS_NOT_INITIALIZED' : undeclared identifier nss/nssinit.c(661) : error C2065: 'SEC_ERROR_DUPLICATE_FUNCTION' : undeclared identifier nss/nssinit.c(712) : error C2065: 'SEC_ERROR_FUNCTION_NOT_FOUND' : undeclared identifier nss/nssinit.c(724) : warning C4003 : not enough actual parameters for macro 'PZ_NewLock'
Attachment #216808 - Flags: superreview?(julien.pierre.bugs)
This version of this patch compiles without warnings. It's probably not exactly what Bob wants, but with it I can test and continue development of the ECC code.
Attachment #216808 - Attachment is obsolete: true
Comment on attachment 217535 [details] [diff] [review] register/unregister shutdown functions, v6 (Checked in) Looks good, though I do have the changes to secerr.h if we want more meaningful error codes.
Attachment #217535 - Flags: review+
> b) if this was a pointer, rather than a flag, it could be initialized > to point to s_mpi_is_sse, which could change the pointer to point to > either s_mpv_mul_d_x86 or to s_mpv_mul_d_sse2. Then the code at > label s_mpv_mul_d could be a single jump indirect instruction, > rather than a series of 8 instructions. The 8 instructions are only executed once. It does cost 3 or 4 instructions in the normal case (load is_sse, cmp, then jz or jg). The indirect pointer would 3 instruction, though you would need one for each function. Since we're about to drop into a multiply, I'm not sure I could measure the performance gain;). > >+ dec %ecx # --a_len > >+ jnz 15b # jmp if a_len != 0 > > I think there's a single instruction that combines dec cx and jnz > but it's obviously not crucial. loop I think. I left it from the previous code. > Did you test the NO_PIC (second) case? I am wondering if > the PUT macro should be defined as I had at some point, but I think before I created the PUT and Get Macros. (I was just using defines. Obviously the macros wouldn't have worked with the (%ebx).. > I am curious how you picked the numbered labels. In the > original code, the numbered labels are mostly contiguous, > with only 9 and 13 missing. In the new code, there are > much more "holes" (3,4,7,8,9,12,13,14,17,18,19,24,29,33) > in the numbered labels. Yes, the holes are intentional I basically gave each function a label space (first one 0-9, second 10-19, etc). The later functions ended up taking all the labels allocated to it). > Why doesn't the s_mpv_div_2dx1d function at the end of > this file have an sse2 version? Because there is not sse2 divide function;).
Comment on attachment 217535 [details] [diff] [review] register/unregister shutdown functions, v6 (Checked in) Checked in on trunk. Checking in nss.def; new revision: 1.163; previous revision: 1.162 Checking in nss.h; new revision: 1.48; previous revision: 1.47 Checking in nssinit.c; new revision: 1.72; previous revision: 1.71
Attachment #217535 - Attachment description: register/unregister shutdown functions, v6 → register/unregister shutdown functions, v6 (Checked in on trunk)
Attachment #217535 - Flags: review+
(In reply to comment #55) > (From update of attachment 215280 [details] [diff] [review] [edit]) In comment 55, I suggested using PRCallOnce instead of an AtomicIncrement for each of the 25 different curves. I forgot that PRCallOnce doesn't allow ANY arguments to be passed to the once-called function. :-( So, to do 25 different curves, I'd need 25 different callOne functions. :-( Whatta pain. Nevertheless, I am considering doing just that. There is a bigger issue with attachment 215280 [details] [diff] [review] though. in the ECDHE_RSA case, It attempts to choose the size of the ECDHE key (that is, the curve for that key) based on the strength of the signing key. I think that's the wrong basis for that decision. The basis should be the strength of the ciphers used in the cipher suite. The FIPS rule is that the strength of the key used for key transport (or key agreement) needs to be at least as strong as the strength of the key being transported (IINM), which is not a function of the size of the signature key used for authentication. So, I think it's back to the drawing board for the ECDHE_RSA code. More suggestions or discussion is welcome. I also think this discussion should be in a different bug (since it's not primarily performance work) so I will file a separate bug about that.
PR_CallOnceWithArg (added in NSPR 4.3) allows you to pass one argument to the once-called function.
Attachment #213677 - Flags: review?(julien.pierre.bugs) → review+
Attachment #215280 - Flags: superreview?(julien.pierre.bugs)
Comment on attachment 217535 [details] [diff] [review] register/unregister shutdown functions, v6 (Checked in) Checked in on branch Checking in nss.def; new revision: 1.158.2.3; previous revision: 1.158.2.2 Checking in nss.h; new revision: 1.46.2.3; previous revision: 1.46.2.2 Checking in nssinit.c; new revision: 1.69.2.3; previous revision: 1.69.2.2
Attachment #217535 - Attachment description: register/unregister shutdown functions, v6 (Checked in on trunk) → register/unregister shutdown functions, v6 (Checked in)
Comment on attachment 211396 [details] [diff] [review] CHECKED IN version 2: Implement the derive sensitive only for those derivation functions that require it. Bob, Wan-Teh, Is there some reason why we don't want this optimization in NSS 3.11.1 ??
Comment on attachment 211420 [details] [diff] [review] CHECKED IN 3.12: Remove mp_init/mp_clear calls (and potential mallocs,frees and zeros) in tight loops (Please ignore previous comment. It was attached to wrong patch.) Bob, Wan-Teh, this fix in on trunk but not branch. Is there some reason we don't want this fix in NSS 3.11.1 ?
Comment on attachment 211420 [details] [diff] [review] CHECKED IN 3.12: Remove mp_init/mp_clear calls (and potential mallocs,frees and zeros) in tight loops This patch modifies the lib/freebl/ecl library. Checking in this patch on the NSS_3_11_BRANCH will require us to redo the FIPS ECDSA algorithm testing. Since we haven't received the FIPS ECDSA algorithm certificate, we should not need to pay extra for retesting ECDSA.
re: (From update of attachment 211420 [details] [diff] [review] [edit]) There is no other reason for this not to go into 3.11. bob
Target Milestone: 3.11.1 → 3.11.2
Moving the rest of these issues to 3.12 is fine. NOTE: most of the patches are already in the 3.11 branch.
Target Milestone: 3.11.2 → 3.12
Keywords: perf
Status: NEW → RESOLVED
Closed: 17 years ago
Resolution: --- → FIXED
Summary: NSS ECC performance problems. → NSS ECC performance problems (intel)
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: