Closed
Bug 799267
Opened 12 years ago
Closed 11 years ago
Add telemetry for CertVerification
Categories
(Core :: Security: PSM, defect)
Tracking
()
RESOLVED
FIXED
mozilla20
People
(Reporter: cviecco, Assigned: cviecco)
Details
Attachments
(2 files, 10 obsolete files)
11.14 KB,
patch
|
briansmith
:
review+
|
Details | Diff | Splinter Review |
11.16 KB,
patch
|
Details | Diff | Splinter Review |
We need to add telemetry to PSM in order to determine effects of PKIX enabling
Assignee | ||
Comment 1•12 years ago
|
||
Assignee | ||
Updated•12 years ago
|
Assignee: nobody → cviecco
Assignee | ||
Updated•12 years ago
|
Attachment #669311 -
Flags: review?(bsmith)
Comment 2•12 years ago
|
||
Comment on attachment 669311 [details] [diff] [review] implementation v1 (time to check + is libpkix enabled) Review of attachment 669311 [details] [diff] [review]: ----------------------------------------------------------------- ::: security/manager/ssl/src/SSLServerCertVerification.cpp @@ +631,5 @@ > } > > SECStatus > PSM_SSL_PKIX_AuthCertificate(CERTCertificate *peerCert, void * pinarg, > const char * hostname) We cannot use telemetry in this function without locks because it may execute on any of 5 threads in the pool, and telemetry is not thread-safe. We need to record beginTime before we dispatch to the thread pool. ::: toolkit/components/telemetry/Histograms.json @@ +2315,5 @@ > + "SECURITY_NSS_VERIFICATION_FIRST_CONNECTION" : { > + "kind": "exponential", > + "high": "10000", > + "n_buckets": 50, > + "description": "Time spent on NSS doing a successful verification for a SSL connection (ms)" I think you expect that we can look at the data two-dimensionally across two different probes, but we cannot. We need to have a problem "time when libpkix is enabled" and another one "time when libpkix is disabled." Sadly, 10,000ms may not be enough time. Please increase it to 100,000ms. "FIRST_CONNECTION" is a bad name because it doesn't consider the fact that we do certificate validation during renegotiation, and that we sometimes do full handshakes even after the first connection (e.g. if the server doesn't support session resumption and/or if the session has expired). I recommend using the prefix SSL_CERT, and naming the probes SSL_CERT_VALIDATION_TIME_LIBPKIX and SSL_CERT_VALIDATION_TIME_CLASSIC.
Attachment #669311 -
Flags: review?(bsmith) → review-
Assignee | ||
Comment 3•12 years ago
|
||
Comment on attachment 669311 [details] [diff] [review] implementation v1 (time to check + is libpkix enabled) Review of attachment 669311 [details] [diff] [review]: ----------------------------------------------------------------- ::: security/manager/ssl/src/SSLServerCertVerification.cpp @@ +631,5 @@ > } > > SECStatus > PSM_SSL_PKIX_AuthCertificate(CERTCertificate *peerCert, void * pinarg, > const char * hostname) On the mutex OK. If we change the begin time, then shall we include the comparison to the domain name too? ::: toolkit/components/telemetry/Histograms.json @@ +2315,5 @@ > + "SECURITY_NSS_VERIFICATION_FIRST_CONNECTION" : { > + "kind": "exponential", > + "high": "10000", > + "n_buckets": 50, > + "description": "Time spent on NSS doing a successful verification for a SSL connection (ms)" Ok Agreed
Comment 4•12 years ago
|
||
Comment on attachment 669311 [details] [diff] [review] implementation v1 (time to check + is libpkix enabled) Review of attachment 669311 [details] [diff] [review]: ----------------------------------------------------------------- ::: security/manager/ssl/src/SSLServerCertVerification.cpp @@ +631,5 @@ > } > > SECStatus > PSM_SSL_PKIX_AuthCertificate(CERTCertificate *peerCert, void * pinarg, > const char * hostname) It is OK to start measuring right before the call to SSLServerCertVerificationJob::Dispatch. You can stop measuring in SetCertVerificationResult. It would be good if you could add a third dimension to the timing: whether or not we had a failure that caused us to take the error path that checks for overrides, etc. So, we'd have these probes: Certificate validated successfully using classic Certificate validated successfully using libpkix Certificate (initially) failed to validate using classic Certificate (initially) failed to validate using libpkix. My hypothesis is that the error cases take significantly longer than the non-error cases. We are not really concerned about how slow the error cases are.
Assignee | ||
Comment 5•12 years ago
|
||
Attachment #669311 -
Attachment is obsolete: true
Assignee | ||
Comment 6•12 years ago
|
||
Attachment #669716 -
Attachment is obsolete: true
Assignee | ||
Updated•12 years ago
|
Attachment #669722 -
Flags: review?(bsmith)
Assignee | ||
Comment 7•12 years ago
|
||
Comment on attachment 669722 [details] [diff] [review] implementaiton v2.1 (separate histograms + mutex) Did not notice a comment sent from brian regaring when to stop measuring. Need a new revision of this.
Attachment #669722 -
Flags: review?(bsmith)
Assignee | ||
Comment 8•12 years ago
|
||
I have the 'feeling' that we should move the failure code to the certerrorrunnable. But this one works as advertised
Attachment #669722 -
Attachment is obsolete: true
Attachment #669769 -
Flags: review?(bsmith)
Comment 9•12 years ago
|
||
Comment on attachment 669769 [details] [diff] [review] implementaiton v3.0 (separate histograms + mutex) Review of attachment 669769 [details] [diff] [review]: ----------------------------------------------------------------- ::: security/manager/ssl/src/SSLServerCertVerification.cpp @@ +111,5 @@ > #include "nsComponentManagerUtils.h" > #include "nsServiceManagerUtils.h" > #include "nsIConsoleService.h" > #include "PSMRunnable.h" > +#include <prlock.h> why? @@ +135,5 @@ > // do not use a nsCOMPtr to avoid static initializer/destructor > nsIThreadPool * gCertVerificationThreadPool = nullptr; > } // unnamed namespace > > +static mozilla::Mutex SSLVerificationTelemetryMutex("SSLVerificationMutex"); We cannot create a static mutex like this because we must avoid static initializers. See the comment 4 lines above. @@ +1044,5 @@ > // Reset the error code here so we can detect if AuthCertificate fails to > // set the error code if/when it fails. > PR_SetError(0, 0); > SECStatus rv = AuthCertificate(mInfoObject, mCert); > + uint32_t interval = (uint32_t) ((TimeStamp::Now() - mJobStartTime).ToMilliseconds()); Instead of calculating the interval yourself, use Telemetry::AccumulateTimeDelta, to avoid the messy casting and math. @@ +1057,3 @@ > nsRefPtr<SSLServerCertVerificationResult> restart > = new SSLServerCertVerificationResult(mInfoObject, 0); > restart->Dispatch(); Why not pass the Telemetry::ID and interval to the SSLServerCertVerificationResult so that we can avoid the mutex altogether, by calling Telemetry::Accumulate in nsNSSSocketInfo::SetCertVerificationResult, which is always guaranteed to run on the same thread? I'm a little nervous about the performance ramifications of adding a lock here just for telemetry, especially when we're trying to make this code as fast as possible. @@ +1065,5 @@ > + if(nsNSSComponent::globalConstFlagUsePKIXVerification){ > + Telemetry::Accumulate(Telemetry::SSL_INITIAL_FAILED_CERT_VALIDATION_TIME_LIBPKIX, interval); > + } > + else{ > + Telemetry::Accumulate(Telemetry::SSL_INITIAL_FAILED_CERT_VALIDATION_TIME_CLASSIC, interval); Here, it is OK to acquire a mutex if it helps, because this is the error path, where performance isn't as much of a factor. ::: toolkit/components/telemetry/Histograms.json @@ +2311,5 @@ > + "SSL_SUCCESFUL_CERT_VALIDATION_TIME_LIBPKIX" : { > + "kind": "exponential", > + "high": "100000", > + "n_buckets": 50, > + "description": "Time spent on a successful cert verification using LIBPKIX SSL (ms)" s/using LIBPKIX SSL/in libpix mode/. @@ +2317,5 @@ > + "SSL_SUCCESFUL_CERT_VALIDATION_TIME_CLASSIC" : { > + "kind": "exponential", > + "high": "100000", > + "n_buckets": 50, > + "description": "Time spent on a successful cert verification using 'Classic 'SSL (ms)" s/using 'Classic 'SSL/in classic mode/ @@ +2323,5 @@ > + "SSL_INITIAL_FAILED_CERT_VALIDATION_TIME_LIBPKIX" : { > + "kind": "exponential", > + "high": "100000", > + "n_buckets": 50, > + "description": "Time spent on an initially failed cert verification using LIBPKIX SSL (ms)" ditto. @@ +2329,5 @@ > + "SSL_INITIAL_FAILED_CERT_VALIDATION_TIME_CLASSIC" : { > + "kind": "exponential", > + "high": "100000", > + "n_buckets": 50, > + "description": "Time spent on an initially failed cert verification using 'Classic 'SSL (ms)" ditto.
Attachment #669769 -
Flags: review?(bsmith) → review+
Comment 10•12 years ago
|
||
Comment on attachment 669769 [details] [diff] [review] implementaiton v3.0 (separate histograms + mutex) Review of attachment 669769 [details] [diff] [review]: ----------------------------------------------------------------- Sorry. Getting close though!
Attachment #669769 -
Flags: review+ → review-
Assignee | ||
Comment 11•12 years ago
|
||
Comment on attachment 669769 [details] [diff] [review] implementaiton v3.0 (separate histograms + mutex) Review of attachment 669769 [details] [diff] [review]: ----------------------------------------------------------------- ::: security/manager/ssl/src/SSLServerCertVerification.cpp @@ +111,5 @@ > #include "nsComponentManagerUtils.h" > #include "nsServiceManagerUtils.h" > #include "nsIConsoleService.h" > #include "PSMRunnable.h" > +#include <prlock.h> Because I forgot to remove it.... removed now. @@ +135,5 @@ > // do not use a nsCOMPtr to avoid static initializer/destructor > nsIThreadPool * gCertVerificationThreadPool = nullptr; > } // unnamed namespace > > +static mozilla::Mutex SSLVerificationTelemetryMutex("SSLVerificationMutex"); Agreed. Moved init to InitializeSSLServerCertVerificationThreads. @@ +1044,5 @@ > // Reset the error code here so we can detect if AuthCertificate fails to > // set the error code if/when it fails. > PR_SetError(0, 0); > SECStatus rv = AuthCertificate(mInfoObject, mCert); > + uint32_t interval = (uint32_t) ((TimeStamp::Now() - mJobStartTime).ToMilliseconds()); Agreed. @@ +1057,3 @@ > nsRefPtr<SSLServerCertVerificationResult> restart > = new SSLServerCertVerificationResult(mInfoObject, 0); > restart->Dispatch(); This is a lock on a very small piece of code. I dont expect much contention. And we are already doing a bunch of crypto and well implemented locks without contention should take in the order of 10's of nanosecs (a <20 asm instructions). Also it seems very messy to use locks sometimes yes and sometimes not. The mutex is much more clear (I think). But it can be done without locks by doing the accumulate in the socket transport thread.
Assignee | ||
Comment 12•12 years ago
|
||
Attachment #669769 -
Attachment is obsolete: true
Assignee | ||
Updated•12 years ago
|
Attachment #670926 -
Flags: review?(bsmith)
Comment 13•12 years ago
|
||
Comment on attachment 670926 [details] [diff] [review] implementaiton v4.0 mutex only for error case. Review of attachment 670926 [details] [diff] [review]: ----------------------------------------------------------------- ::: security/manager/ssl/src/SSLServerCertVerification.cpp @@ +134,5 @@ > // do not use a nsCOMPtr to avoid static initializer/destructor > nsIThreadPool * gCertVerificationThreadPool = nullptr; > } // unnamed namespace > > +static mozilla::Mutex *SSLVerificationTelemetryMutex = NULL; 1. Instead of static, put it in the unnamed namespace two lines before. 2. Use the gXXX naming convention from the Mozilla Coding Style guide (gCertVerficationTelemetryMutex). 3. s/NULL/nullptr/. i.e. make it look like line 135. @@ +627,5 @@ > : mFdForLogging(fdForLogging) > , mInfoObject(infoObject) > , mCert(CERT_DupCertificate(cert)) > { > + mJobStartTime=TimeStamp::Now(); Add a comment or an assertion about being on the socket transport service thread. @@ +1044,5 @@ > // Reset the error code here so we can detect if AuthCertificate fails to > // set the error code if/when it fails. > PR_SetError(0, 0); > SECStatus rv = AuthCertificate(mInfoObject, mCert); > + uint32_t interval = (uint32_t) ((TimeStamp::Now() - mJobStartTime).ToMilliseconds()); When a function calls SECStatus, you must be very careful about calling any function between then and the time you call PR_GetError. You must either avoid calling any other functions, or you must call PR_GetError() and save the result into a variable for later use. @@ +1049,2 @@ > if (rv == SECSuccess) { > + MutexAutoLock telemetryMutex(*SSLVerificationTelemetryMutex); Either the patch description ("implementaiton v4.0 mutex only for error case") is wrong, or this code is wrong. Note that you could just pass mJobStartTime, TimeStamp::Now(), and nsNSSComponent::globalConstFlagUsePKIXVerification to the SSLServerCertVerificationResult constructor, and then call Telemetry::AccumulateTimeDelta from SSLServerCertVerificationResult::Run, without using the lock. Also, notice the code formatting. @@ +1058,5 @@ > = new SSLServerCertVerificationResult(mInfoObject, 0); > restart->Dispatch(); > return NS_OK; > } > + //The telemetry is on its own block to limit the size of the lock contention This comment is unnecessary because we all know this code pattern very well. ::: toolkit/components/telemetry/Histograms.json @@ +2317,5 @@ > + "SSL_SUCCESFUL_CERT_VALIDATION_TIME_CLASSIC" : { > + "kind": "exponential", > + "high": "100000", > + "n_buckets": 50, > + "description": "Time spent on a successful cert verification in classic mode (ms)" s/ / /
Attachment #670926 -
Flags: review?(bsmith) → review-
Assignee | ||
Comment 14•12 years ago
|
||
Comment 15•12 years ago
|
||
Comment on attachment 670971 [details] [diff] [review] implementaiton v4.1 (the real one) mutex only for error case. Review of attachment 670971 [details] [diff] [review]: ----------------------------------------------------------------- Getting close now. ::: security/manager/ssl/src/SSLServerCertVerification.cpp @@ +134,5 @@ > // do not use a nsCOMPtr to avoid static initializer/destructor > nsIThreadPool * gCertVerificationThreadPool = nullptr; > } // unnamed namespace > > +static mozilla::Mutex *SSLVerificationTelemetryMutex = NULL; See previous comment about this. @@ +621,5 @@ > > const void * const mFdForLogging; > const nsRefPtr<TransportSecurityInfo> mInfoObject; > CERTCertificate * const mCert; > + TimeStamp mJobStartTime; const @@ +629,5 @@ > const void * fdForLogging, TransportSecurityInfo * infoObject, > CERTCertificate * cert) > : mFdForLogging(fdForLogging) > , mInfoObject(infoObject) > , mCert(CERT_DupCertificate(cert)) , mJobStartTime(TimeStamp::Now()) @@ +1048,5 @@ > // Reset the error code here so we can detect if AuthCertificate fails to > // set the error code if/when it fails. > PR_SetError(0, 0); > SECStatus rv = AuthCertificate(mInfoObject, mCert); > + uint32_t interval = (uint32_t) ((TimeStamp::Now() - mJobStartTime).ToMilliseconds()); See previous comment about executing code between a function that returns SECStatus and the call to PR_GetError(). @@ +1063,4 @@ > restart->Dispatch(); > return NS_OK; > } > + //The telemetry is on its own block to limit the size of the lock contention Comment is unnecessary. However, a blank line is necessary. @@ +1064,5 @@ > return NS_OK; > } > + //The telemetry is on its own block to limit the size of the lock contention > + { > + This blank line should be removed. @@ +1067,5 @@ > + { > + > + MutexAutoLock telemetryMutex(*SSLVerificationTelemetryMutex); > + if(nsNSSComponent::globalConstFlagUsePKIXVerification){ > + Telemetry::Accumulate(Telemetry::SSL_INITIAL_FAILED_CERT_VALIDATION_TIME_LIBPKIX, interval); Fix line lengths to be ~80 characters. @@ +1070,5 @@ > + if(nsNSSComponent::globalConstFlagUsePKIXVerification){ > + Telemetry::Accumulate(Telemetry::SSL_INITIAL_FAILED_CERT_VALIDATION_TIME_LIBPKIX, interval); > + } > + else{ > + Telemetry::Accumulate(Telemetry::SSL_INITIAL_FAILED_CERT_VALIDATION_TIME_CLASSIC, interval); Same. Line is too long, so side-by-size diff wraps unless I shrink the font too much. @@ +1245,5 @@ > > SSLServerCertVerificationResult::SSLServerCertVerificationResult( > TransportSecurityInfo * infoObject, PRErrorCode errorCode, > + SSLErrorMessageType errorMessageType, > + mozilla::Telemetry::ID telemetryID , uint32_t telemetryValue ) Do not use the mozilla:: prefix except inside header files. @@ +1271,5 @@ > NS_IMETHODIMP > SSLServerCertVerificationResult::Run() > { > // TODO: Assert that we're on the socket transport thread > + if(mTelemetryID != Telemetry::HistogramCount){ whitespace. This condition is always true, isn't it?
Attachment #670971 -
Flags: review-
Comment 16•12 years ago
|
||
Also, explain in a comment that you take the lock in the error case because the performance doesn't matter and because the error case's control flow is too complicated to bother avoiding the lock in the same way we avoid it for the non-error case.
Assignee | ||
Comment 17•12 years ago
|
||
Comment on attachment 670971 [details] [diff] [review] implementaiton v4.1 (the real one) mutex only for error case. Review of attachment 670971 [details] [diff] [review]: ----------------------------------------------------------------- ::: security/manager/ssl/src/SSLServerCertVerification.cpp @@ +134,5 @@ > // do not use a nsCOMPtr to avoid static initializer/destructor > nsIThreadPool * gCertVerificationThreadPool = nullptr; > } // unnamed namespace > > +static mozilla::Mutex *SSLVerificationTelemetryMutex = NULL; I dont see the difference, it also a pointer to a null value (resolved at compile time). There is no static intializer/destructor. and even static to avoid polluting the global namespace. @@ +621,5 @@ > > const void * const mFdForLogging; > const nsRefPtr<TransportSecurityInfo> mInfoObject; > CERTCertificate * const mCert; > + TimeStamp mJobStartTime; agreed @@ +629,5 @@ > const void * fdForLogging, TransportSecurityInfo * infoObject, > CERTCertificate * cert) > : mFdForLogging(fdForLogging) > , mInfoObject(infoObject) > , mCert(CERT_DupCertificate(cert)) agreed @@ +1048,5 @@ > // Reset the error code here so we can detect if AuthCertificate fails to > // set the error code if/when it fails. > PR_SetError(0, 0); > SECStatus rv = AuthCertificate(mInfoObject, mCert); > + uint32_t interval = (uint32_t) ((TimeStamp::Now() - mJobStartTime).ToMilliseconds()); thanks fixed @@ +1063,4 @@ > restart->Dispatch(); > return NS_OK; > } > + //The telemetry is on its own block to limit the size of the lock contention removed @@ +1064,5 @@ > return NS_OK; > } > + //The telemetry is on its own block to limit the size of the lock contention > + { > + removed @@ +1067,5 @@ > + { > + > + MutexAutoLock telemetryMutex(*SSLVerificationTelemetryMutex); > + if(nsNSSComponent::globalConstFlagUsePKIXVerification){ > + Telemetry::Accumulate(Telemetry::SSL_INITIAL_FAILED_CERT_VALIDATION_TIME_LIBPKIX, interval); split now 88 chars... @@ +1070,5 @@ > + if(nsNSSComponent::globalConstFlagUsePKIXVerification){ > + Telemetry::Accumulate(Telemetry::SSL_INITIAL_FAILED_CERT_VALIDATION_TIME_LIBPKIX, interval); > + } > + else{ > + Telemetry::Accumulate(Telemetry::SSL_INITIAL_FAILED_CERT_VALIDATION_TIME_CLASSIC, interval); ditto (88 chars) @@ +1245,5 @@ > > SSLServerCertVerificationResult::SSLServerCertVerificationResult( > TransportSecurityInfo * infoObject, PRErrorCode errorCode, > + SSLErrorMessageType errorMessageType, > + mozilla::Telemetry::ID telemetryID , uint32_t telemetryValue ) done @@ +1271,5 @@ > NS_IMETHODIMP > SSLServerCertVerificationResult::Run() > { > // TODO: Assert that we're on the socket transport thread > + if(mTelemetryID != Telemetry::HistogramCount){ it should be, the condition could be removed, but I am taking defensive measures here.
Assignee | ||
Comment 18•12 years ago
|
||
Attachment #670926 -
Attachment is obsolete: true
Comment 19•12 years ago
|
||
> We cannot use telemetry in this function without locks because it may
> execute on any of 5 threads in the pool, and telemetry is not thread-safe.
I was under the impression that due to its non-thread-safe nature, accumulate needs to run from the main thread. I.e., is it enough to make sure that accumulate calls for a particular probe do not race (which is what your locking does), even if it might race with other probes (which the patch allows)?
Assignee | ||
Comment 20•12 years ago
|
||
Attachment #686722 -
Attachment is obsolete: true
Assignee | ||
Comment 21•12 years ago
|
||
Attachment #686788 -
Attachment is obsolete: true
Assignee | ||
Comment 22•12 years ago
|
||
Comment on attachment 686795 [details] [diff] [review] implementation v5.2 Review of attachment 686795 [details] [diff] [review]: ----------------------------------------------------------------- The only known issue/discussion is the static mutex pointer if it should be static or be part of the unnamed space in that file.
Assignee | ||
Updated•12 years ago
|
Attachment #686795 -
Flags: review?(bsmith)
Assignee | ||
Comment 23•12 years ago
|
||
(In reply to Devdatta Akhawe [:devd] from comment #19) > > We cannot use telemetry in this function without locks because it may > > execute on any of 5 threads in the pool, and telemetry is not thread-safe. > > I was under the impression that due to its non-thread-safe nature, > accumulate needs to run from the main thread. I.e., is it enough to make > sure that accumulate calls for a particular probe do not race (which is what > your locking does), even if it might race with other probes (which the patch > allows)? Currently* doing a per probe lock is fine. The telemetry probe is a static array: https://mxr.mozilla.org/mozilla-central/source/toolkit/components/telemetry/Telemetry.cpp#460 of histogram pointers. The potential race is the binding of the pointer to a new created histogram https://mxr.mozilla.org/mozilla-central/source/toolkit/components/telemetry/Telemetry.cpp#486 That is why is is safe to do a per probe mutex.
Comment 24•12 years ago
|
||
Comment on attachment 686795 [details] [diff] [review] implementation v5.2 Review of attachment 686795 [details] [diff] [review]: ----------------------------------------------------------------- Hi Camilo. I think it's better to make it part of the unnamed namespace to be consistent with the other code in the file. Also, please add a comment explaining why we created this new mutex instead of using the other mutex we already have. ::: security/manager/ssl/src/SSLServerCertVerification.cpp @@ +109,1 @@ > #include "mozilla/Assertions.h" When possible, try to keep includes in alphabetical order @@ +135,5 @@ > // do not use a nsCOMPtr to avoid static initializer/destructor > nsIThreadPool * gCertVerificationThreadPool = nullptr; > } // unnamed namespace > > +static Mutex *SSLVerificationTelemetryMutex = NULL; Please see previous review comments about this; it is good to be consistent with the existing code. Also, please add this comment here: // We avoid using a mutex for the success case to avoid // lock-related performance issues. However, we do use // a lock in the error case to simplify the code, since // performance in the error case is not important. Mutex *gSSLVerificationErrorTelemetryMutex = nullptr; (in the unnamed namespace above). @@ +1070,4 @@ > RefPtr<SSLServerCertVerificationResult> restart( > + new SSLServerCertVerificationResult(mInfoObject, > + 0, > + PlainErrorMessage, One reason for defaulting the errorMessageType argument is to make it clearer that the non-error case (0) is not an error. I suggest that you move the telemetry ID and telemetry interval parameters to be between the infoObject and error code arguments (i.e. make them the second and third arguments). @@ +1083,5 @@ > + { > + uint32_t interval = (uint32_t) ((TimeStamp::Now() - mJobStartTime).ToMilliseconds()); > + MutexAutoLock telemetryMutex(*SSLVerificationTelemetryMutex); > + if(nsNSSComponent::globalConstFlagUsePKIXVerification){ > + Telemetry::Accumulate(Telemetry::SSL_INITIAL_FAILED_CERT_VALIDATION_TIME_LIBPKIX, Use AccumulateInterval instead of inlining it into this function. Then you can remove the "Note: the interval is not calculated once as PR_GETError MUST be called" comment. @@ +1087,5 @@ > + Telemetry::Accumulate(Telemetry::SSL_INITIAL_FAILED_CERT_VALIDATION_TIME_LIBPKIX, > + interval); > + } > + else{ > + Telemetry::Accumulate(Telemetry::SSL_INITIAL_FAILED_CERT_VALIDATION_TIME_CLASSIC, ditto. @@ +1306,5 @@ > > SSLServerCertVerificationResult::SSLServerCertVerificationResult( > TransportSecurityInfo * infoObject, PRErrorCode errorCode, > + SSLErrorMessageType errorMessageType, > + Telemetry::ID telemetryID , uint32_t telemetryValue ) whitespace between "ID" and "," and before ")" @@ +1316,1 @@ > { // We accumulate telemetry for (only) successful validations // on the main thread to avoid adversely affecting performance // by acquiring the mutex that we use when accumulating the // telemetry for unsuccessful validations. Unsuccessful // validations times are accumulated elsewhere. MOZ_ASSERT(errorCode != 0 || telemetryID == Telemetry::HistogramCount);
Attachment #686795 -
Flags: review?(bsmith) → review-
Comment 25•12 years ago
|
||
(In reply to Camilo Viecco (:cviecco) from comment #23) > (In reply to Devdatta Akhawe [:devd] from comment #19) > > > We cannot use telemetry in this function without locks because it may > > > execute on any of 5 threads in the pool, and telemetry is not thread-safe. > > > > I was under the impression that due to its non-thread-safe nature, > > accumulate needs to run from the main thread. I.e., is it enough to make > > sure that accumulate calls for a particular probe do not race (which is what > > your locking does), even if it might race with other probes (which the patch > > allows)? > > Currently* doing a per probe lock is fine. The telemetry probe is a static > array: > https://mxr.mozilla.org/mozilla-central/source/toolkit/components/telemetry/ > Telemetry.cpp#460 > of histogram pointers. The potential race is the binding of the pointer to a > new created histogram > https://mxr.mozilla.org/mozilla-central/source/toolkit/components/telemetry/ > Telemetry.cpp#486 FWIW, there's locking done further down so you don't have to worry about this sort of thing: https://mxr.mozilla.org/mozilla-central/source/ipc/chromium/src/base/histogram.cc#1154 Do you really expect cert verification to take nearly 2 minutes? The maximums on the histograms you're adding suggest as much.
Assignee | ||
Comment 26•12 years ago
|
||
Comment on attachment 686795 [details] [diff] [review] implementation v5.2 Review of attachment 686795 [details] [diff] [review]: ----------------------------------------------------------------- thank you. Aggreed on alsmot everything. Watch for version 6.x ::: security/manager/ssl/src/SSLServerCertVerification.cpp @@ +109,1 @@ > #include "mozilla/Assertions.h" Done @@ +135,5 @@ > // do not use a nsCOMPtr to avoid static initializer/destructor > nsIThreadPool * gCertVerificationThreadPool = nullptr; > } // unnamed namespace > > +static Mutex *SSLVerificationTelemetryMutex = NULL; Agreed. Done @@ +1070,4 @@ > RefPtr<SSLServerCertVerificationResult> restart( > + new SSLServerCertVerificationResult(mInfoObject, > + 0, > + PlainErrorMessage, I see. I did something similar: put the telemetry params as third and fourth arguments, to make it clear on several failure sections that there was no telemetry in place. @@ +1083,5 @@ > + { > + uint32_t interval = (uint32_t) ((TimeStamp::Now() - mJobStartTime).ToMilliseconds()); > + MutexAutoLock telemetryMutex(*SSLVerificationTelemetryMutex); > + if(nsNSSComponent::globalConstFlagUsePKIXVerification){ > + Telemetry::Accumulate(Telemetry::SSL_INITIAL_FAILED_CERT_VALIDATION_TIME_LIBPKIX, Done @@ +1087,5 @@ > + Telemetry::Accumulate(Telemetry::SSL_INITIAL_FAILED_CERT_VALIDATION_TIME_LIBPKIX, > + interval); > + } > + else{ > + Telemetry::Accumulate(Telemetry::SSL_INITIAL_FAILED_CERT_VALIDATION_TIME_CLASSIC, Done @@ +1306,5 @@ > > SSLServerCertVerificationResult::SSLServerCertVerificationResult( > TransportSecurityInfo * infoObject, PRErrorCode errorCode, > + SSLErrorMessageType errorMessageType, > + Telemetry::ID telemetryID , uint32_t telemetryValue ) Done @@ +1316,1 @@ > { Done
Assignee | ||
Comment 27•12 years ago
|
||
(In reply to Nathan Froyd (:froydnj) from comment #25) > (In reply to Camilo Viecco (:cviecco) from comment #23) > > (In reply to Devdatta Akhawe [:devd] from comment #19) > > > > We cannot use telemetry in this function without locks because it may > > > > execute on any of 5 threads in the pool, and telemetry is not thread-safe. > > > > > > I was under the impression that due to its non-thread-safe nature, > > > accumulate needs to run from the main thread. I.e., is it enough to make > > > sure that accumulate calls for a particular probe do not race (which is what > > > your locking does), even if it might race with other probes (which the patch > > > allows)? > > > > Currently* doing a per probe lock is fine. The telemetry probe is a static > > array: > > https://mxr.mozilla.org/mozilla-central/source/toolkit/components/telemetry/ > > Telemetry.cpp#460 > > of histogram pointers. The potential race is the binding of the pointer to a > > new created histogram > > https://mxr.mozilla.org/mozilla-central/source/toolkit/components/telemetry/ > > Telemetry.cpp#486 > > FWIW, there's locking done further down so you don't have to worry about > this sort of thing: > > https://mxr.mozilla.org/mozilla-central/source/ipc/chromium/src/base/ > histogram.cc#1154 > > Do you really expect cert verification to take nearly 2 minutes? The > maximums on the histograms you're adding suggest as much. Thank you I would hope that it would take less than 1 sec, but since there can be multiple OCSP requests for a resource it would depend on the timeouts for these connections. Since the tail of this probe will look similar to the SSL_TIME_UNTIL_READY and given that the TIME_UNTIL ready max is near 51k I am adjusting this to 60K as limit.
Assignee | ||
Comment 28•12 years ago
|
||
Attachment #670971 -
Attachment is obsolete: true
Assignee | ||
Updated•12 years ago
|
Attachment #691476 -
Flags: review?(bsmith)
Assignee | ||
Comment 29•11 years ago
|
||
Attachment #691476 -
Attachment is obsolete: true
Attachment #691476 -
Flags: review?(bsmith)
Assignee | ||
Comment 30•11 years ago
|
||
Comment on attachment 691560 [details] [diff] [review] implementation v6.1 the assert condition was wrong. Now we actually run with errors.
Attachment #691560 -
Flags: review?(bsmith)
Comment 31•11 years ago
|
||
Comment on attachment 691560 [details] [diff] [review] implementation v6.1 Review of attachment 691560 [details] [diff] [review]: ----------------------------------------------------------------- r+ with nits addressed. ::: security/manager/ssl/src/SSLServerCertVerification.cpp @@ +137,5 @@ > + > +// We avoid using a mutex for the success case to avoid > +// lock-related performance issues. However, we do use > +// a lock in the error case to simplify the code, since > +// performance in the error case is not important. line length (too short). @@ +239,5 @@ > const PRErrorCode mErrorCode; > const SSLErrorMessageType mErrorMessageType; > + const Telemetry::ID mTelemetryID; > + const uint32_t mTelemetryValue; > + remove blank line. @@ +1064,5 @@ > RefPtr<SSLServerCertVerificationResult> restart( > + new SSLServerCertVerificationResult(mInfoObject, > + 0, > + telemetryID, > + interval)); indention. Nit: I don't think each parameter needs its own line @@ +1076,5 @@ > + { > + MutexAutoLock telemetryMutex(*gSSLVerificationTelemetryMutex); > + if(nsNSSComponent::globalConstFlagUsePKIXVerification){ > + Telemetry::AccumulateTimeDelta(Telemetry::SSL_INITIAL_FAILED_CERT_VALIDATION_TIME_LIBPKIX, > + mJobStartTime,TimeStamp::Now()); line length. @@ +1079,5 @@ > + Telemetry::AccumulateTimeDelta(Telemetry::SSL_INITIAL_FAILED_CERT_VALIDATION_TIME_LIBPKIX, > + mJobStartTime,TimeStamp::Now()); > + } > + else{ > + Telemetry::AccumulateTimeDelta(Telemetry::SSL_INITIAL_FAILED_CERT_VALIDATION_TIME_CLASSIC, line length. @@ +1308,5 @@ > +// We accumulate telemetry for (only) successful validations > +// on the main thread to avoid adversely affecting performance > +// by acquiring the mutex that we use when accumulating the > +// telemetry for unsuccessful validations. Unsuccessful > +// validations times are accumulated elsewhere. line length (too short).
Attachment #691560 -
Flags: review?(bsmith) → review+
Updated•11 years ago
|
Attachment #686795 -
Attachment is obsolete: true
Assignee | ||
Comment 32•11 years ago
|
||
Keeping r+ from bsmith (now checking for spaces)
Comment 33•11 years ago
|
||
Backed out for leaks: https://tbpl.mozilla.org/?tree=Mozilla-Inbound&rev=851fd44eeb42 https://hg.mozilla.org/integration/mozilla-inbound/rev/21bd64a6b46d
Assignee | ||
Comment 34•11 years ago
|
||
My bad, used try to test for compilation(for all platforms) but only did full unit testing on Linux.
Assignee | ||
Comment 35•11 years ago
|
||
Green build, pushing post festivites. https://tbpl.mozilla.org/?tree=Try&rev=3a7ae907fdd3
Comment 36•11 years ago
|
||
https://hg.mozilla.org/mozilla-central/rev/da2b24b1fc7c
Status: NEW → RESOLVED
Closed: 11 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla20
You need to log in
before you can comment on or make changes to this bug.
Description
•