749890 - Intermittent test_spdy.js | test failed (with xpcshell return code: 0), | false == true - See following stack:

Reporter

Description

•

13 years ago

https://tbpl.mozilla.org/php/getParsedLog.php?id=11282433&tree=Mozilla-Inbound Rev3 Fedora 12x64 mozilla-inbound debug test xpcshell on 2012-04-27 17:00:52 PDT for push df3acd833280 slave: talos-r3-fed64-052 TEST-UNEXPECTED-FAIL | /home/cltbld/talos-slave/test/build/xpcshell/tests/netwerk/test/unit/test_spdy.js | test failed (with xpcshell return code: 0), see following log: >>>>>>> ### XPCOM_MEM_LEAK_LOG defined -- logging leaks to /tmp/tmp0Reeqt/runxpcshelltests_leaks.log TEST-INFO | (xpcshell/head.js) | test 1 pending WARNING: SSL handshake timed out: file /builds/slave/m-in-lnx64-dbg/build/security/manager/ssl/src/nsNSSIOLayer.cpp, line 2118 TEST-INFO | (xpcshell/head.js) | test 2 pending TEST-INFO | (xpcshell/head.js) | test 2 finished TEST-INFO | (xpcshell/head.js) | running event loop TEST-PASS | /home/cltbld/talos-slave/test/build/xpcshell/tests/netwerk/test/unit/test_spdy.js | [testOnStartRequest : 50] 200 == 200 TEST-PASS | /home/cltbld/talos-slave/test/build/xpcshell/tests/netwerk/test/unit/test_spdy.js | [testOnStartRequest : 51] true == true TEST-PASS | /home/cltbld/talos-slave/test/build/xpcshell/tests/netwerk/test/unit/test_spdy.js | [testOnStopRequest : 62] true == true TEST-PASS | /home/cltbld/talos-slave/test/build/xpcshell/tests/netwerk/test/unit/test_spdy.js | [testOnStopRequest : 63] true == true TEST-UNEXPECTED-FAIL | /home/cltbld/talos-slave/test/build/xpcshell/tests/netwerk/test/unit/test_spdy.js | false == true - See following stack: JS frame :: /home/cltbld/talos-slave/test/build/xpcshell/head.js :: do_throw :: line 462 JS frame :: /home/cltbld/talos-slave/test/build/xpcshell/head.js :: _do_check_eq :: line 556 JS frame :: /home/cltbld/talos-slave/test/build/xpcshell/head.js :: do_check_eq :: line 577 JS frame :: /home/cltbld/talos-slave/test/build/xpcshell/head.js :: do_check_true :: line 591 JS frame :: /home/cltbld/talos-slave/test/build/xpcshell/tests/netwerk/test/unit/test_spdy.js :: testOnStopRequest :: line 64 TEST-INFO | (xpcshell/head.js) | exiting test WARNING: cannot post event if not initialized: file /builds/slave/m-in-lnx64-dbg/build/netwerk/protocol/http/nsHttpConnectionMgr.cpp, line 213 WARNING: cannot post event if not initialized: file /builds/slave/m-in-lnx64-dbg/build/netwerk/protocol/http/nsHttpConnectionMgr.cpp, line 213 WARNING: nsExceptionService ignoring thread destruction after shutdown: file /builds/slave/m-in-lnx64-dbg/build/xpcom/base/nsExceptionService.cpp, line 199 WARNING: OOPDeinit() without successful OOPInit(): file /builds/slave/m-in-lnx64-dbg/build/toolkit/crashreporter/nsExceptionHandler.cpp, line 1998

Comment hidden (Legacy TBPL/Treeherder Robot)

Patrick McManus [:mcmanus]

Comment 3

•

13 years ago

I'm presuming this is a failure of the new test, not of the tested code. Nick let me know if you feel otherwise.

Whiteboard: [orange] → [orange][spdy]

u408661

Comment 4

•

13 years ago

(In reply to Patrick McManus [:mcmanus] from comment #3) > I'm presuming this is a failure of the new test, not of the tested code. > Nick let me know if you feel otherwise. I'll probably need to instrument the test a bit to see exactly why the failure is occurring. Somewhere along the line, we are getting a response on a connection that either the browser or the server (or both!) doesn't think is SPDY. The first thing that popped into my mind was that perhaps this is the same issue causing the spdy indicator add-on to not show a page as spdy? (Some failure to set the x-firefox-spdy header, perhaps?) Was there ever a resolution on that? I can't find anything in bugzilla, or in email, so maybe I'm just making stuff up in a desperate attempt to not feel bad about causing an intermittent orange? Patrick, do you remember anything like that when the spdy indicator add-on first hit, or am I really just making stuff up? I'll have more time to look into this Monday morning and instrument the test to see why, exactly, a connection is getting labeled as non-spdy in the test.

Patrick McManus [:mcmanus]

Comment 5

•

13 years ago

i don't really know anything about the add-on.. I've used it but that's about it.

Comment hidden (Legacy TBPL/Treeherder Robot)

u408661

Comment 14

•

13 years ago

I'm running an instrumented version of the test locally in a loop to see if I can figure out which side thinks we're not using spdy. I saw the failure once this morning on my local machine, so I know I can reproduce it, unfortunately it was using the non-instrumented version of the patch when it failed :)

Comment hidden (Legacy TBPL/Treeherder Robot)

u408661

Comment 16

•

13 years ago

Alright, I was able to duplicate this (after a few hours running test_spdy.js in a loop), and the error is coming because we get an NS_ERROR_NOT_AVAILABLE when trying to get the x-firefox-spdy header off the response to make sure the connection is actually spdy. My best guess is for some reason is that we fail to negotiate spdy with the server, although whether that's a server issue or a client issue remains to be seen. Patrick, what would be the best way to get deep debugging info from the client side to see why negotiating SPDY fails?

Comment hidden (Legacy TBPL/Treeherder Robot)

u408661

Comment 18

•

13 years ago

(In reply to Nick Hurley [:hurley] from comment #16) > Patrick, what would be the best way to get deep debugging info from the > client side to see why negotiating SPDY fails? I should note, I mean if there's anything other than http logging & packet capture I can do here (I'm already running with those enabled, which I should've done the first time around).

Comment hidden (Legacy TBPL/Treeherder Robot)

Patrick McManus [:mcmanus]

Comment 21

•

13 years ago

(In reply to Nick Hurley [:hurley] from comment #18) > (In reply to Nick Hurley [:hurley] from comment #16) > > Patrick, what would be the best way to get deep debugging info from the > > client side to see why negotiating SPDY fails? > > I should note, I mean if there's anything other than http logging & packet > capture I can do here (I'm already running with those enabled, which I > should've done the first time around). those are what I would start with. Any SSL or PSM logging would probably help too - but I don't know how to do that. honza or bsmith?

u408661

Comment 22

•

13 years ago

OK, after another 20 hours, I've succeeded in reproducing again, this time with http logging and packet capture! The files are http://people.mozilla.org/~hurley/spdy_test_failure_xpcshell.txt (this is the output of make check-one on the failed run) http://people.mozilla.org/~hurley/spdy_test_failure.log (this is the http log from the failed run) http://people.mozilla.org/~hurley/spdy_test_failure.pcap (this is the packet capture of the failure - it also contains packets from some successful runs, the failures are (obviously) at the end of the capture) From a quick glance at the pcap, it looks like at some point firefox just decides to use SSLv3 instead of TLS, which means we can't use NPN to upgrade to SPDY. I don't know why it makes this decision (ssl/tls debugging is not my strong suit). Patrick, Josh said to hand this off to you for further investigation.

Assignee: nobody → mcmanus

Patrick McManus [:mcmanus]

Comment 23

•

13 years ago

nick - thanks for the data! bsmith - I think I need some help.. I can take it a little farther than nick, but then I'm just guessing at some point.. as nick says for most of the time we tcp connect, offer up a tls v1 client hello with extension 13172 (npn) in it and get back a tls v1 server hello with the npn list of options. you can spot check that on "tcp.stream eq 10" in nick's pcap. but in stream 697 (or packet 77934 if you prefer) we tcp connect and offer up a SSL 3.0 client hello with no NPN extension. The connection proceeds to use HTTP/1.1. the preceeding stream 696 (which thankfully is not overlapped) (packet 77920) starts a TLS handshake but the client sends a fin before it is complete. The whole thing happens in 8ms, so there wasn't a timeout that I can think of.. but closing a socket that we don't need is certainly the kind of thing the network stack does these days. could that failed handshake in 696 (even though the failure was caused by the client) have caused some kind of tls intolerance trigger to downgrade stream 697? 798 is also at v3. Given that intolerance can happen based on timeouts and is therefore racy, for the purposes of the spdy test it might make sense to just check the SSL version.. but if we've got some kind of behavior that is downgrading our ssl level that needs to be figured out.

Comment hidden (Legacy TBPL/Treeherder Robot)

u408661

Comment 26

•

13 years ago

Attached patch test instrumentation — Details — Splinter Review

Just for reference, here's the patch I used that instruments the test for (slightly) more info on which end is failing (this was in case, for example, my server-side spdy detection was flaky and that's what was causing the test to identify the connection as non-spdy, which is unfortunately not the case here)

u408661

Comment 27

•

13 years ago

Attached file script to run test until failure occurs — Details

Also for reference (and in case anyone wants to try this at home), here's the script I used to run the test over and over again until it failed. Paths will need to be changed to match your setup. This script expects to be run with your cwd = the top-level of your mozilla-central checkout.

Comment hidden (Legacy TBPL/Treeherder Robot)

Brian Smith (:briansmith, :bsmith, use NEEDINFO?)

Assignee

Comment 44

•

13 years ago

First, are we sure that the end of the pcap corresponds to the XPCShell log that was posted? In that log, the TLS intolerance timeout is definitely being hit: WARNING: SSL handshake timed out: file /home/hurley/src/mozilla/m-c/security/manager/ssl/src/nsNSSIOLayer.cpp, line 1015 The code for calculating the timeout is: PRIntervalTime now = PR_IntervalNow(); PRIntervalTime certVerificationTime = mCertVerificationEnded - mCertVerificationStarted; PRIntervalTime totalTime = now - mHandshakeStartTime; PRIntervalTime totalTimeExceptCertVerificationTime = totalTime - certVerificationTime; Patrick is right that the handshake has only taken 8ms, not 25 SECONDS. Is it possible that PR_IntervalNow() isn't monotonically increasing, which could cause these calculations to be wrong? HandshakeTimeout() is only called in nsSSLIOLayerPoll (the PSM implementation of the poll method). So, we cannot be detecting a handshake timeout after we've closed a socket, because that would mean that we're polling the socket after we've closed it, which is wrong. (But, even if we polled after we closed the socket, then the PSM nsNSSIOLayer would have been poppoed off the socket so nsSSLIOLayerPoll would never get called.) Once we've made a successful TLS 1.0 connection to a server, we do not allow fallback to SSL 3.0 for that server for the rest of the browsing session. Perhaps there is a bug caused by the (fully intentional) race between the handshake and the certificate verification, but it seems unlikely. I will look at it more In (tcp.stream eq 695) we close the connection sending a TLS alert first. In (tcp.stream eq 696) we close the connection without sending a TLS alert. The logic for determining whether we send a TLS alert is in ssl_SecureClose: if (ss->version >= SSL_LIBRARY_VERSION_3_0 && /* cond 1 */ !(ss->shutdownHow & ssl_SHUTDOWN_SEND) && /* cond 2 */ ss->firstHsDone && /* cond 3 */ !ss->recvdCloseNotify && /* cond 4 */ ss->ssl3.initialized) { /* cond 5 */ Conditions 1, 4, 5 are definitely true. Is it possible that we have shutdown the socket for sending (Condition 2) before closing it? If condition 3 isn't true, then that would indicate that the certificate verification might not have finished before we closed the connection. But, if certificate verification were still pending, then we would not hit the TLS intolerance timeout because there is an explicit check that ensures we never to the TLS intolerance fallback if cert verification is pending. If there are more packet captures for this, I would love to see them, to see if we consistently have this "no sent alert" pattern when the test fails.

Honza Bambas (:mayhemer)

Comment 45

•

13 years ago

(In reply to Brian Smith (:bsmith) from comment #44) > Is > it possible that PR_IntervalNow() isn't monotonically increasing, which > could cause these calculations to be wrong? I don't believe so. > Is it possible that we have shutdown the socket for sending (Condition 2) > before closing it? We don't shutdown sockets at all (necko). Just closing them.

Comment hidden (Legacy TBPL/Treeherder Robot)

Patrick McManus [:mcmanus]

Comment 49

•

13 years ago

(In reply to Honza Bambas (:mayhemer) from comment #45) > (In reply to Brian Smith (:bsmith) from comment #44) > > before closing it? > > We don't shutdown sockets at all (necko). Just closing them. in the trace in question when we close the socket there is unread data queued in the kernel.. that generates the tcp rst. I'm wondering if there is some way that manifests in an error that gets interpreted as the intolerance. (you would think that would be more of a problem for the peer - after we issue the close I wouldn't think we'd notice any io events, but its a possibility.)

Comment hidden (Legacy TBPL/Treeherder Robot)

u408661

Comment 63

•

13 years ago

(In reply to Brian Smith (:bsmith) from comment #44) > First, are we sure that the end of the pcap corresponds to the XPCShell log > that was posted? Yes, there was no other traffic on localhost:4443 on the machine I ran this on, this was the most recent pcap available (I was cycling between 10 different files for posterity), and there would have been no xpcshell runs after the test failed. > If there are more packet captures for this, I would love to see them, to see > if we consistently have this "no sent alert" pattern when the test fails. I will arrange for more pcaps, it just might take a while (given how intermittent this issue can be).

Patrick McManus [:mcmanus]

Comment 64

•

13 years ago

(In reply to Brian Smith (:bsmith) from comment #44) > WARNING: SSL handshake timed out: file > /home/hurley/src/mozilla/m-c/security/manager/ssl/src/nsNSSIOLayer.cpp, line > 1015 > [..] > Patrick is right that the handshake has only taken 8ms, not 25 SECONDS. Is > it possible that PR_IntervalNow() isn't monotonically increasing, which > could cause these calculations to be wrong? all the tbpl reports are on linux, and nick's repro is on linux. (and actually I think we might only be running the test on linux.) So let's look at linux pr_intervalnow() handling.. :) a breakpoint on my desktop shows me: http://mxr.mozilla.org/mozilla-central/source/nsprpub/pr/src/misc/prinrval.c#77 which bears the comment: * This version of interval times is based on the time of day * capability offered by system. This isn't valid for two reasons: * 1) The time of day is neither linear nor montonically increasing * 2) The units here are milliseconds. That's not appropriate for our use. */ doh! And stepping through that you can see that it calls gettimeofday() directly.. a little googling on that and you'll easily see that its not guaranteed monotonic. That would be a pretty big problem. Linux does provide a cheap monotonic time source - clock_gettiime(CLOCK_MONOTONIC).. perhaps we should be using that?

u408661

Comment 65

•

13 years ago

(In reply to Patrick McManus [:mcmanus] from comment #64) > (In reply to Brian Smith (:bsmith) from comment #44) > > > WARNING: SSL handshake timed out: file > > /home/hurley/src/mozilla/m-c/security/manager/ssl/src/nsNSSIOLayer.cpp, line > > 1015 > > > [..] > > Patrick is right that the handshake has only taken 8ms, not 25 SECONDS. Is > > it possible that PR_IntervalNow() isn't monotonically increasing, which > > could cause these calculations to be wrong? > > all the tbpl reports are on linux, and nick's repro is on linux. (and > actually I think we might only be running the test on linux.) So let's look > at linux pr_intervalnow() handling.. :) It's true that we're only running the test on linux right now, and linux is where the data above comes from, however, I've also reproduced this on my mac. This does appear to use the same gettimeofday implementation as linux (not surprising, given that they're both unix-ish). Not that this particularly adds any more info to the mix, but I wanted to make sure the whole picture (failures on multiple platforms) is known.

Comment hidden (Legacy TBPL/Treeherder Robot)

Patrick McManus [:mcmanus]

Comment 67

•

13 years ago

fwiw I have not been able to reproduce it - hundreds of thousands of test_spdy iterations and millions of iterations of a simpler c++ test that just tests the ordering of two pr_intervalnow() calls. but clock issues can be extremely hardware dependent. if it is a problem with unix interval handling, that would also potentially bring doubt to a lot of different code paths on android.. so while its tempting to just try changing the handshaketimeout() methods to use mozilla::timestamp() (which does use clock_gettime) there is more work to do. nick can you try this patch? (obviously not suitable for promotion to nspr as is) and see if it changes things for you? I'll also let tryserver at it for a long series of runs - the CI seems to produce it fairly often. diff --git a/nsprpub/pr/src/Makefile.in b/nsprpub/pr/src/Makefile.in --- a/nsprpub/pr/src/Makefile.in +++ b/nsprpub/pr/src/Makefile.in @@ -160,16 +160,17 @@ ifeq ($(OS_TARGET),Android) # Android has no libpthread.so in NDK OS_LIBS = -ldl else OS_LIBS = -lpthread -ldl endif else OS_LIBS = -ldl endif +OS_LIBS += -lrt endif ifeq ($(OS_ARCH),HP-UX) ifeq ($(USE_PTHREADS), 1) ifeq (,$(filter-out B.10.10 B.10.20,$(OS_RELEASE))) OS_LIBS = -ldce else OS_LIBS = -lpthread -lrt diff --git a/nsprpub/pr/src/md/unix/unix.c b/nsprpub/pr/src/md/unix/unix.c --- a/nsprpub/pr/src/md/unix/unix.c +++ b/nsprpub/pr/src/md/unix/unix.c @@ -3046,22 +3046,22 @@ PR_Now(void) LL_I2L(us, tv.tv_usec); LL_MUL(s, s, s2us); LL_ADD(s, s, us); return s; } PRIntervalTime _PR_UNIX_GetInterval() { - struct timeval time; + struct timespec time; PRIntervalTime ticks; - (void)GETTIMEOFDAY(&time); /* fallicy of course */ + clock_gettime(CLOCK_MONOTONIC, &time); ticks = (PRUint32)time.tv_sec * PR_MSEC_PER_SEC; /* that's in milliseconds */ - ticks += (PRUint32)time.tv_usec / PR_USEC_PER_MSEC; /* so's that */ + ticks += (PRUint32)time.tv_nsec / PR_NSEC_PER_MSEC; /* so's that */ return ticks; } /* _PR_SUNOS_GetInterval */ PRIntervalTime _PR_UNIX_TicksPerSecond() { return 1000; /* this needs some work :) */ }

u408661

Comment 68

•

13 years ago

(In reply to Patrick McManus [:mcmanus] from comment #67) > nick can you try this patch? (obviously not suitable for promotion to nspr > as is) and see if it changes things for you? I'll also let tryserver at it > for a long series of runs - the CI seems to produce it fairly often. OK, I've got this running on my linux machine. I'll report back in a couple days (or sooner, if the problem recurs before then).

Comment hidden (Legacy TBPL/Treeherder Robot)

Patrick McManus [:mcmanus]

Comment 72

•

13 years ago

Try server continues to have intermittent fails negotiating spdy with the patch from comment 67. so the root cause here is something other than the pr_interval using gettimeofday() https://tbpl.mozilla.org/?tree=Try&rev=a92e6b55c585

Comment hidden (Legacy TBPL/Treeherder Robot)

Patrick McManus [:mcmanus]

Comment 75

•

13 years ago

I insturmented HandshakeTimeout() to print some stuff and had try reproduce the failure 1.11 1.12 - return totalTimeExceptCertVerificationTime > 1.13 + bool rv = totalTimeExceptCertVerificationTime > 1.14 PR_SecondsToInterval(HANDSHAKE_TIMEOUT_SECONDS); 1.15 + fprintf(stderr,"handshaketimeout:: rv=%d time=%dms state=%d\n", 1.16 + rv, PR_IntervalToMilliseconds(totalTimeExceptCertVerificationTime), 1.17 + mCertVerificationState); 1.18 + return rv; try:: handshaketimeout:: rv=0 time=0ms state=0 handshaketimeout:: rv=1 time=-3ms state=2 WARNING: SSL handshake timed out: file /builds/slave/try-lnx64-dbg/build/security/manager/ssl/src/nsNSSIOLayer.cpp, line 1019 (and then in another test handshaketimeout:: rv=1 time=-2ms state=2) I had expected it to print an absurd elapsed time, indicating something was corrupted or maybe being compared to '0'.. but instead it prints -3ms (or -2ms). I'm kinda surprised that's not -1 if the issue is monotonicity. I wonder about some kind of a rounding propogation problem in this snippet.. (printervaltime is essentially an unisgned number of milliseconds, taken from gtod which does usec, so they are rounded to form the now() value) PRIntervalTime now = PR_IntervalNow(); PRIntervalTime certVerificationTime = mCertVerificationEnded - mCertVerificationStarted; PRIntervalTime totalTime = now - mHandshakeStartTime; PRIntervalTime totalTimeExceptCertVerificationTime = totalTime - certVerificationTime; bool rv = totalTimeExceptCertVerificationTime > PR_SecondsToInterval(HANDSHAKE_TIMEOUT_SECONDS);

Patrick McManus [:mcmanus]

Comment 76

•

13 years ago

the negative time is also consistent with setHandshakeInProgress() being called after setcertverficationwaiting(), though I have nothing else to suggest that is happening.

Comment hidden (Legacy TBPL/Treeherder Robot)

Patrick McManus [:mcmanus]

Comment 79

•

13 years ago

(In reply to Patrick McManus [:mcmanus] from comment #76) > the negative time is also consistent with setHandshakeInProgress() being > called after setcertverficationwaiting(), though I have nothing else to > suggest that is happening. confirmed. See annotated try failure: https://tbpl.mozilla.org/php/getParsedLog.php?id=11589533&tree=Try#error0 setting certverification started to 2F4DD401, handshake start is 0 setting handshake starttime to 2F4DD407 setting certverification ended to 2F4DD408, handshake start is 2F4DD407 handshaketimeout:: rv=1 time=-6ms state=2 WARNING: SSL handshake timed out: file /builds/slave/try-lnx64-dbg/build/security/manager/ssl/src/nsNSSIOLayer.cpp, line 1024 This is the patch with the printfs: https://hg.mozilla.org/try/rev/8bea3e6f1206 In this case SetHandshakeInProgress() was called 6ms after SetCertVerificationWaiting(), but that's the most extreme example I've seen. I believe Brian says he knows how to fix the race condition here and offered to do so. Thanks!

Assignee: mcmanus → bsmith

Comment hidden (Legacy TBPL/Treeherder Robot)

Brian Smith (:briansmith, :bsmith, use NEEDINFO?)

Assignee

Comment 103

•

13 years ago

https://tbpl.mozilla.org/?tree=Try&rev=2d5b7bbddbb0

Brian Smith (:briansmith, :bsmith, use NEEDINFO?)

Assignee

Comment 104

•

13 years ago

Oops. Let's try again: https://tbpl.mozilla.org/?tree=Try&rev=9f8aa02d059e

Comment hidden (Legacy TBPL/Treeherder Robot)

Brian Smith (:briansmith, :bsmith, use NEEDINFO?)

Assignee

Comment 114

•

13 years ago

Attached patch Do not do TLS intolerance handshake timeout after we know we've received the server hello. (obsolete) — Details — Splinter Review

The TLS intolerance timeout is needed for servers that do not respond *at all* to our hello request; see the long comment in checkHandshake() in nsNSSIOLayer.cpp. Once we've received a server hello from the server, there's no need to do the timeout, and it is better NOT to, because any timeouts after we've received a response. By explicitly avoiding those wrong timeouts, we avoid the need for the (incorrect) complicated comparisons of the PRIntervalTimes I introduced, which caused this bug.

Attachment #623210 - Flags: review?(honzab.moz)

Attachment #623210 - Flags: feedback?(mcmanus)

Brian Smith (:briansmith, :bsmith, use NEEDINFO?)

Assignee

Comment 115

•

13 years ago

Here is the try build for just this test, with ~450 passing runs: https://tbpl.mozilla.org/?tree=Try&rev=908620cfe336 Here is the full try run, just submitted: https://tbpl.mozilla.org/?tree=Try&rev=6380b35ed74b

Brian Smith (:briansmith, :bsmith, use NEEDINFO?)

Assignee

Comment 116

•

13 years ago

(In reply to Brian Smith (:bsmith) from comment #114) > Created attachment 623210 [details] [diff] [review] > Do not do TLS intolerance handshake timeout after we know we've received the > server hello. > > The TLS intolerance timeout is needed for servers that do not respond *at > all* > to our hello request; see the long comment in checkHandshake() in > nsNSSIOLayer.cpp. > > Once we've received a server hello from the server, > there's no need to do the timeout, and it is better NOT to, because any > timeouts after we've received a response ... are false positives.

Comment hidden (Legacy TBPL/Treeherder Robot)

Honza Bambas (:mayhemer)

Comment 122

•

13 years ago

Comment on attachment 623210 [details] [diff] [review] Do not do TLS intolerance handshake timeout after we know we've received the server hello. Review of attachment 623210 [details] [diff] [review]: ----------------------------------------------------------------- Brian, please add [diff] showfunc=true to your hgrc. Dropping r because of the first comment. Brian, please fix and retest this patch. ::: security/manager/ssl/src/nsNSSIOLayer.cpp @@ +478,5 @@ > #define HANDSHAKE_TIMEOUT_SECONDS 25 > > bool nsNSSSocketInfo::HandshakeTimeout() > { > + if (!!mAllowTLSIntoleranceTimeout) Rather just a singe '!' ? According this change, it seems like some of your tests of this code were not done or could have misleading results... @@ +498,3 @@ > PRIntervalTime now = PR_IntervalNow(); > + bool result = (now - mHandshakeStartTime) > + > PR_SecondsToInterval(HANDSHAKE_TIMEOUT_SECONDS); You can do: static const kHandshakeTimeout = PR_SecondsToInterval(HANDSHAKE_TIMEOUT_SECONDS); and use that konst rather. Time conversion may be expensive. @@ -502,3 @@ > PRIntervalTime now = PR_IntervalNow(); > - PRIntervalTime certVerificationTime = > - mCertVerificationEnded - mCertVerificationStarted; You can remove these two members now.

Attachment #623210 - Flags: review?(honzab.moz)

Comment hidden (Legacy TBPL/Treeherder Robot)

Patrick McManus [:mcmanus]

Comment 140

•

13 years ago

Comment on attachment 623210 [details] [diff] [review] Do not do TLS intolerance handshake timeout after we know we've received the server hello. I don't have anything to add beyond honza's comments other than PR_SecondsToInterval is (cross-platform) just a multiply so there is no perf concern there.. but const is always good :)

Attachment #623210 - Flags: feedback?(mcmanus)

Comment hidden (Legacy TBPL/Treeherder Robot)

Brian Smith (:briansmith, :bsmith, use NEEDINFO?)

Assignee

Updated

•

13 years ago

Blocks: 738992

Comment hidden (Legacy TBPL/Treeherder Robot)

Josh Aas

Comment 206

•

13 years ago

Why have we been sitting on a patch for a common intermittent orange like this? What is the next step here?

Comment hidden (Legacy TBPL/Treeherder Robot)

Brian Smith (:briansmith, :bsmith, use NEEDINFO?)

Assignee

Comment 209

•

13 years ago

Attached patch Do not do TLS intolerance handshake timeout after we know we've received the server hello [v2] — Details — Splinter Review

Here's the updated patch that fixes the typo and which address the other review comments.

Attachment #623210 - Attachment is obsolete: true

Attachment #626931 - Flags: review?(honzab.moz)

Comment hidden (Legacy TBPL/Treeherder Robot)

Honza Bambas (:mayhemer)

Comment 219

•

13 years ago

Comment on attachment 626931 [details] [diff] [review] Do not do TLS intolerance handshake timeout after we know we've received the server hello [v2] Review of attachment 626931 [details] [diff] [review]: ----------------------------------------------------------------- r=honzab Thanks. ::: security/manager/ssl/src/nsNSSIOLayer.cpp @@ +495,1 @@ > } static const PRIntervalTime handshakeTimeoutInterval = PR_SecondsToInterval(25) is called the first time this line gets executed. It is not a static initializer. Thus, you don't need your lazy init code, compiler does it for you automatically.

Attachment #626931 - Flags: review?(honzab.moz) → review+

Comment hidden (Legacy TBPL/Treeherder Robot)

Brian Smith (:briansmith, :bsmith, use NEEDINFO?)

Assignee

Comment 228

•

13 years ago

https://hg.mozilla.org/integration/mozilla-inbound/rev/1d09d60b4bc1

Target Milestone: --- → mozilla15

Brian Smith (:briansmith, :bsmith, use NEEDINFO?)

Assignee

Comment 229

•

13 years ago

(In reply to Honza Bambas (:mayhemer) from comment #219) > static const PRIntervalTime handshakeTimeoutInterval = > PR_SecondsToInterval(25) is called the first time this line gets executed. > It is not a static initializer. Thus, you don't need your lazy init code, > compiler does it for you automatically. I fixed this before I pushed: https://hg.mozilla.org/integration/mozilla-inbound/diff/1d09d60b4bc1/security/manager/ssl/src/nsNSSIOLayer.cpp#l1.113

Ed Morley [:emorley]

Comment 230

•

13 years ago

https://hg.mozilla.org/mozilla-central/rev/1d09d60b4bc1

Status: NEW → RESOLVED

Closed: 13 years ago

Resolution: --- → FIXED

Nobody; OK to take it and work on it

Updated

•

13 years ago

Keywords: intermittent-failure

Nobody; OK to take it and work on it

Updated

•

13 years ago

Whiteboard: [orange][spdy] → [spdy]

Comment hidden (Legacy TBPL/Treeherder Robot)

u408661

Updated

•

12 years ago

Depends on: 840625

test instrumentation 13 years ago u408661 5.62 KB, patch		Details \| Diff \| Splinter Review
script to run test until failure occurs 13 years ago u408661 630 bytes, application/x-sh		Details
Do not do TLS intolerance handshake timeout after we know we've received the server hello. 13 years ago Brian Smith (:briansmith, :bsmith, use NEEDINFO?) 7.00 KB, patch		Details \| Diff \| Splinter Review
Do not do TLS intolerance handshake timeout after we know we've received the server hello [v2] 13 years ago Brian Smith (:briansmith, :bsmith, use NEEDINFO?) 9.57 KB, patch	mayhemer : review+	Details \| Diff \| Splinter Review