Closed Bug 865314 Opened 11 years ago Closed 8 years ago

ssl parallelism to new host restricted to 1

Tracking

()

Status:

RESOLVED FIXED

Milestone:

mozilla49

Tracking Flags:

Tracking

Status

firefox49

---

fixed

People

(Reporter: mcmanus, Assigned: mcmanus)

References

Details

(Whiteboard: [necko-active])

Attachments

(1 file, 6 obsolete files)

patch 0 11 years ago Patrick McManus [:mcmanus] 4.55 KB, patch	mcmanus : review+	Details \| Diff \| Splinter Review
patch 1 11 years ago Patrick McManus [:mcmanus] 7.71 KB, patch	mcmanus : review+	Details \| Diff \| Splinter Review
ssl parallelism for some cert handshakes limited to 1 10 years ago Patrick McManus [:mcmanus] 11.20 KB, patch	mcmanus : review+	Details \| Diff \| Splinter Review
ssl parallelism to new host should not be 1 10 years ago Patrick McManus [:mcmanus] 21.11 KB, patch	u408661 : review+	Details \| Diff \| Splinter Review
ssl parallelism to new host should not be 1 10 years ago Patrick McManus [:mcmanus] 20.41 KB, patch	u408661 : review+	Details \| Diff \| Splinter Review
interdiff 10 years ago Patrick McManus [:mcmanus] 3.01 KB, patch		Details \| Diff \| Splinter Review
allow 6 parallel connects to new https origin 8 years ago Patrick McManus [:mcmanus] 27.03 KB, patch	mcmanus : review+	Details \| Diff \| Splinter Review

Patrick McManus [:mcmanus]

Assignee

Description

•

11 years ago

on first connect to a new ssl host we restrict the parallelism to 1 until the ssl handshake is complete. We do that in order to identify spdy hosts (which won't use the parallelism).

In retrospect that is optimizing for the wrong thing. It is designed to avoid connection terminations but what we really want to avoid is un-necessary parallel active connections. To fix this we can just lift the restrict-to-1 restriction in the case of hosts which aren't known to speak spdy (hosts known to speak that will still have it in place). If we should happen to have more than 1 active spdy connection to the same host then we will just gracefully shut down the extra ones.

In practice I expect any extra connections to carry 0 transactions - because the first one to establish will be able to dispatch the whole queue pretty much immediately due to the muxxing nature of spdy. but HTTP/1 connections will get more parallelism and therefore better latency properties.

There is a significant side effect here. The first SSL connection to a host is also the one that will trigger OCSP (which is afterwards cached) and also establish resumable SSL sessions. So with this change we can expect to see more OCSP and non-resumed handshakes - though it should all be done in parallel and I would expect not actually add latency.

https://tbpl.mozilla.org/?tree=Try&rev=02d84ac9e07f

Patrick McManus [:mcmanus]

Assignee

Comment 1

•

11 years ago

https://tbpl.mozilla.org/?tree=Try&rev=c3659ec532e4

Patrick McManus [:mcmanus]

Assignee

Comment 2

•

11 years ago

Attached patch patch 0 (obsolete) — Details — Splinter Review

jason r+'d this in person

Attachment #741599 - Flags: review+

Patrick McManus [:mcmanus]

Assignee

Comment 3

•

11 years ago

https://hg.mozilla.org/integration/mozilla-inbound/rev/96a806212cac

Phil Ringnalda (:philor)

Comment 4

•

11 years ago

Backed out in https://hg.mozilla.org/integration/mozilla-inbound/rev/83a790e5acd8 as the apparent cause of fairly frequent Linux failures in test_spdy.js, like https://tbpl.mozilla.org/php/getParsedLog.php?id=22224595&tree=Mozilla-Inbound (2 in 10 frequent in the retriggers on your push on Linux64 opt, which seems to have been the most failure-prone).

Phil Ringnalda (:philor)

Comment 5

•

11 years ago

Interestingly enough it also seems, probably, perhaps, I think maybe, to have moved the heinous complex of Android shutdown certificate crashes in bug 761987 and friends away from being shutdown crashes and into being earlier on during test crashes. I triggered a crapload of Android jsreftests on your push in https://tbpl.mozilla.org/?tree=Mozilla-Inbound&rev=96a806212cac and got several of those in-a-test ones; I triggered a crapload on your parent push in https://tbpl.mozilla.org/?tree=Mozilla-Inbound&rev=ba1bd4eccd7c and I'm quite looking forward to seeing whether I can manage to not get any at all there.

Phil Ringnalda (:philor)

Comment 6

•

11 years ago

And indeed, all of the bug 761987 crashes on your parent were during shutdown, so somehow that patch made Android both more likely to crash @nssCertificate_Destroy and made it do it during test runs instead of only at shutdown.

Patrick McManus [:mcmanus]

Assignee

Comment 7

•

11 years ago

dougt, did you say there was a promising mismatched allocator patch for 761987? (see comments 5 and 6 in this bug)

I want to repush this patch when I figure out comment 4, but I'd hate to spread out some orange that at least is centralized right now. The patch will give us a latency win in setting up parallel http/1 ssl connections.

Flags: needinfo?(doug.turner)

Doug Turner (:dougt)

Comment 8

•

11 years ago

I was hopeful that it was bug 850332 which landed a week ago.

Flags: needinfo?(doug.turner)

Brian Smith (:briansmith, :bsmith, use NEEDINFO?)

Comment 9

•

11 years ago

(In reply to Patrick McManus [:mcmanus] from comment #7)
> dougt, did you say there was a promising mismatched allocator patch for
> 761987? (see comments 5 and 6 in this bug)
> 
> I want to repush this patch when I figure out comment 4, but I'd hate to
> spread out some orange that at least is centralized right now. The patch
> will give us a latency win in setting up parallel http/1 ssl connections.

There must be a race condition in the certificate validation logic. I will fix it as soon as I can find it, which I am going to start on today.

Patrick McManus [:mcmanus]

Assignee

Updated

•

11 years ago

Depends on: 866867

Patrick McManus [:mcmanus]

Assignee

Updated

•

11 years ago

Depends on: 761987

Patrick McManus [:mcmanus]

Assignee

Comment 10

•

11 years ago

Attached patch patch 1 (obsolete) — Details — Splinter Review

updated for whitespace and bitrot

Attachment #741599 - Attachment is obsolete: true

Attachment #743233 - Flags: review+

Brian Smith (:briansmith, :bsmith, use NEEDINFO?)

Updated

•

11 years ago

Comment 11

•

10 years ago

Attached patch ssl parallelism for some cert handshakes limited to 1 (obsolete) — Details — Splinter Review

Patrick McManus [:mcmanus]

Assignee

Updated

•

10 years ago

Attachment #8368640 - Flags: review+

Patrick McManus [:mcmanus]

Assignee

Updated

•

10 years ago

Attachment #743233 - Attachment is obsolete: true

Patrick McManus [:mcmanus]

Assignee

Comment 12

•

10 years ago

the new patch is just updated for bitrot.

there is a try build over here
https://tbpl.mozilla.org/?tree=Try&rev=9532f191d24e

I triggered ~20 JS1 runs on android 4.. and got 2 free arenalist shutdown crashes from it.

I more or less read that as situation unchanged - what do you think brian?

Flags: needinfo?(brian)

Patrick McManus [:mcmanus]

Assignee

Updated

•

10 years ago

Depends on: 963317

Brian Smith (:briansmith, :bsmith, use NEEDINFO?)

Comment 13

•

10 years ago

Thanks Patrick. We need to fix the underlying issue. I will work with cviecco and keeler to solve it.

FWIW, I suspect the main problem is that the socket transport service thread is not implementing the nsNSSShutDownObject interface, and thus it doesn't know to stop all the SSL threads and avoid starting up new ones when NSS shuts down. If there are any networking team people available to work on that particular part of the problem, please let me know.

Depends on: 934902

Flags: needinfo?(brian)

Patrick McManus [:mcmanus]

Assignee

Comment 14

•

10 years ago

i'd be happy to help myself

Brian Smith (:briansmith, :bsmith, use NEEDINFO?)

Updated

•

10 years ago

Depends on: 871575

Brian Smith (:briansmith, :bsmith, use NEEDINFO?)

Comment 15

•

10 years ago

(In reply to Patrick McManus [:mcmanus] from comment #12)
> I triggered ~20 JS1 runs on android 4.. and got 2 free arenalist shutdown
> crashes from it.
> 
> I more or less read that as situation unchanged - what do you think brian?

Yesterday I ran 775 runs of jreftest J1 and there were 3 such crashes on Android 2.2 opt and 1 such crash on Android 4.0 opt. 4/775 = 5/1000 failures. Also, those failures occur intermittently even without this patch on a different try run. So, I don't think that the crash bugs need to block this bug from landing. What do you think, Patrick?

https://tbpl.mozilla.org/?tree=Try&rev=b70890f16240

No longer depends on: 871575, 934902, 963317

Flags: needinfo?(mcmanus)

Patrick McManus [:mcmanus]

Assignee

Comment 16

•

10 years ago

dkeeler landed something a few weeks ago that I wondered at the time if it would help with my experience in comment 12.. perhaps that is the improvement you see in comment 15.

let me think for a bit about what that was.. I don't recall exactly off the top of my head.

Flags: needinfo?(mcmanus)

Brian Smith (:briansmith, :bsmith, use NEEDINFO?)

Comment 17

•

10 years ago

I think it is bug 967629 that you are referring to.

Patrick McManus [:mcmanus]

Assignee

Comment 18

•

10 years ago

(In reply to Brian Smith (:briansmith, was :bsmith; NEEDINFO? for response) from comment #17)
> I think it is bug 967629 that you are referring to.

yes - thank you.

imo - let's go forward with this right after we get a fresh ff31 tree, to avoid a crash landing.

Patrick McManus [:mcmanus]

Assignee

Comment 19

•

10 years ago

Attached patch ssl parallelism to new host should not be 1 (obsolete) — Details — Splinter Review

Attachment #8507384 - Flags: review?(hurley)

Patrick McManus [:mcmanus]

Assignee

Comment 20

•

10 years ago

this fell off my radar accidentally - we really want this patch.

I've un-bit rotted it and improved the logic for quickly activating transactions that are attached to secondary connections when a spdy connection is negotiated.

Patrick McManus [:mcmanus]

Assignee

Comment 21

•

10 years ago

https://tbpl.mozilla.org/?tree=Try&rev=0d9e5c76ef00

u408661

Comment 22

•

10 years ago

Comment on attachment 8507384 [details] [diff] [review]
ssl parallelism to new host should not be 1

Review of attachment 8507384 [details] [diff] [review]:
-----------------------------------------------------------------

This LGTM, but let's make sure we get a whole bunch of linux xpcshell runs before landing to make sure this isn't still susceptible to the intermittent orange that philor saw on the original patch (comment 4). I don't *think* it will be, but better safe than sorry.

Attachment #8507384 - Flags: review?(hurley) → review+

Patrick McManus [:mcmanus]

Assignee

Comment 23

•

10 years ago

try is complaining about an unused rv (debug only rv) on opt anyhow.. will respin and rerun tests.

Patrick McManus [:mcmanus]

Assignee

Comment 24

•

10 years ago

Your results will be at https://tbpl.mozilla.org/?tree=Try&rev=c709b8819aea

Patrick McManus [:mcmanus]

Assignee

Comment 25

•

10 years ago

https://hg.mozilla.org/integration/mozilla-inbound/rev/d4845054ea18

Patrick McManus [:mcmanus]

Assignee

Comment 26

•

10 years ago

https://hg.mozilla.org/integration/mozilla-inbound/rev/5fcda0c42793

Patrick McManus [:mcmanus]

Assignee

Comment 27

•

10 years ago

comments 25 and 26 are the checkin and backout of the patch from comment 21.

the version from comment 24 should have been commited - the only difference is a DebugOnly declaration that is needed to avoid breaking opt builds that error on unused variables.

Patrick McManus [:mcmanus]

Assignee

Comment 28

•

10 years ago

https://hg.mozilla.org/integration/mozilla-inbound/rev/06acd829f970

Phil Ringnalda (:philor)

Comment 29

•

10 years ago

https://hg.mozilla.org/mozilla-central/rev/06acd829f970

Status: NEW → RESOLVED

Closed: 10 years ago

Resolution: --- → FIXED

Target Milestone: --- → mozilla36

Ryan VanderMeulen [:RyanVM]

Comment 30

•

10 years ago

Backed out for causing topcrashes. I'm also respinning nightlies.

https://hg.mozilla.org/mozilla-central/rev/da125623d9cb

Status: RESOLVED → REOPENED

Resolution: FIXED → ---

Target Milestone: mozilla36 → ---

u408661

Updated

•

10 years ago

Depends on: 1089638

u408661

Updated

•

10 years ago

Depends on: 1089749

Alice0775 White

Updated

•

10 years ago

Depends on: 1089766

Andrew McCreight [:mccr8]

Updated

•

10 years ago

Version: 18 Branch → Trunk

Virtual_ManPL [:Virtual] 🇵🇱 - (please needinfo? me - so I will see your comment/reply/question/etc.)

Updated

•

10 years ago

No longer depends on: 1089766

Daniel Veditz [:dveditz]

Updated

•

10 years ago

Depends on: 1090238

Patrick McManus [:mcmanus]

Assignee

Comment 31

•

10 years ago

Attached patch ssl parallelism to new host should not be 1 (obsolete) — Details — Splinter Review

Attachment #8528591 - Flags: review?(hurley)

Patrick McManus [:mcmanus]

Assignee

Updated

•

10 years ago

Attachment #8507384 - Attachment is obsolete: true

Patrick McManus [:mcmanus]

Assignee

Updated

•

10 years ago

Attachment #8368640 - Attachment is obsolete: true

Patrick McManus [:mcmanus]

Assignee

Comment 32

•

10 years ago

Attached patch interdiff (obsolete) — Details — Splinter Review

this is the interdiff from the previously reviewed version to this one. fyi

Patrick McManus [:mcmanus]

Assignee

Comment 33

•

10 years ago

The previous version had code that dealt with what happens when we get a confirmed spdy/h2 session and still have other connections pending. It tried to migrate the transactions on those half open connections onto the new one.

That was too clever by half. The transactions are actually listed twice - once as a weak ref on the half open connection and once in the pending queue for the origin connection entry. Because the patch only looked at the former it allowed for the transaction to be executed twice.

I say it was too clever, because all that needed to be done was to Abandon() the half open connection. Because the transaction is still in the pending queue it will automatically be assigned to the new connection the next time the queues are considered (which happens at the end of the function).

The previous patch also had the same problem as I fixed in 1104993 (which led to me filing that bug).

u408661

Updated

•

10 years ago

Attachment #8528591 - Flags: review?(hurley) → review+

Patrick McManus [:mcmanus]

Assignee

Comment 34

•

10 years ago

given that this patch has proven tricky to land I'm going to wait to do so until gecko-37 is created.

Patrick McManus [:mcmanus]

Assignee

Comment 35

•

10 years ago

https://hg.mozilla.org/integration/mozilla-inbound/rev/61ee2e053920

Patrick McManus [:mcmanus]

Assignee

Updated

•

10 years ago

Target Milestone: --- → mozilla37

Wes Kocher (:KWierso) (Not reading bugmail; email directly if needed)

Comment 36

•

10 years ago

https://hg.mozilla.org/mozilla-central/rev/61ee2e053920

Status: REOPENED → RESOLVED

Closed: 10 years ago → 10 years ago

Resolution: --- → FIXED

Patrick McManus [:mcmanus]

Assignee

Comment 37

•

10 years ago

this caused crashes (bug 1069838) so I'll backout while its investigated.

m-i is closed right now - so the backout is pending.

Status: RESOLVED → REOPENED

Resolution: FIXED → ---

Patrick McManus [:mcmanus]

Assignee

Comment 38

•

10 years ago

https://hg.mozilla.org/integration/mozilla-inbound/rev/36d7afaaca18

Patrick McManus [:mcmanus]

Assignee

Comment 39

•

10 years ago

comment 39 is the backout of the comment 35 push

Keywords: leave-open

Wes Kocher (:KWierso) (Not reading bugmail; email directly if needed)

Comment 40

•

10 years ago

Backout merged to m-c: https://hg.mozilla.org/mozilla-central/rev/36d7afaaca18

Patrick McManus [:mcmanus]

Assignee

Updated

•

10 years ago

Depends on: 1073825

Patrick McManus [:mcmanus]

Assignee

Updated

•

10 years ago

Depends on: 1111217

Patrick McManus [:mcmanus]

Assignee

Updated

•

10 years ago

Keywords: leave-open

Patrick McManus [:mcmanus]

Assignee

Updated

•

8 years ago

Whiteboard: [necko-active]

Patrick McManus [:mcmanus]

Assignee

Comment 41

•

8 years ago

https://treeherder.mozilla.org/#/jobs?repo=try&revision=6e9dcd726f3c

Patrick McManus [:mcmanus]

Assignee

Comment 42

•

8 years ago

failures from dec 14 include (comment 37 had a typo)

https://bugzilla.mozilla.org/show_bug.cgi?id=1089638
which was probably a dup of
https://bugzilla.mozilla.org/show_bug.cgi?id=1111217
which in turn was fixed by cset
https://hg.mozilla.org/integration/mozilla-inbound/rev/511d0d709406

none of that is directly caused by 865314 but because cancel's of h2 spiked with this patch, the above mentioned crashes spiked too. Indeed you can find > 1000 crashes related to 1111217 in the last 4 weeks of crash stats data, but they are pretty much limited to esr 31 (predates the fix).

So nothing should prevent landing this patch, except brad lassey ran a try build of 865314 plus 1111217 and experienced two crashes that had useless stacks. Now useless stacks have not really been a signature part of this bug, so I suspect that perhaps it was more indicative of something about try (or an ephemeral unrelated issue).

Patrick McManus [:mcmanus]

Assignee

Comment 43

•

8 years ago

hey brad, in the past you've had a knack for making this patch crash. I feel like its ready to reland but we still have an unresolved issue from 1111217 where you had a couple crashes with useless stacks that weren't explained. Enough water under the bridge to retest I think. Could you run it for a day and see if its stable for you?

there is a try build in comment 41

the try builds sets network.http.connection-retry-timeout to 1 just to make sure to exercise some code on the try tests. Feel free to set it back to 250 (which is how it will land) or just enjoy the background connections!

Flags: needinfo?(blassey.bugs)

Brad Lassey [:blassey] (use needinfo?)

Comment 44

•

8 years ago

No crashes, but the last two mornings this build has been completely hung. Unfortunately I don't have symbols so I can't get a stack for where we're hung.

Flags: needinfo?(blassey.bugs)

Patrick McManus [:mcmanus]

Assignee

Comment 45

•

8 years ago

Attached patch allow 6 parallel connects to new https origin — Details — Splinter Review

Patrick McManus [:mcmanus]

Assignee

Updated

•

8 years ago

Attachment #8751206 - Flags: review+

Patrick McManus [:mcmanus]

Assignee

Updated

•

8 years ago

Attachment #8528594 - Attachment is obsolete: true

Patrick McManus [:mcmanus]

Assignee

Updated

•

8 years ago

Attachment #8528591 - Attachment is obsolete: true

Patrick McManus [:mcmanus]

Assignee

Updated

•

8 years ago

Target Milestone: mozilla37 → mozilla49

Patrick McManus [:mcmanus]

Assignee

Comment 46

•

8 years ago

https://hg.mozilla.org/integration/mozilla-inbound/rev/b1e8d6cf54e6e3f46aa64872b3f376cc69c7effe
Bug 865314 - allow 6 parallel connects to new https origin r=hurley

Carsten Book [:Tomcat]

Comment 47

•

8 years ago

bugherder

https://hg.mozilla.org/mozilla-central/rev/b1e8d6cf54e6

Status: REOPENED → RESOLVED

Closed: 10 years ago → 8 years ago

status-firefox49: --- → fixed

Resolution: --- → FIXED

Patrick McManus [:mcmanus]

Assignee

Updated

•

8 years ago

Depends on: 1275917

You need to log in before you can comment on or make changes to this bug.