1157301 - Reconsider TCP keep-alive policies for mobile

James Willcox (:snorp) (jwillcox@mozilla.com) (he/him)

Reporter

Description

•

10 years ago

It has come to our attention that TCP keep-alives for persistent HTTP connections are significantly impacting our power consumption. We need to reconsider our behavior here, taking mobile radio performance into account. I think there are a lot options, among those: 1) Modify the keep-alive interval (currently 10s). Have different intervals depending on the type of interface used (wan vs wifi). 2) Don't run keep-alives at all if the app or tab is backgrounded 3) Kill keep-alive entirely on mobile 3a) Maybe keep the short-lived ones?

Aaron Train [:aaronmt]

Updated

•

10 years ago

tracking-fennec: --- → ?

Mark Finkle (:mfinkle) (use needinfo?)

Updated

•

10 years ago

tracking-fennec: ? → 40+

Patrick McManus [:mcmanus]

Comment 1

•

10 years ago

(In reply to James Willcox (:snorp) (jwillcox@mozilla.com) from comment #0) > It has come to our attention that TCP keep-alives for persistent HTTP > connections are significantly impacting our power consumption. its always hard to quantify this - so I'm curious what you're looking at. In particular this only happens on the idle period of a socket, and they get timed out after ~3 minutes max iirc.. fwiw you probably don't want to disable this - it leads to frustrating hanging behavior.. instead the knobs you would use would be to make it a little less frequent (up to the order of a minute on the outside) and just plain close idle connections more aggressively (which stay idle on the order of 3 minutes) sworkman can help with config.

James Willcox (:snorp) (jwillcox@mozilla.com) (he/him)

Reporter

Comment 2

•

10 years ago

Steve, what do you think?

Flags: needinfo?(sworkman)

Patrick McManus [:mcmanus]

Comment 3

•

10 years ago

(In reply to Patrick McManus [:mcmanus] from comment #1) > (In reply to James Willcox (:snorp) (jwillcox@mozilla.com) from comment #0) > > It has come to our attention that TCP keep-alives for persistent HTTP > > connections are significantly impacting our power consumption. > > its always hard to quantify this - so I'm curious what you're looking at. In > particular this only happens on the idle period of a socket, and they get > timed out after ~3 minutes max iirc.. > I am curious on this part.. because there certainly are moments in time that if extrapolated have a bad payoff.. but it doesn't seem like they should actually add up to much over time.

Steve Workman [:sworkman] (INACTIVE)

Comment 4

•

10 years ago

Hi James! There has been an obvious tradeoff here between power consumption and reliability of the TCP connections. I would echo Pat's question: Where and how have you observed that keepalives are causing significant power loss. Note: I'm not saying that they aren't, because they very well could be, but deciding how to move forward depends on what test was being run. And then I'd want to know how often the scenario(s) in the test are likely to come up. E.g. how often are long-lived connections made for the general user of Fennec? Do they keep Fennec from being closed in the background? - (I might have misunderstood Android's process management here, but they evict background apps after a while, right?) The ideas presented in the description might work very well. In order to balance power consumption and reliability, however, it's going to be necessary to have the data about which scenarios are the most costly. And like I just wrote, how often those scenarios are likely to come up. So, create some tests, play with the prefs and gather some data from telemetry about long-lived vs short-lived connections, effects of eviction etc.

Flags: needinfo?(sworkman)

James Willcox (:snorp) (jwillcox@mozilla.com) (he/him)

Reporter

Comment 5

•

10 years ago

(In reply to Steve Workman [:sworkman] (please use needinfo) from comment #4) > Hi James! > > There has been an obvious tradeoff here between power consumption and > reliability of the TCP connections. I would echo Pat's question: Where and > how have you observed that keepalives are causing significant power loss. > Note: I'm not saying that they aren't, because they very well could be, but > deciding how to move forward depends on what test was being run. I forwarded what we have. It's under NDA, so can't post here. > > And then I'd want to know how often the scenario(s) in the test are likely > to come up. E.g. how often are long-lived connections made for the general > user of Fennec? Do they keep Fennec from being closed in the background? - > (I might have misunderstood Android's process management here, but they > evict background apps after a while, right?) Android does evict background apps when it needs to free memory. In theory we could (and do, on some devices) run a very long time. That's why I think we should perhaps consider killing keep-alives when the app is backgrounded (which we already have events for...and maybe we do this already?). > > The ideas presented in the description might work very well. In order to > balance power consumption and reliability, however, it's going to be > necessary to have the data about which scenarios are the most costly. And > like I just wrote, how often those scenarios are likely to come up. So, > create some tests, play with the prefs and gather some data from telemetry > about long-lived vs short-lived connections, effects of eviction etc. Yeah. We don't really have the ability to do these power observations, though, so that makes this difficult. I like the idea about telemetry for long-lived connections. Do we have that already, or do we need to add it?

Flags: needinfo?(sworkman)

Steve Workman [:sworkman] (INACTIVE)

Comment 6

•

10 years ago

I'm unsure about that, I don't see anything in Histograms.json that has "TCP_" or "HTTP_" in it that looks like a measurement of long-lived connections. Pat can verify that.

Flags: needinfo?(sworkman) → needinfo?(mcmanus)

Patrick McManus [:mcmanus]

Comment 7

•

10 years ago

we don't have anything related to time.. we have some that count the number of transactions on a connection, but that can't tease apart the different patterns.

Flags: needinfo?(mcmanus)

Brad Lassey [:blassey] (use needinfo?)

Updated

•

10 years ago

tracking-fennec: 40+ → +

tracking-p11: --- → +

Patrick McManus [:mcmanus]

Updated

•

9 years ago

Whiteboard: [necko-backlog]

Firefox Bug Husbandry Bot

Comment 8

•

8 years ago

Bulk change to priority: https://bugzilla.mozilla.org/show_bug.cgi?id=1399258

Priority: -- → P1

Firefox Bug Husbandry Bot

Comment 9

•

8 years ago

Bulk change to priority: https://bugzilla.mozilla.org/show_bug.cgi?id=1399258

Priority: P1 → P3

BMO Automation

Updated

•

3 years ago

Severity: normal → S3

Bugzilla

Reconsider TCP keep-alive policies for mobile

Categories

(Core :: Networking: HTTP, defect, P3)

Tracking

()

People

(Reporter: snorp, Unassigned)

References

Details

(Whiteboard: [necko-backlog])

Crash Data

Security

(public)

User Story

Description

Updated

Updated

Comment 1

Comment 2

Comment 3

Comment 4

Comment 5

Comment 6

Comment 7

Updated

Updated

Comment 8

Comment 9

Updated