1359697 - Don't poll for captive portal detection

Reporter

Description

•

9 years ago

I got a new Mac. It complained that Nightly was using lots of power. As well as finding all the usual suspects (add-ons, Test Pilot), I noticed that we were hitting the captive portal endpoint every few minutes. This was happening: - When Firefox was backgrounded. - With no page loads happening. - On a stable wireless network on which this machine has spent 100% of its life. - On an OS with its own reliable captive portal detection. This is just bad enough to keep the machine from going into a low power state, but just useless enough that it's statistically unlikely to beat the OS (or the user!) to detecting a captive portal. To do that we'd need to poll very frequently. Can someone explain why we poll at all? Is this to catch the hotel 24-hour logout case? Possible solutions: - Don't poll; wait for OS network transition events. - Don't poll when Firefox is not the foreground application. - Don't do captive portal detection on Mac OS or modern Windows by default; they have good captive portal detection systems already. - Don't poll on a separate schedule; instead coalesce these polls with a failed telemetry upload, blocklist ping, update check, or one of the countless other network hits we do.

Patrick McManus [:mcmanus]

Comment 1

•

9 years ago

after the initial check passes (i.e. you're not on CP), the algorithm is not expected to continue to poll unless something else suspicious is observed (cert failures iirc).. valentin will take a look. he'll probably need logs.

Flags: needinfo?(valentin.gosu)

Patrick McManus [:mcmanus]

Updated

•

9 years ago

Assignee: nobody → valentin.gosu

Whiteboard: [power] → [power][necko-active]

Richard Newman [:rnewman]

Reporter

Comment 2

•

9 years ago

Definitely happening frequently. I have about 10 sitting in the log right now, after I restarted to flip the pref back. The stack for each is the same: URLFetcher jar:file:///Applications/FirefoxNightly.app/Contents/Resources/omni.ja!/components/captivedetect.js:63:3 _startDetection jar:file:///Applications/FirefoxNightly.app/Contents/Resources/omni.ja!/components/captivedetect.js:312:22 finishPreparation jar:file:///Applications/FirefoxNightly.app/Contents/Resources/omni.ja!/components/captivedetect.js:282:5 _applyDetection jar:file:///Applications/FirefoxNightly.app/Contents/Resources/omni.ja!/components/captivedetect.js:301:7 _runNextRequest jar:file:///Applications/FirefoxNightly.app/Contents/Resources/omni.ja!/components/captivedetect.js:408:7 _addRequest jar:file:///Applications/FirefoxNightly.app/Contents/Resources/omni.ja!/components/captivedetect.js:416:7 checkCaptivePortal jar:file:///Applications/FirefoxNightly.app/Contents/Resources/omni.ja!/components/captivedetect.js:266:5 Let me know what you need, Valentin.

Ryan VanderMeulen [:RyanVM]

Updated

•

9 years ago

status-firefox55: --- → affected

status-firefox57: affected → ---

Valentin Gosu [:valentin]

Assignee

Comment 3

•

9 years ago

So, the polling logic is like this: The initial polling time is network.captive-portal-service.minInterval If nothing happens to trigger a recheck, this time exponentially grows to network.captive-portal-service.maxInterval So we start polling every 60s the first 10 times, then every 5 minutes 10 more times, and so on until we reach the maxInterval. If something happens that calls RecheckCaptivePortal - redirect to local IP address, cert errors, coming back from stand-by, or network state changes, we start polling using the minInterval. I agree with you that maybe this isn't the most energy efficient way of checking. I am open to suggestions on how to improve this. My first ideas would be to add a pref for how quickly we grow the polling interval. (In reply to Richard Newman [:rnewman] from comment #0) > were hitting the captive portal endpoint every few minutes. Do you know how much this affects your battery life? Does it cut it down by half? 10%? less? > - When Firefox was backgrounded. We would like the CP portal state to be detected by the time the user brings Firefox to the foreground. I don't know if we can do this without polling. > - On an OS with its own reliable captive portal detection. Some CP implementations block the native Mac/Windows detection endpoints, so they don't get detected properly. > This is just bad enough to keep the machine from going into a low power > state, but just useless enough that it's statistically unlikely to beat the > OS (or the user!) to detecting a captive portal. To do that we'd need to > poll very frequently. Our current behaviour is a compromise, to avoid polling too frequently, but still allow detection to happen before a user needs to navigate to something. > Can someone explain why we poll at all? Is this to catch the hotel 24-hour > logout case? Yes. Some CPs lock you out after 30 minutes, or 1 hour, etc. There is currently no way of knowing how much time the user has left. > Possible solutions: > > - Don't poll; wait for OS network transition events. We trigger detection when network changes happen. > - Don't poll when Firefox is not the foreground application. This would reduce the usefulness of our detection. The polling is specifically used to address the background case, when the user isn't actively using the network. > - Don't do captive portal detection on Mac OS or modern Windows by default; This would reduce its usefulness. There are corner cases which Win/Mac doesn't catch, but we do. Figuring out a better strategy seems better than simply disabling it. > - Don't poll on a separate schedule; instead coalesce these polls with a > failed telemetry upload, blocklist ping, update check, or one of the > countless other network hits we do. We kind of do this already. If a background connection happens, and it fails (cert error or redirect) we trigger detection immediately.

Flags: needinfo?(valentin.gosu)

Patrick McManus [:mcmanus]

Comment 4

•

9 years ago

my understanding was that if no CP was ever detected in the session (based on the initial check) we would only poll again based on some kind of signal such as link change or maybe cert fail. but there wouldn't be a polling interval at all. That was for exactly the reason in this bug. That's different than a session that has seen a CP - inevitably its going to have to do some polling.

Richard Newman [:rnewman]

Reporter

Comment 5

•

9 years ago

(In reply to Valentin Gosu [:valentin] from comment #3) > I agree with you that maybe this isn't the most energy efficient way of > checking. I am open to suggestions on how to improve this. Perhaps a better option is to not reduce the polling interval on most of these? That is, it's fine to _bring forward_ the next check if I switch networks or return from idle, but just because I returned from idle doesn't mean we should start polling every minute. For many users that'll effectively cause them to spend most of their time polling every minute. I'm not really sure what the point of frequent initial polling is. Isn't it enough to poll every 30 minutes, and also to poll when the user foregrounds Firefox + shortly after waking from sleep + shortly after a network change? IMO a successful poll should _immediately_ kick us into the longest polling duration. There's no point going "yup, no captive portal! better double-check in 60 seconds". I also wonder if there's a return-from-standby/idle/idle timer bug here. > My first ideas > would be to add a pref for how quickly we grow the polling interval. I don't think adding a pref really helps; users will not open about:config, they'll simply switch away from Firefox when Apple suggests to them that their battery will last four hours longer if they use Safari. I felt a strong temptation to do so myself. Captive portal polling is, I'm sure, not the main offender, but every little helps. > Do you know how much this affects your battery life? Does it cut it down by > half? 10%? less? It's hard to measure. A Firefox with a handful of static pages open in tabs is still often present in macOS's "apps using significant power" list. Certainly polling once per minute or two over the ~5h I'd get out of a 15" MBP will put the wlan card into a high-power mode quite often, and unnecessarily. More importantly, this is just about the worst possible timing for a user tethered to a cellular device: if I were idle on my iPhone's hotspot, Firefox would _kill_ my phone's battery, by keeping it in high-power radio mode for 25-50% of the time just to poll. > We would like the CP portal state to be detected by the time the user brings > Firefox to the foreground. I don't know if we can do this without polling. Isn't it sufficient to check as soon as Firefox is foregrounded? I mean, the user isn't going to interact with the CP page without switching to the browser, and milliseconds after foreground is just as good as a minute before. Perhaps a reasonable compromise, as I suggest above, is using a longer polling interval plus event-based checks. > > - Don't poll on a separate schedule; instead coalesce these polls with a > > failed telemetry upload, blocklist ping, update check, or one of the > > countless other network hits we do. > > We kind of do this already. If a background connection happens, and it fails > (cert error or redirect) we trigger detection immediately. OK, so this is even more evidence that rapid polling is unnecessary. Firefox Sync runs every few minutes, plus on network state changes. Telemetry uploads, blocklist ping, GMP updates, safe browsing downloads, etc. all run, too. It's tempting to say "easy, just wake up and see if we recently made a successful network request, and skip this check". However, ideally a solution doesn't even involve waking up to figure out if we should poll -- that might be enough to kick this big quad-core CPU out of a low-power mode, which is also going to affect my battery life. Do you have telemetry to suggest how often a timed captive portal detection found a CP? Do you have telemetry to suggest what proportion of those would not have been found in time by one of the other mechanisms? My hypothesis is that between background operations like Sync, a long-timer poll, and checks based on user activity and network transitions, the short-timer poll doesn't really help.

Comment hidden (mozreview-request)

Ben Kelly [:bkelly, not reviewing]

Comment 7

•

9 years ago

We shipped captive portal stuff on FF53, right? Will the changes here be upliftable to FF54?

Patrick McManus [:mcmanus]

Comment 8

•

9 years ago

mozreview-review

Comment on attachment 8864445 [details] Bug 1359697 - Make CaptivePortalService not poll for status when it is in the NOT_CAPTIVE state https://reviewboard.mozilla.org/r/136124/#review139352

Attachment #8864445 - Flags: review?(mcmanus) → review+

Valentin Gosu [:valentin]

Assignee

Comment 9

•

9 years ago

(In reply to Ben Kelly [not reviewing due to deadline][:bkelly] from comment #7) > We shipped captive portal stuff on FF53, right? Will the changes here be > upliftable to FF54? I think so. We haven't made a lot of changes to the CaptivePortalService since then.

Pulsebot

Comment 10

•

9 years ago

Pushed by valentin.gosu@gmail.com: https://hg.mozilla.org/integration/autoland/rev/8a1206836aa4 Make CaptivePortalService not poll for status when it is in the NOT_CAPTIVE state r=mcmanus

Carsten Book [:Tomcat]

Comment 11

•

9 years ago

bugherder

https://hg.mozilla.org/mozilla-central/rev/8a1206836aa4

Status: NEW → RESOLVED

Closed: 9 years ago

status-firefox55: affected → fixed

Resolution: --- → FIXED

Target Milestone: --- → mozilla55

Valentin Gosu [:valentin]

Assignee

Comment 12

•

9 years ago

Comment on attachment 8864445 [details] Bug 1359697 - Make CaptivePortalService not poll for status when it is in the NOT_CAPTIVE state Approval Request Comment [Feature/Bug causing the regression]: Initial captive portal impl - bug 1048131 [User impact if declined]: Unnecessary network requests will be made leading to extra battery use and extra load on our endpoint. [Is this code covered by automated tests?]: No. [Has the fix been verified in Nightly?]: Yes. [Needs manual test from QE? If yes, steps to reproduce]: Yes. Start wireshark on a normal network. Start firefox. Keep watching for 10 minutes. Only one or two requests to the captive portal URL should be made. We should also check the main use cases for captive portal detection. [List of other uplifts needed for the feature/fix]: none. [Is the change risky?]: no. [Why is the change risky/not risky?]: This is a fairly simple change. [String changes made/needed]: none

Attachment #8864445 - Flags: approval-mozilla-beta?

Attachment #8864445 - Flags: approval-mozilla-aurora?

Valentin Gosu [:valentin]

Assignee

Updated

•

9 years ago

Blocks: 1363651

Gerry Chang [:gchang]

Comment 13

•

9 years ago

Hi Brindusa, could you help find someone to verify if this issue was fixed as expected on a latest Nightly build? Thanks!

status-firefox54: --- → affected

Flags: qe-verify+

Flags: needinfo?(brindusa.tot)

Adrian Florinescu [:aflorinescu]

Comment 14

•

9 years ago

(In reply to Gerry Chang [:gchang] from comment #13) > Hi Brindusa, could you help find someone to verify if this issue was fixed > as expected on a latest Nightly build? Thanks! I can verify this fix. (In reply to Valentin Gosu [:valentin] from comment #12) > [Needs manual test from QE? If yes, steps to reproduce]: > Yes. Start wireshark on a normal network. Start firefox. Keep watching for > 10 minutes. Only one or two requests to the captive portal URL should be > made. > We should also check the main use cases for captive portal detection. Valentin, could you please help me out with some hints on what to look for in wireshark? I'm not that experienced at using it. Also, I'd probably include a very brief smoke on the detection on Win/Linux/Mac, since we know that the behavior of the CP varies with the OS. Please let me know if you think this is unnecessary work for this fix.

Flags: needinfo?(valentin.gosu)

Flags: needinfo?(brindusa.tot)

Flags: needinfo?(adrian.florinescu)

Valentin Gosu [:valentin]

Assignee

Comment 15

•

9 years ago

(In reply to Adrian Florinescu [:AdrianSV] from comment #14) > > We should also check the main use cases for captive portal detection. > Valentin, could you please help me out with some hints on what to look for > in wireshark? I'm not that experienced at using it. Filter by http requests, there shouldn't be too many of them. I think you can even apply a filter such as http.host=="detectportal.firefox.com" and it should only show you the requests that interest you. Make sure there are no more than 2 requests when starting Firefox, and that no other requests are made when NOT in a captive portal. Make sure to use a clean profile just in case. > Also, I'd probably include a very brief smoke on the detection on > Win/Linux/Mac, since we know that the behavior of the CP varies with the OS. > Please let me know if you think this is unnecessary work for this fix. I think only one platform should be enough.

Flags: needinfo?(valentin.gosu)

Liz Henry (:lizzard)

Comment 16

•

9 years ago

Sounds like this affects 53, let's track it.

status-firefox53: --- → affected

tracking-firefox53: --- → +

tracking-firefox54: --- → +

Liz Henry (:lizzard)

Comment 17

•

9 years ago

Valentin, if you want to request uplift to mozilla-release here we can try to include this fix in the upcoming 53.0.3 release.

Flags: needinfo?(valentin.gosu)

Benson Wong [:mostlygeek]

Updated

•

9 years ago

Comment 18

•

9 years ago

Comment on attachment 8864445 [details] Bug 1359697 - Make CaptivePortalService not poll for status when it is in the NOT_CAPTIVE state Based on conversation on irc, email, and with :mcmanus let's just take this patch on m-r for the 53 dot release as it sounds pretty safe, and should decrease the frequency that we are polling.

Attachment #8864445 - Flags: approval-mozilla-release+

Attachment #8864445 - Flags: approval-mozilla-beta?

Attachment #8864445 - Flags: approval-mozilla-beta+

Attachment #8864445 - Flags: approval-mozilla-aurora?

Attachment #8864445 - Flags: approval-mozilla-aurora+

Liz Henry (:lizzard)

Updated

•

9 years ago

Attachment #8864445 - Flags: approval-mozilla-aurora+ → approval-mozilla-aurora-

Liz Henry (:lizzard)

Comment 19

•

9 years ago

I think this also would affect esr52, but I think the fix could wait for the next ESR (52.2.0) in June.

status-firefox-esr52: --- → ?

tracking-firefox-esr52: --- → ?

Liz Henry (:lizzard)

Comment 20

•

9 years ago

Tomcat, can you land this when you get a chance? Julien, this should land for the dot release.

Flags: needinfo?(jcristau)

Flags: needinfo?(cbook)

Carsten Book [:Tomcat]

Comment 21

•

9 years ago

(In reply to Liz Henry (:lizzard) (needinfo? me) from comment #20) > Tomcat, can you land this when you get a chance? Julien, this should land > for the dot release. will do it first thing today

Flags: needinfo?(cbook)

Carsten Book [:Tomcat]

Comment 22

•

9 years ago

bugherder uplift

https://hg.mozilla.org/releases/mozilla-beta/rev/029eee8618cf

status-firefox54: affected → fixed

Carsten Book [:Tomcat]

Comment 23

•

9 years ago

uplift

https://hg.mozilla.org/releases/mozilla-release/rev/1fe643b3421a27640a61a3b2b8b0986c7234e314

Julien Cristau [:jcristau]

Updated

•

9 years ago

Flags: needinfo?(jcristau)

Ryan VanderMeulen [:RyanVM]

Updated

•

9 years ago

status-firefox53: affected → fixed

status-firefox-esr52: ? → affected

Ryan VanderMeulen [:RyanVM]

Updated

•

9 years ago

Attachment #8864445 - Flags: approval-mozilla-esr52?

Adrian Florinescu [:aflorinescu]

Comment 24

•

9 years ago

Verified the fix on: Windows 10 x64 using: FF Version 53.0.3 20170518000419 FF Version 55.0a1 20170518100156 Also checked basic captive portal detection and everything seems in good shape.

status-firefox53: fixed → verified

status-firefox55: fixed → verified

Flags: needinfo?(adrian.florinescu)

Julien Cristau [:jcristau]

Comment 25

•

9 years ago

Comment on attachment 8864445 [details] Bug 1359697 - Make CaptivePortalService not poll for status when it is in the NOT_CAPTIVE state let's get this fix in esr52 (default branch, for 52.2esr)

Attachment #8864445 - Flags: approval-mozilla-esr52? → approval-mozilla-esr52+

Julien Cristau [:jcristau]

Updated

•

9 years ago

tracking-firefox-esr52: ? → 54+

Ryan VanderMeulen [:RyanVM]

Comment 26

•

9 years ago

bugherder uplift

https://hg.mozilla.org/releases/mozilla-esr52/rev/ca2b5274549f

status-firefox-esr52: affected → fixed

Julien Cristau [:jcristau]

Comment 27

•

9 years ago

Included in 53.0.3 relnotes.

relnote-firefox: --- → 53+

Bogdan Maris, Desktop Test Engineering

Comment 28

•

8 years ago

I also verified this issue using Firefox 54 beta 10 on Windows 10 64 bit. I could only see 2 requests from http.host=="detectportal.firefox.com" in Wireshark, using bad build 53.0.2 I could see 11 requests.

status-firefox54: fixed → verified

Flags: qe-verify+

Valentin Gosu [:valentin]

Assignee

Updated

•

8 years ago

Flags: needinfo?(valentin.gosu)