1152264 - Push API constantly doing requests

Reporter

Description

•

11 years ago

This is the second device getting into my hands with the same symptom: logcat spam about push.services.mozilla.com Looking at the prefs, in both case, I had: > user_pref("services.push.adaptive.lastGoodPingInterval.mobile", 0); > user_pref("services.push.adaptive.lastGoodPingInterval.wifi", 0); > user_pref("services.push.adaptive.mobile", "mobile-208-15"); > user_pref("services.push.pingInterval", 0); > user_pref("services.push.pingInterval.mobile", 0); > user_pref("services.push.pingInterval.wifi", 0); In all cases, this was on device upgraded and at some poing being in 2.2. Looks like we may have a bad regression at some point. Removing those prefs, device stops doing constant requests, and I could not notice any broken feature (tested with Find My Device)

:gerard-majax

Reporter

Updated

•

11 years ago

Component: General → DOM: Push Notifications

Product: Firefox OS → Core

Gregor Wagner [:gwagner]

Comment 1

•

11 years ago

Can QA reproduce?

Keywords: qawanted

Pi Wei Cheng [:piwei] (inactive)

Comment 2

•

11 years ago

Not sure what was being done here. Could you elaborate on what you did and your environmental variables? Device, branch (it says 'upgraded' to 2.2, was it OTA? were you on 2.2 and OTA'ed to 2.2?), what settings were enabled... anything specific that could help QA reproduce the issue.

Flags: needinfo?(lissyx+mozillians)

:gerard-majax

Reporter

Comment 3

•

11 years ago

(In reply to Pi Wei Cheng [:piwei] from comment #2) > Not sure what was being done here. Could you elaborate on what you did and > your environmental variables? Device, branch (it says 'upgraded' to 2.2, was > it OTA? were you on 2.2 and OTA'ed to 2.2?), what settings were enabled... > anything specific that could help QA reproduce the issue. As I said, this happened on multiple devices. I cannot give more specifics, since I don't know when it started, I only know when I noticed ... On one device I noticed it while working on Find My Device on current master, on another device that is not mine, it was when the contributor brought it to me because there were a lot of weird issues. So the only common variable between both is that at some point, they got a 2.2 build. They got updated in different way (sometimes OTA, sometimes flashing boot and system partition), and that should not have an impact. In both case, those prefs got set. This is probably what you should track: when did those prefs get set ...

Flags: needinfo?(lissyx+mozillians)

Pi Wei Cheng [:piwei] (inactive)

Updated

•

11 years ago

Keywords: qawanted → steps-wanted

La Bonave

Comment 5

•

11 years ago

Hi, I ran into exactly the same problem as Alexandre. (ZTE Open C French, with 2.2 OTA'ed multiple times) After noticing serious data usage on mobile & wifi (I'd say a week ago, roughly), I first updated the B2G 2.2 to an early april release (by sideloading), without any difference. In a shell, after a reboot and while I still had a massive data exchange (and battery dropping), I did a netstat, that only showed two https connections to an amazon AWS instance that had a certificate issued to push.services.mozilla.com. I found this bug and did remove in the prefs.js the following lines : > user_pref("services.push.adaptive.lastGoodPingInterval.mobile", 0); > user_pref("services.push.adaptive.lastGoodPingInterval.wifi", 0); > user_pref("services.push.pingInterval", 0); > user_pref("services.push.pingInterval.mobile", 0); > user_pref("services.push.pingInterval.wifi", 0); as well as the userAgentID one. Restarted B2G, and solved the issue, data consumption dropped to a small 63 kB in 5 minutes instead of nearly 500 kB every 3-4 minutes.

Matěj Cepl

Comment 7

•

11 years ago

Yes, I have user_pref("services.push.adaptive.lastGoodPingInterval.mobile", 0); user_pref("services.push.adaptive.lastGoodPingInterval.wifi", 0); user_pref("services.push.adaptive.mobile", "mobile-230-02"); user_pref("services.push.pingInterval", 0); user_pref("services.push.pingInterval.mobile", 0); user_pref("services.push.pingInterval.wifi", 0); user_pref("services.push.userAgentID", "220a9c0439954e5a89e30ec440a46d68"); What does it mean?

Matěj Cepl

Comment 8

•

11 years ago

(In reply to Matěj Cepl from comment #7) > Yes, I have After the removal of the preferences from /data/b2g/mozilla/*.default/prefs.js the connection looks more sane.

:gerard-majax

Reporter

Comment 9

•

11 years ago

Johan, this is the bug we talked about. I have no idea when this started, though.

Flags: needinfo?(jlorenzo)

Panos Astithas (he/him) [:past] (please ni?)

Comment 10

•

11 years ago

When I upgraded (manual build) from 2.3 to 3.0 I made a clean install, so it's not a case of 2.2 prefs surviving the upgrade. After that I've been using backup_restore_profile.py to migrate my data over an update.

Panos Astithas (he/him) [:past] (please ni?)

Comment 11

•

11 years ago

Just resetting the ping interval prefs from WebIDE seems to have fixed it for me.

Johan Lorenzo [:jlorenzo]

Comment 12

•

11 years ago

I couldn't find some exact steps to reproduce. However, I have some suspicions on the function called _calculateAdaptivePing[1]. If I understand the function correctly, there is a way to reduce the ping interval down to 0 if the Web Socket is still down after a certain amount of retries. For instance: The default value of pingInterval is 180000 (3 minutes), after 3 retries * 17 iterations with the web socket down, Math.floor() will return 0. > if (wsWentDown) { > debug('The WebSocket was disconnected, calculating next ping'); > > // If we have not tried this pingInterval yet, initialize > this._pingIntervalRetryTimes[lastTriedPingInterval] = > (this._pingIntervalRetryTimes[lastTriedPingInterval] || 0) + 1; > > // Try the pingInterval at least 3 times, just to be sure that the > // calculated interval is not valid. > if (this._pingIntervalRetryTimes[lastTriedPingInterval] < 2) { > debug('pingInterval= ' + lastTriedPingInterval + ' tried only ' + > this._pingIntervalRetryTimes[lastTriedPingInterval] + ' times'); > return; > } > > // Latest ping was invalid, we need to lower the limit to limit / 2 > nextPingInterval = Math.floor(lastTriedPingInterval / 2); The function has been implemented in bug 894879 and hasn't changed a lot since, so we might have this issue since 2.0. Guillermo, Nikhil, do you think the given example scenario is plausible? If not, do you think we might decrease more than we increase the ping after a long period of time? [1] http://mxr.mozilla.org/mozilla-central/source/dom/push/PushService.jsm#679

status-b2g-v2.0: --- → ?

status-b2g-v2.1: --- → ?

status-b2g-v2.2: --- → affected

status-b2g-master: --- → affected

Flags: needinfo?(willyaranda)

Flags: needinfo?(nsm.nikhil)

Flags: needinfo?(jlorenzo)

Comment 13

•

11 years ago

I started noticing this this week as well user_pref("services.push.adaptive.lastGoodPingInterval.mobile", 0); user_pref("services.push.adaptive.lastGoodPingInterval.wifi", 0); user_pref("services.push.adaptive.mobile", "mobile-234-30"); user_pref("services.push.pingInterval", 0); user_pref("services.push.pingInterval.mobile", 0); user_pref("services.push.pingInterval.wifi", 0); on my z3c

Dale Harvey (:daleharvey)

Comment 14

•

11 years ago

As far as I can tell, this was interfering with my ability to actually connect to data networks, I could only connect if I restarted and when I disabled data connection I could never reconnect. It also looks to have used 400MB of my roaming data (£40 worth)

Gregor Wagner [:gwagner]

Updated

•

11 years ago

blocking-b2g: 2.2? → 2.2+

Nikhil Marathe [:nsm] (No longer reading bugmail, please needinfo?)

Comment 15

•

11 years ago

Fernando, I thought we had lower limits. Why are the limits converging to zero then? Comment 12 seems relevant.

Flags: needinfo?(nsm.nikhil) → needinfo?(frsela)

Johan Lorenzo [:jlorenzo]

Comment 16

•

11 years ago

Clearing steps-wanted while we have more information about the limits.

Keywords: steps-wanted

Guillermo López :willyaranda (probably SLOW response)

Comment 17

•

11 years ago

I don't see the pref "services.push.adaptive.gap" (defaults to 60000ms) that should limit our minimum interval to that value. Also, "services.push.adaptive.enabled" is not there, nor "services.push.pingInterval.default". Could you double check? In this case, the value that we use for the ping interval is > user_pref("services.push.pingInterval", 0); Changing this to other value should fix the 0 interval.

Flags: needinfo?(willyaranda) → needinfo?(jlorenzo)

:gerard-majax

Reporter

Comment 18

•

11 years ago

And another report from OpenC builds: https://bugzilla.frenchmozilla.org/show_bug.cgi?id=642

Johan Lorenzo [:jlorenzo]

Comment 19

•

11 years ago

services.push.adaptive.enabled and services.push.pingInterval.default are not located in prefs.js. Hence, the values used are the one by default (true and 180000). If I understand the condition using "adaptative.gap"[1] correctly, the only time where it'll go in the else will be when nextPingInterval is 60s more than the last previous. In every other case, this_recalculatePing will be set to false. Once that done, you just need the websocket to be down to fall into [2]. So, if I understand correctly, after 2 tries, you divide this._lastGoodPingInterval by 2 and that's how you manage to set services.push.adaptive.lastGoodPingInterval to 0 (like in every testimonial). Do you think that could explain why the limit is converging to 0, Guillermo? [1] http://mxr.mozilla.org/mozilla-central/source/dom/push/PushService.jsm#769 [2] http://mxr.mozilla.org/mozilla-central/source/dom/push/PushService.jsm#732

Flags: needinfo?(jlorenzo) → needinfo?(willyaranda)

Fernando R. Sela (no CC, needinfo please) [:frsela]

Assignee

Comment 20

•

11 years ago

We should review this method avoiding 0 intervals. Thank you Nikhil, I'll take a look in the following days.

Flags: needinfo?(frsela)

Marcelino Veiga Tuimil [:sonmarce]

Comment 21

•

11 years ago

Push ping interval should be configured according to target mobile network configuration, depending on how long a TCP connection remains open. Of course a value of "0" is wrong, I do not know why it is included in build, but it should be removed, or replaced by a better one.

Fernando R. Sela (no CC, needinfo please) [:frsela]

Assignee

Comment 22

•

11 years ago

Fully agree, These values are used by the push agent without checking a minimum value [1] , so 0 is the same as "inmediatenly" Please, change the build default preferences to reasonable values. Moreover, B2G preferences file has 3 minutes per each pinginterval [2] Closing this bug since this isn't related to gecko code but OEM customization. Reopen if you consider it. [1] http://mxr.mozilla.org/mozilla-central/source/dom/push/PushService.jsm#716 [2] http://mxr.mozilla.org/mozilla-central/source/b2g/app/b2g.js#487 (In reply to Marcelino Veiga Tuimil [:sonmarce] from comment #21) > Push ping interval should be configured according to target mobile network > configuration, depending on how long a TCP connection remains open. Of > course a value of "0" is wrong, I do not know why it is included in build, > but it should be removed, or replaced by a better one.

Status: NEW → RESOLVED

Closed: 11 years ago

Resolution: --- → INVALID

:gerard-majax

Reporter

Comment 23

•

11 years ago

That's unrelated with OEM customization. This happened on multiple different devices and builds. Nobody ever set those prefs by hands, that's the point.

Johan Lorenzo [:jlorenzo]

Comment 24

•

11 years ago

(In reply to Marcelino Veiga Tuimil [:sonmarce] from comment #21) > I do not know why it is included in build, but it should be removed, or replaced by a better one. Sorry, this is not the point of this bug. The main issue is to understand and fix the convergence of services.push.adaptive.lastGoodPingInterval to 0 in prefs.js. b2g.js remained untouched. No OEM customization has been made.

Status: RESOLVED → REOPENED

Resolution: INVALID → ---

Fernando R. Sela (no CC, needinfo please) [:frsela]

Assignee

Comment 25

•

11 years ago

(In reply to Johan Lorenzo [:jlorenzo] (QA) from comment #24) > (In reply to Marcelino Veiga Tuimil [:sonmarce] from comment #21) > > I do not know why it is included in build, but it should be removed, or replaced by a better one. > > Sorry, this is not the point of this bug. The main issue is to understand > and fix the convergence of services.push.adaptive.lastGoodPingInterval to 0 > in prefs.js. b2g.js remained untouched. No OEM customization has been made. Sorry for the misunderstanding

Bobby Chien

Comment 26

•

10 years ago

Doug, could you help on this?

Flags: needinfo?(dougt)

Fernando R. Sela (no CC, needinfo please) [:frsela]

Assignee

Updated

•

10 years ago

Assignee: nobody → frsela

Comment hidden (typo)

Fernando R. Sela (no CC, needinfo please) [:frsela]

Assignee

Comment 28

•

10 years ago

Sorry, I commented into an incorrect bug. Please forgot Comment #27

Fernando R. Sela (no CC, needinfo please) [:frsela]

Assignee

Comment 29

•

10 years ago

Attached patch Bug1152264.patch (obsolete) — Details — Splinter Review

This patch adds a protection which avoids pings lower than 1 minute. Meanwhile I'll study why the algorithm goes to 0

Attachment #8596570 - Flags: feedback?(dougt)

Fernando R. Sela (no CC, needinfo please) [:frsela]

Assignee

Comment 30

•

10 years ago

Can you provide some log traces of the Push system [1] when this failure happens? [1] http://mxr.mozilla.org/mozilla-central/source/b2g/app/b2g.js#469

Flags: needinfo?(lissyx+mozillians)

:gerard-majax

Reporter

Comment 31

•

10 years ago

(In reply to Fernando R. Sela (no CC, needinfo please) [:frsela] from comment #30) > Can you provide some log traces of the Push system [1] when this failure > happens? > > [1] http://mxr.mozilla.org/mozilla-central/source/b2g/app/b2g.js#469 If by "some log traces of the Push system when this failure happens", I'm afraid we cannot: the issue has been sitting on the devices for at least days until we got to the prefs that were the cause of the data activity. So, again, we have no clear idea when/how this started.

Flags: needinfo?(lissyx+mozillians)

Doug Turner (:dougt)

Comment 32

•

10 years ago

Alexandre, can you run with this patch for a couple days. It looks like it should solve this problem.

Flags: needinfo?(dougt)

:gerard-majax

Reporter

Comment 33

•

10 years ago

(In reply to Doug Turner (:dougt) from comment #32) > Alexandre, can you run with this patch for a couple days. It looks like it > should solve this problem. I already fixed all my devices and those of people who came to me with this issue. I don't really know what I can test for now, the pref value is good :(. Especially until we have a completely documented status of the triggering conditions. Johan already started this work.

Johan Lorenzo [:jlorenzo]

Comment 34

•

10 years ago

As Alexandre suggested offline, one way to make sure this pref is never set to something lower than 60 seconds, would be to wrap the setter of the pref and choose the Max value between the candidate and 60000ms. From a QA standpoint, this would also help to lower the complexity of the _calculateAdaptivePing(). This function alone has currently a cyclomatic complexity of 21 [1], adding another "if" in the middle would make it worse. Moreover, _calculateAdaptivePing() is currently not unit tested at all. If I understand the code correctly, this function doesn't depend on bug 1038078 to be unit tested. What do you think Fernando? Also, after reading _calculateAdaptivePing() a couple of times, I am under the impression that this functions has too many responsibilities. I have some trouble to know how many scenarios we'd have to test. I think one of solution could be to break it down to functions with only 1 responsibility that we could easily unit test. To sum up, as the testing of this bug is currently nearly impossible, I'd recommend to wrap the pref set and uplift this small patch. In a second time, I'd add some unit tests for this particular function. Then, break it down and add the tests we forgot before the refactor. I hope this will make bug harder to hide or detect. Does this sounds like strategy to you guys? [1] http://jsmeter.info/wv7ocd/1

Flags: needinfo?(lissyx+mozillians)

Flags: needinfo?(frsela)

Flags: needinfo?(dougt)

:gerard-majax

Reporter

Comment 35

•

10 years ago

Whatever we can do to fix this mess up to v2.0 (because 2.0 looks impacted, too) is fine by me.

Flags: needinfo?(lissyx+mozillians)

Doug Turner (:dougt)

Updated

•

10 years ago

Attachment #8596570 - Flags: feedback?(dougt) → review?(nsm.nikhil)

Nikhil Marathe [:nsm] (No longer reading bugmail, please needinfo?)

Comment 36

•

10 years ago

Comment on attachment 8596570 [details] [diff] [review] Bug1152264.patch Review of attachment 8596570 [details] [diff] [review]: ----------------------------------------------------------------- r=me with comments. ::: dom/push/PushService.jsm @@ +783,5 @@ > this._wsWentDownCounter = 0; > this._recalculatePing = true; > this._lastGoodPingInterval = Math.floor(lastTriedPingInterval / 2); > + if (this._lastGoodPingInterval < 60000) { > + // We set a lower security limit. 1 minute is the less allowed ping interval Just, "1 minute is the least allowed ping interval" 1 minute sounds a little terrible too. Maybe 5 minutes? Could you move 60000 to a file level constant at the least. Thanks!

Attachment #8596570 - Flags: review?(nsm.nikhil) → review+

Doug Turner (:dougt)

Updated

•

10 years ago

Flags: needinfo?(dougt)

Fernando R. Sela (no CC, needinfo please) [:frsela]

Assignee

Comment 37

•

10 years ago

(In reply to Nikhil Marathe [:nsm] (needinfo? please) from comment #36) > Comment on attachment 8596570 [details] [diff] [review] > Bug1152264.patch > > Review of attachment 8596570 [details] [diff] [review]: > ----------------------------------------------------------------- > > r=me with comments. > > ::: dom/push/PushService.jsm > @@ +783,5 @@ > > this._wsWentDownCounter = 0; > > this._recalculatePing = true; > > this._lastGoodPingInterval = Math.floor(lastTriedPingInterval / 2); > > + if (this._lastGoodPingInterval < 60000) { > > + // We set a lower security limit. 1 minute is the less allowed ping interval > > Just, "1 minute is the least allowed ping interval" > 1 minute sounds a little terrible too. Maybe 5 minutes? Agree, too short, but in some networks (After our lab tests, Vivo Brasil[1] cuts the connection after 5 minutes so will be failing continously. Don't know about other networks) > Could you move 60000 to a file level constant at the least. Thanks! Fully agree !, I'll set it in a constant, but with which minimum value? 3min? [1] http://mxr.mozilla.org/mozilla-central/source/dom/push/PushService.jsm#696

Flags: needinfo?(frsela)

:gerard-majax

Reporter

Comment 38

•

10 years ago

While checking for something else, I had a look at my prefs.js file, and noticed that the interval is getting low already: > user_pref("services.push.adaptive.lastGoodPingInterval.mobile", 21357); > user_pref("services.push.adaptive.lastGoodPingInterval.wifi", 432486); > user_pref("services.push.adaptive.mobile", "mobile-208-15"); > user_pref("services.push.pingInterval", 648729); > user_pref("services.push.pingInterval.mobile", 21357); > user_pref("services.push.pingInterval.wifi", 648729); That's just after three weeks of dogfooding.

Fernando R. Sela (no CC, needinfo please) [:frsela]

Assignee

Comment 39

•

10 years ago

Attached patch Bug1152264.patch — Details — Splinter Review

r+ (nsm)

Attachment #8596570 - Attachment is obsolete: true

Attachment #8604105 - Flags: review+

Attachment #8604105 - Flags: checkin?

Fernando R. Sela (no CC, needinfo please) [:frsela]

Assignee

Updated

•

10 years ago

Attachment #8604105 - Flags: checkin?

Fernando R. Sela (no CC, needinfo please) [:frsela]

Assignee

Comment 40

•

10 years ago

Try: https://treeherder.mozilla.org/#/jobs?repo=try&revision=8edf92aaea6f

:gerard-majax

Reporter

Comment 41

•

10 years ago

How will we be fixing live devices ? Comment 12 states that this code exists in 2.0, so we can probably get into this state for 2.0, i.e., current release.

Flags: needinfo?(frsela)

Fernando R. Sela (no CC, needinfo please) [:frsela]

Assignee

Comment 42

•

10 years ago

My suspicions are more related to the bug 1100863 which reduces the pingInterval when the websocket is closed, so this patch, sets a lower limit to that bug.

Flags: needinfo?(frsela)

Ben Bangert [:benbangert]

Comment 43

•

10 years ago

We deployed a new Push system today that provides more insight into ongoing Push operations. One thing that stood out in our data was that clients were being dropped for pinging too frequently. Our push system has a minimum ping interval of 20 seconds, any client pinging more frequently than this value will be dropped by our service. Does the ping adaptation have a minimum threshold?

Lina Butler (ex-Mozilla)

Comment 44

•

10 years ago

We just released an update to the Push server to address this. Clients that ping too frequently will no longer be dropped; instead, we'll wait 5 seconds before responding to the ping.

:gerard-majax

Reporter

Comment 45

•

10 years ago

(In reply to Kit Cambridge [:kitcambridge] from comment #44) > We just released an update to the Push server to address this. Clients that > ping too frequently will no longer be dropped; instead, we'll wait 5 seconds > before responding to the ping. How does that addresses the network data over consumption (40£ in roaming for Dale recently, after just a coule of hours) and the induced battery drainage ?

Flags: needinfo?(kcambridge)

Flags: needinfo?(frsela)

Flags: needinfo?(bbangert)

Lina Butler (ex-Mozilla)

Comment 46

•

10 years ago

It will reduce the data and battery usage from the current behavior. Instead of reconnecting and re-sending the handshake (which uses a lot more data, particularly if the client has lots of channel IDs), the client and server will exchange pings (`{}`). Far from ideal, but less overhead than renegotiating the TLS connection and then handshaking. We can do some additional server work to raise this interval by a few seconds, but the client will time out requests after 10 seconds—at which point it'll reconnect, and we're back to square one. So 10 seconds is the absolute maximum that we can wait before replying (assuming zero-latency, which won't happen even on a local link). In short, the server work is a stopgap until the client can be patched. Even then, not all clients will be updated. Martin Thomson suggested moving the adaptive ping to the server, having it build a database of maximum ping intervals, and send a suggested ping interval to the client...but, again, that requires matching client work, and the same issue of old clients applies.

Flags: needinfo?(kcambridge)

Ben Bangert [:benbangert]

Comment 47

•

10 years ago

Kit has answered most of it. I should note that it seems some of these clients out there that likely have a value of 0 stuck in them for pings, have high latency, so by delaying our response, it exceeds the 10 second round-trip time and the client drops, then reconnects. While reducing the cost somewhat, I don't expect it to be more than a 10-30% drop, but that's the most we can help from the server-side as long as the client has this bug. To deal with this as gracefully as possible and reduce the more data-intensive reconnects, an adaptive pong is going out with a Push server deploy in the next day or two which will reduce how fast the server sends pong's such that it sends 1 pong max per 8-second window (2 seconds less than the 10-sec window to account for network slop). This way if a client has a low latency connection to our server, we'll slow down the pong more, but if they're in a higher latency environment, we might reply right away on our side to prevent the client from performing the more expensive reconnect.

Flags: needinfo?(willyaranda)

Flags: needinfo?(frsela)

Flags: needinfo?(bbangert)

Josh Cheng [:josh]

Comment 48

•

10 years ago

Hi Ben, Do you have any update after applying new patch reduce server pong speed? Thanks!

Flags: needinfo?(bbangert)

Ben Bangert [:benbangert]

Comment 49

•

10 years ago

It's been deployed, there's still clients that are reconnecting constantly. Churn does seem to have dropped slightly though.

Flags: needinfo?(bbangert)

Bobby Chien

Comment 50

•

10 years ago

[Blocking Requested - why for this release]: Triage meeting: continue to track in 3.0 (or future release). Nominate to 3.0?

blocking-b2g: 2.2+ → 3.0?

Boaz Dodin

Comment 51

•

10 years ago

It's a bad, confirmed regression that according to comment 14 "used 400MB of my roaming data (£40 worth)". Bobby, what is the rational to leave it unfixed in 2.2?

Flags: needinfo?(bchien)

Bobby Chien

Comment 52

•

10 years ago

I still keep v2.2 as affected. According to v2.2 will be CC soon (need approval after CC release), we make this to be followed up in v3.0. Fernando, what do you think? I can move it back in v2.2. Thanks.

Flags: needinfo?(bchien) → needinfo?(frsela)

Fernando R. Sela (no CC, needinfo please) [:frsela]

Assignee

Comment 53

•

10 years ago

(In reply to Bobby Chien [:bchien] from comment #52) > I still keep v2.2 as affected. According to v2.2 will be CC soon (need > approval after CC release), we make this to be followed up in v3.0. > > Fernando, what do you think? I can move it back in v2.2. Thanks. Agree, The patch from [1] it's also landed in 2.2 so this fix should be included there. [1] https://bugzilla.mozilla.org/show_bug.cgi?id=1100863

Flags: needinfo?(frsela)

Fernando R. Sela (no CC, needinfo please) [:frsela]

Assignee

Updated

•

10 years ago

Keywords: checkin-needed

Pulsebot

Comment 54

•

10 years ago

https://hg.mozilla.org/integration/mozilla-inbound/rev/b9efa70a359a

Keywords: checkin-needed

Wes Kocher (:KWierso) (Not reading bugmail; email directly if needed)

Comment 55

•

10 years ago

https://hg.mozilla.org/mozilla-central/rev/b9efa70a359a

Status: REOPENED → RESOLVED

Closed: 11 years ago → 10 years ago

status-firefox41: --- → fixed

Resolution: --- → FIXED

Target Milestone: --- → mozilla41

:gerard-majax

Reporter

Comment 57

•

10 years ago

Just got a report on IRC from someone with OpenC and 2.2 build provided by community, still hitting this issue. Wouldn't the server side issue have helped?

Flags: needinfo?(kcambridge)

jorge alves

Comment 58

•

10 years ago

It's still happening for me and I couldn't "fix" it by messing with the settings. The only reason I didn't complain here before is because the updates where broken and I trusted the fix. When it finally got around to update I forgot to come back here. This issue burns through battery life and data plan allowance and it's really obvious. Is no one at mozilla dogfooding the OS?

Marcelino Veiga Tuimil [:sonmarce]

Comment 59

•

10 years ago

Patch was only landed in mozilla-central, is it already happening there? Or in 2.2?

:gerard-majax

Reporter

Comment 60

•

10 years ago

(In reply to Marcelino Veiga Tuimil [:sonmarce] from comment #59) > Patch was only landed in mozilla-central, is it already happening there? Or > in 2.2? That's on 2.2. And that was documented a long time ago. And people said there were some server-side changes to mitigate. Current reports shows: - that it has not reached 2.2 (I don't see anything landed on b2g37 branch) - the server side fix is not enough or not effective

Fernando R. Sela (no CC, needinfo please) [:frsela]

Assignee

Comment 61

•

10 years ago

Comment on attachment 8604105 [details] [diff] [review] Bug1152264.patch NOTE: Please see https://wiki.mozilla.org/Release_Management/B2G_Landing to better understand the B2G approval process and landings. [Approval Request Comment] Bug caused by (feature/regressing bug #): User impact if declined: high data comsumption Testing completed: - Risk to taking this patch (and alternatives if risky): low String or UUID changes made by this patch: none

Attachment #8604105 - Flags: approval-mozilla-b2g37?

Lina Butler (ex-Mozilla)

Comment 62

•

10 years ago

(In reply to jorge alves from comment #58) > It's still happening for me and I couldn't "fix" it by messing with the > settings. The only reason I didn't complain here before is because the > updates where broken and I trusted the fix. When it finally got around to > update I forgot to come back here. The excessive data usage and battery drain are frustrating, and I'm very sorry this is still an issue for you. You trusted the fix, and we didn't deliver. You have every right to be upset. > This issue burns through battery life and data plan allowance and it's > really obvious. Is no one at mozilla dogfooding the OS? We tested the server patch by setting the phone's preferences to match comment #1 and comment #7, and verifying that the phone was sending a small data packet every 10 seconds to our server. Unfortunately, that's the most we can do without Fernando's client patch...but the fact that the issue is still obvious means the server fix is incomplete. Could I ask you to post the following prefs from your phone into this ticket? This would help us a lot in figuring out what's going on. * services.push.serverURL * services.push.userAgentID * services.push.pingInterval * services.push.requestTimeout * services.push.adaptive.mobile * services.push.pingInterval.mobile * services.push.adaptive.lastGoodPingInterval.mobile * services.push.pingInterval.wifi * services.push.adaptive.lastGoodPingInterval.wifi

Flags: needinfo?(kcambridge) → needinfo?(jag.alves)

Lina Butler (ex-Mozilla)

Comment 63

•

10 years ago

(In reply to jorge alves from comment #58) > It's still happening for me and I couldn't "fix" it by messing with the > settings. Also, are you using 2.2, or `central`?

jorge alves

Comment 64

•

10 years ago

(In reply to Kit Cambridge [:kitcambridge] from comment #62) * serverURL: wss://push.services.mozilla.com/ * userAgentID: 8139c49991164abba8860a60dbdae5ad * pingInterval: 0 * requestTimeout: 10000 * adaptive.mobile: mobile-204-08 * pingInterval.mobile: 0 * adaptive.lastGoodPingInterval.mobile: 0 * pingInterval.wifi: 0 * adaptive.lastGoodPingInterval.wifi 0 I'm running 2.2 on flame-kk and it's very possible I reset some of them to default when trying to fix it. And no need to apologize, I'm know what I'm getting into by using pre-release stuff.

Flags: needinfo?(jag.alves)

jorge alves

Comment 65

•

10 years ago

Anything else can help with in order to fix this?

Flags: needinfo?(kcambridge)

Lina Butler (ex-Mozilla)

Comment 66

•

10 years ago

Thanks for the info! I think we can try a more aggressive fix on the server to prevent phones that get into this state from reconnecting. As a consequence, those phones won't receive push notifications until their network state changes (reconnecting to the carrier, or switching between cellular and Wi-Fi)...or until the phone is rebooted. But I think that's an acceptable trade-off for a working phone that doesn't burn through your data. :-) I've opened an issue against our server here, with a more detailed description of the workaround: https://github.com/mozilla-services/autopush/issues/103 We'll try to get to it next week, but the week of June 29th is more realistic.

Flags: needinfo?(kcambridge)

jorge alves

Comment 67

•

10 years ago

Is this workaround acceptable to 2.2 phones sold to end users?

Josh Cheng [:josh]

Comment 68

•

10 years ago

Hi Kit, Hi Fernando, Do we still need the client patch as it seems you try to fix it on server side?

Flags: needinfo?(frsela)

Lina Butler (ex-Mozilla)

Comment 69

•

10 years ago

(In reply to jorge alves from comment #67) > Is this workaround acceptable to 2.2 phones sold to end users? It's not good, but it's the best we can do on the server until Fernando's patch is uplifted. (In reply to Josh Cheng [:josh] from comment #68) > Hi Kit, Hi Fernando, > Do we still need the client patch as it seems you try to fix it on server > side? We definitely need the client patch. The server workaround is a hack to disable push notifications for devices that get into this state. It's only to mitigate the battery drain and data usage.

Josh Cheng [:josh]

Updated

•

10 years ago

Attachment #8604105 - Flags: approval-mozilla-b2g37? → approval-mozilla-b2g37+

Ryan VanderMeulen [:RyanVM]

Comment 70

•

10 years ago

https://hg.mozilla.org/releases/mozilla-b2g37_v2_2/rev/b153d97d5653

status-b2g-v2.2: affected → fixed

Fernando R. Sela (no CC, needinfo please) [:frsela]

Assignee

Comment 71

•

10 years ago

(In reply to Josh Cheng [:josh] from comment #68) > Hi Kit, Hi Fernando, > Do we still need the client patch as it seems you try to fix it on server > side? Hi, this patch, as Kit said, is needed, too. Thank you

Flags: needinfo?(frsela)

:gerard-majax

Reporter

Updated

•

10 years ago

Blocks: 1189729

Comment hidden (obsolete)

Josh Cheng [:josh]

Comment 73

•

10 years ago

Change status-b2g-master to fixed which aligned status-firefox41. Will be changed to "affected" when patch backed out.

status-b2g-master: affected → fixed

Michael Henretty [:mikehenrty][:mhenretty]

Comment 74

•

10 years ago

backed out of m-c. We'll leave this bug closed/fixed and allow original patch author to fix both this and bug 1100863 in that one. https://hg.mozilla.org/integration/mozilla-inbound/rev/6379ad0797339835a87913fde9a30d458e64a241

Bug1152264.patch 10 years ago Fernando R. Sela (no CC, needinfo please) [:frsela] 1.25 KB, patch	nsm : review+	Details \| Diff \| Splinter Review
Bug1152264.patch 10 years ago Fernando R. Sela (no CC, needinfo please) [:frsela] 1.91 KB, patch	frsela : review+ jocheng : approval-mozilla-b2g37+	Details \| Diff \| Splinter Review