Closed Bug 1152264 Opened 5 years ago Closed 5 years ago

Push API constantly doing requests

Categories

(Core :: DOM: Push Notifications, defect)

ARM
Gonk (Firefox OS)
defect
Not set

Tracking

()

RESOLVED FIXED
mozilla41
blocking-b2g 2.5?
Tracking Status
firefox41 --- fixed
b2g-v2.0 --- ?
b2g-v2.1 --- ?
b2g-v2.2 --- fixed
b2g-master --- fixed

People

(Reporter: gerard-majax, Assigned: frsela)

References

Details

(Keywords: regression)

Attachments

(1 file, 1 obsolete file)

This is the second device getting into my hands with the same symptom: logcat spam about push.services.mozilla.com

Looking at the prefs, in both case, I had:
> user_pref("services.push.adaptive.lastGoodPingInterval.mobile", 0);
> user_pref("services.push.adaptive.lastGoodPingInterval.wifi", 0);
> user_pref("services.push.adaptive.mobile", "mobile-208-15");
> user_pref("services.push.pingInterval", 0);
> user_pref("services.push.pingInterval.mobile", 0);
> user_pref("services.push.pingInterval.wifi", 0);

In all cases, this was on device upgraded and at some poing being in 2.2. Looks like we may have a bad regression at some point.

Removing those prefs, device stops doing constant requests, and I could not notice any broken feature (tested with Find My Device)
Component: General → DOM: Push Notifications
Product: Firefox OS → Core
Can QA reproduce?
Keywords: qawanted
Not sure what was being done here. Could you elaborate on what you did and your environmental variables? Device, branch (it says 'upgraded' to 2.2, was it OTA? were you on 2.2 and OTA'ed to 2.2?), what settings were enabled... anything specific that could help QA reproduce the issue.
Flags: needinfo?(lissyx+mozillians)
(In reply to Pi Wei Cheng [:piwei] from comment #2)
> Not sure what was being done here. Could you elaborate on what you did and
> your environmental variables? Device, branch (it says 'upgraded' to 2.2, was
> it OTA? were you on 2.2 and OTA'ed to 2.2?), what settings were enabled...
> anything specific that could help QA reproduce the issue.

As I said, this happened on multiple devices. I cannot give more specifics, since I don't know when it started, I only know when I noticed ...

On one device I noticed it while working on Find My Device on current master, on another device that is not mine, it was when the contributor brought it to me because there were a lot of weird issues.

So the only common variable between both is that at some point, they got a 2.2 build. They got updated in different way (sometimes OTA, sometimes flashing boot and system partition), and that should not have an impact.

In both case, those prefs got set. This is probably what you should track: when did those prefs get set ...
Flags: needinfo?(lissyx+mozillians)
Duplicate of this bug: 1104175
Hi, I ran into exactly the same problem as Alexandre. (ZTE Open C French, with 2.2 OTA'ed multiple times)
After noticing serious data usage on mobile & wifi (I'd say a week ago, roughly), I first updated the B2G 2.2 to an early april release (by sideloading), without any difference.

In a shell, after a reboot and while I still had a massive data exchange (and battery dropping), I did a netstat, that only showed two https connections to an amazon AWS instance that had a certificate issued to push.services.mozilla.com.

I found this bug and did remove in the prefs.js the following lines :
> user_pref("services.push.adaptive.lastGoodPingInterval.mobile", 0);
> user_pref("services.push.adaptive.lastGoodPingInterval.wifi", 0);
> user_pref("services.push.pingInterval", 0);
> user_pref("services.push.pingInterval.mobile", 0);
> user_pref("services.push.pingInterval.wifi", 0);
as well as the userAgentID one.

Restarted B2G, and solved the issue, data consumption dropped to a small 63 kB in 5 minutes instead of nearly 500 kB every 3-4 minutes.
Duplicate of this bug: 1151265
Yes, I have

user_pref("services.push.adaptive.lastGoodPingInterval.mobile", 0);             
user_pref("services.push.adaptive.lastGoodPingInterval.wifi", 0);               
user_pref("services.push.adaptive.mobile", "mobile-230-02");                    
user_pref("services.push.pingInterval", 0);                                     
user_pref("services.push.pingInterval.mobile", 0);                              
user_pref("services.push.pingInterval.wifi", 0);                                
user_pref("services.push.userAgentID", "220a9c0439954e5a89e30ec440a46d68");     

What does it mean?
(In reply to Matěj Cepl from comment #7)
> Yes, I have

After the removal of the preferences from /data/b2g/mozilla/*.default/prefs.js the connection looks more sane.
Johan, this is the bug we talked about. I have no idea when this started, though.
Flags: needinfo?(jlorenzo)
When I upgraded (manual build) from 2.3 to 3.0 I made a clean install, so it's not a case of 2.2 prefs surviving the upgrade. After that I've been using backup_restore_profile.py to migrate my data over an update.
Just resetting the ping interval prefs from WebIDE seems to have fixed it for me.
I couldn't find some exact steps to reproduce. However, I have some suspicions on the function called _calculateAdaptivePing[1]. If I understand the function correctly, there is a way to reduce the ping interval down to 0 if the Web Socket is still down after a certain amount of retries.

For instance: The default value of pingInterval is 180000 (3 minutes), after 3 retries * 17 iterations with the web socket down, Math.floor() will return 0.

>   if (wsWentDown) {
>     debug('The WebSocket was disconnected, calculating next ping');
> 
>     // If we have not tried this pingInterval yet, initialize
>     this._pingIntervalRetryTimes[lastTriedPingInterval] =
>          (this._pingIntervalRetryTimes[lastTriedPingInterval] || 0) + 1;
> 
>      // Try the pingInterval at least 3 times, just to be sure that the
>      // calculated interval is not valid.
>      if (this._pingIntervalRetryTimes[lastTriedPingInterval] < 2) {
>        debug('pingInterval= ' + lastTriedPingInterval + ' tried only ' +
>          this._pingIntervalRetryTimes[lastTriedPingInterval] + ' times');
>        return;
>      }
> 
>      // Latest ping was invalid, we need to lower the limit to limit / 2
>      nextPingInterval = Math.floor(lastTriedPingInterval / 2);

The function has been implemented in bug 894879 and hasn't changed a lot since, so we might have this issue since 2.0.

Guillermo, Nikhil, do you think the given example scenario is plausible? If not, do you think we might decrease more than we increase the ping after a long period of time? 

[1] http://mxr.mozilla.org/mozilla-central/source/dom/push/PushService.jsm#679
status-b2g-v2.0: --- → ?
status-b2g-v2.1: --- → ?
Flags: needinfo?(willyaranda)
Flags: needinfo?(nsm.nikhil)
Flags: needinfo?(jlorenzo)
See Also: → 894879
I started noticing this this week as well

user_pref("services.push.adaptive.lastGoodPingInterval.mobile", 0);
user_pref("services.push.adaptive.lastGoodPingInterval.wifi", 0);
user_pref("services.push.adaptive.mobile", "mobile-234-30");
user_pref("services.push.pingInterval", 0);
user_pref("services.push.pingInterval.mobile", 0);
user_pref("services.push.pingInterval.wifi", 0);

on my z3c
As far as I can tell, this was interfering with my ability to actually connect to data networks, I could only connect if I restarted and when I disabled data connection I could never reconnect.

It also looks to have used 400MB of my roaming data (£40 worth)
blocking-b2g: 2.2? → 2.2+
Fernando,

I thought we had lower limits. Why are the limits converging to zero then? Comment 12 seems relevant.
Flags: needinfo?(nsm.nikhil) → needinfo?(frsela)
Clearing steps-wanted while we have more information about the limits.
Keywords: steps-wanted
I don't see the pref "services.push.adaptive.gap" (defaults to 60000ms) that should limit our minimum interval to that value.

Also, "services.push.adaptive.enabled" is not there, nor "services.push.pingInterval.default". Could you double check?

In this case, the value that we use for the ping interval is

> user_pref("services.push.pingInterval", 0);

Changing this to other value should fix the 0 interval.
Flags: needinfo?(willyaranda) → needinfo?(jlorenzo)
And another report from OpenC builds: https://bugzilla.frenchmozilla.org/show_bug.cgi?id=642
services.push.adaptive.enabled and services.push.pingInterval.default are not located in prefs.js. Hence, the values used are the one by default (true and 180000).

If I understand the condition using "adaptative.gap"[1] correctly, the only time where it'll go in the else will be when nextPingInterval is 60s more than the last previous. In every other case, this_recalculatePing will be set to false.

Once that done, you just need the websocket to be down to fall into [2]. So, if I understand correctly, after 2 tries, you divide this._lastGoodPingInterval by 2 and that's how you manage to set services.push.adaptive.lastGoodPingInterval to 0 (like in every testimonial).

Do you think that could explain why the limit is converging to 0, Guillermo?

[1] http://mxr.mozilla.org/mozilla-central/source/dom/push/PushService.jsm#769
[2] http://mxr.mozilla.org/mozilla-central/source/dom/push/PushService.jsm#732
Flags: needinfo?(jlorenzo) → needinfo?(willyaranda)
We should review this method avoiding 0 intervals.
Thank you Nikhil, I'll take a look in the following days.
Flags: needinfo?(frsela)
Push ping interval should be configured according to target mobile network configuration, depending on how long a TCP connection remains open. Of course a value of "0" is wrong, I do not know why it is included in build, but it should be removed, or replaced by a better one.
Fully agree,

These values are used by the push agent without checking a minimum value [1] , so 0 is the same as "inmediatenly"

Please, change the build default preferences to reasonable values.

Moreover, B2G preferences file has 3 minutes per each pinginterval [2]

Closing this bug since this isn't related to gecko code but OEM customization. Reopen if you consider it.

[1] http://mxr.mozilla.org/mozilla-central/source/dom/push/PushService.jsm#716
[2] http://mxr.mozilla.org/mozilla-central/source/b2g/app/b2g.js#487

(In reply to Marcelino Veiga Tuimil [:sonmarce] from comment #21)
> Push ping interval should be configured according to target mobile network
> configuration, depending on how long a TCP connection remains open. Of
> course a value of "0" is wrong, I do not know why it is included in build,
> but it should be removed, or replaced by a better one.
Status: NEW → RESOLVED
Closed: 5 years ago
Resolution: --- → INVALID
That's unrelated with OEM customization. This happened on multiple different devices and builds. Nobody ever set those prefs by hands, that's the point.
(In reply to Marcelino Veiga Tuimil [:sonmarce] from comment #21)
> I do not know why it is included in build, but it should be removed, or replaced by a better one.

Sorry, this is not the point of this bug. The main issue is to understand and fix the convergence of  services.push.adaptive.lastGoodPingInterval to 0 in prefs.js. b2g.js remained untouched. No OEM customization has been made.
Status: RESOLVED → REOPENED
Resolution: INVALID → ---
(In reply to Johan Lorenzo [:jlorenzo] (QA) from comment #24)
> (In reply to Marcelino Veiga Tuimil [:sonmarce] from comment #21)
> > I do not know why it is included in build, but it should be removed, or replaced by a better one.
> 
> Sorry, this is not the point of this bug. The main issue is to understand
> and fix the convergence of  services.push.adaptive.lastGoodPingInterval to 0
> in prefs.js. b2g.js remained untouched. No OEM customization has been made.

Sorry for the misunderstanding
Doug, could you help on this?
Flags: needinfo?(dougt)
Assignee: nobody → frsela
Sorry, I commented into an incorrect bug. Please forgot Comment #27
Attached patch Bug1152264.patch (obsolete) — Splinter Review
This patch adds a protection which avoids pings lower than 1 minute. Meanwhile I'll study why the algorithm goes to 0
Attachment #8596570 - Flags: feedback?(dougt)
Can you provide some log traces of the Push system [1] when this failure happens?

[1] http://mxr.mozilla.org/mozilla-central/source/b2g/app/b2g.js#469
Flags: needinfo?(lissyx+mozillians)
(In reply to Fernando R. Sela (no CC, needinfo please) [:frsela] from comment #30)
> Can you provide some log traces of the Push system [1] when this failure
> happens?
> 
> [1] http://mxr.mozilla.org/mozilla-central/source/b2g/app/b2g.js#469

If by "some log traces of the Push system when this failure happens", I'm afraid we cannot: the issue has been sitting on the devices for at least days until we got to the prefs that were the cause of the data activity.

So, again, we have no clear idea when/how this started.
Flags: needinfo?(lissyx+mozillians)
Alexandre, can you run with this patch for a couple days. It looks like it should solve this problem.
Flags: needinfo?(dougt)
(In reply to Doug Turner (:dougt) from comment #32)
> Alexandre, can you run with this patch for a couple days. It looks like it
> should solve this problem.

I already fixed all my devices and those of people who came to me with this issue. I don't really know what I can test for now, the pref value is good :(. Especially until we have a completely documented status of the triggering conditions. Johan already started this work.
As Alexandre suggested offline, one way to make sure this pref is never set to something lower than 60 seconds, would be to wrap the setter of the pref and choose the Max value between the candidate and 60000ms.

From a QA standpoint, this would also help to lower the complexity of the _calculateAdaptivePing(). This function alone has currently a cyclomatic complexity of 21 [1], adding another "if" in the middle would make it worse. Moreover, _calculateAdaptivePing() is currently not unit tested at all. If I understand the code correctly, this function doesn't depend on bug 1038078 to be unit tested. What do you think Fernando?

Also, after reading _calculateAdaptivePing() a couple of times, I am under the impression that this functions has too many responsibilities. I have some trouble to know how many scenarios we'd have to test. I think one of solution could be to break it down to functions with only 1 responsibility that we could easily unit test. 

To sum up, as the testing of this bug is currently nearly impossible, I'd recommend to wrap the pref set and uplift this small patch. 
In a second time, I'd add some unit tests for this particular function. Then, break it down and add the tests we forgot before the refactor. I hope this will make bug harder to hide or detect. Does this sounds like strategy to you guys? 


[1] http://jsmeter.info/wv7ocd/1
Flags: needinfo?(lissyx+mozillians)
Flags: needinfo?(frsela)
Flags: needinfo?(dougt)
Whatever we can do to fix this mess up to v2.0 (because 2.0 looks impacted, too) is fine by me.
Flags: needinfo?(lissyx+mozillians)
Attachment #8596570 - Flags: feedback?(dougt) → review?(nsm.nikhil)
Comment on attachment 8596570 [details] [diff] [review]
Bug1152264.patch

Review of attachment 8596570 [details] [diff] [review]:
-----------------------------------------------------------------

r=me with comments.

::: dom/push/PushService.jsm
@@ +783,5 @@
>          this._wsWentDownCounter = 0;
>          this._recalculatePing = true;
>          this._lastGoodPingInterval = Math.floor(lastTriedPingInterval / 2);
> +        if (this._lastGoodPingInterval < 60000) {
> +          // We set a lower security limit. 1 minute is the less allowed ping interval

Just, "1 minute is the least allowed ping interval"
1 minute sounds a little terrible too. Maybe 5 minutes?
Could you move 60000 to a file level constant at the least. Thanks!
Attachment #8596570 - Flags: review?(nsm.nikhil) → review+
Flags: needinfo?(dougt)
(In reply to Nikhil Marathe [:nsm] (needinfo? please) from comment #36)
> Comment on attachment 8596570 [details] [diff] [review]
> Bug1152264.patch
> 
> Review of attachment 8596570 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> r=me with comments.
> 
> ::: dom/push/PushService.jsm
> @@ +783,5 @@
> >          this._wsWentDownCounter = 0;
> >          this._recalculatePing = true;
> >          this._lastGoodPingInterval = Math.floor(lastTriedPingInterval / 2);
> > +        if (this._lastGoodPingInterval < 60000) {
> > +          // We set a lower security limit. 1 minute is the less allowed ping interval
> 
> Just, "1 minute is the least allowed ping interval"
> 1 minute sounds a little terrible too. Maybe 5 minutes?

Agree, too short, but in some networks (After our lab tests, Vivo Brasil[1] cuts the connection after 5 minutes so will be failing continously. Don't know about other networks)

> Could you move 60000 to a file level constant at the least. Thanks!

Fully agree !, I'll set it in a constant, but with which minimum value? 3min?

[1] http://mxr.mozilla.org/mozilla-central/source/dom/push/PushService.jsm#696
Flags: needinfo?(frsela)
While checking for something else, I had a look at my prefs.js file, and noticed that the interval is getting low already:
> user_pref("services.push.adaptive.lastGoodPingInterval.mobile", 21357);         
> user_pref("services.push.adaptive.lastGoodPingInterval.wifi", 432486);          
> user_pref("services.push.adaptive.mobile", "mobile-208-15");                    
> user_pref("services.push.pingInterval", 648729);                                
> user_pref("services.push.pingInterval.mobile", 21357);                          
> user_pref("services.push.pingInterval.wifi", 648729);   

That's just after three weeks of dogfooding.
Attached patch Bug1152264.patchSplinter Review
r+ (nsm)
Attachment #8596570 - Attachment is obsolete: true
Attachment #8604105 - Flags: review+
Attachment #8604105 - Flags: checkin?
How will we be fixing live devices ? Comment 12 states that this code exists in 2.0, so we can probably get into this state for 2.0, i.e., current release.
Flags: needinfo?(frsela)
My suspicions are more related to the bug 1100863 which reduces the pingInterval when the websocket is closed, so this patch, sets a lower limit to that bug.
Flags: needinfo?(frsela)
We deployed a new Push system today that provides more insight into ongoing Push operations. One thing that stood out in our data was that clients were being dropped for pinging too frequently.

Our push system has a minimum ping interval of 20 seconds, any client pinging more frequently than this value will be dropped by our service. Does the ping adaptation have a minimum threshold?
We just released an update to the Push server to address this. Clients that ping too frequently will no longer be dropped; instead, we'll wait 5 seconds before responding to the ping.
(In reply to Kit Cambridge [:kitcambridge] from comment #44)
> We just released an update to the Push server to address this. Clients that
> ping too frequently will no longer be dropped; instead, we'll wait 5 seconds
> before responding to the ping.

How does that addresses the network data over consumption (40£ in roaming for Dale recently, after just a coule of hours) and the induced battery drainage ?
Flags: needinfo?(kcambridge)
Flags: needinfo?(frsela)
Flags: needinfo?(bbangert)
It will reduce the data and battery usage from the current behavior. Instead of reconnecting and re-sending the handshake (which uses a lot more data, particularly if the client has lots of channel IDs), the client and server will exchange pings (`{}`). Far from ideal, but less overhead than renegotiating the TLS connection and then handshaking.

We can do some additional server work to raise this interval by a few seconds, but the client will time out requests after 10 seconds—at which point it'll reconnect, and we're back to square one. So 10 seconds is the absolute maximum that we can wait before replying (assuming zero-latency, which won't happen even on a local link).

In short, the server work is a stopgap until the client can be patched. Even then, not all clients will be updated. Martin Thomson suggested moving the adaptive ping to the server, having it build a database of maximum ping intervals, and send a suggested ping interval to the client...but, again, that requires matching client work, and the same issue of old clients applies.
Flags: needinfo?(kcambridge)
Kit has answered most of it. I should note that it seems some of these clients out there that likely have a value of 0 stuck in them for pings, have high latency, so by delaying our response, it exceeds the 10 second round-trip time and the client drops, then reconnects. While reducing the cost somewhat, I don't expect it to be more than a 10-30% drop, but that's the most we can help from the server-side as long as the client has this bug.

To deal with this as gracefully as possible and reduce the more data-intensive reconnects, an adaptive pong is going out with a Push server deploy in the next day or two which will reduce how fast the server sends pong's such that it sends 1 pong max per 8-second window (2 seconds less than the 10-sec window to account for network slop). This way if a client has a low latency connection to our server, we'll slow down the pong more, but if they're in a higher latency environment, we might reply right away on our side to prevent the client from performing the more expensive reconnect.
Flags: needinfo?(willyaranda)
Flags: needinfo?(frsela)
Flags: needinfo?(bbangert)
Hi Ben,
Do you have any update after applying new patch reduce server pong speed?
Thanks!
Flags: needinfo?(bbangert)
It's been deployed, there's still clients that are reconnecting constantly. Churn does seem to have dropped slightly though.
Flags: needinfo?(bbangert)
[Blocking Requested - why for this release]:

Triage meeting: continue to track in 3.0 (or future release). Nominate to 3.0?
blocking-b2g: 2.2+ → 3.0?
It's a bad, confirmed regression that according to comment 14 "used 400MB of my roaming data (£40 worth)".

Bobby, what is the rational to leave it unfixed in 2.2?
Flags: needinfo?(bchien)
I still keep v2.2 as affected. According to v2.2 will be CC soon (need approval after CC release), we make this to be followed up in v3.0. 

Fernando, what do you think? I can move it back in v2.2. Thanks.
Flags: needinfo?(bchien) → needinfo?(frsela)
(In reply to Bobby Chien [:bchien] from comment #52)
> I still keep v2.2 as affected. According to v2.2 will be CC soon (need
> approval after CC release), we make this to be followed up in v3.0. 
> 
> Fernando, what do you think? I can move it back in v2.2. Thanks.

Agree, The patch from [1] it's also landed in 2.2 so this fix should be included there.

[1] https://bugzilla.mozilla.org/show_bug.cgi?id=1100863
Flags: needinfo?(frsela)
https://hg.mozilla.org/mozilla-central/rev/b9efa70a359a
Status: REOPENED → RESOLVED
Closed: 5 years ago5 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla41
Duplicate of this bug: 1077107
Just got a report on IRC from someone with OpenC and 2.2 build provided by community, still hitting this issue. Wouldn't the server side issue have helped?
Flags: needinfo?(kcambridge)
It's still happening for me and I couldn't "fix" it by messing with the settings. The only reason I didn't complain here before is because the updates where broken and I trusted the fix. When it finally got around to update I forgot to come back here.

This issue burns through battery life and data plan allowance and it's really obvious. Is no one at mozilla dogfooding the OS?
Patch was only landed in mozilla-central, is it already happening there? Or in 2.2?
(In reply to Marcelino Veiga Tuimil [:sonmarce] from comment #59)
> Patch was only landed in mozilla-central, is it already happening there? Or
> in 2.2?

That's on 2.2. And that was documented a long time ago. And people said there were some server-side changes to mitigate. Current reports shows:
 - that it has not reached 2.2 (I don't see anything landed on b2g37 branch)
 - the server side fix is not enough or not effective
Comment on attachment 8604105 [details] [diff] [review]
Bug1152264.patch

NOTE: Please see https://wiki.mozilla.org/Release_Management/B2G_Landing to better understand the B2G approval process and landings.

[Approval Request Comment]
Bug caused by (feature/regressing bug #): 
User impact if declined: high data comsumption
Testing completed: -
Risk to taking this patch (and alternatives if risky): low
String or UUID changes made by this patch: none
Attachment #8604105 - Flags: approval-mozilla-b2g37?
(In reply to jorge alves from comment #58)
> It's still happening for me and I couldn't "fix" it by messing with the
> settings. The only reason I didn't complain here before is because the
> updates where broken and I trusted the fix. When it finally got around to
> update I forgot to come back here.

The excessive data usage and battery drain are frustrating, and I'm very sorry this is still an issue for you. You trusted the fix, and we didn't deliver. You have every right to be upset.

> This issue burns through battery life and data plan allowance and it's
> really obvious. Is no one at mozilla dogfooding the OS?

We tested the server patch by setting the phone's preferences to match comment #1 and comment #7, and verifying that the phone was sending a small data packet every 10 seconds to our server. Unfortunately, that's the most we can do without Fernando's client patch...but the fact that the issue is still obvious means the server fix is incomplete.

Could I ask you to post the following prefs from your phone into this ticket? This would help us a lot in figuring out what's going on.

* services.push.serverURL
* services.push.userAgentID
* services.push.pingInterval
* services.push.requestTimeout
* services.push.adaptive.mobile
* services.push.pingInterval.mobile
* services.push.adaptive.lastGoodPingInterval.mobile
* services.push.pingInterval.wifi
* services.push.adaptive.lastGoodPingInterval.wifi
Flags: needinfo?(kcambridge) → needinfo?(jag.alves)
(In reply to jorge alves from comment #58)
> It's still happening for me and I couldn't "fix" it by messing with the
> settings.

Also, are you using 2.2, or `central`?
(In reply to Kit Cambridge [:kitcambridge] from comment #62)

* serverURL: wss://push.services.mozilla.com/
* userAgentID: 8139c49991164abba8860a60dbdae5ad
* pingInterval: 0
* requestTimeout: 10000
* adaptive.mobile: mobile-204-08
* pingInterval.mobile: 0
* adaptive.lastGoodPingInterval.mobile: 0
* pingInterval.wifi: 0
* adaptive.lastGoodPingInterval.wifi 0

I'm running 2.2 on flame-kk and it's very possible I reset some of them to default when trying to fix it.

And no need to apologize, I'm know what I'm getting into by using pre-release stuff.
Flags: needinfo?(jag.alves)
Anything else can help with in order to fix this?
Flags: needinfo?(kcambridge)
Thanks for the info! I think we can try a more aggressive fix on the server to prevent phones that get into this state from reconnecting.

As a consequence, those phones won't receive push notifications until their network state changes (reconnecting to the carrier, or switching between cellular and Wi-Fi)...or until the phone is rebooted. But I think that's an acceptable trade-off for a working phone that doesn't burn through your data. :-)

I've opened an issue against our server here, with a more detailed description of the workaround: https://github.com/mozilla-services/autopush/issues/103

We'll try to get to it next week, but the week of June 29th is more realistic.
Flags: needinfo?(kcambridge)
Is this workaround acceptable to 2.2 phones sold to end users?
Hi Kit, Hi Fernando,
Do we still need the client patch as it seems you try to fix it on server side?
Flags: needinfo?(frsela)
(In reply to jorge alves from comment #67)
> Is this workaround acceptable to 2.2 phones sold to end users?

It's not good, but it's the best we can do on the server until Fernando's patch is uplifted.

(In reply to Josh Cheng [:josh] from comment #68)
> Hi Kit, Hi Fernando,
> Do we still need the client patch as it seems you try to fix it on server
> side?

We definitely need the client patch. The server workaround is a hack to disable push notifications for devices that get into this state. It's only to mitigate the battery drain and data usage.
Attachment #8604105 - Flags: approval-mozilla-b2g37? → approval-mozilla-b2g37+
(In reply to Josh Cheng [:josh] from comment #68)
> Hi Kit, Hi Fernando,
> Do we still need the client patch as it seems you try to fix it on server
> side?

Hi, this patch, as Kit said, is needed, too.
Thank you
Flags: needinfo?(frsela)
Blocks: 1189729
Change status-b2g-master to fixed which aligned status-firefox41.
Will be changed to "affected" when patch backed out.
backed out of m-c. We'll leave this bug closed/fixed and allow original patch author to fix both this and bug 1100863 in that one.

https://hg.mozilla.org/integration/mozilla-inbound/rev/6379ad0797339835a87913fde9a30d458e64a241
You need to log in before you can comment on or make changes to this bug.