SimplePush: Shut down all connections when no connectivity is available.

RESOLVED FIXED

Status

Firefox OS
General
RESOLVED FIXED
5 years ago
5 years ago

People

(Reporter: nsm, Assigned: nsm)

Tracking

unspecified
x86_64
Linux
Dependency tree / graph

Firefox Tracking Flags

(firefox22 wontfix, firefox23 wontfix, firefox24 fixed, b2g18 fixed, b2g18-v1.0.0 wontfix, b2g18-v1.0.1 wontfix, b2g-v1.1hd fixed)

Details

(Whiteboard: [fixed-in-birch])

Attachments

(1 attachment, 3 obsolete attachments)

PushService monitors for network connection changes and tries to reconnect. But a network connection change can also mean no connection available. In such a case PushService should shut down rather than repeatedly trying to reconnect.
Created attachment 748363 [details] [diff] [review]
Monitor for active network change.

network-interface-state-changed is a very noisy event to watch for, and leads to frequent restarts of the websocket, even when the preferred network (i.e. Wi-Fi) has stayed constant and should be used.

Put in a check for offline in beginWSSetup() to prevent reconnect attempts even when the network is down.

This patch also does some clean up of removeObserver() calls.
Attachment #748363 - Flags: review?(justin.lebar+bug)
Created attachment 748369 [details] [diff] [review]
Patch 1: Monitor for active network change.

Modified patch which applies successfully against birch/m-c. Sorry about that.
Attachment #748363 - Attachment is obsolete: true
Attachment #748363 - Flags: review?(justin.lebar+bug)
Attachment #748369 - Flags: review?(justin.lebar+bug)
> +    // This observer is notified only on B2G by
> +    // dom/system/gonk/NetworkManager.js.

This patch looks fine for B2G, but what's your plan for desktop?
Attachment #748369 - Flags: review?(justin.lebar+bug) → review+
(In reply to Justin Lebar [:jlebar] from comment #3)
> > +    // This observer is notified only on B2G by
> > +    // dom/system/gonk/NetworkManager.js.
> 
> This patch looks fine for B2G, but what's your plan for desktop?

I am going to use "network:offline-status-changed". I have a patch that works, the problem is that that even fires on B2G leading to two events on B2G. Is it good practice to use Services.appinfo.name == "B2G" to not add observers for both events? (I'm thinking it's not).
> I am going to use "network:offline-status-changed".

Is that right?  What if I switch WiFi networks, or change the active network interface (e.g. plug in ethernet and then disconnect wifi); don't I need to reconnect the websocket?  Or do we simply rely on timeouts in that case?

> Is it good practice to use Services.appinfo.name == "B2G" to not add observers for both events? (I'm 
> thinking it's not).

Hmmm.  It sounds like what you want to test for isn't B2G, but rather for whether you're going to get these NetworkManager-specific notifications.  Maybe you could (somehow) test for the NetworkManager, instead of testing for B2G?

A hack would be to stop listening to offline-status-changed as soon as you hear one of the NetworkManager observer topics, but that assumes that the NetworkManager is always going to fire before offline-status-changed, which is fragile...
(In reply to Justin Lebar [:jlebar] from comment #5)
> > I am going to use "network:offline-status-changed".
> 
> Is that right?  What if I switch WiFi networks, or change the active network
> interface (e.g. plug in ethernet and then disconnect wifi); don't I need to
> reconnect the websocket?  Or do we simply rely on timeouts in that case?

AFAIK aren't any notifications similar to network-active-changed on desktop. So we'll have to rely on the timeout or on the OS destroying all sockets with an error when it changes the network.

> 
> > Is it good practice to use Services.appinfo.name == "B2G" to not add observers for both events? (I'm 
> > thinking it's not).
> 
> Hmmm.  It sounds like what you want to test for isn't B2G, but rather for
> whether you're going to get these NetworkManager-specific notifications. 
> Maybe you could (somehow) test for the NetworkManager, instead of testing
> for B2G?
> 
> A hack would be to stop listening to offline-status-changed as soon as you
> hear one of the NetworkManager observer topics, but that assumes that the
> NetworkManager is always going to fire before offline-status-changed, which
> is fragile...

The NetworkManager topics don't fire when the active network changes to 'none'.
They do fire when it goes from none to a network. So it would be a really dirty hack
tracking the order of the two. I'll check for NetworkManager since that currently only
exists in Gonk.
> we'll have to rely on the timeout or on the OS destroying all sockets with an error when it changes 
> the network.

Okay, that doesn't sound too bad.

> I'll check for NetworkManager since that currently only exists in Gonk.

Neither does this.  :)
Created attachment 749246 [details] [diff] [review]
Patch 2: Watch for offline status change.
Comment on attachment 749246 [details] [diff] [review]
Patch 2: Watch for offline status change.

Add support for offline change as discussed.

At least on my linux system, offline change is not emitted when the network dies. It only occurs when 'Work offline' is toggled in firefox. It's also sad that the OS does not notify/forcibly close sockets when no connections are available, so we keep retrying ridiculously. But that is out of scope.

On B2G the Wi-Fi hotspot going down is detected (data is off) and the phone switches to offline mode, which is good.
Attachment #749246 - Flags: review?(justin.lebar+bug)
Attachment #749246 - Attachment description: Watch for offline status change. → Patch 2: Watch for offline status change.
Attachment #748369 - Attachment description: Monitor for active network change. → Patch 1: Monitor for active network change.
Comment on attachment 749246 [details] [diff] [review]
Patch 2: Watch for offline status change.

Review of attachment 749246 [details] [diff] [review]:
-----------------------------------------------------------------

::: dom/push/src/PushService.jsm
@@ +288,5 @@
>        case "profile-change-teardown":
>          this._shutdown();
>          break;
>        case "network-active-changed":
> +      case "network:offline-status-changed":

Maybe add a comment here that "network-active-changed" is a B2G event and "network:offline-status-changed" is a desktop event?

@@ +432,5 @@
> +    try {
> +      Cc["@mozilla.org/network/manager;1"].getService(Ci.nsINetworkManager);
> +      Services.obs.addObserver(this, "network-active-changed", false);
> +    } catch (e) {
> +      Services.obs.addObserver(this, "network:offline-status-changed", false);

Can you consolidate this NetworkManager detection code from init() and _shutdown() into a helper function that just returns the appropriate event name? Then init() and _shutdown() can simply call `Services.obs.addObserver(this, this._getNetworkEventName(), false)` and `Services.obs.removeObserver(this, this._getNetworkEventName())`.
Created attachment 751714 [details] [diff] [review]
Patch 2: Watch for offline status change.

cpeterson's feedback.
Attachment #749246 - Attachment is obsolete: true
Attachment #749246 - Flags: review?(justin.lebar+bug)
Attachment #751714 - Flags: review?(justin.lebar+bug)
Summary: SimplePush: Shutdown all connections when no connectivity is available. → SimplePush: Shut down all connections when no connectivity is available.
Comment on attachment 751714 [details] [diff] [review]
Patch 2: Watch for offline status change.

Sorry it took me so long to get to this.  I'm (hopefully!) done with tef, so I should be faster now.

This looks OK to me, but I feel totally out of my depth here.  Jason, would you mind signing off on this?
Attachment #751714 - Flags: review?(justin.lebar+bug)
Attachment #751714 - Flags: review?(jduell.mcbugs)
Attachment #751714 - Flags: review+
Comment on attachment 751714 [details] [diff] [review]
Patch 2: Watch for offline status change.

Review of attachment 751714 [details] [diff] [review]:
-----------------------------------------------------------------

Honza, hg log shows you seem to know the online/offline notification code.  Can we be doing any better here for desktop? I.e. it seems like we both are 1) lacking an equivalent to network-active-changed, and 2) on desktop it looks like we don't send offline/online status events by default.

> AFAIK there aren't any notifications similar to network-active-changed on desktop.
> So we'll have to rely on the timeout or on the OS destroying all sockets with
> an error when it changes the network.

I don't know this code well, but I see we have a nsIOService::mManageOfflineStatus (and a pref, "network.manage-offline-status"), but the pref is not set in all.js by default.  There's a bunch of relevant bugs where it went from being on by default except for xpcshell tests (bug 463046) and then got turned off (620472, 720320, 791626).   When I look at the default pref files I see

 browser/app/profile/firefox.js:pref("network.manage-offline-status", false);                                                                                                       
browser/metro/profile/metro.js:pref("network.manage-offline-status", true);                                                                                                        
mobile/android/app/mobile.js:pref("network.manage-offline-status", true);       

I.e. we have it turned off for firefox but on for metro/android, and it's not set for B2G, which means it defaults to false (should we change that?).

One possible issue is that the notification was created AFAICT to determine whether to put firefox in offline mode, which is a very different question than whether the PushServer, etc needs to close/reconnect sockets. So maybe we need to stop overloading that.
Attachment #751714 - Flags: review?(jduell.mcbugs) → review?(honzab.moz)
Asking for leo? since this helps on bug 869922
blocking-b2g: --- → leo?
Flags: needinfo?(dcoloma)
(In reply to Guillermo López (:willyaranda) from comment #14)
> Asking for leo? since this helps on bug 869922

1. That bug has been resolved invalid
2. Nothing in SimplePush can block 1.1
Comment on attachment 748369 [details] [diff] [review]
Patch 1: Monitor for active network change.

willyaranda, I think you want approval-mozilla-b2g18 instead.

[Approval Request Comment]
Bug caused by (feature/regressing bug #): 822712
User impact if declined: 
No user facing impact, but extra CPU spinning when the phone is in flight mode or has no internet connections.
Testing completed: 
Tested on devices under both wifi and 3g against multiple push servers. Confirmed by willyaranda. Push connection is always reestablished when possible.
Risk to taking this patch (and alternatives if risky): 
Can't really think of a risk over the existing code.
String or UUID changes made by this patch:
None
Attachment #748369 - Flags: approval-mozilla-b2g18?
blocking-b2g: leo? → ---
Is there any reason we shouldn't wait for this to land on m-c?
Flags: needinfo?(dcoloma)
Patch 1: Monitor for active network change.

http://hg.mozilla.org/projects/birch/rev/850e5446d4a6
Whiteboard: [fixed-in-birch][leave open]
Without this patch I'm having an incorrect behavior on 1.1 (leo). The network connection goes down and it's refreshed every 5 minutes it seems. According to Guillermo that behavior doesn't reproduce with this patch. Can we land this on 1.1? Oh and for what it's worth, I think this should be blocking. The way it's now either I don't get notifications or the battery consumption is going to be quite bad (since it's re-opening the connections every 5 minutes or so).
NVM my last comment, I'm observing the same behavior with this patch applied. Every 5 minutes the socket is closed and reopened, although the timer it's set to 30 minutes:

_processNextRequestInQueue()
06-03 12:36:50.893 I/Gecko   ( 1019): -*- PushService.jsm: Request queue empty
06-03 12:36:50.963 I/Gecko   ( 1019): -*- PushService.jsm: Set alarm 1800000 in the future 9
06-03 12:41:52.617 I/Gecko   ( 1019): -*- PushService.jsm: wsOnStop()
06-03 12:41:52.617 I/Gecko   ( 1019): -*- PushService.jsm: shutdownWS()
06-03 12:41:52.617 I/Gecko   ( 1019): -*- PushService.jsm: Stopped existing alarm 9
06-03 12:41:52.627 I/Gecko   ( 1019): -*- PushService.jsm: Socket error 2152398868
06-03 12:41:52.637 I/Gecko   ( 1019): -*- PushService.jsm: reconnectAfterBackoff()
(In reply to Antonio Manuel Amaya Calvo from comment #20)
> NVM my last comment, I'm observing the same behavior with this patch
> applied. Every 5 minutes the socket is closed and reopened, although the
> timer it's set to 30 minutes:
> 
> _processNextRequestInQueue()
> 06-03 12:36:50.893 I/Gecko   ( 1019): -*- PushService.jsm: Request queue
> empty
> 06-03 12:36:50.963 I/Gecko   ( 1019): -*- PushService.jsm: Set alarm 1800000
> in the future 9
> 06-03 12:41:52.617 I/Gecko   ( 1019): -*- PushService.jsm: wsOnStop()
> 06-03 12:41:52.617 I/Gecko   ( 1019): -*- PushService.jsm: shutdownWS()
> 06-03 12:41:52.617 I/Gecko   ( 1019): -*- PushService.jsm: Stopped existing
> alarm 9
> 06-03 12:41:52.627 I/Gecko   ( 1019): -*- PushService.jsm: Socket error
> 2152398868

This error code is NS_ERROR_NET_RESET. Is this over wifi or 3g. It seems the network drops, which is a legitimate reason to reconnect. You can use this bit to discover the error name:

    for (var i in Components.results)
      if (Components.results[i] == <error code>)
        debug("Status code: " + i);
Flags: needinfo?(amac)
Reasking approval for b2g18. See comment 16.(sorry about the earlier markup akeybl).
This happens over 3G only, and it seems to happen only when we're using the APN with a proxy configured. What it's actually happening (I ran a tcpdump on the device) is that the proxy server is sending a reset packet about 5 minutes after it sent the last ACK. Any server initiated traffic isn't being delivered to the phone on that 5 minutes. 

We're still trying to determine if that's a server problem or if there's something we're transmitting that causes this behavior on the proxy. If we find something that can be done on the client side to avoid the proxy misbehaving (or if it's actually a client problem) we'll file another bug.
Flags: needinfo?(amac)

Updated

5 years ago
Attachment #748369 - Flags: approval-mozilla-b2g18? → approval-mozilla-b2g18+
Nikhil: are we fine landing just part1 of this for now?  We have at best weak review for part 2 so far (i.e. comment 12 and 13).

If this is a blocker let me know and I can bump it up on honza's todo queue.
(In reply to Jason Duell (:jduell) from comment #25)
> Nikhil: are we fine landing just part1 of this for now?  We have at best
> weak review for part 2 so far (i.e. comment 12 and 13).
> 
> If this is a blocker let me know and I can bump it up on honza's todo queue.

1 is fine for now.
ok.  Split patch 2 into a new bug if it's not landed by the next branch fork (June 24).
https://hg.mozilla.org/releases/mozilla-b2g18/rev/2c15111d2c67
status-b2g18: --- → fixed
status-b2g18-v1.0.0: --- → wontfix
status-b2g18-v1.0.1: --- → wontfix
https://hg.mozilla.org/releases/mozilla-b2g18_v1_1_0_hd/rev/2c15111d2c67
status-b2g-v1.1hd: --- → fixed
status-firefox22: --- → wontfix
status-firefox23: --- → wontfix
status-firefox24: --- → affected
I can see second patch waiting for review during last two weeks. Is it being reviewed? Or just waiting for people availability to review it? 

In Telefónica we have to execute end to end test plan of push, and this bug is blocking all test cases (we have a week delay because of that). Is there any way to speed up review process to land second patch?
Flags: needinfo?(honzab.moz)
(In reply to Marcelino Veiga Tuimil [:sonmarce] from comment #30)
> I can see second patch waiting for review during last two weeks. Is it being
> reviewed? Or just waiting for people availability to review it? 
> 
> In Telefónica we have to execute end to end test plan of push, and this bug
> is blocking all test cases (we have a week delay because of that). Is there
> any way to speed up review process to land second patch?

The second patch does not affect b2g and I won't be asking for approval-b2g18? on it since it isn't required to land on b2g18. Patch 1 which is relevant to b2g has already landed on all branches. If there are automated scripts that rely on this bug to be marked as FIXED, I'd recommend moving Patch 2 to a separate bug and marking this as fixed.
Nikhil,  I agree the 2nd patch should be moved to a new bug.  And you're the best person to summarize that new bug, so could you do it?  :)
Flags: needinfo?(honzab.moz)
Attachment #751714 - Attachment is obsolete: true
Attachment #751714 - Flags: review?(honzab.moz)
Moved patch 2 to 881416.
Status: NEW → RESOLVED
Last Resolved: 5 years ago
Resolution: --- → FIXED
Whiteboard: [fixed-in-birch][leave open] → [fixed-in-birch]
Thanks for clarifying and spliting the bug, now we can start end to end testing

Updated

5 years ago
status-firefox24: affected → fixed
You need to log in before you can comment on or make changes to this bug.