Closed Bug 1173671 Opened 5 years ago Closed 5 years ago

[Aries] OTA updates (and everything else) downloaded using cell network even though I'm connected to WiFi, after I've turned WiFi off and on again

Categories

(Firefox OS Graveyard :: General, defect, P1)

ARM
Gonk (Firefox OS)
defect

Tracking

(blocking-b2g:2.5+, firefox39 wontfix, firefox40 wontfix, firefox41 fixed, b2g-master fixed)

RESOLVED FIXED
FxOS-S1 (26Jun)
blocking-b2g 2.5+
Tracking Status
firefox39 --- wontfix
firefox40 --- wontfix
firefox41 --- fixed
b2g-master --- fixed

People

(Reporter: dholbert, Assigned: jessica)

References

Details

(Whiteboard: [spark][STR in comment 5, modified to not require OTA in comment 26])

Attachments

(7 files, 1 obsolete file)

Both last night and tonight, I checked for & accepted an OTA update on my Xperia Z3C (aka aries aka spark device).

In both cases, I was connected to my home WiFi, and yet the OTA update was downloaded over my cell phone connection.  I could tell by monitoring the usage app, and also due to SMS messages I received from my carrier during the OTA-update download (letting me know last night that I'd used up 90% of my monthly high-speed data cap, and letting me know tonight that I'd used up 100%).

Not sure if this is update-specific, or if all of the phone's traffic is being channeled through my cell connection for some reason.
I'm using the dogfood channel, with this version info from check_versions.py:

Build ID               20150610165040
Gaia Revision          e3eaf72ccd1bfe6d60d37efde6d3b92c1dbc5ff9
Gaia Date              2015-06-10 03:13:33
Gecko Revision         https://hg.mozilla.org/mozilla-central/rev/95afddf894e3
Gecko Version          41.0a1
Device Name            aries
Firmware(Release)      4.4.2
Firmware(Incremental)  eng.worker.20150528.204812
Firmware Date          Thu May 28 20:48:20 UTC 2015
Bootloader             s1

(This ^ is the version of the OS that I'm currently running, while downloading tonight's OTA update & hitting this bug.  I received this version via an OTA update at work today.)
(My monthly high-speed data cap is 1 GB, which is usually more than enough for me, given that I'm on WiFi most of the time. However, it's nowhere near enough to download >100MB OTA updates each day. :))
QA: I'm going to assume that this wasn't always the case, so it must be a regression.

Dave, Henry, Vincent: Any idea what's happening here? Is there a more appropriate person we could contact?
blocking-b2g: --- → spark+
Flags: needinfo?(hchang)
Flags: needinfo?(dhylands)
Flags: needinfo?(changyihsin)
Priority: -- → P1
Whiteboard: [spark]
Adding Hsin-Yi at Dave's suggestion.
Flags: needinfo?(htsai)
After rebooting, I was able to start re-downloading the OTA update definitely using my WiFi connection (which I could tell because it was proceeding much faster than it had been going over my data connection).

So this doesn't happen 100% of the time for me.

Right now it looks like I can reproduce this reliably if I do the following:
 (1) Reboot phone.
 (2) Allow phone to connect to your WiFi network.
 (3) Check for updates, if you don't already have an OTA prompt.
 (4) Accept update (in notification tray). Watch it start downloading.
 (5) Cancel update after a few seconds (by tapping on it, in notification tray)
 (6) Turn WiFi off & on again.
 (7) Once WiFi has reconnected, re-accept the OTA update in notification tray.

This results in the OTA update resuming, but proceeding at a much slower rate (about .02 MB/sec, which I think means it's using my currently-throttled data service.  When it's downloading over WiFi, it proceeds at about 1 MB/sec)
(Side note: if I try to accept an OTA update *while WiFi is disabled*, I get a nice warning dialog, telling me that updates downloaded with cell data may incur extra charges, etc.  But when I reproduce this bug [consuming cell data despite WiFi being enabled], I get no such dialog.)
Here's "adb logcat -v time" of my STR from comment 5. (from bootup)

I download about 5 MB over the course of a few seconds, and then I cancel it ("AUS:SVC Downloader: cancel" in the logcat), and turn WiFi off and on again, and then resume the download.

Then there's a minute or two of "AUS:SVC Downloader:onStopRequest - socket error, shouldRetrySoon: true" (during which time the update shows as being at 0 MB downloaded in my status tray).

Then the update actually resumes, but at a very slow rate (which I think indicates that I'm using my cell data service).  I cut off the logcat output shortly after that.
I can't tell if the download traffic was via data network from the log.
Could you execute the following command when you suspect the OTA is on
data network?

$ adb shell netcfg
$ adb shell ip route

Thanks!
Flags: needinfo?(hchang)
Attached netcfg/ip route logs, captured while performing comment 5's STR.

"ip route" has this in the first log, but not in the second log:
> default via 192.168.10.1 dev wlan0 

192.168.10.1 is the internal-network IP address of my home router. So it seems to be losing that as the default gateway, after I've turned WiFi off and on again (and allowed it to reconnect & resume downloading an update).
Flags: needinfo?(hchang)
Attachment #8620997 - Attachment description: log of network commands after WiFi has been turned off & on (when I seem to be using data network) → log of network commands when bug is happening (after WiFi has been turned off & on, when I seem to be using data network)
For comparison, here's a log of the network commands when I've turned off WiFi & am intentionally downloading an OTA update over data connection.

The only differences vs. the "when bug is happening" log is:
 * netcfg shows "wlan0    DOWN" instead of UP
 * ip route has no output (whereas in the buggy log, it has a single line with the phone's own IP address, & no default route. Seems like these might be functionally-equivalent states, routing-wise?)
Note from QA: We are depending on bug 1133074 to resolve before we can do regression windows on Aries.
[CC'ing Dragana, as I think she's worked on some networking-related B2G stuff]
Summary: [Aries] OTA updates are downloaded using cell network even though I'm connected to WiFi → [Aries] OTA updates are downloaded using cell network even though I'm connected to WiFi, after I've turned WiFi off and on again
NI Johan on getting regression windows on Aries.
Flags: needinfo?(jlorenzo)
I tried to reproduce on a freshly flash build [1] I wasn't able to repro after 10 tries. I used 2 different Wi-Fi network to connect to:
* Mozilla (WPA2 Enterprise)
* Mozilla Guest (No authentication at all). 

This might be a race condition. Daniel, can you shoot a video? I'll be sure about the timing, then.

In the meantime, scratching regressionwindow-wanted and regression as we currently have no proof.

[1] Build ID               20150611132111
Gaia Revision          68269e7b6510930eb2f644f69d27d456c1bdec75
Gaia Date              2015-06-10 23:36:01
Gecko Revision         https://hg.mozilla.org/mozilla-central/rev/9ebd530c5843
Gecko Version          41.0a1
Device Name            aries
Firmware(Release)      4.4.2
Firmware(Incremental)  eng.worker.20150611.130754
Firmware Date          Thu Jun 11 13:08:01 UTC 2015
Bootloader             s1
Flags: needinfo?(jlorenzo) → needinfo?(dholbert)
(In reply to Johan Lorenzo [:jlorenzo] (QA) from comment #16)
> I tried to reproduce on a freshly flash build [1] I wasn't able to repro
> after 10 tries. I used 2 different Wi-Fi network to connect to:
> * Mozilla (WPA2 Enterprise)
> * Mozilla Guest (No authentication at all).

FWIW, my phone was flashed on ~5/28 and has gotten OTA updates ever since then. Your fresh flash probably gave you a different gonk version, and if we're lucky, maybe it's fixed in newer gonk (?) which is why you can't repro.

I've tested this with 3 different Wi-Fi networks:
 * My home WiFi network (WPA2 pre-shared key [standard])
 * openwireless.org at Mozilla MV office (no auth)
 * openwireless.org at Mozilla SF office (no auth)
Summary: [Aries] OTA updates are downloaded using cell network even though I'm connected to WiFi, after I've turned WiFi off and on again → [Aries] OTA updates (and everything else) downloaded using cell network even though I'm connected to WiFi, after I've turned WiFi off and on again
I don't actually have a OTA update available right now, so I can't be 100% sure I'm reproducing this at the moment.

When I do see an OTA update, I'll take a video (and capture more logs with wifi.debugging.enabled and network.debugging.enabled set to true, per gerard-majax's suggestion in IRC).
Daniel, do you recall when the last time you had an OTA was prior to the OTA in comment 0?

You can always get the latest (if you feel like taking the risks) by changing the channel to dogfood-latest.

We're going to be working on making builds for dogfoods so unless you use the dogfood-latest channel it will slow down in terms of updates.
(In reply to Naoki Hirata :nhirata (please use needinfo instead of cc) from comment #19)
> Daniel, do you recall when the last time you had an OTA was prior to the OTA
> in comment 0?

When I posted comment 0:
 - I had received an OTA update the night before, on 6/10 (and I reproduced this bug & got a SMS about having used up 90% of my data allowance)
 - Then I received 2 OTA updates during the day, on 6/11.
 - Then I started receiving one more OTA update on the night of 6/11 (at which time I noticed myself hitting this bug again, & got a SMS message about having no high-speed data remaining, & filed the bug.)

I don't remember when I'd last received an OTA update before 6/10; probably earlier that day, or the night before. I'm not sure it matters, though, because I think I've actually been hitting this bug for a while now (perhaps for as long as I've been dogfooding, going back to around May 30).  I only just *noticed* the bug because I got SMS messages about reaching my data limit.
(In reply to Johan Lorenzo [:jlorenzo] (QA) from comment #16)
> I tried to reproduce on a freshly flash build [1] I wasn't able to repro
> after 10 tries.

To clarify, when you say "I wasn't able to repro":
 - Did you check our "ip route" command for the difference I noticed in comment 11?
 - Are you sure your update wasn't actually downloading over your data network? (If your data is fast, the download speed difference might not be noticeable. It's only noticeable for me right now because my data connection has been rate-limited.)

(I don't think there's anything particularly special about my timing when I perform the steps in comment 5; I've allowed less/more time in various places when I've triggered this since filing, and it doesn't seem to affect the reproduciblity for me.)
Flags: needinfo?(jlorenzo)
Flags: needinfo?(dholbert)
Whiteboard: [spark] → [spark][tentative STR in comment 5]
I know that the updater listens for network:offline-status-changed events and uses those to determine whether the network is available or not.

There are a bunch of Network files in dom/system/gonk which I believe are responsible for managing this stuff.

It looks like jjong has touched NetworkManager.js alot, so adding ni? for her.
Flags: needinfo?(dhylands) → needinfo?(jjong)
On Firefox OS, wifi and data call are mutually exclusive, if wifi is enabled and connected, data call should be disconnected automatically, unless something went wrong. 

From attachment 8620997 [details], it seems that there is no default route, I am not sure how the download was resumed. If you are able to reproduce it, please set wifi.debugging.enabled, network.debugging.enabled and ril.debugging.enabled to true. You can also enable them in Settings -> Developer. Thanks.

Keeping NI for tracking.
(In reply to Daniel Holbert [:dholbert] from comment #21)
> To clarify, when you say "I wasn't able to repro":
>  - Did you check our "ip route" command for the difference I noticed in
> comment 11?
Yes, I did.

>  - Are you sure your update wasn't actually downloading over your data
> network? (If your data is fast, the download speed difference might not be
> noticeable. It's only noticeable for me right now because my data connection
> has been rate-limited.)
My data connection is fast (H+) indeed, howeverm neither Cost Control nor my carrier told me I downloaded more than 50 MB over the data network.

> (I don't think there's anything particularly special about my timing when I
> perform the steps in comment 5; I've allowed less/more time in various
> places when I've triggered this since filing, and it doesn't seem to affect
> the reproduciblity for me.)
Okay, there must be something I don't do like you. In any case, a video starting at the time you reboot, should help me to repro.
Flags: needinfo?(jlorenzo) → needinfo?(dholbert)
Flags: needinfo?(changyihsin)
Haven't received any OTA update offers since comment 12; will record a video (and save logcat output with prefs from comment 23) once I do.
Since it looks like updates are coming slower now, I'm now using "download a large (>100MB) file from ftp.mozilla.org" as my testcase.

I reproduced this bug once today, using the STR in comment 5 with that tweak. The second time I tried, though -- after rebooting (and starting to record a video -- I couldn't actually get it to use my data connection for the download. I did see it failing to get a default gateway in "ip route", though.  (So, the download just never successfully resumed at all.

Here's adb logcat output from that, if it helps. I imagine the RIL debugging pref will not have added anything useful to this log (since in this case, it didn't fail over to using my data connection; it just ended up with no connection). But maybe there's some useful stuff in here from "networking.debugging.enabled" at least.
Flags: needinfo?(dholbert)
Thanks Daniel for trying to reproduce this. Unfortunately, I can't find network logs in attachment 8622760 [details] (you should find some 'NetworkManager' logs in it if it's enabled). 

Is this issue Aries only? We are getting our device soon, maybe I can have a try then.
(In reply to Jessica Jong [:jjong] [:jessica] from comment #27)
> Unfortunately, I can't find
> network logs in attachment 8622760 [details] (you should find some
> 'NetworkManager' logs in it if it's enabled). 

Darn. I set that pref in WebIDE ("Device Preferences" screen), but it looks like it gets reset every time I reboot.

I'll try to repro again tomorrow, and I'll set that pref first-thing after bootup, I guess.

> Is this issue Aries only? We are getting our device soon, maybe I can have a
> try then.

(Not sure.)
Attachment #8622760 - Attachment is obsolete: true
OK, I reproduced the actual bug (with failing over to using my data connection), and I captured logcat output with debug-logging prefs successfully set (for network, wifi, & ril).

Particularly-relevant portion, after I re-enabled my WiFi:
06-16 15:20:01.404   316   316 I Gecko   : -*- NetworkManager: resetRoutingTable failed
06-16 15:20:01.404   316   316 I Gecko   : -*- NetworkService: dequeue id: 35
06-16 15:20:01.404   316   316 I Gecko   : -*- NetworkService: Id 35 is not on top of the queue
06-16 15:20:01.404   316   316 I Gecko   : -*- NetworkManager: Evaluating whether active network needs to be changed.
06-16 15:20:01.404   316   316 I Gecko   : -*- NetworkService: Going to change default route to rmnet0

I think rmnet0 is my data connection; not sure.

Per attachment description, I started a download over WiFi, and disabled it after a few seconds (and waited for a bit).

Then at ~exactly 15:20:00 (in the log), I re-enabled WiFi.

I tried to resume the download several times. The first time, it failed. The second time, I'm not sure it actually tried to resume. The third time, it did resume, and it was clearly using my data network. (Getting .02 MB/sec, due to my currently-throttled cell data situation.)
The successful download resume (using my cell data connection) happened at around 15:20:51 in the log, and I think it corresponds to this logged message:
> 06-16 15:20:51.744   316   816 I Gecko   : RIL Worker: Received 80 bytes.
One other side note which may be relevant:

When this bug is happening, my status bar simultaneously shows my cell service meter *with a data-type classification* (currently "4G" despite my throttled state), alongside the active-WiFi symbol.

When this bug is *not* happening, I don't think I normally see a data-type classification on my cell service meter. (I just see a number of bars, but no "4G" label.)
(In reply to Johan Lorenzo [:jlorenzo] (QA) from comment #24)
> Okay, there must be something I don't do like you. In any case, a video
> starting at the time you reboot, should help me to repro.

Here's a video of me reproducing this: https://www.youtube.com/watch?v=qzxv5wlVUOA
(This video was taken immediately after reproducing it & saving the most recent logcat output. i.e. I reproduced twice in a row; capturing logcat w/ debugging logging prefs the first time, and videoing it the second time.)

Relevant moments:
 Initial download progressing quickly:
  https://www.youtube.com/watch?v=qzxv5wlVUOA#t=0m55s
 Disabling wifi:
  https://www.youtube.com/watch?v=qzxv5wlVUOA#t=1m5s
 Successfully resuming, at an extremely-slow (cell-data) rate:
   https://www.youtube.com/watch?v=qzxv5wlVUOA#t=2m5s
Flags: needinfo?(jlorenzo)
(The file-download I'm using is just an arbitrary 125mb file from http://ftp.mozilla.org/pub/mozilla.org/firefox/nightly/latest-trunk/ )
Thanks Daniel for reproducing and providing useful logs. I think I know what is happening here...

When you re-enable wifi, network connection is switched from data to wifi; when doing this, it will try to remove data's ipv4 and ipv6 default routes [1], and since Daniel is using TMO, it is default to IPv6, so it has problem removing ipv4 default route and returns error.

> 06-16 15:20:18.054   316  1093 D NetUtils: failed to remove default route for rmnet0: No such process

I think we should just warn if something wrong happens when we try to remove old default routes (usually it is because the route does not exist), just like the ipv6 case [2]. I'll provide a patch to fix this.

Note that this should only happen on Aries + TMO (ipv6), and we have disabled ipv6 on Aries (bug 1171185) recently.


[1] http://hg.mozilla.org/mozilla-central/file/d7c148c84594/dom/system/gonk/NetworkUtils.cpp#l1960
[2] http://hg.mozilla.org/mozilla-central/file/d7c148c84594/dom/system/gonk/NetworkUtils.cpp#l1962
Flags: needinfo?(jjong)
Attached patch patch.Splinter Review
Comment on attachment 8623502 [details] [diff] [review]
patch.

Edgar, may I have your review on this? You can refer to comment 34 for details. Thanks!
Attachment #8623502 - Flags: review?(echen)
(In reply to Jessica Jong [:jjong] [:jessica] from comment #34)
> Daniel is using TMO, it is default to IPv6, so it has problem
> removing ipv4 default route and returns error.

That explains why I couldn't reproduce in France. We don't have any carrier using IPv6. Clearing my NI as the problem has been identified.
Flags: needinfo?(jlorenzo)
Sorry, disregard comment 38; as jjong noted, ipv6 is now disabled in Aries, so this shouldn't reproduce there, with current gonk. (I just verified that I can't reproduce this with spark RC1 which I flashed last night.)
Comment on attachment 8623502 [details] [diff] [review]
patch.

Review of attachment 8623502 [details] [diff] [review]:
-----------------------------------------------------------------

Looks good to me. Thank you.

This reminds me one thing: 
The setDefaultRoute() now supports removing default route on old interface and then setting default route on new interface.
But we already have removeDefaultRoute()[1] in nsINetworkService for removing default routing, it seems to me that the setDefaultRoute() could focus on setting default route only.
For removing default route, we could just reuse the removeDefaultRoute().
What do you think? We can discuss this in a separated bug.

[1] https://dxr.mozilla.org/mozilla-central/source/dom/system/gonk/nsINetworkService.idl?from=nsINetworkService.idl#325-341
Attachment #8623502 - Flags: review?(echen) → review+
(In reply to Edgar Chen [:edgar][:echen] from comment #40)
> Comment on attachment 8623502 [details] [diff] [review]
> patch.
> 
> Review of attachment 8623502 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> Looks good to me. Thank you.
> 
> This reminds me one thing: 
> The setDefaultRoute() now supports removing default route on old interface
> and then setting default route on new interface.
> But we already have removeDefaultRoute()[1] in nsINetworkService for
> removing default routing, it seems to me that the setDefaultRoute() could
> focus on setting default route only.
> For removing default route, we could just reuse the removeDefaultRoute().
> What do you think? We can discuss this in a separated bug.
> 
> [1]
> https://dxr.mozilla.org/mozilla-central/source/dom/system/gonk/
> nsINetworkService.idl?from=nsINetworkService.idl#325-341

Agree, we should remove old default routes explicitly. I'll land this first and let's handle this in bug 1175817, left some comments there.
https://hg.mozilla.org/mozilla-central/rev/e6094f82d2ab
Status: NEW → RESOLVED
Closed: 5 years ago
Resolution: --- → FIXED
Target Milestone: --- → FxOS-S1 (26Jun)
blocking-b2g: spark+ → 2.5+
Duplicate of this bug: 1167028
You need to log in before you can comment on or make changes to this bug.