Closed Bug 1007953 Opened 10 years ago Closed 10 years ago

[B2G][Tarako][Geolocation]Random location jumping when using Marketplace App HERE Maps while device is stationary

Categories

(Core :: DOM: Geolocation, defect)

ARM
Gonk (Firefox OS)
defect
Not set
normal

Tracking

()

RESOLVED FIXED
blocking-b2g 1.3T+
Tracking Status
b2g-v1.3T --- fixed
b2g-v1.4 --- wontfix
b2g-v2.0 --- wontfix

People

(Reporter: mclemmons, Assigned: garvan)

References

()

Details

(Keywords: verifyme)

Attachments

(3 files, 4 obsolete files)

Description: User downloads, installs, and launches Marketplace App HERE Maps. User elects to allow HERE Maps to know one's location. Zoom in several times and user witnesses the green dot jumping several miles from East to West and vice versa everso often. 

Prerequisites:
1. Enable both Wi-Fi and Data Connection
2. Download and Install HERE Maps from Marketplace

Repro Steps
1) Updated Tarako to BuildID: 20140508014002
2) Tap Marketplace App HERE Now and allow HERE Maps access to know location
3) Zoom in or out (press the + or - signs several times) so multiple city names can be viewed
4) Observe green dot for at least two minutes

Actual:
Green dot for current location jumps constantly despite device in fixed location.

Expected:
Green dot for current location remains stationary.

Repro Rate: 10/10 – 100%
Attachments: Firewatch, Logcat, Video Clip = https://www.youtube.com/watch?v=VzkBv6V2iqY (specific jumps occur at 14 second, 19 second, 1:28 and 1:34 marks.)

Environmental Variables:
Device: Tarako 1.3T MOZ
BuildID: 20140508115937
Gaia: 554fd996205fef74d88e988f3596c58dcd05df28
Gecko: cfc406e6ec85
Version: 28.1
Firmware Version: sp6821a-gonk4.0-4-29
Attached file Firewatch Log
blocking-b2g: --- → 1.3T?
This makes geolocation seem like it's broken in a mapping context if it's jumping around this much.
The logcat shows an alternating pattern of requests to the service. In the first if sends WiFi and cell data and receives a result based on WiFi. But than a bit later it sends a request with only cell data in it, and gets a much worse result based on cell data alone. This cycle continues, where it is alternating between cell-only and cell+wifi based requests.

So I think this is a client-side issue and not an issue on the service end.

It also makes quite a number of outbound requests, one about every 5 seconds. That sounds to me like we are missing some sort of local caching here. If we send the exact same data to the service, it is not gonna return a different result for quite some time (a day would likely be safe).
ni? Doug / Erin for further comments
Flags: needinfo?(elancaster)
Flags: needinfo?(dougt)
If this is client-side, dougt and rbarnes should do initial evaluation with Garvan included.
Flags: needinfo?(rlb)
Flags: needinfo?(elancaster)
Trying to understand the NetworkGeolocationProvider.js code, here's what I think happens:

1. The WifiGeoPositionProvider sets up a nsIWifiMonitor during startup and its own onChange method is called whenever the access points change.
2. During startup it gets the current list of access points once.
3. Inside onChange we set up the global gWifiResults to the list of filtered access points we want to use

4. Something asks for a location, which triggers the notify function
5. The notify function calls updateMobileInfo to collect cell data
6. updateMobileInfo works synchronously/blocking and sets up the global gCellResults with the cell data we want to send
7. notify uses both gWifiResults and gCellResults and makes an outbound service request
8. notify sets gWifiResults and gCellResults back to null

So far so good, now the trouble begins:

9. Something asks for a location update again, triggering notify
10. updateMobileInfo is called in a blocking fashion again and sets up gCellResults
11. gWifiResults is still null, since the underlying access point list hasn't changed yet
12. notify makes an outbound request with only cell data

13. The underlying access point list is updated again and gWifiResults set-up correctly once more
14. Start again at 4

So the issue is gWifiResults being nullified and only being updated asynchronously. Or rather access point updates happening much less frequently than something asking for location updates.
blocking-b2g: 1.3T? → 1.3T+
Assignee: nobody → gkeeley
Attached patch bug_1007953.patch (obsolete) — Splinter Review
Wifi scanner as the sole the heartbeat that drives location updates, shorten the wifi scanner wait to 5s to ensure this (also stop and restart the scanner on WifiGeoPositionProvider.startup() to ensure this). Exception to this is if wifi scan is turned off in the prefs, use a short timer to drive the location update. 
In the patch, because sendLocationRequest(wifiData) is called by the wifi scanner, and if the wifi scanner is functioning, then consecutive calls will have a valid wifiData arg, and the will do xhr.send(wifi+cell). In the bug, we are seeing xhr.send(wifi+cell), then xhr.send(cell), the latter providing a poorer location.
Attachment #8426437 - Flags: review?(josh)
Flags: needinfo?(rlb)
Trying to investigate on my Tarako. Built with
BRANCH=v1.3t ./config.sh tarako

I can't use the NetworkGeolocationProvider.js from this patch, as nsGeolocation.cpp does not have the function Geolocation::LocationUpdatePending() (which we try call from the js).

Can someone tell me what branch I should be pulling?
For tarako, 1.3t is the right branch.
Doug, can you clarify for me what I am seeing in comment 8? 

As of today, I have a Tarako with a SIM that I can build to, and am verifying the original bug report, and the effect of the patch. Will report back later today.
Flags: needinfo?(dougt)
Attached patch bug_1007952_1.3t_patch.diff (obsolete) — Splinter Review
Patch that applies cleanly to 1.3t. The other changes in the patch for central (bug_1007952.patch) are not used by Tarako.

Verified the bug as reported on Tarako, applied this patch, verified the fix.

(Aside: comment 8 was a mistake on my part)
Attachment #8427416 - Flags: review?(josh)
Blocks: 1014916
Comment on attachment 8426437 [details] [diff] [review]
bug_1007953.patch

Review of attachment 8426437 [details] [diff] [review]:
-----------------------------------------------------------------

Can you rephrase the commit message a bit? I find it difficult to understand as written.

::: dom/system/NetworkGeolocationProvider.js
@@ +128,5 @@
> +      // wifi thread triggers WifiGeoPositionProvider to proceed, with no wifi, do manual timeout
> +      this.timeoutTimer = Cc["@mozilla.org/timer;1"].createInstance(Ci.nsITimer);
> +      this.timeoutTimer.initWithCallback(this,
> +                                        gLocationRequestTimeout,
> +                                        this.timeoutTimer.TYPE_REPEATING_SLACK);

What do we wait for now?

@@ +197,3 @@
>      LOG("updateMobileInfo called");
>      try {
> +// TODO for each radio interface.

nit: indent this please

@@ +213,2 @@
>          "radio": "gsm",
> +// TODO type

nit: indentation

@@ +215,5 @@
>          "mobileCountryCode": iccInfo.mcc,
>          "mobileNetworkCode": iccInfo.mnc,
>          "locationAreaCode": cell.gsmLocationAreaCode,
>          "cellId": cell.gsmCellId,
> +// TODO signal strength? 

nit: trailing whitespace
Attachment #8426437 - Flags: review?(josh) → review+
Also what are the power implications of switching from 60 seconds to 5 seconds now?
Comment on attachment 8427416 [details] [diff] [review]
bug_1007952_1.3t_patch.diff

Review of attachment 8427416 [details] [diff] [review]:
-----------------------------------------------------------------

::: dom/system/NetworkGeolocationProvider.js
@@ +101,5 @@
> +      // wifi thread triggers WifiGeoPositionProvider to proceed, with no wifi, do manual timeout
> +      this.timeoutTimer = Cc["@mozilla.org/timer;1"].createInstance(Ci.nsITimer);
> +      this.timeoutTimer.initWithCallback(this,
> +                                        gLocationRequestTimeout,
> +                                        this.timeoutTimer.TYPE_REPEATING_SLACK);

Nit: align these lines with the 't' in this.

@@ +195,5 @@
>    notify: function (timeoutTimer) {
> +    // If Wifi scanning is disabled, then we can not depend on that for the
> +    // heartbeat that drives location updates.  Instead, just use a timer which
> +    // will drive the update.
> +    if (gWifiScanningEnabled == false) {

Why is this check here but not in the other patch?
Attachment #8427416 - Flags: review?(josh) → review+
Attached patch bug_1007953_1.3t_patch.diff (obsolete) — Splinter Review
1.3t patch addressing comments from jdm
Attachment #8427416 - Attachment is obsolete: true
Attachment #8430236 - Flags: review?(josh)
Attached patch bug_1007953_moz_central.patch (obsolete) — Splinter Review
mozilla_central patch addressing review from jdm
Attachment #8426437 - Attachment is obsolete: true
Attachment #8430238 - Flags: review?(josh)
Thanks for the review and questions Josh.

Old code: NetworkGeolocationProvider.js waits for 2 events (before sending a geolocation request to GLS/MLS):
* the onChange from the wifi listener thread, and notify from the timer timeout
* old version was driven by the timer timeout, and depended on the wifi list having been reported before the timer
* for getCurrentPostition, this worked ok AFAIK
* In the watch position case, the timer is timing out before the wifis have been reported (sometimes)

New version:
* if wifi is on, NetworkGeolocationProvider only listens for wifi events
* timer timeout is only for when there is no wifi, so that the geoloc. request is still triggered (could be with just cell, or an empty request to get geoip)

The new version fixes the jumping caused by those 2 timers (and also is a bit more deterministic/predictable in behaviour).

The existing B2G wifi thread calls back every 5 sec so no change was made to this. 
But for other platforms, the wifi threads wait timeout was changed to match B2Gs behaviour. 

Very good question regarding the power cost of this. I don't have a good answer. I know from experience (and my own battery test cases) that querying the Android wifi layer for info has no cost tied to the freqency of the query, as long as the wifi isn't in a low-power state. The query to the wifi layer is just returning a cache of the last list of access points, and layers above this cannot control the frequency of the scan, regardless of how often the ask the wifi layer for a list of access points.

That said, what happens on a mac/PC/linux laptop, now that we are querying this layer at a higher freq, does it have a battery drain effect? I am going on the assumption that it is very minimal.
Josh, I don't think I can touch that 5 timer in the non-wifi case without greater code impact. The code is originally designed around that heartbeat idea of 5 sec. I do plan to get to this when the code is refactored to better meet the geolocation spec.
(Although if you have a low-impact idea, I can try it).
Blocks: 1009592
Attachment #8430236 - Flags: review?(josh) → review+
Attachment #8430238 - Flags: review?(josh) → review+
I have a couple of questions:
is bug 1014924 a duplicate of this?

Also I found that geolocation seems to work with SIM data as well but places me in a completely different location ( Sacramento instead of Vallejo area ).  Would these patches help resolve this?
Flags: needinfo?(gkeeley)
Garvan, the 1.3t patch doesn't apply on 1.3t... Also, can you format it as an hg patch? that will make it easier to land.
Comment on attachment 8430236 [details] [diff] [review]
bug_1007953_1.3t_patch.diff

Review of attachment 8430236 [details] [diff] [review]:
-----------------------------------------------------------------

::: dom/system/NetworkGeolocationProvider.js
@@ +101,5 @@
> +      // wifi thread triggers WifiGeoPositionProvider to proceed, with no wifi, do manual timeout
> +      this.timeoutTimer = Cc["@mozilla.org/timer;1"].createInstance(Ci.nsITimer);
> +      this.timeoutTimer.initWithCallback(this,
> +                                         gLocationRequestTimeout,
> +                                         this.timeoutTimer.TYPE_REPEATING_SLACK);

What if I'm driving in the country where there are no wifi signals at
all, but I do switch cell towers now and then. Will the wifi watcher
trigger anything if nothing changes for wifi?  What if I have wifi
turned off? In my testing it seemed that I never got any updates
in that case. Perhaps scanning was still enabled, but the wifi
radio was off so the watcher was never calling us? Using async
notification of wifi changes seems like the right thing to do, but 
maybe you could pair that with a timer to poll for cell tower changes?
(I'm assuming there is no way to just register a listener for 
cell tower changes.)

It seems to me that we need a timer running all the time. And testing this patch along with the patch in 1009592 indicates that if wifi is off, we never send a location request.  I think wifi can be off even when scanning is enabled or something.
@nhirata:

> is bug 1014924 a duplicate of this?

No.  See comment # 11 in that bug.

> Also I found that geolocation seems to work with SIM data as well but places me in a completely different location ( Sacramento instead of Vallejo area ).  Would these patches help resolve this?

No.  Accuracy is independent of this bug.


@dfj:

http://mxr.mozilla.org/mozilla-central/source/dom/system/NetworkGeolocationProvider.js#66

If you take a look, this pref controls if we startup the timer.  It is not toggled anywhere in the system (it can be via gecko preferences, but typically it is set for the build and not messed around with).   So, basically this timer is always running - as you suggest.
the m-c patch doesn't apply either.  I could clean it up, but not sure if we are missing another patch:

  26 applying attachment.cgi?id=8430238
  27 patching file dom/system/NetworkGeolocationProvider.js
  28 Hunk #5 FAILED at 175
  29 Hunk #6 succeeded at 253 (offset 1 lines).
(In reply to Doug Turner (:dougt) from comment #22)
> 
> 
> @dfj:
> 
> http://mxr.mozilla.org/mozilla-central/source/dom/system/
> NetworkGeolocationProvider.js#66
> 
> If you take a look, this pref controls if we startup the timer.  It is not
> toggled anywhere in the system (it can be via gecko preferences, but
> typically it is set for the build and not messed around with).   So,
> basically this timer is always running - as you suggest.

Doug: if geo.wifi.scan is always true, then this patch means that we never start a timer. And that means that if wifi is turned off, geolocation just won't work.
And, if geo.wifi.scan is always false, then we will always start a timer, but will also always ignore wifi and base our location on cell towers only. If I understand the patch correctly, it only works if geo.wifi.scan accurately reflects whether wifi scanning is available or not. But if its value never changes then this patch can't work correctly.
One other point to consider is that geo.wifi.scan is only checked when the geolocation subsystem initializes.
I think we are on to another bug. My patch did not change the underlying use of the prefs, that I am aware of. The behaviour *should* be the same. 
NetworkGeolocationProvider gets shutdown and restarted when not currently in use. As always, if an app is running a watch, it will stay alive until it stops.
One item of note, is that on Tarako the wifi scanning can't be shut off (see bug 10149824 )
But yet, this should all be fixed as a separate bug.
Flags: needinfo?(gkeeley)
Above, I meant see this bug https://bugzilla.mozilla.org/show_bug.cgi?id=1014924
regarding the wifi shut off issue.
This is a patch for 1.3t only, and this bug now is Tarako-only. Don't clear the global wifi list, so that if the timer timeout fires before the wifi timer has fired, we will have the last wifi results.

We still want to change NetworkGeolocationProvider.js to not run 2 timers on m-c, opening a new bug for that, with the same report as this bug.
Attachment #8430236 - Attachment is obsolete: true
Attachment #8430238 - Attachment is obsolete: true
Attachment #8431732 - Flags: review?(dougt)
Comment on attachment 8431732 [details] [diff] [review]
bug_1007953_1.3t_minimal_patch.diff

Review of attachment 8431732 [details] [diff] [review]:
-----------------------------------------------------------------

This looks good for 1.3T.
Attachment #8431732 - Flags: review?(dougt) → review+
https://hg.mozilla.org/releases/mozilla-b2g28_v1_3t/rev/869d971df367

Lets mark this bug closed as it is against the Tarako device.  I do not want this patch on m-c.  Instead, we should go with the other approach, fix onerror(), and ensure that the timer is setup during the onerror() callback.  Garvan, can you file these bugs?
Status: NEW → RESOLVED
Closed: 10 years ago
Flags: needinfo?(gkeeley)
Resolution: --- → FIXED
Blocks: 1018379
Filed for mozilla-central: https://bugzilla.mozilla.org/show_bug.cgi?id=1018379
Flags: needinfo?(gkeeley)
No longer blocks: 1018379
Depends on: 1018379
Blocks: 971637
qawanted to ask for testing of this patch on latest 1.3T build.
Keywords: qawanted, verifyme
Note - QA Wanted is used to address some prelanding. If you want something addressed post landing, use verifyme.
Keywords: qawanted
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: