Closed Bug 1071372 Opened 10 years ago Closed 10 years ago

[Telemetry] The usageTime is incorrect in App usage metrics when using a locked SIM card

Categories

(Firefox OS Graveyard :: Gaia::System, defect)

ARM
Gonk (Firefox OS)
defect
Not set
normal

Tracking

(blocking-b2g:2.1+)

RESOLVED DUPLICATE of bug 1072075
blocking-b2g 2.1+

People

(Reporter: gchang, Assigned: djf)

Details

(Whiteboard: ft:media)

Attachments

(2 files)

### Steps:
1. Set the retry interval to 30 sec in gaia/apps/system/js/app_usage_metrics.js
2. Enable wifi in device first, then power off device
3. Put a locked SIM card in device
4. Power on device
5. Enter PIN code in the lock screen
6. Pull down the notification screen
7. disable and re-enable wifi icon in the notification screen
8. The settings app will be opened.
9. Wait until the data is sent to server (The data should be the first one sent by device after device is power on)
------------------------------------------------------------------
Below is the data sent by device

{
    "apps": {
        "app://verticalhome.gaiamobile.org/manifest.webapp": {
            "activities": {},
            "installs": 0,
            "invocations": 1,
            "uninstalls": 0,
            "usageTime": 378772
        }
    },
    "deviceID": "9uhoy3oc",
    "deviceinfo": {
        "developer.menu.enabled": true,
        "deviceinfo.platform_build_id": "20140917160201",
        "deviceinfo.platform_version": "34.0a2",
        "deviceinfo.update_channel": "unknown"
    },
    "locale": "en-US",
    "screen": {
        "devicePixelRatio": 1.5,
        "height": 569,
        "width": 320
    },
    "start": 1411379979666,
    "stop": 1411380106018
}
------------------------------------------------------------------

### Expected:
1. The usageTime should be a reasonable value. 378772 is too large.

### Actual:
1. The usageTime of verticalhome app is too large. It's incorrect.

### Reproduce rate
always

### Version:
Device    flame
Gaia-Rev        379e68fe729a684fa2fcddb30ea1e65508db73e1
Gecko-Rev       https://hg.mozilla.org/releases/mozilla-aurora/rev/7cb113c3ce0c
Build-ID        20140917160201
Version         34.0a2
[Blocking Requested - why for this release]:
blocking-b2g: --- → 2.1?
Blocking Reason: In correct usage time recorded. Affects data driven product decisions. 

NI djf/marshall - please investigate if this is a bug in the usage collection changes that you worked on.
blocking-b2g: 2.1? → 2.1+
Component: Gaia → Gaia::System
Flags: needinfo?(marshall)
Flags: needinfo?(dflanagan)
Whiteboard: ft:media
Assignee: nobody → dflanagan
I've never used a locked sim card, and I can't see what that has to do with this bug. I suspect it is changing the behavior of the lock screen so that we don't get the normal wake up or unlock when we start the phone.

The code initializes currentAppStartTime to 0 and seems to rely on that being set to the actual time when the phone wakes up. But if that process is changed by the locked sim, then the first reading we get would be way off...

I'll have to figure out how to lock a sim, I guess.
Flags: needinfo?(dflanagan)
On the other hand, if the bad usage time was because we were using a zero start time, then we'd expect a time like 1411379980 not just 378772.  On the other hand, the 379 in the first number is awfully close to the 378 in the second number... I need to look at how and when the batch start time gets set.
Looking over the code, I can't come up with an explanation that would account for these particular numbers.  This batch of metrics was generated on September 22. 378772 seconds would mean that the homescreen was running since September 17th. That actually is plausible if display blanking was turned off (or if having a locked sim card somehow means we can't detect when we go to the lock screen) and the phone was just sitting on a desk plugged in.

But the batch start and end times are both on sept. 22, so that isn't consistent. Maybe there is some kind of race condition having to do with the wifi being turned off and then back on again at the same time that we're trying to send a batch of data.

Gerry: can you make this happen consistently?  If so, would you attach a logcat, please?  To check whether it is a race condition, would you try again with a 2 minute report time instead of 30 seconds?  Also, does the bug still reproduce if you turn off metrics and then turn it back on.  (Settings->Improve B2G->Submit data)

Thanks
Flags: needinfo?(gchang)
More questions for Gerry:

When you say "locked SIM card" you mean a card with a SIM PIN, right? Or do you mean something like http://en.wikipedia.org/wiki/SIM_lock

In Step 5 of your STR you say "5. Enter PIN code in the lock screen".  But I see the lock screen and then a separate PIN lock screen.  Do you have both enabled?  Or do you use the PIN lock and have the lockscreen disabled?  (If the latter, then it is more likely that you did actually run the homescreen for 378000 seconds.)

In step 8, you write "8. The settings app will be opened."  Do you mean that it opens automatically when you re-enable wifi?  I don't see that, but I'm using master instead of 2.1. Perhaps that flow changed.

Have you tried reproducing this on master? Or is it only a 2.1 bug?
(In reply to David Flanagan [:djf] from comment #6)
> More questions for Gerry:
> 
> When you say "locked SIM card" you mean a card with a SIM PIN, right? Or do
> you mean something like http://en.wikipedia.org/wiki/SIM_lock
> 
Right.
> In Step 5 of your STR you say "5. Enter PIN code in the lock screen".  But I
> see the lock screen and then a separate PIN lock screen.  Do you have both
> enabled?  Or do you use the PIN lock and have the lockscreen disabled?  (If
> the latter, then it is more likely that you did actually run the homescreen
> for 378000 seconds.)
> 
Sorry if I'm not clear in step 5. I just mean enter the PIN code of SIM. When you are back from rebooting device, the PIN code screen will be pop-up, so, you should enter a PIN code in this stage. (I don't disable lockscreen)

> In step 8, you write "8. The settings app will be opened."  Do you mean that
> it opens automatically when you re-enable wifi?  I don't see that, but I'm
> using master instead of 2.1. Perhaps that flow changed.
> 
The behavior should not change. Why can I open settings app, because I'm using a newly flash gaia build and there are no wifi network remembered in device. You can forget all wifi network first. In step 2, the wifi is enabled, but device doesn't join any network.
The step here is to to force appclosed event of verticalhome to be called.

> Have you tried reproducing this on master? Or is it only a 2.1 bug?
It seems that master is unstable. I can't even open settings app.
Flags: needinfo?(gchang)
(In reply to Gerry Chang [:cfchang] from comment #7)
> (In reply to David Flanagan [:djf] from comment #6)
> > More questions for Gerry:
> > 
> > When you say "locked SIM card" you mean a card with a SIM PIN, right? Or do
> > you mean something like http://en.wikipedia.org/wiki/SIM_lock
> > 
> Right.
> > In Step 5 of your STR you say "5. Enter PIN code in the lock screen".  But I
> > see the lock screen and then a separate PIN lock screen.  Do you have both
> > enabled?  Or do you use the PIN lock and have the lockscreen disabled?  (If
> > the latter, then it is more likely that you did actually run the homescreen
> > for 378000 seconds.)
> > 
> Sorry if I'm not clear in step 5. I just mean enter the PIN code of SIM.
> When you are back from rebooting device, the PIN code screen will be pop-up,
> so, you should enter a PIN code in this stage. (I don't disable lockscreen)
> 
> > In step 8, you write "8. The settings app will be opened."  Do you mean that
> > it opens automatically when you re-enable wifi?  I don't see that, but I'm
> > using master instead of 2.1. Perhaps that flow changed.
> > 
> The behavior should not change. Why can I open settings app, because I'm
> using a newly flash gaia build and there are no wifi network remembered in
> device. You can forget all wifi network first. In step 2, the wifi is
> enabled, but device doesn't join any network.
> The step here is to to force appclosed event of verticalhome to be called.
> 
I think it should be appopened event of other app (like settings app) not appclosed event of verticalhome.
> > Have you tried reproducing this on master? Or is it only a 2.1 bug?
> It seems that master is unstable. I can't even open settings app.
I'm unable to reproduce this bug.  I used the attached patch to modify the transmit interval and to add some extra logging.  Then, to attempt to reproduce, I did this:

 NOFTU=1 make reset-gaia; adb reboot; adb logcat > /tmp/log

By doing a full make reset-gaia, I discard any pending usage data, and forget any wifi networks.  I then follow steps 5 to 10 in the STR.

In the logcat I see data transmission of a data batch that includes only a single homescreen invocation as in this bug report. But the time is just 20s as it should be.

Gerry: can you reproduce this bug? Can you reproduce it with my patch above applied? If so, please send me a logcat. (Ideally using my patch because that outputs more logging information).
Flags: needinfo?(gchang)
Marshall. In https://bugzilla.mozilla.org/show_bug.cgi?id=1071443#c7 you wrote something about "before the system clock gets synchronized".  Can you tell me more about that? Is that really a thing? Can system app JS code start running before the time is properly set?  I suppose that could cause a bug like this...
(In reply to David Flanagan [:djf] from comment #11)
> Marshall. In https://bugzilla.mozilla.org/show_bug.cgi?id=1071443#c7 you
> wrote something about "before the system clock gets synchronized".  Can you
> tell me more about that? Is that really a thing? Can system app JS code
> start running before the time is properly set?  I suppose that could cause a
> bug like this...

I see this happening a lot on the Flame. These are the race conditions I've been able to repro:
Race 1a) Device boots up, and AUM starts, recording Date.now() in |this.metrics.data.start|
Race 1b) WiFi connects and the system clock gets updated -- |metrics.startTime()| can now actually be after Date.now()

Race 2a) Transmission of the first round of metrics happens, but fails
Race 2b) WiFi reconnects, and system clock gets updated -- |this.lastFailedTransmission| can now be after Date.now()

It's possible that this could happen between start and stop of an app as well, but in practice I haven't seen that happen yet. I've had some early success with code locally that resets both |this.metrics.data.start| and |this.lastFailedTransmission| on the 'moztimechange' event, but right now the best I can do is to lose reported time whenever time is updated.

I think the only reliable solution would be to add an interval timer, and store the previous timestamp, i.e. in pseudo-code:

var lastTimestamp = Date.now();
var currentAppRunningTime = 0;
const RESOLUTION = 5000;

setTimeout(function() {
  if (lastTimestamp > Date.now() || Date.now() - lastTimestamp > RESOLUTION * 2) {
    // We would also need to account for the lock screen since that could cause this timer to not be called
    // for a while..

    currentAppRunningTime += 5
  } else {
    currentAppRunningTime += (Date.now() - lastTimestamp) / 1000;
  }

  lastTimestamp = Date.now();
}, 5000);
Flags: needinfo?(marshall)
Marshall,

How much do we care about this time issue? Presumably when a phone connects to wifi the time will only be adjusted by a few seconds to fix clock drift, right?  Or are we worried about a case where a phone has been sitting discharged and has lost its internal time? Does that even happen? 

Is there actually any way that the moztimechange event you're talking about could cause a 5-day jump in the time as seen in this bug?
Flags: needinfo?(marshall)
(In reply to David Flanagan [:djf] from comment #13)
> Marshall,
> 
> How much do we care about this time issue? Presumably when a phone connects
> to wifi the time will only be adjusted by a few seconds to fix clock drift,
> right?  Or are we worried about a case where a phone has been sitting
> discharged and has lost its internal time? Does that even happen? 

It seems to happen very frequently on the Flame, though I'm not sure if there is an underlying hardware issue or not.

I have seen all kinds of funny behavior in the drift going from minutes, hours, days, and an initial state of epoch time up to the 'current' time..

> 
> Is there actually any way that the moztimechange event you're talking about
> could cause a 5-day jump in the time as seen in this bug?

Yeah, I have seen multiple day jumps in my testing (though not as many as 5 tbh)
Flags: needinfo?(marshall)
It looks like we have more cases where the system time is updated and it affects the metrics counter in Bug 1072075, let's move the discussion there.
(In reply to David Flanagan [:djf] from comment #10)
> I'm unable to reproduce this bug.  I used the attached patch to modify the
> transmit interval and to add some extra logging.  Then, to attempt to
> reproduce, I did this:
> 
>  NOFTU=1 make reset-gaia; adb reboot; adb logcat > /tmp/log
> 
> By doing a full make reset-gaia, I discard any pending usage data, and
> forget any wifi networks.  I then follow steps 5 to 10 in the STR.
> 
> In the logcat I see data transmission of a data batch that includes only a
> single homescreen invocation as in this bug report. But the time is just 20s
> as it should be.
> 
> Gerry: can you reproduce this bug? Can you reproduce it with my patch above
> applied? If so, please send me a logcat. (Ideally using my patch because
> that outputs more logging information).

Yes, I can recreate the bug. I put the logcat in attachment.
Flags: needinfo?(gchang)
Attached file logcat.log
From the logcat, I'm guessing that this must be due to a moztimechange event when wifi connects, so this is probably the same bug as 1072075.  I think we can fix that by switching from Date.now() to performance.now() for app usage time.

We can possibly also use the delta between Date.now() and performance.now() to figure out how much the time has changed for each moztimechange event, and use that to adjust the batch start time, if that seems necessary.

I still wonder why this reproduces so easily for Gerry and not for me. I wonder if his data connection and his wifi connection have time values that are significantly out of sync...  Or whether we have some other bug on the Flame and it just doesn't keep good time when switched off.
The Flame is almost certainly the reason, I also recently started testing this on my Flame and I can reproduce it basically at will..
Gerry,

I still have not been able to reproduce, but I think I may have fixed this bug with my patch in bug 1072075. Would you apply that patch and see if it fixes the bug for you?
Flags: needinfo?(gchang)
(In reply to Marshall Culpepper [:marshall_law] from comment #19)
> The Flame is almost certainly the reason, I also recently started testing
> this on my Flame and I can reproduce it basically at will..

I wonder if this line when booting has anything to do with it:

I/cat     (  299): <6>[   13.083622] alarm_set_rtc: Failed to set RTC, time will be lost on reboot
Hi David, 
I apply your patch and it works for me.
Flags: needinfo?(gchang)
Status: NEW → RESOLVED
Closed: 10 years ago
Resolution: --- → DUPLICATE
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: