Closed Bug 1204846 Opened 6 years ago Closed 6 years ago

Amount of mobile data consumed regresses

Categories

(Firefox OS Graveyard :: Gaia::Cost Control, defect, P1)

ARM
Gonk (Firefox OS)
defect

Tracking

(blocking-b2g:2.5+, firefox44 fixed)

RESOLVED FIXED
FxOS-S9 (16Oct)
blocking-b2g 2.5+
Tracking Status
firefox44 --- fixed

People

(Reporter: gerard-majax, Assigned: timhuang)

Details

(Keywords: dataloss, regression)

Attachments

(1 file, 2 obsolete files)

[Blocking Requested - why for this release]: seems critical

Spotted on a build from september 12th, running on my dogfooding Z3c: amount of consumed mobile data (in utility tray and in cost control app) regressed. Yesterday, it was saying over 480MB, today it is 479MB.

I am pretty confident the value was above 500MB a few days ago. I have data consumption tracking enabled since day 1, resetted on february 15th (migration to a new SIM card forced the reset, which is a pretty bad UX ...) and never resetted since.

Automatic reset is being set to "never". Enabling data and consuming some show that the value is increasing properly. Is there any rationale that could explain why the amount of consumed data reduces ?
Major regression, blocking.
blocking-b2g: 2.5? → 2.5+
It was 479.46MB when I filed the bug. Today morning arounnd 9:30 it was still that amount. and now, I see 467.71MB. And I did use some data in between for sure ...
And 438.39MB now :(
Francisco, Can you please set a priority for this defect?

Thanks
Flags: needinfo?(francisco)
So, on Gecko, I can read that maximum storage is controlled by two parameters: maxStorageSamples and sampleRate: https://dxr.mozilla.org/mozilla-central/source/dom/network/NetworkStatsService.jsm#183

Those values are defined in https://dxr.mozilla.org/mozilla-central/source/dom/network/NetworkStatsDB.jsm#36:
> // Constant defining the maximum values allowed per interface. If more, older
> // will be erased.
> const VALUES_MAX_LENGTH = 6 * 30;
>
> // Constant defining the rate of the samples. Daily.
> const SAMPLE_RATE = 1000 * 60 * 60 * 24;

From this, I would understand that we will indeed never be keeping more data than over six months, with a daily sampling, which seems to be what is used by CostControl.

Ethan, am I right?
Flags: needinfo?(ettseng)
Flags: needinfo?(francisco)
Priority: -- → P1
(In reply to Alexandre LISSY :gerard-majax from comment #5)
> So, on Gecko, I can read that maximum storage is controlled by two
> parameters: maxStorageSamples and sampleRate:
> https://dxr.mozilla.org/mozilla-central/source/dom/network/
> NetworkStatsService.jsm#183
> From this, I would understand that we will indeed never be keeping more data
> than over six months, with a daily sampling, which seems to be what is used
> by CostControl.
> Ethan, am I right?

Sorry for my late response.

Yes. The design of NetworkStats keeps data no more than six months.
The intention is to avoid flooding Database with too much data.
However, if the number of records reach the limitation, NetworkStatsDB shall remove the oldest records and should not decrease the amount of data usage.

The situation you described is absolutely a certain bug in NetworkStats component.

I am in a work week in Mountain View this week, and could not dig into this bug pretty soon.
Flags: needinfo?(ettseng)
Today the widget is not working anymore and the app stays blocked on the loading icon. ... I have not changed anything
Assignee: nobody → tihuang
Tim is already working on this bug and has some progress.

Tim, thanks for taking this bug!
Another point from Tim.. Looking to have this bug resolved for 2.5 release.  Thanks Tim!!
Right now, I can reproduce this bug by adjusting the sample rate to 5 min and making data expire after two hours. It is obvious that this problem is caused by the deleting of expired data. But, the NetworkStats service will make sure that the overall data usage still can be traced. There are two fields of stats accountable for this job, txTotalBytes and rxTotalBytes.

And why this regression happens? I think it is because of that these two fields would not be involved ,according to NetworkStatsDB.jsm, when someone gets samples. Thus, samples lose tracks for expired data, which makes this regression happens. 

The possible solution of this bug could be that we could involve txTotalBytes and rxTotalBytes when acquiring samples. For example, we could directly add them into rxBytes and txBytes in the first sample, which allows tracking of expired data.
Attached patch bug_1204846.patch (obsolete) — Splinter Review
Attachment #8671280 - Flags: review?(ettseng)
Tim, does that means that we are losing the data until your patch lands?
Flags: needinfo?(tihuang)
No, the data dose not lose before my patch lands. Actually, It still remains in the database. After my patch lands, those expired data will show in the Usage.
Flags: needinfo?(tihuang)
Comment on attachment 8671280 [details] [diff] [review]
bug_1204846.patch

Review of attachment 8671280 [details] [diff] [review]:
-----------------------------------------------------------------

Good fix! It is simple enough and effective.
Just need to add some comments. r=me.

::: dom/network/NetworkStatsDB.jsm
@@ +952,5 @@
> +            data.push({ rxBytes: cursor.value.rxBytes +
> +                        cursor.value.rxTotalBytes,
> +                        txBytes: cursor.value.txBytes +
> +                        cursor.value.txTotalBytes,
> +                        date: new Date(cursor.value.timestamp + offset) });

This block looks like a special case handle.
We have at least add some comments to explain its purpose.

For example: Add rx/txTotalBytes to the first sample of the result records so that Cost Control app could calculate the correct usage once previous samples were expired and removed from the Database.
(You could write more precise comments since you understand it better than me).

@@ +993,5 @@
> +                data.push({ rxBytes: cursor.value.rxBytes +
> +                            cursor.value.rxTotalBytes,
> +                            txBytes: cursor.value.txBytes +
> +                            cursor.value.txTotalBytes,
> +                            date: new Date(cursor.value.timestamp + offset) });

Ditto.
Attachment #8671280 - Flags: review?(ettseng) → review+
Attached patch bug_1204846_v2.patch (obsolete) — Splinter Review
Attachment #8671280 - Attachment is obsolete: true
Attachment #8672901 - Flags: review?(ettseng)
I did some modifications and created a new patch. Within this patch, the first sample of getSamples() will only replies rx/txTotalBytes instead of (rx/txBytes + rx/txTotalBytes), because the rx/txTotalbytes contains the rx/txBytes, so that we don't have to reply both values, but only the rx/txTotalBytes.

And I modified the test_findBrowsingTrafficStats.js to allow xpcshell-test working without problems. Because there are two test case do not contain rx/txTotalBytes in samples they create, which will lead to a fail of xpcshell-test.
Comment on attachment 8672901 [details] [diff] [review]
bug_1204846_v2.patch

Review of attachment 8672901 [details] [diff] [review]:
-----------------------------------------------------------------

Thanks Tim!
I only recommend to adjust description in code comment. r=me.

::: dom/network/NetworkStatsDB.jsm
@@ +949,5 @@
>          var cursor = event.target.result;
>          if (cursor){
> +          // We use rxTotalBytes/txTotalBytes instead of rxBytes/txBytes of the
> +          // first sample. The rxTotalBytes/txTotalBytes counts overall
> +          // network usage, so it contains data of expired samples. Hence, The

typo: "the"

@@ +951,5 @@
> +          // We use rxTotalBytes/txTotalBytes instead of rxBytes/txBytes of the
> +          // first sample. The rxTotalBytes/txTotalBytes counts overall
> +          // network usage, so it contains data of expired samples. Hence, The
> +          // Cost Control app could calculate the correct network usage
> +          // once privious samples were expired and removed from the Database.

typo: "previous"

I suggest to modify the comment as following:
"We use rxTotalBytes/txTotalBytes instead of rxBytes/txBytes from the first (oldest) sample. The rx/txTotalBytes fields record accumulative usage amount, which means even if old samples were expired and removed from the Database, we can still obtain the correct network usage for the recent duration by calculating the difference of rx/txTotalBytes between samples."

@@ +994,5 @@
>              } else {
> +              // We use rxTotalBytes/txTotalBytes instead of rxBytes/txBytes of
> +              // the first sample. The rxTotalBytes/txTotalBytes counts overall
> +              // network usage, so it contains data of expired samples. Hence,
> +              // The Cost Control app could calculate the correct network usage

typo: "the"

@@ +995,5 @@
> +              // We use rxTotalBytes/txTotalBytes instead of rxBytes/txBytes of
> +              // the first sample. The rxTotalBytes/txTotalBytes counts overall
> +              // network usage, so it contains data of expired samples. Hence,
> +              // The Cost Control app could calculate the correct network usage
> +              // once privious samples were expired and removed from the

typo: previous
Attachment #8672901 - Flags: review?(ettseng) → review+
Only modify comments.
Attachment #8672901 - Attachment is obsolete: true
Attachment #8673430 - Flags: review+
(In reply to Tim Huang[:timhuang] from comment #19)
> Created attachment 8673430 [details] [diff] [review]
> bug_1204846_v3_only_adjust_comments.patch
> Only modify comments.

Looks good. Let's land it. :)
Keywords: checkin-needed
https://hg.mozilla.org/mozilla-central/rev/9132d165572e
Status: NEW → RESOLVED
Closed: 6 years ago
Resolution: --- → FIXED
Target Milestone: --- → FxOS-S9 (16Oct)
Thanks Tim!
We resolved a thorny 2.5 blocker! \^o^/
Thank you very much :ethan and :timhuang for taking care of this.
You need to log in before you can comment on or make changes to this bug.