Closed Bug 1209654 Opened 6 years ago Closed 6 years ago

[Cost Control] Data alert doesn't match reported usage

Categories

(Firefox OS Graveyard :: Gaia::Cost Control, defect)

ARM
Gonk (Firefox OS)
defect
Not set
normal

Tracking

(blocking-b2g:2.5+, firefox45 fixed, b2g-v2.2 affected, b2g-v2.5 affected, b2g-master affected)

RESOLVED FIXED
2.6 S1 - 11/20
blocking-b2g 2.5+
Tracking Status
firefox45 --- fixed
b2g-v2.2 --- affected
b2g-v2.5 --- affected
b2g-master --- affected

People

(Reporter: rillian, Assigned: timhuang)

References

Details

(Keywords: foxfood)

Attachments

(4 files, 3 obsolete files)

Attached image notification screenshot
Aries device running master branch from last week showed a data usage alert, saying I'd used the 4.50 GB I set the alarm for. Data limit on the status screen says, "Data Limit (4.50 GB) 4.29 GB available". So...which is it?

The usage app says:

Mobile usage 212.75 MB
Wi-Fi usage 1.06 GB

Which matches the data limit, but not the alert. My provider reports 201.6 MB in the current billing period, so it looks like it's the alert that's incorrect.
Can QA reproduce?
blocking-b2g: --- → 2.5?
Keywords: qawanted
QA Whiteboard: [QAnalyst-Triage+]
Keywords: smoketest
Whiteboard: [dogfood-blocker]
Hi, Could that be related with this bug in comment #19?:
https://bugzilla.mozilla.org/show_bug.cgi?id=1179512#c19

If the limit was reached and the notification is not removed, it will persit in the utility tray and show a previous old state not the current one.
Attached image screenshots of issue
Yes, QA was able to reproduce this issue on Aries 2.5, Flame 2.6 central, and Flame 2.2.

I set Data limit to 4.5GB, made it track both Mobile usage and Wi-Fi usage. I then started browsing the web and streaming youtube via Data, and at around 210MB Mobile usage, I got a notification saying I had reached the limit.

Repro rate is 3/3 on affected branches. It takes quite a while (~2 hours) to reach ~210MB data usage so I wasn't able to try more attempts.

Attaching screenshots and logcats.

Bug reproduces on:

Device: Aries 2.5
BuildID: 20151030120435
Gaia: 91cac94948094cfdcd00cba5c6483e27e80cb3b0
Gecko: c2534acb485963331d67bbc5c07f0d862ed56bf5
Gonk: a19052e4389c3ae2d8fc3e7a74a475401baacc56
Version: 45.0a1 
Firmware Version: D5803_23.1.A.1.28_NCB.ftf
User Agent: Mozilla/5.0 (Mobile; rv:45.0) Gecko/45.0 Firefox/45.0

Device: Flame 2.6 master
BuildID: 20151102030250
Gaia: bfcf8e9bfa7ba264c5f8bc865ce8a435f9b134cb
Gecko: 451a185791433bce1a6a894c27f3da60a3119431
Gonk: 205ac4204bbbb2098a8046444acba551ba5dc75a
Version: 45.0a1
Firmware Version: v18Dv4
User Agent: Mozilla/5.0 (Mobile; rv:45.0) Gecko/45.0 Firefox/45.0

Device: Flame 2.2
BuildID: 20151102032501
Gaia: 885647d92208fb67574ced44004ab2f29d23cb45
Gecko: b8b7f4efaa6e
Gonk: bd9cb3af2a0354577a6903917bc826489050b40d
Version: 37.0 (2.2) 
Firmware Version: v18Dv4
User Agent: Mozilla/5.0 (Mobile; rv:37.0) Gecko/37.0 Firefox/37.0

Note:
Comment 2 scenario does NOT apply to this bug.
QA Whiteboard: [QAnalyst-Triage+] → [QAnalyst-Triage?]
Flags: needinfo?(mshuman)
Flags: needinfo?(ktucker)
Flags: needinfo?(jmercado)
Keywords: qawanted
Flags: needinfo?(mshuman)
Flags: needinfo?(jmercado)
I'm a bit confused with the data limit, I thought we were triggering the notification when we consume that amount of data, no when we have that limit left. Anyway looks like a blocker to me
I think the problem is because we exceeded the maximum value of an integer: 4.5 GB > 2^32 bytes. I'm guessing that value is coded on a 32-byte-unsigned-integer.

Moreover, we don't enter GiB, but GB. So we use multiple of 10 instead of 2[1].

So if you set 4.5GB - 4GiB, you get 205032704 bytes. That is to say 205MB, which is close to the values found in comment 0 and comment 3.

[1] https://github.com/mozilla-b2g/gaia/blob/95d500dc39974b5fd2dc3c17f90a25d86c37ce92/apps/costcontrol/js/common.js#L314
(In reply to Johan Lorenzo [:jlorenzo] (QA) from comment #6)
> 32-byte-unsigned-integer.
32-bit*
ni Salva, to see if there is a quick fix for this.
Flags: needinfo?(salva)
:jlorenzo, clever assumption! Thanks for pointing in that direction.

If this is the case, then there is no a trivial solution in the client side as it is Gonk who is in charge of manage the quota (actually, it is netd [1] through iptables [2]).

I'm still checking it.

[1] https://dxr.mozilla.org/mozilla-central/source/dom/system/gonk/NetworkUtils.cpp#737
[2] https://android.googlesource.com/platform/system/netd/+/8a93272255f1b7e3083a97e1e28ddf675c0c7fb0%5E!/
Flags: needinfo?(salva)
By the way, maybe :ethan could help here.
Flags: needinfo?(ettseng)
QA Whiteboard: [QAnalyst-Triage?] → [QAnalyst-Triage+]
Flags: needinfo?(ktucker)
Comms triage: Even though this issue has been here for long, setting a data limit to 4.5GB is a real user's scenario.
blocking-b2g: 2.5? → 2.5+
After further discussion, we decided this is not a smoke test blocker.
Keywords: smoketest
Well, here is the problem:
https://dxr.mozilla.org/mozilla-central/source/dom/webidl/NetworkOptions.webidl#41

As in ARM long is 32 bits length [1]. This should be long long and it's possible that Gonk need a modification to convert this:
https://dxr.mozilla.org/mozilla-central/source/dom/system/gonk/NetworkUtils.cpp#737

Into:
`PR_snprintf(command, MAX_COMMAND_SIZE - 1, "bandwidth setinterfacealert %s %lld", GET_CHAR(mIfname), GET_FIELD(mThreshold));`

Notice `%lld` formatter.

I would make the patch myself but then you have to wait for me to setup my laptop.

[1] http://en.cppreference.com/w/cpp/language/types
Assignee: nobody → tihuang
Kinda of busy recently. Delegate Tim Huang to help on this bug.
Tim, thanks for taking this. :)
Flags: needinfo?(ettseng)
Salva's right. The 4.5G will overflow for the 32 bits ARM based CPU. And this bug could be easily resolved by changing 'long to 'long long'. 

And there is another issue here. Any data limit within the range [2.15G, 4.29G] is going to fail on addAlarm, because of 'long' is a signed value and its range is [−2147483647, +2147483647]. Any value in the range [2.15G, 4.29G] will be treated as negative value, which causes the addAlarm fails.

Fortunately, this issue can be fixed after we use 'long long' to represent data limit.
Whiteboard: [dogfood-blocker]
I found another issue after I changed the 'long' to 'long long' of the addAlarm(). The getAllAlarms() still returns overflow value if I wrote a overflow value, say 5GB. Now, I am trying to fix this.
Attached patch bug_1209654.patch (obsolete) — Splinter Review
Attachment #8685316 - Flags: review?(ettseng)
Comment on attachment 8685316 [details] [diff] [review]
bug_1209654.patch

Review of attachment 8685316 [details] [diff] [review]:
-----------------------------------------------------------------

The consumer (Cost Control app) and implementations of these interfaces are all written in JavaScript.
So simply changing the data type of interfaces resolves the problem.

My only suggestion is, why don't we just declare threshold as "unsigned long long". We don't need negative values for thresholds.
Except for this, the patch looks pretty simple and fine to me.

However, since it involves changes in IDL and WebIDL. I would like to ask review from a super reviewer.
Boris, could you help to review it as well?
Attachment #8685316 - Flags: review?(ettseng) → review?(bzbarsky)
Comment on attachment 8685316 [details] [diff] [review]
bug_1209654.patch

"unsigned long long" would make sense if the values in fact can't be negative.  If you don't trust your callers to get that right, then "long long" and checks in the implementation (or for the WebIDL "unsigned long long" and [EnforceRange]) are the way to go.

I don't think you need to change the CID, fwiw.
Attachment #8685316 - Flags: review?(bzbarsky) → review+
Attached patch bug_1209654_v2.patch (obsolete) — Splinter Review
Attachment #8685316 - Attachment is obsolete: true
Attachment #8685882 - Flags: review?(ettseng)
Comment on attachment 8685882 [details] [diff] [review]
bug_1209654_v2.patch

Review of attachment 8685882 [details] [diff] [review]:
-----------------------------------------------------------------

Thanks Tim!
Please set both Boris and me as reviewers in the commit message. :)
Attachment #8685882 - Flags: review?(ettseng) → review+
Attached patch bug_1209654_v3.patch (obsolete) — Splinter Review
The unsigned long long will cause an error of mochitest. The negative value will be converted into a huge value that causes this erroneous phenomenon. So we still use long long and leave the handling of negative value to the gecko implementation.
Attachment #8685882 - Attachment is obsolete: true
Attachment #8686540 - Flags: review?(ettseng)
Could we add some regression test just to simply test that negative and above 32bits values are always properly handled with independence of changing the implementation, please?
(In reply to Salvador de la Puente González [:salva] from comment #25)
> Could we add some regression test just to simply test that negative and
> above 32bits values are always properly handled with independence of
> changing the implementation, please?

Negative value test is already covered in current test cases.
Tim's patch adds a test for values above 2^32.
So both cases will be covered. :)
Comment on attachment 8686540 [details] [diff] [review]
bug_1209654_v3.patch

Review of attachment 8686540 [details] [diff] [review]:
-----------------------------------------------------------------

Ideally, it would be nicer to declare "unsigned long long" for threshold.
But considering the complexity to fix the mochitest failure on TreeHerder, let's use singed long long.
NetworkStatsService::setAlarm() would throw InvalidThresholdValue error if the threshold is a negative value.
Attachment #8686540 - Flags: review?(ettseng) → review+
Using [EnforceRange] didn't work?
(In reply to Boris Zbarsky [:bz] from comment #28)
> Using [EnforceRange] didn't work?

Oh, I forgot this advice. I am not sure Tim tried it or not.

Tim, if you haven't tried [EnforceRange], let's use it in WebIDL and try those tests again.
I have not tried [EnforceRange], hoping it could fix this issue.
Modify the commit only for reviewers' nickname.
Attachment #8686540 - Attachment is obsolete: true
Attachment #8686963 - Flags: review+
(In reply to Tim Huang[:timhuang] from comment #30)
> I have not tried [EnforceRange], hoping it could fix this issue.

I had tried [EnforceRange], but it's not working. Because of the MozNetworkStatsManager still uses XPIDL[1], and it does not support [EnforceRange]. So we still use signed long long here.

[1]: https://bugzilla.mozilla.org/show_bug.cgi?id=986837#c0
Keywords: checkin-needed
https://hg.mozilla.org/mozilla-central/rev/d473bd156547
Status: NEW → RESOLVED
Closed: 6 years ago
Resolution: --- → FIXED
Target Milestone: --- → 2.6 S1 - 11/20
Depends on: 1225549
The fix to this bug should also land in 2.5. Tim Huang, could you please ask for approval-gaia-v2.5?
Thanks
Flags: needinfo?(tihuang)
Sure, I will finish this job as soon as possible.
Flags: needinfo?(tihuang)
Comment on attachment 8686963 [details] [diff] [review]
bug_1209654_v3_only_modify_commit.patch

NOTE: Please see https://wiki.mozilla.org/Release_Management/B2G_Landing to better understand the B2G approval process and landings.

[Approval Request Comment]
Bug caused by (feature/regressing bug #): Bug 1209654
User impact if declined: The data alert will not match reported usage. 
Testing completed: Yes.
Risk to taking this patch (and alternatives if risky): Nope.
String or UUID changes made by this patch:
nsIDOMMozNetworkStatsManager
nsINetworkService
Attachment #8686963 - Flags: approval‑mozilla‑b2g44?
Attachment #8686963 - Flags: approval‑mozilla‑b2g44?
You need to log in before you can comment on or make changes to this bug.