Open Bug 1102812 Opened 10 years ago Updated 2 years ago

[NetworkStats] Improve the accuracy of network statistics

Categories

(Core :: Networking, defect, P5)

ARM
Gonk (Firefox OS)
defect

Tracking

()

People

(Reporter: ethan, Unassigned)

References

Details

(Whiteboard: [necko-would-take])

Attachments

(2 files)

This issue was reported and being worked around in bug 1080473 (Noticeable difference
between by application breakdown totals and the total displayed in chart and widget).

AFAIK, the sum of traffic amount of every app is much lower than the traffic
accounted on network interfaces.
And the workaround in bug 1080473 is to calculate and store the difference as
"residualTraffic" and count it in the system app.

This bug is aimed to fix the real problem in platform.

Traffic of per-app and per-interface is accounted by NetworkStatsService using 
different data sources.
We believe some traffic is missed by current the way we account for applications.
Assignee: nobody → ettseng
IIRC we don't account for 1) DNS, 2) TLS overhead (we only count uncompressed bytes), and TCP handshakes.  IIUC the TLS ought to be the biggest factor of the three (and maybe there are others I'm missing).

Getting precise data from the TLS layer is hard, IIRC.  Our plan (which maybe we didn't do, or didn't do correctly) was to use some simple multipler for TLS connections (ex: count them 1.3 times, or whatever) to account for TLS overhead.  

It looks like we need to look more closely at what's going on, so we're not just guessing here.
Also note that we can't map apps to sockets most of the time, because apps can share a single connection to the same host (since we reuse HTTP connections). This is especially likely for sites like Google web metrics, ad servers, etc.
Can we do it on the layer under TLS?
NSS normally owns the socket when using TLS. But NSS allows custom replacements of its I/O functions (via PR_CreateIOLayerStub()) so it would technically indeed be possible to have a set that counts the exact number of bytes sent and received even with TLS. Unless I miss something of course.,
From the Portland meeting with the Taipei folks, it sounds like we need to try to reproduce the "off by 200%" statistics they saw and figure out what's going on there.

Vincent/Ethan: I forget who saw these numbers on their phone.  Can we have any steps to reproduce?
Flags: needinfo?(vchang)
Flags: needinfo?(ettseng)
(In reply to Jason Duell [:jduell] (needinfo? me for lower latency) from comment #5)
> From the Portland meeting with the Taipei folks, it sounds like we need to
> try to reproduce the "off by 200%" statistics they saw and figure out what's
> going on there.
> Vincent/Ethan: I forget who saw these numbers on their phone.  Can we have
> any steps to reproduce?

This bug was first reported in bug 1080473 and there were some STRs in that bug, such as:
https://bugzilla.mozilla.org/show_bug.cgi?id=1080473#c0

And what I did to reproduce it was simply to browse Internet via mobile network (without Wi-Fi)
for a period of time. Then open the Usage app to observe the network statistics it reports.

I'll attach two screenshots for example.
Flags: needinfo?(ettseng)
Attached image usageapp_summary.png
For example, these two screenshots were taken today on Flame v2.2.

Mobile usage: 9.98 MB (total traffic accounted on the mobile interface)
System app usage in summary page: 7.66 MB ("artificial" traffic Gaia made for system app)
System app usage in detail page:  4.43 MB (traffic accounted for system app)

In this case, the difference between interface and per-app-summation is (7.66 - 4.43) = 3.23 MB.
The missing traffic is 73% of the one of system app.

I think the proportion is not fixed. The point is the difference is significant.
Does it make sense to count all traffic flowed through sockets (actual socket, not SocketTransport) created by HttpConnections, and compare the value with the amount of http traffic we measured from HttpChannel, to see whether the difference between them is large or the traffic is introduced by other source that we are not counting yet?
Do you mean that there might be some network traffic coming from different channels we are not counting yet or certain android native processes not controllable by gecko?
Flags: needinfo?(vchang)
I think it might be easier if we know: a) how much the difference is between the value we count at channel side and that we count at socket side, and b) is there any traffic we didn't count (say, from other channel) so that even counting from at socket side cannot make the number match that we got from interface.
Flags: needinfo?(jduell.mcbugs)
> Mobile usage: 9.98 MB (total traffic accounted on the mobile interface)
> System app usage in summary page: 7.66 MB ("artificial" traffic Gaia made for system app)
> System app usage in detail page:  4.43 MB (traffic accounted for system app)

What about non-System app usage?  Does it add up to 9.98 - 7.66 = 2.3 MB?  Does the detail vs summary numbers show the same difference that System usage does?

> might be some network traffic coming from different channels we are not counting yet
> or certain android native processes not controllable by gecko?

It could be either.  A good next step might be to try the phone on wifi, and listen to traffic on the same wifi network with a laptop/desktop that runs Wireshark, and use that to see where the traffic is going from the phone, and maybe we can figure out from that which sockets are not getting counted.
Flags: needinfo?(jduell.mcbugs)
(In reply to Jason Duell [:jduell] (needinfo? me for lower latency) from comment #13)
> > Mobile usage: 9.98 MB (total traffic accounted on the mobile interface)
> > System app usage in summary page: 7.66 MB ("artificial" traffic Gaia made for system app)
> > System app usage in detail page:  4.43 MB (traffic accounted for system app)
> What about non-System app usage?  Does it add up to 9.98 - 7.66 = 2.3 MB? 
Yes.
Scrolling down the page we can see the usage of each app, which is not shown in the screenshot.
And their summation is 2.3 MB.

> Does the detail vs summary numbers show the same difference that System
> usage does?
No.
Only the system app differs in detail vs summary.
For other apps, detail would be commensurate with summary.
I think I remember somebody saying that all extra traffic (that we do not know where it is coming from) is added to system app statistic, therefore, only system app differs. The extra traffic can come from any app.
(In reply to Dragana Damjanovic from comment #15)
> I think I remember somebody saying that all extra traffic (that we do not
> know where it is coming from) is added to system app statistic, therefore,
> only system app differs. The extra traffic can come from any app.

Yes, that's what I said.

BTW, the implementation of retrieving statistics of network interface is in NetworkService.js.
http://dxr.mozilla.org/mozilla-central/source/dom/system/gonk/NetworkService.js#125
The data source is the /proc/net/dev file.
Indeed, there is a difference between the traffic counted for the whole device and the sum of the traffics for each application. The difference ends in the System App but this has not been implemented yet in the detailed view (see 1089580), only for the summary.
See Also: → 1083680
Bug 1083680 was opened from Gaia's perspective for the same purpose of this bug.
Status: NEW → ASSIGNED
Whiteboard: [necko-would-take]
Set assignee to default since I am not actively working on this bug.
Assignee: ettseng → nobody
Status: ASSIGNED → NEW
Bulk change to priority: https://bugzilla.mozilla.org/show_bug.cgi?id=1399258
Priority: -- → P5
Severity: normal → S3
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: