Closed
Bug 1091240
Opened 10 years ago
Closed 10 years ago
network-activity statusbar widget uses 20-33% CPU in WebRTC calls, adds huge networking delay
Categories
(Firefox OS Graveyard :: Gaia::System, defect, P1)
Tracking
(blocking-b2g:2.1+)
People
(Reporter: jesup, Assigned: daleharvey)
References
(Depends on 1 open bug)
Details
(Whiteboard: [systemsfe])
Attachments
(5 files)
[Blocking Requested - why for this release]:
Makes webrtc calls almost unusable, especially for more than 10 minutes
We may well want this on 2.0 as well if that's possible.
+++ This bug was initially created as a clone of Bug #1087605 +++
Using a 2.1 build with H.264 enabled:
user_pref("media.peerconnection.video.h264_enabled", true);
Go to http://mozilla.github.io/werbrtc-landing/pc_test.html
Select "one-way call" and "Require h.264"
Select "Start", select front (or rear) camera
Use "adb shell top -m 10 -d 2 -t" to watch thread CPU use
Watch how b2g/b2g uses 22-33% of the CPU (for what?)
Logs with NSPR_LOG_MODULES=signaling:5,timestamp will show that the rtt will be 300-600ms to start, and goes up from there. (3 seconds + after 20 minutes)
With the attached patch, b2g uses <3% cpu, rtt drops to <20ms typically (often <10), delay doesn't build up.
Previous profiling has shown that huge amounts of CPU are being caused by this, and updateIconVisibility() is causing reflows/relayout N times per second, which is where a lot of the CPU is going (or so it appears). The patch verifies this is the cause.
Updated•10 years ago
|
blocking-b2g: 2.1? → 2.1+
Comment 1•10 years ago
|
||
I thought we fixed this issue already?
Flags: needinfo?(mhenretty)
Flags: needinfo?(bgirard)
Reporter | ||
Comment 2•10 years ago
|
||
(In reply to Gregor Wagner [:gwagner] from comment #1)
> I thought we fixed this issue already?
I hope so (and if so great!) I did see significant CPU use for b2g/b2g in reports from sotaro as well. I'm updating all my trees and rebuilding (due to some annoying ISP arguments with netflix, I'm limited to a few K/s to certain sites including most of what we use for repo sync).
Sotaro's report from B2G 2.1 in bug 1087605:
https://bug1087605.bugzilla.mozilla.org/attachment.cgi?id=8512231
He shows ~12% CPU in b2g/b2g, which is still way too high, though better than the 25% I was seeing.
Reporter | ||
Comment 3•10 years ago
|
||
Also, the reason I knew it was the network-activity status was that a previous profile from Tef or from QC had pointed at it.
Comment 4•10 years ago
|
||
I made a slightly different change on system (just commented out the
// window.addEventListener('moznetworkupload', this);
// window.addEventListener('moznetworkdownload', this);
lines in statusbar.js. Uploaded the new system to a couple of phones (Flame and FireE) and then placed a Loop call between them using the Hello app.
.
There's a *huge* improvement both in the observable video quality, and on the initial setup time of the calls (time between the call is 'connected' and the video appears.
Comment 5•10 years ago
|
||
I tried Firefox Hello's SimpleWebRTC use case on latest v2.1 flame device, by disabling "network-activity" indicator, total cpu usage was decreased from 98% to 81%.
- total cpu usage 98% -> 81%
- b2g main thread 12% -> 4%
- b2g Compositor thread 12%->6%
Comment 6•10 years ago
|
||
(In reply to Sotaro Ikeda [:sotaro] from comment #5)
> I tried Firefox Hello's SimpleWebRTC use case on latest v2.1 flame device,
Firefox Hello app is the following.
https://github.com/borjasalguero/firefox-loop-reference
Comment 7•10 years ago
|
||
Got top log on Firefox Hello's SimpleWebRTC use case with "adb shell top -m 20 -t -d 3" on latest v2.1 flame with Bug 1087605 fix.
Comment 8•10 years ago
|
||
Got top log on Firefox Hello's SimpleWebRTC use case by disabling "network activitiy" indicator and with "adb shell top -m 20 -t -d 3" on latest v2.1 flame with Bug 1087605 fix.
attachment 8513827 [details] [diff] [review] is used to disable "network activitiy" indicator.
Comment 9•10 years ago
|
||
(In reply to Sotaro Ikeda [:sotaro] from comment #6)
> (In reply to Sotaro Ikeda [:sotaro] from comment #5)
> > I tried Firefox Hello's SimpleWebRTC use case on latest v2.1 flame device,
>
> Firefox Hello app is the following.
> https://github.com/borjasalguero/firefox-loop-reference
You might want to try with the Firefox Hello app at https://github.com/mozilla-b2g/firefoxos-loop-client. Thanks!
Comment 10•10 years ago
|
||
Thanks for the correction :-)
Comment 11•10 years ago
|
||
This is the top log taken from a FireE running 2.0, and the latest version of the loop app (using a Flame as the other device). There are 4 snapshots:
* One disabling completely the widget (removed the listeners).
* One setting network.activity.blipIntervalMilliseconds" to 2000 ms;
* One setting network.activity.blipIntervalMilliseconds" to 1000 ms;
* One setting network.activity.blipIntervalMilliseconds" to 250 ms (default);
If the widget is disabled, the compositor is at 11%, and b2g on the parent is at only 2%. All the excess CPU is taken by MediaStreamGrph and AudioGUM
If the widget is enabled, at the default rate, the compositor is at 18%, and b2g is at 11%. Changing the time makes a small difference in both b2g and the compositor (goes down to 16%/8%). Probably using even bigger intervals would make b2g go down further, but I don't know about the compositor (it doesn't seem to like rendering the little squares from the widget).
Reporter | ||
Comment 12•10 years ago
|
||
Also: When looking at different solutions for the network activity indicator
enable signaling:5 logs and note the rtt
Keeping the mainthread of the parent busy causes problems for IPC networking traffic
This would be very bad: (before nuking it after 20 min)
I/PRLog ( 1364): 9381056[b3a03600]: [Socket Thread|WebrtcOMXH264VideoCodec] WebrtcOMXH264VideoCodec.cpp:1042: WebrtcOMXH264VideoEncoder:0xb2b6d5c0 set channel packet loss:0, rtt:3620
This (with the fix) is very good: (after nuking it after 2 hours)
I/PRLog ( 1408): 2014-10-29 21:04:51.656182 UTC - 32555200[b3391700]: [Socket Thread|WebrtcOMXH264VideoCodec] WebrtcOMXH264VideoCodec.cpp:1042: WebrtcOMXH264VideoEncoder:0xb2bacef0 set channel packet loss:0, rtt:8
Reporter | ||
Updated•10 years ago
|
Severity: normal → critical
Reporter | ||
Comment 13•10 years ago
|
||
(In reply to Antonio Manuel Amaya Calvo (:amac) from comment #11)
> This is the top log taken from a FireE running 2.0, and the latest version
> of the loop app (using a Flame as the other device). There are 4 snapshots:
Is this with the status visible or invisible? Previous comments were that the problem was be there even with the statusbar hidden.
Flags: needinfo?(amac.bug)
Comment 15•10 years ago
|
||
The following is b2g main thread's profile during WebRTC.
http://people.mozilla.org/~bgirard/cleopatra/#report=709147a1e2f4774c05ebcfcd22808dc29105b8c7
Comment 16•10 years ago
|
||
We improved the situation in 2.1 by uplifting bug 1074683. But there are more things we can do if network activity icon changes are still causing poor webRTC quality.
2.0, the statusbar code is significantly different. I don't think the network activity icon is causing as many problems there, but can someone confirm?
Flags: needinfo?(mhenretty)
Comment 17•10 years ago
|
||
Let's have Alberto investigate some ways for improving webRTC performance with relation to the network activity statusbar icon.
Assignee: anygregor → apastor
Comment 18•10 years ago
|
||
(In reply to Michael Henretty [:mhenretty] from comment #16)
> We improved the situation in 2.1 by uplifting bug 1074683. But there are
> more things we can do if network activity icon changes are still causing
> poor webRTC quality.
>
> 2.0, the statusbar code is significantly different. I don't think the
> network activity icon is causing as many problems there, but can someone
> confirm?
The measurements :amac provided in comment 11 are based on 2.0, so it seems it's also causing a lot of problems in 2.0.
Comment 19•10 years ago
|
||
performance of animated PNG seems to have a problem. BenWa and jrmuizel are looking into it. I am looking CairoImage, that is allocating gralloc buffer for every rendering.
Comment 20•10 years ago
|
||
I confirmed that during Animated PNG rendering, new CairoImage is allocated, copy image data to it and destroyed for every rendering.
Comment 21•10 years ago
|
||
(In reply to Michael Henretty [:mhenretty] from comment #17)
> Let's have Alberto investigate some ways for improving webRTC performance
> with relation to the network activity statusbar icon.
The problem doesn't seem to be webRTC related. I tried just downloading some data on the background (using here maps). I launched the download and then pressed the home button so the only thing that was being animated on the homescreen was the network indicator. And I got a 12% (varies from 10% to 15%) of CPU for the b2g thread on the main process, and 8-10% on the compositor thread. When there's no network indicator, that goes to around 0% for each of them (I'm guessing since they don't even appear on the top).
Comment 22•10 years ago
|
||
Oh, I forgot. As :dcoloma said, all the tests I've done are on 2.0 (some on a FireE, some on a Flame, as indicated).
Reporter | ||
Comment 23•10 years ago
|
||
(In reply to Antonio Manuel Amaya Calvo (:amac) from comment #21)
> The problem doesn't seem to be webRTC related. I tried just downloading
> some data on the background (using here maps).
This doesn't surprise me at all, and on larger pages, pdf's, etc I can easily believe the CPU use (and possible latency increases) might negatively impact loadtimes/performance and IPC traffic to content.
NI bent for his knowledge after discussing this on IRC.
Flags: needinfo?(bent.mozilla)
Updated•10 years ago
|
Whiteboard: [systemsfe]
Comment 24•10 years ago
|
||
(In reply to Daniel Coloma:dcoloma from comment #18)
> (In reply to Michael Henretty [:mhenretty] from comment #16)
> > We improved the situation in 2.1 by uplifting bug 1074683. But there are
> > more things we can do if network activity icon changes are still causing
> > poor webRTC quality.
> >
> > 2.0, the statusbar code is significantly different. I don't think the
> > network activity icon is causing as many problems there, but can someone
> > confirm?
>
> The measurements :amac provided in comment 11 are based on 2.0, so it seems
> it's also causing a lot of problems in 2.0.
I see. I was responding to Sotaro's findings in comment 5. In 2.0, we weren't doing nearly as much work in JS on network activity updates. Basically we would just display an animated png for half a second [1]. The problems you are seeing in 2.0 is probably related to Sotaro's comment 20, bug 1091777.
1.) https://github.com/mozilla-b2g/gaia/blob/5a3bc7c4d873147ea660c5e986ae7df08bb3521d/apps/system/js/statusbar.js#L729
Not sure what is being requested of me here...
Flags: needinfo?(bent.mozilla)
Comment 26•10 years ago
|
||
Comment 11 mentions profiling this on 2.0, are we considering blocking on 2.0 here?
Flags: needinfo?(bbajaj)
Comment 27•10 years ago
|
||
Assigning over to :daleharvey to profile with css transitions rather than animated pngs, since Alberto has a couple of blockers already.
Assignee: apastor → dale
Updated•10 years ago
|
Target Milestone: --- → 2.1 S8 (7Nov)
Assignee | ||
Comment 28•10 years ago
|
||
Making the CSS sprite isnt particularly easy, but I should get it done soon, there is an r+'d patch @ https://bugzilla.mozilla.org/show_bug.cgi?id=1091775 which drops the CPU usage by a lot, given the comments I expect that will be good enough (using css wont be free either) but I will see exactly what the numbers are before I close this
Assignee | ||
Comment 29•10 years ago
|
||
So as far I can tell using a css transition actually takes up more cpu than the patched lower framerate apng, the patch is here, https://github.com/daleharvey/gaia/commit/4edc14abfa7ecb2d36b33f7cf5aced89a9935914.patch, will test again in the morning to make sure, but I think this will be closed dupe
Comment 30•10 years ago
|
||
The animation you have is running at 60 FPS which would consume more resources.
You need to make the animation 5 FPS just like the gif. You need to force discrete steps that happen every 200ms using keyframes. The FPS counter (from the developer menu) should be displaying 5 FPS if the right affect is achieve.
In this case we should be able to implement the exact same effect as the gif but bypass the main thread thus saving CPU cycles.
Assignee | ||
Comment 31•10 years ago
|
||
Ah, I hoped steps would do that, will try again
Comment 32•10 years ago
|
||
Ohh, I've never seen that before. Maybe it does. Check with the FPS counters. If nothing else is changing the counter on the left should be your OMTA target rate where the middle counter should be much lower (about 1 FPS).
Assignee | ||
Comment 33•10 years ago
|
||
ok, confirmed using steps() does run the animation at 5fps (updated the patch with the missing </div> sorry)
CSS transitions - https://gist.github.com/daleharvey/09181f47af368bae05c4
Current master - https://gist.github.com/daleharvey/c9eed5f59c98b0309ee4
The CSS transitions are settling on a lower bound of 6% CPU for the b2g process with current master settling on around 4%, the Compositor stays between 0-1% almost entirely for both but on average lower for master. So closing this as a fixed duplicate.
I did test against master with the previous apng and while its awesome that we got such an improvement, its also hugely worrying that an indicator that is very often visible could be using 25 - 30% of CPU (during page loads which arent cheap themselves) and by the looks of it this bug has been around for a long time without being caught until someone happened to be testing webrtc
While every functional team needs to make the best effort to writing code that efficiently uses resources, it seems like this is the type of issue that is going to slip though and needs a systematic process to ensure it doesnt happen again, I believe there is no longer a "performance team" anymore, how are these problems going to be addressed?
Status: NEW → RESOLVED
Closed: 10 years ago
Flags: needinfo?(bbajaj) → needinfo?(anygregor)
Resolution: --- → DUPLICATE
Comment 34•10 years ago
|
||
(In reply to Dale Harvey (:daleharvey) from comment #33)
> While every functional team needs to make the best effort to writing code
> that efficiently uses resources, it seems like this is the type of issue
> that is going to slip though and needs a systematic process to ensure it
> doesnt happen again, I believe there is no longer a "performance team"
> anymore, how are these problems going to be addressed?
>
Good question but I don't have a good answer here. We usually find the biggest regressions when the phone starts heating up in your pocket or the battery drains faster than you can charge it. But we don't have a good way to test for it now.
I hope with our battery initiative we get this test coverage basically for free.
Flags: needinfo?(anygregor)
You need to log in
before you can comment on or make changes to this bug.
Description
•