Closed Bug 1091240 Opened 10 years ago Closed 10 years ago

network-activity statusbar widget uses 20-33% CPU in WebRTC calls, adds huge networking delay

Tracking

(blocking-b2g:2.1+)

Status:

RESOLVED DUPLICATE of bug 1091775

Milestone:

2.1 S8 (7Nov)

Project Flags:

blocking-b2g

2.1+

People

(Reporter: jesup, Assigned: daleharvey)

References

(Depends on 1 open bug)

Details

(Whiteboard: [systemsfe])

Attachments

(5 files)

nuke_network_activity (NOT FOR CHECKIN) 10 years ago Randell Jesup [:jesup] (needinfo me) 1.87 KB, patch		Details \| Diff \| Splinter Review
top log with Bug 1087605 fix 10 years ago Sotaro Ikeda [:sotaro] 327.94 KB, text/plain		Details
top log by disabling network_activity and with Bug 1087605 fix 10 years ago Sotaro Ikeda [:sotaro] 805.14 KB, text/plain		Details
Top log with 2.0 10 years ago Antonio Manuel Amaya Calvo (:amac) 8.22 KB, text/plain		Details
Call stack of CairoImage constructor and destructor during animated png rendering 10 years ago Sotaro Ikeda [:sotaro] 7.62 KB, text/plain		Details

Randell Jesup [:jesup] (needinfo me)

Reporter

Description

•

10 years ago

Attached patch nuke_network_activity (NOT FOR CHECKIN) — Details — Splinter Review

[Blocking Requested - why for this release]: Makes webrtc calls almost unusable, especially for more than 10 minutes We may well want this on 2.0 as well if that's possible. +++ This bug was initially created as a clone of Bug #1087605 +++ Using a 2.1 build with H.264 enabled: user_pref("media.peerconnection.video.h264_enabled", true); Go to http://mozilla.github.io/werbrtc-landing/pc_test.html Select "one-way call" and "Require h.264" Select "Start", select front (or rear) camera Use "adb shell top -m 10 -d 2 -t" to watch thread CPU use Watch how b2g/b2g uses 22-33% of the CPU (for what?) Logs with NSPR_LOG_MODULES=signaling:5,timestamp will show that the rtt will be 300-600ms to start, and goes up from there. (3 seconds + after 20 minutes) With the attached patch, b2g uses <3% cpu, rtt drops to <20ms typically (often <10), delay doesn't build up. Previous profiling has shown that huge amounts of CPU are being caused by this, and updateIconVisibility() is causing reflows/relayout N times per second, which is where a lot of the CPU is going (or so it appears). The patch verifies this is the cause.

bhavana bajaj [:bajaj]

Updated

•

10 years ago

blocking-b2g: 2.1? → 2.1+

Gregor Wagner [:gwagner]

Comment 1

•

10 years ago

I thought we fixed this issue already?

Flags: needinfo?(mhenretty)

Flags: needinfo?(bgirard)

Randell Jesup [:jesup] (needinfo me)

Reporter

Comment 2

•

10 years ago

(In reply to Gregor Wagner [:gwagner] from comment #1) > I thought we fixed this issue already? I hope so (and if so great!) I did see significant CPU use for b2g/b2g in reports from sotaro as well. I'm updating all my trees and rebuilding (due to some annoying ISP arguments with netflix, I'm limited to a few K/s to certain sites including most of what we use for repo sync). Sotaro's report from B2G 2.1 in bug 1087605: https://bug1087605.bugzilla.mozilla.org/attachment.cgi?id=8512231 He shows ~12% CPU in b2g/b2g, which is still way too high, though better than the 25% I was seeing.

Randell Jesup [:jesup] (needinfo me)

Reporter

Comment 3

•

10 years ago

Also, the reason I knew it was the network-activity status was that a previous profile from Tef or from QC had pointed at it.

Antonio Manuel Amaya Calvo (:amac)

Comment 4

•

10 years ago

I made a slightly different change on system (just commented out the // window.addEventListener('moznetworkupload', this); // window.addEventListener('moznetworkdownload', this); lines in statusbar.js. Uploaded the new system to a couple of phones (Flame and FireE) and then placed a Loop call between them using the Hello app. . There's a *huge* improvement both in the observable video quality, and on the initial setup time of the calls (time between the call is 'connected' and the video appears.

Sotaro Ikeda [:sotaro]

Comment 5

•

10 years ago

I tried Firefox Hello's SimpleWebRTC use case on latest v2.1 flame device, by disabling "network-activity" indicator, total cpu usage was decreased from 98% to 81%. - total cpu usage 98% -> 81% - b2g main thread 12% -> 4% - b2g Compositor thread 12%->6%

Sotaro Ikeda [:sotaro]

Comment 6

•

10 years ago

(In reply to Sotaro Ikeda [:sotaro] from comment #5) > I tried Firefox Hello's SimpleWebRTC use case on latest v2.1 flame device, Firefox Hello app is the following. https://github.com/borjasalguero/firefox-loop-reference

Sotaro Ikeda [:sotaro]

Comment 7

•

10 years ago

Attached file top log with Bug 1087605 fix — Details

Got top log on Firefox Hello's SimpleWebRTC use case with "adb shell top -m 20 -t -d 3" on latest v2.1 flame with Bug 1087605 fix.

Sotaro Ikeda [:sotaro]

Comment 8

•

10 years ago

Attached file top log by disabling network_activity and with Bug 1087605 fix — Details

Got top log on Firefox Hello's SimpleWebRTC use case by disabling "network activitiy" indicator and with "adb shell top -m 20 -t -d 3" on latest v2.1 flame with Bug 1087605 fix. attachment 8513827 [details] [diff] [review] is used to disable "network activitiy" indicator.

José Antonio Olivera Ortega [:jaoo]

Comment 9

•

10 years ago

(In reply to Sotaro Ikeda [:sotaro] from comment #6) > (In reply to Sotaro Ikeda [:sotaro] from comment #5) > > I tried Firefox Hello's SimpleWebRTC use case on latest v2.1 flame device, > > Firefox Hello app is the following. > https://github.com/borjasalguero/firefox-loop-reference You might want to try with the Firefox Hello app at https://github.com/mozilla-b2g/firefoxos-loop-client. Thanks!

Sotaro Ikeda [:sotaro]

Comment 10

•

10 years ago

Thanks for the correction :-)

Antonio Manuel Amaya Calvo (:amac)

Comment 11

•

10 years ago

Attached file Top log with 2.0 — Details

This is the top log taken from a FireE running 2.0, and the latest version of the loop app (using a Flame as the other device). There are 4 snapshots: * One disabling completely the widget (removed the listeners). * One setting network.activity.blipIntervalMilliseconds" to 2000 ms; * One setting network.activity.blipIntervalMilliseconds" to 1000 ms; * One setting network.activity.blipIntervalMilliseconds" to 250 ms (default); If the widget is disabled, the compositor is at 11%, and b2g on the parent is at only 2%. All the excess CPU is taken by MediaStreamGrph and AudioGUM If the widget is enabled, at the default rate, the compositor is at 18%, and b2g is at 11%. Changing the time makes a small difference in both b2g and the compositor (goes down to 16%/8%). Probably using even bigger intervals would make b2g go down further, but I don't know about the compositor (it doesn't seem to like rendering the little squares from the widget).

Randell Jesup [:jesup] (needinfo me)

Reporter

Comment 12

•

10 years ago

Also: When looking at different solutions for the network activity indicator enable signaling:5 logs and note the rtt Keeping the mainthread of the parent busy causes problems for IPC networking traffic This would be very bad: (before nuking it after 20 min) I/PRLog ( 1364): 9381056[b3a03600]: [Socket Thread|WebrtcOMXH264VideoCodec] WebrtcOMXH264VideoCodec.cpp:1042: WebrtcOMXH264VideoEncoder:0xb2b6d5c0 set channel packet loss:0, rtt:3620 This (with the fix) is very good: (after nuking it after 2 hours) I/PRLog ( 1408): 2014-10-29 21:04:51.656182 UTC - 32555200[b3391700]: [Socket Thread|WebrtcOMXH264VideoCodec] WebrtcOMXH264VideoCodec.cpp:1042: WebrtcOMXH264VideoEncoder:0xb2bacef0 set channel packet loss:0, rtt:8

Randell Jesup [:jesup] (needinfo me)

Reporter

Updated

•

10 years ago

Severity: normal → critical

Randell Jesup [:jesup] (needinfo me)

Reporter

Comment 13

•

10 years ago

(In reply to Antonio Manuel Amaya Calvo (:amac) from comment #11) > This is the top log taken from a FireE running 2.0, and the latest version > of the loop app (using a Flame as the other device). There are 4 snapshots: Is this with the status visible or invisible? Previous comments were that the problem was be there even with the statusbar hidden.

Flags: needinfo?(amac.bug)

Antonio Manuel Amaya Calvo (:amac)

Comment 14

•

10 years ago

That was with status bar visible (all of them).

Flags: needinfo?(amac.bug)

Sotaro Ikeda [:sotaro]

Comment 15

•

10 years ago

The following is b2g main thread's profile during WebRTC. http://people.mozilla.org/~bgirard/cleopatra/#report=709147a1e2f4774c05ebcfcd22808dc29105b8c7

Michael Henretty [:mikehenrty][:mhenretty]

Comment 16

•

10 years ago

We improved the situation in 2.1 by uplifting bug 1074683. But there are more things we can do if network activity icon changes are still causing poor webRTC quality. 2.0, the statusbar code is significantly different. I don't think the network activity icon is causing as many problems there, but can someone confirm?

Flags: needinfo?(mhenretty)

Michael Henretty [:mikehenrty][:mhenretty]

Comment 17

•

10 years ago

Let's have Alberto investigate some ways for improving webRTC performance with relation to the network activity statusbar icon.

Assignee: anygregor → apastor

Daniel Coloma:dcoloma

Comment 18

•

10 years ago

(In reply to Michael Henretty [:mhenretty] from comment #16) > We improved the situation in 2.1 by uplifting bug 1074683. But there are > more things we can do if network activity icon changes are still causing > poor webRTC quality. > > 2.0, the statusbar code is significantly different. I don't think the > network activity icon is causing as many problems there, but can someone > confirm? The measurements :amac provided in comment 11 are based on 2.0, so it seems it's also causing a lot of problems in 2.0.

Sotaro Ikeda [:sotaro]

Comment 19

•

10 years ago

performance of animated PNG seems to have a problem. BenWa and jrmuizel are looking into it. I am looking CairoImage, that is allocating gralloc buffer for every rendering.

Sotaro Ikeda [:sotaro]

Comment 20

•

10 years ago

Attached file Call stack of CairoImage constructor and destructor during animated png rendering — Details

I confirmed that during Animated PNG rendering, new CairoImage is allocated, copy image data to it and destroyed for every rendering.

Antonio Manuel Amaya Calvo (:amac)

Comment 21

•

10 years ago

(In reply to Michael Henretty [:mhenretty] from comment #17) > Let's have Alberto investigate some ways for improving webRTC performance > with relation to the network activity statusbar icon. The problem doesn't seem to be webRTC related. I tried just downloading some data on the background (using here maps). I launched the download and then pressed the home button so the only thing that was being animated on the homescreen was the network indicator. And I got a 12% (varies from 10% to 15%) of CPU for the b2g thread on the main process, and 8-10% on the compositor thread. When there's no network indicator, that goes to around 0% for each of them (I'm guessing since they don't even appear on the top).

Antonio Manuel Amaya Calvo (:amac)

Comment 22

•

10 years ago

Oh, I forgot. As :dcoloma said, all the tests I've done are on 2.0 (some on a FireE, some on a Flame, as indicated).

Randell Jesup [:jesup] (needinfo me)

Reporter

Comment 23

•

10 years ago

(In reply to Antonio Manuel Amaya Calvo (:amac) from comment #21) > The problem doesn't seem to be webRTC related. I tried just downloading > some data on the background (using here maps). This doesn't surprise me at all, and on larger pages, pdf's, etc I can easily believe the CPU use (and possible latency increases) might negatively impact loadtimes/performance and IPC traffic to content. NI bent for his knowledge after discussing this on IRC.

Flags: needinfo?(bent.mozilla)

Benoit Girard (:BenWa)

Updated

•

10 years ago

Depends on: 1091763

Flags: needinfo?(bgirard)

Benoit Girard (:BenWa)

Updated

•

10 years ago

Depends on: 1091775

Sotaro Ikeda [:sotaro]

Updated

•

10 years ago

Depends on: 1091777

Benoit Girard (:BenWa)

Updated

•

10 years ago

Depends on: 1091782

Gregor Wagner [:gwagner]

Updated

•

10 years ago

Whiteboard: [systemsfe]

Michael Henretty [:mikehenrty][:mhenretty]

Comment 24

•

10 years ago

(In reply to Daniel Coloma:dcoloma from comment #18) > (In reply to Michael Henretty [:mhenretty] from comment #16) > > We improved the situation in 2.1 by uplifting bug 1074683. But there are > > more things we can do if network activity icon changes are still causing > > poor webRTC quality. > > > > 2.0, the statusbar code is significantly different. I don't think the > > network activity icon is causing as many problems there, but can someone > > confirm? > > The measurements :amac provided in comment 11 are based on 2.0, so it seems > it's also causing a lot of problems in 2.0. I see. I was responding to Sotaro's findings in comment 5. In 2.0, we weren't doing nearly as much work in JS on network activity updates. Basically we would just display an animated png for half a second [1]. The problems you are seeing in 2.0 is probably related to Sotaro's comment 20, bug 1091777. 1.) https://github.com/mozilla-b2g/gaia/blob/5a3bc7c4d873147ea660c5e986ae7df08bb3521d/apps/system/js/statusbar.js#L729

Ben Turner (not reading bugmail, use the needinfo flag!)

Comment 25

•

10 years ago

Not sure what is being requested of me here...

Flags: needinfo?(bent.mozilla)

Michael Henretty [:mikehenrty][:mhenretty]

Comment 26

•

10 years ago

Comment 11 mentions profiling this on 2.0, are we considering blocking on 2.0 here?

Flags: needinfo?(bbajaj)

Michael Henretty [:mikehenrty][:mhenretty]

Comment 27

•

10 years ago

Assigning over to :daleharvey to profile with css transitions rather than animated pngs, since Alberto has a couple of blockers already.

Assignee: apastor → dale

Candice Serran (:cserran)

Updated

•

10 years ago

Target Milestone: --- → 2.1 S8 (7Nov)

Dale Harvey (:daleharvey)

Assignee

Comment 28

•

10 years ago

Making the CSS sprite isnt particularly easy, but I should get it done soon, there is an r+'d patch @ https://bugzilla.mozilla.org/show_bug.cgi?id=1091775 which drops the CPU usage by a lot, given the comments I expect that will be good enough (using css wont be free either) but I will see exactly what the numbers are before I close this

Dale Harvey (:daleharvey)

Assignee

Comment 29

•

10 years ago

So as far I can tell using a css transition actually takes up more cpu than the patched lower framerate apng, the patch is here, https://github.com/daleharvey/gaia/commit/4edc14abfa7ecb2d36b33f7cf5aced89a9935914.patch, will test again in the morning to make sure, but I think this will be closed dupe

Benoit Girard (:BenWa)

Comment 30

•

10 years ago

The animation you have is running at 60 FPS which would consume more resources. You need to make the animation 5 FPS just like the gif. You need to force discrete steps that happen every 200ms using keyframes. The FPS counter (from the developer menu) should be displaying 5 FPS if the right affect is achieve. In this case we should be able to implement the exact same effect as the gif but bypass the main thread thus saving CPU cycles.

Dale Harvey (:daleharvey)

Assignee

Comment 31

•

10 years ago

Ah, I hoped steps would do that, will try again

Benoit Girard (:BenWa)

Comment 32

•

10 years ago

Ohh, I've never seen that before. Maybe it does. Check with the FPS counters. If nothing else is changing the counter on the left should be your OMTA target rate where the middle counter should be much lower (about 1 FPS).

Dale Harvey (:daleharvey)

Assignee

Comment 33

•

10 years ago

ok, confirmed using steps() does run the animation at 5fps (updated the patch with the missing </div> sorry) CSS transitions - https://gist.github.com/daleharvey/09181f47af368bae05c4 Current master - https://gist.github.com/daleharvey/c9eed5f59c98b0309ee4 The CSS transitions are settling on a lower bound of 6% CPU for the b2g process with current master settling on around 4%, the Compositor stays between 0-1% almost entirely for both but on average lower for master. So closing this as a fixed duplicate. I did test against master with the previous apng and while its awesome that we got such an improvement, its also hugely worrying that an indicator that is very often visible could be using 25 - 30% of CPU (during page loads which arent cheap themselves) and by the looks of it this bug has been around for a long time without being caught until someone happened to be testing webrtc While every functional team needs to make the best effort to writing code that efficiently uses resources, it seems like this is the type of issue that is going to slip though and needs a systematic process to ensure it doesnt happen again, I believe there is no longer a "performance team" anymore, how are these problems going to be addressed?

Status: NEW → RESOLVED

Closed: 10 years ago

Flags: needinfo?(bbajaj) → needinfo?(anygregor)

Resolution: --- → DUPLICATE

Gregor Wagner [:gwagner]

Comment 34

•

10 years ago

(In reply to Dale Harvey (:daleharvey) from comment #33) > While every functional team needs to make the best effort to writing code > that efficiently uses resources, it seems like this is the type of issue > that is going to slip though and needs a systematic process to ensure it > doesnt happen again, I believe there is no longer a "performance team" > anymore, how are these problems going to be addressed? > Good question but I don't have a good answer here. We usually find the biggest regressions when the phone starts heating up in your pocket or the battery drains faster than you can charge it. But we don't have a good way to test for it now. I hope with our battery initiative we get this test coverage basically for free.

Flags: needinfo?(anygregor)

You need to log in before you can comment on or make changes to this bug.