Open Bug 1652516 Opened 4 years ago Updated 11 months ago

tvnow.de causes CPU churn likely due to many many setTimeouts

Categories

(Core :: DOM: Core & HTML, defect, P3)

77 Branch
defect

Tracking

()

People

(Reporter: jonas, Unassigned)

References

(Blocks 1 open bug)

Details

(Keywords: memory-footprint, perf, Whiteboard: [sitewait])

Attachments

(5 files)

User Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:77.0) Gecko/20100101 Firefox/77.0

Steps to reproduce:

Watch any TV show on tvnow.de

Actual results:

Can only watch in medium quality, this already brings CPUs to max; HD quality not possible.

Profile: https://share.firefox.dev/2WfL8kY

Expected results:

Should be able to watch in HD quality (works in Google Chrome).

Since I don't have an account on tvnow.de I'm unable to test this myself.
However, I'm going to set a component in order to involve the development team in reviewing this issue.

Andrew, could you take a look over the profile linked in the report?

Thanks!

Component: Untriaged → Audio/Video
Flags: needinfo?(continuation)
Product: Firefox → Core

I'm not familiar with profiler profiles. Maybe mconley can take a look.

Flags: needinfo?(continuation) → needinfo?(mconley)

I'm afraid the bug summary and comment 0 / STR are at a mismatch. The bug summary reports unexpectedly high CPU, whereas the comment 0 reports unexpected inability to watch HD video. These might be related, but for now, they're very much two separate things.

I'm going to try to answer the "high CPU" question.

This site is using frequent setTimeout and setInterval calls to enter JavaScript.

The page JavaScript is obfuscated, so it's difficult for me to understand what it's doing. Interestingly though, there's a frame called "debug" on the stack: https://share.firefox.dev/30kxSgi, which makes me wonder if some of the page is in some kind of debug/development mode, which might be more expensive.

It looks to me as well that this page JavaScript is causing us to restyle and repaint. So some JavaScript powered animation is occurring, which will also incur a higher CPU cost (since the animation calculations are being done on the CPU and not the GPU).

There's also a MutationObserver on the page that's running very frequently: https://share.firefox.dev/30kxSgi

The site is also slowly accumulating more and more memory, reaching almost 200MB by the end of the profile.

Ultimately, this looks like an expensive site to run. Unless the site is serving different JavaScript to Chrome, I suspect Chrome would see similar CPU usage (unless they're somehow throttling things differently).

Flags: needinfo?(mconley)

Is there anything I can do to help debug this further? Do you need any more information?

Btw you can profile this without an account; some videos seem to be free to watch. I created the profile without an account.

I've done some more comparison with Chrome:

  1. Chrome does not seem to have the memory leak (incognito mode, no extensions active, profile attached). Note that my Firefox tests were performed in incognito mode as well, but with uBlock origin active.

  2. Chrome CPU load seems to be a bit lower, but not I'm not sure if this explains the better video playback performance. Using top -a and playing a video for 60s, this are the results for Chrome:

Processes: 419 total, 4 running, 415 sleeping, 1982 threads 0:01:18
Load Avg: 2.52, 3.01, 3.33 CPU usage: 19.8% user, 8.33% sys, 72.58% idle SharedLibs: 197M resident, 42M data, 18M linkedit.
MemRegions: 90272 total, 1974M resident, 93M private, 1125M shared. PhysMem: 7693M used (1784M wired), 498M unused.
VM: 2166G vsize, 1991M framework vsize, 13771738(0) swapins, 14217551(0) swapouts. Networks: packets: 10914/15M in, 1894/212K out. Disks: 2257/47M read, 5613/63M written.

PID COMMAND %CPU TIME #TH #WQ #POR MEM PURG CMPR PGRP PPID STATE BOOSTS %CPU_ME %CPU_OTHRS UID FAULTS COW MSGSENT MSGRECV SYSBSD SYSMACH CSW
13032 top 4.6 00:03.62 1/1 0 33 3672K 0B 0B 13032 12457 running *0[0] 0.00071 0.00000 0 32417+ 106 1012094 505929+ 125679+ 748282+ 4942+
12997 Google Chrom 17.9 00:14.08 8/1 1 75 30M 16K 0B 12967 12967 running *0[0] 0.00000 0.00000 501 707757+ 2163 97607+ 63114+ 108975+ 172126+ 123956+
12984 Google Chrom 36.3 00:28.44 17 1 315 102M+ 16K 0B 12967 12967 sleeping *0[0] 0.00000 0.00000 501 825451+ 4909 278546+ 141666+ 363209+ 1174148+ 576733+
12972 Google Chrom 6.7 00:05.26 11 1 200 89M 15M 0B 12967 12967 sleeping *0[0] 0.00148 0.00000 501 60902 2550 171313 66043 160054 363879+ 219779

Firefox results:

Processes: 400 total, 3 running, 397 sleeping, 1816 threads 0:01:06
Load Avg: 3.48, 4.15, 3.82 CPU usage: 23.18% user, 13.17% sys, 63.63% idle SharedLibs: 203M resident, 43M data, 20M linkedit.
MemRegions: 71718 total, 1919M resident, 95M private, 1193M shared. PhysMem: 7526M used (1869M wired), 665M unused.
VM: 2027G vsize, 1991M framework vsize, 13831125(0) swapins, 14217551(0) swapouts. Networks: packets: 10341/14M in, 2461/234K out. Disks: 3766/123M read, 1731/32M written.

PID COMMAND %CPU TIME #TH #WQ #POR MEM PURG CMPR PGRP PPID STATE BOOSTS %CPU_ME %CPU_OTHRS UID FAULTS COW MSGSENT MSGRECV SYSBSD SYSMACH CSW
13083 plugin-conta 18.5 00:12.28 8 1 47 28M+ 0B 0B 13074 13074 sleeping *0[0] 0.00000 0.00000 501 41020+ 1673 47520+ 11907+ 189609+ 64625+ 149561+
13081 plugin-conta 41.6 00:27.55 48 2 348 284M+ 0B 0B 13074 13074 sleeping *0[2] 0.00000 0.00036 501 640190+ 2317 141589+ 52574+ 1466837+ 247053+ 895573+
13074 firefox 29.5 00:19.57 70 4 718 446M- 6784K 0B 13074 1 sleeping *0[2] 0.00329 0.03260 501 327484+ 8207 462414+ 212317+ 927395+ 979620+ 672846+

So, for Chrome, 47.7 CPU seconds in 60s and a load average of 2.52; Firefox 59.4 CPU seconds in 60s and a load average of 3.48. I did not use the computer for anything else during that time but of course these results are far from a "lab environment", so take it with a grain of salt.

Attached file Chrome memory profile

As noted above, the website is causing a large amount of churn and that this may instead be a site issue. Moving to the JS component to see if they have any further insight on if we can improve how we deal with this in Fx.

Component: Audio/Video → JavaScript Engine
Summary: Very high CPU load during TV streaming → tvnow.de causes CPU churn

(In reply to Jonas H. from comment #5)

  1. Chrome does not seem to have the memory leak (incognito mode, no extensions active, profile attached). Note that my Firefox tests were performed in incognito mode as well, but with uBlock origin active.

Can you test without uBlockOrigin, we have another bug reports which causes some slow down on Reddit (Bug 1645651).

From the profile I do not see any obvious issue suggesting that this could be a JavaScript issue (nor the opposite).

Flags: needinfo?(jonas)

I see some dom::clientWidth and dom::offsetWidth calls in the profile (comment 0), which are triggering some layout taking 1ms at a time.
While this does not seems to be the biggest offender, I have no better clues at the moment.

Component: JavaScript Engine → Layout

Vanilla nightly, no incognito mode, 90s video playback

https://share.firefox.dev/3hyYlOd
https://share.firefox.dev/3eTBOK4

Flags: needinfo?(jonas)

Daniel: Anything stand out to you that might be layout-related in the profiles? (See comment 9 and comment 10.)

Flags: needinfo?(dholbert)

Yeah, I'm pretty skeptical that layout is especially involved in the perf problem here (or that layout improvements would help substantially here).

In particular:

  • The ~1ms clientWidth/offsetWidth flushes that nbp reported (from comment 9) sound pretty reasonable/ideal, assuming there was a restyle or a content change of some sort.
  • Even if we consider those as being expensive: the profiler shows that layout only represents around 4.5% of the total samples that are recorded for the content process. So even if all Layout work was suddenly free, that wouldn't substantially help with the reported CPU usage here. Even in terms of a "frequent wakeup" perspective, the reason we're waking up is in response to the setTimeout firings, not in response to layout needing to happen.

At the risk of kicking this around too many times, I'm going to reclassify this as Core|Performance, because I'm unconvinced that there's substantial layout-engine investigation/optimization that would help here.

Hopefully someone on the performance team can do or arrange some sort of a deep dive (and maybe a performance comparison to Chrome, particularly with UA string spoofing to test whether we're getting the same content as they are). It seems like this site is pretty perf-heavy in both browsers, though moreso in Firefox (47.7/60 CPU-seconds vs. 59/60 CPU-seconds per comment 5), so it's clear that the site is taxing the CPU and we'd need to do some analysis to see how we're spending our extra CPU-seconds. (It might even be architectural, i.e. how many threads/processes we've got our workload split across...? I don't know if Chrome has a similar sort of process count to us for this sort of workload.)

Component: Layout → Performance
Flags: needinfo?(dholbert)

Jonas, could you perhaps capture a memory report from about:memory. A verbose report might be nicer. (See top-left of about:memory)

Flags: needinfo?(jonas)
Attached file memory-reports.zip
Flags: needinfo?(jonas)

Here's a video that can be watched without an account, and that is available outside Germany (I had to click around for a bit to find one):
https://www.tvnow.de/shows/frauentausch-1668/2020-07/episode-454-vanessa-tauscht-mit-jana-257236

The video player does not allow me to switch to higher quality settings, neither in Chrome nor in Firefox.

Yes, you need a premium account for that. It has a free one month trial though, but you have to enter some payment details.

I'm not really willing to do that.

Can only watch in medium quality, this already brings CPUs to max; HD quality not possible.

When you say "not possible", what exactly do you mean by that? What happens if you select HD quality? Can you share a profile of what happens when you're using HD quality?

In HD mode, video playback has unbearably slow FPS performance.

I've recorded 5 profiles in HD mode:

Firefox profiler, platform https://share.firefox.dev/31dy01o
Firefox profiler, media https://share.firefox.dev/2XetwGI

Instruments, firefox process, attached
Instruments, one of the plugin-container children, attached
Instruments, entire system. It includes potentially private information that I do not know how to strip, so I'll not be publishing it here. I'm happy to send it to anyone interested, and will send it to you Markus to your moz email.

For anyone who does not have the entire system profile, I've attached a few screenshots with the processes that take up most of the time. I'm aware that this isn't the best way to share the profiles but I don't know how else to do it without publicly posting the entire profile.

Just realised I've included screenshots – sorry for using a trash TV show for profiling 😁

(In reply to Jonas H. from comment #18)

Firefox profiler, platform https://share.firefox.dev/31dy01o
Firefox profiler, media https://share.firefox.dev/2XetwGI

Thanks, can you get another one with the platform preset but without screenshots? Select the platform preset, then select Custom, then go to the settings and find and uncheck the Screenshots checkbox.
Also, I can see that WebRender is not enabled on your machine. Did you disable it? Can you try force-enabling it (by setting gfx.webrender.all to true)?

(I'm asking because capturing screenshots during profiling creates some extra overhead that skews the profile.)

Also, I can see that WebRender is not enabled on your machine. Did you disable it?

Not that I can remember and definitely not for the nightly version (assuming that it does not copy over some parts of the main Firefox installation config). But the machine is a MacBookPro12,1 that has an Intel Iris Graphics 6100 integrated graphics chip, so no dedicated GPU, maybe that's why it was disabled automatically by some Firefox hardware detection?

New profile w/o screenshots and w/ WebRender enabled as suggested https://share.firefox.dev/3k34eoY

I filed a couple of other issues with tvnow.de over the last couple weeks/months. All of them are being tracked via bug 1578161.

I had a look at your last shared profile and as it looks like we still spend a lot of time in graphics even with webrender enabled. I assume we should make this a gfx related bug and move it to the appropriate component? Markus, what do you think?

Blocks: 1578161
Status: UNCONFIRMED → NEW
Ever confirmed: true
Flags: needinfo?(mstange.moz)

A few things jump out at me from the profile in comment 26:

  1. Escalating memory usage and frequent GC's blocking the main thread of the content process. There's like a 1s GC pause every 5-10 seconds.
  2. A whole lot of setTimeout's. Holy smokes. Look at the Marker Chart for the content process to get a sense of this. It's just jam packed with setTimeout's, doing who knows what, but completely saturating the main thread. It wouldn't surprise me if this is related to the increased memory usage and the GC pauses - who knows what kind of garbage these setTimeout's are creating each time.

sefeng tested Chromium and sees similar setTimeout tasks running at high frequency.

When we tested in Nightly and Chrome, we didn't see any frame dropping in the video from comment 15 while playing, despite similar setTimeout's running.

I feel like this bug lacks focus. The problems being reported here as far as I can tell are:

  1. High CPU usage
  2. Bad performance with HD videos (comment 0 suggests HD quality is not available, but comment 18 seems to contradict that, so I'll presume it's possible but that the actual playback performance is poor)

I think we've more or less narrowed in on the CPU usage problem: the frequent entering of JS via setTimeout is causing the main thread to wake up and execute JS very very often. This appears to be the same on Chrome, but it's also possible that their setTimeout's use a slightly different scheduling algorithm.

Looking at the HD video problem, if I had to guess based on the profiles, it looks like the bad quality is likely being caused by jank in the compositor thread. I suspect that starts to veer in to Graphics territory for investigation.

Part of the problem, I think, of this bug, is how it seems to straddle both CPU usage and frame dropping with HD video playback. This is causing confusion.

I suggest we keep this bug focused on the CPU usage issue. Jones H, I'm going to file a new bug for you on the HD video playback problem where that can be investigated separately.

See Also: → 1658961

If we presume that the CPU usage is related to setTimeout scheduling (which might be something we can improve upon relative to Chrome), then I suggest that this bug be placed under the DOM component.

Component: Performance → DOM: Core & HTML
Summary: tvnow.de causes CPU churn → tvnow.de causes CPU churn likely due to many many setTimeouts

Well, for the setTimeouts issue I already had bug 1641673 filed. It's marked as WFM now given that a fix for NoScript made it much better. Looking at the profiles from the reporter I cannot see that NoScript is in use.

Flags: needinfo?(mstange.moz)

Actually I can also still see the huge amount of setTimeout calls. So it's not fixed even with the fix for NoScript. I wonder which way to dupe the bugs.

Maybe lets keep both open. On bug 1641673 we can cover the webcompat and maybe get the website to less aggressively use the timers, while here we could try to improve Gecko support.

As the original reporter, sorry for being unclear with the original bug description. The trigger for me to report the bug was the fact that HD playback has incredibly bad performance. I observed unusually high CPU load during HD and SD playback, so I just assumed that the both were connected.

Personally I care more for acceptable HD playback performance than for low CPU load.

I just reached out to the site and asked them if they are willing to collaborate on resolving this issue. Maybe we can get a bit more insight into what they're trying to achieve, or maybe we can identify solutions. I'll update this thread as soon as I have something to update.

Whiteboard: [sitewait]

They replied to me Aug 20 and said they're investigating, but I've heard no news since then. As discussed with :annevk, I just pinged them for an update, and also asked for a testing account, so we have some way to investigate.

Edit: Just to leave a paper-trail here: :mstange has testing credentials as of Sep 11.

Severity: -- → S3
Priority: -- → P3

I took a brief look at this last week, with the help of denschub's login.
On my machine I was able to play videos in HD quality, but it was not a great experience.
Even when selecting HD quality, the video isn't necessarily played in HD. The page seems to adapt the video quality automatically, based on observed performance. Unfortunately, it observes performance on the main thread, which is chock full of setTimeout and setInterval callbacks. So, even if video playback quality is smooth (videos run on the compositor thread, independent of the main thread), the page thinks that the experience is bad and automatically downgrades the video resolution.

I don't remember all the details but I don't think I made much headway when I was looking at the reasons for all the setTimeouts.

Nothing has been changed here. That's a profile with recent beta: https://share.firefox.dev/3q1YyhM

Note that this happened when I paused a video, and tried to continue the next day after the machine was in hibernation mode. Maybe that is related here. Also as it can be seen there is a lot of memory allocation, and within some seconds the content process claims about 1GB.

Dennis, did you get any further updates?

Flags: needinfo?(dschubert)

(In reply to Henrik Skupin (:whimboo) [⌚️UTC+2] from comment #38)

Dennis, did you get any further updates?

I did not. I just re-pinged, but it appears like they're a bit secretive about what they're doing. I'll update this thread if I learn something new.

Flags: needinfo?(dschubert)

Olli, I wonder if this could be somewhat related to bug 1683560 and caused by bug 1363829. If there is a possibility I could have a look at Firefox 54, and run it for a while on this site.

Flags: needinfo?(bugs)

If comment 38 happened only after resume, that is possibly a different issue.
I could see that possibly causing issues if there are lots of scheduled timers when suspending - do they all get run immediately when resuming.

Flags: needinfo?(bugs)

(In reply to Olli Pettay [:smaug] from comment #41)

If comment 38 happened only after resume, that is possibly a different issue.

Hard to say but I'm fairly sure it was not directly after resuming but minutes/hours later.

I could see that possibly causing issues if there are lots of scheduled timers when suspending - do they all get run immediately when resuming.

So maybe it's somewhat related to bug 1415923 comment 85? But again this flood of timer events should settle after a couple of minutes.

I tried to reproduce again, and got some new insights. First, the extensive usage of setTimeout() is not related to sleep/hibernation. It starts right away when switching to the tab. When the tab is in background it's all fine, so I assume throttling of timers works as expected.

A good example to reproduce is:
https://www.tvnow.de/serien/mission-weltall-16782/staffel-1/episode-1-suche-nach-fremdem-leben-1454606

This page causes a 25% CPU load for the MainThread of the web content process. Here a profile: https://share.firefox.dev/3cwueHI

Olli, while we see that many setTimeout calls they don't seem to make up the 25% CPU usage for this thread. Can you reproduce with the above URL, and maybe you see something more?

Interesting is also the amount of calls to __psynch_cvwait which might also account for a good amount of CPU usage - see bug 1684139 comment 12.

Flags: needinfo?(bugs)

That link isn't available outside Germany.

In that profile there is plenty of idle time too, and timers don't run constantly.
Mainthread seems to be over 70% idle.
The web page is requesting animation frames all the time, so some cpu usage is expected.

Flags: needinfo?(bugs)

What I recently noticed was that when the setTimeout calls cause lots of CPU usage, all the affected scripts actually came from an episode that I watched before, but not from the current one. Navigating through episodes clearly loads fresh pages. So I wonder why the profiler shows running JS from scripts that are no longer related to the current window global. Might this be related to bfcache, or that it hasn't been GC'ed yet?

Flags: needinfo?(bugs)

Recently tvnow.de pushed a complete overhaul of their website including the media player. I'll have to keep an eye out if something has been changed / improved.

(In reply to Henrik Skupin (:whimboo) [⌚️UTC+1] from comment #45)

What I recently noticed was that when the setTimeout calls cause lots of CPU usage, all the affected scripts actually came from an episode that I watched before, but not from the current one. Navigating through episodes clearly loads fresh pages. So I wonder why the profiler shows running JS from scripts that are no longer related to the current window global. Might this be related to bfcache, or that it hasn't been GC'ed yet?

Btw. this is something that I still see. Maybe Julien can help with that question.

Flags: needinfo?(bugs) → needinfo?(felash)

Hey Henrik!

Your question actually brings me a lot of more questions!

  1. I'd like to undestand the problem: can you point some specific part of a profile, maybe zooming on a range that exhibits the problem and giving the resulting URL? Especially I'd like to see what data we're looking at :-)

  2. On Matrix you mentioned issues about the memory leaking, which (I believe) indicates that we're not really navigating... so that could explain why you're still seeing the old URL. Maybe the page is faking navigation? You could tell possibly by looking at the devtools' network monitor, or maybe the profiler's network panel too.

My understanding is that on a real navigation, everything would be GC, and memory wouldn't leak. If memory leaks between episodes, this means the page isn't reloading.
Although yes, there's the BFCache in this case, which means possibly that the GC wouldn't happen right away? I'm not so sure about all that TBH :-)

Thanks!

Flags: needinfo?(felash) → needinfo?(hskupin)

Sorry for the delay. I'll try to get to the needinfo request soon.

In the meantime I wanted to mention that the side had another UI refresh, and maybe also some other fixes / improvements. Over the last days it feels more responsive and performs better. I'll keep monitoring it.

I'm not using this website anymore. So I wont be able to get the requested details.

Flags: needinfo?(hskupin)
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: