Closed Bug 1420608 Opened 7 years ago Closed 7 years ago

very high CPU / GPU / power use on page with mp4 videos (all playing at the same time, autoplay)

Categories

(Core :: Audio/Video: Playback, defect, P3)

defect

Tracking

()

RESOLVED FIXED
mozilla59
Tracking Status
firefox57 --- wontfix
firefox58 + fixed
firefox59 --- fixed

People

(Reporter: jkleverson77, Assigned: jwwang)

References

Details

(Keywords: power)

Attachments

(13 files)

User Agent: Mozilla/5.0 (Windows NT 6.3; Win64; x64; rv:59.0) Gecko/20100101 Firefox/59.0
Build ID: 20171125100325

Steps to reproduce:

- create a new profile, exit
- start Firefox in the previous profile and visit https://www.jwz.org/blog/2017/10/anime-floppy-disks/
- scroll down to verify that all the videos are (automatically) playing
- scroll to the end of the page, verify that the last video is playing
- open task manager, observe high CPU
- scroll up and down on the page to begin and end, wait in between
- the CPU doubles again and stays at that use



Actual results:

- 45% total CPU use at 3.15 GHz on i7-3630QM CPU (4 cores / 8 threads) Windows 8.1


Expected results:

- 40 times less CPU is needed! Proof: the same page on IE 11: 3% of total CPU use on 1.15 GHz speed on the same machine: that is 40 times less CPU constantly used!
Attached picture shows both result: foreground: Firefox Nightly, background: IE 11, same page loaded, videos automatically playing in both cases
Attached file jwz-high-cpu.7z
The Local copy of the https://www.jwz.org/blog/2017/10/anime-floppy-disks/ page, in case the page content changes on the site.

The page was saved using Save as / Web page complete.

I can reproduce the behavior described opening this local html.
Summary: 40 times very high CPU use on page with mp4 videos → 40 times very high CPU use on page with mp4 videos (all playing at the same time, autoplay)
(The picture attached to the main report is a Photoshop composite of two independent runs, only one browser was running for each of the measurements, no other pages were loaded during the measurement).
Attached file procmon-log.csv
The difference between the period when the CPU is cca 25% and when the CPU is cca 50% is that in the former no new threads are constantly created, but once the CPU jumps the constant thread creation and destruction inside of a Firefox.exe process can be observed. Attached the log file from PROCMON where the thread creation is recorded (saved as a csv file).

More than 8 threads are created and destroyed per second in that cca 50% phase.
Comment on attachment 8931864 [details]
composite-screenshot-FF-high-CPU-vs-IE-low.jpg

The composite file: foreground: the screenshot of Firefox nightly using 45% CPU in Turbo mode (3.15 GHz). Background: the screenshot of IE 11 using 3% in minimal mode (1.15 GHz). The measurements were made independently and then were combined in a single image for easy comparison.
Attachment #8931864 - Attachment description: 40 times more constant CPU use than IE 11 on the same 4 core / 8 threads computer → composite-screenshot-FF-high-CPU-vs-IE-low.jpg
I've tried to figure out some stack traces. I don't know how it comes there, but maybe the thread creation happens because of this? Something creates:

class SupportsConfigEvent : public Runnable
{
public:
  SupportsConfigEvent(DXVA2Manager* aDXVA2Manager,
                      IMFMediaType* aMediaType,
                      float aFramerate)
    : Runnable("SupportsConfigEvent")
    , mDXVA2Manager(aDXVA2Manager)
    , mMediaType(aMediaType)
    , mFramerate(aFramerate)
    , mSupportsConfig(false)
  {
  }

  NS_IMETHOD Run()
  {
    MOZ_ASSERT(NS_IsMainThread(), "Must be on main thread.");
    mSupportsConfig = mDXVA2Manager->SupportsConfig(mMediaType, mFramerate);
    return NS_OK;
  }

which in turn runs: 

CanCreateDecoder(desc, aFramerate);

which does:

  RefPtr<IDirectXVideoDecoder> decoder = CreateDecoder(aDesc);
  return decoder.get() != nullptr;


It seems that it's that, as I see a lot of threads starting somewhere in msmpeg2vdec.dll which is probably the result of the CreateDecoder call.

Note it's not the whole story about the slowdowns, only the second phase. Even if this specific scenario is fixed, the CPU is much, much higher than in IE in the first phase before that process creation in the loop starts (before the CPU jumps to 45% on my machine).
The number of threads Firefox created in every second, made by processing the high-cpu-procmon-log.CSV and roughly compared with the observed CPU use from other sampling. The graphs are not made during the same sampling, there were two runs, so 1 to 1 correspondence of the seconds doesn't exist, but the two distinct periods could be recognized (roughly 25% and roughly 50% CPU usage).
CPU uses as drawn by Process Explorer (clicking on the CPU graph) in two phases when Nightly shows https://www.jwz.org/blog/2017/10/anime-floppy-disks/
First phase: no new threads are created, Second: continuous creation of new threads
Attached image ie11-ctx-switches.jpg
Finally, the same CPU graph (including context switches) as in nightly-ctx-switches.jpg, but for the reference case of IE 11. Note that the CPU speed there remains very low compared to both Nightly cases, and compare also the number of contexts switches and interrupts to the nightly-ctx-switches.jpg.
Attached image cpu-power-and-gpu.jpg
Even more statistics, quite telling, IMHO:

CPU Power draw, measured using "Intel® Power Gadget 3.0": 
IE 11 reference:  6.3 W

Nightly Phase 1: 17.6 W (almost 3 times than IE 11) 
        Phase 2: 26.5 W (more than 4 times than IE 11)


GPU Load, measured using TechPowerUp GPU-Z 2.5.0: 

IE 11 reference:    5-6% when on videos, 
                      0% (!) on the bottom of the page

Nightly: Phase 1: 15-18% when on videos, 
                      7% on the bottom of the page
	 Phase 2:     (not measured)
                  10-11% (!) on the bottom of the page
Wall plug power draw of the notebook, while battery removed:

Writing this with Nightly in https://bugzilla.mozilla.org/show_bug.cgi?id=1420608 : 20 W


Showing https://www.jwz.org/blog/2017/10/anime-floppy-disks/ :

IE 11 reference: 27 W       35%  more than idle
Nightly Phase 1: 40 W   cca 50%  more than IE 11 for the same page, or 2 X   more than idle
Nightly Phase 2: 53 W   cca  2 X more than IE 11 for the same page, or 2.6 X more than idle
Component: Untriaged → Audio/Video: Playback
Keywords: power
Product: Firefox → Core
I've also tried to see a kind of maximum wall plug draw: I've opened 6 tabs in Nightly, all with the jwz page: the draw reached a maximum of 68 W but didn't keep it long, probably the power or temperature management scaled it down to 64 W.
Then, to compare,  I've opened 6 tabs of the same page in IE 11, it drew on average 33 W. So with more tabs open with the pages with more videos, Nightly still sucks up the battery twice as fast!
Moreover, the power draw above the idle is even worse than what the Windows Task Manager displays as values. Specifically, if the maximum draw is around 68 W and idle draw is 20 W, then is Nightly Phase 1 (40W) is around 40 percent of the span between 20W and 68W. The phase 2 (53W) is then around 60% of the span between 20 and 68W. I've seen some bug report where somebody claimed that the task manager isn't correct, but in this example task manager is even optimistic, compared to the actual power draw: the corresponding values were around 23% and 45%, so definitely not too high, when comparing with the total power draw. But this also means that I shouldn't have used CPU speed from it to additionally scale the estimates, that is, now I know that it shows reasonable values without the need for the additional scaling.
Summary: 40 times very high CPU use on page with mp4 videos (all playing at the same time, autoplay) → very high CPU / power use on page with mp4 videos (all playing at the same time, autoplay)
Summary: very high CPU / power use on page with mp4 videos (all playing at the same time, autoplay) → very high CPU / GPU / power use on page with mp4 videos (all playing at the same time, autoplay)
The thread creation is probably due to creating a new decoder for each loop of each video. I tried a couple of experiments:

1. By setting media.wmf.dxva.max-videos=999 I got all the videos to be hardware decoded. The page still used a lot of CPU.
2. Pausing all of the videos by right-clicking. This reduces the CPU usage significantly.
Priority: -- → P3
Flags: needinfo?(ajones)
I've extracted the corresponding lengths of all videos in that page:

02.07 03.07 01.63 02.97 01.97 01.17 01.17 01.90 02.40 00.27 01.17 02.57 01.83 01.43 02.90 04.57

And all number of frames:

62 92 49 89 59 35 35 57 72 8 35 77 55 43 87 137

There are 16 of videos inside, the longest is less than 4.57 seconds. All are 30 fps effectively.

There is no sound in any of them physically. Moreover I believe all are marked in HTML 

Here is a number of the threads created in every second in the example for which I've made a graph:


atSecond, numbers of threads created:

009	58 threads created

010-068 -- no threads created

069	6
070	14
071	16
072	15
073	14
074	41

087	6
088	46

0 - 2 per sec

129-154 -- no threads created

154	4
155	38
156	36
157	38
158	38
159	41
160	41
161	42
162	41
163	42
164	43
165	43
166	39
167	33
168	42
169	36
170	34
171	36
172	41
173	39
174	43
175	34
176	40
177	45
178	45 
and then continues around 45 per second.

Why is creating 45 of threads *per second* needed to loop 16 same videos on the page on which nothing else changes?

This computer on which I've measured is reasonably fast. I've observed a user scrolling his Facebook feed (with, of course, videos in the page, starting one after another) using Firefox on a less powerful computer and Firefox becoming not responsive after not too much scrolling. I assume the same effect happens as this one I've isolated here.

The "constant thread creation" phase is obviously triggered at some point and it's obviously too much. But it's not the only problem, the amount of work (and the power used!) needed to play the videos even without that thread creation is still  double compared with the old IE 11. That it at least twice as less time on the battery --- especially for people who open more tabs. A "normal" user doesn't have ad blockers or video blockers. He opens more pages in more tabs. If the videos are playing (ads and whatever), is not going to search them on every page and every tab to pause them. Just having open Firefox with the open tabs will drain his battery. I've observed the problems with videos "from the distance" for at least many months, and now after a lot of progress in development, and still seeing it...

Anthony, did I understand correctly: you can reproduce what I described? Shouldn't the bug then be marked "New"?
I see "needinfo"? Can I help more?
I've set media.wmf.dxva.max-videos=999 too. The results:

Nightly, after opening the page (before the "constant thread creation phase"):
14% CPU use (2.4 GHz), 12% GPU load

IE 11 is still:
 6% CPU (1.15 Ghz), 5% CPU load
As I've posted before (with the code quoted, see above) it appears to me that there's some call which results in the creation of the decoder which is immediately discarded afterwards, because the call only asks "can it be created" and for that purpose the whole decoder is created, the bool value true is remembered to be returned, and then the decoder is discarded. Strangely, it seems something has to happen to trigger that phase, as there is a period without the creation of the new decoders. Interestingly around 45 threads per second created is close to three times the number of videos on the page (3 * 16 = 48).
The two big issues here are, as I see it:

1) there is some *trigger* which initiates the loop where 45 threads are created and destroyed *per second* (observable using Sysinternals Process Monitor), the CPU/GPU/electricity use of course jumps again and the number of CPU interrupts per second raises 7 times (as seen in nightly-ctx-switches.jpg).

2) even before the "constant creation phase" occurs, a lot of CPU/GPU/electricity is needed compared to IE.

As I've said, both look similar to what I've observed "over the shoulder" by a normal user using Facebook in which autostart videos occur. It seems that when the 1) happens the Firefox gets even unresponsive, helped by a lot of processing needed for 2) anyway in the background tabs (or maybe are even other tabs already in that phase, I don't know).

In short, I believe these two issues affect very common experience ("i.e. a normal user simply browsing the Facebook and opening a few more pages in tabs, everything with videos") and aren't some specially constructed case. The only thing that's special is is that I've managed to locate a single page where I can repeat it, without having to give somebody's personal Facebook login to demonstrate the issues.
(In reply to Johnny Kl from comment #13)
> The "constant thread creation" phase is obviously triggered at some point
> and it's obviously too much.

Each video is decoded by Windows Media Foundation using 7 threads. Each loop of the video will restart all of those threads. That seems avoidable, especially given that we have seamless looping on audio.

> But it's not the only problem, the amount of
> work (and the power used!) needed to play the videos even without that
> thread creation is still  double compared with the old IE 11. That it at
> least twice as less time on the battery --- especially for people who open
> more tabs.

The decoders shut down when you open new tabs. However with setting media.use-blank-decoder=true it still uses a lot of CPU.

> A "normal" user doesn't have ad blockers or video blockers. He
> opens more pages in more tabs. If the videos are playing (ads and whatever),
> is not going to search them on every page and every tab to pause them. Just
> having open Firefox with the open tabs will drain his battery. I've observed
> the problems with videos "from the distance" for at least many months, and
> now after a lot of progress in development, and still seeing it...

In general you can't fully understand the impact of an issue until you know the cause. We don't know the cause yet.

It is intersting to note that Chrome uses a similar amount of CPU.
Flags: needinfo?(ajones)
Flags: needinfo?(ajones)
(In reply to Anthony Jones from comment #17)

> It is intersting to note that Chrome uses a similar amount of CPU.

I don't agree! I've tried Chromium which can't play the videos at all, but I've installed the latest Opera which probably has the same video code as Chrome, and I can easily show you that where Nightly uses 53% CPU, Opera uses 12% and IE 11 only 6%. See the new attachment and its comment which will immediately follow this reply. Thank you.
Attached image two-pages-backgr.jpg
The case, reduced to the most obvious procedure to demonstrate, only then check the other images for more details. It's how two *background* tabs with videos playing (the default behaviour) affect the CPU much worse than Opera (based on Chromium) or IE 11:

- on Windows OS
- copy the URL https://www.jwz.org/blog/2017/10/anime-floppy-disks/ to the clipboard
- open Task Manager / Performance
- open Nightly, paste the URL once, wait for all videos loaded and playing
- keeping this tab open, open a New tab, paste the URL to in that new tan, wait for all videos loaded and playing
- open a New empty tab, keeping both tabs with videos playing in the background
- wait 15 seconds
- check Task Manager / Performance: note CPU is 53%
- In the Process Monitor, making sure that only "Process and Thread activity" is on, more than 40 threads are created and closed per second

- the same but opening Opera (close Nightly before!):  CPU is 12%

- the same but opening IE 11: CPU is 6%

- in the attached two-pages-backgr.jpg all three cases compared, each time starting from no application running.

That is, once again: Firefox Nightly 53%, Opera 12%, IE11 6%, or if we normalize to IE 11, the factors are Firefox 8.8x Opera 2x, IE11 1. It's a huge difference.

Anthony I understood you can also see the same behavior? Shouldn't the bug then be considered confirmed, if you can repeat it, especially in the form I describe now? Thank you.
(In reply to Anthony Jones from comment #17)

> Each video is decoded by Windows Media Foundation using 7 threads. Each loop of the video will restart all of those threads.

It doesn't appear to be what I observe. The longest video is less than 5 seconds long, the shortest just 0.27 seconds. If your claim would hold, at least every 5 seconds the new threads should be created. But it I just open a single tab with all the videos looping, and if I don't scroll down, I can observe for minutes no new threads being created. Only some additional action by me will move Firefox to the "thread creation phase" as I've described in all the comments before Comment 19.

> The decoders shut down when you open new tabs.

Can you please explain which action you believe would shut them down? See Comment 19: what I observe is: I open one tab with videos, leave it opened, then open a new tab with the videos, leave it opened, then open a new, third tab, keep it empty, wait just some seconds and then I can see the threads being created and closed, many of them per second. The tabs in the background (that is, not visible) produce 53% of CPU use, whereas Opera (Chromium based) in the same scenario produces 12% of CPU use and IE11 only 6%. From what I observe, Firefox at this moment handles videos in the background tabs many times worse than any competition. But note that once the "phase with the thread creation" starts (which doesn't happen immediately after the only tab with videos is opened) it's always visibly worse than the competition, even if it's only tab in the browser. Please use the Process Monitor https://docs.microsoft.com/en-us/sysinternals/downloads/procmon to confirm that these two different phases do exist (at best while keeping the Task Manager / Performance opened, to see the difference in the CPU use in the corresponding phases -- the one where the threads are created uses obviously more CPU, and in the phase where no threads are created (single tab) the videos are still playing and looping) and let me know if you need help observing what I've described.
(In reply to Anthony Jones from comment #17)

> Each video is decoded by Windows Media Foundation using 7 threads. Each loop of the video will restart all of those threads

Here is the video I've recorded. For minutes no new threads are created, I've uploaded one minute total: the last 30 "quiet" seconds before I do "something" (scroll the page down, move the mouse over something) when the "thread creation phase" starts, using more CPU (the next 30 seconds):

https://youtu.be/07ZBfXliY6Y
Using windbg, I can see that in the "thread creation phase" (see how the phase starts from 30th second of the video from the Comment 21 ) the H264Converter::CreateDecoder is called all the time, whereas in the "no rapid thread creation" phase (as seen in the first 30 seconds of the video from the Comment 21 ) there are effectively no calls to that. I also see that these calls result in the new threads created. I hope that's a good enough hint for the developers to search for the cause.
In case it's not so easy for others to achieve this, this is the full call stack that creates new threads for new decoders in the "thread creation phase." I'm otherwise not familiar with Firefox design, so I have no idea why it comes in that "unstable" situation where all the creations repeat too much too often:

Call Site
xul!mozilla::MFTDecoder::SendMFTMessage+0x22 [z:\build\build\src\dom\media\platforms\wmf\mftdecoder.cpp @ 179]
xul!mozilla::MFTDecoder::SetMediaTypes+0x98 [z:\build\build\src\dom\media\platforms\wmf\mftdecoder.cpp @ 105]
xul!mozilla::WMFVideoMFTManager::SetDecoderMediaTypes+0x22d [z:\build\build\src\dom\media\platforms\wmf\wmfvideomftmanager.cpp @ 836]
xul!mozilla::WMFVideoMFTManager::InitInternal+0x342 [z:\build\build\src\dom\media\platforms\wmf\wmfvideomftmanager.cpp @ 732]
xul!mozilla::WMFVideoMFTManager::Init+0x43 [z:\build\build\src\dom\media\platforms\wmf\wmfvideomftmanager.cpp @ 625]
xul!mozilla::WMFDecoderModule::CreateVideoDecoder+0x8d [z:\build\build\src\dom\media\platforms\wmf\wmfdecodermodule.cpp @ 109]
xul!mozilla::H264Converter::CreateDecoder+0x181 [z:\build\build\src\dom\media\platforms\wrappers\h264converter.cpp @ 273]
xul!mozilla::H264Converter::H264Converter+0x1b0 [z:\build\build\src\dom\media\platforms\wrappers\h264converter.cpp @ 37]
xul!mozilla::PDMFactory::CreateDecoderWithPDM+0x19c [z:\build\build\src\dom\media\platforms\pdmfactory.cpp @ 292]
xul!mozilla::PDMFactory::CreateDecoder+0xbe [z:\build\build\src\dom\media\platforms\pdmfactory.cpp @ 235]
xul!mozilla::MediaFormatReader::DecoderFactory::DoCreateDecoder+0x1c3 [z:\build\build\src\dom\media\mediaformatreader.cpp @ 774]
xul!mozilla::MediaFormatReader::DecoderFactory::RunStage+0x42 [z:\build\build\src\dom\media\mediaformatreader.cpp @ 713]
xul!<lambda_15decc4741528b208ebb37c365c9455f>::operator()+0x4e [z:\build\build\src\dom\media\mediaformatreader.cpp @ 692]
xul!mozilla::MozPromise<RefPtr<mozilla::GlobalAllocPolicy::Token>,bool,1>::InvokeMethod+0x1a [z:\build\build\src\obj-firefox\dist\include\mozilla\mozpromise.h @ 520]
xul!mozilla::MozPromise<RefPtr<mozilla::GlobalAllocPolicy::Token>,bool,1>::InvokeCallbackMethod+0x41 [z:\build\build\src\obj-firefox\dist\include\mozilla\mozpromise.h @ 565]
xul!mozilla::MozPromise<RefPtr<mozilla::GlobalAllocPolicy::Token>,bool,1>::ThenValue<<lambda_15decc4741528b208ebb37c365c9455f>,<lambda_71d8e683077d8a4f5eb73812268d8b57> >::DoResolveOrRejectInternal+0x58 [z:\build\build\src\obj-firefox\dist\include\mozilla\mozpromise.h @ 770]
xul!mozilla::MozPromise<mozilla::dom::U2FSignResult,enum nsresult,1>::ThenValueBase::DoResolveOrReject+0x80 [z:\build\build\src\obj-firefox\dist\include\mozilla\mozpromise.h @ 497]
xul!mozilla::MozPromise<mozilla::dom::U2FRegisterResult,enum nsresult,1>::ThenValueBase::ResolveOrRejectRunnable::Run+0x51 [z:\build\build\src\obj-firefox\dist\include\mozilla\mozpromise.h @ 403]
xul!mozilla::AutoTaskDispatcher::TaskGroupRunnable::Run+0x54 [z:\build\build\src\obj-firefox\dist\include\mozilla\taskdispatcher.h @ 217]
xul!mozilla::TaskQueue::Runner::Run+0xea [z:\build\build\src\xpcom\threads\taskqueue.cpp @ 244]
xul!nsThreadPool::Run+0x328 [z:\build\build\src\xpcom\threads\nsthreadpool.cpp @ 230]
xul!nsThread::ProcessNextEvent+0xe16 [z:\build\build\src\xpcom\threads\nsthread.cpp @ 1034]
xul!NS_ProcessNextEvent+0x26 [z:\build\build\src\xpcom\threads\nsthreadutils.cpp @ 508]
xul!mozilla::ipc::MessagePumpForNonMainThreads::Run+0xdd [z:\build\build\src\ipc\glue\messagepump.cpp @ 365]
xul!MessageLoop::RunHandler+0x1b [z:\build\build\src\ipc\chromium\src\base\message_loop.cc @ 320]
xul!MessageLoop::Run+0x3e [z:\build\build\src\ipc\chromium\src\base\message_loop.cc @ 300]
xul!nsThread::ThreadFunc+0xe4 [z:\build\build\src\xpcom\threads\nsthread.cpp @ 425]
nss3!_PR_NativeRunThread+0xee [z:\build\build\src\nsprpub\pr\src\threads\combined\pruthr.c @ 419]
nss3!pr_root+0xa [z:\build\build\src\nsprpub\pr\src\md\windows\w95thred.c @ 96]
ucrtbase!thread_start<unsigned int (__cdecl*)(void * __ptr64)>+0x35
KERNEL32!BaseThreadInitThunk+0x22
(In reply to Johnny Kl from comment #19)
> Anthony I understood you can also see the same behavior? Shouldn't the bug
> then be considered confirmed, if you can repeat it, especially in the form I
> describe now?

The unconfirmed/new workflow isn't used in this component but if it makes you happy I'll do that.


(In reply to Johnny Kl from comment #23)
> xul!mozilla::WMFVideoMFTManager::Init+0x43
> [z:\build\build\src\dom\media\platforms\wmf\wmfvideomftmanager.cpp @ 625]

This is where the decoder is initialised, which supports my previous statement about WMF. The thread creation itself seems like a red herring given that the performance of the blank decoder is still poor.
Status: UNCONFIRMED → RESOLVED
Closed: 7 years ago
Resolution: --- → FIXED
Lets try that again.
Status: RESOLVED → REOPENED
Ever confirmed: true
Flags: needinfo?(ajones)
Resolution: FIXED → ---
(In reply to Anthony Jones (:kentuckyfriedtakahe, :k17e) from comment #24)

> The thread creation itself seems like a red herring
> given that the performance of the blank decoder is still poor.

No. Firefox can obviously use the existing decoders for loops, and then something can trigger the creation of many of the decoders per second.

If you please see my video, it's easy to see: there is a phase where no decoders are created all the time, before I scroll down the page and hoover over something to make that "rapid thread / decoder creation" start can last as much as you want.

I know I've shown thread creations and not decoders in the video I've posted. But here is the log of the times at which the H264Converter::H264Converter is called, taken automatically with windbg. Since that's the only constructor of that class, I guarantee there were no creations of that object before I scroll the page, and the vidoes loop at that time just as in the video I've recorded.

The format is a counter of H264Converter created, space, minute since the start of a program : second . fractionOfSecond:
01 2:53.365
02 2:53.372
03 2:53.381
04 2:53.395
05 2:53.486
06 2:53.494
07 2:53.506
08 2:53.518
09 2:53.564
10 2:53.575
11 2:53.589
12 2:53.603
13 2:53.635
14 2:53.646
15 2:53.656
16 2:53.669 <-- the 16th H264Converter is created, here I don't do nothing and wait almost 3 minutes
17 5:46.824 <-- after I trigger the "rapid" state H264Converter  is created many times per second
18 5:46.834
19 5:46.843
20 5:46.957
21 5:47.034
22 5:47.089
23 5:47.124
24 5:47.151
25 5:47.194
26 5:47.227
27 5:47.258
28 5:47.290
29 5:47.323
30 5:47.350
31 5:47.361
32 5:47.399
33 5:47.421
34 5:47.440
35 5:47.473
36 5:47.506
37 5:47.536
38 5:47.569
39 5:47.604
40 5:47.638
41 5:47.669
42 5:47.702
43 5:47.735
44 5:47.769
45 5:47.799
46 5:47.812
47 5:47.854
48 5:47.860
49 5:47.880
50 5:47.910
51 5:47.938
52 5:47.966
53 5:47.985
54 5:48.014
55 5:48.050
56 5:48.084
57 5:48.105
58 5:48.124
59 5:48.161
60 5:48.194
61 5:48.232
62 5:48.272
63 5:48.307
64 5:48.342
65 5:48.379
66 5:48.419
67 5:48.455
68 5:48.493
69 5:48.523
70 5:48.538
71 5:48.575
72 5:48.608
73 5:48.641
74 5:48.673
75 5:48.708
76 5:48.741
77 5:48.776
78 5:48.811
79 5:48.848
80 5:48.889
81 5:48.926
82 5:48.963

etc...

The pattern of Firefox being able to not create new H264Converters as long as it's not touched and entering another state via some trigger action is obvious.

Here why I'm certain that it's exactly the construction of H264Converter::H264Converter that creates the most of the new threads in the "rapid thread creation phase" -- I observed the stack at these moments:

The stack at the moment of the thread creation that happens at the construction of H264Converter is:

xul!mozilla::MFTDecoder::SendMFTMessage()+0x22 [z:\build\build\src\dom\media\platforms\wmf\mftdecoder.cpp @ 179]
xul!mozilla::MFTDecoder::SetMediaTypes()+0x98 [z:\build\build\src\dom\media\platforms\wmf\mftdecoder.cpp @ 105]
xul!mozilla::WMFVideoMFTManager::SetDecoderMediaTypes(void)+0x22d [z:\build\build\src\dom\media\platforms\wmf\wmfvideomftmanager.cpp @ 836]
xul!mozilla::WMFVideoMFTManager::InitInternal(void)+0x342 [z:\build\build\src\dom\media\platforms\wmf\wmfvideomftmanager.cpp @ 732]
xul!mozilla::WMFVideoMFTManager::Init(void)+0x43 [z:\build\build\src\dom\media\platforms\wmf\wmfvideomftmanager.cpp @ 625]
xul!mozilla::WMFDecoderModule::CreateVideoDecoder()+0x8d [z:\build\build\src\dom\media\platforms\wmf\wmfdecodermodule.cpp @ 109]
xul!mozilla::H264Converter::CreateDecoder()+0x181 [z:\build\build\src\dom\media\platforms\wrappers\h264converter.cpp @ 273]


So as the result of all the calls after the construction of H264Converter, eventually:

  hr = SendMFTMessage(MFT_MESSAGE_NOTIFY_BEGIN_STREAMING, 0);
  NS_ENSURE_TRUE(SUCCEEDED(hr), hr);

is called which results in 8 levels of calls inside of msmpeg2vdec.dll for which I can't get the symbols from Microsoft's server, so that part of the symbol resolution is not accurate and I don't quote it but from there the thread creation in KERNEL32 is initiated.

To repeat once again, I see typically only 16 H264Converter objects constructed after the page loads, then the videos can loop for minutes, and only once I scroll down another, more CPU consuming phase starts with the rapid creation of H264Converter objects, and for each one exactly one thread is being created from msmpeg2vdec.
It looks like we create a new decoder every loop of the video when in decoder shutdown mode.

Steps to reproduce:

* Navigate to https://www.jwz.org/blog/2017/10/anime-floppy-disks/
* Open new tab
* Wait 15 seconds
* Use gecko profiler

Expected results:

mozilla::CreateDXVAManagerEvent::Run should not show up on the profile

Actual results:

mozilla::CreateDXVAManagerEvent::Run is using 9% CPU on GeckoMain
(In reply to Johnny Kl from comment #26)
> To repeat once again, I see typically only 16 H264Converter objects
> constructed after the page loads, then the videos can loop for minutes, and
> only once I scroll down another, more CPU consuming phase starts with the
> rapid creation of H264Converter objects, and for each one exactly one thread
> is being created from msmpeg2vdec.

It looks like we're fine until we go into decoder shutdown. Once we're in decoder shutdown mode, we flip-flop between WMF and blank decoder, I'm guessing once per video loop.
[Tracking Requested - why for this release]: This issue causes excessive CPU usage on short looping videos.
Having given this some thought, I'm guessing that we're starting the decoder up at the end of each loop in order to show the last frame.
As I've noted in previous comments, there are two issues behind this entry: one is this 

- "rapid creation phase" and another is 

- generally worse playing of media compared to other browsers (as far as I observe it).

I think both issues should be addressed, and now that at last somebody (if I understood correctly? thanks Anthony!) finally confirmed what I've reported I hope both issues could somehow be handled, even if I reported them only in one bug. I've however tried to see if anybody else created bugs about generally inferior media playing, and I can at least somehow confirm this case here:

https://bugzilla.mozilla.org/show_bug.cgi?id=1419047#c4

Maybe people working for Mozilla can now create appropriate bug reports that will lead both to the serious investigation high CPU/GPU/power resource load for media playing, as well as addressing this "rapid creation"? Thanks and good luck.
(In reply to Johnny Kl from comment #31)
> - generally worse playing of media compared to other browsers (as far as I
> observe it).

I can see out of order frames clearly on the 2nd video on the 2nd row.
(In reply to Johnny Kl from comment #31)
> I've however tried to see if anybody else created bugs about
> generally inferior media playing, and I can at least somehow confirm this
> case here:
> 
> https://bugzilla.mozilla.org/show_bug.cgi?id=1419047#c4

Except that issue has nothing to with media.
(In reply to Anthony Jones (:kentuckyfriedtakahe, :k17e) from comment #32)
> (In reply to Johnny Kl from comment #31)
> > - generally worse playing of media compared to other browsers (as far as I
> > observe it).
> 
> I can see out of order frames clearly on the 2nd video on the 2nd row.

I created bug 1422660 because it is likely unrelated to the decoder shutdown issue.
(In reply to Anthony Jones (:kentuckyfriedtakahe, :k17e) from comment #32)
> (In reply to Johnny Kl from comment #31)
> > - generally worse playing of media compared to other browsers (as far as I
> > observe it).
> 
> I can see out of order frames clearly on the 2nd video on the 2nd row.

I've just meant "worse regarding the CPU/GPU/power usage" as I described in the Comment 19 : 

Firefox Nightly 53% CPU
Opera 12% CPU
IE11 6% CPU

Regarding "frame skipping" I haven't believed the example is relevant as these MP4 were created from the GIFs!

https://www.jwz.org/blog/2017/10/gifs-as-mp4s/
Likely wontfix for 57 at this point.
Attached image e10s-overhead.jpg
The difference in CPU use when opening the https://www.jwz.org/blog/2017/10/anime-floppy-disks/ before the "rapid creation phase" that is, when no new decoders are constantly created.

Above: defaults, when the page is opened in the new, just created profile
CPU use: cca 25%

Below: when the page is opened in the profile with
    browser.tabs.remote.autostart = false
    media.wmf.dxva.max-videos = 999
CPU use: cca 9%

I observe 16% of CPU use overhead just for these two defaults, even when the bug with the "rapid creation" is not triggered.

I hope somebody can investigate this too?

(btw the setup.exe from the Comment 35 is not installing any firefox binaries at my computer, I've already contacted jwwang directly about that. Trying to check what it did, I see that setup.exe creates something like system.dll and uac.dll. I didn't know there is uac highjacking built-in in the setup. Anyway that setup doesn't report failure when nothing is installed, instead it creates the shortcuts to non-existing firefox.exe)
Somehow I can no longer open this link: https://www.jwz.org/blog/2017/10/anime-floppy-disks/. Is the site still up?
(In reply to JW Wang [:jwwang] [:jw_wang] from comment #40)
> Somehow I can no longer open this link:
> https://www.jwz.org/blog/2017/10/anime-floppy-disks/. Is the site still up?

You can download the jwz-high-cpu.7z attached in Comment 1 containing the whole page content, which I archived days ago, and test locally, even with no internet.

The page loads for me.
https://searchfox.org/mozilla-central/source/dom/media/MediaDecoderStateMachine.cpp#820-827

I guess this is the culprit.

1. MDSM turns off blank decoder when seeking starts.
2. MDSM turns on blank decoder when decoding starts for the tab is still in the background.
3. It looks like flipping between WMF and blank decoder could result in hight CPU usage.
4. 3 is even more significant when looping a small file because you will seek (to the beginning) very often.

Btw, I can't repro the issue on my Windows 10 machine. I guess step 3 is more expensive on some Windows machines than others.
(In reply to JW Wang [:jwwang] [:jw_wang] from comment #42)

> I can't repro the issue on my Windows 10 machine. I guess step 3 is
> more expensive on some Windows machines than others.

Thanks. If you can add the trace statement on the point of the creation of the decoders and see them being rapidly created every second you are definitely reproducing the "rapid creation phase." On the significantly faster computer than mine there would be surely less visible CPU problems, but on all of the slower systems the problems they surely are.

Once that bug is fixed, I hope somebody can create a relevant bug based on the observations from the Comment 36 and  Comment 38 ?
That build fixes the issue for me. The CPU usage drops once I open a new tab.
Attachment #8935599 - Flags: review?(gsquelart)
Assignee: nobody → jwwang
Comment on attachment 8935599 [details]
Bug 1420608. P1 - don't switch off blank decoder when seeking begins.

https://reviewboard.mozilla.org/r/206478/#review212194
Attachment #8935599 - Flags: review?(gsquelart) → review+
Thanks!
Pushed by jwwang@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/5a1a53d1fd02
don't switch off blank decoder when seeking begins. r=gerald
https://searchfox.org/mozilla-central/rev/f5f1c3f294f89cfd242c3af9eb2c40d19d5e04e7/toolkit/content/tests/browser/browser_resume_bkg_video_on_tab_hover.js#12

The test is racy! Sometimes the 'mozentervideosuspend' event is fired before any event handler is registered and the test times out for waiting an event that will never come.
Flags: needinfo?(jwwang)
Attachment #8936192 - Flags: review?(alwu)
Comment on attachment 8936192 [details]
Bug 1420608. P2 - fix the test timeout.

https://reviewboard.mozilla.org/r/206956/#review212992

::: commit-message-65c69:3
(Diff revision 1)
> +Bug 1420608. P2 - fix the test timeout.
> +
> +See comment 50 for the cause. We call load() and play() to start playback

nit : it's better to mention that the "mozentervideosuspend" would happen before we start register the lisnter in comment msg.
Attachment #8936192 - Flags: review?(alwu) → review+
Thanks!
Pushed by jwwang@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/69a0e0c9d40e
P1 - don't switch off blank decoder when seeking begins. r=gerald
https://hg.mozilla.org/integration/autoland/rev/772fef647fd7
P2 - fix the test timeout. r=alwu
https://hg.mozilla.org/mozilla-central/rev/69a0e0c9d40e
https://hg.mozilla.org/mozilla-central/rev/772fef647fd7
Status: REOPENED → RESOLVED
Closed: 7 years ago7 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla59
If I understand one from two different issues I've reported in the comments here is fixed. Regarding the second, maybe a good manifestation is the one in Comment 38 ? 

What are your opinions on the issue from Comment 38 and should a new bug be made for it? Thank you.
Hey Johnny! This is more of an automated comment that is made after we do the merges, in this case from autoland to central. We usually try to do the merges right before the nightly tests are triggered. 
Yes, the first part now is considered fix and for what you refer to in comment 38 I suggest you open another bug and put this bug in the see also section.
Attached image open-in-4-tabs-cmp.jpg
With the latest Nightly arrived I can confirm that the "rapid creation phase" as I saw it before is gone. Thanks everybody for that one!

Unsurprisingly I still observe the other performance problems: specifically, in this comment I can show that Firefox has some strange periods where the CPU use is still much, much higher than in IE11 or Opera, which is especially visible if I try to open the jwz anime floppy page in four tabs, one after another.

In these cases my worry is less that the battery is consumed more long term (although the used CPU is twice of IE11 even after these periods, which is significant) but that these relatively long high "flames" point to something that I can imagine on the slower computers potentially blocks the responsiveness to the user, and therefore it would be worthy to analyze what's going on there.

To reproduce:
1 copy the link https://www.jwz.org/blog/2017/10/anime-floppy-disks/
2 open the browser and open the Task Manager/Performance from the OS
3 open the link in the first tab, wait to see it's loaded, then
4 open the second tab, open the link, wait to see it's loaded, then
5 open the third tab, open the link, wait to see it's loaded, then
6 open the fourth tab, open the link, wait to see it's loaded, then

Observe the "high and wide flames." If I don't wait to the end of the load, I can even observe even wider periods of 100% CPU use (the flames are even higher and even wider). I can also induce that the end speed doesn't end to be the same as in the snapshot. All in all it seems there is some visibly unnecessary work which potentially overuses the CPU resources.

The variation: open the same link in the same tab with the short delays, Firefox ends in some other higher CPU constant use, compared to one when it was opened only once!

I know that this bug under this number is considered "fixed" but as I am not an "insider" I still don't know how to present my case optimally, to avoid that the report is declared "non existing" in some way. That was why I took the effort to do a lot of various observations here and illustrate them or additionally prove them, but I would not like to start from the point zero (as I do this in my free time) and I also don't have the access to the various OSes, which is maybe of benefit, just the browsers I've used.

In this bug entry ( 1420608 ) there are I believe different scenarios which point to more than one issue that still can be solved simply by using as a test this one page I've used, also opening it in various timings more than once or in more than one tab, or with the various settings, and I really believe the best would be if the bugs are filled by those who are more versed and know more about what is going on "under the hub" in the Firefox (and can recognize the using the built-in profiler and their knowledge). Please see the previous comments for inspiration, maybe at least the comment 38 (with the attached 8934568: e10s-overhead.jpg ) and this comment now, including the "variation" which is also somehow telling could be used as as the starting point for opening new bugs. For the starting identification of the potential problems, maybe it's worthy doing the visual comparison of the CPU use graphs doing the same actions on other OSes too.

Note: the configuration for the picture attached is default after install, i.e. media.wmf.dxva.max-videos is the default 8, not 999, and tabs.remote.autostart is also the default.
JW,
Ihis bug looks serious and worth uplifting. 
What do you think?
Flags: needinfo?(jwwang)
Approval Request Comment
[Feature/Bug causing the regression]:none
[User impact if declined]:high CPU usage when viewing a page with many videos on Windows.
[Is this code covered by automated tests?]:yes
[Has the fix been verified in Nightly?]:yes
[Needs manual test from QE? If yes, steps to reproduce]: no
[List of other uplifts needed for the feature/fix]:none
[Is the change risky?]:low
[Why is the change risky/not risky?]:the change is simple. We just move the action of suspending video decoding from place A to B to reduce the chance of recreating decoders. Even if we get it wrong, it will at worst remain high CPU on those pages as if no such fix is applied.
[String changes made/needed]:none
Flags: needinfo?(jwwang)
Attachment #8937891 - Flags: review+
Attachment #8937891 - Flags: approval-mozilla-beta?
Comment on attachment 8937891 [details] [diff] [review]
1420608_fix_beta_58.patch

Fix a high cpu usage issue. Beta58+.
Attachment #8937891 - Flags: approval-mozilla-beta? → approval-mozilla-beta+
Is this bug related to this issue I have https://bugzilla.mozilla.org/show_bug.cgi?id=1411575 ?
(In reply to JW Wang [:jwwang] [:jw_wang] from comment #63)
> [Is this code covered by automated tests?]:yes
> [Needs manual test from QE? If yes, steps to reproduce]: no

Based on this assessment on manual testing needs and it's automated coverage, marking this as qe-verify-.
Flags: qe-verify-
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: