Closed Bug 1349991 Opened 7 years ago Closed 7 years ago

High CPU usage and growing heap-unclassified allocations in gfx process during video playback

Categories

(Core :: Graphics: Layers, defect, P1)

x86_64
Windows 10
defect

Tracking

()

VERIFIED FIXED
mozilla56
Tracking Status
firefox52 --- wontfix
firefox-esr52 --- wontfix
firefox53 --- wontfix
firefox54 --- wontfix
firefox55 + wontfix
firefox56 --- verified

People

(Reporter: bugzilla.mozilla.org, Assigned: jerry)

References

Details

(Keywords: regression, regressionwindow-wanted, Whiteboard: [MemShrink:P1])

Crash Data

Attachments

(6 files)

I'm seeing *all* child processes (content and gfx) saturating 1 core each and the gfx process consuming more and more memory until it crashes or I stop playback. Upon stopping playback the memory usage ramps down again over the course of several seconds.
The video playing in a background tab does not exhibit this behavior, it has to be visible.


GPU (pid 12156)
Explicit Allocations

6,885.54 MB (100.0%) -- explicit
├──6,824.60 MB (99.11%) ── heap-unclassified
└─────60.94 MB (00.89%) -- (5 tiny)
      ├──60.40 MB (00.88%) ++ heap-overhead
      ├───0.17 MB (00.00%) ── profiler/profiler-state
      ├───0.14 MB (00.00%) ++ xpcom
      ├───0.13 MB (00.00%) ++ atom-tables
      └───0.11 MB (00.00%) ── telemetry

Other Measurements

134,217,727.94 MB (100.0%) -- address-space
├──134,210,283.62 MB (99.99%) ── free(segments=132)
└────────7,444.32 MB (00.01%) -- (2 tiny)
         ├──7,349.94 MB (00.01%) -- commit
         │  ├──7,087.55 MB (00.01%) -- private
         │  │  ├──7,079.70 MB (00.01%) ── readwrite(segments=7232)
         │  │  ├──────7.59 MB (00.00%) ── readwrite+writecombine(segments=9)
         │  │  ├──────0.15 MB (00.00%) ── readwrite+stack(segments=9)
         │  │  ├──────0.11 MB (00.00%) ── readwrite+guard(segments=9)
         │  │  ├──────0.00 MB (00.00%) ── execute-readwrite(segments=1)
         │  │  └──────0.00 MB (00.00%) ── readonly(segments=1)
         │  ├────153.52 MB (00.00%) ++ image
         │  └────108.87 MB (00.00%) ++ mapped
         └─────94.38 MB (00.00%) ++ reserved

6,885.54 MB (100.0%) -- heap-committed
├──6,825.14 MB (99.12%) ── allocated
└─────60.40 MB (00.88%) ── overhead

10 (100.0%) -- observer-service
└──10 (100.0%) -- referent
   ├──10 (100.0%) ── strong
   └───0 (00.00%) ++ weak

      6,825.14 MB ── heap-allocated
          1.00 MB ── heap-chunksize
      6,890.00 MB ── heap-mapped
      7,191.41 MB ── private
      7,002.82 MB ── resident
      6,940.80 MB ── resident-unique
          2.30 MB ── system-heap-allocated
      7,524.25 MB ── vsize
132,587,666.44 MB ── vsize-max-contiguous
Attached image cpu-memory.png
The last process in the screenshot is for GFX.

And before you suggest it: I already tried reproducing it on an empty profile and it's not happening there. I only see it - very relably - on my heavyweight profile. Maybe it's crossed some threshold where something cleaning up memory can't keep up with allocations anymore?
And since the GFX process is affected I assume that addons can play no role in this since they don't have access to it.
Attached image rampdown.png
Build ID 	20170323030203
OS: Unspecified → Windows 10
Hardware: Unspecified → x86_64
The memory exhaustion also resulted in several processes crashing. Maybe the stacks are useful

bp-07a33504-b835-4d93-b4a0-d0f342170323
bp-ac40f826-88e1-48d0-a3b4-c3fb82170323
Product: Firefox → Core
Component: General → Graphics
Severity: normal → major
Has Regression Range: --- → no
Has STR: --- → yes
Flags: needinfo?(bugzilla.mozilla.org)
See Also: → 1344357
I think I'm experiencing all of this. PLUS, content processes are started up to the limit specified by dom.ipc.processcount. And when I'm playing the video all seem to use the cpu to some degree (0.7-1.2%, it seems). Two does this heavily. I'm guessing the gpu process and the tab's process. All of this results in all my cpu cores being maxed out and memory leaking heavily.
With severe lag, so I didn't have patience to wait around until something crashed. I rather closed the tab, or stopped playback

This started happening abruptly with 2017-03-22's central build.
> PLUS, content processes are started up to the limit specified by dom.ipc.processcount. 

FF is currently not designed to use as content processes as shown in your screenshot. You probably should use a 64bit build if you have issues with memory exhaustion and 4-8 processes at most.

So due to that discrepancy you may or may not be seeing the same issue as I am.
(In reply to The 8472 from comment #6)
> FF is currently not designed to use as content processes as shown in your
> screenshot. You probably should use a 64bit build if you have issues with
> memory exhaustion and 4-8 processes at most.
> 

I don't know what you're talking about. I never claimed I have memory exhaustion issues, outside leaks like this. Firefox is designed to use at most as many content processes as specified in dom.ipc.processCount, so I don't know what you mean by "not designed".
What's definitely abnormal is that processes waste CPU time when they're doing nothing (my screenshot was with one youtube page loaded), and that they start at the first place, when not needed.


> So due to that discrepancy you may or may not be seeing the same issue as I
> am.

That's easy to determine. Just increase the max number of content processes to a significantly higher number and see what happens. The only difference per your description is that I use more content processes.
Although more than one bug can be present at the same time.
> Firefox is designed to use at most as many content processes as specified in dom.ipc.processCount, so I don't know what you mean by "not designed".

The current aim of e10s-multi is, to my knowledge, 4-8 content processes, nothing more is seriously tested. So any potential scalability problems beyond that have not been addressed and you might end up running into them when using more than that number. That's why I suggested you should try running within the "approved" range.

> Although more than one bug can be present at the same time.

Yes, that's what I'm trying to rule out. It does look quite similar after all, there's just that one difference.

> (my screenshot was with one youtube page loaded), and that they start at the first place, when not needed.

I think there was a recent change to prestart some content processes to avoid delays when opening new tabs, setting a high processCount might affect that.
(In reply to The 8472 from comment #8)

> The current aim of e10s-multi is, to my knowledge, 4-8 content processes,
> nothing more is seriously tested. So any potential scalability problems
> beyond that have not been addressed and you might end up running into them
> when using more than that number. That's why I suggested you should try
> running within the "approved" range.

I've been using a lot of content processes for a long time. This issue appeared abruptly. Doesn't seem to be related.

> 
> > Although more than one bug can be present at the same time.
> 
> Yes, that's what I'm trying to rule out. It does look quite similar after
> all, there's just that one difference.

What I meant that there might be a second bug in parallel here. The stuff you described happen here too, so they're very likely the same bug. But there might be more at play.


> I think there was a recent change to prestart some content processes to
> avoid delays when opening new tabs, setting a high processCount might affect
> that.

Hmmm... Someone mentioned that also. But I didn't find the mentioned setting. Did they add it as a hidden preference? It'd be a **** thing to do... 

It still seems really odd that processes consume significant cpu time while one video plays in a single loaded tab.
Attached image profiler.png
It looks like the main threads of content processes are spending their CPU time in IPC code. Note that it's only 27% in WaitForMultipleObjects but 93% in MessagePump.

The compositor main thread is 99% in the wait method, so its activity probably in some other threads not captured by the profiler.

Side note: the profiler shouldn't show sleep-like methods as using CPU time. It often eats up the lion's share of the percentages and makes everything else more difficult to spot
Assignee: nobody → bugzilla.mozilla.org
Assignee: bugzilla.mozilla.org → nobody
I have now also have observed some cases where all child processes are pegged at 100% CPU but the memory use is not growing significantly.

STR:

1. create empty profile on nightly
2. open browser console and execute: for(let i=0;i<2000;i++) window.openNewTabWith("data:text/plain,foo");
3. restart profile
4. restore previous session (that should open 2000 lazy tabs)
5. open https://imgur.com/zvATqgs and http://www.faz.net/aktuell/politik/wahl-im-saarland/die-sozialdemokraten-im-wahljahr-2017-14945357.html in new windows
6. (optional) play one or more youtube videos in new windows

Result: very high CPU load and possibly the GPU process consuming increasing amounts of memory
@The 8472

I tested with a one tab open and the problem didn't present itself. When I loaded the normal session, with lots of unloaded tabs, it started happening again.

It looks like processes are erroneously started now if the session has unloaded tabs. The leaking memory and all processes consuming cpu time issues come atop of this. No clue why it works for me with only 1-2 tabs open, and not for you.


I feel like we'll be stuck with this for a long while, since so far we're the only ones noticing this.
(In reply to avada from comment #9)
> (In reply to The 8472 from comment #8)
> 
> > The current aim of e10s-multi is, to my knowledge, 4-8 content processes,
> > nothing more is seriously tested. So any potential scalability problems
> > beyond that have not been addressed and you might end up running into them
> > when using more than that number. That's why I suggested you should try
> > running within the "approved" range.
> 
> I've been using a lot of content processes for a long time. This issue
> appeared abruptly. Doesn't seem to be related.
> 
> > 
> > > Although more than one bug can be present at the same time.
> > 
> > Yes, that's what I'm trying to rule out. It does look quite similar after
> > all, there's just that one difference.
> 
> What I meant that there might be a second bug in parallel here. The stuff
> you described happen here too, so they're very likely the same bug. But
> there might be more at play.
> 
> 
> > I think there was a recent change to prestart some content processes to
> > avoid delays when opening new tabs, setting a high processCount might affect
> > that.
> 
> Hmmm... Someone mentioned that also. But I didn't find the mentioned
> setting. Did they add it as a hidden preference? It'd be a **** thing to
> do... 
> 
> It still seems really odd that processes consume significant cpu time while
> one video plays in a single loaded tab.

session restore was changed BACK to pre Firefox 44 behavior where restored (but unloaded) tabs are placed on content processes instead of choking the main process.
Whiteboard: [memshrink]
Maybe David has some ideas here. It sounds like there are steps to reproduce which is nice.
Flags: needinfo?(dvander)
Crash Signature: [@ OOM | unknown | mozalloc_abort | mozalloc_handle_oom | moz_xmalloc | mozilla::BufferList<T>::AllocateSegment ] [@ moz_abort | arena_run_split | arena_malloc_large | moz_xmalloc | mozilla::BufferList<T>::AllocateSegment ]
Whiteboard: [memshrink] → [MemShrink:P1]
This issue does not occur with webrender enabled. And even though that has its own problems, those may be more tolerable than ram exhaustion, so depending on the tradeoffs you want to make it can serve as a workaround for the moment.
(In reply to The 8472 from comment #16)
> This issue does not occur with webrender enabled. And even though that has
> its own problems, those may be more tolerable than ram exhaustion, so
> depending on the tradeoffs you want to make it can serve as a workaround for
> the moment.

gfx.webrender.enabled is true by default. And I never had it disabled, so I'm confused by your statement.
Well, you're on a 32bit build and you're seeing CPU load from a large number of content processes, not necessarily the gfx process (insufficient information from you to be sure). I mostly see the gfx process bloating up, the content processes are affected to a lesser extent. Due to those differences it might not be applicable to your case.
Oh, and you would actually have to check about:support to see whether it's really enabled. On 64bit builds seems disabled by default.
(In reply to The 8472 from comment #18)
> Well, you're on a 32bit build and you're seeing CPU load from a large number
> of content processes, not necessarily the gfx process (insufficient
> information from you to be sure). I mostly see the gfx process bloating up,
> the content processes are affected to a lesser extent. Due to those
> differences it might not be applicable to your case.

Three processes consume CPU to a large degree. (The main, and which I assume is the the GFX, and the tab's content process) The others only consume a small percentage of the cores, apparently without good cause. 

(In reply to The 8472 from comment #19)
> Oh, and you would actually have to check about:support to see whether it's
> really enabled. On 64bit builds seems disabled by default.

Hmmm. I have "unavailable by runtime: Build doesn't include WebRender" So I guess it's not availbel on x86 builds. (Or just the localized ones)
Can anyone who can reproduce this, link to a video that causes the problem?
Flags: needinfo?(dvander)
(In reply to David Anderson [:dvander] from comment #21)
> Can anyone who can reproduce this, link to a video that causes the problem?

Pretty much *any* html5 video triggers that behavior for me.
Concrete example: https://upload.wikimedia.org/wikipedia/commons/transcoded/c/c0/Big_Buck_Bunny_4K.webm/Big_Buck_Bunny_4K.webm.1080p.webm

Note that it is not the video playback itself that causes the load. The memory only keeps piling up in the gfx process if the tab with the video is actually visible. Speculation from bug 1350042 comment 5 is that this actually is a pileup of IPC messages.
(In reply to The 8472 from comment #11)
> I have now also have observed some cases where all child processes are
> pegged at 100% CPU but the memory use is not growing significantly.
> 
> STR:
> 
> 1. create empty profile on nightly
> 2. open browser console and execute: for(let i=0;i<2000;i++)
> window.openNewTabWith("data:text/plain,foo");
> 3. restart profile
> 4. restore previous session (that should open 2000 lazy tabs)
> 5. open https://imgur.com/zvATqgs and
> http://www.faz.net/aktuell/politik/wahl-im-saarland/die-sozialdemokraten-im-
> wahljahr-2017-14945357.html in new windows
> 6. (optional) play one or more youtube videos in new windows
> 
> Result: very high CPU load and possibly the GPU process consuming increasing
> amounts of memory

STR needs adjustment due to virtual lazy tabs (bug 906076). Skipping steps 3 and 4 should do the job.
It's still just as f*cked up as it was before bug 906706.

I noticed one thing though. I also get unreasonalbly high cpu usage and hangs when loading solidopinion comments. eg: when scrolling repeatedly to the bottom of this page: http://www.nextbigfuture.com/2017/04/132587.html
Anthony, seems like somebody from A/V should take a look at this.

David, didn't we have a bug about too much IPC?  Or was it somebody else that was looking at it?
Flags: needinfo?(dvander)
Flags: needinfo?(ajones)
(In reply to avada from comment #24)
> It's still just as f*cked up as it was before bug 906706.

Addons may cause virtual tabs to become devirtualized. I have observed the TabGroups addon doing that for example. You can check about:memory whether there are many about:blank window objects in the child processes.

Some overhead seems to be scaling with the number of window objects in content processes. That's why it was not a problem when they were in the parent process (but that had other drawbacks) or when they're not yet allocated.
Any tips on what to do to get |for(let i=0;i<2000;i++) window.openNewTabWith("data:text/plain,foo");| to work?
Note that in nightly we stop decoding videos in background tabs after 10 seconds.

Is someone who can repro it provide a link to a cleopatra profile? You should be able to just click on Share to get the link.
(In reply to Milan Sreckovic [:milan] from comment #25)
> Anthony, seems like somebody from A/V should take a look at this.
> 
> David, didn't we have a bug about too much IPC?  Or was it somebody else
> that was looking at it?

Video sends a lot of IPC, but if the event queue was backed up I would expect the video to slow down/skip a lot. Is that happening? And if the textures were leaking I'd expect the rate to be insanely fast, like 50MB/sec or something depending on the window size.

The STR involves having 2,000 background tabs - are those loading while you're measuring memory? Anyway - perfht.ml link would probably help.
Flags: needinfo?(dvander)
(In reply to Anthony Jones (:kentuckyfriedtakahe, :k17e) from comment #27)
> Any tips on what to do to get |for(let i=0;i<2000;i++)
> window.openNewTabWith("data:text/plain,foo");| to work?

1. open devtools (F12)
2. in devtools settings (cogwheel) enable "browser chrome and addon debugging toolboxes" 
3. open browser console (Ctrl+Shift+J)
4. paste code into the input line at the bottom of the browser console and press return

(In reply to David Anderson [:dvander] from comment #29)
> (In reply to Milan Sreckovic [:milan] from comment #25)
> > Anthony, seems like somebody from A/V should take a look at this.
> > 
> > David, didn't we have a bug about too much IPC?  Or was it somebody else
> > that was looking at it?
> 
> Video sends a lot of IPC, but if the event queue was backed up I would
> expect the video to slow down/skip a lot. Is that happening? And if the
> textures were leaking I'd expect the rate to be insanely fast, like 50MB/sec
> or something depending on the window size.

No, video does not stutter until physical ram is exhausted.
Also, video is not the only trigger for this issue, just the one that reproduces it most easily. Some other sites also cause this without playing video.

> The STR involves having 2,000 background tabs - are those loading while
> you're measuring memory?

The issue happened (before virtual tabs) with the about:blank tabs from load-on-demand background tabs. So it was not due to tab activity.

> Anyway - perfht.ml link would probably help.

I think I have tried using the gecko profiler and didn't spot anything interesting, so maybe it's happening on a thread that's not captured by the profiler? But I can try again.
(In reply to The 8472 from comment #30)
> I think I have tried using the gecko profiler and didn't spot anything
> interesting, so maybe it's happening on a thread that's not captured by the
> profiler? But I can try again.

This is probably true, you'll have to tell the profiler ahead of time what threads you're interested in monitoring.
I might run into this, with the content process and VTDecoderXPCService hogging CPU on my mac nightly. I wanted my CPU back, so I killed the content process, and restoring a non-offending tab then crashed firefox. Maybe the crashstats are helpful?

https://crash-stats.mozilla.com/report/index/bp-04ce6288-9de9-4ceb-abc3-50dfd0170501
https://crash-stats.mozilla.com/report/index/bp-0acd8589-3bb3-407a-94d2-2d1020170501
I think what I experience is definitely related to bug 1256472. With a few content processes there's no problem at all, but if something causes the addition of all the lazy background tabs, then all the issues present themselves. (Including bug 1356192 )
Jean-Yves - can you take a look at this one?
Flags: needinfo?(ajones) → needinfo?(jyavenard)
(In reply to Nathan Froyd [:froydnj] from comment #31)
> (In reply to The 8472 from comment #30)
> > I think I have tried using the gecko profiler and didn't spot anything
> > interesting, so maybe it's happening on a thread that's not captured by the
> > profiler? But I can try again.
> 
> This is probably true, you'll have to tell the profiler ahead of time what
> threads you're interested in monitoring.

Is this useful? https://perfht.ml/2qLRF7N

The profile was created with an empty thread filter, 2000 non-lazy data URI tabs and a video playing. The profile covers the memory use going up and down again after switching to a tab that does not cause the behavior.
Flags: needinfo?(nfroyd)
(In reply to The 8472 from comment #35)
> (In reply to Nathan Froyd [:froydnj] from comment #31)
> > (In reply to The 8472 from comment #30)
> > > I think I have tried using the gecko profiler and didn't spot anything
> > > interesting, so maybe it's happening on a thread that's not captured by the
> > > profiler? But I can try again.
> > 
> > This is probably true, you'll have to tell the profiler ahead of time what
> > threads you're interested in monitoring.
> 
> Is this useful? https://perfht.ml/2qLRF7N

I don't see any thread doing an outsized amount of computation.  I do see a decent amount of IPC stacks, but nothing that would enable figuring out what's going wrong.  Maybe somebody who knows more about the media stack would see something; a lot of those threads are doing nothing but waiting. :(
Flags: needinfo?(nfroyd)
(In reply to Nathan Froyd [:froydnj] from comment #36)
> (In reply to The 8472 from comment #35)
> > (In reply to Nathan Froyd [:froydnj] from comment #31)
> > > (In reply to The 8472 from comment #30)
> > > > I think I have tried using the gecko profiler and didn't spot anything
> > > > interesting, so maybe it's happening on a thread that's not captured by the
> > > > profiler? But I can try again.
> > > 
> > > This is probably true, you'll have to tell the profiler ahead of time what
> > > threads you're interested in monitoring.
> > 
> > Is this useful? https://perfht.ml/2qLRF7N
> 
> I don't see any thread doing an outsized amount of computation.  I do see a
> decent amount of IPC stacks, but nothing that would enable figuring out
> what's going wrong.  Maybe somebody who knows more about the media stack
> would see something; a lot of those threads are doing nothing but waiting. :(

Maybe it's spinning on some kernel interface used for IPC? I'm seeing a significant amount of time spent in kernel mode in process explorer. And (system-global) context switches go from 20k/s to 1M/s, interrupts from 15k/s to 470k/s. This does not occur when playing a video in an empty profile, so it's likely related to the issue.

Here's another capture https://perfht.ml/2qkdoQx with higher resolution and buffer size and only the threads that seemed to do IPC.
Also 1M context switches per second over 5 processes means 200k/s/process. The profiler was sampling at 1k/s, so the results might be skewed by some sampling bias.
Hi Anthony or Blake, any update here? If Jean-Yves is busy with other stuff, would you be able to find someone else to take a look at this? Thanks!
Flags: needinfo?(bwu)
Flags: needinfo?(ajones)
Kaku,
Could you check it?
Flags: needinfo?(kaku)
Flags: needinfo?(jyavenard)
Flags: needinfo?(bwu)
Flags: needinfo?(ajones)
I also noticed all firefox processes using around 0.70-1.4%  when playing html5 videos and also on some sites where there are no videos being played.

This site http://www.tomshardware.com/reviews/amd-ryzen-ama,5018.html triggers it for me, im running ublock origin so there could be video playback on the site without it.
(In reply to Thomas Andre from comment #41)
> I also noticed all firefox processes using around 0.70-1.4%  when playing
> html5 videos and also on some sites where there are no videos being played.
> 
> This site http://www.tomshardware.com/reviews/amd-ryzen-ama,5018.html
> triggers it for me, im running ublock origin so there could be video
> playback on the site without it.

Indeed. It's reproducible to me on v55 beta.
According to the socorro, the crash signature appears since 52, revising the status.
Gerry, we think it's good to keep tracking this for 55, could you help me nominate 55-tracking flag?
Per comment #43, track 55+.
Flags: needinfo?(gchang)
I cannot reproduce the phenomenon described by :The 8472, following Comment 23. 

I cannot reproduce comment 41 too, and I think comment 41 is not the same with the original issue because it mentioned nothing about memory exhaustion. 

I also think that this has very little connection to media code.
According to profiles in comment 35 and comment 37, it shows that media-related threads are not (heavily) working. And, I agree with Comment 29 that if we're leaking textures, it should lead to a quick crash. Comment 30 also mentioned that web pages without video playback also cause this issue.

:The 8472, is it still reproducible on your side now?
Flags: needinfo?(kaku) → needinfo?(bugzilla.mozilla.org)
(In reply to Tzuhao Kuo [:kaku] from comment #45)
> :The 8472, is it still reproducible on your side now?

Yes, it still occurs as of build 20170618030207

Note that the excessive CPU use can set in without the ever-increasing memory consumption. I think the issue is that one process broadcasts a huge amount of IPC messages and the others work to process them. The more messages it sends the higher the overall CPU load. The exact rate is variable. Only when the send rate exceeds the ability of the other processes to consume them they begin to pile up and memory starts ramping up.
That's why there can be two different symptom profiles, either increased CPU load only or increses CPU load + memory exhaustion.

I assume that that threshold varies from system to system, so maybe you need something more stressing to reproduce. You could try opening several windows and play two or more youtube videos side by side.

>I also think that this has very little connection to media code.

Yes, svg or css animations trigger it too, sometimes to a lesser extent. Videos are just the easiest way to trigger it.
Flags: needinfo?(bugzilla.mozilla.org)
Kan-Ru, maybe this could be an IPC issue rather than a media issue? 
Can you take a look? Thanks very much. 

Marking this fix-optional for 55 as it doesn't seem currently actionable. If we come up with a fix we could still take a patch though. 

Andrei, we have several reports of this issue. Can your team could try to reproduce it?
I'll try to reproduce it. :The 8472 while you can easily reproduce it, could you grab a debug build and run with MOZ_IPC_MESSAGE_LOG=1 to find out which IPC messages are flooding?
Flags: needinfo?(kanru) → needinfo?(bugzilla.mozilla.org)
I can reproduce it locally. I noticed that

1. GPU process's memory grew once I started to play video from https://imgur.com/zvATqgs
2. GPU process's memory slowly decreased once I paused the video
3. GPU process was constantly receiving WM_USER+1 message, presumably from the IPC system
  3.1 GPU process was constantly sending DidComposite IPC message
4. All other content processes was receiving "nsAppShell::EventID" message, it looks like a lot of time was spent in waking up and processing this message

(In reply to David Anderson [:dvander] from comment #29)
> (In reply to Milan Sreckovic [:milan] from comment #25)
> > Anthony, seems like somebody from A/V should take a look at this.
> > 
> > David, didn't we have a bug about too much IPC?  Or was it somebody else
> > that was looking at it?
> 
> Video sends a lot of IPC, but if the event queue was backed up I would
> expect the video to slow down/skip a lot. Is that happening? And if the
> textures were leaking I'd expect the rate to be insanely fast, like 50MB/sec
> or something depending on the window size.

The memory did grow very fast, but it's not leaking. The video started to drop frames eventually. I think it's is queued IPC message taking the memory but memory report shows 98% heap-unclassified.

I also noticed that SendDidComposite has to allocate every time. Maybe we should increase it's default size in message-metadata.ini
Flags: needinfo?(dvander)
(In reply to Kan-Ru Chen [:kanru] (UTC+8) from comment #49)
> I can reproduce it locally. I noticed that
> 
> 1. GPU process's memory grew once I started to play video from
> https://imgur.com/zvATqgs
> 2. GPU process's memory slowly decreased once I paused the video
> 3. GPU process was constantly receiving WM_USER+1 message, presumably from
> the IPC system
>   3.1 GPU process was constantly sending DidComposite IPC message
> 4. All other content processes was receiving "nsAppShell::EventID" message,
> it looks like a lot of time was spent in waking up and processing this
> message
> 

Re: #2, that sounds right. The WM_USER+1 message is just to wake up the event loop, so probably it's getting posted from the I/O thread to wake up the compositor.

> I also noticed that SendDidComposite has to allocate every time. Maybe we
> should increase it's default size in message-metadata.ini

That's odd. DidComposite on PCompositorBridge or PImageBridge? The former has a very small message footprint. The latter takes an array, but its length will be the number of videos that were composited.
Flags: needinfo?(dvander)
For GPU OOM, a large portion of heap allocations come from the following stack:
  ntdll.dll!<PDB not found>
  kernel32.dll!BaseThreadInitThunk
  |- xul.dll!`anonymous namespace'::ThreadFunc
  |    xul.dll!base::Thread::ThreadMain
  |    xul.dll!MessageLoop::Run
  |    xul.dll!MessageLoop::RunHandler
  |    xul.dll!base::MessagePumpWin::Run
  |    |- xul.dll!base::MessagePumpForUI::DoRunLoop
  |    |    xul.dll!MessageLoop::DoWork
  |    |    xul.dll!MessageLoop::DeferOrRunPendingTask
  |    |    xul.dll!MessageLoop::RunTask
  |    |    xul.dll!mozilla::detail::RunnableMethodImpl<mozilla::layers::CompositorVsyncScheduler * __ptr64 const,void (__cdecl mozilla::layers::CompositorVsyncScheduler::*)(mozilla::TimeStamp) __ptr64,1,1,mozilla::TimeStamp>::Run
  |    |    xul.dll!mozilla::layers::CompositorVsyncScheduler::Composite
  |    |    xul.dll!mozilla::layers::CompositorBridgeParent::CompositeToTarget
  |    |    xul.dll!mozilla::layers::CompositorBridgeParent::DidComposite
  |    |    xul.dll!mozilla::layers::CompositorBridgeParent::NotifyDidComposite
  |    |    xul.dll!mozilla::layers::CrossProcessCompositorBridgeParent::DidComposite
  |    |    xul.dll!mozilla::layers::PCompositorBridgeParent::SendDidComposite
  |    |    |- xul.dll!mozilla::layers::PCompositorBridgeParent::Write<mozilla::TimeStamp>
  |    |    |    xul.dll!Pickle::WriteBytes
  |    |    |    xul.dll!mozilla::BufferList<InfallibleAllocPolicy>::AllocateSegment
  |    |    |    |- mozglue.dll!moz_xmalloc
  |    |    |    |    mozglue.dll!arena_malloc_large
  |    |    |    |    |- mozglue.dll!arena_run_split
That's amazing. Is content not reading messages off the pipe fast enough, forcing the GPU process to buffer writes?
(And why wouldn't that happen with the GPU process off?)
(In reply to David Anderson [:dvander] from comment #50)
> > I also noticed that SendDidComposite has to allocate every time. Maybe we
> > should increase it's default size in message-metadata.ini
> 
> That's odd. DidComposite on PCompositorBridge or PImageBridge? The former
> has a very small message footprint. The latter takes an array, but its
> length will be the number of videos that were composited.

The initial capacity is 64 bytes and the header took 28 bytes

http://searchfox.org/mozilla-central/source/__GENERATED__/ipc/ipdl/PCompositorBridgeParent.cpp#201

When we write the compositeStart sentinel we allocate 4k but only use 24 bytes.
(In reply to David Anderson [:dvander] from comment #52)
> That's amazing. Is content not reading messages off the pipe fast enough,
> forcing the GPU process to buffer writes?

It could be. I noticed that the content processes has very few sampled stacks containing PCompositorBridgeChild or PCompositorManagerChild in the CPU usage profile.
(In reply to Kan-Ru Chen [:kanru] (UTC+8) from comment #54)
> (In reply to David Anderson [:dvander] from comment #50)
> > > I also noticed that SendDidComposite has to allocate every time. Maybe we
> > > should increase it's default size in message-metadata.ini
> > 
> > That's odd. DidComposite on PCompositorBridge or PImageBridge? The former
> > has a very small message footprint. The latter takes an array, but its
> > length will be the number of videos that were composited.
> 
> The initial capacity is 64 bytes and the header took 28 bytes
> 
> http://searchfox.org/mozilla-central/source/__GENERATED__/ipc/ipdl/
> PCompositorBridgeParent.cpp#201
> 
> When we write the compositeStart sentinel we allocate 4k but only use 24
> bytes.

Confirmed with Jerry that we need to send DidComposite for each top level browser (tabchild) so with 2000 tabs open we have to send 120K message per second and that's 457 MB (!) messages in size. No wonder the GPU process was totally bogged down.

There are two separated issues need address here:

 1. We should not send DidComposite message to hidden tabs or at least send it once per content process
 2. We should reduce the cost of small IPC messages.

I'll open a new bug for 2. and this bug can handle 1.
Component: Audio/Video: Playback → Graphics: Layers
Flags: needinfo?(bugzilla.mozilla.org)
Flags: needinfo?(andrei.vaida)
Depends on: 1380974
(In reply to Danial Horton from comment #56)
> Measured with GPU-Z Video Engine Load / Aida64 Video Engine reading on a
> Pascal GP104 and GP106
> 
> This has only started to occur recently after video playback 'fixes' were
> added.

This sounds like something worth looking into, but please file a new bug for it instead of piling on to this one.
(In reply to Kan-Ru Chen [:kanru] (UTC+8) from comment #58)
> (In reply to Kan-Ru Chen [:kanru] (UTC+8) from comment #54)
> > (In reply to David Anderson [:dvander] from comment #50)
> > > > I also noticed that SendDidComposite has to allocate every time. Maybe we
> > > > should increase it's default size in message-metadata.ini
> > > 
> > > That's odd. DidComposite on PCompositorBridge or PImageBridge? The former
> > > has a very small message footprint. The latter takes an array, but its
> > > length will be the number of videos that were composited.
> > 
> > The initial capacity is 64 bytes and the header took 28 bytes
> > 
> > http://searchfox.org/mozilla-central/source/__GENERATED__/ipc/ipdl/
> > PCompositorBridgeParent.cpp#201
> > 
> > When we write the compositeStart sentinel we allocate 4k but only use 24
> > bytes.
> 
> Confirmed with Jerry that we need to send DidComposite for each top level
> browser (tabchild) so with 2000 tabs open we have to send 120K message per
> second and that's 457 MB (!) messages in size. No wonder the GPU process was
> totally bogged down.
> 
> There are two separated issues need address here:
> 
>  1. We should not send DidComposite message to hidden tabs or at least send
> it once per content process
>  2. We should reduce the cost of small IPC messages.
> 
> I'll open a new bug for 2. and this bug can handle 1.

Please skip the comment 59.
Yes, gecko will send 2000 messages from parent to child if we have 2000 tabs.

https://dxr.mozilla.org/mozilla-central/rev/8486950bd91878bf077a9ac33cb3c018288fe518/gfx/layers/ipc/CompositorBridgeParent.cpp#2005-2008

The IndirectLayerTree map has 2000 entries in this case.
There's also the issue that the memory shows up as heap-unclassified.
(In reply to The 8472 from comment #62)
> There's also the issue that the memory shows up as heap-unclassified.

That normally shouldn't be a issue because IPC messages are allocated/released quickly. But we should probably report the queued message number in content process or GPU process like we did in the chrome processes.
What I will do for 2000 DidComposite() messages problem are:

a) Try to skip the message for non-updated layer.
Please check:
https://dxr.mozilla.org/mozilla-central/rev/8486950bd91878bf077a9ac33cb3c018288fe518/gfx/layers/ipc/CrossProcessCompositorBridgeParent.cpp#342,344

If there is no new transaction for that layer tree, the transaction id will be reset to 0. So, we might skip the message when the transaction id is 0.

Furthermore, we should not send the DidComposite() message to the background tab. There is no information to show the layer tree is a foreground or background tab in gpu process. If the tab is a background tab, I think there will be no layer transaction for that tab. So, I think the a) could resolve the foreground/background problem.

b) Group the messages which have the same target and send by one message.

The ipdl message will like:
  async DidComposite(DidCompositeData[] aMessages);

https://dxr.mozilla.org/mozilla-central/rev/8486950bd91878bf077a9ac33cb3c018288fe518/gfx/layers/ipc/CompositorBridgeParent.cpp#2008

The timestamp data sent to the child are all the same. Group all together could reduce the memory usage.
(In reply to Jerry Shih[:jerry] (UTC+8) from comment #64)
> b) Group the messages which have the same target and send by one message.

This will make it impossible to label the DidComposite messages for Quantum DOM.

If we don't send DidComposite messages to background tabs (which I thought was the purpose of bug 1337548), then it seems like we shouldn't be sending that many of them. Are these 2000 tabs actually visible?
(In reply to Bill McCloskey (:billm) from comment #65)
> (In reply to Jerry Shih[:jerry] (UTC+8) from comment #64)
> > b) Group the messages which have the same target and send by one message.
> 
> This will make it impossible to label the DidComposite messages for Quantum
> DOM.
> 
> If we don't send DidComposite messages to background tabs (which I thought
> was the purpose of bug 1337548), then it seems like we shouldn't be sending
> that many of them. Are these 2000 tabs actually visible?

I think the patch in bug 1337548 doesn't check the background tab condition.

The sIndirectLayerTrees is a global map. All layer trees will put together.
https://dxr.mozilla.org/mozilla-central/rev/7d92f47379da55adf4da6d0b16398f5c629cf949/gfx/layers/ipc/CompositorBridgeParent.cpp#238

If we have two CompositorBridgeParent(if we have two widget window) and we call DidComposite() for these two CompositorBridgeParent, all layer trees node will be iterated twice.

The patch:
+    if (lts->mCrossProcessParent && lts->mParent == this) {

A layer tree only has one parent. With this checking, a layer trees node will be iterated only once.
Flags: needinfo?(wmccloskey)
OK, that sucks. Whatever happens here, it would be much better if we could avoid doing (b).
Flags: needinfo?(wmccloskey)
(In reply to Bill McCloskey (:billm) from comment #67)
> it would be much better if we could avoid doing (b).
     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Do you have any concern if we do the item (b)|merge the message|?
Flags: needinfo?(wmccloskey)
(In reply to Jerry Shih[:jerry] (UTC+8) from comment #68)
> (In reply to Bill McCloskey (:billm) from comment #67)
> > it would be much better if we could avoid doing (b).
>      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
> 
> Do you have any concern if we do the item (b)|merge the message|?

Yes, as I said, merging the messages would make it impossible to label the DidComposite message. We label this message here:
http://searchfox.org/mozilla-central/rev/cbd628b085ac809bf5a536109e6288aa91cbdff0/gfx/layers/ipc/CompositorManagerChild.cpp#235
If the messages get merged, then there will be no way to get a single layer ID to use for labeling.

This is the most common runnable in Firefox, so it would be very bad for Quantum DOM if we couldn't label the message.
Flags: needinfo?(wmccloskey)
See Also: → 1376763
I created 4 tabs and refresh one tab, then I got the following messages.
The gecko still send the DidComposite message with tid=0 (transaction id) to the background tabs.

The client side just skip the tid:0 notification[1], but we always send the DidComposite to plugins[2].

[1]
https://dxr.mozilla.org/mozilla-central/rev/7d92f47379da55adf4da6d0b16398f5c629cf949/gfx/layers/client/ClientLayerManager.cpp#510
[2]
https://dxr.mozilla.org/mozilla-central/rev/7d92f47379da55adf4da6d0b16398f5c629cf949/gfx/layers/client/ClientLayerManager.cpp#524

bignose CrossProcessCompositorBridgeParent::DidComposite, gecko layer id:4294967302, tid:97
bignose CrossProcessCompositorBridgeParent::DidComposite, gecko layer id:4294967303, tid:0
bignose CrossProcessCompositorBridgeParent::DidComposite, gecko layer id:4294967304, tid:0
bignose CrossProcessCompositorBridgeParent::DidComposite, gecko layer id:4294967305, tid:0
bignose CrossProcessCompositorBridgeParent::DidComposite, gecko layer id:4294967302, tid:98
bignose CrossProcessCompositorBridgeParent::DidComposite, gecko layer id:4294967303, tid:0
bignose CrossProcessCompositorBridgeParent::DidComposite, gecko layer id:4294967304, tid:0
bignose CrossProcessCompositorBridgeParent::DidComposite, gecko layer id:4294967305, tid:0
bignose CrossProcessCompositorBridgeParent::DidComposite, gecko layer id:4294967302, tid:99
Please check comment 70 and comment 58. If we have a lot of tabs, we will have the high cpu and memory usage problem.
It's fine to skip the notification for the regular layer-tree case[1], but how about the plugin[2]?
Could we skip the [2] case?

[1]
https://dxr.mozilla.org/mozilla-central/rev/7d92f47379da55adf4da6d0b16398f5c629cf949/gfx/layers/client/ClientLayerManager.cpp#510
[2]
https://dxr.mozilla.org/mozilla-central/rev/7d92f47379da55adf4da6d0b16398f5c629cf949/gfx/layers/client/ClientLayerManager.cpp#524
Attachment #8886770 - Flags: review?(nical.bugzilla)
Attachment #8886770 - Flags: review?(matt.woodrow)
Attachment #8886770 - Flags: review?(dvander)
Comment on attachment 8886770 [details] [diff] [review]
Skip the DidComposite() message for non-updated layer tree.

Review of attachment 8886770 [details] [diff] [review]:
-----------------------------------------------------------------

I believe it's fine to not deliver DidComposite messages to background plugins (since we didn't actually composite them!), but dvander should make the final call.
Attachment #8886770 - Flags: review?(matt.woodrow)
Comment on attachment 8886770 [details] [diff] [review]
Skip the DidComposite() message for non-updated layer tree.

Review of attachment 8886770 [details] [diff] [review]:
-----------------------------------------------------------------

The DidComposite notification is a recent addition to NPAPI that only Flash uses, and I'm pretty sure Flash composites at 60fps and ignores the callback. Either way, the callback's documentation says it only has to fire when the plugin is visible. So I think this is okay.
Attachment #8886770 - Flags: review?(dvander) → review+
You will have ti be very very careful about skipping DidComposite notifications because we use this as a mechanism to throttle the refresh driver and prevent it from being too much ahead of the compositor. See https://searchfox.org/mozilla-central/rev/01d27fdd3946f7210da91b18fcccca01d7324fe2/layout/base/nsRefreshDriver.cpp#2242

The refresh driver won't tick again until it has received the DidComposite notification for one of the last two frames. If you drop DidComposite notifications without adding some mechanism to un-throttle the refresh driver when the tab comes back to the foreground, web content will get stuck in a state where it stops refreshing an nothing is painted (which is tough to debug).
Attachment #8886770 - Flags: review?(nical.bugzilla) → review+
Please land the attachment 8886770 [details] [diff] [review] to m-c.

The try:
https://treeherder.mozilla.org/#/jobs?repo=try&revision=c0b737123e2e13b3eb4deeee6eee2a0850978048&selectedJob=116818220
I could see the similar test failed in another try push. So, I think most of dt* and bc* test fails are not related to this patch.
Keywords: checkin-needed
Pushed by cbook@mozilla.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/c1b154630587
Skip the DidComposite() message for non-updated layer tree. r=dvander,nical,mattwoodrow
Keywords: checkin-needed
https://hg.mozilla.org/mozilla-central/rev/c1b154630587
Status: ASSIGNED → RESOLVED
Closed: 7 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla56
Can you confirm that things look better with the 2017-07-25 nightly?
Flags: needinfo?(bugzilla.mozilla.org)
Can confirm.
Flags: needinfo?(bugzilla.mozilla.org)
I just skip some use-less message from parent to child. I don't group the messages. So, the label of the task is still work.
Flags: needinfo?(wmccloskey)
Can we move any remaining follow-up work to a new bug? Especially since we're running low on time in this cycle, it will make tracking easier. Also, is this something we'd consider safe to uplift? If so, please nominate :)
Status: RESOLVED → VERIFIED
Flags: needinfo?(hshih)
From comment 73, I don't know if there is a big impact for plugin. This patch might cause the screen not updating. I would like to test more. So, there is no uplift request here. But this patch could reduce the big number of messages passing if you have a lot of tab for every frame. Maybe it could be uplifted sometime later.
Flags: needinfo?(hshih)
We have less than a week until we're building the Fx55 release candidate, so that tells me that this is probably wontfix for 55 then.
Flags: needinfo?(wmccloskey)
Depends on: 1646629
No longer depends on: 1646629
You need to log in before you can comment on or make changes to this bug.