Closed Bug 1241656 Opened 4 years ago Closed 4 years ago

Avoid heap churn for YouTube videos caused by AutoTaskDispatcher::mDirectTasks

Categories

(Core :: XPCOM, defect)

defect
Not set

Tracking

()

RESOLVED FIXED
mozilla46
Tracking Status
firefox46 --- fixed

People

(Reporter: njn, Assigned: njn)

Details

Attachments

(1 file, 1 obsolete file)

I did cumulative heap profiling with DMD while playing YouTube videos. On both
Mac and Linux I saw huge numbers of malloc/free calls occurring for the
creation/destruction of AutoTaskDispatcher::mDirectTasks, which is a
std::queue, like so:

> Cumulative {
>   201,814 blocks in heap block record 3 of 50,684
>   103,328,768 bytes (103,328,768 requested / 0 slop)
>   Individual block sizes: 512 x 201,814
>   2.34% of the heap (2.74% cumulative)
>   Allocated at {
>     #01: replace_malloc (/home/njn/moz/mi4/d64dmd/memory/replace/dmd/../../../../memory/replace/dmd/DMD.cpp:1257)
>     #02: malloc (/home/njn/moz/mi4/d64dmd/memory/build/../../../memory/build/replace_malloc.c:152)
>     #03: moz_xmalloc (/home/njn/moz/mi4/memory/mozalloc/mozalloc.cpp:83)
>     #04: __gnu_cxx::new_allocator<nsCOMPtr<nsIRunnable> >::allocate(unsigned long, void const*) (/usr/bin/../lib/gcc/x86_64-linux-gnu/5.2.1/../.
> ./../../include/c++/5.2.1/ext/new_allocator.h:104)
>     #05: std::allocator_traits<std::allocator<nsCOMPtr<nsIRunnable> > >::allocate(std::allocator<nsCOMPtr<nsIRunnable> >&, unsigned long) (/usr/
> bin/../lib/gcc/x86_64-linux-gnu/5.2.1/../../../../include/c++/5.2.1/bits/alloc_traits.h:360)
>     #06: std::_Deque_base<nsCOMPtr<nsIRunnable>, std::allocator<nsCOMPtr<nsIRunnable> > >::_M_allocate_node() (/usr/bin/../lib/gcc/x86_64-linux-
> gnu/5.2.1/../../../../include/c++/5.2.1/bits/stl_deque.h:601)
>     #07: std::_Deque_base<nsCOMPtr<nsIRunnable>, std::allocator<nsCOMPtr<nsIRunnable> > >::_M_create_nodes(nsCOMPtr<nsIRunnable>**, nsCOMPtr<nsI
> Runnable>**) (/usr/bin/../lib/gcc/x86_64-linux-gnu/5.2.1/../../../../include/c++/5.2.1/bits/stl_deque.h:726)
>     #08: std::_Deque_base<nsCOMPtr<nsIRunnable>, std::allocator<nsCOMPtr<nsIRunnable> > >::_M_initialize_map(unsigned long) (/usr/bin/../lib/gcc
> /x86_64-linux-gnu/5.2.1/../../../../include/c++/5.2.1/bits/stl_deque.h:709)
>     #09: _Deque_base (/usr/bin/../lib/gcc/x86_64-linux-gnu/5.2.1/../../../../include/c++/5.2.1/bits/stl_deque.h:513)
>     #10: _Deque_base (/usr/bin/../lib/gcc/x86_64-linux-gnu/5.2.1/../../../../include/c++/5.2.1/bits/stl_deque.h:520)
>     #11: deque (/usr/bin/../lib/gcc/x86_64-linux-gnu/5.2.1/../../../../include/c++/5.2.1/bits/stl_deque.h:957)
>     #12: queue (/usr/bin/../lib/gcc/x86_64-linux-gnu/5.2.1/../../../../include/c++/5.2.1/bits/stl_queue.h:146)
>     #13: AutoTaskDispatcher (/home/njn/moz/mi4/d64dmd/xpcom/threads/../../dist/include/mozilla/TaskDispatcher.h:72)
>     #14: AutoTaskGuard (/home/njn/moz/mi4/d64dmd/xpcom/threads/../../dist/include/mozilla/TaskQueue.h:121)
>     #15: mozilla::TaskQueue::Runner::Run() (/home/njn/moz/mi4/xpcom/threads/TaskQueue.cpp:169)
>   }
> }
> 
> Cumulative {
>   201,814 blocks in heap block record 5 of 50,684
>   12,916,096 bytes (12,916,096 requested / 0 slop)
>   Individual block sizes: 64 x 201,814
>   0.29% of the heap (5.38% cumulative)
>   Allocated at {
>     #01: replace_malloc (/home/njn/moz/mi4/d64dmd/memory/replace/dmd/../../../../memory/replace/dmd/DMD.cpp:1257)
>     #02: malloc (/home/njn/moz/mi4/d64dmd/memory/build/../../../memory/build/replace_malloc.c:152)
>     #03: moz_xmalloc (/home/njn/moz/mi4/memory/mozalloc/mozalloc.cpp:83)
>     #04: __gnu_cxx::new_allocator<nsCOMPtr<nsIRunnable>*>::allocate(unsigned long, void const*) (/usr/bin/../lib/gcc/x86_64-linux-gnu/5.2.1/../.
> ./../../include/c++/5.2.1/ext/new_allocator.h:104)
>     #05: std::allocator_traits<std::allocator<nsCOMPtr<nsIRunnable>*> >::allocate(std::allocator<nsCOMPtr<nsIRunnable>*>&, unsigned long) (/usr/
> bin/../lib/gcc/x86_64-linux-gnu/5.2.1/../../../../include/c++/5.2.1/bits/alloc_traits.h:360)
>     #06: std::_Deque_base<nsCOMPtr<nsIRunnable>, std::allocator<nsCOMPtr<nsIRunnable> > >::_M_allocate_map(unsigned long) (/usr/bin/../lib/gcc/x
> 86_64-linux-gnu/5.2.1/../../../../include/c++/5.2.1/bits/stl_deque.h:615)
>     #07: std::_Deque_base<nsCOMPtr<nsIRunnable>, std::allocator<nsCOMPtr<nsIRunnable> > >::_M_initialize_map(unsigned long) (/usr/bin/../lib/gcc
> /x86_64-linux-gnu/5.2.1/../../../../include/c++/5.2.1/bits/stl_deque.h:688)
>     #08: _Deque_base (/usr/bin/../lib/gcc/x86_64-linux-gnu/5.2.1/../../../../include/c++/5.2.1/bits/stl_deque.h:490)
>     #09: deque (/usr/bin/../lib/gcc/x86_64-linux-gnu/5.2.1/../../../../include/c++/5.2.1/bits/stl_deque.h:883)
>     #10: AutoTaskDispatcher (/home/njn/moz/mi4/d64dmd/xpcom/threads/../../dist/include/mozilla/TaskDispatcher.h:72)
>     #11: AutoTaskGuard (/home/njn/moz/mi4/d64dmd/xpcom/threads/../../dist/include/mozilla/TaskQueue.h:121)
>     #12: mozilla::TaskQueue::Runner::Run() (/home/njn/moz/mi4/xpcom/threads/TaskQueue.cpp:169)
>     #13: nsThreadPool::Run() (/home/njn/moz/mi4/xpcom/threads/nsThreadPool.cpp:221)
>     #14: non-virtual thunk to nsThreadPool::Run() (/home/njn/moz/mi4/xpcom/threads/nsThreadPool.cpp:148)
>     #15: nsThread::ProcessNextEvent(bool, bool*) (/home/njn/moz/mi4/xpcom/threads/nsThread.cpp:990)
>   }
> }

I've seen these allocations account for 11-14% of all heap allocations
(cumulative, not live) in the content process while watching YouTube. That's a
*lot* of heap churn.

And it's because libstdc++ does two heap allocations when constructing a
std::queue, before it's even used.

(The code for libstdc++'s std::deque is at https://gcc.gnu.org/onlinedocs/libstdc++/libstdc++-html-USERS-4.3/a02018.html. I think the 512 comes from __deque_buf_size() on lines 71--83, and the 64 comes from _S_initial_map_size (which is 8) multiplied by the element size, which is also 8 on 64-bit platforms.)

So I did some ad hoc profiling to see how big mDirectTasks gets:

> 33413 counts:
> ( 1)  31604 (94.6%, 94.6%): mMaxDirectTasks: 0
> ( 2)   1804 ( 5.4%,100.0%): mMaxDirectTasks: 1
> ( 3)      2 ( 0.0%,100.0%): mMaxDirectTasks: 2
> ( 4)      1 ( 0.0%,100.0%): mMaxDirectTasks: 3
> ( 5)      1 ( 0.0%,100.0%): mMaxDirectTasks: 4
> ( 6)      1 ( 0.0%,100.0%): mMaxDirectTasks: 5

95% of the time we don't put anything in it! It should be easy to avoid
constructing the queue when we don't need it.

I see three possible fixes here.

1. Lazify mDirectTasks by wrapping it in a Maybe<>. Very easy, and gets us 95%
of the benefit.

2. Reimplement mDirectTasks with nsDeque. We'd want a type-safe wrapper around
nsDeque similar to MediaQueue. A bit more work, but gets 100% of the benefit.

3. Implement a new deque class in MFBT, preferably replacing nsDeque and its
uses in the process. A lot more work, but gets 100% of the benefit and also
improves other code.

I haven't looked at other workloads, so I don't know if this situation will
come up in other times. I also haven't yet looked at Windows to see if the std::deque there is better.

Option 3 is The Right Thing To Do, but I'm not sure I have the time to do that, so I'm inclined to settle on option 1. bholley, what do you think?
Summary: Avoid heap churn for YouTube videos causes by AutoTaskDispatcher::mDirectTasks → Avoid heap churn for YouTube videos caused by AutoTaskDispatcher::mDirectTasks
This patch implements option 1 from comment 0.
Attachment #8710708 - Flags: feedback?(bobbyholley)
Assignee: nobody → n.nethercote
Status: NEW → ASSIGNED
> I also haven't yet looked at Windows to see if the std::deque there is better.

I just did a Windows run and there's no sign of the problem there. The MS implementation must be smarter and avoid heap allocations during std::queue construction.
Comment on attachment 8710708 [details] [diff] [review]
Lazify AutoTaskDispatcher::mDirectTasks

Review of attachment 8710708 [details] [diff] [review]:
-----------------------------------------------------------------

Nice find! I'd also be fine with using a different data structure here if the implementation happens to be more efficient - I just did this because the algorithm is Queue-y. A vector with inline storage would also work fine.

::: xpcom/threads/TaskDispatcher.h
@@ +83,5 @@
>      // accessed by the direct tasks it dispatches (true for TailDispatchers, but
>      // potentially not true for other hypothetical AutoTaskDispatchers). Feel
>      // free to loosen this restriction to apply only to mIsTailDispatcher if a
>      // use-case requires it.
> +    MOZ_ASSERT(mDirectTasks.isNothing() || mDirectTasks->empty());

This condition appears often enough that I think we should make a predicate method called HaveDirectTasks() and use it here...

@@ +93,5 @@
>  
>    void DrainDirectTasks() override
>    {
> +    if (mDirectTasks.isSome()) {
> +      while (!mDirectTasks->empty()) {

and here...

@@ +131,5 @@
>    }
>  
>    bool HasTasksFor(AbstractThread* aThread) override
>    {
> +    return !!GetTaskGroup(aThread) || (aThread == AbstractThread::GetCurrent() && !mDirectTasks->empty());

And here. Also, what is this change here doing? Don't you mean to check isSome here?
Attachment #8710708 - Flags: feedback?(bobbyholley) → feedback+
In general though, if having an std::deque is a footgun, we should probably find some way to forbid it.
With previous comments addressed.
Attachment #8710850 - Flags: review?(bobbyholley)
Attachment #8710708 - Attachment is obsolete: true
Attachment #8710850 - Flags: review?(bobbyholley) → review+
https://hg.mozilla.org/mozilla-central/rev/08b0f38ce699
Status: ASSIGNED → RESOLVED
Closed: 4 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla46
You need to log in before you can comment on or make changes to this bug.