Add a memory reporter counting ContentParents and the number of IPC messages they have outstanding.

RESOLVED FIXED in Firefox 25, Firefox OS v1.1hd

Status

()

RESOLVED FIXED
5 years ago
5 years ago

People

(Reporter: justin.lebar+bug, Assigned: justin.lebar+bug)

Tracking

unspecified
mozilla25
Points:
---

Firefox Tracking Flags

(blocking-b2g:leo+, firefox23 wontfix, firefox24 wontfix, firefox25 fixed, b2g18 fixed, b2g18-v1.0.0 wontfix, b2g18-v1.0.1 wontfix, b2g-v1.1hd fixed)

Details

(Whiteboard: [MemShrink][LeoVB+])

Attachments

(5 attachments, 2 obsolete attachments)

(Assignee)

Description

5 years ago
Patch in a moment.
(Assignee)

Comment 1

5 years ago
Created attachment 774971 [details] [diff] [review]
Part 1: Add Unsound_IsClosed() and Unsound_NumQueuedMessages() to AsyncChannel.
Attachment #774971 - Flags: review?(benjamin)
(Assignee)

Comment 2

5 years ago
Created attachment 774973 [details] [diff] [review]
Part 2: Add ContentParent::GetAllEvenIfDead().

To do this, we got rid of the explicit removal of a ContentParent from
sContentParents in MarkAsDead().  Instead, we rely on
LinkedListElement's destructor to remove the ContentParent from
sContentParents.

Thus, a ContentParent stays in sContentParents until it's destructed.
ContentParent::GetAll() checks whether the ContentParent is still alive,
so its behavior is unchanged.
Attachment #774973 - Flags: review?(benjamin)
(Assignee)

Comment 3

5 years ago
Created attachment 774974 [details] [diff] [review]
Part 3: Add a memory reporter counting ContentParents and the number of IPC messages they have outstanding.
Attachment #774974 - Flags: review?(n.nethercote)
(Assignee)

Comment 4

5 years ago
I haven't tried compiling this on on Windows, so sorry in advance for typos there.
(Assignee)

Comment 5

5 years ago
Created attachment 774978 [details] [diff] [review]
Part 3, v2: Add a memory reporter counting ContentParents and the number of IPC messages they have outstanding.

Tweaked memory reporter path slightly.
Attachment #774971 - Attachment is obsolete: true
Attachment #774971 - Flags: review?(benjamin)
Attachment #774978 - Flags: review?(n.nethercote)
(Assignee)

Updated

5 years ago
Attachment #774974 - Attachment is obsolete: true
Attachment #774974 - Flags: review?(n.nethercote)
(Assignee)

Updated

5 years ago
Attachment #774971 - Attachment is obsolete: false
(Assignee)

Comment 6

5 years ago
The memory reporter looks like:

> 0 (100.0%) -- queued-ipc-messages
> ├──0 (100.0%) ── content-parent((Preallocated), pid=2156, open channel, 0x44874800, refcnt=14)
> ├──0 (100.0%) ── content-parent(Homescreen, pid=2086, open channel, 0x46112800, refcnt=14)
> └──0 (100.0%) ── content-parent(Usage, pid=2074, open channel, 0x40416400, refcnt=14)
(Assignee)

Updated

5 years ago
Attachment #774971 - Flags: review?(benjamin)
(Assignee)

Comment 7

5 years ago
I wanted to include the message size here, instead of just the number, but it was somewhat complicated to achieve in a thread-safe manner.

The issue is that I don't want to call malloc_usable_size on every Message when it gets enqueued (sorry, premature optimization).  But I also don't want to stick a lock around the Channel object, which means that we can't iterate over the Channel's queue from the main thread.

This patch is conservative in terms of its perf impact, which is good if we want to take it on branches (we probably do).
Comment on attachment 774971 [details] [diff] [review]
Part 1: Add Unsound_IsClosed() and Unsound_NumQueuedMessages() to AsyncChannel.

Review of attachment 774971 [details] [diff] [review]:
-----------------------------------------------------------------

Looks good, a few things though:

(The comments in the posix files apply to win as well)

::: ipc/chromium/src/chrome/common/ipc_channel.h
@@ +103,5 @@
> +  // Unsound_IsClosed() and Unsound_NumQueuedMessages() are safe to call from
> +  // any thread, but the value returned may be out of date, because we don't
> +  // use any synchronization when reading or writing it.
> +  bool Unsound_IsClosed();
> +  uint32_t Unsound_NumQueuedMessages();

Nit: Can you mark these as 'const'?

::: ipc/chromium/src/chrome/common/ipc_channel_posix.cc
@@ +292,5 @@
>    closed_ = false;
>  #if defined(OS_MACOSX)
>    last_pending_fd_id_ = 0;
>  #endif
> +  output_queue_length_ = 0;

This will certainly trigger valgrind/Asan warnings, so we should file a followup to annotate them as known-race.

I'm a bit concerned that this will occasionally do really bogus things (like if an optimizer tries to use this location for temporary calculations or something), but at worst we'll just get a flukey memory report...

::: ipc/chromium/src/chrome/common/ipc_channel_posix.h
@@ +151,5 @@
>    // A generation ID for RECEIVED_FD messages.
>    uint32_t last_pending_fd_id_;
>  #endif
>  
> +  size_t output_queue_length_;

Nit: This should have some kind of comment too, and maybe be named "unsound" also.

::: ipc/glue/AsyncChannel.cpp
@@ +279,5 @@
> +
> +uint32_t
> +AsyncChannel::ThreadLink::Unsound_NumQueuedMessages()
> +{
> +    return 0;

Maybe just comment here that we don't currently care about thread channels but that we may implement this someday?

@@ +694,5 @@
> +bool
> +AsyncChannel::Unsound_IsClosed()
> +{
> +    if (!mLink) {
> +        return false;

Is this right? If we don't have a link shouldn't that count as closed?
Attachment #774971 - Flags: review+
(In reply to Justin Lebar [:jlebar] from comment #7)
> The issue is that I don't want to call malloc_usable_size on every Message
> when it gets enqueued (sorry, premature optimization).

We can always use the capacity_ member, right?
Attachment #774973 - Flags: review?(benjamin) → review+
(Assignee)

Comment 10

5 years ago
> We can always use the capacity_ member, right?

It's true, we could.

The trick is ensuring that the capacity doesn't change.  We've had a lot of problems with counter-based memory reporters where the value you add in isn't the same as the value you subtract out, so I didn't want to risk it.

(Note that we have the same problem with a counter-based malloc_usable_size approach.)
(Assignee)

Updated

5 years ago
Attachment #774971 - Flags: review?(benjamin)
Comment on attachment 774978 [details] [diff] [review]
Part 3, v2: Add a memory reporter counting ContentParents and the number of IPC messages they have outstanding.

Review of attachment 774978 [details] [diff] [review]:
-----------------------------------------------------------------

::: dom/ipc/ContentParent.cpp
@@ +254,5 @@
> +            "The number of unset IPC messages held in this ContentParent's "
> +            "channel.  Usually we'd expect this to be zero, or close to it; a "
> +            "large value here might indicate that we're leaking messages. "
> +            "Similarly, a ContentParent object for a process that's no longer "
> +            "running could indicate that we're leaking ContentParents.");

Omit the "Usually we'd expect this to be zero, or close to it;" ?

@@ +335,5 @@
>          return;
>      }
>  
> +    nsRefPtr<ContentParentMemoryReporter> mr = new ContentParentMemoryReporter();
> +    NS_RegisterMemoryMultiReporter(mr);

I was really confused here, because I thought ContentParent::StartUp() was run for each new ContentParent.  But now I see it's a static method run once -- "Start up the content-process machinery.  This might include scheduling pre-launch tasks."
Attachment #774978 - Flags: review?(n.nethercote) → review+
(Assignee)

Comment 12

5 years ago
> Omit the "Usually we'd expect this to be zero, or close to it;" ?

Yeah, that's redundant with the next part.
(Assignee)

Comment 13

5 years ago
Created attachment 775127 [details] [diff] [review]
Part 4: Don't crash if we call ContentParent::Pid() after the process is dead.

I wasn't seeing this null-crash before; I think it was just an issue of killing the child processes at the right time.
Attachment #775127 - Flags: review?(khuey)
Comment on attachment 775127 [details] [diff] [review]
Part 4: Don't crash if we call ContentParent::Pid() after the process is dead.

This looks good to me!
Attachment #775127 - Flags: review?(khuey) → review+
(Assignee)

Comment 15

5 years ago
Created attachment 775145 [details] [diff] [review]
Part 1: Don't leak BrowserElementParent due to an event listener on the window which contains it.

This doesn't fix the leak, but I think it's a prerequisite.
(Assignee)

Comment 16

5 years ago
Comment on attachment 775145 [details] [diff] [review]
Part 1: Don't leak BrowserElementParent due to an event listener on the window which contains it.

Sorry, wrong bug.
Attachment #775145 - Attachment is obsolete: true
(Assignee)

Comment 18

5 years ago
Sorry for the try: -a pushes here, but I figure nobody is using the cycles on the weekend.  https://tbpl.mozilla.org/?tree=Try&rev=5986d4307e54
(Assignee)

Comment 19

5 years ago
So this burns xpcshell because (apparently) sContentParents is not empty at shutdown any longer (because we now remove ContentParents from this list only when they're destructed), and we have an assertion in LinkedList's destructor that checks that the list is empty.

It also burns Windows for reasons I'm not sure of yet.
(Assignee)

Updated

5 years ago
Blocks: 893012
Assignee: nobody → justin.lebar+bug
Whiteboard: [MemShrink]
(Assignee)

Comment 20

5 years ago
khuey thinks that the windows orange might have been infra weirdness.  Let's find out...

https://tbpl.mozilla.org/?tree=Try&rev=9b72365f663e
(Assignee)

Comment 21

5 years ago
I need this in order to diagnose bug 893012 and verify that the changes we make there actually fix things.
blocking-b2g: --- → leo+
(Assignee)

Comment 23

5 years ago
Created attachment 777369 [details] [diff] [review]
Part 1.5: Make sContentParents a StaticAutoPtr.

This eliminates a static constructor, which is good.  It also avoids a
problem at shutdown: If we leak a ContentParent and we destroy
sContentParents atexit, then sContentParents will assert that it's not
empty and cause test failures.
Attachment #777369 - Flags: review?(khuey)
(Assignee)

Updated

5 years ago
Whiteboard: [MemShrink] → [MemShrink][jlebar will uplift]
(Assignee)

Updated

5 years ago
Whiteboard: [MemShrink][jlebar will uplift] → [MemShrink]
(In reply to Justin Lebar [:jlebar] from comment #25)
> I guess this still needs to be uplifted to b2g18-hd.

b2g18 is merged to v1.1hd regularly, so no worries on that front :-)

https://hg.mozilla.org/releases/mozilla-b2g18_v1_1_0_hd/rev/386b0b75ead4
https://hg.mozilla.org/releases/mozilla-b2g18_v1_1_0_hd/rev/ddf7916363d2
https://hg.mozilla.org/releases/mozilla-b2g18_v1_1_0_hd/rev/c72b0005e61b
https://hg.mozilla.org/releases/mozilla-b2g18_v1_1_0_hd/rev/5eb37b391eb2
https://hg.mozilla.org/releases/mozilla-b2g18_v1_1_0_hd/rev/1e2ec80f5953
status-b2g18-v1.0.0: --- → wontfix
status-b2g18-v1.0.1: --- → wontfix
status-b2g-v1.1hd: --- → fixed
status-firefox23: --- → wontfix
status-firefox24: --- → wontfix
status-firefox25: --- → fixed

Updated

5 years ago
Whiteboard: [MemShrink] → [MemShrink][LeoVB+]
You need to log in before you can comment on or make changes to this bug.