Closed Bug 893242 Opened 11 years ago Closed 11 years ago

Add a memory reporter counting ContentParents and the number of IPC messages they have outstanding.


(Toolkit :: about:memory, defect)

Not set



blocking-b2g leo+
Tracking Status
firefox23 --- wontfix
firefox24 --- wontfix
firefox25 --- fixed
b2g18 --- fixed
b2g18-v1.0.0 --- wontfix
b2g18-v1.0.1 --- wontfix
b2g-v1.1hd --- fixed


(Reporter: justin.lebar+bug, Assigned: justin.lebar+bug)



(Whiteboard: [MemShrink][LeoVB+])


(5 files, 2 obsolete files)

Patch in a moment.
To do this, we got rid of the explicit removal of a ContentParent from
sContentParents in MarkAsDead().  Instead, we rely on
LinkedListElement's destructor to remove the ContentParent from

Thus, a ContentParent stays in sContentParents until it's destructed.
ContentParent::GetAll() checks whether the ContentParent is still alive,
so its behavior is unchanged.
Attachment #774973 - Flags: review?(benjamin)
I haven't tried compiling this on on Windows, so sorry in advance for typos there.
Tweaked memory reporter path slightly.
Attachment #774971 - Attachment is obsolete: true
Attachment #774971 - Flags: review?(benjamin)
Attachment #774978 - Flags: review?(n.nethercote)
Attachment #774974 - Attachment is obsolete: true
Attachment #774974 - Flags: review?(n.nethercote)
Attachment #774971 - Attachment is obsolete: false
The memory reporter looks like:

> 0 (100.0%) -- queued-ipc-messages
> ├──0 (100.0%) ── content-parent((Preallocated), pid=2156, open channel, 0x44874800, refcnt=14)
> ├──0 (100.0%) ── content-parent(Homescreen, pid=2086, open channel, 0x46112800, refcnt=14)
> └──0 (100.0%) ── content-parent(Usage, pid=2074, open channel, 0x40416400, refcnt=14)
Attachment #774971 - Flags: review?(benjamin)
I wanted to include the message size here, instead of just the number, but it was somewhat complicated to achieve in a thread-safe manner.

The issue is that I don't want to call malloc_usable_size on every Message when it gets enqueued (sorry, premature optimization).  But I also don't want to stick a lock around the Channel object, which means that we can't iterate over the Channel's queue from the main thread.

This patch is conservative in terms of its perf impact, which is good if we want to take it on branches (we probably do).
Comment on attachment 774971 [details] [diff] [review]
Part 1: Add Unsound_IsClosed() and Unsound_NumQueuedMessages() to AsyncChannel.

Review of attachment 774971 [details] [diff] [review]:

Looks good, a few things though:

(The comments in the posix files apply to win as well)

::: ipc/chromium/src/chrome/common/ipc_channel.h
@@ +103,5 @@
> +  // Unsound_IsClosed() and Unsound_NumQueuedMessages() are safe to call from
> +  // any thread, but the value returned may be out of date, because we don't
> +  // use any synchronization when reading or writing it.
> +  bool Unsound_IsClosed();
> +  uint32_t Unsound_NumQueuedMessages();

Nit: Can you mark these as 'const'?

::: ipc/chromium/src/chrome/common/
@@ +292,5 @@
>    closed_ = false;
>  #if defined(OS_MACOSX)
>    last_pending_fd_id_ = 0;
>  #endif
> +  output_queue_length_ = 0;

This will certainly trigger valgrind/Asan warnings, so we should file a followup to annotate them as known-race.

I'm a bit concerned that this will occasionally do really bogus things (like if an optimizer tries to use this location for temporary calculations or something), but at worst we'll just get a flukey memory report...

::: ipc/chromium/src/chrome/common/ipc_channel_posix.h
@@ +151,5 @@
>    // A generation ID for RECEIVED_FD messages.
>    uint32_t last_pending_fd_id_;
>  #endif
> +  size_t output_queue_length_;

Nit: This should have some kind of comment too, and maybe be named "unsound" also.

::: ipc/glue/AsyncChannel.cpp
@@ +279,5 @@
> +
> +uint32_t
> +AsyncChannel::ThreadLink::Unsound_NumQueuedMessages()
> +{
> +    return 0;

Maybe just comment here that we don't currently care about thread channels but that we may implement this someday?

@@ +694,5 @@
> +bool
> +AsyncChannel::Unsound_IsClosed()
> +{
> +    if (!mLink) {
> +        return false;

Is this right? If we don't have a link shouldn't that count as closed?
Attachment #774971 - Flags: review+
(In reply to Justin Lebar [:jlebar] from comment #7)
> The issue is that I don't want to call malloc_usable_size on every Message
> when it gets enqueued (sorry, premature optimization).

We can always use the capacity_ member, right?
Attachment #774973 - Flags: review?(benjamin) → review+
> We can always use the capacity_ member, right?

It's true, we could.

The trick is ensuring that the capacity doesn't change.  We've had a lot of problems with counter-based memory reporters where the value you add in isn't the same as the value you subtract out, so I didn't want to risk it.

(Note that we have the same problem with a counter-based malloc_usable_size approach.)
Attachment #774971 - Flags: review?(benjamin)
Comment on attachment 774978 [details] [diff] [review]
Part 3, v2: Add a memory reporter counting ContentParents and the number of IPC messages they have outstanding.

Review of attachment 774978 [details] [diff] [review]:

::: dom/ipc/ContentParent.cpp
@@ +254,5 @@
> +            "The number of unset IPC messages held in this ContentParent's "
> +            "channel.  Usually we'd expect this to be zero, or close to it; a "
> +            "large value here might indicate that we're leaking messages. "
> +            "Similarly, a ContentParent object for a process that's no longer "
> +            "running could indicate that we're leaking ContentParents.");

Omit the "Usually we'd expect this to be zero, or close to it;" ?

@@ +335,5 @@
>          return;
>      }
> +    nsRefPtr<ContentParentMemoryReporter> mr = new ContentParentMemoryReporter();
> +    NS_RegisterMemoryMultiReporter(mr);

I was really confused here, because I thought ContentParent::StartUp() was run for each new ContentParent.  But now I see it's a static method run once -- "Start up the content-process machinery.  This might include scheduling pre-launch tasks."
Attachment #774978 - Flags: review?(n.nethercote) → review+
> Omit the "Usually we'd expect this to be zero, or close to it;" ?

Yeah, that's redundant with the next part.
I wasn't seeing this null-crash before; I think it was just an issue of killing the child processes at the right time.
Attachment #775127 - Flags: review?(khuey)
Comment on attachment 775127 [details] [diff] [review]
Part 4: Don't crash if we call ContentParent::Pid() after the process is dead.

This looks good to me!
Attachment #775127 - Flags: review?(khuey) → review+
This doesn't fix the leak, but I think it's a prerequisite.
Comment on attachment 775145 [details] [diff] [review]
Part 1: Don't leak BrowserElementParent due to an event listener on the window which contains it.

Sorry, wrong bug.
Attachment #775145 - Attachment is obsolete: true
Sorry for the try: -a pushes here, but I figure nobody is using the cycles on the weekend.
So this burns xpcshell because (apparently) sContentParents is not empty at shutdown any longer (because we now remove ContentParents from this list only when they're destructed), and we have an assertion in LinkedList's destructor that checks that the list is empty.

It also burns Windows for reasons I'm not sure of yet.
Blocks: 893012
Assignee: nobody → justin.lebar+bug
Whiteboard: [MemShrink]
khuey thinks that the windows orange might have been infra weirdness.  Let's find out...
I need this in order to diagnose bug 893012 and verify that the changes we make there actually fix things.
blocking-b2g: --- → leo+
This eliminates a static constructor, which is good.  It also avoids a
problem at shutdown: If we leak a ContentParent and we destroy
sContentParents atexit, then sContentParents will assert that it's not
empty and cause test failures.
Attachment #777369 - Flags: review?(khuey)
Whiteboard: [MemShrink] → [MemShrink][jlebar will uplift]
Whiteboard: [MemShrink][jlebar will uplift] → [MemShrink]
Whiteboard: [MemShrink] → [MemShrink][LeoVB+]
You need to log in before you can comment on or make changes to this bug.