Closed
Bug 893242
Opened 11 years ago
Closed 11 years ago
Add a memory reporter counting ContentParents and the number of IPC messages they have outstanding.
Categories
(Toolkit :: about:memory, defect)
Toolkit
about:memory
Tracking
()
People
(Reporter: justin.lebar+bug, Assigned: justin.lebar+bug)
References
Details
(Whiteboard: [MemShrink][LeoVB+])
Attachments
(5 files, 2 obsolete files)
21.01 KB,
patch
|
bent.mozilla
:
review+
|
Details | Diff | Splinter Review |
2.61 KB,
patch
|
bent.mozilla
:
review+
|
Details | Diff | Splinter Review |
3.94 KB,
patch
|
n.nethercote
:
review+
|
Details | Diff | Splinter Review |
1.53 KB,
patch
|
bent.mozilla
:
review+
|
Details | Diff | Splinter Review |
4.22 KB,
patch
|
khuey
:
review+
|
Details | Diff | Splinter Review |
Patch in a moment.
Assignee | ||
Comment 1•11 years ago
|
||
Attachment #774971 -
Flags: review?(benjamin)
Assignee | ||
Comment 2•11 years ago
|
||
To do this, we got rid of the explicit removal of a ContentParent from
sContentParents in MarkAsDead(). Instead, we rely on
LinkedListElement's destructor to remove the ContentParent from
sContentParents.
Thus, a ContentParent stays in sContentParents until it's destructed.
ContentParent::GetAll() checks whether the ContentParent is still alive,
so its behavior is unchanged.
Attachment #774973 -
Flags: review?(benjamin)
Assignee | ||
Comment 3•11 years ago
|
||
Attachment #774974 -
Flags: review?(n.nethercote)
Assignee | ||
Comment 4•11 years ago
|
||
I haven't tried compiling this on on Windows, so sorry in advance for typos there.
Assignee | ||
Comment 5•11 years ago
|
||
Tweaked memory reporter path slightly.
Attachment #774971 -
Attachment is obsolete: true
Attachment #774971 -
Flags: review?(benjamin)
Attachment #774978 -
Flags: review?(n.nethercote)
Assignee | ||
Updated•11 years ago
|
Attachment #774974 -
Attachment is obsolete: true
Attachment #774974 -
Flags: review?(n.nethercote)
Assignee | ||
Updated•11 years ago
|
Attachment #774971 -
Attachment is obsolete: false
Assignee | ||
Comment 6•11 years ago
|
||
The memory reporter looks like:
> 0 (100.0%) -- queued-ipc-messages
> ├──0 (100.0%) ── content-parent((Preallocated), pid=2156, open channel, 0x44874800, refcnt=14)
> ├──0 (100.0%) ── content-parent(Homescreen, pid=2086, open channel, 0x46112800, refcnt=14)
> └──0 (100.0%) ── content-parent(Usage, pid=2074, open channel, 0x40416400, refcnt=14)
Assignee | ||
Updated•11 years ago
|
Attachment #774971 -
Flags: review?(benjamin)
Assignee | ||
Comment 7•11 years ago
|
||
I wanted to include the message size here, instead of just the number, but it was somewhat complicated to achieve in a thread-safe manner.
The issue is that I don't want to call malloc_usable_size on every Message when it gets enqueued (sorry, premature optimization). But I also don't want to stick a lock around the Channel object, which means that we can't iterate over the Channel's queue from the main thread.
This patch is conservative in terms of its perf impact, which is good if we want to take it on branches (we probably do).
Comment on attachment 774971 [details] [diff] [review]
Part 1: Add Unsound_IsClosed() and Unsound_NumQueuedMessages() to AsyncChannel.
Review of attachment 774971 [details] [diff] [review]:
-----------------------------------------------------------------
Looks good, a few things though:
(The comments in the posix files apply to win as well)
::: ipc/chromium/src/chrome/common/ipc_channel.h
@@ +103,5 @@
> + // Unsound_IsClosed() and Unsound_NumQueuedMessages() are safe to call from
> + // any thread, but the value returned may be out of date, because we don't
> + // use any synchronization when reading or writing it.
> + bool Unsound_IsClosed();
> + uint32_t Unsound_NumQueuedMessages();
Nit: Can you mark these as 'const'?
::: ipc/chromium/src/chrome/common/ipc_channel_posix.cc
@@ +292,5 @@
> closed_ = false;
> #if defined(OS_MACOSX)
> last_pending_fd_id_ = 0;
> #endif
> + output_queue_length_ = 0;
This will certainly trigger valgrind/Asan warnings, so we should file a followup to annotate them as known-race.
I'm a bit concerned that this will occasionally do really bogus things (like if an optimizer tries to use this location for temporary calculations or something), but at worst we'll just get a flukey memory report...
::: ipc/chromium/src/chrome/common/ipc_channel_posix.h
@@ +151,5 @@
> // A generation ID for RECEIVED_FD messages.
> uint32_t last_pending_fd_id_;
> #endif
>
> + size_t output_queue_length_;
Nit: This should have some kind of comment too, and maybe be named "unsound" also.
::: ipc/glue/AsyncChannel.cpp
@@ +279,5 @@
> +
> +uint32_t
> +AsyncChannel::ThreadLink::Unsound_NumQueuedMessages()
> +{
> + return 0;
Maybe just comment here that we don't currently care about thread channels but that we may implement this someday?
@@ +694,5 @@
> +bool
> +AsyncChannel::Unsound_IsClosed()
> +{
> + if (!mLink) {
> + return false;
Is this right? If we don't have a link shouldn't that count as closed?
Attachment #774971 -
Flags: review+
(In reply to Justin Lebar [:jlebar] from comment #7)
> The issue is that I don't want to call malloc_usable_size on every Message
> when it gets enqueued (sorry, premature optimization).
We can always use the capacity_ member, right?
Updated•11 years ago
|
Attachment #774973 -
Flags: review?(benjamin) → review+
Assignee | ||
Comment 10•11 years ago
|
||
> We can always use the capacity_ member, right?
It's true, we could.
The trick is ensuring that the capacity doesn't change. We've had a lot of problems with counter-based memory reporters where the value you add in isn't the same as the value you subtract out, so I didn't want to risk it.
(Note that we have the same problem with a counter-based malloc_usable_size approach.)
Assignee | ||
Updated•11 years ago
|
Attachment #774971 -
Flags: review?(benjamin)
Comment 11•11 years ago
|
||
Comment on attachment 774978 [details] [diff] [review]
Part 3, v2: Add a memory reporter counting ContentParents and the number of IPC messages they have outstanding.
Review of attachment 774978 [details] [diff] [review]:
-----------------------------------------------------------------
::: dom/ipc/ContentParent.cpp
@@ +254,5 @@
> + "The number of unset IPC messages held in this ContentParent's "
> + "channel. Usually we'd expect this to be zero, or close to it; a "
> + "large value here might indicate that we're leaking messages. "
> + "Similarly, a ContentParent object for a process that's no longer "
> + "running could indicate that we're leaking ContentParents.");
Omit the "Usually we'd expect this to be zero, or close to it;" ?
@@ +335,5 @@
> return;
> }
>
> + nsRefPtr<ContentParentMemoryReporter> mr = new ContentParentMemoryReporter();
> + NS_RegisterMemoryMultiReporter(mr);
I was really confused here, because I thought ContentParent::StartUp() was run for each new ContentParent. But now I see it's a static method run once -- "Start up the content-process machinery. This might include scheduling pre-launch tasks."
Attachment #774978 -
Flags: review?(n.nethercote) → review+
Assignee | ||
Comment 12•11 years ago
|
||
> Omit the "Usually we'd expect this to be zero, or close to it;" ?
Yeah, that's redundant with the next part.
Assignee | ||
Comment 13•11 years ago
|
||
I wasn't seeing this null-crash before; I think it was just an issue of killing the child processes at the right time.
Attachment #775127 -
Flags: review?(khuey)
Comment on attachment 775127 [details] [diff] [review]
Part 4: Don't crash if we call ContentParent::Pid() after the process is dead.
This looks good to me!
Attachment #775127 -
Flags: review?(khuey) → review+
Assignee | ||
Comment 15•11 years ago
|
||
This doesn't fix the leak, but I think it's a prerequisite.
Assignee | ||
Comment 16•11 years ago
|
||
Comment on attachment 775145 [details] [diff] [review]
Part 1: Don't leak BrowserElementParent due to an event listener on the window which contains it.
Sorry, wrong bug.
Attachment #775145 -
Attachment is obsolete: true
Assignee | ||
Comment 17•11 years ago
|
||
Assignee | ||
Comment 18•11 years ago
|
||
Sorry for the try: -a pushes here, but I figure nobody is using the cycles on the weekend. https://tbpl.mozilla.org/?tree=Try&rev=5986d4307e54
Assignee | ||
Comment 19•11 years ago
|
||
So this burns xpcshell because (apparently) sContentParents is not empty at shutdown any longer (because we now remove ContentParents from this list only when they're destructed), and we have an assertion in LinkedList's destructor that checks that the list is empty.
It also burns Windows for reasons I'm not sure of yet.
Updated•11 years ago
|
Assignee: nobody → justin.lebar+bug
Whiteboard: [MemShrink]
Assignee | ||
Comment 20•11 years ago
|
||
khuey thinks that the windows orange might have been infra weirdness. Let's find out...
https://tbpl.mozilla.org/?tree=Try&rev=9b72365f663e
Assignee | ||
Comment 21•11 years ago
|
||
I need this in order to diagnose bug 893012 and verify that the changes we make there actually fix things.
blocking-b2g: --- → leo+
Assignee | ||
Comment 22•11 years ago
|
||
Assignee | ||
Comment 23•11 years ago
|
||
This eliminates a static constructor, which is good. It also avoids a
problem at shutdown: If we leak a ContentParent and we destroy
sContentParents atexit, then sContentParents will assert that it's not
empty and cause test failures.
Attachment #777369 -
Flags: review?(khuey)
Attachment #777369 -
Flags: review?(khuey) → review+
Assignee | ||
Comment 24•11 years ago
|
||
https://hg.mozilla.org/projects/birch/rev/f7566da9633a
https://hg.mozilla.org/projects/birch/rev/57f124e4df9a
https://hg.mozilla.org/projects/birch/rev/fb3477de11bc
https://hg.mozilla.org/projects/birch/rev/87f80556b71a
https://hg.mozilla.org/projects/birch/rev/9dbf02392f9c
This needs a separate set of patches for b2g18. I can uplift tomorrow.
Assignee | ||
Updated•11 years ago
|
Whiteboard: [MemShrink] → [MemShrink][jlebar will uplift]
Assignee | ||
Comment 25•11 years ago
|
||
remote: https://hg.mozilla.org/releases/mozilla-b2g18/rev/386b0b75ead4
remote: https://hg.mozilla.org/releases/mozilla-b2g18/rev/ddf7916363d2
remote: https://hg.mozilla.org/releases/mozilla-b2g18/rev/c72b0005e61b
remote: https://hg.mozilla.org/releases/mozilla-b2g18/rev/5eb37b391eb2
remote: https://hg.mozilla.org/releases/mozilla-b2g18/rev/1e2ec80f5953
I guess this still needs to be uplifted to b2g18-hd.
status-b2g18:
--- → fixed
Comment 26•11 years ago
|
||
https://hg.mozilla.org/mozilla-central/rev/f7566da9633a
https://hg.mozilla.org/mozilla-central/rev/57f124e4df9a
https://hg.mozilla.org/mozilla-central/rev/fb3477de11bc
https://hg.mozilla.org/mozilla-central/rev/87f80556b71a
https://hg.mozilla.org/mozilla-central/rev/9dbf02392f9c
Status: NEW → RESOLVED
Closed: 11 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla25
Assignee | ||
Updated•11 years ago
|
Whiteboard: [MemShrink][jlebar will uplift] → [MemShrink]
Comment 27•11 years ago
|
||
(In reply to Justin Lebar [:jlebar] from comment #25)
> I guess this still needs to be uplifted to b2g18-hd.
b2g18 is merged to v1.1hd regularly, so no worries on that front :-)
https://hg.mozilla.org/releases/mozilla-b2g18_v1_1_0_hd/rev/386b0b75ead4
https://hg.mozilla.org/releases/mozilla-b2g18_v1_1_0_hd/rev/ddf7916363d2
https://hg.mozilla.org/releases/mozilla-b2g18_v1_1_0_hd/rev/c72b0005e61b
https://hg.mozilla.org/releases/mozilla-b2g18_v1_1_0_hd/rev/5eb37b391eb2
https://hg.mozilla.org/releases/mozilla-b2g18_v1_1_0_hd/rev/1e2ec80f5953
status-b2g18-v1.0.0:
--- → wontfix
status-b2g18-v1.0.1:
--- → wontfix
status-b2g-v1.1hd:
--- → fixed
status-firefox23:
--- → wontfix
status-firefox24:
--- → wontfix
status-firefox25:
--- → fixed
Updated•11 years ago
|
Whiteboard: [MemShrink] → [MemShrink][LeoVB+]
You need to log in
before you can comment on or make changes to this bug.
Description
•