Closed
Bug 1330185
Opened 7 years ago
Closed 7 years ago
Use a PProfiler protocol to control the profiler on a background thread in subprocesses
Categories
(Core :: Gecko Profiler, defect)
Core
Gecko Profiler
Tracking
()
RESOLVED
FIXED
mozilla55
People
(Reporter: mstange, Assigned: mstange)
References
Details
Attachments
(1 file, 3 obsolete files)
At the moment, content process profiles are captured through the PContent protocol. That means that we can only control the profiler and capture profiles while the main thread of the content process is responding. If we used a protocol whose content process endpoint lives on a different thread, we could always get content process profiles right away and wouldn't lose profiling data during hangs.
Assignee | ||
Comment 1•7 years ago
|
||
I should clarify that this is not just about getting the profile, but also about the other ways to control the profiler. It should be possible to start and stop the profiler in the content process while the content process main thread is hung as long as the parent process main thread is responsive.
Comment hidden (mozreview-request) |
Assignee | ||
Comment 3•7 years ago
|
||
This seems to be mostly working, except for the fact that ProfilerParent::ActorDestroy is called with aActorDestroyReason == AbnormalShutdown. I need to track that down. It's also missing lots of comments to document the life cycle of the actors and the role of the threads that are involved.
Assignee | ||
Comment 4•7 years ago
|
||
And the GPU process piece is untested.
Comment hidden (mozreview-request) |
Comment hidden (mozreview-request) |
Comment hidden (mozreview-request) |
Assignee | ||
Updated•7 years ago
|
Summary: Use a PBackground protocol to control the profiler in subprocesses → Use a PProfiler protocol to control the profiler on a background thread in subprocesses
Comment hidden (mozreview-request) |
Comment hidden (mozreview-request) |
Comment hidden (mozreview-request) |
Comment hidden (mozreview-request) |
Comment hidden (mozreview-request) |
Assignee | ||
Updated•7 years ago
|
Attachment #8857316 -
Attachment is obsolete: true
Attachment #8857316 -
Flags: review?(n.nethercote)
Assignee | ||
Updated•7 years ago
|
Attachment #8867936 -
Attachment is obsolete: true
Attachment #8867936 -
Flags: review?(n.nethercote)
Assignee | ||
Updated•7 years ago
|
Attachment #8867937 -
Attachment is obsolete: true
Comment hidden (mozreview-request) |
Comment hidden (mozreview-request) |
Comment 15•7 years ago
|
||
mozreview-review |
Comment on attachment 8867938 [details] Bug 1330185 - Use a top-level PProfiler protocol to control the profiler in other processes. https://reviewboard.mozilla.org/r/139474/#review143712 It seems reasonable to me, but I am very shaky when it comes to IPDL and cross-process communication, so my blessing should be considered only a weak endorsement, and you can see that my comments are all about very shallow things... ::: dom/ipc/ContentChild.cpp:2826 (Diff revision 4) > > #ifdef MOZ_GECKO_PROFILER > - if (profiler_is_active()) { > - // We're shutting down while we were profiling. Send the > - // profile up to the parent so that we don't lose this > - // information. > + if (mProfilerController) { > + nsCString shutdownProfile = mProfilerController->GrabShutdownProfileAndShutdown(); > + mProfilerController = nullptr; > + // Send up the shutdown profile to the parent process through our own s/Send up/Send/ ::: dom/plugins/ipc/PPluginModule.ipdl (Diff revision 4) > - /** > - * Control the Gecko Profiler in the plugin process. > - */ > - async StartProfiler(ProfilerInitParams params); > - async StopProfiler(); > - async PauseProfiler(bool aPause); Thumbs up for replacing the Pause-with-a-bool-argument with separate pause and resume operations :) ::: tools/profiler/gecko/ChildProfilerController.cpp:27 (Diff revision 4) > +{ > + MOZ_RELEASE_ASSERT(NS_IsMainThread()); > + Endpoint<PProfilerParent> parent; > + Endpoint<PProfilerChild> child; > + nsresult rv; > + rv = PProfiler::CreateEndpoints(aOtherPid, Merge assignment with declaration on the previous line. ::: tools/profiler/gecko/PProfiler.ipdl:17 (Diff revision 4) > + * control the Gecko Profiler in other processes, and request profiles from > + * those processes. > + * It is a top-level protocol so that its child endpoint can be on a > + * background thread, so that profiles can be gathered even if the main thread > + * is unresponsive. > + */ C++ comments? ::: tools/profiler/gecko/ProfilerChild.cpp:36 (Diff revision 4) > + profiler_start(params.entries(), params.interval(), > + params.features(), > + filterArray.Elements(), > + filterArray.Length()); > + > + return IPC_OK(); Mis-indentation. ::: tools/profiler/public/ChildProfilerController.h:24 (Diff revision 4) > +class PProfilerParent; > + > +/** > + * ChildProfilerController manages the setup and teardown of ProfilerChild. > + * It's used on the main thread. > + */ C++ comments, please :) ::: tools/profiler/public/ProfilerChild.h:28 (Diff revision 4) > + > + ProfilerChild(); > + > + // Collects and returns a profile. > + // The collected profile expected to be sent to the main process through a > + // different message channel. I'm having trouble parsing this sentence, particularly the "expected". Do you mean "expects"? ::: tools/profiler/public/ProfilerParent.h:1 (Diff revision 4) > +/* -*- Mode: C++; tab-width: 2; indent-tabs-mode: nil; c-basic-offset: 2 -*- */ As per style guide, proper modelines are these: > /* -*- Mode: C++; tab-width: 8; indent-tabs-mode: nil; c-basic-offset: 2 -*- */ > /* vim: set ts=8 sts=2 et sw=2 tw=80: */ Please use them in all new files. ::: tools/profiler/public/ProfilerParent.h:27 (Diff revision 4) > + > +public: > + static bool Alloc(mozilla::ipc::Endpoint<PProfilerParent>&& aEndpoint); > + > + // The following static methods can be called on any thread, but they are > + // no-ops on anything other than the main thread. Is it expected that these methods will be called off the main thread?
Attachment #8867938 -
Flags: review?(n.nethercote) → review+
Assignee | ||
Updated•7 years ago
|
Attachment #8867938 -
Flags: review?(dvander)
Assignee | ||
Comment 16•7 years ago
|
||
mozreview-review-reply |
Comment on attachment 8867938 [details] Bug 1330185 - Use a top-level PProfiler protocol to control the profiler in other processes. https://reviewboard.mozilla.org/r/139474/#review143712 > I'm having trouble parsing this sentence, particularly the "expected". Do you mean "expects"? Oops, this was supposed to say "The collected profile *is* expected to be sent to the main process through a different message channel.". I've changed it to: // This method can be used to grab a profile just before PProfiler is torn // down. The collected profile should then be sent through a different // message channel that is guaranteed to stay open long enough. > Is it expected that these methods will be called off the main thread? Yes, in the child processes: e.g. ProfilerChild::RecvStop (which happens on a background thread) will call profiler_stop which will call ProfilerParent::ProfilerStopped(). But we don't care about that call in child processes because we only nest one level deep - a child process never needs to notify other processes.
Comment hidden (mozreview-request) |
Assignee | ||
Comment 18•7 years ago
|
||
The latest try push has one remaining failure, a compilation failure on the static analysis build: https://treeherder.mozilla.org/#/jobs?repo=try&revision=4950eb8ab937702e34b844a640ca57211cb01a39&selectedJob=100521747 I can't make much sense of the error message: > ipc/ipdl/_ipdlheaders/mozilla/dom/PContentChild.h:1032:22: error: reference to 'PProfilerParent' is ambiguous > typedef mozilla::PProfilerParent PProfilerParent; > ^ > ipc/ipdl/_ipdlheaders/mozilla/dom/PContentParent.h:477:7: note: candidate found by name lookup is 'mozilla::PProfilerParent' > class PProfilerParent; > ^ > ipc/ipdl/_ipdlheaders/mozilla/dom/PContentChild.h:478:7: note: candidate found by name lookup is 'mozilla::PProfilerParent' > class PProfilerParent; > ^ I'll try to reproduce it locally.
Comment hidden (mozreview-request) |
Comment 20•7 years ago
|
||
mozreview-review |
Comment on attachment 8867938 [details] Bug 1330185 - Use a top-level PProfiler protocol to control the profiler in other processes. https://reviewboard.mozilla.org/r/139474/#review144760 I only looked at this IPC aspects. That part looks good to me. ::: dom/ipc/ContentParent.cpp:2788 (Diff revision 5) > > +mozilla::ipc::IPCResult > +ContentParent::RecvInitProfiler(Endpoint<PProfilerParent>&& aEndpoint) > +{ > + if (!ProfilerParent::Alloc(Move(aEndpoint))) { > + return IPC_FAIL(this, "ProfilerParent::Alloc failed"); Is this really something we should kill the child process for? ::: dom/plugins/ipc/PluginModuleChild.cpp:207 (Diff revision 5) > return false; > } > > +#ifdef MOZ_GECKO_PROFILER > + mProfilerController = new ChildProfilerController(); > + Unused << SendInitProfiler(mProfilerController->SetUpEndpoints(OtherPid())); Are you sure we want to do this? This code will send a ProfilerParent to the content process. There's probably not a lot that it will do with that. I think the InitForChrome case should be all you need. ::: dom/plugins/ipc/PluginModuleParent.cpp:3127 (Diff revision 5) > > +mozilla::ipc::IPCResult > +PluginModuleParent::RecvInitProfiler(Endpoint<PProfilerParent>&& aEndpoint) > +{ > + if (!ProfilerParent::Alloc(Move(aEndpoint))) { > + return IPC_FAIL(this, "ProfilerParent::Alloc failed"); Again, do we really want to kill the plugin over this? ::: ipc/glue/BackgroundChildImpl.cpp:44 (Diff revision 5) > #include "mozilla/net/PUDPSocketChild.h" > #include "mozilla/dom/network/UDPSocketChild.h" > #include "mozilla/dom/WebAuthnTransactionChild.h" > #include "nsID.h" > #include "nsTraceRefcnt.h" > +#include "ProfilerChild.h" What's this for? ::: tools/profiler/gecko/ChildProfilerController.cpp:36 (Diff revision 5) > + if (NS_FAILED(rv)) { > + MOZ_CRASH("Failed to create top level actor for PProfiler!"); > + } > + > + if (NS_SUCCEEDED(NS_NewNamedThread("ProfilerChild", getter_AddRefs(mThread)))) { > + mThread->Dispatch(NewRunnableMethod<Endpoint<PProfilerChild>&&>( You can pass an initial event to the thread. No need to dispatch it separately. ::: tools/profiler/gecko/ProfilerChild.cpp:91 (Diff revision 5) > +{ > + mDestroyed = true; > +} > + > +void > +ProfilerChild::Destroy() { Brace should go on its own line. ::: tools/profiler/gecko/ProfilerParent.cpp:244 (Diff revision 5) > + mPendingRequestedProfilesCount--; > + } > + mDestroyed = true; > + > + // Asynchronously delete this object. > + NS_DispatchToMainThread(NS_NewRunnableFunction([=](){ delete this; })); There's a DeallocPProfilerParent method you should be able to implement on ProfilerParent. That would be a little more direct.
Attachment #8867938 -
Flags: review?(wmccloskey) → review+
Comment 21•7 years ago
|
||
> > Is it expected that these methods will be called off the main thread?
>
> Yes, in the child processes: e.g. ProfilerChild::RecvStop (which happens on
> a background thread) will call profiler_stop which will call
> ProfilerParent::ProfilerStopped(). But we don't care about that call in
> child processes because we only nest one level deep - a child process never
> needs to notify other processes.
That would be useful extra info to include in the comment :)
Assignee | ||
Comment 22•7 years ago
|
||
mozreview-review-reply |
Comment on attachment 8867938 [details] Bug 1330185 - Use a top-level PProfiler protocol to control the profiler in other processes. https://reviewboard.mozilla.org/r/139474/#review144760 > Is this really something we should kill the child process for? I've changed it to an NS_WARNING. > Are you sure we want to do this? This code will send a ProfilerParent to the content process. There's probably not a lot that it will do with that. I think the InitForChrome case should be all you need. Thanks for catching this. I've removed the InitForContent change. > Again, do we really want to kill the plugin over this? Changed to NS_WARNING here as well. > What's this for? Left-over from a previous version of the patch. Removed. > You can pass an initial event to the thread. No need to dispatch it separately. Done. NS_NewNamedThread doesn't accept an already_AddRefed<nsIRunnable> though, only an nsIRunnable*, so I needed to add a local variable nsCOMPtr<nsIRunnable> setupRunnable to briefly store the runnable. > There's a DeallocPProfilerParent method you should be able to implement on ProfilerParent. That would be a little more direct. Thanks for the pointer, that's better.
Comment hidden (mozreview-request) |
Comment 24•7 years ago
|
||
Pushed by mstange@themasta.com: https://hg.mozilla.org/integration/mozilla-inbound/rev/51cdc2541486 Use a top-level PProfiler protocol to control the profiler in other processes. r=njn, r=billm
Assignee | ||
Comment 25•7 years ago
|
||
mozreview-review-reply |
Comment on attachment 8867938 [details] Bug 1330185 - Use a top-level PProfiler protocol to control the profiler in other processes. https://reviewboard.mozilla.org/r/139474/#review144760 > Done. NS_NewNamedThread doesn't accept an already_AddRefed<nsIRunnable> though, only an nsIRunnable*, so I needed to add a local variable nsCOMPtr<nsIRunnable> setupRunnable to briefly store the runnable. I'm going to revert this change because I'd like to assert in SetupProfilerChild that I'm on the right thread, so I need the newly-created thread to be stored in mThread before I dispatch the runnable.
Comment hidden (mozreview-request) |
Comment 27•7 years ago
|
||
Pushed by mstange@themasta.com: https://hg.mozilla.org/integration/mozilla-inbound/rev/e6f368f2d874 Use a top-level PProfiler protocol to control the profiler in other processes. r=njn, r=billm
Comment 28•7 years ago
|
||
Backed out in https://hg.mozilla.org/integration/mozilla-inbound/rev/df065d529e85237953033deaf3bd17e4281baa9f for Win8 leaks like https://treeherder.mozilla.org/logviewer.html#?job_id=101099619&repo=mozilla-inbound
Comment 29•7 years ago
|
||
Also seems to break non-GeckoProfiler case (e.g. Linux on non-x86, iOS, Darwin/X11, BSDs, Solaris). Looking under MOZ_OBJDIR nothing references ProfilerParent.cpp but I don't understand how ipdl works. ../../gfx/ipc/Unified_cpp_gfx_ipc0.o: In function `_ZN7mozilla3gfx8GPUChild16RecvInitProfilerEONS_3ipc8EndpointINS_15PProfilerParentEEE': objdir/gfx/ipc/Unified_cpp_gfx_ipc0.cpp:(.text._ZN7mozilla3gfx8GPUChild16RecvInitProfilerEONS_3ipc8EndpointINS_15PProfilerParentEEE+0xd): undefined reference to `_ZN7mozilla14ProfilerParent5AllocEONS_3ipc8EndpointINS_15PProfilerParentEEE' /usr/bin/ld: ../../gfx/ipc/Unified_cpp_gfx_ipc0.o: relocation R_X86_64_PC32 against `_ZN7mozilla14ProfilerParent5AllocEONS_3ipc8EndpointINS_15PProfilerParentEEE' can not be used when making a shared object; recompile with -fPIC /usr/bin/ld: final link failed: Bad value clang++: error: linker command failed with exit code 1 (use -v to see invocation)
Comment hidden (mozreview-request) |
Comment hidden (mozreview-request) |
Comment hidden (mozreview-request) |
Assignee | ||
Comment 33•7 years ago
|
||
I've changed the patch to initiate the PProfiler setup in the parent process. Instead of creating the endpoints in the content process and then sending the parent endpoint over to the parent process, we now create the endpoints in the parent process and send the child endpoint over to the child process. This makes the new code more similar to the old code, and it avoids the problem that caused this patch to be backed out: On Windows 8 64 bit debug tests, we were hitting the failure case in DuplicateHandle at http://searchfox.org/mozilla-central/rev/66d9eb3103e429127c85a7921e16c5a02458a127/ipc/glue/ProtocolUtils.cpp#211 when we were trying to initialize the profiler in a plugin process. I'm not sure why we hit this failure, but Bill's hypothesis was that it has to do with restrictions from the sandbox that we use for plugin processes. It turns out that the failure to duplicate the handle usually won't cause test failures, because we handle that case gracefully, at http://searchfox.org/mozilla-central/rev/31eec6f59eb303adf3318058cb30bd6a883115a5/ipc/glue/Transport_win.h#50 . However, the call to CrashReporter::AnnotateCrashReport in DuplicateHandle *does* cause a test failure, because we're making that call before the crash reporter has been set up in the plugin process, so we hit bug 1367591 and leak an array. The patch has changed slightly since the last review, but not significantly so, and I don't think it needs to be re-reviewed. (In reply to Jan Beich (away from May 25 to June 11) from comment #29) > Also seems to break non-GeckoProfiler case (e.g. Linux on non-x86, iOS, > Darwin/X11, BSDs, Solaris). I was missing an #ifdef in GPUChild.cpp. I think it's fixed now.
Comment hidden (mozreview-request) |
Assignee | ||
Comment 35•7 years ago
|
||
I made some more changes. Most importantly, if ProfilerParent receives a profile, it no longer needs an instance of nsProfiler so that it can pass its profile to the ProfileGatherer. Instead, it now returns a promise from GatherProfiles, and resolves that promise. Using nsProfiler was a problem when we wanted to pass profiles to it late during shutdown. By the time ProfilerParent::ActorDestroyed was called, the nsProfiler instance was already destroyed, so nsProfiler::GetOrCreate created a new instance, which during initialization used the observer service, which asserts if it's used late during shutdown. The other change was the use of the MozPromise IPDL infrastructure. See https://groups.google.com/d/msg/mozilla.dev.platform/7on2slDlfIw/0d3JyvCHAwAJ - ProfilerParent::SendGatherProfile now returns a promise, and ProfilerChild::RecvGatherProfile just resolves a promise in the child process instead of calling a different IPDL method (SendProfile) to reply.
Assignee | ||
Comment 36•7 years ago
|
||
Try push is at https://treeherder.mozilla.org/#/jobs?repo=try&revision=b8e71307641e0236a8d5b8705b29e955b4fafebd . I really hope this is the last one.
Comment hidden (mozreview-request) |
Comment 38•7 years ago
|
||
Pushed by mstange@themasta.com: https://hg.mozilla.org/integration/autoland/rev/bb9f906e70e3 Use a top-level PProfiler protocol to control the profiler in other processes. r=billm,njn
Comment 39•7 years ago
|
||
bugherder |
https://hg.mozilla.org/mozilla-central/rev/bb9f906e70e3
Status: NEW → RESOLVED
Closed: 7 years ago
status-firefox55:
--- → fixed
Resolution: --- → FIXED
Target Milestone: --- → mozilla55
Comment 40•7 years ago
|
||
Wontfix for 54, since we are shipping 54 next week.
status-firefox54:
--- → wontfix
Updated•6 years ago
|
Assignee: nobody → mstange
You need to log in
before you can comment on or make changes to this bug.
Description
•