Closed Bug 1330185 Opened 7 years ago Closed 7 years ago

Use a PProfiler protocol to control the profiler on a background thread in subprocesses

Categories

(Core :: Gecko Profiler, defect)

defect
Not set
normal

Tracking

()

RESOLVED FIXED
mozilla55
Tracking Status
firefox53 --- wontfix
firefox54 --- wontfix
firefox55 --- fixed

People

(Reporter: mstange, Assigned: mstange)

References

Details

Attachments

(1 file, 3 obsolete files)

At the moment, content process profiles are captured through the PContent protocol. That means that we can only control the profiler and capture profiles while the main thread of the content process is responding.
If we used a protocol whose content process endpoint lives on a different thread, we could always get content process profiles right away and wouldn't lose profiling data during hangs.
I should clarify that this is not just about getting the profile, but also about the other ways to control the profiler. It should be possible to start and stop the profiler in the content process while the content process main thread is hung as long as the parent process main thread is responsive.
Depends on: 1354961
This seems to be mostly working, except for the fact that ProfilerParent::ActorDestroy is called with aActorDestroyReason == AbnormalShutdown. I need to track that down.
It's also missing lots of comments to document the life cycle of the actors and the role of the threads that are involved.
And the GPU process piece is untested.
Summary: Use a PBackground protocol to control the profiler in subprocesses → Use a PProfiler protocol to control the profiler on a background thread in subprocesses
Attachment #8857316 - Attachment is obsolete: true
Attachment #8857316 - Flags: review?(n.nethercote)
Attachment #8867936 - Attachment is obsolete: true
Attachment #8867936 - Flags: review?(n.nethercote)
Attachment #8867937 - Attachment is obsolete: true
Comment on attachment 8867938 [details]
Bug 1330185 - Use a top-level PProfiler protocol to control the profiler in other processes.

https://reviewboard.mozilla.org/r/139474/#review143712

It seems reasonable to me, but I am very shaky when it comes to IPDL and cross-process communication, so my blessing should be considered only a weak endorsement, and you can see that my comments are all about very shallow things...

::: dom/ipc/ContentChild.cpp:2826
(Diff revision 4)
>  
>  #ifdef MOZ_GECKO_PROFILER
> -  if (profiler_is_active()) {
> -    // We're shutting down while we were profiling. Send the
> -    // profile up to the parent so that we don't lose this
> -    // information.
> +  if (mProfilerController) {
> +    nsCString shutdownProfile = mProfilerController->GrabShutdownProfileAndShutdown();
> +    mProfilerController = nullptr;
> +    // Send up the shutdown profile to the parent process through our own

s/Send up/Send/

::: dom/plugins/ipc/PPluginModule.ipdl
(Diff revision 4)
> -  /**
> -   * Control the Gecko Profiler in the plugin process.
> -   */
> -  async StartProfiler(ProfilerInitParams params);
> -  async StopProfiler();
> -  async PauseProfiler(bool aPause);

Thumbs up for replacing the Pause-with-a-bool-argument with separate pause and resume operations :)

::: tools/profiler/gecko/ChildProfilerController.cpp:27
(Diff revision 4)
> +{
> +  MOZ_RELEASE_ASSERT(NS_IsMainThread());
> +  Endpoint<PProfilerParent> parent;
> +  Endpoint<PProfilerChild> child;
> +  nsresult rv;
> +  rv = PProfiler::CreateEndpoints(aOtherPid,

Merge assignment with declaration on the previous line.

::: tools/profiler/gecko/PProfiler.ipdl:17
(Diff revision 4)
> + * control the Gecko Profiler in other processes, and request profiles from
> + * those processes.
> + * It is a top-level protocol so that its child endpoint can be on a
> + * background thread, so that profiles can be gathered even if the main thread
> + * is unresponsive.
> + */

C++ comments?

::: tools/profiler/gecko/ProfilerChild.cpp:36
(Diff revision 4)
> +  profiler_start(params.entries(), params.interval(),
> +                 params.features(),
> +                 filterArray.Elements(),
> +                 filterArray.Length());
> +
> + return IPC_OK();

Mis-indentation.

::: tools/profiler/public/ChildProfilerController.h:24
(Diff revision 4)
> +class PProfilerParent;
> +
> +/**
> + * ChildProfilerController manages the setup and teardown of ProfilerChild.
> + * It's used on the main thread.
> + */

C++ comments, please :)

::: tools/profiler/public/ProfilerChild.h:28
(Diff revision 4)
> +
> +  ProfilerChild();
> +
> +  // Collects and returns a profile.
> +  // The collected profile expected to be sent to the main process through a
> +  // different message channel.

I'm having trouble parsing this sentence, particularly the "expected". Do you mean "expects"?

::: tools/profiler/public/ProfilerParent.h:1
(Diff revision 4)
> +/* -*- Mode: C++; tab-width: 2; indent-tabs-mode: nil; c-basic-offset: 2 -*- */

As per style guide, proper modelines are these:

> /* -*- Mode: C++; tab-width: 8; indent-tabs-mode: nil; c-basic-offset: 2 -*- */
> /* vim: set ts=8 sts=2 et sw=2 tw=80: */

Please use them in all new files.

::: tools/profiler/public/ProfilerParent.h:27
(Diff revision 4)
> +
> +public:
> +  static bool Alloc(mozilla::ipc::Endpoint<PProfilerParent>&& aEndpoint);
> +
> +  // The following static methods can be called on any thread, but they are
> +  // no-ops on anything other than the main thread.

Is it expected that these methods will be called off the main thread?
Attachment #8867938 - Flags: review?(n.nethercote) → review+
Attachment #8867938 - Flags: review?(dvander)
Comment on attachment 8867938 [details]
Bug 1330185 - Use a top-level PProfiler protocol to control the profiler in other processes.

https://reviewboard.mozilla.org/r/139474/#review143712

> I'm having trouble parsing this sentence, particularly the "expected". Do you mean "expects"?

Oops, this was supposed to say "The collected profile *is* expected to be sent to the main process through a different message channel.".

I've changed it to:
  // This method can be used to grab a profile just before PProfiler is torn
  // down. The collected profile should then be sent through a different
  // message channel that is guaranteed to stay open long enough.

> Is it expected that these methods will be called off the main thread?

Yes, in the child processes: e.g. ProfilerChild::RecvStop (which happens on a background thread) will call profiler_stop which will call ProfilerParent::ProfilerStopped(). But we don't care about that call in child processes because we only nest one level deep - a child process never needs to notify other processes.
The latest try push has one remaining failure, a compilation failure on the static analysis build:
https://treeherder.mozilla.org/#/jobs?repo=try&revision=4950eb8ab937702e34b844a640ca57211cb01a39&selectedJob=100521747

I can't make much sense of the error message:

> ipc/ipdl/_ipdlheaders/mozilla/dom/PContentChild.h:1032:22: error: reference to 'PProfilerParent' is ambiguous
>     typedef mozilla::PProfilerParent PProfilerParent;
>                      ^
> ipc/ipdl/_ipdlheaders/mozilla/dom/PContentParent.h:477:7: note: candidate found by name lookup is 'mozilla::PProfilerParent'
> class PProfilerParent;
>       ^
> ipc/ipdl/_ipdlheaders/mozilla/dom/PContentChild.h:478:7: note: candidate found by name lookup is 'mozilla::PProfilerParent'
> class PProfilerParent;
>       ^

I'll try to reproduce it locally.
Comment on attachment 8867938 [details]
Bug 1330185 - Use a top-level PProfiler protocol to control the profiler in other processes.

https://reviewboard.mozilla.org/r/139474/#review144760

I only looked at this IPC aspects. That part looks good to me.

::: dom/ipc/ContentParent.cpp:2788
(Diff revision 5)
>  
> +mozilla::ipc::IPCResult
> +ContentParent::RecvInitProfiler(Endpoint<PProfilerParent>&& aEndpoint)
> +{
> +  if (!ProfilerParent::Alloc(Move(aEndpoint))) {
> +    return IPC_FAIL(this, "ProfilerParent::Alloc failed");

Is this really something we should kill the child process for?

::: dom/plugins/ipc/PluginModuleChild.cpp:207
(Diff revision 5)
>          return false;
>      }
>  
> +#ifdef MOZ_GECKO_PROFILER
> +    mProfilerController = new ChildProfilerController();
> +    Unused << SendInitProfiler(mProfilerController->SetUpEndpoints(OtherPid()));

Are you sure we want to do this? This code will send a ProfilerParent to the content process. There's probably not a lot that it will do with that. I think the InitForChrome case should be all you need.

::: dom/plugins/ipc/PluginModuleParent.cpp:3127
(Diff revision 5)
>  
> +mozilla::ipc::IPCResult
> +PluginModuleParent::RecvInitProfiler(Endpoint<PProfilerParent>&& aEndpoint)
> +{
> +  if (!ProfilerParent::Alloc(Move(aEndpoint))) {
> +    return IPC_FAIL(this, "ProfilerParent::Alloc failed");

Again, do we really want to kill the plugin over this?

::: ipc/glue/BackgroundChildImpl.cpp:44
(Diff revision 5)
>  #include "mozilla/net/PUDPSocketChild.h"
>  #include "mozilla/dom/network/UDPSocketChild.h"
>  #include "mozilla/dom/WebAuthnTransactionChild.h"
>  #include "nsID.h"
>  #include "nsTraceRefcnt.h"
> +#include "ProfilerChild.h"

What's this for?

::: tools/profiler/gecko/ChildProfilerController.cpp:36
(Diff revision 5)
> +  if (NS_FAILED(rv)) {
> +    MOZ_CRASH("Failed to create top level actor for PProfiler!");
> +  }
> +
> +  if (NS_SUCCEEDED(NS_NewNamedThread("ProfilerChild", getter_AddRefs(mThread)))) {
> +    mThread->Dispatch(NewRunnableMethod<Endpoint<PProfilerChild>&&>(

You can pass an initial event to the thread. No need to dispatch it separately.

::: tools/profiler/gecko/ProfilerChild.cpp:91
(Diff revision 5)
> +{
> +  mDestroyed = true;
> +}
> +
> +void
> +ProfilerChild::Destroy() {

Brace should go on its own line.

::: tools/profiler/gecko/ProfilerParent.cpp:244
(Diff revision 5)
> +    mPendingRequestedProfilesCount--;
> +  }
> +  mDestroyed = true;
> +
> +  // Asynchronously delete this object.
> +  NS_DispatchToMainThread(NS_NewRunnableFunction([=](){ delete this; }));

There's a DeallocPProfilerParent method you should be able to implement on ProfilerParent. That would be a little more direct.
Attachment #8867938 - Flags: review?(wmccloskey) → review+
> > Is it expected that these methods will be called off the main thread?
> 
> Yes, in the child processes: e.g. ProfilerChild::RecvStop (which happens on
> a background thread) will call profiler_stop which will call
> ProfilerParent::ProfilerStopped(). But we don't care about that call in
> child processes because we only nest one level deep - a child process never
> needs to notify other processes.

That would be useful extra info to include in the comment :)
Comment on attachment 8867938 [details]
Bug 1330185 - Use a top-level PProfiler protocol to control the profiler in other processes.

https://reviewboard.mozilla.org/r/139474/#review144760

> Is this really something we should kill the child process for?

I've changed it to an NS_WARNING.

> Are you sure we want to do this? This code will send a ProfilerParent to the content process. There's probably not a lot that it will do with that. I think the InitForChrome case should be all you need.

Thanks for catching this. I've removed the InitForContent change.

> Again, do we really want to kill the plugin over this?

Changed to NS_WARNING here as well.

> What's this for?

Left-over from a previous version of the patch. Removed.

> You can pass an initial event to the thread. No need to dispatch it separately.

Done. NS_NewNamedThread doesn't accept an already_AddRefed<nsIRunnable> though, only an nsIRunnable*, so I needed to add a local variable nsCOMPtr<nsIRunnable> setupRunnable to briefly store the runnable.

> There's a DeallocPProfilerParent method you should be able to implement on ProfilerParent. That would be a little more direct.

Thanks for the pointer, that's better.
Pushed by mstange@themasta.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/51cdc2541486
Use a top-level PProfiler protocol to control the profiler in other processes. r=njn, r=billm
Comment on attachment 8867938 [details]
Bug 1330185 - Use a top-level PProfiler protocol to control the profiler in other processes.

https://reviewboard.mozilla.org/r/139474/#review144760

> Done. NS_NewNamedThread doesn't accept an already_AddRefed<nsIRunnable> though, only an nsIRunnable*, so I needed to add a local variable nsCOMPtr<nsIRunnable> setupRunnable to briefly store the runnable.

I'm going to revert this change because I'd like to assert in SetupProfilerChild that I'm on the right thread, so I need the newly-created thread to be stored in mThread before I dispatch the runnable.
Pushed by mstange@themasta.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/e6f368f2d874
Use a top-level PProfiler protocol to control the profiler in other processes. r=njn, r=billm
Also seems to break non-GeckoProfiler case (e.g. Linux on non-x86, iOS, Darwin/X11, BSDs, Solaris). Looking under MOZ_OBJDIR nothing references ProfilerParent.cpp but I don't understand how ipdl works.

../../gfx/ipc/Unified_cpp_gfx_ipc0.o: In function `_ZN7mozilla3gfx8GPUChild16RecvInitProfilerEONS_3ipc8EndpointINS_15PProfilerParentEEE':
objdir/gfx/ipc/Unified_cpp_gfx_ipc0.cpp:(.text._ZN7mozilla3gfx8GPUChild16RecvInitProfilerEONS_3ipc8EndpointINS_15PProfilerParentEEE+0xd): undefined reference to `_ZN7mozilla14ProfilerParent5AllocEONS_3ipc8EndpointINS_15PProfilerParentEEE'
/usr/bin/ld: ../../gfx/ipc/Unified_cpp_gfx_ipc0.o: relocation R_X86_64_PC32 against `_ZN7mozilla14ProfilerParent5AllocEONS_3ipc8EndpointINS_15PProfilerParentEEE' can not be used when making a shared object; recompile with -fPIC
/usr/bin/ld: final link failed: Bad value
clang++: error: linker command failed with exit code 1 (use -v to see invocation)
I've changed the patch to initiate the PProfiler setup in the parent process. Instead of creating the endpoints in the content process and then sending the parent endpoint over to the parent process, we now create the endpoints in the parent process and send the child endpoint over to the child process. This makes the new code more similar to the old code, and it avoids the problem that caused this patch to be backed out:
On Windows 8 64 bit debug tests, we were hitting the failure case in DuplicateHandle at http://searchfox.org/mozilla-central/rev/66d9eb3103e429127c85a7921e16c5a02458a127/ipc/glue/ProtocolUtils.cpp#211 when we were trying to initialize the profiler in a plugin process. I'm not sure why we hit this failure, but Bill's hypothesis was that it has to do with restrictions from the sandbox that we use for plugin processes.
It turns out that the failure to duplicate the handle usually won't cause test failures, because we handle that case gracefully, at http://searchfox.org/mozilla-central/rev/31eec6f59eb303adf3318058cb30bd6a883115a5/ipc/glue/Transport_win.h#50 . However, the call to CrashReporter::AnnotateCrashReport in DuplicateHandle *does* cause a test failure, because we're making that call before the crash reporter has been set up in the plugin process, so we hit bug 1367591 and leak an array.

The patch has changed slightly since the last review, but not significantly so, and I don't think it needs to be re-reviewed.

(In reply to Jan Beich (away from May 25 to June 11) from comment #29)
> Also seems to break non-GeckoProfiler case (e.g. Linux on non-x86, iOS,
> Darwin/X11, BSDs, Solaris).

I was missing an #ifdef in GPUChild.cpp. I think it's fixed now.
I made some more changes.

Most importantly, if ProfilerParent receives a profile, it no longer needs an instance of nsProfiler so that it can pass its profile to the ProfileGatherer. Instead, it now returns a promise from GatherProfiles, and resolves that promise.

Using nsProfiler was a problem when we wanted to pass profiles to it late during shutdown. By the time ProfilerParent::ActorDestroyed was called, the nsProfiler instance was already destroyed, so nsProfiler::GetOrCreate created a new instance, which during initialization used the observer service, which asserts if it's used late during shutdown.

The other change was the use of the MozPromise IPDL infrastructure. See https://groups.google.com/d/msg/mozilla.dev.platform/7on2slDlfIw/0d3JyvCHAwAJ - ProfilerParent::SendGatherProfile now returns a promise, and ProfilerChild::RecvGatherProfile just resolves a promise in the child process instead of calling a different IPDL method (SendProfile) to reply.
Pushed by mstange@themasta.com:
https://hg.mozilla.org/integration/autoland/rev/bb9f906e70e3
Use a top-level PProfiler protocol to control the profiler in other processes. r=billm,njn
https://hg.mozilla.org/mozilla-central/rev/bb9f906e70e3
Status: NEW → RESOLVED
Closed: 7 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla55
Depends on: 1369859
Wontfix for 54, since we are shipping 54 next week.
Depends on: 1374284
Depends on: 1374038
No longer depends on: 1374284
Assignee: nobody → mstange
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: