Closed Bug 1312540 Opened 8 years ago Closed 8 years ago

PContent::Msg_GetGMPPluginVersionForAPI sends a sync IPC message to the parent and causes main thread IO

Categories

(Core :: Audio/Video: GMP, defect, P1)

50 Branch
defect

Tracking

()

RESOLVED FIXED
mozilla52
Tracking Status
firefox52 --- fixed

People

(Reporter: mconley, Assigned: cpearce)

References

Details

Attachments

(4 files)

In bug 1310255, we've been examining stacks from BHR for hangs that bog down tab switch.

This (pseudo)stack:

Startup::XRE_InitChildProcess
nsInputStreamPump::OnInputStreamReady
nsInputStreamPump::OnStateStart
PContent::Msg_GetGMPPluginVersionForAPI

Showed up a bunch. As of this writing (and it's only a day's worth of data, but still), this is #3 in things stalling tab switch.

We should avoid sending a sync message here if we can avoid it - and if this is causing main thread IO in the parent process (I haven't looked but I trust billm's analysis), we should definitely avoid that as well.
Blocks: 1300411
See Also: → 1310255
Hey peterv,

I believe you added the sync GetGMPPluginVersionForAPI message in bug 1057908. Any ideas on how we could avoid these sync messages entirely?
Flags: needinfo?(peterv)
Summary: PContent::Msg_GetGMPPluginVersionForAPI sends a sync message to the parent and causes main thread IO → PContent::Msg_GetGMPPluginVersionForAPI sends a sync IPC message to the parent and causes main thread IO
Blocks: 1174239
(In reply to Mike Conley (:mconley) - (high latency on reviews and needinfos) from comment #1)
> I believe you added the sync GetGMPPluginVersionForAPI message in bug
> 1057908. Any ideas on how we could avoid these sync messages entirely?

I don't know, IIRC a lot of the callers needed the answer synchronously. Chris, any idea?

We might be able to cache some of this data in both processes (in child process for avoiding IPC and in the main process for avoiding main-thread IO).
Flags: needinfo?(peterv) → needinfo?(cpearce)
The callers do need an answer synchronously. We could certainly build caches in each process of the GMP plugin version info. There is a few infrastructure changes we'd need to make in the media stack to make that work.

The question is, how urgent is this? I understand tab spinners are a bad thing, but I'd like to understand how harmful a #3 tab spinner is? That will help us decide how urgently to fix this.
Flags: needinfo?(cpearce)
(In reply to Chris Pearce (:cpearce) from comment #3)
> The question is, how urgent is this? I understand tab spinners are a bad
> thing, but I'd like to understand how harmful a #3 tab spinner is? That will
> help us decide how urgently to fix this.

Sure, I can try to answer that.

For the samples we've collected over the past few days, the #1 cause for tab spinners is not very illuminating: "Startup::XRE_InitChildProcesses" is the lone pseudostack entry, which essentially tells us that the content process is indeed running, but that we're running inside something that has no pseudostack labels. Not very useful. This accounts for 60% of the stacks, but once we get the right labels landed, this could either resolve to be one thing, or (more likely) several things that accumulate to 60% of the top 3 tab switch hangs (and ~28% of all hangs we've gathered).

The #2 cause for tab switch spinners appear to be things kicked off by unknown chrome scripts. Because we have JS interruption to force painting, I have to assume these chrome scripts are calling into something that is not necessarily interruptible (Maybe js-ctypes for example). Hard to tell at this point, but I'm hoping that once bug 1312597 is in Nightly, it'll become a bit clearer. These unknown chrome scripts account for 20% of the top 3 hangs (and ~9% of all hangs we've gathered)

#3 is PContent::Msg_GetGMPPluginVersionForAPI, and this accounts for the last 20% of the top 3 tab switch hangs (and ~9% of all hangs we've gathered).

So we're still refining BHR and our instrumentation to help us figure out the top 2 causes. PContent::Msg_GetGMPPluginVersionForAPI is currently the only one that we feel is actionable at this point until we get more data.

Does that help illuminate how urgent this is? It's definitely not the sole cause, but it's also currently one of the top causes, and the only one that is currently actionable.
Flags: needinfo?(cpearce)
(In reply to Mike Conley (:mconley) - (high latency on reviews and needinfos) from comment #4)
> Does that help illuminate how urgent this is? It's definitely not the sole
> cause, but it's also currently one of the top causes, and the only one that
> is currently actionable.

So 20% of tab spinners are caused by PContent::Msg_GetGMPPluginVersionForAPI. 

Do you have a feel for how common tab spinners are? That is, do you have a graph that says (say) 5% of all tab switches hit a spinner, and of that 20% are caused by PContent::Msg_GetGMPPluginVersionForAPI?
Flags: needinfo?(cpearce) → needinfo?(mconley)
(In reply to Chris Pearce (:cpearce) from comment #5)
> (In reply to Mike Conley (:mconley) - (high latency on reviews and
> needinfos) from comment #4)
> > Does that help illuminate how urgent this is? It's definitely not the sole
> > cause, but it's also currently one of the top causes, and the only one that
> > is currently actionable.
> 
> So 20% of tab spinners are caused by
> PContent::Msg_GetGMPPluginVersionForAPI. 
> 
> Do you have a feel for how common tab spinners are? That is, do you have a
> graph that says (say) 5% of all tab switches hit a spinner

We have a dashboard here: https://chutten.github.io/bug1310250/

Currently, the data suggests that 4% of all tab switches across all telemetry pings we've received, result in a spinner.

I don't know enough about your current slate of work to tell you how urgent it is in comparison to all of the other stuff you're working on (that might be above my paygrade). Roping in blassey.
Flags: needinfo?(mconley) → needinfo?(blassey.bugs)
So telemetry indicates that 0.8% of tab switches hit a spinner due to this. That seems pretty significant. Are you looking to uplift these fixes?
Flags: needinfo?(mconley)
(In reply to Chris Pearce (:cpearce) from comment #7)
> So telemetry indicates that 0.8% of tab switches hit a spinner due to this.
> That seems pretty significant. Are you looking to uplift these fixes?

I guess it depends on how extensive the changes are. Tab switch spinners are bad. Under-tested GMP support may be worse - but I might let Product make that call if it comes to it.
Flags: needinfo?(mconley)
Rank: 25
Priority: -- → P2
I reckon this must be caused by the HasGMPFor() calls in GMPDecoderModule::UpdateUsableCodecs(). All other places that call into GetGMPPluginVersionForAPI come from API calls from EME and WebRTC, and I struggle to believe they're common enough to show up this frequently.

GMPDecoderModule::UpdateUsableCodecs() is called every time a GMP is added or removed to/from the GMPService. Which happens on startup a few times, or when a new GMP update downloads.

GMPDecoderModule::UpdateUsableCodecs() supports audio/video decoding via GMPs, which is a feature we have preffed off, and I was planning on removing after we drop XP support. A simple way to test my theory would be to move the body of GMPDecoderModule::UpdateUsableCodecs() behind a check of media.gmp.decoder.enabled==true. Or even just not send the PContent.NotifyGMPsChanged message. That would be a simple patch that would be safe to uplift to beta.
Priority: P2 → P1
We need gmp-changed for MediaKeySystemAccessManager::Observe() so that CDM requests awaiting a download to complete resolve. So preffing out the body of GMPDecoderModule::UpdateUsableCodecs() seems like it.
Gah, blanking out GMPDecoderModule::UpdateUsableCodecs() if media.gmp.decoder.enabled is off won't work as EMEDecoderModule::SupportsMimeType() relies on the structure that GMPDecoderModule::UpdateUsableCodecs() populates.
Flags: needinfo?(blassey.bugs)
Chris, would bug 1313200 help you here? I haven't followed the recent comments, but I thought I'd ask. If it would help, I can bump the priority of it.
Flags: needinfo?(cpearce)
With bug 1312597 landed, the #2 top frame set I mentioned in comment 4 has been split apart into smaller, more actionable bits.

Doing so has also moved PContent::Msg_GetGMPPluginVersionForAPI into the #2 spot.
(In reply to Bill McCloskey (:billm) from comment #12)
> Chris, would bug 1313200 help you here? I haven't followed the recent
> comments, but I thought I'd ask. If it would help, I can bump the priority
> of it.

I don't think rushing that would necessarily help (though doing it still sounds like a good idea!), as the API here needs to be synchronous and some call sites run from JS.
Flags: needinfo?(cpearce)
Assignee: nobody → cpearce
Comment on attachment 8806257 [details]
Bug 1312540 - Maintain a cache of GMPs capabilities in content processes.

https://reviewboard.mozilla.org/r/89762/#review89214

::: dom/ipc/ContentChild.h:1
(Diff revision 1)
>  /* -*- Mode: C++; tab-width: 8; indent-tabs-mode: nil; c-basic-offset: 2 -*- */

In the commit description: 'subseuent' -> 'subsequent'.

::: dom/ipc/PContent.ipdl:340
(Diff revision 1)
>      nsCString url;
>      PBlob blob;
>      Principal principal;
>  };
>  
> +struct GMPAPITags {

Style: '{' on new line.

::: dom/ipc/PContent.ipdl:345
(Diff revision 1)
> +struct GMPAPITags {
> +    nsCString api;
> +    nsCString[] tags;
> +};
> +
> +struct GMPCapabilityData {

Style: '{' on new line.

::: dom/media/gmp/GMPServiceChild.cpp:136
(Diff revision 1)
>  }
>  
> +typedef mozilla::dom::GMPCapabilityData GMPCapabilityData;
> +typedef mozilla::dom::GMPAPITags GMPAPITags;
> +
> +struct GMPCapabilityAndVersion {

Style: '{' on new line.

::: dom/media/gmp/GMPServiceChild.cpp:137
(Diff revision 1)
>  
> +typedef mozilla::dom::GMPCapabilityData GMPCapabilityData;
> +typedef mozilla::dom::GMPAPITags GMPAPITags;
> +
> +struct GMPCapabilityAndVersion {
> +

Unnecessary empty line.

::: dom/media/gmp/GMPServiceChild.cpp:152
(Diff revision 1)
> +      }
> +      mCapabilities.AppendElement(Move(cap));
> +    }
> +  }
> +
> +  nsCString ToString() const {

Style: '{' on new line.

::: dom/media/gmp/GMPServiceParent.cpp:877
(Diff revision 1)
> +      bool found = false;
> +      for (const GMPCapabilityData& cap : caps) {
> +        if (cap.name().Equals(name)) {
> +          found = true;
> +        }
> +      }
> +      if (found) {
> +        continue;
> +      }

No need for 'found' here, you could simply do:
  for (...) {
    if (...) {
      continue;
    }
  }

::: dom/media/gmp/GMPServiceParent.cpp:889
(Diff revision 1)
> +        continue;
> +      }
> +      GMPCapabilityData x;
> +      x.name() = name;
> +      x.version() = gmp->GetVersion();
> +      for (const GMPCapability& tag: gmp->GetCapabilities()) {

Style: Add space before ':'.

::: dom/media/gmp/GMPServiceParent.cpp:895
(Diff revision 1)
> +        x.capabilities().AppendElement(GMPAPITags(tag.mAPIName, tag.mAPITags));
> +      }
> +      caps.AppendElement(Move(x));
> +    }
> +  }
> +  nsTArray<ContentParent*> children;

Children are parents? :-P
Attachment #8806257 - Flags: review?(gsquelart) → review+
Comment on attachment 8806258 [details]
Bug 1312540 - Simplify behaviour of GMP::GetPluginVersionForAPI.

https://reviewboard.mozilla.org/r/89764/#review89228
Attachment #8806258 - Flags: review?(gsquelart) → review+
Comment on attachment 8806259 [details]
Bug 1312540 - Use GMP caps cached in content process for GetPluginVersion.

https://reviewboard.mozilla.org/r/89766/#review89232

::: dom/media/gmp/GMPParent.h:121
(Diff revision 1)
>    void Shutdown();
>  
>    // This must not be called while we're in the middle of abnormal ActorDestroy
>    void DeleteProcess();
>  
>    bool SupportsAPI(const nsCString& aAPI, const nsCString& aTag);

You should remove this orphaned declaration.

::: dom/media/gmp/GMPParent.cpp:587
(Diff revision 1)
> +bool
> +GMPCapability::Supports(const nsTArray<GMPCapability>& aCapabilities,
> +                        const nsCString& aAPI,
> +                        const nsCString& aTag)
> +{
> +  for (uint32_t i = 0; i < aCapabilities.Length(); i++) {

I think you could use a range-for here and for the next look, but it may be too much of a tangent for this patch, so up to you if you'd prefer not to make this change now.
Attachment #8806259 - Flags: review?(gsquelart) → review+
Comment on attachment 8806260 [details]
Bug 1312540 - Remove GetGMPPluginVersionForAPI IPC.

https://reviewboard.mozilla.org/r/89768/#review89350

LGTM - though a heads up that your try build has gone red.
Attachment #8806260 - Flags: review?(mconley) → review+
Comment on attachment 8806257 [details]
Bug 1312540 - Maintain a cache of GMPs capabilities in content processes.

https://reviewboard.mozilla.org/r/89762/#review89344

::: dom/ipc/ContentParent.cpp:1418
(Diff revision 1)
>      mGatherer = static_cast<ProfileGatherer*>(gatherer.get());
>  
>      StartProfiler(currentProfilerParams);
>    }
>  #endif
> +  RefPtr<GeckoMediaPluginServiceParent> gmps(GeckoMediaPluginServiceParent::GetSingleton());

Please put a newline before this.

::: dom/media/gmp/GMPServiceChild.cpp:185
(Diff revision 1)
> +
> +static nsCString
> +ToString(const nsTArray<GMPCapabilityData>& aCapabilities)
> +{
> +  nsCString s;
> +  for (const GMPCapabilityAndVersion& gmp : *sGMPCapabilities) {

You're iterating over sGMPCapabilities and never using aCapabilities. Pretty confusing when reading the code.

::: dom/media/gmp/GMPServiceParent.cpp:859
(Diff revision 1)
> +void
> +GeckoMediaPluginServiceParent::UpdateContentProcessGMPCapabilities()
> +{
> +  if (!NS_IsMainThread()) {
> +    RefPtr<GeckoMediaPluginServiceParent> self(this);
> +    NS_DispatchToMainThread(NS_NewRunnableFunction([self]() -> void {

Any reason not to use NewRunnableMethod here?

::: dom/media/gmp/GMPServiceParent.cpp:872
(Diff revision 1)
> +  typedef mozilla::dom::ContentParent ContentParent;
> +
> +  nsTArray<GMPCapabilityData> caps;
> +  {
> +    MutexAutoLock lock(mMutex);
> +    for (const RefPtr<GMPParent>& gmp : mPlugins) {

How many plugins are we likely to see here? I'm a little worried about the n^2 algorithm.

::: dom/media/gmp/GMPServiceParent.cpp:897
(Diff revision 1)
> +      caps.AppendElement(Move(x));
> +    }
> +  }
> +  nsTArray<ContentParent*> children;
> +  ContentParent::GetAll(children);
> +  for (ContentParent* child : children) {

Slightly nicer to do:
for (auto* cp : ContentParent::AllProcesses(ContentParent::eLive))
Attachment #8806257 - Flags: review?(wmccloskey) → review+
Comment on attachment 8806257 [details]
Bug 1312540 - Maintain a cache of GMPs capabilities in content processes.

https://reviewboard.mozilla.org/r/89762/#review89214

> No need for 'found' here, you could simply do:
>   for (...) {
>     if (...) {
>       continue;
>     }
>   }

That's a double for loop. So a continue in the inner loop won't cause the outer loop to continue, and its the outer loop we want to continue. So I need to 'break' out of the inner loop and 'continue' in the outer loop. So I'll add a break here.

> Children are parents? :-P

Confusingly, yes. These are the parent side actors of the parent<->child relationship of the chrome process being the parent of the child content processes. So our connection to the child process in the chrome process happens over the "parent" actors.
(In reply to Chris Pearce (:cpearce) from comment #26)
> > Children are parents? :-P
> 
> Confusingly, yes. These are the parent side actors of the parent<->child
> relationship of the chrome process being the parent of the child content
> processes. So our connection to the child process in the chrome process
> happens over the "parent" actors.

By this reasoning we would be justified in calling ContentChild instances "parents" because they are the child side reflections of ContentParents!

Standard practice is to call ContentParent instances "parents" and ContentChild instances "children".
Comment on attachment 8806257 [details]
Bug 1312540 - Maintain a cache of GMPs capabilities in content processes.

https://reviewboard.mozilla.org/r/89762/#review89214

> That's a double for loop. So a continue in the inner loop won't cause the outer loop to continue, and its the outer loop we want to continue. So I need to 'break' out of the inner loop and 'continue' in the outer loop. So I'll add a break here.

Oops, sorry for giving a wrong advice, this is embarrassing.
Yes, a 'break' in the inner loop would help.

Or you could 'goto' the bottom of the outer loop block! :-P
Or use std::none_of. (I've been meaning to use <algorithm>, but never got the courage to start.)
But I'm digging myself deeper, ignore me.
Pushed by cpearce@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/97147b709fd8
Maintain a cache of GMPs capabilities in content processes. r=billm,gerald
https://hg.mozilla.org/integration/autoland/rev/8d38f007c911
Simplify behaviour of GMP::GetPluginVersionForAPI. r=gerald
https://hg.mozilla.org/integration/autoland/rev/255aac711339
Use GMP caps cached in content process for GetPluginVersion. r=gerald
https://hg.mozilla.org/integration/autoland/rev/c6e7c38bd386
Remove GetGMPPluginVersionForAPI IPC. r=mconley
I found a minor regression in the EME UI introduced by these patches. Will fix in bug 1314797.
Mike: When will we get, or do we now have, data confirming that tab switches have improved due to the work in this bug?
Flags: needinfo?(mconley)
The dashboard I've been monitoring is https://chutten.github.io/bug1310250/.

It's still a bit early to draw any conclusions yet - not enough data has come in just yet. We'll need a few more days for there to be enough "saturation", I think.
Flags: needinfo?(mconley)
Comment on attachment 8806257 [details]
Bug 1312540 - Maintain a cache of GMPs capabilities in content processes.

Approval Request Comment
[Feature/regressing bug #]: EME/GMP decoding
[User impact if declined]: We think that 0.4% of all tab switches get jank because of a problem fixed in this patch. So if we don't uplift this, we'll continue to get jank.
[Describe test coverage new/current, TreeHerder]: We have lots of tests that depend on the code I change in this patch.
[Risks and why]: Low; it would be quite obvious if this code didn't work!
[String/UUID change made/needed]: None.
Attachment #8806257 - Flags: approval-mozilla-beta?
Comment on attachment 8806258 [details]
Bug 1312540 - Simplify behaviour of GMP::GetPluginVersionForAPI.

Approval Request Comment
[Feature/regressing bug #]: EME/GMP decoding
[User impact if declined]: We think that 0.4% of all tab switches get jank because of a problem fixed in this patch. So if we don't uplift this, we'll continue to get jank.
[Describe test coverage new/current, TreeHerder]: We have lots of tests that depend on the code I change in this patch.
[Risks and why]: Low; it would be quite obvious if this code didn't work!
[String/UUID change made/needed]: None.
Attachment #8806258 - Flags: approval-mozilla-beta?
Comment on attachment 8806259 [details]
Bug 1312540 - Use GMP caps cached in content process for GetPluginVersion.

Approval Request Comment
[Feature/regressing bug #]: EME/GMP decoding
[User impact if declined]: We think that 0.4% of all tab switches get jank because of a problem fixed in this patch. So if we don't uplift this, we'll continue to get jank.
[Describe test coverage new/current, TreeHerder]: We have lots of tests that depend on the code I change in this patch.
[Risks and why]: Low; it would be quite obvious if this code didn't work!
[String/UUID change made/needed]: None.
Attachment #8806259 - Flags: approval-mozilla-beta?
Comment on attachment 8806260 [details]
Bug 1312540 - Remove GetGMPPluginVersionForAPI IPC.

Approval Request Comment
[Feature/regressing bug #]: EME/GMP decoding
[User impact if declined]: We think that 0.4% of all tab switches get jank because of a problem fixed in this patch. So if we don't uplift this, we'll continue to get jank.
[Describe test coverage new/current, TreeHerder]: We have lots of tests that depend on the code I change in this patch.
[Risks and why]: Low; it would be quite obvious if this code didn't work!
[String/UUID change made/needed]: None.
Attachment #8806260 - Flags: approval-mozilla-beta?
Comment on attachment 8806257 [details]
Bug 1312540 - Maintain a cache of GMPs capabilities in content processes.

After discussing with cpearce, we decide to let this ride the train on 52.
Attachment #8806257 - Flags: approval-mozilla-beta? → approval-mozilla-beta-
Attachment #8806258 - Flags: approval-mozilla-beta? → approval-mozilla-beta-
Attachment #8806259 - Flags: approval-mozilla-beta? → approval-mozilla-beta-
Attachment #8806260 - Flags: approval-mozilla-beta? → approval-mozilla-beta-
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: