Closed Bug 1265815 Opened 8 years ago Closed 8 years ago

Spike in Windows GMP crashes in Firefox 46

Categories

(Core :: Audio/Video: GMP, defect, P1)

Unspecified
Windows
defect

Tracking

()

RESOLVED FIXED
mozilla49
Tracking Status
firefox46 blocking fixed
firefox47 + fixed
firefox48 + fixed
firefox49 --- fixed

People

(Reporter: RyanVM, Assigned: cpearce)

References

Details

Attachments

(3 files, 1 obsolete file)

[Tracking Requested - why for this release]: Significant spike in GMP crashes.

It was brought to our attention that the Telemetry crash data shows a significant spike in GMP crashes starting on Beta with the last uplift in March. Anthony Zhang and myself looked at the data on Nightly and the spike appears to have started on the Feb. 25 nightly. Looking at the pushlog range, bug 1250766 stands out as a possible culprit since it landed on Nightly47 and was uplifted to Aurora46 shortly afterward. This theory is especially bolstered because there was a similar drop-off in crash rate in early January when bug 1237145 was landed.

Chris, can you please look into this soon?
Flags: needinfo?(cpearce)
Can you include the crash data in a way that we can see in the bug? The telemetry server isn't accepting my login. But we also want to put data into the bug, so that it will persist while a query may have different results a month from now. 

Too late for 46 fix, but we should be aware of this issue in 47.   If the crash volume is such that we should reopen this in 46 please let me know or change the flag back to affected. Thanks!
(In reply to Liz Henry (:lizzard) (needinfo? me) from comment #2)
> Can you include the crash data in a way that we can see in the bug? The
> telemetry server isn't accepting my login.

Try your @mozilla.com Google account for login, that should work. I hope we'll be able to make publicly visible dashboards out of this in the future but the system there seems to not support that at this time.
KaiRo, my understanding is that crash-stats data for GMP crashes is pretty low, but is there any discernible pattern in GMP crash reports starting around 25-Feb on Nightly/Aurora or on Beta after 46b1 was released?
Flags: needinfo?(kairo)
I've attached the exported CSV for reference (though we don't really expect that data to change).

Note that @mozilla.com users can log into re:dash using their Google Apps accounts - LDAP credentials will not work.
Liz, I don't think we should say "too late for 46" without understanding this better. This is a pretty massive regression and could easily be a sign that we broke video playback for some users. I suggest that we consider this a blocker until we understand it better.
Flags: needinfo?(lhenry)
"Possible release blocker" sounds a lot different than "significant spike".  Who is investigating this? I notice it's assigned to nobody.   

I did not realize this telemetry data was finalized and well understood enough to affect our decisions for release. Is that the case?  My impression was that you all are still testing this system. 

KaiRo: Naturally, when I first looked at this bug this morning I tried my LDAP login, since that is the most incredibly obvious thing to do. However, it did not work.
Flags: needinfo?(lhenry) → needinfo?(benjamin)
RyanVM: I can't access the data you're referring to. I don't see a crash spike in crash-stats.mozilla.com.

Is this crash still happening? We've fixed a crash that was affecting Windows XP users caused by Adobe's GMP not working on machines without SSE2.
Flags: needinfo?(cpearce) → needinfo?(ryanvm)
My understanding is that very few GMP crashes get reported to crash-stats because it requires user opt-in and that submission rate is incredibly low.
(In reply to Chris Pearce (:cpearce) from comment #9)
> Is this crash still happening? We've fixed a crash that was affecting
> Windows XP users caused by Adobe's GMP not working on machines without SSE2.

This fix I'm referring to is bug 1258220, so which landed on beta on 2016-03-28 and Nightly on 2016-03-25.
I think it's important to mention here that on the graph, the purple part is Windows XP - the crash spike *only seems to affect XP users*.

:cpearce, are you using the "Log In With Google" button mentioned earlier in the bug? That is currently the correct way to log into re:dash.

Although the GMPlugin crash rate lowered quite a bit on March 28, the crash rate was still higher than when the plugin wasn't enabled.
(In reply to Anthony Zhang [:azhang] from comment #13)
> I think it's important to mention here that on the graph, the purple part is
> Windows XP - the crash spike *only seems to affect XP users*.
> 
> :cpearce, are you using the "Log In With Google" button mentioned earlier in
> the bug? That is currently the correct way to log into re:dash.

Ah, that works. Thanks for pointing that out.

 
> Although the GMPlugin crash rate lowered quite a bit on March 28, the crash
> rate was still higher than when the plugin wasn't enabled.

So based on my reading of your CSV file, this shows there were crashes on all windows version in build ids up until about April, and in about April the crashes switched to being pretty much only on Windows XP.

How do we map build ids to Firefox versions? That is, are we seeing crashes in Beta46, or in Nightly 48?
Flags: needinfo?(azhang)
Based on reading the source of your query, isn't this data for the nightly channel only?
If I fork the query and modify it to run for beta, the number of crashes goes up significantly:

https://sql.telemetry.mozilla.org/queries/204/source#table

Which isn't surprising, since beta has more users.
(In reply to Chris Pearce (:cpearce) from comment #15)
> Based on reading the source of your query, isn't this data for the nightly
> channel only?

That's correct, but this also occurs on Beta as well [1].

(In reply to Chris Pearce (:cpearce) from comment #14)
> So based on my reading of your CSV file, this shows there were crashes on
> all windows version in build ids up until about April, and in about April
> the crashes switched to being pretty much only on Windows XP.
> 
> How do we map build ids to Firefox versions? That is, are we seeing crashes
> in Beta46, or in Nightly 48?

The buildIDs map to versions as listed in the release calendar [2]. It does seem to be the case that the crashes used to be for all Windows versions, basically stopped after disabling the Adobe thing, and then started up again for Windows XP after we re-enabled it.

So on [1] you're looking at Nightly 48 for the February 25 spike, and Beta 46 for the March 7 spike (Beta 46 was released on March 7).

Feel free to needinfo me again if you have any questions.

[1]: https://sql.telemetry.mozilla.org/queries/197/source#334
[2]: https://wiki.mozilla.org/RapidRelease/Calendar
Flags: needinfo?(azhang)
That is a pretty big spike. We should probably undo whatever it was we did for Adobe then. Unless that makes things worse somehow, like crashing the entire browser.
Assignee: nobody → cpearce
Rank: 5
Priority: -- → P1
We've decided to disable unencrypted decoding via the Adobe GMP in 46. That should drop the crashes, but also means users on Windows XP and Windows without platform codecs won't get MP4 video playback for another release. :(

bsmedberg: how can we get crash reports for these crashes?
Chris, I'm happy to have my Softvision team do some exploratory GMP testing on WinXP if you can confirm which prefs we'll need to make sure are set.
Flags: needinfo?(ryanvm) → needinfo?(rares.bologa)
We detected a spike in crashes in GMP plugin processes that coincide with
turning on unencrypted decoding via GMP.

This happens on all Windows versions, but particularly on Windows XP.

We also need to revert bug 1234099 so that we don't show the plugin in
the add-on manager.
Attachment #8743044 - Flags: review?(ajones)
Comment on attachment 8743044 [details] [diff] [review]
[Beta46] Disable unencrypted Adobe GMP decoding due to crashes

r? spohl for firefox bits.
Attachment #8743044 - Flags: review?(spohl.mozilla.bugs)
Attachment #8743044 - Flags: review?(ajones) → review+
Attachment #8743044 - Flags: review?(spohl.mozilla.bugs) → review+
I will split out hiding the Adobe GMP on WinXP into bug 1265815 per liz's request so that it's easier to track uplifts.
We detected a spike in crashes in GMP plugin processes that coincide with
turning on unencrypted decoding via GMP.

This happens on all Windows versions, but particularly on Windows XP.
Attachment #8743044 - Attachment is obsolete: true
(In reply to Chris Pearce (:cpearce) from comment #23)
> I will split out hiding the Adobe GMP on WinXP into bug 1265815 per liz's
> request so that it's easier to track uplifts.

Copy paste error, I meant bug 1265928.
Comment on attachment 8743058 [details] [diff] [review]
[Beta46] Disable unencrypted Adobe GMP decoding due to crashes

Approval Request Comment
[Feature/regressing bug #]: Unencrypted MP4 playback using Adobe Gecko Media Plugin
[User impact if declined]: GMP child process crashes while trying to play MP4 video. Note: Firefox itself won't crash, the user visible effect is video will just fail to play.
[Describe test coverage new/current, TreeHerder]: We have a test covering unencrypted decoding.
[Risks and why]: Fairly low; this is a revert to previous behaviour
[String/UUID change made/needed]: None
Attachment #8743058 - Flags: review+
Attachment #8743058 - Flags: approval-mozilla-beta?
Attachment #8743058 - Flags: approval-mozilla-aurora?
Comment on attachment 8743058 [details] [diff] [review]
[Beta46] Disable unencrypted Adobe GMP decoding due to crashes

No longer allowing this as it has caused many crashes on 46 beta and we need to look into why. We are leaving this enabled in nightly though.
Attachment #8743058 - Flags: approval-mozilla-beta?
Attachment #8743058 - Flags: approval-mozilla-beta+
Attachment #8743058 - Flags: approval-mozilla-aurora?
Attachment #8743058 - Flags: approval-mozilla-aurora+
Comment on attachment 8743058 [details] [diff] [review]
[Beta46] Disable unencrypted Adobe GMP decoding due to crashes

Also land this on m-r since we are building RC2 from there.
Attachment #8743058 - Flags: approval-mozilla-release+
(In reply to Ryan VanderMeulen [:RyanVM] from comment #4)
> KaiRo, my understanding is that crash-stats data for GMP crashes is pretty
> low, but is there any discernible pattern in GMP crash reports starting
> around 25-Feb on Nightly/Aurora or on Beta after 46b1 was released?

As comment #10 shows, we have almost no reports at all in crash-stats, and no discernible pattern for sure. I'm pretty sure that the crash prompt for GMP doesn't work usefully at least in this case, the few reports we get in may very well be only from people who go to about:crashes and click on reports there.
Flags: needinfo?(kairo)
(In reply to Anthony Zhang [:azhang] from comment #17)
> (In reply to Chris Pearce (:cpearce) from comment #14)
> > How do we map build ids to Firefox versions? That is, are we seeing crashes
> > in Beta46, or in Nightly 48?
> 
> The buildIDs map to versions as listed in the release calendar [2].

We may want to work on a database that we can join data with that actually maps build IDs to released versions (there are some thoughts about building a DB like that in general, no project for it yet though), but right now we have to manually find that out via release calendar or other tools.
Flags: needinfo?(rares.bologa) → needinfo?(cosmin.muntean)
Depends on: 1266195
Flags: needinfo?(benjamin)
I reproduced the crash on Nightly 48.0a1 (2016-04-24) on a Windows XP machine with Intel Pentium 4. When I tried to play a mp4 video (e.g. http://techslides.com/demos/sample-videos/small.mp4), the video failed to play, and I could see crashes in about:crashes, even if the Firefox itself did not crash.
This is reproducible both with preferences "media.wmf.enabled" and "media.webm.enabled" being set to true or false.

All crashes have the same signature: [@ mozilla::gmp::GMPChild::ProcessingError]. The same signature was reported in bug 1262377, where Vimeo videos fail to play on Windows XP Intel Pentium 4.
Flags: needinfo?(cosmin.muntean)
Let's disable unencrypted GMP decoding everywhere until we've got crash reporting hooked up. We can't debug this without crash reporting hooked up. That's happening in bug 1267918.
Depends on: 1267918
(In reply to Chris Pearce (:cpearce) from comment #35)
> https://hg.mozilla.org/releases/mozilla-aurora/rev/40994e384c13

I also took the liberty of disabling this on Firefox 48 (Aurora) since this had slipped through in the last uplift.

Unencrypted decoding via the Adobe Primetime GMP is now disabled everywhere. EME is still working.
https://hg.mozilla.org/mozilla-central/rev/d0dae4dc43e2
Status: NEW → RESOLVED
Closed: 8 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla49
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: