Open Bug 764157 Opened 8 years ago Updated 7 years ago

Plugins pegged at 100% CPU if initialized while viewing OS X lock screen

Categories

(Core :: Plug-ins, defect, P3)

All
macOS
defect

Tracking

()

Tracking Status
firefox14 - ---
firefox15 - ---

People

(Reporter: MattN, Unassigned)

References

Details

Attachments

(7 files, 3 obsolete files)

I update to today's Nightly [16.0a1 (2012-06-12)] and while it was restarting, I locked my screen (switched to the login window, not sleep) and then walked away to get lunch.

When I returned, the fans on the computer were running and Firefox had been restarted.  Activity monitor shows both Flash and Google Talk Plugin using 100% CPU each.  I've been waiting for many minutes and neither of them have died down.  I do have pinned tabs that use these two plugins so their presence is fine but begin pegged at 100% CPU is unusual.
Google Talk NPAPI Plugin

    File: googletalkbrowserplugin.plugin
    Version: 2.9.10.7526

Shockwave Flash

    File: Flash Player.plugin
    Version: 11.3.300.257
    Shockwave Flash 11.3 r300
This first and most obvious question is can you reproduce this consistently?

If so, please give detailed STR.

Over the last few months I've seen several instances of what I think is one or more plugins eating far too much CPU -- enough to make the browser UI unresponsive.  But this was only in-process plugins (on OS X 10.5), and only in FF releases.  And I've never been able to reproduce it.

This may just be plugin bugs (in all or almost all of my cases it was the Flash plugin that ate too much CPU).  Or it may be something we're doing wrong.  Or some combination of the two.

But we need to know a lot more before we try to do anything about it.
Reproducible 100% of the time with the following STR (safe mode and regular mode):

Prereq: Enable fast user switching via System Prefs > Users & Groups > Login Options > "Show fast user switching menu as..." and then you will see the menu by the clock.

1) Pin the tab https://www.adobe.com/products/flashplayer.html
2) Restart Firefox
3) As soon as Firefox starts to start, switch to the login window using OS X's fast user switching menu and choosing "Login Window..." 
4) Wait a bit on the login screen for Firefox to restore the tabs
5) Login to the same account and notice flash player using 100% CPU.

The same STR will cause the Google Talk Plugin and Flash to use 100% CPU at the same time if you switch the URL in step 1 with mail.google.com and you are logged in and have installed the Google Talk NPAPI Plugin. (see the "Call Phone" link in the chat widget).
> 1) Pin the tab https://www.adobe.com/products/flashplayer.html

I can't reproduce this with just one tab pinned.  But I can with five identical tabs pinned.  I imagine that's because it takes a bit longer for Firefox to finish loading five tabs, which gives me enough time to switch to the login window.

Thanks very much, Matthew, for your report.

Now I need to figure out what to do from here :-)

(I wish fast user switching had a keyboard shortcut.  But it doesn't by default, and I can't figure out how to give it one.)
Summary: Flash and Google Talk plugins pegged at 100% CPU each after Nightly update → Plugins pegged at 100% CPU if initialized while viewing OS X lock screen
Note that this bug is more important that its STR make it seem.  Yes, very few people will follow exactly those steps.  But (see comment #5) I myself have seen this several times over the last few months, and so I imagine have many others.
(In reply to Steven Michaud from comment #8)
> Note that this bug is more important that its STR make it seem.  Yes, very
> few people will follow exactly those steps.  But (see comment #5) I myself
> have seen this several times over the last few months, and so I imagine have
> many others.

Yes, I agree that it's probably worthwhile to figure out the cause to see other ways it can be triggered. I also see 100% CPU usage of plugins when not following these STR and since it seems to affect both GTalk and Flash, it likely affects all plugins.  This leads me to believe it's a problem on our side rather than with both plugins.
> it likely affects all plugins

Yes, it does.

I've now reproduced it with an altered copy of my testing plugin from bug 441880.  (I altered it by calling "sleep()" in its NPP_New() function.)

But, as important as this bug is, I've got others that are still more important.  So I'll need to put this aside for a while (hopefully not for long).
Matthew, it'd be really valuable (and save me some time) if you could find a regression range for this bug.

I don't think it's always been with us.
(In reply to Steven Michaud from comment #11)
> Matthew, it'd be really valuable (and save me some time) if you could find a
> regression range for this bug.

Won't have time for this in the next few days.  If someone else wants to find the reggression range, go ahead.
http://hg.mozilla.org/mozilla-central/pushloghtml?fromchange=30161b298513&tochange=447556784745

Last good: Firefox 11 from 17 November
First bad: Firefox 11 from 18 November

Used Matthew's STR from comment 6, with four pinned tabs of the same flash site: https://www.adobe.com/products/flashplayer.html

CPU usage was around 0.x to 3 until the build from the 18th. Afterwards, CPU usage had over 90% consistently after following the steps (usually around 95-96).

Erased the flags for the moment, if there still is something more to do here, please re-add.
(In reply to comment #13)

Nothing obvious in this range.

Just guessing, but I'd double-check whether the following has anything to do with this bug:

http://hg.mozilla.org/mozilla-central/rev/c94985276834
Matt Woodrow — Bug 703430 - Cache Azure mac fonts in gfxFontMac. r=jrmuizel
(In reply to Steven Michaud from comment #14)
> http://hg.mozilla.org/mozilla-central/rev/c94985276834
> Matt Woodrow — Bug 703430 - Cache Azure mac fonts in gfxFontMac. r=jrmuizel

Given that suspicion, sending over to Matt.

Matt - can you prepare a try build with bug 703430 backed out for Virgil to test? Thanks!
Assignee: nobody → matt.woodrow
I think bug 671639 is a more likely cause as it's related to plugins on Mac.
(In reply to comment #16)

I missed that one.  It'd also be worth checking.

But since we're all just guessing, best of all if someone does hg bisect over the entire range.
Agreed, my change only affects fonts on Azure canvas, it won't affect plugins at all.
Ran hg bisect and built Firefox on the same Mac machine. This is the last range after the latest test with that build. Revision b201f2434265 was bad, so that would make the new regression range:

http://hg.mozilla.org/mozilla-central/pushloghtml?fromchange=30161b298513&tochange=b201f2434265

This rules out bug 703430.

I can dwell into this tomorrow a bit too (had a few warnings and compilation errors that made it harder at first). I would double check with a try server build though when a bug will be found as culprit to rule out any build issues.
It must be a regression of Bug 671639.
Assignee: matt.woodrow → bgirard
Blocks: 671639
CGLChoosePixelFormat fails and that causes the cpu to be pegged:
http://mxr.mozilla.org/mozilla-central/source/gfx/thebes/nsCoreAnimationSupport.mm#466

I'm tempted to make it abort drawing instead of spinning. The might be a better solution to restart drawing but I don't think it's worth investigating at this point in time.
It seems like the fast login switch has serious bugs. I was able to see my terminal and browser window.
> It seems like the fast login switch has serious bugs.

That wouldn't surprise me.  But that isn't the only way to trigger this bug (I've also seen it with fast user switching off).

If possible, please try to find a workaround that stop's this bug's STR from pegging the plugin-container process.

Yes, this is an Apple bug.  But they're extremely unlikely to fix it unless it effects "their" products (e.g. Safari), or we can find STR that "work" for multiple browsers.  And we're probably very lucky to have the STR we *do* have.
Attached patch Tentative fix (obsolete) — Splinter Review
I have this tentative fix here but everytime I try it my system hangs in the login screen.

I'm just trying to stop drawing something of the same resolution if it previously failed. This is useful if the draw allocation is too large.
Attached patch patch (obsolete) — Splinter Review
Attachment #635942 - Attachment is obsolete: true
Attached patch patch (obsolete) — Splinter Review
Add a better comment
Attachment #635963 - Attachment is obsolete: true
Attached patch patch v2Splinter Review
Fixes the problem. This patch can be tried by applying the other patch to simulate the bug.
Attachment #635964 - Attachment is obsolete: true
Attachment #635968 - Flags: review?(smichaud)
Thanks, Benoit!

If you don't mind, I'll put off doing the review until Monday :-)
Not at all, it's a bit of an oddball case so I'm not in a rush to land it. Still nice to fix.

I didn't manage to test the problem with this patch (every time I tried 10.7 froze while switching user) but I'm confident my simulate patch does the right thing.
Since bug 671639 has been in Firefox since 11, untracking for release. We would consider uplifting to aurora/beta when completed, however.
(In reply to Benoit Girard (:BenWa) from comment #28)
> Created attachment 635968 [details] [diff] [review]
> patch v2

This patch fixes the 100% CPU usage for me but not the fact that the plugins don't get drawn at all after unlocking the computer, so it's still a bad experience.  I didn't include this in the original STR since I first experienced this with invisible plugins (Flash and Google Talk in mail.google.com) so I didn't notice.  

If this is a different problem then we can move it to a different bug if you would like. (It has the same STR as comment 6 but there should be a step 6 to verify that the plugin content is visible after unlocking.)

I hit the following assertion while testing this patch with Flash and Java app tabs: 
###!!! ASSERTION: nsCARenderer::Render failure: 'Not Reached', file /Users/matthew/mozilla-central/dom/plugins/base/nsPluginInstanceOwner.cpp, line 1557

BenWa, I never have the glitches locking the computer with multiple monitors on 10.7.4 and I use it every time I leave my computer.
Comment on attachment 635968 [details] [diff] [review]
patch v2

> CGLChoosePixelFormat fails and that causes the cpu to be pegged:
> http://mxr.mozilla.org/mozilla-central/source/gfx/thebes/nsCoreAnimationSupport.mm#466

What I see is failures at CGLCreateContext() (a few lines down).  Once they start happening they keep happening, over and over.  This is (apparently) what pegs the CPU.

This is surely Apple's bug.  But maybe we can work around it by never making the call to CGLChoosePixelFormat() or CGLCreateContext() under the conditions that trigger it.  I don't much like leaving the plugin undrawn.

Later this week I'll try to find out what those conditions might be -- whether something is invisible that shouldn't be, whether we're on the wrong screen, and so forth.  If I do find them, then maybe we can do something like my patch for bug 752294.
Another option we could take would be to only disable this with a debugger attached:
http://developer.apple.com/library/mac/#qa/qa1361/_index.html

This and safe mode really interferes with debugging turn around time.
Attachment #635968 - Flags: review?(smichaud)
Sorry for the long wait reviewing your patch, Benoit.

I've gone back to this bug several times, looking for a better workaround, but I still haven't found one.
No problem, just clearing the review queue. Feel free to look at the patch if you decide it's a good approach.
Assignee: bgirard → nobody
nsCoreAnimationSupport.mm is gone. Steven, is this still relevant?
Flags: needinfo?(smichaud)
> http://mxr.mozilla.org/mozilla-central/source/gfx/thebes/nsCoreAnimationSupport.mm#466

This code has moved, and is now here:

http://mxr.mozilla.org/mozilla-central/source/gfx/2d/QuartzSupport.mm#559

So I assume nothing has changed, and this bug still happens.

I'm now less concerned about it than I was when I wrote comment #8 -- I only ever saw those problems on OS X 10.5, running in 32-bit mode and with plugins running in-process.

I still hope to get back to this, eventually.  But it may be quite a while before I have the time.
Flags: needinfo?(smichaud)
Priority: -- → P3
You need to log in before you can comment on or make changes to this bug.