Created attachment 294167 [details] [diff] [review] fix I noticed that my dogfood trunk build was running slowly and using a lot of CPU even when I wasn't actively using it. I did some digging and discovered that my GMail window was repainting itself constantly. More digging showed that the window was being invalidated via the following stack: #0 nsChildView::Invalidate (this=0x41d59150, aRect=@0xbfffce08, aIsSynchronous=0) at /Users/roc/mozilla-checkin/mozilla/widget/src/cocoa/nsChildView.mm:1135 #1 0x195532c9 in nsViewManager::UpdateWidgetArea (this=0x41d51830, aWidgetView=0x41d56fe0, aDamagedRegion=@0xbfffcec8, aIgnoreWidgetView=0x0) at /Users/roc/mozilla-checkin/mozilla/view/src/nsViewManager.cpp:855 #2 0x1955312a in nsViewManager::UpdateWidgetArea (this=0x41d51830, aWidgetView=0x4391ebe0, aDamagedRegion=@0xbfffd048, aIgnoreWidgetView=0x0) at /Users/roc/mozilla-checkin/mozilla/view/src/nsViewManager.cpp:838 #3 0x1955312a in nsViewManager::UpdateWidgetArea (this=0x41d51830, aWidgetView=0x4391ded0, aDamagedRegion=@0xbfffd1c8, aIgnoreWidgetView=0x0) at /Users/roc/mozilla-checkin/mozilla/view/src/nsViewManager.cpp:838 #4 0x1955312a in nsViewManager::UpdateWidgetArea (this=0x41d51830, aWidgetView=0x41d41bf0, aDamagedRegion=@0xbfffd348, aIgnoreWidgetView=0x0) at /Users/roc/mozilla-checkin/mozilla/view/src/nsViewManager.cpp:838 #5 0x1955312a in nsViewManager::UpdateWidgetArea (this=0x41d51830, aWidgetView=0x40093e00, aDamagedRegion=@0xbfffd518, aIgnoreWidgetView=0x0) at /Users/roc/mozilla-checkin/mozilla/view/src/nsViewManager.cpp:838 #6 0x195534e9 in nsViewManager::UpdateView (this=0x41d51830, aView=0x44d4fe30, aRect=@0xbfffd594, aUpdateFlags=0) at /Users/roc/mozilla-checkin/mozilla/view/src/nsViewManager.cpp:899 #7 0x191f0b3d in nsPluginInstanceOwner::InvalidateRect (this=0x43972240, invalidRect=0xbfffd688) at /Users/roc/mozilla-checkin/mozilla/layout/generic/nsObjectFrame.cpp:2040 #8 0x432e2e40 in nsPluginInstancePeerImpl::InvalidateRect (this=0x449691d0, invalidRect=0xbfffd688) at /Users/roc/mozilla-checkin/mozilla/modules/plugin/base/src/nsPluginInstancePeer.cpp:879 #9 0x432c4d27 in _invalidaterect (npp=0x44bfb3dc, invalidRect=0xbfffd688) at /Users/roc/mozilla-checkin/mozilla/modules/plugin/base/src/ns4xPlugin.cpp:1294 #10 0x453d3c31 in dyld_stub__Z14CallGetServiceI17nsICaseConversionEjPKcPPT_ () #11 0x453eaf0a in Flash_EnforceLocalSecurity () #12 0x452394a0 in dyld_stub__Z14CallGetServiceI17nsICaseConversionEjPKcPPT_ () #13 0x453e4710 in Flash_EnforceLocalSecurity () #14 0x4527d961 in dyld_stub__Z14CallGetServiceI17nsICaseConversionEjPKcPPT_ () #15 0x453ee5d9 in Flash_EnforceLocalSecurity () #16 0x453da78a in Flash_EnforceLocalSecurity () #17 0x432c7fbd in ns4xPluginInstance::HandleEvent (this=0x44bfb3c0, event=0xbfffd9cc, handled=0xbfffd9c8) at /Users/roc/mozilla-checkin/mozilla/modules/plugin/base/src/ns4xPluginInstance.cpp:1267 #18 0x191f6b0d in nsPluginInstanceOwner::Notify (this=0x43972240) at /Users/roc/mozilla-checkin/mozilla/layout/generic/nsObjectFrame.cpp:3797 #19 0x01372c07 in nsTimerImpl::Fire (this=0x43d4ece0) at /Users/roc/mozilla-checkin/mozilla/xpcom/threads/nsTimerImpl.cpp:403 #20 0x01372e0b in nsTimerEvent::Run (this=0x3fe22150) at /Users/roc/mozilla-checkin/mozilla/xpcom/threads/nsTimerImpl.cpp:487 #21 0x0136eb33 in nsThread::ProcessNextEvent (this=0x29129b0, mayWait=0, result=0xbfffdb84) at /Users/roc/mozilla-checkin/mozilla/xpcom/threads/nsThread.cpp:510 #22 0x013141b3 in NS_ProcessPendingEvents_P (thread=0x29129b0, timeout=20) at nsThreadUtils.cpp:180 So this is plugin "idle events" being delivered to a Flash instance. Flash is then invalidating the entire window on every event (actually a 2000x2000 pixel area). I'm not sure where it gets this area from, especially since the plugin is 100x100, or why this instance (which AFAIK Gmail only uses to play sounds) is refreshing everything. I suspect something's gone badly wrong somewhere. Anyway, it turns out that this plugin instance is actually hidden (as recorded by nsPluginInstanceOwner::mWidgetVisible). In that case we should be free to ignore its invalidation requests, and I'm attaching a trivial patch to do just that. I can't reliably reproduce the bug so I can't test my patch but I rigged up a conditional breakpoint in gdb to have the same effect and it suppressed this bug.
Comment on attachment 294167 [details] [diff] [review] fix works around a nasty, hard to reproduce but potentially high profile bug in GMail/Flash and possibly other sites, although it's a simple and sound patch
Comment on attachment 294167 [details] [diff] [review] fix a=beltzner for 1.9; this is awesome, awesome work, roc. I think you should actually blog about it, as we need more people to know about how you dug into it and figured out the issue.
Just wondering out loud here... Is the invalidated area a function of the plugin's area? If so, I wonder if it's (20*width)x(20*height), where 20 is the old pixels to twips multiplier, which could be another bug (on our end? on their end?). Is it just this one flash app, or do all flash apps request much-larger-than-necessary refresh areas?
I don't know yet, but I'll point out that the old pixels to twips multiplier was a variable, it depended on the user's DPI (for example it was often 15 for me) so I sincerely hope Flash didn't have it hardcoded anywhere.
Could the incorrect invalidate area mentioned in this bug be the reason I generally see very high CPU usage on Flash-using pages on Mac? For example, opening a few New York Times opinion pages will consistently send one of my cores to 100% usage... I'll try a debug build on that site and see what I see, I guess... Michelle, do you happen to know what's going on on the Flash side here?
Hmm. I'm not seeing the 2000x2000 rects being painted with roc's patch, except when I switch tabs (and then I get one per flash instance). So not sure why it's managing to use up so much CPU over here. :(
Fixed on trunk, but are you sure the problem is limited to trunk? Sounds just like complaints about the current release that we hear fairly commonly on MozillaZine.
thanks for blogging about this, roc! I just hit this bug with Firefox 3b2 on http://www.youtube.com/watch?v=G0LtUX_6IXY, which does not have roc's fix. I noticed the spike in cpu. using quartz debug, I could see what roc described: invalidating the entire window. then, running a recent trunk build (with your fix) I don't not see the bug and only the flash plugin is repainting. I'll attach a sample from Firefox 3 b 2 for roc to confirm that what I saw was indeed this bug. Perhaps this is enough to mark this bug as verified?
That sample doesn't tell us quite enough, it doesn't tell us where the repainting is coming from. But it does sound like this patch fixed your bug. You could try backing it out to verify that if you like.
Thanks for eyeballing the sample, Roc. On close inspection, I see that it doesn't show what your "drive by debugging" showed (http://weblogs.mozillazine.org/roc/archives/2007/12/driveby_debuggi.html) I'll leave it to the seasoned QA folks to decide if my comment #10 is enough to verify this bug as fixed. For litmus, a test case could be written involving quartz debug (enabling "flash screen updates") and http://www.youtube.com/watch?v=G0LtUX_6IXY. Any thoughts on how we could create an automated test for this?
We might be able to add an assertion that fires when a plugin tries to invalidate part of the window outside its own bounds. Then we could have a test that would fire that assertion for this bug, and we would catch that when we treat assertions as fatal.
See bug 411635 for this problem occurring with visible CoreGraphics Flash instances.
The Flash Player team has also fixed this problem on the Flash Player side. For SWF files with no drawing elements, we were inaccurately using a default size of 2000x2000, even though the surface is never going to be drawn. Now, we will check if we have a valid surface, and if not, we will ignore the region (not invalidate it). The Firefox side of the fix should be kept in the codebase because I cannot say when a Flash Player with this fix will be released. The Flash Player fix does not affect bug 411635. That is still a problem.
This bug should now be fixed in the Flash Player 10 beta: http://labs.adobe.com/downloads/flashplayer10.html