Closed Bug 671639 Opened 13 years ago Closed 13 years ago

Crazy flashing with Google Plus hangout

Categories

(Core Graveyard :: Plug-ins, defect)

x86
macOS
defect
Not set
normal

Tracking

(firefox10-)

RESOLVED FIXED
mozilla11
Tracking Status
firefox10 - ---

People

(Reporter: joe, Unassigned)

References

Details

(Keywords: regression)

Attachments

(3 files, 2 obsolete files)

The Google Plus "hangout" feature lets you use the Google Talk NPAPI plugin to video chat with your friends. Unfortunately, on OS X at least, video flashes (black to content and back) very rapidly.

This is not the case when doing a video chat from within gmail itself.
Is this with today's (July 14th) nightly? Does it reproduce with yesterday's nightly?
I just tested with both and only see flashes if I force redraw by resizing the window, otherwise I get a frozen frame. What's probably going on here is user agent sniffing failing like quicktime.
Yes, this only happens with today's nightly, not yesterday's -> almost certainly a regression with async plugin drawing!
I find it unsettling that I'm seeing no frames produced rather then flashing like Joe.

I did confirm that the plugin is using InvalidatingCoreAnimation drawing model as expected, however I see the plugin asking to redraw an empty region over and over again:
1887726752[100116330]: NPN_InvalidateRect: npp=1423a1e48, top=0, left=0, bottom=0, right=0
1887726752[100116330]: NPN_InvalidateRect: npp=1423a1e48, top=0, left=0, bottom=0, right=0
1887726752[100116330]: NPN_InvalidateRect: npp=1423a1e48, top=0, left=0, bottom=0, right=0
1887726752[100116330]: NPN_InvalidateRect: npp=1423a1e48, top=0, left=0, bottom=0, right=0
1887726752[100116330]: NPN_InvalidateRect: npp=1423a1e48, top=0, left=0, bottom=0, right=0
1887726752[100116330]: NPN_InvalidateRect: npp=1423a1e48, top=0, left=0, bottom=0, right=0
1887726752[100116330]: NPN_InvalidateRect: npp=1423a1e48, top=0, left=0, bottom=0, right=0
1887726752[100116330]: NPN_InvalidateRect: npp=1423a1e48, top=0, left=0, bottom=0, right=0
1887726752[100116330]: NPN_InvalidateRect: npp=1423a1e48, top=0, left=0, bottom=0, right=0
1887726752[100116330]: NPN_InvalidateRect: npp=1423a1e48, top=0, left=0, bottom=0, right=0
1887726752[100116330]: NPN_InvalidateRect: npp=1423a1e48, top=0, left=0, bottom=0, right=0
1887726752[100116330]: NPN_InvalidateRect: npp=1423a1e48, top=0, left=0, bottom=0, right=0
Do these match up with what you've got installed?

    File: googletalkbrowserplugin.plugin
    Version: 2.1.9.3062
    Version 2.1.9.3062

    File: npgtpo3dautoplugin.plugin
    Version: 0.1.44.5
    Google Talk Plugin Video Accelerator version:0.1.44.5
D'oh! I used to always ask for plugin versions.

Google Talk NPAPI Plugin

    File: /Library/Internet Plug-Ins/googletalkbrowserplugin.plugin
    Version: 2.0.4.2108
    Version 2.0.4.2108

MIME Type 	Description 	Suffixes
application/googletalk 	Google voice and video chat 	googletalk
Google Talk Plugin Video Accelerator

    File: /Library/Internet Plug-Ins/npgtpo3dautoplugin.plugin
    Version: 0.1.43.7
    Google Talk Plugin Video Accelerator version:0.1.43.7

MIME Type 	Description 	Suffixes
application/vnd.gtpo3d.auto 	Google Talk Plugin Video Accelerator Type
I updated the plugins and can now reproduce the issue. It must be related to the way I implemented double buffering. I wonder why we only see it in this instance. I will investigate this.

In the meantime I wonder if we should get google talk plugin added to:
http://www.mozilla.com/en-US/plugincheck/
I get the following errors in the console:
objc[24683]: Class CrApplication is implemented in both /Users/benoitgirard/mozilla/mozilla-central/builds/obj-ff-optuni/x86_64//dist/universal/firefox/Nightly.app/Contents/MacOS/XUL and /Library/Internet Plug-Ins/npgtpo3dautoplugin.plugin/Contents/MacOS/npgtpo3dautoplugin. One of the two will be used. Which one is undefined.
objc[24683]: Class NoOp is implemented in both /Users/benoitgirard/mozilla/mozilla-central/builds/obj-ff-optuni/x86_64//dist/universal/firefox/Nightly.app/Contents/MacOS/XUL and /Library/Internet Plug-Ins/npgtpo3dautoplugin.plugin/Contents/MacOS/npgtpo3dautoplugin. One of the two will be used. Which one is undefined.
objc[24683]: Class TaskOperation is implemented in both /Users/benoitgirard/mozilla/mozilla-central/builds/obj-ff-optuni/x86_64//dist/universal/firefox/Nightly.app/Contents/MacOS/XUL and /Library/Internet Plug-Ins/npgtpo3dautoplugin.plugin/Contents/MacOS/npgtpo3dautoplugin. One of the two will be used. Which one is undefined.
[0719/135441:INFO:/b/build/slave/opt-mac-gtv/build/o3d/core/cross/features.cc(136)] Smooth Texture Updates = 1 render_mode_ = 1
[0719/135441:INFO:/b/build/slave/opt-mac-gtv/build/o3d/core/cross/features.cc(136)] Smooth Texture Updates = 1 render_mode_ = 0
plugin-container(24683,0xb030b000) malloc: *** error for object 0x32252ac: pointer being freed was not allocated
Attached file Plugin frames
The problem we are seeing seems to be coming from the frames we are receiving from the plugin. I checked to see if the bad frames were coming from a particular buffer, since we are double buffering, but that is not the case.

I dumped the frame we are receiving from as soon as they are rendered and we get dark frames and tearing. It appears that the frames are not properly constructed and the plugin returns to the event loop with partially updated frames where our paint message can see the partial frame. In any case it appears that the plugin does not sync its drawing correctly. (See attachments for partial frames and frames with tearing).

Async plugin rendering might just have just changed the timing causing this issue to show up. A notable difference with async rendering is that we push a draw event to the message loop of the plugin instead of making an ipc call and putting the message on the browser's message loop.

For the record the plugin is using a CAOpenGLLayer with the asynchronous option.
(In reply to comment #9)
> with the asynchronous option.
I'm not sure if this is true looking at the traces again.

Do we have a contact with the Google Talk plugin to discuss what expectation they have to sync rendering? Perhaps they relied on unspecified internal behavior that was changed with Async rendering and not guaranteed by NPAPI. This would explain why it broke with async rendering and why no other plugins are affected by the change.
Hi guys, I'm one of the engineers who works on this plugin. This plugin is actually open-source--it is known by the name "O3D" and you can browse the code at http://src.chromium.org/svn/trunk/o3d/. The OS X NPAPI glue code mostly lives in http://src.chromium.org/svn/trunk/o3d/plugin/mac/.

In our normal hardware-accelerated OpenGL mode (there is a software fallback mode too), all drawing is done within the drawInCGLContext method of our CAOpenGLLayer (http://src.chromium.org/svn/trunk/o3d/plugin/mac/o3d_layer.mm), so it's not clear to me how FF could be seeing an incomplete frame. Though if the new FF code does not wait for all pending GL operations in the CGLContext to complete then that would probably explain the observations. You may need to add a call to glFinish().

We do see a regression starting in FF4 where resizing the plugin sometimes puts it into a state where no video is rendered and FF instead displays a black area with a white rectangle in it. Another Google engineer asked about this issue on one of the mailing lists but I don't think it got any traction. Hopefully now that we have been connected with one another we can resolve both issues. :)

Regarding the InvalidateRect with an empty region, that's very odd ... All of our NPN_InvalidateRect calls come from http://src.chromium.org/svn/trunk/o3d/plugin/mac/plugin_mac.mm which always invalidates the entire plugin region in Firefox, so for an empty invalidate to happen we must have gotten a previous NPP_SetWindow() call with a width and height of 0.
I should also mention that if you want to build the plugin yourself the instructions are at http://code.google.com/p/o3d/wiki/HowToBuild. The public version is branded as "O3D" instead of "Google Talk Plugin Video Accelerator" but you can force Google+ to use the "O3D" one by just disabling the other plugin in FF.
Thanks Tristan. Having access to the source will make this problem much easier to find.

(In reply to comment #11)
> Though if the new FF
> code does not wait for all pending GL operations in the CGLContext to
> complete then that would probably explain the observations. You may need to
> add a call to glFinish().

The browser does not have access to the CGLContext of the plugin. The plugin only passes an abstract CALayer. With the changes for async plugin painting the delay between invalidation and the draw call is much smaller since we no longer wait on interprocess messages to paint the layer. Perhaps this timing change doesn't give enough time for the GL pipeline to be emptied without an explicit glFlush before FF tries to paint the layer.

Just a guest ATM, have to look at the source first.

> 
> We do see a regression starting in FF4 where resizing the plugin sometimes
> puts it into a state where no video is rendered and FF instead displays a
> black area with a white rectangle in it. Another Google engineer asked about
> this issue on one of the mailing lists but I don't think it got any
> traction. Hopefully now that we have been connected with one another we can
> resolve both issues. :)

I've seen similar issues with Flash, this was caused by CALayer having a default animation property. Since neither the plugin or the browser expected the animation InvalidateRect were never made to account for it while resizing causing visual glitches. The latest trunk builds now overwrite the layer animation property so that may have solved the issue. 
We should file a follow up for that issue.
> 
> Regarding the InvalidateRect with an empty region, that's very odd ... All
> of our NPN_InvalidateRect calls come from
> http://src.chromium.org/svn/trunk/o3d/plugin/mac/plugin_mac.mm which always
> invalidates the entire plugin region in Firefox, so for an empty invalidate
> to happen we must have gotten a previous NPP_SetWindow() call with a width
> and height of 0.

Perhaps that was faulty logging, I'll double check that.
(In reply to comment #13)
> longer wait on interprocess messages to paint the layer. Perhaps this timing
> change doesn't give enough time for the GL pipeline to be emptied without an
> explicit glFlush before FF tries to paint the layer.

Well we *do* call glFlush(), just not glFinish(). (AFAIK drawInCGLContext is not supposed to.)

> I've seen similar issues with Flash, this was caused by CALayer having a
> default animation property. Since neither the plugin or the browser expected
> the animation InvalidateRect were never made to account for it while
> resizing causing visual glitches. The latest trunk builds now overwrite the
> layer animation property so that may have solved the issue. 
> We should file a follow up for that issue.

Hmm, we do some stuff with the "actions" property to prevent some undesired default animations. Perhaps that is leading to this ...
Benoit, if there is any other information we could provide that would help you resolve this, don't hesitate to ask.
(In reply to comment #15)
> Benoit, if there is any other information we could provide that would help
> you resolve this, don't hesitate to ask.

Thanks, I haven't had time to build the plugin. I did investigate a bit more by debugging firefox and looking at the source provided. It seems that when we do SetWindow gecko stops receiving InvalidateRect message. This is what causes the rendering to stop. It would be great if you could investigate this issue. Let me know if you think the API implementation is doing something unexpected.

I looked through the code to see if there was obvious concurrency race that could cause flickering but did not see any. I have investigate past the process boundary because the indirection was non-trivial to follow without a build. I was however able to make the problem disappear by delaying the draw message by 100ms in the message loop. This seems to indicate that there is a race condition that's specific to the interaction between Gecko and the Google talk plugin.
(In reply to Tristan Schmelcher from comment #15)
> Benoit, if there is any other information we could provide that would help
> you resolve this, don't hesitate to ask.

Hi Tristan,

Have you had a chance to investigate the issue from the perspective of the plugin? The attachment in the bug shows the frames we are receiving from the plugin that are incomplete. Can you confirm that the frames the plugin is painting to the core animation layer are complete when using the nightly (about to merge to our aurora channel):
http://nightly.mozilla.org/
Sorry, your previous comment got lost in my inbox. I haven't looked into this issue and right now all the engineers that know this code are tied up. :/ I'll see if I can find someone to investigate this.

I would be extremely surprised if the frames painted into the layer are not complete by the end of drawInCGLContext. Certainly all the GL calls to draw it have been issued by then.
I'm seeing this as well, both in 8.0 (today) and with a more-or-less up-to-date Aurora on 9/26.  At that point, I had plugin: Version: 2.3.2.3851, accelerator: Version: 0.1.44.11.  I'm on OS X 10.7.1 on a new mini.

For me, the flashing occurs at the "check your hair" stage, and then, perhaps unrelated, my system hard-locks within a few seconds of going to the actual "hangout" stage, long before anyone else can join the hangout.  Both effects are reproducible (although obviously it's pretty annoying to do so!), so if you'd like data from the hard-lock, let me know.

The plugin works fine in Chrome on the same system, and also works fine for me on Firefox on a 10.6.8 laptop.
Dustin, can you try with the latest 8beta? We backed out async plugin painting which we think caused this issue.
Success!  No flicker, no hard-lock.

Thanks!
The following series of patches does double buffering using 2 IOSurface that are swapped between the same CARenderer+CALayer. Previously we used a pair of IOSurface+CARenderer and swapped the CALayer between them.
Assignee: nobody → bgirard
Status: NEW → ASSIGNED
(In reply to Tristan Schmelcher from comment #18)
Thanks for the information you provided Tristan, it was very helpful in resolving this problem!
Attachment #571700 - Flags: review?(matt.woodrow)
Attachment #571703 - Flags: review?(matt.woodrow)
Moving tracking from 8 to 10 as per async feature postponement in bug 663259.
(In reply to Benoit Girard (:BenWa) from comment #24)

No problem, glad you were able to resolve it!

Based on your investigation, is there anything our plugin was doing wrong that we should change?
(In reply to Tristan Schmelcher from comment #26)
> (In reply to Benoit Girard (:BenWa) from comment #24)
> 
> No problem, glad you were able to resolve it!
> 
> Based on your investigation, is there anything our plugin was doing wrong
> that we should change?

Not then I can see. It's still a mystery to me why GoogleTalk was the only affected plug-in (Flash, Quicktime worked fine).
Comment on attachment 571700 [details] [diff] [review]
Part 1: AttachIOSurface now updates the Framebuffer instead of recreating it

Review of attachment 571700 [details] [diff] [review]:
-----------------------------------------------------------------

::: gfx/thebes/nsCoreAnimationSupport.mm
@@ +671,5 @@
> +      ::glLoadIdentity();
> +      ::glOrtho (0.0, width, 0.0, height, -1, 1);
> +
> +      ::glTranslatef(0.0f, height, 0.0);
> +      ::glScalef(1.0, -1.0, 1.0);

Most of this code looks to be more or less identical to SetupRenderer, can we share more of it inside SetBounds?
Attachment #571700 - Flags: review?(matt.woodrow) → review+
Comment on attachment 571703 [details] [diff] [review]
Part 2: Double buffer using a single CARenderer

Review of attachment 571703 [details] [diff] [review]:
-----------------------------------------------------------------

::: dom/plugins/ipc/PluginUtilsOSX.mm
@@ +347,5 @@
>    if (!mFrontSurface) {
>      return false;
>    }
>  
> +  nsRefPtr<nsIOSurface> ioSurface = nsIOSurface::LookupSurface(mFrontSurface->GetIOSurfaceID());

Why are we creating a new nsIOSurface here instead of just using mFrontSurface?
No longer copying nsIOSurface
Attachment #571703 - Attachment is obsolete: true
Attachment #574698 - Flags: review?(matt.woodrow)
Attachment #571703 - Flags: review?(matt.woodrow)
Attachment #574698 - Flags: review?(matt.woodrow) → review+
https://hg.mozilla.org/mozilla-central/rev/6451c16af7b0
https://hg.mozilla.org/mozilla-central/rev/c4dbd7dcc810

Should this be moved to core/Plugins so we can set the TM correctly?
Status: ASSIGNED → RESOLVED
Closed: 13 years ago
Resolution: --- → FIXED
Assignee: bgirard → nobody
Component: Google Talk → Plug-ins
Product: Plugins → Core
QA Contact: google-talk → plugins
Target Milestone: --- → mozilla11
Version: unspecified → Trunk
From https://bugzilla.mozilla.org/show_bug.cgi?id=663259#c62 it sounds like this shouldn't be an issue for FF10.
Depends on: 764157
Product: Core → Core Graveyard
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: