Closed Bug 579262 Opened 14 years ago Closed 14 years ago

Some bad drawing cases with plugin still sneak through

Categories

(Core :: Layout, defect)

defect
Not set
major

Tracking

()

RESOLVED FIXED
Tracking Status
blocking2.0 --- beta3+

People

(Reporter: matjk7, Assigned: roc)

References

()

Details

(Whiteboard: [Input])

Attachments

(3 files)

User-Agent:       Mozilla/5.0 (Windows; Windows NT 6.1; en-US; rv:2.0b2pre) Gecko/20100715 Minefield/4.0b2pre
Build Identifier: Mozilla/5.0 (Windows; Windows NT 6.1; en-US; rv:2.0b2pre) Gecko/20100715 Minefield/4.0b2pre  Built from http://hg.mozilla.org/mozilla-central/rev/2bfc4dd54872

Sorry for not noticing this on latest try-server builds, but it looks like bad rendering of plugin content still happens sometimes. This should be fixed asap since retained layers will be in beta2. STR is not really reliable, sometimes it happens for no apparent reason and other times I can reproduce it perfectly.

Reproducible: Sometimes

Steps to Reproduce:
1.Open a flash video with old youtube video controls (i.e http://www.youtube.com/watch?v=QV8eiSA4vqc&feature=related)
2.Click on the expand/reduce button twice
Actual Results:  
Web content is drawn on top of plugin area.

Expected Results:  
Plugin content is drawn on top of web content.
Blocks: 564991
blocking2.0: --- → ?
Version: unspecified → Trunk
Yeah, this is ugly.  Basically here is we're seeing. the plugin is changing both size and position but the plugin image stays the same.  When going to a larger size, it doesn't repaint correctly until a tab switch.  When going back to the original size, it repaints quickly without a tab switch event.
Ok it looks like whenever the plugin area is resized but the content isn't active the browser doesnt repaint the new plugin area properly, causing web content to overlap it. This happens with any plugin and not just flash ( https://bugzilla.mozilla.org/show_bug.cgi?id=564991#c229 ) . For some reason, disabling OOPP fixes this. Disabling OOPP also fixes https://bugzilla.mozilla.org/show_bug.cgi?id=545892 and possibly other bugs ( https://bugzilla.mozilla.org/show_bug.cgi?id=579676 ). Could OOPP be turned off by default for beta2 (with a release note saying that it will be turned on again when those bugs are fixed) since code freeze is on Monday and chances are this bugs won't be fixed by then?
I bashed away at this bug most of the day but couldn't figure it out. Backing out Part 36 of the retained layers patches (changeset 626c70cb1db5), and calling nsRootPresContext::UpdatePluginGeometry() explicitly after we've instantiated the plugin, thus giving us almost exactly the pre-retained-layers behaviour, didn't fix the problem. The strange thing is that these are windowed plugins so mostly we're unable to affect their rendering no matter what we do.

I thought this would be an invalidation problem of some kind, but when I load and reload http://www.youtube.com/watch?v=QV8eiSA4vqc&feature=related, I see a spinner in the middle of the window that is only partially painted over several frames, so it's not just an invalidation problem. If I pause the browser and plugin container while that spinner is animating, the sizes of the HWNDs in our process and in the plugin container are both correct. I can't figure out why only part of the plugin would be painted.

I wonder if it's a Flash bug that we're triggering somehow.
Hmm, but I can reproduce it --- or at least a related bug --- with a very simple testcase!

<embed type="application/x-test" style="position:fixed; width:100%; height:100%; left:0; top:0"></embed>
Hmm, but that's windowless and the youtube testcases are windowed.
Aha! If I open a Youtube video in a pop-out using the third button from the right in the new controls (e.g. http://www.youtube.com/watch?v=lx4fc9oRerw), then resizing the popup sometimes fails to change the size of the plugin properly. This is easiest to see as the window gets larger. It's clear that the problem is we do a reflow, call NPP_SetWindow OK, but for some reason we're not getting around to invalidating the plugin window in nsWindow::Resize.
Assignee: nobody → roc
Status: UNCONFIRMED → ASSIGNED
Ever confirmed: true
And that's because sometimes GetPluginGeometryUpdates is not returning any configurations.
Attached patch FixSplinter Review
This doesn't fix all the problems I see, but it's certainly an improvement.
Attachment #458325 - Flags: review?(tnikkel)
Comment on attachment 458325 [details] [diff] [review]
Fix

Some tabs and/or weird indentation snuck in. Please fix that before landing.
Attachment #458325 - Flags: review?(tnikkel) → review+
Keywords: checkin-needed
Whiteboard: [needs landing]
blocking2.0: ? → beta2+
http://hg.mozilla.org/mozilla-central/rev/10f6355b8475

I fixed one big problem here. I think there are still others. Please file new bugs on them, blocking bug 564991. Thanks!!!
Status: ASSIGNED → RESOLVED
Closed: 14 years ago
Keywords: checkin-needed
Resolution: --- → FIXED
Whiteboard: [needs landing]
This isn't fixed at all (in fact it happens more often now). Built from http://hg.mozilla.org/mozilla-central/rev/d1845b52cfd0 . Using the same STR from description now it happens as soon as I open the video, even before resizing, and when resizing it happens again.
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
The same problem happen on youtube videos that have video playing, and scrolling seems to cause it go away, ie as everything outside of the actual the video where its black doesn't get set till scrolling occurs.
That's not just after the fix; I'm seeing that using the nightly from yesterday, too.

Trying to determine if I think we should hold b2 for this. Roc says that it will require some investigation :/
Can't you ship b2 with OOPP turned off? It would fix this issue and all other OOPP related bugs (duh) and give time to find a proper fix to this. Releasing b2 with more regressions than b1 doesn't make much sense to me, plus the biggest motivation for OOPP was to prevent crashes, but the user base for betas is really small comparing to release channel, not to mention that crashes are likely to happen with betas anyway.
I just think releasing b2 with a release note saying that OOPP will be turned off until major issues are fixed is less bad than releasing it with plugins malfunctioning.
OK, did some quick testing with a build with the latest change and this is what I'm seeing:

 - before anything's in the cache, on first load, the video UI looks fine
 - once something's in the cache, the video UI doesn't draw until you hover over it
 - the resize bug is still there, visible on first resize, goes away when you scroll

Clearing the cache works as a way when testing to get it back to the "looks fine" state.

I don't think we can ship with YouTube broken like this, sadly :(

Screenshot of what happens after you have the site in cache and load another YouTube video: http://grab.by/5vpl
(this has nothing to do with OOPP)
(oh, maybe it does - still, we're not going to disable OOPP in betas, no)
(In reply to comment #7)
> Aha! If I open a Youtube video in a pop-out using the third button from the
> right in the new controls (e.g. http://www.youtube.com/watch?v=lx4fc9oRerw),

This video works good, but this video covers the entire plugin edges.  Is this what was fixed?  

I guess we're seeing issues with the URL posted at the top of bug where the video playing size is smaller than the entire plugin area.
Moving this to beta3 - adding relnote. It's ugly, but not so ugly that it prevents people from being able to use the browser. Switching applications/tabs redraws the entire region correctly. Need to fix right quick, though.
blocking2.0: beta2+ → beta3+
Keywords: relnote
I thought there might be problems with the ordering of NPP_SetWindow, resizing the plugin widget, and painting, but that doesn't seem to be it. Even if I force NPP_SetWindow to happen after resizing the plugin widget and also make our plugin-container process invalidate and repaint the plugin window after each NPP_SetWindow call, the problems still happen.

Unfortunately I don't seem to be able to reproduce the problem within the VMWare Windows XP VM that I use for record-and-replay at the office. Strangely, I can easily reproduce the problem in my VMWare Fusion Windows XP VM on my laptop.

I bisected within the retained-layers landing and found that the problem starts happening at Part 15:
http://hg.mozilla.org/mozilla-central/rev/a78221e8bde4
That is very weird; retaining BasicLayers contents should only mean that we do less repainting in the browser. But just maybe it's possible we end up overpainting the plugin window or something like that. I'll dig into that more tomorrow.
(In reply to comment #21)
> That is very weird; retaining BasicLayers contents should only mean that we do
> less repainting in the browser. But just maybe it's possible we end up
> overpainting the plugin window or something like that. I'll dig into that more
> tomorrow.
Could it be exactly the opposite? I noticed that this bug also happens on youtube videos with the new player if you basically try to make it happen (resize repeatedly, scroll up and down quickly, etc.) but only on the inactive parts of the plugin area. And since this only happens with OOPP turned on, my guess is that:
1-When firefox runs flash in the same process, everytime the flash content is active, firefox repaints the whole plugin area.
2-When flash runs in another process, the plugin-container.exe is responsible for drawing on the entire plugin area, but it only paints the active content (the video itself, the progress bar/time on the video). Notice that videos with black bars show web content there when resized, but videos where the content fills the whole plugin area don't.
If my guess is somewhat accurate then you should be able to fix this by forcing the plugin-container to repaint the whole plugin area and not just the active content (pre-retained layers behavior), or by making the firefox process aware of plugin activity and triggering a repaint of the whole plugin area when it start/stops, to prevent issues like this from happening again.
I got record+replay working and did a lot more investigation. It seems Flash paints in two ways: full repaints using occasional BeginPaint/EndPaint, and partial updates using GetDC/ReleaseDC. In both cases the actual work is done using a BitBlt. The partial updates work, in that the BitBlt causes the blitted content to successfully appear on the screen. The BeginPaint/EndPaint updates do not work. Flash tries to blit to the DC returned by BeginPaint but nothing changes on the screen.

So the question is, how does enabling retaining layer contents make Flash's BeginPaint/EndPaint return a broken DC???
I thought maybe we might be doing something bad to the DC pool, but the bug still happens with D2D drawing, which doesn't use DCs, so that's probably not it.

(In reply to comment #22)
> 2-When flash runs in another process, the plugin-container.exe is responsible
> for drawing on the entire plugin area, but it only paints the active content
> (the video itself, the progress bar/time on the video). Notice that videos with
> black bars show web content there when resized, but videos where the content
> fills the whole plugin area don't.
> If my guess is somewhat accurate then you should be able to fix this by forcing
> the plugin-container to repaint the whole plugin area and not just the active
> content (pre-retained layers behavior), or by making the firefox process aware
> of plugin activity and triggering a repaint of the whole plugin area when it
> start/stops, to prevent issues like this from happening again.

I actually tried that earlier but it didn't help. You're quite right that the "inactive" parts of the plugin aren't painted properly; those are the parts that are supposed to be painted by BeginPaint/EndPaint, which isn't working.
Note, I tried replacing the "if (!IsRetained())" with "if (1)" in BasicThebesLayer::Paint, and that made the problem go away. So it's definitely something to do with ThebesLayerBuffers.

A note to myself of things I should try tomorrow:
-- Try using CreateOffscreen instead of CreateSimilarSurface to create the ThebesLayerBuffer's buffer
-- Add code so that we can dump DC state (clip region etc) and call that code from the debugger every time we do a BeginPaint in the plugin container.
-- Try making only certain layers retained, to see which layer(s) being retained triggers the bug.
While poking around in the debugger I noticed that we got an assertion about mutation listeners firing during paint. The stack showed that we were in nsObjectFrame::PaintPlugin doing our PrintWindow fallback path to try to render the plugin into a retained ThebesLayer! Now that shouldn't be happening, it's a waste of time since the plugin renders in its own window already. Indeed, nsObjectFrame has code to suppress such rendering if the rendering is going to end up on the screen. That code checks the gfxContext::FLAG_DESTINED_FOR_SCREEN flag on the target context. But of course with retained layers we're rendering to a buffer and the context we created doesn't have that flag set on it! And if I set that flag on the context returned from ThebesLayerBuffer, presto this bug goes away!

It appears that the PrintWindow call is triggering some weird/buggy Windows behviour. I know that PrintWindow issues a WM_PAINT event to windows in the window subtree, which will trigger BeginPaint in the OOP plugin window, so it's not surprising that it could interfere with BeginPaint somehow.

Anyway, clearly the BasicLayers code needs to be propagating that flag around. We need to propagate it through our backbuffer logic as well as when we draw into ThebesLayerBuffers. That's a no-brainer fix for this bug.

However, there is still a problem because we could still be doing PrintWindows in response to drawWindow requests or actual printing. That might trigger similar problems to this bug. Maybe we should move away from using PrintWindow, or modify our OOPP code so the PrintWindow is triggered on the plugin-container side. I'll file a new bug.
Chris, Bas: you should check your GL/D3D layers code to make sure FLAG_DESTINED_FOR_SCREEN is set appropriately on contexts you pass to the DrawThebesLayer callback, so you don't get burned by this bug.
Whiteboard: [needs review]
Note that this patch will speed up rendering too, since it will cut out a lot of synchronous PrintWindow calls.
Filed bug 581247 on the remaining issues described in comment #26.
Keywords: checkin-needed
Whiteboard: [needs review] → [needs landing]
http://hg.mozilla.org/mozilla-central/rev/7e78b814d00b
Status: REOPENED → RESOLVED
Closed: 14 years ago14 years ago
Keywords: checkin-needed
Resolution: --- → FIXED
Whiteboard: [needs landing]
Confirming fixed on the youtube example at least ( Built from http://hg.mozilla.org/mozilla-central/rev/30239e4cebd8 ). 
There's one bug that is somewhat related to this and I was hoping it would be fixed by this patch since you mentioned it would speed up rendering. 
STR: open a youtube video you can see the fps counter (right click->show video info), scroll to the bottom of the page and then back to the top, notice the frame rate dropped and the video stutters. 
For some reason if you hover over the video the frame rate quickly goes up (above the video's average fps). I think this is a regression from retained layers. Should I file a new bug or would bug 581247 fix this?
File a new bug please.
(In reply to comment #35)
> File a new bug please.

Bug 581725
I don't see how my filed bug 583058 is duplicate of this bug. My bug is about content outside the flash area, in the only event when scrolling up is done. I guess that the screenshot https://bug583058.bugzilla.mozilla.org/attachment.cgi?id=461329 misled because a bit of the flash player itself is not repainted.

Interestingly, while I started to write this comment and scrolled up this page https://bugzilla.mozilla.org/show_bug.cgi?id=579262, multiple lines like the following appear at the top of the window, which is exactly what my bug is about. And in this page there's no flash. I couldn't reproduce it.

   > File a new bug please.

   Bug 581725
Lost of user reports about this - http://input.mozilla.com/en-US/search/?q=youtube+black&product=firefox

So, I'm glad it got fixed for b3.
Whiteboard: [Input]
(In reply to comment #29)
> Chris, Bas: you should check your GL/D3D layers code to make sure
> FLAG_DESTINED_FOR_SCREEN is set appropriately on contexts you pass to the
> DrawThebesLayer callback, so you don't get burned by this bug.

Is there a bug for tracking the D2D fix (since it's still broken in b3)?
(asking as couldn't find a dependency)
Just tried with B5 and problem is still there. Most of the times it happens if you scroll.
File a new bug please (if it's not 587508)
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: