Closed Bug 1240800 Opened 7 years ago Closed 7 years ago

Graphical corruption with transform transitions with HWA on, D2D off

Categories

(Core :: Graphics, defect)

31 Branch
defect
Not set
normal

Tracking

()

RESOLVED FIXED
mozilla47
Tracking Status
firefox46 --- wontfix
firefox47 --- fixed

People

(Reporter: Oriol, Assigned: bas.schouten)

References

Details

(Keywords: regression, testcase, Whiteboard: gfx-noted)

Attachments

(5 files, 1 obsolete file)

Attached file Demo
User Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:46.0) Gecko/20100101 Firefox/46.0
Build ID: 20160118030338

Steps to reproduce:

Open the attached demo and hover the box.
There will be a transform transition.


Actual results:

There is some graphical corruption.


Expected results:

It should work like when D2D is enabled or HWA is disabled.
It happens both with D3D9 and D3D11.
Pushlog: https://hg.mozilla.org/mozilla-central/pushloghtml?fromchange=1417d180a1d8&tochange=4941a2ac0786
Before that it wasn't completely perfect neither, but much better.
Attached image Screenshots
Whiteboard: regression, testcase
(CCing relevant people on real bugs is always fine, you don't need to ask for permission.)

There's a lot of mattwoodrow in that pushlog range. I'm also CCing BenWa because he has been working on the D3D9 compositor lately.
Whiteboard: regression, testcase → regression, testcase, gfx-noted
I have bisected the OpenGL regression. The cause was

https://hg.mozilla.org/mozilla-central/rev/5cd5233505fc
Blocks: 1097699
Attached patch OpenGL fix (obsolete) — Splinter Review
I recovered some code from 34e2d2bd7ec4 which fixes the OpenGL case.

I guess the real problem is somewhere else, but hopefully this can help fixing it.
I have bisected the D3D11 regression. The cause was

https://hg.mozilla.org/mozilla-central/rev/99690880e8f5

also by :bas.schouten
Blocks: 1035227
Flags: needinfo?(bas)
(In reply to Oriol from comment #4)
> I have bisected the OpenGL regression. The cause was
> 
> https://hg.mozilla.org/mozilla-central/rev/5cd5233505fc

I suspect that wasn't the actual cause but somehow changed something in your config making it be fixed.

I have an idea of what might be causing this though. I'll fix it.
Flags: needinfo?(bas)
Hrm, I can reproduce this, but my initial guess about what's causing it is wrong. Interestingly enough it seems if we always do a fullupload this is fine. If we do a full upload on creating a new texture, this is still broken. This shouldn't be possible, but it obviously is, investigating further.
(In reply to Bas Schouten (:bas.schouten) from comment #7)
> I suspect that wasn't the actual cause but somehow changed something in your
> config making it be fixed.

But there is something in that patch which introduces the issue, because recovering some of the code deleted there fixes the OpenGL problem (see comment 5).

Not sure why I didn't search it in MXR, but now I see in http://mxr.mozilla.org/mozilla-central/source/gfx/layers/composite/LayerManagerComposite.cpp#1539 there is a check of MOZ_HAVE_PLATFORM_SPECIFIC_LAYER_BUFFERS. If it isn't defined, it does
> bool LayerManagerComposite::SupportsDirectTexturing() { return false; }

My patch solves the problem by defining that MOZ_HAVE_PLATFORM_SPECIFIC_LAYER_BUFFERS and
> bool LayerManagerComposite::SupportsDirectTexturing() { return true; }

So I guess using return true directly in LayerManagerComposite.cpp would fix the problem too.
(In reply to Oriol from comment #9)
> So I guess using return true directly in LayerManagerComposite.cpp would fix
> the problem too.

Yes, that works. Or using double buffering in http://mxr.mozilla.org/mozilla-central/source/gfx/layers/client/ContentClient.cpp#87 works too.

So I guess it's a problem of single buffering.
(In reply to Oriol from comment #10)
> (In reply to Oriol from comment #9)
> > So I guess using return true directly in LayerManagerComposite.cpp would fix
> > the problem too.
> 
> Yes, that works. Or using double buffering in
> http://mxr.mozilla.org/mozilla-central/source/gfx/layers/client/
> ContentClient.cpp#87 works too.
> 
> So I guess it's a problem of single buffering.

Right.. but that just selects a flag that's wrong, causes a whole bunch of other bugs, (by lying about supporting direct texturing), and causes us to always do a complete upload :). So like I said, it just fiddle with your configuration in an invalid way, it doesn't actually 'fix' the bug, it just happens to make this artifact go away.
Attached image PostScale.png
So, I've rootcaused the issue here, I took images of both the software surface used for uploading when things are looking correct, and those used for uploading when things are messed up. The main issue is in the scale.

When the surface is being scaled back down when the image resettles in the rotated position, the surface looks like the attached image, which is fine, except that the region that's specified to be uploaded includes -only- the gradient area, and not the text area, causing part of the 'old' surface's area to leak though.
Flags: needinfo?(matt.woodrow)
Attached image PreScale.png
This was the pre-scale surface leaking through for me. I should be a little more specific, it updates the little 'Foo' bounding box and the gradient. The reason for this is that with transparent surfaces we just draw the bounds of the visible region, assuming the rest will be transparent. In this case, that assumption is false. The bit that's incorrect is not included in the visible region, but is still being drawn. I'll try and figure out what the best solution is.
Flags: needinfo?(matt.woodrow)
I'm going to suggest this as a fix, in some unfortunate cases it will cause us to upload a little more than strictly necessary, but in general it's more correct and it won't make that much of a difference.
Assignee: nobody → bas
Status: UNCONFIRMED → ASSIGNED
Ever confirmed: true
Comment on attachment 8716574 [details] [diff] [review]
OpenGL fix

Your patch works great!
Below the text I still see a 1px tall bar with the gradient, but probably that should be another bug.
Attachment #8716574 - Attachment is obsolete: true
FYI the patch here also fixes bug 1246449.
Comment on attachment 8716926 [details]
MozReview Request: Bug 1240800: When we've reallocated our buffer client side and fail to track the proper invalid region always upload the bounds of the visible region. r=mattwoodrow

https://reviewboard.mozilla.org/r/33995/#review30701
Attachment #8716926 - Flags: review?(matt.woodrow) → review+
https://hg.mozilla.org/mozilla-central/rev/a1cb1143aaa8
Status: ASSIGNED → RESOLVED
Closed: 7 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla47
Any plans to uplift this to 46?
Flags: needinfo?(bas)
Keywords: regression, testcase
Whiteboard: regression, testcase, gfx-noted → gfx-noted
I would like to but I want to give it a day or two on Nightly.
Flags: needinfo?(bas)
This bug fixed a problem. Bug 1247466 was supposed to fix another problem.
I guess both problems can be fixed here, but then this shouldn't be marked as fixed.
If bug 1247466 isn't fixed by this, it should be reopened.
In dup bug 1246449 Firefox 46 is marked as affected. Does this need to be uplifted before we ship 46?
Flags: needinfo?(bas)
(In reply to Lawrence Mandel [:lmandel] (use needinfo) from comment #27)
> In dup bug 1246449 Firefox 46 is marked as affected. Does this need to be
> uplifted before we ship 46?

Really depends on how serious we think the problem is. It's been reasonably well tested through 47 though obviously.
Flags: needinfo?(bas)
The dup bug 1246449 is not an issue if APZ is disabled, so there's no need to uplift on account of that bug at least.
Thanks kats. Seems like the question is just about this bug then. How prevalent do you expect the issue reported here to be on the web? 

Is this an edge case based on a non default config? 

Does this affect common OpenGL usage or a seldom used feature?
I don't know the answers to those questions - Bas should be able to answer. My involvement in this bug is limited to noticing that the patch fixed the other bug; I don't really know the code.
Flags: needinfo?(bas)
(In reply to Kartikaya Gupta (email:kats@mozilla.com) from comment #31)
> I don't know the answers to those questions - Bas should be able to answer.
> My involvement in this bug is limited to noticing that the patch fixed the
> other bug; I don't really know the code.

No, this -can- cause a bug in the default config, it's simply a little less likely.

Rather than affecting a feature this affects something happening all the time, but usually being asymptomatic. In order for it to cause visible artifacts a couple of factors have to be true to trigger the heuristics to cause this, however saying how likely that is, is extremely hard, considering how long this bug has been around. It doesn't seem very likely in practice.
Flags: needinfo?(bas)
This doesn't sound like a widespread issue in 46 and we are very late in the 46 beta cycle. Seems reasonable to wontfix.
You need to log in before you can comment on or make changes to this bug.