Closed Bug 1072501 Opened 5 years ago Closed 4 years ago

We probably don't use DIBTextureClient with e10s

Categories

(Core :: Graphics: Layers, defect)

x86
Windows XP
defect
Not set

Tracking

()

RESOLVED FIXED
mozilla46
Tracking Status
e10s m8+ ---
firefox45 --- wontfix
firefox46 --- fixed

People

(Reporter: jrmuizel, Assigned: bas.schouten)

References

Details

Attachments

(3 files, 1 obsolete file)

I expect this will break subpixel AA
subpixel when not using direct2d
tracking-e10s: --- → ?
OS: Mac OS X → Windows XP
Jeff, is this a serious e10s related bug? Nobody is sure what this is about.
Flags: needinfo?(jmuizelaar)
We create a DIBTexture client here only when in the same process:
http://dxr.mozilla.org/mozilla-central/source/gfx/layers/client/TextureClient.cpp#296

This is needed for proper subpixel-aa text and maybe performance.

This can probably be fixed by creating a DIBSection around shared memory like we do for plugins.
Flags: needinfo?(jmuizelaar)
It would help us if E10S team can figure out which of your milestones needs this so that we can schedule it correctly.
(In reply to Milan Sreckovic [:milan] from comment #4)
> It would help us if E10S team can figure out which of your milestones needs
> this so that we can schedule it correctly.

It's up to the graphics team, can we live without whatever this is? In triage yesterday we just gave it a plus, implying this is considered post-enable on nightly cleanup fodder.
Thanks, that helps, wasn't sure what that plus meant.
Hey Jeff,

Can you help me understand if I'm reading this bit of code correctly?  It looks like we might have E10S issues on Windows XP sytems where we are using either total software rendering with no direct x support at all, and rendering something with BGR color formats, which is the line of code you reference, and that we might have issues with D3D9 on XP on E10S (the previous if statement right above the one you reference).  I see other places where aAllocator->IsSameProcess is used in this method, but the usage in the XP ifdef concerns me most because trying to get real-world preliminary testing in that area is going to be tough.

Also, what is a real world case where we would use BGR color space? I can't exactly wrap my head around when we tend to encounter surfaces of that type.
Flags: needinfo?(jmuizelaar)
(In reply to Clint Talbert ( :ctalbert ) from comment #7)
> Hey Jeff,
> 
> Can you help me understand if I'm reading this bit of code correctly?  It
> looks like we might have E10S issues on Windows XP sytems where we are using
> either total software rendering with no direct x support at all, and
> rendering something with BGR color formats, which is the line of code you
> reference, and that we might have issues with D3D9 on XP on E10S (the
> previous if statement right above the one you reference).  I see other
> places where aAllocator->IsSameProcess is used in this method, but the usage
> in the XP ifdef concerns me most because trying to get real-world
> preliminary testing in that area is going to be tough.
> 
> Also, what is a real world case where we would use BGR color space? I can't
> exactly wrap my head around when we tend to encounter surfaces of that type.

This is unrelated to Windows XP. The problems should show up anytime we're not using Direct2D. I'm not sure the other cases matter.
Flags: needinfo?(jmuizelaar)
Has anything changed on this front, as in what we think we need to do and what we have done?
Flags: needinfo?(jmuizelaar)
Flags: needinfo?(bas)
(In reply to Milan Sreckovic [:milan] from comment #9)
> Has anything changed on this front, as in what we think we need to do and
> what we have done?

No, the basic fix for this is 'in the tree', but when we enable it we see mysterious crashes occur in the wild inside cairo. I have not heard of any way to reproduce them, we haven't attempted to re-enable it and figure out more data from crash reports - we could.
Flags: needinfo?(bas)
Is it expected that end up in the "UNSUPPORTED" section of cairo_win32_surface_composite?
(In reply to Milan Sreckovic [:milan] from comment #11)
> Is it expected that end up in the "UNSUPPORTED" section of
> cairo_win32_surface_composite?

Hmm, probably is, although I still don't like the "unsupported" part.
Read access violation on the surface in the pattern argument to DrawTargetCairo::DrawPattern.  Not null, but "bad".  Don't know if it's a UAF or some uninitialized value.
Flags: needinfo?(jmuizelaar) → needinfo?(bas)
(In reply to Milan Sreckovic [:milan] from comment #13)
> Read access violation on the surface in the pattern argument to
> DrawTargetCairo::DrawPattern.  Not null, but "bad".  Don't know if it's a
> UAF or some uninitialized value.

Neither - we use the memory after we've unmapped it; it'd be surprising if that worked :) TextureHostFileMapping::UpdatedInternal unmaps it, after giving it to texture source to hold, and eventually passing it outside through BindTextureSource.  I don't know what the design had in mind here - for us to keep the data mapped inside of TextureHostFileMapping or to make a copy in the bind method, or something else?

Anyway, something like this works, but I don't know if the data that gets outside of this class lives after the class goes away.
Attachment #8691064 - Flags: feedback?(bas)
Comment on attachment 8691064 [details] [diff] [review]
Manage the mapped data

Review of attachment 8691064 [details] [diff] [review]:
-----------------------------------------------------------------

Hrm, so this is interesting. I wrote this code, the intention was to minimize address space usage by unmapping the file when we no longer need it (i.e. when we're done with the surface), if the texture source apparently can grab a reference internally, I believe we should store userdata on the surface and have that unmap the file when the surface is destroyed. This would maintain the minimizing address space usage in the parent process but at the same time keep the memory around for as long as the DataSourceSurface is. This patch would obviously do the trick, but f- since I'd like to know if that approach gets us the best of both worlds.

Nice catch.
Attachment #8691064 - Flags: feedback?(bas) → feedback-
Flags: needinfo?(bas)
Assignee: nobody → bas
Attachment #8691533 - Flags: review?(milan)
Comment on attachment 8691533 [details] [diff] [review]
Unmap file mapping on source surface destruction.

Review of attachment 8691533 [details] [diff] [review]:
-----------------------------------------------------------------

This fixes the crash I was seeing (with suitable rebasing to central.)  Offline conversation established that we won't have the same map given to multiple surfaces.

I am assuming that the TextureHostFileMapping will not outlive the surface that uses the mapped data, including during the shutdown?

::: gfx/layers/TextureDIB.cpp
@@ +236,2 @@
>  
>    ::UnmapViewOfFile(fileMapping);

I know this isn't part of the change, but shouldn't this be ::UnmapViewOfFile(data) instead?
Attachment #8691533 - Flags: review?(milan) → review+
(In reply to Milan Sreckovic [:milan] from comment #19)
> Comment on attachment 8691533 [details] [diff] [review]
> Unmap file mapping on source surface destruction.
> 
> Review of attachment 8691533 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> This fixes the crash I was seeing (with suitable rebasing to central.) 
> Offline conversation established that we won't have the same map given to
> multiple surfaces.
> 
> I am assuming that the TextureHostFileMapping will not outlive the surface
> that uses the mapped data, including during the shutdown?

If it would we'd see issues in other texture clients I believe. I'm also not sure if you close the handle but not unmap the memory actually goes away but it shouldn't be relevant.

> ::: gfx/layers/TextureDIB.cpp
> @@ +236,2 @@
> >  
> >    ::UnmapViewOfFile(fileMapping);
> 
> I know this isn't part of the change, but shouldn't this be
> ::UnmapViewOfFile(data) instead?

Very good catch. Absolutely true. Fixed locally, I will land this patch after Nical lands his queue, I don't want to bitrot him.
as a note, this caused a 10% win for tps, the backout saw a regression- looking forward to this landing again!
(In reply to Joel Maher (:jmaher) from comment #23)
> as a note, this caused a 10% win for tps, the backout saw a regression-
> looking forward to this landing again!

I honestly don't see any way whatsoever that this test failure would be related to my patch so it might be a little tricky :).
Flags: needinfo?(bas)
(In reply to Bas Schouten (:bas.schouten) from comment #24)
> (In reply to Joel Maher (:jmaher) from comment #23)
> > as a note, this caused a 10% win for tps, the backout saw a regression-
> > looking forward to this landing again!
> 
> I honestly don't see any way whatsoever that this test failure would be
> related to my patch so it might be a little tricky :).

I was reading the log poorly, I think I understand the problem better now.
Backed out in https://treeherder.mozilla.org/logviewer.html#?job_id=18156523&repo=mozilla-inbound - it was a little hard to spot since you landed on top of other gfx talos e10s bustage, but https://treeherder.mozilla.org/logviewer.html#?job_id=18160878&repo=mozilla-inbound#L6491 is a pretty good example.
Yeah, that was what I meant to say.
This fixes the issue that was causing the reftest failure.
Attachment #8691533 - Attachment is obsolete: true
Attachment #8694992 - Flags: review?(milan)
Attachment #8694992 - Flags: review?(milan) → review?(jmuizelaar)
Attachment #8694992 - Flags: review?(jmuizelaar) → review+
That's slightly better, in that it fooled me into thinking it was just normal talos-e10s bustage for a little while, but it's still busted. Tree's closed.
Backed out in https://hg.mozilla.org/integration/mozilla-inbound/rev/d11cceb2f732 - Win32-only, t-e10s-only, g1 and s and tp, perhaps https://treeherder.mozilla.org/logviewer.html#?job_id=18418348&repo=mozilla-inbound#L2445 is a clue as to what's failing before the unhelpful "TalosError: timeout".
Bas, next time we land this, what do you think about having a preference that lets us "turn it off" - we'd still have it be on by default, but I'd like the ability to turn it off in case we need to investigate possible fallback.  (gfx.layers.dib.disable or some such, false by default.)
(In reply to Milan Sreckovic [:milan] from comment #35)
> Bas, next time we land this, what do you think about having a preference
> that lets us "turn it off" - we'd still have it be on by default, but I'd
> like the ability to turn it off in case we need to investigate possible
> fallback.  (gfx.layers.dib.disable or some such, false by default.)

Yeah, once I get it to pass the tests I'll add that.
https://hg.mozilla.org/mozilla-central/rev/ccc320a19060
Status: NEW → RESOLVED
Closed: 4 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla46
Depends on: 1234926
Milan, should we uplift this to 45 for our e10s beta experiment?
Flags: needinfo?(milan)
Whiteboard: [e10s-45-uplift]
I would prefer to let it ride the trains; I'm not convinced all that we need is contained in this bug, and this is a minor visual glitch that I can't see really screwing up the e10s beta experiment.
Flags: needinfo?(milan)
Whiteboard: [e10s-45-uplift]
You need to log in before you can comment on or make changes to this bug.