Closed Bug 1261166 Opened 4 years ago Closed 3 years ago

Consider using IOSurface for sharing content-painted textures to the compositor on OS X

Categories

(Core :: Graphics: Layers, defect)

x86_64
macOS
defect
Not set

Tracking

()

RESOLVED FIXED
mozilla50
Tracking Status
firefox48 --- fix-optional
firefox49 --- fix-optional
firefox50 --- fix-optional

People

(Reporter: mstange, Assigned: kats)

References

(Depends on 1 open bug, Blocks 5 open bugs)

Details

(Whiteboard: gfx-noted)

Attachments

(2 files)

Attached patch wip patchSplinter Review
At the moment we use shmems and glTexImage2D to upload PaintedLayer contents from the content process into textures on the Compositor. These texture uploads can be very slow and impact APZ smoothness.
There are two parts to the slowness: The first is incurred when we first access a newly-allocated texture, and the second part is the actual memcpy.

We can get pixels into Compositor textures using IOSurfaces instead. With those, at least half of the cost of "texture upload" would be incurred on the content side, and the actual layer transaction would be really fast because the Compositor will then only need to associate IOSurface IDs with textures.

Matt and I wrote a patch that attempts to do this. It mostly works and improves layer transaction performance on the Compositor a lot. There are three problems with it:
 1. On time I saw textures turning black.
 2. Content spends a lot of time copying pixels between front and back buffers of a tile.
 3. Content also spends a lot of time locking the IOSurface.

There may be more improvements to be had here.
^ has reftest failures. Matt, do you know what might be going on there? I can investigate but if you have any thoughts off the top of your head...
Flags: needinfo?(matt.woodrow)
I don't really, sorry.

I can't see any reason why the failing tests would be getting layers, so it seems like actual content drawing is being weird.

Are we using the same moz2d backend for IOSurface as we would normally?

We should probably pass moz2DBackend into MacIOSurfaceTextureData::Create to be sure.
Flags: needinfo?(matt.woodrow)
Here's the talos compare from the first run, btw: https://treeherder.mozilla.org/perf.html#/compare?originalProject=mozilla-central&originalRevision=b6683e141c47&newProject=try&newRevision=03251e6a987a&framework=1&showOnlyImportant=0

As expected, it shows memory usage improving but paint times getting worse (which affects a bunch of tests including CART, tscrollx, etc.). The second try run still has the same reftest failures so it will need more investigation, which I'll do once I have some spare cycles.
I looked through a random subset of the reftest failures, and I think it has something to do with font smoothing. For one thing, all the tests that I looked at use the Ahem font to write something or the other, and the blocks that get drawn have antialiasing (or not) around the edges.

The simplest test I found that was failing was layout/reftests/text/osx-font-smoothing.html which is a != test to check that the "-moz-osx-font-smoothing: grayscale" CSS property results in a different rendering, but the test fails because both renderings are the same. So either everything is getting smooth'd or nothing is.
MacIOSurfaceTextureData::BorrowDrawTarget always creates the draw target with SurfaceFormat::B8G8R8A8, and Skia can't do subpixel-AA in DrawTargets that have alpha. We need to replace SurfaceFormat::B8G8R8A8 with GetFormat().
Summary: Consider using IOSurface for sharing textures to the compositor on OS X → Consider using IOSurface for sharing content-painted textures to the compositor on OS X
Whiteboard: gfx-noted
That try push crashed and burned because of infra problems and getting into a bad state. Here's a new one, same patches, rebased on m-c:

https://treeherder.mozilla.org/#/jobs?repo=try&revision=edd186caa57f
Ok, so that fixed the vast majority of the reftest failures. There's only two left, both with "capturestream" in the name so likely something to do with that.
See Also: → 949573
See Also: 9495731265824
After a bunch of printf-debugging, it looks like the failure is caused by the MacIOSurfaceTextureData not implementing UpdateFromSurface. When we hit the code at [1], the call returns false, and NS_IsMainThread() is also false, so we don't go into the fallback path.

I implemented this function in MacIOSurfaceTextureData (basically copied the code from the X11 one, which is the same as the main-thread version) and that seemed to fix the failing reftests locally. I'm not sure if that implementation is fine, or if we should manually memcpy the data into the MacIOSurface buffer.

I'll do a try push with that tacked on. The next problem here I think is the talos paint regressions - Markus, you said there was some way to improve that, can you elaborate?
(In reply to Kartikaya Gupta (email:kats@mozilla.com) from comment #11)
> The next problem here I think is the
> talos paint regressions - Markus, you said there was some way to improve
> that, can you elaborate?

This is a question for Matt. I've forgotten the answer but I was just repeating what Matt had told me.
Flags: needinfo?(matt.woodrow)
We should probably have a good at bug 1265824 to see which option gives better results.

The performance problem I was worried about was that anytime we lock the IOSurface we're going to be reading back the entire tile (unless IOSurface keeps a cached CPU memory copy), and when we unlock we will be uploading the entire buffer.

The idea I had was to have a GLContext in the content process, and then manage our own CPU memory buffer, and do uploads manually using glTexSubImage2D.

I'm not actually sure if we can manually upload to a texture backed by an IOSurface, but it seems like it would help a fair bit here.
Flags: needinfo?(matt.woodrow)
I see. IOSurface does have a CPU memory cache, as far as I can tell - I only saw us blocking in the Lock call when I was burning through many tiles by doing lots of painting.
If there are no other cheap wins using IOSurface, then I think we should just do bug 1265824 and disregard this bug.
We could even do both I think. Bug 1265824 would improve our upload speed, and IOSurface could let us do it in the content process (and make sure we never block APZ).
(In reply to Kartikaya Gupta (email:kats@mozilla.com) from comment #14)
> The promised try push:
> https://treeherder.mozilla.org/#/
> jobs?repo=try&revision=19cd851bb6b0&group_state=expanded

reftests are green, talos is still showing regressions. Sounds like we need to get bug 1265824 done first. I'll try to take a look at that once I have some more free cycles, probably not till next week.
Actually, can we land these patches, but put the toggle behind a pref? I'd like to make it easier to flip this on and off, without having to apply a big stack of patches. I'll post cleaned up patches, not sure who should be the author/reviewer on the main patch (attachment 8736897 [details] [diff] [review]).
Argh, the patches are bitrotted. I'm getting crashes in skia after rebasing :(
I rebased the patches and put the stuff behind a pref, but am getting some leaked objects in debug builds.

https://treeherder.mozilla.org/#/jobs?repo=try&revision=f615a14b2c7b&group_state=expanded

I'll try to figure out what's going on, but the above try push has the latest rebased code if anybody else needs it.
With the patch in bug 1280297 I discovered that there's a cyclical reference which is probably causing the leak.

LayerTransactionParent holds a RefPtr to LayerManagerComposite
LayerManagerComposite holds a RefPtr to CompositorOGL
Compositor holds a bunch of ShmemTextureReadLock instances in mUnlockAfterComposition
The ShmemTextureReadLock instances hold RefPtrs to LayerTransactionParent

I'm not really sure about why cycle is created or what the correct way to break it is, but it seems like the Compositor should somehow be dropping the ShmemTextureReadLock pointers, as they are supposed to be transient, I think.
I broke the LayerTransactionParent -> LayerManagerComposite link because that was easy to do, but it didn't fix the leak. There's some other path(s) from LayerTransactionParent -> Compositor, I suspect via LayerComposite subclasses.
See Also: → 1176011
I talked this over with Matt and Sotaro, and Sotaro said we should move the code that does the ReadUnlock() calls from ~Compositor to Compositor::Destroy, so that it runs earlier. I tried that and got crashes, because of yet another bug that Sotaro tracked down, to do with shmem lifetimes being tied to PLayerTransaction. This bug is likely the same root cause behind the backout of bug 1176011, and :gw280 is working on fixing it. I'll file another bug for the memory leak and set up the dependencies.
The leak was injected by bug 1272600. bug 1176011 is going to extend shmem lifetime to PCompositorBridge ipc life time.
New try push based on latest m-c, which includes bug 1176011:

https://treeherder.mozilla.org/#/jobs?repo=try&revision=2483322da656

(With bug 1176011 I wasn't able to reproduce the leak locally, so bug 1280436 may not be needed.)
Ignore comment 26, I can still reproduce the leak. The patch on bug 1280436 causes a crash still.
Comment on attachment 8763604 [details] [diff] [review]
Allow using IOSurface instead of texture upload on OS X

I think this is probably ok to land now. The pref is off by default so it doesn't have any impact. We'll need to reduce the talos impact (by using ClientStorage?) and fix bug 1280762 to turn on the pref.
Attachment #8763604 - Attachment description: Updated WIP → Allow using IOSurface instead of texture upload on OS X
Attachment #8763604 - Flags: review?(nical.bugzilla)
Attachment #8763604 - Flags: review?(matt.woodrow)
Assignee: nobody → bugmail.mozilla
OS: Unspecified → Mac OS X
Hardware: Unspecified → x86_64
Comment on attachment 8763604 [details] [diff] [review]
Allow using IOSurface instead of texture upload on OS X

Review of attachment 8763604 [details] [diff] [review]:
-----------------------------------------------------------------

::: gfx/thebes/gfxPrefs.h
@@ +281,5 @@
>    // Disable surface sharing due to issues with compatible FBConfigs on
>    // NVIDIA drivers as described in bug 1193015.
>    DECL_GFX_PREF(Live, "gfx.use-glx-texture-from-pixmap",       UseGLXTextureFromPixmap, bool, false);
>  
> +  DECL_GFX_PREF(Live, "gfx.use-iosurface-textures",            UseIOSurfaceTextures, bool, false);

How robust is it to switch between iosurfaces and shmem at runtime? I don't know how well we'd handle having a mix of different texture types in the same tiled layer. If we don't handle that well or in doubt, I'd rather make it a "Once" pref.
Attachment #8763604 - Flags: review?(nical.bugzilla) → review+
(In reply to Nicolas Silva [:nical] from comment #30)
> How robust is it to switch between iosurfaces and shmem at runtime? I don't
> know how well we'd handle having a mix of different texture types in the
> same tiled layer. If we don't handle that well or in doubt, I'd rather make
> it a "Once" pref.

Oh whoops, you're right, I'll make it Once. I don't think I even tested changing it live, and don't think we need to support that.
Also for the record here's a try push with it turned on:

https://treeherder.mozilla.org/#/jobs?repo=try&revision=cee7fe779096&group_state=expanded

(includes the patch from bug 1280762, which is required to avoid leaks/crashes). The try push is green, but the talos tests show that there is still a perf regression:

https://treeherder.mozilla.org/perf.html#/compare?originalProject=mozilla-central&originalRevision=51377a641589&newProject=try&newRevision=cee7fe779096&framework=1

Hopefully landing ClientStorage will negate that perf regression.
Comment on attachment 8763604 [details] [diff] [review]
Allow using IOSurface instead of texture upload on OS X

Review of attachment 8763604 [details] [diff] [review]:
-----------------------------------------------------------------

::: gfx/layers/MacIOSurfaceImage.cpp
@@ +18,5 @@
>  TextureClient*
>  MacIOSurfaceImage::GetTextureClient(CompositableClient* aClient)
>  {
>    if (!mTextureClient) {
> +    BackendType backend = gfxPlatform::GetPlatform()->GetDefaultContentBackend();

Can we instead pass BACKEND_NONE to indicate this TextureClient can't be drawn to?

::: gfx/layers/opengl/MacIOSurfaceTextureClientOGL.cpp
@@ +96,5 @@
>  }
>  
> +already_AddRefed<DrawTarget>
> +MacIOSurfaceTextureData::BorrowDrawTarget()
> +{

Lets return nullptr (or assert even) here if mBackend == BACKEND_UNKNOWN so that my suggestion for MacIOSurfaceImage is handled correctly.
Attachment #8763604 - Flags: review?(matt.woodrow) → review+
Updated patch based on comment 33. Try push with IOSurface disabled:
https://treeherder.mozilla.org/#/jobs?repo=try&revision=d6284cca51b4&group_state=expanded
and enabled:
https://treeherder.mozilla.org/#/jobs?repo=try&revision=24190457efbc&group_state=expanded

The disabled one looks good so I'll land this. The enabled one is still mostly in progress, but if errors turn up we can file follow-ups for them.
Pushed by kgupta@mozilla.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/7a6d37ee074f
Add the ability to use IOSurface instead of texture upload on OS X. r=nical,mattwoodrow
https://hg.mozilla.org/mozilla-central/rev/7a6d37ee074f
Status: NEW → RESOLVED
Closed: 3 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla50
You need to log in before you can comment on or make changes to this bug.