Closed Bug 870211 Opened 6 years ago Closed 6 years ago

Investigate performance regressions with OMTC on Mac

Categories

(Core :: Graphics: Layers, defect)

23 Branch
x86
macOS
defect
Not set

Tracking

()

RESOLVED FIXED
Tracking Status
firefox22 --- unaffected
firefox23 + fixed
firefox24 --- fixed

People

(Reporter: mattwoodrow, Assigned: mattwoodrow)

References

Details

(Keywords: perf, regression, Whiteboard: [leave open])

Attachments

(8 files, 3 obsolete files)

17.04 KB, patch
roc
: review+
Details | Diff | Splinter Review
1.24 KB, patch
roc
: review+
mattwoodrow
: checkin+
Details | Diff | Splinter Review
2.25 KB, patch
jgilbert
: review+
jgilbert
: review+
Details | Diff | Splinter Review
4.03 KB, patch
nical
: review+
Details | Diff | Splinter Review
19.25 KB, patch
nical
: review+
Details | Diff | Splinter Review
17.46 KB, patch
nical
: review+
Details | Diff | Splinter Review
16.81 KB, patch
bas.schouten
: review+
Details | Diff | Splinter Review
20.01 KB, patch
mattwoodrow
: review+
Details | Diff | Splinter Review
Latest talos run: https://tbpl.mozilla.org/?tree=Try&rev=f4a8cd65a66e
Compare talos: http://perf.snarkfest.net/compare-talos/index.html?oldRevs=b842d26dd5f0,c50f597b1e6a,afb7995ef276,dae38fc0aeb4,b109e2dbf03b&newRev=f4a8cd65a66e&submit=true

Tp5 memory usage, TResize and tscrollr seem to be significant regressions.

I have profiles (with Instruments) for TResize and tscroll.

BenWa: Are there instructions anywhere on running talos locally with the gecko profiler?
TScroll:

This tests runs 3 different pages. The first two are fairly similar, and the third is significantly slower with OMTC.

We're spending huge amounts of time self-copying data around the buffer because we're trying to draw into what would be multiple quadrants if we used buffer rotation.

This is cheaper with GL, since we're moving GPU memory around instead (20% of the profile vs 33%). I'm not sure if this accounts for the entire regression, but there's nothing else that stands out as being slow.

I think the main problem here is that with OMTC we're keeping a rotated system memory buffer around, as well as a texture of the same size. Copying data around the buffer, and into new buffers is slower. This probably also explains the memory usage regression.

Possible solutions:

* Add new Compositable/Texture types that work like OGL layers did. We want to only use a system memory surface big enough for the update we're drawing, and only keep a full size buffer as a texture. All buffer rotation and copying operations happen on the GPU.
  - This is quite a lot of work, everything expect the texture being passed across IPC to be the entire surface.

* Debug why we have to self-copy every frame with the 3rd test on tscroll and fix it.
  - Somewhat tangential to the actual issue here, but would probably fix this specific performance problem

* Ignore it
  - I suspect this won't be popular :)

* Others?


I will try get gecko profiler traces for these runs and make them available.
Elaborating on the second option 'Debug....'

The reason we have to self copy instead of using buffer rotation is because the repaint region contains two rects.

One is the 5 pixel high strip at the bottom that has been scrolled into view, and the other is a 1 pixel high strip at the top.

This happens because we're clipping the scrolled content (the clip doesn't scroll with the content), and the top line of the clip is at a sub-pixel offset.

When we scroll we compute the region that has changed in regard to clipping, which finds two app unit regions (the area scrolled off the top, and the area scrolled in at the bottom).

We convert these regions into pixels (rounding outwards) and invalidate them. The majority of the top region invalidates an area that isn't part of the layer and is ignored, but the 1 pixel strip (that was initially a partial pixel before rounding) remains.

It seems like this is expected behaviour, and not a bug. It's possible that we would snap the clipping and drawing to the nearest pixel though, and DLBI could factor that in before converting app units to pixels.
Adding needinfo?BenWa for the question about profiling talos.
Flags: needinfo?(bgirard)
I think Instruments isn't showing time spent blocking on synchronous IPC calls in an obvious way.

Since we're self copying pixels around on the CPU surface, we mark the whole things as changed and re-upload the whole thing to the GPU (on the compositor thread, but while blocking the main thread).
We could probably get significant performance improvements by

a) Annotating the self-copy operation, and sending this to the compositor, rather than just marking the entire buffer as 'dirty'. This would massively reduce the amount of upload we need to do.

b) Upload to the texture outside of the transaction. Blocking the main thread on this sucks.

Not sure if those will be enough to fix the regression, or whether I should just look at the first option.
Where is the subpixel clipping offset coming from? We generally do try to snap cliprects, because subpixel clip rects are not that useful.

I kinda like the idea of "only use a system memory surface big enough for the update we're drawing, and only keep a full size buffer as a texture." :-)
Filed bug 870287 for the clip comparison code not taking pixel snapping into account.

I'll get started on the latter part tomorrow.
(In reply to Matt Woodrow (:mattwoodrow) from comment #3)
> Adding needinfo?BenWa for the question about profiling talos.

There's no way to do so but this was one of the primary use cases for the profiler I wanted to support. We just still need to do that work. Eideticker uses it so we know it can be done.

If you just want to do it locally only I can take a look into that. Doing it on talos would be more involved I think. Ideally we can get releng to upload an additional file (the profile) to the FTP.
Flags: needinfo?(bgirard)
Attached patch WIP: Incremental updates (obsolete) — Splinter Review
Seems to work for simple cases, including scrolling with buffer rotation.

ToDo:

Move texture uploads out of the critical path
Tidy up code
Check performance
Recycle Shmem buffers possibly
Run tests and fix any bugs
Nominating for tracking-firefox23 because (In reply to Matt Woodrow (:mattwoodrow) from comment #0)
> Tp5 memory usage, TResize and tscrollr seem to be significant regressions.

Ts also regressed when bug 756601 and friends landed on inbound (but not in the Try run from comment 0).

Nominating for tracking-firefox23.  Note that Firefox 23 moves to Aurora on Monday (three days from now), so any fixes/backouts/etc. after that will need to land on the Aurora branch if we don't want to release with these regressions.
Keywords: perf, regression
Version: unspecified → 23 Branch
Matt, assigning to you - can you say what the status quo is in terms of a metric for success here in not shipping a regression of performance to FF23 users?
Assignee: nobody → matt.woodrow
(In reply to Matt Brubeck (:mbrubeck) from comment #11)
> Nominating for tracking-firefox23.  Note that Firefox 23 moves to Aurora on
> Monday (three days from now), so any fixes/backouts/etc. after that will
> need to land on the Aurora branch if we don't want to release with these
> regressions.

We can just back bug 756601 on Aurora. We won't be using it there anyway.
This fixes the Ts regression from landing 756601, and makes me sad.

The layers refactoring accidentally stopped us from using TextureImageCGL (which contains OSX-specific OpenGL extensions to improve texture upload performance).

One of my patches fixed this, and apparently caused a performance regression. This doesn't really make sense, but I've confirmed that switching back to BasicTextureImage fixes the regression.
Attachment #748612 - Flags: review?(roc)
Can you put the reverse patch in another bug? Texture upload performance sounds desirable...
(In reply to Robert O'Callahan (:roc) (Mozilla Corporation) from comment #15)
> Can you put the reverse patch in another bug? Texture upload performance
> sounds desirable...

I'm not sure if I see the point if it doesn't actually work. It definitely gave us an improvement when we wrote it though.

It also won't affect OMTC, so we can probably ignore it.
I'm not sure why we initially decided that this only needed to happen on certain drivers, but I'm adding a caller that has a bound framebuffer so we need to preserve it.
Attachment #747916 - Attachment is obsolete: true
Attachment #748630 - Flags: review?(jgilbert)
We need to know these details on the client side with the following patches
Attachment #748631 - Flags: review?(nical.bugzilla)
The following set of patches basically take the ThebesLayer painting/ texture uploading code from ThebesLayerOGL and split it into client/host parts.

Taking guesses as to who might want to review each part, please feel free to steal or swap as you wish.
Attachment #748632 - Flags: review?(bas)
This is a ContentHost implementation that understands updating its texture from partial SurfaceDescriptors.
Attachment #748633 - Flags: review?(nical.bugzilla)
Attachment #748634 - Flags: review?(bas)
The other half of incremental updates.

A ContentClient implementation that doesn't retain a TextureClient, but instead allocates Shmem big enough for the changed region and sends it to the ContentHost.
Attachment #748635 - Flags: review?(nical.bugzilla)
These patches fix the Tp5 memory usage regression, and the TResize regression.

TScroll is significantly improved, but still slightly behind trunk.

A bunch of painting tests are now showing perf wins.

http://perf.snarkfest.net/compare-talos/index.html?oldRevs=b842d26dd5f0,c50f597b1e6a,afb7995ef276,dae38fc0aeb4,b109e2dbf03b&newRev=a8d2ad9e603a&submit=true
A lot of the complexity of this code is because of buffer rotation. Now that texture copying and uploads are done without blocking the main thread we could probably stop bothering with this.

The D3D9/10 backends never bothered, and didn't suffer from terrible performance.

Would remove a lot of code and complexity, might be a nice follow-up (first/mentored bug even).
Landed the BasicTextureImage change to avoid shipping a regression with the next uplift.

https://hg.mozilla.org/integration/mozilla-inbound/rev/ab40e8c3c169
Whiteboard: [leave open]
Attachment #748612 - Flags: checkin+
(In reply to Matt Woodrow (:mattwoodrow) from comment #16)
> It also won't affect OMTC, so we can probably ignore it.

Oh, never mind then.
(In reply to Matt Woodrow (:mattwoodrow) from comment #24)
> A lot of the complexity of this code is because of buffer rotation. Now that
> texture copying and uploads are done without blocking the main thread we
> could probably stop bothering with this.
> 
> The D3D9/10 backends never bothered, and didn't suffer from terrible
> performance.
> 
> Would remove a lot of code and complexity, might be a nice follow-up
> (first/mentored bug even).

I would want to see what performance is like on GPU-less systems before we do this.
I'm only suggesting removing it from the incremental-updates path. BasicCompositor will likely still want to use ThebesLayerBuffer.
Comment on attachment 748630 [details] [diff] [review]
Save/Restore framebuffer when blitting textures

Review of attachment 748630 [details] [diff] [review]:
-----------------------------------------------------------------

2*r+.
Attachment #748630 - Flags: review?(jgilbert)
Attachment #748630 - Flags: review+
(In reply to Matt Woodrow (:mattwoodrow) from comment #24)
> A lot of the complexity of this code is because of buffer rotation. Now that
> texture copying and uploads are done without blocking the main thread we
> could probably stop bothering with this.
> 
> The D3D9/10 backends never bothered, and didn't suffer from terrible
> performance.
> 
> Would remove a lot of code and complexity, might be a nice follow-up
> (first/mentored bug even).

The D3D9/10 backends used GPU memcpy. We'd need to implement something like that if we wanted to go with it. (Probably expose something on TextureClient/Host) I'm open to doing this if indeed we can prove it performs well on the CPU. (I'm still hoping we can get at least D3D9 everywhere)
Comment on attachment 748632 [details] [diff] [review]
Make TextureHosts support updating from partial surfaces

Review of attachment 748632 [details] [diff] [review]:
-----------------------------------------------------------------

I'm reluctantly okay with this, it isn't exactly pretty. Also note it won't compile on Windows due to it not changing the UpdateImpl signature there.

::: gfx/layers/composite/TextureHost.h
@@ +176,5 @@
>     * Update the texture host using the data from aSurfaceDescriptor.
>     */
>    void Update(const SurfaceDescriptor& aImage,
> +              nsIntRegion *aRegion = nullptr,
> +              nsIntPoint* aOffset = nullptr);

Document the offset parameter here! Its meaning isn't obvious!

@@ +257,4 @@
>  
>    SurfaceDescriptor* GetBuffer() const { return mBuffer; }
> +
> +  virtual gfxASurface::gfxContentType ContentType() { return gfxASurface::CONTENT_COLOR_ALPHA; }

This doesn't make me happy. We should be able to remove it and use the Format which is already exposed.
Attachment #748632 - Flags: review?(bas) → review+
Comment on attachment 748634 [details] [diff] [review]
New IPC messages for doing incremental updates

Review of attachment 748634 [details] [diff] [review]:
-----------------------------------------------------------------

This really needs documentation.
Attachment #748634 - Flags: review?(bas) → review-
Comment on attachment 748632 [details] [diff] [review]
Make TextureHosts support updating from partial surfaces

Review of attachment 748632 [details] [diff] [review]:
-----------------------------------------------------------------

Quickly sneaking in a review that doesn't belong to me

::: gfx/layers/composite/TextureHost.h
@@ +176,5 @@
>     * Update the texture host using the data from aSurfaceDescriptor.
>     */
>    void Update(const SurfaceDescriptor& aImage,
> +              nsIntRegion *aRegion = nullptr,
> +              nsIntPoint* aOffset = nullptr);

Please document what aOffset is vs aRegion (reading the other patch it looks like it's the source offset whereas aRegion refers to the destination but it'd be good to document it here).

@@ +247,5 @@
>    virtual const char *Name() = 0;
>    virtual void PrintInfo(nsACString& aTo, const char* aPrefix);
>  #endif
>  
> +  virtual void EnsureBuffer(const nsIntSize& aSize, gfxASurface::gfxContentType aType) {}

Please document.
Also, this will most likely have to go away with bug 858914, with which management of TextureClient/Host's shared buffers will be more strictly defined. So if we go for this now let's make it clear that it's temporary.

@@ +251,5 @@
> +  virtual void EnsureBuffer(const nsIntSize& aSize, gfxASurface::gfxContentType aType) {}
> +
> +  virtual void CopyTexImage(const nsIntRect& aSourceRect,
> +                            TextureHost *aDest,
> +                            const nsIntRect& aDestRect) { }

I am not sure I like exposing CopyTexImage this way (especially with most texture hosts not implementing it and the unsafe casting in TextureImageTextureHostOGL).

I am also worried of this silently doing nothing if we are using the wrong texture host. If we do this we should either make it pure virtual or assert here.

::: gfx/layers/opengl/TextureHostOGL.cpp
@@ +180,5 @@
> +                                         const nsIntRect& aDestRect)
> +{
> +  // TODO: Make sure this cast is safe
> +  TextureImageTextureHostOGL *dest =
> +    static_cast<TextureImageTextureHostOGL*>(aDest);

This looks a bit scary.

@@ +185,5 @@
> +  mGL->BlitTextureImage(mTexture, aSourceRect,
> +                        dest->mTexture, aDestRect);
> +  // TODO: Should we expose this as TextureHost API and call
> +  // it after all CopyTexImage calls?
> +  dest->mTexture->MarkValid();

I'd rather avoid adding too much API to TextureHost for now, are there other case that would benefit from exposing MarkValid ?

::: gfx/layers/opengl/TextureHostOGL.h
@@ +109,4 @@
>  
>    virtual void SetCompositor(Compositor* aCompositor) MOZ_OVERRIDE;
>  
> +  virtual void EnsureBuffer(const nsIntSize& aSize, gfxContentType aType);

Please use MOZ_OVERRIDE whenever it applies.
Attachment #748632 - Flags: review+ → review?(bas)
Comment on attachment 748631 [details] [diff] [review]
Expose more details in TextureIdentifier

Review of attachment 748631 [details] [diff] [review]:
-----------------------------------------------------------------

r=me with the my comment below fixed.

::: gfx/layers/ipc/CompositableForwarder.h
@@ +147,5 @@
>  protected:
>    uint32_t mMaxTextureSize;
>    LayersBackend mCompositorBackend;
> +  bool mSupportsTextureBlitting;
> +  bool mSupportsPartialUploads;

These two members should be initialized in the constructor

::: gfx/layers/ipc/ShadowLayers.cpp
@@ +155,5 @@
>    Transaction* mTxn;
>  };
>  
>  void
>  CompositableForwarder::IdentifyTextureHost(const TextureFactoryIdentifier& aIdentifier)

(Not related to your patch, but I wonder what this is doing in this file.)
Attachment #748631 - Flags: review?(nical.bugzilla) → review+
Comment on attachment 748632 [details] [diff] [review]
Make TextureHosts support updating from partial surfaces

Review of attachment 748632 [details] [diff] [review]:
-----------------------------------------------------------------

Nical's comments are good too! Please do reply to them as well :).

::: gfx/layers/composite/TextureHost.h
@@ +176,5 @@
>     * Update the texture host using the data from aSurfaceDescriptor.
>     */
>    void Update(const SurfaceDescriptor& aImage,
> +              nsIntRegion *aRegion = nullptr,
> +              nsIntPoint* aOffset = nullptr);

Document the offset parameter here! Its meaning isn't obvious!

@@ +257,4 @@
>  
>    SurfaceDescriptor* GetBuffer() const { return mBuffer; }
> +
> +  virtual gfxASurface::gfxContentType ContentType() { return gfxASurface::CONTENT_COLOR_ALPHA; }

This doesn't make me happy. We should be able to remove it and use the Format which is already exposed.
It looks like the Ts, tresize, and tscroll regressions are fixed by ab40e8c3c169 which landed on inbound.  Should that one patch be uplifted to Aurora (or should we wait and uplift this whole patch series)?
Comment on attachment 748612 [details] [diff] [review]
Fix Ts regression

[Approval Request Comment]
Bug caused by (feature/regressing bug #): Bug 756601
User impact if declined: Performance regression.
Testing completed (on m-c, etc.): Just landed on m-c.
Risk to taking this patch (and alternatives if risky): Should be no risk, just reverts the change.
String or IDL/UUID changes made by this patch: None
Attachment #748612 - Flags: approval-mozilla-aurora?
Attachment #748612 - Flags: approval-mozilla-aurora? → approval-mozilla-aurora+
Added documentation to CompositableForwarder.h
Attachment #748634 - Attachment is obsolete: true
Attachment #749075 - Flags: review?(bas)
Fixed review comments.

Carrying forward r=Bas
Attachment #748632 - Attachment is obsolete: true
Attachment #748632 - Flags: review?(bas)
Attachment #749076 - Flags: review+
Comment on attachment 748635 [details] [diff] [review]
Add ContentClientIncremental

Review of attachment 748635 [details] [diff] [review]:
-----------------------------------------------------------------

::: gfx/layers/client/ContentClient.h
@@ +331,5 @@
>  protected:
>    virtual void CreateFrontBufferAndNotify(const nsIntRect& aBufferRect) MOZ_OVERRIDE;
>  };
>  
> +class ContentClientIncremental : public ContentClientRemote

Please add some doc for this class

@@ +401,5 @@
> +
> +  SurfaceDescriptor mUpdateDescriptor;
> +  SurfaceDescriptor mUpdateDescriptorOnWhite;
> +
> +  ContentType mContentType;

This member should be initialized in the ctor
Attachment #748635 - Flags: review?(nical.bugzilla) → review+
Comment on attachment 748633 [details] [diff] [review]
Add ContentHostIncremental

Review of attachment 748633 [details] [diff] [review]:
-----------------------------------------------------------------

Some of the logic here will have to be changed when we land the new TextureClient/Host model, but I guess we can't avoid that.

As a side note I am not too found of the way we tend to add virtual methods to interfaces and only implement them for one of the implementations. If we keep doing this we may want, at some point, to split top level interfaces (like CompositableHost) into smaller sub-interfaces like we did for TextureSource. Maybe something like IncrementalCompositableHost* CompositableHost::AsIncrementalCompositable() and add the specific API there.

::: gfx/layers/composite/CompositableHost.h
@@ +102,5 @@
> +  virtual void UpdateIncremental(TextureIdentifier aTextureId,
> +                                 SurfaceDescriptor& aSurface,
> +                                 const nsIntRegion& aUpdated,
> +                                 const nsIntRect& aBufferRect,
> +                                 const nsIntPoint& aBufferRotation)

Please add some doc

@@ +129,5 @@
>                                   const TextureInfo& aTextureInfo) = 0;
>  
> +  virtual void EnsureTextureHost(ISurfaceAllocator* aAllocator,
> +                                 const TextureInfo& aTextureInfo,
> +                                 const nsIntRect& aBufferRect)

nit: I think it's worth adding a comment here to say that this is expected to be only used with incremental compositables

::: gfx/layers/composite/ContentHost.h
@@ +198,5 @@
>    virtual void PrintInfo(nsACString& aTo, const char* aPrefix);
>  #endif
>  };
>  
> +class ContentHostIncremental : public ContentHostBase

Please add some doxygen-style doc at the top of the class and add MOZ_OVERRIDE in the method declarations whenever applicable.

@@ +329,5 @@
> +  };
> +
> +  nsTArray<nsAutoPtr<Request> > mUpdateList;
> +
> +  ISurfaceAllocator* mDeAllocator;

This member should be initialized in the constructor
Attachment #748633 - Flags: review?(nical.bugzilla) → review+
Comment on attachment 749075 [details] [diff] [review]
New IPC messages for doing incremental updates v2

Review of attachment 749075 [details] [diff] [review]:
-----------------------------------------------------------------

This -really- makes me want to cry. We -really- have to work this buffer rotation issue out to be more sane or we'll hack the refactored layers into as bad a situation as the old layers were within months. Please add a bug for this work.
Attachment #749075 - Flags: review?(bas) → review+
Status: NEW → RESOLVED
Closed: 6 years ago
Resolution: --- → FIXED
Depends on: 874736
Depends on: 876661
You need to log in before you can comment on or make changes to this bug.