Add a ClientStorageTextureSource

RESOLVED FIXED in Firefox 63

Status

()

P2
normal
RESOLVED FIXED
3 years ago
a month ago

People

(Reporter: jrmuizel, Assigned: dthayer)

Tracking

(Depends on: 1 bug, Blocks: 14 bugs, {feature, perf})

unspecified
mozilla63
Unspecified
macOS
feature, perf
Points:
---
Dependency tree / graph

Firefox Tracking Flags

(relnote-firefox 63+, firefox63 fixed)

Details

(Whiteboard: [gfx-noted][fxperf:p1][qf:p3:f64])

Attachments

(28 attachments, 8 obsolete attachments)

2.98 KB, text/plain
Details
20.06 KB, patch
Details | Diff | Splinter Review
2.32 MB, image/jpeg
Details
5.03 MB, video/webm
Details
37.68 KB, patch
Details | Diff | Splinter Review
3.84 KB, patch
Details | Diff | Splinter Review
8.87 KB, patch
Details | Diff | Splinter Review
8.36 KB, patch
Details | Diff | Splinter Review
36.72 KB, patch
Details | Diff | Splinter Review
5.57 KB, patch
Details | Diff | Splinter Review
3.88 KB, patch
Details | Diff | Splinter Review
1.25 KB, patch
Details | Diff | Splinter Review
59 bytes, text/x-review-board-request
mattwoodrow
: review+
Details
59 bytes, text/x-review-board-request
mattwoodrow
: review+
Details
59 bytes, text/x-review-board-request
mattwoodrow
: review+
Details
59 bytes, text/x-review-board-request
mattwoodrow
: review+
Details
59 bytes, text/x-review-board-request
mattwoodrow
: review+
Details
59 bytes, text/x-review-board-request
mattwoodrow
: review+
Details
1.62 KB, text/html
Details
1.96 MB, text/html
Details
6.21 MB, text/html
Details
4.20 MB, text/html
Details
8.29 MB, text/html
Details
4.61 MB, text/html
Details
59 bytes, text/x-review-board-request
mattwoodrow
: review+
Details
59 bytes, text/x-review-board-request
froydnj
: review+
Details
59 bytes, text/x-review-board-request
mattwoodrow
: review+
Details
59 bytes, text/x-review-board-request
mattwoodrow
: review+
Details
(Reporter)

Description

3 years ago
Our current upload path on OS X is quite poor. CoreAnimation does much better. We should too.
(Reporter)

Updated

3 years ago
Duplicate of this bug: 949573
(Reporter)

Updated

3 years ago
See Also: → bug 1261166
Whiteboard: [gfx-noted]
Because I don't trust etherpad to not delete stuff randomly
See Also: → bug 1279131
BufferTextureHost already has most of what you need to get client storage working, you just need to add a ClientStorageTextureSource that wraps the BufferTextureHost the same way we do with the basic compositor (see BufferTextureHost::EnsureWrappingTextureSource).
So we don't need for an extra TextureClient/Host pair.
Based on m-c 300620:1828937da949

This code holds the read lock to the textures for quite a few frames after they have been used (probably more than necessary) to ensure that OpenGL has finished reading from them before we attempt to write to the system memory again.

This unfortunately interacts really poorly with our tile recycling, and we end up doing a lot of allocations (and spending time in the VM mapping and zeroing memory).

We probably need to figure out the minimum number of frames we need to hold them for, and then fix the tile pool to recycle nicely for this. Alternatively we should try blocking the compositor until previous frames have fully completed (using the apple fence extension?) so that our existing recycling assumptions work.

The patch also has code to temporarily disable alignment to partial tiles. Aligning to partial tiles means that we get lots of invalidations for partial tiles, and spend a lot of time reading from the old front buffer into our new back which was interacting badly with the above recycling issues. This is just experimental, probably not something we want to do for real (since we can get partial tiles in other ways, like the document boundary).
(Reporter)

Comment 7

3 years ago
It looks like CoreAnimations is actually using TexSubImage in addition to TexImage (this makes some sense). I'll investigate more into what they're doing.
(Reporter)

Comment 8

3 years ago
(In reply to Jeff Muizelaar [:jrmuizel] from comment #7)
> It looks like CoreAnimations is actually using TexSubImage in addition to
> TexImage (this makes some sense). I'll investigate more into what they're
> doing.

Yeah, they seem to be doing full TexSubImage2Ds. This explains why we were seeing so much more allocation traffic then Safari. They are also not TexSubImage2ding the same textures every frame. They seem to be alternating between sets of textures per frame. (i.e. it appears to be double buffering, but could be something more sophisticated).
(Reporter)

Comment 9

3 years ago
The basic picture here seems to be to:
1. texture->TexImage2D(size, buffer_addr);
2. texture->Draw()
3. Swap();
4. texture2->TexImage2D(size, buffer_addr2);
5. texture2->Draw();
6. Swap();
8. texture1->TexSubImage2D(size, buffer_addr);
9. texture1->Draw();
10. Swap();
11. texture1->TexSubImage2D(size, buffer_addr2);
12. texture1->Draw();
etc.

The modifications of the buffer_addr*[] seem to be protected by fences but the exact mechanism is not clear to me.

Typically the same buffer is always used with TexImage2D and TexSubImage2D as otherwise a copy will occur.

Coreanimation does not always flip between sets of textures. Sometimes it uses the same ones in a row. I don't know why.
(Reporter)

Comment 10

3 years ago
It turns out all of the weirdness I was seeing was because of the out-of-order dtrace printing. When sorting the logs entries by the order they happened everything seems to make sense. Here's my current understanding.

Frame1
0. fence1->FinishFence()
1. texture->TexImage2D(size, buffer_addr);
2. texture->Draw()
3. Swap();
4. fence1->SetFence();

Frame2
5. fence2->FinishFence();
6. texture2->TexImage2D(size, buffer_addr2);
7. texture2->Draw();
8. read_unlock(buffer_addr);
9. Swap();
10. fence2->SetFence();

Frame3
11. fence1->FinishFence();
12. texture1->TexSubImage2D(size, buffer_addr);
13. texture1->Draw();
14. read_unlock(buffer_addr2);
15. Swap();
16. fenec1->SetFence();

Frame4
17. fence2->FinishFence();
18. texture1->TexSubImage2D(size, buffer_addr2);
19. texture1->Draw();
20. read_unlock(buffer_addr);
21. Swap();
22. fence2->SetFence();
The other interesting thing to know would be when do they write to buffer_addr(2)?

I assume it's between the fence and the TexImag2D call.
(Reporter)

Comment 12

3 years ago
My understanding is that the write to buffer_addr will happen sometime after the read_unlock(buffer_addr2) and before the next TexSubImage2D(size, buffer_addr2). Unfortunately, confirming this is difficult as the writes happen in another process.
In particular, assuming comment 12 is true, the timing of the write violates the client storage spec. The spec says that, before you write to the memory, you need to bind a different buffer to your texture with another call to glTexImage2D or delete the texture. It does not look like CoreAnimation is doing that.

Also, Jeff's investigations are being done on 10.9. CoreAnimation's behavior may well have changed in the meantime.
(Reporter)

Comment 14

3 years ago
(In reply to Nicolas Silva [:nical] from comment #5)
> BufferTextureHost already has most of what you need to get client storage
> working, you just need to add a ClientStorageTextureSource that wraps the
> BufferTextureHost the same way we do with the basic compositor (see
> BufferTextureHost::EnsureWrappingTextureSource).
> So we don't need for an extra TextureClient/Host pair.

Since we now need the texture's lifetime to persist as long as the buffer it seems like we need a different approach. What would you recommend nical?
Flags: needinfo?(nical.bugzilla)
(In reply to Jeff Muizelaar [:jrmuizel] from comment #14)
> Since we now need the texture's lifetime to persist as long as the buffer it
> seems like we need a different approach. What would you recommend nical?

I am not sure I understand what you mean by that (or in what way it is different from what we had before, which I don't remember in detail).

As long as all you need to share between the content and compositor processes is a shmem, there is no need to add a new TextureClient/Host pair. The ClientStorage TextureSource can do all of the API-specific things and there are hooks in BufferTextureHost for all of the interesting steps the TextureSource would need to be notified.
Flags: needinfo?(nical.bugzilla)
(Reporter)

Comment 16

3 years ago
I have a test app at https://github.com/jrmuizel/client-storage that is happily uploading about 4k*4k of textures per frame with no noticeable cpu usage being spent in texture upload.
(Reporter)

Comment 17

3 years ago
I found the missing piece for synchronization. CoreAnimation calls FinishObjectAPPLE on the textures before read_unlocking them.
(Reporter)

Comment 18

3 years ago
I've updated the repo above to use FinishObjectAPPLE for synchronization. It works although is slow because we call FinishObjectAPPLE immediately after Swap() which causes us to actually block. CoreAnimation doesn't call FinishObjectAPPLE until the next frame. Here's the updated timeline:

Frame1
1. texture1->TexImage2D(size, buffer_addr1);
2. texture1->Draw()
3. Swap();

Frame2
4. texture2->TexImage2D(size, buffer_addr2);
5. texture2->Draw();
6. texture1->FinishObject();
7. read_unlock(buffer_addr1);
8. Swap();

Frame3
9. texture1->TexSubImage2D(size, buffer_addr1);
10. texture1->Draw();
11. texture2->FinishObject();
12. read_unlock(buffer_addr2);
13. Swap();

Frame4
14. texture2->TexSubImage2D(size, buffer_addr2);
15. texture2->Draw();
16. texture1->FinishObject();
17. read_unlock(buffer_addr1);
18. Swap();
(Reporter)

Comment 19

3 years ago
Nical, CoreAnimation doesn't try to unlock the buffer until the end/beginning of the next frame (i.e. 15ms after the Swap) what's the best way for us to do this?
Flags: needinfo?(nical.bugzilla)
IIUC the current Compositor::ReadUnlockAfterComposition helper does exactly that. It is called for all textures that are not already ReadUnlock'ed (uploads etc.) when they are not current on any compositable (as soon as they have been replaced by another texture), and as a result the compositor ReadUnlocks the textures at the end of the frame where they have been replaced.

We should try that with IOSurfaces and see. It sounds like what you want and it would work the same way other shared gpu textures, which is nice.
Flags: needinfo?(nical.bugzilla)
(Reporter)

Comment 21

3 years ago
(In reply to Nicolas Silva [:nical] from comment #20)
> IIUC the current Compositor::ReadUnlockAfterComposition helper does exactly
> that. It is called for all textures that are not already ReadUnlock'ed
> (uploads etc.) when they are not current on any compositable (as soon as
> they have been replaced by another texture), and as a result the compositor
> ReadUnlocks the textures at the end of the frame where they have been
> replaced.
> 
> We should try that with IOSurfaces and see. It sounds like what you want and
> it would work the same way other shared gpu textures, which is nice.

Is there a way for the TextureSource to know before this is happening? We need a place to call FinishObject on the texture. From my current reading it looks like the unlocking happens without the involvement of the TextureSource.
Flags: needinfo?(nical.bugzilla)
Won't we need to have triple buffering for this to work correctly?

If we have content drawing and compositing running simultaneously, then we have one buffer being drawn to, one buffer getting TexSubImage2D called on it, and a 3rd waiting in the pipeline for us to read_unlock it.
(In reply to Jeff Muizelaar [:jrmuizel] from comment #21)
> Is there a way for the TextureSource to know before this is happening? We
> need a place to call FinishObject on the texture. From my current reading it
> looks like the unlocking happens without the involvement of the
> TextureSource.

You can just add methods to TextureSource and have the TextureHost call them.

(In reply to Matt Woodrow (:mattwoodrow) from comment #22)
> Won't we need to have triple buffering for this to work correctly?

Sure, but that's outside the scope of TextureClient/Host/Source. The buffering logic is in whichever CompositableClient uses the texture, and we already have made most of them play nice with gralloc which is somewhat similar, although now that I think about it we have fences with gralloc so we can afford to block production instead of adding more buffers I guess.
Flags: needinfo?(nical.bugzilla)
(Reporter)

Comment 24

3 years ago
(In reply to Matt Woodrow (:mattwoodrow) from comment #22)
> Won't we need to have triple buffering for this to work correctly?
> 
> If we have content drawing and compositing running simultaneously, then we
> have one buffer being drawn to, one buffer getting TexSubImage2D called on
> it, and a 3rd waiting in the pipeline for us to read_unlock it.

CoreAnimation does not triple buffer which suggests we shouldn't have to either. The time between a buffer getting TexSubImage2Ded and previous buffer being unlocked should be short enough that we can block production for this period of time.
(In reply to Jeff Muizelaar [:jrmuizel] from comment #24)
> 
> CoreAnimation does not triple buffer which suggests we shouldn't have to
> either. The time between a buffer getting TexSubImage2Ded and previous
> buffer being unlocked should be short enough that we can block production
> for this period of time.

Ok, that seems reasonable. We don't currently have infrastructure (within the tiling code at least) to wait for a lock to become available, the code just checks and creates a new tile if the existing one is locked.
(Reporter)

Comment 26

3 years ago
In talking about this with mstange we thought it might be able to use the same mach connection that we use for shared memory allocations to do this blocking.
(Reporter)

Comment 27

3 years ago
(Reporter)

Updated

3 years ago
Summary: Add a ClientStorage TextureClient/TextureHost pair → Add a ClientStorageTextureSource
(Reporter)

Comment 28

3 years ago
Posted patch WIP (obsolete) — Splinter Review
This version actually seems to work.

As long as we're not creating or destroying textures, we don't seem to spend a noticeable amount of time doing upload. It currently has a hack where we spin on the readlock to wait on the compositor. Currently, we spend a substantial amount of time spinning but the the cpu time in the compositor stays low.
Attachment #8798180 - Attachment is obsolete: true
I had a friend ping me about janky scrolling performance in Firefox, and when talking with kats yesterday, he pointed me to this bug.  It would be great to get this fixed.

Updated

2 years ago
Blocks: 1342209
Jeff and I refreshed our minds today again about how CoreAnimation does this. Here's a picture and a piece of text.

CoreAnimation has two buffers per "tile". When content wants to paint to a buffer, it sends a blocking message to the mach ipc thread on the Compositor. Once that message is replied to, it starts to paint.

Every composite does these things, in this order:
    - Iterate over all textures used in the previous composite and call
      glFinishObjectAPPLE on them.
    - Signal a condition variable to record that these textures are now
      available, and possibly to unblock the libdispatch thread that might be
      waiting on it.
    - Composite the current frame, uploading the new textures via
      glTexSubimage() in the process.
    - Swap buffers.

The mach IPC thread has a handler for the message that content sends when it wants to paint. This handler:
    - Checks the condition variable for that set of textures
    - If it has already been signaled, which means that those textures have
      already been glFinishObjectAPPLE'd, then it directly replies to the
      message from content, and content can paint.
    - If the condition variable has not been signaled, then it kicks off a
      task using GCD. This task will run on a libdispatch thread. This task
      does one thing: it blocks on the condition variable.
    - Once the condition variable has been signaled, and the waiting task is
      unblocked, that task sends the reply to the blocking message from content. 
      This will unblock content so that it can paint.

This achieves the following:
    - Don't write to textures that the compositor is using.
    - Don't allocate a third buffer if both buffers are in use; instead, make
      content wait.
    - Don't block the mach IPC thread if content needs to wait - do the waiting
      on a different thread.

The current plan is to implement something extremely similar to this, because it's a known-to-work state. There might be things we can simplify later on (e.g. don't use mach ipc and take advantage of our own IPDL / IPC mechanisms), but we can do that once we have something that works.

The patch that is currently attached in this bug already has a lot of this, but it doesn't implement the part that would make content wait until it can paint. The same is true for https://github.com/jrmuizel/client-storage/ - it's missing the waiting.

Jeff would like to finish this bug, once he has time, but if somebody else is available to work on it, please do, and don't wait for Jeff.

Updated

2 years ago
Whiteboard: [gfx-noted] → [gf:p1]

Updated

2 years ago
Blocks: 1343558
Is this still relevant with Web Render? If so I will make it a qf:p1
Flags: needinfo?(ehsan)
Whiteboard: [gf:p1] → [gf:p1][qf]
(Reporter)

Comment 32

2 years ago
It will probably not be relevant with webrender.

Comment 33

2 years ago
Even if it's not relevant with Web Render, do we want to keep janking until Web Render ships?
Flags: needinfo?(ehsan) → needinfo?(jmuizelaar)
(Reporter)

Comment 34

2 years ago
No. I'd rather stop janking before WebRender ships.
Flags: needinfo?(jmuizelaar)

Comment 35

2 years ago
Bas, can you please help find an owner for this?  Thanks!
Flags: needinfo?(bas)
Whiteboard: [gf:p1][qf] → [gfx-noted][qf:p1]
Milan, neither me, nical or jamie have OS X machines, do we have someone that could deal with this?
Flags: needinfo?(bas) → needinfo?(milan)

Comment 37

2 years ago
Here is a video demonstrating how bad this bug can be in practice.  Look at the tab throbber to get a sense of the UI jank caused by this.  I was trying to scroll constantly while capturing the video.  The places in the video when Firefox wasn't scrolling was where we were janking.

Updated

2 years ago
Blocks: 1347559
(In reply to Naveed Ihsanullah [:naveed] from comment #31)
> Is this still relevant with Web Render? If so I will make it a qf:p1

Even if it's Mac only?
Flags: needinfo?(milan)
OS: Unspecified → Mac OS X
Flags: needinfo?(nihsanullah)
Yes this should be [qf:p1]. This is bad enough as a user experience that it isn't really shippable in the current form. The compositor jank janks the entire window and the impact is across many sites.
Flags: needinfo?(nihsanullah)
(Reporter)

Comment 40

2 years ago
Here's a somewhat elaborated sketch that I worked on over the weekend. One thing missing is that the frame/transaction id probably needs to live in shared memory so that the client side can check for completion without needing to do any ipc.
Assignee: nobody → jmuizelaar
Attachment #8799032 - Attachment is obsolete: true
Re-tagging for QF triage as this is OS X only.
Whiteboard: [gfx-noted][qf:p1] → [gfx-noted][qf]
Blocks: 1355419
(Reporter)

Updated

2 years ago
No longer blocks: 1355419
Even though this is OSX only the jank is severe enough that users may leave over it. That hits our P1 criteria.
Whiteboard: [gfx-noted][qf] → [gfx-noted][qf:p1]
Assignee: jmuizelaar → nobody
Assignee: nobody → milan
Blocks: 1353137
Given this is OS X-only, it seems like it's time to not have this as qf:p1.
Whiteboard: [gfx-noted][qf:p1] → [gfx-noted][qf]
According to Milan this will require resources that are already stretched with the current qf:p1 load for FF57. Moving to qf:p2 and will re-evaluate for FF58.
Whiteboard: [gfx-noted][qf] → [gfx-noted][qf:p2]
I will do the remaining work.
Assignee: milan → hshih
Status: NEW → ASSIGNED
Whiteboard: [gfx-noted][qf:p2] → [gfx-noted][qf:p3]
Jerry, I don't know if the patches in bug 1005441 are relevant, take a look.
Flags: needinfo?(hshih)
That patch was written before the tiling landed.
I think we could do the whole things in CreateWrappingDataSourceSurface()(just like the wip attachment 8768908 [details] [diff] [review]). I'm getting a weird screen result with my local patch. The tiled buffers are updated by content side unexpectedly. Maybe the problem comes from the tile-buffer recycling.
Flags: needinfo?(hshih)
The "mExternallyOwned" is used for gralloc buffer. We don't use the gralloc buffer now.

MozReview-Commit-ID: 7Gurpa3kdp0
The DirectMapTextureSource could let the compositor to read the buffer directly.
That could get rid of some memory copy operations during texture uploading.

MozReview-Commit-ID: CHhoR96P7VG
The client side can't get the GL context in CompositorOGL. So, it can't know
the texture direct mapping capability directly. This patch adds the texture
direct mapping info in TextureFactoryIdentifier. Then, the client side could
get the info form the TextureFactoryIdentifier.

MozReview-Commit-ID: KEazDVg0p9Y
This patch will use DirectMapTextureSource to wrap the DataSourceSurface data for gpu access.
That could improve the texture uploading performance.

MozReview-Commit-ID: CGPFcCsR1RY
There is no benefit to use recycle the old one. That will have an additional
memory copy from the new texture host to the old texture source. We should
just wrap the new texture host to a direct mapping texture source to get
better texture uploading performance.

MozReview-Commit-ID: 9ONQAhvKmuh
Wait for the reftest and talos's result.
There are a lot of tests failed. Wait for the updated try.
The performance regression comes from the TileClient buffer allocation. It's addressed in comment 24 and comment 25. In my patch, TileClient will allocate a new buffer when the frontbuffer is still read-locked. The buffer allocation costs a lot. But I can see the client_storage improves a lot for texImage2D() calls in profiler. I will try to add a waiting code as comment 24 and 25 or try to have triple buffers for TileClient.
Here are the comparison between the original texture uploading path and the client-storage.

Test with the full-screen video from https://www.csfd.cz/film/402192-zmensovani/videa/

a) The original path:
https://perfht.ml/2mL3nwO

b) client-storage
https://perfht.ml/2mLWlrO

This version use client-storage for texture uploading. And it also creates a glSync object after the teximage2d/texSubImage2d call. Then, we will wait the glSync object at the  end of next composition. I don't have a good way to make the client side waiting for the readlock. So, I just use a simple spinlook at client side. We can see the waiting time in profiler if the compositor doesn't handle the glSync object. We could adjust the glSync waiting position to another position.
The original idea for client-storage is that:

1) before a tiled layer painting, send a sync message to parent
2) parent side starts to wait glsync object. Then, go back to child side.
3) start to paint.

This model will have problem if there are a lot of painting layers. If we try to reduce the message number, we should have a preprocessing step to find all tiled painting layer. And send a message to wait all sync object for these layers. And If the parent side is busy, it can't handle the message soon. Then, the child side could spend more time for waiting.

That's why I don't create a message to trigger the glsync object waiting.
Blocks: 1424325
(Assignee)

Updated

11 months ago
Assignee: bignose1007+bugzilla → dothayer
Whiteboard: [gfx-noted][qf:p3] → [gfx-noted][qf:p3][fxperf:p1]
(Assignee)

Comment 61

10 months ago
Quick update on where this is right now, to any interested parties:

I've set up the IPC on the mach connection we use for shared memory (per comment 26). I went through a bunch of iterations on this, because I was trying to sort out a way to simply sync on a transaction ID. Transaction IDs didn't work out - as far as I can tell this is because we can reuse tiles across many transactions, meaning we might free up some tiles with a transaction but leave the rest alive. I ended up resorting to tracking a set of texture serial IDs per process, and sending an array of required texture IDs up to the parent process when are going to need them. This waits on a condition variable until items are removed from the set, at which point it checks again.

In any case, this does seem to perform rather well, and makes scrolling on sites like gsmarena.com (from Bug 1457413) very smooth. However, it does regress some of the motionmark tests in talos. We seem to be trading memcpy time in the compositor for allocation time in the content process due to the double buffering (?).

Additionally, and more importantly, I'm running into intermittent crashes on talos from timeouts waiting on texture serials to free up. My assumption is that if we're waiting on a texture serial to get the back buffer, then we've already sent up the front buffer, meaning the back buffer should stop being used within a matter of time, but that doesn't seem to be the case. Are there cases where both the front buffer and the back buffer should be locked and won't unlock until we allocate a new back buffer and send it up? It could be a bug in my code, but I just wanted to make sure that assumption was correct. In the mean time I'm just sending up talos runs with printf's scattered everywhere trying to understand better where this is coming from.

Feel free to call out anything in the above that seems off.
(Assignee)

Comment 62

10 months ago
Resolved the crashes, but still seeing the motionmark regressions. Not sure what to do about them. Going to sleep on it.
(In reply to Doug Thayer [:dthayer] (PTO on May 17) from comment #62)
> Resolved the crashes, but still seeing the motionmark regressions. Not sure
> what to do about them. Going to sleep on it.

We have existing bugs open where we're really slow due to allocating (and memset(0)ing) tiles.

The solution to both problems might be to investigate the tile pool size/allocation strategy and see if we can do better.
(Assignee)

Comment 64

10 months ago
(In reply to Matt Woodrow (:mattwoodrow) from comment #63)
> We have existing bugs open where we're really slow due to allocating (and
> memset(0)ing) tiles.
> 
> The solution to both problems might be to investigate the tile pool
> size/allocation strategy and see if we can do better.

Yeah, it seemed like a problem that I was just exacerbating with my changes. From my profiling, it seems the tile pool isn't in the hot path here, since we end up using ClientSingleTiledLayerBuffer for most of the heavy lifting in these tests. I'm not sure what it would take to keep a pool for the ClientSingleTiledLayerBuffer, since its tiles are all different sizes. Maybe we could benefit from using a set of predefined tile sizes and finding the smallest predefined size that fits our requested size, assuming our requested size isn't something like 5x1000.

In either case I don't know what I can do for this within the scope of this bug, so I'm not sure how to go forward here.
Whiteboard: [gfx-noted][qf:p3][fxperf:p1] → [gfx-noted][qf:p3][fxperf:p1][qf:f64]
Comment hidden (mozreview-request)
Comment hidden (mozreview-request)
Comment hidden (mozreview-request)
Comment hidden (mozreview-request)
Comment hidden (mozreview-request)
Comment hidden (mozreview-request)
(Assignee)

Comment 73

10 months ago
The patches still need some cleanup, but I'm throwing them up here in the mean time in case anyone wants to pull them down and take a look.
(Assignee)

Comment 74

10 months ago
Animated gifs on the new reddit (not on the old reddit) are busted with this. If I switch the textures from GL_TEXTURE_RECTANGLE to GL_TEXTURE_2D it avoids the problem, but that doesn't allow us to avoid the driver copy that we avoid with glTextureRangeAPPLE. Trying to figure out why we're running into this, and what's special about the new reddit's animated gifs.
Comment hidden (mozreview-request)
Comment hidden (mozreview-request)
Comment hidden (mozreview-request)
(Assignee)

Comment 78

10 months ago
Hey Jeff - do you have any thoughts on either of these two things?

- What can we do to mitigate the cost of the extra allocations we're now doing? I'm specifically interested in the talos regressions of the motionmark tests, which are the only regression I'm seeing on try. It seems like we don't need to be allocating so much - we don't use a texture pool because we hit the ClientSingleTiledLayerBuffer, since the tests tend to just be large numbers of tiny layers. Could we explore using a pool or set of pools for ClientSingleTiledLayerBuffer? I don't know enough context around this and what exploration has already been done here.

- Can you think of any reasons why using GL_TEXTURE_RECTANGLE_ARB in the DirectMapTextureSource (this looks roughly like the sketch you posted a while ago) would interfere with animated gifs sometimes? It seems to me to be interfering with MacIOSurfaceTextureHost planes, but I'm not sure. Those also use GL_TEXTURE_RECTANGLE_ARB, but I don't see why that would result in any conflict. See comment 74.
Flags: needinfo?(jmuizelaar)
(Reporter)

Comment 79

10 months ago
So the animated gif thing is actually related to videos which perhaps makes some sense. We should try to figure out more about what scenario's cause it.

I'm don't have a concrete suggestion for the motionmark problem yet. I'm going to think about it more.
(Reporter)

Comment 80

10 months ago
Can you post before and after profiles of this patch on the motionmark bouncing clipped rectangle test?
(Reporter)

Updated

10 months ago
Flags: needinfo?(dothayer)
Comment hidden (mozreview-request)
(Assignee)

Comment 82

10 months ago
My assertion that it was due to allocations appears to be off? Maybe I accidentally fixed a bug since I last profiled that test :/

Anyway, here they are:

before: https://perfht.ml/2IMPdZu
after: https://perfht.ml/2xdet5G

Nothing jumps out at me at the outset - the profiles look very similar to me. But maybe you can pick out something I can't. Do you happen to know how the scores are calculated for this test?
Flags: needinfo?(dothayer)
Comment hidden (obsolete)
(Assignee)

Comment 84

10 months ago
wait - wrong profiles (ignore the above comment)

before: https://perfht.ml/2IJqp07
after: https://perfht.ml/2xbarKT

Comment 85

10 months ago
I had a play with the latest Try 

https://treeherder.mozilla.org/#/jobs?repo=try&author=dothayer@mozilla.com&selectedJob=180086432

and it looks great on gsmarena.com. It also seems to fix some of my high GPU power consumption complaints on other pages.

But I sometimes see large black (unrendered?) areas. 

I just got it now, I switched to the (already loaded) gsmarena.com tab and it appeared completely black with just the toolbar. I cmd-R 'd to reload and it refused to render, it just stayed black. Sometimes it shows a small broken thumbnail of the whole page at the center of the blackness. Scrolling or mouseovering shows the status bar hints bottom left but nothing else.

This profile is me scrolling and reloading the blank gsmarena.com tab

https://perfht.ml/2GNUjyx


This is on a MacBookPro11,1 @ 1440x900 resolution. Mostly default settings but gfx.compositor.glcontext.opaque TRUE.

hope that's useful to you, keep up the good work.
(Assignee)

Comment 86

10 months ago
I moved the glFinishObjectAPPLE calls to before composition rather than after, and my paint times for the motionmark tests went down dramatically, so I did another talos run:

https://treeherder.mozilla.org/perf.html#/compare?originalProject=try&originalRevision=7867306bde5158bee26b76f10b3a771d69d60932&newProject=try&newRevision=5ad1e5e4b865f79e3b1c147b0d71d928f7aba159&framework=1

This is interesting because we see more green on it than before, which is good. But the red has also increased. Notably damp seems to have regressed, as well as displaylist_mutate.html, and motionmark is still having some trouble, though it seems it may be less pronounced.

I couldn't reproduce the regression on displaylist_mutate.html (I did set the ASAP mode prefs), but here's the profiles anyway:

displaylist_mutate.html (before): https://perfht.ml/2ILghbm
displaylist_mutate.html (after): https://perfht.ml/2LuE7Wx

I could reproduce the motionmark regressions, and I'm seeing longer than expected composites in the after profile, but nothing jumps out as me as a smoking gun:

CSS_bouncing_blend_circles (before) https://perfht.ml/2xdf6vT
CSS_bouncing_blend_circles (after) https://perfht.ml/2xkx53s

I don't quite know the best way of getting detailed easy to reproduce profiles for the damp tests. Thoughts / recommendations? Just talos-test with --geckoProfile? That didn't yield anything useful for the motionmark tests, but maybe I'm getting something wrong?

In all cases I would have expected moving glFinishObjectAPPLE before composition to make glFinishObjectAPPLE show up in the profiles, which it doesn't. I'm also surprised that the GPU has finished with these textures at the beginning of the composition in which they are no longer referenced. I would have expected a bit more latency than that? But maybe my mental model of all of this is off.
(Assignee)

Comment 87

10 months ago
Eh - looking at the graphs I'm thinking damp and displaylist_mutate might be due to chance on the baseline. motionmark is still not looking great though. Something like a 3-5% regression? Still not sure why.
Flags: needinfo?(mconley)
(Assignee)

Comment 88

10 months ago
Here's some profiles of the motionmark_animometer Design test, before and after:

motionmark_animometer Design (before) https://perfht.ml/2J3c3YA
motionmark_animometer Design (after) https://perfht.ml/2xs2SQ6

tldr; the before shows a ton of time spent in the compositor in memmoves copying the textures all the way down through the driver. The second profile gets rid of this, but trades it for time spent in glFinishObject and time spent in the content process waiting on the IPC response which happens once the glFinishObject completes.

Markus, do you have any ideas for how to tackle this problem? Locally my throughput for the test is higher in my version than in the baseline, but we spend more time that I would like to blocking in the content process. I can't think of a way to solve this within the scope of this bug, but maybe I'm missing something. It seems like the GPU just has a lot of work we've given it, and so if we block on that work then we're going to spend a lot of time blocking.
Flags: needinfo?(mstange)
(Reporter)

Comment 89

10 months ago
Just responding quickly, I'd recommend trying to produce a more descriptive timeline of what things are being waited on when.

When I was initially figuring out how to best work with client storage I used the logviz.html that I'll attach. It's super simple and you'll probably need to modify to work best, but it basically just takes the first int of a line and interprets that as a timestamp and draws the text of the line to a canvas.

Try it out with the following content in the text box to get a feeling:

4 First
90000 Second
300000 Third
(Reporter)

Comment 90

10 months ago
Posted file logviz.html
Flags: needinfo?(jmuizelaar)
Looking over these profiles, I don't think I'm going to be much help, I'm afraid. :(

I will say, however, that these patches seem to address bug 1453080 for me, which I'm very excited about.
Flags: needinfo?(mconley)
(Assignee)

Comment 92

10 months ago
Posted file logviz.html
Created a visualization of a log of textures locking/unlocking/etc. per Jeff's recommendation. This log is captured from a particular motionmark animometer test[1].

To explain what's going on with all the colors and layout: the two columns on the right are the logs of the parent and content process. They have colored bars to the left indicating when certain events begin and end, roughly mirroring perf.html's view. Grey is a RefreshDriverTick, red is a LayerTransaction, green is a Rasterize, blue is SyncTextures (waiting on texture IDs to be notified as unlocked by the parent process), and yellow is a glFinishObject call. Regarding the text on the left, some of the log entries have numbers after them - these numbers represent the affected texture IDs.

The view on the right is a set of per-texture timelines. Each column corresponds to a particular TextureHost/Client (with the ID of that texture at the top). The grey bar is the time that it's locked out by the parent - beginning and ending in TiledLayerBufferComposite::UseTiles. The blue is us waiting in SyncTextures in the content process for those textures, and the yellow is the glFinishObject call for that texture.

Everything looked more or less as I expected it to in the 1x timescale view. However, if you look at the 100x view you'll see large periods of time with certain textures being locked, in a way that it actually creates a clear visual pattern. I suspect that this is just due to these textures finding themselves stuck behind the textures we're glFinishObject'ing, and the ordering just makes that pattern look the way it does. It's still interesting to think about.

If anyone has any thoughts about what this shows or what else I should include in the log, let me know.

[1]: https://browserbench.org/MotionMark/developer.html?test-interval=15&display=minimal&tiles=big&controller=fixed&frame-rate=30&kalman-process-error=1&kalman-measurement-error=4&time-measurement=performance&suite-name=Animometer&test-name=Design&complexity=17
(Reporter)

Comment 93

10 months ago
The log looks really nice. Can you include uses of the textures (both glBindTexture and glTexSubImage) and calls to glSwapBuffers in the log?
Flags: needinfo?(dothayer)
(Assignee)

Comment 94

10 months ago
Posted file logviz.html
Added info for glSwapBuffers / glBindTexture / glTex(Sub)Image2D. The orange lines all the way across are glSwapBuffers calls, the blue per-texture lines are glBindTexture calls, and the green lines are glTexSubImage2D calls.
Flags: needinfo?(dothayer)
(Reporter)

Comment 95

10 months ago
My guess is that the finishing might be getting blocked by texture upload because we're interleaving finishObject and texSubImage2d. Can we make sure we do all of the finishing before starting any new uploads?
Flags: needinfo?(dothayer)
(Assignee)

Comment 96

10 months ago
Posted file logviz.html
It's a similar story with the glFinishObject calls all preceding the glTexSubImage2D calls. Attaching a log if you want to take a look for yourself - in short, the bars all look the same, but the glTexSubImage2D lines are just moved to the bottom of the cycle.
Flags: needinfo?(dothayer)
(Assignee)

Comment 97

10 months ago
At the risk of sounding ignorant, it feels like the big problem here is that we're blitting so many overlapping buffers on the GPU? I don't know too much about the background behind this, but it doesn't look like that's what Chrome or Safari are doing - it looks like they're squashing the layers in this test before compositing?
(Reporter)

Comment 98

10 months ago
(In reply to Doug Thayer [:dthayer] (PTO on June 4) from comment #97)
> At the risk of sounding ignorant, it feels like the big problem here is that
> we're blitting so many overlapping buffers on the GPU? I don't know too much
> about the background behind this, but it doesn't look like that's what
> Chrome or Safari are doing - it looks like they're squashing the layers in
> this test before compositing?

It is possible that we're just running into the limits of the GPU. It would be nice if we could confirm this somehow though.

Yes, I Safari at least is not using separate layers for all the items.
(Assignee)

Comment 99

10 months ago
(In reply to Jeff Muizelaar [:jrmuizel] on parental leave until at least June 25 from comment #98)
> It is possible that we're just running into the limits of the GPU. It would
> be nice if we could confirm this somehow though.

I'm trying to sort out a way of confirming it. The best I can come up with is that if we remove the glFinishObject calls, the framerate score remains the same. I'm attaching two more visualizations of logs with and without glFinishObject to demonstrate the similarities.

The time spent in glSwapBuffers seems to be what's really gating our score here (let me know if that seems right). The key problem with the approach though is that when the GPU is our bottleneck, we're not free with this approach to, say, run scripts, since we're blocking on this in the main thread of the content process, no?
(Assignee)

Comment 100

10 months ago
(Assignee)

Comment 101

10 months ago

Updated

10 months ago
Whiteboard: [gfx-noted][qf:p3][fxperf:p1][qf:f64] → [gfx-noted][fxperf:p1][qf:p3:f64]
(Assignee)

Comment 102

10 months ago
If I force it all into one layer by always returning false for ActiveLayerTracker::IsStyleAnimated, our performance on the test goes up dramatically. Matt, do you have an idea of how hard it would be / what difficulties we might run into trying to implement a heuristic that would avoid the layer explosion for cases with many overlapping active layers?
Flags: needinfo?(matt.woodrow)
(In reply to Doug Thayer [:dthayer] (PTO on June 4) from comment #102)
> If I force it all into one layer by always returning false for
> ActiveLayerTracker::IsStyleAnimated, our performance on the test goes up
> dramatically. Matt, do you have an idea of how hard it would be / what
> difficulties we might run into trying to implement a heuristic that would
> avoid the layer explosion for cases with many overlapping active layers?

So it sounds like this testcase has a bunch of stacked layers that it manipulates the transform on to get the 3d effect. It also modifies the colour of each layer, so we have to repaint everything, and layerization isn't helping much.

The transform changes come from JS, so we don't get to have async animations either.

It sounds like IsStyleAnimated is returning true in the hope that we can just push the layers around without repainting, but the colour change cancels that out. 

I think we should be able to detect that, either by counting style changes to other properties, or checking if the frame is invalidated?

Jamie is the go-to guy for layerization optimizations at the moment, so ni? him too.
Flags: needinfo?(matt.woodrow) → needinfo?(jnicol)
Comment hidden (off-topic)
(Assignee)

Updated

10 months ago
Depends on: 1467619
(Assignee)

Comment 105

9 months ago
The remaining talos regressions after Bug 1467619 and Bug 1468401 seem to all be things like rasterflood_gradient which run in ASAP mode, causing us to check if textures are locked a tiny bit too early (around .15 to .3 ms locally). This incurs a small hit to the overall test score (I'm seeing about 3%). I'm not sure what to do about this. We're not actually waiting on the GPU - we're just waiting on the compositor to wake up for that frame and check the textures to unlock them. Short of scheduling the compositor to check this early, I don't know what options are open here.

Still investigating, but any thoughts/opinions are welcome.
(Assignee)

Comment 106

9 months ago
Hmm, I think I just need to send an async message via our SharedMemory backchannel up to the compositor to recheck its textures before flushing async paints.
We were wondering if this issue was related.
https://webcompat.com/issues/9930

It's about this site https://code.visualstudio.com/docs/?dv=osx

And the reduced test case to reproduce is 
http://la-grange.net/2018/05/30/test/svg-animation.html

Comment 108

9 months ago
I filed bug 1469594 based on the reduced test case in comment 107. But I think my new bug is not related to the question in comment 107.
Comment hidden (mozreview-request)
Comment hidden (mozreview-request)
Comment hidden (mozreview-request)
Comment hidden (mozreview-request)
Comment hidden (mozreview-request)
Comment hidden (mozreview-request)
Comment hidden (mozreview-request)
Comment hidden (mozreview-request)
Comment hidden (mozreview-request)
(Assignee)

Updated

9 months ago
Attachment #8979708 - Attachment is obsolete: true
(Assignee)

Updated

9 months ago
Attachment #8980356 - Attachment is obsolete: true
(Assignee)

Updated

9 months ago
Attachment #8980357 - Attachment is obsolete: true
(Assignee)

Comment 118

9 months ago
I wanted to make sure I get these up before the weekend. Markus, feel free to kick any/all of these to someone else if it's a better fit. In any case, there's still two talos regressions here:

tresize (~2%) - Due to the CGLLockContext call in the resize handler. It seems we lock things up, which doesn't go over well when we're trying to glFinishObject our textures and wait on that in the compositor
tsvgx (~8%) - This seems to be limited to ASAP mode, as far as I can tell. The compositor is stuck in SwapBuffers, and the content process is waiting on the compositor from SyncTextures. I can't really come up with a good way around this - I'm open to suggestions, but given the real world benefits of this change I think it's worth it to eat the regression

Anyway, the plan was to enable this on Nightly with the pref, and see how it works out with peoples' real workflows.
Hey mattwoodrow - would you feel comfortable reviewing any of this stuff, or should we wait until mstange is back?
Flags: needinfo?(matt.woodrow)

Comment 120

9 months ago
mozreview-review
Comment on attachment 8979705 [details]
Bug 1265824 - Remove the unsed member in GLTextureSource

https://reviewboard.mozilla.org/r/245854/#review259476
Attachment #8979705 - Flags: review+

Comment 121

9 months ago
mozreview-review
Comment on attachment 8979706 [details]
Bug 1265824 - Add a new texture source type "DirectMapTextureSource"

https://reviewboard.mozilla.org/r/245856/#review259504

::: gfx/layers/opengl/TextureHostOGL.h:316
(Diff revision 2)
> +private:
> +  bool UpdateInternal(gfx::DataSourceSurface* aSurface,
> +                      nsIntRegion* aDestRegion,
> +                      gfx::IntPoint* aSrcOffset,
> +                      bool aInit);
> +  GLsync mSync;

What is this for? I can't see it used in any of the patches. Can we get rid of it?

Comment 122

9 months ago
mozreview-review
Comment on attachment 8979707 [details]
Bug 1265824 - Pass the texture direct mapping info to all texture creating functions

https://reviewboard.mozilla.org/r/245858/#review259508

Can we add this to the TextureAllocationFlags instead of passing an extra bool to all the ::Create functions? All the TextureFactoryIdentifier changes look great.

Comment 123

9 months ago
mozreview-review
Comment on attachment 8979709 [details]
Bug 1265824 - implement CreateDataTextureSourceAroundYCbCr() and CreateDataTextureSourceAround() for CompositorOGL

https://reviewboard.mozilla.org/r/245862/#review259510
Attachment #8979709 - Flags: review+

Comment 124

9 months ago
mozreview-review
Comment on attachment 8979710 [details]
Bug 1265824 - handle the texture uploading and readLock related things for direct mapping texture source

https://reviewboard.mozilla.org/r/245864/#review259512

::: gfx/layers/composite/TextureHost.cpp:1013
(Diff revision 2)
>    if (!mHasIntermediateBuffer && EnsureWrappingTextureSource()) {
> +    if (mFirstSource && mFirstSource->IsDirectMap()) {
> +      // The direct mapping texture source still needs to call Update() to
> +      // upload the new data. Only skip the uploading if we have the same
> +      // serial id.
> +      if (mFirstSource->GetUpdateSerial() == mUpdateSerial) {

It looks like MaybeUpload already checks the serial, so you shouldn't need to do so here.
Attachment #8979710 - Flags: review+

Comment 125

9 months ago
mozreview-review
Comment on attachment 8979711 [details]
Bug 1265824 - Don't recycle the texture source if it's direct mapping

https://reviewboard.mozilla.org/r/245866/#review259514

I might have misunderstood how some of this work, but I thought we still had actual GPU memory associated with the texture? The client storage code just lets OpenGL manage when they want to upload into it etc.

If that's the case, then recycling those should save an allocation shouldn't it?

Comment 126

9 months ago
mozreview-review
Comment on attachment 8979712 [details]
Bug 1265824 - Wait on texture handles with IPC

https://reviewboard.mozilla.org/r/245868/#review259516

I haven't gone through this one properly yet, will try do so tomorrow.

It's probably worth putting StaticMonitor into a new file (and patch, with the OffTheBooksCondVar changes) so that other people can use it. It's going to need a review from an xpcom peer regardless, so separate patch is probably easiest.

I also see that this switches us from TEXTURE_RECTANGLE_ARB to TEXTURE_2D. Why the change here? I thought generally all apple extensions required TEXTURE_RECTANGLE.

Comment 127

9 months ago
mozreview-review
Comment on attachment 8987174 [details]
Bug 1265824 - Guard client storage stuff behind a pref

https://reviewboard.mozilla.org/r/252418/#review259518
Attachment #8987174 - Flags: review+

Comment 128

9 months ago
mozreview-review
Comment on attachment 8987175 [details]
Bug 1265824 - Don't use client storage for overly large textures

https://reviewboard.mozilla.org/r/252420/#review259520
(In reply to Mike Conley (:mconley) (:⚙️) (Catching up on needinfos / reviews) from comment #119)
> Hey mattwoodrow - would you feel comfortable reviewing any of this stuff, or
> should we wait until mstange is back?

I'll do what I can! Still would really like mstange to have a look over them, even if it's just a rubber-stamp.
Flags: needinfo?(matt.woodrow)
(Assignee)

Comment 130

9 months ago
(In reply to Matt Woodrow (:mattwoodrow) from comment #126)
> Comment on attachment 8979712 [details]
> Bug 1265824 - Wait on texture handles with IPC
> 
> https://reviewboard.mozilla.org/r/245868/#review259516
> 
> I haven't gone through this one properly yet, will try do so tomorrow.
> 
> It's probably worth putting StaticMonitor into a new file (and patch, with
> the OffTheBooksCondVar changes) so that other people can use it. It's going
> to need a review from an xpcom peer regardless, so separate patch is
> probably easiest.
> 
> I also see that this switches us from TEXTURE_RECTANGLE_ARB to TEXTURE_2D.
> Why the change here? I thought generally all apple extensions required
> TEXTURE_RECTANGLE.

I couldn't measure or perceive any benefits from using TEXTURE_RECTANGLE_ARB - simply using TEXTURE_2D was enough to get rid of all of the memmoves in my profiles. Switching from TEXTURE_2D to TEXTURE_RECTANGLE_ARB however did cause visual regressions with some videos on reddit that I couldn't get to the bottom of. I would be more interested in discovering the cause of those regressions and potentially fixing them if I knew there were some performance gain, but as far as I can tell there isn't. This was surprising to me since the documentation indicates I should see a win, but I don't.

The other surprise was that I couldn't perceive any improvement from the glTextureRangeAPPLE call, though I did see gains from using GL_STORAGE_CACHED_APPLE, which as far as I could tell is supposed to be used in conjunction with glTextureRangeAPPLE. AFAIK glTextureRangeAPPLE is supposed to be used to map *multiple* textures within contiguous buffers, which is a bit tricky to manage, though I did play around with it. I think it might be interesting playing around with this, but for all the dependent bugs the texture upload cost was reduced to near nothing, so I'm not sure it's worth the effort.
Comment hidden (mozreview-request)
Comment hidden (mozreview-request)
Comment hidden (mozreview-request)
Comment hidden (mozreview-request)
Comment hidden (mozreview-request)
Comment hidden (mozreview-request)
Comment hidden (mozreview-request)
Comment hidden (mozreview-request)
(Assignee)

Updated

9 months ago
Attachment #8979711 - Attachment is obsolete: true
Attachment #8979711 - Flags: review?(mstange)
(Assignee)

Updated

9 months ago
Attachment #8987175 - Attachment is obsolete: true
Attachment #8987175 - Flags: review?(mstange)

Comment 139

9 months ago
mozreview-review
Comment on attachment 8988938 [details]
Bug 1265824 - Add StaticMonitor

https://reviewboard.mozilla.org/r/254060/#review261006

::: xpcom/base/StaticMonitor.h:80
(Diff revision 1)
> +  Atomic<OffTheBooksMutex*> mMutex;
> +  Atomic<OffTheBooksCondVar*> mCondVar;

This is a little weird, but I assume that having separate member variables rather than a single `Atomic<OffTheBooksMonitor*>` is slightly less work, or something?  I think to have `Atomic<OffTheBooksMonitor*>`, we'd need all the stuff you implemented here, plus an extra `OffTheBooksMonitor` class, so it would be more code.  WDYT?

::: xpcom/base/StaticMonitor.h:89
(Diff revision 1)
> +  // Disallow copy constructor, but only in debug mode.  We only define
> +  // a default constructor in debug mode (see above); if we declared
> +  // this constructor always, the compiler wouldn't generate a trivial
> +  // default constructor for us in non-debug mode.
> +#ifdef DEBUG
> +  StaticMonitor(StaticMonitor& aOther);

Please make the argument here `const`.  Bonus points if you file a follow-up for fixing `StaticMutex` in the same way, since I imagine some of the code was copy-pasta'd into this bug.

::: xpcom/base/StaticMonitor.h:93
(Diff revision 1)
> +#ifdef DEBUG
> +  StaticMonitor(StaticMonitor& aOther);
> +#endif
> +
> +  // Disallow these operators.
> +  StaticMonitor& operator=(StaticMonitor* aRhs);

I think this should be `operator=(const StaticMonitor&)`?  Having a declaration for `operator=(StaticMonitor*)` is a bit strange, as the compiler won't auto-generate a method with such a signature.
Attachment #8988938 - Flags: review?(nfroyd) → review+
(In reply to Doug Thayer [:dthayer] from comment #118)
> In any case, there's still two talos regressions here:
> 
> tresize (~2%) - Due to the CGLLockContext call in the resize handler. It
> seems we lock things up, which doesn't go over well when we're trying to
> glFinishObject our textures and wait on that in the compositor
> tsvgx (~8%) - This seems to be limited to ASAP mode, as far as I can tell.
> The compositor is stuck in SwapBuffers, and the content process is waiting
> on the compositor from SyncTextures. I can't really come up with a good way
> around this - I'm open to suggestions, but given the real world benefits of
> this change I think it's worth it to eat the regression

I agree, these regressions absolutely seem worth eating. We have tons of evidence of the positive effects of this change, and the tsvgx regression, being ASAP-mode-only, is not user facing.

I'll try to review these patches over the next several days.
Flags: needinfo?(mstange)

Comment 141

9 months ago
mozreview-review
Comment on attachment 8979706 [details]
Bug 1265824 - Add a new texture source type "DirectMapTextureSource"

https://reviewboard.mozilla.org/r/245856/#review261264

::: gfx/layers/composite/TextureHost.h:189
(Diff revision 3)
> +  // The direct-map cpu buffer should be alive when gpu uses it. And it
> +  // should not be updated while gpu reads it. This Sync() function
> +  // implements this synchronized behavior.

This comment should probably mention that this method blocks until the GPU is done with this texture. The current comment did not make this obvious to me. There's a comment in the DirectMapTextureSource implementation that mentions this, but I think it should be mentioned here as well.
I realized that, while I have a good idea of what we want to achieve here, I don't actually know the code that's being changed here all that well. Matt / nical / sotaro are actually a much better choice of reviewer for this code.
I've looked through all patches except the one that adds the waiting and they all look good to me. I'll try to look at the waiting patch soon.

Comment 143

9 months ago
mozreview-review
Comment on attachment 8979706 [details]
Bug 1265824 - Add a new texture source type "DirectMapTextureSource"

https://reviewboard.mozilla.org/r/245856/#review262396
Attachment #8979706 - Flags: review?(matt.woodrow) → review+

Comment 144

9 months ago
mozreview-review
Comment on attachment 8979707 [details]
Bug 1265824 - Pass the texture direct mapping info to all texture creating functions

https://reviewboard.mozilla.org/r/245858/#review262398
Attachment #8979707 - Flags: review?(matt.woodrow) → review+

Comment 145

9 months ago
mozreview-review
Comment on attachment 8979712 [details]
Bug 1265824 - Wait on texture handles with IPC

https://reviewboard.mozilla.org/r/245868/#review262400

Looking really good! I'm excited to have this landed.

Sorry for the slow reviews, was travelling last week.

We might need someone else to review the ipc/glue bits, though I'm not sure anyone still knows that code.

::: gfx/layers/client/MultiTiledContentClient.cpp:264
(Diff revision 5)
>    oldRetainedTiles.Clear();
>  
>    nsIntRegion paintRegion = aPaintRegion;
>    nsIntRegion dirtyRegion = aDirtyRegion;
> +
> +  AutoTArray<uint64_t, 10> syncTextureSerials;

Can you put this block into a helper? Update is a bit long already :)

::: gfx/layers/opengl/TextureHostOGL.cpp:364
(Diff revision 5)
>  {
>    gl()->MakeCurrent();
>  
>    if (aInit) {
>      gl()->fGenTextures(1, &mTextureHandle);
> -    gl()->fBindTexture(LOCAL_GL_TEXTURE_RECTANGLE_ARB, mTextureHandle);
> +    gl()->fBindTexture(LOCAL_GL_TEXTURE_2D, mTextureHandle);

Would be great if you could fold the GL_TEXTURE_2D changes into the initial patch that adds DirectMapTextureSource.

::: ipc/glue/SharedMemoryBasic_mach.h:54
(Diff revision 5)
>  
> +  static void RegisterTextureSourceProvider(layers::TextureSourceProvider* textureSourceProvider);
> +  static void UpdateTextureLocks(base::ProcessId aProcessId);
> +  static bool WaitForTextures(base::ProcessId aProcessId, const nsTArray<uint64_t>& aTextureIds);
> +  static void SetTexturesLocked(pid_t pid, const nsTArray<uint64_t>& textureIds);
> +  static void SetTexturesUnlocked(pid_t pid, const nsTArray<uint64_t>& textureIds);

Hmm, I'm not a huge fan of adding texture logic into the SharedMemoryBasic class.

My understanding of this code is that it's currently code that sets up a mach ipc connection between processes, and then some code on top of that which shares memory handles across the connection. Adding texture locking on the same connection makes the class feel a bit awkward.

Do you think you could try split the existing statics out into a separate class/namespace, and then have the texture code use that directly? Would be nice to put most (all?) of the texture-specific logic into gfx/layers somewhere.

I think we're going to want to extend this again in the future (to support sharing IOSurface objects between processes), so it'd be nice to have a full abstraction around the mach ipc bits, but I don't want to block this landing on that. Just a basic separation here would be a good step in the right direction.

::: ipc/glue/SharedMemoryBasic_mach.mm:125
(Diff revision 5)
>    kCleanupMsg,
>  };
>  
> +const int kTextureLockTimeout = 32; // We really don't want to wait more than
> +                                    // two frames for a texture to unlock. This
> +                                    // in any case be very uncommon.

Missing a word from this comment.

::: ipc/glue/SharedMemoryBasic_mach.mm:281
(Diff revision 5)
>    if (gMemoryCommPorts.find(pid) == gMemoryCommPorts.end()) {
>      // We don't have the ports open to communicate with that pid, so we're going to
>      // ask our parent process over IPC to set them up for us.
>      if (gParentPid == 0) {
>        // If we're the top level parent process, we have no parent to ask.
> -      LOG_ERROR("request for ports for pid %d, but we're the chrome process\n", pid);
> +      // LOG_ERROR("request for ports for pid %d, but we're the chrome process\n", pid);

Why is this error disabled?

::: ipc/glue/SharedMemoryBasic_mach.mm:547
(Diff revision 5)
> +}
> +
> +void
> +SharedMemoryBasic::RegisterTextureSourceProvider(layers::TextureSourceProvider* textureSourceProvider)
> +{
> +  gTextureSourceProvider = textureSourceProvider;

I don't think we can use a global here.

Compositors are one-per-window, so if you open a second window I'd expect this value to get overwritten.
Attachment #8979712 - Flags: review?(matt.woodrow)
Comment hidden (mozreview-request)
Comment hidden (mozreview-request)
Comment hidden (mozreview-request)
Comment hidden (mozreview-request)
Comment hidden (mozreview-request)
Comment hidden (mozreview-request)
Comment hidden (mozreview-request)
Comment hidden (mozreview-request)

Comment 154

9 months ago
mozreview-review
Comment on attachment 8979712 [details]
Bug 1265824 - Wait on texture handles with IPC

https://reviewboard.mozilla.org/r/245868/#review262992

::: gfx/layers/TextureSourceProvider.cpp:31
(Diff revision 6)
> +    if (bufferTexture && bufferTexture->IsDirectMap()) {
> -    texture->ReadUnlock();
> +      texture->ReadUnlock();
> +      auto actor = texture->GetIPDLActor();
> +      if (actor) {
> +        base::ProcessId pid = actor->OtherPid();
> +        nsTArray<uint64_t>* textureIds = texturesIdsToUnlockByPid.LookupOrAdd(pid);

Might be worth making this code OSX only too, since it's needless overhead (if we're not going to process the textureIdsToUnlockByPid hashtable) and will affect Windows.

::: gfx/layers/TextureSync.cpp:64
(Diff revision 6)
> +
> +  return &gProcessTextureIds.at(pid);
> +}
> +
> +bool
> +WaitForTextureIdsToUnlock(pid_t pid, uint64_t* textureIds, uint32_t textureIdsLength)

You could use mozilla::Span to wrap the pointer/length array and get nicer for each syntax on the loop.

::: gfx/layers/TextureSync.cpp:82
(Diff revision 6)
> +
> +      if (allCleared) {
> +        return true;
> +      }
> +
> +      if (lock.Wait(TimeDuration::FromMilliseconds(kTextureLockTimeout)) == CVStatus::Timeout) {

Is it possible for us to get woken up (not due to timeout), but still not have all the textures we're waiting for unlocked?

And if so, would it be possible for that to happen repeatedly, such that the total time we spend blocking here is considerably longer than 32ms?

::: gfx/layers/TextureSync.cpp:132
(Diff revision 6)
> +}
> +
> +void
> +TextureSync::RegisterTextureSourceProvider(TextureSourceProvider* textureSourceProvider)
> +{
> +  RefPtr<TextureSourceProvider> ref(textureSourceProvider);

You shouldn't need to construct an explicit RefPtr here I don't think, just calling gTextureSourceProviders.AppendElement(textureSourceProvider) should do the right thing.

::: gfx/layers/client/ClientLayerManager.cpp:321
(Diff revision 6)
>  ClientLayerManager::EndTransactionInternal(DrawPaintedLayerCallback aCallback,
>                                             void* aCallbackData,
>                                             EndTransactionFlags)
>  {
> +  if (mForwarder) {
> +    mForwarder->UpdateTextureLocks();

Copying the commit message comment about this into an inline comment would be worth it imo.

::: gfx/layers/client/MultiTiledContentClient.cpp:300
(Diff revision 6)
>    oldRetainedTiles.Clear();
>  
>    nsIntRegion paintRegion = aPaintRegion;
>    nsIntRegion dirtyRegion = aDirtyRegion;
> +
> +  MaybeSyncTextures(paintRegion, newTiles, scaledTileSize);

Not worth blocking landing this, but I think we should consider a non-blocking approach as a follow-up.

We have existing code (in GetBackBuffer) that checks if a tile is locked, and reallocates from the pool if it's not available (rather than blocking and waiting).

Blocking is a bit sad, since it's dead time that we can't do anything else (like run JS), and I suspect that it's frequently faster to just copy valid pixels from the locked texture into an unlocked one in the pool.

I think it'd be worth trying unifying this with the existing lock code, and to see if the non-blocking approach helps.
Attachment #8979712 - Flags: review?(matt.woodrow) → review+
Comment hidden (mozreview-request)
Comment hidden (mozreview-request)
Comment hidden (mozreview-request)
(Assignee)

Updated

9 months ago
Attachment #8991332 - Attachment is obsolete: true
(Assignee)

Comment 158

9 months ago
mozreview-review-reply
Comment on attachment 8988938 [details]
Bug 1265824 - Add StaticMonitor

https://reviewboard.mozilla.org/r/254060/#review261006

> This is a little weird, but I assume that having separate member variables rather than a single `Atomic<OffTheBooksMonitor*>` is slightly less work, or something?  I think to have `Atomic<OffTheBooksMonitor*>`, we'd need all the stuff you implemented here, plus an extra `OffTheBooksMonitor` class, so it would be more code.  WDYT?

Yeah - it felt a little awkward but it felt like just going with less code was the right thing. I think it works out to be a little bit less work on average if two threads were to call it around the same time, since there should in theory be a smaller window for racing on creating both variables.
(Assignee)

Comment 159

9 months ago
(In reply to Matt Woodrow (:mattwoodrow) from comment #145)
> We might need someone else to review the ipc/glue bits, though I'm not sure
> anyone still knows that code.

Jed, is there any chance you could take a quick look at the SharedMemoryBasic_mach.* and TextureSync.* (WRT the IPC bits) changes here? I realize this is a stretch, but I'm struggling to find a proper owner of this code who's still active, and you did the most recent review. If you can think of someone else who'd be better equipped to review it, let me know.
Flags: needinfo?(jld)
I'd actually been hoping to delete SharedMemoryBasic_mach entirely and use shm_open instead, in order to make bug 1473156 (blocking the main thread for 50-100ms on every process launch while the Mach IPC gets bootstrapped) more tractable.

On the other hand, if we need to keep it or something like it around to support other uses of Mach IPC, it could be refactored a little to do the setup asynchronously, because it's already creating a dedicated thread per child process and managing per-other-process state.  This also goes nicely with the idea in comment #145 to break the core port-wrangling out of SharedMemoryBasic (which I agree with; wedging in graphics hooks like that is… not ideal).  So that isn't as much of a problem as I first thought.

I'll flag myself for review to take a closer look at the patch; I don't know this code well (yet) but I have been actively working on shared memory in general.
Flags: needinfo?(jld)
(Assignee)

Comment 162

8 months ago
Hey Jed, I know you're likely busy, so don't feel pressured, but I am hoping to get this landed and any kinks worked out before I go on paternity leave in the next few weeks (unsure when), so if there's any chance you could take a look soonish I would really appreciate it. Or if any of it looks like bits that are not time-sensitive which I could get out of the way as followups, let me know.
Flags: needinfo?(jld)
Comment on attachment 8979712 [details]
Bug 1265824 - Wait on texture handles with IPC

https://reviewboard.mozilla.org/r/245868/#review264558

I'm not a huge fan of how this is set up with `MemoryPorts` escaping into layout, but I'm planning to refactor this anyway so it won't make much difference in the end.

::: ipc/glue/SharedMemoryBasic_mach.h:15
(Diff revision 7)
>  #include "base/file_descriptor_posix.h"
>  #include "base/process.h"
>  
>  #include "SharedMemory.h"
>  #include <mach/port.h>
> +#include "chrome/common/mach_ipc_mac.h"

Style nit: This could use forward declarations instead of depending on the header, I think?
Attachment #8979712 - Flags: review?(jld) → review+
Flags: needinfo?(jld)
Comment hidden (mozreview-request)
Comment hidden (mozreview-request)
Comment hidden (mozreview-request)
Comment hidden (mozreview-request)
Comment hidden (mozreview-request)
Comment hidden (mozreview-request)
Comment hidden (mozreview-request)
Comment hidden (mozreview-request)

Comment 172

8 months ago
We're sorry, Autoland could not rebase your commits for you automatically. Please manually rebase your commits and try again.

hg error in cmd: hg rebase -s 58fbf85eef5898898cde3f3e7934d84111ee22f2 -d bdf8618dc2c3: rebasing 473397:58fbf85eef58 "Bug 1265824 - Remove the unsed member in GLTextureSource r=mattwoodrow"
rebasing 473398:30a9dbb81dd5 "Bug 1265824 - Add a new texture source type "DirectMapTextureSource" r=mattwoodrow"
rebasing 473399:d7889c23e19a "Bug 1265824 - Pass the texture direct mapping info to all texture creating functions r=mattwoodrow"
rebasing 473400:ee7c181cd1db "Bug 1265824 - implement CreateDataTextureSourceAroundYCbCr() and CreateDataTextureSourceAround() for CompositorOGL r=mattwoodrow"
rebasing 473401:68d8bb87264e "Bug 1265824 - handle the texture uploading and readLock related things for direct mapping texture source r=mattwoodrow"
rebasing 473402:746e0cb37cb2 "Bug 1265824 - Add StaticMonitor r=froydnj"
rebasing 473403:fddd4696126c "Bug 1265824 - Wait on texture handles with IPC r=jld,mattwoodrow"
merging gfx/layers/client/MultiTiledContentClient.cpp
merging gfx/layers/moz.build
warning: conflicts while merging gfx/layers/client/MultiTiledContentClient.cpp! (edit, then use 'hg resolve --mark')
unresolved conflicts (see hg resolve, then hg rebase --continue)
Comment hidden (mozreview-request)
Comment hidden (mozreview-request)
Comment hidden (mozreview-request)
Comment hidden (mozreview-request)
Comment hidden (mozreview-request)
Comment hidden (mozreview-request)
Comment hidden (mozreview-request)
Comment hidden (mozreview-request)

Comment 181

8 months ago
Pushed by dothayer@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/367abce30668
Remove the unsed member in GLTextureSource r=mattwoodrow
https://hg.mozilla.org/integration/autoland/rev/efce316a4425
Add a new texture source type "DirectMapTextureSource" r=mattwoodrow
https://hg.mozilla.org/integration/autoland/rev/984133e9614b
Pass the texture direct mapping info to all texture creating functions r=mattwoodrow
https://hg.mozilla.org/integration/autoland/rev/4731dc56702d
implement CreateDataTextureSourceAroundYCbCr() and CreateDataTextureSourceAround() for CompositorOGL r=mattwoodrow
https://hg.mozilla.org/integration/autoland/rev/be68741ff4ce
handle the texture uploading and readLock related things for direct mapping texture source r=mattwoodrow
https://hg.mozilla.org/integration/autoland/rev/51795de4adaf
Add StaticMonitor r=froydnj
https://hg.mozilla.org/integration/autoland/rev/b5ba15b1a70f
Wait on texture handles with IPC r=jld,mattwoodrow
https://hg.mozilla.org/integration/autoland/rev/1099d6f15f9f
Guard client storage stuff behind a pref r=mattwoodrow
Comment hidden (mozreview-request)
Comment hidden (mozreview-request)
(Assignee)

Comment 185

8 months ago
Quick update - just trying to diagnose this crashtest failure: https://treeherder.mozilla.org/logviewer.html#?job_id=188974850&repo=try&lineNumber=49882
Flags: needinfo?(dothayer)
(Assignee)

Comment 186

8 months ago
I think I've got it boiled down to a texture size issue. I was missing logic in a couple of places to ensure we don't get to a DirectMapTextureSource for textures that exceed the max texture size. Unfortunately verifying this is excruciatingly slow because I can't reproduce the issue locally and we're strapped for OSX resources on try. Hopefully this next run is green by the end of the day though.
Comment hidden (mozreview-request)
Comment hidden (mozreview-request)
Comment hidden (mozreview-request)
Comment hidden (mozreview-request)
Comment hidden (mozreview-request)
Comment hidden (mozreview-request)
Comment hidden (mozreview-request)
Comment hidden (mozreview-request)
Comment hidden (mozreview-request)
Comment hidden (mozreview-request)

Comment 197

8 months ago
mozreview-review
Comment on attachment 8994090 [details]
Bug 1265824 - Remove CreateForYCbCrWithBufferSize

https://reviewboard.mozilla.org/r/258694/#review265600
Attachment #8994090 - Flags: review?(matt.woodrow) → review+

Comment 198

8 months ago
mozreview-review
Comment on attachment 8994091 [details]
Bug 1265824 - Plug holes in texture size restrictions

https://reviewboard.mozilla.org/r/258696/#review265602
Attachment #8994091 - Flags: review?(matt.woodrow) → review+

Comment 199

8 months ago
Pushed by dothayer@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/bf9d73b214fc
Remove the unsed member in GLTextureSource r=mattwoodrow
https://hg.mozilla.org/integration/autoland/rev/6aecd088e02c
Add a new texture source type "DirectMapTextureSource" r=mattwoodrow
https://hg.mozilla.org/integration/autoland/rev/295dd1a6a09f
Pass the texture direct mapping info to all texture creating functions r=mattwoodrow
https://hg.mozilla.org/integration/autoland/rev/71b23fbe1fec
implement CreateDataTextureSourceAroundYCbCr() and CreateDataTextureSourceAround() for CompositorOGL r=mattwoodrow
https://hg.mozilla.org/integration/autoland/rev/39ae151b3d8c
handle the texture uploading and readLock related things for direct mapping texture source r=mattwoodrow
https://hg.mozilla.org/integration/autoland/rev/239ab9f9ef52
Add StaticMonitor r=froydnj
https://hg.mozilla.org/integration/autoland/rev/c141fb67cf9a
Wait on texture handles with IPC r=jld,mattwoodrow
https://hg.mozilla.org/integration/autoland/rev/7c90215a2eca
Guard client storage stuff behind a pref r=mattwoodrow
https://hg.mozilla.org/integration/autoland/rev/27c7daabd1a3
Remove CreateForYCbCrWithBufferSize r=mattwoodrow
https://hg.mozilla.org/integration/autoland/rev/391c8e7897df
Plug holes in texture size restrictions r=mattwoodrow
Comment hidden (mozreview-request)
Comment hidden (mozreview-request)
Comment hidden (mozreview-request)
Comment hidden (mozreview-request)
Comment hidden (mozreview-request)
Comment hidden (mozreview-request)
Comment hidden (mozreview-request)
Comment hidden (mozreview-request)

Comment 209

8 months ago
Pushed by dothayer@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/3b0192c401cf
Remove the unsed member in GLTextureSource r=mattwoodrow
https://hg.mozilla.org/integration/autoland/rev/cf95b1f62eae
Add a new texture source type "DirectMapTextureSource" r=mattwoodrow
https://hg.mozilla.org/integration/autoland/rev/1fe8be31198a
Pass the texture direct mapping info to all texture creating functions r=mattwoodrow
https://hg.mozilla.org/integration/autoland/rev/f428582ce4dc
implement CreateDataTextureSourceAroundYCbCr() and CreateDataTextureSourceAround() for CompositorOGL r=mattwoodrow
https://hg.mozilla.org/integration/autoland/rev/ee6b49ee41c9
handle the texture uploading and readLock related things for direct mapping texture source r=mattwoodrow
https://hg.mozilla.org/integration/autoland/rev/ce33d2c68119
Add StaticMonitor r=froydnj
https://hg.mozilla.org/integration/autoland/rev/c7ea73331fc8
Wait on texture handles with IPC r=jld,mattwoodrow
https://hg.mozilla.org/integration/autoland/rev/71422954f047
Guard client storage stuff behind a pref r=mattwoodrow
https://hg.mozilla.org/integration/autoland/rev/1b574f2c0ec0
Remove CreateForYCbCrWithBufferSize r=mattwoodrow
https://hg.mozilla.org/integration/autoland/rev/96ce98b72056
Plug holes in texture size restrictions r=mattwoodrow
(Assignee)

Updated

8 months ago
Flags: needinfo?(dothayer)
(Reporter)

Updated

8 months ago
Blocks: 1400787

Comment 213

8 months ago
How do I enable this? I'm on Nightly Mac 2018-07-24 and I created a new pref called gfx.allow-texture-direct-mapping and set it true and restarted Firefox. But I didn't see any dramatic differences between true and false on gsmarena.com. IIRC when I was playing with the Try version some months ago I did see an improvement. Perhaps I have done something wrong.

Is there any definitive test I can try e.g. a benchmark to see whether ClientStorageTextureSource is working or not?

Thanks
(Assignee)

Comment 214

8 months ago
(In reply to Mark from comment #213)
> How do I enable this? I'm on Nightly Mac 2018-07-24 and I created a new pref
> called gfx.allow-texture-direct-mapping and set it true and restarted
> Firefox. But I didn't see any dramatic differences between true and false on
> gsmarena.com. IIRC when I was playing with the Try version some months ago I
> did see an improvement. Perhaps I have done something wrong.
> 
> Is there any definitive test I can try e.g. a benchmark to see whether
> ClientStorageTextureSource is working or not?
> 
> Thanks

Make sure to try with today's build. There are two builds per day; it came out on build id 20180724223402, and wasn't in the build yet for 20180724100052, so you could be running a 2018-07-24 build and still not see it.

Anyway, to be certain, grab a profile with the https://perf-html.io profiler and search for "DirectMapTextureSource" in the Compositor thread. If you're seeing jank in the Compositor and no DirectMapTextureSource, you're likely not using client storage.

Comment 215

8 months ago
So it turns out that patience IS a virtue...

It works on today's Nightly. I see DirectMapTextureSource and significant improvement in gsmarena.com jank at lower resolutions, still struggling a bit at 1680x1050. I'll have a fiddle and see what else I see.

Doug, Good luck with your parental leave!
(Assignee)

Comment 216

8 months ago
(In reply to Mark from comment #215)
> So it turns out that patience IS a virtue...
> 
> It works on today's Nightly. I see DirectMapTextureSource and significant
> improvement in gsmarena.com jank at lower resolutions, still struggling a
> bit at 1680x1050. I'll have a fiddle and see what else I see.

Yeah - regarding gsmarena.com, one thing you might still see is a lot of checkerboarding, and large Rasterize times in the content process spend in ShmemTextureData::Create. From my testing this was an existing problem that was just being hidden by the previous jank. You'll see this for a while when scrolling while the texture pool fills up. I think there's more we can do here with our texture pooling, but I think it's a separate bug. (Looking into filing one about what I'm seeing right now.) Does this match what you're seeing, or are you seeing something else? Does your issue go away after 10-15 seconds of scrolling up and down on gsmarena?

> Doug, Good luck with your parental leave!

Thanks! :)
(Assignee)

Comment 217

8 months ago
(Actually - I think Matt's comment in the gsmarena bug[1] is a better account of what's still going on. This bug should help the upload performance but there's still a lot left unchanged. I still think some of this can be shaved off with pooling adjustments though.)

[1]: https://bugzilla.mozilla.org/show_bug.cgi?id=1457413#c18

Comment 218

8 months ago
(In reply to Doug Thayer [:dthayer] (On parental leave soon) from comment #216)

> Does this match what
> you're seeing, or are you seeing something else? Does your issue go away
> after 10-15 seconds of scrolling up and down on gsmarena?

When gfx.allow-texture-direct-mapping is false there is a huge jank as the sidebar scrolls up, often get 0 frames of sidebar animation, instead the whole scroll freezes for ~100s of ms and the sidebar appears.

When gfx.allow-texture-direct-mapping is true the jank gets substantially better, it's still far from butter smooth even at 1280x800 default resolution.

But in both cases there's a lot of checkerboarding particularly if I fling back up to the top of the page.

Not sure about the 10-15 seconds, maybe, maybe not.
Big AWSY perf improvement:

== Change summary for alert #14548 (as of Tue, 24 Jul 2018 01:22:55 GMT) ==

Improvements:

  8%  Resident Memory osx-10-10 opt stylo     707,995,740.35 -> 648,727,504.31

For up to date results, see: https://treeherder.mozilla.org/perf.html#/alerts?id=14548
Thanks so much to Doug as well as to everyone else who helped out here!

My Mac with lots of tabs feels like it runs the fans much less, and unlike before this push, Firefox processes almost never show up as big CPU-eaters in Activity Monitor.  I wonder if this effects enough people to deserve a release note, in the interest of re-attracting Mac users who tried after Quantum but couldn't take all the CPU usage.

Comment 221

8 months ago
Many many thanks for this implementation. This solution improves my Firefox's behaviour perceivably, especially if I have iMac retina with 5K resolution.

I'm not sure how significant improvement can this implementation bring, but it noticeably improves  my tab switching (almost instant), page scrolling (less stuttering) and video playback (better smoothness, less cpu usage) on my machine.

Once again, great job! I would say it really deserves to be mentioned in release notes.

Comment 222

8 months ago
I'm sorry for additional spamming :-( but I'm curious. Are there any plans with bugs, which this feature blocks?
(Assignee)

Updated

8 months ago
Depends on: 1477914
Duplicate of this bug: 629519
Release Note Request (optional, but appreciated)
[Why is this notable]: Important improvement on Mac
[Affects Firefox for Android]: No
[Suggested wording]: Improved the reactivity of Firefox on Mac OS X
(I am sure we can find a better wording)
[Links (documentation, blog post, etc)]: Are we going to write something about that?
relnote-firefox: --- → ?
I am adding that to Nightly 63 release notes with this wording:

Improved the reactivity of Firefox on MacOS

Looking at the patch (https://hg.mozilla.org/mozilla-central/rev/71422954f047#l2.12) this is not riding the trains and I am not seeing a bug about it in the dependencies. Is this something planned for 63 or will it be activated in a later release? 

If the former, please note that the Nightly Soft Freeze is on August 23 and preferences should not be activated after the soft freeze until the merge to beta (https://wiki.mozilla.org/Release_Management/Release_Process#Nightly_soft_code_freeze). 

Mike, do you know what the release status of this feature is? Thanks
Flags: needinfo?(jnicol) → needinfo?(mconley)
Keywords: feature
Redirecting to mstange.

Hey Markus, this has been enabled on Nightly for a few weeks now, and I don't think we've observed any ill effects. Do you think this is stable enough to ride the 63 train out to release?
Flags: needinfo?(mconley) → needinfo?(mstange)
I haven't seen any reports of problems, so yes, let's turn it on!
Flags: needinfo?(mstange)

Updated

7 months ago
Depends on: 1483533
Added to Beta 63 release notes.
relnote-firefox: ? → 63+
You need to log in before you can comment on or make changes to this bug.