Updating a matrix3d for a transform style needs improvement

RESOLVED FIXED in mozilla34

Status

()

Core
CSS Parsing and Computation
RESOLVED FIXED
4 years ago
3 years ago

People

(Reporter: Harald, Unassigned)

Tracking

Trunk
mozilla34
Points:
---

Firefox Tracking Flags

(Not tracked)

Details

(Whiteboard: [dependency: marketplace-partners][dependency: openwebperf])

Attachments

(2 attachments, 1 obsolete attachment)

(Reporter)

Description

4 years ago
[I spoke about this with roc and bz at the Firefox OS days in MV]

Setting style.transform to a 3dmatrix is really slow due to bottlenecks in CSS parsing. This impacts the apps, especially on mobile, as physics based animations and gestures are gaining traction to reach the interface responsiveness and fidelity of iOS.

Test case from https://github.com/digitarald/openwebperf
Example: http://digitarald.github.io/openwebperf/3d-matrix-transform/direct.html
Profile: http://people.mozilla.org/~bgirard/cleopatra/#report=234f70ef898c6bb7e98e3fd7f6dd52a0a5dbb0af
Most of the time is spend setting the transform style `3dmatrix`.

We spoke about 2 options:
1) JIT the `element.style.transform = '3dmatrix…` operation to skip the CSS parsing.
2) Add a new API (e.g. element.mozSetTransform(Float32Array matrix))

The JIT path had more fans, as it is a low hanging fruit; but both should be tested for viability.

I am working with various web interface framework partners to give feedback here as well.
(Reporter)

Comment 1

4 years ago
You were fired up when we spoke about this in MV :roc and :bz. I hope you are still interested in investigating this.
Flags: needinfo?(roc)
Flags: needinfo?(bzbarsky)
Whiteboard: [dependency: marketplace-partners][dependency: openwebperf]
Version: unspecified → Trunk
That profile has 8560 samples of which 5234 are in mach_msg_trap (which I'm guessing is what the profile shows when idle), leaving 3326.

set_transform is only 535 samples in that profile (15.8%).  All the parsing, and some other things, are inside that.  That's what the proposal here would help.

ProcessPendingRestyles is 934 (28.1%) (of which only 223 in ResolveStyleFor, so bug 931668 is likely to help more than bug 977991).

PresShell::Paint is 1316 samples (39.6%).

(I'm not much good at splitting profiles up in cleopatra, so I probably missed something else important.)

So I'm not completely convinced about the value of JIT'ing or otherwise speeding up the parsing phase here, though it at least seems slightly useful.
(In reply to David Baron [:dbaron] (UTC-7) (needinfo? for questions) (away/busy Aug 27-Sep 11) from comment #2)
> That profile has 8560 samples of which 5234 are in mach_msg_trap (which I'm
> guessing is what the profile shows when idle), leaving 3326.

That's correct, it is idle time.

But the sample counts of something is not actually a really useful way to reason about profiles. We care about keeping 60 FPS and a low frame latency so we need to keep the rendering pipeline running in under 16ms. The strength of the profiler is being able to focus in and look at the timeline to analysis latency and details cost of each stage.

Having as said, the style flush is 3ms and the reflow is 3ms per transaction which is totally within budget.

Looks like the problem with this demo is the compositor code can't keep up. This is forcing the main thread to sleep and wait for the compositor to catch up. This explains why the main thread is spending so much time in mach_msg_thread. We're spending a lot of time updating the APZC which is something that isn't used on Mac.

Clearing info in bz and roc. I think kats is a better person to answer this.
Flags: needinfo?(roc)
Flags: needinfo?(bzbarsky)
Flags: needinfo?(bugmail.mozilla)
So yes, we should disable walking the APZC tree on platforms where it's not used.

That being said, according to the profile 54% of the time in the compositor thread is spent in the call at [1]. If we disable walking the APZC tree this will go away, but it will come back when we enable APZC on desktop so it's something to keep in mind. Switching to the event regions for hit testing (bug 918288) might reduce this.. but it might also make it worse depending on how the event regions are constructed.

Do you want me to file a separate bug to disable APZC tree walking on platforms without APZC? I don't know if it will address the overall problem this bug was filed for.

[1] http://mxr.mozilla.org/mozilla-central/source/gfx/layers/apz/src/APZCTreeManager.cpp?rev=413e0473a1d8#394
Flags: needinfo?(bugmail.mozilla)
You're probably right that long term disabling APZC wont buy us much since we will hopefully ship it everywhere soon-ish.

It feels to me like we're just spending too much time transforming regions. I suspect we might just be hitting degenerate algorithmic complexity. Perhaps the problem is here [1]. Can we concatenate the transforms then perform a single region transformation?

[1] http://mxr.mozilla.org/mozilla-central/source/gfx/layers/apz/src/APZCTreeManager.cpp#409
Flags: needinfo?(bugmail.mozilla)
Hm good point, I forgot about that transformation. But unfortunately I don't think we can concatenate the transforms there because we actually do need the separately transformed versions as we descend in the tree.

Based on the profile I think that in the layer tree for this particular case there is just one layer which has a really degenerate visible region and is eating up most of the time. (a) Do we expect this to happen often, and (b) is this the only place in the code where having a degenerate visible region hurts us? If it isn't and we somehow fix this then we'll just move the bottleneck somewhere else. In that case a better fix would be to try and avoid creating layers with such degenerate visible regions.
Flags: needinfo?(bugmail.mozilla)
Typically the visible region is limited to 8 pieces (was 3 or 4).

Can we get a layers.dump attach to the bug? Flip layers.dump, reproduce the problem and attach a single dump sent to stderr/logcat. It will spew the layer tree each frame.
Flags: needinfo?(hkirschner)
Created attachment 8476143 [details]
Layer dump on OS X (slightly old nightly)

I got a layers dump on OS X using the example page linked in comment 0. Turns out the visible region is pretty simple, but there's a lot of layers and each one has nontrivial transform.
Flags: needinfo?(hkirschner)
Created attachment 8476208 [details] [diff] [review]
Don't walk the APZ tree if APZ is disabled
Attachment #8476208 - Flags: review?(botond)
Comment on attachment 8476208 [details] [diff] [review]
Don't walk the APZ tree if APZ is disabled

Review of attachment 8476208 [details] [diff] [review]:
-----------------------------------------------------------------

I wonder if it might make more sense to not even create an APZCTreeManager object [1] if AsyncPanZoomEnabled() is false. As far as I can tell, all the places where we use the tree manager we already check that it's not null, except here [2].

[1] http://mxr.mozilla.org/mozilla-central/source/gfx/layers/ipc/CompositorParent.cpp?rev=49d6a8691dce#245
[2] http://mxr.mozilla.org/mozilla-central/source/gfx/layers/ipc/CompositorParent.cpp?rev=49d6a8691dce#280
Created attachment 8476391 [details] [diff] [review]
Don't build an APZCTreeManager if APZ is disabled

Yeah, good point. I was a little concerned that there might be other places where we assume a non-null APZCTM but if there are we should just fix them.
Attachment #8476208 - Attachment is obsolete: true
Attachment #8476208 - Flags: review?(botond)
Attachment #8476391 - Flags: review?(botond)

Updated

4 years ago
Attachment #8476391 - Flags: review?(botond) → review+
(Reporter)

Comment 12

4 years ago
This fixes one aspect of the performance, but only on platforms with APZC disabled (aka desktop, but only for a while) if I understand correctly. Are there any bugs I should create to cover the other aspects?
Maybe i should have done that patch in a separate bug but given that i didn't it would be best to clone this bug into a new one with an updated profile after my fix lands. The updated profile should point to the next bottleneck that we need to optimize.
(In reply to Kartikaya Gupta (email:kats@mozilla.com) from comment #4)
> That being said, according to the profile 54% of the time in the compositor
> thread is spent in the call at [1].
> 
> [1]
> http://mxr.mozilla.org/mozilla-central/source/gfx/layers/apz/src/
> APZCTreeManager.cpp?rev=413e0473a1d8#394

In the nsIntRegion::Transform call, or the matrix Inverse? I can't get Cleopatra to show me...

Seems to me that, at least when the matrix is complex, we should simplify the region to a single rect before transforming it. The transformation has to fluff out to the bounding-box of each rect anyway.
The nsIntRegion::Transform was the thing consuming most of the time.
This landed on m-c yesterday but the bug was never closed. Not sure why but ni? to :kwierso in case other bugs were missed too.

https://hg.mozilla.org/mozilla-central/rev/05436dd932bc
Status: NEW → RESOLVED
Last Resolved: 4 years ago
Flags: needinfo?(kwierso)
Resolution: --- → FIXED
Target Milestone: --- → mozilla34
No clue why this didn't get closed. I ran the mc-merge tool against that merge, and I've verified that several of the other commits got marked correctly...
Flags: needinfo?(kwierso)

Comment 19

3 years ago
I'm experiencing poor transform:matrix3d performance in Firefox compared to Chrome. Try this site in both browsers:

https://trusktr.io/
(hover with the mouse near the left edge of the page to open the menu, or swipe in from the left edge on mobile (with care in iOS since swiping from the left edge in Safari takes you back to the previous site))

Pleeease fine-tune CSS transform:matrix3d as this is (in concept) an incredible way of making dynamic things in the web!

I'm in Mac OS X 10.10.2 (14C109), Firefox Developer Edition 38.0a2 (2015-03-15)
I've filed bug 1143492 about comment 19.
You need to log in before you can comment on or make changes to this bug.