Make APZ tell chrome main thread the transforms for chrome to content coordinate spaces

RESOLVED FIXED in Firefox 67

Status

()

enhancement
P3
normal
RESOLVED FIXED
4 months ago
2 months ago

People

(Reporter: hsivonen, Assigned: hsivonen)

Tracking

(Blocks 1 bug)

unspecified
mozilla67
Points:
---
Dependency tree / graph

Firefox Tracking Flags

(Fission Milestone:M2, firefox67 fixed)

Details

(Whiteboard: [fission-event-m1] )

Attachments

(1 attachment)

Assignee

Description

4 months ago

Introduce a function that takes a LayersIds and returns the transform matrix from the chrome coordinate space to the content coordinate space associated with that LayersId. This is LayoutDeviceToLayoutDeviceMatrix4x4, since, for better or worse, content processes have "LayoutDevice"-typed top-level coordinates even though that may not strictly be true in the case of out-of-process iframes.

Priority: -- → P3
Assignee

Updated

4 months ago
Whiteboard: [fission-event-m1]

Updated

4 months ago
Fission Milestone: --- → M2
Assignee

Comment 1

4 months ago

https://gist.github.com/staktrace/6b87bd7ba12df80f53552b1e2220b795 shows the added HitTestingTreeNode::GetCSSTransformToContent() method multiplying a CSSTransformMatrix by another CSSTransformMatrix, which is a unit mismatch. Even after reading the docs, I'm quite confused about what matrices I need to insert in between those matrices in the multiplication chain in order to have the units match up. I see that some places in the code multiply by AsyncTransformMatrix(), which to me looks like just unit multiply with different units.

kats, could you, please, explain some more how to get the units to match up when recursively multiplying the matrices in HitTestingTreeNode::GetCSSTransformToContent()?

Also, it doesn't help conceptualize what's going on that the matrix type that I need is LayoutDeviceToLayoutDeviceMatrix4x4. Am I correct in assuming that for all the scale between CSS and LayoutDevice units can't differ between iframe parent and child so that if the going from a CSS to CSS to a LayoutDeviceToLayoutDeviceMatrix4x4 at the end is always a matter of multiplying by CSSToLayoutDeviceScale in TabParent?

Flags: needinfo?(kats)

I can provide some background on the conceptual model here.

Every level in the hit testing tree represents a layer, and every layer has its own coordinate space.

The number of layers is dynamic, so their coordinate systems can't each get a different static type. Instead, functions that work with layer coordinate systems typically have an (implicit) notion of a "current layer", and quantities in the coordinate system of the current layer use LayerPixel as their units.

It's also common to work with quantities in the coordinate system of the current layer's parent layer, so we have ParentLayerPixel for this purpose.

When accumulating a transform over several layers, you often need to change your view of what the "current layer" is in a loop (or through recursion). To do this, we cast from ParentLayer units to Layer units, or vice versa. Casting requires a justification, and we have a justification named MovingDownToChildren for this purpose.

Going from a layer to its parent involves applying potentially two kinds of transforms: a "CSS transform" (called that because it typically comes from a CSS transform property, though it could come from some other things as well), and an "async transform" (representing OMTA or APZ scrolling/zooming). We apply these two in that order, and give them the types CSSTransformMatrix and AsyncTransformMatrix, respectively. (To fit into our unit model, we invented the intermediate coordinate space CSSTransformedLayerPixel, so that CSSTransformMatrix goes from LayerPixel to CSSTransformedLayerPixel, and AsyncTransformMatrix from CSSTransformedLayerPixel to ParentLayerPixel, but that's mostly an implementation detail.)

Sometimes, you want to accumulate transforms ignoring the async transforms (because they're going to change asynchronously anyways). In such cases, to get a Layer to ParentLayer transform representing the CSS transform only, you take the CSS transform and multiply it by an empty AsyncTransformMatrix.

The rootmost coordinate space is Screen space. It is equivalent to the ParentLayer space of the topmost layer, and we have a justification named ScreenIsParentLayerForRoot for casting between those as well.

Regarding the mentioned patch:

Given what the GetCSSTransformToContent() function is doing numerically -- multiplying together all the CSS transforms from a layer up to the root -- it should return a matrix of type LayerToScreenMatrix4x4. (That doesn't really seem to agree with the name of the function, which should be more like GetCSSTransformToRoot() based on what it's doing.)

Given that signature, an implementation that unit-checks would look something like:

LayerToScreenMatrix4x4 HitTestingTreeNode::GetCSSTransformToRoot() const {
  if (mParent) {
    LayerToParentLayerMatrix4x4 thisToParent =
        mTransform * AsyncTransformMatrix();
    ParentLayerToScreenMatrix4x4 parentToRoot =
        ViewAs<ParentLayerToScreenMatrix4x4>(
            mParent->GetCSSTransformToRoot(),
            PixelCastJustification::MovingDownToChildren);
    return thisToParent * parentToRoot;
  }

  // We are the (g)root.
  return ViewAs<LayerToScreenMatrix4x4>(
      mTransform * AsyncTransformMatrix(),
      PixelCastJustification::ScreenIsParentLayerForRoot);
}

I appreciate that's a bit verbose, but on the other hand these strongly typed units have caught real bugs, so I think they're worth it.

As far as going from this LayerToScreenMatrix4x4 to a matrix that uses LayoutDevice coordinates for consumption by main-thread code, that tends to involve some hand-waving and various LayoutDeviceIsScreenFor* cast justifications (feel free to add a new justification if the existing ones aren't a good fit).

(In reply to Henri Sivonen (:hsivonen) from comment #1)

Am I correct in assuming that for all the scale between CSS and LayoutDevice units can't differ between iframe parent and child

I believe so, yes.

so that if the going from a CSS to CSS to a LayoutDeviceToLayoutDeviceMatrix4x4 at the end is always a matter of multiplying by CSSToLayoutDeviceScale in TabParent?

I can see why the naming CSSTransformMatrix can be confusing in this context :/ As explained above, it's something rather different from a "CSS to CSS" matrix, which would be CSSToCSSMatrix4x4.

In any case, I don't think you need to multiply by CSSToLayoutDeviceScale at all, because the various Layer coordinate spaces all include that scale already, so getting to the LayoutDeviceToLayoutDeviceMatrix4x4 should just be a cast.

(For completeness, even if you were hypothetically converting from a CSSToCSSMatrix4x4 to a LayoutDeviceToLayoutDeviceMatrix4x4, you would not multiply the matrix by CSSToLayoutDeviceScale: converting the source units of the matrix involves dividing by that scale, and converting the destination units involves multiplying by it, and those two operations cancel out.)

(I think I've answered the questions from comment 1, but I'll leave the needinfo on kats just in case I missed something.)

Thanks Botond! I was having a hard time putting it into words :)

(In reply to Botond Ballo [:botond] from comment #4)

As far as going from this LayerToScreenMatrix4x4 to a matrix that uses LayoutDevice coordinates for consumption by main-thread code, that tends to involve some hand-waving and various LayoutDeviceIsScreenFor* cast justifications (feel free to add a new justification if the existing ones aren't a good fit).

With respect to this bit, I think the most technically correct would be to keep the TabParent matrix as a LayerToScreenMatrix4x4. When converting a LayoutDevice coordinate from the child to the parent, you'd want to do something like:

LayerPoint childcoords = ViewAs<LayerPixel, PixelCastJustification::ContentProcessIsLayerInUiProcess>(myChildLayoutDeviceCoords);
ScreenPoint converted = matrix.TransformPoint(childcoords);
LayoutDevicePoint parentcoords = ViewAs<LayoutDevicePixel, PixelCastJustification::LayoutDeviceIsScreenForUntransformedEvent>(converted);

But to reduce verbosity I'm fine with using a LayoutDeviceToLayoutDevice matrix and then use a PixelCastJustification::ContentProcessIsLayerInUiProcess to convert the LayerToScreenMatrix4x4 to the LayoutDeviceToLayoutDevice matrix when populating the value in TabParent. (ContentProcessIsLayerInUiProcess would be the new justification to add to the list).

In the end, any casting of matrices/points from one type to another just needs to be adequately documented, whether it's via PixelCastJustification or code comments. The values flowing through should be the same either way and so correctness shouldn't be affected.

Flags: needinfo?(kats)
Assignee

Comment 9

4 months ago

Thanks. At least locally the attached patch didn't break things.

Let's see how it does on try:
https://treeherder.mozilla.org/#/jobs?repo=try&revision=8df46e0012ff0fa65e6912462dc5d5c2495ce2aa

Assignee: nobody → hsivonen
Status: NEW → ASSIGNED
Attachment #9048469 - Attachment description: Bug 1530661 - Make APZ report the per LayersId layer-to-sceen transform matrices to the chrome process. → Bug 1530661 - Make APZ report the per LayersId layer-to-screen transform matrices to the chrome process.
Assignee

Comment 10

4 months ago

With the latest patch, mouse clicks into out-of-process iframes (that aren't animating) work! However, the DOM API reporting of screen-relative coordinates from within the out-of-process iframe is broken. (I'll file a bug.)

Let's see if anything breaks on try:
https://treeherder.mozilla.org/#/jobs?repo=try&revision=c43558511211fa96ced217f881791dc18151e68e

Assignee

Comment 11

4 months ago

What the bug description said didn't make much sense as a standalone function, so this bug grew into what bug 1530656 was supposed to be. Since the discussion is here, morphing this bug and closing the other one.

Blocks: oop-frames, 1528937
No longer blocks: 1530656
Summary: Introduce a method for computing the chrome to content transfer matrix given a LayersId → Make APZ tell chrome main thread the transforms for chrome to content coordinate spaces
Assignee

Updated

4 months ago
Duplicate of this bug: 1530656
Assignee

Updated

4 months ago
Blocks: 1532901
Assignee

Comment 13

4 months ago

(In reply to Henri Sivonen (:hsivonen) from comment #10)

Let's see if anything breaks on try:
https://treeherder.mozilla.org/#/jobs?repo=try&revision=c43558511211fa96ced217f881791dc18151e68e

Failures with the IPC error "message sent to unknown actor ID" when trying to send the transforms from the compositor.

Assignee

Comment 14

4 months ago

(In reply to Henri Sivonen (:hsivonen) from comment #13)

(In reply to Henri Sivonen (:hsivonen) from comment #10)

Let's see if anything breaks on try:
https://treeherder.mozilla.org/#/jobs?repo=try&revision=c43558511211fa96ced217f881791dc18151e68e

Failures with the IPC error "message sent to unknown actor ID" when trying to send the transforms from the compositor.

I'm guessing that

  RefPtr<GeckoContentController> controller =
      GetContentController(aRootLayerTreeId);

returns a somehow disconnected controller. How can I check for such disconnectedness?

Flags: needinfo?(botond)
Assignee

Comment 15

4 months ago

(In reply to Henri Sivonen (:hsivonen) from comment #13)

(In reply to Henri Sivonen (:hsivonen) from comment #10)

Let's see if anything breaks on try:
https://treeherder.mozilla.org/#/jobs?repo=try&revision=c43558511211fa96ced217f881791dc18151e68e

Failures with the IPC error "message sent to unknown actor ID" when trying to send the transforms from the compositor.

Linux and Mac Quantum Render tests look OK. I take it that this only fails on Windows, because the compositor is in a different process only on Windows?

(In reply to Henri Sivonen (:hsivonen) from comment #15)

How can I check for such disconnectedness?

The patch you have should already be doing this check (the mCanSend check in RemoteContentController). You should also null-check the result of GetContentController, but that's not the problem here.

(In reply to Henri Sivonen (:hsivonen) from comment #15)

Linux and Mac Quantum Render tests look OK. I take it that this only fails on Windows, because the compositor is in a different process only on Windows?

Yeah that seems plausible. On Windows GetContentController() for the root layers id will return a RemoteContentController instead of a ChromeProcessController, and so sending the transform goes over PAPZ.

I don't really know why this is failing. Are you able to reproduce the problem locally and debug?

Assignee

Comment 17

4 months ago

(In reply to Kartikaya Gupta (email:kats@mozilla.com) from comment #16)

You should also null-check the result of GetContentController, but that's not the problem here.

Added the null check.

Are you able to reproduce the problem locally and debug?

I don't currently have a Windows debugging environment set up. Maybe I need to set one up for this.

Is there a way to make the compositor to use a separate process on Linux in the hope of reproducing the problem where rr exists?

(In reply to Henri Sivonen (:hsivonen) from comment #17)

Is there a way to make the compositor to use a separate process on Linux in the hope of reproducing the problem where rr exists?

Based on a quick look at the code, setting the pref layers.gpu-process.force-enabled should cause us to use the GPU process on Linux.

cc Ryan to make sure I'm not overlooking something here that would prevent the GPU process from working on Linux.

(For the original question, I'm afraid I don't have a better answer than Kats, either.)

Flags: needinfo?(botond)

Yeah, the GPU process can be enabled on Linux via the pref.

Assignee

Comment 20

4 months ago

If I enable the GPU process, WebRender gets turned off during mach reftest even if I use --setpref to turn it on. If I change the source to hard-code WebRender enablement, mach reftest fails with

###!!! [Child][MessageChannel::SendAndWait] Error: Channel error: cannot send/recv
Crash Annotation GraphicsCriticalError: |[0][GFX1-]: Failed to connect WebRenderBridgeChild. (t=11.4751) [GFX1-]: Failed to connect WebRenderBridgeChild.

--setpref security.sandbox.gpu.level=0 --setpref security.sandbox.content.level=0 isn't sufficient to remedy this.

Additionally, --debugger=rr appears to have no effect in combination of mach reftest. :-(

Assignee

Comment 21

4 months ago
Assignee

Comment 22

4 months ago

(In reply to Henri Sivonen (:hsivonen) from comment #21)

Let's see if overriding PAPZParent::Recv__delete__() helps:
https://treeherder.mozilla.org/#/jobs?repo=try&revision=343e18fe7b05bf5a39fffd8dde2c91f2cbd9ffcd

Hopefully in a manner that actually compiles this time:
https://treeherder.mozilla.org/#/jobs?repo=try&revision=e288750cd24ee1160a75420deb83de4b27224a04

(In reply to Henri Sivonen (:hsivonen) from comment #20)

Additionally, --debugger=rr appears to have no effect in combination of mach reftest. :-(

Hmm, this does work for me.

The initial output looks very similar as without rr, but eventually I see:

REFTEST SUITE-START | Running 1 tests
REFTEST INFO | Running tests in file:///home/botond/dev/mozilla/central/gfx/layers/apz/test/reftest/reftest.list
REFTEST INFO | Running with e10s: True
REFTEST INFO | Application command: /home/botond/bin/rr /home/botond/dev/mozilla/central/objdir-desktop-clang/dist/bin/firefox -marionette -profile /tmp/tmpThen4v.mozrunner

and at the end a recording is available for replay with rr replay.

Assignee

Updated

4 months ago
See Also: → 1533673
Assignee

Comment 24

4 months ago

In the interest of getting something that's demoable on some platform into Nightly, let's put this behind an opt-in pref when the GPU process is in use:
https://treeherder.mozilla.org/#/jobs?repo=try&revision=2bc86da9f1022e2c47ae3a74ac15cbcdb0a0a51c

FWIW I applied your latest patch (on Linux), flipped on the pref in gfxPrefs.h, and built it. Then I ran:

./mach reftest --setpref gfx.webrender.all=true --setpref layers.gpu-process.force-enabled=true layout/reftests/reftest-sanity/

And it seemed to run fine. The sandbox dump by the reftest harness indicated webrender was enabled. I also ran the same thing (minus the gpu-process pref) on Windows and there too it seemed to work ok. The only thing I can think of that you should make sure you're using gfx.webrender.all instead of gfx.webrender.enabled, because the former will also enable acceleration (which is off by default on Linux) while the latter will not. So using the latter by itself will not actually enable WR.

Assignee

Comment 26

4 months ago

I used gfx.webrender.all. No idea why it fell back to basic layers under mach reftest. Let's leave the GPU process issue to bug 1533673 and move to review here, since the orange on the try run was intermittent (green on re-run).

Comment 27

4 months ago
Pushed by hsivonen@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/91d0c9066fd1
Make APZ report the per LayersId layer-to-screen transform matrices to the chrome process. r=kats

Comment 28

4 months ago
bugherder
Status: ASSIGNED → RESOLVED
Closed: 4 months ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla67

Did you have a specific test case that you used to test this behaviour? If so, can you point me to it? I'd like to land it in-tree as a mochitest or similar, to ensure it doesn't regress as we make other changes to this code.

Flags: needinfo?(hsivonen)
Assignee

Comment 30

2 months ago

I used https://hsivonen.fi/fission-host.html and observed that context menu invoked from the keyboard (on a keyboard that has a context menu key) and IME-managed menus show up in the right place.

For the next hop of pushing the matrices from parent to child, I used click event screenX and screenY in the child.

Flags: needinfo?(hsivonen)
You need to log in before you can comment on or make changes to this bug.