Closed Bug 935219 Opened 11 years ago Closed 10 years ago

FrameMetrics::mCompositionBounds is still incorrectly calculated

Categories

(Core :: Layout, defect)

defect
Not set
normal

Tracking

()

RESOLVED FIXED
mozilla30
blocking-b2g 1.3+
Tracking Status
firefox28 --- wontfix
firefox29 --- wontfix
firefox30 --- fixed
b2g-v1.3 --- fixed
b2g-v1.3T --- fixed
b2g-v1.4 --- fixed

People

(Reporter: kats, Assigned: botond)

References

Details

Attachments

(2 files, 23 obsolete files)

278.15 KB, application/pdf
Details
84.13 KB, patch
tnikkel
: review+
Details | Diff | Splinter Review
botond and I were discussing this today after Cwiiis brought up that the composition bounds were not what he was expecting. Basically, the mCompositionBounds for a layer L currently includes the cumulativeResolution of layer L, and so changes when L is rendered at a different resolution. But conceptually, the composition bounds for layer L are fixed in L's parent layer, and so should not change if L's contents are changing.

I believe that mCompositionBounds should be calculated as

RoundedToInt(LayoutDeviceRect::FromAppUnits(compositionBounds, auPerDevPixel) * metrics.GetParentResolution())

rather than what it is now [1]. We can leave it as a ScreenIntRect, or we could change it to a ParentLayerRect (which I would prefer but is much more work). I suspect there are a few places that it is being used incorrectly still and so converting it to a ParentLayerRect and seeing what shakes out might be a good idea.

[1] http://mxr.mozilla.org/mozilla-central/source/layout/base/nsDisplayList.cpp?rev=de45f494f6b2#733
Actually I just realized another problem that had been nagging at the back of my mind for a while. When we assign mCompositionBounds we use a layerToScreenScale of 1.0, when in fact that's not right transform at all. The composition bounds is part of a layer that actually has a non-empty transform, and specifically one that cancels out the resolution.

So consider the case when loading grid.html on my Surface Pro. The screen resolution is 1920x1080, and the CSS to LDPixel scale is 1.395. This makes my screen 1376x774 CSS pixels, and that is the size of the root layer.

The child of the root layer is the one that corresponds to the content for the subdocument, and it has no frame metrics. And then the child of that one is the root scrollable layer for the content. The composition bounds of this layer is going to be the full screen, which is 1376x774 CSS pixels, or 1920x1080 LDPixels. Let's say this layer also has a resolution of 2.0, so it comes out to 3840x2160 LayerPixels. When this layer is actually composited to the screen, the layer's GetTransform() will return a 0.5 scaling matrix, which makes the layer render at 1920x1080 ScreenPixels. So the composition bounds are actually 1920x1080 ScreenPixels. But because we use a layerToScreenScale of 1.0, we compute a composition bounds of 3840x2160 ScreenPixels (which then subsequently gets cropped by the widget bounds, which really shouldn't be needed at all).
I think putting the composition bounds into parent layer pixels makes the above scenario work out fine:

If the resolution (and thus the CSS transform) is on the root scrollable layer, then that resolution doesn't get multiplied into the composition bounds (because that's a _parent_ layer pixel), so the composition bounds remains 1920x1080. Likewise, inputs which are in the 1920x1080 range in global screen space remain in the 1920x1080 range when hit testing on the root scrollable layer, since for hit testing we don't unapply that layer's transform, only its parent layers'.

Now consider the case where the resolution (and thus the CSS transform) is on the root scrollable layer's parent layer instead (the one that corresponds to the content for the subdocument and has no frame metrics). Now, the composition bounds will be 3840x2160, because the parent layer's resolution does get factored into parent layer pixels, but when hit testing we also untransform the input from the 1920x1080 global screen space by the parent layer's 0.5 CSS transform, which puts the transformed point into the 3840x2160 range as well.
So after starting to implement this yesterday I came to a realization that the code as it is now is actually (almost) correct mathematically, although some things could be labelled better. So here's what happened:

I changed RecordFrameMetrics so that mCompositionBounds was calculated like I said in comment 0, and turned mCompositionBounds into a ParentLayerIntRect. Then I went looking for places it was used in APZC.cpp to update those as well. Let's consider the use at [1] as an example.

In this usage, the code computes the minimum zoom that can be applied to the content without going into underzoom. It does this by calculating the ratio of the scrollable rect (in CSS pixels) with the mCompositionBounds. When I thought about this, I realized that with my new definition of mCompositionBounds, I would have to change this code to be like so:

LayerRect availableSpace = mCompositionBounds * mLastContentPaintMetrics.mResolution;
realMinZoom = std::max(..., availableSpace.width / mFrameMetrics.mScrollableRect.width);

And as it turns out, this "availableSpace" is exactly equal to the current computation of mCompositionBounds (modulo the widget-bounds-clipping and scrollbar-trimming). I didn't check all the use sites, but the ones I did check we seem to always want mCompositionBounds to be in a space that is comparable to the other things in FrameMetrics (i.e. not in a "parent" space) but that doesn't include the transient async transform (which Botond and I discussed yesterday, and can be defined as the difference between mFrameMetrics.mResolution and mLastContentPaintMetrics.mResolution). And that's exactly what the current mCompositionBounds is.

This also made it clear to me what the problem with hit testing in Metro is - previously we unapplied all of L's CSS and async transforms, and now we unapply neither of them. Really what we want to be doing is unapplying L's CSS and non-transient async transform, so that the input point is in the same space as mCompositionBounds (which doesn't include the transient part of the async transform). Apologies if this is exactly what Botond was telling me yesterday and I didn't quite understand. But it makes sense to me now. I also remember thinking last night that the same transform made sense for input events being delivered to the APZC although I don't quite remember why now.

I'll work in bug 902505 to update the hit-test and input transform code assuming mCompositionBounds doesn't change, and get that landed so that I'm unblocked on Metro. We can come back to this bug after that and convert mCompositionBounds and its use sites properly to something more obviously correct.

[1] http://mxr.mozilla.org/mozilla-central/source/gfx/layers/ipc/AsyncPanZoomController.cpp?rev=240145f87b58#639
The other thing I realized is that because the APZC for layer L knows nothing about L's CSS transforms (or in fact any of the CSS transforms on layers L..R) the only reasonable definition for L's "screen" space is the space where L has its async transform applied but nothing else. If we define L's screen space to include L's CSS transform then there is no way that anything in APZC or FrameMetrics could ever know anything about it.
(In reply to Kartikaya Gupta (email:kats@mozilla.com) from comment #3)
> This also made it clear to me what the problem with hit testing in Metro is
> - previously we unapplied all of L's CSS and async transforms, and now we
> unapply neither of them. Really what we want to be doing is unapplying L's
> CSS and non-transient async transform, so that the input point is in the
> same space as mCompositionBounds (which doesn't include the transient part
> of the async transform).

If the composition bounds includes L's own resolution, then unapplying L's non-transient transforms does not get the input point into the same coordinate space as the composition bounds. Consider the following example, with a single scrollable layer L:

Initial scenario:
  Screen width       = 1000 screen pixels
  Resolution         = 1
  Composition bounds = 1000 "screen" pixels
  CSS transform      = 1
  Async transform    = 1

Now zoom in 2x, and allow Gecko to repaint at the higher resolution:
  Screen width       = 1000 screen pixels
  Resolution         = 2
  Composition bounds = 2000 "screen" pixels
  CSS transform      = 0.5
  Async transform    = 2 (non-transient since we've repainted)

Here, unapplying the css and non-transient async transforms will yield input points with x coordinates in the range [0, 1000], but the composition bounds will be [0, 2000] along the x axis.

When zooming in the root scroll frame, this doesn't happen, since we clip the composition bounds to the widget bounds, so we'll get composition bounds of [0, 1000]. When zooming out, though, we'll get composition bounds of e.g. [0, 500], and as a result touches in the [500, 1000] range will be ignored. You can see this by loading a simple page on B2G and zooming out and trying to pan on the right side of the screen.

I realize that this contradicts what I said in comment #2 - that comment was wrong (I didn't consider the async transform when writing that).

So this is still very much a problem. I think the solution is to put the composition bounds into parent layer pixels by not having it include the layer's own resolution, as Kats suggests in comment #0. I will give that a try.
Assignee: nobody → botond
(In reply to Botond Ballo [:botond] from comment #5)
> So this is still very much a problem. I think the solution is to put the
> composition bounds into parent layer pixels by not having it include the
> layer's own resolution, as Kats suggests in comment #0. I will give that a
> try.

One nice side effect of this is that we should no longer need to clip the composition bounds to the widget bounds any more.
This bug causes, among other things, panning after zooming out to be broken. It's broken to 1.2 as well, so we'll probably want to uplift the fix, which I'm working on.
blocking-b2g: --- → koi?
QA,

Do we know if this is a regression issue?
Keywords: qawanted
(In reply to Botond Ballo [:botond] from comment #7)
> This bug causes, among other things, panning after zooming out to be broken.
> It's broken to 1.2 as well, so we'll probably want to uplift the fix, which
> I'm working on.

Can you give clearer STR here?
(In reply to Jason Smith [:jsmith] from comment #9)
> (In reply to Botond Ballo [:botond] from comment #7)
> > This bug causes, among other things, panning after zooming out to be broken.
> > It's broken to 1.2 as well, so we'll probably want to uplift the fix, which
> > I'm working on.
> 
> Can you give clearer STR here?

1. Load http://people.mozilla.org/~bballo/grid.html in the B2G browser.
2. Zoom out.
3. Try panning on the right side of the screen (on area that was brought into view by zooming out).

Expected results: the page scrolls.

Actual results: no scroll.
May attempt to fix this is running into bug 914864, a bug which was previously masked by the fact that we were clamping the composition bounds to the widget bounds (something we can't do any more). I will take a look at that bug.
Depends on: 914864
QA Contact: sarsenyev
As per comment 10, doesn't look like a regression, after checking 1.3, 1.2, 1.1 builds, all versions are having the same issue, after zooming out the screen the page doesn't scroll.
Keywords: qawanted
(In reply to sarsenyev from comment #12)
> As per comment 10, doesn't look like a regression, after checking 1.3, 1.2,
> 1.1 builds, all versions are having the same issue, after zooming out the
> screen the page doesn't scroll.

Okay, then not a blocker since this isn't a regression from a past release.
blocking-b2g: koi? → -
Blocks: 941745
(In reply to Jason Smith [:jsmith] from comment #13)
> (In reply to sarsenyev from comment #12)
> > As per comment 10, doesn't look like a regression, after checking 1.3, 1.2,
> > 1.1 builds, all versions are having the same issue, after zooming out the
> > screen the page doesn't scroll.
> 
> Okay, then not a blocker since this isn't a regression from a past release.

Switching to koi+ given the bug this now blocks.
blocking-b2g: - → koi+
No longer blocks: 941745
Blocks: 941745
This bug is not a regression, right?  How is it blocking a regression then?  Bug 941745 comment 11 is not describing the same problem as originally reported in that bug.  Fixing this may or may not fix the original problem.  Botond, can you clarify?
Flags: needinfo?(botond)
(In reply to Milan Sreckovic [:milan] from comment #15)
> This bug is not a regression, right?  How is it blocking a regression then? 
> Bug 941745 comment 11 is not describing the same problem as originally
> reported in that bug.  Fixing this may or may not fix the original problem. 
> Botond, can you clarify?

I've been treating this bug as a "calculate the composition bounds correctly" bug. We've made a number of changes to how the composition bounds are calculated, and I think none of them have gotten it quite right, and I think I have a plan to get it right in this bug.

The symptoms of bug 941745 make it sound like a bug in the composition bounds calculation (I'm not sure though - I'll need to reproduce and debug it to be sure), in which case fixing this bug will probably fix it too. However, there may also be a more specific fix that will fix just bug 941745.
Flags: needinfo?(botond)
No longer blocks: 941745
Removing koi+ as this is no longer blocking bug 941745.
blocking-b2g: koi+ → ---
Stealing this bug as it blocks Important Bugs (TM) and needs to get looked at soonish.
Here's my WIP patch. It's basically at the stage where the code is written and it compiles, but my attempt to start testing it ran into bug 914864 and I didn't test it further. This was a few weeks ago, so it probably bitrotted a bit by now.
Attached patch WIP (obsolete) — Splinter Review
Rebased patch and updated to deal with some newly added code in ClientTiledThebesLayer. Patch now applies cleanly to trunk and compiles.

Interestingly, I'm now no longer running into bug 914864 any more. Not sure why...
Attachment #8342429 - Attachment is obsolete: true
Tested the patch a bit more, it seems to be working fairly well. I didn't run into bug 914864 at all, so I'm removing the dependency. Maybe something changed recently in layout that resolved the issue? I occasionally see some strange behaviour with subframes when zoomed in, but it's hard to reproduce.

I'll clean up the patch a bit and then post it for review.
No longer depends on: 914864
Attached patch bug935219.patch (obsolete) — Splinter Review
Cleaned up patch. Posting it for review.

I realize the PixelCastJustification stuff is a bit unconventional. I'm happy to remove it if you prefer.
Attachment #8342555 - Attachment is obsolete: true
Attachment #8342687 - Flags: review?(bugmail.mozilla)
Comment on attachment 8342687 [details] [diff] [review]
bug935219.patch

Also flagging Chris for review of the ClientTiledThebesLayer bits as he worked on that code recently (Chris, feel free to look over the rest of patch as well if you'd like, one can never have too many eyes on code that deals with these coordinate systems).
Attachment #8342687 - Flags: review?(chrislord.net)
(In reply to Botond Ballo [:botond] from comment #21)
> I occasionally see some strange behaviour with subframes when zoomed in

One of these behaviours can be reproduced fairly reliably:
  1. Load http://people.mozilla.org/~kgupta/tmp/iframe.html
  2. Zoom in
  3. Turn the screen off and back on.
The contents of the iframe become black. Zooming back out and panning the iframe contents makes them appear again.

I'll look into this.
Comment on attachment 8342687 [details] [diff] [review]
bug935219.patch

This is a pretty huge patch and it sounds like you're still investigating issues, so I'm unflagging myself. Please re-request when appropriate (or request feedback if you need it).
Attachment #8342687 - Flags: review?(chrislord.net)
Comment on attachment 8342687 [details] [diff] [review]
bug935219.patch

Review of attachment 8342687 [details] [diff] [review]:
-----------------------------------------------------------------

::: gfx/layers/ipc/AsyncPanZoomController.cpp
@@ +1652,5 @@
> +
> +void AsyncPanZoomController::UpdateTransformScale()
> +{
> +  // The x and y scales should be the same.
> +  mFrameMetrics.mTransformScale.scale = (GetCSSTransform() * GetNontransientAsyncTransform()).GetXScale();

Under what conditions is this not 1.0? Based on my various notes from the last time I banged my head against the coordinate systems, GetNontransientAsyncTransform will be the resolution for this layer, and GetCSSTransform() will basically be the inverse of that because of the post-scale that is applied in calculating it. Therefore, assuming no actual CSS transform set by the content, this will always be 1.0.

If there is an actual CSS transform set by the content, then there is no guarantee that the x-scale and the y-scale are the same, and so doing GetXScale() will give you something that doesn't apply on the y-axis. I'm not sure if your intent here is to include content-driven CSS transforms or not but either way something seems amiss.
(In reply to Kartikaya Gupta (email:kats@mozilla.com) from comment #26)
> Comment on attachment 8342687 [details] [diff] [review]
> bug935219.patch
> 
> Review of attachment 8342687 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> ::: gfx/layers/ipc/AsyncPanZoomController.cpp
> @@ +1652,5 @@
> > +
> > +void AsyncPanZoomController::UpdateTransformScale()
> > +{
> > +  // The x and y scales should be the same.
> > +  mFrameMetrics.mTransformScale.scale = (GetCSSTransform() * GetNontransientAsyncTransform()).GetXScale();
> 
> Under what conditions is this not 1.0? Based on my various notes from the
> last time I banged my head against the coordinate systems,
> GetNontransientAsyncTransform will be the resolution for this layer, and
> GetCSSTransform() will basically be the inverse of that because of the
> post-scale that is applied in calculating it. Therefore, assuming no actual
> CSS transform set by the content, this will always be 1.0.

Most notably on Metro, where the async transform and the CSS transform are set on different layers. Recall that this is why we found it neccesary to unapply the local non-transient transforms during when delivering input to an APZC.

The other case I can think of is when Gecko decides to render content at a resolution that's different from the zoom - my understanding is Gecko has the freedom to do that, even if it typically doesn't.
By the way, I think the fact that the transform scale is usually 1 is one reason we've had so much trouble getting our coordinates right - it's so easy to ignore it :)

Up until recently, we never unapplied it and did everything in ParentLayer pixels (calling them Screen pixels). That worked msot of the time. Then we discovered that was wrong for Metro, and we started always unapplying it. That also works most of the time. For things to work all of the time, though, I don't think we can ignore it.
(In reply to Botond Ballo [:botond] from comment #27)
> Most notably on Metro, where the async transform and the CSS transform are
> set on different layers. Recall that this is why we found it neccesary to
> unapply the local non-transient transforms during when delivering input to
> an APZC.

APZC::GetLocalTransform returns the cumulative transform up to the last layer which had an APZC [1]. That means it doesn't matter if the async transform and CSS transform are on different layers, as along as there is no APZC in between. Or am I missing something here?

> The other case I can think of is when Gecko decides to render content at a
> resolution that's different from the zoom - my understanding is Gecko has
> the freedom to do that, even if it typically doesn't.

Right, this usually happens when content sets CSS transforms. So the x-scale and y-scale may be different, and we'll probably need to account for that.

[1] http://mxr.mozilla.org/mozilla-central/source/gfx/layers/composite/APZCTreeManager.cpp#213
(In reply to Kartikaya Gupta (email:kats@mozilla.com) from comment #29)
> (In reply to Botond Ballo [:botond] from comment #27)
> > Most notably on Metro, where the async transform and the CSS transform are
> > set on different layers. Recall that this is why we found it neccesary to
> > unapply the local non-transient transforms during when delivering input to
> > an APZC.
> 
> APZC::GetLocalTransform returns the cumulative transform up to the last
> layer which had an APZC [1]. That means it doesn't matter if the async
> transform and CSS transform are on different layers, as along as there is no
> APZC in between. Or am I missing something here?

APZC does not have a method named GetLocalTransform() :)

It has GetCSSTransform() and GetAncestorTransform(). My understanding, confirmed by the code you linked, is GetCSSTransform() is the CSS transform of the current layer only, and GetAncestorTransform() is the cumulative CSS transform up to the last layer which had an APZC (and not including GetCSSTransform()).

I use GetCSSTransform() in computing the transform scale, so it does not include a CSS transform corresponding to a resolution set on a parent (non-apzc) layer.

> > The other case I can think of is when Gecko decides to render content at a
> > resolution that's different from the zoom - my understanding is Gecko has
> > the freedom to do that, even if it typically doesn't.
> 
> Right, this usually happens when content sets CSS transforms. So the x-scale
> and y-scale may be different, and we'll probably need to account for that.

For some reason I thought Gecko could do this even in the absence of CSS transforms. Not sure where I got that from.

Anyways, given that we currently do some things as if we had a transform scale of 1, we are currently not handling CSS transforms with different x and y scales differently, so this patch is an improvement over the current situation.
Comment on attachment 8342687 [details] [diff] [review]
bug935219.patch

(In reply to Botond Ballo [:botond] from comment #24)
> (In reply to Botond Ballo [:botond] from comment #21)
> > I occasionally see some strange behaviour with subframes when zoomed in
> 
> One of these behaviours can be reproduced fairly reliably:
>   1. Load http://people.mozilla.org/~kgupta/tmp/iframe.html
>   2. Zoom in
>   3. Turn the screen off and back on.
> The contents of the iframe become black. Zooming back out and panning the
> iframe contents makes them appear again.
> 
> I'll look into this.

This actually happens without my patch, too. I'll file a separate bug for it.

I think this patch is in a good state and ready to be reviewed. Re-flagging Chris for review. (Chris, again this is really an r? for the ClientTiledThebesLayer bits and an f? for the rest if you feel like looking at it.)
Attachment #8342687 - Flags: review?(chrislord.net)
(In reply to Botond Ballo [:botond] from comment #31)
> Comment on attachment 8342687 [details] [diff] [review]
> bug935219.patch
> 
> (In reply to Botond Ballo [:botond] from comment #24)
> > (In reply to Botond Ballo [:botond] from comment #21)
> > > I occasionally see some strange behaviour with subframes when zoomed in
> > 
> > One of these behaviours can be reproduced fairly reliably:
> >   1. Load http://people.mozilla.org/~kgupta/tmp/iframe.html
> >   2. Zoom in
> >   3. Turn the screen off and back on.
> > The contents of the iframe become black. Zooming back out and panning the
> > iframe contents makes them appear again.
> > 
> > I'll look into this.
> 
> This actually happens without my patch, too. I'll file a separate bug for it.

I filed bug 946833.
Comment on attachment 8342687 [details] [diff] [review]
bug935219.patch

Review of attachment 8342687 [details] [diff] [review]:
-----------------------------------------------------------------

Partial review since I have a couple of interviews to do soon and I won't get back to this until later, but my main concern is this:

There are two different coordinate spaces that are both called "ParentLayerPixel". one is mDevPixelsPerCSSPixel * GetParentResolution(), and the other is GetZoomToParent(). I would prefer renaming one of these. When I suggested using "ParentLayerRect" for mCompositionBounds I thought these were the same. Since they're not, it would be better to rename the second one to avoid confusion. I don't know what a good name is, but I'll think about it. The best suggestion I have right now is "LocallyTransformedScreenPixel" which refers to the fact that it "ScreenPixel" after some layer-local transform has been applied.

::: gfx/layers/composite/APZCTreeManager.cpp
@@ +808,3 @@
>    // It is OC.Inverse() * NC.Inverse() * MC.Inverse() * LC.Inverse() * LN.Inverse() at L,
>    //   and RC.Inverse() * QC.Inverse() * PC.Inverse() * PN.Inverse()                at P.
> +  gfxPoint hitTestPointForThisLayer = ancestorUntransform.ProjectPoint(aHitTestPoint);

The code here no longer corresponds to the calculations in the comments. Something needs to be updated.

::: gfx/layers/ipc/AsyncPanZoomController.cpp
@@ +689,5 @@
>      // either axis such that we don't overscroll the boundaries when zooming.
>      CSSPoint neededDisplacement;
>  
> +    CSSToParentLayerScale realMinZoom = mMinZoom * mFrameMetrics.mTransformScale;
> +    CSSToParentLayerScale realMaxZoom = mMaxZoom * mFrameMetrics.mTransformScale;

Not sure this is right, can you explain?
(In reply to Kartikaya Gupta (email:kats@mozilla.com) from comment #33)
> There are two different coordinate spaces that are both called
> "ParentLayerPixel". one is mDevPixelsPerCSSPixel * GetParentResolution(),
> and the other is GetZoomToParent(). I would prefer renaming one of these.
> When I suggested using "ParentLayerRect" for mCompositionBounds I thought
> these were the same. Since they're not, it would be better to rename the
> second one to avoid confusion. I don't know what a good name is, but I'll
> think about it. The best suggestion I have right now is
> "LocallyTransformedScreenPixel" which refers to the fact that it
> "ScreenPixel" after some layer-local transform has been applied.

I believe these two coordinate systems are the same: the layer pixels of the parent layer.

mDevPixelsPerCSSPixel * GetParentResolution() is the correct scale factor to use when translating between the CSS pixels of the parent layer and the ParentLayer pixels of the parent layer. For example, we use this when calculating composition bounds, since the CSS dimensions of the composition bounds are in the CSS pixels of the parent layer.

GetZoomToParent() is the correct scale factor to use when translating between the CSS pixels of the current layer and the ParentLayer pixels of the parent layer. For example, we use this in CalculateCompositedRectInCSSPixels(), since there we are asking how many CSS pixels of the current layer can fit into the composition bounds.

To illustrate this with some numbers, consider the following very simple example:
  - You have a page with an iframe with width=300 CSS pixels.
  - The iframe contents can be zoomed.
  - The iframe contents have been zoomed by a factor of 2x.
  - All other zooms/resolutions are 1.
Under these circumstances, the distance from the left edge of the iframe to the right edge of the iframe has the following values in the following coordinate systems:
  - (A) 300 in the CSS pixels of the page (parent layer). (I'm sure you agree that zooming the iframe 
    contents does not change the width of the iframe).
  - (B) 300 in the ParentLayer pixels of the iframe contents (the parent layer being the page)
  - (C) 150 in the CSS pixels of the iframe contents
Finally consider the scale factors:
  - mDevPixelsPerCSSPixel * GetParentResolution() = 1, which correctly converts from (A) to (B)
  - GetZoomToParent() = 2, which correctly converts from (C) to (B)

> ::: gfx/layers/composite/APZCTreeManager.cpp
> @@ +808,3 @@
> >    // It is OC.Inverse() * NC.Inverse() * MC.Inverse() * LC.Inverse() * LN.Inverse() at L,
> >    //   and RC.Inverse() * QC.Inverse() * PC.Inverse() * PN.Inverse()                at P.
> > +  gfxPoint hitTestPointForThisLayer = ancestorUntransform.ProjectPoint(aHitTestPoint);
> 
> The code here no longer corresponds to the calculations in the comments.
> Something needs to be updated.

Yeah, that comment is obsolete. I'll remove it.

> ::: gfx/layers/ipc/AsyncPanZoomController.cpp
> @@ +689,5 @@
> >      // either axis such that we don't overscroll the boundaries when zooming.
> >      CSSPoint neededDisplacement;
> >  
> > +    CSSToParentLayerScale realMinZoom = mMinZoom * mFrameMetrics.mTransformScale;
> > +    CSSToParentLayerScale realMaxZoom = mMaxZoom * mFrameMetrics.mTransformScale;
> 
> Not sure this is right, can you explain?

Immediately below, realMinZoom is compared to the ratio of the composition bounds to the scrollable rect. Since the scrollable rect is in CSS pixels (of the current layer) and the composition bounds is in ParentLayer pixels, the (conceptual) type of this ratio is CSSToParentLayerScale, so the thing we compare to it should also be a CSSToParentLayerScale.

We store mMinZoom and mMaxZoom as CSSToScreenScales. To convert them to CSSToParentLayerScales, we multiply them by the transform scale which is a ScreenToParentLayerScale.
Comment on attachment 8342687 [details] [diff] [review]
bug935219.patch

Review of attachment 8342687 [details] [diff] [review]:
-----------------------------------------------------------------

The changes to ClientTiledThebesLayer are consistent, I feel ok r+'ing that. I'd very much like you to check the numbers though and make sure this doesn't break the tile coherency code. It would be a shame to fix that and break it again in the same cycle :)

The bits in FrameMetrics and TabChild seem dubious to me though, but I would defer to you and kats in these situations... I didn't give much time to the APZC stuff, as both you and kats know the code a lot better than I do at the moment (I don't think my review is useful for that part).

::: dom/ipc/TabChild.cpp
@@ +384,5 @@
>          // Calculate a really simple resolution that we probably won't
>          // be keeping, as well as putting the scroll offset back to
>          // the top-left of the page.
>          mLastMetrics.mViewport = CSSRect(CSSPoint(), kDefaultViewportSize);
> +        mLastMetrics.mCompositionBounds = ParentLayerIntRect(

Just to help me understand, why is this ParentLayer and not just Layer if the metrics is for the root layer? So that it doesn't have any layer transforms applied?

::: gfx/layers/FrameMetrics.h
@@ +16,4 @@
>  
>  // The layer coordinates of the parent layer. Like the layer coordinates
>  // of the current layer (LayerPixel) but doesn't include the current layer's
>  // resolution.

This kind of answers my last question, I guess, but should this read transform? I thought the resolution only applies at the very end when drawing, and relates to LayoutDevice space?

@@ +138,5 @@
> +  // to layer pixels of our parent layer. Much as mZoom is used to interface
> +  // between inputs we get in screen pixels and quantities in CSS pixels,
> +  // this is used to interface between mCompositionBounds and quantities
> +  // in CSS pixels.
> +  CSSToParentLayerScale GetZoomToParent() const

Can you do this? What if there's a transform on this layer?

@@ +270,5 @@
> +  // system in which APZCs receive input events) and our parent layer's
> +  // layer pixels (the coordinate system of mCompositionBounds).
> +  // This consists of the scale of the local CSS transform and the
> +  // nontransient async transform.
> +  ScreenToParentLayerScale mTransformScale;

This, and this comment, just don't seem right to me with respect to transforms... A scale is a single factor, but the CSS transform is an affine transform matrix - surely this can't work in a lot of cases? And if that's the case, shouldn't that be documented?

::: gfx/layers/client/ClientTiledThebesLayer.cpp
@@ +43,5 @@
>    aAttrs = ThebesLayerAttributes(GetValidRegion());
>  }
>  
>  static LayoutDeviceRect
> +ApplyParentLayerToLayoutTransform(const gfx3DMatrix& aTransform, const ParentLayerRect& aParentLayerRect)

Might as well get rid of this function at this point, TransformTo reads nicer anyway.

::: gfx/layers/client/TiledContentClient.cpp
@@ +489,5 @@
>  
>    // Transform the screen coordinates into transformed layout device coordinates.
>    LayoutDeviceRect transformedCompositionBounds =
>      TransformCompositionBounds(compositionBounds, zoom, aPaintData->mScrollOffset,
> +                            aPaintData->mResolution, aPaintData->mTransformParentLayerToLayout);

Might as well correct this indentation if you're changing the line.

::: layout/base/Units.h
@@ +325,5 @@
> +
> +// Convenience functions for transforming an entity from one strongly-typed
> +// coordinate system to another using the provided transformation matrix.
> +template <typename TargetUnits, typename SourceUnits>
> +static gfx::PointTyped<TargetUnits> TransformTo(const gfx3DMatrix& aTransform,

I like this, do we want it for gfx::SizeTyped too?
Attachment #8342687 - Flags: review?(chrislord.net) → review+
(In reply to Botond Ballo [:botond] from comment #30)
> APZC does not have a method named GetLocalTransform() :)
> 
> It has GetCSSTransform() and GetAncestorTransform(). My understanding,
> confirmed by the code you linked, is GetCSSTransform() is the CSS transform
> of the current layer only, and GetAncestorTransform() is the cumulative CSS
> transform up to the last layer which had an APZC (and not including
> GetCSSTransform()).
> 
> I use GetCSSTransform() in computing the transform scale, so it does not
> include a CSS transform corresponding to a resolution set on a parent
> (non-apzc) layer.

My bad. That seems reasonable then.

> > Right, this usually happens when content sets CSS transforms. So the x-scale
> > and y-scale may be different, and we'll probably need to account for that.
> 
> For some reason I thought Gecko could do this even in the absence of CSS
> transforms. Not sure where I got that from.

It can, but I don't know if it keeps the x-scale and y-scale the same in that situation. I was more concerned about the CSS transforms because in that case I _know_ that the x-scale and y-scale can be different.

> Anyways, given that we currently do some things as if we had a transform
> scale of 1, we are currently not handling CSS transforms with different x
> and y scales differently, so this patch is an improvement over the current
> situation.

Fair enough.

(In reply to Botond Ballo [:botond] from comment #34)
> I believe these two coordinate systems are the same: the layer pixels of the
> parent layer.
> 
> mDevPixelsPerCSSPixel * GetParentResolution() is the correct scale factor to
> use when translating between the CSS pixels of the parent layer and the
> ParentLayer pixels of the parent layer. For example, we use this when
> calculating composition bounds, since the CSS dimensions of the composition
> bounds are in the CSS pixels of the parent layer.
> 
> GetZoomToParent() is the correct scale factor to use when translating
> between the CSS pixels of the current layer and the ParentLayer pixels of
> the parent layer. For example, we use this in
> CalculateCompositedRectInCSSPixels(), since there we are asking how many CSS
> pixels of the current layer can fit into the composition bounds.

Ok. It took me some time to wrap my head around this but I think it makes sense now. As this implicitly introduces the notion of separate layers having their own CSS pixels, we need be careful that code that assumes otherwise is corrected.

> > ::: gfx/layers/ipc/AsyncPanZoomController.cpp
> Immediately below, realMinZoom is compared to the ratio of the composition
> bounds to the scrollable rect. Since the scrollable rect is in CSS pixels
> (of the current layer) and the composition bounds is in ParentLayer pixels,
> the (conceptual) type of this ratio is CSSToParentLayerScale, so the thing
> we compare to it should also be a CSSToParentLayerScale.
> 
> We store mMinZoom and mMaxZoom as CSSToScreenScales. To convert them to
> CSSToParentLayerScales, we multiply them by the transform scale which is a
> ScreenToParentLayerScale.

Hmm ok.
I pushed a try build with this patch [1] mostly to check the android talos numbers. I'm also building it locally on Android, B2G, and Metro to do some manual testing before r+'ing. Otherwise it tentatively looks fine.

Changes to mCompositionBounds will affect android as well and we may need to update uses of mCompositionBounds in AsyncCompositionManager::TransformScrollableLayer and LayerManagerComposite::ComputeRenderIntegrity. Those two codepaths are only exercised on Fennec IIRC.

[1] https://tbpl.mozilla.org/?tree=Try&rev=666f68c51dfb
The build for me locally on B2G because of MOZ_END_ENUM_CLASS(Enum) instead of MOZ_END_ENUM_CLASS(PixelCastJustification). Fixed and re-pushed to try at https://tbpl.mozilla.org/?tree=Try&rev=e015dd823acc
This appears to regress Fennec. STR:
1. Load build with this patch applied
2. Load nightly.mozilla.org
3. Wait for the page to render

The top-left corner of the page is rendered but the rest is blank. Scrolling around makes the rest come paint (at low resolution) and zooming fixes it, but basically the initially rendered area is wrong. The latest nightly build doesn't exhibit this behaviour. Loading other pages like http://chrislord.net/files/mozilla/fixed2.html exhibit even worse behaviour in that they remain in a bad/low-res state.
Comment on attachment 8342687 [details] [diff] [review]
bug935219.patch

Review of attachment 8342687 [details] [diff] [review]:
-----------------------------------------------------------------

Minusing for now based on the Fennec regressions.

::: layout/base/Units.h
@@ +287,5 @@
> +
> +MOZ_BEGIN_ENUM_CLASS(PixelCastJustification, uint8_t)
> +  // For the root layer, Screen Pixel = Parent Layer Pixel.
> +  ScreenToParentLayerForRoot
> +MOZ_END_ENUM_CLASS(Enum)

s/Enum/PixelCastJustification/
Attachment #8342687 - Flags: review?(bugmail.mozilla) → review-
No longer blocks: 814435
(In reply to Chris Lord [:cwiiis] from comment #35)
> The changes to ClientTiledThebesLayer are consistent, I feel ok r+'ing that.
> I'd very much like you to check the numbers though and make sure this
> doesn't break the tile coherency code. It would be a shame to fix that and
> break it again in the same cycle :)

How do you typically check the numbers for this sort of thing? Do you have a test page that you look at and check in the debugger than certain metrics are what you expect?

> The bits in FrameMetrics and TabChild seem dubious to me though, but I would
> defer to you and kats in these situations... I didn't give much time to the
> APZC stuff, as both you and kats know the code a lot better than I do at the
> moment (I don't think my review is useful for that part).

If it seems dubious in any specific ways beyond what you've commented on, please do say. Being able to justify why things aren't dubious is a good test of my patch being sensible. :)

> ::: dom/ipc/TabChild.cpp
> @@ +384,5 @@
> >          // Calculate a really simple resolution that we probably won't
> >          // be keeping, as well as putting the scroll offset back to
> >          // the top-left of the page.
> >          mLastMetrics.mViewport = CSSRect(CSSPoint(), kDefaultViewportSize);
> > +        mLastMetrics.mCompositionBounds = ParentLayerIntRect(
> 
> Just to help me understand, why is this ParentLayer and not just Layer if
> the metrics is for the root layer? So that it doesn't have any layer
> transforms applied?

There are two ways to get to layer coordinates for a layer L.

1) Start with CSS coordinates for L, multiply by the device scale, and multiply
   by the resolutions of all the layers from the root up to and including L.
2) Start with (global) Screen coordinates, and unapply all the CSS and async
   transforms of all layers from the root up to and including L.

Both of these interpretations follow from the equations at the bottom of [1].

Parent layer coordinates just mean layer coordinates for a layer's parent layer.
So, if M is the parent of L, then L's parent layer coordinates are M's layer 
coordinates.

So we can adapt (2) to come up with a way of getting L's parent layer pixels:
start with global Screen coordinates, and unapply all the CSS and async
transforms of all layers from the root up to and _not_ including L.

If L is the root layer, then parent layer coordinates are the same as global
Screen coordinates, as there are no layers higher up whose transforms need
to be unapplied.

[1] https://staktrace.com/spout/entry.php?id=801

> ::: gfx/layers/FrameMetrics.h
> @@ +16,4 @@
> >  
> >  // The layer coordinates of the parent layer. Like the layer coordinates
> >  // of the current layer (LayerPixel) but doesn't include the current layer's
> >  // resolution.
> 
> This kind of answers my last question, I guess, but should this read
> transform? I thought the resolution only applies at the very end when
> drawing, and relates to LayoutDevice space?

Yeah, this comment was based on an earlier (and poorer) understanding of
coordinate systems. I will update it.

> @@ +138,5 @@
> > +  // to layer pixels of our parent layer. Much as mZoom is used to interface
> > +  // between inputs we get in screen pixels and quantities in CSS pixels,
> > +  // this is used to interface between mCompositionBounds and quantities
> > +  // in CSS pixels.
> > +  CSSToParentLayerScale GetZoomToParent() const
> 
> Can you do this? What if there's a transform on this layer?
>
> @@ +270,5 @@
> > +  // system in which APZCs receive input events) and our parent layer's
> > +  // layer pixels (the coordinate system of mCompositionBounds).
> > +  // This consists of the scale of the local CSS transform and the
> > +  // nontransient async transform.
> > +  ScreenToParentLayerScale mTransformScale;
> 
> This, and this comment, just don't seem right to me with respect to
> transforms... A scale is a single factor, but the CSS transform is an affine
> transform matrix - surely this can't work in a lot of cases? And if that's
> the case, shouldn't that be documented?

I think many things in APZ break if there is a CSS transform on the layer 
being scrolled other than the one used to compensate for the resolution
(which is just a scale and the same in the x- and y-scales). I will add a
comment that clarifies this.

I agree that we should rectify this situation at some point, and when we do,
mTransformScale will need to turn into a matrix, but let's fix one problem
at a time.

> ::: gfx/layers/client/ClientTiledThebesLayer.cpp
> @@ +43,5 @@
> >    aAttrs = ThebesLayerAttributes(GetValidRegion());
> >  }
> >  
> >  static LayoutDeviceRect
> > +ApplyParentLayerToLayoutTransform(const gfx3DMatrix& aTransform, const ParentLayerRect& aParentLayerRect)
> 
> Might as well get rid of this function at this point, TransformTo reads
> nicer anyway.

I was thinking of keeping this thin wrapper around because it enforces that the thing being converted is a ParentLayerRect. If we called TransformTo directly, the function would silently accept any sort of typed rect.

> ::: gfx/layers/client/TiledContentClient.cpp
> @@ +489,5 @@
> >  
> >    // Transform the screen coordinates into transformed layout device coordinates.
> >    LayoutDeviceRect transformedCompositionBounds =
> >      TransformCompositionBounds(compositionBounds, zoom, aPaintData->mScrollOffset,
> > +                            aPaintData->mResolution, aPaintData->mTransformParentLayerToLayout);
> 
> Might as well correct this indentation if you're changing the line.

Will do.

> ::: layout/base/Units.h
> @@ +325,5 @@
> > +
> > +// Convenience functions for transforming an entity from one strongly-typed
> > +// coordinate system to another using the provided transformation matrix.
> > +template <typename TargetUnits, typename SourceUnits>
> > +static gfx::PointTyped<TargetUnits> TransformTo(const gfx3DMatrix& aTransform,
> 
> I like this, do we want it for gfx::SizeTyped too?

I was thinking of adding these things on demand (i.e. only adding one when we actually have a call site for it), but if you prefer I can add it now.
As Kats discovered and pointed out to me on IRC, this patch breaks hit testing on Metro. 

STR:
  1. Load http://people.mozilla.org/~bballo/grid.html
  2. Zoom in
  3. Pan in the bottom right area of the page. The pan is ignored.

I've debugged this, and here's what is happening:

Let L be the root scrollable layer. When loading a page like http://people.mozilla.org/~bballo/grid.html, this is the only layer with an APZC.

L has a parent layer M.

Say the screen size is 1000 pixels (let's just consider one dimension). Now consider what happens when you zoom in, say by a factor of 2x. Let's say the page is repainted at the new resolution, so there is no transient async transform.

Here's what happens on B2G:
  - L gets a CSS transform of 0.5
  - L gets a (non-transient) async transform of 2
  - L's FrameMetrics gets a resolution of 2 and a parent resolution of 1

My patch handles this case fine: hit points start in the range [0, 1000] (screen coordinates) and are transformed into the range [0, 1000] (parent layer coordinates), and the composition bounds includes the parent resolution by not the resolution, and so is [0, 1000]. All is well.

Here's what I _thought_ happens on Metro:
  - M gets a CSS transform of 0.5
  - L gets a (non-transient) async transform of 2
  - L's FrameMetrics gets a resolution of 1 and a parent resolution of 2
      - since it's M, the _parent_ layer, that got the CSS transform

My patch would handle this case just fine too: hit points start in the range [0, 1000] (screen coordinates) and are transformed into the range [0, 2000] (parent layer coordinates), and the composition bounds is [0, 2000]. Again all is well.

Unfortunately, that's not what actually happens on Metro. What happens is:
  - M gets a CSS transform of 0.5
  - L gets a (non-transient) async transform of 2
  - L's FrameMetrics gets a resolution of 2 (!!!) and a parent resolution of 1 (!!!)

My patch does not handle this case fine: hit points are in the range [0, 2000] but the composition bounds is in the range [0, 1000].

Basically, my patch makes the assumption that a resolution and its corresponding CSS transform are "on the same level", and Metro breaks this assumption.

One problem might be that being "on the same level" is not a well-defined concept, given that resolutions are on pres shells but transforms are on layers.

Kats, if you have any insights about how this can be resolved, I would be glad to hear them.

For starters, it would help my understanding of the situation to know what is the mechanism by which the CSS transform goes to M on Metro, but L on B2G. I would think that the same mechanism ought to ensure that the resolution goes into the "parent resolution" on Metro, but on the current layer's resolution on B2G.
Flags: needinfo?(bugmail.mozilla)
Attached patch bug935219.patch (obsolete) — Splinter Review
Updated patch to fix the compiler error Kats pointed out in comment #40 and Chris' review comments. The Metro and Fennec regressions remain to be fixed.
Attachment #8342687 - Attachment is obsolete: true
(In reply to Kartikaya Gupta (email:kats@mozilla.com) from comment #39)
> This appears to regress Fennec. STR:
> 1. Load build with this patch applied
> 2. Load nightly.mozilla.org
> 3. Wait for the page to render
> 
> The top-left corner of the page is rendered but the rest is blank. Scrolling
> around makes the rest come paint (at low resolution) and zooming fixes it,
> but basically the initially rendered area is wrong. The latest nightly build
> doesn't exhibit this behaviour. Loading other pages like
> http://chrislord.net/files/mozilla/fixed2.html exhibit even worse behaviour
> in that they remain in a bad/low-res state.

I can reproduce this, and I agree with you that the uses of mCompositionBounds in AsyncCompositionManager::TransformScrollableLayer and LayerManagerComposite::ComputeRenderIntegrity are potential culprits. 

Unfortunately, my initial attempt to fix this by multiplying the composition bounds by the resolution (to arrive back at the pre-patch value of the composition bounds) in those places does not fix the problem. Perhaps the problem is not the removal the resolution, but the removal of clamping to the widget bounds? I will need to investigate further.
(In reply to Botond Ballo [:botond] from comment #44)
> (In reply to Kartikaya Gupta (email:kats@mozilla.com) from comment #39)
> > This appears to regress Fennec. STR:
> > 1. Load build with this patch applied
> > 2. Load nightly.mozilla.org
> > 3. Wait for the page to render
> > 
> > The top-left corner of the page is rendered but the rest is blank. Scrolling
> > around makes the rest come paint (at low resolution) and zooming fixes it,
> > but basically the initially rendered area is wrong. The latest nightly build
> > doesn't exhibit this behaviour. Loading other pages like
> > http://chrislord.net/files/mozilla/fixed2.html exhibit even worse behaviour
> > in that they remain in a bad/low-res state.
> 
> I can reproduce this, and I agree with you that the uses of
> mCompositionBounds in AsyncCompositionManager::TransformScrollableLayer and
> LayerManagerComposite::ComputeRenderIntegrity are potential culprits. 
> 
> Unfortunately, my initial attempt to fix this by multiplying the composition
> bounds by the resolution (to arrive back at the pre-patch value of the
> composition bounds) in those places does not fix the problem. Perhaps the
> problem is not the removal the resolution, but the removal of clamping to
> the widget bounds? I will need to investigate further.

So it seems like the changes to AsyncCompositionManager and LayerManagerComposite are necessary (that is, they restore the value of the composition bounds in that code to what it used to be before the patch), but they are not enough. The page nightly.mozilla.org is zoomed out by a factor of about 3.2 when I load it on my phone, so there is a resolution and the difference between the old and new values of the composition bounds are significant, I just need to find what other code relies on it.
(In reply to Botond Ballo [:botond] from comment #42)
> Kats, if you have any insights about how this can be resolved, I would be
> glad to hear them.
> 
> For starters, it would help my understanding of the situation to know what
> is the mechanism by which the CSS transform goes to M on Metro, but L on
> B2G. I would think that the same mechanism ought to ensure that the
> resolution goes into the "parent resolution" on Metro, but on the current
> layer's resolution on B2G.

So I recall running into similar weirdness at some point in the past. I came to the conclusion that this problem arises because the container layer for a presShell doesn't always have a populated FrameMetrics, and so when we compute the cumulative resolution and parent resolution in RecordFrameMetrics it doesn't quite work out. I even wrote a patch to fix this [1] but IIRC there were some problems with it and I was able to fix whatever bug I was working on some other way, and that patch got abandoned. If it helps solve this problem feel free to steal it and unbitrot it.

[1] https://github.com/staktrace/mozilla-central/commit/4aae77913a7bbeced7699c512e612de0ba04b33b
Flags: needinfo?(bugmail.mozilla)
Attached patch bug935219.patch (obsolete) — Splinter Review
(In reply to Botond Ballo [:botond] from comment #45)
> (In reply to Botond Ballo [:botond] from comment #44)
> > (In reply to Kartikaya Gupta (email:kats@mozilla.com) from comment #39)
> > > This appears to regress Fennec. STR:
> > > 1. Load build with this patch applied
> > > 2. Load nightly.mozilla.org
> > > 3. Wait for the page to render
> > > 
> > > The top-left corner of the page is rendered but the rest is blank. Scrolling
> > > around makes the rest come paint (at low resolution) and zooming fixes it,
> > > but basically the initially rendered area is wrong. The latest nightly build
> > > doesn't exhibit this behaviour. Loading other pages like
> > > http://chrislord.net/files/mozilla/fixed2.html exhibit even worse behaviour
> > > in that they remain in a bad/low-res state.
> > 
> > I can reproduce this, and I agree with you that the uses of
> > mCompositionBounds in AsyncCompositionManager::TransformScrollableLayer and
> > LayerManagerComposite::ComputeRenderIntegrity are potential culprits. 
> > 
> > Unfortunately, my initial attempt to fix this by multiplying the composition
> > bounds by the resolution (to arrive back at the pre-patch value of the
> > composition bounds) in those places does not fix the problem. Perhaps the
> > problem is not the removal the resolution, but the removal of clamping to
> > the widget bounds? I will need to investigate further.
> 
> So it seems like the changes to AsyncCompositionManager and
> LayerManagerComposite are necessary (that is, they restore the value of the
> composition bounds in that code to what it used to be before the patch), but
> they are not enough. The page nightly.mozilla.org is zoomed out by a factor
> of about 3.2 when I load it on my phone, so there is a resolution and the
> difference between the old and new values of the composition bounds are
> significant, I just need to find what other code relies on it.

The problem was the coordinate conversions performed on the critical display port in ClientTiledThebesLayer::BeginPaint() were not updated properly, which was causing the layer's valid region to be clipped to be too small.

The attached patch fixes this as well as adjusted the uses of mCompositionBounds in AsyncCompositionManager and LayerManagerComposite (and there was one other place, in ContainerLayerComposite, that needed to be adjusted as well). With this patch, nightly.mozilla.org seems to behave fine on Fennec.

The Metro regressions remains to be fixed.
Attachment #8345592 - Attachment is obsolete: true
Try push with updated patch: https://tbpl.mozilla.org/?tree=Try&rev=fb4574cc77df
Comment on attachment 8346236 [details] [diff] [review]
bug935219.patch

Review of attachment 8346236 [details] [diff] [review]:
-----------------------------------------------------------------

Do we have a gtest in there which will have failed before the fix, but passes now?
(In reply to Milan Sreckovic [:milan] from comment #49)
> Do we have a gtest in there which will have failed before the fix, but
> passes now?

I don't know if gtest is equipped to test this bug. The current APZC gtests just test APZC code in isolation, without Gecko itself running. The composition bounds calculation, on the other hand, is done in layout code (nsDisplayList.cpp) and depends on other Gecko components like the frame tree. I'm not sure how we can test that with gtest.

I would love to be able to write a test for this bug. If anyone has any suggestions about how, I would be interested to hear them.
(In reply to Botond Ballo [:botond] from comment #47)
> The Metro regressions remains to be fixed.

Update: I debugged the Metro regression and it seems to be caused by the fact that the transform and the resolution get set on different layers on Metro, something that my code doesn't expect. I'm not sure whether this is intentional and/or necessary. I filed bug 951936 for this.
Attached patch bug935219.patch (obsolete) — Splinter Review
Unbitrotted patch so people can play around with it if they'd like.
Attachment #8346236 - Attachment is obsolete: true
Depends on: 951936
Updated patch to include some of the earlier fixes which had slipped out. Also posted a second patch to work around a strange overloading issue that was causing a compiler error on older compilers (I will investigate this further and file a separate bug about it, but I don't want to hold up the progress of this bug).

Try push: https://tbpl.mozilla.org/?tree=Try&rev=f6f9e9b6cde4

I will wait with flagging the patches for final review until I have a clean try push, and a patch for bug 951936 that passes review.
Whoops, that workaround broke platforms to which it did not apply. Updated patch to fix this.

New try push: https://tbpl.mozilla.org/?tree=Try&rev=ca20bbf163c4
Attachment #8355298 - Attachment is obsolete: true
Rebased Part 1 patch to apply to latest trunk (notably, Randall's progressive tiling patches have landed and both those and this patch touch code in TiledContentClient).

New try push (also including patches for the prerequisite bug 951936): https://tbpl.mozilla.org/?tree=Try&rev=27c4af511cbc
Attachment #8355296 - Attachment is obsolete: true
(In reply to Botond Ballo [:botond] from comment #55)
> Updated patch to include some of the earlier fixes which had slipped out.
> Also posted a second patch to work around a strange overloading issue that
> was causing a compiler error on older compilers (I will investigate this
> further and file a separate bug about it, but I don't want to hold up the
> progress of this bug).

I investigated this issue further and filed bug 957835 for it, with a patch.

Try push with that patch instead of the workaround patch from this bug: https://tbpl.mozilla.org/?tree=Try&rev=96c4933c194c.
Comment on attachment 8355313 [details] [diff] [review]
Part 2 - Work around a strange C++ overloading issue

This workaround is obsolete now that bug 957835 has landed.
Attachment #8355313 - Attachment is obsolete: true
Rebased patch on top of patches from bug 959847, which is the new approach Timothy and I are taking to resolve the issue mentioned in https://bugzilla.mozilla.org/show_bug.cgi?id=935219#c51 (the old approach was bug 951936).

Tested patch on B2G, looks OK.
Attachment #8355702 - Attachment is obsolete: true
Depends on: 959847
No longer depends on: 951936
(In reply to Botond Ballo [:botond] from comment #60)
> Tested patch on B2G, looks OK.

Tested this with Fennec as well. I see some misbehaviour when zoomed out (only part of the screen is painted), but this might be a result of a known issue with bug 959847 where the valid region of the root scrollable layer is smaller than expected. Will retest once that issue is fixed.
(In reply to Botond Ballo [:botond] from comment #61)
> Tested this with Fennec as well. I see some misbehaviour when zoomed out
> (only part of the screen is painted), but this might be a result of a known
> issue with bug 959847 where the valid region of the root scrollable layer is
> smaller than expected. Will retest once that issue is fixed.

I see the same behaviour on Metro. This leads me to believe it is caused by the same valid-region issue, because that affects Fennec and Metro similarly.
I rebased this to see if it fixes bug 964935, but there were a couple of spots where I wasn't sure what to do, and it's not working correctly. I'm not going to obsolete the old one because Botond might want to just rebase from his old patch himself.
Nominating as 1.3? because it blocks bug 964835, which is 1.3+.
blocking-b2g: --- → 1.3?
(In reply to Botond Ballo [:botond] from comment #64)
> Nominating as 1.3? because it blocks bug 964835, which is 1.3+.

Sorry, bug 964935.
This looks potentially risky. Are there no other low risk solutions possible here?
(In reply to Jason Smith [:jsmith] from comment #66)
> This looks potentially risky. Are there no other low risk solutions possible
> here?

Botond and I both looked at the code and can't think of any low risk solutions to bug 964935. I agree that this one is pretty risky too but it's the best we have right now. Any chance we can drop the 1.3+ from bug 964935?
blocking-b2g: 1.3? → 1.4+
No longer blocks: 964935
Rebased patch to apply to latest trunk.
Attachment #8362676 - Attachment is obsolete: true
Attachment #8374358 - Attachment is obsolete: true
Flipping nom back - per a drivers' decision, QC needs to nominate individual actionable bugs to be FC & QC blockers, not block on meta bugs.
blocking-b2g: 1.4+ → 1.4?
Rebased patch again and tested on B2G, Fennec, and Metro. Seems to be working fine.

Flagging for review. Kats, if you feel someone else should review it as well, feel free to flag accordingly.
Attachment #8380108 - Attachment is obsolete: true
Attachment #8380774 - Flags: review?(bugmail.mozilla)
(In reply to Botond Ballo [:botond] from comment #70)
> Rebased patch again and tested on B2G, Fennec, and Metro. Seems to be
> working fine.

Note that this patch requires the patches in bug 959847 as well, including the ones I've just posted.

Try push for bug 959847 + bug 935219: https://tbpl.mozilla.org/?tree=Try&rev=88647476fa58.
Without this we don't hit APZ performance target for 1.4.
blocking-b2g: 1.4? → 1.4+
Attached file coordinate-systems.pdf (obsolete) —
Attached is a diagram of the various coordinate systems in play here. I think it will be useful to look at while reviewing the bug.
(In reply to Botond Ballo [:botond] from comment #73)
> Created attachment 8380946 [details]
> coordinate-systems.pdf
> 
> Attached is a diagram of the various coordinate systems in play here. I
> think it will be useful to look at while reviewing the bug.

There is one thing the diagram glosses over: it claims that prior to this patch, FrameMetrics::mCompositionBounds is in a particular coordinate space which I label "silly pixels". That's not quite true, since prior to this patch we additionally clamp the composition bounds to the widget bounds in RecordFrameMetrics(), so they are in an ever weirder space ("silly pixels clamped to the widget bounds").
Attached file coordinate-systems.pdf
(In reply to Botond Ballo [:botond] from comment #74)
> (In reply to Botond Ballo [:botond] from comment #73)
> > Created attachment 8380946 [details]
> > coordinate-systems.pdf
> > 
> > Attached is a diagram of the various coordinate systems in play here. I
> > think it will be useful to look at while reviewing the bug.
> 
> There is one thing the diagram glosses over: it claims that prior to this
> patch, FrameMetrics::mCompositionBounds is in a particular coordinate space
> which I label "silly pixels". That's not quite true, since prior to this
> patch we additionally clamp the composition bounds to the widget bounds in
> RecordFrameMetrics(), so they are in an ever weirder space ("silly pixels
> clamped to the widget bounds").

Updated diagram to reflect this clamping.
Attachment #8380946 - Attachment is obsolete: true
Comment on attachment 8380774 [details] [diff] [review]
Fix composition bounds calculation and APZ hit testing (again)

Review of attachment 8380774 [details] [diff] [review]:
-----------------------------------------------------------------

::: gfx/layers/FrameMetrics.h
@@ +25,5 @@
> +//     answer), multiply by the device scale and the resolutions of all
> +//     layers from the root down to and including the parent.
> +//   - Start with global screen coordinates and unapply all CSS and async
> +//     transforms from the root down to and including the parent.
> +// It's helpful to look at https://staktrace.com/spout/entry.php?id=801

I would replace this link with a link to https://wiki.mozilla.org/Platform/GFX/APZ#Coordinate_systems

@@ +318,5 @@
> +  // This consists of the scale of the local CSS transform and the
> +  // nontransient async transform.
> +  // TODO: APZ does not currently work well if there is a CSS transform
> +  //       on the layer being scrolled that's not just a scale that's
> +  //       the same in both directions. When we fix this, mTranformScale

typo: s/mTranform/mTransform/

::: gfx/layers/client/ClientTiledThebesLayer.cpp
@@ +100,5 @@
>      // Convert the display port to screen space first so that we can transform
>      // it into layout device space.
> +    const ParentLayerRect& criticalDisplayPort = metrics.mCriticalDisplayPort
> +                                               * metrics.mDevPixelsPerCSSPixel
> +                                               * metrics.GetParentResolution();

See my comment in APZCTreeManager.cpp - doesn't this also end up in the "unnamed" coordinate space rather than ParentLayer pixels?

::: gfx/layers/client/TiledContentClient.cpp
@@ +147,5 @@
>  
> +  aCompositionBounds = ParentLayerRect(compositorMetrics.mCompositionBounds);
> +  // TODO(botond): if something doesn't work, double-check this line
> +  //               (in particular, try using mDevPixelsPerCSSPixel * GetParentResolution()
> +  //               as the scale factor instead)

Do we have a way to check if this is working as intended?

::: gfx/layers/composite/APZCTreeManager.cpp
@@ +212,5 @@
>          CSSRect touchSensitiveRegion;
>          if (state->mController->GetTouchSensitiveRegion(&touchSensitiveRegion)) {
>            visible = visible.Intersect(touchSensitiveRegion
> +                                      * metrics.mDevPixelsPerCSSPixel
> +                                      * metrics.GetParentResolution());

According to your PDF, the argument to Intersect here is going to be in the "unnamed (_not_ M's layer pixels)" coordinate space, but you're intersecting it with a ParentLayerRect. That seems wrong?

@@ +864,3 @@
>    // It is OC.Inverse() * NC.Inverse() * MC.Inverse() * LC.Inverse() * LN.Inverse() at L,
>    //   and RC.Inverse() * QC.Inverse() * PC.Inverse() * PN.Inverse()                at P.
> +  gfxPoint hitTestPointForThisLayer = ancestorUntransform.ProjectPoint(aHitTestPoint);

Remove the last two comment lines above this ("It is OC.Inverse() * ... at P.").

::: gfx/layers/ipc/AsyncPanZoomController.cpp
@@ +2109,5 @@
>  }
> +
> +void AsyncPanZoomController::UpdateTransformScale()
> +{
> +  // The x and y scales should be the same.

Add a warning if the scales are not the same

::: layout/base/Units.h
@@ +9,5 @@
>  
>  #include "mozilla/gfx/Point.h"
>  #include "mozilla/gfx/Rect.h"
>  #include "mozilla/gfx/ScaleFactor.h"
> +#include "gfx3DMatrix.h"

While the TransformTo functions are nice, I'm not sure they're worth the cost of including gfx3DMatrix.h here, because this file is included from a lot of places. It would be better to split the ViewAs and TransformTo functions into a separate header which is only included in the .cpp files that actually use it.
Updated patch to address review comments. Also rebased on top of bug 958596 since that is about to land.

(In reply to Kartikaya Gupta (email:kats@mozilla.com) from comment #76)
> Comment on attachment 8380774 [details] [diff] [review]
> Fix composition bounds calculation and APZ hit testing (again)
> 
> Review of attachment 8380774 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> ::: gfx/layers/FrameMetrics.h
> @@ +25,5 @@
> > +//     answer), multiply by the device scale and the resolutions of all
> > +//     layers from the root down to and including the parent.
> > +//   - Start with global screen coordinates and unapply all CSS and async
> > +//     transforms from the root down to and including the parent.
> > +// It's helpful to look at https://staktrace.com/spout/entry.php?id=801
> 
> I would replace this link with a link to
> https://wiki.mozilla.org/Platform/GFX/APZ#Coordinate_systems

Done.

> @@ +318,5 @@
> > +  // This consists of the scale of the local CSS transform and the
> > +  // nontransient async transform.
> > +  // TODO: APZ does not currently work well if there is a CSS transform
> > +  //       on the layer being scrolled that's not just a scale that's
> > +  //       the same in both directions. When we fix this, mTranformScale
> 
> typo: s/mTranform/mTransform/

Done.

> ::: gfx/layers/client/ClientTiledThebesLayer.cpp
> @@ +100,5 @@
> >      // Convert the display port to screen space first so that we can transform
> >      // it into layout device space.
> > +    const ParentLayerRect& criticalDisplayPort = metrics.mCriticalDisplayPort
> > +                                               * metrics.mDevPixelsPerCSSPixel
> > +                                               * metrics.GetParentResolution();
> 
> See my comment in APZCTreeManager.cpp - doesn't this also end up in the
> "unnamed" coordinate space rather than ParentLayer pixels?

I think several things are wrong with the coordinate transformations in this function, including some quantities being misleadingly typed. I plan to discuss this with Chris and fix it in a follow-up. However, I'd rather not have that hold up the progress of this bug. The current behaviour seems to be working well enough, and the intent of my changes in this patch is to preserve the current behaviour by adjusting the 'layoutToParentLayer' matrix and the scale factor used here by the same amount (metrics.mResolution).
 
> ::: gfx/layers/client/TiledContentClient.cpp
> @@ +147,5 @@
> >  
> > +  aCompositionBounds = ParentLayerRect(compositorMetrics.mCompositionBounds);
> > +  // TODO(botond): if something doesn't work, double-check this line
> > +  //               (in particular, try using mDevPixelsPerCSSPixel * GetParentResolution()
> > +  //               as the scale factor instead)
> 
> Do we have a way to check if this is working as intended?

That comment was left over from a while ago. I double-checked this code and verified that GetZoomToParent() is the correct scale factor, as we will be dividing the composition bounds by this factor an expecting a quantity in the CSS pixels of the current layer (if the scale factor were mDevPixelsPerCSSPixel * GetParentResolution(), the result would instead be in the CSS pixels of the parent layer).

In terms of actually checking whether this particular code is working as intended, I don't have anything in mind. I'm open to ideas.

> ::: gfx/layers/composite/APZCTreeManager.cpp
> @@ +212,5 @@
> >          CSSRect touchSensitiveRegion;
> >          if (state->mController->GetTouchSensitiveRegion(&touchSensitiveRegion)) {
> >            visible = visible.Intersect(touchSensitiveRegion
> > +                                      * metrics.mDevPixelsPerCSSPixel
> > +                                      * metrics.GetParentResolution());
> 
> According to your PDF, the argument to Intersect here is going to be in the
> "unnamed (_not_ M's layer pixels)" coordinate space, but you're intersecting
> it with a ParentLayerRect. That seems wrong?

The code is correct as long as touchSensitiveRegion is in the CSS pixels of the parent layer ("M's CSS pixels" from the PDF).

touchSensitiveRegion ultimately comes from layout code, where I find it hard to reason about coordinate systems, but since the only place I'm aware of a touch-sensitive region being used right now is for the keyboard app, and that cannot be zoomed, M's CSS pixels are the same as L's CSS pixels and it does not matter.

Since we are about to get rid of GetTouchSensitiveRegion() when we implement bug 918288, I think this is good enough for now.

> @@ +864,3 @@
> >    // It is OC.Inverse() * NC.Inverse() * MC.Inverse() * LC.Inverse() * LN.Inverse() at L,
> >    //   and RC.Inverse() * QC.Inverse() * PC.Inverse() * PN.Inverse()                at P.
> > +  gfxPoint hitTestPointForThisLayer = ancestorUntransform.ProjectPoint(aHitTestPoint);
> 
> Remove the last two comment lines above this ("It is OC.Inverse() * ... at
> P.").

Done.

> ::: gfx/layers/ipc/AsyncPanZoomController.cpp
> @@ +2109,5 @@
> >  }
> > +
> > +void AsyncPanZoomController::UpdateTransformScale()
> > +{
> > +  // The x and y scales should be the same.
> 
> Add a warning if the scales are not the same

Done.

> ::: layout/base/Units.h
> @@ +9,5 @@
> >  
> >  #include "mozilla/gfx/Point.h"
> >  #include "mozilla/gfx/Rect.h"
> >  #include "mozilla/gfx/ScaleFactor.h"
> > +#include "gfx3DMatrix.h"
> 
> While the TransformTo functions are nice, I'm not sure they're worth the
> cost of including gfx3DMatrix.h here, because this file is included from a
> lot of places. It would be better to split the ViewAs and TransformTo
> functions into a separate header which is only included in the .cpp files
> that actually use it.

Good point. I introduced layout/base/UnitTransforms.h.
Attachment #8380774 - Attachment is obsolete: true
Attachment #8380774 - Flags: review?(bugmail.mozilla)
Attachment #8381668 - Flags: review?(bugmail.mozilla)
Sorry, that was the wrong patch (I had some local changes that weren't in there). Here is the correct patch.
Attachment #8381668 - Attachment is obsolete: true
Attachment #8381668 - Flags: review?(bugmail.mozilla)
Attachment #8381722 - Flags: review?(bugmail.mozilla)
Comment on attachment 8381722 [details] [diff] [review]
Fix composition bounds calculation and APZ hit testing (again)

Review of attachment 8381722 [details] [diff] [review]:
-----------------------------------------------------------------

r=me with comments addressed

::: dom/ipc/TabChild.cpp
@@ +1580,5 @@
>  
>      FrameMetrics newMetrics = aFrameMetrics;
>      APZCCallbackHelper::UpdateRootFrame(utils, newMetrics);
>  
> +    CSSRect cssCompositedRect = CSSRect(aFrameMetrics.CalculateCompositedRectInCssPixels());

Any particular reason you switched from newMetrics to aFrameMetrics here? UpdateRootFrame can modify newMetrics so this technically could have functional consequences (although in practice I think it only modifies the scroll offset and/or displayport and should have no consequences)

::: gfx/layers/ipc/AsyncPanZoomController.cpp
@@ +2111,5 @@
>  }
> +
> +void AsyncPanZoomController::UpdateTransformScale()
> +{
> +  auto nontransientTransforms = GetCSSTransform() * GetNontransientAsyncTransform();

s/auto/gfx3DMatrix/

::: layout/base/Units.h
@@ +9,5 @@
>  
>  #include "mozilla/gfx/Point.h"
>  #include "mozilla/gfx/Rect.h"
>  #include "mozilla/gfx/ScaleFactor.h"
> +#include "gfx3DMatrix.h"

Move this include to UnitTransforms.h
Attachment #8381722 - Flags: review?(bugmail.mozilla) → review+
Updated patch to address review comments. Carrying r+.

(In reply to Kartikaya Gupta (email:kats@mozilla.com) from comment #79)

> ::: dom/ipc/TabChild.cpp
> @@ +1580,5 @@
> >  
> >      FrameMetrics newMetrics = aFrameMetrics;
> >      APZCCallbackHelper::UpdateRootFrame(utils, newMetrics);
> >  
> > +    CSSRect cssCompositedRect = CSSRect(aFrameMetrics.CalculateCompositedRectInCssPixels());
> 
> Any particular reason you switched from newMetrics to aFrameMetrics here?
> UpdateRootFrame can modify newMetrics so this technically could have
> functional consequences (although in practice I think it only modifies the
> scroll offset and/or displayport and should have no consequences)

Nope. It got mixed up over the course of three months of rebases. Thanks for catching it!

> ::: gfx/layers/ipc/AsyncPanZoomController.cpp
> @@ +2111,5 @@
> >  }
> > +
> > +void AsyncPanZoomController::UpdateTransformScale()
> > +{
> > +  auto nontransientTransforms = GetCSSTransform() * GetNontransientAsyncTransform();
> 
> s/auto/gfx3DMatrix/

I changed this, but I can't resist commenting on this topic in general.

Here we had a function of the form

  something = some_expression.some_function();

and changed it to

  auto x = some_expression;
  ...
  something = x.some_function();

What do we gain in a case like this by writing out the type of 'x' explicitly?

If the answer is "the reader can explicitly see what the type of 'x'" is, I don't buy that because that wasn't the case before the change, either. For this argument to be consistent, we should also require that the original form be written:

  Type x = some_expression;
  something = x.some_function();

but I don't see this being required in general (and that's just as well, for that would mean not having subexpressions in our programs at all).

On the other hand, I can tell you what I lose by writing out the type of 'x' explicitly: I have to manually compute the type of 'some_expression', a task that the compiler is able to perfectly well for me. In the above example, I have to look up the return types of GetCSSTransform() and GetNontransientAsyncTransform() (and in general I might have to do more complicated things like manually perform overload resolution).

It might also be worth noting that C++ experts recommend using auto "almost always": see point 4 of [1].

> ::: layout/base/Units.h
> @@ +9,5 @@
> >  
> >  #include "mozilla/gfx/Point.h"
> >  #include "mozilla/gfx/Rect.h"
> >  #include "mozilla/gfx/ScaleFactor.h"
> > +#include "gfx3DMatrix.h"
> 
> Move this include to UnitTransforms.h

Whoops, that was kind of the point of introducing UnitTransforms.h, wasn't it. Done.


[1] http://herbsutter.com/2013/08/12/gotw-94-solution-aaa-style-almost-always-auto/
Attachment #8381722 - Attachment is obsolete: true
Attachment #8382331 - Flags: review+
(In reply to Botond Ballo [:botond] from comment #80)
> What do we gain in a case like this by writing out the type of 'x'
> explicitly?
> 
> If the answer is "the reader can explicitly see what the type of 'x'" is, I
> don't buy that because that wasn't the case before the change, either. For
> this argument to be consistent, we should also require that the original
> form be written:

I agree that in the old code the reader couldn't explicitly see the type of 'x' either, and that using 'auto' is roughly equivalent in readability to the old code. However, IMO using the type instead of auto makes the readability _better_ than the old code, and that's something that is desirable.

> On the other hand, I can tell you what I lose by writing out the type of 'x'
> explicitly: I have to manually compute the type of 'some_expression', a task
> that the compiler is able to perfectly well for me.
> 
> In the above example, I
> have to look up the return types of GetCSSTransform() and
> GetNontransientAsyncTransform() (and in general I might have to do more
> complicated things like manually perform overload resolution).

In the above example, if you didn't know it was a gfx3DMatrix, yet managed to call GetXScale() and GetYScale() on it, you were probably relying on the IDE to find those functions. It's also possible to have an error in the code, accidentally multiply two things and get back some other random type which also has a GetXScale() and GetYScale() function, and continue merrily on without realizing it, because the IDE would autocomplete it just fine and everything would compile just fine. You've already dealt with problems caused by badly-scoped operator overloading in our codebase, and I wouldn't be at all surprised if in some circumstances (other platforms or because of future changes) that multiplication returned a Matrix4x4 or something else. (Granted Matrix4x4 doesn't have a GetXScale or GetYScale but it could in the future, and the code would silently start breaking).

> It might also be worth noting that C++ experts recommend using auto "almost
> always": see point 4 of [1].
>
> [1]
> http://herbsutter.com/2013/08/12/gotw-94-solution-aaa-style-almost-always-
> auto/

We can discuss this in person if you like but from my reading of point 4 it doesn't seem like most of those points are actually advantages that apply to a statement like "auto nontransientTransforms = GetCSSTransform() * GetNontransientAsyncTransform();" and don't _also_ apply to the statement "gfx3DMatrix nontransientTransforms = GetCSSTransform() * GetNontransientAsyncTransform();". I can see only maybe one or two from the list favouring "auto", but I would have to weigh those against the readability advantage which I still claim exists here :)
(In reply to Kartikaya Gupta (email:kats@mozilla.com) from comment #81)
> I wouldn't
> be at all surprised if in some circumstances (other platforms or because of
> future changes) that multiplication returned a Matrix4x4 or something else.
> (Granted Matrix4x4 doesn't have a GetXScale or GetYScale but it could in the
> future, and the code would silently start breaking).

If the expression's type changed from gfx3DMatrix to Matrix4x4 and 
as a result GetXScale/GetYScale started returning different values
than before, I'd say the authors of gfx3DMatrix/Matrix4x4 were
doing something untoward.

What's more likely if the expression type changes, is that GetXScale/
GetYScale continues working just fine, and then I'd say "good thing
we used 'auto' and didn't have to update this code manually".

But anyways, I don't want to belabour this point. We can chat more 
in person like you suggested :)
(In reply to Botond Ballo [:botond] from comment #71)
> Try push for bug 959847 + bug 935219:
> https://tbpl.mozilla.org/?tree=Try&rev=88647476fa58.

This try run shows some failures on b2g-emulator mochitests. I ran the test_formaction_attribute.html test on a local non-debug b2g-emulator setup (code from master, with no additional patches) and was able to reproduce the test failure (which manifests in the form of a timeout). At first I thought this would be a timing issue that got tickled by these patches but that doesn't appear to be the case as far as I can tell.

In my case, the reason the test hangs is because the focus event never gets dispatched, and so after the main body of the test completes, the event-driven continuation bits don't run. I tracked down why the focus event doesn't get dispatched after putting focus in the input field, and I traced it to the line of code at [1]. At this point, mActiveWindow is null, which makes isElementInActiveWindow false, which makes sendFocusEvent false a few lines down, and that seems to go down a code path where we never send the focus event. I'm not sure why mActiveWindow is null here; we'd probably need to ask :ehsan about that.

However, Botond also pushed a second try run at [2] where the test is passing just fine, so maybe the test failed for some other reason in the first try run? Either way we'll probably want to make sure it is reliably affected by these patches first.

[1] http://mxr.mozilla.org/mozilla-central/source/dom/base/nsFocusManager.cpp?rev=49993e01b351#1154
[2] https://tbpl.mozilla.org/?tree=Try&rev=e6b0a4d07706
New try push off a clean m-c build + tn's patches from bug 959847 + the patch from this bug:

https://tbpl.mozilla.org/?tree=Try&rev=713afa569807
It is definitely this patch that is causing the M2 test on B2G ICS Emulator opt to fail. Looking at the logcat from one of the failed runs for test_formaction_attribute.html, the symptoms look different than when I was reproducing the timeout locally. Specifically, in the try run log I see this:

17:58:05     INFO -  02-27 01:44:04.052   704   704 I GeckoDump: 969 INFO TEST-START | /tests/content/html/content/test/forms/test_formaction_attribute.html
17:58:05  WARNING -  02-27 01:44:07.182   704   704 E GeckoConsole: [JavaScript Error: "The Components object is deprecated. It will soon be removed." {file: "http://mochi.test:8888/tests/SimpleTest/EventUtils.js" line: 28}]
17:58:05     INFO -  02-27 01:44:07.493   704   704 E GeckoConsole: [JavaScript Warning: "A form was submitted in the windows-1252 encoding which cannot encode all Unicode characters, so user input may get corrupted. To avoid this problem, the page should be changed so that the form is submitted in the UTF-8 encoding either by changing the encoding of the page itself to UTF-8 or by specifying accept-charset=utf-8 on the form element." {file: "http://mochi.test:8888/tests/content/html/content/test/forms/test_formaction_attribute.html" line: 135}]
17:58:05     INFO -  02-27 01:44:07.583   704   704 E GeckoConsole: [JavaScript Warning: "A form was submitted in the windows-1252 encoding which cannot encode all Unicode characters, so user input may get corrupted. To avoid this problem, the page should be changed so that the form is submitted in the UTF-8 encoding either by changing the encoding of the page itself to UTF-8 or by specifying accept-charset=utf-8 on the form element." {file: "chrome://specialpowers/content/specialpowersAPI.js" line: 95}]
17:58:05     INFO -  02-27 01:44:07.653   704   704 E GeckoConsole: [JavaScript Warning: "A form was submitted in the windows-1252 encoding which cannot encode all Unicode characters, so user input may get corrupted. To avoid this problem, the page should be changed so that the form is submitted in the UTF-8 encoding either by changing the encoding of the page itself to UTF-8 or by specifying accept-charset=utf-8 on the form element." {file: "chrome://specialpowers/content/specialpowersAPI.js" line: 95}]
17:58:05     INFO -  02-27 01:44:07.733   704   704 E GeckoConsole: [JavaScript Warning: "A form was submitted in the windows-1252 encoding which cannot encode all Unicode characters, so user input may get corrupted. To avoid this problem, the page should be changed so that the form is submitted in the UTF-8 encoding either by changing the encoding of the page itself to UTF-8 or by specifying accept-charset=utf-8 on the form element." {file: "chrome://specialpowers/content/specialpowersAPI.js" line: 95}]
17:58:05     INFO -  02-27 01:44:07.863   704   704 E GeckoConsole: [JavaScript Warning: "A form was submitted in the windows-1252 encoding which cannot encode all Unicode characters, so user input may get corrupted. To avoid this problem, the page should be changed so that the form is submitted in the UTF-8 encoding either by changing the encoding of the page itself to UTF-8 or by specifying accept-charset=utf-8 on the form element." {file: "chrome://specialpowers/content/specialpowersAPI.js" line: 95}]
17:58:05     INFO -  02-27 01:49:26.403   704   704 I GeckoDump: 970 ERROR TEST-UNEXPECTED-FAIL | /tests/content/html/content/test/forms/test_formaction_attribute.html | Test timed out.

This output seems to indicate at least some of the form submissions are working, whereas in my case none of them were.
I filed bug 977625 for the issue I described in comment 83. I am unable to reproduce any of the tbpl failures for either test_formaction_attribute.html or test_input_event.html locally. Looks like we'll have to debug it via tbpl. It would help to split this patch up into smaller standalone pieces so we can bisect it.
Updated patch to correct a mistake Kats pointed out over IRC, where AsyncPanZoomController::UpdateTransformScale() and AsyncPanZoomController::ToParentLayerCoords() were multiplying the non-transient async transform and the CSS transform in the wrong order. Carrying r+.

Regarding the mochitest failures, I am unable to reproduce them locally, so I have requested access to a Try machine to debug them there (bug 977700).
Attachment #8382331 - Attachment is obsolete: true
Attachment #8383215 - Flags: review+
Moving to 1.3+ per discussion in bug 964935 - we're taking this to 1.3 to resolve bug 964935, but without the dependency included here.
blocking-b2g: 1.4+ → 1.3+
Attached patch bug935219.patch (obsolete) — Splinter Review
Finally fixed the mochitest failures.

The problem was that the clamping of the composition bounds of the RCDRSF (root content document's root scroll frame) to the widget bounds that we did prior to this patch did not just serve the purpose of papering over the coordinate mismatch. It also served the purpose of correcting for the case where the size of the viewport frame was larger than the size of the widget, and layout was giving us the former but we really wanted the latter.

After some discussion with Kats and Timothy we realized that what we want to do for the RCDRSF is just use the widget bounds.

I updated the patch to do that, and it now has a clean try push: https://tbpl.mozilla.org/?tree=Try&rev=b2abaf9f1995.

Re-requesting review for this change. The only change compared to the previous version of the patch is an added hunk in RecordFrameMetrics that sets the composition bounds to the wiget bounds for the RCDRSF.
Attachment #8383215 - Attachment is obsolete: true
Attachment #8388240 - Flags: review?(bugmail.mozilla)
Attachment #8388240 - Flags: review?(bugmail.mozilla) → review+
Rebased and retested on B2G, Fennec, and Metro. 

I can't attach the rebased patch right now (bugzilla attachmens are down), nor can I land (the trees are closed), but I will do both as soon as I can.
Rebased patch. Carrying r+.
Attachment #8388240 - Attachment is obsolete: true
Attachment #8388663 - Flags: review+
Timothy pointed out that using the bounds of the nearest widget is wrong on Metro, because the nearest widget on Metro includes the chrome. Instead, we should check to see if the document has its own widget, if so use that widget's bounds, and if not use the document's view's bounds.

Updated patch to do this. Flagging for Timothy to review the changed bits.
Attachment #8388663 - Attachment is obsolete: true
Attachment #8388754 - Flags: review?(tnikkel)
Comment on attachment 8388754 [details] [diff] [review]
Fix composition bounds calculation and APZ hit testing (again)

>+          bounds = view->GetBounds().ToNearestPixels(auPerDevPixel);
>+        }
>+        metrics.mCompositionBounds = ParentLayerIntRect::FromUnknownRect(mozilla::gfx::IntRect(
>+            bounds.x, bounds.y, bounds.width, bounds.height));
>+      }

If we get our bounds from the view shouldn't we be converting them like we do above? Instead of from unknownrect, including the parent resolution.
(In reply to Timothy Nikkel (:tn) from comment #93)
> Comment on attachment 8388754 [details] [diff] [review]
> Fix composition bounds calculation and APZ hit testing (again)
> 
> >+          bounds = view->GetBounds().ToNearestPixels(auPerDevPixel);
> >+        }
> >+        metrics.mCompositionBounds = ParentLayerIntRect::FromUnknownRect(mozilla::gfx::IntRect(
> >+            bounds.x, bounds.y, bounds.width, bounds.height));
> >+      }
> 
> If we get our bounds from the view shouldn't we be converting them like we
> do above? Instead of from unknownrect, including the parent resolution.

The RCD-RSF should have a parent resolution of 1. The only other conversion we do above is multiplying by auPerDevPixel, which we also do if we use the view bounds.
(In reply to Botond Ballo [:botond] from comment #94)
> The RCD-RSF should have a parent resolution of 1. The only other conversion
> we do above is multiplying by auPerDevPixel, which we also do if we use the
> view bounds.

Sure, but why not just do the conversion correctly instead of incorrectly marking it as an unknown rect when we know the rect and can properly convert it?
Good point. Updated.
Attachment #8388754 - Attachment is obsolete: true
Attachment #8388754 - Flags: review?(tnikkel)
Attachment #8388792 - Flags: review?(tnikkel)
Comment on attachment 8388792 [details] [diff] [review]
Fix composition bounds calculation and APZ hit testing (again)

+  // For the root scroll frame of the root content document, the above
+  // calculation will yield the size of the viewport frame as the composition
+  // bounds, which can be larger than the size of the widget containing the
+  // content being rendered. In such a case we want the widget size instead.
+  // On some platforms, like metro, there is no widget corresponding to the
+  // root content document, so we use the view size instead.

Can we make this comment be something like:

For the root scroll frame of the root content document, the above calculation will yield the size of the viewport frame as the composition bounds, which doesn't actually correspond to what is visible when nsIDOMWindowUtils::setCSSViewport has been called to modify the visible area of the prescontext that the viewport frame is reflowed into. In that case if our document has a widget then the widgets bounds will correspond to what is visible. If we don't have a widget the root view's bounds correspond to what would be visible because they don't get modified by setCSSViewport.
Attachment #8388792 - Flags: review?(tnikkel) → review+
Attached patch Patch for uplifting to 1.3 (obsolete) — Splinter Review
Rebased patch to apply to 1.3. Compiled and did some brief testing on Nexus 4, seems to be working fine.
https://hg.mozilla.org/mozilla-central/rev/fb3193bbf199
Status: NEW → RESOLVED
Closed: 10 years ago
Flags: in-testsuite+
Resolution: --- → FIXED
Target Milestone: --- → mozilla30
Please nominate the 1.3 for approval-mozilla-b2g28 when you get a chance. Also, is this something we might want on 29 as well for other APZC consumers? If so, please request aurora approval too :)
Flags: needinfo?(botond)
(In reply to Ryan VanderMeulen [:RyanVM UTC-4] from comment #101)
> Please nominate the 1.3 for approval-mozilla-b2g28 when you get a chance.

QA is doing exploratory testing of the 1.3 patch as per https://bugzilla.mozilla.org/show_bug.cgi?id=964935#c66. I'll nominate once I hear back from them.

> Also, is this something we might want on 29 as well for other APZC
> consumers? If so, please request aurora approval too :)

I don't think this issue is pressing enough for other APZC consumers. If others disagree I can be swayed.
Flags: needinfo?(botond)
Comment on attachment 8389343 [details] [diff] [review]
Patch for uplifting to 1.3

Rebased this again 1.3, posted to bug 964935, and requested 1.3 uplift there.
Attachment #8389343 - Attachment is obsolete: true
We should also go through the bugs that depend on this one and close/dupe any that are now fixed.
Depends on: 984279
Fixed on v1.3 by way of bug 964935. Calling this wontfix for 29 since nobody else spoke up in response to comment 102 :)
Depends on: 1000423
You need to log in before you can comment on or make changes to this bug.