Closed Bug 947337 Opened 11 years ago Closed 10 years ago

Touch Events coordinates returns strange values

Categories

(Core :: Panning and Zooming, defect)

defect
Not set
normal

Tracking

()

RESOLVED FIXED
mozilla29
blocking-b2g 1.3+
Tracking Status
firefox27 --- wontfix
firefox28 --- fixed
firefox29 --- fixed
b2g-v1.3 --- fixed
b2g-v1.4 --- fixed

People

(Reporter: vingtetun, Assigned: kats)

References

Details

Attachments

(8 files, 8 obsolete files)

14.97 KB, patch
Details | Diff | Splinter Review
138.44 KB, video/quicktime
Details
400.00 KB, text/plain
Details
4.98 KB, patch
botond
: review+
Details | Diff | Splinter Review
5.40 KB, patch
botond
: review+
Details | Diff | Splinter Review
9.15 KB, patch
botond
: review+
Details | Diff | Splinter Review
4.75 KB, patch
Details | Diff | Splinter Review
5.37 KB, patch
botond
: review+
Details | Diff | Splinter Review
I was hoping that bug 940706 fix that but it does not. In the contacts app there is a list of letters on the right side that are used as shortcuts. With APZ it does not work all the time. 

Looking at the touchmove events it seems event.touches[0].[pageY/clientY] returns strange coordinate sometimes like the copy-paste above that prevents the shortcut bar from working correctly.

I/GeckoDump( 7897): Shortcuts (touchmove): 226
I/GeckoDump( 7897): Shortcuts (touchmove): -12176
I/GeckoDump( 7897): Shortcuts (touchmove): 253
I/GeckoDump( 7897): Shortcuts (touchmove): -18478
I/GeckoDump( 7897): Shortcuts (touchmove): 271
I/GeckoDump( 7897): Shortcuts (touchmove): -24989
I/GeckoDump( 7897): Shortcuts (touchmove): -24989
I/GeckoDump( 7897): Shortcuts (touchmove): 288
I/GeckoDump( 7897): Shortcuts (touchmove): -26160
I/GeckoDump( 7897): Shortcuts (touchmove): 319
I/GeckoDump( 7897): Shortcuts (touchmove): -27722
I/GeckoDump( 7897): Shortcuts (touchmove): 331
I/GeckoDump( 7897): Shortcuts (touchmove): -32913
I/GeckoDump( 7897): Shortcuts (touchmove): 345
I/GeckoDump( 7897): Shortcuts (touchend): -36844
Steps to reproduce:
 - populate the device with some data if needed. |cd $GAIA; make reference-workload-medium| with the device plugged in.
 - Open the contacts app on the device
 - Wait a bit for contacts to be fully populate.
 - Try to use the shortcut bar on the right side

Expected result:
 - The view navigate to the appropriate list of contacts

Actual result:
 - Sometimes it work, most of the time it does not.
I wonder if this is the cause of the problem captured in the movie attachment to bug 942854
(In reply to Milan Sreckovic [:milan] from comment #2)
> I wonder if this is the cause of the problem captured in the movie
> attachment to bug 942854

I dumped a few events there and there is sometimes strange values with negative pageY. Or even sometimes the pageY values are bigger than the size of the screen. Not sure if this is the root cause of the bug there, but in all cases this stuff probably does not help.
Wonder if we're messing up the coordinate  systems somehow?
Assignee: nobody → bugmail.mozilla
blocking-b2g: --- → 1.3+
(In reply to Milan Sreckovic [:milan] from comment #4)
> Wonder if we're messing up the coordinate  systems somehow?

It seems like the coordinates are messed up by the call to http://mxr.mozilla.org/mozilla-central/source/dom/ipc/TabParent.cpp#776 

Before calling that they are good, and after they are not anymore.
(In reply to Vivien Nicolas (:vingtetun) (:21) (RTO - Review Time Off until 09/12/13) from comment #6)
> Apparently the transformation matrix passed to
> http://mxr.mozilla.org/mozilla-central/source/gfx/layers/composite/
> APZCTreeManager.cpp#433 contains messed up values for |y|.

Obviously without the transformation matrix a few things are broken but I also can't reproduce bug 942752 and bug 943882. Without it, calendar is also not jumping all over the place as seen in the video from bug 942854, but the rendering issue is still here.
Do you still see this problem?  Because the jumpy behavior in 942854 seems to be gone for me.
(In reply to Milan Sreckovic [:milan] from comment #10)
> Do you still see this problem?  Because the jumpy behavior in 942854 seems
> to be gone for me.

I rebuilt today and I still see the strange coordinates I mentioned.
(In reply to Vivien Nicolas (:vingtetun) (:21) (RTO - Review Time Off until 09/12/13) from comment #12)
> Seems like
> http://mxr.mozilla.org/mozilla-central/source/gfx/layers/composite/
> APZCTreeManager.cpp#986 is the matrix that contains the weird values.

I wonder if the 'weird' values are not because of APZCTreeManager trying to get the scroll offset coordinate of the contacts list while panning the contacts shortcuts.
(In reply to Vivien Nicolas (:vingtetun) (:21) (RTO - Review Time Off until 09/12/13) from comment #13)
> (In reply to Vivien Nicolas (:vingtetun) (:21) (RTO - Review Time Off until
> 09/12/13) from comment #12)
> > Seems like
> > http://mxr.mozilla.org/mozilla-central/source/gfx/layers/composite/
> > APZCTreeManager.cpp#986 is the matrix that contains the weird values.
> 
> I wonder if the 'weird' values are not because of APZCTreeManager trying to
> get the scroll offset coordinate of the contacts list while panning the
> contacts shortcuts.

Seems like it happens when the the current scroll offset is out of sync with the last paint scroll offset.
Attached patch bug947337 (obsolete) — Splinter Review
This is not clear to me why we need to care about paint offsets while dispatching events that are normally tied to the window.
Attachment #8345231 - Flags: review?(bugmail.mozilla)
Attached patch bug909877.patch (obsolete) — Splinter Review
Is it better ?
Attachment #8345231 - Attachment is obsolete: true
Attachment #8345231 - Flags: review?(bugmail.mozilla)
Attachment #8345234 - Flags: review?(arthur.chen)
Attached patch bug947337 (obsolete) — Splinter Review
Oups. I attached a patch for an other bug, sorry for the noise.
Attachment #8345234 - Attachment is obsolete: true
Attachment #8345234 - Flags: review?(arthur.chen)
Attachment #8345236 - Flags: review?(bugmail.mozilla)
(In reply to Vivien Nicolas (:vingtetun) (:21) (RTO - Review Time Off until 09/12/13) from comment #1)
> Steps to reproduce:
>  - populate the device with some data if needed. |cd $GAIA; make
> reference-workload-medium| with the device plugged in.
>  - Open the contacts app on the device
>  - Wait a bit for contacts to be fully populate.
>  - Try to use the shortcut bar on the right side
> 
> Expected result:
>  - The view navigate to the appropriate list of contacts
> 
> Actual result:
>  - Sometimes it work, most of the time it does not.

Using these STR most of the time I get the expected result. Occasionally after scrolling around it does get into a state where clicking on the letters on the right doesn't work, but it doesn't happen that often for me.

(In reply to Vivien Nicolas (:vingtetun) (:21) (RTO - Review Time Off until 09/12/13) from comment #3)
> I dumped a few events there and there is sometimes strange values with
> negative pageY. Or even sometimes the pageY values are bigger than the size
> of the screen. Not sure if this is the root cause of the bug there, but in
> all cases this stuff probably does not help.

Where are you dumping the pageY that you see this value?
(In reply to Kartikaya Gupta (email:kats@mozilla.com) from comment #18)
> (In reply to Vivien Nicolas (:vingtetun) (:21) (RTO - Review Time Off until
> 09/12/13) from comment #1)
> > Steps to reproduce:
> >  - populate the device with some data if needed. |cd $GAIA; make
> > reference-workload-medium| with the device plugged in.
> >  - Open the contacts app on the device
> >  - Wait a bit for contacts to be fully populate.
> >  - Try to use the shortcut bar on the right side
> > 
> > Expected result:
> >  - The view navigate to the appropriate list of contacts
> > 
> > Actual result:
> >  - Sometimes it work, most of the time it does not.
> 
> Using these STR most of the time I get the expected result. Occasionally
> after scrolling around it does get into a state where clicking on the
> letters on the right doesn't work, but it doesn't happen that often for me.
> 
> (In reply to Vivien Nicolas (:vingtetun) (:21) (RTO - Review Time Off until
> 09/12/13) from comment #3)
> > I dumped a few events there and there is sometimes strange values with
> > negative pageY. Or even sometimes the pageY values are bigger than the size
> > of the screen. Not sure if this is the root cause of the bug there, but in
> > all cases this stuff probably does not help.
> 
> Where are you dumping the pageY that you see this value?

https://github.com/mozilla-b2g/gaia/blob/master/apps/communications/contacts/js/contacts_shortcuts.js#L128
Ok, so I can force the bad pageY values to happen more reliably by applying the attached patch and dragging around in the contacts alphabet shortcuts. The reason they happen is that we receive input events in the parent process between the time that the child calls UpdateScrollOffset and the time that the child finishes painting and sends us a layer transaction. During this time we have a giant async scroll offset ("giant" depending on how far the scroll offset jumped) which is technically correct to get the right visual behaviour. For input transformation though it is not correct. I haven't tested Vivien's patch but it looks like it would fix this, but also break other scenarios.

I will think about what the right fix here is.

Also just to note, I don't think the bad pageY values here would affect _tapping_ on the shortcut bar, since in that case there are no touchmove events to speak of. It will only affect _dragging_ because that is when we have a stream of input events getting interleaved with the scroll offset jumping.
(In reply to Kartikaya Gupta (email:kats@mozilla.com) from comment #20)
> Created attachment 8345373 [details] [diff] [review]
> Patch to make bad pageY values happen more reliably
> 
> Ok, so I can force the bad pageY values to happen more reliably by applying
> the attached patch and dragging around in the contacts alphabet shortcuts.
> The reason they happen is that we receive input events in the parent process
> between the time that the child calls UpdateScrollOffset and the time that
> the child finishes painting and sends us a layer transaction. During this
> time we have a giant async scroll offset ("giant" depending on how far the
> scroll offset jumped) which is technically correct to get the right visual
> behaviour. For input transformation though it is not correct. I haven't
> tested Vivien's patch but it looks like it would fix this, but also break
> other scenarios.
> 

Just curious, what could it break?

> I will think about what the right fix here is.
> 
> Also just to note, I don't think the bad pageY values here would affect
> _tapping_ on the shortcut bar, since in that case there are no touchmove
> events to speak of. It will only affect _dragging_ because that is when we
> have a stream of input events getting interleaved with the scroll offset
> jumping.

I agree that tapping will likely not be impacted.
Comment on attachment 8345236 [details] [diff] [review]
bug947337

Review of attachment 8345236 [details] [diff] [review]:
-----------------------------------------------------------------

Minusing because I believe this will regress other scenarios. I discussed this with Botond and Markus and have another solution, coming up.
Attachment #8345236 - Flags: review?(bugmail.mozilla) → review-
Vivien, can you check if this fixes the pageY problem for you? I'm not yet convinced the pageY problem is the root cause of all the contacts shortcut scrolling problems so those may still exist.

Eventually we can move this new flag into the FrameMetrics and then we don't need to keep it in APZC at all.
Attachment #8345429 - Flags: review?(botond)
Attachment #8345429 - Flags: review?(21)
Comment on attachment 8345429 [details] [diff] [review]
Only accept new scroll offsets in layers updates

Review of attachment 8345429 [details] [diff] [review]:
-----------------------------------------------------------------

Nice and simple, I like it!

If this will be the permanent solution, we should consider renaming UpdateScrollOffset to something else, as it no longer updates the scroll offset. Maybe ScrollOffsetChanged?

::: gfx/layers/ipc/AsyncPanZoomController.h
@@ +586,5 @@
>    // function can be used, or the monitor can be held and then |mState| updated.
>    ReentrantMonitor mMonitor;
>  
>  private:
> +  // Flag to capture whether or not UpdateScrollOffset was called; if it was

Let's say "Flag to capture whether or not UpdateScrollOffset was called since the last layers update".
Attachment #8345429 - Flags: review?(botond) → review+
(In reply to Vivien Nicolas (:vingtetun) (:21) from comment #21)
> Just curious, what could it break?

I believe it would affect input events while a legitimate async scroll transform is in place. For example, say you do a pan, and then before gecko has a chance to repaint, you click on something. The touch events from the click will have coordinates as if the pan never happened, and so will get targetted to the wrong element. Note that this will only happen *sometimes* because in general the RequestContentRepaint events from the APZC panning will be processed before the touch events from the click, but there are cases where there is still an async transform that hasn't been processed yet.
Comment on attachment 8345429 [details] [diff] [review]
Only accept new scroll offsets in layers updates

Review of attachment 8345429 [details] [diff] [review]:
-----------------------------------------------------------------

Seems to work for me \o/
Attachment #8345429 - Flags: review?(21) → review+
Comment on attachment 8345429 [details] [diff] [review]
Only accept new scroll offsets in layers updates

Sorry I tried with my patch on top of it. I still see the wrong pageY with it.
Attachment #8345429 - Flags: review+ → feedback-
(In reply to Vivien Nicolas (:vingtetun) (:21) from comment #27)
> Comment on attachment 8345429 [details] [diff] [review]
> Only accept new scroll offsets in layers updates
> 
> Sorry I tried with my patch on top of it. I still see the wrong pageY with
> it.

I also see bug 942752 with it.
I saw a wrong pageY value come out once with my patch after hammering on it for a while. Based on the log I suspect it is possible for the layers update with the new scroll offset to arrive *before* the call to UpdateScrollOffset, and that is when this occurs. I'm not entirely sure what code path triggers that but I will continue looking.
(In reply to Botond Ballo [:botond] from comment #24)
> If this will be the permanent solution, we should consider renaming
> UpdateScrollOffset to something else, as it no longer updates the scroll
> offset. Maybe ScrollOffsetChanged?

Not to mention removing the now-unused parameter of UpdateScrollOffset :).
Markus, do you know how hard it would be to add a flag on the FrameMetrics like we were discussing on IRC? I don't know if layout stores the kind of information that we need ("this layout was triggered by a content scrollTo") or if it's even possible. If it turns out to be simple to do though I think that would solve the problem I'm seeing now regardless of the cause.
Flags: needinfo?(mstange)
Layout doesn't record the information yet, but something like this should do it. I haven't tested it yet, though.
This lets us specify an origin when calling ScrollToCSSPixelsApproximate in APZCCallbackHelper and stores it in an mOriginOfLastScroll field on the scroll frames. APZCCallbackHelper calls this with aOrigin == nsGkAtoms::apzc, and RecordFrameMetrics sets metrics.mWasScrolledByContent = (origin != nsGkAtoms::apzc).
I got rid of nsIDOMWindowUtils::ScrollToCSSPixelsApproximate so that we don't need to pipe aOrigin through it, but I don't know whether nsLayoutUtils::FindScrollableFrameFor finds the right frame when scrolling the root.

Kats, does this work for you?
Flags: needinfo?(mstange)
Thanks, Markus, that works beautifully! The attached WIP patch applies on top of it and uses the flag in APZC.

Vivien, can you try applying attachment 8345933 [details] [diff] [review] and this patch and see if that fixes the problem? It fixes it for me but I can only intermittently reproduce the problem to begin with so I can't confirm that it fixes it. If you find it fixes the problem I'll clean up the patches and get them ready for landing.
Attachment #8345236 - Attachment is obsolete: true
Attachment #8345429 - Attachment is obsolete: true
Flags: needinfo?(21)
I'm rebuilding. I will keep you up as soon my build is finished.
Flags: needinfo?(21)
(In reply to Vivien Nicolas (:vingtetun) (:21) from comment #34)
> I'm rebuilding. I will keep you up as soon my build is finished.

Hammering a bit on the contacts app I still see that (but this sounds harder than usual).

I/GeckoDump( 3152): Event: -43030
I/GeckoDump( 3152): Event: 296
I/GeckoDump( 3152): Event: 296
I/GeckoDump( 3152): Event: 43511
I/GeckoDump( 3152): Event: 287
I/GeckoDump( 3152): Event: -13821
I/GeckoDump( 3152): Event: 248
I/GeckoDump( 3152): Event: 14
I/GeckoDump( 3152): Event: 14
I/GeckoDump( 3152): Event: 124
I/GeckoDump( 3152): Event: 141
I/GeckoDump( 3152): Event: -32916
I/GeckoDump( 3152): Event: 176
I/GeckoDump( 3152): Event: 194
I/GeckoDump( 3152): Event: 4228
I/GeckoDump( 3152): Event: 183
I/GeckoDump( 3152): Event: 183
I/GeckoDump( 3152): Event: 172
I/GeckoDump( 3152): Event: 2341
I/GeckoDump( 3152): Event: 2341
I/GeckoDump( 3152): Event: 239
I/GeckoDump( 3152): Event: 234
I/GeckoDump( 3152): Event: 11022
I/GeckoDump( 3152): Event: 11022
I/GeckoDump( 3152): Event: 11022

I also can reproduce fairly easily the issue from bug 942752.

Sorry :/
*sigh*

I'll spin up a build for my hamachi to see if i can repro there more easily. if that doesn't work I'll try helix.

In the meantime I would still like to get this patch to kill UpdateScrollOffset in, so I'll spin off another bug for that.
Huzzah! I can repro on hamachi. Will debug.
I put cleaned up patches on bug 949132. With those patches I can't reproduce the bad pageY values while scrolling around in the contacts shortcuts. I do see something that looks like bug 942752 in that I see tap highlight show up and persist on random contacts as I'm panning around. I'm not yet convinced that is caused by bad pageY values.
Depends on: 949132
No longer depends on: 949132
See Also: → 949132
Comment on attachment 8345933 [details] [diff] [review]
Add mWasScrolledByContent flag to frame metrics and set it to false when scrolling via APZCCallbackHelper

Obsoleted by bug 949132
Attachment #8345933 - Attachment is obsolete: true
Comment on attachment 8346027 [details] [diff] [review]
WIP - use mWasScrolledByContent flag

Ditto
Attachment #8346027 - Attachment is obsolete: true
Vivien, can you see this problem on hamachi or some other phone?
Flags: needinfo?(21)
(In reply to Milan Sreckovic [:milan] from comment #41)
> Vivien, can you see this problem on hamachi or some other phone?

I will try in a few hours.
Flags: needinfo?(21)
Vivien, when you get a chance, please reproduce the problem with this patch applied and attach the full logcat. Hopefully it will provide a clue as to why we're still getting bad transforms.
Attachment #8348795 - Flags: feedback?(21)
Attachment #8345373 - Attachment is obsolete: true
Flags: needinfo?(nhirata.bugzilla)
Vivien, I'm still unable to reproduce this problem on recent code. Can you apply the patch from comment 43 and repro?
Flags: needinfo?(21)
I see the negative coordinate values without the patches from bug 949132, but not with them. This is consistent with our belief that those patches should fix the problem. Duping; Vivien, feel free to undupe if you still see the negative values with those patches.
Status: NEW → RESOLVED
Closed: 10 years ago
Resolution: --- → DUPLICATE
We can't just dupe, because this bug is 1.3+, and bug 949132 has been -'d. Either we keep them separate and have this bug land the solution to 1.3, or we 1.3+ bug 949132, until then I'll reopen.
Status: RESOLVED → REOPENED
Resolution: DUPLICATE → ---
After IRC conversation, will make bug 949132 1.3+ and dupe this one back to it.
Status: REOPENED → RESOLVED
Closed: 10 years ago10 years ago
Resolution: --- → DUPLICATE
Attached video weird-highlight.MOV
Here's a video of a button outside the scrolling div being highlighted while scrolling. On a build from yesterday (so after bug 949132's landing).
Status: RESOLVED → REOPENED
Flags: needinfo?(21)
Resolution: DUPLICATE → ---
Your thumb was pretty close to the button - to me it looks like the event fluffing code sent the touchdown to the button and so it got highlighted. If you think there's still an issue here please file a new bug with details. I'd rather not reopen this one because (a) I'm not convinced that this is caused by bad touch event coordinates from the APZC and (b) I'd rather start investigating this with a clean slate and not have people worry about the previous 50 comments on this bug.
Status: REOPENED → RESOLVED
Closed: 10 years ago10 years ago
Resolution: --- → DUPLICATE
(In reply to Kartikaya Gupta (email:kats@mozilla.com) from comment #49)
> Your thumb was pretty close to the button - to me it looks like the event
> fluffing code sent the touchdown to the button and so it got highlighted.

I reproduce this issue so many times in the past and so it is clear to me that it is not fluffling related. Also fluffing will likely not highlight this button while the finger is still on the list and the list is panning since it would have mean that the list was not the start of the gesture and then I assumme that panning will not have happened.

> If
> you think there's still an issue here please file a new bug with details.
> I'd rather not reopen this one because (a) I'm not convinced that this is
> caused by bad touch event coordinates from the APZC and (b) I'd rather start
> investigating this with a clean slate and not have people worry about the
> previous 50 comments on this bug.
> 

As I stated in this bug, the proposed patch makes the issue goes away for me. So I'm pretty convinced that it is the same root cause. So it unclear to me why we should start from a new bug.
Status: RESOLVED → REOPENED
Resolution: DUPLICATE → ---
(In reply to Vivien Nicolas (:vingtetun) (:21) from comment #50)
> I reproduce this issue so many times in the past and so it is clear to me
> that it is not fluffling related. Also fluffing will likely not highlight
> this button while the finger is still on the list and the list is panning
> since it would have mean that the list was not the start of the gesture and
> then I assumme that panning will not have happened.

If the finger starts on the list, panning will happen on the list. But if the touchdown is dispatched to the button because of the fluffing code then I would expect the button to be highlighted. But you could be right, I don't know exactly what happens there.

> As I stated in this bug, the proposed patch makes the issue goes away for
> me. So I'm pretty convinced that it is the same root cause. So it unclear to
> me why we should start from a new bug.

I don't believe your proposed patch is correct - there are many ways to fix any given bug and just because it fixes the issue you're seeing doesn't mean it won't introduce other regressions. We've spent a lot of time figuring out what the right transforms are supposed to be in that code and I'm pretty convinced that they're correct.

Seeing as I can't reproduce this issue and you haven't provided the information I asked for in in comment 43 I can't really do anything here. Unassigning from myself.
Assignee: bugmail.mozilla → nobody
(In reply to Kartikaya Gupta (email:kats@mozilla.com) from comment #51)
> 
> I don't believe your proposed patch is correct - there are many ways to fix
> any given bug and just because it fixes the issue you're seeing doesn't mean
> it won't introduce other regressions. We've spent a lot of time figuring out
> what the right transforms are supposed to be in that code and I'm pretty
> convinced that they're correct.

I'm not arguing at all that my patch is correct. I'm just saying that the issue is still present and by having run the proposed patch during the break I have not encounter any issues - which does not prove that there is no issues though.

> 
> Seeing as I can't reproduce this issue and you haven't provided the
> information I asked for in in comment 43 I can't really do anything here.
> Unassigning from myself.

I was trying to rebuild yesterday to give you the infos but I encounter a build issue in the JS runtime :/
Attached file log.apzc.txt
Here is some logging.

I have also added a currentY: XXX value in the contacts app so you can see when strange values are returned.
Attachment #8361153 - Flags: feedback?(bugmail.mozilla)
No longer blocks: gaia-apzc
Comment on attachment 8361153 [details]
log.apzc.txt

Thanks for the log. I think I understand what is going on now. The main problem is that when we calculate the transformToGecko in GetInputTransforms, we use the GetCurrentAsyncTransform which computes the async transform relative to mLastContentPaintMetrics. In fact what we want to use there is the async transform relative to mLastPaintRequestMetrics (or more precisely, the most recent metrics we requested a repaint before the transformed touch event gets back to TabChild - this may not be mLastPaintRequestMetrics because of the paint throttler). I'll see if I can figure out a good fix for this.
Attachment #8361153 - Flags: feedback?(bugmail.mozilla)
PM retriaged this and we believe it is a blocker.
The only functional change here is that ZoomToRect's call to repaint endZoomToMetrics is now subject to the deduplication code (which should be fine, I think).
This part is needed because we might have called RequestContentRepaint, but the corresponding paint might not yet have been inserted into the message queue to gecko because of the paint throttler. If this were not the case then we would just be able to use mLastPaintRequestMetrics instead.
Attached patch Part 3/3 - The missing transform (obsolete) — Splinter Review
Yay complicated transforms!

I haven't tested this code yet at all. I want to first figure out how to reliably reproduce the problem Vivien was seeing, which should be possibly by inserting a few sleep calls in the right places. My first attempt at doing that failed and I still couldn't reproduce it so I will try some more.
I had that transform backwards. Fixed now. I was able to force the condition to repro and verify that with these patches it seems to behave better.

Vivien, can you try out these patches and see if it fixes the problem for you? I wrote these patches against 1.3 code but there are versions rebased to master at https://github.com/staktrace/gecko-dev/commits/geckotransform (top 3 commits).
Flags: needinfo?(21)
Attachment #8362549 - Attachment is obsolete: true
Assignee: nobody → bugmail.mozilla
Status: REOPENED → ASSIGNED
(In reply to Kartikaya Gupta (email:kats@mozilla.com) from comment #59)
> Created attachment 8362693 [details] [diff] [review]
> Part 3/3 - The missing transform
> 
> I had that transform backwards. Fixed now. I was able to force the condition
> to repro and verify that with these patches it seems to behave better.
> 
> Vivien, can you try out these patches and see if it fixes the problem for
> you? I wrote these patches against 1.3 code but there are versions rebased
> to master at https://github.com/staktrace/gecko-dev/commits/geckotransform
> (top 3 commits).

I applied the patches on top of m-c, applied the usual dump for the contacts app but filtered so it only returns values < 0 and > window.innerHeight.
 
Both good and mitigate results:
 - It seems that the negative values or the values way too big are gone!
 - I still see sometimes some values between window.innerHeight < n < the screen height. I think that pageY is supposed to be relative to the document based on https://developer.mozilla.org/en-US/docs/Web/API/event.pageY
 - The highlight issue from https://bugzilla.mozilla.org/show_bug.cgi?id=947337#c48 is still here.

So it seems like the completely insane values are gone, which is a good stuff. I won't mind if you want to close this bug once the patches has landed and continue to investigate the 2 others issues into different bugs.
Flags: needinfo?(21)
(In reply to Vivien Nicolas (:vingtetun) (:21) from comment #60)
>  - It seems that the negative values or the values way too big are gone!

Yay!

>  - I still see sometimes some values between window.innerHeight < n < the
> screen height. I think that pageY is supposed to be relative to the document
> based on https://developer.mozilla.org/en-US/docs/Web/API/event.pageY

Hm. pageY is relative to the document, yes. Which means any value 0 <= n < documentheight should be valid. This is what I see on desktop if I dump the pageY of mousedown events, for example. So I think pageY values greater than window.innerHeight are fine. I'm not entirely sure if the shortcut bar is in a separate document or is position:fixed or something so I'm not sure if the values coming out are right or not. But as long as they are in the same range as the non-APZ values I'm happy.

>  - The highlight issue from
> https://bugzilla.mozilla.org/show_bug.cgi?id=947337#c48 is still here.

Yeah I strongly suspect this is because the hardware is generating some wacky events (in bug 957188 comment 11 I reported seeing large variations in the events that the hardware sends us) and combining with the event fluffing code. But yeah, a follow-up bug for this would be good.

> So it seems like the completely insane values are gone, which is a good
> stuff. I won't mind if you want to close this bug once the patches has
> landed and continue to investigate the 2 others issues into different bugs.

Sounds good.
Attachment #8362545 - Flags: review?(botond)
Attachment #8362547 - Flags: review?(botond)
Attachment #8362693 - Flags: review?(botond)
This is just a patch I had locally that makes it more likely to hit the bad scenario, and to detect when we do hit it. Putting it here in case it comes in handy in the future.
Attachment #8362545 - Flags: review?(botond) → review+
(In reply to Kartikaya Gupta (email:kats@mozilla.com) from comment #61)
> >  - I still see sometimes some values between window.innerHeight < n < the
> > screen height. I think that pageY is supposed to be relative to the document
> > based on https://developer.mozilla.org/en-US/docs/Web/API/event.pageY
> 
> Hm. pageY is relative to the document, yes. Which means any value 0 <= n <
> documentheight should be valid. This is what I see on desktop if I dump the
> pageY of mousedown events, for example. So I think pageY values greater than
> window.innerHeight are fine. I'm not entirely sure if the shortcut bar is in
> a separate document or is position:fixed or something so I'm not sure if the
> values coming out are right or not. But as long as they are in the same
> range as the non-APZ values I'm happy.
>

The shortcut bar is a fixed/absolute positioning fwiw.
Attachment #8362547 - Flags: review?(botond) → review+
Comment on attachment 8362693 [details] [diff] [review]
Part 3/3 - The missing transform

Review of attachment 8362693 [details] [diff] [review]:
-----------------------------------------------------------------

::: gfx/layers/composite/APZCTreeManager.cpp
@@ +866,4 @@
>          MT.Inverse()
>          LC.Inverse()
>          LN.Inverse()
>     This combined transformation is returned in the aTransformToApzcOut out-parameter.

nit: A bit further up in this comment (the bugzilla review system doesn't let me see it here), the statement "Layer L is the layer that corresponds to the returned APZC instance" dates back from the time that this comment was above GetTargetAPZC, which did in fact return an APZC instance. Let's update this to "Layer L is the layer that corresponds to |aApzc|".

@@ +871,5 @@
>     Next, if we want user inputs sent to gecko for event-dispatching, we will need to strip
>     out all of the async transforms that are involved in this chain. This is because async
>     transforms are stored only in the compositor and gecko does not account for them when
> +   doing display-list-based hit-testing for event dispatching.
> +   Furthermore, because this these input events are processed by Gecko in a FIFO queue that

s/this these/these

@@ +918,5 @@
>     Additionally, for space-saving purposes, each APZC instance stores its layer's individual
>     CSS transform and the accumulation of CSS transforms to its parent APZC. So the APZC for
>     layer L would store LC and (MC * NC * OC), and the layer P would store PC and (QC * RC).
>     The APZCs also obviously have LT, LN, PT, and PN, so all of the above transformation combinations
>     required can be generated.

It might be worthwhile to add a sentence saying how an APZC for layer X knows or calculates XD.

@@ +943,5 @@
>    gfx3DMatrix transientAsyncUntransform = nontransientAsyncTransform * asyncUntransform;
>  
>    // aTransformToApzcOut is initialized to OC.Inverse() * NC.Inverse() * MC.Inverse() * LC.Inverse() * LN.Inverse()
>    aTransformToApzcOut = ancestorUntransform * aApzc->GetCSSTransform().Inverse() * nontransientAsyncTransform.Inverse();
> +  // aTransformToGeckoOut is initialized to LT.Inverse() * LD * LC * MC * NC * OC

Let's add a comment pointing out that we haven't forgotten about MD, ND, and OD, they're just zero.
Attachment #8362693 - Flags: review?(botond) → review+
(In reply to Botond Ballo [:botond] from comment #64)
> Let's add a comment pointing out that we haven't forgotten about MD, ND, and
> OD, they're just zero.

There was already a comment to that effect at the end of the method but it was slightly out-of-date so I updated it to be more accurate. Addressed all the other comments as well and landed on inbound:

https://hg.mozilla.org/integration/mozilla-inbound/rev/5becc07d9dac
https://hg.mozilla.org/integration/mozilla-inbound/rev/f3095f92e1c3
https://hg.mozilla.org/integration/mozilla-inbound/rev/6ade48b8b9d7
Kats,

Please provide next steps.
Flags: needinfo?(bugmail.mozilla)
Fix tests.
Reland.
Flags: needinfo?(bugmail.mozilla)
The seemingly-random destroy calls in other tests were needed as well, or the mock object got leaked. I believe this is a result of my rearranging how RequestContentRepaint worked and more specifically the change in when PostDelayedTask gets called.

The HitTesting2 test was failing because the way we set it up, it would have a nonempty LD transform which would cancel out the LA transform. So I modified the existing assertions to deal with those, and then extended the test to simulate having nonempty (but different) LD and LA transforms.
Attachment #8363426 - Flags: review?(botond)
Comment on attachment 8363426 [details] [diff] [review]
Part 4/3 - Update tests

Review of attachment 8363426 [details] [diff] [review]:
-----------------------------------------------------------------

It's fortunate that we could arrange for one paint request to be "dispatched" and another not without bringing additional machinery into the test :)

::: gfx/tests/gtest/TestAsyncPanZoomController.cpp
@@ +921,5 @@
> +  hit = GetTargetAPZC(manager, ScreenPoint(75, 75), transformToApzc, transformToGecko);
> +  EXPECT_EQ(apzcroot, hit.get());
> +  // transformToApzc doesn't unapply the root's own async transform
> +  EXPECT_EQ(gfxPoint(75, 75), transformToApzc.Transform(gfxPoint(75, 75)));
> +  // but transformToGecko does

Let's be a little more specific here: transformToGecko unapplies the full async transform of -100, and then reapplies the "D" transform of -50, leading to an overall adjustment of +50.

@@ +929,5 @@
> +  hit = GetTargetAPZC(manager, ScreenPoint(25, 25), transformToApzc, transformToGecko);
> +  EXPECT_EQ(apzcroot, hit.get());
> +  // transformToApzc doesn't unapply the root's own async transform
> +  EXPECT_EQ(gfxPoint(25, 25), transformToApzc.Transform(gfxPoint(25, 25)));
> +  // but transformToGecko does

Likewise here.
Attachment #8363426 - Flags: review?(botond) → review+
Needs a branch-specific patch for uplift.
Flags: needinfo?(bugmail.mozilla)
Bad rebase on my part; I added an extra apzc->Destroy() at the end of ApzcPan which was causing the problem. Verified locally that taking that out fixes it, but the trees are closed now so I'll reland in the morning.
Flags: needinfo?(nhirata.bugzilla)
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: