IPDL::PBrowser::RecvRealTouchEvent causes 50ms sync reflow

RESOLVED FIXED in Firefox 28, Firefox OS v1.3

Status

Firefox OS
General
P1
normal
RESOLVED FIXED
5 years ago
4 years ago

People

(Reporter: BenWa, Assigned: nrc)

Tracking

({perf})

unspecified
1.3 C3/1.4 S3(31jan)
ARM
Gonk (Firefox OS)
Dependency tree / graph

Firefox Tracking Flags

(blocking-b2g:1.3+, firefox27 wontfix, firefox28 fixed, firefox29 fixed, b2g-v1.3 fixed, b2g-v1.3T fixed, b2g-v1.4 fixed)

Details

(Whiteboard: [c=handeye p= s= u=1.3])

Attachments

(8 attachments, 1 obsolete attachment)

(Reporter)

Description

5 years ago
Profile:
http://people.mozilla.org/~bgirard/cleopatra/#report=70f79696ad4b58afe9d1ed57db3ac1b93f1deb08&search=layout%253A%253ADoReflow

Receiving a touch event can cause a sync reflow taking 50ms on the contact app. this means. We need to avoid reflowing to handle a touch event if possible.

Comment 1

5 years ago
Nice find. ++BenWa

Updated

5 years ago
Keywords: perf
OS: Mac OS X → Gonk (Firefox OS)
Hardware: x86 → ARM
Whiteboard: [c= p= s= u=]
This is blocking a blocker (bug 912134) - Mike, can you help find an owner?
blocking-b2g: --- → koi+
Flags: needinfo?(mlee)

Comment 3

5 years ago
Jed,

This is a 1.2 blocker I need you to make your #1 priority after landing your patches for bug 810526.
Assignee: nobody → jld
Status: NEW → ASSIGNED
Flags: needinfo?(mlee)
Priority: -- → P1
Whiteboard: [c= p= s= u=] → [c= p= s= u=1.2]

Comment 4

5 years ago
The suspicious code is http://mxr.mozilla.org/mozilla-central/source/layout/base/nsPresShell.cpp?rev=bc427f5ec61b&mark=6063-6063#6061 which was added in Bug 780692.

But if flush takes lots of time, there is obviously something to flush. Should figure out what, since 
there will be a flush at some point anyway.

And perhaps try to remove the explicit flush for touch events.
I notice a call to mozilla::dom::HTMLImageElementBinding::set_src immediately before the ~50ms reflow.

Comment 7

5 years ago
(In reply to Olli Pettay [:smaug] from comment #4)
> And perhaps try to remove the explicit flush for touch events.
Or do it conditionally only if we have animations to flush.
So the user scrolls, the app adds more contacts to the DOM, and at some point we have to figure out where things are.  What part of that do we think we can improve on?  Or do we think that's not what's happening?

Keep in mind that (1) I currently know very little about layout and how we do it, and (2) reflows caused by reading scrollTop/scrollLeft are covered by bug 914854, which is not this.
(In reply to Olli Pettay [:smaug] from comment #7)
> (In reply to Olli Pettay [:smaug] from comment #4)
> > And perhaps try to remove the explicit flush for touch events.
> Or do it conditionally only if we have animations to flush.

That would solve the particular case for that bug, but I'm worried that there are other ways for content to not be at the correct location if we don't flush.
This seems to require deep knowledge of the layout code (which I do not have) to even determine if this is something that can meaningfully be "fixed", let alone do so.  Help?
Flags: needinfo?(roc)
Flags: needinfo?(dbaron)
Flags: needinfo?(bzbarsky)
Just sort of thinking out loud here:  why do we want to flush as part of processing a touch event?  In theory, if the user has touched something, we'd rather send the event to the thing they actually saw when they touched than what would be there if everything were up-to-date.  The exception to that is animations on the compositor (which I think is why this flush got put in in the first place), where the layout data we have is behind what the user is actually seeing.  So I wonder if, instead of flushing, we could use current data from transitions running on the compositor (maybe we even using a timestamp on the event) when determining where to dispatch the event, but otherwise *not* update layout data?
Sounds good to me.
Flags: needinfo?(roc)
Likewise.
Flags: needinfo?(bzbarsky)
OK, so I guess that leaves a few questions.

One, it occurs to me that there are two different ways to do what I said:

 (1) [which I was originally assuming] change the display list construction that we do fr event handling to look at "current" compositor animation state for things that have transform being animated on the compositor.  (Maybe opacity too, but that's only barely relevant.)

 (2) re-use the mini-flush code and actually reflect the data in style, and do the necessary frame tree updates, without flushing anything else.

It occurs to me that (2) *might* actually be simpler.  Though mini-flush code is pretty dangerous to begin with, so perhaps not.


Two:  is this something Jed is comfortable taking a shot at?  Or should we find somebody with more layout experience?

Comment 15

5 years ago
(In reply to David Baron [:dbaron] (needinfo? me) from comment #14)
> ...
> ...
> Two:  is this something Jed is comfortable taking a shot at?  Or should we
> find somebody with more layout experience?

Jed has already expressed some discomfort tackling this so a developer with more Layout experience is preferred. Who would you recommend?
Whiteboard: [c= p= s= u=1.2] → [c=handeye p= s= u=1.2]
(In reply to Mike Lee [:mlee] from comment #15)
> Jed has already expressed some discomfort tackling this so a developer with
> more Layout experience is preferred. Who would you recommend?

I think that actually depends on the answer to "One", so I'll ask bz what he thinks about "One" from comment 14.
Flags: needinfo?(bzbarsky)
I don't have a good feel for how easy it is to ask the compositor for info like that.  If it's not too hard, it certainly sounds safer than messing with miniflushes.
Flags: needinfo?(bzbarsky)
I think option 1 is more complex in the end since it adds a new way to get style/geometry data. The miniflush is an existing concept.
(Reporter)

Comment 19

5 years ago
If we're trying to provide scrolling physics then we should not do anything fancy but wait on sub-APZC which runs all the scrolling code in the compositor. Ideally in the mean time we can have a work around. Of course this doesn't work if we want to do something more then APZC does.

Asking the compositor for information is bad because the compositor is designed to block on vsync so the compositor can reasonably be expected to take ~15ms to handle an IPC message.
(Assignee)

Comment 20

5 years ago
roc has asked me to look at this.

I am not sure how option one would work - would we send an async message to the compositor thread querying the current values of the layer for a display list item? That sounds slow and complex.

In the current code that Smaug pointed to (http://mxr.mozilla.org/mozilla-central/source/layout/base/nsPresShell.cpp?rev=bc427f5ec61b&mark=6063-6063#6061) why do we only flush for touch events and not (say) for clicks? Are assuming that we only need to flush on mobile? Or is there something else at play?

Do we need to take account of the APZC transform as well as any OMTA here? Or has AZPC been factored in before we get to this stage?
Assignee: jld → ncameron
(In reply to Nick Cameron [:nrc] from comment #20)
> I am not sure how option one would work - would we send an async message to
> the compositor thread querying the current values of the layer for a display
> list item? That sounds slow and complex.

What I was thinking, anyway, was that we wouldn't ask the compositor anything -- we'd just, for anything being animated on the compositor, do the computation on the main thread, given the time (from the event) and the animation manager's animation data.

> In the current code that Smaug pointed to
> (http://mxr.mozilla.org/mozilla-central/source/layout/base/nsPresShell.
> cpp?rev=bc427f5ec61b&mark=6063-6063#6061) why do we only flush for touch
> events and not (say) for clicks? Are assuming that we only need to flush on
> mobile? Or is there something else at play?

Maybe because OMTA is only enabled on mobile, and we'd have needed it for clicks if we enabled OMTA on desktop?  Anyway, the idea here is to remove that and replace it with a better solution.
Flags: needinfo?(dbaron)
(Assignee)

Comment 22

5 years ago
Does anyone have a test case for this?

I have a test case for desktop and (changing |if (aEvent->eventStructType == NS_TOUCH_EVENT)| to |nsLayoutUtils::AreAsyncAnimationsEnabled()| so that I flush for clicks as well as touches) the behaviour is buggy - I often get catch the click event with the wrong element. Removing the flush makes absolutely no difference to the behaviour (so one fix here is just to delete it, we should get a perf improvement and no worse correctness). Disabling OMTA gives the correct behaviour, so I'm pretty sure OMTA is what is causing the issues.

I had a bit of an experiment with the full flush and using a mini-flush, but I could not cause correct behaviour, nor determine why we get incorrect behaviour. I didn't measure the time difference between a full flush and a mini-flush either. I will continue to investigate on Monday.
(In reply to Benoit Girard (:BenWa) from comment #19)
> If we're trying to provide scrolling physics then we should not do anything
> fancy but wait on sub-APZC which runs all the scrolling code in the
> compositor. Ideally in the mean time we can have a work around. Of course
> this doesn't work if we want to do something more then APZC does.
> 
> Asking the compositor for information is bad because the compositor is
> designed to block on vsync so the compositor can reasonably be expected to
> take ~15ms to handle an IPC message.

Perhaps a distraction from this bug, but -- how is doing scrolling physics on the compositor thread going to interact with event handling?
(Reporter)

Comment 24

5 years ago
(In reply to David Baron [:dbaron] (needinfo? me) from comment #23)
> Perhaps a distraction from this bug, but -- how is doing scrolling physics
> on the compositor thread going to interact with event handling?

If we have an event listeners then we dispatch the event to the main thread for handling, otherwise it's safe to handle the scrolling from the compositor. But yes its been tricky to get right and if there's a mismatch typically the screen position jumps around if the compositor and the main thread are mediating the numbers correctly between the two threads.
(Assignee)

Comment 25

5 years ago
I have more questions than answers, I'm afraid. I get the same results with my testcase whether I do a full flush, a mini flush, or no flush at all. I have a div transitioning I click on the div to start the transition and then click in the same place without moving the mouse once the div is no longer there. I listen for the event on the moving div and a div behind. Expected: the background div receives the click event. This happens with OMTA disabled. With OMTA enabled, neither div receives the event. I have been trying to figure out why that happens.

The first thing I checked was the current event frame and content in the pres shell, surprisingly these were always correct. I also checked using the layout debugger (visual event debugging/event dumping) and this highlighted the expected elements. I then spent a while debugging through the event dispatch system but at this point I stopped getting anything useful - I'm just not familiar with the code, in particular I couldn't work out how to get the element associated with the event listener. Its possible also there is something wrong with my test case (I'll upload that in a sec), but it works fine with OMTA disabled and if I move the mouse before the second click to cause a mouse move events.

SO, can someone give me some tips on debugging event dispatch please? Anything/anywhere I should check to find out why my listener is not getting the event?
(Assignee)

Comment 26

5 years ago
Created attachment 808415 [details]
testcase for desktop (requires enabling OMTC and OMTA)

Comment 27

5 years ago
Paging :smaug to the rescue
Do you know if click event is dispatched at all?
It is dispatched in http://mxr.mozilla.org/mozilla-central/source/content/events/src/nsEventStateManager.cpp?rev=5dbc32740161#4402
Note, aEvent->clickCount needs to be > 0. So if mousedown and mouseup happen on different elements, 
click isn't dispatched.
(Assignee)

Comment 29

5 years ago
(In reply to Olli Pettay [:smaug] from comment #28)
> Do you know if click event is dispatched at all?
> It is dispatched in
> http://mxr.mozilla.org/mozilla-central/source/content/events/src/
> nsEventStateManager.cpp?rev=5dbc32740161#4402
> Note, aEvent->clickCount needs to be > 0. So if mousedown and mouseup happen
> on different elements, 
> click isn't dispatched.

Yes, I see click events dispatched here and, I think, to the correct elements. I see them bouncing around a bunch of HandleEvent type methods in nsPresShell, nsEventListenerManager and nsEventDispatcher, but that's where I couldn't match events to elements any more, so I'm not really sure of what is going on.
nsEventListenerManager::HandleEventSubType should get called with
aDOMEvent->GetInternalNSEvent()->message==NS_MOUSE_CLICK and aCurrentTarget pointing to the
EventTarget which has at least one listener for click.
(Elements, Documents and Windows for example are EventTargets)
(Assignee)

Comment 31

5 years ago
(In reply to Olli Pettay [:smaug] from comment #30)
> nsEventListenerManager::HandleEventSubType should get called with
> aDOMEvent->GetInternalNSEvent()->message==NS_MOUSE_CLICK and aCurrentTarget
> pointing to the
> EventTarget which has at least one listener for click.
> (Elements, Documents and Windows for example are EventTargets)

Ah! The crucial bit of info I was missing is that elements _are_ EventTargets. I was looking for a field. That should make it easier, thanks!
(Assignee)

Comment 32

5 years ago
I have kind of figured out what is going on here. The key is that for a successful click we go through the event handling machinery twice. As Smaug guessed, the problem is that click count gets reset so we don't dispatch the click at all. I didn't notice this because we dispatch in the first pass and not the second. That is because we are only doing a successful flush before the second pass and so when we check the click we have a different element. The flush doesn't work because the first time around because we have different pres contexts for the two passes. The two passes have the same root pres context and the first time around, the root pres context is the pres context associated with the pres shell. Only the transition manager on the non-root pres context knows about the transitioning element.

Things I don't understand:

- why we have two passes and what they are for

- why the mouse down and mouse up events are handled by different pres shells

- how we have a pres shell handling an event on an element which is transitioning without it knowing about the transition

- how we fix this! I either need to change which pres shell gets handles the events, change which transition manager knows about the transition (I think this is not the right approach), or get access to the correct pres context from the pres shell in the first pass. I don't know how to do any of things.
Flags: needinfo?(dbaron)
Flags: needinfo?(bugs)
(In reply to Nick Cameron [:nrc] from comment #32) 
> - why we have two passes and what they are for
if with two passes you mean click dispatch, it is just how click is defined.
mousedown and mouseup happening on the same element causes a click to be dispatched. 

> - why the mouse down and mouse up events are handled by different pres shells
They shouldn't be. Does hittesting give different shell? What is the testcase you're using?
The one in this bug? (How does one enable OMTC and OMTA on Linux? )

> - how we have a pres shell handling an event on an element which is
> transitioning without it knowing about the transition
what you mean? On desktop at least we don't do transitions omt, so hittesting should know about the state, no?

> - how we fix this! I either need to change which pres shell gets handles the
> events, change which transition manager knows about the transition (I think
> this is not the right approach), or get access to the correct pres context
> from the pres shell in the first pass. I don't know how to do any of things.
I'm missing something here. mousedown/up should happen on the same shell/prescontext
Flags: needinfo?(bugs)
(Assignee)

Comment 34

5 years ago
(In reply to Olli Pettay [:smaug] from comment #33)
> (In reply to Nick Cameron [:nrc] from comment #32) 
> > - why we have two passes and what they are for
> if with two passes you mean click dispatch, it is just how click is defined.
> mousedown and mouseup happening on the same element causes a click to be
> dispatched. 
> 

So I assumed that because the pres shells were different they were deliberately more different than just mouse down/up.

> > - why the mouse down and mouse up events are handled by different pres shells
> They shouldn't be. Does hittesting give different shell? What is the
> testcase you're using?
Yes, different shells. Yes, the test case on the bug - click on the yellow box to start the transition, then (without moving the mouse) click once the box has passed - expected: background goes red; actual: nothing happens).

> The one in this bug? (How does one enable OMTC and OMTA on Linux? )

You will need a Linux system that is OK with HWA. Then set the following prefs to true:
layers.acceleration.force-enabled
layers.offmainthreadcomposition.enabled
layers.offmainthreadcomposition.async-animations

> 
> > - how we have a pres shell handling an event on an element which is
> > transitioning without it knowing about the transition
> what you mean? On desktop at least we don't do transitions omt, so
> hittesting should know about the state, no?
> 

Main thread vs OMTA doesn't make much difference here. The transitions are managed by a transition manager and there is one transition manager per pres context. Transitions known to one transition manager are not know by transition managers on other pres contexts.

> > - how we fix this! I either need to change which pres shell gets handles the
> > events, change which transition manager knows about the transition (I think
> > this is not the right approach), or get access to the correct pres context
> > from the pres shell in the first pass. I don't know how to do any of things.
> I'm missing something here. mousedown/up should happen on the same
> shell/prescontext

OK, so this is the big mystery to me two. I will do some digging tomorrow.
(Assignee)

Comment 35

5 years ago
So, I solved my mystery and have patches to hopefully solve the problem - I haven't profiled, it would be great if someone could do that to see if we actually save time.

First, the mystery is not so mysterious - the sequence of events is - mouse down is triggered, that is dealt with by the root presshell, we hit test find the frame, dispatch the event, then set mouse capture for a scrollframe in content. There is an animation in progress but mini-flushing on the root prescontext does nothing so the now moved div gets the mouse down. Then mouse up happens, this time the event is handled by the content presshell due to the capture, again we mini-flush, but this time on the content prescontext which is managing the animation and so the flush has an effect and so the mouse up is registered on the background div. That means the down and up happen on different elements so the click event is discarded and nothing happens.

My solution is to mini-flush all the prescontexts when we need to, then everything goes right and we get away without doing a full flush.

(BTW, touch is different from clicks, but this work was necessary because we want to have OMTA for desktop in the near future where we need to be able to handle clicks in this situation. I'm building not to double check it works with touches.)

Patches coming up once I've tested with touch as well as click.
Flags: needinfo?(dbaron)
(Assignee)

Comment 36

5 years ago
My enthusiasm was premature, it does not look like my solution works for touches. However, I am relatively optimistic it will not be too hard to fix. I'll be on it tomorrow morning.
(Assignee)

Comment 37

5 years ago
Created attachment 810245 [details] [diff] [review]
Use a mini-flush in PresShell::HandleEvent

There are a couple of questions in comments tagged with 'REVIEW'
Attachment #810245 - Flags: review?(dbaron)
(Assignee)

Comment 38

5 years ago
Created attachment 810247 [details] [diff] [review]
Add mini-flush for animations

We need to mini-flush animations as well as transitions.

I'm not very happy about the big macro, but it was that or lots of copypasta or a virtual method call in the middle of an inner loop, and these things are meant to be fast so I wasn't sure if that is a cost we could tolerate.
Attachment #810247 - Flags: review?(dbaron)
Mike Lee - can we get somebody to profile and measure the performance improvement with this patch?  It is not a trivial change, and if we are going to push for landing it on Aurora, it would be good to know how much we're gaining.
Flags: needinfo?(mlee)

Comment 40

5 years ago
Mason, need you to profile these changes and report your findings  per comment #39 here.
Flags: needinfo?(mlee) → needinfo?(mchang)
On it! I'm still new here though, so it might take me a while to get the profile up and running. I'll post the profiles on this bug once they are up.
Flags: needinfo?(mchang)
I think I got a few profiles. Here's the initial profile showing the 50ms layout::DoReflow delay on Hamachi Device.

https://people.mozilla.org/~bgirard/cleopatra/#report=43d980b6ebadca18674a335fd5417c9be71b34b2&search=RecvRealTouchEvent

This was based on mozilla-central 149475:df452c8738ef. After applying the patch, here are the results:

https://people.mozilla.org/~bgirard/cleopatra/#report=4d13ce9d5dcb3e496f2de9e5768ea88a03547184&search=RecvRealTouchEvent

Essentially we no longer see a layout::Flush associated with RecvRealTouchEvent, assuming I'm reading Cleopatra right. This was on the contacts app with the gaia-heavy reference workload, running b2g V1.2. Profiling was done with 1ms sampling rate and profiling the Gecko thread of b2g.

Is this the profile you needed Milan and Nick?
(Reporter)

Comment 43

5 years ago
(In reply to Mason Chang from comment #42)
> Essentially we no longer see a layout::Flush associated with
> RecvRealTouchEvent, assuming I'm reading Cleopatra right.

You are, it looks much better.
Hey Nick, I realize you're probably coming back from the summit. I'm new to bugzilla and should've need infoed you with comment 42. Let me know if there's anything else I can do to help!
Flags: needinfo?(ncameron)
(Assignee)

Comment 45

5 years ago
(In reply to Mason Chang from comment #44)
> Hey Nick, I realize you're probably coming back from the summit. I'm new to
> bugzilla and should've need infoed you with comment 42. Let me know if
> there's anything else I can do to help!

Hi, thanks for the profile. I'm not great at interpreting these things but BenWa is, so if he says it looks good, then I am happy. I don't think there is anything else we need right now, thanks. Just waiting for review and (after that) a decision on whether the gain is worth the risk of landing the patches.
Flags: needinfo?(ncameron)
Target Milestone: --- → 1.2 C3(Oct25)
Comment on attachment 810245 [details] [diff] [review]
Use a mini-flush in PresShell::HandleEvent

I like the refactoring to go inside dispatchUsingCoordinates and do this for all events for which dispatchUsingCoordinates istrue.

I'm not sure why the UnlockPointer call was there either, but it goes back to bug 737100, and it might be that that bit (unlike the rest of it) should be specific to NS_TOUCH_EVENT.  smaug would know.

Per hg blame it looks like the reason for using script blockers goes back to bug 436965, and given that it matches the things that restyle in FlushPendingNotifications it's probably a good idea -- but it seems better to use an nsAutoScriptBlocker rather than using AddScriptBlocker and RemoveScriptBlocker directly.  (Make sure its destructor runs *before* the frame = GetNearestFrameContainingPresShell(this) call, though.)

The bigger question I'd like to understand is which documents you need to flush styles for.  You're currently doing mDocument and all its subdocuments (for subframes, I presume), but you're *not* doing ancestor documents.  That seems a little bit peculiar to me.  The reason to want to flush is presumably that the async animations have moved something in a way that's going to change the position -> event target mapping.  The reason to want to flush styles in multiple documents is that the touch target might be moving due to movement i multiple documents -- its own document, and ancestor documents.  But if there's movement in ancestor documents, it better have already been flushed before we get to this point, or else we might be the wrong pres shell entirely.  The way I'd like to think the event dispatch works is that if the event lands within a subframe, we'll go through this code that you're modifying for the outer frame, and then go through it again for the inner frame if the coordinates in the outer frame mean that the event is inside an inner frame.  If that's the case, then you don't need to enumerate sub documents at all; you just need to flush mDocument (which is what the code that you're removing did, too).  If it's not the case, then I'd think you need to flush ancestor documents -- and probably somewhere earlier even before you get to this point (but if that were needed, it seems like the old code would have needed to do it as well, so it seems unlikely).  I'm having trouble seeing the justification for flushing all subdocuments.  So I'm inclined to think the right thing is dropping the EnumerateSubDocuments call, and doing the flushing only on mDocument.

Marking review- to at least prompt answers to the last question, but if you think this is right, feel free to ask again.
Attachment #810245 - Flags: review?(dbaron)
Attachment #810245 - Flags: review-
Attachment #810245 - Flags: feedback?(bugs)
Comment on attachment 810247 [details] [diff] [review]
Add mini-flush for animations

AnimationCommon.h:

>+  // Reparent aContent's style.
>+  already_AddRefed<nsStyleContext> ReparentContent(nsIContent* aContent,
>+                                                   nsStyleContext* aParentStyle);

Mention that it does :before and :after too, in the comment.

>+#define UPDATE_THROTTLED_INTERNAL(aClass,aAnimationsGetter) void               \


space after comma

And perhaps also "IMPL" at the beginning or end of the macro name,
and the full name of the function, e.g.,
IMPL_UPDATE_ALL_THROTTLED_STYLES_INTERNAL.  I think that would be a bit
clearer.

also, make the macro arguments all-lowercase (with underscores if
needed), and with a trailing _ to signal that they're macro arguments.



There are multiple 80th column violations in the patch.  Please fix them.

r=dbaron with that
Attachment #810247 - Flags: review?(dbaron) → review+
(In reply to David Baron [:dbaron] (needinfo? me) from comment #46)
> I'm not sure why the UnlockPointer call was there either, but it goes back
> to bug 737100, and it might be that that bit (unlike the rest of it) should
> be specific to NS_TOUCH_EVENT.  smaug would know.
Pointer lock is for mouse events, for now at least, and since it is not defined for
touch events (not talking about pointer events here), we explicitly unlock if someone
uses touch. for now.
(In reply to David Baron [:dbaron] (needinfo? me) (vacation until October 21) from comment #46)
> The way I'd like to think the event dispatch
> works is that if the event lands within a subframe, we'll go through this
> code that you're modifying for the outer frame, and then go through it again
> for the inner frame if the coordinates in the outer frame mean that the
> event is inside an inner frame.

That's not how it works :-). Basically events are always dispatched by calling PresShell::HandleEvent containing the widget which received the event. If the event needs to go to the focused document or the document currently holding mouse capture, we recursively call PresShell::HandleEvent for that document, but there's only ever one level of recursion. Targeting of events at subdocuments is achieved by building a displaylist for the presshell which crosses subdocument boundaries.

> If that's the case, then you don't need to
> enumerate sub documents at all; you just need to flush mDocument (which is
> what the code that you're removing did, too).  If it's not the case, then
> I'd think you need to flush ancestor documents -- and probably somewhere
> earlier even before you get to this point (but if that were needed, it seems
> like the old code would have needed to do it as well, so it seems unlikely).

I think Nick's approach is right. We have to flush subdocuments to ensure that the display list we build is accurate. (Maybe we don't have to flush all of them, but that could get complicated.) We don't have to flush ancestors since we guarantee that when PresShell::HandleEvent is called, the event must go to a document in the subdocument tree rooted at this presshell.
(Assignee)

Comment 50

5 years ago
dbaron: does roc's comments persuade you that the patch is correct? Or does it still need changes?
Target Milestone: 1.2 C3(Oct25) → ---
(Assignee)

Updated

5 years ago
Target Milestone: --- → 1.2 C3(Oct25)
Comment on attachment 810245 [details] [diff] [review]
Use a mini-flush in PresShell::HandleEvent

sounds good, so r=dbaron with the third paragraph of comment 46 addressed (ignore the long fourth paragraph), plus the change implied by comment 48 (keeping the UnlockPointer call specific to touch events)
Attachment #810245 - Flags: review- → review+
(Assignee)

Comment 52

5 years ago
(In reply to David Baron [:dbaron] (needinfo? me) from comment #46)
> Comment on attachment 810245 [details] [diff] [review]
> Use a mini-flush in PresShell::HandleEvent
> 
> I like the refactoring to go inside dispatchUsingCoordinates and do this for
> all events for which dispatchUsingCoordinates istrue.
> 

Do you mean inside WidgetEvent::IsUsingCoordinates? And by 'refactoring' do you mean the code for doing flushes? That seems odd to me - IsUsingCoordinates is just a tester method at the moment and the code uses a member function of nsPresShell.
Flags: needinfo?(dbaron)
(In reply to Nick Cameron [:nrc] from comment #52)
> (In reply to David Baron [:dbaron] (needinfo? me) from comment #46)
> > Comment on attachment 810245 [details] [diff] [review]
> > Use a mini-flush in PresShell::HandleEvent
> > 
> > I like the refactoring to go inside dispatchUsingCoordinates and do this for
> > all events for which dispatchUsingCoordinates istrue.
> > 
> 
> Do you mean inside WidgetEvent::IsUsingCoordinates? And by 'refactoring' do
> you mean the code for doing flushes? That seems odd to me -
> IsUsingCoordinates is just a tester method at the moment and the code uses a
> member function of nsPresShell.

I mean I like the change that you made that's already in the patch; no need to do anything else.
Flags: needinfo?(dbaron)
(Assignee)

Comment 54

5 years ago
(In reply to David Baron [:dbaron] (needinfo? me) from comment #53)
> (In reply to Nick Cameron [:nrc] from comment #52)
> > (In reply to David Baron [:dbaron] (needinfo? me) from comment #46)
> > > Comment on attachment 810245 [details] [diff] [review]
> > > Use a mini-flush in PresShell::HandleEvent
> > > 
> > > I like the refactoring to go inside dispatchUsingCoordinates and do this for
> > > all events for which dispatchUsingCoordinates istrue.
> > > 
> > 
> > Do you mean inside WidgetEvent::IsUsingCoordinates? And by 'refactoring' do
> > you mean the code for doing flushes? That seems odd to me -
> > IsUsingCoordinates is just a tester method at the moment and the code uses a
> > member function of nsPresShell.
> 
> I mean I like the change that you made that's already in the patch; no need
> to do anything else.

Ah, great, nice and easy :-)
The mochitest-metro test failure is probably caused by Metro test code that depends on animation timing in incorrect or fragile ways.  Let us know if you need any help debugging/fixing the test(s).
Comment on attachment 810245 [details] [diff] [review]
Use a mini-flush in PresShell::HandleEvent

Hmm, I gave feedback already.
Attachment #810245 - Flags: feedback?(bugs)

Comment 59

5 years ago
Updating Target Milestone for FxOS Perf koi+'s.
Target Milestone: 1.2 C3(Oct25) → 1.2 C4(Nov8)
(Assignee)

Comment 60

5 years ago
(In reply to Matt Brubeck (:mbrubeck) from comment #57)
> The mochitest-metro test failure is probably caused by Metro test code that
> depends on animation timing in incorrect or fragile ways.  Let us know if
> you need any help debugging/fixing the test(s).

Sorry, was distracted by other bugs. So, I pushed a few things to Try, and if we do a full flush rather than a mini-flush then we pass the Metro tests. Obviously, that defeats the purpose of this bug. Nothing else I experimented with solved the problem.

I don't see anything obviously timing related in the tests which this should affect, but I don't really understand what is going on.

I'd really appreciate some help debugging this, especially as I don't have a Metro device. Any suggestions for things I can throw at Try?
Flags: needinfo?(mbrubeck)
Adding jimm who wrote the Metro test in question.

The first failure happens here:
http://hg.mozilla.org/mozilla-central/file/48dbd532a004/browser/metro/base/tests/mochitest/browser_selection_basic.js#l176

This test uses the TouchDragAndHold helper defined in browser/metro/base/tests/mochitest/head.js, which repeatedly calls EventUtils.synthesizeTouchAtPoint to simulate a "drag" (touchstart, sequence of touchmove, touchend).  The drag *should* be targeted at a floating selection "marker" that is part of our browser chrome.  The test then does some stuff to wait until our browser chrome has processed the resulting events and messages, and then checks to see that the selection state has been updated correctly.

Dragging the selection marker works by setting .left and .top on the marker element, which is an element in a XUL stack.  However, the code that controls the marker also caches this position, and the test reads from that cached position (.xPos) rather than from element.left.  Is it possible that avoiding reflows means the element hasn't actually been moved to its new position yet, and so it doesn't receive the touch events sent by the test?  Should we force a reflow by reading from element.left directly, or maybe through a hack like offsetLeft?
Flags: needinfo?(mbrubeck)
(Assignee)

Comment 62

5 years ago
(In reply to Matt Brubeck (:mbrubeck) from comment #61)
> Adding jimm who wrote the Metro test in question.
> 
> The first failure happens here:
> http://hg.mozilla.org/mozilla-central/file/48dbd532a004/browser/metro/base/
> tests/mochitest/browser_selection_basic.js#l176
> 
> This test uses the TouchDragAndHold helper defined in
> browser/metro/base/tests/mochitest/head.js, which repeatedly calls
> EventUtils.synthesizeTouchAtPoint to simulate a "drag" (touchstart, sequence
> of touchmove, touchend).  The drag *should* be targeted at a floating
> selection "marker" that is part of our browser chrome.  The test then does
> some stuff to wait until our browser chrome has processed the resulting
> events and messages, and then checks to see that the selection state has
> been updated correctly.
> 
> Dragging the selection marker works by setting .left and .top on the marker
> element, which is an element in a XUL stack.  However, the code that
> controls the marker also caches this position, and the test reads from that
> cached position (.xPos) rather than from element.left.  Is it possible that
> avoiding reflows means the element hasn't actually been moved to its new
> position yet, and so it doesn't receive the touch events sent by the test? 
> Should we force a reflow by reading from element.left directly, or maybe
> through a hack like offsetLeft?

I tried replacing xPos with _element.left and likewise for yPos, but that did not change the outcome. Which is surprising, I would have thought that would trigger a full flush in its own right.
(Assignee)

Comment 63

5 years ago
Try push with that failure: https://tbpl.mozilla.org/?tree=Try&rev=c623465bd870

Comment 64

5 years ago
(In reply to Nick Cameron [:nrc] from comment #63)
> Try push with that failure:
> https://tbpl.mozilla.org/?tree=Try&rev=c623465bd870

The first failure is a selection exists test (looking for active selection in the page) which fails. Is there any way your changes could have triggered a reset of selected content? The test checks to be sure there is initial selection before the drag, so something goes wrong after we try to manipulate the selection bounds using the selection monocles.

The other thing that seems clear is that the drag of the monocle takes place and triggers code your patch - 

https://tbpl.mozilla.org/php/getParsedLog.php?id=30164774&tree=Try&full=1#error0

Not sure what

14:13:21     INFO -  Would have flushed
14:13:21     INFO -  full flush

means but we get a slew of these while the drag takes place.

I can try running this locally to see. We can also dump a bunch of screen shots if the test failure is slave specific to see what's going on.
(Assignee)

Comment 65

5 years ago
(In reply to Jim Mathies [:jimm] from comment #64)
> (In reply to Nick Cameron [:nrc] from comment #63)
> > Try push with that failure:
> > https://tbpl.mozilla.org/?tree=Try&rev=c623465bd870
> 
> The first failure is a selection exists test (looking for active selection
> in the page) which fails. Is there any way your changes could have triggered
> a reset of selected content? The test checks to be sure there is initial
> selection before the drag, so something goes wrong after we try to
> manipulate the selection bounds using the selection monocles.
> 
> The other thing that seems clear is that the drag of the monocle takes place
> and triggers code your patch - 
> 
> https://tbpl.mozilla.org/php/getParsedLog.
> php?id=30164774&tree=Try&full=1#error0
> 
> Not sure what
> 
> 14:13:21     INFO -  Would have flushed
> 14:13:21     INFO -  full flush
> 

Hmm, this is interesting - it means that I posted the wrong patch, well an extra patch actually which reverts part of my change from full to mini flush. So, this patch queue is the one that succeeds, but changing xPos to element.left causes us to fail (seemingly without the changes which caused the same failure before).

I'm not exactly sure what this means. Possibly we are (in some way) doing more flushing than before and that upsets the assumptions in the test infrastructure. I'm not sure if accessing the element rather than the cached xPos will trigger more flushing than the touch event would.

> means but we get a slew of these while the drag takes place.
> 
> I can try running this locally to see. We can also dump a bunch of screen
> shots if the test failure is slave specific to see what's going on.

It might be useful to see what is happening on the screen during the test (or just in normal use too, to see if my patches cause real problems or just test problems). I'm pretty sure it is not just one slave, I've seen these errors on multiple try runs now.

Comment 66

5 years ago
Can you post with a rollup merged to mc tip? The two patches posted here don't apply cleanly.

Comment 67

5 years ago
This might interfere even more but you can snap some screen shots during the particular test that fails:

let snap = SpecialPowers.snapshotWindow(window, false);
info(snap.toDataURL());

which will show up as data uris in the logs.

Probably best to start by running them locally and see if they fail.
(Assignee)

Comment 68

5 years ago
Created attachment 827685 [details] [diff] [review]
rebased rollup patch

Comment 69

5 years ago
I'm seeing out of sync touch event delivery here. For example, in the 'expand / collapse selection' test, I see:

TEST-PASS | chrome://mochitests/content/metro/browser/metro/base/tests/mochitest/browser_selection_basic.js | selection active
TEST-PASS | chrome://mochitests/content/metro/browser/metro/base/tests/mochitest/browser_selection_basic.js | short selection test
SelectionHelperUI: touchstart 
SelectionHandler: Browser:SelectionMoveStart 
SelectionHelperUI: Content:SelectionRange 
.. (lots of ranges, expected)
SelectionHelperUI: Content:SelectionRange 
SelectionHelperUI: touchend 
SelectionHelperUI: Content:SelectionRange 
SelectionHandler: Browser:SelectionMoveEnd 
(the ping below post the drag to clear the message manager queue, expected)
SelectionHandler: Browser:SelectionHandlerPing 
SelectionHelperUI: Content:SelectionHandlerPong 
TEST-PASS | chrome://mochitests/content/metro/browser/metro/base/tests/mochitest/browser_selection_basic.js | long selection test
TEST-PASS | chrome://mochitests/content/metro/browser/metro/base/tests/mochitest/browser_selection_basic.js | selection active
SelectionHelperUI: touchstart 
^^ unexpected, odd touchstart delivered 
SelectionHandler: Browser:SelectionClose 
SelectionHelperUI: Content:HandlerShutdown 
^^ selection handler shutting down due touchstart
TEST-UNEXPECTED-FAIL | chrome://mochitests/content/metro/browser/metro/base/tests/mochitest/browser_selection_basic.js | selection active - Got false, expected true

Comment 70

5 years ago
The coords are interesting as well. In this particular test, we have a sentence in the page. We select the first work of the sentence, and then using the monocles, drag the selection over to the end of the sentence. At this point we check the selected text of the page to be sure everything worked. Then we drag the monocle back to the first word and check again. We releat that process five times.

On the first drag I see a touch start that fits what the test is doing:
SelectionHelperUI: touchstart 
touchstart: 74,37  

and at the end I see:

SelectionHelperUI: touchend 
SelectionHelperUI: touchstart 
touchstart: 449,37
(Assignee)

Comment 71

5 years ago
(In reply to Jim Mathies [:jimm] from comment #69)
> I'm seeing out of sync touch event delivery here. For example, in the
> 'expand / collapse selection' test, I see:
...
> SelectionHelperUI: touchstart 
> ^^ unexpected, odd touchstart delivered 

So if we were setting off a new touchstart where we shouldn't be then we would see this behaviour and then fail?
(Assignee)

Comment 72

5 years ago
(In reply to Jim Mathies [:jimm] from comment #70)
> The coords are interesting as well. In this particular test, we have a
> sentence in the page. We select the first work of the sentence, and then
> using the monocles, drag the selection over to the end of the sentence. At
> this point we check the selected text of the page to be sure everything
> worked. Then we drag the monocle back to the first word and check again. We
> releat that process five times.
> 
> On the first drag I see a touch start that fits what the test is doing:
> SelectionHelperUI: touchstart 
> touchstart: 74,37  
> 
> and at the end I see:
> 
> SelectionHelperUI: touchend 
> SelectionHelperUI: touchstart 
> touchstart: 449,37

Presumably the second touchstart is either at the end of the text or somewhere in between? So the movement five times, that happens all in one touch event? Or each selection is a separate touch? What would you expect to see for this value - the same as at the start? And this time around you didn't see any more touchstarts?

Comment 73

5 years ago
(In reply to Nick Cameron [:nrc] from comment #72)
> (In reply to Jim Mathies [:jimm] from comment #70)
> > The coords are interesting as well. In this particular test, we have a
> > sentence in the page. We select the first work of the sentence, and then
> > using the monocles, drag the selection over to the end of the sentence. At
> > this point we check the selected text of the page to be sure everything
> > worked. Then we drag the monocle back to the first word and check again. We
> > releat that process five times.
> > 
> > On the first drag I see a touch start that fits what the test is doing:
> > SelectionHelperUI: touchstart 
> > touchstart: 74,37  
> > 
> > and at the end I see:
> > 
> > SelectionHelperUI: touchend 
> > SelectionHelperUI: touchstart 
> > touchstart: 449,37
> 
> Presumably the second touchstart is either at the end of the text or
> somewhere in between? So the movement five times, that happens all in one
> touch event? Or each selection is a separate touch? What would you expect to
> see for this value - the same as at the start? And this time around you
> didn't see any more touchstarts?

I wouldn't expect to see the touchstart after the touchend, it doesn't correlate to what the test harness is doing. Looks like some sort of erroneously generated event.

The test harness here sends a touchstart for a single touch point, a bunch of touchmoves with coords that move across the screen, then a touch end. It's simulating a simple touch down, drag, touch up on a monocle. As the monocle is dragged the selection library manipulates the selection underneath using various selection apis we have for setting and manipulating selection in the page.

harness:
http://mxr.mozilla.org/mozilla-central/source/browser/metro/base/tests/mochitest/head.js#760
test:
http://mxr.mozilla.org/mozilla-central/source/browser/metro/base/tests/mochitest/browser_selection_basic.js#135

Comment 74

5 years ago
Created attachment 829210 [details]
test debug output

So I went in and added a ton of logging to this test and slowed things down a bit. My original assessment that there was a errant touchstart was wrong. What we're seeing here is a failure to drag a monocle. I need to add some more output here to figure out what's going on. Will post back in a bit.

Comment 75

5 years ago
 0:16.77 TEST-INFO | chrome://mochitests/content/metro/browser/metro/base/tests/mochitest/browser_selection_basic.js | ** end drag
 0:16.77 SelectionHelperUI: Content:SelectionRange
 0:16.77 SelectionHelperUI: Content:SelectionRange
 0:16.77 SelectionHandler: Browser:SelectionMoveEnd
 0:16.77 SelectionHelperUI: Content:SelectionRange
 0:16.77 SelectionHelperUI: Content:SelectionRange
 0:16.77 SelectionHandler: Browser:SelectionHandlerPing
 0:16.77 SelectionHelperUI: Content:SelectionHandlerPong
 0:16.77 TEST-PASS | chrome://mochitests/content/metro/browser/metro/base/tests/mochitest/browser_selection_basic.js | selection active
 0:16.77 TEST-PASS | chrome://mochitests/content/metro/browser/metro/base/tests/mochitest/browser_selection_basic.js | drag back selection check
 0:16.77 TEST-INFO | chrome://mochitests/content/metro/browser/metro/base/tests/mochitest/browser_selection_basic.js | ** starting new drag
 0:16.77 TEST-INFO | chrome://mochitests/content/metro/browser/metro/base/tests/mochitest/browser_selection_basic.js | [0] touchstart 74.69999694824219 x 37.55000305175781
 0:16.77 SelectionHelperUI: MozAppbarDismissing
 0:16.77 SelectionHelperUI: touchstart
 0:16.77 touchstart: 74 37
 0:16.77 TEST-INFO | chrome://mochitests/content/metro/browser/metro/base/tests/mochitest/browser_selection_basic.js | [1] touchmove 84.20599700927734 x 37.55000305175781
 0:16.77 SelectionHelperUI: touchmove received, checking for content scroll.
 0:16.77 TEST-INFO | chrome://mochitests/content/metro/browser/metro/base/tests/mochitest/browser_selection_basic.js | [2] touchmove 93.71199707031249 x 37.55000305175781
 0:16.77 SelectionHelperUI: touchmove received, checking for content scroll.
 0:16.77 TEST-INFO | chrome://mochitests/content/metro/browser/metro/base/tests/mochitest/browser_selection_basic.js | [3] touchmove 103.21799713134764 x 37.55000305175781
 0:16.77 SelectionHelperUI: touchmove received, checking for content scroll.
 0:16.77 SelectionHelperUI: content is being scrolled, shutdown selection handling.
 0:16.77 closeEditSession
 0:16.77 SelectionHandler: Browser:SelectionClose
 0:16.77 SelectionHelperUI: Content:HandlerShutdown

Looks like when we go to start a new drag, the drag coords don't match the position of the monocle.

Comment 76

5 years ago
Unfortunately snapping screen shots after every drag fixes the bug.

Comment 77

5 years ago
Not sure how to proceed. Basically, a round 18x18 image event target (the monocle) is not where it's supposed to be, or it's where it's supposed to be and our tests have been compensating for bad position.

Updated

5 years ago
Target Milestone: 1.2 C4(Nov8) → 1.2 C5(Nov22)

Updated

5 years ago
Blocks: 936535

Updated

5 years ago
Blocks: 937425

Comment 78

5 years ago
Is there anything else we can try here?  This patch is very helpful on the b2g FPS scrolling bugs I am working.  Is a win8 machine adequate to build and run these tests?
(Assignee)

Comment 79

5 years ago
(In reply to Ben Kelly [:bkelly] from comment #78)
> Is there anything else we can try here?  This patch is very helpful on the
> b2g FPS scrolling bugs I am working.  Is a win8 machine adequate to build
> and run these tests?

There is certainly more to do here, just been stuck with some other high priority stuff lately. Will get back on this soon.

Comment 80

5 years ago
Looking to fix this but will need to push this to 1.3 if not fixed by December 1st.

1.2 Triage
(Assignee)

Comment 81

5 years ago
We could land this with a b2g ifdef if we want it to make 1.2 for sure. We really shouldn't land this last minute because I think it carries some risk. So (although it makes me pretty sad) I think we should land now with an ifdef, rather than try to fix this and then land. I will try and find a proper solution asap too, but that might take a while.

mlee - does this sound sensible?
dbaron - does this fly from a technical point of view?
Flags: needinfo?(mlee)
Flags: needinfo?(dbaron)
Target Milestone: 1.2 C5(Nov22) → 1.2 C4(Nov8)

Comment 82

5 years ago
Thanks Nick.
I'll wait on dbaron's response. Please provide specifics about the risk involved in with your ifdef proposal.

Ben,
How dependent are we on this patch for resolving our outstanding FPS regressions?
Flags: needinfo?(ncameron)
Flags: needinfo?(mlee)
Flags: needinfo?(bkelly)
You're proposing making it a B2G ifdef because if you land it everywhere you end up with some metro tests failing?

Comment 84

5 years ago
(In reply to Mike Lee [:mlee] from comment #82)
> Ben,
> How dependent are we on this patch for resolving our outstanding FPS
> regressions?

Mike, I'm not sure.  We are still trying to identify the cause of the regression.  In theory this bug is not new, so the regression is unlikely directly related to this bug.  It just helps us compensate by minimizing reflows.

Nick, I appreciate you trying to accommodate the b2g deadlines and my desire to have this patch.  After thinking about it, though, it seems like it might be better to track down the test issue before landing and, if necessary, let it ride the trains in v1.3.

Just my opinion.  Thanks again!
Flags: needinfo?(bkelly)

Comment 85

5 years ago
What's the deadline on the 1.2 release/merge/whatever? I'd prefer we not add to our b2g technical debt. I'll find some time to poke at this more today. Maybe nrc can give me a hand via irc.

Comment 86

5 years ago
(In reply to Jim Mathies [:jimm] from comment #85)
> What's the deadline on the 1.2 release/merge/whatever? I'd prefer we not add
> to our b2g technical debt. I'll find some time to poke at this more today.
> Maybe nrc can give me a hand via irc.

Per comment 80, we need to land before Dec 1 in order to get it into v1.2.  Thanks!

Comment 87

5 years ago
Nick, could your work impact how quickly a xul element's visibility is reflected in the dom? Our selection code hides the xul:toolbarbuttons [1] associated with the markers when they are being dragged, then sets them back once the drag completes. We do this via the hidden property on the element [2]. When we test we repeat each drag for a set of iterations. So for example:

1) select a word
2) drag right hand marker to the right expanding selection across a sentence
3) check what text is selected
4) drag right hand marker to the left contracting selection back to a word
5) check what text is selected
6) goto 2 for 4 more iterations

In steps 2 & 4, the drag marker gets hidden by our selection controller during the drag. Both steps also assume that when they start the marker is visible and will receive touch events.

If I add a delay after 3 & 5, the test succeeds. If I don't sometimes the follow up drag doesn't pick up a marker implying that maybe the marker isn't visible in the dom yet. These are automated tests, so they execute really quickly.

[1] http://mxr.mozilla.org/mozilla-central/source/browser/metro/base/content/bindings/selectionoverlay.xml#13
[2] http://mxr.mozilla.org/mozilla-central/source/browser/metro/base/content/helperui/SelectionHelperUI.js#154
Flags: needinfo?(dbaron)
(Assignee)

Comment 88

5 years ago
(In reply to David Baron [:dbaron] (needinfo? me) from comment #83)
> You're proposing making it a B2G ifdef because if you land it everywhere you
> end up with some metro tests failing?

Yes (I know, that is kind of evil, but possibly better than a last minute landing of a high-ish risk patch).(In reply to Ben Kelly [:bkelly] from comment #84)
> (In reply to Mike Lee [:mlee] from comment #82)
> > Ben,
> > How dependent are we on this patch for resolving our outstanding FPS
> > regressions?
> 
> Nick, I appreciate you trying to accommodate the b2g deadlines and my desire
> to have this patch.  After thinking about it, though, it seems like it might
> be better to track down the test issue before landing and, if necessary, let
> it ride the trains in v1.3.
> 
> Just my opinion.  Thanks again!

OK, fine by me. In fact, better :-)

(In reply to Jim Mathies [:jimm] from comment #85)
> What's the deadline on the 1.2 release/merge/whatever? I'd prefer we not add
> to our b2g technical debt. I'll find some time to poke at this more today.
> Maybe nrc can give me a hand via irc.

I would much prefer to fix the tests. I'm technically on holiday today, but keen to get stuck in next week.
Flags: needinfo?(ncameron)

Updated

5 years ago
blocking-b2g: koi+ → 1.3+
Whiteboard: [c=handeye p= s= u=1.2] → [c=handeye p= s= u=1.3]
Target Milestone: 1.2 C4(Nov8) → 1.3 Sprint 6 - 12/6
(Assignee)

Comment 89

5 years ago
(In reply to Jim Mathies [:jimm] from comment #87)
> Nick, could your work impact how quickly a xul element's visibility is
> reflected in the dom? Our selection code hides the xul:toolbarbuttons [1]
> associated with the markers when they are being dragged, then sets them back
> once the drag completes. We do this via the hidden property on the element
> [2]. When we test we repeat each drag for a set of iterations. So for
> example:
> 
> 1) select a word
> 2) drag right hand marker to the right expanding selection across a sentence
> 3) check what text is selected
> 4) drag right hand marker to the left contracting selection back to a word
> 5) check what text is selected
> 6) goto 2 for 4 more iterations
> 
> In steps 2 & 4, the drag marker gets hidden by our selection controller
> during the drag. Both steps also assume that when they start the marker is
> visible and will receive touch events.
> 
> If I add a delay after 3 & 5, the test succeeds. If I don't sometimes the
> follow up drag doesn't pick up a marker implying that maybe the marker isn't
> visible in the dom yet. These are automated tests, so they execute really
> quickly.
> 
> [1]
> http://mxr.mozilla.org/mozilla-central/source/browser/metro/base/content/
> bindings/selectionoverlay.xml#13
> [2]
> http://mxr.mozilla.org/mozilla-central/source/browser/metro/base/content/
> helperui/SelectionHelperUI.js#154

This seems like a reasonable hypothesis. I replace |FlushPendingNotifications| by a flush which only flushes animations and transitions. If the change to the hidden property was previously flushed by |FlushPendingNotifications| then it no longer would be, which would have the effects we observe/you describe.

We can confirm this by logging what gets flushed in that FlushPendingNotifications call. I'm not sure how easy that is, but I'll have a go.

If it is the case, I'm not sure how to fix it. This doesn't seem like something that should be touch specific like it is currently. I would expect a similar test, but using mouse actions rather than touch, would currently fail. I'm not sure under what circumstances we need to do a full flush and when we can get away with just flushing animations.
(Assignee)

Comment 90

5 years ago
Created attachment 8336631 [details] [diff] [review]
some pretty basic logging

jimm or anyone else with a metro setup: could you run the failing tests using a build with this patch please and post the logs? (It needs to build with DEBUG, but on Try the mc testsuite only runs for opt builds. I don't have a local Metro build). Thanks!
Flags: needinfo?(jmathies)
(In reply to Nick Cameron [:nrc] from comment #90)
> jimm or anyone else with a metro setup: could you run the failing tests
> using a build with this patch please and post the logs? (It needs to build
> with DEBUG, but on Try the mc testsuite only runs for opt builds. I don't
> have a local Metro build). Thanks!

You might be able to use Try despite this, by having the patch stack that you push to try include a change to the build configuration.  (You can modify the appropriate {browser,mobile/android,b2g,xulrunner}/config/mozconfigs/... file if you can figure out which is the right one, or just hack configure.in if you can't.)

Updated

5 years ago
No longer blocks: 936535

Comment 92

5 years ago
(In reply to Nick Cameron [:nrc] from comment #90)
> Created attachment 8336631 [details] [diff] [review]
> some pretty basic logging
> 
> jimm or anyone else with a metro setup: could you run the failing tests
> using a build with this patch please and post the logs? (It needs to build
> with DEBUG, but on Try the mc testsuite only runs for opt builds. I don't
> have a local Metro build). Thanks!

I'll take it for a spin. FWIW, you can include "WinUtil.h" and use the logging functions there to get output on try runs in release builds.
Flags: needinfo?(jmathies)

Comment 93

5 years ago
Is this logging patch to be applied with the rollup? It doesn't look the same around line 6168 in nsPresShell. Also it doesn't apply cleanly.
(Assignee)

Comment 94

5 years ago
(In reply to Jim Mathies [:jimm] from comment #93)
> Is this logging patch to be applied with the rollup? It doesn't look the
> same around line 6168 in nsPresShell. Also it doesn't apply cleanly.

No, just on top of trunk.

(Its not the logging functions I need from debug builds, its the |List| functions for elements - and they call off into a whole bunch of other functions which are also ifdef'ed).

Thanks for giving it a go!
Flags: needinfo?(jmathies)

Comment 95

4 years ago
Jim, have you been able to take a look at this since Nick's reply?

Comment 96

4 years ago
(In reply to Mike Lee [:mlee] from comment #95)
> Jim, have you been able to take a look at this since Nick's reply?

No I haven't, sorry. Been busy on fx28 merge related work. I'll get back to this next week.
Flags: needinfo?(jmathies)

Comment 97

4 years ago
Created attachment 8346658 [details]
log 1

Here you go.
Attachment #8346658 - Flags: feedback?(ncameron)
(Assignee)

Comment 98

4 years ago
Comment on attachment 8346658 [details]
log 1

Great, thanks!
Attachment #8346658 - Flags: feedback?(ncameron)
(Assignee)

Comment 99

4 years ago
This is the interesting bit from the log, we get it seven times over the duration of the test:

Before restyling:

XUL* xul:toolbarbutton@066035B8 anonid="selectionhandle-mark3" class="selectionhandle" label="^" left="8px" top="25.550003051757812px" state=[10000] flags=[00000a08] primaryframe=00000000 refcount=8<>
anonymous-children<
  XUL* xul:image@0C416B58 class="toolbarbutton-icon" xbl:inherits="validate,src=image,label" label="^" state=[10000] flags=[00000a20] primaryframe=00000000 refcount=5<>
  XUL* xul:label@0C416BD8 class="toolbarbutton-text" crop="right" flex="1" xbl:inherits="value=label,accesskey,crop" value="^" state=[10000] flags=[00000a20] primaryframe=00000000 refcount=5<>
>

After restyling:

XUL* xul:toolbarbutton@066035B8 anonid="selectionhandle-mark3" class="selectionhandle" label="^" left="8px" top="25.550003051757812px" state=[10000] flags=[00000a08] primaryframe=0689ECB8 refcount=9<>
anonymous-children<
  XUL* xul:image@0C416B58 class="toolbarbutton-icon" xbl:inherits="validate,src=image,label" label="^" state=[10000] flags=[00000a20] primaryframe=0BD9E628 refcount=6<>
  XUL* xul:label@0C416BD8 class="toolbarbutton-text" crop="right" flex="1" xbl:inherits="value=label,accesskey,crop" value="^" state=[10000] flags=[00000a20] primaryframe=00000000 refcount=6<>
>

I don't think that it helps confirm or disprove the hypothesis from comment 89 though :-(
(In reply to Jim Mathies [:jimm] from comment #87)
> If I add a delay after 3 & 5, the test succeeds. If I don't sometimes the
> follow up drag doesn't pick up a marker implying that maybe the marker isn't
> visible in the dom yet. These are automated tests, so they execute really
> quickly.

It seems clear to me that Nick's patch means we don't flush layout or style while handling your synthesized DOM events, so the marker element is still treated as invisible.

Can you try inserting a forced layout+style flush instead of a delay? E.g. by calling getBoundingClientRect on the marker element? That should fix it.

If that fixes it, I think we have two choices here: actually add those flushes to the test, or modify the event synthesis code (called via nsDOMWindowUtils I guess?) to do the flushes automatically. I slightly prefer the former.
This was punted from 1.2. Why is this necessary for 1.3?
blocking-b2g: 1.3+ → 1.3?
Personally I think it can probably ride the trains, but how do we keep it high priority enough to finally get it landed?  Its a nice to have, but my tests with scrolling showed it was very good nice to have in terms of user experience.  (Maybe this has changed in impact with APZC, not sure.)
Nick, can you update 'rebased rollup patch' when you get a chance? I'll take rocs suggestion for a spin.
Flags: needinfo?(ncameron)
(Assignee)

Comment 104

4 years ago
Created attachment 8358991 [details] [diff] [review]
rebased rollup patch
Attachment #827685 - Attachment is obsolete: true
Flags: needinfo?(ncameron)
Created attachment 8359201 [details] [diff] [review]
mochitest-metro fix

This fixed mochitest-metro tests locally. I'd suggest some try runs to confirm. Also, maybe we should add an explicit flushLayout() call to dom window utils for this?
1.3+ for performance and APZC work
blocking-b2g: 1.3? → 1.3+
(Assignee)

Comment 107

4 years ago
Try push: https://tbpl.mozilla.org/?tree=Try&rev=3f58a1fc6ca1

Looks good :-)

jimm, could you ask someone appropriate for review on your patch please?
Flags: needinfo?(jmathies)
Attachment #8359201 - Flags: review+
Flags: needinfo?(jmathies)
This is going to need a branch-specific patch for Aurora (v1.3) uplift.
status-b2g-v1.3: --- → affected
status-b2g-v1.4: --- → fixed
status-firefox27: --- → wontfix
status-firefox28: --- → affected
status-firefox29: --- → fixed
Flags: needinfo?(ncameron)
Keywords: branch-patch-needed
Target Milestone: 1.3 Sprint 6 - 12/6 → 1.3 C3/1.4 S3(31jan)
Keywords: branch-patch-needed

Updated

4 years ago
Depends on: 969222
status-b2g-v1.3T: --- → fixed
You need to log in before you can comment on or make changes to this bug.