Closed Bug 1362404 Opened 4 years ago Closed 4 years ago

Handle PresShell shutdown better in INPUT_EVENT_RESPONSE probes

Categories

(Core :: Widget, enhancement, P2)

enhancement

Tracking

()

RESOLVED FIXED
mozilla55
Tracking Status
firefox55 --- fixed

People

(Reporter: chutten, Assigned: chutten)

References

(Blocks 1 open bug)

Details

(Whiteboard: tpi:+)

Attachments

(1 file)

:bsmedberg's analysis comparing INPUT_EVENT_RESPONSE_MS and INPUT_EVENT_RESPONSE_COALESCED_MS suggests that something slightly weird is afoot.

There are about 12% more clients that report bad (>2.5) hangs using the non-coalesced metric than there are using the coalesced metric.

This is weird because one would expect a slightly higher %ge reporting such things from the _coalesced_ metric as though it reports fewer events, it reports longer ones.

A hypothesis was hatched that amounts to: "Maybe on shutdown when the browser is busy, we have some stray laggy input events. The old measure records them as usual, but there is no moment's peace for the coalesced measure to record its information"

So, to fix this (or at least rule this out):
1) Don't record INPUT_EVENT_RESPONSE-family probes if mIsDestroying
2) Flush the currently-pending coalesced event on PressShell::Destroy()
Actually, tying it to the PresShell's destruction isn't right. Need to tie it to process shutdown.
Further analysis suggests that this isn't predominantly coming from shutdown or aborted-session pings. It's coming from roughly the right proportion of daily and environment-change pings as well.
gist analysis for reference: https://gist.github.com/bsmedberg/9b5fd467e0410bbd30b8f1a3ce977a85

masayuki, I could also use your advice/review on this problem: I'm having trouble trusting the data enough to proceed with using this for quantum release criteria for responsiveness because of the large disparity in affected users.
Flags: needinfo?(masayuki)
Another possibility that came to mind: maybe we're discarding these input events for good reasons because the coalesced event happened across a sleep boundary, even though an individual event didn't?

How quickly can we test/rule-out/prove hypotheses here? Can we cheaply measure how often various scenarios happen?
For sleep/wake we disregard events that were created before the computer wakes up. I can't think of a situation that would result in it causing a difference between events being reported. For instance, for the following timeline:

event1_start
event2_start
event1_end
sleep
wake
event2_end
event3_start
event3_end

INPUT_EVENT_RESPONSE_MS gives us (event1_end - event1_start) and (event3_end - event3_start)
INPUT_EVENT_RESPONSE_COALESCED_MS gives us (event1_end - event1_start) and is pending (event3_end - event3_start)

event2 is disregarded because by the time we think about it (event2_end) its mTimeStamp (event2_start) is before mLastOSWake (wake).
I have some thoughts:

1. INPUT_EVENT_RESPONSE_MS is recorded of duration between current event handling time and its creation time. However, INPUT_EVENT_RESPONSE_COALESCED_MS is recorded of duration between previous event handling time and previous (or older) event creation time. So, last event's INPUT_EVENT_RESPONSE_COALESCED_MS won't be recorded. However, this difference is caused only when the last PresShell is destroyed because INPUT_EVENT_RESPONSE_COALESCED_MS is recorded with static variables. So, I guess that this is NOT a problem.

2. If widget creates some events with actual creation time of the native event but the others with current time, following scenario is possible:

event1 created (with actual creation time)
event1 handled (sLastInputProcessed and sLastInputCreated are updated)
event2 created (with current time, newer than event1)
event2 handled (event1 recorded, sLastInputProcessed and sLastInputCreated are updated)
event3 created (with actual time, OLDER than event2)
event3 handled (event2 NOT recorded, ONLY sLastInputProcessed is updated)
event4 created (with actual time, which is newer than event3)
event4 handled (event2-3 are recorded, the duration MAY be shorter than expected)

So, sLastInputCreated should be updated if coming event is older than it.

3. If an event creates nested event with same timestamp (e.g., click for mouseup)

event1 created (with actual creation time)
  event2 created (with event1's timestamp)
  event2 handled (sLastInputProcessed is updated, sLastInputCreated is event1's timestamp)
event1 handled (event2 NOT recorded, sLastInputProcessed is updated, sLastInputCreated is not updated)

So, event2's short handling time is not recorded, this is different point from INPUT_EVENT_RESPONSE_MS.

4. If an event creates nested event with current time

event1 created (with actual creation time)
  event2 created (with current time)
  event2 handled (sLastInputProcessed and sLastInputCreated are updated)
event1 handled (event2 NOT recorded, ONLY sLastInputProcessed is updated)
event3 created (with actual creation time)
event3 handled (event1 recorded as the duration between event2 creation and event1 handled time)

So, this is similar to #2.
Flags: needinfo?(masayuki)
So comprehensive, thank you! 

(In reply to Masayuki Nakano [:masayuki] (JST, +0900) (PTO: 5/3-5/5) from comment #6)
> 2. If widget creates some events with actual creation time of the native
> event but the others with current time, following scenario is possible:
> ...
> So, sLastInputCreated should be updated if coming event is older than it.

Sounds like a plan.
 
> 3. If an event creates nested event with same timestamp (e.g., click for
> mouseup)
> 
> event1 created (with actual creation time)
>   event2 created (with event1's timestamp)
>   event2 handled (sLastInputProcessed is updated, sLastInputCreated is
> event1's timestamp)
> event1 handled (event2 NOT recorded, sLastInputProcessed is updated,
> sLastInputCreated is not updated)
> 
> So, event2's short handling time is not recorded, this is different point
> from INPUT_EVENT_RESPONSE_MS.

This should be taken care of if we wrap the whole thing in `if (aIsHandlingNativeEvent)`, correct? (bug 1359502)
 
> 4. If an event creates nested event with current time
> 
> event1 created (with actual creation time)
>   event2 created (with current time)
>   event2 handled (sLastInputProcessed and sLastInputCreated are updated)
> event1 handled (event2 NOT recorded, ONLY sLastInputProcessed is updated)
> event3 created (with actual creation time)
> event3 handled (event1 recorded as the duration between event2 creation and
> event1 handled time)
> 
> So, this is similar to #2.

I guess the gist of this was that I was too trusting about aEvent->mTimeStamp being monotonically increasing.

How much of this would we avoid by recording only native events?
Flags: needinfo?(masayuki)
Assignee: nobody → chutten
(In reply to Chris H-C :chutten from comment #7)
> (In reply to Masayuki Nakano [:masayuki] (JST, +0900) (PTO: 5/3-5/5) from
> comment #6)
> > 3. If an event creates nested event with same timestamp (e.g., click for
> > mouseup)
> > 
> > event1 created (with actual creation time)
> >   event2 created (with event1's timestamp)
> >   event2 handled (sLastInputProcessed is updated, sLastInputCreated is
> > event1's timestamp)
> > event1 handled (event2 NOT recorded, sLastInputProcessed is updated,
> > sLastInputCreated is not updated)
> > 
> > So, event2's short handling time is not recorded, this is different point
> > from INPUT_EVENT_RESPONSE_MS.
> 
> This should be taken care of if we wrap the whole thing in `if
> (aIsHandlingNativeEvent)`, correct? (bug 1359502)

Yes.

> > 4. If an event creates nested event with current time
> > 
> > event1 created (with actual creation time)
> >   event2 created (with current time)
> >   event2 handled (sLastInputProcessed and sLastInputCreated are updated)
> > event1 handled (event2 NOT recorded, ONLY sLastInputProcessed is updated)
> > event3 created (with actual creation time)
> > event3 handled (event1 recorded as the duration between event2 creation and
> > event1 handled time)
> > 
> > So, this is similar to #2.
> 
> I guess the gist of this was that I was too trusting about
> aEvent->mTimeStamp being monotonically increasing.

Yeah, unfortunately not. For example, native event might be swapped at retrieving the queue since input event should have higher priority, some native event may not have timestamp (check "time" in this page, https://developer.gnome.org/gdk3/stable/gdk3-Event-Structures.html, e.g., GdkEventFocus doesn't have it).

> How much of this would we avoid by recording only native events?

Honestly, I'm not sure. But I think that using older timestamp for sLastInputCreated makes this bug go away.
Flags: needinfo?(masayuki)
Comment on attachment 8865893 [details]
bug 1362404 - Handle out-of-order overlapping event coalescing.

https://reviewboard.mozilla.org/r/137482/#review140946
Attachment #8865893 - Flags: review?(masayuki) → review+
Pushed by chutten@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/07819b6e08bf
Handle out-of-order overlapping event coalescing. r=masayuki
https://hg.mozilla.org/mozilla-central/rev/07819b6e08bf
Status: NEW → RESOLVED
Closed: 4 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla55
You need to log in before you can comment on or make changes to this bug.