Closed Bug 1357457 Opened 3 years ago Closed 3 years ago

Create a variant of INPUT_EVENT_RESPONSE_MS which coalesces overlapping "hangs"

Categories

(Core :: Widget, defect, P2)

defect

Tracking

()

VERIFIED FIXED
mozilla55
Tracking Status
firefox54 --- verified
firefox55 --- verified

People

(Reporter: benjamin, Assigned: chutten)

References

(Blocks 1 open bug)

Details

(Whiteboard: tpi:+)

User Story

Given a stream of input events E: (E1->EN).

let lastStartTime be 0;
let lastEndTime be 0;

for event in E:
  processEvent(E)
  let thisStartTime = E.startTime
  let thisEndTime = now()

  if lastEndTime < thisStartTime:
    # events don't overlap. Record prior event telemetry if appropriate
    if lastEndTime != lastStartTime:
      recordInputTelemetry(lastEndTime - lastStartTime)
    lastStartTime = thisStartTime
    lastEndTime = thisEndTime
  else:
    # events do overlap. Alter the recorded end time but not start time
    # Don't record: wait for the next event
    lastEndTime = thisEndTime

Attachments

(1 file)

Given this situation:
* The user is typing.
* The chrome process starts a GC or other operation that ends up taking 5 seconds.
* The user keeps typing during that 5 seconds.

The current INPUT_EVENT_RESPONSE_MS will count each keypress during the GC as a separate input event. Which might be good in some cases, but for measuring user perception we'd prefer to count this as one giant hang.

I'm proposing another algorithm in the user story. I wrote it based on the belief that we can't know the contents of the current input queue. I'm open to alternate algorithm suggestions if this algorithm is impossible to implement.

This should continue to be recorded separately for chrome and content-directed events, just as INPUT_EVENT_RESPONSE_MS is currently.

Please consider this a quantum P1, since this may be used directly for quantum release criteria as well as an ongoing metric of release health.
Priority: -- → P2
Whiteboard: tpi:+
Comment on attachment 8860033 [details]
bug 1357457 - Report non-overlapping Input Responses to Telemetry.  data-r=bsmedberg

https://reviewboard.mozilla.org/r/132054/#review134866

data-r=me with the optout/expiration changes

::: layout/base/PresShell.cpp:8267
(Diff revision 1)
>      Telemetry::Accumulate(Telemetry::INPUT_EVENT_RESPONSE_MS, millis);
>      if (mDocument && mDocument->GetReadyStateEnum() != nsIDocument::READYSTATE_COMPLETE) {
>        Telemetry::Accumulate(Telemetry::LOAD_INPUT_EVENT_RESPONSE_MS, millis);
>      }
> +
> +    if (!mLastInputProcessed || mLastInputProcessed < aEvent->mTimeStamp) {

Is it worth documenting the algorithm in pseudocode, either here, or in the histogram description, or in the commit message?

::: toolkit/components/telemetry/Histograms.json:5675
(Diff revision 1)
>      "description": "Time (ms) from the Input event being created to the end of it being handled"
>    },
> +  "INPUT_EVENT_RESPONSE_COALESCED_MS": {
> +    "alert_emails": ["perf-telemetry-alerts@mozilla.com"],
> +    "bug_numbers": [1357457],
> +    "expires_in_version": "never",

To start, please make this opt-out but expires in 61. We'll likely make it 'never' but I want to do validation/testing first.
Attachment #8860033 - Flags: review+
Comment on attachment 8860033 [details]
bug 1357457 - Report non-overlapping Input Responses to Telemetry.  data-r=bsmedberg

https://reviewboard.mozilla.org/r/132054/#review134866

> Is it worth documenting the algorithm in pseudocode, either here, or in the histogram description, or in the commit message?

I think it's fine in the bug. It's a second-order source, true, but one that's very accessible and visible.
Comment on attachment 8860033 [details]
bug 1357457 - Report non-overlapping Input Responses to Telemetry.  data-r=bsmedberg

https://reviewboard.mozilla.org/r/132054/#review135164

If I misunderstand something, please explain about that and request review again.

::: layout/base/PresShell.h:914
(Diff revision 2)
>  
>    mozilla::TimeStamp        mLastOSWake;
> +
> +  mozilla::TimeStamp        mLastInputCreated;
> +  mozilla::TimeStamp        mLastInputProcessed;
> +

Remove this unnecessary empty line.

::: layout/base/PresShell.cpp:8256
(Diff revision 2)
>    if (Telemetry::CanRecordBase() &&
>        !aEvent->mTimeStamp.IsNull() &&
>        aEvent->mTimeStamp > mLastOSWake &&
>        aEvent->AsInputEvent()) {

Although, this may not be the scope of this bug, this if block should run only when aIsHandlingNativeEvent is true.

::: layout/base/PresShell.cpp:8267
(Diff revision 2)
>      Telemetry::Accumulate(Telemetry::INPUT_EVENT_RESPONSE_MS, millis);
>      if (mDocument && mDocument->GetReadyStateEnum() != nsIDocument::READYSTATE_COMPLETE) {
>        Telemetry::Accumulate(Telemetry::LOAD_INPUT_EVENT_RESPONSE_MS, millis);
>      }
> +
> +    if (!mLastInputProcessed || mLastInputProcessed < aEvent->mTimeStamp) {

FYI: WidgetEvent::mTimeStamp may become too old if it took to long to receive the event from OS even if OS-side's delay. Is it okay? (E.g., on Windows, mTimeStamp is computed with native event's timestamp, not the time we received it. http://searchfox.org/mozilla-central/rev/7aa21f3b531ddee90a353215bd86e97d6974e25b/widget/windows/nsWindow.cpp#4077,4098,4102,4104 )

::: layout/base/PresShell.cpp:8267
(Diff revision 2)
> +    if (!mLastInputProcessed || mLastInputProcessed < aEvent->mTimeStamp) {
> +      if (mLastInputProcessed) {

So, this means that the event is 2nd or later input event on the PresShell and native event is created after the last input event is handled by the PresShell.

I wonder, isn't this records big delay when user moves mouse cursor on a document and then, switch the tab to different one, finally, goes back to the first tab and moves mouse cursor again?

Is this intentional scenario?

I feel that you write this code with assuming that there is only one PresShell in the process. Don't you need to make mLastInputProcessed and mLastInputCreated static members?

::: layout/base/PresShell.cpp:8269
(Diff revision 2)
> +        // This input event was created after we handled the last one.
> +        // Accumulate the previous events' coalesced duration.
> +        double lastMillis = (mLastInputProcessed - mLastInputCreated).ToMilliseconds();
> +        Telemetry::Accumulate(Telemetry::INPUT_EVENT_RESPONSE_COALESCED_MS,
> +                              lastMillis);

Hmm, this adds the previous event's handling time, isn't it? I don't understand the meaning. Why isn't this recorded at previous event handling?
Attachment #8860033 - Flags: review?(masayuki) → review-
Blocks: 1359502
(In reply to Masayuki Nakano [:masayuki] (JST, +0900) from comment #5)
> ::: layout/base/PresShell.cpp:8256
> (Diff revision 2)
> >    if (Telemetry::CanRecordBase() &&
> >        !aEvent->mTimeStamp.IsNull() &&
> >        aEvent->mTimeStamp > mLastOSWake &&
> >        aEvent->AsInputEvent()) {
> 
> Although, this may not be the scope of this bug, this if block should run
> only when aIsHandlingNativeEvent is true.

Interesting. I've filed bug 1359502.
 
> ::: layout/base/PresShell.cpp:8267
> (Diff revision 2)
> >      Telemetry::Accumulate(Telemetry::INPUT_EVENT_RESPONSE_MS, millis);
> >      if (mDocument && mDocument->GetReadyStateEnum() != nsIDocument::READYSTATE_COMPLETE) {
> >        Telemetry::Accumulate(Telemetry::LOAD_INPUT_EVENT_RESPONSE_MS, millis);
> >      }
> > +
> > +    if (!mLastInputProcessed || mLastInputProcessed < aEvent->mTimeStamp) {
> 
> FYI: WidgetEvent::mTimeStamp may become too old if it took to long to
> receive the event from OS even if OS-side's delay. Is it okay? (E.g., on
> Windows, mTimeStamp is computed with native event's timestamp, not the time
> we received it.
> http://searchfox.org/mozilla-central/rev/
> 7aa21f3b531ddee90a353215bd86e97d6974e25b/widget/windows/nsWindow.cpp#4077,
> 4098,4102,4104 )

Generally these inputs are handled within milliseconds, so if this "too old" is on the same order of magnitude of GTC's 49.7-day rollover, I think we're okay.

> ::: layout/base/PresShell.cpp:8267
> (Diff revision 2)
> > +    if (!mLastInputProcessed || mLastInputProcessed < aEvent->mTimeStamp) {
> > +      if (mLastInputProcessed) {
> 
> I feel that you write this code with assuming that there is only one
> PresShell in the process. Don't you need to make mLastInputProcessed and
> mLastInputCreated static members?

I did write under that assumption. This should've been obviously wrong, I apologize for not catching it.

> ::: layout/base/PresShell.cpp:8269
> (Diff revision 2)
> > +        // This input event was created after we handled the last one.
> > +        // Accumulate the previous events' coalesced duration.
> > +        double lastMillis = (mLastInputProcessed - mLastInputCreated).ToMilliseconds();
> > +        Telemetry::Accumulate(Telemetry::INPUT_EVENT_RESPONSE_COALESCED_MS,
> > +                              lastMillis);
> 
> Hmm, this adds the previous event's handling time, isn't it? I don't
> understand the meaning. Why isn't this recorded at previous event handling?

The idea is to record the time users spend waiting for input events to be processed. In the case that a user is waiting for, say, a 5-second GC and types many characters, we want a single 5-second accumulation on this probe.

Thus the algorithm above: we need to wait for a non-overlapping event to know when the previous batch of overlapping events can be coalesced. We still want the processing time in case there is a long processing time in addition to the delay.

---
I will prepare a multi-presshell-aware patch for review.
You patch works as:

OS                          PresShell
Event1
                            Event1 Processed
Event2
                            Event2 Processed, records (Event1 Processed - Event1)
Event3
Event4
                            Event3 Processed, records (Event2 Processed - Event2).
                            Event4 Processed, records nothing since Event4 is older than Event3 Processed.
Event5
Event6
                            Event5 Processed, records (Event4 Processed - Event3).
Event7
                            Event6 Processed, records nothing since Event6 is older than Event5 Processed.
Event8
                            Event7 Processed, records nothing since Event7 is older than Event6 Processed.
                            Event8 Processed, records nothing since Event8 is older than Event7 Processed.
Event9
                            Event9 Processed, records (Event8 Processed - Event5).
Event10
                            Event10 Processed, records (Event9 Processed - Event9).

Are all of them intentional result of you?

So, when user types too fast or system is busy, your patch records lot of events handling as a big delay.

One concern is, we cannot distinguish if we're just slow, user input it too fast, system is busy, etc.

The other concern is, not hanging up case is also recorded as long hanging up (Event6 - Event8).
Flags: needinfo?(chutten)
I believe this is intentional, as this measure is recording the durations the user is waiting for their Firefox to "catch up" to their input. Regardless of whether this is a user typing faster than we can manage or a Firefox instance under heavy load, there is something the user wanted us to do that we haven't handled yet.

From your excellent example, the user's been waiting since Event 6 for Firefox to respond in a timely fashion to their input. It isn't until Event 8 is handled that we return to responsive processing of user input.

This is my interpretation, at any rate. ni?bsmedberg to correct me if I have this wrong.
Flags: needinfo?(chutten) → needinfo?(benjamin)
Comment 8 describes the desired behavior here, yes. We want to record a single extended pause event all the while we haven't caught up to the user input.
Flags: needinfo?(benjamin)
Comment on attachment 8860033 [details]
bug 1357457 - Report non-overlapping Input Responses to Telemetry.  data-r=bsmedberg

https://reviewboard.mozilla.org/r/132054/#review137184

Okay, then, let's land this.
Attachment #8860033 - Flags: review?(masayuki) → review+
Pushed by chutten@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/f175826d9189
Report non-overlapping Input Responses to Telemetry. r=bsmedberg,masayuki data-r=bsmedberg
https://hg.mozilla.org/mozilla-central/rev/f175826d9189
Status: NEW → RESOLVED
Closed: 3 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla55
Blocks: 1360644
Comment on attachment 8860033 [details]
bug 1357457 - Report non-overlapping Input Responses to Telemetry.  data-r=bsmedberg

Approval Request Comment
[Feature/Bug causing the regression]: N/A
[User impact if declined]: A key measurement for quantum release criteria will not be baselineable.
[Is this code covered by automated tests?]: Telemetry alerts and hindsight will notify us if something goes awry with data collection.
[Has the fix been verified in Nightly?]: I have verified that data is coming through as expected, and that behaviour is correct.
[Needs manual test from QE? If yes, steps to reproduce]:
1. load about:telemetry
2. Look at the INPUT_EVENT_RESPONSE_COALESCED_MS probe
3. Open the console.
4. Run `let start = Date.now(); while (Date.now() - start < 3000) { }; console.log('complete');`
5. While that busy loop is keeping the main thread busy, interact with the browser: type some things, click some things.. whatever you'd like.
6. After the busy loop completes, reload about:telemetry
.. Examine INPUT_EVENT_RESPONSE_COALESCED_MS - it should have only a single more 3k bucket count. Test fails if it has as many more count in that bucket as input events you performed while the busy loop was running.
[List of other uplifts needed for the feature/fix]: bug 1357742
[Is the change risky?]: Not really.
[Why is the change risky/not risky?]: It is very similar to already-running code and uses well-understood atoms of gecko code (timestamps) in straighforward ways.
[String changes made/needed]: None
Attachment #8860033 - Flags: approval-mozilla-beta?
Attachment #8860033 - Flags: approval-mozilla-aurora?
Hi Brindusa, could you help find someone to verify if this issue was fixed as expected on a latest Nightly build? Thanks!
Flags: qe-verify+
Flags: needinfo?(brindusa.tot)
Verified as fixed, with the steps from comment 14, in the latest Nightly build, 55.0a1 build ID 20170501030204 on Window 10x64, Ubuntu 16.04 x64 and Mac 10.10.
Status: RESOLVED → VERIFIED
Flags: needinfo?(brindusa.tot)
Comment on attachment 8860033 [details]
bug 1357457 - Report non-overlapping Input Responses to Telemetry.  data-r=bsmedberg

A key measurement for quantum release criteria. Beta54+. Should be in 54 beta 5.
Attachment #8860033 - Flags: approval-mozilla-beta?
Attachment #8860033 - Flags: approval-mozilla-beta+
Attachment #8860033 - Flags: approval-mozilla-aurora?
Attachment #8860033 - Flags: approval-mozilla-aurora-
Blocks: 1362404
I verified using steps from comment 14 that the behavior for telemetry data is as expected on Firefox 54 beta 10 across platforms (Windows 10 64bit, macOS 10.12.4 and Ubuntu 16.04 32bit).
Flags: qe-verify+
You need to log in before you can comment on or make changes to this bug.