Open Bug 1303296 Opened 4 years ago Updated 1 year ago

Implement Expected Queueing Time (EQT) for Telemetry/Profiler

Categories

(Core :: DOM: Events, defect, P3)

defect

Tracking

()

Tracking Status
firefox53 --- affected

People

(Reporter: Dominik, Unassigned)

References

(Blocks 1 open bug)

Details

(Keywords: perf, Whiteboard: [qf:p3])

Attachments

(1 file)

In the context of Progressive Web Metrics, Estimated Input Latency (EIL) quantifies the risk to input responsiveness for any time window after the page first becomes interactive.

EIL is based on the Expected Queueing Time (EQT), which estimates how long the main thread would take to respond to user input, and the probability that new input will require a responsive main thread.

Chromium specs can be found here: https://docs.google.com/document/d/1b9slyaB9yho91YTOkAQfpCdULFkZM9LqsipcX3t7He8/edit#
Can you start a discussion on dev-platform, Dominik?
Flags: needinfo?(dstrohmeier)
Priority: -- → P3
Andrew, yes, we are late with this.

Email will go out to dev-platform at latest on Monday hopefully starting a discussion on the topic of Progressive Web Metrics, in general, and the individual components what we see critical to be added to Gecko.
Flags: needinfo?(dstrohmeier)
:billm mentioned that the DOM team started to land probes. This would fit right into this task set and we need to start baselining.
Flags: needinfo?(overholt)
Assignee: nobody → afarre
Flags: needinfo?(overholt)
I was thinking we could use some other measurements for this but now I'm not so sure. Let me work on this some more.
Assignee: afarre → nobody
Clarified the minimal scope for this bug, which is only telemetry and profiler. Developer facing numbers depend on validation and spec stabilizing.
Keywords: dev-doc-needed
Summary: Implement Estimated Input Latency (EIL) → Implement Estimated Input Latency (EIL) for Telemetry/Profiler
William's going to take a look here.
Assignee: nobody → wchen
Latest status: wchen and I had a short kick off meeting last week. He didn't see any glaring issues with the spec apart from minor chrome specific parts, so we might have a test implementation soon that we can then verify in real life scenarios.
We can take a first shot at this by actually doing the analysis on JSON profiles. This way we can statically infer the numbers for any profile and then see if the in-engine analysis came to the same conclusion.
This will probably come in handy for Bug 1305153, especially measuring latency occurring due to content JS.
Harald says the new Estimated Queuing Latency (EQL) metric has replaced Estimated Input Latency (EIL).
Summary: Implement Estimated Input Latency (EIL) for Telemetry/Profiler → Implement Estimated Queuing Latency (EQL) for Telemetry/Profiler
William, are you still working on this EQL probe for Hasal performance testing?
Flags: needinfo?(wchen)
Summary: Implement Estimated Queuing Latency (EQL) for Telemetry/Profiler → Implement Expected Queueing Time (EQT) for Telemetry/Profiler
:wchen, is this on your backlog to be on the trains for Q1.  Chrome is also planning this for Q1, so we can compare notes with them.
This is a core metric for the Quantum work.  What's the current status and is there someone owning this?
I'm working on this and I'll have patches for this up soon.
Flags: needinfo?(wchen)
Can I help by arranging a meeting to talk about any details that are unclear and might be blocking you?
Flags: needinfo?(wchen)
Can we have a strawman implementation by end of Q1 that allows us to assess how useful this metric is for tracking performance?
Flags: needinfo?(overholt)
Whiteboard: [qf:p3]
billm: I see that you recently reviewed some code related to queuing events. Can you review this patch? 

This patch implements expected queuing as described in the EQT document [1] and roughly implemented as described in second EQT document[2]. The way we calculate the EQT in Gecko isn't quite the same as described in the document because we queue input events in a different way and can't make the same assumptions as Chrome. Here are the things that happen in this patch:

- Create a circular buffer for the main thread that records event-related actions in the event queue that handles input events. Actions are pushing and popping of nested event queues, an event being queue, an event that has started running and when an event has stopped running.
- When we want to compute the EQT, we use the actions in the buffer to create time slices that are used to determine when a hypothetical input event would run if it were to arrive during the time slice.
- For now I have only enabled this for non-release/beta since it touches the event queue and I don't want to risk performance issues unless we actually need more data collected from release/beta.

[1] https://docs.google.com/document/d/1b9slyaB9yho91YTOkAQfpCdULFkZM9LqsipcX3t7He8
[2] https://docs.google.com/document/d/13ck0H1It8YChVAsgsAsZnZRtc3JQkZ-QRBSl944X6nA
Flags: needinfo?(wchen)
Attachment #8848367 - Flags: review?(wmccloskey)
EQT is an instantaneous measurement, at what time do you want to sample it for telemetry?
Flags: needinfo?(hkirschner)
We need it for bug 1299118, TTI would be the main consumer for now.

With the profiler running we could mark times when EQT is bad, but those times could be also deferred from the information in the profile.
Flags: needinfo?(hkirschner)
Comment on attachment 8848367 [details] [diff] [review]
Implement expected queuing time metric.

Review of attachment 8848367 [details] [diff] [review]:
-----------------------------------------------------------------

I can look this over, but changes to nsThread that are this significant should be reviewed by an XPCOM peer.
Attachment #8848367 - Flags: review?(wmccloskey) → review?(nfroyd)
Comment on attachment 8848367 [details] [diff] [review]
Implement expected queuing time metric.

Review of attachment 8848367 [details] [diff] [review]:
-----------------------------------------------------------------

Um.  Lots of comments here.  The actual nsThread integration seems reasonable enough, but I'd like some context on when exactly we're calling into all this machinery.

::: xpcom/threads/nsThread.cpp
@@ +665,5 @@
>    }
>  #endif
>  }
>  
> +#define EVENT_ACTION_BUFFER_LENGTH 1000

Any rationale for why this is 1000 rather than 60 or 500 or 3000?  Please document why this number and not some other one.

Probably better to make this:

static const kEventActionBufferLength = ...;

because variables are better than macros where possible.

Since we're using this for a circular buffer that gets touched on every event, it should be a power-of-two so `%` is as cheap as we can make it.

@@ +749,5 @@
>      }
> +#ifndef RELEASE_OR_BETA
> +    void* eventPtr = event.get();
> +#endif
> +    bool putInNormalQueue = false;

This variable reads better as `placedInNormalQueue`.  See comments on the header file part of the patch.

@@ +1627,5 @@
> +  TimeStamp mStart;
> +  TimeStamp mEnd;
> +  // Pointer is only used to identify a event and should never
> +  // be dereferenced.
> +  void* mRunsAfter;

Maybe we should use mozilla::Opaque<uintptr_t> here, just to make sure nobody does bad things with the pointer?  We can initialize it to the obvious value in Init().

I'm not sure if doing so makes things clearer or not; asking for your feedback here.

@@ +1637,5 @@
> +};
> +} // anonymous namespace
> +
> +double
> +nsThread::GetExpectedQueueTime()

How often is this function getting called?

@@ +1645,5 @@
> +  TimeStamp now = TimeStamp::Now();
> +
> +  // Create a local copy of the event action buffer so
> +  // that we don't have to keep holding the lock.
> +  nsTArray<TimedEventAction> actions(EVENT_ACTION_BUFFER_LENGTH);

Presumably this should be an nsAutoTArray, so we don't have to heap allocate in the common case?

@@ +1659,5 @@
> +  uint32_t startIndex = actionIndex;
> +  uint32_t endIndex = actionIndex;
> +  while (actions[startIndex].mTimeStamp.IsNull() ||
> +         (now - actions[startIndex].mTimeStamp).ToMilliseconds() > 1000) {
> +    startIndex = (startIndex + 1) % EVENT_ACTION_BUFFER_LENGTH;

Is this really correct in the case that `actions` is shorter than EVENT_ACTION_BUFFER_LENGTH?

Also, actions[] accesses are not going to be valid if `actions` is shorter than EVENT_ACTION_BUFFER_LENGTH on the first iteration of this loop and the ones below.

@@ +1666,5 @@
> +      break;
> +    }
> +  }
> +
> +  nsTArray<QueuingDuration> queuingDurations;

nsAutoTArray, same rationale as previously?

@@ +1675,5 @@
> +  // A nullptr means that there is currently no queued task,
> +  // thus any arriving input events do not have to wait in
> +  // the queue.
> +  nsTArray<void*> lastQueuedTask;
> +  lastQueuedTask.AppendElement(nullptr);

nsAutoTArray, same rationale as previously?

@@ +1680,5 @@
> +
> +  // Create queuing durations for the recorded actions.
> +  for (uint32_t currentIndex = startIndex;
> +       currentIndex != endIndex;
> +       currentIndex = (currentIndex + 1) % EVENT_ACTION_BUFFER_LENGTH) {

Same question here as when we were iterating to determine `startIndex`.

@@ +1681,5 @@
> +  // Create queuing durations for the recorded actions.
> +  for (uint32_t currentIndex = startIndex;
> +       currentIndex != endIndex;
> +       currentIndex = (currentIndex + 1) % EVENT_ACTION_BUFFER_LENGTH) {
> +    TimedEventAction currentAction = actions[currentIndex];

const TimedEventAction&, so we're not needlessly copying here?

@@ +1692,5 @@
> +
> +        // End the previous duration if it has not been marked as ended.
> +        if (!queuingDurations.IsEmpty() &&
> +            queuingDurations.LastElement().mEnd.IsNull()) {
> +          queuingDurations.LastElement().mEnd = currentAction.mTimeStamp;

We do this check-and-set three or four times in this function; I think a small helper is warranted.

Additionally, consider helper methods for calculating this loop and the subsequent one.  The comments do help make this a lot clearer than it would be otherwise (thank you!), but splitting things up a bit would also be helpful.

@@ +1782,5 @@
> +  uint32_t currentIndex = startIndex;
> +  // For each queuing duration, compute the queue time of a hypothetical
> +  // input event arriving during the duration.
> +  for (uint32_t i = 0; i < queuingDurations.Length(); i++) {
> +    QueuingDuration& duration = queuingDurations[i];

Use const?

@@ +1788,5 @@
> +    TimeStamp runTime = duration.mEnd;
> +    if (duration.mRunsAfter) {
> +      // Find the time at which |duration.mRunsAfter| is finished running.
> +      while (currentIndex != endIndex) {
> +        TimedEventAction& action = actions[currentIndex];

Use const?

@@ +1789,5 @@
> +    if (duration.mRunsAfter) {
> +      // Find the time at which |duration.mRunsAfter| is finished running.
> +      while (currentIndex != endIndex) {
> +        TimedEventAction& action = actions[currentIndex];
> +        currentIndex = (currentIndex + 1) % EVENT_ACTION_BUFFER_LENGTH;

Same concern as above when we're iterating `actions`.

::: xpcom/threads/nsThread.h
@@ +169,4 @@
>                    mozilla::MutexAutoLock& aProofOfLock);
>  
> +    void PutEvent(nsIRunnable* aEvent,
> +                  bool* aPutInNormalQueue,

I think we should call this `aPlacedInNormalQueue` to make its only-outparamness a little clearer.  With the current name, it sounds like it could be an in-out param.

Why are we not trying to account for EQT of the high priority queue, even as a separate metric?  Surely that is important, too?

@@ +177,4 @@
>      }
>  
>      void PutEvent(already_AddRefed<nsIRunnable> aEvent,
> +                  bool* aPutInNormalQueue,

Same comment on this function parameter as previously.

@@ +306,5 @@
> +    // Pointer is only used to identify a event and should
> +    // never be dereferenced.
> +    void* mEvent;
> +    EventAction mAction;
> +    mozilla::TimeStamp mTimeStamp;

Please put mAction after mTimeStamp so this might pack a little better on some platforms.
Attachment #8848367 - Flags: review?(nfroyd) → feedback+
I have a couple higher-level comments:

1. I think that, as much as possible, this code should be split into a separate data structure that just records the events and computes the EQT. That should go in a separate file from nsThread. And we'll need unit tests for that data structure. This code is complex enough that it should be tested in isolation (with GTests or something).

2. I think you may be misinterpreting what PushEventQueue/PopEventQueue are for. They're only called on worker threads for a specific purpose (that I've never really understood). There are nested event loops on the main thread and these Push/Pop functions are never called for them. If you want to detect nested event loops, you'll need to look at the recursion counter in nsThread.

3. I had a hard time following what the code was doing. There will need to be a high-level comment describing what's actually being computed as well as more comments in the code that does the EQT computation. I can kind of imagine how you would adapt Chromium's EQT measurement for our event loop (where input events are not prioritized). But you need to actually write out what you're measuring in detail.
Flags: needinfo?(overholt)
Do we care about this anymore?
Assignee: wchen → nobody
Flags: needinfo?(dstrohmeier)
(In reply to Andrew Overholt [:overholt] from comment #23)
> Do we care about this anymore?

Harald? ^

(Dominik is out on paternity leave.)
Flags: needinfo?(dstrohmeier) → needinfo?(hkirschner)
Not for 57. We might revisit Time to Interactive later.

It came up in the past as possible metric for event queue and idle times by the DOM team; is that still relevant?
Flags: needinfo?(hkirschner)
Keywords: perf
You need to log in before you can comment on or make changes to this bug.