Closed Bug 1305153 Opened 9 years ago Closed 7 years ago

Data collection for responsiveness improvements in DOM

Categories

(Core :: DOM: Core & HTML, defect, P3)

defect

Tracking

()

RESOLVED WONTFIX

People

(Reporter: nika, Assigned: farre)

References

(Depends on 1 open bug, Blocks 1 open bug)

Details

Attachments

(1 file)

We want to do a bunch of data collection as part of our efforts to improve the responsiveness of DOM. These will help us focus on fixing problems which will make a big difference. These are my thoughts for how to handle the data collection for how much preemption of JS will help us: 1. When entering content JS, record start time and whether or not the document is in the background. 2. When posting a vsync or input task to the main thread, record the time when that task was posted. 3. When exiting content JS, record end time. If there are vsync or input tasks waiting, post a telemetry probe with how long they have been waiting. Record how long the execution of content JS took. These should be separated into two bins, one for foreground tabs, and one for background tabs. My thoughts for reflow are similar: 1. When entering reflow, record start time, whether the reflow was triggered by JS, and whether or not the document is in the background. 2. When posting a vsync or input task to the main thread, record the time when that task was posted. 3. When exiting reflow, record end time. If there are vsync or input tasks waiting, post a telemetry probe with how long they have been waiting. Record how long the reflow execution took. These should be separated into bins, based on whether the tab is in the background, and whether the reflow was triggered by JS. Ehsan, do you have thoughts as to how well these metric would work for our purposes?
Flags: needinfo?(ehsan)
Depends on: 1305154
Sorry for the late response, this somehow fell through the cracks in my queue. :( (In reply to Michael Layzell [:mystor] from comment #0) > These are my thoughts for how to handle the data collection for how much > preemption of JS will help us: > > 1. When entering content JS, record start time and whether or not the > document is in the background. > 2. When posting a vsync or input task to the main thread, record the time > when that task was posted. > 3. When exiting content JS, record end time. If there are vsync or input > tasks waiting, post a telemetry probe with how long they have been waiting. > Record how long the execution of content JS took. These should be separated > into two bins, one for foreground tabs, and one for background tabs. I'm assuming here you're thinking about questions such as "how much time we've spent running JS in the foreground/background when we would have liked to paint or process input events." For the paint case, you need to pay attention to whether the refresh driver is "active". That essentially covers the case where we want to paint (which effectively means we either have a dynamic update which can potentially affect painting or we're running an animation.) Please ping mstange on the correct definition of the refresh driver activeness, but there will be a central place where a refresh driver is marked as active. That will be the interesting place in the code base. For the user input case, as you noted, we need to look at the place where we receive an input event from the parent through IPC. In each one of the cases above, we need to tell whether we're running JS from the background or foreground tab. For the user input event case, since the input event is received on a background thread, you can check to see whether there's currently some JS that's running. For the refresh driver case, IIRC we determine whether the refresh driver must become active on the main thread. It's possible that we can actually ignore the refresh driver itself and just care about the incoming vsync event similar to the user input event case above, assuming that a vsync event is only dispatched when the corresponding refresh driver becomes active, but please double check that with mstange. If that's not the case then we need to find a suitable place. It's probably possible to implement all of this simply by checking whether there is a pending user input/vsync event when we're done running JS code, and if there is, record how much time we spent running the JS, whether it came from a foreground or background tab, and whatnot. The basic setup you've described here should work I think, but please double check the refresh driver case. > My thoughts for reflow are similar: > > 1. When entering reflow, record start time, whether the reflow was triggered > by JS, and whether or not the document is in the background. > 2. When posting a vsync or input task to the main thread, record the time > when that task was posted. > 3. When exiting reflow, record end time. If there are vsync or input tasks > waiting, post a telemetry probe with how long they have been waiting. Record > how long the reflow execution took. These should be separated into bins, > based on whether the tab is in the background, and whether the reflow was > triggered by JS. This is very similar (at least on the high level) with the above, so yes, a similar solution should work here as well.
Flags: needinfo?(ehsan)
I can't work on this right now. Once my plate clears I'll come back to it if someone else hasn't taken it over.
Assignee: michael → nobody
Priority: -- → P2
Assignee: nobody → afarre
Ehsan, thought that I'd throw up the telemetry version I made of Bill's content JS measurements. If you could just take a look to make sure that I'm not way off. And if it looks good, is there perhaps a point in limiting where we actually gather data? Currently I've just carried everything over from Bug 1296486, but maybe the top ten of https://bugzilla.mozilla.org/show_bug.cgi?id=1296486#c2 would be enough?
Attachment #8815245 - Flags: feedback?(ehsan)
Comment on attachment 8815245 [details] [diff] [review] 0001-Bug-1305153-Measure-time-spent-in-content-JS-that-ca.patch Review of attachment 8815245 [details] [diff] [review]: ----------------------------------------------------------------- This looks pretty good! I have some minor comments below. One question about the telemetry notes. Do you know how they show up in the telemetry UI? ::: dom/telemetry/DataCollector.cpp @@ +26,5 @@ > + nsString aTaskInfo) > + : mTaskType(aTaskType) > + , mTaskInfo(aTaskInfo) > + , mStart(TimeStamp::Now()) > + , mDocumentActive(false) FWIW, instead of copying the ctor implementations like this, use the C++11 delegated constructors to implement the ones taking fewer args in terms of the ones taking more. @@ +106,5 @@ > +DataCollector::CommonInit(nsIDocument* aDocument) > +{ > + if (!NS_IsMainThread() || !XRE_IsContentProcess()) { > + return; > + } Not sure why this is useful. (The main thread check should probably be converted into an assertion. @@ +109,5 @@ > + return; > + } > + > + if (aDocument) { > + mDocumentActive = !aDocument->IsInBackgroundWindow(); If you make this class MOZ_RAII then you won't need to store this, you can compute it in the dtor. Hopefully the background-ness of the window doesn't change while we're running JS inside it. ::: dom/telemetry/DataCollector.h @@ +22,5 @@ > +namespace dom { > + > +class EventTarget; > + > +namespace telemetry { Please don't invent a mozilla::dom::telemetry namespace! mozilla::telemetry should be good here. @@ +24,5 @@ > +class EventTarget; > + > +namespace telemetry { > + > +class DataCollector I think we want to model this after the AutoTimer class. You probably don't want to take the id as a template argument, but it would be nice to allow consumers to decide whether they want to record the time in milli or microseconds. It may also be nice to accept an optional aStart argument. Also this could use a better name. How about AutoJSTaskTimer? @@ +29,5 @@ > +{ > +public: > + DataCollector(nsIDocument* aDocument, const char* aTaskType); > + DataCollector(nsIDocument* aDocument, const char* aTaskType, > + nsString aTaskInfo); Is there a good reason to use wide strings for the task info? I'd use narrow strings all the way through. ::: dom/telemetry/moz.build @@ +1,1 @@ > +# -*- Mode: python; indent-tabs-mode: nil; tab-width: 40 -*- I'd move this code to toolkit/components/telemetry. You should probably add the timer class directly to Telemetry.h.
Attachment #8815245 - Flags: feedback?(ehsan) → feedback+
(In reply to Andreas Farre [:farre] from comment #3) > And if it looks good, is there perhaps a point in limiting where we actually > gather data? Currently I've just carried everything over from Bug 1296486, > but maybe the top ten of > https://bugzilla.mozilla.org/show_bug.cgi?id=1296486#c2 would be enough? Hmm, it's hard to predict which ones are the big ones before we get this landed, I think. I wouldn't worry about adding too many instances here, we can always remove stuff if it becomes clear some of the call sites are negligible.
What does LastTickSkippedAnyPaints() measure? I had a hard time understanding the code.
Also, when bug 1318506 is done, most of the runnables from bug 1296486 will be labeled using the new infrastructure. You should be able to wrap the runnable passed to TabGroup::Dispatch in something that sets up a DataCollector. That way you won't have to manually instrument so many runnables.
(In reply to [PTO to Dec5] Bill McCloskey (:billm) from comment #6) > What does LastTickSkippedAnyPaints() measure? I had a hard time > understanding the code. LastTickSkippedAnyPaints() returns true when the currently active refresh driver timer skipped the last paint, and I use this as an indication of jank. I could get the current and previous jank levels instead, if that is more clear. That is also more correct, since LastTickSkippedAnyPaints() only notices if the last tick is skipped. Comparing changes between jank levels will notice all skipped ticks in between.
(In reply to [PTO to Dec5] Bill McCloskey (:billm) from comment #7) > Also, when bug 1318506 is done, most of the runnables from bug 1296486 will > be labeled using the new infrastructure. You should be able to wrap the > runnable passed to TabGroup::Dispatch in something that sets up a > DataCollector. That way you won't have to manually instrument so many > runnables. Right. Will adjust to use bug1318506 .
(In reply to :Ehsan Akhgari from comment #4) > Comment on attachment 8815245 [details] [diff] [review] > 0001-Bug-1305153-Measure-time-spent-in-content-JS-that-ca.patch > > Review of attachment 8815245 [details] [diff] [review]: > ----------------------------------------------------------------- > > This looks pretty good! I have some minor comments below. > > One question about the telemetry notes. Do you know how they show up in the > telemetry UI? If you mean about:telemetry they show up under 'Keyed Histograms', and they're listed > ::: dom/telemetry/DataCollector.cpp > @@ +26,5 @@ > > + nsString aTaskInfo) > > + : mTaskType(aTaskType) > > + , mTaskInfo(aTaskInfo) > > + , mStart(TimeStamp::Now()) > > + , mDocumentActive(false) > > FWIW, instead of copying the ctor implementations like this, use the C++11 > delegated constructors to implement the ones taking fewer args in terms of > the ones taking more. Right. > @@ +106,5 @@ > > +DataCollector::CommonInit(nsIDocument* aDocument) > > +{ > > + if (!NS_IsMainThread() || !XRE_IsContentProcess()) { > > + return; > > + } > > Not sure why this is useful. > > (The main thread check should probably be converted into an assertion. That's true. > @@ +109,5 @@ > > + return; > > + } > > + > > + if (aDocument) { > > + mDocumentActive = !aDocument->IsInBackgroundWindow(); > > If you make this class MOZ_RAII then you won't need to store this, you can > compute it in the dtor. Hopefully the background-ness of the window doesn't > change while we're running JS inside it. Will do. > ::: dom/telemetry/DataCollector.h > @@ +22,5 @@ > > +namespace dom { > > + > > +class EventTarget; > > + > > +namespace telemetry { > > Please don't invent a mozilla::dom::telemetry namespace! mozilla::telemetry > should be good here. Right. > @@ +24,5 @@ > > +class EventTarget; > > + > > +namespace telemetry { > > + > > +class DataCollector > > I think we want to model this after the AutoTimer class. You probably don't > want to take the id as a template argument, but it would be nice to allow > consumers to decide whether they want to record the time in milli or > microseconds. It may also be nice to accept an optional aStart argument. > > Also this could use a better name. How about AutoJSTaskTimer? Sure, will make it so. > @@ +29,5 @@ > > +{ > > +public: > > + DataCollector(nsIDocument* aDocument, const char* aTaskType); > > + DataCollector(nsIDocument* aDocument, const char* aTaskType, > > + nsString aTaskInfo); > > Is there a good reason to use wide strings for the task info? I'd use > narrow strings all the way through. Not really, it is because the async event type is wide, so some conversion needs to happen, but then it might as well be when constructing the AutoJSTaskTimer. > ::: dom/telemetry/moz.build > @@ +1,1 @@ > > +# -*- Mode: python; indent-tabs-mode: nil; tab-width: 40 -*- > > I'd move this code to toolkit/components/telemetry. You should probably add > the timer class directly to Telemetry.h. Will do as well. Many thanks.
(In reply to Andreas Farre [:farre] from comment #10) > (In reply to :Ehsan Akhgari from comment #4) > > Comment on attachment 8815245 [details] [diff] [review] > > 0001-Bug-1305153-Measure-time-spent-in-content-JS-that-ca.patch > > > > Review of attachment 8815245 [details] [diff] [review]: > > ----------------------------------------------------------------- > > > > This looks pretty good! I have some minor comments below. > > > > One question about the telemetry notes. Do you know how they show up in the > > telemetry UI? > > If you mean about:telemetry they show up under 'Keyed Histograms', and > they're listed > Brain apparently gave up mid-sentence. They're listed in a histogram with either id CONTENT_JS_FOREGROUND_TICK_DELAY_MS or CONTENT_JS_BACKGROUND_TICK_DELAY_MS, and a key based on 'aTaskType' and 'aTaskInfo'. So basically one historgram per ('aTaskType' x 'aTaskInfo'), This means potentially a lot of histograms, but I think that that is what we want.
Yes, that sounds good.
Depends on: 1322184
Blocks: 1325245
Moving to p3 because no activity for at least 1 year(s). See https://github.com/mozilla/bug-handling/blob/master/policy/triage-bugzilla.md#how-do-you-triage for more information
Priority: P2 → P3
Component: DOM → DOM: Core & HTML

Not needed anymore.

Status: NEW → RESOLVED
Closed: 7 years ago
Resolution: --- → WONTFIX
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: