Closed Bug 1322184 Opened 3 years ago Closed 2 years ago

Measure time running content JS that ends up delaying a refresh driver tick

Categories

(Core :: DOM: Core & HTML, defect)

48 Branch
defect
Not set

Tracking

()

RESOLVED FIXED
mozilla55
Tracking Status
firefox55 --- fixed

People

(Reporter: farre, Assigned: farre)

References

Details

Attachments

(1 file, 7 obsolete files)

No description provided.
Assignee: nobody → afarre
Blocks: 1305153
Comment on attachment 8825833 [details]
Bug 1322184 - Measure time spent in content JS that causes delay in paint.

https://reviewboard.mozilla.org/r/103888/#review105670

::: dom/base/TabGroup.cpp:270
(Diff revision 1)
> +      mozilla::DebugOnly<bool> result =
> +        nsRefreshDriver::GetJankLevels(preJankLevels);
> +      MOZ_ASSERT(result);
> +    }
> +
> +    Telemetry::AutoCancellableTimer<

Add a keyless measurement to get the total time.

::: dom/base/TabGroup.cpp:327
(Diff revision 1)
> +    if (nsCOMPtr<nsINamed> named = do_QueryInterface(runnable)) {
> +      named->SetName(aName);
> +    }
> +  }
> +
> +  if (Telemetry::CanRecordBase()) {

This should only be on for nightly and beta.
Comment on attachment 8825833 [details]
Bug 1322184 - Measure time spent in content JS that causes delay in paint.

https://reviewboard.mozilla.org/r/103888/#review105914

::: dom/base/TabGroup.cpp:291
(Diff revision 1)
> +    bool cancel = false;
> +    for (int32_t i = preJankLevels.length() - 1; i >= 2; --i) {
> +      if (preJankLevels[i] < postJankLevels[i]) {
> +        cancel = true;
> +        break;
> +      }
> +    }

This should of course be the other way around. If the jank levels before is less than the current jank levels we've obviously janked, and therefore we shouldn't cancel timers.
(In reply to Andreas Farre [:farre] from comment #3)
> Comment on attachment 8825833 [details]
> Bug 1322184 - Measure time spent in content JS that causes delay in paint.
> 
> https://reviewboard.mozilla.org/r/103888/#review105914
> 
> ::: dom/base/TabGroup.cpp:291
> (Diff revision 1)
> > +    bool cancel = false;
> > +    for (int32_t i = preJankLevels.length() - 1; i >= 2; --i) {
> > +      if (preJankLevels[i] < postJankLevels[i]) {
> > +        cancel = true;
> > +        break;
> > +      }
> > +    }
> 
> This should of course be the other way around. If the jank levels before is
> less than the current jank levels we've obviously janked, and therefore we
> shouldn't cancel timers.

And after fixing this I realize that this really doesn't work. Jank levels are updated when a tick starts, which won't happen when running runnables on the main thread in any case. Trying to work around this.
Comment on attachment 8825833 [details]
Bug 1322184 - Measure time spent in content JS that causes delay in paint.

https://reviewboard.mozilla.org/r/103888/#review105670

> Add a keyless measurement to get the total time.

Added a TOTAL key that tracks total time, which becomes basically the same thing, but without adding a new probe.
(In reply to Andreas Farre [:farre] from comment #4)
> And after fixing this I realize that this really doesn't work. Jank levels
> are updated when a tick starts, which won't happen when running runnables on
> the main thread in any case. Trying to work around this.

Quickly threw together a solution for this, but not really sure if it is a sane approach.

Basically just added a check if the current refresh driver timer has _any_ refresh driver that should've painted, but has been delayed. Matt, is this a sane way to check if there really should've been a paint?
Flags: needinfo?(matt.woodrow)
Comment on attachment 8825833 [details]
Bug 1322184 - Measure time spent in content JS that causes delay in paint.

Clearing the request for now to get it out of my queue.  Please reset the flag once it's ready!
Attachment #8825833 - Flags: review?(ehsan)
mWaitingForTransaction gets set when the refresh driver gets sufficiently ahead of the compositor (2 frames) to avoid over-production.

It gets set when we send a paint off to the compositor (via allocating a new id for the paint), and gets cleared again when we get a notification from the compositor that it's caught up sufficiently.

If we get another refresh driver tick during this waiting state, then we skip the paint and set mSkippedPaints = true. When the paint notification does finally come in, we paint immediately instead of waiting for the next tick (in an attempt to catch up again). This latter part probably should go away sometime.

That doesn't sound like what we want to measure for this bug though. I think we want to know how late the refresh driver tick happens, possibly by measuring the difference in the timer/vsync timestamp (when it was meant to fire) and the current time when we actually run Tick()?
Flags: needinfo?(matt.woodrow)
So, I threw out the refresh driver stuff and instead made VsyncParent and VsyncChild record that sending and receiving vsync ipc messages is happening and then use that in TabGroup to detect that we're running an event when there is a pending vsync. 

Actually checking if the event has delayed the vsync is a bit hand-wavey, since the vsync may have arrived at the very end the handled event, and I should probably add a mutex protected timestamp, but I wanted to ask if this seems like a good idea to begin with.

(See attached file for context)
Attachment #8825833 - Attachment is obsolete: true
Flags: needinfo?(matt.woodrow)
Flags: needinfo?(ehsan)
I think one thing that this patch is missing is checking whether a vsync event is already pending when the runnable starts to run.  Perhaps we should record that in a separate probe since in theory if we are doing a good job at prioritizing vsync events, this number should be very low.

Also, do we want to have another probe which is basically the sum across foreground and background windows?

Other than this, I think this is mostly what we want!
Flags: needinfo?(ehsan)
Attachment #8831230 - Attachment is obsolete: true
Attachment #8832043 - Flags: review?(matt.woodrow)
Attachment #8832043 - Flags: review?(ehsan)
Attachment #8832043 - Flags: feedback?(benjamin)
Comment on attachment 8832043 [details] [diff] [review]
0001-Bug-1305153-Measure-time-spent-in-content-JS-that-ca.patch

Review of attachment 8832043 [details] [diff] [review]:
-----------------------------------------------------------------

::: layout/ipc/VsyncChild.cpp
@@ +61,5 @@
>  {
>    MOZ_ASSERT(NS_IsMainThread());
>    MOZ_ASSERT(!mIsShutdown);
> +
> +  nsLayoutUtils::RecordReceiveVsyncMessage();

Don't VsyncChild and VsyncParent run in different processes?

It's a sub-protocol of PBackground, which is for communicating between the content process and the parent process.
Yep. I'm really starting to think that this isn't consistently measurable. The only point in time that we can know that we've janked is, as Matt says in comment #9, when we've gotten the vsync event and can compare the expected time with 'now'. 

So to be able to do this I think that I'd need keep a list of enter/exit times and label of content runnables, and _when_ the refresh driver detects a jank it can check against that list to see which runnables affected that jank. This approach seems a lot less attractive, though.
Flags: needinfo?(matt.woodrow)
Comment on attachment 8832043 [details] [diff] [review]
0001-Bug-1305153-Measure-time-spent-in-content-JS-that-ca.patch

I don't know why I wasn't paying attention to the vsync actors running in different processes, but Matt is right, this won't work.
Attachment #8832043 - Flags: review?(ehsan) → review-
Comment on attachment 8832043 [details] [diff] [review]
0001-Bug-1305153-Measure-time-spent-in-content-JS-that-ca.patch

I'll wait until you have the thing you want, but I'll add a few notes now:

The histogram descriptions really need more detail about *when* they are recorded. e.g. is a measurement recorded for CONTENT_JS_TICK_DELAY_MS every time the refresh driver is delayed? Or every time it fires at all (record 0 because there's no delay).

For the keyed metrics, you need to describe the key in detail. Since keyed is false by default, you don't need to include keyed: false in the others.
Attachment #8832043 - Flags: feedback?(benjamin) → feedback-
(In reply to Andreas Farre [:farre] from comment #14)
> Yep. I'm really starting to think that this isn't consistently measurable.
> The only point in time that we can know that we've janked is, as Matt says
> in comment #9, when we've gotten the vsync event and can compare the
> expected time with 'now'. 
> 
> So to be able to do this I think that I'd need keep a list of enter/exit
> times and label of content runnables, and _when_ the refresh driver detects
> a jank it can check against that list to see which runnables affected that
> jank. This approach seems a lot less attractive, though.

Another alternative would be to change VSyncChild to run on a helper thread in the content process, and then deliver notifications back to the main thread.

This would give us a place to increment the count (that isn't blocked on the main thread), and then we can decrement the count when RefreshDriverVsyncObserver gets the main-thread notification.

That's a little more involved though, you'd want to talk to :mchang if you want to go that way.
Probably the easiest way to do this is to iterate over the IPC queue to figure out if there are any pending vsync messages.
http://searchfox.org/mozilla-central/rev/d20e4431d0cf40311afa797868bc5c58c54790a2/ipc/glue/MessageChannel.cpp#1009
You can do this from where you were calling HasPendingVsyncMessage().

If there are a lot of messages, it could be slow. But you could abort after a few and assume there probably was a vsync. If we're blocking lots of IPC messages, it's probably just as bad as blocking a vsync anyway.
Attachment #8832043 - Flags: review?(matt.woodrow)
Started peeking at messages to determine if a vsync ipc message was pending or became pending.

Re-wrote to adapt to ValidatingDispatcher.
Attachment #8832043 - Attachment is obsolete: true
Attachment #8839567 - Flags: review?(wmccloskey)
Attachment #8839567 - Flags: review?(ehsan)
Comment on attachment 8839567 [details] [diff] [review]
0001-Bug-1322184-Measure-time-spent-in-content-JS-that-ca.patch

Review of attachment 8839567 [details] [diff] [review]:
-----------------------------------------------------------------

You'll need data collection review from bsmedberg. Also, I don't really understand the purpose of all the histograms, but I'll defer to Ehsan on that since he wants to use them sooner.

Generally this looks nice. It feels a little strange to put this code in XPCOM. But given how much effort has gone into the bug, I don't want to ask for major changes. Especially since I can't think of a better place to put it.

::: dom/base/TabGroup.h
@@ -114,5 @@
>                     nsIDocShellTreeItem* aOriginalRequestor,
>                     nsIDocShellTreeItem** aFoundItem);
>  
> -  nsTArray<nsPIDOMWindowOuter*> GetTopLevelWindows();
> -

Did you mean to remove the blank line?

::: toolkit/components/telemetry/Histograms.json
@@ +254,5 @@
> +    "expires_in_version": "60",
> +    "keyed": true,
> +    "kind": "exponential",
> +    "high": 10000,
> +    "n_buckets": 50,

This is a lot of buckets for a keyed histogram where we expect to have ~500 keys. Can you use the histogram simulator to try to find a good bucketing scheme?
https://telemetry.mozilla.org/histogram-simulator/

::: xpcom/threads/Dispatcher.cpp
@@ +87,5 @@
> +{
> +public:
> +  explicit AutoCollectTelemetry(ValidatingDispatcher::Runnable* aRunnable
> +                                MOZ_GUARD_OBJECT_NOTIFIER_PARAM)
> +    : mMaxProccessedMessageCount(nsContentUtils::TelemetryMessageSearchLimit())

Making this a pref seems like overkill to me. I think a static const int declared at the top of the file would be fine.

@@ +108,5 @@
> +
> +  uint32_t mMaxProccessedMessageCount;
> +  bool mIsBackground;
> +  nsCString mKey;
> +  TimeStamp mThen;

mStart makes more sense to me.

@@ +110,5 @@
> +  bool mIsBackground;
> +  nsCString mKey;
> +  TimeStamp mThen;
> +  TimeStamp mNow;
> +  TimeStamp mPendingVsync;

Perhaps this could be a return value from CollectData rather than a field?

@@ +115,5 @@
> +  MOZ_DECL_USE_GUARD_OBJECT_NOTIFIER
> +};
> +
> +void
> +AutoCollectTelemetry::CollectData()

Maybe this should be called GetVsyncTime or something?

@@ +117,5 @@
> +
> +void
> +AutoCollectTelemetry::CollectData()
> +{
> +  if (ipc::BackgroundChild::GetForCurrentThread()) {

Please return early if !GetForCurrentThread. That way you can remove one level of indentation.

@@ +151,5 @@
> +                  : Telemetry::CONTENT_JS_FOREGROUND_TICK_DELAY_MS;
> +
> +  CollectData();
> +
> +  if (mPendingVsync.IsNull() || mPendingVsync > mNow) {

If mMaxProcessedMessages == 0, we should record that in telemetry since there's a good chance that a vsync was in there somewhere.

@@ +166,5 @@
> +  if (mPendingVsync > mThen) {
> +    return;
> +  }
> +
> +  Telemetry::Accumulate(Telemetry::CONTENT_JS_KNOWN_TICK_DELAY_MS, duration);

I'm not really sure what this is for. How would we use it?

@@ +172,5 @@
> +  return;
> +}
> +
> +void
> +AutoCollectTelemetry::CollectTelemetryTelemetry()

Could this be folded into CollectTelemetry? Then you can change mNow to a local variable.

@@ +176,5 @@
> +AutoCollectTelemetry::CollectTelemetryTelemetry()
> +{
> +  Telemetry::Accumulate(
> +    Telemetry::CONTENT_JS_DISPATCHER_RUNNABLE_TELEMETRY_COLLECTION_MS,
> +    (TimeStamp::Now() - mNow).ToMilliseconds());

I'm a little unsure whether we should even collect this. If we do, I think the units should be microseconds. Taking even 1ms would be very surprising.

::: xpcom/threads/Dispatcher.h
@@ +54,5 @@
>    virtual dom::TabGroup* AsTabGroup() { return nullptr; }
>  
> +  // This method returns true if all members of the "group" are in a
> +  // "background" state.
> +  virtual bool IsBackground() const { return false; }

If you put IsBackground inside ValidatingDispatcher, then you don't need to define it for DocGroup.

Also, keep in mind that SystemGroup runnables will count as foreground because of this declaration.
Attachment #8839567 - Flags: review?(wmccloskey) → review+
(In reply to Bill McCloskey (:billm) from comment #20)
> 
> If mMaxProcessedMessages == 0, we should record that in telemetry since
> there's a good chance that a vsync was in there somewhere.
> 

This is really a problem. We really want to know the extra amount of time a vsync stays pending because a TabGroup related runnable ran instead of the vsync event.

I've been trying to figure out if it would be possible to track this from when the vsync event arrives on the MessageChannel, but I'm not getting very far.

Not sure of how to proceed here.
(In reply to Andreas Farre [:farre] from comment #21)
> (In reply to Bill McCloskey (:billm) from comment #20)
> > 
> > If mMaxProcessedMessages == 0, we should record that in telemetry since
> > there's a good chance that a vsync was in there somewhere.
> > 
> 
> This is really a problem. We really want to know the extra amount of time a
> vsync stays pending because a TabGroup related runnable ran instead of the
> vsync event.
> 
> I've been trying to figure out if it would be possible to track this from
> when the vsync event arrives on the MessageChannel, but I'm not getting very
> far.
> 
> Not sure of how to proceed here.

My hope is that the case where |mMaxProcessedMessages == 0| is rare enough that it's okay that we don't know when the vsync arrived. Let's just record it in a separate counter and move on. If it turns out that it happens often, then we can do something different.
Comment on attachment 8839567 [details] [diff] [review]
0001-Bug-1322184-Measure-time-spent-in-content-JS-that-ca.patch

Review of attachment 8839567 [details] [diff] [review]:
-----------------------------------------------------------------

Sorry for the long delay on this.  :(

::: dom/base/TabGroup.cpp
@@ +258,5 @@
> +{
> +  MOZ_RELEASE_ASSERT(NS_IsMainThread());
> +
> +  for (nsPIDOMWindowOuter* outerWindow : GetTopLevelWindows()) {
> +    if (!outerWindow->IsBackground()) {

We should unify the logic of what the window getting background treatment is.  Right now TimeoutManager adds a check to see if the window is playing audio which makes this probe report the wrong result in such tabs.

I think it would be a good idea to expose a TreatAsBackground() method as a helper for returning true when the window is background and is not playing audio, and then you can use it both here and in TimeoutManager.

::: xpcom/threads/Dispatcher.cpp
@@ +120,5 @@
> +{
> +  if (ipc::BackgroundChild::GetForCurrentThread()) {
> +    ipc::BackgroundChild::GetForCurrentThread()
> +      ->GetIPCChannel()
> +      ->PeekMessages([this](const IPC::Message& aMsg) -> bool {

PeekMessages holds a lock: <http://searchfox.org/mozilla-central/rev/31b6089ce26fa76459642765115605d50a6c67b4/ipc/glue/MessageChannel.cpp#1012>   We should probably fix that before adding this call.

But if we go with my suggestion below, this code automatically goes away!

@@ +151,5 @@
> +                  : Telemetry::CONTENT_JS_FOREGROUND_TICK_DELAY_MS;
> +
> +  CollectData();
> +
> +  if (mPendingVsync.IsNull() || mPendingVsync > mNow) {

I thought about this quite a bit, and I think I disagree.  I spent a lot of time trying to think of a good argument...  :-)  But then I looked at the MessageChannel code, and I found this: <http://searchfox.org/mozilla-central/rev/31b6089ce26fa76459642765115605d50a6c67b4/ipc/glue/MessageChannel.cpp#894>

So we already do something custom for a message type upon message receipt.  Can we add another call here like MaybeRecordPendingVSync() and set a bool indicating we have a pending vsync, and clear it when the message is processed?  I really think it's worth it to play it a bit wild and loose with our abstractions in this case.
Attachment #8839567 - Flags: review?(ehsan) → review-
(In reply to Bill McCloskey (:billm) from comment #22)
> (In reply to Andreas Farre [:farre] from comment #21)
> > (In reply to Bill McCloskey (:billm) from comment #20)
> > > 
> > > If mMaxProcessedMessages == 0, we should record that in telemetry since
> > > there's a good chance that a vsync was in there somewhere.
> > > 
> > 
> > This is really a problem. We really want to know the extra amount of time a
> > vsync stays pending because a TabGroup related runnable ran instead of the
> > vsync event.
> > 
> > I've been trying to figure out if it would be possible to track this from
> > when the vsync event arrives on the MessageChannel, but I'm not getting very
> > far.
> > 
> > Not sure of how to proceed here.
> 
> My hope is that the case where |mMaxProcessedMessages == 0| is rare enough
> that it's okay that we don't know when the vsync arrived. Let's just record
> it in a separate counter and move on. If it turns out that it happens often,
> then we can do something different.

The basic problem is that we will be blind to whether this case is rare or not.  No matter if we decide to assume 0 means there is vsync or not, we may be estimating incorrectly, and we'll lack data to know whether that's the case or not.  We could gather separate telemetry about how frequently 0 is observed but that seems pretty backwards to me, it's a very indirect way of something we can just teach MessageChannel about.

Or alternatively we could have an API on MessageChannel to query whether a specific message type is in the queue or not to keep the code a bit more generic.  It's a bit more work but it may also waste some space on MessageChannel to hold a hashmap or something like that...  I'm open to any ideas on how to do this.
I guess we could add a virtual method to ITopLevelProtocol that would be called from the I/O thread when a message is received. We could implement that method on PBackground and save when the vsync was received. Then we would clear it when processing a vsync.

The method is a bit dangerous since it's called from such a weird context. But I think it's better than adding code into MessageChannel directly.

(For the record, I still think the PeekMessages approach is probably fine.)
How's this going, Andreas? Can I help with anything?
Flags: needinfo?(afarre)
Sorry, it got a bit stalled while I was doing other stuff. Thanks for offering to help. If you could have a look at how it is now it would be a big help.

I've fixed the review issues that were pending, and it now implements recording vsyncs using your suggestion via a virtual method on IToplevelProtocol.

I've adjusted the buckets and span of the histograms so that the focus is around a ~16ms.

There were some struggle to be able to atomically set when the vsync arrived, but if I offset everything from process creation then it works out.
Attachment #8839567 - Attachment is obsolete: true
Flags: needinfo?(afarre)
Attachment #8857500 - Flags: feedback?(wmccloskey)
Comment on attachment 8857500 [details] [diff] [review]
0001-Bug-1322184-Measure-time-spent-in-content-JS-that-ca.patch

Review of attachment 8857500 [details] [diff] [review]:
-----------------------------------------------------------------

Looks good. Please ask for data review from bsmedberg as well on the next patch.

::: dom/base/TabGroup.cpp
@@ +271,5 @@
> +  return true;
> +}
> +
> +
> +

Extra newlines here.

::: dom/base/TabGroup.h
@@ -114,5 @@
>                     nsIDocShellTreeItem* aOriginalRequestor,
>                     nsIDocShellTreeItem** aFoundItem);
>  
> -  nsTArray<nsPIDOMWindowOuter*> GetTopLevelWindows();
> -

Please don't remove the newline.

@@ +118,5 @@
>    // This method is always safe to call off the main thread. The nsIEventTarget
>    // can always be used off the main thread.
>    nsIEventTarget* EventTargetFor(TaskCategory aCategory) const override;
>  
> +  bool IsBackground() const override;

Please comment that a TabGroup returns true here if all of its top-level windows are in the background.

::: ipc/glue/BackgroundChildImpl.cpp
@@ +527,5 @@
>    return true;
>  }
>  
> +void
> +BackgroundChildImpl::OnChannelReceivedMessage(const Message& aMsg)

Please wrap the whole method in EARLY_BETA_OR_EARLIER, as well as the declaration in the header file.

::: toolkit/components/telemetry/Histograms.json
@@ +281,5 @@
> +    "kind": "exponential",
> +    "high": 100,
> +    "n_buckets": 10,
> +    "bug_numbers": [1305153, 1296486],
> +    "description": "Time (in ms) that a pending vsync gets delayed by a runnable associated with a unit of related browsing contexts. The time is measured from the vsync event's time stamp to when the runnable finishes execution."

I would propose using TabGroup instead of "unit of related browsing contexts". I think more Mozilla devs will understand that term at this point.

::: xpcom/threads/SchedulerGroup.cpp
@@ +17,5 @@
>  
> +#include "nsContentUtils.h"
> +#include "mozilla/layout/PVsync.h"
> +#include "mozilla/ipc/BackgroundChild.h"
> +#include "mozilla/ipc/PBackgroundChild.h"

Can you remove some of these #includes?

@@ +92,5 @@
> +    : mIsBackground(aRunnable->IsBackground())
> +  {
> +    MOZ_GUARD_OBJECT_NOTIFIER_INIT;
> +    aRunnable->GetName(mKey);
> +    mStart = TimeStamp::Now();

Can you put this stuff in an #ifdef like the destructor?

@@ +194,4 @@
>    }
>  }
>  
> +Atomic<uint32_t> sLastSeenVsync(0);

This is a global, so the prefix is "g" ("s" is for static members). Also, I would call it gLastReceivedVsync.

Also, it should be declared static so that it has file scope.

@@ +195,5 @@
>  }
>  
> +Atomic<uint32_t> sLastSeenVsync(0);
> +
> +/* static */  void

Extra space in the middle.

@@ +198,5 @@
> +
> +/* static */  void
> +SchedulerGroup::RecordVsyncTime(TimeStamp aTimeStamp)
> +{
> +  bool inconsistent = false;

Please add MOZ_ASSERT(!NS_IsMainThread()) so people know that it runs off the main thread.

@@ +214,5 @@
> +  sLastSeenVsync = 0;
> +}
> +
> +/* static */ TimeStamp
> +SchedulerGroup::GetVsyncTime()

Instead of this being a method, it could probably just be folded into the CollectTelemetry method. It's just accessing a global variable.

@@ +223,5 @@
> +  if (inconsistent || !lastSeenVsync) {
> +    return TimeStamp();
> +  }
> +
> +  return creation + TimeDuration::FromMilliseconds(lastSeenVsync);

I don't understand why you're adding and subtracting the process creation time. It seems like it cancels out. Are you just trying to make it small enough to fit in one word? I'm pretty sure that Atomic<uint64_t> works fine on all our tier-1 platforms.

@@ +359,5 @@
>  
> +  nsresult result;
> +
> +  {
> +    AutoCollectTelemetry telemetry(this);

Maybe rename this to AutoCollectVsyncTelemetry.

::: xpcom/threads/SchedulerGroup.h
@@ +38,4 @@
>    SchedulerGroup();
>  
>    NS_INLINE_DECL_PURE_VIRTUAL_REFCOUNTING
> +  // This method returns true if all members of the "group" are in a

Please put a newline before the comment.

@@ +83,5 @@
> +  static void RecordVsyncTime(TimeStamp aTimeStamp);
> +
> +  static void ClearVsyncTime();
> +
> +  static TimeStamp GetVsyncTime();

Can you comment what these methods are for? Also, it might be good to change the names:

RecordVsyncTime -> MarkVsyncReceived
ClearVsyncTime -> MarkVsyncRan
Attachment #8857500 - Flags: feedback?(wmccloskey) → feedback+
Flags: needinfo?(wmccloskey)
Fixed all issues except adding/subtracting creation time. At first I thought that this was required due to windows, but it turns out it is only XP that has sizeof(TimeStampValue) > sizeof(uint64_t). On the other hand, the alternative to adding and subtracting creation time is exposing TimeStamp::mValue, which is kind of icky.

One thing though, realized that there was a bug in the method now called MarkVsyncReceived. If we called MarkVsyncReceived several times without a call to MarkVsyncRan we would write the new value, which is wrong and bad. We should keep the oldest unhandled vsync when measuring how long we delay ticking.
Attachment #8857500 - Attachment is obsolete: true
Attachment #8861348 - Flags: review?(wmccloskey)
Attachment #8861348 - Flags: feedback?(benjamin)
Comment on attachment 8861348 [details] [diff] [review]
0001-Bug-1322184-Measure-time-spent-in-content-JS-that-ca.patch

Review of attachment 8861348 [details] [diff] [review]:
-----------------------------------------------------------------

::: xpcom/threads/SchedulerGroup.cpp
@@ +80,4 @@
>  
>  NS_DEFINE_STATIC_IID_ACCESSOR(SchedulerEventTarget, NS_DISPATCHEREVENTTARGET_IID)
>  
> +static Atomic<uint64_t> gLastSeenVsync(0);

Maybe call this gEarliestUnprocessedVsync?

@@ +221,5 @@
> +  if (inconsistent) {
> +    return;
> +  }
> +
> +  gLastSeenVsync = (TimeStamp::Now() - creation).ToMilliseconds();

Can we use ToMicroseconds instead (here and in the computation above)? I think the additional precision could be meaningful. We can still record in milliseconds, but I'd like to avoid weird rounding errors in the addition and subtraction with the process creation time.
Attachment #8861348 - Flags: review?(wmccloskey) → review+
gLastSeenVsync -> gEarliestUnprocessedVsync
ToMilliseconds -> ToMicroseconds but not when recording.
Attachment #8861348 - Attachment is obsolete: true
Attachment #8861348 - Flags: feedback?(benjamin)
Attachment #8862379 - Flags: review+
Attachment #8862379 - Flags: feedback?(benjamin)
Flags: needinfo?(wmccloskey)
Comment on attachment 8862379 [details] [diff] [review]
0001-Bug-1322184-Measure-time-spent-in-content-JS-that-ca.patch

>diff --git a/toolkit/components/telemetry/Histograms.json b/toolkit/components/telemetry/Histograms.json

>+  "CONTENT_JS_TICK_DELAY_MS": {
>+    "alert_emails": ["farre@mozilla.com"],
>+    "expires_in_version": "65",

Please start out having all these expire in Firefox 61 (6 months/five releases from now). You will be able to renew as long as the data is still useful/being used.

>+    "kind": "exponential",
>+    "high": 100,
>+    "n_buckets": 10,
>+    "bug_numbers": [1305153, 1296486],
>+    "description": "Time (in ms) that a pending vsync gets delayed by a runnable associated with a TabGroup. The time is measured from the vsync event's time stamp to when the runnable finishes execution."
>+  },

Is this a union/superset of CONTENT_JS_FOREGROUND_TICK_DELAY_MS/TOTAL and CONTENT_JS_BACKGROUND_TICK_DELAY_MS? If so it's probably worth calling that out in the descriptions because it's a bit confusing to read through and see a bunch of similar histograms. If this is a strict union, you really don't need this since you could just aggregate the two others.

If there are multiple runnables that delay a single vsync, does this record one value or multiple? The name of the histogram implies that we'd record a single delay value per tick, but the description seems to say that we could record an unbounded number of values per tick depending on the number of pending runnables. It would be valuable to be more clear about how this behaves in the presence of multiple runnables.

Does this include both UI events (which often don't have nsRunnable, at least in the chrome process) as well as internally-dispatched runnables?

>+  "CONTENT_JS_FOREGROUND_TICK_DELAY_MS": {
...
>+    "description": "Time (in ms) that a pending vsync gets delayed by a runnable associated with a TabGroup that is (partially) in the foreground. The time is measured from the vsync event's time stamp to when the runnable finishes execution. The histogram is keyed by the label of the runnable, indicating which type of task the runnable is performing. The key 'TOTAL' is used to accumulate all foreground tick delays."
>+  },

I suggest that for ease of analysis this be broken up into two histograms. CONTENT_JS_FOREGROUND_TICK_DELAY_TOTAL_MS (non-keyed) for the total, and CONTENT_JS_FOREGROUND_TICK_DELAY_EVENTS_MS to break it down per-event. If you do both in one keyed histogram it takes a little more smarts to split it out during an analysis phase.

>+  "CONTENT_JS_KNOWN_TICK_DELAY_MS": {
>+    "alert_emails": ["farre@mozilla.com"],
>+    "expires_in_version": "65",
>+    "kind": "exponential",
>+    "high": 100,
>+    "n_buckets": 10,
>+    "bug_numbers": [1305153, 1296486],
>+    "description": "Time (in ms) that a pending vsync gets delayed by a runnable associated with a TabGroup when the vsync event's timestamp is before the starting time of the runnable. The time is measured from the vsync event's time stamp to when the runnable finishes execution."
>+  },

Is this a subset of CONTENT_JS_TICK_DELAY_MS based on the time an event is dispatched relative to the vsync? I don't understand what the word "known" means in this context (also read through the bug and don't see how this is going to be used).
Flags: needinfo?(afarre)
Attachment #8862379 - Flags: feedback?(benjamin) → feedback-
(In reply to Benjamin Smedberg [:bsmedberg] from comment #32)
> Comment on attachment 8862379 [details] [diff] [review]
> 0001-Bug-1322184-Measure-time-spent-in-content-JS-that-ca.patch
> 
> >diff --git a/toolkit/components/telemetry/Histograms.json b/toolkit/components/telemetry/Histograms.json
> 
> >+  "CONTENT_JS_TICK_DELAY_MS": {
> >+    "alert_emails": ["farre@mozilla.com"],
> >+    "expires_in_version": "65",
> 
> Please start out having all these expire in Firefox 61 (6 months/five
> releases from now). You will be able to renew as long as the data is still
> useful/being used.

Right, will make it so.

> >+    "kind": "exponential",
> >+    "high": 100,
> >+    "n_buckets": 10,
> >+    "bug_numbers": [1305153, 1296486],
> >+    "description": "Time (in ms) that a pending vsync gets delayed by a runnable associated with a TabGroup. The time is measured from the vsync event's time stamp to when the runnable finishes execution."
> >+  },
> 
> Is this a union/superset of CONTENT_JS_FOREGROUND_TICK_DELAY_MS/TOTAL and
> CONTENT_JS_BACKGROUND_TICK_DELAY_MS? If so it's probably worth calling that
> out in the descriptions because it's a bit confusing to read through and see
> a bunch of similar histograms. If this is a strict union, you really don't
> need this since you could just aggregate the two others.

Yes, it is the union of the two. I'll remove it (and add two separate non-keyed for fg and bg as discussed below). 

> If there are multiple runnables that delay a single vsync, does this record
> one value or multiple? The name of the histogram implies that we'd record a
> single delay value per tick, but the description seems to say that we could
> record an unbounded number of values per tick depending on the number of
> pending runnables. It would be valuable to be more clear about how this
> behaves in the presence of multiple runnables.

Right, we measure the amount of time a single runnable delays a tick. The same tick could be "delayed" by several runnables, and this would be recorded as several data points.

> Does this include both UI events (which often don't have nsRunnable, at
> least in the chrome process) as well as internally-dispatched runnables?

Only content events.

> >+  "CONTENT_JS_FOREGROUND_TICK_DELAY_MS": {
> ...
> >+    "description": "Time (in ms) that a pending vsync gets delayed by a runnable associated with a TabGroup that is (partially) in the foreground. The time is measured from the vsync event's time stamp to when the runnable finishes execution. The histogram is keyed by the label of the runnable, indicating which type of task the runnable is performing. The key 'TOTAL' is used to accumulate all foreground tick delays."
> >+  },
> 
> I suggest that for ease of analysis this be broken up into two histograms.
> CONTENT_JS_FOREGROUND_TICK_DELAY_TOTAL_MS (non-keyed) for the total, and
> CONTENT_JS_FOREGROUND_TICK_DELAY_EVENTS_MS to break it down per-event. If
> you do both in one keyed histogram it takes a little more smarts to split it
> out during an analysis phase.

Right, will do.

> >+  "CONTENT_JS_KNOWN_TICK_DELAY_MS": {
> >+    "alert_emails": ["farre@mozilla.com"],
> >+    "expires_in_version": "65",
> >+    "kind": "exponential",
> >+    "high": 100,
> >+    "n_buckets": 10,
> >+    "bug_numbers": [1305153, 1296486],
> >+    "description": "Time (in ms) that a pending vsync gets delayed by a runnable associated with a TabGroup when the vsync event's timestamp is before the starting time of the runnable. The time is measured from the vsync event's time stamp to when the runnable finishes execution."
> >+  },
> 
> Is this a subset of CONTENT_JS_TICK_DELAY_MS based on the time an event is
> dispatched relative to the vsync? I don't understand what the word "known"
> means in this context (also read through the bug and don't see how this is
> going to be used).

This is for the case when we know beforehand that we have a pending vsync, but we start a runnable anyway. And yes, this then is a subset of CONTENT_JS_TICK_DELAY_MS but with that extra piece of information.
Flags: needinfo?(afarre)
Updated with :bsmedberg's suggestions. Do you need to re-review these changes Bill?
Attachment #8862379 - Attachment is obsolete: true
Flags: needinfo?(wmccloskey)
Attachment #8863772 - Flags: feedback?
> > If there are multiple runnables that delay a single vsync, does this record
> > one value or multiple? The name of the histogram implies that we'd record a
> > single delay value per tick, but the description seems to say that we could
> > record an unbounded number of values per tick depending on the number of
> > pending runnables. It would be valuable to be more clear about how this
> > behaves in the presence of multiple runnables.
> 
> Right, we measure the amount of time a single runnable delays a tick. The
> same tick could be "delayed" by several runnables, and this would be
> recorded as several data points.

I'm worried about this from an analysis perspective. Imagine this sequence:

vsync 0 @ 0ms
event A start @ 8ms
event A finish @ 10ms <-- this event doesn't delay a vsync so it's not measured at all, right?
event B start @ 15ms
vsync 1 @ 17ms
event B finish @ 21ms <-- this records a delay of 4ms, right?
event C "known" start @ 21ms
event C end @ 28ms <-- this records a delay of which: 7ms or 11ms? It also records a "known" event delay of 7ms or 11ms?

If event C records a delay of 11ms, we're overcounting the time spent in event C. If event C records 7ms, that accumulates correctly, but the histogram description needs to describe this more clearly.
Flags: needinfo?(afarre)
Ok, I see your point. C would indeed count as 11. And it would overcount for not just the *_KNOWN_* histogram, but the others as well whenever we have a pending unhandled vsync but starts running an event instead. 

I wonder if this is a problem though. What we want is something that we can track over time, where the desired result is something that is becoming smaller as we progress. That is, regardless if the measurement of a "delay" of a vsync is entirely correct, the tracked delta in the histogram would be correct?
Flags: needinfo?(afarre)
Yeah, if you're ok with that oddity that's not a blocker for data review. It might be worth calling out explicitly somewhere, though.
Please go ahead and land. It doesn't need re-review from me.
Flags: needinfo?(wmccloskey)
(In reply to Benjamin Smedberg [:bsmedberg] from comment #37)
> Yeah, if you're ok with that oddity that's not a blocker for data review. It
> might be worth calling out explicitly somewhere, though.

The probes' description says that they measure from the vsync timestamp, so I think that's already done.

I haven't classiefied the try run yet:

https://treeherder.mozilla.org/#/jobs?repo=try&revision=d371e449edbaf762d1a8781ace1c7a716cfe48cd

but it looking good superficially.
Yep, try looks good. 

Setting checkin-needed.
Keywords: checkin-needed
Pushed by ryanvm@gmail.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/08d888f39c15
Measure time spent in content JS that causes delay in paint. r=billm, data-r=bsmedberg
Keywords: checkin-needed
https://hg.mozilla.org/mozilla-central/rev/08d888f39c15
Status: NEW → RESOLVED
Closed: 2 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla55
Depends on: 1366156
Depends on: 1367458
Depends on: 1392868
Component: DOM → DOM: Core & HTML
You need to log in before you can comment on or make changes to this bug.