Add counters in DocGroup and WorkerPrivate to track their activities

RESOLVED FIXED in mozilla61

Status

()

enhancement
P2
normal
RESOLVED FIXED
2 years ago
3 months ago

People

(Reporter: tarek, Assigned: tarek)

Tracking

57 Branch
mozilla61
Points:
---
Dependency tree / graph

Firefox Tracking Flags

(Not tracked)

Details

(Whiteboard: [perf-tools])

Attachments

(1 attachment)

Add 2 counters in the DocGroup class :

- a static array that counts how many times a runnable is dispatched
- a long that counts how much time is spent in runnable labeled for that docgroup

Both of these values can be fetched and reset.

The data will be transmited to the parent via IPDL (next patch) so it can be used to display tab activity, or dump it in a log file for further analysis.
Assignee: nobody → tarek
Blocks: 1419679
Comment on attachment 8950144 [details]
Bug 1437438 - Add a performance counter to track scheduler activity -

https://reviewboard.mozilla.org/r/219410/#review225140

This is roughly how I would imagine execution time would be collected, with the following comments.

I would also feel better if :froydnj or :erahm could sign off on this as well :)

::: dom/base/DocGroup.cpp:48
(Diff revision 2)
> +  mExecutionDuration = 0;
> +  mMetrics.Clear();
> +  for (size_t i=0; i<(size_t)TaskCategory::Count; i++) {
> +    mMetrics.AppendElement(0);
> +  }

Isn't this the same as ResetMetrics?

::: xpcom/threads/nsThread.cpp:991
(Diff revision 2)
>      *aResult = (event.get() != nullptr);
>  
>      if (event) {
>        LOG(("THRD(%p) running [%p]\n", this, event.get()));
> -
> +      if (mNestedEventLoopDepth > 1) {
> +        RecordExecutionTimeForGroup(event, TimeStamp::Now() - mStart);

What we're trying to achieve at this point is that we want to avoid including the time spent in a nested event loop when recording the execution time for an event.

And it is true that this always happens when mNestedEventLoopDepth is greater than 1, but strictly speaking need to do this when mNestedEventLoopDepth is also larger than mNestedEventLoopDepth was in the previous call to ProcessNextEvent, that is for example when we see mNestedEventLoopDepth go from 1 to 2, but not when mNestedEventLoopDepth stays at 2.

Also, the event supplied to RecordExecutionTimeForGroup at this point is not the event we wish to record time for. Here 'event' is the next event to be processed, and we want to record time on the event we just paused. I don't think it's safe to assume that the nested event loop will always run with the current scheduler group.

::: xpcom/threads/nsThread.cpp:1055
(Diff revision 2)
> +      mStart = mozilla::TimeStamp::Now();
>        event->Run();
> +      RecordExecutionTimeForGroup(event, TimeStamp::Now() - mStart);

I would really really like it if we didn't unconditionally collect this data, since it adds an overhead on every processed event.
Attachment #8950144 - Flags: review?(afarre) → review-
Comment on attachment 8950144 [details]
Bug 1437438 - Add a performance counter to track scheduler activity -

https://reviewboard.mozilla.org/r/219410/#review225140

> What we're trying to achieve at this point is that we want to avoid including the time spent in a nested event loop when recording the execution time for an event.
> 
> And it is true that this always happens when mNestedEventLoopDepth is greater than 1, but strictly speaking need to do this when mNestedEventLoopDepth is also larger than mNestedEventLoopDepth was in the previous call to ProcessNextEvent, that is for example when we see mNestedEventLoopDepth go from 1 to 2, but not when mNestedEventLoopDepth stays at 2.
> 
> Also, the event supplied to RecordExecutionTimeForGroup at this point is not the event we wish to record time for. Here 'event' is the next event to be processed, and we want to record time on the event we just paused. I don't think it's safe to assume that the nested event loop will always run with the current scheduler group.

> but not when mNestedEventLoopDepth stays at 2.

How can that be possible since mNestedEventLoopDepth is always incremented when the function is called ?

The other issue is that the recursive calls are not linear, there can be a full tree as far as I understand.

It sounds like the simplest implementation is to keep track of the timer and parent 
from within the runnable itself.

e.g. in pseudo-code :


    nsThread::ProcessNextEvent() 
    {
      if (mCurrentEvent) {
        event->SetParent(mCurrentEvent);
        mCurrentEvent->PauseRecording();
      }
      
      mCurrentEvent = event; 
      event->StartRecording();
      event->Run();   
      event->FinishRecording();
      
      if (event.HasParent()) {
        event.GetParentEvent()->ResumeRecording();
      }  
    }



Or if we can't change/wrap the runnable class like that, maybe keep track of 
all timers in a flat static array, assuming we don't have a lot of imbricated 
calls. 

What do you think?

> I would really really like it if we didn't unconditionally collect this data, since it adds an overhead on every processed event.

Sounds like a good idea. A pref would work?
Comment on attachment 8950144 [details]
Bug 1437438 - Add a performance counter to track scheduler activity -

https://reviewboard.mozilla.org/r/219410/#review225242

I don't think I have anything meaningful to add to the review, since I don't really know the code very well.
Attachment #8950144 - Flags: review?(josh)
Comment on attachment 8950144 [details]
Bug 1437438 - Add a performance counter to track scheduler activity -

https://reviewboard.mozilla.org/r/219410/#review225238

I have lots of questions.  What do you want to do with this data once it's being collected?

::: commit-message-a8e15:1
(Diff revision 4)
> +Bug 1437438 - Add counters in DocGroup to track its activity - r?jdm r?farre r?froydnj
> +
> +MozReview-Commit-ID: AvCLYScwkoi

Including a commit message describing what you are trying to accomplish would be most appreciated by reviewers and/or future readers. :)

::: dom/base/DocGroup.h:39
(Diff revision 4)
>  // more detail, a DocGroup is actually a collection of documents, and a
>  // TabGroup is a collection of DocGroups. A TabGroup typically will contain
>  // (through its DocGroups) the documents from one or more tabs related by
>  // window.opener. A DocGroup is a member of exactly one TabGroup.
>  
> +typedef AutoTArray<uint32_t, (size_t)TaskCategory::Count> DocGroupMetrics;

Since events can be dispatched into `DocGroup` from any thread, these counters need to be `Atomic<uint32_t>`.

::: dom/base/DocGroup.h:39
(Diff revision 4)
>  // more detail, a DocGroup is actually a collection of documents, and a
>  // TabGroup is a collection of DocGroups. A TabGroup typically will contain
>  // (through its DocGroups) the documents from one or more tabs related by
>  // window.opener. A DocGroup is a member of exactly one TabGroup.
>  
> +typedef AutoTArray<uint32_t, (size_t)TaskCategory::Count> DocGroupMetrics;

We don't have much use for the resizability of `AutoTArray` here, so this should just be `mozilla::Array` instead.

::: dom/base/DocGroup.h:59
(Diff revision 4)
>  
>    bool MatchesKey(const nsACString& aKey)
>    {
>      return aKey == mKey;
>    }
> +  DocGroupMetrics* GetMetrics()

WDYT about making this return `const DocGroupMetrics*`?  Or even `const DocGroupMetrics&`?

::: dom/base/DocGroup.h:63
(Diff revision 4)
>    }
> +  DocGroupMetrics* GetMetrics()
> +  {
> +    return &mMetrics;
> +  }
> +  void ResetMetrics()

What ensures this is only called on a single thread?

I see that it's only used to initialize `mMetrics` from the `DocGroup` constructor.  In that case, you might be better off ditching this method and just moving the initialization inline to `DocGroup::DocGroup` so somebody else doesn't wonder what ensures thread safety.

::: dom/base/DocGroup.h:72
(Diff revision 4)
> +  uint64_t GetExecutionDuration()
> +  {
> +    return mExecutionDuration;
> +  }

Where does this function get used?  I don't see it being used in the current patch.

::: dom/base/DocGroup.cpp:71
(Diff revision 4)
> +DocGroup::IncrementExecutionDuration(mozilla::TimeDuration aDuration)
> +{
> +  mExecutionDuration = mExecutionDuration + (uint64_t)aDuration.ToMicroseconds();
> +}

If this can only be called on the main thread--which is what it looks like is the intention here, even if it's not what the code actually does--we should have an assertion here about that.

If it can be called on multiple threads, I'm curious why that would be useful.

::: xpcom/threads/nsThread.h:183
(Diff revision 4)
>    bool mCanInvokeJS;
>  
>  #ifndef RELEASE_OR_BETA
>    mozilla::TimeStamp mNextIdleDeadline;
>  #endif
> +  mozilla::TimeStamp mStart;

This member needs some documentation: what's being measured here?  The point when the thread is started?  The point when the shutdown of the thread is started?  Clearly it's neither of these things, based on reading the rest of the patch, but it's hard to see that without reading the entirety of `nsThread.cpp`.

::: xpcom/threads/nsThread.cpp:990
(Diff revision 4)
> -
> +      if (mNestedEventLoopDepth > 1) {
> +        RecordExecutionTimeForGroup(event, TimeStamp::Now() - mStart);
> +      }

Why are we calling `RecordExecutionTimeForGroup` on the same event twice?

::: xpcom/threads/nsThread.cpp:998
(Diff revision 4)
>        Maybe<Telemetry::AutoTimer<Telemetry::MAIN_THREAD_RUNNABLE_MS>> timer;
>        Maybe<Telemetry::AutoTimer<Telemetry::IDLE_RUNNABLE_BUDGET_OVERUSE_MS>> idleTimer;

Why are neither of these timers suitable for the data you want to collect?

::: xpcom/threads/nsThread.cpp:1055
(Diff revision 4)
> +      mStart = mozilla::TimeStamp::Now();
>        event->Run();
> +      RecordExecutionTimeForGroup(event, TimeStamp::Now() - mStart);

Like Andreas said, this should be guarded by some kind of pref.  I think you also want to make sure that `RecordExecutionTimeForGroup` is also called on the main thread only?

On what branches do you want to collect this data?
Attachment #8950144 - Flags: review?(nfroyd)
Comment on attachment 8950144 [details]
Bug 1437438 - Add a performance counter to track scheduler activity -

https://reviewboard.mozilla.org/r/219410/#review225238

This patch was extracted from a bigger patch that uses these counters to track what tabs are doing over time. The data is sent over ipdl and then used on an about: page that displays tab activity. See a screenshot here https://ziade.org/energy.png 
we also want to have them dumped in logs to study them after the fact. 

The goal is to try to track down tabs that are active in the background and see if we can detect battery-draining pages.

I've built a prototype and I am now building cleaner patches out of the ugly monster I have here https://reviewboard.mozilla.org/r/215238/diff/4#index_header

This is the first logical patch that adds the counters in DocGroup, the second patch adds the ipdl protocol to collect the values in the parent process, the third patch is the code that uses the data.

Happy to meet if you want more info about the project.
Comment on attachment 8950144 [details]
Bug 1437438 - Add a performance counter to track scheduler activity -

https://reviewboard.mozilla.org/r/219410/#review225238

> Why are neither of these timers suitable for the data you want to collect?

They collect info about the idling times, so not exactly what we want
Priority: -- → P2
Comment on attachment 8950144 [details]
Bug 1437438 - Add a performance counter to track scheduler activity -

https://reviewboard.mozilla.org/r/219410/#review226806


Code analysis found 1 defect in this patch:
 - 1 defect found by clang-tidy

You can run this analysis locally with:
 - `./mach static-analysis check path/to/file.cpp` (C/C++)


If you see a problem in this automated review, please report it here: http://bit.ly/2y9N9Vx


::: xpcom/tests/gtest/TestThreadManager.cpp:197
(Diff revision 6)
> +
> +
> +class MockSchedulerGroup: public SchedulerGroup
> +{
> +public:
> +  MockSchedulerGroup(mozilla::dom::DocGroup* aDocGroup)

Error: Bad implicit conversion constructor for 'mockschedulergroup' [clang-tidy: mozilla-implicit-constructor]
Comment on attachment 8950144 [details]
Bug 1437438 - Add a performance counter to track scheduler activity -

https://reviewboard.mozilla.org/r/219410/#review227158


Code analysis found 1 defect in this patch:
 - 1 defect found by clang-tidy

You can run this analysis locally with:
 - `./mach static-analysis check path/to/file.cpp` (C/C++)


If you see a problem in this automated review, please report it here: http://bit.ly/2y9N9Vx


::: xpcom/tests/gtest/TestThreadManager.cpp:179
(Diff revision 7)
> +
> +
> +class MockSchedulerGroup: public SchedulerGroup
> +{
> +public:
> +  MockSchedulerGroup(mozilla::dom::DocGroup* aDocGroup)

Error: Bad implicit conversion constructor for 'mockschedulergroup' [clang-tidy: mozilla-implicit-constructor]
Comment on attachment 8950144 [details]
Bug 1437438 - Add a performance counter to track scheduler activity -

https://reviewboard.mozilla.org/r/219410/#review227166


Code analysis found 1 defect in this patch:
 - 1 defect found by clang-tidy

You can run this analysis locally with:
 - `./mach static-analysis check path/to/file.cpp` (C/C++)


If you see a problem in this automated review, please report it here: http://bit.ly/2y9N9Vx


::: xpcom/tests/gtest/TestThreadManager.cpp:179
(Diff revision 8)
> +
> +
> +class MockSchedulerGroup: public SchedulerGroup
> +{
> +public:
> +  MockSchedulerGroup(mozilla::dom::DocGroup* aDocGroup)

Error: Bad implicit conversion constructor for 'mockschedulergroup' [clang-tidy: mozilla-implicit-constructor]
Comment on attachment 8950144 [details]
Bug 1437438 - Add a performance counter to track scheduler activity -

https://reviewboard.mozilla.org/r/219410/#review227240

I'm unclear on what this code is intended to do with nested event loops, and I'm unsure whether it handles those successfully.

::: dom/base/DocGroup.h:39
(Diff revision 6)
>  // more detail, a DocGroup is actually a collection of documents, and a
>  // TabGroup is a collection of DocGroups. A TabGroup typically will contain
>  // (through its DocGroups) the documents from one or more tabs related by
>  // window.opener. A DocGroup is a member of exactly one TabGroup.
>  
> +typedef mozilla::Array<Atomic<uint32_t>, (size_t)TaskCategory::Count> DocGroupMetrics;

Documentation on this type and/or the `mMetrics` member would be nice.

::: dom/base/DocGroup.h:63
(Diff revision 6)
> +  void ResetMetrics()
> +  {
> +    for (auto& cnt : mMetrics) {
> +      cnt = 0;
> +    }
> +    mExecutionDuration = 0;
> +  }

Like last time, why do we have a separate, publically-callable function for doing this when it's only called from the constructor?

::: dom/base/DocGroup.h:147
(Diff revision 6)
>    nsCString mKey;
>    RefPtr<TabGroup> mTabGroup;
>    nsTArray<nsIDocument*> mDocuments;
>    RefPtr<mozilla::dom::CustomElementReactionsStack> mReactionsStack;
>    nsTArray<RefPtr<HTMLSlotElement>> mSignalSlotList;
> +  uint64_t mExecutionDuration;

Execution duration of what?  In what units?

::: dom/base/DocGroup.cpp:73
(Diff revision 6)
>  }
>  
> +void
> +DocGroup::IncrementExecutionDuration(mozilla::TimeDuration aDuration)
> +{
> +  mExecutionDuration = mExecutionDuration + (uint64_t)aDuration.ToMicroseconds();

The previous review suggested either adding thread-safety assertions to this method or explaining why it could be usefully called from multiple threads (which would require `mExecutionDuration` to be `Atomic`).  Neither of those have been done, AFAICT.  Why not?

::: xpcom/tests/gtest/TestThreadManager.cpp:211
(Diff revision 6)
> +private:
> +  mozilla::dom::DocGroup* mDocGroup;
> +};
> +
> +
> +TEST(ThreadManager, CollectMetrics)

This is a good start (if a somewhat peculiar place to put the test), but you definitely want tests for nested event loops too.

::: xpcom/tests/gtest/TestThreadManager.cpp:217
(Diff revision 6)
> +{
> +  // This test makes sure that if the enable_scheduler_timing pref
> +  // is set, we collect in DocGroup the time spent in runnables
> +  // that are linked to that DocGroup.
> +  nsresult rv;
> +  Preferences::SetBool("dom.performance.enable_scheduler_timing", true);

Probably want to set this to `false` after the test?  Or perhaps the previous value?

::: xpcom/threads/nsThread.cpp:1003
(Diff revision 6)
> +      // The event starts to run, storing the timestamp
> +      if (mozilla::dom::DOMPrefs::SchedulerLoggingEnabled()
> +          && MAIN_THREAD == mIsMainThread) {
> +        if (mNestedEventLoopDepth > mCurrentEventLoopDepth ) {
> +          // this is a recursive call, we want to send the time of
> +          // the parent even and then reset the timer
> +          mozilla::TimeDuration duration = TimeStamp::Now() - mCurrentEventStart;
> +          LOG(("[%d] Recording parent execution time for %p: %f", mNestedEventLoopDepth, mCurrentEvent.get(), duration.ToMilliseconds()));
> +          RecordExecutionTimeForGroup(mCurrentEvent, duration);
> +        }
> +     }

The comments here do not match what this block of code is actually doing.  We're not storing any kind of timestamp here, nor are we resetting any timers here.

What is the code really trying to do?

::: xpcom/threads/nsThread.cpp:1004
(Diff revision 6)
> +      if (mozilla::dom::DOMPrefs::SchedulerLoggingEnabled()
> +          && MAIN_THREAD == mIsMainThread) {

Please reverse this check, so we don't try to check whether scheduler logging is enabled off the main thread.

::: xpcom/threads/nsThread.cpp:1008
(Diff revision 6)
> +      // The event starts to run, storing the timestamp
> +      if (mozilla::dom::DOMPrefs::SchedulerLoggingEnabled()
> +          && MAIN_THREAD == mIsMainThread) {
> +        if (mNestedEventLoopDepth > mCurrentEventLoopDepth ) {
> +          // this is a recursive call, we want to send the time of
> +          // the parent even and then reset the timer

Nit: "the parent event"

::: xpcom/threads/nsThread.cpp:1011
(Diff revision 6)
> +        if (mNestedEventLoopDepth > mCurrentEventLoopDepth ) {
> +          // this is a recursive call, we want to send the time of
> +          // the parent even and then reset the timer
> +          mozilla::TimeDuration duration = TimeStamp::Now() - mCurrentEventStart;
> +          LOG(("[%d] Recording parent execution time for %p: %f", mNestedEventLoopDepth, mCurrentEvent.get(), duration.ToMilliseconds()));
> +          RecordExecutionTimeForGroup(mCurrentEvent, duration);

`mCurrentEvent` is supposed to be the event that started the nested event loop, correct?  How does that square with `mCurrentEvent` getting set on every `ProcessNextEvent` call (see below)?

Also, I'd suggest trying to pull the `DocGroup` out and storing that instead, so you don't have to look up the `DocGroup` every trip through the event loop.

::: xpcom/threads/nsThread.cpp:1015
(Diff revision 6)
> +          LOG(("[%d] Recording parent execution time for %p: %f", mNestedEventLoopDepth, mCurrentEvent.get(), duration.ToMilliseconds()));
> +          RecordExecutionTimeForGroup(mCurrentEvent, duration);
> +        }
> +     }
> +
>        Maybe<Telemetry::AutoTimer<Telemetry::MAIN_THREAD_RUNNABLE_MS>> timer;

You said that this timer collects information about idle times...that's not what the name or the code indicates...

::: xpcom/threads/nsThread.cpp:1075
(Diff revision 6)
> +      if (mozilla::dom::DOMPrefs::SchedulerLoggingEnabled()
> +          && MAIN_THREAD == mIsMainThread) {

Likewise here for the order of the checks.

::: xpcom/threads/nsThread.cpp:1087
(Diff revision 6)
> +      if (mozilla::dom::DOMPrefs::SchedulerLoggingEnabled()
> +          && MAIN_THREAD == mIsMainThread) {

Likewise here for the order of the checks.

::: xpcom/threads/nsThread.cpp:1089
(Diff revision 6)
> +
> +#ifndef RELEASE_OR_BETA
> +      // End of execution , we can send the duration for the group
> +      if (mozilla::dom::DOMPrefs::SchedulerLoggingEnabled()
> +          && MAIN_THREAD == mIsMainThread) {
> +        mozilla::TimeDuration duration = TimeStamp::Now() - mCurrentEventStart;

If we've run a nested event loop, `mCurrentEventStart` will have been modified by `event->Run()`, won't it?

::: xpcom/threads/nsThread.cpp:1091
(Diff revision 6)
> +      // End of execution , we can send the duration for the group
> +      if (mozilla::dom::DOMPrefs::SchedulerLoggingEnabled()
> +          && MAIN_THREAD == mIsMainThread) {
> +        mozilla::TimeDuration duration = TimeStamp::Now() - mCurrentEventStart;
> +        LOG(("[%d] Recording execution time for %p: %f", mNestedEventLoopDepth, event.get(), duration.ToMilliseconds()));
> +        RecordExecutionTimeForGroup(event, duration);

`mCurrentEvent` should probably be nulled out here so we're not holding on to the runnable any longer than we should.

`mCurrentEventLoopDepth` should possibly be updated here as well?
Attachment #8950144 - Flags: review?(nfroyd)
Comment on attachment 8950144 [details]
Bug 1437438 - Add a performance counter to track scheduler activity -

https://reviewboard.mozilla.org/r/219410/#review227240

> Like last time, why do we have a separate, publically-callable function for doing this when it's only called from the constructor?

This function is going to be called via IPDL in ContentChild. I have not included the IPDL layer in the current patch because I assumed it was big enough.

Notice that the functions are called in the tests.

> You said that this timer collects information about idle times...that's not what the name or the code indicates...

From Telemetry doc:

1/ IDLE_RUNNABLE_BUDGET_OVERUSE_MS: The time a given runnable exceeds its budget as set in nsIRunnable::SetDeadline (in milliseconds). The key comes from the runnables nsINamed::name value.

2/ MAIN_THREAD_RUNNABLE_MS : The time a given main thread runnable took to run (in milliseconds). The key comes from the runnables nsINamed::name value.

From my understanding:

1/ is a specific case where we check that a runnable is not spending more time than a set deadline
2/ is counting how long each runnable takes but include the time spent in recursive calls

both are sent to telemetry.

The counter we're adding is counting how long each runnable takes for each docgroup without including recursive calls.
The values are stored inside each Docgroup so they can be used via IPDL for adding live information in about:energy 

I don't think we should change the existing MAIN_THREAD_RUNNABLE_MS metrics since we would provide a different value
and potentially create havoc in the existing histograms.
Comment on attachment 8950144 [details]
Bug 1437438 - Add a performance counter to track scheduler activity -

https://reviewboard.mozilla.org/r/219410/#review225238

> Where does this function get used?  I don't see it being used in the current patch.

see next comment on this
Comment on attachment 8950144 [details]
Bug 1437438 - Add a performance counter to track scheduler activity -

https://reviewboard.mozilla.org/r/219410/#review227380


Code analysis found 1 defect in this patch:
 - 1 defect found by clang-tidy

You can run this analysis locally with:
 - `./mach static-analysis check path/to/file.cpp` (C/C++)


If you see a problem in this automated review, please report it here: http://bit.ly/2y9N9Vx


::: xpcom/threads/nsThread.cpp:925
(Diff revision 10)
> +nsThread::GetRunnableDocGroup(nsIRunnable* aEvent)
> +{
> +  RefPtr<SchedulerGroup::Runnable> docRunnable = do_QueryObject(aEvent);
> +  if (docRunnable) {
> +    return docRunnable->DocGroup();
> +  } else {

Warning: Do not use 'else' after 'return' [clang-tidy: readability-else-after-return]
Comment on attachment 8950144 [details]
Bug 1437438 - Add a performance counter to track scheduler activity -

https://reviewboard.mozilla.org/r/219410/#review227406


Code analysis found 1 defect in this patch:
 - 1 defect found by clang-tidy

You can run this analysis locally with:
 - `./mach static-analysis check path/to/file.cpp` (C/C++)


If you see a problem in this automated review, please report it here: http://bit.ly/2y9N9Vx


::: xpcom/threads/nsThread.cpp:925
(Diff revision 11)
> +nsThread::GetRunnableDocGroup(nsIRunnable* aEvent)
> +{
> +  RefPtr<SchedulerGroup::Runnable> docRunnable = do_QueryObject(aEvent);
> +  if (docRunnable) {
> +    return docRunnable->DocGroup();
> +  } else {

Warning: Do not use 'else' after 'return' [clang-tidy: readability-else-after-return]
Comment on attachment 8950144 [details]
Bug 1437438 - Add a performance counter to track scheduler activity -

https://reviewboard.mozilla.org/r/219410/#review227532


Code analysis found 1 defect in this patch:
 - 1 defect found by clang-tidy

You can run this analysis locally with:
 - `./mach static-analysis check path/to/file.cpp` (C/C++)


If you see a problem in this automated review, please report it here: http://bit.ly/2y9N9Vx


::: xpcom/threads/nsThread.cpp:925
(Diff revision 12)
> +nsThread::GetRunnableDocGroup(nsIRunnable* aEvent)
> +{
> +  RefPtr<SchedulerGroup::Runnable> docRunnable = do_QueryObject(aEvent);
> +  if (docRunnable) {
> +    return docRunnable->DocGroup();
> +  } else {

Warning: Do not use 'else' after 'return' [clang-tidy: readability-else-after-return]
Whiteboard: [perf-tools]
Comment on attachment 8950144 [details]
Bug 1437438 - Add a performance counter to track scheduler activity -

https://reviewboard.mozilla.org/r/219410/#review229808

::: xpcom/threads/nsThread.cpp:1007
(Diff revision 16)
>          HangMonitor::NotifyActivity();
>        }
>  
>  #ifndef RELEASE_OR_BETA
> +      if (PR_GetCurrentThread() == mThread
> +          && mSchedulerLoggingEnabled

Micro-optimization, but _if_ we manage to toggle logging using a member, then we should check that it's enabled before we check that PR_GetCurrentThread() == mThread to avoid a method call.

::: xpcom/threads/nsThread.cpp:1009
(Diff revision 16)
>  
>  #ifndef RELEASE_OR_BETA
> +      if (PR_GetCurrentThread() == mThread
> +          && mSchedulerLoggingEnabled
> +          && mNestedEventLoopDepth > mCurrentEventLoopDepth
> +          && mCurrentPerformanceCounter) {

Shouldn't mCurrentPerformanceCounter always be non-null here?

::: xpcom/threads/nsThread.cpp:1099
(Diff revision 16)
> +          mCurrentPerformanceCounter->IncrementExecutionDuration(duration.ToMicroseconds());
> +        }
> +        if (recursiveEvent) {
> +          // If we're in a recursive call, reset the timer,
> +          // so the parent get its remaining execution time right.
> +          mCurrentEventStart = mozilla::TimeStamp::Now();

Don't you need to reset the current performance counter here as well? Otherwise the remainder of an event that triggered a recursive set of calls won't be recorded.

If you add:
RefPtr<mozilla::PerformanceCounter> currentPerformanceCounter;

below line 1076 and modify line 1083 to

currentPerformanceCounter = mCurrentPerformanceCounter = GetPerformanceCounter(event);

and at this line do:

mCurrentPerformanceCounter = currentPerformanceCounter;
Attachment #8950144 - Flags: review?(afarre) → review-
Comment on attachment 8950144 [details]
Bug 1437438 - Add a performance counter to track scheduler activity -

https://reviewboard.mozilla.org/r/219410/#review230672

::: xpcom/tests/gtest/TestThreadMetrics.cpp:65
(Diff revision 26)
> +    PR_Sleep(PR_MillisecondsToInterval(mExecutionTime1));
> +    if (mRecursive) {
> +      // Dispatch another runnable so nsThread::ProcessNextEvent is called recursively
> +      nsCOMPtr<nsIThread> thread = do_GetMainThread();
> +      nsCOMPtr<nsIRunnable> runnable = new TimedRunnable(150, 0, false);
> +      thread->Dispatch(runnable, NS_DISPATCH_NORMAL);

Just dispatching an event will not cause ProcessNextEvent to be called recursively. What you want to do is dispatch an event and then call ProcessNextEvent!
Attachment #8950144 - Flags: review?(afarre) → review-
Blocks: 1439929
Summary: Add counters in DocGroup to track its activity → Add counters in DocGroup and WorkerPrivate to track their activities
Blocks: 1443443
Comment on attachment 8950144 [details]
Bug 1437438 - Add a performance counter to track scheduler activity -

https://reviewboard.mozilla.org/r/219410/#review231310

I haven't looked closely at the tests, but I have lots of questions about the rest of the code.

::: dom/workers/WorkerPrivate.h:578
(Diff revision 29)
> +  PerformanceCounter*
> +  GetPerformanceCounter();

`DocGroup` returns a strong reference for this function.  This code returns a weak reference.  And then the actual `WorkerThread` code returns a strong reference.  Why the disparity between the implementations?

::: xpcom/tests/gtest/TestThreadMetrics.cpp:87
(Diff revision 29)
> + */
> +class ThreadMetrics: public ::testing::Test
> +{
> +public:
> +  explicit ThreadMetrics() {
> +    mPrefKey = NS_LITERAL_CSTRING("dom.performance.enable_scheduler_timing");

Why does this need to be an `nsCString`, if all you use it for is `.get()`'ing a `const char*`?  Could this just be a `static const char[]` instead?

::: xpcom/threads/PerformanceCounter.h:13
(Diff revision 29)
> +typedef mozilla::Array<Atomic<uint32_t>, 100> DispatchCounter;
> +typedef mozilla::Array<Atomic<uint64_t>, 100> ExecutionDurationCounter;

Where is this `100` constant coming from?  Why are we not simply using the number of `TaskCategory` enums?

::: xpcom/threads/PerformanceCounter.h:17
(Diff revision 29)
> +#define DEFAULT_KEY (size_t)mozilla::TaskCategory::Other
> +typedef mozilla::Array<Atomic<uint32_t>, 100> DispatchCounter;
> +typedef mozilla::Array<Atomic<uint64_t>, 100> ExecutionDurationCounter;
> +
> +// PerformanceCounter is a class that can be used to keep track of
> +// runnable execution times and dispatches counts.

Nit: "dispatch counts", I think.  (What are those?  Why are they relevant?)

::: xpcom/threads/PerformanceCounter.h:19
(Diff revision 29)
> +typedef mozilla::Array<Atomic<uint64_t>, 100> ExecutionDurationCounter;
> +
> +// PerformanceCounter is a class that can be used to keep track of
> +// runnable execution times and dispatches counts.
> +//
> +// It's currently instanciated in DocGroup and WorkerPrivate in order

Nit: "instantiated"

::: xpcom/threads/PerformanceCounter.h:31
(Diff revision 29)
> +// PerformanceCounter attached to the runnnable via its DocGroup
> +// or WorkerPrivate and call IncrementExecutionDuration()
> +//
> +// Notice that the execution duration counting takes into account
> +// recursivity. If an event triggers a recursive call to
> +// nsThread::ProcessNextEVent, the counter will discard teh time

Nit: "the time"

::: xpcom/threads/PerformanceCounter.h:38
(Diff revision 29)
> +class PerformanceCounter
> +{
> +public:
> +  NS_INLINE_DECL_THREADSAFE_REFCOUNTING(PerformanceCounter)
> +
> +  explicit PerformanceCounter(const nsACString& aName=NS_LITERAL_CSTRING("<Unnamed>"));

Is there value in having `<Unnamed>` counters?  Since this is a new API, it seems preferable to make everybody go ahead and name their counters up front, rather than trying to track down what `<Unnamed>` counters matter when we discover that everybody is being lazy and not naming things.

::: xpcom/threads/PerformanceCounter.h:50
(Diff revision 29)
> +
> +  /**
> +   * This is called via nsThread::ProcessNextEvent to measure runnable
> +   * execution duration.
> +   *
> +   * Keys can be used to distinguish counts per categories

This doesn't tell me much.  What are the categories?  Why are we permitting so many of them?  How do I figure out what is a legal value to pass here?

Couldn't we pass some sort of enum instead?

::: xpcom/threads/PerformanceCounter.h:60
(Diff revision 29)
> +   * Returns a key/counter array of all dispatches.
> +   */
> +  const DispatchCounter& GetDispatchCounter();
> +
> +  /**
> +   * Returns the execution duration per key.

Same problem here with the key.

::: xpcom/threads/PerformanceCounter.h:77
(Diff revision 29)
> +  uint64_t GetTotalDispatchCount();
> +
> +  /**
> +   * Reset all counters and execution duration.
> +   */
> +  virtual void ResetPerformanceCounters();

`virtual`?  What is the benefit of subclassing `PerfomanceCounter`?  Why did the documentation not mention this?

If there's no benefit in subclassing, we might as just well not have any virtual functions at all and declare the class `final`.  It's also unusual that resetting the counters would be considered an overridable behavior, but not any of the counter accumulations, which leads me to believe that this is not what you meant to do. :)

::: xpcom/threads/PerformanceCounter.h:80
(Diff revision 29)
> +   * Reset all counters and execution duration.
> +   */
> +  virtual void ResetPerformanceCounters();
> +
> +private:
> +  virtual ~PerformanceCounter() {}

Virtual destructors for refcounted things are unnecessary.

::: xpcom/threads/PerformanceCounter.cpp:31
(Diff revision 29)
> +  #ifdef XP_MACOSX
> +  LOG(("Total dispatch %llu", uint64_t(mTotalDispatchCount)));
> +  #else
> +  LOG(("Total dispatch %lu", uint64_t(mTotalDispatchCount)));
> +  #endif

I think you can use the `PRIu64` format specifier like so:

`LOG(("Total dispatch %" PRIu64, uint64_t(mTotalDispatchCount)));`

and avoid the need for system-specific ifdefs.

::: xpcom/threads/PerformanceCounter.cpp:43
(Diff revision 29)
> +  #ifdef XP_MACOSX
> +  LOG(("Total duration %llu", uint64_t(mTotalExecutionDuration)));
> +  #else
> +  LOG(("Total duration %lu", uint64_t(mTotalExecutionDuration)));
> +  #endif

Same comment here about the logging.

::: xpcom/threads/nsThread.h:134
(Diff revision 29)
>    {
>      return mShutdownContext != nullptr;
>    }
>  
> +#ifndef RELEASE_OR_BETA
> +  virtual RefPtr<mozilla::PerformanceCounter> GetPerformanceCounter(nsIRunnable* aEvent);

Why does this need to return a strong reference?  Isn't somebody holding on to that for us?

::: xpcom/threads/nsThread.cpp:934
(Diff revision 29)
> +RefPtr<mozilla::PerformanceCounter>
> +nsThread::GetPerformanceCounter(nsIRunnable* aEvent)
> +{
> +  RefPtr<SchedulerGroup::Runnable> docRunnable = do_QueryObject(aEvent);
> +  if (docRunnable) {
> +    RefPtr<mozilla::dom::DocGroup> docGroup = docRunnable->DocGroup();

Do we need to hold a strong reference here?  As long as the `docRunnable` is alive, its `DocGroup` should be alive, right?

::: xpcom/threads/nsThread.cpp:1015
(Diff revision 29)
> +      if (GetSchedulerLoggingEnabled()
> +          && PR_GetCurrentThread() == mThread

This specific condition is repeated multiple times in this function, maybe it should be pulled out as a separate variable so we're not doing a bunch of unnecessary calls.

Actually, wait.  Under what conditions would `PR_GetCurrentThread()` *not* be equal to `mThread`?  We're running this code on the thread itself, so isn't this condition trivially true?

::: xpcom/threads/nsThread.cpp:1102
(Diff revision 29)
> +#ifndef RELEASE_OR_BETA
> +      // End of execution, we can send the duration for the group
> +      if (GetSchedulerLoggingEnabled() && PR_GetCurrentThread() == mThread) {
> +       if (recursiveEvent) {
> +          // If we're in a recursive call, reset the timer,
> +          // so the parent get its remaining execution time right.

Nit: "gets its".
Attachment #8950144 - Flags: review?(nfroyd)
No longer blocks: 1443443
No longer blocks: 1419679
Comment on attachment 8950144 [details]
Bug 1437438 - Add a performance counter to track scheduler activity -

https://reviewboard.mozilla.org/r/219410/#review231742

Some more questions.

::: xpcom/threads/PerformanceCounter.h:48
(Diff revision 31)
> +   * aCategory can be used to distinguish counts per TaskCategory
> +   * When not provided, defaults to TaskCategory::Other
> +   */
> +  void IncrementDispatchCounter(TaskCategory aCategory=TaskCategory::Other);

Is it really correct here to default to `TaskCategory::Other`?  `TaskCategory` is a concept for `DocGroup`s and doesn't apply for tasks dispatched on worker threads, for instance.

::: xpcom/threads/PerformanceCounter.h:60
(Diff revision 31)
> +  void IncrementExecutionDuration(uint32_t aMicroseconds,
> +                                  TaskCategory aCategory=TaskCategory::Other);

Same question here, only moreso: when we're dispatched, at least the `DocGroup` knows the `TaskCategory` and can count the number of dispatches appropriately.  But for this function, we don't even know the `TaskCategory`.  In fact we're not passing the second argument in any calls to this function in this patch, so everything is getting accumulated into `TaskCategory::Other`.

In effect, the patch is saying that we're dispatching to a variety of `TaskCategory`ies, but only accumulating time in the `TaskCategory::Other` category, which is surely going to skew some results...

I don't think it's correct to merely start passing a `TaskCategory` argument to `IncrementExecutionDuration`, though; I think we should consider a broader solution.

::: xpcom/threads/PerformanceCounter.cpp:76
(Diff revision 31)
> +  for (auto& cnt : mDispatchCounter) {
> +    cnt = 0;
> +  }
> +  for (auto& cnt : mExecutionDurationCounter) {
> +    cnt = 0;
> +  }

IIUC, when this function is called (even if it can only be called from a single thread--if it *can* only be called from a single thread, we should be asserting that here), we may wind up in a situation that the following happens:

1. A task is dispatched for a particular performance counter in a category C.
2. The dispatch counter for category C is zero'd.
3. The execution counter for category C is zero'd.
4. We then go and run the task dispatched in step 1.

So we can wind up with a non-zero execution time for a category (and indeed, the total duration) while having incorrect (even zero) dispatch counts.  Is this situation desirable?
Attachment #8950144 - Flags: review?(nfroyd)
Comment on attachment 8950144 [details]
Bug 1437438 - Add a performance counter to track scheduler activity -

https://reviewboard.mozilla.org/r/219410/#review231742

> Is it really correct here to default to `TaskCategory::Other`?  `TaskCategory` is a concept for `DocGroup`s and doesn't apply for tasks dispatched on worker threads, for instance.

Since worker threads don't categorize the runnable they dispatch, I am making the assumption that we can default to TaskCategory::Other for now. When we get the counters in the main process we know when it's a worker and we can discard that value.

In the future, I am assuming that if we want to add a category for runnable dispatched by workers, we could extend the array to be a concatenation of TaskCategory+AnotherEmum.

> Same question here, only moreso: when we're dispatched, at least the `DocGroup` knows the `TaskCategory` and can count the number of dispatches appropriately.  But for this function, we don't even know the `TaskCategory`.  In fact we're not passing the second argument in any calls to this function in this patch, so everything is getting accumulated into `TaskCategory::Other`.
> 
> In effect, the patch is saying that we're dispatching to a variety of `TaskCategory`ies, but only accumulating time in the `TaskCategory::Other` category, which is surely going to skew some results...
> 
> I don't think it's correct to merely start passing a `TaskCategory` argument to `IncrementExecutionDuration`, though; I think we should consider a broader solution.

Do you have a solution in mind ? Should we have a new enum that is a super-set of TaskCategory + 1 entry for workers ?

> IIUC, when this function is called (even if it can only be called from a single thread--if it *can* only be called from a single thread, we should be asserting that here), we may wind up in a situation that the following happens:
> 
> 1. A task is dispatched for a particular performance counter in a category C.
> 2. The dispatch counter for category C is zero'd.
> 3. The execution counter for category C is zero'd.
> 4. We then go and run the task dispatched in step 1.
> 
> So we can wind up with a non-zero execution time for a category (and indeed, the total duration) while having incorrect (even zero) dispatch counts.  Is this situation desirable?

It's suboptimal but I don't think it's a big deal if we miss a couple of hits. The performance counters are studied over time to see how a tab of a worker is using the scheduler. The cost of making sure we are fully accurate would mean that we'd need a lock when we reset stuff, and as far as I know how to do it, it would slow down the event loop ? or maybe it's easy to do ?
Comment on attachment 8950144 [details]
Bug 1437438 - Add a performance counter to track scheduler activity -

https://reviewboard.mozilla.org/r/219410/#review231742

> Since worker threads don't categorize the runnable they dispatch, I am making the assumption that we can default to TaskCategory::Other for now. When we get the counters in the main process we know when it's a worker and we can discard that value.
> 
> In the future, I am assuming that if we want to add a category for runnable dispatched by workers, we could extend the array to be a concatenation of TaskCategory+AnotherEmum.

ok so after our IRC chat, I have implemented a DispatchCategory "fake enum" class. I am not sure if it's the best way to do but its works. Let me know what you think :)
Comment on attachment 8950144 [details]
Bug 1437438 - Add a performance counter to track scheduler activity -

https://reviewboard.mozilla.org/r/219410/#review232724
Attachment #8950144 - Flags: review?(afarre) → review+
Comment on attachment 8950144 [details]
Bug 1437438 - Add a performance counter to track scheduler activity -

https://reviewboard.mozilla.org/r/219410/#review232866

A few more things!

::: xpcom/tests/gtest/TestThreadMetrics.cpp:109
(Diff revision 32)
> +  }
> +
> +  virtual void TearDown() {
> +    // let's reset the counters
> +    mCounter->ResetPerformanceCounters();
> +    ASSERT_EQ(mCounter->GetDispatchCounter()[mOther], 0u);

You should probably test all the counters here, just to make sure `ResetPerformanceCounters` is doing what you expect.

::: xpcom/tests/gtest/TestThreadMetrics.cpp:127
(Diff revision 32)
> +    runnable = new SchedulerGroup::Runnable(runnable.forget(),
> +                                            mSchedulerGroup, mDocGroup);
> +    return mDocGroup->Dispatch(TaskCategory::Other, runnable.forget());
> +  }
> +
> +  void ProcessAllEvent() {

Nit: let's call this `ProcessAllEvents`.

::: xpcom/tests/gtest/TestThreadMetrics.cpp:151
(Diff revision 32)
> +  // Let's look at the task category "other" counter
> +  ASSERT_EQ(mCounter->GetDispatchCounter()[mOther], 1u);

Maybe make sure that none of the other categories received dispatches?

::: xpcom/tests/gtest/TestThreadMetrics.cpp:173
(Diff revision 32)
> +  // let's look at the counters
> +  ASSERT_EQ(mCounter->GetDispatchCounter()[mOther], 1u);

Maybe make sure that none of the other categories received dispatches?

::: xpcom/threads/PerformanceCounter.h:23
(Diff revision 32)
> +  DispatchCategory(uint32_t aValue)
> +    : mValue(aValue)
> +  {}
> +
> +  DispatchCategory(TaskCategory aValue)

Please make these two constructors `constexpr`.  They also probably both need to be `explicit` if you plan on passing the static analysis job on try. ;)

::: xpcom/threads/PerformanceCounter.h:24
(Diff revision 32)
> + */
> +class DispatchCategory final
> +{
> +public:
> +  DispatchCategory(uint32_t aValue)
> +    : mValue(aValue)

We have some expectation about what this value is going to be, don't we?  (`TaskCategory::Count`, right?)  Shouldn't we assert that we're getting just that value and no other?

(You'll have to make the constructor not-`constexpr` in debug builds once this assert is added, sadly.)

::: xpcom/threads/PerformanceCounter.h:32
(Diff revision 32)
> +  DispatchCategory(TaskCategory aValue)
> +    : mValue((uint32_t)aValue)
> +  {}
> +
> +  uint32_t
> +  GetValue()

Let's make this method `const`.

::: xpcom/threads/PerformanceCounter.h:78
(Diff revision 32)
> +
> +  /**
> +   * This is called everytime a runnable is dispatched.
> +   *
> +   * aCategory can be used to distinguish counts per TaskCategory
> +   * When not provided, defaults to TaskCategory::Other

Nit: there's no "When not provided", because this argument is no longer optional.

::: xpcom/threads/PerformanceCounter.h:106
(Diff revision 32)
> +  uint32_t GetDispatchCount(DispatchCategory aCategory);
> +
> +  /**
> +   * Returns the total number of dispatches.
> +   */
> +  uint32_t GetTotalDispatchCount();

Why does this return a `uint32_t` when the corresponding member below is `uint64_t`?

::: xpcom/threads/nsThread.cpp:1015
(Diff revision 32)
>        if (MAIN_THREAD == mIsMainThread) {
>          HangMonitor::NotifyActivity();
>        }
>  
>  #ifndef RELEASE_OR_BETA
> +      if (GetSchedulerLoggingEnabled()

Can we pull the result of this out into a `bool` and make here and and the below calls use this `bool`, so we don't have to call the same function three times?  The checks in `GetSchedulerLoggingEnabled` can be rather expensive...
Attachment #8950144 - Flags: review?(nfroyd)
Comment on attachment 8950144 [details]
Bug 1437438 - Add a performance counter to track scheduler activity -

https://reviewboard.mozilla.org/r/219410/#review233214

Thanks for sticking through all the reviews!

Do you have any sense of how this patch does with our performance tests?
Attachment #8950144 - Flags: review?(nfroyd) → review+
Comment on attachment 8950144 [details]
Bug 1437438 - Add a performance counter to track scheduler activity -

https://reviewboard.mozilla.org/r/219410/#review233214

Thanks for your reviews! I've just finished a full try and it looks ok. I will run a full talos test next, setting the flag so the counter gets always activated. Let me know if you may think of any other test for testing performances
Pushed by tziade@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/552d61cb273a
Add a performance counter to track scheduler activity - r=farre,froydnj
I've ran all talos tests with the pref set and I have not seen any alert through perfherder on try. I am therefore landing the patch now - thanks farre and froydnj for helping me and coping with my gecko learning curve :)
Status: NEW → RESOLVED
Closed: Last year
Resolution: --- → FIXED
Target Milestone: --- → mozilla61
Component: DOM → DOM: Core & HTML
Regressions: 1548795
You need to log in before you can comment on or make changes to this bug.