Closed Bug 1419679 Opened 7 years ago Closed 6 years ago

Keep track of tabs & workers awakenings in the Scheduler

Categories

(Toolkit :: Performance Monitoring, enhancement)

enhancement
Not set
normal

Tracking

()

RESOLVED FIXED

People

(Reporter: tarek, Assigned: tarek)

References

Details

(Whiteboard: [perf-tools])

Attachments

(2 files)

Similar to bug 1415536 - I would like to extend this class to send events every n seconds, containing the probed values.

The generated event would be a JSON mapping aggregating all probed values in the last n seconds, that can be easily consumed on the JS side, containing everything needed to monitor what each tab is doing.
Blocks: 1419681
Just to clarify, this would be:

- every n seconds *if there was any activity*;
- with only the contents of the current process.

Right?
Flags: needinfo?(tarek)
> every n seconds *if there was any activity* 

yes.

> with only the contents of the current process

I am not sure what that means, but basically the goal would be to keep track of the activity
in the same way you've built the tabs table in about:performance. So the events woud be distributed
to observers in JS that would track everything from every process.
Flags: needinfo?(tarek) → needinfo?(dteller)
Well, you need activity to be tracked in each process. This will account for a number of tabs, but not all. Then, you need each process to send its data to the parent process, to collate the activity of all processes.

One of the difficulties here is that not all processes will send the data at the same time, some processes may not even send data at all (if there is no activity or if the process is frozen or crashed), so the parent process will have to deal with holes in its knowledge.
Flags: needinfo?(dteller)
I am a bit confused with your answer because your panel about:performance is displaying all this information per tab as far as I can see, and I thought that nsPerformanceStats was the part that was doing the collecting.

Does it mean that about:performances has holes in its knowledge as well?  The data displayed is an approximation ?
Flags: needinfo?(dteller)
about:performance is using polling, so it's requesting all the data from each process every ~10 seconds. That's pretty bad for battery, because it causes swapping, etc. That's why I have later added nsIPerformanceObserver to nsIPerformanceStatsService, to turn this into battery-friendlier events. These events are what was used by the AddonWatcher.

And yes, about:performance may have holes if a process doesn't respond within a few seconds. In this case, I don't remember what about:performance does, either it assumes that the corresponding tabs are dead or it keeps the old values.


That's something unescapable: multi-process means that several processes can fail or freeze independently. Oh, the fun :)
Flags: needinfo?(dteller)
From a discussion with David, one possible implementation would be:

- track the activity per windowID in internal counters (one per process) in the scheduler
- send through IPC every n seconds a summary of those activities
- collect in https://searchfox.org/mozilla-central/source/toolkit/components/perfmonitoring/PerformanceWatcher.jsm#291

Notice that a very simple first implementation would simply consist of counting how frequently / how long a WIndowID is active in the scheduler and not necessarely keep track of the CPU/GPU usage. If the number stays high for a long time (e..g load average) it's a sign the tab is super busy
This is a first version of a patch I am starting:

- create a new nsITabGroupActivity interface that keeps track of the number of events per TabGroup -- along with the average duration of each event. 
- add a mTabGroupActivity member in the Scheduler class that implements nsITabGroupActivity
- add a call in nsThread::ProcessNextEvent() that increments the nsITabGroupActivity counter
- add a timer that get fired every n seconds to report the nsITabGroupActivity as the subject (deactivated by default)

what do you think Andreas and Nathan ?
Flags: needinfo?(nfroyd)
Flags: needinfo?(afarre)
Summary: Extend nsPerformanceStats to send event → Keep track of tabs awakenings in the Scheduler
Hi Nika, 
I pushed a work in progress to show you what I am trying to do in the scheduler. e.g. grab the windowid
that corresponds to the tab that gets awakened by the scheduler. This is the prliminary work for doing what I am describing in comment #7

One thing that's a bit unclear to me is the cases where one TabGroup deals with several windows. when nsThread.ProcessNextEvent() is called on a TabGroup that has several windows in it, should we make the assumption that the event is going to impact all the windows in the list or is there something else to take into account?

Thanks!
Flags: needinfo?(nika)
Flags: needinfo?(nfroyd)
Flags: needinfo?(bugs)
Flags: needinfo?(afarre)
I had a discussion with :smaug and will try to work on the patch with my new understanding :)
Flags: needinfo?(nika)
Flags: needinfo?(bugs)
Comment on attachment 8935290 [details]
Bug 1419679 - Keep track of tabs awakenings in the Scheduler -

Clearing review 'cause you're reworking the patch
Attachment #8935290 - Flags: review?(nika)
Hey Nika,

My patch is still a work in progress (miss tests, some stuff to finish..) but I figured I would put it on reviewboard today for your early feedback to validate the design.


Here's a high-level view: a service keep tracks of counters in memory. A counter is just an integer with a unique label. Counters are sorted by process ids and "host" when available. A host is the document URI from which the increment is called.

I am adding a first series of counters in DocGroup::Dispatch every time a runnable is called. I am using the TaskCategory as the key for the counter. I am planning to add more in other places in Firefox once this patch is in.

The service dumps counters in a flat array that's used as a subject of the event sent to observers every n seconds. (in the current patch it's every 100ms for testing purposes, but the final value will probably be one second)

There are two observers we can add on this event:
- one that will update a screen like about:performances to display info about what's going on in teh scheduler and elswhere
- one that will send that data in Telemetry

Thanks!
Flags: needinfo?(nika)
Comment on attachment 8945028 [details]
Bug 1419679 - Keep track of tabs awakenings in the Scheduler

https://reviewboard.mozilla.org/r/215238/#review220954

I think olli or farre would be better picks for reviewing changes to this component now. I haven't worked much on the DocGroup/TabGroup since I did the initial implementation >1yr ago. That being said, I have a few comments & nits.

::: dom/base/DocGroup.cpp:16
(Diff revision 2)
>  #include "mozilla/Telemetry.h"
>  #include "nsIDocShell.h"
> +#include "mozilla/Logging.h"
> +#include "nsPerformanceService.h"
> +
> +static LazyLogModule sDocLog("nsPerformanceService");

I don't understand why you define the sDocLog as "nsPerformanceService" and I don't see any callers of this LOG macro?

::: dom/base/DocGroup.cpp:74
(Diff revision 2)
>  nsresult
>  DocGroup::Dispatch(TaskCategory aCategory,
>                     already_AddRefed<nsIRunnable>&& aRunnable)
>  {
> +  nsCOMPtr<nsIPerformanceService> perf = GetOrCreatePerfService();
> +  // XXX we want perf to be a service do_GetService("@mozilla.org/perf-metrics;1");

We'd prefer to move away from xpcom when possible, so it would be nice to avoid doing this.

::: dom/base/DocGroup.cpp:76
(Diff revision 2)
>                     already_AddRefed<nsIRunnable>&& aRunnable)
>  {
> +  nsCOMPtr<nsIPerformanceService> perf = GetOrCreatePerfService();
> +  // XXX we want perf to be a service do_GetService("@mozilla.org/perf-metrics;1");
> +  MOZ_ASSERT(perf);
> +  if (perf) {

I am not convinced that doing all of this work of building up host strings, appending key suffixes, and doing a bunch of hashtable lookups is acceptable to perform after every DocGroup dispatch.

::: dom/base/DocGroup.cpp:78
(Diff revision 2)
> +  nsCOMPtr<nsIPerformanceService> perf = GetOrCreatePerfService();
> +  // XXX we want perf to be a service do_GetService("@mozilla.org/perf-metrics;1");
> +  MOZ_ASSERT(perf);
> +  if (perf) {
> +    // grabbing the host name of the first document
> +    nsString host;

Looks like the code might be nicer if this was initialized as nsString host = NS_LITERAL_STRING("None");

Also, why are you using nsString instead of nsCString here? GetHost produces an nsCString, and you're doing an unnecessary transcode for some reason?

::: dom/base/DocGroup.cpp:86
(Diff revision 2)
> +    } else {
> +      nsCOMPtr<nsIDocument> doc = do_QueryInterface(mDocuments[0]);
> +      nsAutoCString uri;
> +      nsCOMPtr<nsIURI> docURI = doc->GetDocumentURI();
> +      if (docURI) {
> +        nsresult rv = docURI->GetHost(uri);

Can we maybe cache this on the DocGroup so we don't have to fetch it every time we dispatch a event to one?

::: dom/base/DocGroup.cpp:97
(Diff revision 2)
> +      } else {
> +        host.AssignLiteral("None");
> +      }
> +    }
> +    // converting TaskCategory in a user-friendly key
> +    nsString key;

Same here - I don't see why this can't be a nsCString, as all values which this is assigned to are ASCII.

I presume these are due to the decision to use wide strings in IncrementMetrics, but it seems to me like utf-8 strings would be cleaner.

::: dom/base/DocGroup.cpp:100
(Diff revision 2)
> +    }
> +    // converting TaskCategory in a user-friendly key
> +    nsString key;
> +    switch (aCategory) {
> +      case TaskCategory::UI:
> +        key.AppendPrintf("Scheduler/TaskCategory/UI");

Please use AssignLiteral here and below instead of AppendPrintf so that we can be confident we're not performing unnecessary string allocations

::: toolkit/components/perfmonitoring/nsIPerformanceData.idl:9
(Diff revision 2)
> +/**
> + * Keep tracks of the bytes that are sent (tx) and received (rx)
> + * into a socket identified by its file descriptor (fd)
> + * for a given host & port.
> + */

I don't understand this comment.

::: toolkit/components/perfmonitoring/nsIPerformanceService.idl:14
(Diff revision 2)
> +[scriptable, uuid(09e87097-aabb-481b-81ce-aa4fbd2bd389)]
> +interface nsIPerformanceService : nsISupports
> +{
> +  void init();
> +  void shutdown();
> +  void IncrementMetrics(in AString aKey, in AString aHost, in unsigned long aAmount);

nit: Usually this would be written in ipdl as incrementMetrics

::: toolkit/components/perfmonitoring/nsPerformanceService.h:36
(Diff revision 2)
> +  uint32_t mPid;
> +  nsString mHost;
> +  nsString mKey;
> +  uint32_t mCount;

nit: If you put mPid and mCount next to each other, this struct will be 1 word shorter on 64-bit.

::: toolkit/components/perfmonitoring/nsPerformanceService.h:53
(Diff revision 2)
> +  PerformanceMetrics(Counters* aCounters);
> +  uint32_t Length();
> +private:
> +  ~PerformanceMetrics() = default;
> +
> +  Counters*  mCounters;

I don't see the code which ensures that this pointer never dangles. nsPerformanceService doesn't seem to null this pointer our for you.

I think that if you hold onto a PerformanceMetrics after nsPerformanceService is shut down, this pointer will almost definitely be dangling.

There should at the least be a comment explaining why this pointer is safe.

::: toolkit/components/perfmonitoring/nsPerformanceService.cpp:43
(Diff revision 2)
> +
> +
> +PerformanceMetrics::PerformanceMetrics(Counters* aCounters)
> +  : mCounters(aCounters)
> +{
> +  mMetrics = do_CreateInstance(NS_ARRAY_CONTRACTID);

I'd prefer to avoid using nsIArrays if possible.

::: toolkit/components/perfmonitoring/nsPerformanceService.cpp:99
(Diff revision 2)
> +  uint32_t length = 0;
> +  mMetrics->GetLength(&length);
> +
> +  nsCOMPtr<nsIMutableArray> metrics = do_CreateInstance(NS_ARRAY_CONTRACTID);
> +  for (uint32_t i = 0; i < length; ++i) {
> +    nsCOMPtr<nsIPerformanceMetricsData> stats;
> +    stats = do_QueryElementAt(mMetrics, i);
> +    mozilla::DebugOnly<nsresult> rv = metrics->AppendElement(stats);
> +    MOZ_ASSERT(NS_SUCCEEDED(rv));
> +  }
> +  metrics.forget(aMetrics);
> +  return NS_OK;
> +

Even if you stick with XPCOM it will probably be cleaner to have a GetLength() and GetMetricAt() method on PerforamnceMetrics (and store an nsTArray internally), rather than using nsIMutableArrays.

::: toolkit/components/perfmonitoring/nsPerformanceService.cpp:128
(Diff revision 2)
> +   uint32_t length = 0;
> +   mMetrics->GetLength(&length);
> +
> +   if (length == 0) {

nit: indentation

::: toolkit/components/perfmonitoring/nsPerformanceService.cpp:155
(Diff revision 2)
> +
> +
> +NS_IMPL_ISUPPORTS(PerformanceMetricsData, nsIPerformanceMetricsData);
> +
> +NS_IMETHODIMP
> +PerformanceMetricsData::GetHost(nsAString& aHost) {

nit: Make sure your braces are on the right line. We add a newline before the {

::: toolkit/components/perfmonitoring/nsPerformanceService.cpp:185
(Diff revision 2)
> +
> +NS_IMETHODIMP
> +nsPerformanceService::Init()
> +{
> +  gPerformanceService = this;
> +  NS_ADDREF(gPerformanceService);

Please avoid using NS_ADDREF and NS_RELEASE.

Consider using a `mozilla::StaticRefPtr<nsPerformanceService>` and calling `ClearOnShutdown(&gPerformanceService)`. This also helps avoid the ::Shutdown method grossness.

It also makes sure that ::Shutdown is actually called. AFAICT this object is leaked right now.

::: toolkit/components/perfmonitoring/nsPerformanceService.cpp:283
(Diff revision 2)
> +
> +nsIPerformanceService* GetOrCreatePerfService()
> +{
> +
> +  if (!nsPerformanceService::gPerformanceService) {
> +    RefPtr<nsPerformanceService> service = new nsPerformanceService(100);

nit: Put this magic number in a define somewhere.
Flags: needinfo?(nika)
Depends on: 1437438
Depends on: 1439929
Whiteboard: [perf-tools]
Depends on: 1443443
Summary: Keep track of tabs awakenings in the Scheduler → Keep track of tabs & workers awakenings in the Scheduler
No longer depends on: 1439929
No longer depends on: 1437438
All the work happened in Bug 1443443 - let's close this one and its aborted patch :)
Status: NEW → RESOLVED
Closed: 6 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: