Closed Bug 1441972 Opened 4 years ago Closed 4 years ago

Add telemetry to measure how many DocGroups a user has


(Core :: DOM: Core & HTML, enhancement, P2)




Tracking Status
firefox61 --- fixed
firefox62 --- fixed


(Reporter: overholt, Assigned: farre)




(2 files, 4 obsolete files)

No description provided.
Assignee: nobody → afarre
Leading question: How do we want to count DocGroups? It's reasonable to assume that we at least want to measure in a manner that is similar to how we measure user engagement for tabs[1], that is the maximum concurrent number of DocGroups. What isn't as clear is at what level we want to count DocGroups. Do we want maximum concurrent in/per:

1) Total
2) Content process
3) TabGroup
4) Domain

The answer lies in what we want to know, which is: how many process would we get if we moved every DocGroup from the same domain to a unique process. This means that it might not as easy as just counting DocGroups straight up (as in 1-3), since a DocGroup for a base domain could be present and duplicated (i.e. not referentially equal) in several TabGroups with TabGroups with that base domain in several content processes. It is also not as easy as in counting unique domains as in [2], since that doesn't take concurrent DocGroups into account. 

So in the end, what do we want to measure? We want to measure the concurrent max of unique domains, where concurrent means that there is a DocGroup alive from that base domain[3].

This patch does the simplest possible thing to count doc groups with a scalar telemetry probe. It records the maximum number of docgroups any tab has had (I guess during a session, but the telemetry docs are a bit hazy here).
I realized that 'unload' is the perfect time to count docgroups! This is still not perfectly accurate, since the lifetime of docgroups isn't distinct, but it is definitely good enough. I think that we can drop the idea of a scalar probe to get the max, since this histogram version contains an estimate of this value. I say estimate since the probe is exponential which gives us more precision for lower values at the cost of lower precision for higher values. I guess that it could as easily be a linear histogram, and it could have a high value higher than 50, but if we get as high as 50 we're in trouble anyway, right? Just say so, and I'll tweak it.

I'll hold off on requesting the data review until we've decided that we're happy with the probe.
Attachment #8979924 - Attachment is obsolete: true
Attachment #8980291 - Flags: review?(nika)
Answers to questions from

1. This will indicate how many processes that would be needed to have a process per doc group.

2. To be able to determine the possibility of having one doc group per process.

3. It would be possible to perform a study with one patched version of Firefox collecting data manually for a bunch of pre-determined sites, say Alexa top 50. The reason for this not being engough is mostly that it doesn't reflect actual usage.

4. Not really. This probe lies somewhere in the area between TAB_COUNT and SCALARS_BROWSER.ENGAGEMENT.UNIQUE_DOMAINS_COUNT.

5. When the top level inner window unloads, this probe records the current number of docgroups associated with the window's tabgroup. The probe will be called DOCGROUPS_PER_TABGROUP. This falls into Category 2 "Interaction data". It is not Category three, since no actual information of the site is saved, only a fairly accurate estimate of how many unique domains that are "active" at the same time for a particular tab.

6. I want this data to be collected 7 months initially.

7. This will be for non-release channels (which will happen automatically since it is an opt-in probe).

8. The collection is not default on.

9. There are two perspectives of how to analyze the data. First and foremost we'll be able to use the Measuement Dashboard from t.m.o to visualize the distribution ('the shape' basically) of how many docgroups per tabgroups there are. The second is the distribution of the maximum number of docgroups per tab per "user", which we should be able to estimate by using s.t.m.o to gather the max values paired with every client_id. We could of course use yet another probe for this to get an exact value for this, but again, the rough shape of the distribution is enough information I believe.

10. The results will probably only be of interest internally.
Comment on attachment 8980291 [details] [diff] [review]

Review of attachment 8980291 [details] [diff] [review]:

::: dom/base/TabGroup.cpp
@@ +308,5 @@
> +uint32_t
> +TabGroup::Count() const
> +{
> +  return mDocGroups.Count();

I think this probe is useful, but we also will want to collect information about how many _active_ docgroups we have. The difference between the two will be how many processes we could be spending per tab on the bfcache. 

Perhaps we could also add a .IsActive() method to DocGroup which checks if any of its documents are active, and have another probe which counts just the active ones?

It might also be useful to do some manual testing of browsing around the web after this is implemented to check if that number is significantly different from the real mDocGroups number.
Attachment #8980291 - Flags: review?(nika) → review+
Moved collection to nsIDocument::OnPageHide, it is indeed better suited to happen there.

Added/changed the collected probes to ACTIVE_DOCGROUPS_PER_TABGROUP and TOTAL_DOCGROUPS_PER_TABGROUP. They do indeed differ enough to warrant having two probes. Also, added a guard to protect from actually performing the docgroup counting if user has opted out of telemtetry collection.
Attachment #8980291 - Attachment is obsolete: true
Attachment #8980560 - Flags: review?(nika)
Comment on attachment 8980560 [details] [diff] [review]

Review of attachment 8980560 [details] [diff] [review]:

::: dom/base/DocGroup.cpp
@@ +186,5 @@
> +bool
> +DocGroup::IsActive() const
> +{
> +  for (auto doc : mDocuments) {

nit: Please specify nsIDocument* here. I worry when I see a non-ref-qualified iterator that we'll be invoking many unnecessary copy constructors.

::: dom/base/DocGroup.h
@@ +122,5 @@
>    // List of DocGroups that has non-empty signal slot list.
>    static AutoTArray<RefPtr<DocGroup>, 2>* sPendingDocGroups;
> +  // Returns true if any of its documents are active but not in the bfcache.
> +  bool IsActive() const;

nit: newline after this line.

::: dom/base/TabGroup.cpp
@@ +320,5 @@
> +    }
> +  }
> +
> +  return count;
> +}

nit: newline after this line.

::: dom/base/TabGroup.h
@@ +99,5 @@
>    {
>      return mDocGroups.Iter();
>    }
> +  uint32_t Count(bool aActive = false) const;

Please add a comment explaining the aActive parameter. Also perhaps rename to `aActiveOnly` to make it more clear?

::: dom/base/nsDocument.cpp
@@ +8491,5 @@
>  void
>  nsIDocument::OnPageHide(bool aPersisted, EventTarget* aDispatchStartTarget)
>  {
> +  if (Telemetry::CanRecordExtended() && IsTopLevelContentDocument()) {
> +    RefPtr<TabGroup> tabGroup = GetDocGroup()->GetTabGroup();

nit: Can you null-check these here? I wouldn't be surprised if there was an edge case where OnPageHide could fire and one or both of these is null.

::: toolkit/components/telemetry/Histograms.json
@@ +13774,5 @@
> +    "expires_in_version": "75",
> +    "kind": "exponential",
> +    "high": 50,
> +    "n_buckets": 20,
> +    "description": "Number of active doc groups per tab group."

Please mention when this is collected in these descriptions.
Attachment #8980560 - Flags: review?(nika) → review+
Carrying over Nika's r+. Nits fixed.
Attachment #8980560 - Attachment is obsolete: true
Attachment #8981118 - Flags: review+
Attachment #8981118 - Flags: review?(chutten)
Chris, see comment 4 for the answers to data review questions.
Comment on attachment 8981118 [details] [diff] [review]

Review of attachment 8981118 [details] [diff] [review]:

Your data collection review request in Comment #4 mentions only one of the two probes you are adding. It's no biggie, I see you split the collection into active/not active and so the review can proceed. For the record, you are asking to collect both ACTIVE_DOCGROUPS_PER_TABGROUP and TOTAL_DOCGROUPS_PER_TABGROUP, both needed to answer the questions in your request and both Category 2 (well, given their nature as almost an implementation detail we could almost make the argument that it's Category 1, but 2 is the easier sell.)


    Is there or will there be documentation that describes the schema for the ultimate data set available publicly, complete and accurate? 

Standard Telemetry mechanisms apply.

    Is there a control mechanism that allows the user to turn the data collection on and off?

Standard Telemetry mechanisms apply.

    If the request is for permanent data collection, is there someone who will monitor the data over time?


    Using the category system of data types on the Mozilla wiki, what collection type of data do the requested measurements fall under? **

We'll round them up to Category 2: Interaction

    Is the data collection request for default-on or default-off?

Default-off (prerelease-only)

    Does the instrumentation include the addition of any new identifiers (whether anonymous or otherwise; e.g., username, random IDs, etc. See the appendix for more details)?


    Is the data collection covered by the existing Firefox privacy notice? 


    Does there need to be a check-in in the future to determine whether to renew the data? 

Yes. :farre, I will count on you to file the necessary bug to remove or apply to renew this collection before it expires.

Result: datareview+ (once the expires_in_version comment is addressed)

::: toolkit/components/telemetry/Histograms.json
@@ +13771,5 @@
> +    "record_in_processes": ["content"],
> +    "alert_emails": [""],
> +    "bug_numbers": [1441972],
> +    "expires_in_version": "75",

Comment #4 indicates you want seven months' collection initially. We currently plan to have Nightly on 66 in December (, so an expires_in_version of "67" ought to cover it.
Attachment #8981118 - Flags: review?(chutten) → review+
r+ from nika and chutten carrying over after fixing issue from data review. Now probe expires in 67.
Attachment #8981118 - Attachment is obsolete: true
Attachment #8981839 - Flags: review+
Keywords: checkin-needed
Pushed by
Add probe for counting docgroups per tabgroup. r=nika, data-review=chutten
Keywords: checkin-needed
Closed: 4 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla62
Any chance we can get this uplifted to 61 so we get release population data sooner?
Flags: needinfo?(afarre)
Approval Request Comment
[Feature/Bug causing the regression]:
[User impact if declined]:
We get telemetry later.
[Is this code covered by automated tests?]:
[Has the fix been verified in Nightly?]:
[Needs manual test from QE? If yes, steps to reproduce]: 
[List of other uplifts needed for the feature/fix]:
[Is the change risky?]:
[Why is the change risky/not risky?]:
Only adds telemtry.
[String changes made/needed]:
Flags: needinfo?(afarre)
Attachment #8984122 - Flags: approval-mozilla-beta?
I should add that the attached patch is identical to the reviewed one, modulo rebasing.
Comment on attachment 8984122 [details] [diff] [review]

Adds some new Telemetry probes and has baked on Nightly for a week. Approved for 61.0b12.
Attachment #8984122 - Flags: approval-mozilla-beta? → approval-mozilla-beta+
Component: DOM → DOM: Core & HTML
See Also: → 1589700
You need to log in before you can comment on or make changes to this bug.