Closed Bug 1589700 Opened 5 years ago Closed 4 years ago

Collect telemetry on the number of site-origins across all tabs for all windows

Categories

(Core :: Performance, enhancement, P2)

enhancement

Tracking

()

RESOLVED FIXED
mozilla73
Tracking Status
firefox73 --- fixed

People

(Reporter: sefeng, Assigned: sefeng)

References

Details

Attachments

(2 files, 3 obsolete files)

The proposal is we collect number of site origins at the end of pageload, so this gives us a sense of the average/distribution of number of site origins per load (per tab). Then we can use it to correlate with the number of tabs, to get a sense of how Fission would impact our users in terms of the number of processes etc.

Attached file data_collection_review_1589700.txt (obsolete) —
Attachment #9102564 - Flags: data-review?(tdsmith)

Hm, I'm not sure I understand what you want to collect. It's sometimes helpful to have the patch up for review before the data collection request because it reduces ambiguity. Is it one of these things:

  • How many different origins a single pageload uses
  • The count of distinct origins used by all active tabs, accumulated after each pageload

I'm not sure what the significance of a tab or of the number of tabs is. I'm also not sure if Fission processes are bound to tabs or what their lifespan will be. We collect the scalar browser.engagement.max_concurrent_tab_count, but that only describes the high-water mark for number of tabs within each subsession; is that enough context to understand the measurement you want to collect?

Flags: needinfo?(sefeng)

Hi Tim,

So the probe that we want to implement, is the first bullet point you mentioned, "How many different origins a single pageload uses". By correlating with the number of tabs, we can learn how many different site origins across the entire browsing session, which basically means how many processes is going to created across the session.

The number of tabs telemetry I am looking at is the TAB_COUNT one, I think correlating with this one should give us enough information.

Hope it clears the things up, please let me know your opinion.

Flags: needinfo?(sefeng)

Okay, great. If that's the quantity you collect, I think the documentation for this probe should not mention tabs. The data-review request looks good but the title of the ticket confused me.

I forgot about TAB_COUNT; thanks! That will let you find the modal number of simultaneous tabs for each browser session for each user, which is nicer than the maximum. The product of "average origins per pageload" and "modal simultaneous tabs" sounds like a reasonable guess at the number of simultaneous processes. Is that what you were thinking?

Comment on attachment 9102564 [details]
data_collection_review_1589700.txt

A couple of notes:
- I think this is a reasonable collection, but I'm not sure I agree there's no other way to learn this. We could get a good estimate from crawl data, if we wanted an answer without having to ship code. Estimates from crawl data are less interesting if we want to track evolution over time, but:
- Are you sure that continuing to track this post-Fission is useful? We could directly instrument the number of simultaneous processes instead, which will probably be more useful. A time-limited collection could make sense.
- Consider whether it makes sense to add some rounding to make it harder to pick specific popular websites out of the data.

Thanks for the discussion.

--

1) Is there or will there be **documentation** that describes the schema for the ultimate data set in a public, complete, and accurate way?

Yes, in Histograms.json and the probe dictionary.

2) Is there a control mechanism that allows the user to turn the data collection on and off?

Yes, the Firefox telemetry opt-out.

3) If the request is for permanent data collection, is there someone who will monitor the data over time?

:sefeng.

4) Using the **[category system of data types](https://wiki.mozilla.org/Firefox/Data_Collection)** on the Mozilla wiki, what collection type of data do the requested measurements fall under?

Category 1, technical data.

5) Is the data collection request for default-on or default-off?

Default-on.

6) Does the instrumentation include the addition of **any *new* identifiers** (whether anonymous or otherwise; e.g., username, random IDs, etc.  See the appendix for more details)?

No.

7) Is the data collection covered by the existing Firefox privacy notice?

Yes.

8) Does there need to be a check-in in the future to determine whether to renew the data? (Yes/No)

No, permanent collection.

9) Does the data collection use a third-party collection tool?

No.
Attachment #9102564 - Flags: data-review?(tdsmith) → data-review+
Summary: Collect telemetry on the number of site-origins per tab → Collect telemetry on the number of site-origins per load

It looks like this is similar to bug 1441972, which added a (now expired) probe that uses Tab/DocGroups. Was this known? Is the idea to add the new probe because the old one uses things that we're likely going to tear out soon?

See Also: → 1441972

I didn't aware bug 1441972, and I don't think we had considered to reuse it.

However I don't think they are the same thing? For example, account.google.com and service.google.com are in the same docGroup, but they don't have the same siteOrigin.

Priority: -- → P2

(In reply to Sean Feng [:sefeng] from comment #3)

Hi Tim,

So the probe that we want to implement, is the first bullet point you mentioned, "How many different origins a single pageload uses". By correlating with the number of tabs, we can learn how many different site origins across the entire browsing session, which basically means how many processes is going to created across the session.

But tabs can reuse processes, or iframes in them. And that sharing can happen quite often.
Bug 1441972 should be very close to what we want (or I have misunderstood what we want).

What I said in Comment 4 was wrong, the siteOrigin for account.google.com and service.google.com are the same, so now I start to think the docGroup telemetry one probably makes sense.

Unless the number of siteOrigin can reveals something unique, we might should just reuse Bug 1441972.

Randell, what do you think?

(Nika's needinfo queue is blocked, can't NI her)

Flags: needinfo?(rjesup)

Bug 1441972 looks at per-tab only, but it sounds close to what we want (though we'd prefer to know the number of unique docgroups-per-session, which isn't exactly docsgroups-per-tab times tabs, though that's a first approximation).

It's be more work, but if we could count the number of unique docgroups that should be close to the number of processes we'll need

Flags: needinfo?(rjesup)

I see, yeah, the plan was to collect number of unique site origins per load, and correlate the it with TAB_COUNT telemetry to get a sense of number of process across the browsing session.

I think we'd want another telemetry to count the number of tabGroups, then we can reuse the docGroup-per-tabGroup telemetry.

What do you think Olli?

Flags: needinfo?(bugs)

tabgroups are going away, bug 1561715.

But what would docgroup-per-tabgroup (== docgroup-per-browsingcontextgroup) tell us?

We want number of different-site-docgroups, no? Since that should map to the number of processes pretty closely
(data: url might be a special case).

Flags: needinfo?(bugs)
Attachment #9104329 - Attachment is obsolete: true

After some discussion, we agreed that the old docGroup-per-tabGroup telemetry wasn't sufficient for our use case, because our high level goal was to get a sense of number of processes across the browser.

Then it felt back to the original plan, which was collecting number of site origins per load, and correlate with TAB_COUNT. However, the was one caveat. Olli pointed out that we can't do the correlation because TABs may share processes, for instance, Tab A has sites a, b, c, and Tab B has sites b, c, there will only be 3 processes.

So we agreed a new approach, which is collecting number of unique site origins across the browser with a minimum 5 minutes interval, then we no longer need to do correlations. And this is what the new patch is for.

I will update the data review collection form and the title of the bug after we updated the performance team OKR. (This bug was one of the KR).

Summary: Collect telemetry on the number of site-origins per load → Collect telemetry on the number of site-origins across all tabs for all windows

Requesting Tim to review the updated data collection form.

Attachment #9102564 - Attachment is obsolete: true
Attachment #9112641 - Flags: data-review?(tdsmith)
Comment on attachment 9112641 [details]
data_collection_review_1589700.txt

Thanks for the update.

1) Is there or will there be **documentation** that describes the schema for the ultimate data set in a public, complete, and accurate way?

Yes, in Histograms.json, Glean's metrics.yaml, and the probe dictionary.

2) Is there a control mechanism that allows the user to turn the data collection on and off?

Yes, the Firefox telemetry opt-out.

3) If the request is for permanent data collection, is there someone who will monitor the data over time?

:sefeng.

4) Using the **[category system of data types](https://wiki.mozilla.org/Firefox/Data_Collection)** on the Mozilla wiki, what collection type of data do the requested measurements fall under?

Category 2, interaction data.

5) Is the data collection request for default-on or default-off?

Default-on.

6) Does the instrumentation include the addition of **any *new* identifiers** (whether anonymous or otherwise; e.g., username, random IDs, etc.  See the appendix for more details)?

No.

7) Is the data collection covered by the existing Firefox privacy notice?

Yes.

8) Does there need to be a check-in in the future to determine whether to renew the data? (Yes/No)

No, permanent collection.

9) Does the data collection use a third-party collection tool?

No.
Attachment #9112641 - Flags: data-review?(tdsmith) → data-review+
Pushed by sefeng@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/52af8765cb21
Record the number of unique site origins across all tabs r=smaug,agi,Dexter
Pushed by sefeng@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/5c83e2ecafdd
Record the number of unique site origins across all tabs r=smaug,agi,Dexter
Status: NEW → RESOLVED
Closed: 4 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla73
Flags: needinfo?(sefeng)

Comment on attachment 9141586 [details]
Bug 1589700 - Collect per tab uniqute site origin telemetry r?nika!,snorp!,dexter!

Revision D71493 was moved to bug 1603185. Setting attachment 9141586 [details] to obsolete.

Attachment #9141586 - Attachment is obsolete: true
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: