Closed Bug 1242013 Opened 4 years ago Closed 4 years ago

Find out how many different tabs our users view per some unit of time (maybe per hour)

Categories

(Toolkit :: Telemetry, defect)

defect
Not set

Tracking

()

RESOLVED FIXED
mozilla47
Tracking Status
firefox47 --- fixed

People

(Reporter: mconley, Assigned: billm)

Details

Attachments

(1 file, 2 obsolete files)

I'm trying to improve tab switching time.

One way to do this is to keep layers (or a screenshot) or background tabs around for some amount of time, and show them synchronously when the user brings that tab to the foreground (and then cross-fade or snap transition once the actual layers from the content process become available).

This is a classic space / time trade-off. Layers and screenshots would consume running memory. Users with many many many tabs would end up using a bunch of memory to keep those screenshots around, so there'd need to be an upper bound on how many of these screenshots or layers we hold onto.

billm suggested a first step which I think is a good one; let's get a sense of how many tabs our users tend to look at over some period of time. Say, per hour.

So this bug is about adding a telemetry probe to get this data.
Hey Yoric, do you know if we have Telemetry infrastructure that would automatically do some kind of segmenting "by the hour" for a count like this?
Flags: needinfo?(dteller)
Nothing that I know of.
Flags: needinfo?(dteller)
Alright, well, I guess what I need then is some mechanism that will increment a count every time a tab is switched to.

If the session is shut down before the reset time (let's say we're measuring per hour), make sure to record the count before the session goes away.

If we cross the reset time, record the count, and the reset the count.

Hey :vladan, as a data collection peer, does this sound kosher? If so, I might get azhang to work on this.
Flags: needinfo?(vladan.bugzilla)
> Alright, well, I guess what I need then is some mechanism that will increment a count 
> every time a tab is switched to.

I think you would want the number of *unique* tabs activated per hour

This probe would effectively report roughly how active the user was each hour, but without ordering the hourly activity reports in any way. That doesn't seem too bad. So I think that's ok.
Flags: needinfo?(vladan.bugzilla)
unique tabs -> simultaneously-active unique URLs
Mmmh... I'm pretty sure that there is a last-used date attached by Session Restore to each tab. This might be sufficient for comment 4.
Flags: needinfo?(mconley)
(In reply to Vladan Djeric (:vladan) -- last day at Moz Friday Jan 29 from comment #4)
> > Alright, well, I guess what I need then is some mechanism that will increment a count 
> > every time a tab is switched to.
> 
> I think you would want the number of *unique* tabs activated per hour

Yes, a good point. We very much want that.

(In reply to David Rajchenbach-Teller [:Yoric] (please use "needinfo") from comment #6)
> Mmmh... I'm pretty sure that there is a last-used date attached by Session
> Restore to each tab. This might be sufficient for comment 4.

Ah, yes - I think you're referring to the lastAccessed property on a tab object. Looking at that thing, that looks like a Date object representing the last time that tab was selected. It's Date.now() if the tab is currently selected.

Hey azhang, does this sound like a probe you could put together?
Flags: needinfo?(mconley) → needinfo?(azhang)
Attached patch tab-switches.patch (obsolete) — Splinter Review
This patch implements a probe and histogram for tab switching to unique tabs.
Flags: needinfo?(azhang)
Attachment #8714467 - Flags: review?(mconley)
Comment on attachment 8714467 [details] [diff] [review]
tab-switches.patch

Review of attachment 8714467 [details] [diff] [review]:
-----------------------------------------------------------------

Are we sure this will allow us to calculate unique tabs per hour?

Consider the following scenario: I have 5 tabs, A, B, C, D and E.

Here's the timeline:

1. Session starts with A, B, C, D and E
2. I, by default, am viewing A
3. I switch to B
4. I switch back to A
5. Hour ends. I have looked at two unique tabs (A and B), and I record Telemetry for that.
6. New hour starts. I am still looking at A
7. I switch to B
8. I switch to C
9. I switch to D
10. I switch to E
11. Hour ends. Because A and B already had "viewed" set on them, I've only recorded 3 viewings, despite having seen a total of 5 unique tabs.

Perhaps we should have a mechanism like this:

1. Session starts with A, B, C, D and E
2. I, by default, am viewing A, so the lastAccessed value is Date.now().
3. I switch to B, so its lastAccessed value is Date.now(). A's is locked at the time I switched away.
4. I switch back to A, so now B is locked at the time I switched away, and A's time is Date.now().
5. Hour ends. We scan tabs that have lastAccessed within the last hour, and record that measurement in Telemetry.
6. New hour starts. I am still looking at A
7. I switch to B
8. I switch to C
9. I switch to D
10. I switch to E
11. Hour ends. We scan tabs that have lastAccessed within the last hour, and record that measurement in Telemetry.
12. I switch to A
13. Session ends before the hour is up. We scan tabs that have been looked at since the last recording (1).

Does that sound reasonable as a probe, Yoric?
Attachment #8714467 - Flags: review?(mconley) → review-
(Feel free to weigh in too, azhang. I'm asking Yoric because I've never designed a Telemetry probe before, and I want to make sure what I'm suggesting will actually return useful data).
Flags: needinfo?(dteller)
What if we keep a list of tabs ordered by how recently they've been accessed. So index 0 would be the tab currently in the foreground. Lazily loaded tabs would not appear in the list. Upon switching to a tab, we would record its index in a histogram and then move it up to index 0 in the list (inserting it if it's not already in the list).
Do we get anything from recording its index? The index sounds like unneeded data.

To directly match the question of tabs used per hour, we could just record the "no. of unique tabs used this hour" into an "exponential" histogram as per comment 4.
(In reply to Mike Conley (:mconley) - Needinfo me! from comment #9)
> Perhaps we should have a mechanism like this:
> 
> 1. Session starts with A, B, C, D and E
> 2. I, by default, am viewing A, so the lastAccessed value is Date.now().
> 3. I switch to B, so its lastAccessed value is Date.now(). A's is locked at
> the time I switched away.
> 4. I switch back to A, so now B is locked at the time I switched away, and
> A's time is Date.now().
> 5. Hour ends. We scan tabs that have lastAccessed within the last hour, and
> record that measurement in Telemetry.
> 6. New hour starts. I am still looking at A
> 7. I switch to B
> 8. I switch to C
> 9. I switch to D
> 10. I switch to E
> 11. Hour ends. We scan tabs that have lastAccessed within the last hour, and
> record that measurement in Telemetry.
> 12. I switch to A
> 13. Session ends before the hour is up. We scan tabs that have been looked
> at since the last recording (1).
> 
> Does that sound reasonable as a probe, Yoric?

That sounds reasonable.
Flags: needinfo?(dteller)
Attached patch tab-switches.patch (obsolete) — Splinter Review
Hmm, the suggestions from comment 9 would definitely be better, but it ignores closed tabs right? If we open and close a tab before the hourly scanning, it'll never get scanned at all.

Here's a new version of the patch, which should handle things better:

* Every time we switch to a unique tab, we increment a count of unique tabs switched to.
* Every hour, we reset this count, update the histogram, and mark all the tabs as unviewed again.

As I understand it, this would correctly handle the situation mentioned in comment 9. One thing to note though, sessions that last less than an hour don't get recorded.

We could potentially add a shutdown handler, and then divide the tab switches by the fraction of an hour, but I'm not sure this would be necessary.
Attachment #8714467 - Attachment is obsolete: true
Attachment #8715384 - Flags: review?(mconley)
Also, the histogram is now exponential with a high value of 3600, which means we can capture between 0 switches per hour to 1 every second.
(In reply to Bill McCloskey (:billm) from comment #11)
> What if we keep a list of tabs ordered by how recently they've been
> accessed. So index 0 would be the tab currently in the foreground. Lazily
> loaded tabs would not appear in the list. Upon switching to a tab, we would
> record its index in a histogram and then move it up to index 0 in the list
> (inserting it if it's not already in the list).

I guess I didn't express myself very well. By index I meant the index in the list of tabs ordered by recency. Here's an example. The list would start out empty. Now consider the following:

1. Switch to tab A. List is now [A].
2. Switch to tab B. List is now [B, A].
3. Switch back to A. List is now [A, B]. Record 1 (A's former index) in histogram.
4. Switch to C. List is now [C, A, B].
5. Switch to B. List is now [B, C, A]. Record 2 (B's former index) in histogram.
6. Switch to C. List is now [C, B, A]. Record 1 (C's former index) in histogram.

The index tells us how far back in the cache we would need to go. So if we had a cache of size 2, then we would know from this example that the hit rate would be 2/3 (the switch to B in step 5 would be a miss, while steps 3 and 6 would be hits).

This seems a lot simpler to me than worrying about hourly measurements and stuff like that.
This implements comment 16.
Assignee: nobody → wmccloskey
Status: NEW → ASSIGNED
Attachment #8718155 - Flags: review?(mconley)
Attachment #8715384 - Attachment is obsolete: true
Attachment #8715384 - Flags: review?(mconley)
Comment on attachment 8718155 [details] [diff] [review]
record cache position

Review of attachment 8718155 [details] [diff] [review]:
-----------------------------------------------------------------

r=me, assuming we don't see any TPS regressions, and we move the canRecordExtended to the top of the _recordTabAccess method for early bailout.

Thanks billm!

::: browser/base/content/tabbrowser.xml
@@ +1328,5 @@
> +            }
> +          }
> +          aTab.cachePosition = 0;
> +
> +          if (Services.telemetry.canRecordExtended && isFinite(pos)) {

We might as well avoid all of the above if canRecordExtended is false, and return early.
Attachment #8718155 - Flags: review?(mconley) → review+
https://hg.mozilla.org/mozilla-central/rev/bf94144ba355
Status: ASSIGNED → RESOLVED
Closed: 4 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla47
In the future please note data-collection reviews in commit messages. I had to hunt for this one after reviewing the histograms.json commit log.
(In reply to Benjamin Smedberg  [:bsmedberg] from comment #21)
> In the future please note data-collection reviews in commit messages. I had
> to hunt for this one after reviewing the histograms.json commit log.

What is the format for a data-collection review?

dcr=vladan ?
I've seen people do data-review= but I don't think there's common practice yet
You need to log in before you can comment on or make changes to this bug.