Closed Bug 1276200 Opened 4 years ago Closed 3 years ago

Validate engagement measurements

Categories

(Toolkit :: Telemetry, defect, P1)

defect
Points:
3

Tracking

()

RESOLVED FIXED
Tracking Status
firefox49 --- affected

People

(Reporter: gfritzsche, Assigned: Dexter)

References

(Blocks 1 open bug)

Details

(Whiteboard: [measurement:client])

After landing the scalar measurements, we need to validate incoming data to confirm that they work as expected.
This is probably best driven with the data from bug 1252625.
Priority: -- → P3
Whiteboard: [measurement:client]
Blocks: 1252625
Summary: Validate scalar measurements → Validate engagement measurements
Priority: P3 → P1
Points: --- → 3
Depends on: 1293222
Depends on: 1292337
Depends on: 1292682
Priority: P1 → P2
An errata spreadsheet, so we don't get crazy when analysing data: https://docs.google.com/spreadsheets/d/14EtOTQUkeMb3-qZjo9NzBucamgQ1hYx_8X0hJNyGERI/edit#gid=0
Priority: P2 → P1
We should also confirm that bug 1293222 actually worked and that it was "an important" fix.
(In reply to Alessio Placitelli [:Dexter] from comment #2)
> We should also confirm that bug 1293222 actually worked and that it was "an
> important" fix.

To expand, i mean that it would be good to look at what impact this had (before vs. after).
Assignee: nobody → alessio.placitelli
Georg, Brendan, would you mind taking a sanity look at https://gist.github.com/Dexterp37/9bea37f536d5b25651aecc8b22d2dfb6 ?

Interesting bits:
- We do have some nighty users with MANY tabs and windows per subsession.
- After the bug that fixed uri counts landed (bug 1293222), the distribution of the URI counts changed a bit.
Flags: needinfo?(gfritzsche)
Flags: needinfo?(bcolloran)
Overall this doesn't look alarming and could be realistic. We should monitor this more going to Beta & Release and should try to cross-check with other data if possible.

Some remarks:
* For this specific analysis it would help to see 75th & 90th percentiles
* For unique_domains_count, how many % exactly recorded >= 100?
* The high tab counts seem limited to 95th & 99th percentiles and might be realistic.
  - This is a channel population biased to technically more literate users.
  - We know that there are "tab hoarders" that never close any tabs.
  - Many of those might come from session restore.
  - We might have previous data from UITelemetry or so that could be used for cross-checking.
  - Either way, going to beta & release i'd expect the distribution to shift away a bit from these users.
* Similarly for the window counts.

For the tab & window counts we should take a closer look:
* we can compare open event counts to maximum counts per session, to see how much comes from session restore etc.
* we should look at daily per-user open counts, to see those are realistic.
Flags: needinfo?(gfritzsche)
Hey John, we had to start taking a look at the data coming in. In comment 4, I've added the link to a gist that makes sure the data we receive is in the proper format and plots an histogram for each probe.

It would be very helpful if you could take a look at that as well.
(In reply to Georg Fritzsche [:gfritzsche] from comment #5)
> For the tab & window counts we should take a closer look:
> * we can compare open event counts to maximum counts per session, to see how
> much comes from session restore etc.
> * we should look at daily per-user open counts, to see those are realistic.

We should look at daily per-user counts for all the measurements, so also the URI/domain ones.
This is great, thanks Alessio -- I'm so excited that this data is starting to come in! I think Georg's remarks and follow-up questions are spot on.

I'm also wondering about the handful of total_uri_count values that are very very high, like >10,000 in a session fragment. Do we have any theories about what could be happening? Supposing that the session fragment happened to be 24hrs long (the longest possible duration for a session fragment), a session fragment with 10k URIs would have to be loading 10000/(24*60*60) = 0.116 URIs/sec for the whole 24 hrs. It makes me wonder whether for some clients we're counting some URI loading pathway that we should be ignoring -- like js loads, or URIs for resources (images etc) being fetched by a page, or some other weird thing. 

Maybe we should pull the data for a couple clients with those super high URI/sec values wonder if we could pull out those session fragments to see if they look weird in other ways. Perhaps they are clients driven by scripts or something, and they really are loading a page every 9 seconds all day long. We have seen stranger things. Anyway, it'd be good to have a theory.
Flags: needinfo?(bcolloran)
I've updated the gist from comment 4.

(In reply to Georg Fritzsche [:gfritzsche] from comment #5)
> Some remarks:
> * For unique_domains_count, how many % exactly recorded >= 100?

Only 465 over 571389 (~0.1%).

> * The high tab counts seem limited to 95th & 99th percentiles and might be
> realistic.
>   - This is a channel population biased to technically more literate users.
>   - We know that there are "tab hoarders" that never close any tabs.

Ah, mice, I didn't know that! :D

> For the tab & window counts we should take a closer look:
> * we can compare open event counts to maximum counts per session, to see how
> much comes from session restore etc.

Nothing should really come from the session store for most of the cases, except
after crash reports or manual session restores.

> * we should look at daily per-user open counts, to see those are realistic.

Done that in the updated gist. They look reasonable, with the exception of the clients
which are opening more than 10k URIs in less than 24hrs. This seems to be the 2% of the
analysed clients.

(In reply to brendan c from comment #8)
> This is great, thanks Alessio -- I'm so excited that this data is starting
> to come in! I think Georg's remarks and follow-up questions are spot on.

Super! Glad to see data coming in as well :-)

> I'm also wondering about the handful of total_uri_count values that are very
> very high, like >10,000 in a session fragment. Do we have any theories about
> what could be happening? Supposing that the session fragment happened to be
> 24hrs long (the longest possible duration for a session fragment), a session
> fragment with 10k URIs would have to be loading 10000/(24*60*60) = 0.116
> URIs/sec for the whole 24 hrs. It makes me wonder whether for some clients
> we're counting some URI loading pathway that we should be ignoring -- like
> js loads, or URIs for resources (images etc) being fetched by a page, or
> some other weird thing. 

You're right, there could be some pathway that I've ignored. By digging into the data,
there seems to be a class of clients with very short subsession lenghts (9 minutes or so!)
opening more than 10k URIs.

This, as you mentioned, could be more like scripted entities. Do you have any suggestion
on how to detect if that's the case? Any particular measurement that could help us?

By analysing my last pings, it looks like I'm a very active URI loader as well: I loaded more than
700 uris in 24hrs. I think that matches my browsing behaviour when looking for some answers online:
I query some search engine, open 20+ new URIs in tabs, then close them all as soon as I have the answer.
(In reply to Alessio Placitelli [:Dexter] from comment #9)
> > * we should look at daily per-user open counts, to see those are realistic.
> 
> Done that in the updated gist. They look reasonable, with the exception of
> the clients
> which are opening more than 10k URIs in less than 24hrs. This seems to be
> the 2% of the
> analysed clients.

So, there was a mistake in the notebook. I was counting subsessions with more than 1k URIs, not 10k. With that change, there are only 11 clients (0.0% :) ) opening huge amounts of URIs. The gist has been updated to reflect this change.

The clients seem to be have consistently. I've dumped some stats, per client (# URIs min, # URIs max, [p75, p95], # samples):

[(2, 44102, array([ 30251.5 ,  42022.65]), 8),
 (3, 16630, array([  421.5,  8779.5]), 11),
 (1, 11920, array([   44.75,  6614.5 ]), 10),
 (1, 11269, array([ 2836.75,  9582.55]), 4),
 (4280, 16728, array([ 15661.5,  16514.7]), 4),
 (1, 47162, array([   117.5,  16462.8]), 15),
 (2, 17137, array([   98. ,  4908.2]), 65),
 (1, 11416, array([  210. ,  2186.7]), 18),
 (20, 29215, array([   353. ,  20603.2]), 7),
 (18, 20951, array([  4624. ,  17685.6]), 5),
 (2, 12840, array([ 3039.5,  7322.8]), 75)]
Would you mind taking a new look at the gist, to rule out any issue?
Flags: needinfo?(gfritzsche)
Brendan, flagging you again given the updates from comment 10  and comment 9. Do they make sense?
Flags: needinfo?(bcolloran)
Looks good Alessio, thanks.

One question: In cell "In [31]", are you summing unique_domains_count across session fragments? Just to make sure I'm understanding the implementation: I would have thought that summing across session fragments was not valid because each session fragment would reset the unique URI Bloom filter (or whatever structure you used), such that if you visit google.com during two subsequent subsessions during one day, it will be added to the Bloom filter during each subsession. Then, if you sum over subsessions from that day, google.com would be counted twice. Is that correct?
Flags: needinfo?(bcolloran)
* In [6] & [8], why use 95th percentile? Should this be max()?
* re [31], we should probably look at "max per day" or "values per subsession" (instead of per day) instead?
* In [65], i think we should look at sessionLength, not subsessionLength (e.g. grabbing them from the shutdown pings).
* [61], nitpicking, but a table with a header would be nice for future dumps like this
* [61], if this is automation, i would look for:
 * either a new or constantly resetting profile
 * low session counts (1 or "few", profileSubsessionCounter as a proxy?)
 * lower session lengths
 * submit high counts with each session.
   Maybe a proxy is "for pathological clients, the uri counts p25 is pretty close to p90"?
Flags: needinfo?(gfritzsche)
Worth calling out that:
(1) per [10], 50% of users open at least 15 tabs per day
(2) per [16], 50% of users open at least 2 windows per day
(3) per [22], 50% of users open at least 13 URIs per day

Can we explain the relationship between (1) & (3)?
Can we show that relationship in the data?

(2) seems a bit high?
Maybe this actually shows that many people have more than one session per day?
Can we look at this per-clientId & per-sessionId (not subsession)?
I've updated the gist with the latest analysis.

(In reply to brendan c from comment #13)
> Then, if you sum over subsessions from that day,
> google.com would be counted twice. Is that correct?

Good point, that's correct. My mistake. I'm plotting both |max| and |p95| right now.

(In reply to Georg Fritzsche [:gfritzsche] from comment #14)
> * In [6] & [8], why use 95th percentile? Should this be max()?

Since we're doing this per day, per user, this should be |max|. Good point.

> * re [31], we should probably look at "max per day" or "values per
> subsession" (instead of per day) instead?

Yeah, my mistake. Fixed that!

> * In [65], i think we should look at sessionLength, not subsessionLength
> (e.g. grabbing them from the shutdown pings).

Added that as well, it doesn't look to surprising though :(

> * [61], if this is automation, i would look for:
>  * either a new or constantly resetting profile
>  * low session counts (1 or "few", profileSubsessionCounter as a proxy?)
>  * lower session lengths
>  * submit high counts with each session.
>    Maybe a proxy is "for pathological clients, the uri counts p25 is pretty
> close to p90"?

Unfortunately, these profiles do not seem to match any of these criteria :(

(In reply to Georg Fritzsche [:gfritzsche] from comment #15)
> Worth calling out that:
> (1) per [10], 50% of users open at least 15 tabs per day
> (2) per [16], 50% of users open at least 2 windows per day
> (3) per [22], 50% of users open at least 13 URIs per day
> 
> Can we explain the relationship between (1) & (3)?

Brendan, do you have any idea about this?
Should we invest more time in digging through the data now or do you think this looks good enough?
If not, any suggestion about what to expand?

> (2) seems a bit high?
> Maybe this actually shows that many people have more than one session per
> day?
> Can we look at this per-clientId & per-sessionId (not subsession)?

I think this should be [134]. Nothing too surprising as far as I can see.
Flags: needinfo?(bcolloran)
Moreover, the shape of the "unique_domain_count" distribution seems to suggest that we might need to increase the limit (which is currently 100). Any thought on this?
Flags: needinfo?(gfritzsche)
(In reply to Alessio Placitelli [:Dexter] from comment #16)
> > (2) seems a bit high?
> > Maybe this actually shows that many people have more than one session per
> > day?
> > Can we look at this per-clientId & per-sessionId (not subsession)?
> 
> I think this should be [134]. Nothing too surprising as far as I can see.

I expected "most" users to open 1 window per session, but the data in [134] is per day, so we can't really say anything about the behavior.

(In reply to Alessio Placitelli [:Dexter] from comment #17)
> Moreover, the shape of the "unique_domain_count" distribution seems to
> suggest that we might need to increase the limit (which is currently 100).
> Any thought on this?

From comment 9, i think we are fine:
> > * For unique_domains_count, how many % exactly recorded >= 100?
> 
> Only 465 over 571389 (~0.1%).

(In reply to Alessio Placitelli [:Dexter] from comment #16)
> > * [61], if this is automation, i would look for:
> >  * either a new or constantly resetting profile
> >  * low session counts (1 or "few", profileSubsessionCounter as a proxy?)
> >  * lower session lengths
> >  * submit high counts with each session.
> >    Maybe a proxy is "for pathological clients, the uri counts p25 is pretty
> > close to p90"?
> 
> Unfortunately, these profiles do not seem to match any of these criteria :(

As this is a limited set of 11 clients, i guess we could:
* dig into their history in a separate bug (are there suspicious addons, could this be mozilla infrastructure, ...?)
* move on and keep monitoring this on Aurora & Beta for low noise ratio
Flags: needinfo?(gfritzsche)
Blocks: 1298342
(In reply to Georg Fritzsche [:gfritzsche] from comment #18)
> (In reply to Alessio Placitelli [:Dexter] from comment #16)
> > > (2) seems a bit high?
> > > Maybe this actually shows that many people have more than one session per
> > > day?
> > > Can we look at this per-clientId & per-sessionId (not subsession)?
> > 
> > I think this should be [134]. Nothing too surprising as far as I can see.
> 
> I expected "most" users to open 1 window per session, but the data in [134]
> is per day, so we can't really say anything about the behavior.

Whoops, I meant [137]. This seems to match with your expected behaviour, as p75 is 0
while p95 is 4.

> (In reply to Alessio Placitelli [:Dexter] from comment #16)
> > > * [61], if this is automation, i would look for:
> > >  * either a new or constantly resetting profile
> > >  * low session counts (1 or "few", profileSubsessionCounter as a proxy?)
> > >  * lower session lengths
> > >  * submit high counts with each session.
> > >    Maybe a proxy is "for pathological clients, the uri counts p25 is pretty
> > > close to p90"?
> > 
> > Unfortunately, these profiles do not seem to match any of these criteria :(
> 
> As this is a limited set of 11 clients, i guess we could:
> * dig into their history in a separate bug (are there suspicious addons,
> could this be mozilla infrastructure, ...?)

Filed bug 1298344.

> * move on and keep monitoring this on Aurora & Beta for low noise ratio

Filed bug 1298342.
Rebecca, since Brendan is away, would you mind checking out the engagement measurements validation notebook [1]?

Interesting bits:

- We have an handful of clients opening loads of URIs per subsession (> 10k). We think these come from automation or scripted FFs, and have filed bug 1298344.
- 50% of users open at least 15 tabs per day
- 50% of users open at least 2 windows per day
- 50% of users open at least 13 URIs per day

In general, the probes seem to be well behaving.

[1] -  https://gist.github.com/Dexterp37/9bea37f536d5b25651aecc8b22d2dfb6
Status: NEW → ASSIGNED
Flags: needinfo?(rweiss)
Rebecca, here comes some additional bits.

- Subsessions hitting the 100 unique domains boundary: which is 0.001 or 0.008 clients (See [155] and [156]).
- We do have some window and tab hoarders (See [123] and [125]).
- In [137] you can see the number of window open events per client session: most of the clients don't open new windows at all. The median is 0 and our p95 is 4.
- In [148] and [150] you can see some quick facts about the heavy URI loaders that I mentioned in comment 20.

There's a mistake on comment 20: we have "50% of users open at least 15 tabs per day" [128] and "50% of users open at least 87 URIs per day" [141]. I think that the discrepancy in #URIs/#Tabs can be explained with about:* pages in new tabs (for # tabs > # URIs) or with reusing tabs (for # URIs > # tabs).

Rebecca, what do you think of the URI hoarders (loading > 10k URIs per subsession)? They don't seem so many, so I'm not sure if it's a real issue. Moreover, do you think that the window open events per client session look reasonable?
We have no prior expectations about what "normal" tab and window open usage looks like, so in many ways we're simply looking to pass a face validity check on whether we think it's possible for a human being to behave in the average way.  

So to that point, everything seems reasonable enough, in that I could imagine a human being performing these actions.  I'm not worried about the number of URI hoarders at this point, but it is the Nightly population, so it will be good to keep track of this number as we move along towards wilder and broader coverage channels (e.g. let's track this as we see the data coming in from Aurora).

With windows open, the median seems reasonable, but I'd like to clarify the third point above (the median being 0).  Is the correct interpretation that most *subsessions* don't include a window open event?
Flags: needinfo?(rweiss)
(In reply to Rebecca Weiss from comment #22)
> With windows open, the median seems reasonable, but I'd like to clarify the
> third point above (the median being 0).  Is the correct interpretation that
> most *subsessions* don't include a window open event?

Yes, that's correct. I've updated the notebook at [1] with a deeper analysis of the widows open events per subsession (see cell [9]): the 75th percentile is 0 and the 95th percentile is 2.

[1] - https://gist.github.com/Dexterp37/9bea37f536d5b25651aecc8b22d2dfb6
qq: do we explicitly count the first window open event that occurs at the start of an FF session (during the first subsession of that session), or would that be left as a matter for analysis?
Flags: needinfo?(bcolloran)
(In reply to brendan c from comment #24)
> qq: do we explicitly count the first window open event that occurs at the
> start of an FF session (during the first subsession of that session), or
> would that be left as a matter for analysis?

Do you mean for the window that gets opened when launching FF? That's not being counted. However, this is counted in the FX_SESSION_RESTORE_NUMBER_OF_WINDOWS_RESTORED histogram.

Unless you meant something else and, if that's the case, I'm sorry  :)
Flags: needinfo?(bcolloran)
Thanks Alessio -- yes exactly, I was thinking of the window(s) that open when FF is first launched. Is FX_SESSION_RESTORE_NUMBER_OF_WINDOWS_RESTORED an opt-out metric, or will it be promoted to opt-out? It seems reasonable to differentiate the session restoration window open events from the count of window open events initiated by user action within a session, but we may want to be able to look at both; and actually, window hoarding across sessions is a quite informative and interesting behavior.

I'm still unclear on the process for whitelisting probes for gen-pop-- rweiss thoughts?
Flags: needinfo?(bcolloran) → needinfo?(rweiss)
(In reply to brendan c from comment #26)
> Thanks Alessio -- yes exactly, I was thinking of the window(s) that open
> when FF is first launched. Is FX_SESSION_RESTORE_NUMBER_OF_WINDOWS_RESTORED
> an opt-out metric, or will it be promoted to opt-out? 

I see your point now. No, it's an opt-in probe. I've no idea if there's an intention to promote that to an opt-out probe.
If you're interested in that probe, then FX_SESSION_RESTORE_NUMBER_OF_TABS_RESTORED is probably relevant too.
If these are fine separately, it's probably easiest to look into making them opt-out in a separate bug.
Can we close this bug and move the discussion for making FX_SESSION_* opt-out in separate bugs?
Flags: needinfo?(bcolloran)
Blocks: 1303044
(In reply to Alessio Placitelli [:Dexter] from comment #29)
> Can we close this bug and move the discussion for making FX_SESSION_*
> opt-out in separate bugs?

Yes that works for me, thanks.
Flags: needinfo?(bcolloran)
Filed bug 1303278, moved the ni? for :rweiss to that bug so we have all the information in one place.
Status: ASSIGNED → RESOLVED
Closed: 3 years ago
Flags: needinfo?(rweiss)
Resolution: --- → FIXED
Depends on: 1309169
You need to log in before you can comment on or make changes to this bug.