Determine how representative telemetry pings are of general population based on startup data in amo ping

RESOLVED FIXED in Backlogged - BZ

Status

Mozilla Metrics
Data/Backend Reports
RESOLVED FIXED
6 years ago
5 years ago

People

(Reporter: (dormant account), Unassigned)

Tracking

unspecified
Backlogged - BZ
x86
Windows 7

Details

(Whiteboard: Telemetry -- needs PM project priority )

Attachments

(1 attachment)

Comment hidden (empty)
Since this seems to be based on my idea, I'll relate the comment I made to Gilbert:

There is an AMO service ping which consists of a comma separated list of installed add-ons, and three startup time metrics.  This ping is not opt-in.  I believe there is a config parameter that could turn it off, but we can expect nearly 100% representation of users of later versions of Firefox in it.

Those same three startup metrics are also collected in the opt-in Telemetry program.

My thought was that we could estimate the bias of the Telemetry opt-in by comparing those measures across the two data sources.

Anurag can provide details and access to the AMO service ping data.
hostname: hm-admin03.scl2.mozilla.com

table: addons_pings
time	string	
guid	string	
src	string	
appos	string	
appversion	string	
tmain	string	
tfirstpaint	string	
tsessionrestored	string	
user_agent	string	
ds	string	
domain	string	

Please file a bug for SSH access to hm-admin03.scl2.mozilla.com and you should be able to access the Hive table to run queries..
(Reporter)

Comment 3

6 years ago
Is anyone working on this yet?
Group: metrics-private
N
Grr.. CC fail. :)

Not yet.  This is something that I hope Chris Jung or Saptarshi might be able to analyze before next Wednesday.  CCing Gilbert to figure out scheduling.
Yeah, working on it. huge data, but should get results by eod
It may be of interest that we are having a discussion about enabling Telemetry by default on Nightly and Aurora in bug 699806.

Comment 8

6 years ago
taras, we are comparing startup times btw telemetry and the general population now. does telemetry not collect num. of extensions? this would be helpful in doing this comparison.
(Reporter)

Comment 9

6 years ago
(In reply to Christopher Jung from comment #8)
> taras, we are comparing startup times btw telemetry and the general
> population now. does telemetry not collect num. of extensions? this would be
> helpful in doing this comparison.

Yes, we report extensions/persona in Firefox 9+. It would be good to be able to rerun this test with Firefox 7,8,9,10(once they hit the stable channel). In Firefox 11, we'll take out startup info out of AMO ping.
We have the comparitive distributions of startup time (tmain) for 

1. WINNT, FFox 7.0 for telemetry vs. All (as obtained from s.a.m.o ping) for all data points in Sep'11
2. WINNT, FFox 8.0{a1,a2} for telemetry vs. All (as obtained from s.a.m.o ping) for all data points in Sep'11.

There are some significant differences and the pattern is different in (1) and (2).
I'll upload PDFs soon.


Joy
Created attachment 574451 [details]
Comparative displays of versions 7,8 and 9 startup times for Telemetry samples vs. All
Hello,

I have attached a PDF with 3 pages describing some differences between
Telemetry samples and the population (firefox users) in regards to
SessionRestored startup time.

The Telemetry data packets contain a sessionrestored time, the AMO
services ping also contains a session restored ping. We make the
assumption that the AMO services ping is representative ot the general
Firefox installation.

We computed the quantiles for all of the Telemetry sessionrestored values for
Firefox 7, 8 and 9 for the month 2011-09.

The sessionrestored times were pulled from the s.a.m.o logs for Sep, 2011.

Page 1: Proportion of observations vs. Log_e of SessionRestored
conditioned on version. Small vertical differences are quite large
(since the x-axis is logged). The pattern for version 7 is different
from 8 and 9.

Page 2: Relative Difference := (Startup time for Telemetry - Startup
time for All)/All *100 8 and 9 are similar, but different from 7. 7 is
definitely centered around 30% less for Telemetry, though 8 and 9 seem
to have a center of 0.

Page 3: Log of Abs Difference. Backs up Page 2. Essentially for larger startup times, the difference increases. Here large means ~13 seconds (which is ~60% percentile for 7 and 8) nd the difference is 1.3 seconds.

see http://sguha.pastebin.mozilla.org/1382401 for the quantiles for Telemetry vs All | version
see http://sguha.pastebin.mozilla.org/1382404 for the quantiles of absolute differences betwen Tel and All startuptimes | version

In essense, 8 is similar to 9 in that both Telemetry and All are similar except at extremes.
For 7, Telemetry and All are different.
Moreover, Page 4 indicates longer startup times are getting shorter with increasing version.

Saptarshi
This looks great Saptarshi.  Could you write up some layman summaries related to this analysis so anyone who sees this bug can learn a bit about what we found?

Something that explains in what ways we have discovered the population of Telemetry users differs from the general population by version and the change trend we are seeing.
Saptarshi - Did you write up a layman summary as Daniel suggested?
Lawrence - thanks so much for following up on this. I'll update this bug with a easy to read summary of the blog post.
Hello,

The blog entry: http://blog.mozilla.com/metrics/2011/12/13/comparing-the-bias-in-telemetry-data-vs-the-typical-firefox-user/

has a description of what we did.

In summary:

We collected start up times for Firefox 7,8 and 9 for November, 2011 from the log files of services.addons.mozilla.org (SAMO). We also took the same information for the same period from the Telemetry data contained in HBase.

1. Visual inspection (barplots, quantile of difference  in deciles and confidence intervals for the median startup times) indicated *no difference*

2. We ran anova tests (anova attempts to attribute the variance in startup times to the version and source (telemetry or all). These tests indicated that neither version nor source proved a difference in startup times.

Hence insofar startup times are concerned there is no difference between Telemetry and the General population.

-- Saptarshi

Comment 17

5 years ago
grouping for triage
Status: NEW → ASSIGNED
Whiteboard: Telemetry -- needs PM project priority

Comment 18

5 years ago
Triaged.
Target Milestone: Unreviewed → Backlogged - BZ
This analysis was completed several months ago. Resolving.
Status: ASSIGNED → RESOLVED
Last Resolved: 5 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.