Compare ComScore data and site impression data collected via telemetry experience.
To give some more context, we want to see how similar/different our ranking of top sites/eTLD+2s compared to comscore top sites (or alexa top sites) with the caveat that our data is from beta users.
The results of the study here: https://docs.google.com/a/mozilla.com/document/d/1ZGJjIhQQr57msIDxA0Gi60MfsRJJd-ZJ0ceeBP5NKHU/edit# It appears that comScore data and newtab data is linearly correlated for limited number of highly ranked sites. google.com,1,171692125,13881430292,49652118190,1,628325,1 yahoo.com,2,125237635,5732688362,22355545954,3,292863,5 facebook.com,3,120146016,18525487778,88929943137,2,372984,2 youtube.com,4,118471644,3296905907,26591404465,4,231808,3 amazon.com,5,82526038,658666828,4415708476,5,162044,4 wikipedia.org,6,62115338,388430018,978910136,23,23429,6 bing.com,7,59762401,1051968905,2684591826,29,18533,19 ebay.com,8,52081255,855487945,4900159299,7,102241,8 twitter.com,9,46629165,394548733,1325507133,9,60323,7 live.com,10,46256922,959666316,1742366673,10,51997,17 It is expected that number of users having these sites in their newtab will be proportional to the number of unique visitors to these sites. Which gives a direct relationship between newtab impressions and unique visitors. As site ranks are decreasing (falling below 50), this relation disappears. Utility sites like apple.com, craigslist, answers.com, about.com, etc.. gradually disappear from newtab, as users do go there routinely, but not frequent enough to push them into newtab visibility area. It’s entirely possible that when 100 most fresent urls are collected, the correlation between comScroe data will become more pronounced.
Thanks for checking the newtab impressions vs comscore and alexa ranks. Do you have the scripts you used to analyze this somewhere? We'll probably want to rerun them when we have single impressions from unique users and see if the site rank relations change.
no scripts this time. Data is joined in one file loaded in octave, and octave has the functionality to do stats and graphs and the like. Perhaps README to protocol what had been done?