Closed
Bug 1400900
Opened 8 years ago
Closed 7 years ago
[Shield] Opt-out Study: TAAR Experiment
Categories
(Shield :: Shield Study, enhancement)
Shield
Shield Study
Tracking
(Not tracked)
RESOLVED
FIXED
People
(Reporter: bmiroglio, Assigned: bmiroglio, NeedInfo)
Details
Attachments
(12 obsolete files)
(waiting for opt-out functionality within shield)
This experiment will test the efficacy of the Telemetry-Aware Add-on Recommender (TAAR) which uses a combination of machine learning techniques to recommend add-ons to a given client. The inputs to TAAR include a clients currently installed add-ons, locale, country, and other meta-data available via telemetry. TAAR then outputs a list of N WebExtensions to recommend to the user, which are then rendered on the discovery-pane. We want to know if TAAR’s recommendations influence users to install more add-ons.
More details here: https://docs.google.com/document/d/1db0h4F-qqP0T5RPXWDxLrD-xxpHxRak64ZJrhVo9kD0/edit?ts=59bb0b56
We intend to ship to the release channel only, so we'll need official QA on our add-on. Should I file a separate bug for QA?
| Assignee | ||
Updated•8 years ago
|
Flags: needinfo?(glind)
| Assignee | ||
Updated•8 years ago
|
Assignee: nobody → bmiroglio
| Assignee | ||
Comment 1•8 years ago
|
||
Flags: needinfo?(mkelly)
| Assignee | ||
Comment 2•8 years ago
|
||
remove testing flag in config
Attachment #8909393 -
Attachment is obsolete: true
| Assignee | ||
Comment 3•8 years ago
|
||
Comment 4•8 years ago
|
||
(In reply to Ben Miroglio [:bmiroglio] from comment #0)
> We intend to ship to the release channel only, so we'll need official QA on
> our add-on. Should I file a separate bug for QA?
You'll need to submit a request to Product Integrity: https://mana.mozilla.org/wiki/display/PI/PI+Request
Flags: needinfo?(mkelly)
Comment 5•8 years ago
|
||
Whoops, also, ideally they'd sign off on the testing in this bug, but an email to a list that you can link to from here is fine.
| Assignee | ||
Comment 6•8 years ago
|
||
I submitted a PI request shortly after filing this bug and it's being tracked. :)
| Assignee | ||
Comment 7•8 years ago
|
||
sguha: can you give you OK on the data payload. Nothing is category 3 :). I've linked you in the google doc.
Flags: needinfo?(sguha)
Comment 8•8 years ago
|
||
As long the data TAAR sends back is okay, then the contents of the ping described in the linked google doc are definitely < category 3.
Flags: needinfo?(sguha)
| Assignee | ||
Comment 9•8 years ago
|
||
Fixing issue that came up during testing--now forces about:addons to display the discover tab when navigating through the popup.
Attachment #8909424 -
Attachment is obsolete: true
Attachment #8910003 -
Attachment is obsolete: true
| Assignee | ||
Comment 10•8 years ago
|
||
[Fix from QA]
Change eligibility criteria to exlcude profiles that are younger than 3 days. Allows time for HBase to populate and serve recommendations. The new profile age criteria is now 3 <= pcd <= 12
Attachment #8912822 -
Attachment is obsolete: true
| Assignee | ||
Comment 11•8 years ago
|
||
Minor change in the way that data is reported to make post-analysis cleaner. In getting payloads from our testers, we see fields that are "null" that implies "false". Setting null values to false for clarity.
Does not affect any UI/functionality.
Attachment #8913741 -
Attachment is obsolete: true
Comment 12•8 years ago
|
||
Sounds good, please go ahead since you have QA signoff and 56 has been released.
We plan to re-enable updates at 100% on the release channel tomorrow (so your data may show a bump starting on Tuesday if you launch this today)
Comment 13•8 years ago
|
||
I have signed the add-on from comment 11, and uploaded it to Shield as "TAAR Experiment v2".
| Assignee | ||
Comment 14•8 years ago
|
||
Making another small change to more accurately report interactions with the popup. We are currently under-reporting popup events.
*This does not affect any TAAR evaluation or any functionality/UI*
mythmon: can you and sign and redeploy this new version?
Attachment #8913962 -
Attachment is obsolete: true
Attachment #8914482 -
Attachment is obsolete: true
Flags: needinfo?(mcooper)
Comment 15•8 years ago
|
||
Ben, that XPI file as the same version number in install.rdf as the previous XPI. We need the version number or the extension id to change between versions. Can update that?
It would also be helpful to name the file something like "taarexp-2.1.xpi". That is, something that includes the name and version of the extension.
Flags: needinfo?(mcooper) → needinfo?(bmiroglio)
| Assignee | ||
Comment 16•8 years ago
|
||
Resubmit XPI with version incremented.
Attachment #8914697 -
Attachment is obsolete: true
Flags: needinfo?(bmiroglio)
Comment 18•8 years ago
|
||
I've signed taarexp-2.1.0.xpi and uploaded it to Shield as "TAAR Experiment v2.1.0".
Flags: needinfo?(mcooper)
| Assignee | ||
Comment 19•8 years ago
|
||
Fix to hopefully mitigate the bug [1] reported by a small group of users.
The popup is triggered after 3 successful uri loads, the count of which is stored in browser.local.storage. If users are seeing the popup it makes me think that the add-on is able to correctly count uri loads, otherwise the popup would never be shown.
In past versions, once the triggerPopup() method is called, the add-on stores a boolean, `sawPopup`, indicating the client saw the popup--and so before showing the popup the add-on ensures `sawPopup`=false in local storage. My guess is that this logic isn't working for these users, so I added a check that total URI loads must be exactly 3 for triggerPopup() to be called. After the 4th URI load, the pageAction is removed and so is the webNavigation listener. Functionally this replicates the intended behavior, it just adds an additonal condition for completeness.
[1] https://www.reddit.com/r/firefox/comments/742gg7/is_there_any_way_to_turn_off_the_customize/
Attachment #8914840 -
Attachment is obsolete: true
Attachment #8914958 -
Attachment is obsolete: true
Flags: needinfo?(mcooper)
Comment 20•8 years ago
|
||
Here is a signed version of "taarexp-2.2.0.xpi". I've uploaded this to Shield as "TAAR Experiment v2.2.0".
Flags: needinfo?(mcooper)
Comment 21•8 years ago
|
||
Reporting on some strange data I've detected coming from this study.
Overall (out of all shield studies running currently) > 98% of data conform to valid states.
Of the < 2% that are not valid, over 1% come from TAAR. Specifically, these data indicate the invalid state of a client being BOTH ineligible and installed, which should be mutually exclusive.
I posit two possibilities:
1. There's an underlying bug in the TAAR code, specifically
2. TAAR is the only study running to enforce eligibility criteria that render a large number of clients ineligible to install the study and this fact is overlapping with a shield-general, study-agnostic issue where otherwise mutually exclusive pings are sent
Not sure how to approach this from here, but ideally we'll determine if it's systematic or predictably random error.
All data available via Spark: df = sqlContext.read.parquet("s3://telemetry-private-analysis-2/jgaunt/shield-clients-parquet")
selecting where ineligible > 0 and installed > 0 reveals the strange records
| Assignee | ||
Comment 22•8 years ago
|
||
> 2. TAAR is the only study running to enforce eligibility criteria that
> render a large number of clients ineligible to install the study and this
> fact is overlapping with a shield-general, study-agnostic issue where
> otherwise mutually exclusive pings are sent
I suspect this is the reason per the eligibility criteria embedded into the shield add-on [1].
[1] https://github.com/benmiroglio/taar-experiment/blob/prod/shield-integrated-addon/addons/taar-study/addon/Config.jsm#L90
Comment 23•8 years ago
|
||
Can that explain why all of the problematic clients have >1 entry ping as well?
Could these criteria be false in one case and true in another within the same client?
Comment 24•8 years ago
|
||
> Can that explain why all of the problematic clients have >1 entry ping as well?
Since the "client" would really be two or more clients, in divergent profiles (possibly different computers), then >1 entry ping would be expected. I would be very curious if there clients with exactly one entry ping that still ended up in the invalid state. I don' think that situation is possible under my explanation.
> Could these criteria be false in one case and true in another within the same client?
Since the eligibility criteria rely on profile creation date, this could be explained by one of the divergent profiles being used (qualifying because it is young enough), and then the second be used time later, after the profile is too old. This situation is less likely, but because profile age and ping time are both available in telemetry, it could be verified or falsified: It is possible to reverse engineer *why* the profile was marked as ineligible.
For reference, here is the eligibility function from the most recent version of the add-on:
const locale = TelemetryEnvironment.currentEnvironment.settings.locale.toLowerCase();
const proflileCreationDate = TelemetryEnvironment.currentEnvironment.profile.creationDate;
const currentDay = Math.round(Date.now() / 60 / 60 / 24 / 1000)
const profileAgeInDays = currentDay - proflileCreationDate
const validProfileAge = profileAgeInDays >= 3 && profileAgeInDays <= 12
const validLocale = locales.has(locale)
return validProfileAge && validLocale
Locales contains: ar, bg, cs, da, de, el, en-gb, en-us, es-ar, es-es, es-la, fi, fr, fr-ca, he, hu, id, it, ja, ko, ms, nl, no, pl, pt, pt-br, ro, ru, sk, sr, sv, th, tl, tr, uk, vi, zh-tw.
Comment 25•8 years ago
|
||
Clients in the table are distinct and unless a profile has been duplicated across machines no more than 1 entry ping is expected per distinct client - correct? I verify there's no one with enter=1, ineligible=1, and installed=1.
The issue of multiple entry pings (or multiple pings more generally) isn't limited to TAAR but it could be more noticeable there because it's the only study running that has rigorous eligibility criteria - that could be leading to the abundance of these particular invalid client states.
Is profile duplication across machines the only possible cause? If so can that be verified in any way?
Comment 26•8 years ago
|
||
I don't want to say it is the only *possible* cause, but it the most likely explanation I've heard of. It is also a long running issue with Shield data that we've never been able to adequately verify or reject. Most of our telemetry relies on having a unique ID, so when that constraint no longer holds, we get weird situations like this one.
Looking at the other parts of the telemetry ping may be useful in confirming this idea. For example, if the contents of environment.system changes between pings, that would seem to be like a pretty solid indication that two machines are involved.
If it isn't possible to untangle the two sets of pings, it may be valuable to treat clients with multiple entry events as invalid, and discard all of them. They are, at least by some definition, weird and outlying clients.
Comment 27•8 years ago
|
||
Looked deeper into some of these clients' pings, narrowing it down to study_state, creationDate, and gfx/hdd information. It doesn't look like the hardware is different between separate enter pings. Rather than separate machines with the same clientId it appears these clients are entering the study a second time after making their first exit. Not sure if this would be shield-general or study-specific...
One, for example, has 'hdd': u'VID:45DF4032' thoughout with the following state transitions:
'study_state': u'enter',
'timestamp': u'2017-10-03T11:10:17.300Z'
'study_state': u'installed',
'timestamp': u'2017-10-03T11:10:17.318Z'
'study_state': u'user-disable',
'timestamp': u'2017-10-05T11:12:13.337Z'
'study_state': u'exit',
'timestamp': u'2017-10-05T11:12:13.347Z'
'study_state': u'enter',
'timestamp': u'2017-10-06T11:12:14.782Z'
'study_state': u'ineligible',
'timestamp': u'2017-10-06T11:12:14.795Z'
'study_state': u'exit',
'timestamp': u'2017-10-06T11:12:14.804Z'
Comment 28•8 years ago
|
||
(In reply to Josh Gaunt [:jgaunt] from comment #25)
> Clients in the table are distinct and unless a profile has been duplicated
> across machines no more than 1 entry ping is expected per distinct client -
> correct? I verify there's no one with enter=1, ineligible=1, and installed=1.
Please note that profiles can also be duplicated locally: some users might have multiple copies of the same profile running on different/same version of Firefox.
Comment 29•7 years ago
|
||
Hey Ben. Now that this study has ended can you please recap the outcome here and close the bug? A couple sentences is fine.
Flags: needinfo?(bmiroglio)
Comment 30•7 years ago
|
||
TAAR study ran successfully between 10/10/2017 and 26/10/2017 including clients from 34 distinct locales.
1264551 clients were enrolled successfully and met all (analysis) inclusion criteria after filtering.
43543 (~3.5%) interacted with the TAAR service during the study period.
2654 unique add-ons were installed by the participants on study day 1.
Clients who are prompted to go to about:addons (pop-up group)
were more likely to install add-on(s); Effect Size = +17.2%
and were more likely to install more add-ons; Effect size = +19.7%*
Clients receiving personalized addon recommendations were more likely to install a larger number of add-ons throughout the study duration; Effect size = +1.4%
Anecdotal effects were observed suggesting a polarization in users' interaction between about:addons and amo as well as group differences between en-US localized clients and non-en-US clients.
Status: NEW → RESOLVED
Closed: 7 years ago
Flags: needinfo?(bmiroglio)
Resolution: --- → FIXED
Attachment #8915058 -
Attachment is obsolete: true
Attachment #8915202 -
Attachment is obsolete: true
You need to log in
before you can comment on or make changes to this bug.
Description
•