Closed Bug 1400900 Opened 8 years ago Closed 7 years ago

[Shield] Opt-out Study: TAAR Experiment

Categories

(Shield :: Shield Study, enhancement)

enhancement
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: bmiroglio, Assigned: bmiroglio, NeedInfo)

Details

Attachments

(12 obsolete files)

(waiting for opt-out functionality within shield) This experiment will test the efficacy of the Telemetry-Aware Add-on Recommender (TAAR) which uses a combination of machine learning techniques to recommend add-ons to a given client. The inputs to TAAR include a clients currently installed add-ons, locale, country, and other meta-data available via telemetry. TAAR then outputs a list of N WebExtensions to recommend to the user, which are then rendered on the discovery-pane. We want to know if TAAR’s recommendations influence users to install more add-ons. More details here: https://docs.google.com/document/d/1db0h4F-qqP0T5RPXWDxLrD-xxpHxRak64ZJrhVo9kD0/edit?ts=59bb0b56 We intend to ship to the release channel only, so we'll need official QA on our add-on. Should I file a separate bug for QA?
Flags: needinfo?(glind)
Assignee: nobody → bmiroglio
Attached file TAAR Experiment XPI (obsolete) —
Flags: needinfo?(mkelly)
Attached file TAAR Experiment XPI (obsolete) —
remove testing flag in config
Attachment #8909393 - Attachment is obsolete: true
Attached file TAAR Experiment XPI (obsolete) —
(In reply to Ben Miroglio [:bmiroglio] from comment #0) > We intend to ship to the release channel only, so we'll need official QA on > our add-on. Should I file a separate bug for QA? You'll need to submit a request to Product Integrity: https://mana.mozilla.org/wiki/display/PI/PI+Request
Flags: needinfo?(mkelly)
Whoops, also, ideally they'd sign off on the testing in this bug, but an email to a list that you can link to from here is fine.
I submitted a PI request shortly after filing this bug and it's being tracked. :)
sguha: can you give you OK on the data payload. Nothing is category 3 :). I've linked you in the google doc.
Flags: needinfo?(sguha)
As long the data TAAR sends back is okay, then the contents of the ping described in the linked google doc are definitely < category 3.
Flags: needinfo?(sguha)
Attached file TAAR Experiment XPI--Fix (obsolete) —
Fixing issue that came up during testing--now forces about:addons to display the discover tab when navigating through the popup.
Attachment #8909424 - Attachment is obsolete: true
Attachment #8910003 - Attachment is obsolete: true
Attached file TAAR Experiment XPI--Fix2 (obsolete) —
[Fix from QA] Change eligibility criteria to exlcude profiles that are younger than 3 days. Allows time for HBase to populate and serve recommendations. The new profile age criteria is now 3 <= pcd <= 12
Attachment #8912822 - Attachment is obsolete: true
Attached file addon.xpi (obsolete) —
Minor change in the way that data is reported to make post-analysis cleaner. In getting payloads from our testers, we see fields that are "null" that implies "false". Setting null values to false for clarity. Does not affect any UI/functionality.
Attachment #8913741 - Attachment is obsolete: true
Sounds good, please go ahead since you have QA signoff and 56 has been released. We plan to re-enable updates at 100% on the release channel tomorrow (so your data may show a bump starting on Tuesday if you launch this today)
Attached file taarexp-2-signed.xpi (obsolete) —
I have signed the add-on from comment 11, and uploaded it to Shield as "TAAR Experiment v2".
Attached file TAAR Experiment XPI--Fix4 (obsolete) —
Making another small change to more accurately report interactions with the popup. We are currently under-reporting popup events. *This does not affect any TAAR evaluation or any functionality/UI* mythmon: can you and sign and redeploy this new version?
Attachment #8913962 - Attachment is obsolete: true
Attachment #8914482 - Attachment is obsolete: true
Flags: needinfo?(mcooper)
Ben, that XPI file as the same version number in install.rdf as the previous XPI. We need the version number or the extension id to change between versions. Can update that? It would also be helpful to name the file something like "taarexp-2.1.xpi". That is, something that includes the name and version of the extension.
Flags: needinfo?(mcooper) → needinfo?(bmiroglio)
Attached file taarexp-2.1.xpi (obsolete) —
Resubmit XPI with version incremented.
Attachment #8914697 - Attachment is obsolete: true
Flags: needinfo?(bmiroglio)
Please see Comment 16
Flags: needinfo?(mcooper)
Attached file taarexp-2.1.0-signed.xpi (obsolete) —
I've signed taarexp-2.1.0.xpi and uploaded it to Shield as "TAAR Experiment v2.1.0".
Flags: needinfo?(mcooper)
Attached file taarexp-2.2.0.xpi (obsolete) —
Fix to hopefully mitigate the bug [1] reported by a small group of users. The popup is triggered after 3 successful uri loads, the count of which is stored in browser.local.storage. If users are seeing the popup it makes me think that the add-on is able to correctly count uri loads, otherwise the popup would never be shown. In past versions, once the triggerPopup() method is called, the add-on stores a boolean, `sawPopup`, indicating the client saw the popup--and so before showing the popup the add-on ensures `sawPopup`=false in local storage. My guess is that this logic isn't working for these users, so I added a check that total URI loads must be exactly 3 for triggerPopup() to be called. After the 4th URI load, the pageAction is removed and so is the webNavigation listener. Functionally this replicates the intended behavior, it just adds an additonal condition for completeness. [1] https://www.reddit.com/r/firefox/comments/742gg7/is_there_any_way_to_turn_off_the_customize/
Attachment #8914840 - Attachment is obsolete: true
Attachment #8914958 - Attachment is obsolete: true
Flags: needinfo?(mcooper)
Attached file taarexp-2.2.0-signed.xpi (obsolete) —
Here is a signed version of "taarexp-2.2.0.xpi". I've uploaded this to Shield as "TAAR Experiment v2.2.0".
Flags: needinfo?(mcooper)
Reporting on some strange data I've detected coming from this study. Overall (out of all shield studies running currently) > 98% of data conform to valid states. Of the < 2% that are not valid, over 1% come from TAAR. Specifically, these data indicate the invalid state of a client being BOTH ineligible and installed, which should be mutually exclusive. I posit two possibilities: 1. There's an underlying bug in the TAAR code, specifically 2. TAAR is the only study running to enforce eligibility criteria that render a large number of clients ineligible to install the study and this fact is overlapping with a shield-general, study-agnostic issue where otherwise mutually exclusive pings are sent Not sure how to approach this from here, but ideally we'll determine if it's systematic or predictably random error. All data available via Spark: df = sqlContext.read.parquet("s3://telemetry-private-analysis-2/jgaunt/shield-clients-parquet") selecting where ineligible > 0 and installed > 0 reveals the strange records
> 2. TAAR is the only study running to enforce eligibility criteria that > render a large number of clients ineligible to install the study and this > fact is overlapping with a shield-general, study-agnostic issue where > otherwise mutually exclusive pings are sent I suspect this is the reason per the eligibility criteria embedded into the shield add-on [1]. [1] https://github.com/benmiroglio/taar-experiment/blob/prod/shield-integrated-addon/addons/taar-study/addon/Config.jsm#L90
Can that explain why all of the problematic clients have >1 entry ping as well? Could these criteria be false in one case and true in another within the same client?
> Can that explain why all of the problematic clients have >1 entry ping as well? Since the "client" would really be two or more clients, in divergent profiles (possibly different computers), then >1 entry ping would be expected. I would be very curious if there clients with exactly one entry ping that still ended up in the invalid state. I don' think that situation is possible under my explanation. > Could these criteria be false in one case and true in another within the same client? Since the eligibility criteria rely on profile creation date, this could be explained by one of the divergent profiles being used (qualifying because it is young enough), and then the second be used time later, after the profile is too old. This situation is less likely, but because profile age and ping time are both available in telemetry, it could be verified or falsified: It is possible to reverse engineer *why* the profile was marked as ineligible. For reference, here is the eligibility function from the most recent version of the add-on: const locale = TelemetryEnvironment.currentEnvironment.settings.locale.toLowerCase(); const proflileCreationDate = TelemetryEnvironment.currentEnvironment.profile.creationDate; const currentDay = Math.round(Date.now() / 60 / 60 / 24 / 1000) const profileAgeInDays = currentDay - proflileCreationDate const validProfileAge = profileAgeInDays >= 3 && profileAgeInDays <= 12 const validLocale = locales.has(locale) return validProfileAge && validLocale Locales contains: ar, bg, cs, da, de, el, en-gb, en-us, es-ar, es-es, es-la, fi, fr, fr-ca, he, hu, id, it, ja, ko, ms, nl, no, pl, pt, pt-br, ro, ru, sk, sr, sv, th, tl, tr, uk, vi, zh-tw.
Clients in the table are distinct and unless a profile has been duplicated across machines no more than 1 entry ping is expected per distinct client - correct? I verify there's no one with enter=1, ineligible=1, and installed=1. The issue of multiple entry pings (or multiple pings more generally) isn't limited to TAAR but it could be more noticeable there because it's the only study running that has rigorous eligibility criteria - that could be leading to the abundance of these particular invalid client states. Is profile duplication across machines the only possible cause? If so can that be verified in any way?
I don't want to say it is the only *possible* cause, but it the most likely explanation I've heard of. It is also a long running issue with Shield data that we've never been able to adequately verify or reject. Most of our telemetry relies on having a unique ID, so when that constraint no longer holds, we get weird situations like this one. Looking at the other parts of the telemetry ping may be useful in confirming this idea. For example, if the contents of environment.system changes between pings, that would seem to be like a pretty solid indication that two machines are involved. If it isn't possible to untangle the two sets of pings, it may be valuable to treat clients with multiple entry events as invalid, and discard all of them. They are, at least by some definition, weird and outlying clients.
Looked deeper into some of these clients' pings, narrowing it down to study_state, creationDate, and gfx/hdd information. It doesn't look like the hardware is different between separate enter pings. Rather than separate machines with the same clientId it appears these clients are entering the study a second time after making their first exit. Not sure if this would be shield-general or study-specific... One, for example, has 'hdd': u'VID:45DF4032' thoughout with the following state transitions: 'study_state': u'enter', 'timestamp': u'2017-10-03T11:10:17.300Z' 'study_state': u'installed', 'timestamp': u'2017-10-03T11:10:17.318Z' 'study_state': u'user-disable', 'timestamp': u'2017-10-05T11:12:13.337Z' 'study_state': u'exit', 'timestamp': u'2017-10-05T11:12:13.347Z' 'study_state': u'enter', 'timestamp': u'2017-10-06T11:12:14.782Z' 'study_state': u'ineligible', 'timestamp': u'2017-10-06T11:12:14.795Z' 'study_state': u'exit', 'timestamp': u'2017-10-06T11:12:14.804Z'
(In reply to Josh Gaunt [:jgaunt] from comment #25) > Clients in the table are distinct and unless a profile has been duplicated > across machines no more than 1 entry ping is expected per distinct client - > correct? I verify there's no one with enter=1, ineligible=1, and installed=1. Please note that profiles can also be duplicated locally: some users might have multiple copies of the same profile running on different/same version of Firefox.
Hey Ben. Now that this study has ended can you please recap the outcome here and close the bug? A couple sentences is fine.
Flags: needinfo?(bmiroglio)
TAAR study ran successfully between 10/10/2017 and 26/10/2017 including clients from 34 distinct locales. 1264551 clients were enrolled successfully and met all (analysis) inclusion criteria after filtering. 43543 (~3.5%) interacted with the TAAR service during the study period. 2654 unique add-ons were installed by the participants on study day 1. Clients who are prompted to go to about:addons (pop-up group) were more likely to install add-on(s); Effect Size = +17.2% and were more likely to install more add-ons; Effect size = +19.7%* Clients receiving personalized addon recommendations were more likely to install a larger number of add-ons throughout the study duration; Effect size = +1.4% Anecdotal effects were observed suggesting a polarization in users' interaction between about:addons and amo as well as group differences between en-US localized clients and non-en-US clients.
Status: NEW → RESOLVED
Closed: 7 years ago
Flags: needinfo?(bmiroglio)
Resolution: --- → FIXED
Attachment #8915058 - Attachment is obsolete: true
Attachment #8915202 - Attachment is obsolete: true
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: