Closed Bug 1294055 Opened 9 years ago Closed 7 years ago

[SHIELD] Study Validation Review for Context Graph Recommendation Engine Experiment

Categories

(Shield :: Shield Study, defect)

x86
macOS
defect
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: telliott, Assigned: glind)

References

Details

User Story

We want to run a set of experiments to see what sort of data insights we can get from aggregate user browsing habits. This is intended as a feasibility study for a larger-scale recommendation engine, and data collected here will help with the future design process.

This addon will be invisible to the user for the course of the entire SHIELD study.
User story: We want to run a set of experiments to see what sort of data insights we can get from aggregate user browsing habits. This is intended as a feasibility study for a larger-scale recommendation engine, and data collected here will help with the future design process. FAQ is currently at https://docs.google.com/document/d/1K-W4y-1eu1WR4CgK7fYXY1vEhrkvkgZhR7X0uCici9o/edit Data Handling explanation is at https://docs.google.com/document/d/1RSAN6SvrQm0wIOpLTYRtW5KuJPSABS3akEY0s5tFL1c/edit There are no branches in this study; it is simply a data collection effort that the user can opt into. We do not intend to test this addon internally.
User Story: (updated)
User Story: (updated)
Assignee: nobody → telliott
Blocks: 1294053
No longer depends on: 1294053
Blocks: 1294064
No longer blocks: 1294064
Blocking issues: 1. I want to see some hypotheses or claims that collecting full tab/url history would help resolve. This could take many forms: - analysis or prediction code (to run) - a possible product direction you would take if you had full urls at a central location (I want to know they will be used) I *think* the implicit claim is that "page histories will have useful patterns for prediction". I would like to hear some candidate hunches, if that is true. Else, state whatever your claim is. 2. I am less keen on tossing the data so quickly after collection. Full url collection is a very expensive, invasive ask, and I don't want us to come back in 6 months for more, because we ran out of analysis time. 3. Other solutions to improve the "value for users" piece might include: a. Pay users for history (we have done this in the past) with actual money b. Have some sort of 'cupcake' or proof of concept in the addon (such as previous demos), or link to articles, or some other way of showing user benefit. c. links to existing blogs, projects, and vision for Forward Button or collective filter, etc.
Assignee: telliott → glind
Flags: needinfo?(telliott)
1) So there's two parts to this. On the data side, I defer somewhat to Rebecca. We have some researchers lines up to do analysis on the data. For my part, I'm interested in what sites turn out to be sticky (multiple visits in a week; consistent visits) and building a basic co-ocurrence graph. On the engineering side, I have several questions that are somewhat meta to this study: * Is there any appetite from the public to contribute this data? * What's the minimum amount of information we can collect and still generate reasonable signal? Can we produce meaningful data from a small cohort of users? * Are username-containing URLs valuable? * What can we do to anonymize user data even further when we build the real thing? Possible product direction is in Nick's Medium post: https://medium.com/@osunick/context-graph-its-time-to-bring-context-back-to-the-web-a7542fe45cf3 - this experiment is the precursor to much of what he's talking about there. 2) I could be persuaded on this. My assumption is that because this is a precursor experiment, we'll have concluded whether or not proceeding is viable by the end of that period and will replace it with a full campaign. I expect us to come back for more in 6 months (though more officially)! What I'm concerned about, on behalf of the users, is the idea that I have this somewhat-sensitive data sitting around indefinitely. Sell me on that being OK. 3) a) Paying users would defeat at least one aspect of the study ("will users participate?"). If response rate is a disaster but we decide we want to proceed anyway, then we can revisit this. b/c) This is the first demo. The only article we have is the one linked to from the various materials (Nick's Medium post). Ultimately, we're at the very beginning of the process, which is why I've been emphasizing the short-term experiment nature of this. It's the prototype we build so we can figure out how to build the real thing.
Flags: needinfo?(telliott)
On first blush I feel satisfied with these answers, and I appreciate the effort to make them explicit. 1. Typically, I want to see analysis code before launch for these reasons: - to prove that the probes actually answer the questions - to prove that analysts have skin in the game. - to catch probe bugs I understand this project is more provisional and exploratory. 2. Not indefinite, just not 6 months. I think 1 year is a better horizon for these reasons: - it's going to take several sprints / iterations to even get the analysis code right. - I want to make sure these lemons are fully squeezed before we get more - getting people to participate in this may mean we need to ask 200 users for 1 participant. That's a lot of annoy. - claim: it's the same risk to have 1 user's data sitting around for a year, than 2 users at 6 months each, with less total annoyance. Reasoning: if a breach happened on Day X, there would still be 1 user's data in the system. 3. What is a participation rate that would cause anxiety? I predict between 0.1% - 0.5% will be the actual. (1 in 1000 - 1 in 200). ## Other extra linting and notes: 1. Ideally, your consent and addon should link to some docs that explain these same research questions, and link to the articles, etc. 2. I am not your engineering manager, BUT.... writing analyzable probes is hart, and the later the analysis code comes, the less likely it will actually work. 3. Paying people (with money or features) is also about trying to get more representative samples. I would have no convincing argument that this sample's context graph is good or not. Bear that in mind for model training ideas before overfititng on this data! Summary: Science Review: R+, conditional on - more links from study materials to some public readable (github wiki? moz-wiki? stuff explaining the WHY of this. - rweiss responding to "On the data side, I defer somewhat to Rebecca."
Flags: needinfo?(rweiss)
On 2, I'll see if the lawyers are comfortable with a year. 6 months was chosen as it's a common standard around here, but if people are ok with a year, we can tweak that. On 3, that's in line with my expectations. I'd worry at 1 in 5000. I share some of your other concerns. Ultimately, our mandate is to get data so that experiments can be performed, and that always comes with higher levels of uncertainty and risk. Such is the nature of moonshots! As well as Nick's post, we'll also have our blog post and FAQ to link to. I can set up a wiki with links to experiment documentation, but there won't be anything in there initially. Will that be sufficient?
I have set up https://wiki.mozilla.org/Context_Graph/Experiment_Summary as a central clearinghouse for experiments and results. Obviously it's empty for now, but it'll be a nice central place to collect links to data and projects as they come in.
I believe that any data change policy merely needs to be stated clearly and explicitly in the consent form. The standard of 6 months is a convention, not a legal requirement, but we should definitely run this by them to be certain. Analysis code will be hard to generate up front, but there are a number of designs that we're interested in exploring of the traditional social network analysis variety, such as (but not limited to): - Prediction of future high engagement content given historical URL visit data for the purpose of identifying content quality on the web - Similarity among our users for types of content, in order to provide better recommendations for personal preferences - Increased visibility into lesser traveled (and yet highly engaging) sources of content on the web for the purpose of maintaining a wider cast of potentially engaging, rich content that would otherwise be missed (as it would not be in a social media feed or a search engine result first page) And others. We've mostly approached academic research partners for this initial period of study. There is some prior art to these approaches; I can provide them as references if needed.
Flags: needinfo?(rweiss)
Status: NEW → RESOLVED
Closed: 7 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.