Closed Bug 804231 Opened 13 years ago Closed 13 years ago

[stub] Privacy review for stub installer pings

Categories

(Privacy Graveyard :: General, task)

x86
Windows 7
task
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: aphadke, Assigned: me)

References

Details

(Whiteboard: [stub+] [qa-])

As part of measuring the effectiveness of stub installer, the following metrics need to be collected. These metrics are similar to regular Apache logs that are collected when user visits www.mozilla.org or about:home snippets. <domain> == mozilla hosted web-server <domain>/stub/status.php?start_download=1 <domain>/stub/status.php?finish_download=1 <domain>/stub/status.php?start_install=1 <domain>/stub/status.php?end_install=1 <domain>/stub/status.php?start_exe=1
Blocks: 802734
No longer blocks: 802734
Whiteboard: [stub+]
Blocks: 802734
The data points that are to be collected that has been discussed so far are: Result code for the installation - success or failure code Server url that the stub downloaded the install from Download amount - bytes Download duration - seconds Firefox launched code (success or failure code) Pre-existing Firefox profile - 1 or 0 Install over an existing install - 1 or 0 Install into default location - 1 or 0 Firefox locale installed Windows version 32 or 64 bit OS These will all be sent at the same time as the very last step before the installer exits.
:rstrong - bit confused here, will these pings be sent as and when each event happens OR are we aggregating the pings together and sending it once just before the installer exits?
(In reply to aphadke from comment #2) > :rstrong - bit confused here, will these pings be sent as and when each > event happens OR are we aggregating the pings together and sending it once > just before the installer exits? Sending one ping with the data aggregated before the installer exits. Also, for version 1 we are not going to send it unless the download has started since user's with firewalls will be notified that the stub is trying to make an internet connection which is just bad form. This last part can be revisited at a later date.
OS: Mac OS X → Windows 7
:rstrong - sending one ping on aggregation will result in loss of data for users who aborted mid-way. This data won't help us understand at what point users drop off. for eg: user downloaded the Firefox exe via stub but didn't click on install.
(In reply to aphadke from comment #4) > :rstrong - sending one ping on aggregation will result in loss of data for > users who aborted mid-way. This data won't help us understand at what point > users drop off. > > for eg: user downloaded the Firefox exe via stub but didn't click on install. aphadke, as I stated, we can revisit this later. Users that start the download will report back with a ping even if they abort during the download. I know you would like to also get pings for users that just launch and exit but users that have firewalls will be asked to allow the stub to ping which is just bad form as I said previously. Please don't let perfect be the enemy of good here and we can revisit whether we report back launching without initiating an install at a later date. Please also be aware that some users will deny access to the internet when the stub tries to connect through their firewall so you will not get 100% ping rate.
Please also note that even if we aggregate we don't need to send a ping for each step.
Tom, what is the status of this bug? Specifically, I need to know whether we need to provide any notification to the user that we are doing this ping in the stub with the current data set as stated in comment #1? If we do need to provide notification, can a subset of that data be sent without notification. I've talked with Sid and his reasoning for not requiring notification is the stub is already downloading software from us and this will be pinging us so it is basically web analytics.
I disagree with Sid on this one. If we just want to do web analytics, we can do web analytics. However, if we want to collect the info described, we need to notify users. I propose putting a checkbox on screen two of the installer (the screen with all the other checkboxes and whatnot), with the following text: [ ] Help Mozilla by sending a summary of how this install went. [Learn more.] [Learn more.] should look linky and produce a tooltip or similar with the following text: It's really useful for Mozilla to know how your install went, so that we can keep improving the Firefox installer. Here's the info that we're asking for: * Whether the install is successful, and we start Firefox afterwards (yay!). * Where the download comes from, how big it is, and how long it takes. * Whether you installed in the default location, over an existing copy of Firefox. * Which operating system version you're using. If we later change the data we're asking for, we'll need to change this text to.
Is there ever data we can collect without notifying the user and if so what data can we collect?
BTW: thanks for the UI suggestions and I will pass them along to UX (though it can't go on "the screen with all the other checkboxes and whatnot" since that screen is typically not seen by users) if we decide that gathering this data from stub installs is worth an additional UI interaction by the users.
Hi all, I'm going to take a look at the UX for this. I have a few questions since I'm not fully up to speed with the design request: 1. Tom: Why is it necessary to ask the user's permission to collect the data that Robert's listed? What are the potential privacy risks? (This just helps me understand how to convey the message fairly, and where to place it) 2. With Robert's suggestion of paring down the data we collect to: * Whether the install is successful, and we start Firefox afterwards (yay!) * How big the download is and how long it takes Do we still need to ask for the user's permission to collect the data? 3. When do we need to ask permission to collect the data? At the end, right before we send the ping, or when the stub installer process starts? 4. What are the risks of sending this data by default while giving the user the ability to opt out? (how big is the privacy risk vs. how big is the usability annoyance of having to show yet another dialog that the user needs to pay attention to)
I erred too much on the side of brevity with my last comment, and I'm going to err too far on the side of length with this one, sorry. * * * I want to spell out the privacy/policy background here. Larissa, I think I've answered each of the questions you raise, but not in the order you ask them, so please follow up if I've missed a crucial factor. I'm starting from the privacy principles, handily referencable at <mozilla.org/privacy>, repeated here: 1. No surprises 2. Real choices 3. Sensible settings 4. Limited data 5. User control 6. Trusted third parties I'll start with the two that I think we have totally under control: #6 & #4. There aren't any third parties involved, and the data points we're interested in are narrow, well-defined, and neatly-tailored to our purpose for collection. On #3, I think we're in a good place here. This is a neat and narrow ping with little risk of fingerprinting, accidental re-identification, or suspicious secondary use possibilities. I think it's quite sensible, (and in keeping with our thoughts on the FHR) to have it on by default. In fact, I've been thinking of this ping as being the FHR-esque component of the installer. Issues ====== ## Principle #1: No surprises I do not think that this collection is obvious to users. I think that most users would be surprised if they were told that we were collecting this information in the installer. I think that they'd be peeved: not because this info is so intrusive (it's not), but because it would feel that we've been less-than up-front. ## Principle #5: User control While this ping is not fundamentally problematic, there may be users who want it to not happen for them. Those users need a way to opt-out of collection before it occurs. That means that there needs to be an opt-out in advance, and that opt-out needs to be readily discoverable. ## Principle #2: Real choices To allow users to make real choices, we have to give them the info they need To understand that choice. In this case (I think), one sentence of purpose and four shortish bullets give a really comprehensive description of what's going on. In fact, I think it quite possible that many of the users whose ears normally perk up at the thought of data collection would read a description and make the informed decision that this is not a problem for them. Minimal collection ================== Any instrumentation for data collection should involve user consent, notice, or choice, depending on the details of the collection. It doesn't matter what the collection is; if we're collecting data for the purpose of collecting data, we need to be up-front, and provide an appropriate level of notice and choice. However, this principle is tailored for deliberate data collection. Incidental data collection is a somewhat different matter. There are things that Firefox *needs* to do. It needs to check for updates and it needs to check the add-ons blocklist. The installer similarly needs to actually download Fitrefox. If an application actually has to connect to an external server in order to provide a user-facing feature, there is no reason why we shouldn't count those events, and analyse them in a manner which cautiously respects privacy needs, by -- for instance -- not attempting to re-identify users. If we are just logging the receipt of connections which the client software needs to make, we do not need to provide advance notice and choice. It should be possible to disable optional features like safebrowsing, or CRL checking. However, it would simply not be possible for the installer to operate without connecting to server to download Firefox. It is reasonable for us to count those events, and to analyse our records of them. By my analysis, we could use basic analytics on the server side to assess: * Server url that the stub downloaded the install from * Download amount - bytes * Download duration - seconds * Firefox locale installed * Windows version * 32 or 64 bit OS But we would not be able to assess: * Result code for the installation - success or failure code * Firefox launched code (success or failure code) * Pre-existing Firefox profile - 1 or 0 * Install over an existing install - 1 or 0 * Install into default location - 1 or 0 This does not mean that it's okay to have a no-notice, no-choice ping that sends back the first list. It means that I think we can use server-side analysis to count these things. If that approach meets our needs and we want to pursue it, that's great. If the second list is sufficiently important, or the technical implementation of server-side analysis sufficiently infeasible that we need to specifically instrument the installer to send data which does not provide a direct user-facing feature, then we need some amount of notice and choice. Implementation ============== [Reference: I have been looking at <http://cl.ly/image/2N0X1J3z3l1U/o>] It turns out I totally missed the "options" button, rats! I had though that there was a page all users would see where they could make this choice. I don't think it's okay just to offer this notice and opt-out opportunity to savvy users who want to customise their install. Because of my analysis of principle #4 [limited data], I think that it is fine to have this be an opt-out choice. Here are the things which I think are hard limits: 1. Every user should see a notice before the ping is sent. There's no privacy requirement about where it needs to be in the flow as long as it's before the ping. 2. When a user sees the notice, they should have an obvious path to say "no" right there. There's no privacy requirement about whether that's a checkbox, a modal dialog, a switch, or some other UX paradigm, as long as there's a clear route to continue the installation without sending the ping. 3. When a user sees the notice, there should be an obvious and low-friction path to get more info. I don't have any strong privacy reason why it should be a tooltip, but I don't think that a link to a web-page works in the context where you're in the process of installing a new browser. I thoroughly defer to Larissa about exactly how to present this, and where it belongs in the flow, but I encourage the use of the text I suggested in comment #8. I'd love to see a proposed screenshot or wireframe flow before closing this review.
Thanks for the explanation, Tom, and for clearly stating what the privacy requirements are from the design (and where we can be flexible). We're now trying to figure out where in the stub installer process this can go. In the meantime (for everyone), my naive question is: "Do we need this data badly enough that we're willing to add a bit of intrusive UI?" I don't have great ideas where to place this message yet, and I want to make sure we're not adding UI for something that doesn't give us a good return in terms of how we can improve the product.
Since we need to provide notification I think it would be best to tie this into our funnelcake runs. This way the ping would only happen and the UI would only be shown when we are specifically gathering data.
In case it isn't clear, this way the UX that is decided upon would only be displayed to a small number of users.
More questions (sorry that I cause trouble and thinking...): What if we only ask to send information if the stub installer fails? That way we can incorporate it as part of the process for trying a different way to install Firefox, and not interfere with the happy new user with the successful installation? (Maybe we can get any metrics we lose from not asking the successful installs purely via server side analytics)
(In reply to Robert Strong [:rstrong] (vac 10/26-11/4) (do not email) from comment #15) > In case it isn't clear, this way the UX that is decided upon would only be > displayed to a small number of users. ok, that's not too bad, even if we have to go with a slightly less-great UX
Let's take that over to bug 805575.
Ok. I'm going to ask a very high level question after reading the very thoughtful and detailed comments on this bug. I simply need a yes/no clarification as I don't see my question answered. First, I understand the entire privacy principles and I think they are great and great guidelines. Underlying these principles is a "user centric" and "user identifiable" approach to the problem However, as I view this stub installer question, I'm still unclear? Are we dealing with *any* specific user identifiable information with stub installer? Yes/No? And if no, then are we conflating this issue with our long detailed privacy principles when we don't apply these same user protection privacy principles in our other mozilla properties (blocklist, other pings?). In other words, are we all aligned on the underlying assumption of this conversation that we have the ability to uniquely identify our users with this code/process? 'cause I'm not clear that we have that ability from what I've heard elsewhere. Or to put it another way, "who" are we protecting?
@jcook Re: Are we dealing with *any* specific user identifiable information with stub installer? A: No, the collected information is raw apache logs which is no different than what we get today when user visits www.mozilla.org or any moz web property. Refer: http://htmlpad.org/metrics-mango-brownbag-2011-08-09/#5 as how we handle log data. Re: conflating the issue A: Redirecting to @StrangeCharm and @Sid on this question.
(In reply to Jim Cook from comment #19) > First, I understand the entire privacy principles and I think they are great > and great guidelines. Underlying these principles is a "user centric" and > "user identifiable" approach to the problem The principles are a good starting point and help us identify privacy-related risks to our users; they get us to have these discussions and think through what could happen if we don't follow them to the T. In this case, you probably see a that following the principles as law might not be the best experience for our users. I'm thinking you want answers to these two questions: 1. If we ship the ping with no UI, what's the risk; what bad stuff will happen to our users, our brand, or the web if we do it (and how likely is the risk realized). 2. How does this weigh against what our users want; what value do they get if we take the risk (versus not) Given these data, we can weigh whether or not the risk is worth it (which direction is better for our users and their trust of us). > Or to put it another way, "who" are we protecting? This is a tangent, but important to note: it's not necessary to identify individuals to protect them. What I think Tom is asking for is a protection of trust (not a protection of confidentiality); don't surprise people, and they'll trust us more.
Status: NEW → ASSIGNED
Final conclusion as noted in #805575 (check there for strings): The decision checkbox is on the options page, and pre-checked. There is a notice (and learn more link) on the download page, right under the download button. Resolving this complete unless other issues crop up.
Status: ASSIGNED → RESOLVED
Closed: 13 years ago
Resolution: --- → FIXED
Whiteboard: [stub+] → [stub+] [qa-]
(In reply to Tom Lowenthal [:StrangeCharm] from comment #22) > Final conclusion as noted in #805575 (check there for strings): > > The decision checkbox is on the options page, and pre-checked. > There is a notice (and learn more link) on the download page, right under > the download button. Just out of curiosity, which download page is this on? We have at least two dedicated desktop download pages (/fx and /new) and other download buttons scattered across our sites (including on mozilla.org). Does the notice need to accompany all of them?
Yes, the notice should be visible next to any button which links to a Firefox download. The purpose is to provide notice before someone downloads Firefox. If it's only in some places, it doesn't meet that need.
You need to log in before you can comment on or make changes to this bug.