Closed Bug 1515172 Opened 11 months ago Closed 4 months ago

accept non-utm params in attributions

Categories

(Firefox :: General, enhancement, P1)

66 Branch
enhancement

Tracking

()

RESOLVED FIXED
Firefox 70
Tracking Status
firefox70 --- fixed

People

(Reporter: hoosteeno, Assigned: mixedpuppy)

References

(Depends on 2 open bugs)

Details

Attachments

(3 files)

+++ This bug was initially created as a clone of Bug #1515153 +++

In Bug #1515153 comment 4, :mixedpuppy offered to add some fields to the attribution data that accompanies a download of Firefox. 

We should add two new fields to this data: One can hold an experiment identifier, and one can hold a variation identifier.

Imagine a simple, contrived experiment: We have reason to believe that people who click on a blue download button are more likely to be Firefox users after 3 weeks than people who click on a green download button.

We can easily segment our downloaders between two variations of download button. But we don't want to directly modify (aka overload) the existing attribution parameters, because they contain valuable data we don't want to clobber, and also because doing so can confound various metrics[0].

If we can add experiment and variation parameters to a download link that will appear in attribution, we can easily monitor the segments over time to learn if it is true that the blue button is better for retention.

Example:
www.mozilla.org/firefox/download/thanks/?eid=button-test-201812&v=blue

[0] see bug 1511104 comment 10
No longer blocks: 1468680, 1511104, 1344771
No longer depends on: 1515153
See Also: → 1515153

:mixedpuppy, are you still up for helping to make this happen?

A lot of teams hope to run experiments that affect install rates. It's impossible to measure install rates without stub attribution, but stub attribution depends on overloading utm parameters, which are there to attribute some marketing further up the funnel (such as an ad). In other words, we can't run websites experiments on install rates without clobbering attribution for someone else.

If we had the option to add an additional parameter to stub attribution, this problem would go away.

Flags: needinfo?(mixedpuppy)

+1 We really need an experiment identifier and a variation identifier passed through stub attribution, for us to optimize the funnel in a meaningful way. Let me know how I can help.

I'm actually just looking at the code for other reasons, but really anyone can add the additional fields[1] easily enough. Also add to the tests (also under browser/components/attribution/test).

[1] https://searchfox.org/mozilla-central/rev/6e3cc153566f5f288ae768a2172385b8436d61dd/browser/components/attribution/AttributionCode.jsm#17-22

Flags: needinfo?(mixedpuppy)

Attaching a data steward review form.

This is a small extension to a feature that has had extensive review: https://bugzilla.mozilla.org/show_bug.cgi?id=1259607

Attachment #9045483 - Flags: review?(chutten)
Comment on attachment 9045483 [details]
experiment_attribution_data_request.md

I have a few questions before the Data Review.

1) By what means will this be reported? It's unclear whether this will be reported using the stub installer, using Firefox Telemetry, or using some other means. That affects opt-out somewhat and informs what other information it is being collected with.

2) The addition of identifiers requires a little extra scrutiny. Is there a minimum cohort size? Which datasets will have these identifiers/Will this introduce any new linkages between data sources?

3) The first question I need to answer in the Data Review Response is where this will be documented. (this also ties into #1) Can you point me at documentation that users could use to find out about this collection?
Flags: needinfo?(hoosteeno)
Attachment #9045483 - Flags: review?(chutten)
  1. By what means will this be reported?

This will be reported using the stub installer, using Firefox Telemetry.

  1. The addition of identifiers requires a little extra scrutiny. Is there a
    minimum cohort size? Which datasets will have these identifiers/Will this
    introduce any new linkages between data sources?

To be clear, there is nothing preventing anyone on the internet from putting any valid string[1] into a link to our download page, and thereby injecting a valid string into our attribution data in telemetry. We do not seek to change the validation criteria currently applied to non-source parameters; we'd want the same validation applied to these strings, too.

We have not designed a minimum cohort size, but generally achieving any kind of statistical confidence in an experiment requires thousands or tens of thousands of participants per variation.

An example of both of the above is found in a currently running experiment, "How To Install"[2], where we are adding utm parameters (only to sessions that lack them) to track experimental branches. Each branch must get more than 16,000 visits to achieve the confidence level we seek.

The data go into 'environment.settings', I believe. They expire after 13 weeks, I believe. More info in https://bugzilla.mozilla.org/show_bug.cgi?id=1292360#c5.

These parameters can be cross-referenced in Google Analytics, just like the current attribution data. That is by design; the data are intended to pass through a limited amount of information about the pre-download journey (on www.mozilla.org) to the post-download instance of Firefox.

  1. The first question I need to answer in the Data Review Response is where
    this will be documented. (this also ties into #1) Can you point me at
    documentation that users could use to find out about this collection?

Attribution is documented in environment docs: https://firefox-source-docs.mozilla.org/toolkit/components/telemetry/telemetry/data/environment.html?highlight=attribution

[1] https://github.com/mozilla-services/stubattribution/blob/master/attributioncode/validator.go#L73
[2] https://mana.mozilla.org/wiki/pages/viewpage.action?pageId=90734715

Flags: needinfo?(hoosteeno)

Great. Thank you for your answers.

(( Near as I can tell from docs and code, environment.settings.attribution does not expire. Which is fine, it's serving a useful purpose across the profile lifetime. ))

So this will be adding fields to the Telemetry Environment to collect the data. This will require some schema changes in addition to updating the Environment documentation for the new fields. We also recommend automated tests for permanent data collections to ensure nothing breaks it while we're not looking.

I'll proceed with the review now.

Comment on attachment 9045483 [details]
experiment_attribution_data_request.md

DATA COLLECTION REVIEW RESPONSE:

    Is there or will there be documentation that describes the schema for the ultimate data set available publicly, complete and accurate?

Yes. This collection is part of the Telemetry Environment so the [Environment docs](https://firefox-source-docs.mozilla.org/toolkit/components/telemetry/telemetry/data/environment.html) will be updated to include documentation for this new collection.

    Is there a control mechanism that allows the user to turn the data collection on and off?

Yes. This collection is Telemetry so can be controlled through Firefox's Preferences. Also, the presence of a Do Not Track header on the download page will scrub attribution.

    If the request is for permanent data collection, is there someone who will monitor the data over time?

Yes. :hoosteeno is responsible.

    Using the category system of data types on the Mozilla wiki, what collection type of data do the requested measurements fall under?

Category 2, Interaction.

    Is the data collection request for default-on or default-off?

Default on for all channels.

    Does the instrumentation include the addition of any new identifiers?

Yes. Two new identifiers will be added to identify the experiment name and branch that we attribute the download to. These will be chosen from a limited range of possible responses and will be used in aggregate.

    Is the data collection covered by the existing Firefox privacy notice?

Yes.

    Does there need to be a check-in in the future to determine whether to renew the data?

No. This collection is permanent.

    Does the data collection use a third-party collection tool?

No. This collection is Telemetry.

---
Result: datareview+
Attachment #9045483 - Flags: data-review+

:RT, is this something you can help prioritize?

Flags: needinfo?(rtestard)

Moving NI to Jim since we agreed he will own prioritization for growth tactics

Flags: needinfo?(rtestard) → needinfo?(jimthomas)

Anything we can do to help with this or additional information we can provide on why this is important? Would love to understand when/if we might see this land. Thanks!

Landing the change in Firefox is the easy part, we just need agreement (probably from those on this bug) and spec of what params should be added. When that is ready if you need someone to do it ping me.

There have been plenty of issues around this over time, so making sure the end-to-end is solid takes a bit of hand holding. Note this will only work on windows stub installers.

As RT mentioned, I'm looking at at this and several other related items as part of a larger attribution / top-of-funnel growth strategy. This seems like it may be an easy win, but the biggest question in my mind is how much we need to future-proof this as an important piece of our growth/experimentation infrastructure. I'm currently working to clarify that narrative so everyone can be on the same page. That would include a spec of these params and any necessary signoffs (although from what I understand most of that has already been settled).

Shane, when you mention making sure it's "end-to-end" solid, is there someone I need to connect with to ensure that's the case? Data Engineering maybe?

Flags: needinfo?(jimthomas)

I'm linking to this GitHub issue because it offers context on a remarkably similar effort underway for FxA. The website needs to handle/preserve upstream referrers as it communicates with downstream channels about both upstream referrers and website conversion points. FxA will support this by adding 2 parameters to its inputs, so we don't have to overload/override utm parameters.

https://github.com/mozilla/fxa/issues/693#issue-428872658

(In reply to Jim T from comment #13)

Shane, when you mention making sure it's "end-to-end" solid, is there someone I need to connect with to ensure that's the case? Data Engineering maybe?

Make sure you know the entire flow of the attributions, each system and who's responsible in that area, so that there is some assurance that everyone knows what's going on.

Our experience with RTAMO has had a number of hickups, but the most recent being a server change that essentially prevented utm params getting into the stub installer. We scrambled to figure out why, and once we found out, realized that particular team had no knowledge of our project, as well as no tests for the attributions getting into the stub installer. The server change in this case was for another project altering some use of the utm params.

(In reply to Shane Caraveo (:mixedpuppy) from comment #15)

Make sure you know the entire flow of the attributions, each system and who's responsible in that area, so that there is some assurance that everyone knows what's going on.

Thanks Shane! I'm coordinating these type of funnel telemetry efforts in an internal doc:
https://docs.google.com/document/d/1xScPda69dRBTSGbQhiVcPwLPDbQo2ZhvatGkSfoO_f0/edit#

Feedback is appreciated!

Blocks: 1543284

Hi Shane,

Wanted to update you on this. We're reaching agreement on the spec in the document I mentioned above and I've confirmed this will actually help address the issues seen in the RTAMO project. I have a meeting with Data Science tomorrow to review, but that's mostly to understand the implications for future analysis.

What would you need to see in terms of a go/no-go decision on this work?

Jim

Flags: needinfo?(mixedpuppy)

Hey all, this looks good as far as I can tell from a Data Science perspective. One question I'd have is regarding tracking installs through the experiment parameters.

Am I correct that we can only track installs made through stub installers? Would the stub installer data include these new experiment parameters? Do we expect that most download experiments that we would run would be for download of stub installers only?

I realize that having the parameters in the telemetry attribution data is probably adequate, but having it in the installation data as well would be helpful.

Hi Jim,
All I need are the final params that need to be accepted into firefox, and that there is a go on this project overall. This looks like funnel_experiment and funnel_variation. I vaguely recall a length limit on the url parameters somewhere, can we shorten the parameter names? Maybe f_exp and f_var?
Shane

Flags: needinfo?(mixedpuppy) → needinfo?(jimthomas)

(In reply to Jesse McCrosky from comment #18)

Hey all, this looks good as far as I can tell from a Data Science perspective. One question I'd have is regarding tracking installs through the experiment parameters.

Am I correct that we can only track installs made through stub installers? Would the stub installer data include these new experiment parameters? Do we expect that most download experiments that we would run would be for download of stub installers only?

Matt, can you help answer Jesse's question about the stub installer data?

(In reply to Shane Caraveo (:mixedpuppy) from comment #19)

Hi Jim,
All I need are the final params that need to be accepted into firefox, and that there is a go on this project overall. This looks like funnel_experiment and funnel_variation. I vaguely recall a length limit on the url parameters somewhere, can we shorten the parameter names? Maybe f_exp and f_var?
Shane

We're a go on the overall project. In fact, it turns out it may end up being helpful for measuring some upcoming initiatives.

In terms of naming, I'd prefer something a little more descriptive to avoid confusion. Any idea what the length limit is? What about funnel_exp and funnel_var?

Flags: needinfo?(jimthomas) → needinfo?(mhowell)

(In reply to Jim T from comment #20)
Matt, can you help answer Jesse's question about the stub installer data?

(In reply to Jesse McCrosky from comment #18)
Am I correct that we can only track installs made through stub installers?

Right, we don't currently have attribution for full installers. We do have it for Mac now though, so it's not just the Windows stub installer.

Would the stub installer data include these new experiment parameters?

Yes. They would be right next to the current utm parameters everywhere those appear, including in the install ping.

(In reply to Jim T from comment #20)
In terms of naming, I'd prefer something a little more descriptive to avoid confusion. Any idea what the length limit is? What about funnel_exp and funnel_var?

There aren't any specific limits on the names. We have a total of roughly 1000 bytes that the whole string with all the parameter names and values has to fit into, but we aren't currently anywhere near that limit and it could be extended anyway if needed, so using a few bytes to make these names more descriptive is a good idea.
On the telemetry side, we do also have a limit of 200 characters per value, but again no limit on names.

Flags: needinfo?(mhowell)

(In reply to Matt Howell (he/him) [:mhowell] from comment #21)

(In reply to Jim T from comment #20)
Matt, can you help answer Jesse's question about the stub installer data?

(In reply to Jesse McCrosky from comment #18)
Am I correct that we can only track installs made through stub installers?

Right, we don't currently have attribution for full installers. We do have it for Mac now though, so it's not just the Windows stub installer.

Actually we do not have it for mac. While the code is there, and the data is in the mac quarantine database, there is a quirk that doesn't allow us to get the data under normal install circumstances.

Specifically the issue is Bug 1525076 comment 2

Thanks for that info Matt!

Shane, given that there's no limit on names, let's stick with the original parameters:

funnel_experiment: name/id of the enrolled experiment
funnel_variation: name/id of the variation cohort used in the enrolled experiment

:merwin, would you mind considering the scope of this bug and sharing any feedback or concerns you have with the proposal (comment 0) and the general specs that have emerged here (comment 16)?

Thanks!

Flags: needinfo?(merwin)

HI Justin,

Clearing NI on Marshall's behalf. Based on the experiment nature of this week and the cohort sizes needed to get the metrics needed, Trust is ok with the addition of the experiment and variation identifier. No additional risk to users identified.

Please let us know if you need anything else and thanks,
Alicia

Flags: needinfo?(merwin)

Hi Shane, do you need anything else to work on this? From my end we're good to go.

(In reply to Jim T from comment #23)

Thanks for that info Matt!

Shane, given that there's no limit on names, let's stick with the original parameters:

funnel_experiment: name/id of the enrolled experiment
funnel_variation: name/id of the variation cohort used in the enrolled experiment

(In reply to Shane Caraveo (:mixedpuppy) from comment #19)

Hi Jim,
All I need are the final params that need to be accepted into firefox, and that there is a go on this project overall. This looks like funnel_experiment and funnel_variation. I vaguely recall a length limit on the url parameters somewhere, can we shorten the parameter names? Maybe f_exp and f_var?
Shane

Flags: needinfo?(mixedpuppy)

You'll need to modify the attribution code[1] to allow other attrs. Then add tests for it[2]. If someone isn't available to do this I'll try to get to it, just let me know. And what is the target/deadline?

[1] https://searchfox.org/mozilla-central/rev/8a990595ce6d5ed07ace2d4d5d86cc69aec90bde/browser/components/attribution/AttributionCode.jsm#43

[2] https://searchfox.org/mozilla-central/source/browser/components/attribution/test/xpcshell/test_AttributionCode.js

Flags: needinfo?(mixedpuppy)
Flags: needinfo?(jimthomas)

Hey Shane,

I'm hoping to get this into Fx70 at this point - in fact we just had another discussion today where it came up as a blocker for some desired testing. I don't have anyone else identified to do this work, so if you can help out that would be great! If not, let me know and I'll have to try to see who else I can find.

Thanks again for your help!

Flags: needinfo?(jimthomas)

I'll try to get to it. IIUC you need funnel_experiment and funnel_variation to be allowed through attributions. Let me know if I missed anything.

Assignee: nobody → mixedpuppy
Flags: needinfo?(jimthomas)

I'd also like to know for sure that funnel_experiment and funnel_variation are actually what will end up in the stub installer. Currently, "utm_" is stripped and the stub installer gets "source" rather than "utm_source".

:jcrawford are you aware of any other parameter manipulation? Do we need to coordinate with Romain or Matt on that? I'll cc them just in case.

Flags: needinfo?(jimthomas) → needinfo?(hoosteeno)

Assuming that the attributions will have the full name (e.g. funnel_experiment) the patch should be sufficient. If "funnel_" is stripped, it will just be a minor change to the patch.

I need an answer on the param handling in order to land this patch.

Flags: needinfo?(jimthomas)
Depends on: 1567320
Depends on: 1567331
Depends on: 1567339

This bug involves 4 major components:

  1. Bedrock sends parameters to
  2. the stub attribution service, which puts data into the installer for
  3. Firefox, which pings data to
  4. Telemetry

This bug relates to #3, Firefox. The blocking bugs just filed cover the other three items on the list. All of the pieces need to be in place before this work is done.

I'd also like to know for sure that funnel_experiment and funnel_variation are actually what will end up in the stub installer. Currently, "utm_" is stripped and the stub installer gets "source" rather than "utm_source".

I believe the Bedrock work and the StubAttribution work can be specified to send the entire parameter name.

Flags: needinfo?(hoosteeno)

Given the comment at https://github.com/mozilla/bedrock/issues/7474#issuecomment-513291225 I suggest we shorten these to experiment and variation.

The URL parameters may be funnel_experiment, but "funnel_" would be stripped in the same manor that "utm_" is stripped for the utm parameters.

Pushed by scaraveo@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/cee7b065455b
support funnel attributes in attribution code r=mconley

Backed out changeset cee7b065455b (bug 1515172) for xpcshell failures at browser/components/attribution/test/xpcshell/test_AttributionCode.js

Backout: https://hg.mozilla.org/integration/autoland/rev/29458adca9045857ded140b356d5701d8f2bc559

Failure push: https://treeherder.mozilla.org/#/jobs?repo=autoland&revision=cee7b065455bb93c4c395fb9468c5e20d5197b38

Failure log: https://treeherder.mozilla.org/logviewer.html#/jobs?job_id=257430043&repo=autoland&lineNumber=5104

18:24:11 INFO - TEST-START | browser/components/attribution/test/xpcshell/test_AttributionCode.js
18:24:12 WARNING - TEST-UNEXPECTED-FAIL | browser/components/attribution/test/xpcshell/test_AttributionCode.js | xpcshell return code: 0
18:24:12 INFO - TEST-INFO took 383ms
18:24:12 INFO - >>>>>>>
18:24:12 INFO - PID 11244 | Unable to load \untrusted-startup-test-dll.dll; LoadLibraryW failed: 126[11244, Main Thread] WARNING: Failed to get directory to cache.: file z:/build/build/src/security/sandbox/win/src/sandboxbroker/sandboxBroker.cpp, line 82
18:24:12 INFO - PID 11244 | [11244, Main Thread] WARNING: Failed to get directory to cache.: file z:/build/build/src/security/sandbox/win/src/sandboxbroker/sandboxBroker.cpp, line 82
18:24:12 INFO - PID 11244 | [11244, Main Thread] WARNING: Failed to get directory to cache.: file z:/build/build/src/security/sandbox/win/src/sandboxbroker/sandboxBroker.cpp, line 82
18:24:12 INFO - PID 11244 | [11244, Main Thread] WARNING: Failed to get directory to cache.: file z:/build/build/src/security/sandbox/win/src/sandboxbroker/sandboxBroker.cpp, line 82
18:24:12 INFO - PID 11244 | [11244, Main Thread] WARNING: Failed to get directory to cache.: file z:/build/build/src/security/sandbox/win/src/sandboxbroker/sandboxBroker.cpp, line 82
18:24:12 INFO - PID 11244 | [11244, Main Thread] WARNING: Couldn't get the user appdata directory. Crash events may not be produced.: file z:/build/build/src/toolkit/crashreporter/nsExceptionHandler.cpp, line 2634
18:24:12 INFO - (xpcshell/head.js) | test MAIN run_test pending (1)
18:24:12 INFO - (xpcshell/head.js) | test run_next_test 0 pending (2)
18:24:12 INFO - (xpcshell/head.js) | test MAIN run_test finished (2)
18:24:12 INFO - running event loop
18:24:12 INFO - browser/components/attribution/test/xpcshell/test_AttributionCode.js | Starting testValidAttrCodes
18:24:12 INFO - (xpcshell/head.js) | test testValidAttrCodes pending (2)
18:24:12 INFO - (xpcshell/head.js) | test run_next_test 0 finished (2)
18:24:12 INFO - PID 11244 | [11244, Main Thread] WARNING: NS_ENSURE_SUCCESS(rv, kKnownEsrVersion) failed with result 0x80004002: file z:/build/build/src/toolkit/components/resistfingerprinting/nsRFPService.cpp, line 662

Flags: needinfo?(mixedpuppy)
Pushed by scaraveo@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/200c15f8b158
support funnel attributes in attribution code r=mconley
Status: NEW → RESOLVED
Closed: 4 months ago
Resolution: --- → FIXED
Target Milestone: --- → Firefox 70
Flags: needinfo?(jimthomas)
Flags: needinfo?(mixedpuppy)

(In reply to Justin Crawford [:hoosteeno] [:jcrawford] from comment #36)

  1. Bedrock sends parameters to
  2. the stub attribution service, which puts data into the installer for
  3. Firefox, which pings data to
  4. Telemetry

This bug relates to #3, Firefox. The blocking bugs just filed cover the other three items on the list. All of the pieces need to be in place before this work is done.

Looks like this bug also covered #4. Here's a screenshot of about:telemetry after faking attribution (on mac with a referrer url with various url params).

(Also tested setting other params on the url, and they did not appear in the telemetry environment data.)

How soon is this attribution data needed? This is currently supported on nightly 70 -> release october 22nd. Does it want to go to beta 69 -> release september 3rd (or even release 68)? The server pieces haven't landed yet but don't ride the trains.

Flags: needinfo?(hoosteeno)

:mardak, thanks for asking. We can wait for 70.

Flags: needinfo?(hoosteeno)
See Also: → 1571773
You need to log in before you can comment on or make changes to this bug.