Closed Bug 1406005 Opened 3 years ago Closed 6 months ago

Run onboarding experiment passing UA string variation to attribution service

Categories

(www.mozilla.org :: General, enhancement)

Development/Staging
enhancement
Not set
normal

Tracking

(firefox57 wontfix)

RESOLVED FIXED
Tracking Status
firefox57 --- wontfix

People

(Reporter: RT, Unassigned)

References

(Blocks 1 open bug)

Details

User Story

As a product manager, I want to understand how onboarding activities impact users based on the browser type they downloaded the Firefox stub installer from, so I can optimize onboarding based on the browser users come from.

As a marketing manager, I want to understand the relative value of new users (retention/engagement) based on the type of browsers they downloaded Firefox from, so I can optimizemarketing activities.

Requirements:
- Collect browser type and version from the user agent string exposed when initiating the stub installer download
- Add browser type and version to the attribution data passed on to the stub installer
- The browser details are added to the stub installer success ping attribution data and then to the unified telemetry activation ping
- The browser data can be processed through the data pipeline and be made available through the Presto churn table (sample query of what is available today https://sql.telemetry.mozilla.org/queries/3841/source?p_start_date=20170118#table)

Proposed implementation:
- Having the attribution service or bouncer capture the UA string would avoid website changes since the requesting browser will have sent that header to the service.

Attachments

(2 files)

No description provided.
User Story: (updated)
Here are some examples of attribution data and a few options:

Existing payload:

attribution.source=www.google.com
attribution.medium=cpc
attribution.campaign=fast-browser
attribution.content=fastest

New proposed attribution fields:

Option 1: parse the UA string on mozilla.org and just pass in the browser name and version.

Two new additional fields:

attribution.browsername=Trident
attribution.browserversion=11.0

Option 2: pass in the entire UA string and it can be parse on the reporting side.

attribution.ua=Mozilla/5.0 (Windows NT 6.3; Trident/7.0; rv:11.0) like Gecko 

The benefit to option 2 is that it doesn't destroy data and it doesn't require mozilla.org to have any additional logic.
Severity: normal → enhancement
Summary: Add browser details to stub attribution data → Attribution service should handle the UA string data
Blocks: 1414258
Product: Firefox → www.mozilla.org
Version: unspecified → Development/Staging
User Story: (updated)
Julien, do you have an opinion on this?
Flags: needinfo?(jvehent)
This removes input validation entirely. It is generally a bad idea to access and reuse input without any validation, but I'm not sure what the actual impact is in this case. How urgent is this request, and can we perform a code review beforehand?
Flags: needinfo?(jvehent)
We're hoping to have this in place for 59 (it seems to make sense to do this along with bug 1414265 that also adds data to the attribution field and bug 1414265 tries to ship in 59 along with bug Bug 1414258 (install an add-on with a new Firefox installation).
Flags: needinfo?(jvehent)
Flags: needinfo?(jvehent)

Recent UR suggests that new and resurrected users who download Firefox using Chrome are especially likely to become long term (retained) Firefox users. I would like to nudge this bug and see if we can get started on experimenting with custom Firefox Onboarding sequences in Firefox 72.

:hoosteeno, is this the right bug to track this work and make this request?

Flags: needinfo?(hoosteeno)
Flags: needinfo?(tspurway)

:hoosteeno, is this the right bug to track this work and make this request?

Yes.

Proposed implementation:

  • Having the attribution service or bouncer capture the UA string would avoid website changes since the requesting browser will have sent that header to the service.

This approach would be cleaner than having the website collect and pass this information from the download button. It's worth investigating the feasibility of this approach, first. That investigation should consider the fact that some changes to one or both of these systems will be necessary, no matter what.

Option 2: pass in the entire UA string and it can be parse on the reporting side.
attribution.ua=Mozilla/5.0 (Windows NT 6.3; Trident/7.0; rv:11.0) like Gecko

This makes the most sense to me. Pre-processing the UA might prohibit some useful future analysis.

It is possible that we could experiment with using UA data to power onboarding flows by passing that data into the installer through experiment parameters (see bug 1515172); but those parameters are not a long-term solution for this.

Flags: needinfo?(hoosteeno)
Flags: needinfo?(tspurway)
Flags: needinfo?(tspurway)
Blocks: 1592312

Looks like this still needs sign off from :ulfr.

Is it acceptable to capture the full user agent if limit the string length and set of characters?

Flags: needinfo?(jvehent)

:freddyb is going to look at it. What's your timeline?

Flags: needinfo?(jvehent)
Flags: needinfo?(hoosteeno)

(In reply to Jeremy Orem [:oremj] from comment #7)

Is it acceptable to capture the full user agent if limit the string length and set of characters?

I think we can make both options (from comment 1) safe, but they have their individual pros and cons.

Option 1:
If we parse and capture the relevant bits from the user agent directly on mozilla.org, it might be initially less than what we need further down the line. This would result in us re-iterating over the user-agent parsing on mozilla.org.

Option 2:
Alternatively, we take everything that is a meaningful user agent string. We'd have to drop those UAs that are too long or don't match a pre-defined set of characters. Otherwise it might break out of the "key=value" syntax and our data would be wrong.
We'd still store the full user agent for those that are compliant with our restriction.
This could result in us parsing the user-agent multiple times with different parsers, which could lead to different readings of our data.

Does that answer your question, oremj?

Flags: needinfo?(oremj)

I think the needinfo with my name on it relates to

(In reply to Julien Vehent [:ulfr] from comment #8)

:freddyb is going to look at it. What's your timeline?

I'll cite :tspurway in comment 5 here:

I would like to nudge this bug and see if we can get started on experimenting with custom Firefox Onboarding sequences in Firefox 72.

Flags: needinfo?(hoosteeno)

Can we expose the UA/browser name and version from mozilla.org?

Flags: needinfo?(pmac)

We can but it's kind of pointless since the client will then connect directly to bouncer which could also grab that info. It really depends on how many of these we want to collect. Currently www.mozilla.org (WMO) only adds attribution data to bouncer requests when a user came to the site with UTM parameters in their link. But every browser has a UA string. If we start building and sending an attributed stub installer to all requests for one it would increase the load on the attribution service quite a lot. I think the attribution service could more simply just grab the UA string and add it to stubs it was already going to attribute (if there's room), or bouncer could send a percentage of ones it normally wouldn't have to the attribution service if we want more.

tl;dr WMO could add the UA string to the signed attribution data sent to bouncer/stub-attribution service, but the browser will send it to them as well so we don't need to

Flags: needinfo?(pmac)

Part of this bug is to break down the UA in to browser/version. Not sure if moz.org already does this, but if it does already have code for this, it would be nice not to have to replicate. If we capture the UA in moz.org it might make a better snapshot of the situation, so we can say this attribution code was generated from X campaign, X browser, etc. If a stubattribution URL gets posted somewhere, we could be getting some false data if different UAs are hitting the same stubattribution URL. I suppose either way there would be some false information in that case.

Flags: needinfo?(oremj)

We do have some browser detection code that could potentially be repurposed for this, but then we'll have to decide ahead of time what we'd like to collect. I don't think it matters for false data if the UA is collected by the site or by bouncer/stubattribution, but you're right that it matters that we do consistent parsing if we're going to parse.

I'm adding a NI for Alex Gibson as he knows more about the UA parsing that we currently do on the site.

Flags: needinfo?(agibson)

(In reply to Paul [:pmac] McLanahan from comment #14)

We do have some browser detection code that could potentially be repurposed for this, but then we'll have to decide ahead of time what we'd like to collect. I don't think it matters for false data if the UA is collected by the site or by bouncer/stubattribution, but you're right that it matters that we do consistent parsing if we're going to parse.

I'm adding a NI for Alex Gibson as he knows more about the UA parsing that we currently do on the site.

We have built in UA detection to break down Firefox versions, as well as some OS versions, but we don't have a generic UA parser that works across all browsers, which sounds like what's being described here. This is something we could potentially invest in building, but as :pmac says the browser does already send this info over, so I don't know if this makes more sense than allowing Bouncer to parse the info.

Flags: needinfo?(agibson)

(In reply to Paul [:pmac] McLanahan from comment #12)

If we start building and sending an attributed stub installer to all requests for one it would increase the load on the attribution service quite a lot.

Maybe to start, would it be more feasible to have a single chrome-attributed installer saved on CDN that all Chrome users get? We would lose out on the exact UA for later usage within Firefox, but perhaps knowing "came from Chrome" "or not" is good enough for initial telemetry and onboarding experiments?

Yes, it would be much easier to detect just a few user agents rather than create a generic user agent parser. Can we pare this request down?

Can we be assured that the installer :mardak proposes in comment 16 will play well with stub attribution in general? We should be able to simultaneously understand UA and any attribution. If we can do that, I think it's a very nice reduction of complexity, assuming it covers the analysis need that inspired :tspurway to revive this bug in comment 5.

Just making sure, the current behavior:

  • most firefox downloads -> plain installer from cdn
  • special referral attribution -> custom referral installer per request

A potential scoped-down optimization for UA:

  • chrome UA -> custom google-attributed installer from cdn
  • firefox UA -> custom mozilla-attributed installer from cdn
  • edge/IE UA -> custom microsoft-attributed installer from cdn
  • safari UA…
  • opera UA…
  • other UA -> plain installer from cdn
  • special referral attribution -> custom referral installer including google/mozilla/microsoft/etc-attribution per request

Notably hoosteeno's comment is making sure that last line also includes something based on UA if we're going to specially make a custom installer anyway. ??

(Not sure which and how many UAs and actual attribution we want but practically that covers close to 99%. Whether from-UA's version is important is a separate question that could quickly increase the number of custom installers.)

:mardak, :hoosteeno,

This trade-off seems good for the usecases we have planned for the immediate (Chrome and Firefox UAs) and mid-term (other browser UAs).

Flags: needinfo?(tspurway)
Flags: needinfo?(tspurway)

We had a meeting about this today. Just wanted to capture a couple points.

  • The download website (www.mozilla.org) could parse UA and send it in URL parameters, but we don't think this is the best approach. Even if we went this way, the bouncer service and the attribution service would need to change to handle those new parameters. All download requests go to bouncer; bouncer decides what to do about them; so we can reduce complexity and bug surface by putting UA parsing at bouncer.
  • Bouncer could decide that a particular download request needs a UA-specific Firefox, and could accomplish that in a couple ways:
    • This could be done by serving the user a custom Firefox build for that UA. This solution implies many more builds because it adds to the installer taxonomy: A user could get a locale-specific, OS-specific, architecture-specific, chrome-switcher-specific installer.
    • This could instead be done by sending all such requests through the attribution service, which would inject the UA into a more generic installer as it currently injects UTM parameters. This implies much more load at the attribution service; only a fraction of Chrome switchers currently pass through that service now.

A footnote: It seems likely that some users would be downloading a browser intended for mass install in e.g. a corporate IT scenario. We might one day need to figure out how to give these users a UA-generic version of the browser that does not assume switching.

nthomas, do you have ideas on how we can get "firefox was downloaded with chrome" data from the installed firefox?

One idea is to have a beetmover task to use the existing attribution service, which adds some metadata to existing builds, to generate some custom builds, e.g., from-chrome installer and from-firefox installer, and the website redirects to the cdn for those custom builds or the normal/generic build.

aki (and hoosteeno's comment 21) brought up some good questions of how long we'll need this for or maybe a one-off experiment for now because we currently already have at least (# platforms) * (# locales) builds for every release, so multiplying that by (# from-browsers) grows quickly.

tspurway, maybe we should limit this to en-US win64 from-chrome + from-firefox for say… firefox 72.0 for now? So soon after 72 is released, we do a one-time attribution service request and store that on cdn and have bouncer detect when it should redirect specially to that file? ?

Flags: needinfo?(nthomas)

nthomas had a suggestion for whether we can access the information about the downloader, and looks like they are indeed stored on the file but only for mac. E.g.,

$ xattr -l Firefox\ 69.0.3.dmg 

com.apple.quarantine: 0081;5dcc6cdf;Firefox Nightly;75A532AA-7340-4C65-9783-095B4B1A16BC

com.apple.metadata:kMDItemWhereFroms:
00000000  62 70 6C 69 73 74 30 30 A2 01 02 5F 10 52 68 74  |bplist00..._.Rht|
00000010  74 70 73 3A 2F 2F 66 74 70 2E 6D 6F 7A 69 6C 6C  |tps://ftp.mozill|
00000020  61 2E 6F 72 67 2F 70 75 62 2F 66 69 72 65 66 6F  |a.org/pub/firefo|
00000030  78 2F 72 65 6C 65 61 73 65 73 2F 36 39 2E 30 2E  |x/releases/69.0.|
00000040  33 2F 6D 61 63 2F 65 6E 2D 55 53 2F 46 69 72 65  |3/mac/en-US/Fire|
00000050  66 6F 78 25 32 30 36 39 2E 30 2E 33 2E 64 6D 67  |fox%2069.0.3.dmg|
00000060  5F 10 3E 68 74 74 70 73 3A 2F 2F 66 74 70 2E 6D  |_.>https://ftp.m|
00000070  6F 7A 69 6C 6C 61 2E 6F 72 67 2F 70 75 62 2F 66  |ozilla.org/pub/f|
00000080  69 72 65 66 6F 78 2F 72 65 6C 65 61 73 65 73 2F  |irefox/releases/|
00000090  36 39 2E 30 2E 33 2F 6D 61 63 2F 65 6E 2D 55 53  |69.0.3/mac/en-US|
000000A0  2F 08 0B 60 00 00 00 00 00 00 01 01 00 00 00 00  |/..`............|
000000B0  00 00 00 03 00 00 00 00 00 00 00 00 00 00 00 00  |................|
000000C0  00 00 00 A1                                      |....|
000000c4

See attachment. The dialog shows the downloader as well as the url, which Firefox adds since bug 337051.

But this doesn't help us for windows. mhowell: No, there's no information about the application that downloaded a file saved at all on Windows.

Hmm, that's a shame. Would the stub looking up the default browser be a reliable signal ? A UA of a browser doesn't seem completely solid itself. Just wondering because it might mean only touching the stub, instead of lots of the stub, bedrock, attribution server, and bouncer stack.

(In reply to Ed Lee :Mardak from comment #22)

nthomas, do you have ideas on how we can get "firefox was downloaded with chrome" data from the installed firefox?

One idea is to have a beetmover task to use the existing attribution service, which adds some metadata to existing builds, to generate some custom builds, e.g., from-chrome installer and from-firefox installer, and the website redirects to the cdn for those custom builds or the normal/generic build.

  • beetmover specifically wouldn't help here, it only moves files around
  • the current partner repack process could be used but is fairly heavy handed. It could generate customized stub and full installers, which would show up as in telemetry as a distribution rather than attribution. Bedrock or bouncer would need to point to these files (as discussed already), personally I think doing it in bouncer is fairly opaque from a testing/rediscovery point of view. This approach would fragment the download stack, and slow down the release automation a bit. On the flip side if wanted a different onboarding extension for each scenario that would be easy
  • we could write new release automation to talk to the attribution service to create 1 to 97 stub installers. It would have to craft the url carefully to be accepted (hmac etc; bedrock already has logic for this). We'd have to figure out how to point the attribution service to pre-release builds, move bits around, set up bouncer, etc, plus the changes in bedrock/bouncer to serve them to people
  • we could enhance the partner repack machinery to use a script (see bug 1261140) to create 1-97 attributed stub installers, which then use the regular full installers. There's a related request at bug 1585811 so some synergy, and some automation already exists for moving bits, bouncer setup. Still the changes in bedrock/bouncer to serve them to people

In the last two cases, what happens if we have a marketing campaign and use attribution on a stub which already has browser attribution ? Do we need to support this case ?

aki (and hoosteeno's comment 21) brought up some good questions of how long we'll need this for or maybe a one-off experiment for now because we currently already have at least (# platforms) * (# locales) builds for every release, so multiplying that by (# from-browsers) grows quickly.

Did the original scope of stub installers get widened out to all platforms at some point in reviving this bug ? Even if it's 'just' windows a time frame would be useful to know.

tspurway, maybe we should limit this to en-US win64 from-chrome + from-firefox for say… firefox 72.0 for now? So soon after 72 is released, we do a one-time attribution service request and store that on cdn and have bouncer detect when it should redirect specially to that file? ?

I don't think we can make the one-time request manually (oremj would know for sure) but this certainly de-scopes the problem a lot.

Flags: needinfo?(nthomas)

I suggested we might use the funnelcake system. It's important to point out that funnelcake is much better suited to a short-term test to prove/disprove other data, rather than a long-term implementation with iteration. If the natural timescale for measurement is quite long then it's probably better to go straight to attribution, rather than test in one system and then re-implement in the preferred one.

Anyway, here's a sketch of details. Funnelcakes are a type of partner repack, where we also have bedrock support for selecting web sessions based on platform, a proportion of requests, etc. Since the last time we created a funnelcake we gained support for partner stub installers, so it's easier to create funnelcakes for windows, but we'd need to do some work to glue things together. The system could look like

  • nthomas+mkaply - set up a configuration to create a funnelcake for each UA of interest; stub and full installer
  • release automation - creates those each time we create a release
    • TODO - nthomas - adjust the bouncer products for funnelcake so they match bedrock's expectations
  • pmac+others
    • TODO - add support for UA detection ?
    • TODO - add support to enable funnelcake when user hasn't landed on path with ?f=NNN ?
    • create a recipe for bedrock funnelcake
  • TODO metrics based on distribution rather than attribution

With that we'd have a system to automatically build, publish, and serve customized builds based on UA. We'd have to scope out the time for work to implement the changes, taking into account the holidays etc. SWAG a few days work for my two items.

Summarizing last week's discussion around using attribution service:

  • there's a cache that should prevent the attribution service from handling every download request although this will still be much higher volume than current usage
  • need to verify if bouncer doing the UA detection is sufficient or will cause problems, e.g., chrome-attributed stub installer gets downloaded by an firefox user as the requested url is the same
    • if insufficient, website changes will be needed to provide a different url based on UA
  • use an explicit mapping of allowed UAs -> attribution value instead of storing the whole UA
  • want to keep using attribution for existing attribution but also include the new UA-related attribution
  • focusing on windows-only stub installer is good enough for now
    • mac has "downloading application" data on the dmg although unclear if it'll be accessilble from firefox application
  • focusing on en-US vs all locales should have minimal benefit / difference
  • reusing attribution "automatically" gets firefox telemetry and targeting for experiments

Something I don't think that was discussed was the hmac/signature generation/verification which should get triggered before the actual download request of the actual stub installer. Are there concerns around these additional XHR load?

Something I don't think that was discussed was the hmac/signature generation/verification which should get triggered before the actual download request of the actual stub installer. Are there concerns around these additional XHR load?

I don't think so. That operation is pretty low-cost and fast.

agibson, could the existing CustomStubAttribution code from https://github.com/mozilla/bedrock/issues/6871 be reused conditionally from the main download page instead of from a variant, e.g.,

const ua = uaForAttribution(navigator.userAgent);
if (isWindows && ua) {
  Mozilla.CustomStubAttribution.init({
    utm_campaign: "user_agent",
    utm_content: ua,
    …
  });
}

Then bouncer should theoretically treat this as any other attribution_code potentially without needing to change bouncer or attribution service (??). What this misses out on is existing attribution, e.g., return to amo, as this naively wouldn't append additional ua attribution, but maybe we can skip those for now as that should be relatively low traffic.

Flags: needinfo?(agibson)

The custom stub attribution logic was added to work around existing issues in tracking retention data for small, short term experiments only. It can clobber existing stub attribution data, so I’m not sure that we can safely use it in this way. The custom code should really be retired, in favour of https://bugzilla.mozilla.org/show_bug.cgi?id=1567331

Flags: needinfo?(agibson)

Maybe the work specifically to support bug 1592312 should be split out at this point if we don't want a completely separate attribution.ua type field as originally suggested in comment 1.

agibson, so without using CustomStubAttribution, could the regular Mozilla.StubAttribution behavior on the download page be modified to have some default data after the usual meetsRequirements() checks, etc? E.g.,

diff --git a/media/js/base/stub-attribution.js b/media/js/base/stub-attribution.js
     StubAttribution.getAttributionData = function(ref) {
…
         return {
             utm_source: params.utm_source,
             utm_medium: params.utm_medium,
             utm_campaign: params.utm_campaign,
-            utm_content: params.utm_content,
+            utm_content: params.utm_content || StubAttribution.getDefaultAttribution(),
             referrer: referrer
         };

Where the default attribution value has some basic UA detection, and hopefully this will get us good enough data in the common case without much engineering effort as most users are from windows and don't have an existing cookie attribution (??).

Flags: needinfo?(agibson)

If the question is could mozorg parse UA information and send it over as part of stub attribution, then yes as previously discussed in this bug it’s totally possible for us to do this in a number of ways. Your suggestion is certainly a plausible option.

The question I would like to understand is why would it make more sense for mozorg to write and maintain a UA parsing library (which does not currently exist), when the UA string is already available to bouncer with existing stub URL requests. Apologies if this has already been discussed but I’m not in the loop with some of the outcomes here. I’m not saying mozorg shouldn’t do this, if it’s the appropriate solution that is a better fit than all others. Reading previous comments, I’m not sure that is the case though.

I’d also like to see a specification for what we consider “basic UA detection”.

Flags: needinfo?(agibson)

There was a suggestion (from catlee?) to not write the whole user agent string, which could be arbitrary data and instead have a more structured / known format, e.g., "Chrome" "Firefox" "Edge" "Safari". I understand UA parsing is complicated, but I believe at least for our use case, we hopefully capture the common case desktop downloads even if there are false positives, but we could name the tokens as "Chrome-ish" "Firefox-ish" "Edge-ish" "Safari-ish" instead to try to be clearer that it's not "definitely Chrome." (I don't think the planned Firefox messaging targeting these would say things like "you just switched from chrome!" but more of giving more importance to highlighting certain features, e.g., tracking protection.)

The current desire to have mozorg handle this through existing stub attribution is to minimize changes. E.g., a new attibution.ua seems like it would require changes to at least mozorg (avoid cache issues), bouncer, attribution service, firefox.

The UA parsing would need to exist somewhere, and I do think ideally it would live in the application code actually using it instead of leaking those details up the stack to attribution service or bouncer or mozorg. There could be other uses for UA parsing in mozorg, but there hasn't been one yet and maybe it's not the time to overengineer one now for general reusability.

I agree the additional code maintenance is undesirable especially if it's a one-off thing that has unknown future requirements / changes, but assuming this UA related code is not to be directly reused, would it be any better if someone from User Journey engineering writes the pull request and updates (if any) for the desired business logic of transforming a user agent string?

The UA parsing would need to exist somewhere, and I do think ideally it would live in the application code actually using it instead of leaking those details up the stack to attribution service or bouncer or mozorg. There could be other uses for UA parsing in mozorg, but there hasn't been one yet and maybe it's not the time to overengineer one now for general reusability.

From mozorg's perspective, other than "is Firefox" and "is not Firefox", we generally try and avoid using UA detection for general purpose website logic, and instead rely on feature detection as much as possible. That's not to say we'd never use a UA detection library if we wrote one, but it's not something I think we should adopt as a general purpose toolkit, given the complexities and inherant brittleness.

I agree the additional code maintenance is undesirable especially if it's a one-off thing that has unknown future requirements / changes, but assuming this UA related code is not to be directly reused, would it be any better if someone from User Journey engineering writes the pull request and updates (if any) for the desired business logic of transforming a user agent string?

If this code was to live on mozorg, I think it would be less back and forth if the mozorg team handled the work, as they are more familiar with testing and QA.

The main decision on whether to do this parsing on bouncer/stub service or the website (if we're doing this with stub attribution) is caching of responses from bouncer/stub service. If they cache responses then we'll have to do it on the client side (JS on the website) to avoid disabling the cache. If responses from bouncer/stub service are not cached then bouncer should be able to parse the User-Agent header string and inject the browser name into the attribution data at that time. I still say the main advantage of doing it on bouncer is the involvement of fewer teams and it should be faster to get into production.

However, as Nick Thomas said in comment #25, we could do this with the funnelcake system, which does mean parsing on the website, but not using stub attribution. This would also mean that the website implementation would potentially be a bit more complex as we've never done this exact thing to decide to send a funnelcake before, but it wouldn't need to involve the bouncer/stub attribution teams and so might be faster to implement than the other parsing on the website path.

However, as Nick Thomas said in comment #25, we could do this with the funnelcake system, which does mean parsing on the website, but not using stub attribution. This would also mean that the website implementation would potentially be a bit more complex as we've never done this exact thing to decide to send a funnelcake before, but it wouldn't need to involve the bouncer/stub attribution teams and so might be faster to implement than the other parsing on the website path.

My main concern around using Funnelcakes for this type of thing is that they are (at least from experience), intended to be for short-lived experiments. They also take a fair amount of effort and coordination to set up and implement on the website. I can't speak to how this may simplify other areas of work, but as far as mozorg is concerned I would probably opt for implementing a longer-term solution (whatever that ends up being).

Alex and I talked and here is our proposal:

We'll do this via stub attribution, and we'll do it on the client-side on the website. We'll inject the browser name (only for specific browsers, sounds like only Chrome for now) into the attribution data. There are some outstanding questions:

  1. Where should this new data go? Into a new parameter, or as an addendum to an existing parameter (e.g. utm_content)?
  2. Should all Windows Chrome downloaders get this attribution or some sub-set?
    a. Only those that would normally have gotten attribution (i.e. had some UTM params or referral coming in)?
    b. If we want to include anyone downloading with Chrome should they all get attribution just for this or only a percentage of them?
  3. What specifically is the requested UA parsing ruleset? We'd like to keep the additional JS load to a minimum, so we'd like to be as specific as possible and implement these rules ourselves.
  1. Where should this new data go? Into a new parameter, or as an addendum to an existing parameter (e.g. utm_content)?

This should be a separate, independent attribution parameter to avoid bias introduced by interaction with other params

  1. Should all Windows Chrome downloaders get this attribution or some sub-set?

Ideally all chrome downloaders should receive attribution. If this isn't feasible now, for the experiment we'll need to make sure some % sampling is done in a way to avoid introducing any bias into the population. Assuming the experiment is a success, the requirement in the future will be to increase to 100% attribution.

Increased load on the website and stub attribution service would be my primary concerns for using attribution on all downloads using Chrome. I'm confident we could handle it on the website side, but I can't really speak to how this might affect bouncer or stub attribution.

Any feelings on this oremj?

Flags: needinfo?(oremj)

(In reply to Paul [:pmac] McLanahan from comment #36)

  1. What specifically is the requested UA parsing ruleset? We'd like to keep the additional JS load to a minimum, so we'd like to be as specific as possible and implement these rules ourselves.

With the focus on Chrome downloaders of desktop windows Firefox, the behavior should be

Chrome UA -> Chrome
Firefox UA -> not Chrome
Safari UA -> not Chrome
Edge UA -> not Chrome
IE UA -> not Chrome
any other UA -> doesn't matter if false positive Chrome or identified as not Chrome

This is roughly making sure the top 95% of UAs are identified as Chrome or not and the remaining 5% is acceptable noise, and those identified (correctly or incorrectly) as Chrome get the attribution.

Sounds good. Diving in further though, what UA string will we accept as being "Chrome"? A lot of browsers try to be as Chrome-like as they can to get around sites trying to only work with Chrome, with some even going so far as to have a dynamic UA based on domain being visited. Should Chromium count? Should other Blink-based browsers? Which versions do we care to support for a positive "Chrome" identification? We'll never get this perfect, but we'll need to figure out what about a UA string makes us confident enough that it is Chrome to add attribution.

(In reply to Ed Lee :Mardak from comment #39)

With the focus on Chrome downloaders of desktop windows Firefox, the behavior should be

Actually, I'd like to see if differentiating for Firefox would add additional undue complexity. We definitely have a number of usecases around Firefox downloading Firefox (understanding pave-overs, etc.), and would be requesting this soon anyway.

So something like:
Chrome UA -> Chrome
Firefox UA -> Firefox
Safari UA -> Other
Edge UA -> Other
IE UA -> Other

Could we not just capture the information inside of the UA string, so that we get flexibility to post process the data the way we need it afterwards (as part of the data pipeline)?
There is be value in understanding the downloading browser even if it's not Chrome so we can customize onboarding as well as possible.
Would that add too much load on the bouncer or attribution service?
Chrome UA -> Chrome
Firefox UA -> Firefox
Safari UA -> Safari
Edge UA -> Edge
IE UA -> IE
Other UA -> Other

(In reply to Romain Testard [:RT] from comment #42)

Could we not just capture the information inside of the UA string, so that we get flexibility to post process the data the way we need it afterwards (as part of the data pipeline)?

We potentially could as discussed in comment #6. The issue I believe is that this would be a significant amount of extra data to inject and the space we have for attribution data in the installer is quite small.

Based on this code from the attribution service we only have 200 characters to work with:

https://github.com/mozilla-services/stubattribution/blob/9cc6e29bf4b6c034cd1b8b9c1886317d3b8a8025/attributioncode/validator.go#L53

(In reply to Jim T from comment #37)

  1. Where should this new data go? Into a new parameter, or as an addendum to an existing parameter (e.g. utm_content)?

This should be a separate, independent attribution parameter to avoid bias introduced by interaction with other params

We are inches away from having parameters to put experiment attribution into. If we could push that over the finish line, then the work remaining would be on the website. We could put the UA into the variation param, and identify a sample with the experiment param. Then, if the experiment justifies it, we can add more attribution params in the same manner.

To get there, we have to complete these bugs:

https://bugzilla.mozilla.org/show_bug.cgi?id=1567331
https://bugzilla.mozilla.org/show_bug.cgi?id=1567320

Based on this code from the attribution service we only have 200 characters to work with...

This appears to be a number we can change, and if we finish bug 1567320, it will be 400. Ref:

https://github.com/mozilla-services/stubattribution/pull/95/files#diff-253081adbe418770bdbe47b1f11c8320R62

We could ask :oremj if bumping it to 600 is possible, and then we could trim UA at 200.

(In reply to Justin Crawford [:hoosteeno] [:jcrawford] from comment #44)

We could put the UA into the variation param, and identify a sample with the experiment param. Then, if the experiment justifies it, we can add more attribution params in the same manner.

Are you suggesting the variation param can be repurposed to contain a UA-based value as nothing is currently using the variation anyway? Or that this will be handled on a per-experiment basis anyway, e.g., experiment1 has variationA and variationB and if we wanted to track UA too, it would get attributed something like variation=A-Chrome or variation=B-Edge? (And the default download with no experiment will just put variation=Chrome?)

If we can reuse / repurpose variation, then existing Firefox 70+ releases with bug 1515172 will automatically start reporting the UA values via telemetry. Otherwise if we need a new param, we will need to wait until a new change makes it to release (perhaps Firefox 73).

Flags: needinfo?(hoosteeno)
See Also: → 1515172

Since this is an experiment, we can put 'ua-onboarding' in the experiment param and 'chrome' in the variation param. If the experiment justifies investment we can build new parameters to add to stub attribution, e.g. 'ua=chrome'.

Flags: needinfo?(hoosteeno)

Morphing this bug per comment 44 and comment 46 to set experiment and variation params using the updated stub-attribution.js API from bug 1567331 which would get written to the stub with https://github.com/mozilla-services/stubattribution/pull/95

In this case, the additional attribution "%26experiment%3Dua-onboarding%26variation%3Dchrome" is 50 characters, so we probably don't require increasing the length from 200 currently enforced by the stub attribution service as well as in Firefox https://searchfox.org/mozilla-central/search?q=ATTR_CODE_MAX_LENGTH But definitely should change it if adding a dedicated ua param later.


If it's not too much additional load and/or complexity, it sounds like there's multiple product desires to attribute more than "just chrome" to include the other top default browser UAs. Where the default browser UA should be correctly identified and any others can be false positives.

Chrome:
Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/78.0.3904.108 Safari/537.36
Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/79.0.3945.117 Safari/537.36

Firefox:
Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:67.0) Gecko/20100101 Firefox/71.0
Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:67.0) Gecko/20100101 Firefox/72.0

Safari:
Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_1) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/13.0.3 Safari/605.1.15

Edge:
Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/64.0.3282.140 Safari/537.36 Edge/18.17763
Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/79.0.3945.117 Safari/537.36 Edg/44.18362.449.0

Internet Explorer:
Mozilla/5.0 (Windows NT 6.1; WOW64; Trident/7.0; rv:11.0) like Gecko

E.g., I believe something like this could be acceptable:

function deriveVariation(ua) {
  if (ua.includes("Trident")) return "ie";
  if (ua.includes("Edg"))     return "edge";
  if (ua.includes("Version")) return "safari";
  if (ua.includes("Firefox")) return "firefox";
  if (ua.includes("Chrome"))  return "chrome";
  return "other";
}

(Where chromium shows up as "chrome," opera might be false positive "safari" or "chrome," older IE "other," and practically the various mobile UAs probably aren't downloading desktop firefox in significant numbers anyway.)

Depends on: 1567320, 1567331
Summary: Attribution service should handle the UA string data → Run onboarding experiment passing UA string variation to attribution service
See Also: → 1595063

(In reply to Justin Crawford [:hoosteeno] [:jcrawford] from comment #44)

We could ask :oremj if bumping it to 600 is possible, and then we could trim UA at 200.

This should be possible. It looks like the absolute maximum is 1010 bytes, but I would have to confirm with Matt Howell.

Flags: needinfo?(oremj)

Yeah, 1010 bytes is the amount of space in the stub binary, so that's currently the absolute limit on the entire attribution string. That could be increased if it's really necessary, but it's a bit more of a hassle than increasing the limits in the telemetry module (the one mentioned in comment 47).

With bug 1595063 in nightly 73, we now have the option of putting UA data in a "ua" field instead of using the "experiment" + "variation" fields, which have been available since Firefox 70 with bug 1515172.

If it's more desirable for bedrock+stubattribution to put the data in a dedicated field with the caveat that we'll need to wait until 73 is in release (2020-02-11), this would give some additional time instead of immediately getting UA data with the current 71 release.

Awesome that we'll have this in the future! Let's stick with the plan to experiment/variation for now and get that working end to end. We should plan to switch to the ua field when it's available in release.

(In reply to Ed Lee :Mardak from comment #50)

If it's more desirable for bedrock+stubattribution to put the data in a dedicated field with the caveat that we'll need to wait until 73 is in release (2020-02-11), this would give some additional time instead of immediately getting UA data with the current 71 release.

With the stubattribution, bedrock, and pipeline-schemas issues resolved and verified end-to-end in bug 1595063 comment 17, seems like we can resolve this bug too. Thanks everyone for getting all the pieces working!

Status: NEW → RESOLVED
Closed: 6 months ago
Flags: needinfo?(tspurway)
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.