Closed Bug 1511104 Opened 1 year ago Closed 1 year ago
Firefox download page must pass through params, but NOT as utm
44 bytes, text/x-github-pull-request
|Details | Review|
See linked doc. On the 2nd page the problem is clearly explained. Without this addressed, we will not be able to fully test attribution for osx downloads.
Component: General → Activity Streams: Newtab
Product: WebExtensions → Firefox
NOTE: the utm_* params must be passed through without the "utm_" part of the param name.
(In reply to Shane Caraveo (:mixedpuppy) from comment #1) > NOTE: the utm_* params must be passed through without the "utm_" part of the > param name. ignore that. The issue is that the *referrer* url must contain the utm_* codes.
Hi Eric, Mixedpuppy is the engineer who understands the request - and wrote up explination https://docs.google.com/document/d/1w2YnGSemYViJ2Cn5_i67BHSpNoRy1TwUMOW62GS0Fik/edit. Timing wise - we found this when we were looking at testing a feature end-to-end (that has been in progress for months). In order to have this feature ready in Firefox 65 - which ships late Jan - we need to land code and test it out ASAP. I hate to ask for a last minute rush - but if there is any way to look at what this would take to see how quickly it could be done, especially if it's a small change - we would really appreciate it. I just filled in "version and target" information - because it was required to move to this area. I have no idea what those values should be. We are working out how to test this feature end-to-end. So it's possible "development/ stage" is the right initial request for this change. There is a bug 1511071 to work on how to QA something in this download area.
Component: Activity Streams: Newtab → Bedrock
Product: Firefox → www.mozilla.org
Target Milestone: --- → 1.0
Version: unspecified → Production
Assignee: nobody → pmac
Status: NEW → ASSIGNED
I know this bug explicitly states that this is just something the firefox download page needs (i.e. /firefox/new/ -> /firefox/download/thanks/), but the windows attribution is much more versitile. If a windows user comes to pretty much any page of the site with some utm_* params those are recorded, encoded, and stored in a cookie. Then when the user does go to download that data is added to the request for recording in the stub installer. Should the Mac version be as flexible, or do we only care about this working from /new/ to /thanks/, at least for now?
Well, our primary concern (and only use case right now) is installations coming from AMO product pages, which link to /new/. So I'm inclined to say it's the latter ("we only care about this working from /new/ to /thanks/, at least for now"). If someone was interested in OSX attribution in general, then we could opt for being more flexible.
(In reply to Paul [:pmac] McLanahan ⏰ET needinfo? me from comment #4) > Then when the user does go to download that > data is added to the request for recording in the stub installer. Should the > Mac version be as flexible, or do we only care about this working from /new/ > to /thanks/, at least for now? OSX doesn't have an installer, but all osx browsers add the quarantine data to downloaded binaries, which includes the download url and the referrer url. It might be possible to pass those utm params through in a more generic fashion, such as taking the cookie data and adding params to the download url itself. We'd have to do some adjustment to the attribution code in firefox as it only looks at the referrer url.
:mixedpuppy - Would it be acceptable to simply pass *all* query params on to /download/thanks/ at all times? Will it hurt anything to potentially have non-utm params in the referrer URL, or to have query params in the referrer for all platforms? I'm asking because we have JS that runs on *some* /firefox/new/ variation pages that does this - appends all current page querystring values on to the download links. I'd rather avoid nitty-gritty querystring parsing unless absolutely necessary.
It is fine to pass through all params. The attribution system will parse and pull out only specific utm_* params. The params wont have any affect on any other platform.
We should not add utm parameters to a same-site link. When Google Analytics sees a utm_campaign parameter in the URL, it considers that request the start of a new session. If we implement the solution described, then... 1. A user session clicking the download button on /new will cause our download count to increment. 2. That same user will arrive on /thanks with a new utm_campaign parameter, which will force a new session to begin. A new session visiting /thanks will also cause our download count to increment. Also, a (single) download will begin. 3. In GA reports, download volume among this segment will double. Each downloading user will have 2 downloads in 2 sessions, one right after the other. The number of installs and aDAU will not double. This confounds one of our most important metrics. We should avoid it using something like the design described in comment 4. If I understand it correctly, that solution will have the added benefit of working on download links sitewide, not just the green button on /new.  https://support.google.com/analytics/answer/2731565#campaign-based-expiration
Changed summary to reflect that we could pass params through in a way that still allows this to work, provided we uplift a small change to Firefox. Shane will file a bug for the other change.
Summary: Firefox download page must pass through utm params → Firefox download page must pass through params, but NOT as utm_
Justin, I've a patch on bug 1515153 that makes the attribution code in Firefox work with the "utm_" part of the param name stripped. So if you strip that, and add the params to the referrer for the download url, is that going to be fine on your end?
Flags: needinfo?(mixedpuppy) → needinfo?(hoosteeno)
I think this will solve the narrow case, but only if these parameters are handled in the same way by the service that validates them. I can't answer that question -- do you know? I will add one comment to bug 1515153 regarding the scope of the patch.  https://github.com/mozilla-services/stubattribution/blob/master/attributioncode/validator.go
Quick question: Is it possible that a downloading session will have e.g. a utm_campaign parameter _and also_ a campaign parameter at the moment of download? Have we anticipated that?
(In reply to Justin Crawford [:hoosteeno] [:jcrawford] from comment #14) > Quick question: Is it possible that a downloading session will have e.g. a > utm_campaign parameter _and also_ a campaign parameter at the moment of > download? Have we anticipated that? It will only be possible if a) the download starts from AMO, b) this bug passes both utm_content and content on the referrer url for the download. Since passing utm_ is problematic, I expect that wont happen. I wanted to keep utm_ support in case we do want to support campaign tracking. The attribution code will prefer the plain content over the utm_content (and other keys) so attribution data will only end up with one of them.
oh, and this is only an osx thing, it has no affect on windows.
Have we reached a decision on what exactly we want the site to do? If so can someone summarize here so that we can get it knocked out quickly?
It's basically #c1. NI mixedpuppy for sanity check.
So the summary then is: When we see any "utm_*" parameter on /firefox/new/, strip the "utm_" prefix, and add those parameters to the link to /firefox/download/thanks/. That sound right?
Adding to Pual's comment 19 above, we should *only* add these parameters for macOS users, correct? In other words, we should explicitly *not* add these parameters for any other OS.
I don't imagine they would make any difference to any other OS, since neither campaign tracking nor attribution is looking for them. But if we want to be exact and minimize any potential future parameter bleed, then yes.
(In reply to Paul [:pmac] McLanahan ⏰ET needinfo? me from comment #19) > So the summary then is: > > When we see any "utm_*" parameter on /firefox/new/, strip the "utm_" prefix, > and add those parameters to the link to /firefox/download/thanks/. > > That sound right? yes
:mixedpuppy - A couple questions to help make this implementation as tight as possible: 1) Are there specific utm_* params you are looking for? In other words, can we filter any unnecessary utm_* params out in an effort to modify download links as little as possible? 2) If the answer to #1 is "yes", are there specific values we can allow? If not, are there specific patterns we can validate against to ensure a value is valid? We are a little wary of taking *all* utm_* keys with *any* arbitrary value and putting that information back into the DOM/href link.
(In reply to Jon Petto [:jpetto] from comment #23) > We are a little wary of taking *all* utm_* keys with *any* arbitrary value > and putting that information back into the DOM/href link. I don't understand. I thought the issue was replicating utm_ params. If those prefixes are removed and the results are replicated to the download link, what is the concern?
The concern is if there could be some value/key+value data that could cause issues (by content and/or length) at any point along the line. We're thinking of querystring data essentially as user input that should be sanitized in some fashion. We will make sure the keys and values are URL encoded, but beyond that don't have any checks. That being the case, the referrer URL could be 2000+ characters long with key/value pairs of (theoretically) great length. Is that acceptable on your end?
Flags: needinfo?(jon) → needinfo?(ddurst)
Ah. Well, since this is all about getting attribution working on OSX, and our previous assertion is that the querystring would be longer, I have to assume that long-but-shorter-than-previously-expected is fine. The only parameters we should be caring about in this context (if I understand correctly) are the ones that attribution system cares about -- and that's the utm_ prefixed ones (source, medium, campaign, content) -- which, I believe, is all of them. (One caution is that the parameters are probably already URL encoded, and we should be sure to not double up on that like happened for Windows.) If we want to narrow the scope for safety, then we should go ahead and limit that to OSX, as you suggested in #c20. But if there's concern about this we can wait until mixedpuppy is back next week.
> 2) If the answer to #1 is "yes", are there specific values we can allow? If > not, are there specific patterns we can validate against to ensure a value > is valid? This bug seems relevant here: https://bugzilla.mozilla.org/show_bug.cgi?id=1515172 It implies a future state where we allow a couple more params into the attribution content. As you can see, it's not been heavily vetted and certainly not committed, but worth anticipating if possible.
It's probably worth making sure the utm_ params included are in the pre-defined list provided in comment 26: utm_source utm_medium utm_campaign utm_content Without this rule, someone could manually type in utm_haxx=sqlinjectioncode and it would get passed along. On this note, imagine someone added a utm_somereservedkeyword=somenastyvalue to the URL. This would pass somereservedkeyword=somenastyvalue in the referrer URL. I don't know if/how this data is sanitized/checked at the endpoint - which is why I'm bringing it up. This potential exploit would still exist if we limit to the keys listed above and still allow any value. There's nothing stopping a user from changing say utm_campaign=cooladdon to utm_campaign=morehaxxhere. (Again, I don't know if a URL encoded value can cause harm at the endpoint, but I think it's worth double-checking.) FWIW, on the Firefox Accounts side, they have both a list of allowed keys and allowed values to mitigate this risk. (Not sure if that's feasible in this case, however.) Sounds like we might want to wait on :mixedpuppy to verify we'll be safe with just the list of available keys?
You can see the codes that are consumed here: https://searchfox.org/mozilla-central/rev/fef7f858efb695a76010b4c624da5277c16e95b3/browser/components/attribution/AttributionCode.jsm#18-22 As far as utm_random=badstuff. it wouldn't really do anything since we don't make them available. Windows doesn't use the referrer url, so whether the params are passed for non-osx platforms or not doesn't matter. (linux doesn't support attributions at all) Allowed values is unrealistic for utm params.
Our concern is mostly due to us injecting whatever values are in these utm_ params into the DOM and the potential for an XSS attack vector on the Firefox download page. This is the most security sensitive page on the site, and this is already a very sensitive site as JS executing on the domain has privileged API access in Firefox. So we're just trying to be as overly cautious as possible with untrusted data. To that end, is there any format that these values will take? I assume they'll contain some addon identifier. Do said identifiers conform to some pattern? If we could even limit them to a set of allowed characters and a length we'd feel better about it. For the Windows attribution we go so far as to sanitize and include an HMAC signature of the data so that the stub attribution service can verify that the request came from www.m.o. And the attribution service in turn has a whitelist for utm_source to aid in privacy. It'd be nice if we could only accept signed params from AMO to add some trust to the system, but failing that (it'd be significantly more work for both of us) we need some way to validate the data before passing it into the DOM on the download page.
Flags: needinfo?(ddurst) → needinfo?(pmac)
Status: ASSIGNED → RESOLVED
Closed: 1 year ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.