Closed Bug 1259612 Opened 4 years ago Closed 4 years ago

Finalize structure for stub attribution `attribution_code`

Categories

(www.mozilla.org :: General, defect)

Production
defect
Not set

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: ckprice, Unassigned)

References

(Blocks 1 open bug, )

Details

Per the project plan detailed in the program Wiki[0], the initial tasks on the mozilla.org side here are:

- When users visit the Firefox download page, the website will construct an attribution code.
- This code will be passed to bouncer (tbd) via a URL param.

cmore is the primary contact between marketing and product. cmore to create dependency bugs, and attach to this tracker.

[0] https://wiki.mozilla.org/Firefox/Stub_Attribution
I created two docs to help move this forward.

I wrote some pseudocode with help of Gareth on how how the cohortID hash would need to be calculated:

https://docs.google.com/a/mozilla.com/document/d/1DzIg19kAdtYEzS_waQNCQBfi8CSGj4cl9N8T24WSiyc/edit?usp=sharing

Real www.mozilla.org Firefox download source, medium, campaign content data, totals and hashes that could be constructed:

https://docs.google.com/a/mozilla.com/spreadsheets/d/1U-0JHpc3INJnBwTFkdrPqpk7yqvpEn16DJtp6d6TRiY/edit?usp=sharing
A simple string tag is good. Since whatever we use is outside the signed parts of the binary we can't trust it so we need to make sure it can't become "interesting" if viewed/embedded in the wrong context. Define a format, or at least a character set, that doesn't contain any HTML or URL special characters. For example: "Hex digits", or "UUID format", or "alphanum+space+hypen".

Whatever you agree to must be enforced by the client when it reads the tag out of the signed binary. If it's invalid it MUST NOT be returned to the server. Might want to invent a signal for an invalid or damaged tag rather than send nothing.

I assume the tag will be stored on disk somewhere. Every time the client is going to send this to the server it should re-validate the format first, and again send the "invalid tag" if necessary.

It'd be best if the server receiving the tags _also_ validated the format. Anyone can poke data at us, and if that data ends up on a web page or report somewhere unfiltered it might end up being a used in a blind attack (or not-so-blind if our source is public on github). Might want a second "invalid-tag" for this case because it's a different kind of bad data and you might want to ignore or account for it differently.
(In reply to Chris More [:cmore] from comment #1)
> Real www.mozilla.org Firefox download source, medium, campaign content data,
> totals and hashes that could be constructed:
> 
> https://docs.google.com/a/mozilla.com/spreadsheets/d/1U-
> 0JHpc3INJnBwTFkdrPqpk7yqvpEn16DJtp6d6TRiY/edit?usp=sharing

NI bsmedberg (and anyone else) - are there any issues with this approach?
Flags: needinfo?(benjamin)
I don't understand why we're using hashes. If we have a hash we'd need to have a lookup table, and that's going to greatly complicate the processing pipeline. Can we include the actual data in the code so that when it comes out the other end (in telemetry pings and such) we can separate it out immediately and it's easy to search?

Does the character ">" ever appear in these strings, or is it safe to use that as a delimeter?
Flags: needinfo?(benjamin)
The hashes are just a carry-over of initially when we though we may have to transmit them via a string that was part of the installer's filename. i.e. super-rad-browser-123SMED456.exe. 

Since we have 1k to work within inside of the cert, probably just plain text with a delimiter. Whatever delimiter we use, we can sanitize the strings before sending them over from www.mozilla.org. We would even stick with URL delimiters like "&". 

I would prefer not hashing the string so it is easily to split and search later and the hashing was just an artifact of initially thinking we need to keep something super short. It doesn't look like that is a constraint anymore.
Here's an example of source, medium, campaign, content and how it is normally handled on our websites:

http://cl.ly/1j1A1p3e2C2a

The attribution code could simply be the strong column which is delimited with "&" and that would align well with our websites.

The only different is that our websites use utm_source, utm_campaign, etc. where utm is a google analytics specific attribute. We're just making it more generic with "source", "medium", etc.
Since this code will itself be part of a query param, perhaps we shouldn't use reserved characters[0] code as delimiters to avoid having to play around with escaping stuff later.

    source>foo-source|medium>foo-medium|campaign>foo-campaign|content>foo-content

Thoughts?

[0] https://tools.ietf.org/html/rfc3986#section-2.2
Flags: needinfo?(chrismore.bugzilla)
Flags: needinfo?(benjamin)
(In reply to Cory Price [:ckprice] from comment #7)
> Since this code will itself be part of a query param, perhaps we shouldn't
> use reserved characters[0] code as delimiters to avoid having to play around
> with escaping stuff later.
> 
>    
> source>foo-source|medium>foo-medium|campaign>foo-campaign|content>foo-content
> 
> Thoughts?
> 
> [0] https://tools.ietf.org/html/rfc3986#section-2.2

For this v1.0 of attribution, we won't be passing this as a query parameter on first run. That's not needed to understand retention and I don't want to slow down to figure that piece out. It was only an interesting idea to get a earlier read on download > first run conversion rates, but the goal of attribution is retention of acquisition cohort and having a parameter on /firstrun/ doesn't directly answer that question.

If and when we decide to add a parameter to /firstrun/, we could just mimic the logic of urlencode() does by automatically escaping reserved characters.
Flags: needinfo?(chrismore.bugzilla)
(In reply to Cory Price [:ckprice] from comment #7)
> Since this code will itself be part of a query param, perhaps we shouldn't
> use reserved characters[0] code as delimiters to avoid having to play around
> with escaping stuff later.
> 
>    
> source>foo-source|medium>foo-medium|campaign>foo-campaign|content>foo-content
> 
> Thoughts?
> 
> [0] https://tools.ietf.org/html/rfc3986#section-2.2

Actually, I see what you mean now. I think you are talking about transmitting the code from www.mozilla.org to this to-be-named service that will consume the code and put it in the certificate. Are you saying on that relay between moz.org and the service, to delimit the code by something other than &? 

It would be possible to urlencode on mozilla.org and urldecode on the to-be-named service before it is ingested into the certificate.
Example code passing:

1) Anonymous user from google.com search lands on www.mozilla.org/firefox/new/ to download firefox.

For this user, these are the parameters that could be derived from where the user came from:

source=google.com
medium=organic
campaign=(not set)
content=(not set)

2) On click of the download button, the user would make a request to:

https://stubservice/?os=win&lang=en-US&product=Firefox-stub-46.0&code=[stub attribution code]

For this user, this would be the value of the attribution code:

source=google.com&medium=organic&campaign=(not set)&content=(not set)

But, we need to url encode the string so that it can be passed as a single url parameter such as:

source%3Dgoogle.com%26medium%3Dorganic%26campaign%3D(not%20set)%26content%3D(not%20set)

Thus, the stub service url would look like:

https://stubservice/?os=win&lang=en-US&product=Firefox-stub-46.0&code=source%3Dgoogle.com%26medium%3Dorganic%26campaign%3D(not%20set)%26content%3D(not%20set)

:ckprice and I talked about the &code= maybe should be more specific to the attribution code since that is the values of the variable being passed, thus we suggest this parameter change:

https://stubservice/?os=win&lang=en-US&product=Firefox-stub-46.0&attribution_code=source%3Dgoogle.com%26medium%3Dorganic%26campaign%3D(not%20set)%26content%3D(not%20set)

Changing &code= to &attribution_code so it is more specific.

Does this help?
Clearing bsmedberg's NI, in a meeting today, we discussed keeping the structure in a single string. The delimiters we use would be up to the moz.org team, which cmore has noted above and I've added attached.
Flags: needinfo?(benjamin)
Changing this bug to only include finalizing this code and resolving.
Status: NEW → RESOLVED
Closed: 4 years ago
Component: Project Tracking → General
Resolution: --- → FIXED
Summary: [tracking] Construct and transmit attribution code on mozilla.org → Finalize structure for stub attribution `attribution_code`
You need to log in before you can comment on or make changes to this bug.