Closed Bug 1867294 Opened 1 year ago Closed 1 year ago

Glean.js automatic click event instrumentation

Categories

(Data Platform and Tools :: Glean: SDK, enhancement, P1)

enhancement

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: brosa, Assigned: aaggarwal)

References

(Blocks 1 open bug)

Details

Attachments

(3 files, 1 obsolete file)

Automatic instrumentation for basic click events in Glean.js.

Assignee: nobody → aaggarwal
Type: defect → enhancement
Priority: -- → P1

Hi all.

I did some groundwork on the type of click events that can be captured automatically via Glean.js for "web". I propose to collect following clicks:

  1. Clicks on HTMLAnchorElement (corresponds to <a> html tag)

    1. It will capture clicks to any hyperlink in a document i.e. hyperlink to web pages, files, email addresses, locations in the same page, or anything else a URL can address
    2. Additional context to be collected with this event (via extra_keys):
      1. url: A string indicating the target url (=> HTMLAnchorElement.href)
      2. id: A string indicating id of the element (=>HTMLAnchorElement.id)
      3. class: A string indicating class of the element (=>HTMLAnchorElement.className)
  2. Clicks on HTMLButtonElement (corresponds to <button> html tag)

    1. It will capture clicks on any button in a document e.g. button to submit a form etc.
    2. Additional context to be collected with this event (via extra_keys):
      1. id: A string indicating the id of the element (=>HTMLButtonElement.id)
      2. name: A string indicating the name of the element (=> HTMLButtonElement.name)
      3. type: A string indicating the behavior of the button element (=>HTMLButtonElement.type). Possible values are: "submit", "reset", "button", "menu"

I believe capturing these should cover our most important use cases for the first version.

There are 2 ways of capturing these click events (similar to automatic page load events):

  1. Clients can speciy via a Configuration option during Glean.js initialization to collect these events automatically
  2. Clients can manually call the APIs (with parameters corresponding to extra_keys mentioned above) to capture these types of click events

I would love to have feedback on this before submitting a data review request.

Status: NEW → ASSIGNED
Flags: needinfo?(tlong)
Flags: needinfo?(jrediger)
Flags: needinfo?(brosa)
Flags: needinfo?(alessio.placitelli)

(In reply to Abhishek from comment #2)

  1. Clicks on HTMLAnchorElement (corresponds to <a> html tag)
    1. It will capture clicks to any hyperlink in a document i.e. hyperlink to web pages, files, email addresses, locations in the same page, or anything else a URL can address

Depending on the context of the url, we should be careful to not collect sensitive information along with recording this. I would also prefer this to be more specifically named so maybe this has a metric category of anchor_element and click as the metric name? I'm curious what other's thoughts are on this, and perhaps it's a good idea to talk to our web customers to know what they would expect. Data Science would be good to involve as a stakeholder on the naming/taxonomy of this.

2. Additional context to be collected with this event (via [extra_keys](https://mozilla.github.io/glean/book/reference/metrics/event.html#extra-metric-parameters)):
    1. `url`: A string indicating the target url (=> `HTMLAnchorElement.href`)

Putting my data-steward hat on: including url data would elevate this collection to be Category 3 or 4 and has the potential to collect private/sensitive information. We wouldn't want to collect the email address associated with a mailto: href, for instance, and query parameters embedded in the url could also be problematic. Is there a way to scope this so that it would be unlikely to collect sensitive information and keep it limited to user-interaction information only?

    2. `id`: A string indicating id of the element (=>`HTMLAnchorElement.id`)
    3. `class`: A string indicating class of the element (=>`HTMLAnchorElement.className`)

Referrer, or the current base url/path might also be useful to collect, but could also be problematic around inadvertently collecting sensitive information. I'm guessing that multiple pages could have an anchor element with the same id/class and this would help to identify the source of the element that was clicked.

  1. Clicks on HTMLButtonElement (corresponds to <button> html tag)
    1. It will capture clicks on any button in a document e.g. button to submit a form etc.
    2. Additional context to be collected with this event (via extra_keys):
      1. id: A string indicating the id of the element (=>HTMLButtonElement.id)
      2. name: A string indicating the name of the element (=> HTMLButtonElement.name)
      3. type: A string indicating the behavior of the button element (=>HTMLButtonElement.type). Possible values are: "submit", "reset", "button", "menu"

Again, collecting the source or page/base-url is probably a good idea to identify the button as I can imagine situations where id/name/type might not be enough to uniquely identify a button on a website.

There are 2 ways of capturing these click events (similar to automatic page load events):

  1. Clients can speciy via a Configuration option during Glean.js initialization to collect these events automatically
  2. Clients can manually call the APIs (with parameters corresponding to extra_keys mentioned above) to capture these types of click events

Which APIs are you referring to here? Are you planning on exposing these generated metrics to Glean.js consumers or does this mean there will be some general API additions to support this?

Flags: needinfo?(tlong)

I talked with Travis about this a little bit today during our 1:1. I think trying to collect specific elements could be tricky. What if we configured the click listener to not specifically listen for certain tags, but all clicks and then look for certain IDs/tags to specify what we want.

I was thinking maybe using custom data attributes which are provided in HTML5. We could allow the user to give the element a specific CSS class, data tag, or something to specify that we want to handle an event from that click. Then in our docs we could tell the user that we are looking for things like data-glean-id, data-glean-name, data-glean-type, etc.

This allows us to listen for more specific, user-configured data rather than trying to make a catch-all that could end up causing maintainability issues later. Different web frameworks may use different tags for certain built-in things. I would rather put the customization on the client than on us I think. Everything you've already done would be the same, but it would just change the element look up to search for the custom tags and pull the data from there rather than trying to grab general info from basic tags.

I am open to other thoughts on this as well.

Flags: needinfo?(brosa)

I agree that the use of HTML dataset is a good place to start, as that's something a vanillaJS webdev like me would find handy. And according to what I've read online, it's not any harder for popular frameworks either.

Hooking into the DOM Event model means we should think a little deeply about how we want to behave. Is Glean's reporting a "default action"? (should we respect setPreventDefault()?). Should we act on capture or bubble? (ie, what interaction if any should we have with webapps that choose to stop propagation of the click?). Do we only want to respect user-initiated clicks or synthetics as well? (See isTrusted).

Clearing the ni? as this is getting well discussed already :)

Flags: needinfo?(alessio.placitelli)

Same, I don't have much input beyond what's already been said.

Flags: needinfo?(jrediger)

Thank you all for your valuable inputs. This is super helpful.

I see a great value mentioned in Bruno's comment. This would allow more flexibility for clients in terms of deciding the elements for which they want to record clicks and low maintenance overhead on our side.

IIUC, this approach will mean following:

  1. We will not differentiate clicks b/w different elements (i.e. <button>, <a> etc.) in the implementation. All that we will care about is recording only those elements that have data-glean-* data attributes (details pertaining to this are discussed below) set on them. This implies we can design just one click event (e.g. element_click) to record all types of clicks.

  2. If the above is correct then we will have to design the extra_keys for this event in such a way that they can capture the context around all types of clicks that clients would ever want to record.

To avoid designing an overly generalized system, I did a bit of investigation (document) on the click events that some (not all) of the existing Mozilla products, that are using Glean for telemetry collection, are currently recording. This investigation can guide us to make design decision based on the actual use cases, fitting to their needs.
TL;DR of the doc:

  1. Anchor elements (<a> tag) and button elements (<button> tag) are the most popular elements for which clicks are being recorded
  2. Some products are recording the entire value of "href" attribute of <a> tags without stripping any sensitive information from them

Based on my learnings from this doc, I propose following extra_keys for the click event:

  1. id: A unique identifier of the element (across the website) that was clicked
  2. type: Type of the element being clicked (could be useful to find out what was clicked e.g. "button", "anchor" etc. but could be used by clients as per their needs)
  3. label: Description of the element being clicked (could be useful for recording button's label but could be used by clients as per their needs)

Clients will have to set the data-glean-id data attribute on an element for Glean.js to automatically record click events on it and can set other data-attributes corresponding to the rest of the extra_keys (i.e. data-glean-type, data-glean-label) to capture the additional context around click events.

I didn't include url as extra_keys in this design because the id extra_key could be used by clients to map back to the url that was clicked, without us having to sanitize and capture them. However, if I am missing some perspective on what additional value capturing a url can provide then I would love to hear from you all.

Looking forward to inputs from you all on this.

I didn't include url as extra_keys in this design because the id extra_key could be used by clients to map back to the url that was clicked, without us having to sanitize and capture them. However, if I am missing some perspective on what additional value capturing a url can provide then I would love to hear from you all.

If I can add some perspective here, I think it might be difficult, or require at least a higher level of effort than it first seems, for clients to instrument unique IDs on elements that can then be used to identify each individual URL of a website. Take bedrock for example, which has tens of thousands of URLs (taking into account all of the languages / locales). Having the page URL captured in a click event would greatly simplify the effort it would take for reporting.

Hey :agibson. Thanks a lot for providing perspective here.

The current state of click events in bedrock (as per the document I shared) gave me an impression that there is no need for capturing urls as part of the event context. May be I missed something there? or is it something that is intended to be done for bedrock in future?

(In reply to Abhishek from comment #10)

Hey :agibson. Thanks a lot for providing perspective here.

The current state of click events in bedrock (as per the document I shared) gave me an impression that there is no need for capturing urls as part of the event context. May be I missed something there? or is it something that is intended to be done for bedrock in future?

Bedrock also sends additional, page level metrics such as path and referrer in its click events (see other metrics that are sent in events pings). This is useful information for answering questions like "How many downloads did landing page X generate?" or "How many downloads were a result of organic search referrals?".

I'm very interested in the work proposed here, as I'd be keen to switch to using these more standardized click events instead of our own implementation. But I'd say being able to know both the url and also the referrer of the session where the click occurred, is a pretty standard ask for most marketing/campaign reporting needs.

I think Bruno's idea of making this an explicit data-attribute that we can look for to automatically record events is outstanding. Like agibson mentioned, it will likely be necessary to record some url information such as the referrer and path. I just want to clarify that I want us to exercise caution around url collection, not to prohibit it entirely. I think we can address this in the same ways that bedrock and mdn do now and ensure we aren't collecting identifiable information and still include pertinent information in the proposed events.

I think it's probably OK to have an expectation for us (as clients) to exercise caution and responsibility when collecting URL information. Query parameters in page URLs (such as UTM parameters) are often useful bits of information for reporting, so if Glean stripped these out at the platform SDK level, I think clients would likely just end up implementing their own click events instead.

In bedrock, everywhere we're likely to capture a click event we would already be firing a page_load event. So we're already very careful not to include any sensitive information in our page load event URLs.

Bruno, I and Alex had a meeting (meeting minutes) today to understand the requirements from Bedrock side pertaining to this feature. Here is the TL;DR:

(In reply to Abhishek from comment #8)

  1. We will not differentiate clicks b/w different elements (i.e. <button>, <a> etc.) in the implementation. All that we will care about is recording only those elements that have data-glean-* data attributes (details pertaining to this are discussed below) set on them.

This approach resonated with Alex. Infact, Bedrock already uses data attribute pattern for GA4 as well and, hence, are used to it. It is not intended to collect clicks on say all <a> or <button> elements anyway. Therefore, setting data attributes on elements for which click events are desired to be recorded is aligned with Bedrock.

  1. type: Type of the element being clicked (could be useful to find out what was clicked e.g. "button", "anchor" etc. but could be used by clients as per their needs)
  2. label: Description of the element being clicked (could be useful for recording button's label but could be used by clients as per their needs)

These 2 keys will be important to capture for click events and it will be useful to allow clients to use them as per their needs (e.g. not restricting type value to just (Element.tagName). e.g. “Download Firefox” button’s label text will be different in every language. Setting the same value for data-glean-label attribute for the corresponding element in each language and locale will make the analysis easy at their end.

(In reply to Alex Gibson [:agibson] from comment #13)

I think it's probably OK to have an expectation for us (as clients) to exercise caution and responsibility when collecting URL information. Query parameters in page URLs (such as UTM parameters) are often useful bits of information for reporting, so if Glean stripped these out at the platform SDK level, I think clients would likely just end up implementing their own click events instead.

The referrer of the page on which the element is clicked and, if the clicked element is a link then the link's UTM parameters and path are of special importance for their marketing team. What's most important is the URL (at least UTM parameters and path) and the referrer of the page where the link was clicked. The link's href is useful, but of less importance when type and label are available as event fields.
Context on why this is important: Marketing team wants to understand not only the total no. of clicks on e.g. a “download” button on a page but also how users landed on this page containing this button (i.e. to count organic searches and non-organic searches).

:agibson :brosa Please feel free to add if I missed something here.

The referrer of the page on which the element is clicked and, if the clicked element is a link then the link's UTM parameters and path are of special importance for their marketing team.

Small correction here - what's most important is the URL of the page (as well as referrer) where the link was clicked. The link's href is useful, but of less importance when we have things like type and label as event fields.

Edit: When it comes to filtering URL information, my preference here would be to let clients do that themselves (in a responsible way), rather than making query parameters they do want to record difficult to get at. If Glean did filter or strip some URL information from these events, then I think providing sites with a way to include additional parameters that they do want to capture, will be really important.

Hey Alex. Thanks a lot for correcting it. I updated my previous comment to reflect that.

In light of recent comments, here is the updated proposal:

  1. We will record clicks on all those elements that have at least one of data-glean-* data attributes (described below) set on them. We will not differentiate clicks b/w different elements (i.e. <button>, <a> etc.) in the implementation. This means, we will design just one click event to record element clicks. It will offer more flexibility to clients on choosing the elements they want to record clicks for.

  2. The extra_keys for this event will be:

    1. id: Id of the element that was clicked (e.g. an element's id but could be used by clients as per their needs)
    2. type: Type of the element that was clicked (e.g. to capture what was clicked e.g. "button", "anchor" etc. but could be used by clients as per their needs)
    3. label: Description of the element that was clicked (e.g. to capture the clicked element's label but could be used by clients as per their needs)
    4. referrer: Referrer of the page where element was clicked
    5. url: Full URL of the page where element was clicked

Clients will have to set at least one of the data attributes data-glean-id, data-glean-type, data-glean-label on an element for Glean.js to automatically record clicks on it. The value of these data attributes will be sent in the corresponding extra_keys of the click event.

I am assuming that we are fine with collecting referrer and url of the page from the perspective of data review as we are collecting these 2 for page load events as well.

I would love to have a final feedback on this.

Flags: needinfo?(tlong)
Flags: needinfo?(chutten)
Flags: needinfo?(brosa)

That makes sense to me, I would recommend being more specific for the referrer and url key names to reflect that they are specific to the page (i.e. page_referrer and page_url)

I second Wil's comment about specificity. And if label is actually a description, maybe it should be called description. (Naming is hard. Also, we should take some time and be especially careful with these names, since we're deciding these on behalf of others who will use them and they will be used across projects for perhaps quite a long time.)

As for the data review perspective, I can think of a couple of models:

  1. Explicit: we document that adding data-glean-* requires data review. I don't think we want to go this way because the volume of reviews will be high, the friction for adding basic collection will be high, the tooling doesn't and likely can't help out with this (no glean_parser data-review for reading HTML or JSX or whatever)... but it does ensure that the collections are considered before they land.
  2. Implicit: the Glean Team attains data review on behalf of our consumers for the concept of a click event. We probably prefer this, but run the risk that something sensitive slips into the query params or rides along in the referrer.

Which brings me to stewardship thoughts on URL collections in general:

  1. Authored URLs: URLs in our website that are authored by us (the document's location, the href of an anchor) are usually safe to collect and can be thought of as similar to knowing the name of the page of Preferences we're on in Firefox Desktop. A difference is that the page name in the Preferences doesn't contain state, and even our authored URLs contain state: (in order from least likely to most) in the path, the fragment, and the query params. We can try to ensure that owners forbid sensitive state from riding along through forbidding it in policy or requiring spot data reviews for their inclusion or something else. Another difference is that an authored URL may point outside of the web property: the argument may need to be made that these are safe to collect as well (since we authored them) despite these being treated as Cat3 in e.g. Firefox Desktop (i.e. sponsored topsites tiles).
  2. External URLs: URLs given to us from the outside (looking at you, referrer) are outside of our control and are Cat3 Stored Communication at the least and we can't argue that they are smart enough that they don't keep sensitive data in any part of it. We may not be able to include the entire referer by default, and may need to either sanitize it (etld+1? +path?) or require that consumers seek data collection review (and then signal to the SDK that review's been granted). This is something that definitely doesn't exist within Stewardship's jurisdiction to decide.

The correct mechanism for having these discussions and getting decisions made about risk and mitigation is Sensitive Data Collection Review which can begin before data collections are implemented.

Flags: needinfo?(chutten)

It makes sense to me too, and I like the more descriptive names suggested by Wil.

I also agree with chutten that we should go ahead and start an escalated data-review for this in order to get Trust & Legal, Security, and Privacy's perspectives on this to ensure we move forward in a way that is both amicable to Mozilla Policy while making things easier on Glean.js consumers.

Flags: needinfo?(tlong)

We may not be able to include the entire referer by default, and may need to either sanitize it (etld+1? +path?)

It would probably be helpful to understand how referer is used by our current stakeholders. This might be sufficient but I can also think of reasons we would need to include additional query parameters. For example query parameters from search result page referers is a big component in marketing optimization. Making sure we have the necessary escape hatches while providing a safe out of the box experience should be considered as we move this forward.

It would probably be helpful to understand how referer is used by our current stakeholders. This might be sufficient but I can also think of reasons we would need to include additional query parameters. For example query parameters from search result page referers is a big component in marketing optimization. Making sure we have the necessary escape hatches while providing a safe out of the box experience should be considered as we move this forward.

As far as I'm aware, the referrer policy adopted by most modern browsers today means that strict-origin-when-cross-origin is the default (e.g. for both Chrome and Firefox). So this means when someone clicks from one origin to another (e.g. www.google.com -> www.mozilla.org), only the origin will be reported (minus things like path and query parameters), unless a website maintainer specifies a different policy. Only a same origin navigation will report the full referrer URL by default.

As far as I'm aware, the referrer policy adopted by most modern browsers today means that strict-origin-when-cross-origin

Oh right, I had forgotten about that. I would assume that makes that data review of page referrer easier as well.

Thank you everyone for your valuable inputs on this.

Considering all the discussion above, I propose to break down the implementation for automatic element clicks into two phases to move forward:

(In reply to Abhishek from comment #17)

In light of recent comments, here is the updated proposal:

  1. We will record clicks on all those elements that have at least one of data-glean-* data attributes (described below) set on them. We will not differentiate clicks b/w different elements (i.e. <button>, <a> etc.) in the implementation. This means, we will design just one click event to record element clicks. It will offer more flexibility to clients on choosing the elements they want to record clicks for.

  2. The extra_keys for this event will be:

    1. id: Id of the element that was clicked (e.g. an element's id but could be used by clients as per their needs)
    2. type: Type of the element that was clicked (e.g. to capture what was clicked e.g. "button", "anchor" etc. but could be used by clients as per their needs)
    3. label: Description of the element that was clicked (e.g. to capture the clicked element's label but could be used by clients as per their needs)

Clients will have to set at least one of the data attributes data-glean-id, data-glean-type, data-glean-label on an element for Glean.js to automatically record clicks on it. The value of these data attributes will be sent in the corresponding extra_keys mentioned above.

First version will implement this i.e. capture only those extra_keys that are specific to the element being clicked. It will be faster to land this since the corresponding data review will be faster as this data is either cat1 or cat2. Considering Wil's suggestion, I added element_ prefix to be explicit in the naming (=> element_id, element_type, element_label).

4. `page_referrer`: Referrer of the page where element was clicked
5. `page_url`: Full URL of the page where element was clicked

Once the first version lands, we will work on finding out what parts of a page's referrer and url (or something else) will be important to collect for individual products and why. We already have some idea for Bedrock (thanks to Alex) and would like to understand the same for other products as well. This will require some discussions. Once we figure that out, we will be in a better position to file the sensitive data collection request (if needed) and update the first version of automatic click events accordingly.

Implements first phase as per the comment

Attachment #9367276 - Attachment is obsolete: true
Comment on attachment 9367276 [details] [diff] [review] [mozilla/glean.js] Bug 1867294: Support for automatically collecting click events (#1848) Updated the patch to implement the first phase as per https://bugzilla.mozilla.org/show_bug.cgi?id=1867294#c24
Attachment #9367276 - Attachment is obsolete: false
Attachment #9367276 - Attachment is patch: true
Attachment #9367276 - Attachment mime type: text/x-github-pull-request → text/plain
Attachment #9369639 - Attachment is obsolete: true
Attached file request.md

Data review request for the first phase implementation https://github.com/mozilla/glean.js/pull/1848

Attachment #9369649 - Flags: data-review?(tlong)

Thanks for flagging me for data-review Abhishek! I see that this review only covers the non-sensitive context items in the event extras and is listed as Category 2. This is fine, I just wanted to confirm that your plan is still to add the referrer and url to the extras in the next iteration on this, and get the sensitive data review for that data at that time.

Just another small suggestion, since the extras are the part here that could contain sensitive information, I would really like to see this information in the data-review request, most appropriately in the description of the event in your response to question 5 of the request form.

So as to not have to obsolete your current attachment and add a new one, I'll just amend that here to call out the contextual information that is being recorded in the event extras to make it clear that this data review covers just the actual technical context/extras and the addition of any additional extras would require a fresh data-review.

Here is the contextual information being recorded in the click event extras (coming from the PR metrics.yaml changes associated with this bug, please correct this if this changes):

element_id:

Description: A string identifying the element clicked and which comes from the specific page element's data-glean-id data attribute value.

element_type:

Description: A string indicating the type of the element clicked. For automatic collection, its value is the element's data-glean-type data attribute value.

element_label:

Description: A string label associated with the element clicked. For automatic collection, its value is the element's data-glean-label data attribute value.

Comment on attachment 9369649 [details]
request.md

Data Review

  1. Is there or will there be documentation that describes the schema for the ultimate data set in a public, complete, and accurate way?

Yes, through the metrics.yaml file and the Glean Dictionary.

  1. Is there a control mechanism that allows the user to turn the data collection on and off?

Yes, through the data preferences associated with the application which is integrating Glean.

  1. If the request is for permanent data collection, is there someone who will monitor the data over time?

Permanent collection to be monitored by Abhishek and Bruno backed up by glean-team@mozilla.com

  1. Using the category system of data types on the Mozilla wiki, what collection type of data do the requested measurements fall under?

Category 2, Interaction data

  1. Is the data collection request for default-on or default-off?

Default-on

  1. Does the instrumentation include the addition of any new identifiers (whether anonymous or otherwise; e.g., username, random IDs, etc. See the appendix for more details)?

No new personal identifiers, only technical ones associated with specific page components

  1. Is the data collection covered by the existing Firefox privacy notice?

Yes

  1. Does the data collection use a third-party collection tool?

No

Result

data-review+

Attachment #9369649 - Flags: data-review?(tlong) → data-review+

Hi team! As someone who's worked with Snowplow automatic link click tracking at Pocket, I wanted to drop a few thoughts in this conversation:
-I want to express my interest in the future version of this event that includes the page URL, page referrer, and target URL of the link click.
-I also want to express interest in automatically capturing link clicks for all relevant HTML elements, and not just those tagged with Glean-specific attributes.

I realize this has some implications in terms of data sensitivity and cleanliness (respectively).

I also realize that for the current/v1 event (without the page/target URLs), we sort of need a developer-defined identifier to add specificity to the events (and the Glean-specific HTML attributes help address this need).

...but in order for the events to truly be automatic and useful in examining user behavior on web properties, we should really be capturing granular event data without developer intervention (at least imo, and I acknowledge that I skew towards the "collect it" side of the privacy spectrum).

Hey Travis. Thanks a lot for your review.

(In reply to Travis Long [:travis_] from comment #28)

Thanks for flagging me for data-review Abhishek! I see that this review only covers the non-sensitive context items in the event extras and is listed as Category 2. This is fine, I just wanted to confirm that your plan is still to add the referrer and url to the extras in the next iteration on this, and get the sensitive data review for that data at that time.

You are right. Once the first version lands, we will work on finding out what parts of a page's referrer and url (or something else) will be important to collect for individual products and why. Once we figure that out, we will be in a better position to file the sensitive data collection request (if needed) and update the first version of automatic click events accordingly.

Just another small suggestion, since the extras are the part here that could contain sensitive information, I would really like to see this information in the data-review request, most appropriately in the description of the event in your response to question 5 of the request form.

So as to not have to obsolete your current attachment and add a new one, I'll just amend that here to call out the contextual information that is being recorded in the event extras to make it clear that this data review covers just the actual technical context/extras and the addition of any additional extras would require a fresh data-review.

Thanks for doing it. Would it be better if I add a new attachment (making the current one obsolete)? It is a very minimal effort on my side. This way, no one will have to go through all the comments to find the missing important piece of information from data request.

(In reply to Travis Long [:travis_] from comment #29)

Comment on attachment 9369649 [details]
request.md

Data Review

  1. Is the data collection request for default-on or default-off?

Default-on

I believe it should be default-off as automatic element click event capturing feature has to be turned on by clients by setting Configuration.enableAutoElementClickEvents option to true. If they don't set it to true explicitly then it is off.

Attached file request_updated.md

:travis_ This is the new data review attachment. Same as previous one + contextual information being collected around element click events (also mentioned in https://bugzilla.mozilla.org/show_bug.cgi?id=1867294#c28) which was missing in the previous data review attachment.

The PR for the first version is in review.

Pertaining to the second version of element click events, documenting the requirements from Bedrock side (as per my conversation with :agibson on slack) here:

For page_referrer: Need only the origin for cross domain requests. For same site requests, need at least the path as well (to understand how users navigate through pages). All of this is what browsers' default referrer policies already limit Bedrock to.

For page_url: Need the origin, path, and likely also the ability to capture specific query parameters. Capturing these query parameters are useful because they often run multi-variant experiments on their pages. So, a visitor might land on say https://www.mozilla.org/en-US/ and then get redirected to an experiment cohort, that might be something like https://www.mozilla.org/en-US/?v=2. When looking at the experiment results, they might want to see how many click events variations 1, 2, or 3 generated over a given time period. They often use this when rolling out say a new feature, or a redesign of a product landing page - to make sure it performs well before everyone sees it.

:agibson Please feel free to correct me if I am wrong anywhere :)

I believe it should be default-off as automatic element click event capturing feature has to be turned on by clients by setting Configuration.enableAutoElementClickEvents option to true. If they don't set it to true explicitly then it is off.

The "default-on" is from the user's perspective, not the application consuming Glean. The users only have a choice at the application level and not at the configuration level of Glean. I approached this review from the understanding that consumers of Glean will make use of this, and when they do, the collection will be "default on" for the end-users and can be disabled by the app's data collection preferences.

That's nothing to worry about since this is just interaction and technical data, being "default on" is acceptable according to our data collection policies since the risk of recording PII is low.

Flags: needinfo?(brosa)

v1 of this was merged in Glean.js.

Bruno, would you kindly file a bug for v2 of this (and make it depend on this one)?

Status: ASSIGNED → RESOLVED
Closed: 1 year ago
Flags: needinfo?(brosa)
Resolution: --- → FIXED
Blocks: 1876433
Flags: needinfo?(brosa)
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: