Closed Bug 1479589 Opened 6 years ago Closed 6 years ago

Add custom telemetry ping which would be sent from blocking autoplay shield-study extension

Categories

(Core :: Audio/Video: Playback, enhancement, P2)

enhancement

Tracking

()

RESOLVED FIXED

People

(Reporter: alwu, Assigned: alwu)

References

()

Details

Attachments

(2 files, 1 obsolete file)

Attached file Data review request (obsolete) —
Hi, François,
Could you help me to do data review for my telemetry ping?
We would plan to use a shield-study app to collect those data [1].
Thank you!

[1] https://github.com/alastor0325/autoplay-shield-study/blob/develop/docs/TELEMETRY.md#shield-study-pings-common-to-all-shield-studies
Attachment #8996161 - Flags: review?(francois)
Rank: 15
Priority: -- → P2
Comment on attachment 8996161 [details]
Data review request

> 8) If this data collection is default on, what is the opt-out mechanism for users?
> 
> Since we use shield-study extension as the study way, user might uninstall it if
> they don't want to in the testing group.

Is this an opt-out Shield study or an opt-in Shield study?

> We would use salted-hash to encrypt all the top domain user visited, and won't
> ollect any real unencrypt data for user, it still highly keep user privacy.

Could you please include more details about the hashing scheme for the domains?

I assume the hash is something like:

    sha256(toplevel_domain + salt)

but then how is the salt value generated? Does Mozilla have a copy of the salt? Is it the same for every user?
Attachment #8996161 - Flags: review?(francois)
(In reply to François Marier [:francois] from comment #3)
> Comment on attachment 8996161 [details]
> Data review request
> Is this an opt-out Shield study or an opt-in Shield study?

It's a opt-out shield study.

> Could you please include more details about the hashing scheme for the
> domains?
> 
> I assume the hash is something like:
> 
>     sha256(toplevel_domain + salt)
> 
> but then how is the salt value generated? Does Mozilla have a copy of the
> salt? Is it the same for every user?

I generated the random UUID salt for each user [1] and use [2] to convert the top-level url to the hash code.

The salt is a random string which we don't keep in our data, and it's unique for each user.

[1] https://github.com/alastor0325/autoplay-shield-study/blob/develop/src/feature.js#L3
[2] https://github.com/alastor0325/autoplay-shield-study/blob/develop/src/feature.js#L18

Thank you!
Flags: needinfo?(francois)
(In reply to Alastor Wu [:alwu] from comment #4)
> The salt is a random string which we don't keep in our data, and it's unique
> for each user.
> 
> [1]
> https://github.com/alastor0325/autoplay-shield-study/blob/develop/src/
> feature.js#L3

So we have no way of reversing the hash and knowing what sites people have visited, we just want to be able to tell how many unique sites a user is visiting, right?

> [2]
> https://github.com/alastor0325/autoplay-shield-study/blob/develop/src/
> feature.js#L18

This is the "hashing" code:

  let hash = 0, i, chr;
  if (baseDomain.length === 0) {
    return hash;
  }

  // use salted-hash
  baseDomain += gId;
  for (i = 0; i < baseDomain.length; i++) {
    chr   = baseDomain.charCodeAt(i);
    hash  = ((hash << 5) - hash) + chr;
    hash |= 0; // Convert to 32bit integer
  }
  return hash;

Any reason why you're not using a standard cryptographically-secure hash function (e.g. sha256) here? It's hard to tell whether or not this hand-rolled hashing scheme is secure since I've never seen it before.
Flags: needinfo?(francois)
(In reply to François Marier [:francois] from comment #5)
> So we have no way of reversing the hash and knowing what sites people have
> visited, we just want to be able to tell how many unique sites a user is
> visiting, right?

Yes.

> Any reason why you're not using a standard cryptographically-secure hash
> function (e.g. sha256) here? It's hard to tell whether or not this
> hand-rolled hashing scheme is secure since I've never seen it before.

I've changed my hashing function to sha256 [1] which is implemented from MDN [2].

[1] https://github.com/alastor0325/autoplay-shield-study/blob/develop/src/feature.js#L11
[2] https://developer.mozilla.org/en-US/docs/Web/API/SubtleCrypto/digest

Thank you!
Flags: needinfo?(francois)
Thanks for the updated code.

Can you please update the data review request text file to (question 5) to include these extra details.

I would suggest replacing this sentence:

> We would use salted-hash to encrypt all the top domain user visited, and won't ollect any real unencrypt data for user, it still highly keep user privacy.

with something like:

> We collect a hash of the base domain by the user for the purpose of telling different websites apart. The hash contains a salt value that is randomly-generated and different for each user. We do not collect the salt values and so it is not possible for Mozilla to reverse the hash and determine the website that the user visited.

Once that's done, please r? on it.
Flags: needinfo?(francois)
Attached file Data review request
Attachment #8996161 - Attachment is obsolete: true
Attachment #8996883 - Flags: review?(francois)
Comment on attachment 8996883 [details]
Data review request

1) Is there or will there be **documentation** that describes the schema for the ultimate data set available publicly, complete and accurate?

Yes, at https://github.com/alastor0325/autoplay-shield-study/blob/develop/docs/TELEMETRY.md#shield-study-pings-common-to-all-shield-studies

2) Is there a control mechanism that allows the user to turn the data collection on and off?

Yes, telemetry setting.

3) If the request is for permanent data collection, is there someone who will monitor the data over time?**

Not permanent.

4) Using the **[category system of data types](https://wiki.mozilla.org/Firefox/Data_Collection)** on the Mozilla wiki, what collection type of data do the requested measurements fall under?  **

Category 2.

5) Is the data collection request for default-on or default-off?

Default ON.

6) Does the instrumentation include the addition of **any *new* identifiers** (whether anonymous or otherwise; e.g., username, random IDs, etc.  See the appendix for more details)?

No.

7) Is the data collection covered by the existing Firefox privacy notice?

Yes.

8) Does there need to be a check-in in the future to determine whether to renew the data?

Maybe.
Attachment #8996883 - Flags: review?(francois) → review+
Send PR [1] to mozilla-pipeline-schemas, waiting for merging.

[1] https://github.com/mozilla-services/mozilla-pipeline-schemas/pull/187
Hi, Mike,

I've modified my patches, could you help me review them again?

Thank you!
Attachment #9001440 - Flags: review?(mtrinkala)
Attachment #9001440 - Attachment is patch: true
Attachment #9001440 - Attachment mime type: text/x-github-pull-request → text/plain
Attachment #9001440 - Flags: review?(mtrinkala) → review-
Comment on attachment 9001440 [details] [diff] [review]
GitHub Pull Request

Hi, Mike,

I've modified my patches, could you help me review them again?

Thank you!
Attachment #9001440 - Flags: review- → review?(mtrinkala)
Attachment #9001440 - Flags: review?(mtrinkala) → review+
Waiting for another review approval.
PR has been merged, closed bug.
Status: NEW → RESOLVED
Closed: 6 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: