Closed Bug 1334113 Opened 3 years ago Closed 3 years ago

Data review for search counts measurements to core ping

Categories

(Firefox for Android :: Metrics, defect)

51 Branch
defect
Not set

Tracking

()

RESOLVED FIXED
Firefox 55
Tracking Status
firefox55 --- fixed

People

(Reporter: barbara, Assigned: cnevinchen)

Details

Attachments

(1 file)

Data stewards, please review data added in https://bugzilla.mozilla.org/show_bug.cgi?id=1253319

Legal bug here: https://bugzilla.mozilla.org/show_bug.cgi?id=1279308

Chenxia, as a data steward for Android, do you know if we've added this to  https://gecko.readthedocs.io/en/latest/mobile/android/fennec/index.html?
No longer depends on: 1334110
No longer blocks: 1334115
No expiration date.
Flags: needinfo?(liuche)
Flags: needinfo?(benjamin)
Barbara, I think I'm reviewing this change: https://hg.mozilla.org/mozilla-central/diff/13b5c2dfde28/toolkit/components/telemetry/docs/core-ping.rst Can you or your team confirm that this is the actual client behavior on Android and iOS?

On desktop to avoid identifying users too precisely, we only record the search engine name in two cases:

* builtin or suggested search engines with an ID (includes partner search engines in various distribution scenarios)
* if extended telemetry is enabled (prerelease builds)

Here is the code which implements this policy: http://searchfox.org/mozilla-central/rev/d3307f19d5dac31d7d36fc206b00b686de82eee4/browser/modules/BrowserUsageTelemetry.jsm#84

It is not clear from this documentation whether the core ping follows this same convention or not. So we need to make that documentation at least clearer, and perhaps change the client behavior for non "standard" search engines.
Flags: needinfo?(liuche)
Flags: needinfo?(benjamin)
Flags: needinfo?(bbermes)
Ben, this core ping addition in this bug is only applicable to Android.

Chenxia, would you mind commenting on if/how we've followed a similar convention than Desktop.
Flags: needinfo?(bbermes) → needinfo?(liuche)
My understanding is that the documentation of the core ping is shared between android and iOS. If my understanding is correct, then this doc should be updated to say that the search counts are recorded on Android only.
Sorry for the delay.

Breaking this into two seperate discussions

1) Where do mobile core ping docs live?

Here is my understanding:

- Gecko/Desktop/Android core ping docs: http://gecko.readthedocs.io/en/latest/toolkit/components/telemetry/telemetry/data/core-ping.html
- iOS core ping docs: https://github.com/mozilla-mobile/firefox-ios/wiki/Telemetry

There are differences between iOS and Android as it relates to what is being collected in the core ping, so let's focus on Android for this specific bug.

2) Get the Taipei eng team to confirm behaviour as questioned in comment #2

Joe/Wesley, would you mind getting somebody from the Taipe engineers to confirm search count measurement behaviour, and if it's the same/similar to Desktop. Once that's confirmed, depending on the outcome, the docs might have to be updated.
Flags: needinfo?(whuang)
Flags: needinfo?(jcheng)
> 2) Get the Taipei eng team to confirm behaviour as questioned in comment #2
> 
> Joe/Wesley, would you mind getting somebody from the Taipe engineers to
> confirm search count measurement behaviour, and if it's the same/similar to
> Desktop. Once that's confirmed, depending on the outcome, the docs might
> have to be updated.

Nevin recently had worked on several tasks related to telemetry. 
Maybe he's able to answer.
Flags: needinfo?(cnevinchen)
We do have search counts. It's "searches"
And we do keep the search engine's identifier. It's "defaultSearch"
http://gecko.readthedocs.io/en/latest/toolkit/components/telemetry/telemetry/data/core-ping.html

But looks like it might be null if Gecko doesn't know it's identifier.
http://searchfox.org/mozilla-central/source/mobile/android/base/java/org/mozilla/gecko/search/SearchEngine.java#57


Desktop use keyed histogram. It counts the usage with identifier.
http://searchfox.org/mozilla-central/rev/d3307f19d5dac31d7d36fc206b00b686de82eee4/browser/modules/BrowserUsageTelemetry.jsm#376

I think we can follow Desktop's approach to record search engines' name more specifically.

Hi Michael.
Please correct me if I'm wrong. Thank you!
Flags: needinfo?(cnevinchen) → needinfo?(michael.l.comella)
Flags: needinfo?(jcheng)
> On desktop to avoid identifying users too precisely, we only record the search engine name in two cases:
>
> * builtin or suggested search engines with an ID (includes partner search engines in various distribution scenarios)

iirc, we do this on mobile (but I don't know what happens with distributions - that code was really complicated and I didn't fully understand it).

To verify, according to the docs [1], we only record custom search engines when "In the case a search engine is added by a user, the engine identifier “other” is used, e.g. “other.<source>”." (where <source> will be the UI component that started this, e.g. actionbar).

Looking at the code, all searches go through:
 SearchCountMeasurements.incrementSearch
  BrowserApp.recordSearch
   BrowserApp.onSearch [2]
   BrowserApp.handleMessage [3]

In onSearch [2], I comment:

>         // We don't use SearchEngine.getEngineIdentifier because it can
>         // return a custom search engine name, which is a privacy concern.

Instead, we use `engine.identifier`, which I assume is null if it's not a built-in engine. I'll leave this to Nevin to verify.

In handleMessage [3], we get the value from the Gecko message's "identifier" and I don't remember what "Search:Keyword" is anymore so I can't verify the value there. I will also leave this to Nevin to verify.

Some approaches on verifying the behavior:
- Check the telemetry to ensure we haven't stored any private data
- Try all the possible kinds of searches and log/debugger all the values that come out
- Look through the code. This was always really challenging because it snakes all over the place and each case is handled differently. I think there are diminishing returns that come out of this and, if it's acceptable, I'd just use the other methods. Note that the IDE Call Hierarchy feature is invaluable here (until it gets to Gecko code :d).

[1]: https://gecko.readthedocs.io/en/latest/toolkit/components/telemetry/telemetry/data/core-ping.html#searches

[2]: https://dxr.mozilla.org/mozilla-central/rev/10ea10d9993c9701e5525928257a589dea2c05d8/mobile/android/base/java/org/mozilla/gecko/BrowserApp.java#2642
[3]: https://dxr.mozilla.org/mozilla-central/rev/10ea10d9993c9701e5525928257a589dea2c05d8/mobile/android/base/java/org/mozilla/gecko/BrowserApp.java#1866

---

The other time Desktop records search engine identifiers:

> * if extended telemetry is enabled (prerelease builds)

We don't do anything special if extended telemetry is enabled.

---

In the interest of improving the code, I'd also add comments to explain when each of the methods that call `recordSearch` get called (I'm always confused when each of them are called - in fact, I probably just throw in some debugger statements).

---

> I think we can follow Desktop's approach to record search engines' name more specifically.

Nevin, to be clear, I think the Barbara's purpose here is to verify we don't record private user data (like desktop), not that we should get more specific names.
Flags: needinfo?(michael.l.comella)
Assignee: nobody → cnevinchen
removing my NI since Nevin is on top of this already.
Flags: needinfo?(whuang)
Comment on attachment 8859926 [details]
Bug 1334113 - Update the document to reflect 'searches' field in core ping.

https://reviewboard.mozilla.org/r/131890/#review136230

::: toolkit/components/telemetry/docs/data/core-ping.rst:140
(Diff revision 1)
>  ~~~~~~~~
> -In the case a search engine is added by a user, the engine identifier "other" is used, e.g. "other.<source>".
> -
> -Sources in Android are based on the existing UI telemetry values and are as
> -follows:
> +This describes the search engine usage(count). The format is { "<engine identifier>.<source>"" : count }
> +This is optional because the users may have never used the search feature.
> +There's no difference if extended telemetry is enabled (prerelease builds) or not.
> +
> +Possible value :

values

::: toolkit/components/telemetry/docs/data/core-ping.rst:141
(Diff revision 1)
> +* {"google-nocodes.suggestion":1}
> +{"yahoo.listitem":2,"duckduckgo.listitem":1,"google-nocodes.suggestion":1}
> +{"yandex-ru.actionbar":2,"google-nocodes.listitem":1,"yandex-ru.suggestion":1}
> +{"google.suggestion":1,"twitter.listitem":4,"google.actionbar":5,"bing.listitem":1}
> +{"null.actionbar":7}

nit: Maybe format this as code?

::: toolkit/components/telemetry/docs/data/core-ping.rst:147
(Diff revision 1)
> +<engine identifier>:
> +The search identifier initialization process:
> +* First, look in the distribution.
> +* Second, look in the jar for plugins shipped with the locale.
> +* Finally, look in the profile for third-party plugins.
> +If after above process, the identifier still can not be found:
> +* It will be "other" when the user click the result from our UI.
> +* It will be "null" when the user press the software keyboard "enter" key directly.

This is a good description of how the engine identifier is determined. However the way it's written it sounds like this is something that happens in the core ping (the lookup).
Attachment #8859926 - Flags: review?(s.kaspari) → review+
Hi Benjamin, 
Please help review. Thank you!
Flags: needinfo?(benjamin)
Comment on attachment 8859926 [details]
Bug 1334113 - Update the document to reflect 'searches' field in core ping.

https://reviewboard.mozilla.org/r/131890/#review138084

This doesn't answer the critical question, which is whether we send search IDs for user-installed search engines.

::: toolkit/components/telemetry/docs/data/core-ping.rst:144
(Diff revision 2)
> -Sources in Android are based on the existing UI telemetry values and are as
> +Possible values :
> -follows:
>  
> +.. code-block:: js
> +
> +    {

This is a set of examples, not a list of all possible values. I don't think you need these examples and they might be misinterpreted as comprehensive.

::: toolkit/components/telemetry/docs/data/core-ping.rst:172
(Diff revision 2)
> +    {
> +       "null.actionbar":7
> +    }
> +
> +<engine identifier>:
> +The search identifier is the engine id from Gecko. Core Ping will receive the id from Fennec.

This description is inadequate. In particular:

* it doesn't matter that the data comes from gecko. What matters is a good description of the data.
* It's not clear, even after reading this, whether you are sending search IDs for search plugins installed by the user. That is the critical point for me in terms of data review.

::: toolkit/components/telemetry/docs/data/core-ping.rst:174
(Diff revision 2)
> +    }
> +
> +<engine identifier>:
> +The search identifier is the engine id from Gecko. Core Ping will receive the id from Fennec.
> +It is initialized outside of the core ping. If you want to know it's generation logic, the process is :
> +* First, look in the distribution.

You can remove all these bullets about how we find search plugins; what's important is what search IDs are and in what cases they are sent.

::: toolkit/components/telemetry/docs/data/core-ping.rst:178
(Diff revision 2)
> +It is initialized outside of the core ping. If you want to know it's generation logic, the process is :
> +* First, look in the distribution.
> +* Second, look in the jar for plugins shipped with the locale.
> +* Finally, look in the profile for third-party plugins.
> +If after above process, the identifier still can not be found:
> +* It will be "other" when the user click the result from our UI.

I don't understand what this means.

::: toolkit/components/telemetry/docs/data/core-ping.rst:179
(Diff revision 2)
> +* First, look in the distribution.
> +* Second, look in the jar for plugins shipped with the locale.
> +* Finally, look in the profile for third-party plugins.
> +If after above process, the identifier still can not be found:
> +* It will be "other" when the user click the result from our UI.
> +* It will be "null" when the user press the software keyboard "enter" key directly.

You mean, if the user types a non-URL in the location bar and presses enter? Why is this a function of the search engine ID and not the "sources"?
Attachment #8859926 - Flags: review?(benjamin) → review-
Flags: needinfo?(benjamin)
Hi Frank
Could you please teach me how to get all the values (since 2017/3/1) of searches in telemetry_core_parquet ?
Since it's an array I don't know how to get the data using STMO.
This one seems to use the main ping and shared with Desktop[1]. I want to check mobile specifically.

I want to make sure we didn't save any user private search engine name. Thank you!


[1] https://sql.telemetry.mozilla.org/queries/93/source#table
Flags: needinfo?(fbertsch)
Hi (In reply to Benjamin Smedberg [:bsmedberg] from comment #2)
> 
> On desktop to avoid identifying users too precisely, we only record the
> search engine name in two cases:
> 
> * builtin or suggested search engines with an ID (includes partner search engines in various distribution scenarios)

for builtin or suggested search engine, Android is using desktop code, so it should be the same as desktop. Code is here [1][2]
for partner distribution scenarios, code is here[3]

> * if extended telemetry is enabled (prerelease builds)
Since it's core ping, it's out-out on Android....[4]

> 
> Here is the code which implements this policy:
> http://searchfox.org/mozilla-central/rev/
> d3307f19d5dac31d7d36fc206b00b686de82eee4/browser/modules/
> BrowserUsageTelemetry.jsm#84
> 
> It is not clear from this documentation whether the core ping follows this
> same convention or not. So we need to make that documentation at least
> clearer, and perhaps change the client behavior for non "standard" search
> engines.

Hi Jim
Could you please help me review if my finding in [1]~[2] is correct?

Thank you!


[1] https://searchfox.org/mozilla-central/rev/d441cb24482c2e5448accaf07379445059937080/mobile/android/chrome/content/browser.js#6147
[2] https://searchfox.org/mozilla-central/rev/d441cb24482c2e5448accaf07379445059937080/mobile/android/chrome/content/browser.js#6173
[3] https://searchfox.org/mozilla-central/rev/d441cb24482c2e5448accaf07379445059937080/mobile/android/base/java/org/mozilla/gecko/search/SearchEngineManager.java#455
[4] https://searchfox.org/mozilla-central/rev/d441cb24482c2e5448accaf07379445059937080/mobile/android/base/java/org/mozilla/gecko/telemetry/TelemetryCorePingDelegate.java#173
Flags: needinfo?(nchen)
Flags: needinfo?(benjamin)
Comment on attachment 8859926 [details]
Bug 1334113 - Update the document to reflect 'searches' field in core ping.

https://reviewboard.mozilla.org/r/131890/#review150110
Attachment #8859926 - Flags: review?(nchen) → review+
I don't really know this code unfortunately. You should find a frontend person.
Flags: needinfo?(nchen)
(In reply to Nevin Chen [:nechen] from comment #15)
> Hi Frank
> Could you please teach me how to get all the values (since 2017/3/1) of
> searches in telemetry_core_parquet ?

Nevin - see https://sql.telemetry.mozilla.org/queries/4778/source. That should be what you need.
Flags: needinfo?(fbertsch)
Comment on attachment 8859926 [details]
Bug 1334113 - Update the document to reflect 'searches' field in core ping.

https://reviewboard.mozilla.org/r/131890/#review150348

data-r=me
Attachment #8859926 - Flags: review?(benjamin) → review+
Flags: needinfo?(benjamin)
Pushed by cbook@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/bc6756d41d80
Update the document to reflect 'searches' field in core ping. r=bsmedberg,jchen,sebastian
Keywords: checkin-needed
https://hg.mozilla.org/mozilla-central/rev/bc6756d41d80
Status: NEW → RESOLVED
Closed: 3 years ago
Resolution: --- → FIXED
Target Milestone: --- → Firefox 55
Flags: needinfo?(liuche)
You need to log in before you can comment on or make changes to this bug.