Closed Bug 1591564 Opened 4 months ago Closed 1 month ago

Integrate the Glean SDK into mozilla-central

Categories

(Toolkit :: Telemetry, task, P1)

task
Points:
3

Tracking

()

RESOLVED FIXED
mozilla74
Tracking Status
firefox74 --- fixed

People

(Reporter: chutten, Assigned: chutten)

References

Details

Attachments

(12 files, 6 obsolete files)

47 bytes, text/x-phabricator-request
Details | Review
47 bytes, text/x-phabricator-request
Details | Review
47 bytes, text/x-phabricator-request
Details | Review
47 bytes, text/x-phabricator-request
Details | Review
47 bytes, text/x-phabricator-request
Details | Review
2.10 KB, text/plain
tdsmith
: data-review+
Details
47 bytes, text/x-phabricator-request
Details | Review
47 bytes, text/x-phabricator-request
Details | Review
47 bytes, text/x-phabricator-request
Details | Review
47 bytes, text/x-phabricator-request
Details | Review
47 bytes, text/x-phabricator-request
Details | Review
47 bytes, text/x-phabricator-request
Details | Review

With the Glean SDK crates available on crates.io, dependencies resolved, integration layer technologies sorted, and build system adapted we can finally vendor the crates in and init them. We can prove that it builds when we want it, and only when we want it. We can write some more tests.

Don't go so far as to send the "prototype" ping yet. That's for later.

Blocks: 1591566
Points: --- → 3
Priority: -- → P2
Depends on: 1596132
Depends on: 1600385

Poking through mozilla-central, I found that CertStorage registers itself as an nsIObserver for its own prefs. This might be helpful in ensuring we have and keep the current value of the data upload pref, though it would mean a certain amount of unsafe code.

Assignee: nobody → chutten

Depends on D56468

Doesn't compile because I'm trying to send an nsIObserver across threads.

Need to spend some time thinking about the design of this state...

Depends on D56469

Attached file Bug 1591564 - mach vendor rust for fog (obsolete) —

Depends on D56470

Since we're the only one sending data, and we're doing so infrequently, let's
get the pref value before each ping send instead of building a pref observer
right this second.

Depends on D57106

Attachment #9114729 - Attachment is obsolete: true
Attachment #9114730 - Attachment is obsolete: true
Attachment #9114731 - Attachment is obsolete: true
Attachment #9114732 - Attachment is obsolete: true

Depends on D57108

This is my first time writing Rust in m-c. Jan-Erik and I recognize that maybe we should have a practised eye take a look at these changes before we push them in, even if it is gated to just Nightly.

Lina, would you be or know of someone who could provide this sort of feedback?

Flags: needinfo?(lina)
Attached file data collection review
Attachment #9116199 - Flags: data-review?(tdsmith)
Status: NEW → ASSIGNED
Priority: P2 → P1

How will you document the prototype ping?

Flags: needinfo?(chutten)

(In reply to Tim Smith 👨‍🔬 [:tdsmith] from comment #13)

How will you document the prototype ping?

Good question! I will put a pings.yaml in toolkit/components/telemetry/fog/ that will contain the necessary information.

Flags: needinfo?(chutten)
Attachment #9116393 - Attachment is obsolete: true
Comment on attachment 9116199 [details]
data collection review

Alicia, this is a data-review request for a new ping type ("prototype ping") containing a new identifier. The identifier is named "client_id" but it does not have the same value as the normal telemetry client_id. The new value ("`prototype` client-id") and the telemetry client_id will not be sent in the same document. The prototype ping will only contain Category 1 technical data. The prototype ping is a time-limited experimental collection in Nightly only in order to shake out the Glean telemetry library on desktop. Please confirm that you don't have any concerns.

Sounds great otherwise; thanks chutten for answering my questions about the content of the ping and documentation!

--


1) Is there or will there be **documentation** that describes the schema for the ultimate data set in a public, complete, and accurate way?

toolkit/components/telemetry/fog/pings.yaml will describe the ping.

https://mozilla.github.io/glean/book/user/pings/index.html#ping-sections describes the ping_info and client_info content they will contain.

In the future, glean pings will be described at https://firefox-source-docs.mozilla.org/toolkit/components/telemetry/index.html and collections will be described by an enhanced probe dictionary.

2) Is there a control mechanism that allows the user to turn the data collection on and off?

Yes, the Firefox telemetry opt-out.

3) If the request is for permanent data collection, is there someone who will monitor the data over time?

n/a.

4) Using the **[category system of data types](https://wiki.mozilla.org/Firefox/Data_Collection)** on the Mozilla wiki, what collection type of data do the requested measurements fall under?

Category 1, technical data.

5) Is the data collection request for default-on or default-off?

Default-on.

6) Does the instrumentation include the addition of **any *new* identifiers**?

Yes, discussed above.

7) Is the data collection covered by the existing Firefox privacy notice?

Yes, but ni agray to confirm.

8) Does there need to be a check-in in the future to determine whether to renew the data?

No, but chutten's responsible for removing it eventually.

9) Does the data collection use a third-party collection tool?

No.
Flags: needinfo?(agray)
Attachment #9116199 - Flags: data-review?(tdsmith) → data-review+

Jan-Erik tells me Emilio might have some input on how I'm doing Rust in mozilla-central. I would love to have input on where I've gone wrong : )

Flags: needinfo?(emilio)

I took a look. It looks mostly fine with some nits, but the prefs usage definitely needs to get fixed.

Flags: needinfo?(emilio)

(In reply to Tim Smith 👨‍🔬 [:tdsmith] from comment #18)

Comment on attachment 9116199 [details]
data collection review

Alicia, this is a data-review request for a new ping type ("prototype ping")
containing a new identifier. The identifier is named "client_id" but it does
not have the same value as the normal telemetry client_id. The new value
("prototype client-id") and the telemetry client_id will not be sent in
the same document. The prototype ping will only contain Category 1 technical
data. The prototype ping is a time-limited experimental collection in
Nightly only in order to shake out the Glean telemetry library on desktop.
Please confirm that you don't have any concerns.

Sounds great otherwise; thanks chutten for answering my questions about the
content of the ping and documentation!

--

  1. Is there or will there be documentation that describes the schema for
    the ultimate data set in a public, complete, and accurate way?

toolkit/components/telemetry/fog/pings.yaml will describe the ping.

https://mozilla.github.io/glean/book/user/pings/index.html#ping-sections
describes the ping_info and client_info content they will contain.

In the future, glean pings will be described at
https://firefox-source-docs.mozilla.org/toolkit/components/telemetry/index.
html and collections will be described by an enhanced probe dictionary.

  1. Is there a control mechanism that allows the user to turn the data
    collection on and off?

Yes, the Firefox telemetry opt-out.

  1. If the request is for permanent data collection, is there someone who
    will monitor the data over time?

n/a.

  1. Using the category system of data
    types
    on the Mozilla
    wiki, what collection type of data do the requested measurements fall under?

Category 1, technical data.

  1. Is the data collection request for default-on or default-off?

Default-on.

  1. Does the instrumentation include the addition of any new
    identifiers
    ?

Yes, discussed above.

  1. Is the data collection covered by the existing Firefox privacy notice?

Yes, but ni agray to confirm.

  1. Does there need to be a check-in in the future to determine whether to
    renew the data?

No, but chutten's responsible for removing it eventually.

  1. Does the data collection use a third-party collection tool?

No.

Trust approved. Nightly privacy notice will cover this as part of its "experimental nature" language.

Flags: needinfo?(agray)

Depends on D57359

Attachment #9117237 - Attachment description: Bug 1591564 - Update glean-preview to 0.0.3 r?janerik → Bug 1591564 - Update glean-preview to 0.0.4 r?janerik

nsIProcess is the tried-and-true method for launching utility subprocesses on
Firefox Desktop's supported platforms. Use that.

Depends on D57977

I read through your patches, and they look great to me too, :chutten!

Flags: needinfo?(lina)
Pushed by chutten@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/a3fdaaf53c34
Initialize the FOGotype when Telemetry inits r=janerik,emilio
https://hg.mozilla.org/integration/autoland/rev/aacb5ca2f486
Assemble a 'prototype' ping every hour. r=janerik
https://hg.mozilla.org/integration/autoland/rev/5f6f201ee47e
Determine from the pref whether Glean upload is enabled r=janerik,emilio
https://hg.mozilla.org/integration/autoland/rev/dfeba21643d4
Convert FOGotype Glean pings to pingsender format and send them r=janerik
https://hg.mozilla.org/integration/autoland/rev/d7de3809a8b5
Debug log when failing to send pings r=janerik,emilio
https://hg.mozilla.org/integration/autoland/rev/01060f5ff14b
Update glean-preview to 0.0.4 r=janerik
https://hg.mozilla.org/integration/autoland/rev/9c740bf48245
Use nsIProcess instead of std::process::Command r=janerik,gsvelto
Pushed by chutten@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/5623ec4258ab
Add a pings.yaml for the 'prototype' ping r=janerik

The Tier2 build problem appears to be libbacktrace being included in mingw windows builds where it maybe should not be. (( The linked discussion points to libbacktrace skipping msvc, but maybe it should also be skipping clang on mingw? (I have no idea). ))

The leak is almost certainly due to me mishandling an nsString in the integration layer somewhere.

Flags: needinfo?(chutten)
Attached file investigation (obsolete) —
Attachment #9119437 - Attachment description: I spent a lot of tim → investigation
Attachment #9119437 - Attachment is obsolete: true

I spent a lot of time today tracking down the libbacktrace issue and why it didn't occur before.
Except for glean-preview we didn't include or update any other Rust crates in this patch set.
glean-preview uses glean-core, which uses failure, which includes backtrace by defualt.

failure was already included before and compiled with default features, and so was backtrace.
glean doesn't really need failure's backtrace feature, so I tried disabling its default features.

Due to how cargo currently works this needs to be done across all transitive dependencies (the final feature set a crate is compiled with it the union across it all).

I then changed m-c to use the crates from git directly.

Turns out that still fails with the same issue on the x86_64-pc-windows-gnu target.
So something is re-enabling that failure "std" feature in tree.

With a combination of cargo-feature-set and cargo tree I tracked it down further.

This is the full feature set for the gkrust-shared crate as compiled:

cargo feature-set --features bitsdownload --features cranelift_x86 --features cubeb-remoting --features fogotype --features gecko_debug --features gecko_profiler --features gecko_refcount_logging --features moz_memory --features moz_places --features new_cert_storage --features new_xulstore --features quantum_render

Turns out the quantum_render feature enables the webrender_bindings crate,
which then somehow pulls in failure through transitive dependencies again.
More trial-and-error revealed its a dependency of the dirs crate,
only used on Windows.
Except dirs doesn't need failure for the target we're building for (x86_64-pc-windows-gnu or Mac or Linux).
It's again a transitive dependency for a crate called redox_users, which is only pulled in when compiled for redox

Except that's not how cargo works.
cargo always pulls in all dependencies, merges all features and only later ignores the crates it doesn't actually need.
That's a long standing issue:


Why did it all work before just fine?
Nothing of the above dependency graph changed with the patch set from :chutten.
So the only thing that changed is us actually using glean_preview.
Which means we're suddenly calling functions that return types that wrap other types that come from the backtrace crate, so the linker is now looking for them and can't find them.
I guess we've just been lucky so far that this didn't blow up.

So what's the solution?

  • Short term: fork failure, remove backtrace there and vendor that fork
  • Mid term: get rid of failure across the Glean code base (ffi-support might want that anyway and we don't benefit much from it either)
    • That only helps us, but it won't be long before someone else runs into this problem again.
  • Long term: fix cargo once and for all.

:froydnj see above writeup

Flags: needinfo?(nfroyd)
Depends on: 1608157

Forking crates to fix problems is fine, though obviously long-term undesirable. We could try disallowing failure from ever appearing in our build graph, though that's probably hard, especially when it appears in crates we'll never compile.

I guess it doesn't work to force the failure crate's features in https://searchfox.org/mozilla-central/source/build/workspace-hack/Cargo.toml, because cargo would still union in the std (and therefore the backtrace) feature from other crates, correct?

Flags: needinfo?(nfroyd)

Correct, cargo always takes the union.
I'm going to land the patch as a short-term fix and I will work on removing failure from crates that don't really need it.

The end of the std::thread at process end didn't seem to release the owned
nsStringBuffer in a way that refcounting liked. So let's copy the nsAString
into an owned String, move it into the thread's closure, and convert it as
necessary to an nsAString when we invoke pingsender.

Not the most efficient, but it doesn't have to be. This is prototype code
that will be removed.

Depends on D58809

Depends on D59531

Pushed by chutten@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/782d35126f2e
Initialize the FOGotype when Telemetry inits r=janerik,emilio
https://hg.mozilla.org/integration/autoland/rev/8da8b0a81ed4
Assemble a 'prototype' ping every hour. r=janerik
https://hg.mozilla.org/integration/autoland/rev/5c34c01cd870
Determine from the pref whether Glean upload is enabled r=janerik,emilio
https://hg.mozilla.org/integration/autoland/rev/779980947286
Convert FOGotype Glean pings to pingsender format and send them r=janerik
https://hg.mozilla.org/integration/autoland/rev/bb14fde6d3e2
Debug log when failing to send pings r=janerik,emilio
https://hg.mozilla.org/integration/autoland/rev/045fbbec7011
Update glean-preview to 0.0.4 r=janerik
https://hg.mozilla.org/integration/autoland/rev/59a0057b6582
Use nsIProcess instead of std::process::Command r=janerik,gsvelto
https://hg.mozilla.org/integration/autoland/rev/0416c108040e
Use Rust string types with std::thread r=froydnj,janerik
https://hg.mozilla.org/integration/autoland/rev/03300b89f364
Run rustfmt on FOGotype r=janerik

Backed out 9 changesets (Bug 1591564) for talos and browser chrome failures at appdata\local\temp\db\data.mdb

Push with failure: https://treeherder.mozilla.org/#/jobs?repo=autoland&group_state=expanded&resultStatus=testfailed%2Cbusted%2Cexception&classifiedState=unclassified&revision=03300b89f364aa1c7741d37ca741c6a7dca23bc3

Failure log:

  1. browser chrome: https://treeherder.mozilla.org/logviewer.html#/jobs?job_id=284452233&repo=autoland&lineNumber=1505
  2. talos-xperf: https://treeherder.mozilla.org/logviewer.html#/jobs?job_id=284460626&repo=autoland&lineNumber=1730

Backout link: https://treeherder.mozilla.org/#/jobs?repo=autoland&group_state=expanded&resultStatus=testfailed%2Cbusted%2Cexception&classifiedState=unclassified&revision=54bacc557b8622a9ab2e7c5c083d5ffdcf25139f


[task 2020-01-10T20:31:35.603Z] 20:31:35     INFO - TEST-START | browser/base/content/test/performance/browser_startup_mainthreadio.js
[task 2020-01-10T20:31:35.662Z] 20:31:35     INFO - TEST-INFO | started process screenshot
[task 2020-01-10T20:31:35.883Z] 20:31:35     INFO - TEST-INFO | screenshot: exit 0
[task 2020-01-10T20:31:35.884Z] 20:31:35     INFO - Buffered messages logged at 20:31:35
[task 2020-01-10T20:31:35.884Z] 20:31:35     INFO - Entering test bound 
[task 2020-01-10T20:31:35.885Z] 20:31:35     INFO - TEST-PASS | browser/base/content/test/performance/browser_startup_mainthreadio.js | The IO interposer should be enabled in builds that are not RELEASE_OR_BETA - 
[task 2020-01-10T20:31:35.885Z] 20:31:35     INFO - whitelisted paths before profile selection:
[task 2020-01-10T20:31:35.885Z] 20:31:35     INFO -   Z:\task_1578687885\AppData\Roaming\Mozilla\Firefox\profiles.ini - condition: true, ignoreIfUnused: true, read: 1, stat: 1, listedPath: UAppData:profiles.ini
[task 2020-01-10T20:31:35.886Z] 20:31:35     INFO -   C:\Users\task_1578687885\AppData\Local\Temp\tmporpc9c.mozrunner - condition: true, stat: 1, listedPath: ProfD:
[task 2020-01-10T20:31:35.886Z] 20:31:35     INFO -   C:\Users\task_1578687885\AppData\Local\Temp\tmporpc9c.mozrunner\parent.lock - condition: true, stat: 1, listedPath: ProfD:parent.lock
[task 2020-01-10T20:31:35.886Z] 20:31:35     INFO -   C:\Users\task_1578687885\AppData\Local\Temp\tmporpc9c.mozrunner\minidumps - condition: true, stat: 1, listedPath: ProfD:minidumps
[task 2020-01-10T20:31:35.886Z] 20:31:35     INFO -   Z:\task_1578687885\build\application\firefox\browser\defaults\preferences - condition: true, stat: 1, listedPath: XCurProcD:defaults/preferences
[task 2020-01-10T20:31:35.886Z] 20:31:35     INFO -   C:\Users\task_1578687885\AppData\Local\Temp\tmporpc9c.mozrunner\startupCache\scriptCache-child-current.bin - condition: true, stat: 1, listedPath: ProfLDS:startupCache/scriptCache-child-current.bin
[task 2020-01-10T20:31:35.886Z] 20:31:35     INFO -   C:\Users\task_1578687885\AppData\Local\Temp\tmporpc9c.mozrunner\startupCache\scriptCache-child.bin - condition: true, stat: 1, listedPath: ProfLDS:startupCache/scriptCache-child.bin
[task 2020-01-10T20:31:35.886Z] 20:31:35     INFO -   C:\Users\task_1578687885\AppData\Local\Temp\tmporpc9c.mozrunner\startupCache\scriptCache-current.bin - condition: true, stat: 1, listedPath: ProfLDS:startupCache/scriptCache-current.bin
[task 2020-01-10T20:31:35.887Z] 20:31:35     INFO -   C:\Users\task_1578687885\AppData\Local\Temp\tmporpc9c.mozrunner\startupCache\scriptCache.bin - condition: true, stat: 1, listedPath: ProfLDS:startupCache/scriptCache.bin
[task 2020-01-10T20:31:35.887Z] 20:31:35     INFO -   Z:\task_1578687885\build\application\firefox\defaults\pref\channel-prefs.js - stat: 1, read: 1, close: 1, listedPath: PrfDef:channel-prefs.js
[task 2020-01-10T20:31:35.887Z] 20:31:35     INFO -   C:\Users\task_1578687885\AppData\Local\Temp\tmporpc9c.mozrunner\prefs.js - stat: 1, read: 1, close: 1, listedPath: PrefD:prefs.js
[task 2020-01-10T20:31:35.887Z] 20:31:35     INFO -   C:\Users\task_1578687885\AppData\Local\Temp\tmporpc9c.mozrunner\user.js - stat: 1, read: 1, close: 1, listedPath: PrefD:user.js
[task 2020-01-10T20:31:35.887Z] 20:31:35     INFO -   C:\Users\task_1578687885\AppData\Local\Temp\tmporpc9c.mozrunner\xulstore\data.mdb - condition: true, write: 1, fsync: 1, listedPath: ProfD:xulstore/data.mdb
[task 2020-01-10T20:31:35.888Z] 20:31:35     INFO - (PoisonIOInterposer) stat - Z:\task_1578687885\AppData\Roaming\Mozilla\Firefox\profiles.ini
[task 2020-01-10T20:31:35.888Z] 20:31:35     INFO - (PoisonIOInterposer) stat - c:\users\task_1578687885\appdata\local\temp\tmporpc9c.mozrunner
[task 2020-01-10T20:31:35.889Z] 20:31:35     INFO - (PoisonIOInterposer) stat - c:\users\task_1578687885\appdata\local\temp\tmporpc9c.mozrunner\parent.lock
[task 2020-01-10T20:31:35.889Z] 20:31:35     INFO - (PoisonIOInterposer) stat - C:\Users\task_1578687885\AppData\Local\Temp\tmporpc9c.mozrunner\minidumps
[task 2020-01-10T20:31:35.889Z] 20:31:35     INFO - (PoisonIOInterposer) read - Z:\task_1578687885\build\application\firefox\defaults\pref\channel-prefs.js
[task 2020-01-10T20:31:35.890Z] 20:31:35     INFO - (PoisonIOInterposer) stat - Z:\task_1578687885\build\application\firefox\browser\defaults\preferences
[task 2020-01-10T20:31:35.890Z] 20:31:35     INFO - Buffered messages finished
[task 2020-01-10T20:31:35.890Z] 20:31:35     INFO - TEST-UNEXPECTED-FAIL | browser/base/content/test/performance/browser_startup_mainthreadio.js | unexpected read on C:\Users\task_1578687885\AppData\Local\Temp\db\data.mdb before profile selection - 
[task 2020-01-10T20:31:35.891Z] 20:31:35     INFO - Stack trace:
[task 2020-01-10T20:31:35.891Z] 20:31:35     INFO -   XREMain::XRE_main
[task 2020-01-10T20:31:35.891Z] 20:31:35     INFO - (PoisonIOInterposer) read - C:\Users\task_1578687885\AppData\Local\Temp\db\data.mdb
[task 2020-01-10T20:31:35.892Z] 20:31:35     INFO - (PoisonIOInterposer) read - C:\Users\task_1578687885\AppData\Local\Temp\tmporpc9c.mozrunner\prefs.js
[task 2020-01-10T20:31:35.892Z] 20:31:35     INFO - (PoisonIOInterposer) read - C:\Users\task_1578687885\AppData\Local\Temp\tmporpc9c.mozrunner\user.js
[task 2020-01-10T20:31:35.893Z] 20:31:35     INFO - (PoisonIOInterposer) stat - C:\Users\task_1578687885\AppData\Local\Temp\tmporpc9c.mozrunner\startupCache\scriptCache-child.bin
[task 2020-01-10T20:31:35.893Z] 20:31:35     INFO - (PoisonIOInterposer) stat - C:\Users\task_1578687885\AppData\Local\Temp\tmporpc9c.mozrunner\startupCache\scriptCache-child-current.bin
[task 2020-01-10T20:31:35.894Z] 20:31:35     INFO - (PoisonIOInterposer) stat - C:\Users\task_1578687885\AppData\Local\Temp\tmporpc9c.mozrunner\startupCache\scriptCache.bin
[task 2020-01-10T20:31:35.894Z] 20:31:35     INFO - (PoisonIOInterposer) stat - C:\Users\task_1578687885\AppData\Local\Temp\tmporpc9c.mozrunner\startupCache\scriptCache-current.bin
Flags: needinfo?(chutten)

It appears as though Glean's init opens the db on the thread it's called on. Lemme wrap the entire init block in a spawned std::thread and see if that solves it.

Trypush here: https://treeherder.mozilla.org/#/jobs?repo=try&revision=0267643b182b6f781fe42ee1a67fc6ed97154501

(looking to see if the Bc tests come back green)

Flags: needinfo?(chutten)

Glean init does some I/O which is forbidden on the main thread (especially
during startup). Though not recommended for mobile platforms, let's init it
off the main thread.

Later versions of the Glean SDK might take care of off-thread I/O during init.
But for now, let's do it ourselves.

Depends on D59532

Pushed by jrediger@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/b18f5ee12494
Initialize the FOGotype when Telemetry inits r=janerik,emilio
https://hg.mozilla.org/integration/autoland/rev/6dbc2325d1ed
Assemble a 'prototype' ping every hour. r=janerik
https://hg.mozilla.org/integration/autoland/rev/43d749cff07e
Determine from the pref whether Glean upload is enabled r=janerik,emilio
https://hg.mozilla.org/integration/autoland/rev/5edf28e3c13b
Convert FOGotype Glean pings to pingsender format and send them r=janerik
https://hg.mozilla.org/integration/autoland/rev/fc040effba1c
Debug log when failing to send pings r=janerik,emilio
https://hg.mozilla.org/integration/autoland/rev/427faa0aef87
Update glean-preview to 0.0.4 r=janerik
https://hg.mozilla.org/integration/autoland/rev/fb49d44ffbb2
Use nsIProcess instead of std::process::Command r=janerik,gsvelto
https://hg.mozilla.org/integration/autoland/rev/1cf9c1e5f7bb
Use Rust string types with std::thread r=froydnj,janerik
https://hg.mozilla.org/integration/autoland/rev/52dbbdef7811
Run rustfmt on FOGotype r=janerik
https://hg.mozilla.org/integration/autoland/rev/7de9590b9e81
Initialize the Glean SDK off the main thread r=janerik
Regressions: 1610300
Duplicate of this bug: 1591566
You need to log in before you can comment on or make changes to this bug.