Closed Bug 1636251 Opened 5 years ago Closed 5 years ago

Report mach errors to Sentry

Categories

(Firefox Build System :: Mach Core, enhancement)

enhancement

Tracking

(firefox78 wontfix, firefox79 fixed)

RESOLVED FIXED
mozilla79
Tracking Status
firefox78 --- wontfix
firefox79 --- fixed

People

(Reporter: mhentges, Assigned: mhentges)

Details

Attachments

(5 files, 3 obsolete files)

As discovered in 1634574, it's viable to integrate Sentry with ./mach in-tree.

Goals:

  • Identical exceptions from the same machine are de-duped
  • Build failures/"expected failures" aren't reported to Sentry
  • Mach (or subcommand) issues are reported

It feels like this should be subject to the same policy as build system telemetry (currently opt-in, maybe opt-out for employees in the future)

That's true, the current implementation I have honours the telemetry setting in machrc.

These errors are reported to the "mach" project here:
https://sentry.prod.mozaws.net/operations/mach/

Should only reports exceptions if mach._run() raises, which should always be a bug in
mach or a loaded command module.

urllib3 is needed by Sentry for its HTTP communication.

These errors are reported to the "mach" project here:
https://sentry.prod.mozaws.net/operations/mach/

Should only report exceptions if mach._run() raises, which should always be a bug in
mach or a loaded command module.

Depends on D74737

Attachment #9147374 - Attachment is obsolete: true
Pushed by mhentges@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/5694a728839e report |./mach| errors with Sentry r=rstewart

:chutten, this is an expansion of data collection under the existing enable-flag for build system telemetry (data review in bug 1291053). Does it need its own data review?

Flags: needinfo?(chutten)

In the implementation, this uses the telemetry setting in .mozbuild's machrc to control whether exceptions are reported.
:glob has confirmed that this doesn't need its own data review - it sounds like Sentry reporting (exception reporting) is separate from telemetry.

Hmm, weird. certifi is listed in virtualenv_packages.txt - maybe it's a Windows-specific issue? I'll dig in.

Flags: needinfo?(mhentges)
Status: ASSIGNED → RESOLVED
Closed: 5 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla78

(In reply to :dmajor from comment #8)

:chutten, this is an expansion of data collection under the existing enable-flag for build system telemetry (data review in bug 1291053). Does it need its own data review?

All new or expanded data collections require Data Collection Review. Similar to how new Histograms in Firefox need review despite that they're all covered by the same Telemetry setting, new data collections in mach need review too.

:mhentges and :glob you might instead be thinking about Trust/Legal review not being needed because mach already has a well-known and privacy-notice-compliant user preference system for data collection. That's the heavier-weight process involving emails and needinfos to make sure the data collection mechanisms we're adding are compliant.

Please request Data Collection Review at your soonest convenience. No need to backout if we get this filed away soonish.

Flags: needinfo?(chutten) → needinfo?(mhentges)

Ah, interesting, ok.
I'm going to NI :glob here to confirm^.

(also re-opening this bug - there were two patches, but one still needs to land)

Status: RESOLVED → REOPENED
Flags: needinfo?(mhentges) → needinfo?(glob)
Resolution: FIXED → ---
Attached file data-collection-review-request.md (obsolete) —
Flags: needinfo?(glob)
Attachment #9149758 - Flags: data-review?(chutten)
Comment on attachment 9149758 [details] data-collection-review-request.md Please expand upon the proposed measurements. "Crash data" could mean quite a few things of a variety of levels of privacy and detail.
Attachment #9149758 - Flags: data-review?(chutten)
Attached file data-collection-review-request.md (obsolete) —
Attachment #9149758 - Attachment is obsolete: true
Attachment #9152300 - Flags: data-review?(chutten)
Comment on attachment 9152300 [details] data-collection-review-request.md Sorry for the delay. Eurgh, sorry, this is going to have to be data-review- for the moment as we need public documentation, some clarity on whether the filenames are relative/sanitized or full, and confirmation that we have some individual responsible for the collection. Specifically: Q5 - Please clarify if the File names captured by Sentry will include full paths (may contain user names) Q6 - This question is for how long will mach continue collecting this data. I will assume this is an ongoing collection, so it will need someone responsible for it. :glob, are you willing to be the person responsible for the collection? A1 for the review response - please document the collection in a place where users subject to this collection might find it. Firefox source docs is a popular location. And since it's mach, maybe some help function in the tool itself would be appropriate instead.
Attachment #9152300 - Flags: data-review?(chutten) → data-review-

I can answer one of these: for Q5, the file names will include full paths - see this Sentry issue that includes a full path. (Note that normally you have to sign in to see Sentry issues, I explicitly made that example public for simplicity).

I'll defer to :glob for the other questions :)

Attached file data-collection.md

please document the collection in a place where users subject to this collection might find it

Mitch - please add this as part of your next revision. It should probably be a sibling of the existing build-telemetry docs (build/docs/telemetry.md -> https://firefox-source-docs.mozilla.org/build/buildsystem/telemetry.html)

Attachment #9152300 - Attachment is obsolete: true
Attachment #9153455 - Flags: data-review?(chutten)
Comment on attachment 9153455 [details] data-collection.md DATA COLLECTION REVIEW RESPONSE: Is there or will there be documentation that describes the schema for the ultimate data set available publicly, complete and accurate? Yes. This collection will be documented in the [Firefox Source Docs](https://firefox-source-docs.mozilla.org/build/buildsystem/telemetry.html). (if not in this file then in one nearby. Check the comments in bug 1636251 for specifics). Is there a control mechanism that allows the user to turn the data collection on and off? Yes. This collection can be controlled through `mach`'s configuration file. If the request is for permanent data collection, is there someone who will monitor the data over time? Yes, :glob is responsible. Using the category system of data types on the Mozilla wiki, what collection type of data do the requested measurements fall under? Category 1, Technical. Is the data collection request for default-on or default-off? Default off. (Might be default on for mozilla employees, I can't quite recall). Does the instrumentation include the addition of any new identifiers? No. Is the data collection covered by the existing Firefox privacy notice? Yes. Does there need to be a check-in in the future to determine whether to renew the data? No. This collection is permanent. --- Result: datareview+, pending Clarification on where the documentation will live and a Trust Review (ni?agray) that full file paths (perhaps containing OS user names) are acceptable collection.
Flags: needinfo?(agray)
Attachment #9153455 - Flags: data-review?(chutten) → data-review+

Hi,

I have two questions for either :glob or/Mitch:

  1. Is there a reason we'd need the full file path and can't work with a sanitized version?
  2. I'd like to take a look at the current disclosure for error reporting for mach. Can someone point me to the right place?
Flags: needinfo?(agray)

(In reply to Alicia Gray from comment #22)

  1. Is there a reason we'd need the full file path and can't work with a sanitized version?

Sentry has the ability to filter data prior to sending to our server with a before_send filter.

Unfortunately this won't be 100% as it's impossible to determine with certainty which components of a path are a username or just another directory name. For paths under /home, /Users, or C:\Users we can assume the next path component is a username, but it's possible to change the location of home directories in most operating systems.

For error messages this is might be trickier as we'd have to find things that are "path-like" in addition to picking out the username.

As error reporting is gated behind the same preference which sends full blown telemetry I'm not sure if sanitising potential usernames provides us any privacy gains.

Let us know if we should do a "best effort" to remove usernames from filenames and error messages – this most likely means masking just the default locations.

  1. I'd like to take a look at the current disclosure for error reporting for mach. Can someone point me to the right place?

mach currently doesn't have an error reporting facility.
Review for the mach telemetry submission appears to have happened in bug 1291053.

(In reply to :glob (Byron Jones) 🎈 from comment #23)

Unfortunately this won't be 100% as it's impossible to determine with certainty which components of a path are a username or just another directory name. For paths under /home, /Users, or C:\Users we can assume the next path component is a username, but it's possible to change the location of home directories in most operating systems.

For error messages this is might be trickier as we'd have to find things that are "path-like" in addition to picking out the username.

Rather than (or in addition to) look for specific absolute paths, I wonder if it would be possible to filter out whatever the prefix of the topsrcdir, the statedir, and the objdir if there is one. My guess is that those are the paths that are most likely to occur, that have user names in them.[1]


[1] The only other one that seems at all likely is python library paths, if someone is using pyenv.

It's true that we could have a deterministic way of identifying absolute paths including the topsrcdir, but we have two situations in which paths with usernames could leak through:

  • The case where an absolute path is to something outside the topsrcdir, like /home/user/Downloads/omni.ja
  • The case where an absolute path appears in the Sentry event in a location we aren't expecting
    • There's a bunch of places where absolute values could pop up, like the event.extra['sys.argv'] property or within event.exception.<snip>.frames[].(abs_path or vars) (abs_path being the path to the source file, vars being the variables in that frame)
    • Arguably we could find them all by brute-force searching through the entire event object, but that's :/

NI to make sure you see my response in comment 23.

Flags: needinfo?(agray)

(In reply to :glob (Byron Jones) 🎈 from comment #27)

NI to make sure you see my response in comment 23.

Hi there,
Back from PTO.

Re: the file path - I think a best effort for sanitizing to remove usernames from filenames and error messages including absolute paths like the topsrcdir is fine.

Re: the disclosure I'm referring to in comment 22 is actually from Bug 1291053 in Comment 28. There should be a mach first run disclosure language in place that includes a link to the public documentation on the data collection that says: Mozilla collects data about local builds in order to make builds faster and improve developer tooling. To learn more about the data we intend to collect read here (url tbd). If you have questions, please ask in #build in irc.mozilla.org. If you would like to opt out of data collection, select (N) at the prompt.

We have a couple of things that need to be done. At minimum, the reference to IRC should get updated for Matrix, assuming there is a Matrix channel for #build. I'd like to see the URL that is referenced in this disclosure to read what we currently tell people and if any update will be needed for this.

Flags: needinfo?(agray) → needinfo?(glob)

Great - thanks Alicia :)

Here's the current first-run message (from python/mozboot/mozboot/bootstrap.py:

Build system telemetry

Mozilla collects data about local builds in order to make builds faster and
improve developer tooling. To learn more about the data we intend to collect
read here:
https://firefox-source-docs.mozilla.org/build/buildsystem/telemetry.html.

If you have questions, please ask in #build in irc.mozilla.org. If you would
like to opt out of data collection, select (N) at the prompt.

Would you like to enable build system telemetry?

Good catch on the IRC reference, we can update that to point to Matrix as part of this work.
Updating the referenced documentation to include Sentry has already been done (https://phabricator.services.mozilla.com/D78409).

Flags: needinfo?(glob)

Looks like I'm in luck, the IRC message was updated recently.

I'll purge paths on a best-effort basis, thanks!

To avoid sending identifying information, common absolute paths are patched with placeholder values. For example, devs
may place their Firefox repository within their home dir, so absolute paths are doctored to be prefixed with
"<topsrcdir"> instead.

However, this alone doesn't resolve the problem, since Sentry exceptions capture variables on the stack, and sometimes
mach will put the contents of ENV into a local variable. Such ENV variables may have paths unrelated to Firefox that
include user names (e.g.: XDG_DATA_DIRS).
So, this revision also does a blanket "find-and-replace" for usernames within the Sentry event.

Depends on D74738

I've added a patch to add some anonymization, but it's not perfect.
As recommended by Tom, I have it patching the following directories:

  • /home/mitch/firefox => <topsrcdir>
  • /home/mitch/firefox/obj-x86_64-pc-linux-gnu => <topobjdir>
  • /home/mitch/.mozbuild => <statedir>

However, this didn't catch all the cases where usernames can pop up.
Sentry sends information about the variables in scope when an exception is triggered, which are very useful for debugging issues. However, these variables may carry identifying information: for example, if an exception is thrown in a part of mach where os.environ is in scope, then environment variables may have usernames (or paths including usernames) that aren't one of the srcdir/objdir/statedir that are being specifically patched. Say, on Linux, the XDG_DATA_DIRS variable contains paths that include my username.

I'm not sure if we can purge this information. If we do a blanket replacement of the username to something like <user>, then we might incorrectly replace a false positive. For example, if a username is mark, and they run into an exception in remark.py, then the error will show up in Sentry as originating from re<user>.py. This is not only is trivial to de-anonymize, but also can obscure exception information.

Glob, is the directory-replacement sufficient? Perhaps this is a question for Alicia?

Flags: needinfo?(glob)

Rather than looking for just the username, you could instead replace the full homedir string.

ie. replace /home/mark/ with ~/.

Flags: needinfo?(glob)

Sounds good, setting that up at the moment.

Note that there's cases where a bare username is in a variable. For example, macos exposes an environment variable called $USER which contains the user's name. Depending on where an exception is triggered, Sentry may pick this up.

I can write a filter to look for the specific variable in which its kept (called orig_env) and scrub it, but this variable's name may change in the future.
Alternatively, I can do a wider purge of all variables or dictionary values with the name USER, but there may be aliasing happening elsewhere in the code, which wouldn't be caught by this solution.

Another possible solution is that we can specifically scrub different parts of the Sentry payload differently. For example, we can perform path scrubbing everywhere, then perform username scrubbing (replacing the name with <user>) only within the vars data structure of the event. This allows us to ensure that the username is entirely removed without breaking the source code and references.
The downside here is that it's not clear that the Sentry payload isn't going to quietly change. If the structure changes, our intelligent scrubbing logic could look in the wrong place for local variables, allowing usernames to sneak through again.

:glob, what do you think?

Flags: needinfo?(glob)

(In reply to Mitchell Hentges [:mhentges] 🦀 from comment #34)

Sounds good, setting that up at the moment.

Note that there's cases where a bare username is in a variable. For example, macos exposes an environment variable called $USER which contains the user's name. Depending on where an exception is triggered, Sentry may pick this up.

Scrubbing all occurrences of a username in any strings will certainly lead to removing of things that aren't usernames (especially if your username happens to be a programming term).

Just scrubbing homedirs is sufficient to satisfy "best effort".

I'd feel differently if this wasn't opt-in along with other telemetry data that is likely to contain equally, if not more, sensitive data.

Flags: needinfo?(glob)

Sounds good, I'll land tomorrow morning PST so I can supervise the initial bugs being reported in

Pushed by mhentges@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/ae78c0a50575 report |./mach| errors with Sentry r=rstewart https://hg.mozilla.org/integration/autoland/rev/ce8cb6373f88 Patch Sentry events to ensure a raw username isn't sent to Sentry r=rstewart https://hg.mozilla.org/integration/autoland/rev/a033282e742d document mach error reporting r=rstewart
Pushed by mhentges@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/92a8f97c9487 report |./mach| errors with Sentry r=rstewart https://hg.mozilla.org/integration/autoland/rev/7d6e0d798df2 Patch Sentry events to ensure a raw username isn't sent to Sentry r=rstewart https://hg.mozilla.org/integration/autoland/rev/201179dac044 document mach error reporting r=rstewart
Flags: needinfo?(mhentges)
Status: REOPENED → RESOLVED
Closed: 5 years ago5 years ago
Resolution: --- → FIXED
Target Milestone: mozilla78 → mozilla79
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: