Report mach errors to Sentry
Categories
(Firefox Build System :: Mach Core, enhancement)
Tracking
(firefox78 wontfix, firefox79 fixed)
People
(Reporter: mhentges, Assigned: mhentges)
Details
Attachments
(5 files, 3 obsolete files)
As discovered in 1634574, it's viable to integrate Sentry with ./mach
in-tree.
Goals:
- Identical exceptions from the same machine are de-duped
- Build failures/"expected failures" aren't reported to Sentry
- Mach (or subcommand) issues are reported
Comment 1•5 years ago
|
||
It feels like this should be subject to the same policy as build system telemetry (currently opt-in, maybe opt-out for employees in the future)
Assignee | ||
Comment 2•5 years ago
|
||
That's true, the current implementation I have honours the telemetry setting in machrc
.
Assignee | ||
Comment 3•5 years ago
|
||
These errors are reported to the "mach" project here:
https://sentry.prod.mozaws.net/operations/mach/
Should only reports exceptions if mach._run()
raises, which should always be a bug in
mach or a loaded command module.
Assignee | ||
Comment 4•5 years ago
|
||
urllib3 is needed by Sentry for its HTTP communication.
Assignee | ||
Comment 5•5 years ago
|
||
These errors are reported to the "mach" project here:
https://sentry.prod.mozaws.net/operations/mach/
Should only report exceptions if mach._run()
raises, which should always be a bug in
mach or a loaded command module.
Depends on D74737
Updated•5 years ago
|
:chutten, this is an expansion of data collection under the existing enable-flag for build system telemetry (data review in bug 1291053). Does it need its own data review?
Assignee | ||
Comment 9•5 years ago
|
||
In the implementation, this uses the telemetry
setting in .mozbuild
's machrc
to control whether exceptions are reported.
:glob has confirmed that this doesn't need its own data review - it sounds like Sentry reporting (exception reporting) is separate from telemetry.
Comment 10•5 years ago
|
||
Backed out for build bustages.
Failure log: https://treeherder.mozilla.org/logviewer.html#/jobs?job_id=302007120&repo=autoland&lineNumber=315
Backout: https://hg.mozilla.org/integration/autoland/rev/15194a4fd69286782329ffca6739e636a991fe8f
Assignee | ||
Comment 11•5 years ago
|
||
Hmm, weird. certifi
is listed in virtualenv_packages.txt
- maybe it's a Windows-specific issue? I'll dig in.
Comment 12•5 years ago
|
||
bugherder |
Comment 13•5 years ago
|
||
(In reply to :dmajor from comment #8)
:chutten, this is an expansion of data collection under the existing enable-flag for build system telemetry (data review in bug 1291053). Does it need its own data review?
All new or expanded data collections require Data Collection Review. Similar to how new Histograms in Firefox need review despite that they're all covered by the same Telemetry setting, new data collections in mach
need review too.
:mhentges and :glob you might instead be thinking about Trust/Legal review not being needed because mach
already has a well-known and privacy-notice-compliant user preference system for data collection. That's the heavier-weight process involving emails and needinfos to make sure the data collection mechanisms we're adding are compliant.
Please request Data Collection Review at your soonest convenience. No need to backout if we get this filed away soonish.
Assignee | ||
Comment 14•5 years ago
|
||
Ah, interesting, ok.
I'm going to NI :glob here to confirm^.
(also re-opening this bug - there were two patches, but one still needs to land)
Updated•5 years ago
|
Comment 15•5 years ago
|
||
Comment 16•5 years ago
|
||
Comment 17•5 years ago
|
||
Comment 18•5 years ago
|
||
Assignee | ||
Comment 19•5 years ago
•
|
||
I can answer one of these: for Q5, the file names will include full paths - see this Sentry issue that includes a full path. (Note that normally you have to sign in to see Sentry issues, I explicitly made that example public for simplicity).
I'll defer to :glob for the other questions :)
Comment 20•5 years ago
|
||
please document the collection in a place where users subject to this collection might find it
Mitch - please add this as part of your next revision. It should probably be a sibling of the existing build-telemetry docs (build/docs/telemetry.md
-> https://firefox-source-docs.mozilla.org/build/buildsystem/telemetry.html)
Comment 21•5 years ago
|
||
Comment 22•5 years ago
|
||
Hi,
I have two questions for either :glob or/Mitch:
- Is there a reason we'd need the full file path and can't work with a sanitized version?
- I'd like to take a look at the current disclosure for error reporting for mach. Can someone point me to the right place?
Comment 23•5 years ago
|
||
(In reply to Alicia Gray from comment #22)
- Is there a reason we'd need the full file path and can't work with a sanitized version?
Sentry has the ability to filter data prior to sending to our server with a before_send
filter.
Unfortunately this won't be 100% as it's impossible to determine with certainty which components of a path are a username or just another directory name. For paths under /home
, /Users
, or C:\Users
we can assume the next path component is a username, but it's possible to change the location of home directories in most operating systems.
For error messages this is might be trickier as we'd have to find things that are "path-like" in addition to picking out the username.
As error reporting is gated behind the same preference which sends full blown telemetry I'm not sure if sanitising potential usernames provides us any privacy gains.
Let us know if we should do a "best effort" to remove usernames from filenames and error messages – this most likely means masking just the default locations.
- I'd like to take a look at the current disclosure for error reporting for mach. Can someone point me to the right place?
mach currently doesn't have an error reporting facility.
Review for the mach telemetry submission appears to have happened in bug 1291053.
Comment 24•5 years ago
|
||
(In reply to :glob (Byron Jones) 🎈 from comment #23)
Unfortunately this won't be 100% as it's impossible to determine with certainty which components of a path are a username or just another directory name. For paths under
/home
,/Users
, orC:\Users
we can assume the next path component is a username, but it's possible to change the location of home directories in most operating systems.For error messages this is might be trickier as we'd have to find things that are "path-like" in addition to picking out the username.
Rather than (or in addition to) look for specific absolute paths, I wonder if it would be possible to filter out whatever the prefix of the topsrcdir, the statedir, and the objdir if there is one. My guess is that those are the paths that are most likely to occur, that have user names in them.[1]
[1] The only other one that seems at all likely is python library paths, if someone is using pyenv.
Assignee | ||
Comment 25•5 years ago
|
||
It's true that we could have a deterministic way of identifying absolute paths including the topsrcdir, but we have two situations in which paths with usernames could leak through:
- The case where an absolute path is to something outside the topsrcdir, like
/home/user/Downloads/omni.ja
- The case where an absolute path appears in the Sentry event in a location we aren't expecting
- There's a bunch of places where absolute values could pop up, like the
event.extra['sys.argv']
property or withinevent.exception.<snip>.frames[].(abs_path or vars)
(abs_path
being the path to the source file,vars
being the variables in that frame) - Arguably we could find them all by brute-force searching through the entire event object, but that's :/
- There's a bunch of places where absolute values could pop up, like the
Assignee | ||
Comment 26•5 years ago
|
||
Depends on D74738
Comment 28•5 years ago
|
||
(In reply to :glob (Byron Jones) 🎈 from comment #27)
NI to make sure you see my response in comment 23.
Hi there,
Back from PTO.
Re: the file path - I think a best effort for sanitizing to remove usernames from filenames and error messages including absolute paths like the topsrcdir is fine.
Re: the disclosure I'm referring to in comment 22 is actually from Bug 1291053 in Comment 28. There should be a mach first run disclosure language in place that includes a link to the public documentation on the data collection that says: Mozilla collects data about local builds in order to make builds faster and improve developer tooling. To learn more about the data we intend to collect read here (url tbd). If you have questions, please ask in #build in irc.mozilla.org. If you would like to opt out of data collection, select (N) at the prompt.
We have a couple of things that need to be done. At minimum, the reference to IRC should get updated for Matrix, assuming there is a Matrix channel for #build. I'd like to see the URL that is referenced in this disclosure to read what we currently tell people and if any update will be needed for this.
Comment 29•5 years ago
|
||
Great - thanks Alicia :)
Here's the current first-run message (from python/mozboot/mozboot/bootstrap.py
:
Build system telemetry
Mozilla collects data about local builds in order to make builds faster and
improve developer tooling. To learn more about the data we intend to collect
read here:
https://firefox-source-docs.mozilla.org/build/buildsystem/telemetry.html.
If you have questions, please ask in #build in irc.mozilla.org. If you would
like to opt out of data collection, select (N) at the prompt.
Would you like to enable build system telemetry?
Good catch on the IRC reference, we can update that to point to Matrix as part of this work.
Updating the referenced documentation to include Sentry has already been done (https://phabricator.services.mozilla.com/D78409).
Assignee | ||
Comment 30•5 years ago
|
||
Looks like I'm in luck, the IRC message was updated recently.
I'll purge paths on a best-effort basis, thanks!
Assignee | ||
Comment 31•5 years ago
|
||
To avoid sending identifying information, common absolute paths are patched with placeholder values. For example, devs
may place their Firefox repository within their home dir, so absolute paths are doctored to be prefixed with
"<topsrcdir"> instead.
However, this alone doesn't resolve the problem, since Sentry exceptions capture variables on the stack, and sometimes
mach will put the contents of ENV into a local variable. Such ENV variables may have paths unrelated to Firefox that
include user names (e.g.: XDG_DATA_DIRS).
So, this revision also does a blanket "find-and-replace" for usernames within the Sentry event.
Depends on D74738
Assignee | ||
Comment 32•5 years ago
|
||
I've added a patch to add some anonymization, but it's not perfect.
As recommended by Tom, I have it patching the following directories:
/home/mitch/firefox
=><topsrcdir>
/home/mitch/firefox/obj-x86_64-pc-linux-gnu
=><topobjdir>
/home/mitch/.mozbuild
=><statedir>
However, this didn't catch all the cases where usernames can pop up.
Sentry sends information about the variables in scope when an exception is triggered, which are very useful for debugging issues. However, these variables may carry identifying information: for example, if an exception is thrown in a part of mach where os.environ
is in scope, then environment variables may have usernames (or paths including usernames) that aren't one of the srcdir/objdir/statedir that are being specifically patched. Say, on Linux, the XDG_DATA_DIRS
variable contains paths that include my username.
I'm not sure if we can purge this information. If we do a blanket replacement of the username to something like <user>
, then we might incorrectly replace a false positive. For example, if a username is mark
, and they run into an exception in remark.py
, then the error will show up in Sentry as originating from re<user>.py
. This is not only is trivial to de-anonymize, but also can obscure exception information.
Glob, is the directory-replacement sufficient? Perhaps this is a question for Alicia?
Comment 33•5 years ago
|
||
Rather than looking for just the username, you could instead replace the full homedir string.
ie. replace /home/mark/
with ~/
.
Assignee | ||
Comment 34•5 years ago
|
||
Sounds good, setting that up at the moment.
Note that there's cases where a bare username is in a variable. For example, macos exposes an environment variable called $USER
which contains the user's name. Depending on where an exception is triggered, Sentry may pick this up.
I can write a filter to look for the specific variable in which its kept (called orig_env
) and scrub it, but this variable's name may change in the future.
Alternatively, I can do a wider purge of all variables or dictionary values with the name USER
, but there may be aliasing happening elsewhere in the code, which wouldn't be caught by this solution.
Another possible solution is that we can specifically scrub different parts of the Sentry payload differently. For example, we can perform path scrubbing everywhere, then perform username scrubbing (replacing the name with <user>
) only within the vars
data structure of the event. This allows us to ensure that the username is entirely removed without breaking the source code and references.
The downside here is that it's not clear that the Sentry payload isn't going to quietly change. If the structure changes, our intelligent scrubbing logic could look in the wrong place for local variables, allowing usernames to sneak through again.
:glob, what do you think?
Comment 35•5 years ago
|
||
(In reply to Mitchell Hentges [:mhentges] 🦀 from comment #34)
Sounds good, setting that up at the moment.
Note that there's cases where a bare username is in a variable. For example, macos exposes an environment variable called
$USER
which contains the user's name. Depending on where an exception is triggered, Sentry may pick this up.
Scrubbing all occurrences of a username in any strings will certainly lead to removing of things that aren't usernames (especially if your username happens to be a programming term).
Just scrubbing homedirs is sufficient to satisfy "best effort".
I'd feel differently if this wasn't opt-in along with other telemetry data that is likely to contain equally, if not more, sensitive data.
Assignee | ||
Comment 36•5 years ago
|
||
Sounds good, I'll land tomorrow morning PST so I can supervise the initial bugs being reported in
Comment 37•5 years ago
|
||
Comment 38•5 years ago
|
||
Backed out 3 changesets (bug 1636251) for phyton failures in python/mach/mach/test/test_conditions.py.
Backout: https://hg.mozilla.org/integration/autoland/rev/568afe2cba6c957f8d7d9ceb2b0ef674648209ea
Failure log: https://treeherder.mozilla.org/logviewer.html#/jobs?job_id=305955090&repo=autoland&lineNumber=356
Comment 39•5 years ago
|
||
Assignee | ||
Updated•5 years ago
|
Assignee | ||
Updated•5 years ago
|
Comment 40•5 years ago
|
||
bugherder |
Updated•5 years ago
|
Description
•