Closed Bug 1098428 Opened 5 years ago Closed 3 years ago

Add Linux sandboxing information to Telemetry

Categories

(Core :: Security: Process Sandboxing, defect)

All
Linux
defect
Not set

Tracking

()

RESOLVED FIXED
mozilla49
Tracking Status
firefox49 --- fixed

People

(Reporter: jld, Assigned: gcp)

References

Details

(Whiteboard: sblc1 [measurement:client])

Attachments

(2 files)

Bug 1077057 added information about Linux sandbox status to nsSystemInfo — whether the system supports the security feature we're using, which child process types the build could sandbox and which of those can be sandboxed on that system.  (In the future this will expand to security feature*s*, plural.)  It's simple to add this to telemetry and FHR alongside the other system info.

The information can be determined from the kernel version string we already collect, if it's a reasonably well-known Linux distribution and not a custom build; this is what I've been doing so far for statistical purposes, but it would be better to measure it directly.
Move process sandboxing bugs to the new Bugzilla component.

(Sorry for the bugspam; filter on 3c21328c-8cfb-4819-9d88-f6e965067350.)
Component: Security → Security: Process Sandboxing
Whiteboard: sblc1
Note from aaklotz: there's an environment section in Telemetry probes, consider using that for the capability flags we currently have in about:support.
Assignee: jld → gpascutto
Summary: Add Linux sandboxing status to Telemetry and FHR → Add Linux sandboxing information to Telemetry
See Also: → CVE-2017-5426
Attachment #8736433 - Flags: review?(gfritzsche)
Comment on attachment 8736433 [details]
MozReview Request: Bug 1098428 - Add Linux sandboxing information to Telemetry. r?gfritzsche

https://reviewboard.mozilla.org/r/43285/#review40093

This is adding quite a few data points to the environment.
Is there a specific question that you want to answer from this data?
Can we reduce the data to specifically answer that question?
Also, is this something we intend to collect indefinitely?
Or just short-term to answer a specific question?
We're trying to see how common some of the more problematic Linux configurations are (i.e. where certain kernel/system features are not compiled in or disabled) that require either a) running with sandboxing disabled b) jumping through high hoops in order to be able to enable it as well as seeing how common features are that allow a stricter sandbox to be used.

We currently don't have any data on this at all, which makes it hard to prioritize sandboxing work. I would imagine that if some features turn out to be extremely common (or eventually become so as people upgrade), we can drop them from Telemetry.

We'd also want this info when analyzing crashes.
(In reply to Gian-Carlo Pascutto [:gcp] from comment #6)
> We're trying to see how common some of the more problematic Linux
> configurations are (i.e. where certain kernel/system features are not
> compiled in or disabled) that require either a) running with sandboxing
> disabled b) jumping through high hoops in order to be able to enable it as
> well as seeing how common features are that allow a stricter sandbox to be
> used.

Is it possible to reduce this to, say, boolean properties like "canEnableSandbox" and "allowsStricterSandbox"?
That would be possible *after* we collect this data and make sensible decisions what we baseline need to support so most of our users "canEnableSandbox" and what we can reasonably expect to use for "StricterSandbox".

Without *any* data, as is currently the case, I have no way of telling what should go in both options!

The alternative is perhaps to dig this data out of the kernel version string and making some assumptions about distributions, as the original comment suggests, but that has all the pitfalls of not measuring the thing you need to know directly.
Ok. With this being only a temporary and exploratory use for one use-case, it seems like this doesn't need to go into the environment (which is submitted with other pings as well).
I'd rather see this as a separate property on the pings payload (i.e. add the data around [0]).

Benjamin, do you have an opinion on this from the data steward perspective?
Flags: needinfo?(benjamin)
Not a strong opinion, no. You're right that this doesn't belong in the environment because it's not a correlate.
Flags: needinfo?(benjamin)
Let's proceed with adding it at [0] as proposed in comment 9 then.
If it is sufficient to get that data from beta/aurora/nightly, then adding it only for Telemetry.canRecordExtended is better.
We should add documentation for this to [1].

0: https://dxr.mozilla.org/mozilla-central/rev/a235bfcc8c411169b82420c503775c1a3e7edad5/toolkit/components/telemetry/TelemetrySession.jsm#1253
1: https://dxr.mozilla.org/mozilla-central/source/toolkit/components/telemetry/docs/main-ping.rst
Whiteboard: sblc1 → sblc1 [measurement:client]
Also, let's stay with the existing naming convention here:
`osCapabilities`, `userNamespaces`, `isEnabled`, etc.
Comment on attachment 8736433 [details]
MozReview Request: Bug 1098428 - Add Linux sandboxing information to Telemetry. r?gfritzsche

Review request updated; see interdiff: https://reviewboard.mozilla.org/r/43285/diff/1-2/
Attachment #8736433 - Flags: review?(gfritzsche)
Comment on attachment 8736433 [details]
MozReview Request: Bug 1098428 - Add Linux sandboxing information to Telemetry. r?gfritzsche

https://reviewboard.mozilla.org/r/43285/#review42551

Sorry for the delay, i was busy with a work-week.

::: toolkit/components/telemetry/TelemetryEnvironment.jsm:1340
(Diff revision 1)
> +    let sandboxmedia = getSysinfoProperty("canSandboxMedia", null);
> +    if (sandboxmedia === null) {
> +      sandboxmedia = false;
> +    }
> +    return {
> +      os_capabilities: {

Let's stick with the naming convention here:
`osCapabilities`, `userNamespaces`, `isEnabled`, etc.

::: toolkit/components/telemetry/TelemetryEnvironment.jsm:9
(Diff revision 2)
>  
>  "use strict";
>  
>  this.EXPORTED_SYMBOLS = [
>    "TelemetryEnvironment",
> +  "getSysinfoProperty"

Left-over, please remove.

::: toolkit/components/telemetry/TelemetrySession.jsm:25
(Diff revision 2)
>  Cu.import("resource://gre/modules/Task.jsm");
>  Cu.import("resource://gre/modules/Timer.jsm");
>  Cu.import("resource://gre/modules/TelemetrySend.jsm", this);
>  Cu.import("resource://gre/modules/TelemetryUtils.jsm", this);
>  Cu.import("resource://gre/modules/AppConstants.jsm");
> +Cu.import("resource://gre/modules/TelemetryEnvironment.jsm", this);

Was there a reason for making the import non-lazy?

::: toolkit/components/telemetry/TelemetrySession.jsm:1245
(Diff revision 2)
> +   * Get the information about sandboxing capabilities
> +   */
> +  getSandboxCapabilities: function () {
> +    // The following two are null if not present, as opposed to
> +    // the others that are true/false.
> +    let sandboxcontent = getSysinfoProperty("canSandboxContent", null);

`getSysinfoProperty()` is not defined in this file.
Did you verify this works?
Could you add basic test coverage at:
https://dxr.mozilla.org/mozilla-central/rev/f3f2fa1d7eed5a8262f6401ef18ff8117a3ce43e/toolkit/components/telemetry/tests/unit/test_TelemetrySession.js#245

::: toolkit/components/telemetry/TelemetrySession.jsm:1246
(Diff revision 2)
> +    if (sandboxcontent === null) {
> +      sandboxcontent = false;

This pattern seems redundant.
Let's just use `sandboxMedia: getSysinfoProperty("canSandboxContent", false)` below?

::: toolkit/components/telemetry/TelemetrySession.jsm:1249
(Diff revision 2)
> +    let sandboxmedia = getSysinfoProperty("canSandboxMedia", null);
> +    if (sandboxmedia === null) {

Ditto.

::: toolkit/components/telemetry/docs/main-ping.rst:450
(Diff revision 2)
>        ...
>      ],
> +
> +sandboxCaps
> +-----------
> +This sections contains the capabilities of the host operating system (Linux only) that can be used to enable a security sandbox, and whether the sandbox was consequently enabled for content and/or media plugins.

Can you please add detail for the contents here?
E.g. at least a commented example of the JSON format?
Attachment #8736433 - Flags: review?(gfritzsche)
Also, do you need this data from the full release population (opt-out)?
Or is it enough to receive it from the testing channels and the release users that opted into Telemetry?

If the latter, we should only add this if `Telemetry.canRecordExtended` is true.
Actually, i see those measurements are all booleans now.
Could we just add them as "flag" histograms [0] (with a Linux "cpp_guard") and you can record them from e.g. the sandboxing code?
That makes them visible in a common location, allows specifying opt-out/-in & expiry and makes them automatically visible on the telemetry.mozilla.org dashboard (for the opt-in population).

0: https://developer.mozilla.org/en-US/docs/Mozilla/Performance/Adding_a_new_Telemetry_probe#Choosing_a_Histogram_Type
Comment on attachment 8736433 [details]
MozReview Request: Bug 1098428 - Add Linux sandboxing information to Telemetry. r?gfritzsche

Review request updated; see interdiff: https://reviewboard.mozilla.org/r/43285/diff/2-3/
Attachment #8736433 - Attachment description: MozReview Request: 1098428 - Add Linux sandboxing information to Telemetry. r?gfritzsche → MozReview Request: Bug 1098428 - Add Linux sandboxing information to Telemetry. r?gfritzsche
Attachment #8736433 - Flags: review?(gfritzsche)
Comment on attachment 8736433 [details]
MozReview Request: Bug 1098428 - Add Linux sandboxing information to Telemetry. r?gfritzsche

https://reviewboard.mozilla.org/r/43285/#review50334

::: toolkit/components/telemetry/Histograms.json:10687
(Diff revision 3)
>    },
> +  "SANDBOX_CAPABILITIES_SECCOMP_BPF": {
> +    "alert_emails": ["gcp@mozilla.com"],
> +    "bug_numbers": [1098428],
> +    "expires_in_version": "55",
> +    "kind": "boolean",

You'll want to use "flag" for those.
"boolean" allows to record multiple true/false values, "flag" is for exactly one (and defaults to false).
Attachment #8736433 - Flags: review?(gfritzsche)
Comment on attachment 8736433 [details]
MozReview Request: Bug 1098428 - Add Linux sandboxing information to Telemetry. r?gfritzsche

Review request updated; see interdiff: https://reviewboard.mozilla.org/r/43285/diff/3-4/
Attachment #8736433 - Flags: review?(gfritzsche)
Comment on attachment 8736433 [details]
MozReview Request: Bug 1098428 - Add Linux sandboxing information to Telemetry. r?gfritzsche

https://reviewboard.mozilla.org/r/43285/#review50664

This looks good to me.
Note that you also need data collection review before landing this:
https://wiki.mozilla.org/Firefox/Data_Collection
Attachment #8736433 - Flags: review?(gfritzsche) → review+
Comment on attachment 8736433 [details]
MozReview Request: Bug 1098428 - Add Linux sandboxing information to Telemetry. r?gfritzsche

Most of the reason for this was covered in comment 6. The current patch will not make this information available in crash reports (because it's no longer in the environment). We will circle back to that (for all platforms, but with less flags) in a separate bug.

The goal of this is to understand what sandboxing feature configurations are available "in the wild", in what percentage, and what the configurations (e.g. distros) which are missing the essential ones actually are. This will be used by the sandboxing team to prioritize work on the Linux sandbox. We foresee that this telemetry can be removed eventually as we make decision on what and what not to support.
Attachment #8736433 - Flags: feedback?(benjamin)
Blocks: 1274540
Comment on attachment 8736433 [details]
MozReview Request: Bug 1098428 - Add Linux sandboxing information to Telemetry. r?gfritzsche

data-review=me
Attachment #8736433 - Flags: feedback?(benjamin) → feedback+
sorry had to back this out for crashes on linux like https://treeherder.mozilla.org/logviewer.html#?job_id=28543353&repo=mozilla-inbound#L2216
Flags: needinfo?(gpascutto)
It looks like you are not allowed to do an Accumulate(foo, 0) on a flag histogram:
https://dxr.mozilla.org/mozilla-central/rev/46fe2115d46a5bb40523b8466341d8f9a26e1bdf/ipc/chromium/src/base/histogram.cc#1004

The documentation is rather confusing on this.
Flags: needinfo?(gpascutto)
Right, flags historically have strange semantics.
Flag histograms default to false/0 if you don't accumulate into them, so it's fine to just ever record 1.

Which documentation did you look at? Maybe we can improve that.
(In reply to Georg Fritzsche [:gfritzsche] from comment #28)
> Right, flags historically have strange semantics.
> Flag histograms default to false/0 if you don't accumulate into them, so
> it's fine to just ever record 1.

It's not that it's fine to just ever record 1, recording 0 is simply *not allowed* at all. That surprised me. That is, I understand that it's only *necessary* to record 1, but it's not clear from the docs (nor from your statement) that it's only exclusively allowed to record 1.

So you can't do (as this code was doing):

Accumulate(FOO, result_of_operation());

It has to be:

if (result_of_operation())
  Accumulate(FOO, 1);
 
> Which documentation did you look at? Maybe we can improve that.

https://developer.mozilla.org/en-US/docs/Mozilla/Performance/Adding_a_new_Telemetry_probe
(In reply to Gian-Carlo Pascutto [:gcp] from comment #29)
> (In reply to Georg Fritzsche [:gfritzsche] from comment #28)
> > Right, flags historically have strange semantics.
> > Flag histograms default to false/0 if you don't accumulate into them, so
> > it's fine to just ever record 1.
> 
> It's not that it's fine to just ever record 1, recording 0 is simply *not
> allowed* at all. That surprised me. That is, I understand that it's only
> *necessary* to record 1, but it's not clear from the docs (nor from your
> statement) that it's only exclusively allowed to record 1.

Its surprising enough that i didn't remember it.
We could just allow and ignore it, on the other hand it is useful now to uncover unneeded Accumulate(x,0) calls.
Unfortunately MozReview seems to break on backed out patches (bug 1275251).
Attachment #8755841 - Flags: review?(gfritzsche)
https://hg.mozilla.org/mozilla-central/rev/d0d5bd7a4077
Status: NEW → RESOLVED
Closed: 3 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla49
(In reply to Georg Fritzsche [:gfritzsche] [away Jun 24 - Jul 3] from comment #19)
> You'll want to use "flag" for those.
> "boolean" allows to record multiple true/false values, "flag" is for exactly
> one (and defaults to false).

In retrospect this is causing me some issues. It looks like there may be callpaths where we submit Telemetry before the code added here has had a chance to run, causing the "false" proportion to look much larger than it is.

Either that or we have a real issue. But I'm annoyed it's hard to tell.
Can we take this to a new bug or so and discuss the exact problem you are having?
Filed bug 1284240.
Depends on: 1284240
You need to log in before you can comment on or make changes to this bug.