Closed
Bug 1284240
Opened 8 years ago
Closed 8 years ago
Telemetry for seccomp-bpf support looks faulty
Categories
(Core :: Security: Process Sandboxing, defect)
Tracking
()
RESOLVED
FIXED
mozilla50
People
(Reporter: gcp, Unassigned)
References
Details
(Whiteboard: sblc2)
Attachments
(1 file)
58 bytes,
text/x-review-board-request
|
gfritzsche
:
review+
gchang
:
approval-mozilla-aurora+
|
Details |
https://sql.telemetry.mozilla.org/queries/622/source#table For some kernel versions that outright should have seccomp, there still a lot more false entries than true entries. Either the detection is not working correctly, or Telemetry is sent before the setup code has had a chance to run. Due to the use of a "flag" variable, this may cause false "false's" to be sent.
Comment 1•8 years ago
|
||
You grab values from each submission of each client, so you can run into pseudoreplication (clients that submit often will bias the distribution). You could just grab the latest value for each client (`sandbox_capabilities_seccomp_bpf[1]` etc.). I would limit this to the "release" & "beta" channels (`WHERE channel IS ...`); especially Nightly can have noisy developer data.
Comment 2•8 years ago
|
||
Telemetry is not sent very early - we don't start sending the pings that collect this data until 60 seconds after startup. If you want to compare this explicitly to make sure, we could only look at those sent on shutdown (`WHERE reason = 'shutdown'`) or filter out data with short session lengths.
Reporter | ||
Comment 3•8 years ago
|
||
(In reply to Georg Fritzsche [:gfritzsche] [away Jun 24 - Jul 3] from comment #1) > You grab values from each submission of each client, so you can run into > pseudoreplication (clients that submit often will bias the distribution). > You could just grab the latest value for each client > (`sandbox_capabilities_seccomp_bpf[1]` etc.). I doubt this is the case. This would require those strange systems with a new kernel yet no seccomp-bpf to systematically be the ones over-reporting. That makes little sense? > I would limit this to the "release" & "beta" channels (`WHERE channel IS > ...`); especially Nightly can have noisy developer data. I don't think the Telemetry has propagated there yet, as it was introduced in 49.
Comment 4•8 years ago
|
||
(In reply to Gian-Carlo Pascutto [:gcp] from comment #3) > (In reply to Georg Fritzsche [:gfritzsche] [away Jun 24 - Jul 3] from > comment #1) > > You grab values from each submission of each client, so you can run into > > pseudoreplication (clients that submit often will bias the distribution). > > You could just grab the latest value for each client > > (`sandbox_capabilities_seccomp_bpf[1]` etc.). > > I doubt this is the case. This would require those strange systems with a > new kernel yet no seccomp-bpf to systematically be the ones over-reporting. > That makes little sense? I wouldn't know, but its easy to fix and check. Sampling once per client is a good practice in general.
Reporter | ||
Comment 5•8 years ago
|
||
I did: SELECT os, system_os[1].version AS osversion, sandbox_capabilities_seccomp_bpf[1] AS seccomp, COUNT(1) AS hits FROM longitudinal WHERE sandbox_capabilities_seccomp_bpf[1] IS NOT NULL AND reason[1] = 'shutdown' GROUP BY os, system_os[1].version, sandbox_capabilities_seccomp_bpf[1] ORDER BY os, system_os[1].version DESC, hits DESC and SELECT os, system_os[1].version AS osversion, sandbox_capabilities_seccomp_bpf[1] AS seccomp FROM longitudinal WHERE sandbox_capabilities_seccomp_bpf[1] IS NOT NULL AND reason[1] = 'shutdown' ORDER BY os, system_os[1].version DESC which mostly reveals that only a few people run Linux ;-) Seccomp enabled or not is still wildly inconsistent with kernel versions, but appears to shift for the same users, which makes no sense.
Comment 6•8 years ago
|
||
I changed this a little to always use the latest available shutdown ping etc., but the numbers are still very low: https://sql.telemetry.mozilla.org/queries/634/source#table Cross-checking with the Aurora population numbers on the Firefox dashboard here, the user numbers we see here are definitely too low: https://metrics.services.mozilla.com/firefox-dashboard/
Comment 7•8 years ago
|
||
(In reply to Gian-Carlo Pascutto [:gcp] from comment #5) > Seccomp enabled or not is still wildly inconsistent with kernel versions, > but appears to shift for the same users, which makes no sense. I think there are currently very few users, so its hard to tell whether its noise or an actual issue. For now, how about: * looking at the histories per user in more detail to see if a pattern stands out (e.g. false when session_length is <Nsec, but true otherwise) * testing the behavior locally to confirm the probe works as intended
Reporter | ||
Comment 8•8 years ago
|
||
https://sql.telemetry.mozilla.org/queries/641/source The data is quite consistent per user, that is if they have "true" it generally stays so, and if it's "false" it also generally stays so. I see no relation to session length. Looks like the detection might be faulty, then?
Comment 9•8 years ago
|
||
I've downgraded my Ubuntu development VM from kernel 4.4.0-28-generic to 4.4.0-22-generic (a version that shows up in the linked data with a lot of "false" but also some "true") to test this, with current mozilla-central, and I'm seeing "true" everywhere in about:support#sandbox and "1" everywhere in about:telemetry, as expected. For whatever that's worth. The test in SandboxInfo.cpp is fairly simple: calls prctl(PR_SET_SECCOMP, 2, nullptr), if it fails with EFAULT then it's supported, if it fails with EINVAL then it's not supported, and if anything else happens (other error code or success) then a there's a MOZ_DIAGNOSTIC_ASSERT. (Chromium uses the same test, but without the assertions, so if we're actually reading a false negative at this layer then so are they.) The telemetry side of this has a lot more “moving parts”, so I'd tend to wonder if something has gone wrong there. Another thing I'm wondering: are these actually our builds, or could some of them be modified downstream builds somehow? Using --disable-sandbox will disable building everything in security/sandbox and won't poke values into telemetry, but the probes as specified in the JSON are conditional only on XP_LINUX. We might be able to get information from the build IDs, if nothing else.
Reporter | ||
Comment 10•8 years ago
|
||
(In reply to Jed Davis [:jld] [⏰PDT; UTC-7] from comment #9) > Another thing I'm wondering: are these actually our builds, or could some of > them be modified downstream builds somehow? That's a possibility, although given that this Telemetry mostly comes from Nightly, distro repacks shouldn't be a thing, and 3rd party builds would need to have explicitly opted out of the sandbox.
Reporter | ||
Comment 11•8 years ago
|
||
Don't see any obvious correlations between build_id and getting seccomp on or off.
Updated•8 years ago
|
Whiteboard: sblc2
Reporter | ||
Comment 12•8 years ago
|
||
I got a local repro for this problem. No exact steps, but: 1) On startup, SandboxInfo::SubmitTelemetry() was called. 2) After running for a while, about:telemetry showed SANDBOX_CAPABILITIES_SECCOMP_BPF 1 samples, average = 0, sum = 0 All Sandbox histograms were 0. About:support showed all true.
Reporter | ||
Comment 13•8 years ago
|
||
Values are OK right after startup: SANDBOX_CAPABILITIES_SECCOMP_BPF 1 samples, average = 1, sum = 1
Comment 14•8 years ago
|
||
Hm, so the problem seems to be that: * in the first subsession, the right values are submitted * later we start a new subsession (e.g. on local midnight or addons being installed or disabled) * the new subsession resets flags to their default (false) value This should be easy to reproduce with enabling/disabling an addon and watching the value change in about:telemetry. Switching the "show subsession data" box on and off should show the two different values. I never liked the "default to false" behavior of flags, i think it is confusing - but we can't break that right now. In bug 1281214 though we are about to implement a "boolean scalar value" measurement, which matches your intent better. So you should be able to: * in re:dash, look at the first subsessions/pings only instead of the shutdown one (subsession_counter = 1) * if the non-subsession value looks correct: * look at the old saved-session pings, which contain values for the whole session * either on t.m.o [1] * or raw in a Spark analysis [1] https://telemetry.mozilla.org/new-pipeline/dist.html#!cumulative=0&end_date=2016-07-07&keys=__none__!__none__!__none__&max_channel_version=nightly%252F50&measure=SANDBOX_CAPABILITIES_SECCOMP_BPF&min_channel_version=null&product=Firefox&sanitize=0&sort_keys=submissions&start_date=2016-06-06&table=0&trim=1&use_submission_date=0
Reporter | ||
Comment 15•8 years ago
|
||
You are right, I could reproduce it with those steps. Given that we actually discussed boolean vs flag in the original bug, and still shot ourselves in the foot, I think it might be a good idea to mark flag as deprecated and not recommend it any more. I wonder if there's any other flag telemetry that is broken in the same way.
Comment 16•8 years ago
|
||
I think flag histograms are working, just counter-intuitively. At this stage i want to wait a short time and then push people on using scalar measurements (i.e. we will soon disallow adding new flag & count histograms). Does this work for you for now, checking the value on the t.m.o dashboard or on sql.t.m.o grabbing by subsession_counter=1?
Reporter | ||
Comment 17•8 years ago
|
||
It appears to be more consistent per user, but I'm still seeing a lot of false on kernels where we know it's supposed to be supported. So I don't know what to think.
Reporter | ||
Comment 18•8 years ago
|
||
Review commit: https://reviewboard.mozilla.org/r/65918/diff/#index_header See other reviews: https://reviewboard.mozilla.org/r/65918/
Attachment #8773191 -
Flags: review?(gfritzsche)
Reporter | ||
Comment 19•8 years ago
|
||
I still see strange behavior in the Telemetry. Because of this bug, even after filtering on subsession_counter, I'm still not confident I can trust the data I'm seeing. But we do need this data, so I want to change the Telemetry so I know I can trust the result. This renames the telemetry and changes everything from flag to boolean. I verified that this is set in the first subsession, and nothing gets set for subsequent ones.
Comment 20•8 years ago
|
||
Comment on attachment 8773191 [details] Bug 1284240 - Use boolean rather than flag for Sandboxing Telemetry. https://reviewboard.mozilla.org/r/65918/#review62854 If that solves your problem, lets do it. I'm looking forward to when we (soon) can point people to scalar booleans for this kind of usage.
Attachment #8773191 -
Flags: review?(gfritzsche) → review+
Comment 21•8 years ago
|
||
Pushed by gpascutto@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/0ce00a7b3b58 Use boolean rather than flag for Sandboxing Telemetry. r=gfritzsche
Comment 22•8 years ago
|
||
bugherder |
https://hg.mozilla.org/mozilla-central/rev/0ce00a7b3b58
Status: NEW → RESOLVED
Closed: 8 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla50
Reporter | ||
Comment 23•8 years ago
|
||
Comment on attachment 8773191 [details] Bug 1284240 - Use boolean rather than flag for Sandboxing Telemetry. Approval Request Comment [Feature/regressing bug #]: Bug 1098428 [User impact if declined]: No visible user impact. Useless data is collected. [Describe test coverage new/current, TreeHerder]: Landed on Nightly a few days ago. [Risks and why]: Very low. No user exposed functionality. [String/UUID change made/needed]: N/A.
Attachment #8773191 -
Flags: approval-mozilla-aurora?
Updated•8 years ago
|
status-firefox49:
--- → affected
Comment 24•8 years ago
|
||
Comment on attachment 8773191 [details] Bug 1284240 - Use boolean rather than flag for Sandboxing Telemetry. This patch fixes collecting useless telemetry data. Take it in aurora.
Attachment #8773191 -
Flags: approval-mozilla-aurora? → approval-mozilla-aurora+
Comment 25•8 years ago
|
||
bugherder uplift |
https://hg.mozilla.org/releases/mozilla-aurora/rev/8d3761e5445c
You need to log in
before you can comment on or make changes to this bug.
Description
•