Open Bug 1467826 Opened 6 years ago Updated 2 years ago

Enforce a strict schema for keyed histograms' keys

Categories

(Toolkit :: Telemetry, enhancement, P3)

enhancement

Tracking

()

Tracking Status
firefox62 --- affected

People

(Reporter: Dexter, Unassigned)

References

Details

One possible remedy for preventing something like bug 1467809 from happening again could be to have some sort of validation on the key names for keyed histograms and scalars.

While this would make it less flexible for users, it would probably be safer.
Is this something we can do effectively on the client?
Or is there maybe something to do in CI or in monitoring incoming data?
Flags: needinfo?(alessio.placitelli)
(In reply to Georg Fritzsche [:gfritzsche] from comment #1)
> Is this something we can do effectively on the client?
> Or is there maybe something to do in CI or in monitoring incoming data?

That pretty much depends on what we want to check: if we go for some regexp for detecting full paths, we might be able to do that on the client. Maybe we can discuss through what we want/need here: is detecting full paths enough?
Flags: needinfo?(alessio.placitelli)
A related discussion would be whether we want to be responsible for this. If we start validating input to try to catch certain leakages we may provide a false sense of security. Might be a stewardship thing, with engineering support, to try and leak data via Telemetry and see where client engineering, pipeline engineering, and policy need to step in to prevent it.

I recall my days working on BlackBerry OS4.6 and we had the concept of a "zebra test" to test the data-at-rest encryption. I mostly don't remember it, but it went something like: put a known, uncommon phrase like "zebra" into all the text fields and storage available to users. Then lock the phone (to trigger the encryption), take a ramdump, and see if the phrase appeared unencrypted.

Maybe we (or stewardship, or PI, or...) could have a similar test where we use, say, marionette to fill every known input with zebras and then ensure the telemetry payload has none of it? (even going so far as to call the profile "zebra" and have it be run by the "zebra" user on all platforms). I think this would be worthy work, the largest stumbling block being finding a team with a broad enough product knowledge to be able to find all of these inputs.
(In reply to Chris H-C :chutten from comment #3)
> A related discussion would be whether we want to be responsible for this. If
> we start validating input to try to catch certain leakages we may provide a
> false sense of security. Might be a stewardship thing, with engineering
> support, to try and leak data via Telemetry and see where client
> engineering, pipeline engineering, and policy need to step in to prevent it.

I think you're right about this.

> I recall my days working on BlackBerry OS4.6 and we had the concept of a
> "zebra test" to test the data-at-rest encryption. I mostly don't remember
> it, but it went something like: put a known, uncommon phrase like "zebra"
> into all the text fields and storage available to users. Then lock the phone
> (to trigger the encryption), take a ramdump, and see if the phrase appeared
> unencrypted.
> 
> Maybe we (or stewardship, or PI, or...) could have a similar test where we
> use, say, marionette to fill every known input with zebras and then ensure
> the telemetry payload has none of it? (even going so far as to call the
> profile "zebra" and have it be run by the "zebra" user on all platforms). I
> think this would be worthy work, the largest stumbling block being finding a
> team with a broad enough product knowledge to be able to find all of these
> inputs.

Mh, that rings a "fuzzing" bell to me. This might be a low hanging fruit: I think security people might already have some "fuzzers" to use in all the text fields in Firefox. However, this looks like a bigger scoped project compared to simply providing some sort of restriction on the keys. I'm inclined to saying let's close this bug as WONTFIX for the reason you provided and file a new one for investigating the zebra-thinghy.
Priority: -- → P3
How about this for a plan:

1) Come up with a couple of options we want to try out for client-side validation. Say, a permissive regex, a path separator detector, and a profile dir matcher.
2) Don't have the validators fail the accumulation. Instead, ship them with keyed uint Scalars counting how many of which probes fail the validators.
3) Perform some real-world analysis on the Scalars to see which is best for real-life
4) Remove the less-best validators, and have the best one change failing keys to "failed-validation" and continue accumulating.

This way we don't lose things, we can iterate on our proposals, and we get real-world tests.
How hard would it be to pull out some of the keys from the existing data to fine-tune any regex against?
I imagine we could do something like:

records = Dataset.from_source('telemetry').where(
    docType='main',
    submissionDate=lambda x: x >= '20180522',
).select(
  parent='payload.keyedHistograms',
  content='payload.processes.content.keyedHistograms',
  gpu='payload.processes.gpu.keyedHistograms',
).records(sc, sample=0.01)

records.flatMap(lambda p: [p.parent[k].keys() for k in p.parent.keys()] +....
Something I stumbled upon while looking at old bugs was bug 1334469 where it exhibits some problems of forbidding even "obviously-bad" key values: logspam, backouts...
See Also: → 1334469
Severity: normal → S3
You need to log in before you can comment on or make changes to this bug.