Open Bug 1372827 Opened 7 years ago Updated 2 years ago

Add telemetry for when BaseThreadInitThunk gatekeeping blocks a thread

Categories

(Core :: General, enhancement, P3)

enhancement

Tracking

()

People

(Reporter: ccorcoran, Unassigned)

References

(Blocks 3 open bugs)

Details

(Whiteboard: inj+)

Attachments

(3 files)

In bug 1322554, we silence a crash by blocking threads whose start addresses are not authorized. While this may prevent crashes, it's probably a good idea to annotate when this happens in a ping.
Putting at P3 due to lack of resources, per aklotz.
Priority: -- → P3
Depends on: 1322554
Whiteboard: inj?
Blocks: 1435773
Assigning to Carl. Please prioritize bug 1435827 above this one whenever the former is unblocked.
Assignee: nobody → ccorcoran
Priority: P3 → P1
Summary: Add telemetry about BaseThreadInitThunk gatekeeping blocks a thread → Add telemetry for when BaseThreadInitThunk gatekeeping blocks a thread
Whiteboard: inj? → inj+
I'm looking for advice on the best way to submit this data via telemetry. We're (tentatively) looking to record the following data every time we block a potentially-crashing / malicious thread:

> - Reason why blocked
> - Is it a known blocked entry point (bug 1435816)
> - Process uptime
> - Thread entry point address
> - other data about that address, for example whether it is near a module, virtual memory flags, etc.

sunahsuh, can you help advise which telemetry delivery method would be most appropriate for this? It's unclear to me whether we should ride the main ping, make a custom ping, use telemetry Events, or something I haven't considered yet.
Flags: needinfo?(ssuh)
From first glance telemetry events seems like the easiest path forward but there are a few limitations that might keep us from using it -- the format allows up to 10 key-value extra metadata fields, with keys limited to 15 bytes and values limited to 80 (full limit documentation here: https://firefox-source-docs.mozilla.org/toolkit/components/telemetry/telemetry/collection/events.html#limits). If the data you'd like to collect fits within those parameters, events are sent within an hour of collection and become queryable within a day of receipt (and hopefully soon, within minutes of receipt). The event ping also has the full environment attached, which would be very useful for this data. I assume this is a relatively rare event, given my reading of the original bug? (By which I mean, happens once per 1000s or 10,000s of browser sessions, if even that.)

If the limitations won't allow use of Events, I suspect a custom ping would be the next best option since we can make a direct-to-parquet dataset that will be queryable in STMO within an hour or so of receipt merely by adding a schema to the pipeline (and a corresponding parquet schema.) The documentation for doing this on the pipeline side is here: https://docs.telemetry.mozilla.org/cookbooks/new_ping.html
Flags: needinfo?(ssuh)
(In reply to Sunah Suh (she/her) [:sunahsuh] from comment #4)
> I assume this is a relatively rare
> event, given my reading of the original bug? (By which I mean, happens once
> per 1000s or 10,000s of browser sessions, if even that.)

Was this cleared up somewhere? What's the expected volume?
(In reply to Georg Fritzsche [:gfritzsche] from comment #5)

> What's the expected volume?

The expected volume is extremely low. I don't have any numbers, but we currently only block threads when:

- A thread is started with an entry point located in bad memory
- Or when a thread is started with an entry point in a LoadLibrary variant. This is an indication of a 3rd party app trying to inject a DLL.
I think preliminary telemetry might be useful here to measure the size of the problem. A categorical histogram [1] (example: [2]) of counts by reason would do the trick, and be rather quick to implement and easy to uplift.

[1]: https://firefox-source-docs.mozilla.org/toolkit/components/telemetry/telemetry/collection/histograms.html#categorical
[2]: https://mzl.la/2ApXRHx
Blocks: 1501717
Depends on: 1313327
Attaching a WIP which performs the event data gathering and dispatching. What remains is to send the event data via telemetry events, once bug 1313327 is landed.
Blocks: 1516532

Just checking in on the status of this data review. Let me know if you need anything more from me.

Flags: needinfo?(chutten)

No, no. I just bounced from this because I was thinking about the bug 1313327 part of it. I'll get to this forthwith.

Flags: needinfo?(chutten)

Comment on attachment 9033920 [details]
Request for data collection review form - Bug 1372827.txt

Preliminary notes:

The "untrustedModules" ping which shares some analysis purpose with these collections, is on Nightly only. These collections are asking for all channels collection. Is this deliberate and necessary? (To be clear, it is allowed as the datareview+ shows, but I would like to ensure it is on purpose)

DATA COLLECTION REVIEW RESPONSE:

Is there or will there be documentation that describes the schema for the ultimate data set available publicly, complete and accurate?

Yes. These collections are Telemetry so are documented in their definitions file (Events.yaml), and in the Probe Dictionary.

Is there a control mechanism that allows the user to turn the data collection on and off?

Yes. These collections are Telemetry so can be controlled through Firefox's Preferences.

If the request is for permanent data collection, is there someone who will monitor the data over time?

Yes. :ccorcoran is responsible.

Using the category system of data types on the Mozilla wiki, what collection type of data do the requested measurements fall under?

Category 1, Technical.

Is the data collection request for default-on or default-off?

Default on for all channels.

Does the instrumentation include the addition of any new identifiers?

Nothing new, though it does include module names like the "untrustedModules" ping does and one of a list of four known entry points.

Is the data collection covered by the existing Firefox privacy notice?

Yes.

Does there need to be a check-in in the future to determine whether to renew the data?

No. This collection is permanent.


Result: datareview+

Attachment #9033920 - Flags: review?(chutten) → review+

Posting a WIP until bug 1313327 is finalized. This now collects all the thread blocking data, serializes via telemetry events, and includes an xpcshell unit test to check the payload.

Assignee: ccorcoran → nobody
Priority: P1 → P3
Severity: normal → S3
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: