Closed Bug 1552536 Opened 7 months ago Closed 7 months ago

Add a dummy pageload origin to origin Telemetry on every page unload

Categories

(Core :: Privacy: Anti-Tracking, task, P2)

task

Tracking

()

RESOLVED FIXED
mozilla69
Tracking Status
firefox69 --- fixed

People

(Reporter: englehardt, Assigned: xeonchen)

References

Details

Attachments

(1 file)

Origin Telemetry doesn't currently have a notion of pageloads. recordOrigin will place origin events from different pageloads within the same buffer, so the total number of buffers returned is not a reliable measure of the total number of pageloads.

We can work around this by adding a special "pageload" dummy origin that is sent on every sampled pageload, irrespective of the actual set of origins blocked/exempt (if any).

We can do this in two parts:

  1. Add a dummy origin to https://searchfox.org/mozilla-central/source/toolkit/components/telemetry/core/TelemetryOriginData.inc. Perhaps ORIGIN("PAGELOAD", "PAGELOAD")?

  2. Call recordOrigin with a metric id of OriginMetricID::ContentBlocking_Blocked_TestOnly or OriginMetricID::ContentBlocking_Blocked (depending on the mode) and origin/hash "PAGELOAD" once for every sampled pageload (i.e., pageloads where IsReportingEnabled() is true). We'll want to call this even when no cookies are blocked. I suspect we can hard code this call here https://searchfox.org/mozilla-central/rev/94c6b5f06d2464f6780a52f32e917d25ddc30d6b/dom/base/ContentBlockingLog.cpp#114, before we loop through the log.

I think it's sufficient to only call the dummy origin with the "blocked" metric ID. Origin Telemetry only prepares buffers for metric IDs that actually have at least one true value and the blocked metric will be the most common.

Assignee: nobody → xeonchen
Status: NEW → ASSIGNED
Type: defect → task
Priority: -- → P2

Depending on the design of the eventual dataset, I'm not sure it'll be trivial to take blocked's pageloads count and use it in analyses of exempted rules. I'd double-check that with... oh geez, who's working that part of this... Anthony, is it you?

Flags: needinfo?(amiyaguchi)

From my perspective, this is fine since I'm also dealing with raw bit-vectors at the end. I have a script to help map from the bit-vector into something human-readable using the TelemetryOriginData.inc file. I can't speak strongly about the analysis, but I imagine it would look something like this:

aggregates = {
    "PAGELOAD": 1000,
    "some.origin.com": 2,
    ...
}

total = aggregates["PAGELOAD"]
del aggregates["PAGELOAD"]

normalized = {origin: count/total for origin, count in aggregates.items()}
Flags: needinfo?(amiyaguchi)
Pushed by xeonchen@gmail.com:
https://hg.mozilla.org/integration/autoland/rev/c77c46ac90a5
add dummy page load origin; r=chutten

Backed out changeset c77c46ac90a5 (Bug 1552536) by xeonchen's request

Backout link: https://hg.mozilla.org/integration/autoland/rev/82b9eaa4679754eb3ff38e6472a3d9f77600affa

Flags: needinfo?(xeonchen)
Flags: needinfo?(xeonchen)
Pushed by xeonchen@gmail.com:
https://hg.mozilla.org/integration/autoland/rev/c63967f172ee
add dummy page load origin; r=Ehsan,chutten
Status: ASSIGNED → RESOLVED
Closed: 7 months ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla69
You need to log in before you can comment on or make changes to this bug.