Allow custom pings to not include things like client_id
Categories
(Toolkit :: Telemetry, enhancement, P3)
Tracking
()
People
(Reporter: janerik, Assigned: mdroettboom)
References
Details
(Whiteboard: [telemetry:mobilesdk:m8])
Attachments
(3 files)
Custom pings might have the requirement to not include information such as the client_id in their pings.
glean needs to provide a way to do this.
This probably means not including the glean_ping_info
in non-default pings.
Updated•6 years ago
|
Updated•6 years ago
|
Updated•6 years ago
|
Updated•6 years ago
|
Assignee | ||
Comment 1•6 years ago
|
||
An interesting thing has come out while trying to implement this --
The easy way seems to be to add a parameter to sendPings
to indicate whether to include client_info
.
However, that fails in one little corner case: If there are events queued on disk from a previous run of the application, we always send them immediately on app startup since the timestamps may not match the current timer. We know which pings those events should be sent on, but we don't know which pings should have client_info
or not, so we could accidentally send events in a custom ping with a client_info
block where there shouldn't be a client_info
block.
There are a few possible ways to resolve this:
-
Add a
pings.yaml
metadata file which would define whetherclient_info
should be included. There are probably documentation and analysis backend advantages to requiring apings.yaml
file in the long run anyway. -
Don't send events on custom pings on startup ever. Technically known data loss, but also perhaps a really narrow corner case.
What do you all think? Put in the extra work of having a pings.yaml
file now?
Assignee | ||
Comment 2•6 years ago
|
||
Another case: The debug activity would need to require "includeClientInfo" as an adb parameter if we didn't have pings.yaml
, and there's a real risk of doing that wrong there.
Comment 3•6 years ago
|
||
(In reply to Michael Droettboom [:mdroettboom] from comment #1)
An interesting thing has come out while trying to implement this --
The easy way seems to be to add a parameter to
sendPings
to indicate whether to includeclient_info
.
Yes, I think this is our low hanging fruit for now.
- Don't send events on custom pings on startup ever. Technically known data loss, but also perhaps a really narrow corner case.
What do you all think? Put in the extra work of having a
pings.yaml
file now?
[...]
Another case: The debug activity would need to require "includeClientInfo" as an adb parameter if we didn't havepings.yaml
, and there's a real risk of doing that wrong there.
Having the pings.yaml
is a bit more painful, as there are some additional constraints:
sync
ping may carry (as far as I know) some events, so they will not be ok in loosing data at restarts; this is, as you highlighted, fixed by having thepings.yaml
activation
ping will conditionally include theclient_id
, depending on whether or not some other data is available; unfortunately, we'll only know at runtime. So we might need both mechanisms.
@Georg, what's your take on this?
Comment 4•6 years ago
|
||
The activation ping requirement of conditionally is a bit odd to me, i'd propose to not block on this here and see if we can solve that differently. (like e.g. always sending a separate identifier, not the main client id)
With that out of the way, using pings.yaml
to specify these ping rules sounds like it should cover everything?
A pings.yaml
brings some integration complexities with it, but also documentation & automation benefits, so it seems like a good idea to have this specified at build-time instead of run-time.
Comment 5•6 years ago
|
||
Jan-Erik, this will also affect glean.rs work. See comment 1. What's your take on this?
Reporter | ||
Comment 6•6 years ago
|
||
I very much like the idea of having statically defined what data lands in the payload of a ping.
If I remember correctly we talked about this early when writing the specification for glean.
Just to understand the current situation and correct me if I'm wrong:
- It's possible to define metrics to be sent in a ping called "custom"
- The application can trigger sending the ping "custom"
- glean will collect a snapshot from the corresponding store (and clear it), then construct the full payload (including
client_info
, a document id, ...) - Actual data that's in
client_info
is defined to be in a pseudo-ping namedglean_client_info
. - There's also a
ping_info
section, mostly harmless, but it does contain experiments info- that last bit might or might not be problematic for e.g. the ecosystem ping
Do we provide any automatic scheduling of custom pings?
(and by custom pings I mean everything that's not defined in glean itself, but by the application)
What condition determines if the activation
ping has to include the client_id
?
Comment 7•6 years ago
|
||
(In reply to Jan-Erik Rediger [:janerik] from comment #6)
I very much like the idea of having statically defined what data lands in the payload of a ping.
If I remember correctly we talked about this early when writing the specification for glean.Just to understand the current situation and correct me if I'm wrong:
- It's possible to define metrics to be sent in a ping called "custom"
- The application can trigger sending the ping "custom"
- glean will collect a snapshot from the corresponding store (and clear it), then construct the full payload (including
client_info
, a document id, ...)- Actual data that's in
client_info
is defined to be in a pseudo-ping namedglean_client_info
.- There's also a
ping_info
section, mostly harmless, but it does contain experiments info
- that last bit might or might not be problematic for e.g. the ecosystem ping
Yes, that's correct.
Do we provide any automatic scheduling of custom pings?
(and by custom pings I mean everything that's not defined in glean itself, but by the application)
No, scheduling is up to the application. We only provide a Glean.sendPings
function that consumers will need to call whenever they want.
What condition determines if the
activation
ping has to include theclient_id
?
This is what this bug is about: we either describe this in a pings.yaml
file (along with some other meta) or as a parameter in Glean.sendPings
Comment 8•6 years ago
|
||
I lean more towards the pings.yaml solution. It's more painful, but I think it is a better solution in the long run. It's really giving us better documentation of the ping payloads and probably making it easier for data-stewards to review custom pings.
I think it should even be possible to have scheduling information in the pings.yaml file that could allow Glean to handle the scheduling as well.
Assignee | ||
Comment 9•6 years ago
|
||
Assignee | ||
Comment 10•6 years ago
|
||
Assignee | ||
Comment 11•6 years ago
|
||
Assignee | ||
Updated•6 years ago
|
Comment hidden (collapsed) |
Description
•