Closed Bug 1529226 Opened 6 years ago Closed 6 years ago

Allow custom pings to not include things like client_id

Categories

(Toolkit :: Telemetry, enhancement, P3)

enhancement

Tracking

()

RESOLVED FIXED

People

(Reporter: janerik, Assigned: mdroettboom)

References

Details

(Whiteboard: [telemetry:mobilesdk:m8])

Attachments

(3 files)

Custom pings might have the requirement to not include information such as the client_id in their pings.

glean needs to provide a way to do this.

This probably means not including the glean_ping_info in non-default pings.

Blocks: 1491345
Priority: -- → P3
Whiteboard: [telemetry:mobilesdk:m?]
Depends on: 1526343
Whiteboard: [telemetry:mobilesdk:m?] → [telemetry:mobilesdk:m7]
Whiteboard: [telemetry:mobilesdk:m7] → [telemetry:mobilesdk:m11]
Whiteboard: [telemetry:mobilesdk:m11] → [telemetry:mobilesdk:m8]

An interesting thing has come out while trying to implement this --

The easy way seems to be to add a parameter to sendPings to indicate whether to include client_info.

However, that fails in one little corner case: If there are events queued on disk from a previous run of the application, we always send them immediately on app startup since the timestamps may not match the current timer. We know which pings those events should be sent on, but we don't know which pings should have client_info or not, so we could accidentally send events in a custom ping with a client_info block where there shouldn't be a client_info block.

There are a few possible ways to resolve this:

  • Add a pings.yaml metadata file which would define whether client_info should be included. There are probably documentation and analysis backend advantages to requiring a pings.yaml file in the long run anyway.

  • Don't send events on custom pings on startup ever. Technically known data loss, but also perhaps a really narrow corner case.

What do you all think? Put in the extra work of having a pings.yaml file now?

Assignee: nobody → mdroettboom
Flags: needinfo?(tlong)
Flags: needinfo?(alessio.placitelli)

Another case: The debug activity would need to require "includeClientInfo" as an adb parameter if we didn't have pings.yaml, and there's a real risk of doing that wrong there.

(In reply to Michael Droettboom [:mdroettboom] from comment #1)

An interesting thing has come out while trying to implement this --

The easy way seems to be to add a parameter to sendPings to indicate whether to include client_info.

Yes, I think this is our low hanging fruit for now.

  • Don't send events on custom pings on startup ever. Technically known data loss, but also perhaps a really narrow corner case.

What do you all think? Put in the extra work of having a pings.yaml file now?
[...]
Another case: The debug activity would need to require "includeClientInfo" as an adb parameter if we didn't have pings.yaml, and there's a real risk of doing that wrong there.

Having the pings.yaml is a bit more painful, as there are some additional constraints:

  • sync ping may carry (as far as I know) some events, so they will not be ok in loosing data at restarts; this is, as you highlighted, fixed by having the pings.yaml
  • activation ping will conditionally include the client_id, depending on whether or not some other data is available; unfortunately, we'll only know at runtime. So we might need both mechanisms.

@Georg, what's your take on this?

Flags: needinfo?(alessio.placitelli) → needinfo?(gfritzsche)

The activation ping requirement of conditionally is a bit odd to me, i'd propose to not block on this here and see if we can solve that differently. (like e.g. always sending a separate identifier, not the main client id)

With that out of the way, using pings.yaml to specify these ping rules sounds like it should cover everything?

A pings.yaml brings some integration complexities with it, but also documentation & automation benefits, so it seems like a good idea to have this specified at build-time instead of run-time.

Flags: needinfo?(gfritzsche)

Jan-Erik, this will also affect glean.rs work. See comment 1. What's your take on this?

Flags: needinfo?(jrediger)

I very much like the idea of having statically defined what data lands in the payload of a ping.
If I remember correctly we talked about this early when writing the specification for glean.

Just to understand the current situation and correct me if I'm wrong:

  • It's possible to define metrics to be sent in a ping called "custom"
  • The application can trigger sending the ping "custom"
  • glean will collect a snapshot from the corresponding store (and clear it), then construct the full payload (including client_info, a document id, ...)
  • Actual data that's in client_info is defined to be in a pseudo-ping named glean_client_info.
  • There's also a ping_info section, mostly harmless, but it does contain experiments info
    • that last bit might or might not be problematic for e.g. the ecosystem ping

Do we provide any automatic scheduling of custom pings?
(and by custom pings I mean everything that's not defined in glean itself, but by the application)

What condition determines if the activation ping has to include the client_id?

Flags: needinfo?(jrediger)

(In reply to Jan-Erik Rediger [:janerik] from comment #6)

I very much like the idea of having statically defined what data lands in the payload of a ping.
If I remember correctly we talked about this early when writing the specification for glean.

Just to understand the current situation and correct me if I'm wrong:

  • It's possible to define metrics to be sent in a ping called "custom"
  • The application can trigger sending the ping "custom"
  • glean will collect a snapshot from the corresponding store (and clear it), then construct the full payload (including client_info, a document id, ...)
  • Actual data that's in client_info is defined to be in a pseudo-ping named glean_client_info.
  • There's also a ping_info section, mostly harmless, but it does contain experiments info
    • that last bit might or might not be problematic for e.g. the ecosystem ping

Yes, that's correct.

Do we provide any automatic scheduling of custom pings?
(and by custom pings I mean everything that's not defined in glean itself, but by the application)

No, scheduling is up to the application. We only provide a Glean.sendPings function that consumers will need to call whenever they want.

What condition determines if the activation ping has to include the client_id?

This is what this bug is about: we either describe this in a pings.yaml file (along with some other meta) or as a parameter in Glean.sendPings

I lean more towards the pings.yaml solution. It's more painful, but I think it is a better solution in the long run. It's really giving us better documentation of the ping payloads and probably making it easier for data-stewards to review custom pings.

I think it should even be possible to have scheduling information in the pings.yaml file that could allow Glean to handle the scheduling as well.

Flags: needinfo?(tlong)
Status: NEW → RESOLVED
Closed: 6 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: