Closed Bug 1442653 Opened 6 years ago Closed 6 years ago

"event" ping design

Categories

(Toolkit :: Telemetry, enhancement, P1)

enhancement

Tracking

()

RESOLVED FIXED

People

(Reporter: chutten, Assigned: chutten)

References

(Depends on 1 open bug)

Details

We need to design the schema and method of delivery for this "events" ping.
Some design considerations. ni?gfritzsche to loop in any others with input, and to provide his own comments.

SIZE:

The ping will be some amount of meta information, plus a number of event records. An event[1] is [timestamp, category, method, object, value, extra]. timestamp is in millis. Assuming a sensible (sub-year) session length, we're talking maybe ten characters in base ten. category has a max length of 30. method and object are both capped at 20. value and extra both have up to 80. Thus the max size of an individual record is about

10 + 30 + 20 + 20 + 80 + 80 = 240

Add in some commas and quotation marks and we're looking at approximately four records per kilobyte. The maximum size on disk of a ping is about a megabyte, so we want to steer well under four thousand records. I'm thinking a good max count of events to trigger a ping submit will be 1000 event records.

That'll put us in at a max size of 250KB + environment, common ping info, and other hangers on.

FREQUENCY:

The entire point is lowering the latency of delivery as well as enabling the sending of more than 500 events. So I think a ping can be sent at least once every hour that had an event in it, and immediately once our limit of 1000 events has been reached. One final ping should be sent within "profile-before-change". 

I think, for prudence's sake, we should also impose a maximum frequency of 10 minutes to limit our exposure to misbehaving modules. We will record the number of records beyond our max that we were unwilling to record and send. I suggest we also record this as a scalar so we can tie into the usual tooling. (though maybe "event summary" bug 1440673 will serve this purpose better)

STRUCTURE:

I presume the clientId and environment will be important. We will have multiple reasons for triggering the send and a count of any records we left behind. And we will have an array of events. So:

submitExternalPing("events", {reason: {periodic|max|shutdown}, eventsTruncated: <number>, events: arrayOfEvents}, {addClientId: true, addEnvironment: true});

It is possible we may wish to include additional information (subsessionId, for instance), but I'm happy to leave that for a later iteration.

META:

We will also want a little bit of non-custom ping record-keeping:

A scalar to count the sum of eventsTruncated over the course of a subsession (to tie in with alerting, reporting, and existing tooling. Are we truncating events? How many? Is the number rising? Even after we fixed a bug?)

A categorical histogram to count the number of each reason of "events" ping sent (to allow us to examine trends in the changing behaviour of this ping being used in the wild. Do we hit the max more often? Do shutdown pings suddenly disappear? Is a client sending more periodic pings than there are periods in their session?)

[1]: https://firefox-source-docs.mozilla.org/toolkit/components/telemetry/telemetry/collection/events.html
Flags: needinfo?(gfritzsche)
Assignee: nobody → chutten
Status: NEW → ASSIGNED
Priority: P2 → P1
Depends on: 1450744
From my understanding this design will first go through an iteration between Sunah & Chris and move to a Google Doc.
Clearing needinfo for now, i'm looking forward to take a design review pass after that step.
Flags: needinfo?(gfritzsche)
The design has gone through a couple of iterations now in the aforementioned Google Doc: https://docs.google.com/document/d/1rkfCOTdm8zsVTRQqUH-SkG4JjVVLbnWoCt2MGiwIz7A/edit

The biggest changes include:

* It will be called "event" not "events"
* The number of events we fail to include due to reaching maximum limits will be recorded in the ping itself
* Limits will be pref-configurable.

Aside from that it follows Comment#1 pretty closely.
Summary: "events" ping design → "event" ping design
See Also: → 1460595
With the design nailed down in the doc (that link again is https://docs.google.com/document/d/1rkfCOTdm8zsVTRQqUH-SkG4JjVVLbnWoCt2MGiwIz7A ), we can count this done. The last round of design resulted in a minute-resolution timestamp for event ordering, and adding a sessionId as well as the subsessionId to help with linkage and failure diagnosis.
Status: ASSIGNED → RESOLVED
Closed: 6 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.