Implement the "rate" metric type
Categories
(Data Platform and Tools :: Glean: SDK, task, P1)
Tracking
(Not tracked)
People
(Reporter: mdroettboom, Assigned: chutten)
References
Details
Attachments
(1 file)
As described in bug 1631856.
Updated•4 years ago
|
Hi Alessio! I am Yvon. I am a newbie attempting to contribute to Glean SDK. I have been reading through Documentation for a couple days now, and was wondering if I could take on this. Do you think this is a good first issue for someone contributing for the first time? If it helps you understand more about where I currently am, I have been using Rust for 3 months now.
Hi @Alessio! I am Yvon. I am a newbie attempting to contribute to Glean SDK. I have been reading through Documentation for a couple days now, and was wondering if I could take on this. Do you think this is a good first issue for someone contributing for the first time? If it helps you understand more about where I currently am, I have been using Rust for 3 months now.
Comment 3•4 years ago
|
||
Hi Yvon and welcome to the Glean SDK. I'm afraid this bug is not appropriate to start development with the SDK itself. Jan-Erik, do you know about any reasonably good first bug for Yvon, in Rust?
Comment 4•4 years ago
|
||
If I may make some suggestions:
- Glean must behave consistently for unknown URL scheme endpoints across language bindings
- Prevent the use of "HTTP" endpoints in Glean outside of tests
- Update the way we get the version code in Android to remove the
@Deprecated
Definitely wait for [:janerik]s input on this suggestions, just commenting because I always have a list of such bugs :)
Comment 5•4 years ago
|
||
Hi Yvon!
As the others have said this bug here is not the best to get started, as it is a bit more involved to fit in what we decided on in the proposal.
The bugs Bea linked are easy to get started, but are mostly not Rust specific.
However bug 1670634 doesn't require much Glean knowledge and with a bit of Rust knowledge it's solvable. I leave some more detailed instructions over there. Feel free to ping me in that bug to pick it up.
Thank you all! I appreciate all the help. Jan-Erik, I will go ahead and give it a try.
Updated•3 years ago
|
Assignee | ||
Comment 7•3 years ago
|
||
Step 1) Revisit the proposal to ensure we did due diligence to ensure it will work with Use Counters. I am certain that the storage and transmission format (numerators and denominators) are correct, but maybe the API won't work out with generated names? Generated code? Hard to say. If unsure, I'm to ping emilio.
Step 2) Implement this.
Assignee | ||
Comment 8•3 years ago
•
|
||
Notes with a Use Counter flavour:
- Use Counters have two denominators each target being counted: One for _DOCUMENT when the target is used in any document, and another for _PAGE when the target is used in a document which is the top-level document (the one whose url is in the location bar). Encoding this using
rate
as written will involve two rates with identical (modulo concurrency) values in their numerators. We have an opportunity to optimize for this case by allowing external numerators as Marina mentions or by pluralizingdenominator_metric
to take a list. We can choose to explore this as an optional optimization in the future (it would save 50% of use counter'srate
payload size). - The duplication of an external denominator's value to each referent is gonna cost us in bandwidth at scale. 2k use counters from each of hundreds of millions of users is a lot, especially if it's daily (I'd push for weekly, personally, but you wouldn't fault them for wanting the ease of scheduling of the "metrics" ping instead of managing their own. Maybe we need scheduling options for custom pings...). Luckily the payload format is ours so we can iterate on this later if we want the savings (it would save 50% of use counter's
rate
payload size). Anything is better than "count"-type histograms: (This is just the numerator)
"USE_COUNTER2_JS_WASM_DOCUMENT": {
"bucket_count": 3,
"histogram_type": 2,
"sum": 1,
"range": [
1,
2
],
"values": {
"0": 0,
"1": 1,
"2": 0
}
},
- Duplicating external denominator values in its referents' the subpayloads is one thing, but trying to keep the in-memory representation seems unnecessary. To do so we'll need to keep the lock while updating the 2k
rate
s. We'll either need some way to iterate over all known rates or aglean_parser
-generated mapping from "denominator"-usedcounter
reference to all therate
s that use it. Not insurmountable and not terrible... but given the alternative is to only (outside of test methods which can be as expensive as we dare) look all this up only once during ping submission? I'm leaning towards the latter, personally, and I hope it's not too much trouble if I just go ahead like that. - Being forced to increment the denominator before the numerator (and only that one time it's going up from zero?!) seems odd to me. This will 100% trip up our users, especially in a multiprocess situation, so I'm inclined to drop it from client concern, if that's okay.
To sum up: it looks like we're clear to implement here with nothing blocking. Just two optimizations which each'd save us half the payload/bandwidth requirements for use counters (combined: 75% savings) and two quibbles that I think I have the wriggle room in the design to take as impl choices : )
Comment 9•3 years ago
|
||
Assignee | ||
Comment 10•3 years ago
|
||
Landed in RLB.
Description
•