Closed Bug 1772163 Opened 2 years ago Closed 2 years ago

Consider lifting (or at least expanding) the limit on the number of static labels for labeled_* metric types (or at least labeled_counter)

Categories

(Data Platform and Tools :: Glean: SDK, enhancement, P2)

enhancement

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: chutten, Assigned: chutten)

References

Details

Attachments

(1 file)

In the course of instrumenting threads and proposing to instrument IPC, Florian has found at least two or three use cases that would have really benefitted from labeled_counter having less-limiting limits on the number of labels.

Ideally there's some sort of solution that doesn't involve re-enumerating all of the label strings in metrics.yaml (( looking at you, yet-to-be-designed string_map )), but the most straightforward path to supporting these sorts of use cases of {<known string of finite but largish domain>: <one or more primitive data>} (e.g. thread_or_runnable_name: wakeup_count, ipc_message_name: sent_or_received_count) appears to be removing (or at lease relaxing) the "at most 100 labels" limit (implemented in glean_parser) on labeled_counter, labeled_boolean, and labeled_string.

Relaxing it by how much? Well, there are 2725 registered IPC message names at present. For the present use cases, 2726 would do.

But do we even need this limit?

  • Because sensible limits help guide instrumentors towards sensible instrumentation?
    • I tried thinking of a way to exploit a larger-than-100 number of labels that wouldn't be more easily done by just sending event metrics or text or something, and came up with nothing.
  • Because of the cost?
    • Application size: Each new label will have to appear in the application. You still have to list them all out in the Glean metrics.yaml format meaning there's incentive to keep numbers down. And the amount of extra space taken up is minimized through clever implementation choices we made because 100 was a big enough number for us to worry.
    • Bandwidth: Only a problem if the ping actually contains information from more than 100 labels. Presuming that information is meaningful, the only other way to record it would be via event metrics which is even more expensive than what is proposed.
    • Storage: The same "it's cheaper than the alternative when the alternative is event metrics" applies.
    • Query speed: Like Bandwidth and Storage this is proportional to the number of labels used, not the number of labels defined. Assuming that the number of labels used is meaningful no matter the encoding, this is the cheapest way we currently have of encoding this information.
      What does BigQuery think about it? Well, near as I can tell, the current representation of labeled_* metrics is in RECORD REPEATED columns. The limits on those types appears to only relate to the nesting level, which removing the limit on labels wouldn't touch.
  • Because of other reasons that I can't see from where I sit?
    • :relud, :klukas, this is where you come in. What am I missing? GLAM? Glean Dictionary? Probe Scraper? BigQuery considerations?
    • Jan-Erik, you're lead on this SDK: what other considerations should we, uh, consider?
    • Feel free to add and remove ni? as necessary.

Right now I'm happy with proposals anywhere from "Raise the limit on labeled_counter statically-defined labels to 4k" to "Remove the limit on the number of statically-defined labels on all three labeled_* metric types". The latter is easier to implement (just remove this line, change some docs, and write some tests. Mayyybe rev the schema version.) so it's winning out a little in my mind at the moment.

Flags: needinfo?(jrediger)
Flags: needinfo?(jklukas)
Flags: needinfo?(dthorn)

I'd favor raising it to 4k. BigQuery can run into Out Of Memory failures when processing large arrays, in the past I've seen that triggered by (legacy) telemetry clients daily. It can be worked around, but I think it's better to enforce a reasonable limit before the data hits stable tables so that analysis doesn't need workarounds.

Flags: needinfo?(dthorn)

What does BigQuery think about it? Well, near as I can tell, the current representation of labeled_* metrics is in RECORD REPEATED columns. The limits on those types appears to only relate to the nesting level, which removing the limit on labels wouldn't touch.

Yes, we currently represent these as repeated key/value records, and I don't have BQ-level concerns with increasing the allowed number of keys aside from the potential OOM issue that :relud raised. I would expect, though, that even with 4k allowed values, the array is sparse and would likely have only a fraction of defined labels present in the array.

Note, though, that this would be a concern if we had chosen to implement labeled metrics as a nested STRUCT with named fields rather than key/value. If we had 4k labels defined, that would map to 4k separate fields and would bump us into the world of ultrawide schemas and some of the performance and limit problems that we experience with legacy main pings. Allowing longer label lists could limit future options for representing the structure, but I don't see it blocking anything that's actually planned.

I can't think of any further complications we'd introduce by raising this.

My gut says to be conservative and raise the limit to 4k as :relud suggests rather than completely removing it.

Flags: needinfo?(jklukas)

With no pipeline concerns given a reasonable limit I'd say go for it.
If this goes totally overboard with everyone suddenly using 2k labels, we can always add another lint (lints can be disabled per metric), nudging people to rethink if it's really worth it.

Flags: needinfo?(jrediger)
Assignee: nobody → chutten
Status: NEW → ASSIGNED
Priority: -- → P2
Status: ASSIGNED → RESOLVED
Closed: 2 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: