Closed Bug 1792025 Opened 2 years ago Closed 1 year ago

Proposal to change `labeled_*` metrics' label format to be more lenient

Tracking

(Not tracked)

Status:

RESOLVED WORKSFORME

People

(Reporter: chutten, Unassigned)

References

(Blocks 1 open bug)

Details

Chris H-C :chutten

Reporter

Description

•

2 years ago

Proposal for changing an existing or adding a new Glean metric type

Who is the individual/team requesting this change?

Chris H-C (:chutten), Glean SDK team. On behalf of :florian and :nika.

Is this about changing an existing metric type or creating a new one?

Changing labeled metric types in the present and into the future.

Can you describe the data that needs to be recorded?

Thread names, IPC Message names, Search engine names, <other capitalized or punctuated strings>

Can you provide a raw sample of the data that needs to be recorded (this is in the abstract, and not any particular implementation details about its representation in the payload or the database)

PSocketProcess__Msg_OnHttpActivityDistributorObserveConnection

What is the business question/use-case that requires the data to be recorded?

Performance and power use in this case. In the broader case, being able to send unconjugated search engine names may be tied to business purposes.

How would the data be consumed?

Same as now (no change): Looker, GLAM, SQL, etc.

Why existing metric types are not enough?

labeled_* metrics apply a variety of slightly-different regexes to determine what labels are permitted. Most attempt to conform to the label format which mandates 30-character words delimited by . with a max length of around 71 (to ensure Glean metrics can be encoded).

According to the docs, this is To ensure maximum support in database columns. But this is incorrect: if we're trying to support these as valid identifiers, field names, or column names in BQ, then we can't use . and have a much wider variety of characters available to us.

Also, because labeled_* metric types support dynamic labels, we can never make them into column names anyway. They could contain any (valid) string.

What is the timeline by which the data needs to be collected?

It's already being collected. Folks are conjugating into label format themselves.

Chris H-C :chutten

Reporter

Comment 1

•

2 years ago

Relevant BQ docs for column names say:

A column name must contain only letters (a-z, A-Z), numbers (0-9), or underscores (_), and it must start with a letter or underscore. The maximum column name length is 300 characters. A column name cannot use any of the following prefixes: _TABLE_ _FILE_ _PARTITION _ROW_TIMESTAMP __ROOT__ _COLIDENTIFIER. Duplicate column names are not allowed even if the case differs. For example, a column named Column1 is considered identical to a column named column1.

Or, up to 300 characters of case-insensitive alphanum plus underscore (_) with initial letter or underscore.

Relevant BQ docs for STRUCT (and thus RECORD) field names only specify a max nesting depth of 15.

Relevant BQ docs for unquoted SQL identifiers permit dashes (-), but otherwise mimic the column name restrictions.

Jan-Erik Rediger [:janerik]

Updated

•

2 years ago

Blocks: 1800490

Jan-Erik Rediger [:janerik]

Updated

•

2 years ago

Duplicate of this bug: 1800491

Chris H-C :chutten

Reporter

Comment 3

•

1 year ago

Earlier this year we expanded the limit to 71 characters of printable ASCII in bug 1672273. More than enough for PSocketProcess__Msg_OnHttpActivityDistributorObserveConnection

Status: NEW → RESOLVED

Closed: 1 year ago

Resolution: --- → WORKSFORME

You need to log in before you can comment on or make changes to this bug.

Bugzilla

Proposal to change `labeled_*` metrics' label format to be more lenient

Categories

(Data Platform and Tools :: Glean Metric Types, task)

Tracking

(Not tracked)

People

(Reporter: chutten, Unassigned)

References

(Blocks 1 open bug)

Details

Crash Data

Security

(public)

User Story

Description

Proposal for changing an existing or adding a new Glean metric type

Who is the individual/team requesting this change?

Is this about changing an existing metric type or creating a new one?

Can you describe the data that needs to be recorded?

Can you provide a raw sample of the data that needs to be recorded (this is in the abstract, and not any particular implementation details about its representation in the payload or the database)

What is the business question/use-case that requires the data to be recorded?

How would the data be consumed?

Why existing metric types are not enough?

What is the timeline by which the data needs to be collected?

Comment 1

Updated

Updated

Comment 3