Closed Bug 1547563 Opened 5 years ago Closed 4 years ago

Pre-account parquet schema should handle non-integer scalars

Categories

(Toolkit :: Telemetry, enhancement, P3)

enhancement

Tracking

()

RESOLVED WONTFIX

People

(Reporter: janerik, Unassigned)

References

Details

We recently landed 2 new string scalars that are included in the pre-account ping.
However, the schema only serializes integer scalars, therefore we currently don't have the data accessible.

We should fix that and discuss with the data team if they can coerce the data from multiple types (the JSON schema supports integers, strings and booleans).

In the meantime, we should look at the raw data for a quick analysis.

Blocks: 1522664
Type: defect → enhancement
Priority: -- → P1
Assignee: nobody → jrediger

Is this still active or paused?

Flags: needinfo?(jrediger)

:trink, a question for you:

regarding the schema here

We now have the situation that we have non-integer scalars, which of course are now rejected by the parquet thingy.
A while back :frank suggested options:

  1. list those explicitly in the schema <- as I understand that's not easily possible because it's all in the same field
  2. update the parquet writer to do that int -> string conversion
  3. don't handle it in parquet. use spark notebooks for access to those fields.

3 definitely seems the easiest. What solution would you prefer?

Flags: needinfo?(jrediger) → needinfo?(mtrinkala)

There are three ways item 2 can be accomplished (least complex to most complex):

  1. coerce any field into what the schema expects; easy but changes the existing behavior for everything
  2. add a coercion cfg option to the output plugin so we can limit the impact (API change)
  3. update the schema specification so the coercion can be explicitly called out on a field by field basis

With the switch to the GCP ingestion it is more important the new system handles this to your satisfaction. However, the parquet output is general purpose and not limited to telemetry so I would prefer a future looking fix as opposed to a band aid if we go down this path.

Flags: needinfo?(mtrinkala)
Assignee: jrediger → nobody
Priority: P1 → P3

Bulk action: Closing down old Ecosystem Telemetry bug tree. Long live Accounts Ecosystem Telemetry!

Status: NEW → RESOLVED
Closed: 4 years ago
Resolution: --- → WONTFIX
You need to log in before you can comment on or make changes to this bug.