Closed Bug 1603536 Opened 6 years ago Closed 4 years ago

Attach table and field descriptions as metadata in BigQuery

Categories

(Data Platform and Tools :: General, enhancement, P2)

enhancement
Points:
2

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: mreid, Assigned: ascholtz)

Details

Attachments

(1 file)

As part of the BigQuery table generation pipeline, we have access to the JSONSchema and associated probe info - we should add field descriptions to the table where possible.

We may also want to do something similar for tables, at least for derived tables in bigquery-etl.

Points: --- → 2
Priority: -- → P2

Propagation of descriptions and titles from json schemas to BigQuery schemas got implemented in: https://github.com/mozilla/jsonschema-transpiler/pull/93

Metadata of views/tables, such as descriptions, can now be defined in bigquery-etl by adding metadata.yaml files. See https://github.com/mozilla/bigquery-etl/pull/684

That's also a first step towards making data sets public in GCP.

As a next step for this, I was wondering if it would make sense to get information from the probe dictionary for fields where the description is missing in the JSON schema and then add those descriptions to the schema?

Flags: needinfo?(mreid)

This sounds good to me!

Do you think there's any likelihood that adding significantly more information to the schema is likely to cause (or compound) performance issues like we've seen with the main ping table recently?

Flags: needinfo?(mreid)

(In reply to Mark Reid [:mreid] from comment #4)

This sounds good to me!

Do you think there's any likelihood that adding significantly more information to the schema is likely to cause (or compound) performance issues like we've seen with the main ping table recently?

We should definitely ask BigQuery support about this.

Anna - Feel free to open an ticket for this, or let me know if you'd rather let me handle it.

Assignee: nobody → ascholtz

According to Google Cloud Support adding descriptions should not cause any memory/performance issues:

Since column descriptions are not used in materialization stats, it wouldn't affect you like the issue you had in the ticket #20528460.
Thus, column descriptions won't lead to memory issues.

Parsing descriptions from main probes and adding it to the schema has been added in: https://github.com/mozilla/mozilla-schema-generator/pull/107

Status: NEW → RESOLVED
Closed: 4 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: