Refactor out boilerplate from telemetry-based Dataset jobs

RESOLVED INCOMPLETE

Status

Cloud Services
Metrics: Pipeline
P3
normal
RESOLVED INCOMPLETE
2 years ago
9 months ago

People

(Reporter: sunahsuh, Unassigned)

Tracking

(Blocks: 1 bug)

Firefox Tracking Flags

(Not tracked)

Details

(Reporter)

Description

2 years ago
We have a lot of shared boilerplate in our dataset jobs that could use a good refactoring, both for maintainability and so we can create new datasets more quickly. In particular the datasets that are generated from telemetry pings share a lot of underlying structure. Some examples of shared code: common CLI options, filtering pings, going from RDD -> Spark DataFrame, writing the dataset back out, *maybe* defining the schema and field generation in the same place in a higher level DSL?
Closing abandoned bugs in this product per https://bugzilla.mozilla.org/show_bug.cgi?id=1337972
Status: NEW → RESOLVED
Last Resolved: 9 months ago
Resolution: --- → INCOMPLETE
You need to log in before you can comment on or make changes to this bug.