Closed Bug 1313701 Opened 8 years ago Closed 7 years ago

Refactor out boilerplate from telemetry-based Dataset jobs

Categories

(Cloud Services Graveyard :: Metrics: Pipeline, defect, P3)

defect

Tracking

(Not tracked)

RESOLVED INCOMPLETE

People

(Reporter: bugzilla, Unassigned)

References

Details

We have a lot of shared boilerplate in our dataset jobs that could use a good refactoring, both for maintainability and so we can create new datasets more quickly. In particular the datasets that are generated from telemetry pings share a lot of underlying structure. Some examples of shared code: common CLI options, filtering pings, going from RDD -> Spark DataFrame, writing the dataset back out, *maybe* defining the schema and field generation in the same place in a higher level DSL?
Closing abandoned bugs in this product per https://bugzilla.mozilla.org/show_bug.cgi?id=1337972
Status: NEW → RESOLVED
Closed: 7 years ago
Resolution: --- → INCOMPLETE
Product: Cloud Services → Cloud Services Graveyard
You need to log in before you can comment on or make changes to this bug.