Once we've uploaded crashes to S3 specifically for going into Telemetry we also need to run a piece of Scala code that builds up a struct. The work is to clone https://github.com/mozilla/telemetry-batch-view/blob/master/src/main/scala/com/mozilla/telemetry/views/MainSummaryView.scala and call it something like CrashView.scala First we need to re-write the buildSchema function  to generate a struct based on *our* JSON Schema (processed_crash.json). Then we also need to re-write the messageToRow function .  https://github.com/mozilla/telemetry-batch-view/blob/a401112b72e1cf92c47083ea76bd67afeef6c71a/src/main/scala/com/mozilla/telemetry/views/MainSummaryView.scala#L378  https://github.com/mozilla/telemetry-batch-view/blob/a401112b72e1cf92c47083ea76bd67afeef6c71a/src/main/scala/com/mozilla/telemetry/views/MainSummaryView.scala#L141
Here are some notes about the latest development on this: https://public.etherpad-mozilla.org/p/socorro-to-telemetry-july2016
I don't think we'll need this after all. Since the data is already in the desired structure and we're not transforming anything, we just need the SparkSQL Struct definition and Spark can automatically convert it from JSON to Parquet. See proof-of-concept at https://gist.github.com/mreid-moz/31ac995e3180c156db61e5f1c0ee745b
Status: NEW → RESOLVED
Last Resolved: 2 years ago
Resolution: --- → WONTFIX
You need to log in before you can comment on or make changes to this bug.