Closed
Bug 1314120
Opened 9 years ago
Closed 9 years ago
Productionize Socorro import
Categories
(Cloud Services Graveyard :: Metrics: Pipeline, defect, P2)
Cloud Services Graveyard
Metrics: Pipeline
Tracking
(Not tracked)
RESOLVED
FIXED
People
(Reporter: mreid, Assigned: amiyaguchi)
References
Details
Attachments
(1 file)
The prototype import at [1] needs some work before it can be considered production.
Namely:
- Replace the hard-coded schema: Read the definitive JSON Schema for crashes from [2] and convert it to a Spark SQL struct
- Replace the hard-coded version: Pull the "version" string from the above schema and use it as a path component when saving to S3
- Store the output data in the official S3 bucket (telemetry-parquet)
- Schedule the generation code to run via Airflow
[1] https://gist.github.com/mreid-moz/092029949782249577aee92602879e2b
[2] https://github.com/mozilla/socorro/blob/master/socorro/schemas/crash_report.json
![]() |
||
Updated•9 years ago
|
Assignee: nobody → amiyaguchi
Priority: -- → P2
Reporter | ||
Updated•9 years ago
|
Points: --- → 3
![]() |
Assignee | |
Comment 1•9 years ago
|
||
I've created a fork of the gist [1] that adds support for generating the Spark structs and pulling out the version number for versioning the S3 bucket. The results of a trial run for 11/01/29016 can be found under `s3://net-mozaws-prod-us-west-2-pipeline-analysis/mreid/crash/v4/v0/`.
[1] https://gist.github.com/acmiyaguchi/bd08b62b025b80acc16efb63be29ea35
![]() |
||
Comment 2•9 years ago
|
||
For what it's worth, the crash_report.json JSON Schema now has a version in it
https://github.com/mozilla/socorro/commit/d429b403d7f5e44a7909656bc42cfaabae43520a
It's still only available on github.com but by early next week it'll be in S3.
Reporter | ||
Comment 3•9 years ago
|
||
Peter, what is the path you plan to use on S3 for the crash schema?
Flags: needinfo?(peterbe)
![]() |
||
Comment 4•9 years ago
|
||
(In reply to Mark Reid [:mreid] from comment #3)
> Peter, what is the path you plan to use on S3 for the crash schema?
/crash_report.json
Flags: needinfo?(peterbe)
Reporter | ||
Comment 5•9 years ago
|
||
Ok, and per bug 1311522, the bucket is org-mozilla-telemetry-crashes.
![]() |
||
Comment 6•9 years ago
|
||
![]() |
Assignee | |
Comment 7•9 years ago
|
||
The crash data should be accessible for use. The Socorro import job is being run on airflow with the resulting data being placed into `s3://telemetry-parquet/socorro_crash/v1`.
Status: NEW → RESOLVED
Closed: 9 years ago
Resolution: --- → FIXED
Updated•7 years ago
|
Product: Cloud Services → Cloud Services Graveyard
You need to log in
before you can comment on or make changes to this bug.
Description
•