Closed Bug 1289572 Opened 9 years ago Closed 9 years ago

Write processed JSON subset to S3

Categories

(Socorro :: General, task)

task
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: peterbe, Assigned: peterbe)

References

Details

Currently we have https://github.com/mozilla/socorro/blob/master/schemas/processed_crash.json which dictates which fields of a crash we can/should send to Telemetry. It conforms with our stored processed crashes as shown with this script: https://github.com/mozilla/socorro/blob/master/schemas/validate_and_test.py However, the stuff we have in processed crashes is LARGER than the JSON Schema. The validation still passes. See https://gist.github.com/peterbe/8d1f4e85724a071605b0ebffd41c47ab We need to take the processed crash, *reduce* it exactly to the fields mentioned in the JSON Schema and nothing more. Then we need to upload this to S3 with a date prefix in the form of /YYYYMMDD/. It can have additional prefixes like this full example: `/telemetry-crashes/20160726/e29960c3-143c-45df-8997-31b602160719.json` Things we need: 1) Code that can take a crash dict and generate a new crash dict that ONLY has the keys mentioned in the JSON Schema. 2) A new bucket to upload to 3) Code that uploads this reduced/limited JSON to S3 (part of the PolyCrashStorage)
Mark, What happens if the JSON we upload *lacks* certain keys. Our JSON Schema says that key 'foo' has be be a 'string'. But what if there is no key called 'foo'? If you suspect that'll be a problem we have to figure out a default or, even better, include it but set to Null.
Flags: needinfo?(mreid)
Missing values shouldn't be a problem. Your schema can say whether a field must be present, so if you want a document to fail validation if particular required fields are missing, you can handle that case with the schema. If a field is not required but is missing, we can set it to null in the batch view code.
Flags: needinfo?(mreid)
Assignee: nobody → peterbe
Commit pushed to master at https://github.com/mozilla/socorro https://github.com/mozilla/socorro/commit/75a5b7b26f9c62f9c402aed99c8cc96dfe1365b6 Bug 1289572 write processed json subset to s3 (#3412) * bug 1289572 - Write processed JSON subset to S3 * bug 1289572 - SimpleDatePrefixKeyBuilder class * TelemetryBotoS3CrashStorage * merge raw and processed crash, validate the new unified crash report * optimization todo note * crash_report not processed_crash * remove pprint statement * Forgot one field to rename in JSON Schema * internally cache all_fields for 1 hour * typo * much better tests of supersearchfields caching * comment out es logging r=adngdb
(In reply to Peter Bengtsson [:peterbe] from comment #0) > Currently we have > https://github.com/mozilla/socorro/blob/master/schemas/processed_crash.json > which dictates which fields of a crash we can/should send to Telemetry. Just for my own sanity, this moved in the repo: https://github.com/mozilla/socorro/blob/master/socorro/schemas/crash_report.json
The next thing to do is to configure this in prod.
Not sure why this didn't close on the code commit. The code is currently in production but we haven't configured it yet https://bugzilla.mozilla.org/show_bug.cgi?id=1311522
Status: NEW → RESOLVED
Closed: 9 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.