Closed
Bug 1289572
Opened 9 years ago
Closed 9 years ago
Write processed JSON subset to S3
Categories
(Socorro :: General, task)
Socorro
General
Tracking
(Not tracked)
RESOLVED
FIXED
People
(Reporter: peterbe, Assigned: peterbe)
References
Details
Currently we have https://github.com/mozilla/socorro/blob/master/schemas/processed_crash.json which dictates which fields of a crash we can/should send to Telemetry.
It conforms with our stored processed crashes as shown with this script: https://github.com/mozilla/socorro/blob/master/schemas/validate_and_test.py
However, the stuff we have in processed crashes is LARGER than the JSON Schema. The validation still passes. See https://gist.github.com/peterbe/8d1f4e85724a071605b0ebffd41c47ab
We need to take the processed crash, *reduce* it exactly to the fields mentioned in the JSON Schema and nothing more.
Then we need to upload this to S3 with a date prefix in the form of /YYYYMMDD/.
It can have additional prefixes like this full example: `/telemetry-crashes/20160726/e29960c3-143c-45df-8997-31b602160719.json`
Things we need:
1) Code that can take a crash dict and generate a new crash dict that ONLY has the keys mentioned in the JSON Schema.
2) A new bucket to upload to
3) Code that uploads this reduced/limited JSON to S3 (part of the PolyCrashStorage)
| Assignee | ||
Comment 1•9 years ago
|
||
Mark,
What happens if the JSON we upload *lacks* certain keys. Our JSON Schema says that key 'foo' has be be a 'string'. But what if there is no key called 'foo'?
If you suspect that'll be a problem we have to figure out a default or, even better, include it but set to Null.
Flags: needinfo?(mreid)
Comment 2•9 years ago
|
||
Missing values shouldn't be a problem. Your schema can say whether a field must be present, so if you want a document to fail validation if particular required fields are missing, you can handle that case with the schema. If a field is not required but is missing, we can set it to null in the batch view code.
Flags: needinfo?(mreid)
| Assignee | ||
Updated•9 years ago
|
Assignee: nobody → peterbe
Comment 3•9 years ago
|
||
Commit pushed to master at https://github.com/mozilla/socorro
https://github.com/mozilla/socorro/commit/75a5b7b26f9c62f9c402aed99c8cc96dfe1365b6
Bug 1289572 write processed json subset to s3 (#3412)
* bug 1289572 - Write processed JSON subset to S3
* bug 1289572 - SimpleDatePrefixKeyBuilder class
* TelemetryBotoS3CrashStorage
* merge raw and processed crash, validate the new unified crash report
* optimization todo note
* crash_report not processed_crash
* remove pprint statement
* Forgot one field to rename in JSON Schema
* internally cache all_fields for 1 hour
* typo
* much better tests of supersearchfields caching
* comment out es logging
r=adngdb
Comment 4•9 years ago
|
||
(In reply to Peter Bengtsson [:peterbe] from comment #0)
> Currently we have
> https://github.com/mozilla/socorro/blob/master/schemas/processed_crash.json
> which dictates which fields of a crash we can/should send to Telemetry.
Just for my own sanity, this moved in the repo:
https://github.com/mozilla/socorro/blob/master/socorro/schemas/crash_report.json
| Assignee | ||
Comment 5•9 years ago
|
||
The next thing to do is to configure this in prod.
| Assignee | ||
Comment 6•9 years ago
|
||
Not sure why this didn't close on the code commit.
The code is currently in production but we haven't configured it yet
https://bugzilla.mozilla.org/show_bug.cgi?id=1311522
Status: NEW → RESOLVED
Closed: 9 years ago
Resolution: --- → FIXED
You need to log in
before you can comment on or make changes to this bug.
Description
•