replace json-schema-reducer
Categories
(Socorro :: Processor, task, P2)
Tracking
(Not tracked)
People
(Reporter: willkg, Assigned: willkg)
References
Details
Attachments
(1 file)
A while back, we started using json-schema-reducer which Peter wrote for contribute.json and maybe AirMo. That takes a Python nested structure and a JSON schema and returns the reduced nested structure. It only looks at the structure--it doesn't do any type or other validation. The end result is that we have a reduced structure with who-knows-what.
Further, it looks like that hasn't been updated since 2016, so it's unmaintained.
I think we should either look for another reducer or write one. It should reduce a structure based on a schema and also do the type validation. We'll use this for TelemetryBotoS3CrashStroage and possibly other things.
Assignee | ||
Comment 1•3 years ago
|
||
Grabbing this to look into now since I need it for bug #1755095.
Assignee | ||
Comment 2•3 years ago
|
||
I looked for other reducers on PyPI and didn't see anything, so I think I'm going to roll my own. I've done something similar recently, so I think it's doable.
Assignee | ||
Comment 3•3 years ago
|
||
It needs to do the following:
- traverse a document and prune anything that isn't in the specified json schema
- handle type checking
- handle json references (https://cswr.github.io/JsonSchema/spec/definitions_references/)
- accept a predicate that looks at the schema and determines whether this item should be included or not -- we can use this for removing all the protected parts of crash data
I'm pretty sure I have most of that working. I need to write some tests covering edge cases. I should also go back and re-read JSON Schema structure. The telemetry socorro crash json schema and java exception schemas are draft-04, so I'm currently writing it to work with those.
Assignee | ||
Comment 4•3 years ago
•
|
||
I got a working reducer with tests.
When I went to test it on crash data, I hit a slew of issues where the data in the processed crash is the wrong type. Bug #1754035 covers schema problems related to the new stackwalker. I'm hitting problems with a lot of other fields, too.
Examples:
available_physical_memory
is an int in the processed crash, but defined as a[string, null]
in the schemaavailable_virtual_memory
is an int in the processed crash, but defined as a[string, null]
in the schemasafe_mode
is a bool in the processed crash, but defined as a [string, null] in the schema and has values"0"
and"1"
intelemetry.socorro_crash
We need to not break what's in telemetry.socorro_crash
, so I'm going to write a converter to fix fields that need fixing as specified in the schema.
I'll fix the other fields in this bug and fix the issues related to the new stackwalker in bug #1754035.
Assignee | ||
Comment 5•3 years ago
|
||
Assignee | ||
Comment 6•3 years ago
|
||
Assignee | ||
Comment 7•3 years ago
|
||
This went to production in bug #1763234. If there are any issues, we can fix them in new bugs. Marking as FIXED.
Description
•