Closed
Bug 1383113
Opened 7 years ago
Closed 4 years ago
rewrite processor rules to use getitem notation
Categories
(Socorro :: Processor, task, P2)
Socorro
Processor
Tracking
(Not tracked)
RESOLVED
FIXED
People
(Reporter: willkg, Assigned: willkg)
References
(Blocks 1 open bug)
Details
Attachments
(6 files)
53 bytes,
text/x-github-pull-request
|
Details | Review | |
53 bytes,
text/x-github-pull-request
|
Details | Review | |
53 bytes,
text/x-github-pull-request
|
Details | Review | |
53 bytes,
text/x-github-pull-request
|
Details | Review | |
53 bytes,
text/x-github-pull-request
|
Details | Review | |
53 bytes,
text/x-github-pull-request
|
Details | Review |
The processor rules currently use a mix of getitem and getattribute notation which makes them harder to reason about. What are these things? Does the mix of notation indicate something? Further, we regularly see KeyErrors in Sentry now that we changed it to send all rules errors. For example: processed_crash.json_dump.system_info['cpu_info'] This mixes the two notations. Further, if json_dump isn't a key in processed_crash, it kicks up an error. It behooves us should pick one notation and stick with that. The getattribute notation seems nice, but doesn't help us with KeyErrors unless we liberally add try/except blocks to catch KeyError. That's unenthusing. Related to this, we're in the process of rewriting the processor and we're not going to have a DotDict equivalent. Given all those things, I want to switch everything to getitem notation. It alleviates KeyErrors, it's uniform, and it'll ease keeping the new processor in sync with the old one. This bug covers rewriting the processor rules to use getitem notation.
Assignee | ||
Comment 1•7 years ago
|
||
The example above will probably become: processed_crash.get('json_dump', {}).get('system_info', {}).get('cpu_info') That's gross. It'd be nice to find a getitem-based notation that isn't gross like the above. Maybe write some traversal functions so we could do something like this?: get_value(processed_crash, ['json_dump', 'system_info', 'cpu_info']) Maybe get inspiration from pathlib?: (Tree(processed_crash) > 'json_dump' > 'system_info' > 'cpu_info').value Might be an interesting tree traversal library in here. Lonnen wants to ultimately switch to immutable data structures which will have their own notation, probably. I don't want to do that for this bug, but it's good to keep in mind. What other existing things can we pull from?
Comment 2•7 years ago
|
||
Tree traversal library could be rad. I like the get_value() demo. Another source of inspiration is the various dotdict implementations we have around already. Immutable structures like those provided by pyrsistent have schemas and performant deep copying, which are useful with the current architecture but may be short lived if we move away from the n-writes crash storage paradigm.
Assignee | ||
Comment 3•7 years ago
|
||
Maybe tree_get and tree_set?: tree_get(processed_crash, key='json_dump.system_info.cpu_info') tree_get(processed_crash, key='json_dump.system_info.cpu_info', default=None) Also, the other side: tree_set(processed_crash, key='json_dump.system_info.cpu_info', value='blah | blah') That's pretty concise, let's you compose keys, keeps everything (including intermediaries) as dicts, and is easy to reason about. How does that strike you?
Comment 4•7 years ago
|
||
I like tree_get() the best of the candidates so far. The dot syntax is very lightweight to type and read. It's less surprising than the pathlib-inspired candidate and it avoids the boilerplate of get_value or .get() chaining. It is, however, unlike the others. What is the effect for querying a key that doesn't exist with a he default? Are the intermediary dictionaries created and the key set to none? Does it short circuit and return the default? Something else? Is this useful if we were certain of the processed_crash structure before hand? Should we spend the time on schema validation instead and then fearlessly access keys? Should we spend the time instead flattening the document?
Assignee | ||
Comment 5•7 years ago
|
||
I don't think writing tree_get and tree_set are hard. I think that's like a half-day of work especially if we keep to "sane-by-default" semantics. I think we should also add a flatten and a schema validator, too. I started a "design-by-documentation" document. I'll push that somewhere and then we can yay/nay that and then move forward with minimal effort.
Assignee | ||
Comment 6•7 years ago
|
||
Comment 7•7 years ago
|
||
Commit pushed to master at https://github.com/mozilla-services/socorro https://github.com/mozilla-services/socorro/commit/108b576ed51555f4d9e9469e2a0a476bf1f70b93 bug 1383113 - convert signature rules to getitem notation (#3871) * reduce some extra lines * convert the things that were using attribute notation to getitem notation * clean up data structure set up in the tests * convert tests to use {} instead of DotDict for all non-config things
Assignee | ||
Comment 8•6 years ago
|
||
I discovered glom (https://glom.readthedocs.io/en/latest/) today. That library does everything that tree_get does, but it's more flexible and we don't have to maintain it. It doesn't have something like tree_set, though. I added an issue to their tracker to see what they thought. It's likely we'll need to write something because it's a complicated problem domain and we have specific needs and can radically reduce the scope to meet our needs.
Assignee | ||
Comment 9•6 years ago
|
||
glom has an Assign now, so it does both get and set: https://glom.readthedocs.io/en/latest/api.html#target-mutation-with-assign
Assignee | ||
Comment 10•6 years ago
|
||
Making this a P2 to do soon. It'll ease rewriting the processor plus it makes the code clearer.
Assignee: nobody → willkg
Status: NEW → ASSIGNED
Priority: -- → P2
Assignee | ||
Comment 11•5 years ago
|
||
I'll tweak these as I go along, but I'm not going to otherwise spend time on this anytime soon. Making it a P3.
Priority: P2 → P3
Assignee | ||
Comment 12•5 years ago
|
||
Unassigning myself since I'm not going to get to this any time soon.
Assignee: willkg → nobody
Status: ASSIGNED → NEW
Assignee | ||
Comment 13•4 years ago
|
||
Making this a P2 again and grabbing it. Moving this forward alleviates issues we're hitting with ModuleSignatureInfo
that are otherwise really hard to deal with.
Assignee: nobody → willkg
Status: NEW → ASSIGNED
Priority: P3 → P2
Assignee | ||
Comment 14•4 years ago
|
||
Assignee | ||
Comment 15•4 years ago
|
||
Assignee | ||
Comment 16•4 years ago
|
||
Assignee | ||
Comment 17•4 years ago
|
||
Assignee | ||
Comment 18•4 years ago
|
||
Assignee | ||
Comment 19•4 years ago
|
||
Assignee | ||
Comment 20•4 years ago
|
||
Assignee | ||
Comment 21•4 years ago
|
||
Assignee | ||
Comment 22•4 years ago
|
||
Assignee | ||
Comment 23•4 years ago
|
||
Assignee | ||
Comment 24•4 years ago
|
||
I deployed this to prod in bug #1635485. Marking as FIXED.
Status: ASSIGNED → RESOLVED
Closed: 4 years ago
Resolution: --- → FIXED
You need to log in
before you can comment on or make changes to this bug.
Description
•