Improve semi-automatic QM_TRY failure analysis
Categories
(Core :: Storage: Quota Manager, task, P2)
Tracking
()
Tracking | Status | |
---|---|---|
firefox89 | --- | fixed |
People
(Reporter: jstutte, Assigned: jstutte)
References
(Blocks 1 open bug)
Details
Attachments
(2 files)
After that we introduced a way to specifically report warnings around QM_TRY in bug 1686191, we can improve the python script currently attached to bug 1482662 regarding the following aspects:
- Directly query the telemetry API from python
- Enrich the code location by hg revisions to have direct links into the code
- Inspect severity and filter for propagated errors
- Find a more stable signature for a code location (like the surrounding function's name) and group found failures in a specific bug
- Find a better home than a bugzilla attachment for the script(s) that do all this.
Not all of this might be automated entirely, we are mostly looking for low hanging fruits here to ease the burden of error analysis.
Assignee | ||
Comment 1•4 years ago
|
||
The result could look like this (except for the signature part):
Assignee | ||
Updated•4 years ago
|
Comment 2•4 years ago
|
||
The example output looks amazing :)
Find a more stable signature for a code location (like the surrounding function's name) and group found failures in a specific bug
I think having only the surrounding function name would be too coarse. The surrounding function name + the relative line offset within that function might be used to group failures, though. But when a function changes, this might lead to false merges. To avoid that, one could inspect the actual source code to check if it matches (at some point, we considered submitting the expression to telemetry as well, which would still be possible, but might increase the event size significantly. Not sure if that's a problem, this should probably be checked with telemetry folks).
Assignee | ||
Comment 3•4 years ago
|
||
(In reply to Simon Giesecke [:sg] [he/him] from comment #2)
The example output looks amazing :)
Thanks!
Find a more stable signature for a code location (like the surrounding function's name) and group found failures in a specific bug
I think having only the surrounding function name would be too coarse. The surrounding function name + the relative line offset within that function might be used to group failures, though. But when a function changes, this might lead to false merges. To avoid that, one could inspect the actual source code to check if it matches (at some point, we considered submitting the expression to telemetry as well, which would still be possible, but might increase the event size significantly. Not sure if that's a problem, this should probably be checked with telemetry folks).
Well, we will see what we can get from rust-code-analysis. And if a function is too long to serve as grouping, the right action might be to shorten it...? ;-)
Comment 4•4 years ago
|
||
(In reply to Jens Stutte [:jstutte] from comment #3)
(In reply to Simon Giesecke [:sg] [he/him] from comment #2)
Well, we will see what we can get from rust-code-analysis. And if a function is too long to serve as grouping, the right action might be to shorten it...? ;-)
Splitting up long functions is surely a good idea, independent of this. But that's a major effort, and not sure if it's feasible to address this issue that way. In particular, where such ambiguity is caused by nested QM_TRY
calls, it might not be desirable to really split that up. But this is hard to judge beforehand :)
Assignee | ||
Updated•4 years ago
|
Assignee | ||
Updated•4 years ago
|
Assignee | ||
Comment 5•4 years ago
•
|
||
My local WIP script now is able also to find function names as anchors:
Error stacks:
Now we should start to think about the workflow we want to achieve with this.
Assignee | ||
Updated•4 years ago
|
Assignee | ||
Comment 6•4 years ago
|
||
Updated•4 years ago
|
Updated•4 years ago
|
Assignee | ||
Comment 7•4 years ago
|
||
Found 17377 rows of data.
No revision for build.id 20210329100226
No revision for build.id 20210329172617
Found 29 error stacks.
Found 26 warning stacks.
Found 0 info stacks.
Found 11 aborted stacks.
Error stacks:
Updated•4 years ago
|
Assignee | ||
Comment 8•4 years ago
|
||
We might want to add a rule, that treats upgrading of the severity within a stack as the start of a new stack. During propagation it is unlikely that we will transform a WARNING
into an ERROR
.
Assignee | ||
Comment 10•4 years ago
•
|
||
This specific bug's goals can be considered achieved:
- Directly query the telemetry API from python
- Enrich the code location by hg revisions to have direct links into the code
- Inspect severity and filter for propagated errors
- Find a more stable signature for a code location (like the surrounding function's name) and group found failures
- Find a better home than a bugzilla attachment for the scripts that do all this.
Missing from the original goals is only:
- Group found failures in a specific bug automatically.
And there are additional caveats we want to examine.
We will file further bugs for those.
Comment 11•4 years ago
|
||
bugherder |
Assignee | ||
Comment 12•4 years ago
|
||
Updated•3 years ago
|
Comment 13•3 years ago
|
||
Comment 14•3 years ago
|
||
bugherder |
Description
•