1700915 - Improve semi-automatic QM_TRY failure analysis

Assignee

Description

•

4 years ago

•

After that we introduced a way to specifically report warnings around QM_TRY in bug 1686191, we can improve the python script currently attached to bug 1482662 regarding the following aspects:

Directly query the telemetry API from python
Enrich the code location by hg revisions to have direct links into the code
Inspect severity and filter for propagated errors
Find a more stable signature for a code location (like the surrounding function's name) and group found failures in a specific bug
Find a better home than a bugzilla attachment for the script(s) that do all this.

Not all of this might be automated entirely, we are mostly looking for low hanging fruits here to ease the burden of error analysis.

Jens Stutte [:jstutte]

Assignee

Comment 1

•

4 years ago

The result could look like this (except for the signature part):

Clients	Sessions	Stack
40	51	dom/quota/ActorsParent.cpp#4121: <- dom/quota/ActorsParent.cpp#4164:NS_ERROR_FAILURE <- dom/quota/ActorsParent.cpp#4202:NS_ERROR_FAILURE
36	36	dom/localstorage/ActorsParent.cpp#1028:
11	11	dom/localstorage/ActorsParent.cpp#1013:NS_ERROR_FILE_NOT_FOUND
11	11	dom/localstorage/ActorsParent.cpp#1028: <- dom/localstorage/ActorsParent.cpp#1028:
11	14	dom/cache/FileUtils.cpp#560:NS_ERROR_FILE_NOT_FOUND <- dom/cache/QuotaClient.cpp#394:NS_ERROR_FILE_NOT_FOUND

Jens Stutte [:jstutte]

Assignee

Updated

•

4 years ago

Summary: Improve semi-automatic failure analysis → Improve semi-automatic QM_TRY failure analysis

Jens Stutte [:jstutte]

Assignee

Updated

•

4 years ago

Depends on: 1686191

Simon Giesecke [:sg] [he/him]

Comment 2

•

4 years ago

The example output looks amazing :)

Find a more stable signature for a code location (like the surrounding function's name) and group found failures in a specific bug

I think having only the surrounding function name would be too coarse. The surrounding function name + the relative line offset within that function might be used to group failures, though. But when a function changes, this might lead to false merges. To avoid that, one could inspect the actual source code to check if it matches (at some point, we considered submitting the expression to telemetry as well, which would still be possible, but might increase the event size significantly. Not sure if that's a problem, this should probably be checked with telemetry folks).

Jens Stutte [:jstutte]

Assignee

Comment 3

•

4 years ago

(In reply to Simon Giesecke [:sg] [he/him] from comment #2)

The example output looks amazing :)

Thanks!

Find a more stable signature for a code location (like the surrounding function's name) and group found failures in a specific bug

I think having only the surrounding function name would be too coarse. The surrounding function name + the relative line offset within that function might be used to group failures, though. But when a function changes, this might lead to false merges. To avoid that, one could inspect the actual source code to check if it matches (at some point, we considered submitting the expression to telemetry as well, which would still be possible, but might increase the event size significantly. Not sure if that's a problem, this should probably be checked with telemetry folks).

Well, we will see what we can get from rust-code-analysis. And if a function is too long to serve as grouping, the right action might be to shorten it...? ;-)

Simon Giesecke [:sg] [he/him]

Comment 4

•

4 years ago

(In reply to Jens Stutte [:jstutte] from comment #3)

(In reply to Simon Giesecke [:sg] [he/him] from comment #2)

Well, we will see what we can get from rust-code-analysis. And if a function is too long to serve as grouping, the right action might be to shorten it...? ;-)

Splitting up long functions is surely a good idea, independent of this. But that's a major effort, and not sure if it's feasible to address this issue that way. In particular, where such ambiguity is caused by nested QM_TRY calls, it might not be desirable to really split that up. But this is hard to judge beforehand :)

Jens Stutte [:jstutte]

Assignee

Updated

•

4 years ago

Assignee: nobody → jstutte

Jens Stutte [:jstutte]

Assignee

Updated

•

4 years ago

Severity: -- → N/A

Priority: -- → P2

Jens Stutte [:jstutte]

Assignee

Comment 5

•

4 years ago

•

Edited

My local WIP script now is able also to find function names as anchors:

Error stacks:

Clients	Sessions	Hits	Anchor	Stack
1244	1418	1691	dom/localstorage/ActorsParent.cpp:LoadUsageFile	dom/localstorage/ActorsParent.cpp#1028:None
413	472	593	dom/localstorage/ActorsParent.cpp:LoadUsageFile	dom/localstorage/ActorsParent.cpp#1013:NS_ERROR_FILE_NOT_FOUND
266	345	5745	dom/cache/FileUtils.cpp:LockedDirectoryPaddingGet	dom/cache/FileUtils.cpp#560:NS_ERROR_FILE_NOT_FOUND <- dom/cache/QuotaClient.cpp#394:NS_ERROR_FILE_NOT_FOUND
248	466	468	dom/quota/ActorsParent.cpp:QuotaManager::LoadQuota	dom/quota/ActorsParent.cpp#4121:None <- dom/quota/ActorsParent.cpp#4164:NS_ERROR_FAILURE <- dom/quota/ActorsParent.cpp#4202:NS_ERROR_FAILURE
113	145	159	dom/quota/ActorsParent.cpp:QuotaManager::LoadQuota	dom/quota/ActorsParent.cpp#4137:None <- dom/quota/ActorsParent.cpp#4164:NS_ERROR_FAILURE <- dom/quota/ActorsParent.cpp#4202:NS_ERROR_FAILURE
23	28	1225	dom/quota/ActorsParent.cpp:GetBinaryInputStream	dom/quota/ActorsParent.cpp#2582:NS_ERROR_FILE_NOT_FOUND <- dom/quota/ActorsParent.cpp#4555:NS_ERROR_FILE_NOT_FOUND <- dom/quota/ActorsParent.cpp#4628:NS_ERROR_FILE_NOT_FOUND
19	45	45	dom/quota/ActorsParent.cpp:QuotaManager::LoadQuota	dom/quota/ActorsParent.cpp#4113:None <- dom/quota/ActorsParent.cpp#4156:NS_ERROR_FAILURE <- dom/quota/ActorsParent.cpp#4194:NS_ERROR_FAILURE
16	21	21	dom/quota/ActorsParent.cpp:QuotaManager::LoadQuota	dom/quota/ActorsParent.cpp#4113:None <- dom/quota/ActorsParent.cpp#4156:NS_ERROR_FAILURE <- dom/quota/ActorsParent.cpp#4194:NS_ERROR_FAILURE
16	20	3421	dom/cache/FileUtilsImpl.h:BodyTraverseFiles	dom/cache/FileUtilsImpl.h#74:None
14	21	460	dom/cache/FileUtils.cpp:LockedDirectoryPaddingGet	dom/cache/FileUtils.cpp#564:NS_ERROR_FILE_NOT_FOUND <- dom/cache/QuotaClient.cpp#394:NS_ERROR_FILE_NOT_FOUND
14	24	288	dom/cache/FileUtils.cpp:LockedDirectoryPaddingGet	dom/cache/FileUtils.cpp#564:NS_ERROR_FILE_NOT_FOUND <- dom/cache/QuotaClient.cpp#394:NS_ERROR_FILE_NOT_FOUND
10	14	1631	dom/quota/ActorsParent.cpp:GetBinaryInputStream	dom/quota/ActorsParent.cpp#2582:NS_ERROR_FILE_NOT_FOUND <- dom/quota/ActorsParent.cpp#9625:NS_ERROR_FILE_NOT_FOUND

Now we should start to think about the workflow we want to achieve with this.

Jens Stutte [:jstutte]

Assignee

Updated

•

4 years ago

Updated

•

4 years ago

Depends on: 1701966

Comment 6

•

4 years ago

Attached file Bug 1700915: Python scripts that fetch and analyze QM_TRY errors. r?#dom-storage-reviewers — Details

Phabricator Automation

Updated

•

4 years ago

Attachment #9212557 - Attachment description: WIP: Bug 1700915: WIP Initial set of python scripts. → WIP: Bug 1700915: Initial set of python scripts.

Phabricator Automation

Updated

•

4 years ago

Attachment #9212557 - Attachment description: WIP: Bug 1700915: Initial set of python scripts. → WIP: Bug 1700915: Python scripts that fetch and analyze QM_TRY errors.

Jens Stutte [:jstutte]

Assignee

Comment 7

•

4 years ago

Found 17377 rows of data.
No revision for build.id 20210329100226
No revision for build.id 20210329172617
Found 29 error stacks.
Found 26 warning stacks.
Found 0 info stacks.
Found 11 aborted stacks.

Error stacks:

Clients	Sessions	Hits	Anchor	Stack
54	57	81	dom/localstorage/ActorsParent.cpp:LoadUsageFile	dom/localstorage/ActorsParent.cpp#1033:None
45	72	1587	dom/cache/FileUtils.cpp:LockedDirectoryPaddingGet	dom/cache/FileUtils.cpp#564:NS_ERROR_FILE_NOT_FOUND <- dom/cache/QuotaClient.cpp#394:NS_ERROR_FILE_NOT_FOUND
45	59	60	dom/quota/ActorsParent.cpp:QuotaManager::LoadQuota	dom/quota/ActorsParent.cpp#4113:None <- dom/quota/ActorsParent.cpp#4156:NS_ERROR_FAILURE <- dom/quota/ActorsParent.cpp#4194:NS_ERROR_FAILURE
40	96	96	dom/quota/ActorsParent.cpp:QuotaManager::LoadQuota	dom/quota/ActorsParent.cpp#4113:None <- dom/quota/ActorsParent.cpp#4156:NS_ERROR_FAILURE <- dom/quota/ActorsParent.cpp#4194:NS_ERROR_FAILURE
38	84	84	dom/quota/ActorsParent.cpp:QuotaManager::LoadQuota	dom/quota/ActorsParent.cpp#4113:None <- dom/quota/ActorsParent.cpp#4156:NS_ERROR_FAILURE <- dom/quota/ActorsParent.cpp#4194:NS_ERROR_FAILURE
33	74	1639	dom/cache/FileUtils.cpp:LockedDirectoryPaddingGet	dom/cache/FileUtils.cpp#564:NS_ERROR_FILE_NOT_FOUND <- dom/cache/QuotaClient.cpp#394:NS_ERROR_FILE_NOT_FOUND
21	31	567	dom/cache/FileUtils.cpp:LockedDirectoryPaddingGet	dom/cache/FileUtils.cpp#564:NS_ERROR_FILE_NOT_FOUND <- dom/cache/QuotaClient.cpp#394:NS_ERROR_FILE_NOT_FOUND
15	17	20	dom/quota/ActorsParent.cpp:QuotaManager::LoadQuota	dom/quota/ActorsParent.cpp#4129:None <- dom/quota/ActorsParent.cpp#4156:NS_ERROR_FAILURE <- dom/quota/ActorsParent.cpp#4194:NS_ERROR_FAILURE
14	19	23	dom/quota/ActorsParent.cpp:QuotaManager::LoadQuota	dom/quota/ActorsParent.cpp#4129:None <- dom/quota/ActorsParent.cpp#4156:NS_ERROR_FAILURE <- dom/quota/ActorsParent.cpp#4194:NS_ERROR_FAILURE
11	14	17	dom/quota/ActorsParent.cpp:QuotaManager::LoadQuota	dom/quota/ActorsParent.cpp#4129:None <- dom/quota/ActorsParent.cpp#4156:NS_ERROR_FAILURE <- dom/quota/ActorsParent.cpp#4194:NS_ERROR_FAILURE
6	6	9	dom/localstorage/ActorsParent.cpp:<Unknown 1013>	dom/localstorage/ActorsParent.cpp#1013:NS_ERROR_FILE_TARGET_DOES_NOT_EXIST
5	8	8	dom/quota/ActorsParent.cpp:ClientUsageArray::Deserialize	dom/quota/ActorsParent.cpp#7081:NS_ERROR_ILLEGAL_VALUE <- dom/quota/ActorsParent.cpp#4093:NS_ERROR_ILLEGAL_VALUE <- dom/quota/ActorsParent.cpp#4156:NS_ERROR_ILLEGAL_VALUE <- dom/quota/ActorsParent.cpp#4194:NS_ERROR_ILLEGAL_VALUE
3	4	4	dom/quota/ActorsParent.cpp:ClientUsageArray::Deserialize	dom/quota/ActorsParent.cpp#7081:NS_ERROR_ILLEGAL_VALUE <- dom/quota/ActorsParent.cpp#4093:NS_ERROR_ILLEGAL_VALUE <- dom/quota/ActorsParent.cpp#4156:NS_ERROR_ILLEGAL_VALUE <- dom/quota/ActorsParent.cpp#4194:NS_ERROR_ILLEGAL_VALUE
3	3	6	dom/quota/ActorsParent.cpp:GetBinaryInputStream	dom/quota/ActorsParent.cpp#2582:NS_ERROR_FILE_NOT_FOUND <- dom/quota/ActorsParent.cpp#9610:NS_ERROR_FILE_NOT_FOUND
3	4	11	dom/quota/ActorsParent.cpp:GetBinaryInputStream	dom/quota/ActorsParent.cpp#2582:NS_ERROR_FILE_NOT_FOUND <- dom/quota/ActorsParent.cpp#9610:NS_ERROR_FILE_NOT_FOUND
2	9	9	dom/quota/ActorsParent.cpp:ClientUsageArray::Deserialize	dom/quota/ActorsParent.cpp#7081:NS_ERROR_ILLEGAL_VALUE <- dom/quota/ActorsParent.cpp#4093:NS_ERROR_ILLEGAL_VALUE <- dom/quota/ActorsParent.cpp#4156:NS_ERROR_ILLEGAL_VALUE <- dom/quota/ActorsParent.cpp#4194:NS_ERROR_ILLEGAL_VALUE
2	2	2	dom/quota/ActorsParent.cpp:<Unknown 4121>	dom/quota/ActorsParent.cpp#4121:None <- dom/quota/ActorsParent.cpp#4164:NS_ERROR_FAILURE <- dom/quota/ActorsParent.cpp#4202:NS_ERROR_FAILURE
2	2	2	dom/quota/ActorsParent.cpp:<Unknown 4137>	dom/quota/ActorsParent.cpp#4137:None <- dom/quota/ActorsParent.cpp#4164:NS_ERROR_FAILURE <- dom/quota/ActorsParent.cpp#4202:NS_ERROR_FAILURE
2	2	15	dom/quota/ActorsParent.cpp:GetBinaryInputStream	dom/quota/ActorsParent.cpp#2582:NS_ERROR_FILE_NOT_FOUND <- dom/quota/ActorsParent.cpp#9610:NS_ERROR_FILE_NOT_FOUND
1	1	1	dom/quota/ActorsParent.cpp:GetBinaryInputStream	dom/quota/ActorsParent.cpp#2582:NS_ERROR_FILE_NOT_FOUND
1	1	1	dom/quota/ActorsParent.cpp:GetBinaryInputStream	dom/quota/ActorsParent.cpp#2582:NS_ERROR_FILE_NOT_FOUND <- dom/quota/ActorsParent.cpp#4547:NS_ERROR_FILE_NOT_FOUND
1	1	1	dom/quota/ActorsParent.cpp:GetBinaryInputStream	dom/quota/ActorsParent.cpp#2582:NS_ERROR_FILE_NOT_FOUND <- dom/quota/ActorsParent.cpp#9610:NS_ERROR_FILE_NOT_FOUND <- dom/indexedDB/ActorsParent.cpp#12620:None <- dom/indexedDB/ActorsParent.cpp#12620:None <- dom/indexedDB/ActorsParent.cpp#12620:None <- dom/indexedDB/ActorsParent.cpp#12620:None <- dom/indexedDB/ActorsParent.cpp#12620:None <- dom/indexedDB/ActorsParent.cpp#12620:None <- dom/indexedDB/ActorsParent.cpp#12620:None <- dom/indexedDB/ActorsParent.cpp#12620:None
1	1	2	dom/quota/ActorsParent.cpp:<Unknown 439>	dom/quota/ActorsParent.cpp#439:NS_ERROR_STORAGE_IOERR
1	1	1	dom/quota/ActorsParent.cpp:<Unknown 2582>	dom/quota/ActorsParent.cpp#2582:NS_ERROR_FILE_NOT_FOUND <- dom/quota/ActorsParent.cpp#4555:NS_ERROR_FILE_NOT_FOUND <- dom/quota/ActorsParent.cpp#4628:NS_ERROR_FILE_NOT_FOUND
1	1	1	dom/cache/FileUtils.cpp:LockedDirectoryPaddingGet	dom/cache/FileUtils.cpp#564:NS_ERROR_FILE_NOT_FOUND <- dom/cache/QuotaClient.cpp#394:NS_ERROR_FILE_NOT_FOUND <- dom/cache/FileUtils.cpp#490:NS_ERROR_FILE_ACCESS_DENIED <- dom/cache/FileUtils.cpp#490:NS_ERROR_FILE_ACCESS_DENIED
1	1	1	dom/quota/ActorsParent.cpp:GetBinaryInputStream	dom/quota/ActorsParent.cpp#2582:NS_ERROR_FILE_NOT_FOUND <- dom/quota/ActorsParent.cpp#4547:NS_ERROR_FILE_NOT_FOUND
1	1	1	dom/cache/FileUtils.cpp:RemoveNsIFileRecursively	dom/cache/FileUtils.cpp#490:NS_ERROR_FILE_DIR_NOT_EMPTY
1	1	1	dom/cache/FileUtils.cpp:RemoveNsIFileRecursively	dom/cache/FileUtils.cpp#490:NS_ERROR_FILE_DIR_NOT_EMPTY
1	1	1	dom/cache/FileUtils.cpp:RemoveNsIFileRecursively	dom/cache/FileUtils.cpp#490:NS_ERROR_FILE_DIR_NOT_EMPTY

Jens Stutte [:jstutte]

Assignee

Updated

•

4 years ago

Updated

•

4 years ago

Attachment #9212557 - Attachment description: WIP: Bug 1700915: Python scripts that fetch and analyze QM_TRY errors. → Bug 1700915: Python scripts that fetch and analyze QM_TRY errors. r?#dom-storage-reviewers

Jens Stutte [:jstutte]

Assignee

Comment 8

•

4 years ago

We might want to add a rule, that treats upgrading of the severity within a stack as the start of a new stack. During propagation it is unlikely that we will transform a WARNING into an ERROR.

Pulsebot

Comment 9

•

4 years ago

Pushed by jstutte@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/9b0779f96ab6 Python scripts that fetch and analyze QM_TRY errors. r=dom-storage-reviewers,janv

Jens Stutte [:jstutte]

Assignee

Comment 10

•

4 years ago

•

Edited

This specific bug's goals can be considered achieved:

Directly query the telemetry API from python
Enrich the code location by hg revisions to have direct links into the code
Inspect severity and filter for propagated errors
Find a more stable signature for a code location (like the surrounding function's name) and group found failures
Find a better home than a bugzilla attachment for the scripts that do all this.

Missing from the original goals is only:

Group found failures in a specific bug automatically.

And there are additional caveats we want to examine.

We will file further bugs for those.

Jens Stutte [:jstutte]

Assignee

Updated

•

4 years ago

Updated

•

4 years ago

No longer depends on: 1701966

Alexandru Michis [:malexandru]

Comment 11

•

4 years ago

bugherder

https://hg.mozilla.org/mozilla-central/rev/9b0779f96ab6

Status: NEW → RESOLVED

Closed: 4 years ago

status-firefox89: --- → fixed

Resolution: --- → FIXED

Target Milestone: --- → 89 Branch

Jens Stutte [:jstutte]

Assignee

Updated

•

4 years ago

Comment 12

•

4 years ago

Attached file Bug 1700915: Change the max_delta heuristics to look at average time, sort stacks for hit count rather than clients and add aborted stacks to output. r?#dom-storage-reviewers — Details

Phabricator Automation

Updated

•

3 years ago

Attachment #9217967 - Attachment description: Bug 1700915: Sort stacks for hit count rather than clients and add aborted stacks to output. r?#dom-storage-reviewers → Bug 1700915: Change the max_delta heuristics to look at average time, sort stacks for hit count rather than clients and add aborted stacks to output. r?#dom-storage-reviewers

Pulsebot

Comment 13

•

3 years ago

Pushed by jstutte@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/8e0781058d15 Change the max_delta heuristics to look at average time, sort stacks for hit count rather than clients and add aborted stacks to output. r=dom-storage-reviewers,janv

Narcis Beleuzu [:NarcisB]

Comment 14

•

3 years ago

bugherder

https://hg.mozilla.org/mozilla-central/rev/8e0781058d15

Bug 1700915: Python scripts that fetch and analyze QM_TRY errors. r?#dom-storage-reviewers 4 years ago Jens Stutte [:jstutte] 48 bytes, text/x-phabricator-request		Details \| Review
Bug 1700915: Change the max_delta heuristics to look at average time, sort stacks for hit count rather than clients and add aborted stacks to output. r?#dom-storage-reviewers 4 years ago Jens Stutte [:jstutte] 48 bytes, text/x-phabricator-request		Details \| Review