Closed Bug 1700915 Opened 4 years ago Closed 4 years ago

Improve semi-automatic QM_TRY failure analysis

Categories

(Core :: Storage: Quota Manager, task, P2)

task

Tracking

()

RESOLVED FIXED
89 Branch
Tracking Status
firefox89 --- fixed

People

(Reporter: jstutte, Assigned: jstutte)

References

(Blocks 1 open bug)

Details

Attachments

(2 files)

After that we introduced a way to specifically report warnings around QM_TRY in bug 1686191, we can improve the python script currently attached to bug 1482662 regarding the following aspects:

  • Directly query the telemetry API from python
  • Enrich the code location by hg revisions to have direct links into the code
  • Inspect severity and filter for propagated errors
  • Find a more stable signature for a code location (like the surrounding function's name) and group found failures in a specific bug
  • Find a better home than a bugzilla attachment for the script(s) that do all this.

Not all of this might be automated entirely, we are mostly looking for low hanging fruits here to ease the burden of error analysis.

Summary: Improve semi-automatic failure analysis → Improve semi-automatic QM_TRY failure analysis
Depends on: 1686191

The example output looks amazing :)

Find a more stable signature for a code location (like the surrounding function's name) and group found failures in a specific bug

I think having only the surrounding function name would be too coarse. The surrounding function name + the relative line offset within that function might be used to group failures, though. But when a function changes, this might lead to false merges. To avoid that, one could inspect the actual source code to check if it matches (at some point, we considered submitting the expression to telemetry as well, which would still be possible, but might increase the event size significantly. Not sure if that's a problem, this should probably be checked with telemetry folks).

(In reply to Simon Giesecke [:sg] [he/him] from comment #2)

The example output looks amazing :)

Thanks!

Find a more stable signature for a code location (like the surrounding function's name) and group found failures in a specific bug

I think having only the surrounding function name would be too coarse. The surrounding function name + the relative line offset within that function might be used to group failures, though. But when a function changes, this might lead to false merges. To avoid that, one could inspect the actual source code to check if it matches (at some point, we considered submitting the expression to telemetry as well, which would still be possible, but might increase the event size significantly. Not sure if that's a problem, this should probably be checked with telemetry folks).

Well, we will see what we can get from rust-code-analysis. And if a function is too long to serve as grouping, the right action might be to shorten it...? ;-)

(In reply to Jens Stutte [:jstutte] from comment #3)

(In reply to Simon Giesecke [:sg] [he/him] from comment #2)

Well, we will see what we can get from rust-code-analysis. And if a function is too long to serve as grouping, the right action might be to shorten it...? ;-)

Splitting up long functions is surely a good idea, independent of this. But that's a major effort, and not sure if it's feasible to address this issue that way. In particular, where such ambiguity is caused by nested QM_TRY calls, it might not be desirable to really split that up. But this is hard to judge beforehand :)

Assignee: nobody → jstutte
Severity: -- → N/A
Priority: -- → P2

My local WIP script now is able also to find function names as anchors:

Error stacks:

Clients Sessions Hits Anchor Stack
1244 1418 1691 dom/localstorage/ActorsParent.cpp:LoadUsageFile dom/localstorage/ActorsParent.cpp#1028:None
413 472 593 dom/localstorage/ActorsParent.cpp:LoadUsageFile dom/localstorage/ActorsParent.cpp#1013:NS_ERROR_FILE_NOT_FOUND
266 345 5745 dom/cache/FileUtils.cpp:LockedDirectoryPaddingGet dom/cache/FileUtils.cpp#560:NS_ERROR_FILE_NOT_FOUND <- dom/cache/QuotaClient.cpp#394:NS_ERROR_FILE_NOT_FOUND
248 466 468 dom/quota/ActorsParent.cpp:QuotaManager::LoadQuota dom/quota/ActorsParent.cpp#4121:None <- dom/quota/ActorsParent.cpp#4164:NS_ERROR_FAILURE <- dom/quota/ActorsParent.cpp#4202:NS_ERROR_FAILURE
113 145 159 dom/quota/ActorsParent.cpp:QuotaManager::LoadQuota dom/quota/ActorsParent.cpp#4137:None <- dom/quota/ActorsParent.cpp#4164:NS_ERROR_FAILURE <- dom/quota/ActorsParent.cpp#4202:NS_ERROR_FAILURE
23 28 1225 dom/quota/ActorsParent.cpp:GetBinaryInputStream dom/quota/ActorsParent.cpp#2582:NS_ERROR_FILE_NOT_FOUND <- dom/quota/ActorsParent.cpp#4555:NS_ERROR_FILE_NOT_FOUND <- dom/quota/ActorsParent.cpp#4628:NS_ERROR_FILE_NOT_FOUND
19 45 45 dom/quota/ActorsParent.cpp:QuotaManager::LoadQuota dom/quota/ActorsParent.cpp#4113:None <- dom/quota/ActorsParent.cpp#4156:NS_ERROR_FAILURE <- dom/quota/ActorsParent.cpp#4194:NS_ERROR_FAILURE
16 21 21 dom/quota/ActorsParent.cpp:QuotaManager::LoadQuota dom/quota/ActorsParent.cpp#4113:None <- dom/quota/ActorsParent.cpp#4156:NS_ERROR_FAILURE <- dom/quota/ActorsParent.cpp#4194:NS_ERROR_FAILURE
16 20 3421 dom/cache/FileUtilsImpl.h:BodyTraverseFiles dom/cache/FileUtilsImpl.h#74:None
14 21 460 dom/cache/FileUtils.cpp:LockedDirectoryPaddingGet dom/cache/FileUtils.cpp#564:NS_ERROR_FILE_NOT_FOUND <- dom/cache/QuotaClient.cpp#394:NS_ERROR_FILE_NOT_FOUND
14 24 288 dom/cache/FileUtils.cpp:LockedDirectoryPaddingGet dom/cache/FileUtils.cpp#564:NS_ERROR_FILE_NOT_FOUND <- dom/cache/QuotaClient.cpp#394:NS_ERROR_FILE_NOT_FOUND
10 14 1631 dom/quota/ActorsParent.cpp:GetBinaryInputStream dom/quota/ActorsParent.cpp#2582:NS_ERROR_FILE_NOT_FOUND <- dom/quota/ActorsParent.cpp#9625:NS_ERROR_FILE_NOT_FOUND

Now we should start to think about the workflow we want to achieve with this.

See Also: → 1701966
Depends on: 1701966
See Also: 1701966
Attachment #9212557 - Attachment description: WIP: Bug 1700915: WIP Initial set of python scripts. → WIP: Bug 1700915: Initial set of python scripts.
Attachment #9212557 - Attachment description: WIP: Bug 1700915: Initial set of python scripts. → WIP: Bug 1700915: Python scripts that fetch and analyze QM_TRY errors.

Found 17377 rows of data.
No revision for build.id 20210329100226
No revision for build.id 20210329172617
Found 29 error stacks.
Found 26 warning stacks.
Found 0 info stacks.
Found 11 aborted stacks.

Error stacks:

Clients Sessions Hits Anchor Stack
54 57 81 dom/localstorage/ActorsParent.cpp:LoadUsageFile dom/localstorage/ActorsParent.cpp#1033:None
45 72 1587 dom/cache/FileUtils.cpp:LockedDirectoryPaddingGet dom/cache/FileUtils.cpp#564:NS_ERROR_FILE_NOT_FOUND <- dom/cache/QuotaClient.cpp#394:NS_ERROR_FILE_NOT_FOUND
45 59 60 dom/quota/ActorsParent.cpp:QuotaManager::LoadQuota dom/quota/ActorsParent.cpp#4113:None <- dom/quota/ActorsParent.cpp#4156:NS_ERROR_FAILURE <- dom/quota/ActorsParent.cpp#4194:NS_ERROR_FAILURE
40 96 96 dom/quota/ActorsParent.cpp:QuotaManager::LoadQuota dom/quota/ActorsParent.cpp#4113:None <- dom/quota/ActorsParent.cpp#4156:NS_ERROR_FAILURE <- dom/quota/ActorsParent.cpp#4194:NS_ERROR_FAILURE
38 84 84 dom/quota/ActorsParent.cpp:QuotaManager::LoadQuota dom/quota/ActorsParent.cpp#4113:None <- dom/quota/ActorsParent.cpp#4156:NS_ERROR_FAILURE <- dom/quota/ActorsParent.cpp#4194:NS_ERROR_FAILURE
33 74 1639 dom/cache/FileUtils.cpp:LockedDirectoryPaddingGet dom/cache/FileUtils.cpp#564:NS_ERROR_FILE_NOT_FOUND <- dom/cache/QuotaClient.cpp#394:NS_ERROR_FILE_NOT_FOUND
21 31 567 dom/cache/FileUtils.cpp:LockedDirectoryPaddingGet dom/cache/FileUtils.cpp#564:NS_ERROR_FILE_NOT_FOUND <- dom/cache/QuotaClient.cpp#394:NS_ERROR_FILE_NOT_FOUND
15 17 20 dom/quota/ActorsParent.cpp:QuotaManager::LoadQuota dom/quota/ActorsParent.cpp#4129:None <- dom/quota/ActorsParent.cpp#4156:NS_ERROR_FAILURE <- dom/quota/ActorsParent.cpp#4194:NS_ERROR_FAILURE
14 19 23 dom/quota/ActorsParent.cpp:QuotaManager::LoadQuota dom/quota/ActorsParent.cpp#4129:None <- dom/quota/ActorsParent.cpp#4156:NS_ERROR_FAILURE <- dom/quota/ActorsParent.cpp#4194:NS_ERROR_FAILURE
11 14 17 dom/quota/ActorsParent.cpp:QuotaManager::LoadQuota dom/quota/ActorsParent.cpp#4129:None <- dom/quota/ActorsParent.cpp#4156:NS_ERROR_FAILURE <- dom/quota/ActorsParent.cpp#4194:NS_ERROR_FAILURE
6 6 9 dom/localstorage/ActorsParent.cpp:<Unknown 1013> dom/localstorage/ActorsParent.cpp#1013:NS_ERROR_FILE_TARGET_DOES_NOT_EXIST
5 8 8 dom/quota/ActorsParent.cpp:ClientUsageArray::Deserialize dom/quota/ActorsParent.cpp#7081:NS_ERROR_ILLEGAL_VALUE <- dom/quota/ActorsParent.cpp#4093:NS_ERROR_ILLEGAL_VALUE <- dom/quota/ActorsParent.cpp#4156:NS_ERROR_ILLEGAL_VALUE <- dom/quota/ActorsParent.cpp#4194:NS_ERROR_ILLEGAL_VALUE
3 4 4 dom/quota/ActorsParent.cpp:ClientUsageArray::Deserialize dom/quota/ActorsParent.cpp#7081:NS_ERROR_ILLEGAL_VALUE <- dom/quota/ActorsParent.cpp#4093:NS_ERROR_ILLEGAL_VALUE <- dom/quota/ActorsParent.cpp#4156:NS_ERROR_ILLEGAL_VALUE <- dom/quota/ActorsParent.cpp#4194:NS_ERROR_ILLEGAL_VALUE
3 3 6 dom/quota/ActorsParent.cpp:GetBinaryInputStream dom/quota/ActorsParent.cpp#2582:NS_ERROR_FILE_NOT_FOUND <- dom/quota/ActorsParent.cpp#9610:NS_ERROR_FILE_NOT_FOUND
3 4 11 dom/quota/ActorsParent.cpp:GetBinaryInputStream dom/quota/ActorsParent.cpp#2582:NS_ERROR_FILE_NOT_FOUND <- dom/quota/ActorsParent.cpp#9610:NS_ERROR_FILE_NOT_FOUND
2 9 9 dom/quota/ActorsParent.cpp:ClientUsageArray::Deserialize dom/quota/ActorsParent.cpp#7081:NS_ERROR_ILLEGAL_VALUE <- dom/quota/ActorsParent.cpp#4093:NS_ERROR_ILLEGAL_VALUE <- dom/quota/ActorsParent.cpp#4156:NS_ERROR_ILLEGAL_VALUE <- dom/quota/ActorsParent.cpp#4194:NS_ERROR_ILLEGAL_VALUE
2 2 2 dom/quota/ActorsParent.cpp:<Unknown 4121> dom/quota/ActorsParent.cpp#4121:None <- dom/quota/ActorsParent.cpp#4164:NS_ERROR_FAILURE <- dom/quota/ActorsParent.cpp#4202:NS_ERROR_FAILURE
2 2 2 dom/quota/ActorsParent.cpp:<Unknown 4137> dom/quota/ActorsParent.cpp#4137:None <- dom/quota/ActorsParent.cpp#4164:NS_ERROR_FAILURE <- dom/quota/ActorsParent.cpp#4202:NS_ERROR_FAILURE
2 2 15 dom/quota/ActorsParent.cpp:GetBinaryInputStream dom/quota/ActorsParent.cpp#2582:NS_ERROR_FILE_NOT_FOUND <- dom/quota/ActorsParent.cpp#9610:NS_ERROR_FILE_NOT_FOUND
1 1 1 dom/quota/ActorsParent.cpp:GetBinaryInputStream dom/quota/ActorsParent.cpp#2582:NS_ERROR_FILE_NOT_FOUND
1 1 1 dom/quota/ActorsParent.cpp:GetBinaryInputStream dom/quota/ActorsParent.cpp#2582:NS_ERROR_FILE_NOT_FOUND <- dom/quota/ActorsParent.cpp#4547:NS_ERROR_FILE_NOT_FOUND
1 1 1 dom/quota/ActorsParent.cpp:GetBinaryInputStream dom/quota/ActorsParent.cpp#2582:NS_ERROR_FILE_NOT_FOUND <- dom/quota/ActorsParent.cpp#9610:NS_ERROR_FILE_NOT_FOUND <- dom/indexedDB/ActorsParent.cpp#12620:None <- dom/indexedDB/ActorsParent.cpp#12620:None <- dom/indexedDB/ActorsParent.cpp#12620:None <- dom/indexedDB/ActorsParent.cpp#12620:None <- dom/indexedDB/ActorsParent.cpp#12620:None <- dom/indexedDB/ActorsParent.cpp#12620:None <- dom/indexedDB/ActorsParent.cpp#12620:None <- dom/indexedDB/ActorsParent.cpp#12620:None
1 1 2 dom/quota/ActorsParent.cpp:<Unknown 439> dom/quota/ActorsParent.cpp#439:NS_ERROR_STORAGE_IOERR
1 1 1 dom/quota/ActorsParent.cpp:<Unknown 2582> dom/quota/ActorsParent.cpp#2582:NS_ERROR_FILE_NOT_FOUND <- dom/quota/ActorsParent.cpp#4555:NS_ERROR_FILE_NOT_FOUND <- dom/quota/ActorsParent.cpp#4628:NS_ERROR_FILE_NOT_FOUND
1 1 1 dom/cache/FileUtils.cpp:LockedDirectoryPaddingGet dom/cache/FileUtils.cpp#564:NS_ERROR_FILE_NOT_FOUND <- dom/cache/QuotaClient.cpp#394:NS_ERROR_FILE_NOT_FOUND <- dom/cache/FileUtils.cpp#490:NS_ERROR_FILE_ACCESS_DENIED <- dom/cache/FileUtils.cpp#490:NS_ERROR_FILE_ACCESS_DENIED
1 1 1 dom/quota/ActorsParent.cpp:GetBinaryInputStream dom/quota/ActorsParent.cpp#2582:NS_ERROR_FILE_NOT_FOUND <- dom/quota/ActorsParent.cpp#4547:NS_ERROR_FILE_NOT_FOUND
1 1 1 dom/cache/FileUtils.cpp:RemoveNsIFileRecursively dom/cache/FileUtils.cpp#490:NS_ERROR_FILE_DIR_NOT_EMPTY
1 1 1 dom/cache/FileUtils.cpp:RemoveNsIFileRecursively dom/cache/FileUtils.cpp#490:NS_ERROR_FILE_DIR_NOT_EMPTY
1 1 1 dom/cache/FileUtils.cpp:RemoveNsIFileRecursively dom/cache/FileUtils.cpp#490:NS_ERROR_FILE_DIR_NOT_EMPTY
See Also: → 1702411
Attachment #9212557 - Attachment description: WIP: Bug 1700915: Python scripts that fetch and analyze QM_TRY errors. → Bug 1700915: Python scripts that fetch and analyze QM_TRY errors. r?#dom-storage-reviewers

We might want to add a rule, that treats upgrading of the severity within a stack as the start of a new stack. During propagation it is unlikely that we will transform a WARNING into an ERROR.

Pushed by jstutte@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/9b0779f96ab6 Python scripts that fetch and analyze QM_TRY errors. r=dom-storage-reviewers,janv

This specific bug's goals can be considered achieved:

  • Directly query the telemetry API from python
  • Enrich the code location by hg revisions to have direct links into the code
  • Inspect severity and filter for propagated errors
  • Find a more stable signature for a code location (like the surrounding function's name) and group found failures
  • Find a better home than a bugzilla attachment for the scripts that do all this.

Missing from the original goals is only:

  • Group found failures in a specific bug automatically.

And there are additional caveats we want to examine.

We will file further bugs for those.

See Also: → 1703840
No longer depends on: 1701966
Status: NEW → RESOLVED
Closed: 4 years ago
Resolution: --- → FIXED
Target Milestone: --- → 89 Branch
See Also: → 1706928
Attachment #9217967 - Attachment description: Bug 1700915: Sort stacks for hit count rather than clients and add aborted stacks to output. r?#dom-storage-reviewers → Bug 1700915: Change the max_delta heuristics to look at average time, sort stacks for hit count rather than clients and add aborted stacks to output. r?#dom-storage-reviewers
Pushed by jstutte@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/8e0781058d15 Change the max_delta heuristics to look at average time, sort stacks for hit count rather than clients and add aborted stacks to output. r=dom-storage-reviewers,janv
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: