Closed Bug 1715081 Opened 3 years ago Closed 7 months ago

Include the directory in leakcheck output

Categories

(Testing :: Mozbase, task, P3)

Default
task

Tracking

(firefox120 fixed)

RESOLVED FIXED
120 Branch
Tracking Status
firefox120 --- fixed

People

(Reporter: mccr8, Assigned: jmaher)

References

(Blocks 1 open bug)

Details

Attachments

(1 file)

Leak checking runs when we shut down the browser, which makes blaming a specific test difficult. However, for many years now we've only run tests for a single directory at a time, so we should be able to include the name of the directory in the leakcheck output. This will improve leak classification.

See Also: → 1715084

in order to keep track of the directory name, we would need to output the manifest/directory as part of the error message.

given an error like:
10:40:01 INFO - TEST-UNEXPECTED-FAIL | leakcheck | tab 480 bytes leaked (LoadedScript, ScriptFetchOptions, nsAuthURLParser, nsStandardURL, nsStringBuffer)

we would need to consider something like:
10:40:01 INFO - TEST-UNEXPECTED-FAIL | dom/tests/mochitest.ini | tab 480 bytes leaked (LoadedScript, ScriptFetchOptions, nsAuthURLParser, nsStandardURL, nsStringBuffer)

or I would prefer to flip that and have:
10:40:01 INFO - TEST-UNEXPECTED-FAIL | tab 480 bytes leaked (LoadedScript, ScriptFetchOptions, nsAuthURLParser, nsStandardURL, nsStringBuffer) | dom/tests/mochitest/ini

The problem is the line can get very long and wouldn't be easy to visualize nor possible to fit in the bugzilla summary.

Ideally I would have this as a single tracking bug:
10:40:01 INFO - TEST-UNEXPECTED-FAIL | tab 480 bytes leaked (LoadedScript, ScriptFetchOptions, nsAuthURLParser, nsStandardURL, nsStringBuffer) | single tracking bug

  • then we can summarize via TH intermittent failure view or from ./mach test-info ... one could get a list of the manifests that were run when the leak occurred. Just like we do for crashes.

In order to make this bug actionable, I would need a way to shorten the bytes leaked portion of the string, maybe the top 2 items?

I was focusing on smaller leaks in ^. for larger leaks if we can identify on of APZEventState, nsGlobalWindowInner or nsGlobalWindowOuter, possibly we include just that?

the leak tooling would need to be adjusted for that.

For the smaller leaks, I feel like the class names are probably enough. Honestly, it would probably be better to not include the directory, though it would still be nice to have the directory information in the log right after the error. The reason is that small leaks are fairly distinct by their leak signature: if you leak LoadedScript, ScriptFetchOptions, nsAuthURLParser, nsStandardURL, nsStringBuffer that's probably a bug with some script runnable.

On the other hand, window leaks (those that include nsGlobalWindowInner or nsGlobalWindowOuter) all basically look the same so we really need the extra information of the directory. I think we've had it happen more than once that there's some specific high frequency leak that I work on fixing, then another new high frequency leak lands around the same time as my fix for the first leak, and it is hard to tell that the specific leak changed without digging into the logs.

So maybe we need some kind of hybrid approach where leaks that don't match a specific list of classes (nsGlobalWindowInner, nsGlobalWindowOuter and BackstagePass, plus a few more maybe) are handled the way they are now, but ones that do only pick a single "most important" class plus the directory. I think that for these, we don't need to bother with the bytes, as they don't mean a lot for big leaks, and hopefully the directory is a much better distinguishing factor. So something like:
TEST-UNEXPECTED-FAIL | leakcheck | tab nsGlobalWindowInner leaked | dom/tests/mochitest/

WPTs do need to know the number of bytes leaked for leak-threshold, but IIRC that's a separate system that reads the logs directly so that should be okay.

I'd like to keep the leakcheck in there if possible because we also have another mechanism in Mochitest BC for window leaks and it is confusing enough keeping them distinct already.

thanks for the feedback here; so it seems like the most realistic end state (or at least 1st step) is to build a list of large window leaks that we special case handle.

from looking at the last 7 weeks of annotated failures (hint: type leakcheck into the summary filter text box), I see at the front of the list and >10K:
AbstractWatcher
APZEventState
BackstagePass
AsyncFreeSnowWhite
CondVar

I don't see reference to:
nsGlobalWindowInner
nsGlobalWindowOuter

Would we need to treat the processes default, tab, gpu, etc. differently?

AbstractWatcher, APZEventState, BackstagePass, AsyncFreeSnowWhite, CondVar are all basically noise that show up because they are near the start of the alphabet. In the DOM representation of a web page, nsGlobalWindowInner is kind of like the nexus through which all ownership in a webpage passes: all things in the webpage keep it alive, and it keeps them all alive.

I went through a large number of individual logs, and they almost all had nsGlobalWindowInner, nsGlobalWindowOuter, Document, BrowsingContext. Bug 1773318 didn't have nsGlobalWindowInner, but it did have BrowsingContext and nsDocShell.

bugs 1842630, 1835825, and 1791653 didn't have nsGlobalWindowInner and didn't have any other other main thread DOM objects. They do all have BackstagePass, but I'm not sure how illuminating that is. For bug 1842630, it kind of feels like WorkerDebuggerManager might be the central object but that's just a guess. For bug 1835825, I'm not sure, as there's a lot of random junk in there. Bug 1791653 also has WorkerDebuggerManager, so maybe that's similar to bug 1842630. Maybe BackstagePass is fine as a fallback for these cases.

We could potentially have leaks for worker globals, but I'm not exactly sure how those would show up, so maybe we should not worry about those for now.

So maybe we could have a priority list like nsGlobalWindowInner, nsGlobalWindowOuter, Document, nsDocShell, BrowsingContext, BackstagePass. In a leak, it would only show the highest priority one that was present.

Would we need to treat the processes default, tab, gpu, etc. differently?

No, that shouldn't be necessary.

thanks! I guess in the end we need to file bugs and present information that will make it possible to debug and fix the leak. I can work on modifying the leak output to be more like:
firstFound in [nsGlobalWindowInner, nsGlobalWindowOuter, Document, nsDocShell, BrowsingContext, BackstagePass]

TEST-UNEXPECTED-FAIL | leakcheck | tab <firstFound> leaked | dom/tests/mochitest/

Given ^, would we want separate bugs for tab and default ?


If this is output, we can then choose at the treeherder layer to recognize this pattern and go for a single tracking bug which would allow all related messages except the last item (dir = dom/tests/mochitest/), instead that will not be the bug title but relatively easy to discover via mach and inside of treeherder.

(In reply to Joel Maher ( :jmaher ) (UTC -8) from comment #6)

Given ^, would we want separate bugs for tab and default ?

I'm inclined to keep them separate, but I don't know how much it might matter in practice.

If this is output, we can then choose at the treeherder layer to recognize this pattern and go for a single tracking bug which would allow all related messages except the last item (dir = dom/tests/mochitest/), instead that will not be the bug title but relatively easy to discover via mach and inside of treeherder.

Well, to me the entire point of putting the test directory separate is to have different bugs to track different leaks. I guess we could end up with way too many bugs depending on how many directories there actually are.

as discussed on matrix, here is a job showing this working:
https://treeherder.mozilla.org/jobs?repo=try&resultStatus=testfailed%2Cbusted%2Cexception%2Cretry%2Cusercancel%2Crunning%2Cpending%2Crunnable&tier=1%2C2%2C3&revision=539d2cedf8d60e3a71b2794886b1e75374bcb2ce&selectedTaskRun=UkPCuInhRT2EIlhrN5yMbw.0

we decided to make the failure line look like:
TEST-UNEXPECTED-FAIL | leakcheck large nsGlobalWindowInner | browser/components/contextualidentity/test/browser/browser.ini

Assignee: nobody → jmaher
Status: NEW → ASSIGNED
Pushed by jmaher@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/72bb9164219d
Reformat leakcheck for large leaks to include directory. r=gbrown
Status: ASSIGNED → RESOLVED
Closed: 7 months ago
Resolution: --- → FIXED
Target Milestone: --- → 120 Branch
Regressions: 1855861
Component: General → Mozbase
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: