Closed Bug 1352355 Opened 3 years ago Closed 10 months ago

Enable leak checking for web-platform-tests

Categories

(Testing :: web-platform-tests, enhancement)

Version 3
enhancement
Not set

Tracking

(firefox65 fixed)

RESOLVED FIXED
mozilla65
Tracking Status
firefox65 --- fixed

People

(Reporter: jgraham, Assigned: jgraham)

References

(Depends on 3 open bugs)

Details

(Whiteboard: [MemShrink:P1][wptsync upstream])

Attachments

(7 files)

Because this should have been done some time ago.

One problem is how to deal with tests from upstream that leak on import. We can't delay the import in order to fix them, so we might need to add a metadata mechansim for skipping leak checking during those tests, or similar.
Depends on: 1333114
Depends on: 1352351
No longer depends on: 1333114
Depends on: 1352357
Depends on: 1333114
Whiteboard: [MemShrink]
Whiteboard: [MemShrink] → [P1:MemShrink]
Whiteboard: [P1:MemShrink] → [MemShrink:P1]
James, I'd be happy to have the service worker test expected fail if we can turn this on.
James, can we get this enabled soon even if it means disabling tests or marking them expected fail/leak?  We are depending more and more on WPT for sole coverage of features and leak testing is essential.
Flags: needinfo?(james)
I will pick this up once we have the new import process running. The central difficulty here is that if we import tests that start to leak it's hard to figure out what we need to do in terms of disabling tests or disabling leak checks until things are fixed; because the leak checks only run on shutdown the level of granularity is very high (basically nearest directory) so the obvious thing to do is start disabling leak checks for entire directories on import. But doing smaller more regular imports should help since it will be easier to isolate the problematic tests and easier to inform developers that something bad is happening.

Agreed on the priority; I will ensure I pick this up before the end of the year.
Flags: needinfo?(james)
James, is there still work to be done here? Are there any plans to continue work on the blocking bugs?
Flags: needinfo?(james)
There is a patch, but I dropped the ball on getting it landed, mostly I think because in my last try run it looked quite (but not entirely) green even with a deliberate leak. I'll give it another go now.
Flags: needinfo?(james)
So there are a few leaks, one in /media-source/ and one in referrer-policy/ and a few more "negative leaks caught" which I don't really know what to do with; at the moment I don't think there's anything in the patch that allows marking that as expected for any given directory. Maybe I need to add it (it also looks like some of the wpt expectation integration in that patch is outdated, so that needs to be fixed too).
Flags: needinfo?(erahm)
What kind of information are you looking for? The negative leak is interesting. I can try investigating it.
Flags: needinfo?(erahm) → needinfo?(continuation)
I guess I should actually needinfo you. I'm happy to help out with investigating leaks that this has turned up so that we can get it landed. Is what I meant. I'll file a bug for the negative leaks thing.
Flags: needinfo?(continuation) → needinfo?(james)
I should have been more specific, sorry. The negative leak thing was what I was looking for info on, so "it's interesting, I can try investigating is" is the perfect outcome, thanks :)
Flags: needinfo?(james)
Getting there slowly:

https://treeherder.mozilla.org/#/jobs?repo=try&selectedJob=211716520&revision=e716aba03ed6cb03155797232d51df2ceb693abd

The only thing left that I don't know how to deal with is "Missing output line for total leaks"; that looks like it's associated with something bad happening, but I'm worried about the case that we import some tests that cause that kind of problem; there isn't a way in the current scheme to represent "this is expected to leak so much that it breaks the leak checker".
"Missing output line for total leaks" simply means that the leak checker did not output anything to the leak log file before the process terminated. The most common reason for this is that the process crashed. Intentional crashes in Mochitests are detected, and missing logs for those processes are not reported. I would not imagine that web platform tests involve crashes, but who knows.
I looked at one of the logs, and there's a shutdown hang:

17:40:53     INFO - PID 3168 | Hit MOZ_CRASH(Shutdown too long, probably frozen, causing a crash.) at z:/build/build/src/toolkit/components/terminator/nsTerminator.cpp:219
17:40:53     INFO - PID 3168 | #01: PRP_TryLock[Z:\task_1542211939\build\application\firefox\nss3.dll +0x12f285]
17:40:53     INFO - PID 3168 | #02: PR_MD_UNLOCK[Z:\task_1542211939\build\application\firefox\nss3.dll +0x11e34d]
17:40:53     INFO - PID 3168 | #03: o____lc_collate_cp_func[Z:\task_1542211939\build\application\firefox\ucrtbase.DLL +0x3e16f]
17:40:53     INFO - PID 3168 | #04: BaseThreadInitThunk[C:\windows\system32\kernel32.dll +0x53c45]
17:40:53     INFO - PID 3168 | #05: DllBlocklist_Initialize[Z:\task_1542211939\build\application\firefox\mozglue.dll +0x6564]
17:40:53     INFO - PID 3168 | #06: RtlInitializeExceptionChain[C:\windows\SYSTEM32\ntdll.dll +0x637f5]
17:40:53     INFO - PID 3168 | #07: RtlInitializeExceptionChain[C:\windows\SYSTEM32\ntdll.dll +0x637c8]

I have no idea what that stack means.
That usually means that some other thread is stuck doing something, which in turn makes the main thread wait, and the main thread failed to notify nsTerminator that it's still alive.  Can you see what all the other threads are doing?
Unfortunately that seems to be the only crash information. I know the full crash dump thing does include information about other threads, but I don't see the output for it.
Err, minidump-stackwalk is what I mean by "full crash dump thing" in case that was no clear.
https://treeherder.mozilla.org/#/jobs?repo=try&tier=1%2C2%2C3&revision=a507e115e7ac8b3f5166b5381588b7e6c1aeedb8 looks more promising; I think we are only getting the "missing the output line" error when a test unexpectedly fails; I will check the harness is doing the right thing in those cases.
Moves mozleak to use structured logging. The logger gets two new
actions, mozleak_object to indicate the name of an object that leaked
in a specific process and mozleak_total to indicate the total number
of bytes leaked in a process.

The output from the TBPL formatter is
expected to remain near-identical to the previous output from the
logger, so there shouldn't be any effect on the ability to fail jobs
if there are leaks.

Additional features required by web-platform-tests are also added
here; the leak thresholds are passed to the logger for mozleak_total
and a list of any objects allowed to leak are passed for
mozleak_object, so that a log consumer may decide whether a leak is
unexpected. In addition, the scope attribute is used to specify the
set of tests (or other tasks) running at the time of the leak, which
may be used to associate a leak with a specific set of files.

MozReview-Commit-ID: 19FsMxVQExH

Depends on D12408
This adds two new properties to wpt metadata files:

mozleak-allowed - This is a list of the form [process-name:object
name], which indicates objects that may be leaked in that specific
process. Automatic updates that find a leak may add that object to
this list.

mozleak-threshold - This is a list (but conceptually a map) of
[process-name: threshold bytes], indicating a threshold below which
leaks will not cause a test failure. This number is updated by setting
it to the observed value for a process.

MozReview-Commit-ID: KA1oPl837a8

Depends on D12410
Depends on D12411
MozReview-Commit-ID: 7JqVXVibiwy

Depends on D12412
Depends on D12413
The settngs() method is called to get a list of session-level settings
that should be applied when running a test. If those settings differ
from the ones applied to the previous test then we have to restart the
browser. But if we set the session on the class in the settings method
then at shutdown time we'll have incorrect settings. This is important
for things like leak checking where the values affect the shutdown
process. So instead ensure that we only set the settings on the class
during startup.

Depends on D12414
Attachment #9026338 - Attachment description: Bug 1352355 - Convert mozleak to structured logging, , mccr8 → Bug 1352355 - Convert mozleak to structured logging, ,
Pushed by james@hoppipolla.co.uk:
https://hg.mozilla.org/integration/autoland/rev/5441e0814fb9
Convert mozleak to structured logging, , r=ahal,mccr8
https://hg.mozilla.org/integration/autoland/rev/80a92104544f
Allow lists as default values in wpt expectation metadata files, r=ato
https://hg.mozilla.org/integration/autoland/rev/02e4dbfecbc4
Enable storing and updating mozleak metadata in wpt ini, r=ato
https://hg.mozilla.org/integration/autoland/rev/d40ebdfc91a0
Ensure leaks cause mozharness to fail, r=ahal
https://hg.mozilla.org/integration/autoland/rev/f3f4ea333cc4
Enable leak checks by default in wpt debug builds, r=ato
https://hg.mozilla.org/integration/autoland/rev/250022f4c01d
Add metadata to allow existing leaks, r=mccr8
https://hg.mozilla.org/integration/autoland/rev/36c4c80ddc1a
Fix use of wpt settings in firefox Browser, r=ato
Whiteboard: [MemShrink:P1] → [MemShrink:P1][wptsync upstream error]
Assignee: nobody → james
Created web-platform-tests PR https://github.com/web-platform-tests/wpt/pull/14464 for changes under testing/web-platform/tests
Whiteboard: [MemShrink:P1][wptsync upstream error] → [MemShrink:P1][wptsync upstream]
Upstream PR was closed without merging
Can't merge web-platform-tests PR due to failing upstream checks:
Github PR https://github.com/web-platform-tests/wpt/pull/14464
* Taskcluster (pull_request) (https://tools.taskcluster.net/task-group-inspector/#/DfdlAqEwS2qEM_NPHr8CwQ)
Whiteboard: [MemShrink:P1][wptsync upstream] → [MemShrink:P1][wptsync upstream error]
Upstream PR was closed without merging
Whiteboard: [MemShrink:P1][wptsync upstream error] → [MemShrink:P1][wptsync upstream]
Pushed by james@hoppipolla.co.uk:
https://hg.mozilla.org/integration/mozilla-inbound/rev/1d0b0692f99c
[wpt PR 14464] - [Gecko Bug 1352355] Enable storing and updating mozleak metadata in wpt ini, a=testonly
Pushed by james@hoppipolla.co.uk:
https://hg.mozilla.org/integration/mozilla-inbound/rev/ca1061daf2ea
[wpt PR 14464] - [Gecko Bug 1352355] Enable storing and updating mozleak metadata in wpt ini, a=testonly
You need to log in before you can comment on or make changes to this bug.