Closed Bug 1535922 Opened 4 years ago Closed 2 years ago

Intermittent TEST-UNEXPECTED-FAIL | leakcheck | default missing output line for total leaks!

Categories

(Testing :: web-platform-tests, defect, P3)

defect

Tracking

(firefox-esr68 unaffected, firefox-esr78 fixed, firefox77 wontfix, firefox78 wontfix, firefox79 fixed, firefox80 fixed)

RESOLVED FIXED
mozilla80
Tracking Status
firefox-esr68 --- unaffected
firefox-esr78 --- fixed
firefox77 --- wontfix
firefox78 --- wontfix
firefox79 --- fixed
firefox80 --- fixed

People

(Reporter: intermittent-bug-filer, Assigned: jgraham)

References

(Depends on 1 open bug, Regression)

Details

(Keywords: intermittent-failure, regression, Whiteboard: [stockwell unknown], [wptsync upstream])

Attachments

(1 file)

#[markdown(off)]
Filed by: aciure [at] mozilla.com

https://treeherder.mozilla.org/logviewer.html#?job_id=234411867&repo=mozilla-central

https://queue.taskcluster.net/v1/task/eIzRdCLhTSigG5MYs6kyZA/runs/0/artifacts/public/logs/live_backing.log

11:53:22 INFO - PROCESS LEAKS c:\users\task_1552822812\appdata\local\temp\tmprkpytw.mozrunner\runtests_leaks_5828.log
11:53:22 INFO - leakcheck | Processing log file c:\users\task_1552822812\appdata\local\temp\tmprkpytw.mozrunner\runtests_leaks_5828.log for scope /css/CSS2/text
11:53:22 INFO - TEST-INFO | leakcheck | default process: leak threshold set at 0 bytes
11:53:22 INFO - TEST-INFO | leakcheck | plugin process: leak threshold set at 0 bytes
11:53:22 INFO - TEST-INFO | leakcheck | tab process: leak threshold set at 10000 bytes
11:53:22 INFO - TEST-INFO | leakcheck | gmplugin process: leak threshold set at 20000 bytes
11:53:22 INFO - TEST-INFO | leakcheck | gpu process: leak threshold set at 0 bytes
11:53:22 INFO - TEST-INFO | leakcheck | rdd process: leak threshold set at 400 bytes
11:53:22 INFO - TEST-INFO | leakcheck | vr process: leak threshold set at 0 bytes
11:53:22 INFO - TEST-INFO | leakcheck | socket process: leak threshold set at 0 bytes
11:53:22 INFO - leakcheck | Processing leak log file c:\users\task_1552822812\appdata\local\temp\tmprkpytw.mozrunner\runtests_leaks_5828.log
11:53:22 INFO - TEST-UNEXPECTED-FAIL | leakcheck | default missing output line for total leaks!
11:53:22 INFO - leakcheck | Processing leak log file c:\users\task_1552822812\appdata\local\temp\tmprkpytw.mozrunner\runtests_leaks_5828_gpu_pid5000.log
11:53:22 INFO - TEST-UNEXPECTED-FAIL | leakcheck | gpu missing output line for total leaks!
11:53:22 INFO - leakcheck | Processing leak log file c:\users\task_1552822812\appdata\local\temp\tmprkpytw.mozrunner\runtests_leaks_5828_tab_pid1448.log
11:53:22 INFO -
11:53:22 INFO - == BloatView: ALL (cumulative) LEAK AND BLOAT STATISTICS, tab process 1448
11:53:22 INFO -
11:53:22 INFO - |<----------------Class--------------->|<-----Bytes------>|<----Objects---->|
11:53:22 INFO - | | Per-Inst Leaked| Total Rem|
11:53:22 INFO - 0 |TOTAL | 48 0| 43334 0|
11:53:22 INFO -
11:53:22 INFO - nsTraceRefcnt::DumpStatistics: 828 entries
11:53:22 INFO - TEST-PASS | leakcheck | tab no leaks detected!
11:53:22 INFO - leakcheck | Processing leak log file c:\users\task_1552822812\appdata\local\temp\tmprkpytw.mozrunner\runtests_leaks_5828_tab_pid6076.log
11:53:22 INFO - TEST-INFO | leakcheck | tab ignoring missing output line for total leaks
11:53:22 INFO - leakcheck | Processing leak log file c:\users\task_1552822812\appdata\local\temp\tmprkpytw.mozrunner\runtests_leaks_5828_tab_pid7260.log
11:53:22 INFO - TEST-INFO | leakcheck | tab ignoring missing output line for total leaks
11:53:22 INFO - leakcheck | Processing leak log file c:\users\task_1552822812\appdata\local\temp\tmprkpytw.mozrunner\runtests_leaks_5828_tab_pid9580.log
11:53:22 INFO - TEST-INFO | leakcheck | tab ignoring missing output line for total leaks
11:53:22 INFO - Closing logging queue
11:53:22 INFO - queue closed
11:53:22 INFO - PROCESS LEAKS c:\users\task_1552822812\appdata\local\temp\tmprkpytw.mozrunner\runtests_leaks_5828.log
11:53:22 INFO - leakcheck | Processing log file c:\users\task_1552822812\appdata\local\temp\tmprkpytw.mozrunner\runtests_leaks_5828.log for scope /css/CSS2/text

Most recent occurrences here are miss-classifications. Removing the stockwell tag.

There are 57 miss-classified failures starting within 19th of April: https://treeherder.mozilla.org/intermittent-failures.html#/bugdetails?startday=2019-04-19&endday=2019-04-21&tree=trunk&bug=1535922

Whiteboard: [stockwell disable-recommended]

Removing the stockwell tag, per comment 7.

Whiteboard: [stockwell disable-recommended]

All of these ^^ are misclassifications.

Whiteboard: [stockwell disable-recommended]

Ignore the above, most of those are misclassifications.

Whiteboard: [stockwell disable-recommended]
Whiteboard: [stockwell disable-recommended]

Most of them are misclassifications.

Whiteboard: [stockwell disable-recommended]

A majority of the above mentioned failures are misclassifications.

Hi Jens, for the failures that are classified correctly, can you assign someone to take a look at this?

Flags: needinfo?(jstutte)

Hi Andreea, probably I miss something, but how can I see which are "the failures that are classified correctly" ? It is difficult for me to understand, if the priority should really change (currently P3, thus we will not get to it soon).

Flags: needinfo?(jstutte) → needinfo?(apavel)

Well just by looking in each failure logs.

I can help with that, though. I'll check the recent failures and correct the ones that are not this bug and ni you after I'm done so you can check the failure rate.

Flags: needinfo?(apavel)

Probably I still miss the point, but the results in your last comment seem not at all related to ServiceWorkers, also to team members of DOM: Workers & Storage. We just see random:

TEST-UNEXPECTED-FAIL | leakcheck | default missing output line for total leaks!

on arbitrary tests. Is there a maintainer for the leakcheck tool that can take a look?

Flags: needinfo?(jstutte)

Andrew, can the leakcheck message be changed similar to the 'Last test finished' one which mentions the failure line?

Flags: needinfo?(continuation)

(In reply to Sebastian Hengst [:aryx] (needinfo on intermittent or backout) from comment #50)

Andrew, can the leakcheck message be changed similar to the 'Last test finished' one which mentions the failure line?

I'm not familiar with what "last test finished" is, but leakcheck runs all at once at the very end of the test run, so I'm not sure how that might work.

I think the real issue with this missing log error message in leakcheck, in terms of creating a big headache for sherriffs, is that it will happen any time we have a crash or timeout, but we only really want it to be an error when there's no other explanation for the missing log. The "default missing output line for total leaks" is nice and consistent, compared to hangs and crashes with can manifest in all sorts of weird ways, so these issues get starred as a leakcheck failure more consistently.

For instance, in this job there are errors like:
PID 8720 | FATAL ERROR: AsyncShutdown timeout in xpcom-will-shutdown Conditions: [{"name":"MediaRecorder::Session 1f53d1fdc80 shutdown","state":"(none)","filename":"z:/build/build/src/dom/media/MediaRecorder.cpp","lineNumber":1036,"stack":"MediaRecorder::Session: shutdown"}] At least one completion condition failed to complete within a reasonable amount of time. Causing a crash to ensure that we do not leave the user with an unresponsive proces PID 8720 | [Child 5228, Main Thread] ###!!! ABORT: file z:/build/build/src/dom/media/MediaRecorder.cpp, line 1036

...which look like they don't have any suggestions for starring. Then we get this, which shows up with a suggestion for starring on the next line:
TEST-UNEXPECTED-FAIL | leakcheck | default missing output line for total leaks!

But clearly the shutdown timeout and crash is the reason there is no leak log. I don't know of any way to coordinate failures like this, so that leakcheck could understand that there's something else in the log that explains the unexpected crash and not produce an error.

Just to provide a contrary example, in this job there's tons of these leakcheck failures without any other crashes, so it is good that the error showed up. The basic failure mode this test is trying to detect is when processes silently exit without producing a leak log, which means we won't actually detect any leaks, if present.

I'll try to think of something I could do to improve the situation here. I'll leave the needinfo up for now.

See Also: → 1560096, 1591678, 1371544, 1239258
Depends on: 1595629

The failure rate for this seems to have dropped off.

Flags: needinfo?(continuation)

I just went through the logs and these are all missclassifications.

Flags: needinfo?(jstutte)
Flags: needinfo?(jstutte)

Recent spike here also seems to be from bug 1626089, James can you take a look?

Flags: needinfo?(james)
Regressed by: 1626089
See Also: → 1628960

Hi Andrew, can we associate this for now with some more general component? To us it just generates noise this way we need to "actively ignore" ;-) Thank you!

Flags: needinfo?(continuation)

Yeah, I'm not sure why this ended up in Service Workers.

Looking at the most recent failures, they are all happening in web platform tests, so I'll move it there. From looking through a few logs, I see things like TEST-KNOWN-INTERMITTENT-CRASH and a similar one for timeouts. I expect that WPT has some kind of automatic whitelisting for tests that crash.

The purpose of this check is to make sure that we don't fail to produce a leak log unexpectedly. For instance, this could happen if somebody runs exit(0) somewhere in shutdown before we write out the leak log. However, in this case we do know why there is no leak log: there was a crash or timeout.

I think what needs to happen here is WPT needs to do something like tell leakcheck to ignore missing logs if there was any kind of known crash or timeout while running the directory. (It would be nicer to have it do that on a per-test basis, but IIRC leakcheck has no knowledge of individual tests, and making that association would be tricky.)

Component: DOM: Service Workers → web-platform-tests
Flags: needinfo?(continuation)
Product: Core → Testing

In the case of unclean shutdown e.g. a crash we probably don't have leak logs to
process, so don't do that, or we end up with a useless extra error we can't
suppress

Assignee: nobody → james
Status: NEW → ASSIGNED

I pushed a patch that just doesn't try to run the leak check if we didn't shut down cleanly. idk if that's a good enough approach.

Flags: needinfo?(james)
Pushed by james@hoppipolla.co.uk:
https://hg.mozilla.org/integration/autoland/rev/08142a084f22
Don't try to process leaks if Firefox didn't shutdown cleanly, r=mccr8
Created web-platform-tests PR https://github.com/web-platform-tests/wpt/pull/24426 for changes under testing/web-platform/tests
Whiteboard: [stockwell unknown] → [stockwell unknown], [wptsync upstream]
Upstream web-platform-tests status checks passed, PR will merge once commit reaches central.
Status: ASSIGNED → RESOLVED
Closed: 2 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla80
Upstream PR merged by moz-wptsync-bot

The patch landed in nightly and beta is affected.
:jgraham, is this bug important enough to require an uplift?
If not please set status_beta to wontfix.

For more information, please visit auto_nag documentation.

Flags: needinfo?(james)
Flags: needinfo?(james)
Has Regression Range: --- → yes
You need to log in before you can comment on or make changes to this bug.