Closed Bug 1630974 Opened 4 years ago Closed 3 years ago

Intermittent leakcheck | default 1328 bytes leaked (Mutex, PollableEvent, ReentrantMonitor, nsAStreamCopier, nsPipe, ...)

Categories

(Core :: Networking, defect, P5)

defect

Tracking

()

RESOLVED INCOMPLETE

People

(Reporter: intermittent-bug-filer, Unassigned)

References

(Regression)

Details

(Keywords: intermittent-failure, regression, Whiteboard: [necko-triaged][stockwell unknown])

Filed by: ccoroiu [at] mozilla.com
Parsed log: https://treeherder.mozilla.org/logviewer.html#?job_id=298077741&repo=autoland
Full log: https://firefox-ci-tc.services.mozilla.com/api/queue/v1/task/V3vEueWbT5GWgIIbHAvf-g/runs/0/artifacts/public/logs/live_backing.log


[task 2020-04-17T11:46:02.305Z] 11:46:02 INFO - TEST-ERROR | /infrastructure/testdriver/actions/multiTouchPointsReleaseSecondPoint.html | took 892ms
[task 2020-04-17T11:46:02.325Z] 11:46:02 INFO - PID 2332 | 1587123962318 Marionette INFO Stopped listening on port 51763
[task 2020-04-17T11:46:02.373Z] 11:46:02 INFO - .
[task 2020-04-17T11:46:02.373Z] 11:46:02 INFO - TEST-OK | /offscreen-canvas/the-canvas-state/2d.state.saverestore.underflow.worker.html | took 866ms
[task 2020-04-17T11:46:02.416Z] 11:46:02 INFO - PID 3784 | 1587123962396 Marionette INFO Stopped listening on port 51556
[task 2020-04-17T11:46:02.612Z] 11:46:02 INFO - PID 2332 | ###!!! [Parent][RunMessage] Error: Channel closing: too late to send/recv, messages will be lost
[task 2020-04-17T11:46:02.613Z] 11:46:02 INFO - PID 2332 | ###!!! [Parent][RunMessage] Error: Channel closing: too late to send/recv, messages will be lost
[task 2020-04-17T11:46:02.620Z] 11:46:02 INFO - PID 2332 | ###!!! [Parent][RunMessage] Error: Channel closing: too late to send/recv, messages will be lost
[task 2020-04-17T11:46:02.620Z] 11:46:02 INFO - PID 2332 | ###!!! [Parent][RunMessage] Error: Channel closing: too late to send/recv, messages will be lost
[task 2020-04-17T11:46:02.620Z] 11:46:02 INFO - PID 2332 | ###!!! [Parent][RunMessage] Error: Channel closing: too late to send/recv, messages will be lost
[task 2020-04-17T11:46:02.744Z] 11:46:02 INFO - PID 2332 | [2020-04-17T11:46:02Z WARN xulstore::persist] tried to remove key that isn't in the store
[task 2020-04-17T11:46:02.744Z] 11:46:02 INFO - PID 2332 | [2020-04-17T11:46:02Z WARN xulstore::persist] tried to remove key that isn't in the store
[task 2020-04-17T11:46:02.826Z] 11:46:02 INFO - PID 3784 | ###!!! [Parent][RunMessage] Error: Channel closing: too late to send/recv, messages will be lost

[task 2020-04-17T11:46:04.732Z] 11:46:04 INFO - TEST-PASS | leakcheck | tab no leaks detected!
[task 2020-04-17T11:46:04.735Z] 11:46:04 INFO - leakcheck | Processing leak log file c:\users\task_1587122719\appdata\local\temp\tmpgibqhy\runtests_leaks_3664_tab_pid5996.log
[task 2020-04-17T11:46:04.735Z] 11:46:04 INFO -
[task 2020-04-17T11:46:04.735Z] 11:46:04 INFO - == BloatView: ALL (cumulative) LEAK AND BLOAT STATISTICS, tab process 5996
[task 2020-04-17T11:46:04.735Z] 11:46:04 INFO -
[task 2020-04-17T11:46:04.736Z] 11:46:04 INFO - |<----------------Class--------------->|<-----Bytes------>|<----Objects---->|
[task 2020-04-17T11:46:04.737Z] 11:46:04 INFO - | | Per-Inst Leaked| Total Rem|
[task 2020-04-17T11:46:04.737Z] 11:46:04 INFO - 0 |TOTAL | 26 0| 90420 0|
[task 2020-04-17T11:46:04.744Z] 11:46:04 INFO - 1518 |nsAStreamCopier | 20 20| 414 1|
[task 2020-04-17T11:46:04.751Z] 11:46:04 INFO - 1819 |nsPipe | 144 144| 773 1|
[task 2020-04-17T11:46:04.751Z] 11:46:04 INFO - 1820 |nsPipeInputStream | 88 88| 773 1|
[task 2020-04-17T11:46:04.754Z] 11:46:04 INFO - 1871 |nsSocketTransport | 616 616| 81 1|
[task 2020-04-17T11:46:04.754Z] 11:46:04 INFO - 1872 |nsSocketTransportService | 248 248| 1 1|
[task 2020-04-17T11:46:04.755Z] 11:46:04 INFO - 1883 |nsStringBuffer | 8 8| 262311 1|
[task 2020-04-17T11:46:04.756Z] 11:46:04 INFO - 1921 |nsTArray_base | 4 8| 9647813 2|
[task 2020-04-17T11:46:04.757Z] 11:46:04 INFO -
[task 2020-04-17T11:46:04.758Z] 11:46:04 INFO - nsTraceRefcnt::DumpStatistics: 921 entries
[task 2020-04-17T11:46:04.759Z] 11:46:04 INFO - TEST-PASS | leakcheck | tab no leaks detected!
[task 2020-04-17T11:46:04.759Z] 11:46:04 INFO -
[task 2020-04-17T11:46:04.760Z] 11:46:04 INFO - nsTraceRefcnt::DumpStatistics: 2024 entries
[task 2020-04-17T11:46:04.761Z] 11:46:04 INFO - TEST-INFO | leakcheck | default leaked 3 Mutex
[task 2020-04-17T11:46:04.761Z] 11:46:04 INFO - TEST-INFO | leakcheck | default leaked 1 PollableEvent
[task 2020-04-17T11:46:04.762Z] 11:46:04 INFO - TEST-INFO | leakcheck | default leaked 1 ReentrantMonitor
[task 2020-04-17T11:46:04.763Z] 11:46:04 INFO - TEST-INFO | leakcheck | default leaked 1 nsAStreamCopier
[task 2020-04-17T11:46:04.763Z] 11:46:04 INFO - TEST-INFO | leakcheck | default leaked 1 nsPipe
[task 2020-04-17T11:46:04.764Z] 11:46:04 INFO - Closing logging queue
[task 2020-04-17T11:46:04.764Z] 11:46:04 INFO - TEST-INFO | leakcheck | default leaked 1 nsPipeInputStream
[task 2020-04-17T11:46:04.765Z] 11:46:04 INFO - queue closed
[task 2020-04-17T11:46:04.765Z] 11:46:04 INFO - TEST-INFO | leakcheck | default leaked 1 nsSocketTransport
[task 2020-04-17T11:46:04.765Z] 11:46:04 INFO - TEST-INFO | leakcheck | default leaked 1 nsSocketTransportService
[task 2020-04-17T11:46:04.766Z] 11:46:04 INFO - TEST-INFO | leakcheck | default leaked 1 nsStringBuffer
[task 2020-04-17T11:46:04.767Z] 11:46:04 INFO - TEST-INFO | leakcheck | default leaked 2 nsTArray_base
[task 2020-04-17T11:46:04.767Z] 11:46:04 INFO - TEST-UNEXPECTED-FAIL | leakcheck | default 1328 bytes leaked (Mutex, PollableEvent, ReentrantMonitor, nsAStreamCopier, nsPipe, ...)
[task 2020-04-17T11:46:04.767Z] 11:46:04 INFO -
[task 2020-04-17T11:46:04.770Z] 11:46:04 INFO - leakcheck | Processing leak log file c:\users\task_1587122719\appdata\local\temp\tmpxwwryh\runtests_leaks_3664_tab_pid136.log
[task 2020-04-17T11:46:04.770Z] 11:46:04 INFO -
[task 2020-04-17T11:46:04.770Z] 11:46:04 INFO - == BloatView: ALL (cumulative) LEAK AND BLOAT STATISTICS, tab process 136
[task 2020-04-17T11:46:04.770Z] 11:46:04 INFO -
[task 2020-04-17T11:46:04.770Z] 11:46:04 INFO - |<----------------Class--------------->|<-----Bytes------>|<----Objects---->|
[task 2020-04-17T11:46:04.771Z] 11:46:04 INFO - | | Per-Inst Leaked| Total Rem|
[task 2020-04-17T11:46:04.771Z] 11:46:04 INFO - 0 |TOTAL | 29 0| 44511 0|
[task 2020-04-17T11:46:04.778Z] 11:46:04 INFO -
[task 2020-04-17T11:46:04.778Z] 11:46:04 INFO - nsTraceRefcnt::DumpStatistics: 978 entries
[task 2020-04-17T11:46:04.778Z] 11:46:04 INFO - TEST-PASS | leakcheck | tab no leaks detected!
[task 2020-04-17T11:46:04.778Z] 11:46:04 INFO - leakcheck | Processing leak log file c:\users\task_1587122719\appdata\local\temp\tmpxwwryh\runtests_leaks_3664_tab_pid1536.log
[task 2020-04-17T11:46:04.778Z] 11:46:04 INFO -

This looks like stream/pipe/transport service stuff leaking.

Component: Canvas: 2D → Networking
Whiteboard: [necko-triaged]

So this problem is now a perma-fail on mac with my changes in bug 1595994 that perform all video or audio decoding out of process.
https://treeherder.mozilla.org/logviewer.html#/jobs?job_id=317858635&repo=try&lineNumber=3801

Who can look into this? Jim ,could you ping whomever is in charge there, all the people I knew are no longer at mozilla.

Flags: needinfo?(jmathies)

(In reply to Jean-Yves Avenard [:jya] from comment #20)

So this problem is now a perma-fail on mac with my changes in bug 1595994 that perform all video or audio decoding out of process.

Can you reproduce the perma-fail locally? Could you please file a new bug blocking this one with a single rollup patch and the hg changeset the patch is made against to this bug so that somebody who is investigating it can easily build it locally or push to try? A separate bug would be good because the issue you are seeing may or may not be the same as all of the failures in this bug, because the intermittent failures are in a number of directories. Thanks.

Flags: needinfo?(jyavenard)
Has Regression Range: --- → yes
Whiteboard: [necko-triaged][stockwell disable-recommended] → [necko-triaged][stockwell needswork:owner]

The heap things that patch adds look to be either booleans or strings, so there's nothing obvious from that angle that could cause a leak.

I did a try push with logging enabled for string buffers, so hopefully that will give us some more information about what is going wrong: https://treeherder.mozilla.org/#/jobs?repo=try&revision=f425b2bec2091d6e1e1423f43094cdef449c2a3a

The leaked string is "127.0.0.1", which probably doesn't help very much.

The allocation stack of the leaked string is probably also not very helpful:
#00: nsTSubstring<char>::StartBulkWriteImpl(unsigned int, unsigned int, bool, unsigned int, unsigned int, unsigned int) [xpcom/string/nsTSubstring.cpp:204]
#01: nsTSubstring<char>::Assign(char const*, unsigned int, std::nothrow_t const&) [xpcom/string/nsTSubstring.cpp:408]
#02: nsTSubstring<char>::Assign(char const*, unsigned int) [xpcom/string/nsTSubstring.cpp:380]
#03: mozilla::net::nsSocketTransport::InitWithConnectedSocket(PRFileDesc*, mozilla::net::NetAddr const*) [netwerk/base/nsSocketTransport2.cpp:899]
#04: mozilla::net::nsServerSocket::CreateClientTransport(PRFileDesc*, mozilla::net::NetAddr const&) [netwerk/base/nsServerSocket.cpp:148]
#05: mozilla::net::nsServerSocket::OnSocketReady(PRFileDesc*, short) [netwerk/base/nsServerSocket.cpp:190]
#06: mozilla::net::nsSocketTransportService::DoPollIteration(mozilla::BaseTimeDuration<mozilla::TimeDurationValueCalculator>) [netwerk/base/nsSocketTransportService2.cpp:1375]
#07: mozilla::net::nsSocketTransportService::Run() [netwerk/base/nsSocketTransportService2.cpp:1157]
#08: {virtual override thunk({offset(-32)}, mozilla::net::nsSocketTransportService::Run())}
#09: nsThread::ProcessNextEvent(bool, bool
) [xpcom/threads/nsThread.cpp:1202]

Jens, looks like changes in networking trigger this. Is there anyone on that team that can take a look?

Flags: needinfo?(jyavenard)
Flags: needinfo?(jstutte)
Flags: needinfo?(jmathies)

jya apparently has some STR, so it would still be good for him to attach the information I requested in comment 21.

Flags: needinfo?(jyavenard)

So to recap: We have a string leak that is happening mostly on Mac OSX. It would help Kershaw indeed to have clear STR for this, this would speed up things a lot.

Flags: needinfo?(jstutte)

So with this stack of patches, I have always one intermittent on macOS 10.14 debug with this leak, it is no longer permafailing,

It's always on mac now and with the WPT tests.
last 2 try pushes:
https://treeherder.mozilla.org/logviewer.html#?job_id=318702788&repo=try
https://treeherder.mozilla.org/logviewer.html#?job_id=318701213&repo=try
https://treeherder.mozilla.org/logviewer.html#?job_id=318701173&repo=try
https://treeherder.mozilla.org/logviewer.html#?job_id=318701163&repo=try
https://treeherder.mozilla.org/logviewer.html#?job_id=318701208&repo=try

I haven't attempted to reproduce it locally.

Flags: needinfo?(jyavenard)

this isn't a perma fail on trunk and mostly if not fully is a problem on osx. I would say it fails 15-25% of the time on osx/debug.

Whiteboard: [necko-triaged][stockwell disable-recommended] → [necko-triaged][stockwell needswork:owner]
Whiteboard: [necko-triaged][stockwell disable-recommended] → [necko-triaged][stockwell needswork:owner]

(In reply to Andreea Pavel [:apavel] from comment #24)

Retriggers: https://treeherder.mozilla.org/#/jobs?repo=autoland&searchStr=OS%2CX%2C10.14%2Cdebug%2CWeb%2Cplatform%2Ctests%2Ctest-macosx1014-64%2Fdebug-web-platform-tests-reftest-e10s%2CWr4&tochange=4006c1ef7d258c1c32e3546104aa508a44bcea25&group_state=expanded&fromchange=71aa02ea22b62481ff6cfb8755042a8c37562238&selectedTaskRun=U4WjOv6kRziYvIdCYkz6RA.0

Kershaw can this be from bug 1652677? Please take a look.

Sorry for the late reply.
The feature implemented in bug 1652677 is behind a pref and not enabled, so it should not be the culprit.

FWIW, I'll also take a look at this.

Flags: needinfo?(kershaw)

Removing [stockwell disable-recommended] tag based on Comment 47 and since the last failure is from 20th of October.

Whiteboard: [necko-triaged][stockwell disable-recommended] → [necko-triaged]
Whiteboard: [necko-triaged][stockwell disable-recommended] → [necko-triaged]

This has been failing with a high frequency since October 28th: https://treeherder.mozilla.org/intermittent-failures/bugdetails?startday=2020-10-23&endday=2020-10-30&tree=trunk&bug=1630974

Kershaw, could you take a look?

Summary: Intermittent leakcheck | default 1328 bytes leaked (Mutex, PollableEvent, ReentrantMonitor, nsAStreamCopier, nsPipe, ...) → High frequency leakcheck | default 1328 bytes leaked (Mutex, PollableEvent, ReentrantMonitor, nsAStreamCopier, nsPipe, ...)
Flags: needinfo?(kershaw)

So, this is really weird to me.
I think I found out what cause this leak.
See the two try pushes below. The only difference is that mEchConfig is removed.

Ok: https://treeherder.mozilla.org/jobs?repo=try&revision=4ac35b4286e8c5f1472e27f3593b0fa3a6eccbff
Ng: https://treeherder.mozilla.org/jobs?repo=try&revision=2c3ebcfe9ee941c9177110338b3ae6b8dd0e4fe1

I can't explain why we have this intermittent leak on macOS and why this is gone by removing an unused data member mEchConfig.
Simon, maybe you have an idea about this?

Flags: needinfo?(kershaw) → needinfo?(sgiesecke)

Without any undefined behaviour involved, I don't see an explanation for this. Are these tests run with ASan as well? Unfortunately, since this is a OS X only issue, we don't have any TSan reports for that.

Flags: needinfo?(sgiesecke)
Whiteboard: [necko-triaged][stockwell disable-recommended] → [necko-triaged]
Whiteboard: [necko-triaged][stockwell disable-recommended] → [necko-triaged][stockwell unknown]

The frequency for this dropped at the end of October for no apparent reason.

Summary: High frequency leakcheck | default 1328 bytes leaked (Mutex, PollableEvent, ReentrantMonitor, nsAStreamCopier, nsPipe, ...) → Intermittent leakcheck | default 1328 bytes leaked (Mutex, PollableEvent, ReentrantMonitor, nsAStreamCopier, nsPipe, ...)
Severity: normal → S4
Priority: -- → P5
Status: NEW → RESOLVED
Closed: 3 years ago
Resolution: --- → INCOMPLETE
You need to log in before you can comment on or make changes to this bug.