remote/test/browser/ browser-chrome tests cause leakcheck to fail
Categories
(Remote Protocol :: Agent, defect, P3)
Tracking
(Not tracked)
People
(Reporter: ato, Unassigned)
References
()
Details
(Keywords: leave-open, memory-leak)
Attachments
(3 obsolete files)
After I enabled the remote agent in the default Nightly build, we
started seeing leaks in the browser-chrome tests as seen here:
I will submit a new patch to unhook the remote agent from the default
build, until we have had time to resolve this. Backing out the
original patchset would be detrimental to investigating this
effectively on try.
Reporter | ||
Updated•6 years ago
|
Reporter | ||
Comment 1•6 years ago
|
||
As detailed in https://bugzilla.mozilla.org/show_bug.cgi?id=1546945,
enabling the remote agent in the default Firefox Nightly build
(https://bugzilla.mozilla.org/show_bug.cgi?id=1533831) caused a
leakcheck in M-bc to fail.
This patch cuts the remote agent off from the build, by stopping
to imply --enable-cdp, until we have had time to fix the leak.
Reporter | ||
Updated•6 years ago
|
Reporter | ||
Updated•6 years ago
|
Reporter | ||
Comment 3•6 years ago
|
||
All browser-chrome jobs have been requested on debug for
https://treeherder.mozilla.org/#/jobs?repo=autoland&revision=004d050a8ec44987f44fd86c62097266b22b915e,
so it should confirm it was the remote agent that caused the leaks.
Because we don’t run all test jobs on all pushes, the working theory
is that the push I linked to in comment #0 saw the effect of the
remote agent M-bc
jobs starting to leak.
Reporter | ||
Comment 4•6 years ago
|
||
As detailed in https://bugzilla.mozilla.org/show_bug.cgi?id=1546945,
there is a memory leak in the remote agent that causes it to
intermittently fail brwoser-chrome leakchecks.
It is possible the memory leak is related to holding onto a reference
of the XPConnect C++ object nsSocketTransportService, but this has
yet to be confirmed.
This patch disables the browser-chrome tests on debug builds,
where we run reference counting leakchecks, in order to get the
remote agent enabled in default Firefox Nightly builds.
Thanks-to: Alexandre Poirot <poirot.alex@gmail.com>
Reporter | ||
Comment 5•6 years ago
|
||
This patch negates the effect of 004d050a8ec4 (git 6dd9496ac0f9).
Reporter | ||
Comment 6•6 years ago
|
||
OK, I talked to ochameau and in order to make some meaningful
progress on enabling the remote agent in default Firefox Nightly
builds, we are going to disable the browser-chrome tests on debug
builds where reference counting leakchecks are run. I’ve submitted
the necessary patches for doing that as well as turning the remote
agent back on.
This bug is still relevant and now tracks re-enabling the M-bc
tests in remote/test/browser/*.js by first plugging the leak.
ochameau and I did some wild-goose-chase debugging on IRC, and we
suspect that we’re holding on to an XPConnect reference to a C++
instance of nsSocketTransportService
. There are only a handful
of places we use sockets, which inculde for WebSocket client
connections, and for the HTTPD.
Reporter | ||
Comment 7•6 years ago
|
||
I narrowed it down to remote/test/browser/browser_cdp.js. On my
workstation it is failing the leakcheck consistently, which is
actually good news for debugging it.
This means I will revisit the attached patches and change them to
only ignore browser_cpd.js on debug.
Updated•6 years ago
|
Comment 8•6 years ago
|
||
bugherder |
Comment 10•6 years ago
|
||
bugherder |
Reporter | ||
Comment 11•6 years ago
|
||
The remote agent has now been re-enabled, so relinquishing this
bug. We still need to fix this memory leak, but I don’t consider
it a top priority.
Reporter | ||
Updated•6 years ago
|
Reporter | ||
Comment 14•6 years ago
|
||
Over the weekend two other intermittent leaks were detected in the
remaining browser-chrome tests:
I’ve filed bug 1547679 to temporarily disable all browser-chrome
tests on debug until we’ve had time to investigate (this bug).
Comment 15•6 years ago
|
||
I can help you at some point with investigating this leak, though I don't think I'll have a ton of time this week.
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment 20•5 years ago
|
||
The failures in comment 19 are in accessible/tests/browser/events/. I don't know why this bug is being starred. There's another bug on file for it.
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Reporter | ||
Updated•5 years ago
|
Reporter | ||
Updated•5 years ago
|
Reporter | ||
Updated•5 years ago
|
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment 25•5 years ago
|
||
Removing the disabled tag per comment 20 (misclassification)
Updated•5 years ago
|
Comment hidden (Intermittent Failures Robot) |
Reporter | ||
Comment 27•5 years ago
|
||
Misclassification, see above.
Comment hidden (Intermittent Failures Robot) |
Updated•5 years ago
|
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment 33•5 years ago
|
||
The leaks are being misclassified. In the two logs I was looking at, they were happening in toolkit/components/certviewer/tests/browser/
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment 41•5 years ago
|
||
See also bug 1587814 for leaks as reported when running the puppeteer unit tests locally.
Comment 42•5 years ago
|
||
Note that any kind of failure in the test, or exception thrown by a CDP endpoint will cause the browser chrome tests to not correctly clean-up the tab
and client
instances. As such we also leak a lot of memory. And the related leak output is frustrating because it makes it harder to find the real issue.
I will file a new bug for that particular case, which will improve a lot our situation, and maybe is all needed here.
Comment hidden (Intermittent Failures Robot) |
Comment 44•5 years ago
|
||
Bug 1589625 has been fixed, and as such I'm fairly sure the leaks should all be gone now. Andreas, can you remember what we have to do to finally close out this bug? With all those formerly landed patches on this and other bugs it's kinda hard to follow. Maybe it's only backing out your first patch?
Comment hidden (Intermittent Failures Robot) |
Comment 46•5 years ago
|
||
(In reply to Intermittent Failures Robot from comment #45)
1 failures in 4182 pushes (0.0 failures/push) were associated with this bug in the last 7 days.
For more details, see:
https://treeherder.mozilla.org/intermittent-failures.html#/bugdetails?bug=1546945&startday=2019-10-21&endday=2019-10-27&tree=all
This is a wrong classification. Alexandru, can you please correct it?
Comment 47•5 years ago
|
||
Corrected, made a new bug for that issue: https://bugzilla.mozilla.org/show_bug.cgi?id=1592568
Reporter | ||
Comment 48•5 years ago
•
|
||
Thanks for bringing this to my attention!
(In reply to Henrik Skupin (:whimboo) [⌚️UTC+2] from comment #44)
Bug 1589625 has been fixed, and as such I'm fairly sure the leaks
should all be gone now. Andreas, can you remember what we have to
do to finally close out this bug? With all those formerly landed
patches on this and other bugs it's kinda hard to follow. Maybe
it's only backing out your first patch?
The leak that caused this bug to be filed was plugged by e2fbe8d063e0
in bug 1589625. After that there were a number of misclassifications
that made it hard to keep track of whether the leak problems had
been fixed.
With the patches for bug 1589625 also applied, I haven’t seen any
recent remote agent specific leaks.
As for what action needs to be taken here, I don’t think there is
any. The remote agent was re-enabled by default with 868b1bf043dc
which we could do because f04e1e9c54aa
disabled browser_cdp.js
on debug. That leak was eventually plugged in 60dca41daf70
by making
sure all the content process domains had their destructors called.
This means the status quo is currently:
- All the tests are passing on debug configurations
- We are not seeing any remote agent specific intermittents
- The remote again remains enabled by default (on the Nightly channel)
Comment hidden (Intermittent Failures Robot) |
Description
•