1546945 - remote/test/browser/ browser-chrome tests cause leakcheck to fail

Reporter

Description

•

6 years ago

After I enabled the remote agent in the default Nightly build, we
started seeing leaks in the browser-chrome tests as seen here:

https://treeherder.mozilla.org/#/jobs?repo=autoland&group_state=expanded&searchStr=debug%2Cbrowser-chrome&fromchange=b9a625eff9e301f700196de977f916e7f77a895f&tochange=afb947a236befa6eb6f23783405bd9c334aa4129

I will submit a new patch to unhook the remote agent from the default
build, until we have had time to resolve this. Backing out the
original patchset would be detrimental to investigating this
effectively on try.

Andreas Tolfsen ❲:ato❳

Reporter

Updated

•

6 years ago

Assignee: nobody → ato

Status: NEW → ASSIGNED

Depends on: 1533831

Priority: -- → P1

Andreas Tolfsen ❲:ato❳

Reporter

Comment 1

•

6 years ago

Attached file bug 1546945: remote: temporarily stop implying --enable-cdp; (obsolete) — Details

As detailed in https://bugzilla.mozilla.org/show_bug.cgi?id=1546945,
enabling the remote agent in the default Firefox Nightly build
(https://bugzilla.mozilla.org/show_bug.cgi?id=1533831) caused a
leakcheck in M-bc to fail.

This patch cuts the remote agent off from the build, by stopping
to imply --enable-cdp, until we have had time to fix the leak.

Pulsebot

Comment 2

•

6 years ago

Pushed by atolfsen@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/004d050a8ec4 remote: temporarily stop implying --enable-cdp; r=aryx

Andreas Tolfsen ❲:ato❳

Reporter

Updated

•

6 years ago

Keywords: leave-open

Andreas Tolfsen ❲:ato❳

Reporter

Updated

•

6 years ago

Blocks: puppeteer-mvp

Andreas Tolfsen ❲:ato❳

Reporter

Updated

•

6 years ago

Blocks: 1540655

Andreas Tolfsen ❲:ato❳

Reporter

Comment 3

•

6 years ago

All browser-chrome jobs have been requested on debug for
https://treeherder.mozilla.org/#/jobs?repo=autoland&revision=004d050a8ec44987f44fd86c62097266b22b915e,
so it should confirm it was the remote agent that caused the leaks.

Because we don’t run all test jobs on all pushes, the working theory
is that the push I linked to in comment #0 saw the effect of the
remote agent M-bc jobs starting to leak.

URL: https://treeherder.mozilla.org/#/jobs...

Andreas Tolfsen ❲:ato❳

Reporter

Comment 4

•

6 years ago

Attached file bug 1546945: remote: disable browser_cdp.js browser-chrome test on debug; (obsolete) — Details

As detailed in https://bugzilla.mozilla.org/show_bug.cgi?id=1546945,
there is a memory leak in the remote agent that causes it to
intermittently fail brwoser-chrome leakchecks.

It is possible the memory leak is related to holding onto a reference
of the XPConnect C++ object nsSocketTransportService, but this has
yet to be confirmed.

This patch disables the browser-chrome tests on debug builds,
where we run reference counting leakchecks, in order to get the
remote agent enabled in default Firefox Nightly builds.

Thanks-to: Alexandre Poirot <poirot.alex@gmail.com>

Andreas Tolfsen ❲:ato❳

Reporter

Comment 5

•

6 years ago

Attached file bug 1546945: remote: re-enable remote agent in Firefox Nightly; (obsolete) — Details

This patch negates the effect of 004d050a8ec4 (git 6dd9496ac0f9).

Andreas Tolfsen ❲:ato❳

Reporter

Comment 6

•

6 years ago

OK, I talked to ochameau and in order to make some meaningful
progress on enabling the remote agent in default Firefox Nightly
builds, we are going to disable the browser-chrome tests on debug
builds where reference counting leakchecks are run. I’ve submitted
the necessary patches for doing that as well as turning the remote
agent back on.

This bug is still relevant and now tracks re-enabling the M-bc
tests in remote/test/browser/*.js by first plugging the leak.

ochameau and I did some wild-goose-chase debugging on IRC, and we
suspect that we’re holding on to an XPConnect reference to a C++
instance of nsSocketTransportService. There are only a handful
of places we use sockets, which inculde for WebSocket client
connections, and for the HTTPD.

Andreas Tolfsen ❲:ato❳

Reporter

Comment 7

•

6 years ago

I narrowed it down to remote/test/browser/browser_cdp.js. On my
workstation it is failing the leakcheck consistently, which is
actually good news for debugging it.

This means I will revisit the attached patches and change them to
only ignore browser_cpd.js on debug.

Keywords: memory-leak

Summary: Remote agent browser-chrome tests causes leakcheck to fail intermittently → remote/test/browser/browser_cdp.js causes leakcheck to fail

Phabricator Automation

Updated

•

6 years ago

Attachment #9060764 - Attachment description: bug 1546945: remote: disable browser-chrome tests on debug; → bug 1546945: remote: disable browser_cdp.js browser-chrome test on debug;

Oana Pop-Rus

Comment 8

•

6 years ago

bugherder

https://hg.mozilla.org/mozilla-central/rev/004d050a8ec4

Pulsebot

Comment 9

•

6 years ago

Pushed by atolfsen@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/f04e1e9c54aa remote: disable browser_cdp.js browser-chrome test on debug; r=ochameau https://hg.mozilla.org/integration/autoland/rev/868b1bf043dc remote: re-enable remote agent in Firefox Nightly; r=ochameau

Razvan Maries

Comment 10

•

6 years ago

bugherder

https://hg.mozilla.org/mozilla-central/rev/f04e1e9c54aa
https://hg.mozilla.org/mozilla-central/rev/868b1bf043dc

Andreas Tolfsen ❲:ato❳

Reporter

Comment 11

•

6 years ago

The remote agent has now been re-enabled, so relinquishing this
bug. We still need to fix this memory leak, but I don’t consider
it a top priority.

Status: ASSIGNED → NEW

Priority: P1 → P3

Andreas Tolfsen ❲:ato❳

Reporter

Updated

•

6 years ago

Assignee: ato → nobody

Andreas Tolfsen ❲:ato❳

Reporter

Comment 14

•

6 years ago

Over the weekend two other intermittent leaks were detected in the
remaining browser-chrome tests:

I’ve filed bug 1547679 to temporarily disable all browser-chrome
tests on debug until we’ve had time to investigate (this bug).

Depends on: 1547679

Summary: remote/test/browser/browser_cdp.js causes leakcheck to fail → remote/test/browser/ browser-chrome tests cause leakcheck to fail

Andrew McCreight (out of office until 8/21) [:mccr8]

Comment 15

•

6 years ago

I can help you at some point with investigating this leak, though I don't think I'll have a ton of time this week.

Comment hidden (Intermittent Failures Robot)

Andreas Tolfsen ❲:ato❳

Reporter

Updated

•

6 years ago

No longer blocks: 1540655

Comment hidden (Intermittent Failures Robot)

Andrew McCreight (out of office until 8/21) [:mccr8]

Comment 20

•

6 years ago

The failures in comment 19 are in accessible/tests/browser/events/. I don't know why this bug is being starred. There's another bug on file for it.

Comment hidden (Intermittent Failures Robot)

Andreas Tolfsen ❲:ato❳

Reporter

Updated

•

6 years ago

Attachment #9060696 - Attachment is obsolete: true

Andreas Tolfsen ❲:ato❳

Reporter

Updated

•

6 years ago

Attachment #9060764 - Attachment is obsolete: true

Andreas Tolfsen ❲:ato❳

Reporter

Updated

•

6 years ago

Attachment #9060765 - Attachment is obsolete: true

Comment hidden (Intermittent Failures Robot)

Daniel Varga [:dvarga]

Comment 25

•

6 years ago

Removing the disabled tag per comment 20 (misclassification)

Daniel Varga [:dvarga]

Updated

•

6 years ago

Whiteboard: [stockwell disable-recommended]

Comment hidden (Intermittent Failures Robot)

Andreas Tolfsen ❲:ato❳

Reporter

Comment 27

•

6 years ago

Misclassification, see above.

Whiteboard: [stockwell disable-recommended]

Comment hidden (Intermittent Failures Robot)

Cosmin Sabou [:CosminS]

Updated

•

6 years ago

Whiteboard: [stockwell disable-recommended]

Comment hidden (Intermittent Failures Robot)

Andrew McCreight (out of office until 8/21) [:mccr8]

Comment 33

•

6 years ago

The leaks are being misclassified. In the two logs I was looking at, they were happening in toolkit/components/certviewer/tests/browser/

Comment hidden (Intermittent Failures Robot)

Henrik Skupin [:whimboo][⌚️UTC+2]

Comment 41

•

6 years ago

See also bug 1587814 for leaks as reported when running the puppeteer unit tests locally.

Comment 42

•

6 years ago

Note that any kind of failure in the test, or exception thrown by a CDP endpoint will cause the browser chrome tests to not correctly clean-up the tab and client instances. As such we also leak a lot of memory. And the related leak output is frustrating because it makes it harder to find the real issue.

I will file a new bug for that particular case, which will improve a lot our situation, and maybe is all needed here.

Henrik Skupin [:whimboo][⌚️UTC+2]

Updated

•

6 years ago

Depends on: 1589625

Comment hidden (Intermittent Failures Robot)

Henrik Skupin [:whimboo][⌚️UTC+2]

Comment 44

•

6 years ago

Bug 1589625 has been fixed, and as such I'm fairly sure the leaks should all be gone now. Andreas, can you remember what we have to do to finally close out this bug? With all those formerly landed patches on this and other bugs it's kinda hard to follow. Maybe it's only backing out your first patch?

Flags: needinfo?(ato)

Comment hidden (Intermittent Failures Robot)

Henrik Skupin [:whimboo][⌚️UTC+2]

Comment 46

•

6 years ago

(In reply to Intermittent Failures Robot from comment #45)

1 failures in 4182 pushes (0.0 failures/push) were associated with this bug in the last 7 days.

For more details, see:
https://treeherder.mozilla.org/intermittent-failures.html#/bugdetails?bug=1546945&startday=2019-10-21&endday=2019-10-27&tree=all

This is a wrong classification. Alexandru, can you please correct it?

Flags: needinfo?(malexandru)

Alexandru Michis [:malexandru]

Comment 47

•

6 years ago

Corrected, made a new bug for that issue: https://bugzilla.mozilla.org/show_bug.cgi?id=1592568

Flags: needinfo?(malexandru)

Andreas Tolfsen ❲:ato❳

Reporter

Comment 48

•

6 years ago

•

Edited

Thanks for bringing this to my attention!

(In reply to Henrik Skupin (:whimboo) [⌚️UTC+2] from comment #44)

Bug 1589625 has been fixed, and as such I'm fairly sure the leaks
should all be gone now. Andreas, can you remember what we have to
do to finally close out this bug? With all those formerly landed
patches on this and other bugs it's kinda hard to follow. Maybe
it's only backing out your first patch?

The leak that caused this bug to be filed was plugged by e2fbe8d063e0
in bug 1589625. After that there were a number of misclassifications
that made it hard to keep track of whether the leak problems had
been fixed.

With the patches for bug 1589625 also applied, I haven’t seen any
recent remote agent specific leaks.

As for what action needs to be taken here, I don’t think there is
any. The remote agent was re-enabled by default with 868b1bf043dc
which we could do because f04e1e9c54aa disabled browser_cdp.js
on debug. That leak was eventually plugged in 60dca41daf70 by making
sure all the content process domains had their destructors called.

This means the status quo is currently:

All the tests are passing on debug configurations
We are not seeing any remote agent specific intermittents
The remote again remains enabled by default (on the Nightly channel)

Status: NEW → RESOLVED

Closed: 6 years ago

Flags: needinfo?(ato)

Resolution: --- → FIXED

Henrik Skupin [:whimboo][⌚️UTC+2]

Comment 49

•

6 years ago

Thanks Andreas!

Resolution: FIXED → WORKSFORME

Comment hidden (Intermittent Failures Robot)

bug 1546945: remote: temporarily stop implying --enable-cdp; 6 years ago Andreas Tolfsen ❲:ato❳ 47 bytes, text/x-phabricator-request		Details \| Review
bug 1546945: remote: disable browser_cdp.js browser-chrome test on debug; 6 years ago Andreas Tolfsen ❲:ato❳ 47 bytes, text/x-phabricator-request		Details \| Review
bug 1546945: remote: re-enable remote agent in Firefox Nightly; 6 years ago Andreas Tolfsen ❲:ato❳ 47 bytes, text/x-phabricator-request		Details \| Review