Intermittent leakcheck | tab process: 10136 bytes leaked (ChannelEventQueue, ChildDNSService, CondVar, ConsoleReportCollector, DataStorage, ...)

RESOLVED FIXED in Firefox 55

Status

()

Core
Networking
RESOLVED FIXED
a year ago
9 months ago

People

(Reporter: Treeherder Bug Filer, Assigned: schien)

Tracking

({intermittent-failure, memory-leak})

unspecified
mozilla56
intermittent-failure, memory-leak
Points:
---

Firefox Tracking Flags

(firefox-esr52 unaffected, firefox54 unaffected, firefox55 fixed, firefox56 fixed)

Details

(Whiteboard: [necko-active][stockwell fixed])

MozReview Requests

()

Submitter Diff Changes Open Issues Last Updated
Loading...
Error loading review requests:

Attachments

(2 attachments)

on my radar, will check if it related to my recent PBg-Http patches.
Assignee: nobody → schien
Keywords: mlk
Whiteboard: [necko-active]

Comment 2

11 months ago
16 failures in 814 pushes (0.02 failures/push) were associated with this bug in the last 7 days.   

Repository breakdown:
* mozilla-inbound: 6
* mozilla-beta: 5
* autoland: 5

Platform breakdown:
* windows8-64: 8
* osx-10-10: 8

For more details, see:
https://brasstacks.mozilla.com/orangefactor/?display=Bug&bugid=1369571&startday=2017-06-12&endday=2017-06-18&tree=all
looks like it is reproducible on Mac, might have better change to reproduce it locally.

Comment 4

11 months ago
14 failures in 892 pushes (0.016 failures/push) were associated with this bug in the last 7 days.   

Repository breakdown:
* autoland: 5
* mozilla-inbound: 4
* mozilla-beta: 4
* mozilla-central: 1

Platform breakdown:
* windows8-64: 8
* osx-10-10: 5
* linux64: 1

For more details, see:
https://brasstacks.mozilla.com/orangefactor/?display=Bug&bugid=1369571&startday=2017-06-19&endday=2017-06-25&tree=all

Comment 5

11 months ago
13 failures in 718 pushes (0.018 failures/push) were associated with this bug in the last 7 days.   

Repository breakdown:
* mozilla-inbound: 6
* autoland: 5
* mozilla-central: 1
* mozilla-beta: 1

Platform breakdown:
* windows8-64: 11
* osx-10-10: 2

For more details, see:
https://brasstacks.mozilla.com/orangefactor/?display=Bug&bugid=1369571&startday=2017-06-26&endday=2017-07-02&tree=all

Comment 6

11 months ago
32 failures in 656 pushes (0.049 failures/push) were associated with this bug in the last 7 days.   

** This failure happened more than 30 times this week! Resolving this bug is a high priority. **

** Try to resolve this bug as soon as possible. If unresolved for 2 weeks, the affected test(s) may be disabled. ** 

Repository breakdown:
* mozilla-inbound: 11
* autoland: 10
* mozilla-central: 6
* mozilla-beta: 5

Platform breakdown:
* windows8-64: 16
* osx-cross: 9
* osx-10-10: 7

For more details, see:
https://brasstacks.mozilla.com/orangefactor/?display=Bug&bugid=1369571&startday=2017-07-03&endday=2017-07-09&tree=all
this is failing quite frequently on osx/win8 debug.  Win8 is non-e10s, osx is e10s.

here is a more recent osx debug log from trunk:
https://treeherder.mozilla.org/logviewer.html#?repo=mozilla-inbound&job_id=113535234&lineNumber=14171

this looks to be leaking in this directory:
chrome://mochitests/content/chrome/dom/xul/test

here is the bloat view:
21:25:24     INFO - == BloatView: ALL (cumulative) LEAK AND BLOAT STATISTICS, tab process 664
21:25:24     INFO - 
21:25:24     INFO -      |<----------------Class--------------->|<-----Bytes------>|<----Objects---->|
21:25:24     INFO -      |                                      | Per-Inst   Leaked|   Total      Rem|
21:25:24     INFO -    0 |TOTAL                                 |       49    10248|   48754      103|
21:25:24     INFO -   30 |ChannelEventQueue                     |      160      160|       1        1|
21:25:24     INFO -   31 |ChildDNSService                       |      168      168|       1        1|
21:25:24     INFO -   43 |CondVar                               |       64      128|      59        2|
21:25:24     INFO -   45 |ConsoleReportCollector                |      112      112|       7        1|
21:25:24     INFO -   57 |DataStorage                           |      456      912|       3        2|
21:25:24     INFO -   93 |HttpBackgroundChannelChild            |       72       72|       1        1|
21:25:24     INFO -   94 |HttpChannelChild                      |     2232     2232|       1        1|
21:25:24     INFO -  113 |LoadInfo                              |      200      200|     184        1|
21:25:24     INFO -  126 |Mutex                                 |       80      880|     160       11|
21:25:24     INFO -  142 |PHttpBackgroundChannelChild           |       40       40|       1        1|
21:25:24     INFO -  143 |PHttpChannelChild                     |       40       40|       1        1|
21:25:24     INFO -  161 |PollableEvent                         |       24       24|       1        1|
21:25:24     INFO -  185 |ReentrantMonitor                      |       40       80|      32        2|
21:25:24     INFO -  187 |RequestContext                        |       64       64|       2        1|
21:25:24     INFO -  188 |RequestContextService                 |       88       88|       1        1|
21:25:24     INFO -  194 |Runnable                              |       40       80|     366        2|
21:25:24     INFO -  197 |SchedulerEventTarget                  |       48       48|      16        1|
21:25:24     INFO -  236 |TabGroup                              |      240      240|       1        1|
21:25:24     INFO -  264 |WeakReference<nsDocShell>             |       24       24|      10        1|
21:25:24     INFO -  313 |nsAuthURLParser                       |       24       24|       2        1|
21:25:24     INFO -  359 |nsDSURIContentListener                |       88       88|       1        1|
21:25:24     INFO -  362 |nsDefaultURIFixup                     |       24       24|       1        1|
21:25:24     INFO -  369 |nsDocLoader                           |      424      424|       2        1|
21:25:24     INFO -  370 |nsDocShell                            |     1168     1168|       1        1|
21:25:24     INFO -  371 |nsDocShell::InterfaceRequestorProxy   |       32       32|       1        1|
21:25:24     INFO -  377 |nsDocumentOpenInfo                    |       88       88|       2        1|
21:25:24     INFO -  406 |nsHttpAuthCache::OriginClearObserver  |       32       64|       2        2|
21:25:24     INFO -  407 |nsHttpHandler                         |      720      720|       1        1|
21:25:24     INFO -  408 |nsHttpRequestHead                     |      136      136|       1        1|
21:25:24     INFO -  410 |nsIOService                           |      288      288|       1        1|
21:25:24     INFO -  427 |nsJSPrincipals                        |       24       24|      33        1|
21:25:24     INFO -  430 |nsLoadGroup                           |       16       16|       2        1|
21:25:24     INFO -  432 |nsMainThreadPtrHolder<T>              |       40       80|       2        2|
21:25:24     INFO -  477 |nsSiteSecurityService                 |       80       80|       1        1|
21:25:24     INFO -  478 |nsSocketTransportService              |      320      320|       1        1|
21:25:24     INFO -  479 |nsStandardURL                         |      272      272|     834        1|
21:25:24     INFO -  483 |nsStringBuffer                        |        8      144|   14195       18|
21:25:24     INFO -  494 |nsTArray_base                         |        8      168|    9096       21|
21:25:24     INFO -  502 |nsURILoader                           |       32       32|       1        1|
21:25:24     INFO -  505 |nsVariant                             |       72       72|       2        1|
21:25:24     INFO -  508 |nsWeakReference                       |       40      360|      44        9|
21:25:24     INFO -  511 |nsWebNavigationInf



I assume this is in the core::networking because of the ChildDNSService and related bits on the bloatview.

:mcmanus, I see you are the triage owner for core::networking, can you find someone on the networking team to look into this failure sometime in the next two weeks?
Flags: needinfo?(mcmanus)
Whiteboard: [necko-active] → [necko-active][stockwell needswork]
sc already took the bug
Flags: needinfo?(mcmanus)

Comment 9

10 months ago
32 failures in 720 pushes (0.044 failures/push) were associated with this bug in the last 7 days. 

This is the #46 most frequent failure this week.  

** This failure happened more than 30 times this week! Resolving this bug is a high priority. **

** Try to resolve this bug as soon as possible. If unresolved for 2 weeks, the affected test(s) may be disabled. ** 

Repository breakdown:
* autoland: 9
* mozilla-beta: 8
* mozilla-inbound: 7
* pine: 5
* try: 2
* cedar: 1

Platform breakdown:
* windows8-64: 17
* osx-10-10: 8
* osx-cross: 6
* linux64: 1

For more details, see:
https://brasstacks.mozilla.com/orangefactor/?display=Bug&bugid=1369571&startday=2017-07-10&endday=2017-07-16&tree=all
Sorry for the late response, I'll provide analysis in these two days.
Reproduced on my local machine with |./mach mochitest --run-until-failure dom/xul/test/test_bug1271240.xul| for few times.

The leakage is caused by the reference cycle between HttpChannelChild and HttpBackgroundChannelChild.
The reference cycle is supposed to be removed when HttpBackgroundChannelChild::ActorDestroyed, however this operation is pending due until HttpChannelChild receives OnStartRequest IPC message. Unfortunately the OnStartRequest IPC message is dropped because of PContent shutdown procedure. So, the reference removal is never executed, which leaks memory until content process is terminated.
Created attachment 8887428 [details]
fail-http.log

relevant HTTP MOZ log.

HttpBackgroundChannelChild @0x7f3a48dafd30 and HttpChannelChild @0x7f3a48d40000 are the leaked object.
Comment hidden (mozreview-request)
Comment on attachment 8887431 [details]
Bug 1369571 - break reference cycle between HttpChannelChild and HttpBackgroundChannelChild while PHttpChannel is destroyed.

cancel r? because try result looks bad.
Attachment #8887431 - Flags: review?(honzab.moz)
Comment hidden (mozreview-request)

Comment 18

10 months ago
14 failures in 822 pushes (0.017 failures/push) were associated with this bug in the last 7 days.   

Repository breakdown:
* autoland: 4
* try: 3
* pine: 3
* mozilla-inbound: 2
* mozilla-beta: 2

Platform breakdown:
* windows8-64: 8
* osx-10-10: 6

For more details, see:
https://brasstacks.mozilla.com/orangefactor/?display=Bug&bugid=1369571&startday=2017-07-17&endday=2017-07-23&tree=all
@mayhemer, is there anything blocking the review? I'd like to land this patch before merge day.
Flags: needinfo?(honzab.moz)

Comment 20

10 months ago
mozreview-review
Comment on attachment 8887431 [details]
Bug 1369571 - break reference cycle between HttpChannelChild and HttpBackgroundChannelChild while PHttpChannel is destroyed.

https://reviewboard.mozilla.org/r/158278/#review167396

::: netwerk/protocol/http/HttpBackgroundChannelChild.cpp:124
(Diff revision 2)
>  
>    // HttpChannelChild is not going to handle any incoming message.
>    mChannelChild = nullptr;
> +
> +  // Remove pending IPC messages as well.
> +  mQueuedRunnables.Clear();

can any of the messages hold a reference to an object we expect to release on some particular thread?  in that case you would need to release them on the respective threads one by one.

::: netwerk/protocol/http/HttpChannelChild.cpp:3620
(Diff revision 2)
> +  MOZ_ASSERT(NS_IsMainThread());
> +
> +  // OnStartRequest might be dropped if IPDL is destroyed abnormally
> +  // and BackgroundChild might have pending IPC messages.
> +  // Clean up BackgroundChild at this time to prevent memleak.
> +  if (aWhy != Deletion) {

any reason why we shouldn't do this all the time?  why we don't need to do this when the reason is Deletion?  (no docs for this state..)
c20
Flags: needinfo?(honzab.moz)
(Assignee)

Comment 23

10 months ago
mozreview-review
Comment on attachment 8887431 [details]
Bug 1369571 - break reference cycle between HttpChannelChild and HttpBackgroundChannelChild while PHttpChannel is destroyed.

https://reviewboard.mozilla.org/r/158278/#review167616

::: netwerk/protocol/http/HttpBackgroundChannelChild.cpp:124
(Diff revision 2)
>  
>    // HttpChannelChild is not going to handle any incoming message.
>    mChannelChild = nullptr;
> +
> +  // Remove pending IPC messages as well.
> +  mQueuedRunnables.Clear();

No, all the runnables are contains a copy of data received from IPC. Nothing is required to release on specific thread.

::: netwerk/protocol/http/HttpChannelChild.cpp:3620
(Diff revision 2)
> +  MOZ_ASSERT(NS_IsMainThread());
> +
> +  // OnStartRequest might be dropped if IPDL is destroyed abnormally
> +  // and BackgroundChild might have pending IPC messages.
> +  // Clean up BackgroundChild at this time to prevent memleak.
> +  if (aWhy != Deletion) {

The only document I found is in the original bug: https://bugzilla.mozilla.org/show_bug.cgi?id=529005#c6
Deletion is paired with Send__delete__(), which is the normal case of closing an IPDL actor.

On parent side we might close the background channel immediately after all messages are sent. If we handle the ActorDestroy immediately we will either lose the chance to process those enqueued messages. Normal Deletion should be treated in order with other IPC messages.
@mayhemer, please see my response in comment #23.
Flags: needinfo?(honzab.moz)
Thanks for this reply!  This is r+ then.
Flags: needinfo?(honzab.moz)

Comment 26

10 months ago
mozreview-review
Comment on attachment 8887431 [details]
Bug 1369571 - break reference cycle between HttpChannelChild and HttpBackgroundChannelChild while PHttpChannel is destroyed.

https://reviewboard.mozilla.org/r/158278/#review167748
Attachment #8887431 - Flags: review?(honzab.moz) → review+
(maybe these comments in some simplified form should appear in the code as well.)

Comment 28

10 months ago
Pushed by schien@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/039ee2964fb9
break reference cycle between HttpChannelChild and HttpBackgroundChannelChild while PHttpChannel is destroyed. r=mayhemer
Please request Beta approval on this too.
status-firefox54: --- → unaffected
status-firefox55: --- → affected
status-firefox56: --- → affected
status-firefox-esr52: --- → unaffected
Flags: needinfo?(schien)

Comment 30

10 months ago
bugherder
https://hg.mozilla.org/mozilla-central/rev/039ee2964fb9
Status: NEW → RESOLVED
Last Resolved: 10 months ago
status-firefox56: affected → fixed
Resolution: --- → FIXED
Target Milestone: --- → mozilla56
Comment on attachment 8887431 [details]
Bug 1369571 - break reference cycle between HttpChannelChild and HttpBackgroundChannelChild while PHttpChannel is destroyed.

Approval Request Comment
[Feature/Bug causing the regression]: Bug 1015466
[User impact if declined]: memleak until content process shutdown
[Is this code covered by automated tests?]: yes
[Has the fix been verified in Nightly?]: yes
[Needs manual test from QE? If yes, steps to reproduce]: no
[List of other uplifts needed for the feature/fix]: n/a
[Is the change risky?]: low
[Why is the change risky/not risky?]: change abnormal shutdown handling
[String changes made/needed]: no
Flags: needinfo?(schien)
Attachment #8887431 - Flags: approval-mozilla-beta?

Comment 32

10 months ago
15 failures in 1008 pushes (0.015 failures/push) were associated with this bug in the last 7 days.   

Repository breakdown:
* autoland: 5
* try: 2
* mozilla-inbound: 2
* mozilla-central: 2
* oak: 1
* mozilla-beta: 1
* date: 1
* cedar: 1

Platform breakdown:
* windows8-64: 8
* osx-10-10: 7

For more details, see:
https://brasstacks.mozilla.com/orangefactor/?display=Bug&bugid=1369571&startday=2017-07-24&endday=2017-07-30&tree=all
Comment on attachment 8887431 [details]
Bug 1369571 - break reference cycle between HttpChannelChild and HttpBackgroundChannelChild while PHttpChannel is destroyed.

necko leak fix, for 55 rc1
Attachment #8887431 - Flags: approval-mozilla-beta? → approval-mozilla-beta+

Comment 34

10 months ago
bugherderuplift
https://hg.mozilla.org/releases/mozilla-beta/rev/2cd8c7f13e6c
status-firefox55: affected → fixed
Whiteboard: [necko-active][stockwell needswork] → [necko-active][stockwell fixed]
Duplicate of this bug: 1379094
You need to log in before you can comment on or make changes to this bug.