Closed Bug 1369571 Opened 7 years ago Closed 7 years ago

Intermittent leakcheck | tab process: 10136 bytes leaked (ChannelEventQueue, ChildDNSService, CondVar, ConsoleReportCollector, DataStorage, ...)

Categories

(Core :: Networking, defect)

defect
Not set
normal

Tracking

()

RESOLVED FIXED
mozilla56
Tracking Status
firefox-esr52 --- unaffected
firefox54 --- unaffected
firefox55 --- fixed
firefox56 --- fixed

People

(Reporter: intermittent-bug-filer, Assigned: schien)

References

Details

(Keywords: intermittent-failure, memory-leak, Whiteboard: [necko-active][stockwell fixed])

Attachments

(2 files)

on my radar, will check if it related to my recent PBg-Http patches.
Assignee: nobody → schien
Keywords: mlk
Whiteboard: [necko-active]
looks like it is reproducible on Mac, might have better change to reproduce it locally.
this is failing quite frequently on osx/win8 debug.  Win8 is non-e10s, osx is e10s.

here is a more recent osx debug log from trunk:
https://treeherder.mozilla.org/logviewer.html#?repo=mozilla-inbound&job_id=113535234&lineNumber=14171

this looks to be leaking in this directory:
chrome://mochitests/content/chrome/dom/xul/test

here is the bloat view:
21:25:24     INFO - == BloatView: ALL (cumulative) LEAK AND BLOAT STATISTICS, tab process 664
21:25:24     INFO - 
21:25:24     INFO -      |<----------------Class--------------->|<-----Bytes------>|<----Objects---->|
21:25:24     INFO -      |                                      | Per-Inst   Leaked|   Total      Rem|
21:25:24     INFO -    0 |TOTAL                                 |       49    10248|   48754      103|
21:25:24     INFO -   30 |ChannelEventQueue                     |      160      160|       1        1|
21:25:24     INFO -   31 |ChildDNSService                       |      168      168|       1        1|
21:25:24     INFO -   43 |CondVar                               |       64      128|      59        2|
21:25:24     INFO -   45 |ConsoleReportCollector                |      112      112|       7        1|
21:25:24     INFO -   57 |DataStorage                           |      456      912|       3        2|
21:25:24     INFO -   93 |HttpBackgroundChannelChild            |       72       72|       1        1|
21:25:24     INFO -   94 |HttpChannelChild                      |     2232     2232|       1        1|
21:25:24     INFO -  113 |LoadInfo                              |      200      200|     184        1|
21:25:24     INFO -  126 |Mutex                                 |       80      880|     160       11|
21:25:24     INFO -  142 |PHttpBackgroundChannelChild           |       40       40|       1        1|
21:25:24     INFO -  143 |PHttpChannelChild                     |       40       40|       1        1|
21:25:24     INFO -  161 |PollableEvent                         |       24       24|       1        1|
21:25:24     INFO -  185 |ReentrantMonitor                      |       40       80|      32        2|
21:25:24     INFO -  187 |RequestContext                        |       64       64|       2        1|
21:25:24     INFO -  188 |RequestContextService                 |       88       88|       1        1|
21:25:24     INFO -  194 |Runnable                              |       40       80|     366        2|
21:25:24     INFO -  197 |SchedulerEventTarget                  |       48       48|      16        1|
21:25:24     INFO -  236 |TabGroup                              |      240      240|       1        1|
21:25:24     INFO -  264 |WeakReference<nsDocShell>             |       24       24|      10        1|
21:25:24     INFO -  313 |nsAuthURLParser                       |       24       24|       2        1|
21:25:24     INFO -  359 |nsDSURIContentListener                |       88       88|       1        1|
21:25:24     INFO -  362 |nsDefaultURIFixup                     |       24       24|       1        1|
21:25:24     INFO -  369 |nsDocLoader                           |      424      424|       2        1|
21:25:24     INFO -  370 |nsDocShell                            |     1168     1168|       1        1|
21:25:24     INFO -  371 |nsDocShell::InterfaceRequestorProxy   |       32       32|       1        1|
21:25:24     INFO -  377 |nsDocumentOpenInfo                    |       88       88|       2        1|
21:25:24     INFO -  406 |nsHttpAuthCache::OriginClearObserver  |       32       64|       2        2|
21:25:24     INFO -  407 |nsHttpHandler                         |      720      720|       1        1|
21:25:24     INFO -  408 |nsHttpRequestHead                     |      136      136|       1        1|
21:25:24     INFO -  410 |nsIOService                           |      288      288|       1        1|
21:25:24     INFO -  427 |nsJSPrincipals                        |       24       24|      33        1|
21:25:24     INFO -  430 |nsLoadGroup                           |       16       16|       2        1|
21:25:24     INFO -  432 |nsMainThreadPtrHolder<T>              |       40       80|       2        2|
21:25:24     INFO -  477 |nsSiteSecurityService                 |       80       80|       1        1|
21:25:24     INFO -  478 |nsSocketTransportService              |      320      320|       1        1|
21:25:24     INFO -  479 |nsStandardURL                         |      272      272|     834        1|
21:25:24     INFO -  483 |nsStringBuffer                        |        8      144|   14195       18|
21:25:24     INFO -  494 |nsTArray_base                         |        8      168|    9096       21|
21:25:24     INFO -  502 |nsURILoader                           |       32       32|       1        1|
21:25:24     INFO -  505 |nsVariant                             |       72       72|       2        1|
21:25:24     INFO -  508 |nsWeakReference                       |       40      360|      44        9|
21:25:24     INFO -  511 |nsWebNavigationInf



I assume this is in the core::networking because of the ChildDNSService and related bits on the bloatview.

:mcmanus, I see you are the triage owner for core::networking, can you find someone on the networking team to look into this failure sometime in the next two weeks?
Flags: needinfo?(mcmanus)
Whiteboard: [necko-active] → [necko-active][stockwell needswork]
sc already took the bug
Flags: needinfo?(mcmanus)
Sorry for the late response, I'll provide analysis in these two days.
Reproduced on my local machine with |./mach mochitest --run-until-failure dom/xul/test/test_bug1271240.xul| for few times.

The leakage is caused by the reference cycle between HttpChannelChild and HttpBackgroundChannelChild.
The reference cycle is supposed to be removed when HttpBackgroundChannelChild::ActorDestroyed, however this operation is pending due until HttpChannelChild receives OnStartRequest IPC message. Unfortunately the OnStartRequest IPC message is dropped because of PContent shutdown procedure. So, the reference removal is never executed, which leaks memory until content process is terminated.
Attached file fail-http.log
relevant HTTP MOZ log.

HttpBackgroundChannelChild @0x7f3a48dafd30 and HttpChannelChild @0x7f3a48d40000 are the leaked object.
Comment on attachment 8887431 [details]
Bug 1369571 - break reference cycle between HttpChannelChild and HttpBackgroundChannelChild while PHttpChannel is destroyed.

cancel r? because try result looks bad.
Attachment #8887431 - Flags: review?(honzab.moz)
@mayhemer, is there anything blocking the review? I'd like to land this patch before merge day.
Flags: needinfo?(honzab.moz)
Comment on attachment 8887431 [details]
Bug 1369571 - break reference cycle between HttpChannelChild and HttpBackgroundChannelChild while PHttpChannel is destroyed.

https://reviewboard.mozilla.org/r/158278/#review167396

::: netwerk/protocol/http/HttpBackgroundChannelChild.cpp:124
(Diff revision 2)
>  
>    // HttpChannelChild is not going to handle any incoming message.
>    mChannelChild = nullptr;
> +
> +  // Remove pending IPC messages as well.
> +  mQueuedRunnables.Clear();

can any of the messages hold a reference to an object we expect to release on some particular thread?  in that case you would need to release them on the respective threads one by one.

::: netwerk/protocol/http/HttpChannelChild.cpp:3620
(Diff revision 2)
> +  MOZ_ASSERT(NS_IsMainThread());
> +
> +  // OnStartRequest might be dropped if IPDL is destroyed abnormally
> +  // and BackgroundChild might have pending IPC messages.
> +  // Clean up BackgroundChild at this time to prevent memleak.
> +  if (aWhy != Deletion) {

any reason why we shouldn't do this all the time?  why we don't need to do this when the reason is Deletion?  (no docs for this state..)
c20
Flags: needinfo?(honzab.moz)
Comment on attachment 8887431 [details]
Bug 1369571 - break reference cycle between HttpChannelChild and HttpBackgroundChannelChild while PHttpChannel is destroyed.

https://reviewboard.mozilla.org/r/158278/#review167616

::: netwerk/protocol/http/HttpBackgroundChannelChild.cpp:124
(Diff revision 2)
>  
>    // HttpChannelChild is not going to handle any incoming message.
>    mChannelChild = nullptr;
> +
> +  // Remove pending IPC messages as well.
> +  mQueuedRunnables.Clear();

No, all the runnables are contains a copy of data received from IPC. Nothing is required to release on specific thread.

::: netwerk/protocol/http/HttpChannelChild.cpp:3620
(Diff revision 2)
> +  MOZ_ASSERT(NS_IsMainThread());
> +
> +  // OnStartRequest might be dropped if IPDL is destroyed abnormally
> +  // and BackgroundChild might have pending IPC messages.
> +  // Clean up BackgroundChild at this time to prevent memleak.
> +  if (aWhy != Deletion) {

The only document I found is in the original bug: https://bugzilla.mozilla.org/show_bug.cgi?id=529005#c6
Deletion is paired with Send__delete__(), which is the normal case of closing an IPDL actor.

On parent side we might close the background channel immediately after all messages are sent. If we handle the ActorDestroy immediately we will either lose the chance to process those enqueued messages. Normal Deletion should be treated in order with other IPC messages.
@mayhemer, please see my response in comment #23.
Flags: needinfo?(honzab.moz)
Thanks for this reply!  This is r+ then.
Flags: needinfo?(honzab.moz)
Comment on attachment 8887431 [details]
Bug 1369571 - break reference cycle between HttpChannelChild and HttpBackgroundChannelChild while PHttpChannel is destroyed.

https://reviewboard.mozilla.org/r/158278/#review167748
Attachment #8887431 - Flags: review?(honzab.moz) → review+
(maybe these comments in some simplified form should appear in the code as well.)
Pushed by schien@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/039ee2964fb9
break reference cycle between HttpChannelChild and HttpBackgroundChannelChild while PHttpChannel is destroyed. r=mayhemer
Please request Beta approval on this too.
https://hg.mozilla.org/mozilla-central/rev/039ee2964fb9
Status: NEW → RESOLVED
Closed: 7 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla56
Comment on attachment 8887431 [details]
Bug 1369571 - break reference cycle between HttpChannelChild and HttpBackgroundChannelChild while PHttpChannel is destroyed.

Approval Request Comment
[Feature/Bug causing the regression]: Bug 1015466
[User impact if declined]: memleak until content process shutdown
[Is this code covered by automated tests?]: yes
[Has the fix been verified in Nightly?]: yes
[Needs manual test from QE? If yes, steps to reproduce]: no
[List of other uplifts needed for the feature/fix]: n/a
[Is the change risky?]: low
[Why is the change risky/not risky?]: change abnormal shutdown handling
[String changes made/needed]: no
Flags: needinfo?(schien)
Attachment #8887431 - Flags: approval-mozilla-beta?
Comment on attachment 8887431 [details]
Bug 1369571 - break reference cycle between HttpChannelChild and HttpBackgroundChannelChild while PHttpChannel is destroyed.

necko leak fix, for 55 rc1
Attachment #8887431 - Flags: approval-mozilla-beta? → approval-mozilla-beta+
https://hg.mozilla.org/releases/mozilla-beta/rev/2cd8c7f13e6c
Whiteboard: [necko-active][stockwell needswork] → [necko-active][stockwell fixed]
You need to log in before you can comment on or make changes to this bug.