js::TraceLoggerThread using 340MB of heap-unclassified in parent process

RESOLVED FIXED in Firefox 52

Status

()

Core
JavaScript Engine: JIT
P1
normal
RESOLVED FIXED
a year ago
5 months ago

People

(Reporter: bkelly, Assigned: h4writer)

Tracking

({mlk})

unspecified
mozilla54
Points:
---
Bug Flags:
qe-verify -

Firefox Tracking Flags

(firefox-esr45- wontfix, firefox51+ wontfix, firefox52+ fixed, firefox53+ fixed, firefox54+ fixed)

Details

(Whiteboard: [MemShrink:P1])

Attachments

(2 attachments, 1 obsolete attachment)

(Reporter)

Description

a year ago
I've been trying to find the source of high heap-unclassified in bug 1330976.  I found that js::TraceLoggerThread::init() seems to use a lot of heap-unclassified in some cases.

The following entries were found in a nightly FF54 DMD build when I came back to my computer this morning.  They add up to ~341MB.

Unreported {
  111,394 blocks in heap block record 1 of 13,766
  114,067,456 bytes (85,550,592 requested / 28,516,864 slop)
  Individual block sizes: 1,024 x 111,394
  14.26% of the heap (14.26% cumulative)
  24.38% of unreported (24.38% cumulative)
  Allocated at {
    #01: calloc_impl (c:\devel\mozilla-central\memory\build\replace_malloc.c:182)
    #02: js::TraceLoggerThread::init (c:\devel\mozilla-central\js\src\vm\tracelogging.cpp:118)
    #03: js::TraceLoggerThreadState::create (c:\devel\mozilla-central\js\src\vm\tracelogging.cpp:10>
    #04: js::TraceLoggerForCurrentThread (c:\devel\mozilla-central\js\src\vm\tracelogging.cpp:970)
    #05: js::HelperThread::handleGCParallelWorkload (c:\devel\mozilla-central\js\src\vm\helperthrea>
    #06: js::HelperThread::threadLoop (c:\devel\mozilla-central\js\src\vm\helperthreads.cpp:1892)
    #07: js::detail::ThreadTrampoline<void (__cdecl&)(void * __ptr64),js::HelperThread * __ptr64>::>
    #08: o__realloc_base[C:\WINDOWS\System32\ucrtbase.dll +0x1cab0]
    #09: BaseThreadInitThunk[C:\WINDOWS\System32\KERNEL32.DLL +0x8364]
    #10: RtlUserThreadStart[C:\WINDOWS\SYSTEM32\ntdll.dll +0x670d1]
  }
}

Unreported {
  111,302 blocks in heap block record 2 of 13,766
  113,973,248 bytes (85,479,936 requested / 28,493,312 slop)
  Individual block sizes: 1,024 x 111,302
  14.25% of the heap (28.50% cumulative)
  24.36% of unreported (48.74% cumulative)
  Allocated at {
    #01: calloc_impl (c:\devel\mozilla-central\memory\build\replace_malloc.c:182)
    #02: js::TraceLoggerThread::init (c:\devel\mozilla-central\js\src\vm\tracelogging.cpp:120)
    #03: js::TraceLoggerThreadState::create (c:\devel\mozilla-central\js\src\vm\tracelogging.cpp:10>
    #04: js::TraceLoggerForCurrentThread (c:\devel\mozilla-central\js\src\vm\tracelogging.cpp:970)
    #05: js::HelperThread::handleGCParallelWorkload (c:\devel\mozilla-central\js\src\vm\helperthrea>
    #06: js::HelperThread::threadLoop (c:\devel\mozilla-central\js\src\vm\helperthreads.cpp:1892)
    #07: js::detail::ThreadTrampoline<void (__cdecl&)(void * __ptr64),js::HelperThread * __ptr64>::>
    #08: o__realloc_base[C:\WINDOWS\System32\ucrtbase.dll +0x1cab0]
    #09: BaseThreadInitThunk[C:\WINDOWS\System32\KERNEL32.DLL +0x8364]
    #10: RtlUserThreadStart[C:\WINDOWS\SYSTEM32\ntdll.dll +0x670d1]
  }
}

Unreported {
  111,299 blocks in heap block record 3 of 13,766
  113,970,176 bytes (113,970,176 requested / 0 slop)
  Individual block sizes: 1,024 x 111,299
  14.25% of the heap (42.75% cumulative)
  24.36% of unreported (73.09% cumulative)
  Allocated at {
    #01: malloc_impl (c:\devel\mozilla-central\memory\build\replace_malloc.c:152)
    #02: js::TraceLoggerThread::init (c:\devel\mozilla-central\js\src\vm\tracelogging.cpp:122)
    #03: js::TraceLoggerThreadState::create (c:\devel\mozilla-central\js\src\vm\tracelogging.cpp:10>
    #04: js::TraceLoggerForCurrentThread (c:\devel\mozilla-central\js\src\vm\tracelogging.cpp:970)
    #05: js::HelperThread::handleGCParallelWorkload (c:\devel\mozilla-central\js\src\vm\helperthrea>
    #06: js::HelperThread::threadLoop (c:\devel\mozilla-central\js\src\vm\helperthreads.cpp:1892)
    #07: js::detail::ThreadTrampoline<void (__cdecl&)(void * __ptr64),js::HelperThread * __ptr64>::>
    #08: o__realloc_base[C:\WINDOWS\System32\ucrtbase.dll +0x1cab0]
    #09: BaseThreadInitThunk[C:\WINDOWS\System32\KERNEL32.DLL +0x8364]
    #10: RtlUserThreadStart[C:\WINDOWS\SYSTEM32\ntdll.dll +0x670d1]
  }
(Reporter)

Comment 1

a year ago
Jon, any ideas what is going on here?
Flags: needinfo?(jcoppeard)
(Reporter)

Comment 2

a year ago
There are two issues here:

1) Maybe this code should have a memory reporter
2) Why is it using so much memory?
Hannes knows more about the trace logger.
Flags: needinfo?(jcoppeard) → needinfo?(hv1989)
(Assignee)

Comment 4

a year ago
This sound bad! I know that tracelogger doesn't report its memory. But it shouldn't use that much memory when disabled! Investigating.
Component: JavaScript: GC → JavaScript Engine: JIT
Priority: -- → P1
So it looks like we call TraceLoggerThread::init at least in the following cases:

1) Main thread of the JSRuntime (e.g. via GCRuntime::minorGC calling js::TraceLoggerForMainThread).
2) Various jit helper threads: HelperThread::handleIonWorkload calls js::TraceLoggerForCurrentThread.
3) Compression helper threads: HelperThread::handleCompressionWorkload calls js::TraceLoggerForCurrentThread.
4) GC threads: HelperThread::handleGCParallelWorkload calls js::TraceLoggerForCurrentThread.
5) JitRuntime creation: JitRuntime::generateTLEventVM calls js::TraceLoggerForMainThread (and is called
   indirectly from JitRuntime::initialize).

Each hashtable init looks like it's 1024 bytes based on moz_malloc_usable_size, so each time we end up with 3KB.

I see a dozen or two hits during startup, which is not too fatal.  The question is how we ended up with 100K hits.

It's possible that we simply have 100K threads around, but seems unlikely.  Or maybe we're never cleaning up TraceLoggerThread instances?

Looks like we store such instances in three places: PerThreadData::traceLogger and TraceLoggerThreadState::mainThreadLoggers (via TraceLoggerThreadState::forMainThread), and TraceLoggerThreadState::threadLoggers (via TraceLoggerThreadState::forThread).  I think the PerThreadData ref is not owning, and ~TraceLoggerThreadState does delete all the stuff in mainThreadLoggers and threadLoggers.

But as far as I can tell this means that we don't clear the threadloggers until ~TraceLoggerThreadState, which happens only in JS_ShutDown.  Which happens only at process termination, no?  So we keep allocating more and more of these things and never freeing them.
And in particular, outside of ~TraceLoggerThreadState nothing ever removes things from the TraceLoggerThreadState hashmaps, afaict.
[Tracking Requested - why for this release]: memory leak over time with worker usage, particularly bad on Nightly.
tracking-firefox52: --- → ?
tracking-firefox53: --- → ?
tracking-firefox54: --- → ?
(Assignee)

Comment 8

a year ago
Created attachment 8830854 [details] [diff] [review]
Patch

Very embarrassing. We are keeping a list of mainThread loggers, but never remove it from the list. I transformed it in a linkedlist in order to make it possible to add/remove it upon construction/destruction. The array was annoying since we had to iterate the whole list.

(Couldn't use LinkedListNode, since the pointers are not needed on all loggers.)
Assignee: nobody → hv1989
Flags: needinfo?(hv1989)
Attachment #8830854 - Flags: review?(bbouvier)
(Assignee)

Comment 9

a year ago
Can you give steps to reproduce? I want to make sure this fixes it fully.
Flags: needinfo?(bkelly)
(Reporter)

Comment 10

a year ago
(In reply to Hannes Verschore [:h4writer] from comment #9)
> Can you give steps to reproduce? I want to make sure this fixes it fully.

Unfortunately my steps were "use nightly for a couple days".

If creating worker threads would trigger this problem you could write a test page that spawns and terminates workers in a loop.  Before your patch in theory it should increase heap-unclassified over time and not increase memory with your patch.
Flags: needinfo?(bkelly)
(Reporter)

Comment 11

a year ago
Does this test case work for you?  It should start leaking immediately when you open it:

https://people-mozilla.org/~bkelly/bug1334129/index.html
Flags: needinfo?(bkelly)
(Reporter)

Comment 12

a year ago
Wrong NI... Does this test case work for you?

https://people-mozilla.org/~bkelly/bug1334129/index.html
Flags: needinfo?(bkelly) → needinfo?(hv1989)
Tracking 52/53/54+ for this memory leak.
tracking-firefox52: ? → +
tracking-firefox53: ? → +
tracking-firefox54: ? → +
(Assignee)

Comment 14

a year ago
Comment on attachment 8830854 [details] [diff] [review]
Patch

Review of attachment 8830854 [details] [diff] [review]:
-----------------------------------------------------------------

Overnight I realized I can do this with less code.
Attachment #8830854 - Flags: review?(bbouvier)
(Assignee)

Comment 15

a year ago
Created attachment 8831068 [details] [diff] [review]
Patch

Reusing LinkedList now. Much cleaner.

I tested it on the testcase and we now stay at 5mb unclassified, instead of leaking on the given testcase.
Attachment #8830854 - Attachment is obsolete: true
Flags: needinfo?(hv1989)
Attachment #8831068 - Flags: review?(bbouvier)
(Assignee)

Comment 16

a year ago
Follow-up bug 1334409 for implementing a memory reporter.
Comment on attachment 8831068 [details] [diff] [review]
Patch

Review of attachment 8831068 [details] [diff] [review]:
-----------------------------------------------------------------

LGTM. We'd really need to use more UniquePtrs in this code...

::: js/src/vm/TraceLogging.cpp
@@ +972,5 @@
> +void
> +TraceLoggerThreadState::destroyMainThread(JSRuntime* runtime)
> +{
> +    MOZ_ASSERT(initialized);
> +    PerThreadData* mainThread = &runtime->mainThread;

As far as I can tell, there's an #ifdef JS_TRACE_LOGGING around the definition of runtime->mainThread, but this is not protected here. Can you ensure it doesn't break builds without TL, please?

@@ +1001,5 @@
>      ThreadLoggerHashMap::AddPtr p = threadLoggers.lookupForAdd(thread);
>      if (p)
>          return p->value();
>  
> +    TraceLoggerThread* logger = js_new<TraceLoggerThread>();

(bonus points if using a UniquePtr and js::MakeUnique here)

@@ -999,5 @@
>      return logger;
>  }
>  
> -TraceLoggerThread*
> -TraceLoggerThreadState::create()

nit: Don't forget to remove the declaration in the header file too.
Attachment #8831068 - Flags: review?(bbouvier) → review+
(Assignee)

Comment 18

a year ago
(In reply to Benjamin Bouvier [:bbouvier] from comment #17)
> Comment on attachment 8831068 [details] [diff] [review]
> Patch
> 
> Review of attachment 8831068 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> LGTM. We'd really need to use more UniquePtrs in this code...
> 
> ::: js/src/vm/TraceLogging.cpp
> @@ +972,5 @@
> > +void
> > +TraceLoggerThreadState::destroyMainThread(JSRuntime* runtime)
> > +{
> > +    MOZ_ASSERT(initialized);
> > +    PerThreadData* mainThread = &runtime->mainThread;
> 
> As far as I can tell, there's an #ifdef JS_TRACE_LOGGING around the
> definition of runtime->mainThread, but this is not protected here. Can you
> ensure it doesn't break builds without TL, please?

TraceLogging.cpp only gets compiled when JS_TRACE_LOGGING is true. I.e. not an issue.
Though there are pre-existing issues. Opened follow-up bug.
 
> @@ +1001,5 @@
> >      ThreadLoggerHashMap::AddPtr p = threadLoggers.lookupForAdd(thread);
> >      if (p)
> >          return p->value();
> >  
> > +    TraceLoggerThread* logger = js_new<TraceLoggerThread>();
> 
> (bonus points if using a UniquePtr and js::MakeUnique here)

That will be follow-up bug.

> @@ -999,5 @@
> >      return logger;
> >  }
> >  
> > -TraceLoggerThread*
> > -TraceLoggerThreadState::create()
> 
> nit: Don't forget to remove the declaration in the header file too.

Thanks! I forgot that indeed.

Follow-up bug is bug 1334426

Comment 19

a year ago
Pushed by hv1989@gmail.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/3f780942dfde
TraceLogger - On destruction remove the logger from the mainlogger list, r=bbouvier
(Assignee)

Comment 20

a year ago
Created attachment 8831208 [details] [diff] [review]
esr45 patch
Attachment #8831208 - Flags: review+
(Assignee)

Comment 21

a year ago
Requesting tracking for esr45 and release. Removing this memory leak on those versions too. I think we are planning a dot release in that case we should include this.
status-firefox51: --- → affected
status-firefox52: --- → affected
status-firefox53: --- → affected
status-firefox54: --- → affected
status-firefox-esr45: --- → affected
tracking-firefox51: --- → ?
tracking-firefox-esr45: --- → ?
Keywords: mlk
Whiteboard: [MemShrink] → [MemShrink:P1]

Comment 22

a year ago
bugherder
https://hg.mozilla.org/mozilla-central/rev/3f780942dfde
Status: NEW → RESOLVED
Last Resolved: a year ago
status-firefox54: affected → fixed
Resolution: --- → FIXED
Target Milestone: --- → mozilla54
(Assignee)

Comment 23

a year ago
Comment on attachment 8831068 [details] [diff] [review]
Patch

Approval Request Comment
[Feature/Bug causing the regression]:
Tracelogging when disabled.

[User impact if declined]:
When heavily using workers a little bit of memory is used, but never freed. I.e. memory leak on particular workload. Shouldn't impact normal usage that much. Though workers are starting to get used more often! I assume that is the reason  this only surfaced now.

[Is this code covered by automated tests?]:
Yes, jit-tests has tracelogger tests. And every other test is running it disabled. I.e. what we are concerned about here.

[Has the fix been verified in Nightly?]:
Yes. No memory leak on nightly anymore on the given testcase.

[Needs manual test from QE? If yes, steps to reproduce]: 
No

[List of other uplifts needed for the feature/fix]:
/

[Is the change risky?]:
[Why is the change risky/not risky?]:
Not really risky. It is a change of data-structure from Vector to LinkedList

[String changes made/needed]:
/
Attachment #8831068 - Flags: approval-mozilla-release?
Attachment #8831068 - Flags: approval-mozilla-beta?
Attachment #8831068 - Flags: approval-mozilla-aurora?
(Assignee)

Comment 24

a year ago
Comment on attachment 8831208 [details] [diff] [review]
esr45 patch

[Approval Request Comment]
If this is not a sec:{high,crit} bug, please state case for ESR consideration:
- memory leak when using workers.

User impact if declined: 
- memory that cannot be used by other resources in Firefox / system and eventually a OOM crash.

Fix Landed on Version:
FF54 (in progress of backporting)

Risk to taking this patch (and alternatives if risky): 
Not really risky. It is a change of data-structure from Vector to LinkedList

String or UUID changes made by this patch: 
/

See https://wiki.mozilla.org/Release_Management/ESR_Landing_Process for more info.
Attachment #8831208 - Flags: approval-mozilla-esr45?
Comment on attachment 8831068 [details] [diff] [review]
Patch

fix tracelogger leak, aurora53+, beta52+
Attachment #8831068 - Flags: approval-mozilla-beta?
Attachment #8831068 - Flags: approval-mozilla-beta+
Attachment #8831068 - Flags: approval-mozilla-aurora?
Attachment #8831068 - Flags: approval-mozilla-aurora+

Comment 26

a year ago
bugherderuplift
https://hg.mozilla.org/releases/mozilla-aurora/rev/e5f851a66841
status-firefox53: affected → fixed

Comment 27

a year ago
bugherderuplift
https://hg.mozilla.org/releases/mozilla-beta/rev/4b91754cb349
status-firefox52: affected → fixed
This sounds pretty bad. Do we have any way to tell how common it is on release? 
Does it seem worth doing a dot release for this alone?
Flags: needinfo?(hv1989)
(Assignee)

Comment 29

a year ago
(In reply to Liz Henry (:lizzard) (needinfo? me) from comment #28)
> This sounds pretty bad. Do we have any way to tell how common it is on
> release? 
> Does it seem worth doing a dot release for this alone?

I don't think it is that common. It has been in release for quite some time and this is the first complaint. Workers aren't used that often and if used in a very moderate number. If we are doing a dot release this should definitely get included. If we aren't I think we can just wait a month. We release often enough and after 2 years another month won't make the difference.
Flags: needinfo?(hv1989)
(In reply to Hannes Verschore [:h4writer] from comment #29)
> (In reply to Liz Henry (:lizzard) (needinfo? me) from comment #28)
> > This sounds pretty bad. Do we have any way to tell how common it is on
> > release? 
> > Does it seem worth doing a dot release for this alone?
> 
> I don't think it is that common. It has been in release for quite some time
> and this is the first complaint. Workers aren't used that often and if used
> in a very moderate number. If we are doing a dot release this should
> definitely get included. If we aren't I think we can just wait a month. We
> release often enough and after 2 years another month won't make the
> difference.

It sounds like we might be doing a point release partially for moz://a support, so maybe this patch could go in with that?
Track 51+ as memory leak issue.
tracking-firefox51: ? → +
Flags: qe-verify-
(Reporter)

Updated

a year ago
See Also: → bug 1336448

Comment 32

11 months ago
Hi :h4writer,
As you may already know, we only take sec-crit and sec-high fixes on ESR45. We can take stability fixes but only for top crashers. I am not sure if this one meets the bar. According to comment #29, it's been there for a while, can we won't fix this in ESR45?
Flags: needinfo?(hv1989)
(Assignee)

Comment 33

11 months ago
(In reply to Gerry Chang [:gchang] from comment #32)
> Hi :h4writer,
> As you may already know, we only take sec-crit and sec-high fixes on ESR45.
> We can take stability fixes but only for top crashers. I am not sure if this
> one meets the bar. According to comment #29, it's been there for a while,
> can we won't fix this in ESR45?

I think we can. It is not leaking fast. Though it feels bad for me to keep this memory leak in esr45.
Flags: needinfo?(hv1989)

Comment 34

11 months ago
Comment on attachment 8831208 [details] [diff] [review]
esr45 patch

Per comment #33, wont fix in ESR45.
Attachment #8831208 - Flags: approval-mozilla-esr45? → approval-mozilla-esr45-

Comment 35

11 months ago
Comment on attachment 8831068 [details] [diff] [review]
Patch

We don't have then plan to have dot release for 51. Rel51-.
Attachment #8831068 - Flags: approval-mozilla-release? → approval-mozilla-release-

Updated

11 months ago
status-firefox51: affected → wontfix
status-firefox-esr45: affected → wontfix
tracking-firefox-esr45: ? → -
You need to log in before you can comment on or make changes to this bug.