The profile module leaks on debug builds

RESOLVED FIXED in mozilla37

Status

()

Core
Gecko Profiler
--
blocker
RESOLVED FIXED
3 years ago
2 years ago

People

(Reporter: vporof, Assigned: tromey)

Tracking

({regression})

unspecified
mozilla37
regression
Points:
---
Dependency tree / graph
Bug Flags:
in-testsuite ?

Firefox Tracking Flags

(Not tracked)

Details

Attachments

(1 attachment, 8 obsolete attachments)

(Reporter)

Description

3 years ago
Run this as a browser mochitest:

function test() {
  let profiler = Cc["@mozilla.org/tools/profiler;1"].getService(Ci.nsIProfiler);

  let ENTRIES = 1000000;
  let INTERVAL = 1;
  let FEATURES = ["js"];
  profiler.StartProfiler(ENTRIES, INTERVAL, FEATURES, FEATURES.length);

  console.profile();

  finish();
}

Leaks: 
Mutex
ReentrantMonitor
SyncProfile
ThreadProfile
ThreadResponsiveness
nsTArray_base
nsThread

Full log: https://pastebin.mozilla.org/5738770

This is badly blocking bug 879008!
(Reporter)

Updated

3 years ago
Blocks: 879008
Keywords: regressionwindow-wanted
when we find the source of this, we should get this included in our testsuites.
Flags: in-testsuite?
Keywords: regression
Does finish() here stop the profiler? If not it's a bug in the test.

... But in any case we shouldn't be leaking even if we forget to stop the profiler.
Spoke with Benwa to see if he knew of any recent changes that might have triggered this. He did mention that the test doesn't explicitly stop the profiler which is technically an error in the test, though I think this is still a new failure. Nothing preventing a user from running console.profile() and leaving that running.

I don't think we explicitly state anywhere that the profiler must have a matching StopProfile for every StartProfile call.
(In reply to Benoit Girard (:BenWa) from comment #2)
> Does finish() here stop the profiler? If not it's a bug in the test.
> 
> ... But in any case we shouldn't be leaking even if we forget to stop the
> profiler.

no, I don't think so. Individual tests *could* register a cleanup function for every invocation of startProfiler but like I said, the profiler itself should be resilient enough to not leak if a dumb user left it on.
(Reporter)

Comment 5

3 years ago
(In reply to Benoit Girard (:BenWa) from comment #2)
> Does finish() here stop the profiler? If not it's a bug in the test.
> 

No. It's an artifact of using waitForExplicitFinish in my head.js You can safely ignore that.

> ... But in any case we shouldn't be leaking even if we forget to stop the
> profiler.

Stopping the profiler doesn't actually kill the leak either.
(Reporter)

Comment 6

3 years ago
(In reply to Rob Campbell [:rc] (:robcee) from comment #3)
> 
> I don't think we explicitly state anywhere that the profiler must have a
> matching StopProfile for every StartProfile call.

That would be extremely incorrect. If there are two clients (e.g. "toolboxes"), and one of them stops the profiler, the other one loses all the data. Anyway, that's a completely different discussion.
(Reporter)

Comment 7

3 years ago
So this still leaks:

function test() {
  let profiler = Cc["@mozilla.org/tools/profiler;1"].getService(Ci.nsIProfiler);

  let ENTRIES = 1000000;
  let INTERVAL = 1;
  let FEATURES = ["js"];
  profiler.StartProfiler(ENTRIES, INTERVAL, FEATURES, FEATURES.length);

  console.profile();

  profiler.StopProfiler();
}
(Reporter)

Comment 8

3 years ago
Try run, with memleaks and use-after-free: https://tbpl.mozilla.org/?tree=Try&rev=78bb6f3cc246
(Reporter)

Comment 9

3 years ago
There is also a crash in there [@ libsystem_kernel.dylib + 0x12386], but it's possibly a different bug.
(Reporter)

Comment 10

3 years ago
This is really weird. As opposed to comment 7, this *sometimes* doesn't leak:

function test() {
  let profiler = Cc["@mozilla.org/tools/profiler;1"].getService(Ci.nsIProfiler);

  let ENTRIES = 1000000;
  let INTERVAL = 1;
  let FEATURES = ["js"];
  profiler.StartProfiler(ENTRIES, INTERVAL, FEATURES, FEATURES.length);

  console.profile();
}

registerCleanupFunction(() => {
  let profiler = Cc["@mozilla.org/tools/profiler;1"].getService(Ci.nsIProfiler);
  profiler.StopProfiler();
});

The only change is that the profiler is stopped later.
What about calling console.profileEnd() in addition to StopProfiler()?

Again, it still shouldn't be leaking, but it would be nice to narrow it down some more too.
(Reporter)

Comment 12

3 years ago
Upon further investigation, calling console.profile() isn't absolutely necessary in order to get those leaks. Doing so, however, guarantees a permaleak.
(Reporter)

Updated

3 years ago
Summary: The profile module leaks when invoking console.profile() on debug builds → The profile module leaks on debug builds
(Reporter)

Updated

3 years ago
Duplicate of this bug: 963193
(Reporter)

Updated

3 years ago
Blocks: 973974
(Reporter)

Updated

3 years ago
Blocks: 1049818
Severity: normal → blocker
(Assignee)

Comment 14

3 years ago
I tried this today.  I was able to see leaks using the original test case.
The only leaks I could see (after trying many times) for the tests in
comment #7 and comment #10 were intermittent leaks like:

TEST-INFO | leakcheck | default process: leaked 18446744073709551615 WeakReference<MessageListener> (0 bytes)
TEST-UNEXPECTED-FAIL | leakcheck | default process: 16 bytes leaked (WeakReference<MessageListener>)
(Assignee)

Comment 15

3 years ago
The leaks I see from the original test are:

TEST-INFO | leakcheck | default process: leaked 4 Mutex (128 bytes)
TEST-INFO | leakcheck | default process: leaked 3 ProfilerBacktrace (24 bytes)
TEST-INFO | leakcheck | default process: leaked 1 ReentrantMonitor (40 bytes)
TEST-INFO | leakcheck | default process: leaked 3 SyncProfile (504 bytes)
TEST-INFO | leakcheck | default process: leaked 3 ThreadProfile (456 bytes)
TEST-INFO | leakcheck | default process: leaked 3 ThreadResponsiveness (72 bytes)
TEST-INFO | leakcheck | default process: leaked 1 nsTArray_base (8 bytes)
TEST-INFO | leakcheck | default process: leaked 1 nsThread (240 bytes)
TEST-UNEXPECTED-FAIL | leakcheck | default process: 1472 bytes leaked (Mutex, ProfilerBacktrace, ReentrantMonitor, SyncProfile, ThreadProfile, ...)


The ProfilerBacktrace, SyncProfile, ThreadProfile, and ThreadResponsiveness leaks are due to
bug 980109.  I have a patch for this part, and I'll upload it -- but
note that bug 980109 was wontfix'd, so it's unclear whether this will fly.

I am not sure if there is a way to tell the test harness that some leaks are
expected.  If there is, that would be another way to handle the problem (similar
to how bug 980109 was dealt with).

I'm still investigating the other leaks.
(Assignee)

Comment 16

3 years ago
Created attachment 8508806 [details] [diff] [review]
delete the pseudostack in mozilla_sampler_shutdown

Here's the patch that fixes a subset of the leaks.  It arranges to
delete the pseudostack object when shutting down the profiler.

I have not run this through a try build yet.

Note that this introduces a cost to mozilla_sampler_call_exit.
Specifically, see the comment that was removed from
mozilla_sampler_call_enter.  I don't know how important this is.
(Reporter)

Comment 17

3 years ago
The profiler is rarely (or theoretically should never have to be) shut down. Usually it's kept active and consumers simply fetch the entire circular buffer, splicing out the interesting data. Therefore, I'd say that the additional cost to mozilla_sampler_call_exit is not important.
(Assignee)

Updated

3 years ago
Assignee: nobody → ttromey
(Assignee)

Comment 18

3 years ago
Instrumenting a few more constructors (PseudoStack, Sampler, ThreadInfo, etc)
shows a few more leaks as well.

I think the nsThread leak comes from TableTicker.cpp:NewSyncProfile.
This allocates a ThreadInfo, which holds a reference to nsThread; but
nothing ever frees this ThreadInfo.

I'm still debugging the fix.
(Assignee)

Comment 19

3 years ago
Created attachment 8510621 [details] [diff] [review]
clean up memory leaks, add ctor marking, and fix a valgrind error

Here's the patch that I'm testing.

This includes the earlier patch, plus an update to remove an abort
from ~PseudoStack.

I added a few MOZ_COUNT_CTOR markers here and there.  This revealed
that we were leaking more memory than previously believed, but didn't
reveal any new sources of leaks.

The nsThread was kept alive by the ThreadInfo object as mentioned in
comment #18.  This patches fixes this by deleting the object in the
SyncProfile destructor, SyncProfile objects are only created with a
fresh ThreadInfo.  This seems somewhat unclean to me; but a better fix
would probably be more invasive.

The mozilla_sampler_stop change is just to avoid initializing the
profiler during shutdown.  I don't think this is leaky, it just looked
weird.

The TickSample change is to make valgrind not complain about using
uninitialized values.  I think this is perhaps questionable (it may be
papering over unintended behavior) and should be pushed off into
another bug; but it's handy to make the output a bit quieter while
working on this patch.
Attachment #8508806 - Attachment is obsolete: true
(Assignee)

Comment 20

3 years ago
(In reply to Victor Porof [:vporof][:vp] from comment #17)
> The profiler is rarely (or theoretically should never have to be) shut down.
> Usually it's kept active and consumers simply fetch the entire circular
> buffer, splicing out the interesting data. Therefore, I'd say that the
> additional cost to mozilla_sampler_call_exit is not important.

mozilla_sampler_call_exit is used to mark the exit from a stack frame.
Normally I wouldn't worry about it but I deleted two comments about how
it's written a certain way to avoid a TLS lookup :)
I'm not sure the best way to test this to see whether it truly matters.
(Assignee)

Comment 21

3 years ago
https://tbpl.mozilla.org/?tree=Try&rev=6bccf78e6950
(Assignee)

Comment 22

3 years ago
I filed bug 1088731 for the undefined behavior.
(Assignee)

Comment 23

3 years ago
Created attachment 8511308 [details] [diff] [review]
clean up memory leaks and add ctor marking
Attachment #8510621 - Attachment is obsolete: true
(Assignee)

Updated

3 years ago
Attachment #8511308 - Flags: review?(bgirard)
Comment on attachment 8511308 [details] [diff] [review]
clean up memory leaks and add ctor marking

Review of attachment 8511308 [details] [diff] [review]:
-----------------------------------------------------------------

::: tools/profiler/GeckoProfilerImpl.h
@@ -446,5 @@
> -  // but for now it is simply use to save a call to
> -  // pthread_getspecific on exit. It also supports the
> -  // case where the sampler is initialized between
> -  // enter and exit.
> -  return stack;

This is important. Why is this gone? Also I'm not sure how this built because the types don't match.

@@ +452,5 @@
>  
> +  mozilla::ThreadLocal<PseudoStack *> *stackTls = (mozilla::ThreadLocal<PseudoStack *> *) aHandle;
> +  PseudoStack *stack = stackTls->get();
> +  if (stack) {
> +    stack->pop();

We're adding a TLS lookup and a branch in very hot code. This was an intentional design decision. This needs a justification to undo this design decission.

::: tools/profiler/PseudoStack.h
@@ -298,3 @@
>  
>    ~PseudoStack() {
> -    if (mStackPointer != 0) {

This is important. Why is this gone?
If the bug is that we're leaking a marker or something of that sort then we should focus on fixing that and leave the pseudostack leak alone. It's in the ignore list and not costing us much.
(Assignee)

Comment 26

3 years ago
(In reply to Benoit Girard (:BenWa) from comment #24)
> Comment on attachment 8511308 [details] [diff] [review]
> clean up memory leaks and add ctor marking
> 
> Review of attachment 8511308 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> ::: tools/profiler/GeckoProfilerImpl.h
> @@ -446,5 @@
> > -  // but for now it is simply use to save a call to
> > -  // pthread_getspecific on exit. It also supports the
> > -  // case where the sampler is initialized between
> > -  // enter and exit.
> > -  return stack;
> 
> This is important. Why is this gone? Also I'm not sure how this built
> because the types don't match.

I don't understand.  It isn't gone, it's been replaced with

+  return &tlsPseudoStack;

The apparent type mismatch isn't really, as the function returns void*.
The earlier return always returns null, so it is also safe.
But, it is confusing as-is and I changed it locally to explicitly
return nullptr.

> @@ +452,5 @@
> >  
> > +  mozilla::ThreadLocal<PseudoStack *> *stackTls = (mozilla::ThreadLocal<PseudoStack *> *) aHandle;
> > +  PseudoStack *stack = stackTls->get();
> > +  if (stack) {
> > +    stack->pop();
> 
> We're adding a TLS lookup and a branch in very hot code. This was an
> intentional design decision. This needs a justification to undo this design
> decission.

I think the justification is just what is given in the bug though -- to
fix the memory leaks so that the tests can be re-enabled (at least, after the other
test breakage is fixed, see bug 1084902).

Is there an existing performance test for this code?  I am curious how to find out
how much this costs.

Perhaps some other approach here is possible.  One idea that comes
to mind is to have SamplerStackFrameRAII own a reference to the PseudoStack.

> ::: tools/profiler/PseudoStack.h
> @@ -298,3 @@
> >  
> >    ~PseudoStack() {
> > -    if (mStackPointer != 0) {
> 
> This is important. Why is this gone?

mozilla_sampler_shutdown now deletes the PseudoStack.  However, at this point
we may be between an enter and a leave, so the stack may not be empty.  Given
this change, the abort is no longer a correct sanity check -- it's now ok for
this situation to occur.
(Reporter)

Comment 27

3 years ago
At the moment, this bug is the main culprit for having more than half of the profiler tests disabled. Just take a look at [0] to see the impact. In the meantime, these disabled tests have started failing because of different reasons (e.g. bug 1084902), but this could've been prevented if the tests weren't disabled because of leaks in the first place.

With the upcoming work on the new Performance++ panel (meta bug 1075567), we're writing even more tests that are going to use the profiler module, as well as transferring a few of the existing ones over to the new tool -- we need those tests to pass, otherwise progress is hindered. As pointed out in the examples (e.g. comment 0, comment 7, comment 12), simply starting and stopping the profiler module results in an (albeit intermittent) leak; other activity in the meantime (like `console.profile()` and `console.profileEnd()`) is not entirely necessary, but guarantees leaks.

I might be missing out on some details and implementation concerns, but this is obviously flawed to me. However, I would not object to having those leaks on an "ignore list", as long it makes the test harness consider them passing and helps us re-enable the profiler tests.

[0] http://dxr.mozilla.org/mozilla-central/source/browser/devtools/profiler/test/browser.ini
(In reply to Victor Porof [:vporof][:vp] from comment #27)
> At the moment, this bug is the main culprit for having more than half of the
> profiler tests disabled. Just take a look at [0] to see the impact. In the
> meantime, these disabled tests have started failing because of different
> reasons (e.g. bug 1084902), but this could've been prevented if the tests
> weren't disabled because of leaks in the first place.

I don't dispute that this isn't important. I just don't think it's the right fix.(

In reply to Tom Tromey :tromey from comment #26)
> The apparent type mismatch isn't really, as the function returns void*.
> The earlier return always returns null, so it is also safe.
> But, it is confusing as-is and I changed it locally to explicitly
> return nullptr.
> 

Ahh right.

> > @@ +452,5 @@
> > >  
> > > +  mozilla::ThreadLocal<PseudoStack *> *stackTls = (mozilla::ThreadLocal<PseudoStack *> *) aHandle;
> > > +  PseudoStack *stack = stackTls->get();
> > > +  if (stack) {
> > > +    stack->pop();
> > 
> > We're adding a TLS lookup and a branch in very hot code. This was an
> > intentional design decision. This needs a justification to undo this design
> > decission.
> 
> I think the justification is just what is given in the bug though -- to
> fix the memory leaks so that the tests can be re-enabled (at least, after
> the other
> test breakage is fixed, see bug 1084902).

It would be nice to not leak the pseudostack on shutdown but the goal of the profiler is to make pseudolabel push/pop as cheap as possible so that people wouldn't think twice about adding them in. Adding any overhead to that is going to get a lot of push back, from myself atleast, unless we have a good reason for it.

I don't believe fixing this is important to fixing the test leaks. We're leaking memory when we start/stop the profiler on a test and this is fixing the one time leak for the pseudostack.

Do you know why we have to fix this (init/shutdown) for start/stop to not leak in a test setting?

> 
> Is there an existing performance test for this code?  I am curious how to
> find out
> how much this costs.

No there is not. But we've looked at the generated code at the callsite for functions where we added this. We haven't touched this code since.

> 
> Perhaps some other approach here is possible.  One idea that comes
> to mind is to have SamplerStackFrameRAII own a reference to the PseudoStack.
> 

That would still make it more expensive, however not as much as the current patch. If I'm wrong and we do need to fix this then that's probably the way we want to go.
(Assignee)

Comment 29

3 years ago
(In reply to Benoit Girard (:BenWa) from comment #28)


> > Perhaps some other approach here is possible.  One idea that comes
> > to mind is to have SamplerStackFrameRAII own a reference to the PseudoStack.

> That would still make it more expensive, however not as much as the current
> patch. If I'm wrong and we do need to fix this then that's probably the way
> we want to go.

I'm looking into some options here.  One idea I've been trying out is
having the PseudoStack have a deferred-destruction mode to support this case.
It's basically the reference-counting idea, just reusing the stack depth
as a kind of reference count.

However, there's still more to figure out first; since if I re-enable the
tests and hack them to work (bug 1084902), I see leaks, even with my
patch applied.
(Assignee)

Comment 30

3 years ago
Created attachment 8514570 [details] [diff] [review]
clean up memory leaks and add ctor marking

Here's a new version of the patch.  It fixes several different leaks.
I can split this into a patch series if you would prefer that.

This version takes a slightly different approach to deleting the
PseudoStack.  It now treats mStackPointer as a kind of reference count
-- when it is zero, deletion happens immediately, but when it is
non-zero, deletion is deferred until the next pop.

PseudoStack::pop is still quite cheap, I believe.  It's certainly
cheaper than the corresponding approach in the previous version of the
patch.

~PseudoStack is now private to ensure that all callers take the
correct approach to destruction.

The earlier patch neglected to update mozilla_sampler_print_location1.
This deletion is now done in ~SyncProfile.

PseudoStack::ForgetStack is no longer needed and thus removed.

One newly discovered leak is in profiler_tracing.  This takes
ownership of the ProfilerBacktrace and so must delete it when
profiling is disabled.

Another "new" leak is that mozilla_sampler_unregister_thread was
checking sInitCount -- but this meant that if the profiler was shut
down before a thread was unregistered, then we would never destroy
that thread's PseudoStack.  Changing the test to check
stack_key_initialized fixes this efficiently.

I will push this to "try" soon.  I'll include my bug 1084902 hack when
I do this so that all the relevant tests are run.
Attachment #8511308 - Attachment is obsolete: true
Attachment #8511308 - Flags: review?(bgirard)
(Assignee)

Comment 31

3 years ago
The "try" push showed some apparent leaks of PseudoStacks coming
from plugin processes:

 TEST-UNEXPECTED-FAIL | leakcheck | plugin process: 24680 bytes leaked (PseudoStack) 

I debugged this and I think that adding MOZ_COUNT_CTOR to PseudoStack
isn't as easy as it may seem.

In XRE_InitChildProcess, the profiler is started and PROFILER_LABEL is used --
before NS_LogInit is called.  And, with the approach to PseudoStack destruction
taken in this patch, the final PseudoStack is not deleted until after
NS_LogTerm is called.  However, NS_LogTerm shuts down leak processing, leading
to an apparent, but not actual, leak.

It might be possible to solve this problem by invoking NS_LogInit before
initializing the profiler.  I will try this and see if anything breaks.
(Assignee)

Comment 32

3 years ago
(In reply to Tom Tromey :tromey from comment #31)

> It might be possible to solve this problem by invoking NS_LogInit before
> initializing the profiler.  I will try this and see if anything breaks.

This seems to have worked, but I think it is better to push the
MOZ_COUNT_* stuff and the initialization reordering into a different bug.
Making this change didn't reveal any new leaks.
(Assignee)

Comment 33

3 years ago
https://tbpl.mozilla.org/?tree=Try&rev=2d82d889a2a0
(Assignee)

Comment 34

3 years ago
Created attachment 8517579 [details] [diff] [review]
clean up memory leaks

Here's the updated version of the patch.  The try build is this patch
plus a hack to make the tests run.

This version omits the MOZ_COUNT_* additions.  As mentioned earlier I
think these are better put in a separate patch, as they involve
changes to the initialization code.

Walking through the patch a bit:

* profiler_tracing must take ownership of its argument even when
  profiling is inactive.

* PseudoStack is now responsible for clearing tlsPseudoStack.
  ~PseudoStack is private to ensure all users use the new "destroy"
  method.  "destroy" reuses the stack depth as a simple reference
  count; this ensures that the PseudoStack cannot be destroyed between
  an enter and its corresponding exit.

* SyncProfile owns its ThreadInfo, so ~SyncProfile destroys it.

* ThreadInfo never owns a PseudoStack any more, so the destructor is
  updated.  Consequently ForgetStack is not needed, either.

* There's no need for mozilla_sampler_stop to initialize the profiler.

* mozilla_sampler_unregister_thread must check stack_key_initialized,
  not sInitCount, because it may be called after the profiler is shut
  down.
(Assignee)

Updated

3 years ago
Attachment #8514570 - Attachment is obsolete: true
(Assignee)

Updated

3 years ago
Attachment #8517579 - Flags: review?(bgirard)
Comment on attachment 8517579 [details] [diff] [review]
clean up memory leaks

Review of attachment 8517579 [details] [diff] [review]:
-----------------------------------------------------------------

The patch looks fine. I'm pretty confident that the things you're fixing here is a -lot- more then the simple leak we add during profiler start/stop. But at the same time it's fixing a bunch of good stuff so thanks for that!

r- because this patches causes 2 regressions, I'll happily take it after that.

::: tools/profiler/GeckoProfilerImpl.h
@@ +250,4 @@
>                                      TracingMetadata aMetaData = TRACING_DEFAULT)
>  {
>    if (!stack_key_initialized)
>      return;

Add delete aCuse here too.

::: tools/profiler/platform.cpp
@@ +932,5 @@
>  
>  void mozilla_sampler_unregister_thread()
>  {
> +  // Don't check sInitCount count here -- we may be unregistering the
> +  // thread after the sampler was shut down.

This will regress the case where you do

...
register_thread
...
register_thread
...
unregister_thread (profiler is shutdown now)
...
unregister_thread (profiler should be shutdown then)

@@ +941,5 @@
>    PseudoStack *stack = tlsPseudoStack.get();
>    if (!stack) {
>      return;
>    }
> +  stack->destroy();

This will regress bug 1054110.

If we're profiling a thread and it goes away (the web worker is done for instance) we can still care about what it was doing and why it took so long to complete even if the thread is no longer live. Same for shutdown where we want to see whats running on shutdown even if at the end you're left with just the main thread to save the data you want to know what the, now dead, compositor thread did.
Attachment #8517579 - Flags: review?(bgirard) → review-
(Assignee)

Comment 36

3 years ago
(In reply to Benoit Girard (:BenWa) from comment #35)

> >                                      TracingMetadata aMetaData = TRACING_DEFAULT)
> >  {
> >    if (!stack_key_initialized)
> >      return;
> 
> Add delete aCuse here too.

Funny, I got one case and missed the other.  Bah.
I've fixed this.

> ::: tools/profiler/platform.cpp
> @@ +932,5 @@
> >  
> >  void mozilla_sampler_unregister_thread()
> >  {
> > +  // Don't check sInitCount count here -- we may be unregistering the
> > +  // thread after the sampler was shut down.
> 
> This will regress the case where you do
> 
> ...
> register_thread
> ...
> register_thread
> ...
> unregister_thread (profiler is shutdown now)
> ...
> unregister_thread (profiler should be shutdown then)

I don't understand this one.  Would you mind explaining a bit more?
What is it that might go wrong?

> @@ +941,5 @@
> >    PseudoStack *stack = tlsPseudoStack.get();
> >    if (!stack) {
> >      return;
> >    }
> > +  stack->destroy();
> 
> This will regress bug 1054110.

I've updated the patch to fix this.
(Assignee)

Updated

3 years ago
Flags: needinfo?(bgirard)
Without the patch we support having multiple calls to register_thread. If we call register_thread 5 times let's say, when we see the first call to unregister_thread we don't drop the thread yet, we wait until all the 5 calls have been match with unregister_thread. I don't think your patch preserves this behavior.
Flags: needinfo?(bgirard) → needinfo?(ttromey)
(Assignee)

Comment 38

3 years ago
(In reply to Benoit Girard (:BenWa) from comment #37)
> Without the patch we support having multiple calls to register_thread. If we
> call register_thread 5 times let's say, when we see the first call to
> unregister_thread we don't drop the thread yet, we wait until all the 5
> calls have been match with unregister_thread. I don't think your patch
> preserves this behavior.

Ok, thanks.  I understand now.  I'll make sure this works.
Flags: needinfo?(ttromey)
(Assignee)

Comment 39

3 years ago
 (In reply to Tom Tromey :tromey from comment #38)
> (In reply to Benoit Girard (:BenWa) from comment #37)
> > Without the patch we support having multiple calls to register_thread. If we
> > call register_thread 5 times let's say, when we see the first call to
> > unregister_thread we don't drop the thread yet, we wait until all the 5
> > calls have been match with unregister_thread. I don't think your patch
> > preserves this behavior.
> 
> Ok, thanks.  I understand now.  I'll make sure this works.

After looking more closely, I don't actually understand how this works in
the current code.  It seems to me that mozilla_sampler_register_thread
and mozilla_sampler_unregister_thread don't take any account of whether
calls have been nested.  And, Sampler::RegisterCurrentThread asserts
in this situation:

  int id = gettid();
  for (uint32_t i = 0; i < sRegisteredThreads->size(); i++) {
    ThreadInfo* info = sRegisteredThreads->at(i);
    if (info->ThreadId() == id && !info->IsPendingDelete()) {
      // Thread already registered. This means the first unregister will be
      // too early.
      ASSERT(false);
      return false;
    }
  }

I have observed threads that are registered, unregistered, and then
re-registered, but that is a different scenario.

I haven't yet tried to see this assertion in action though.
(Assignee)

Comment 40

3 years ago
(In reply to Tom Tromey :tromey from comment #39)

> I haven't yet tried to see this assertion in action though.

I hacked duplicate register/deregister calls into WorkerThreadPrimaryRunnable::Run
and it definitely hits that assert.

I'm testing my current patch.
https://tbpl.mozilla.org/?tree=Try&rev=85bffb3a7755
(The urls wind up in bug 1067547 since I'm testing a branch that has both
fixes, plus the re-enable-the-tests patch)

I'll post the patch momentarily.
(Assignee)

Comment 41

3 years ago
Created attachment 8524894 [details] [diff] [review]
clean up memory leaks

This revision preserves the ability to get a profile from a thread in
the case where profiling is enabled while a thread exits.

This required some changes from earlier revisions of the patch.  In
particular, now a ThreadInfo may own a reference to the PseudoStack.
In this patch the way this is done is:

* Introduce a StackOwningThreadInfo subclass, which claims a reference
  to a PseudoStack.  This is needed to avoid reference cycles from
  ThreadInfo objects pushed onto the PseudoStack.

* A PseudoStack can have only two owning references -- one via
  tlsPseudoStack and one via a StackOwningThreadInfo.

* However, as before, a PseudoStack may also outlive the
  tlsPseudoStack reference, in the case where profiling is stopped
  while a SamplerStackFrameRAII is active.

This is all managed by storing 3 reference bits and then accessing
them atomically.  This avoids the need for an explicit mutex, at the
cost of some complexity.

I think the new comment in PseudoStack explains the situation as well.

As mentioned earlier, I couldn't find a scenario where
register/unregister calls could be nested.  So, I did not change
anything here.
Attachment #8517579 - Attachment is obsolete: true
Blocks: 980109
(Assignee)

Comment 42

3 years ago
Created attachment 8525140 [details] [diff] [review]
clean up memory leaks

I attached the wrong patch to the bug.  It was missing a crucial line
in PseudoStack::ref.

I did manage to send the correct patch through "try".

If you look, though, you'll see the results look fairly messy.  That's
because I re-enabled all the profiler tests, and there are still some
bugs in the tests themselves.
Attachment #8524894 - Attachment is obsolete: true
(Assignee)

Comment 43

3 years ago
https://tbpl.mozilla.org/?tree=Try&rev=9d6b29dee5a4
(Assignee)

Updated

3 years ago
Attachment #8525140 - Flags: review?(bgirard)
Comment on attachment 8525140 [details] [diff] [review]
clean up memory leaks

Review of attachment 8525140 [details] [diff] [review]:
-----------------------------------------------------------------

::: tools/profiler/PseudoStack.h
@@ +378,5 @@
>      mStackPointer--;
> +    if (mStackPointer == 0) {
> +      int newValue = (mRefBits &= ~REF_DYING);
> +      if (newValue == 0) {
> +        delete this;

This will break if this class is allocated on the stack. You'll need to prevent stack allocation:
http://stackoverflow.com/questions/124880/is-it-possible-to-prevent-stack-allocation-of-an-object-and-only-allow-it-to-be

Maybe this should be renamed to popAndMaybeDelete. This means that every user has to watch out that after every pop the object may be deleted. You may want to return a bool.

@@ +470,5 @@
> +  // consistency while avoiding leaks.
> +  mozilla::Atomic<int> mRefBits;
> +
> +  // Possible values in mRefBits.
> +  enum {

Can't this simply be an atomic int that keeps track of how many people own this object? The stack can keep a reference to itself. When the stack pushes the first element is AddRef and it Release on the last element being poped.

@@ +477,5 @@
> +    // A reference is owned by tlsPseudoStack.
> +    REF_TLS_PSEUDOSTACK = 2,
> +    // A reference from tlsPseudoStack was removed, but the stack was
> +    // active, and so is in its "dying" mode.
> +    REF_DYING = 4

Maybe this should be an independent Atomic<bool>?

@@ +529,5 @@
> +    MOZ_ASSERT((mRefBits & REF_DYING) == 0);
> +    // The order here matters -- we must atomically set the "dying"
> +    // bit to avoid a deletion race with derefThread.
> +    if (mStackPointer > 0) {
> +      mRefBits |= REF_DYING;

AFAIK this operation isn't atomic. I think you might need to use the intrinsic:
http://mxr.mozilla.org/mozilla-central/source/mfbt/Atomics.h
I'm not familiar with all the gotchas of Atomic so we may want to get some help with this piece.

This code feels very difficult to prove that it's correct unfortunately. It would be better if we can think of something simpler.
(Assignee)

Comment 45

3 years ago
(In reply to Benoit Girard (:BenWa) from comment #44)

> This will break if this class is allocated on the stack. You'll need to
> prevent stack allocation:

Will do.

> > +  // consistency while avoiding leaks.
> > +  mozilla::Atomic<int> mRefBits;
> > +
> > +  // Possible values in mRefBits.
> > +  enum {
> 
> Can't this simply be an atomic int that keeps track of how many people own
> this object? The stack can keep a reference to itself. When the stack pushes
> the first element is AddRef and it Release on the last element being poped.

Sure, I can do that.  I chose the current approach to avoid putting more work
into mozilla_sampler_call_enter, since we'd previously discussed how that
path is hot.  Making this into a more ordinary reference count will simplify
things though.

> @@ +477,5 @@
> > +    // A reference is owned by tlsPseudoStack.
> > +    REF_TLS_PSEUDOSTACK = 2,
> > +    // A reference from tlsPseudoStack was removed, but the stack was
> > +    // active, and so is in its "dying" mode.
> > +    REF_DYING = 4
> 
> Maybe this should be an independent Atomic<bool>?

It's moot now since I'll make the reference counting change; but I think
the answer is, no, it can't be independent, as then there is no way to 
guarantee atomicity.  The bad case is if the TLS reference is dropped
(setting the dying bit), then the ThreadInfo release can see a 0 reference
count, and it has no way to atomically check both the dying bit and
mRefBits at the same time.


> > +      mRefBits |= REF_DYING;

> AFAIK this operation isn't atomic. I think you might need to use the
> intrinsic:
> http://mxr.mozilla.org/mozilla-central/source/mfbt/Atomics.h
> I'm not familiar with all the gotchas of Atomic so we may want to get some
> help with this piece.

Yeah, it is atomic -- the various OP= operators are defined around here:
http://mxr.mozilla.org/mozilla-central/source/mfbt/Atomics.h#1052
They use the intrinsics.  They're defined a bit funnily since the
intrinsics return the previous value but the operators return the
resulting value; but in our case that's what we want.

> This code feels very difficult to prove that it's correct unfortunately. It
> would be better if we can think of something simpler.

Yeah, I totally agree.  After working on it a bit I think I finally have
a decent idea of the lifetime issues, at least for the subset of objects
I've been dealing with.  So maybe we can find a good spot for some comments
to record the situation.
(Assignee)

Comment 46

3 years ago
(In reply to Benoit Girard (:BenWa) from comment #44)

> Maybe this should be renamed to popAndMaybeDelete. This means that every
> user has to watch out that after every pop the object may be deleted. You
> may want to return a bool.

I did this, but I believe the only caller is mozilla_sampler_call_exit,
and there's nothing to do there.
(Assignee)

Comment 47

3 years ago
https://tbpl.mozilla.org/?tree=Try&rev=d8bf0becf375
(Assignee)

Comment 48

3 years ago
Created attachment 8528492 [details] [diff] [review]
clean up memory leaks

I think this version addresses all the review comments.
Attachment #8525140 - Attachment is obsolete: true
Attachment #8525140 - Flags: review?(bgirard)
(Assignee)

Updated

3 years ago
Attachment #8528492 - Flags: review?(bgirard)
Comment on attachment 8528492 [details] [diff] [review]
clean up memory leaks

Review of attachment 8528492 [details] [diff] [review]:
-----------------------------------------------------------------

Sorry for the late review. It looks good!
Attachment #8528492 - Flags: review?(bgirard) → review+
(Assignee)

Comment 50

3 years ago
Thanks!
Keywords: checkin-needed
https://hg.mozilla.org/integration/mozilla-inbound/rev/53adf5b021fa
Keywords: checkin-needed
sorry had to back this out for memory leaks like https://treeherder.mozilla.org/logviewer.html#?job_id=4547839&repo=mozilla-inbound
Flags: needinfo?(ttromey)
(Assignee)

Comment 53

3 years ago
(In reply to Carsten Book [:Tomcat] from comment #52)
> sorry had to back this out for memory leaks like
> https://treeherder.mozilla.org/logviewer.html#?job_id=4547839&repo=mozilla-
> inbound

No problem.  It seems I'll probably have to roll in the fix for bug 1067547 after all.
Well, or back out the MOZ_COUNT_*TOR bits, though that seems like cheating.
Flags: needinfo?(ttromey)
(Assignee)

Comment 54

3 years ago
Created attachment 8535076 [details] [diff] [review]
clean up memory leaks

I looked into this.

The "final" patch had a MOZ_COUNT_DTOR marker left in it, probably due
to a rebase error on my part.  This caused the apparent "leak" -- but
note first that the leak reports a huge number due to underflow; and
second that there isn't actually a leak at all, but rather bad
placement of the leak detector calls vis a vis profiler initialization
and shutdown.

Despite what I said about cheating in comment #53, my intention here
was to push these markers into the fix for bug 1067547; see comment

So, I am attaching this patch and pushing it to try.  I plan to carry
over the r+ should that work out ok.
Attachment #8528492 - Attachment is obsolete: true
(Assignee)

Comment 55

3 years ago
https://tbpl.mozilla.org/?tree=Try&rev=8edf0753943d
(Assignee)

Comment 56

3 years ago
This version looks ok.  Once it is in I will deal with the follow-up patch.
Keywords: checkin-needed
https://hg.mozilla.org/integration/mozilla-inbound/rev/6b0599030a5f
Keywords: checkin-needed
No longer blocks: 980109
Duplicate of this bug: 980109
https://hg.mozilla.org/mozilla-central/rev/6b0599030a5f
Status: NEW → RESOLVED
Last Resolved: 3 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla37
Keywords: regressionwindow-wanted
You need to log in before you can comment on or make changes to this bug.