Closed Bug 1047124 Opened 11 years ago Closed 11 years ago

The profile module leaks on debug builds

Categories

(Core :: Gecko Profiler, defect)

defect
Not set
blocker

Tracking

()

RESOLVED FIXED
mozilla37

People

(Reporter: vporof, Assigned: tromey)

References

Details

(Keywords: regression)

Attachments

(1 file, 8 obsolete files)

Run this as a browser mochitest: function test() { let profiler = Cc["@mozilla.org/tools/profiler;1"].getService(Ci.nsIProfiler); let ENTRIES = 1000000; let INTERVAL = 1; let FEATURES = ["js"]; profiler.StartProfiler(ENTRIES, INTERVAL, FEATURES, FEATURES.length); console.profile(); finish(); } Leaks: Mutex ReentrantMonitor SyncProfile ThreadProfile ThreadResponsiveness nsTArray_base nsThread Full log: https://pastebin.mozilla.org/5738770 This is badly blocking bug 879008!
Blocks: 879008
when we find the source of this, we should get this included in our testsuites.
Flags: in-testsuite?
Keywords: regression
Does finish() here stop the profiler? If not it's a bug in the test. ... But in any case we shouldn't be leaking even if we forget to stop the profiler.
Spoke with Benwa to see if he knew of any recent changes that might have triggered this. He did mention that the test doesn't explicitly stop the profiler which is technically an error in the test, though I think this is still a new failure. Nothing preventing a user from running console.profile() and leaving that running. I don't think we explicitly state anywhere that the profiler must have a matching StopProfile for every StartProfile call.
(In reply to Benoit Girard (:BenWa) from comment #2) > Does finish() here stop the profiler? If not it's a bug in the test. > > ... But in any case we shouldn't be leaking even if we forget to stop the > profiler. no, I don't think so. Individual tests *could* register a cleanup function for every invocation of startProfiler but like I said, the profiler itself should be resilient enough to not leak if a dumb user left it on.
(In reply to Benoit Girard (:BenWa) from comment #2) > Does finish() here stop the profiler? If not it's a bug in the test. > No. It's an artifact of using waitForExplicitFinish in my head.js You can safely ignore that. > ... But in any case we shouldn't be leaking even if we forget to stop the > profiler. Stopping the profiler doesn't actually kill the leak either.
(In reply to Rob Campbell [:rc] (:robcee) from comment #3) > > I don't think we explicitly state anywhere that the profiler must have a > matching StopProfile for every StartProfile call. That would be extremely incorrect. If there are two clients (e.g. "toolboxes"), and one of them stops the profiler, the other one loses all the data. Anyway, that's a completely different discussion.
So this still leaks: function test() { let profiler = Cc["@mozilla.org/tools/profiler;1"].getService(Ci.nsIProfiler); let ENTRIES = 1000000; let INTERVAL = 1; let FEATURES = ["js"]; profiler.StartProfiler(ENTRIES, INTERVAL, FEATURES, FEATURES.length); console.profile(); profiler.StopProfiler(); }
Try run, with memleaks and use-after-free: https://tbpl.mozilla.org/?tree=Try&rev=78bb6f3cc246
There is also a crash in there [@ libsystem_kernel.dylib + 0x12386], but it's possibly a different bug.
This is really weird. As opposed to comment 7, this *sometimes* doesn't leak: function test() { let profiler = Cc["@mozilla.org/tools/profiler;1"].getService(Ci.nsIProfiler); let ENTRIES = 1000000; let INTERVAL = 1; let FEATURES = ["js"]; profiler.StartProfiler(ENTRIES, INTERVAL, FEATURES, FEATURES.length); console.profile(); } registerCleanupFunction(() => { let profiler = Cc["@mozilla.org/tools/profiler;1"].getService(Ci.nsIProfiler); profiler.StopProfiler(); }); The only change is that the profiler is stopped later.
What about calling console.profileEnd() in addition to StopProfiler()? Again, it still shouldn't be leaking, but it would be nice to narrow it down some more too.
Upon further investigation, calling console.profile() isn't absolutely necessary in order to get those leaks. Doing so, however, guarantees a permaleak.
Summary: The profile module leaks when invoking console.profile() on debug builds → The profile module leaks on debug builds
Blocks: 973974
Severity: normal → blocker
I tried this today. I was able to see leaks using the original test case. The only leaks I could see (after trying many times) for the tests in comment #7 and comment #10 were intermittent leaks like: TEST-INFO | leakcheck | default process: leaked 18446744073709551615 WeakReference<MessageListener> (0 bytes) TEST-UNEXPECTED-FAIL | leakcheck | default process: 16 bytes leaked (WeakReference<MessageListener>)
The leaks I see from the original test are: TEST-INFO | leakcheck | default process: leaked 4 Mutex (128 bytes) TEST-INFO | leakcheck | default process: leaked 3 ProfilerBacktrace (24 bytes) TEST-INFO | leakcheck | default process: leaked 1 ReentrantMonitor (40 bytes) TEST-INFO | leakcheck | default process: leaked 3 SyncProfile (504 bytes) TEST-INFO | leakcheck | default process: leaked 3 ThreadProfile (456 bytes) TEST-INFO | leakcheck | default process: leaked 3 ThreadResponsiveness (72 bytes) TEST-INFO | leakcheck | default process: leaked 1 nsTArray_base (8 bytes) TEST-INFO | leakcheck | default process: leaked 1 nsThread (240 bytes) TEST-UNEXPECTED-FAIL | leakcheck | default process: 1472 bytes leaked (Mutex, ProfilerBacktrace, ReentrantMonitor, SyncProfile, ThreadProfile, ...) The ProfilerBacktrace, SyncProfile, ThreadProfile, and ThreadResponsiveness leaks are due to bug 980109. I have a patch for this part, and I'll upload it -- but note that bug 980109 was wontfix'd, so it's unclear whether this will fly. I am not sure if there is a way to tell the test harness that some leaks are expected. If there is, that would be another way to handle the problem (similar to how bug 980109 was dealt with). I'm still investigating the other leaks.
Here's the patch that fixes a subset of the leaks. It arranges to delete the pseudostack object when shutting down the profiler. I have not run this through a try build yet. Note that this introduces a cost to mozilla_sampler_call_exit. Specifically, see the comment that was removed from mozilla_sampler_call_enter. I don't know how important this is.
The profiler is rarely (or theoretically should never have to be) shut down. Usually it's kept active and consumers simply fetch the entire circular buffer, splicing out the interesting data. Therefore, I'd say that the additional cost to mozilla_sampler_call_exit is not important.
Assignee: nobody → ttromey
Instrumenting a few more constructors (PseudoStack, Sampler, ThreadInfo, etc) shows a few more leaks as well. I think the nsThread leak comes from TableTicker.cpp:NewSyncProfile. This allocates a ThreadInfo, which holds a reference to nsThread; but nothing ever frees this ThreadInfo. I'm still debugging the fix.
Here's the patch that I'm testing. This includes the earlier patch, plus an update to remove an abort from ~PseudoStack. I added a few MOZ_COUNT_CTOR markers here and there. This revealed that we were leaking more memory than previously believed, but didn't reveal any new sources of leaks. The nsThread was kept alive by the ThreadInfo object as mentioned in comment #18. This patches fixes this by deleting the object in the SyncProfile destructor, SyncProfile objects are only created with a fresh ThreadInfo. This seems somewhat unclean to me; but a better fix would probably be more invasive. The mozilla_sampler_stop change is just to avoid initializing the profiler during shutdown. I don't think this is leaky, it just looked weird. The TickSample change is to make valgrind not complain about using uninitialized values. I think this is perhaps questionable (it may be papering over unintended behavior) and should be pushed off into another bug; but it's handy to make the output a bit quieter while working on this patch.
Attachment #8508806 - Attachment is obsolete: true
(In reply to Victor Porof [:vporof][:vp] from comment #17) > The profiler is rarely (or theoretically should never have to be) shut down. > Usually it's kept active and consumers simply fetch the entire circular > buffer, splicing out the interesting data. Therefore, I'd say that the > additional cost to mozilla_sampler_call_exit is not important. mozilla_sampler_call_exit is used to mark the exit from a stack frame. Normally I wouldn't worry about it but I deleted two comments about how it's written a certain way to avoid a TLS lookup :) I'm not sure the best way to test this to see whether it truly matters.
I filed bug 1088731 for the undefined behavior.
Attachment #8510621 - Attachment is obsolete: true
Attachment #8511308 - Flags: review?(bgirard)
Comment on attachment 8511308 [details] [diff] [review] clean up memory leaks and add ctor marking Review of attachment 8511308 [details] [diff] [review]: ----------------------------------------------------------------- ::: tools/profiler/GeckoProfilerImpl.h @@ -446,5 @@ > - // but for now it is simply use to save a call to > - // pthread_getspecific on exit. It also supports the > - // case where the sampler is initialized between > - // enter and exit. > - return stack; This is important. Why is this gone? Also I'm not sure how this built because the types don't match. @@ +452,5 @@ > > + mozilla::ThreadLocal<PseudoStack *> *stackTls = (mozilla::ThreadLocal<PseudoStack *> *) aHandle; > + PseudoStack *stack = stackTls->get(); > + if (stack) { > + stack->pop(); We're adding a TLS lookup and a branch in very hot code. This was an intentional design decision. This needs a justification to undo this design decission. ::: tools/profiler/PseudoStack.h @@ -298,3 @@ > > ~PseudoStack() { > - if (mStackPointer != 0) { This is important. Why is this gone?
If the bug is that we're leaking a marker or something of that sort then we should focus on fixing that and leave the pseudostack leak alone. It's in the ignore list and not costing us much.
(In reply to Benoit Girard (:BenWa) from comment #24) > Comment on attachment 8511308 [details] [diff] [review] > clean up memory leaks and add ctor marking > > Review of attachment 8511308 [details] [diff] [review]: > ----------------------------------------------------------------- > > ::: tools/profiler/GeckoProfilerImpl.h > @@ -446,5 @@ > > - // but for now it is simply use to save a call to > > - // pthread_getspecific on exit. It also supports the > > - // case where the sampler is initialized between > > - // enter and exit. > > - return stack; > > This is important. Why is this gone? Also I'm not sure how this built > because the types don't match. I don't understand. It isn't gone, it's been replaced with + return &tlsPseudoStack; The apparent type mismatch isn't really, as the function returns void*. The earlier return always returns null, so it is also safe. But, it is confusing as-is and I changed it locally to explicitly return nullptr. > @@ +452,5 @@ > > > > + mozilla::ThreadLocal<PseudoStack *> *stackTls = (mozilla::ThreadLocal<PseudoStack *> *) aHandle; > > + PseudoStack *stack = stackTls->get(); > > + if (stack) { > > + stack->pop(); > > We're adding a TLS lookup and a branch in very hot code. This was an > intentional design decision. This needs a justification to undo this design > decission. I think the justification is just what is given in the bug though -- to fix the memory leaks so that the tests can be re-enabled (at least, after the other test breakage is fixed, see bug 1084902). Is there an existing performance test for this code? I am curious how to find out how much this costs. Perhaps some other approach here is possible. One idea that comes to mind is to have SamplerStackFrameRAII own a reference to the PseudoStack. > ::: tools/profiler/PseudoStack.h > @@ -298,3 @@ > > > > ~PseudoStack() { > > - if (mStackPointer != 0) { > > This is important. Why is this gone? mozilla_sampler_shutdown now deletes the PseudoStack. However, at this point we may be between an enter and a leave, so the stack may not be empty. Given this change, the abort is no longer a correct sanity check -- it's now ok for this situation to occur.
At the moment, this bug is the main culprit for having more than half of the profiler tests disabled. Just take a look at [0] to see the impact. In the meantime, these disabled tests have started failing because of different reasons (e.g. bug 1084902), but this could've been prevented if the tests weren't disabled because of leaks in the first place. With the upcoming work on the new Performance++ panel (meta bug 1075567), we're writing even more tests that are going to use the profiler module, as well as transferring a few of the existing ones over to the new tool -- we need those tests to pass, otherwise progress is hindered. As pointed out in the examples (e.g. comment 0, comment 7, comment 12), simply starting and stopping the profiler module results in an (albeit intermittent) leak; other activity in the meantime (like `console.profile()` and `console.profileEnd()`) is not entirely necessary, but guarantees leaks. I might be missing out on some details and implementation concerns, but this is obviously flawed to me. However, I would not object to having those leaks on an "ignore list", as long it makes the test harness consider them passing and helps us re-enable the profiler tests. [0] http://dxr.mozilla.org/mozilla-central/source/browser/devtools/profiler/test/browser.ini
(In reply to Victor Porof [:vporof][:vp] from comment #27) > At the moment, this bug is the main culprit for having more than half of the > profiler tests disabled. Just take a look at [0] to see the impact. In the > meantime, these disabled tests have started failing because of different > reasons (e.g. bug 1084902), but this could've been prevented if the tests > weren't disabled because of leaks in the first place. I don't dispute that this isn't important. I just don't think it's the right fix.( In reply to Tom Tromey :tromey from comment #26) > The apparent type mismatch isn't really, as the function returns void*. > The earlier return always returns null, so it is also safe. > But, it is confusing as-is and I changed it locally to explicitly > return nullptr. > Ahh right. > > @@ +452,5 @@ > > > > > > + mozilla::ThreadLocal<PseudoStack *> *stackTls = (mozilla::ThreadLocal<PseudoStack *> *) aHandle; > > > + PseudoStack *stack = stackTls->get(); > > > + if (stack) { > > > + stack->pop(); > > > > We're adding a TLS lookup and a branch in very hot code. This was an > > intentional design decision. This needs a justification to undo this design > > decission. > > I think the justification is just what is given in the bug though -- to > fix the memory leaks so that the tests can be re-enabled (at least, after > the other > test breakage is fixed, see bug 1084902). It would be nice to not leak the pseudostack on shutdown but the goal of the profiler is to make pseudolabel push/pop as cheap as possible so that people wouldn't think twice about adding them in. Adding any overhead to that is going to get a lot of push back, from myself atleast, unless we have a good reason for it. I don't believe fixing this is important to fixing the test leaks. We're leaking memory when we start/stop the profiler on a test and this is fixing the one time leak for the pseudostack. Do you know why we have to fix this (init/shutdown) for start/stop to not leak in a test setting? > > Is there an existing performance test for this code? I am curious how to > find out > how much this costs. No there is not. But we've looked at the generated code at the callsite for functions where we added this. We haven't touched this code since. > > Perhaps some other approach here is possible. One idea that comes > to mind is to have SamplerStackFrameRAII own a reference to the PseudoStack. > That would still make it more expensive, however not as much as the current patch. If I'm wrong and we do need to fix this then that's probably the way we want to go.
(In reply to Benoit Girard (:BenWa) from comment #28) > > Perhaps some other approach here is possible. One idea that comes > > to mind is to have SamplerStackFrameRAII own a reference to the PseudoStack. > That would still make it more expensive, however not as much as the current > patch. If I'm wrong and we do need to fix this then that's probably the way > we want to go. I'm looking into some options here. One idea I've been trying out is having the PseudoStack have a deferred-destruction mode to support this case. It's basically the reference-counting idea, just reusing the stack depth as a kind of reference count. However, there's still more to figure out first; since if I re-enable the tests and hack them to work (bug 1084902), I see leaks, even with my patch applied.
Here's a new version of the patch. It fixes several different leaks. I can split this into a patch series if you would prefer that. This version takes a slightly different approach to deleting the PseudoStack. It now treats mStackPointer as a kind of reference count -- when it is zero, deletion happens immediately, but when it is non-zero, deletion is deferred until the next pop. PseudoStack::pop is still quite cheap, I believe. It's certainly cheaper than the corresponding approach in the previous version of the patch. ~PseudoStack is now private to ensure that all callers take the correct approach to destruction. The earlier patch neglected to update mozilla_sampler_print_location1. This deletion is now done in ~SyncProfile. PseudoStack::ForgetStack is no longer needed and thus removed. One newly discovered leak is in profiler_tracing. This takes ownership of the ProfilerBacktrace and so must delete it when profiling is disabled. Another "new" leak is that mozilla_sampler_unregister_thread was checking sInitCount -- but this meant that if the profiler was shut down before a thread was unregistered, then we would never destroy that thread's PseudoStack. Changing the test to check stack_key_initialized fixes this efficiently. I will push this to "try" soon. I'll include my bug 1084902 hack when I do this so that all the relevant tests are run.
Attachment #8511308 - Attachment is obsolete: true
Attachment #8511308 - Flags: review?(bgirard)
The "try" push showed some apparent leaks of PseudoStacks coming from plugin processes: TEST-UNEXPECTED-FAIL | leakcheck | plugin process: 24680 bytes leaked (PseudoStack) I debugged this and I think that adding MOZ_COUNT_CTOR to PseudoStack isn't as easy as it may seem. In XRE_InitChildProcess, the profiler is started and PROFILER_LABEL is used -- before NS_LogInit is called. And, with the approach to PseudoStack destruction taken in this patch, the final PseudoStack is not deleted until after NS_LogTerm is called. However, NS_LogTerm shuts down leak processing, leading to an apparent, but not actual, leak. It might be possible to solve this problem by invoking NS_LogInit before initializing the profiler. I will try this and see if anything breaks.
(In reply to Tom Tromey :tromey from comment #31) > It might be possible to solve this problem by invoking NS_LogInit before > initializing the profiler. I will try this and see if anything breaks. This seems to have worked, but I think it is better to push the MOZ_COUNT_* stuff and the initialization reordering into a different bug. Making this change didn't reveal any new leaks.
Attached patch clean up memory leaks (obsolete) — Splinter Review
Here's the updated version of the patch. The try build is this patch plus a hack to make the tests run. This version omits the MOZ_COUNT_* additions. As mentioned earlier I think these are better put in a separate patch, as they involve changes to the initialization code. Walking through the patch a bit: * profiler_tracing must take ownership of its argument even when profiling is inactive. * PseudoStack is now responsible for clearing tlsPseudoStack. ~PseudoStack is private to ensure all users use the new "destroy" method. "destroy" reuses the stack depth as a simple reference count; this ensures that the PseudoStack cannot be destroyed between an enter and its corresponding exit. * SyncProfile owns its ThreadInfo, so ~SyncProfile destroys it. * ThreadInfo never owns a PseudoStack any more, so the destructor is updated. Consequently ForgetStack is not needed, either. * There's no need for mozilla_sampler_stop to initialize the profiler. * mozilla_sampler_unregister_thread must check stack_key_initialized, not sInitCount, because it may be called after the profiler is shut down.
Attachment #8514570 - Attachment is obsolete: true
Attachment #8517579 - Flags: review?(bgirard)
Comment on attachment 8517579 [details] [diff] [review] clean up memory leaks Review of attachment 8517579 [details] [diff] [review]: ----------------------------------------------------------------- The patch looks fine. I'm pretty confident that the things you're fixing here is a -lot- more then the simple leak we add during profiler start/stop. But at the same time it's fixing a bunch of good stuff so thanks for that! r- because this patches causes 2 regressions, I'll happily take it after that. ::: tools/profiler/GeckoProfilerImpl.h @@ +250,4 @@ > TracingMetadata aMetaData = TRACING_DEFAULT) > { > if (!stack_key_initialized) > return; Add delete aCuse here too. ::: tools/profiler/platform.cpp @@ +932,5 @@ > > void mozilla_sampler_unregister_thread() > { > + // Don't check sInitCount count here -- we may be unregistering the > + // thread after the sampler was shut down. This will regress the case where you do ... register_thread ... register_thread ... unregister_thread (profiler is shutdown now) ... unregister_thread (profiler should be shutdown then) @@ +941,5 @@ > PseudoStack *stack = tlsPseudoStack.get(); > if (!stack) { > return; > } > + stack->destroy(); This will regress bug 1054110. If we're profiling a thread and it goes away (the web worker is done for instance) we can still care about what it was doing and why it took so long to complete even if the thread is no longer live. Same for shutdown where we want to see whats running on shutdown even if at the end you're left with just the main thread to save the data you want to know what the, now dead, compositor thread did.
Attachment #8517579 - Flags: review?(bgirard) → review-
(In reply to Benoit Girard (:BenWa) from comment #35) > > TracingMetadata aMetaData = TRACING_DEFAULT) > > { > > if (!stack_key_initialized) > > return; > > Add delete aCuse here too. Funny, I got one case and missed the other. Bah. I've fixed this. > ::: tools/profiler/platform.cpp > @@ +932,5 @@ > > > > void mozilla_sampler_unregister_thread() > > { > > + // Don't check sInitCount count here -- we may be unregistering the > > + // thread after the sampler was shut down. > > This will regress the case where you do > > ... > register_thread > ... > register_thread > ... > unregister_thread (profiler is shutdown now) > ... > unregister_thread (profiler should be shutdown then) I don't understand this one. Would you mind explaining a bit more? What is it that might go wrong? > @@ +941,5 @@ > > PseudoStack *stack = tlsPseudoStack.get(); > > if (!stack) { > > return; > > } > > + stack->destroy(); > > This will regress bug 1054110. I've updated the patch to fix this.
Flags: needinfo?(bgirard)
Without the patch we support having multiple calls to register_thread. If we call register_thread 5 times let's say, when we see the first call to unregister_thread we don't drop the thread yet, we wait until all the 5 calls have been match with unregister_thread. I don't think your patch preserves this behavior.
Flags: needinfo?(bgirard) → needinfo?(ttromey)
(In reply to Benoit Girard (:BenWa) from comment #37) > Without the patch we support having multiple calls to register_thread. If we > call register_thread 5 times let's say, when we see the first call to > unregister_thread we don't drop the thread yet, we wait until all the 5 > calls have been match with unregister_thread. I don't think your patch > preserves this behavior. Ok, thanks. I understand now. I'll make sure this works.
Flags: needinfo?(ttromey)
(In reply to Tom Tromey :tromey from comment #38) > (In reply to Benoit Girard (:BenWa) from comment #37) > > Without the patch we support having multiple calls to register_thread. If we > > call register_thread 5 times let's say, when we see the first call to > > unregister_thread we don't drop the thread yet, we wait until all the 5 > > calls have been match with unregister_thread. I don't think your patch > > preserves this behavior. > > Ok, thanks. I understand now. I'll make sure this works. After looking more closely, I don't actually understand how this works in the current code. It seems to me that mozilla_sampler_register_thread and mozilla_sampler_unregister_thread don't take any account of whether calls have been nested. And, Sampler::RegisterCurrentThread asserts in this situation: int id = gettid(); for (uint32_t i = 0; i < sRegisteredThreads->size(); i++) { ThreadInfo* info = sRegisteredThreads->at(i); if (info->ThreadId() == id && !info->IsPendingDelete()) { // Thread already registered. This means the first unregister will be // too early. ASSERT(false); return false; } } I have observed threads that are registered, unregistered, and then re-registered, but that is a different scenario. I haven't yet tried to see this assertion in action though.
(In reply to Tom Tromey :tromey from comment #39) > I haven't yet tried to see this assertion in action though. I hacked duplicate register/deregister calls into WorkerThreadPrimaryRunnable::Run and it definitely hits that assert. I'm testing my current patch. https://tbpl.mozilla.org/?tree=Try&rev=85bffb3a7755 (The urls wind up in bug 1067547 since I'm testing a branch that has both fixes, plus the re-enable-the-tests patch) I'll post the patch momentarily.
Attached patch clean up memory leaks (obsolete) — Splinter Review
This revision preserves the ability to get a profile from a thread in the case where profiling is enabled while a thread exits. This required some changes from earlier revisions of the patch. In particular, now a ThreadInfo may own a reference to the PseudoStack. In this patch the way this is done is: * Introduce a StackOwningThreadInfo subclass, which claims a reference to a PseudoStack. This is needed to avoid reference cycles from ThreadInfo objects pushed onto the PseudoStack. * A PseudoStack can have only two owning references -- one via tlsPseudoStack and one via a StackOwningThreadInfo. * However, as before, a PseudoStack may also outlive the tlsPseudoStack reference, in the case where profiling is stopped while a SamplerStackFrameRAII is active. This is all managed by storing 3 reference bits and then accessing them atomically. This avoids the need for an explicit mutex, at the cost of some complexity. I think the new comment in PseudoStack explains the situation as well. As mentioned earlier, I couldn't find a scenario where register/unregister calls could be nested. So, I did not change anything here.
Attachment #8517579 - Attachment is obsolete: true
Attached patch clean up memory leaks (obsolete) — Splinter Review
I attached the wrong patch to the bug. It was missing a crucial line in PseudoStack::ref. I did manage to send the correct patch through "try". If you look, though, you'll see the results look fairly messy. That's because I re-enabled all the profiler tests, and there are still some bugs in the tests themselves.
Attachment #8524894 - Attachment is obsolete: true
Attachment #8525140 - Flags: review?(bgirard)
Comment on attachment 8525140 [details] [diff] [review] clean up memory leaks Review of attachment 8525140 [details] [diff] [review]: ----------------------------------------------------------------- ::: tools/profiler/PseudoStack.h @@ +378,5 @@ > mStackPointer--; > + if (mStackPointer == 0) { > + int newValue = (mRefBits &= ~REF_DYING); > + if (newValue == 0) { > + delete this; This will break if this class is allocated on the stack. You'll need to prevent stack allocation: http://stackoverflow.com/questions/124880/is-it-possible-to-prevent-stack-allocation-of-an-object-and-only-allow-it-to-be Maybe this should be renamed to popAndMaybeDelete. This means that every user has to watch out that after every pop the object may be deleted. You may want to return a bool. @@ +470,5 @@ > + // consistency while avoiding leaks. > + mozilla::Atomic<int> mRefBits; > + > + // Possible values in mRefBits. > + enum { Can't this simply be an atomic int that keeps track of how many people own this object? The stack can keep a reference to itself. When the stack pushes the first element is AddRef and it Release on the last element being poped. @@ +477,5 @@ > + // A reference is owned by tlsPseudoStack. > + REF_TLS_PSEUDOSTACK = 2, > + // A reference from tlsPseudoStack was removed, but the stack was > + // active, and so is in its "dying" mode. > + REF_DYING = 4 Maybe this should be an independent Atomic<bool>? @@ +529,5 @@ > + MOZ_ASSERT((mRefBits & REF_DYING) == 0); > + // The order here matters -- we must atomically set the "dying" > + // bit to avoid a deletion race with derefThread. > + if (mStackPointer > 0) { > + mRefBits |= REF_DYING; AFAIK this operation isn't atomic. I think you might need to use the intrinsic: http://mxr.mozilla.org/mozilla-central/source/mfbt/Atomics.h I'm not familiar with all the gotchas of Atomic so we may want to get some help with this piece. This code feels very difficult to prove that it's correct unfortunately. It would be better if we can think of something simpler.
(In reply to Benoit Girard (:BenWa) from comment #44) > This will break if this class is allocated on the stack. You'll need to > prevent stack allocation: Will do. > > + // consistency while avoiding leaks. > > + mozilla::Atomic<int> mRefBits; > > + > > + // Possible values in mRefBits. > > + enum { > > Can't this simply be an atomic int that keeps track of how many people own > this object? The stack can keep a reference to itself. When the stack pushes > the first element is AddRef and it Release on the last element being poped. Sure, I can do that. I chose the current approach to avoid putting more work into mozilla_sampler_call_enter, since we'd previously discussed how that path is hot. Making this into a more ordinary reference count will simplify things though. > @@ +477,5 @@ > > + // A reference is owned by tlsPseudoStack. > > + REF_TLS_PSEUDOSTACK = 2, > > + // A reference from tlsPseudoStack was removed, but the stack was > > + // active, and so is in its "dying" mode. > > + REF_DYING = 4 > > Maybe this should be an independent Atomic<bool>? It's moot now since I'll make the reference counting change; but I think the answer is, no, it can't be independent, as then there is no way to guarantee atomicity. The bad case is if the TLS reference is dropped (setting the dying bit), then the ThreadInfo release can see a 0 reference count, and it has no way to atomically check both the dying bit and mRefBits at the same time. > > + mRefBits |= REF_DYING; > AFAIK this operation isn't atomic. I think you might need to use the > intrinsic: > http://mxr.mozilla.org/mozilla-central/source/mfbt/Atomics.h > I'm not familiar with all the gotchas of Atomic so we may want to get some > help with this piece. Yeah, it is atomic -- the various OP= operators are defined around here: http://mxr.mozilla.org/mozilla-central/source/mfbt/Atomics.h#1052 They use the intrinsics. They're defined a bit funnily since the intrinsics return the previous value but the operators return the resulting value; but in our case that's what we want. > This code feels very difficult to prove that it's correct unfortunately. It > would be better if we can think of something simpler. Yeah, I totally agree. After working on it a bit I think I finally have a decent idea of the lifetime issues, at least for the subset of objects I've been dealing with. So maybe we can find a good spot for some comments to record the situation.
(In reply to Benoit Girard (:BenWa) from comment #44) > Maybe this should be renamed to popAndMaybeDelete. This means that every > user has to watch out that after every pop the object may be deleted. You > may want to return a bool. I did this, but I believe the only caller is mozilla_sampler_call_exit, and there's nothing to do there.
Attached patch clean up memory leaks (obsolete) — Splinter Review
I think this version addresses all the review comments.
Attachment #8525140 - Attachment is obsolete: true
Attachment #8525140 - Flags: review?(bgirard)
Attachment #8528492 - Flags: review?(bgirard)
Comment on attachment 8528492 [details] [diff] [review] clean up memory leaks Review of attachment 8528492 [details] [diff] [review]: ----------------------------------------------------------------- Sorry for the late review. It looks good!
Attachment #8528492 - Flags: review?(bgirard) → review+
Thanks!
Keywords: checkin-needed
Flags: needinfo?(ttromey)
(In reply to Carsten Book [:Tomcat] from comment #52) > sorry had to back this out for memory leaks like > https://treeherder.mozilla.org/logviewer.html#?job_id=4547839&repo=mozilla- > inbound No problem. It seems I'll probably have to roll in the fix for bug 1067547 after all. Well, or back out the MOZ_COUNT_*TOR bits, though that seems like cheating.
Flags: needinfo?(ttromey)
I looked into this. The "final" patch had a MOZ_COUNT_DTOR marker left in it, probably due to a rebase error on my part. This caused the apparent "leak" -- but note first that the leak reports a huge number due to underflow; and second that there isn't actually a leak at all, but rather bad placement of the leak detector calls vis a vis profiler initialization and shutdown. Despite what I said about cheating in comment #53, my intention here was to push these markers into the fix for bug 1067547; see comment So, I am attaching this patch and pushing it to try. I plan to carry over the r+ should that work out ok.
Attachment #8528492 - Attachment is obsolete: true
This version looks ok. Once it is in I will deal with the follow-up patch.
Keywords: checkin-needed
Status: NEW → RESOLVED
Closed: 11 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla37
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: