Closed Bug 1196430 Opened 9 years ago Closed 9 years ago

Add MozLeakHelper to get allocation site for leaks

Tracking

()

Status:

RESOLVED FIXED

Milestone:

mozilla43

Tracking Flags:

Tracking

Status

firefox43

---

fixed

People

(Reporter: BenWa, Assigned: froydnj)

References

Details

Attachments

(9 files, 2 obsolete files)

MozReview Request: Bug 1196430 - Add MozLeakHelper to get allocation site for leaks. r=froydnj 9 years ago Benoit Girard (:BenWa) 40 bytes, text/x-review-board-request		Details
part 1 - rename serialNumberRecord to SerialNumberRecord 9 years ago Nathan Froyd [:froydnj] 4.43 KB, patch	mccr8 : review+	Details \| Diff \| Splinter Review
part 2 - give SerialNumberRecord a proper constructor 9 years ago Nathan Froyd [:froydnj] 2.23 KB, patch	mccr8 : review+	Details \| Diff \| Splinter Review
part 3 - remove unnecessary nsString.h include from nsTraceRefcnt.cpp 9 years ago Nathan Froyd [:froydnj] 1.42 KB, patch	mccr8 : review+	Details \| Diff \| Splinter Review
part 4 - record allocation stacks for classes in XPCOM_MEM_LOG_CLASSES 9 years ago Nathan Froyd [:froydnj] 3.90 KB, patch	mccr8 : review-	Details \| Diff \| Splinter Review
part 5 - dump allocation stacks for leaked objects in XPCOM_MEM_LOG_CLASSES 9 years ago Nathan Froyd [:froydnj] 1.51 KB, patch		Details \| Diff \| Splinter Review
part 6 - move cut-and-paste stack fixer code into mozrunner 9 years ago Nathan Froyd [:froydnj] 12.63 KB, patch	wlach : review+	Details \| Diff \| Splinter Review
part 7 - teach process_leak_log how to symbolicate leaked object stacks 9 years ago Nathan Froyd [:froydnj] 7.44 KB, patch	mccr8 : review+	Details \| Diff \| Splinter Review
part 4 - record allocation stacks for classes in XPCOM_MEM_LOG_CLASSES 9 years ago Nathan Froyd [:froydnj] 3.95 KB, patch	mccr8 : review+	Details \| Diff \| Splinter Review
part 5 - dump allocation stacks for leaked objects in XPCOM_MEM_LOG_CLASSES 9 years ago Nathan Froyd [:froydnj] 1.73 KB, patch	mccr8 : review+	Details \| Diff \| Splinter Review
part 8 - use less reinterpret_cast in nsTraceRefcnt.cpp 9 years ago Nathan Froyd [:froydnj] 4.17 KB, patch	mccr8 : review+	Details \| Diff \| Splinter Review

Benoit Girard (:BenWa)

Reporter

Description

•

9 years ago

We have tools to track leaks and report the allocation sites (LSan, DMD etc...) but it's not easy to get them running in automation.

I'm looking for a simple helper to solve bug 1072313 and similar. I need a solution that will work in automation.

Benoit Girard (:BenWa)

Reporter

Comment 1

•

9 years ago

Attached file MozReview Request: Bug 1196430 - Add MozLeakHelper to get allocation site for leaks. r=froydnj — Details

Bug 1196430 - Add MozLeakHelper to get allocation site for leaks. r=mccr8

Attachment #8650112 - Flags: review?(continuation)

Andrew McCreight [:mccr8]

Comment 2

•

9 years ago

(For the record, we are running LSan in automation, but only for a single platform, 64-bit opt Linux.)

Andrew McCreight [:mccr8]

Updated

•

9 years ago

Component: General → XPCOM

Andrew McCreight [:mccr8]

Comment 3

•

9 years ago

Comment on attachment 8650112 [details]
MozReview Request: Bug 1196430 - Add MozLeakHelper to get allocation site for leaks. r=froydnj

Sorry, I spent some time trying to work this out, but I think I'm going to have to bail on this. I think Nathan would be a better reviewer. My one comment is that at a high level it seems like it would make more sense to use the existing XPCOM refcount logging tables, which already can store per-object information, though given how much reworking of the patch that would require, maybe it isn't worth it.

Attachment #8650112 - Flags: review?(continuation) → review?(nfroyd)

Benoit Girard (:BenWa)

Reporter

Comment 4

•

9 years ago

(In reply to Andrew McCreight [:mccr8] from comment #3)
> though given how much reworking of the patch that
> would require, maybe it isn't worth it.

Most of the patch is getting stacks out of the profiler. I'm fine with reworking the rest but I didn't want to over-engineer stuff that no one really needs. At the same time I'm open to spending a bit more time building something that is useful for more people. Ideally I'd like to be able to watch objects that aren't refcount.

:nfroyd note that the profile traverse code is copy-pasted. If you're uncomfortable reviewing that then I could ask ehsan.

Benoit Girard (:BenWa)

Reporter

Updated

•

9 years ago

Assignee: nobody → bgirard

Nathan Froyd [:froydnj]

Assignee

Comment 5

•

9 years ago

Comment on attachment 8650112 [details]
MozReview Request: Bug 1196430 - Add MozLeakHelper to get allocation site for leaks. r=froydnj

https://reviewboard.mozilla.org/r/16549/#review15157

mccr8's suggestion of tying this into the existing machinery makes a lot of sense, and might help provide some insight into some of those weird intermittent leaks we see on TBPL.  Though I guess we'd have to verify printing the allocation stacks for the leaks doesn't botch the output...

I assume when you say "Ideally I'd like to be able to watch objects that aren't refcount", you mean objects that aren't tied to XPCOM refcounting?  Because you can use things like MOZ_COUNT_CTOR/DTOR to tie non-refcounted things into the refcount logging stuff.  I want to try to understand your use-case (beyond just the intermittent this bug is for, for which this stuff seems useful) before sending you off to tie things into nsTraceRefcnt more deeply.

Bikesheddy comments below.

::: tools/profiler/core/platform.cpp
(Diff revision 1)
> -  if (!profiler_is_active()) {

Removing this seems...less than good.  Am I missing something?

::: xpcom/base/nsTraceRefcnt.h:60
(Diff revision 1)
> +  ~MozLeakWatcher();

I would think you'd want to delete copy/move constructors/operator= here to ensure people don't do anything funny.

::: xpcom/base/nsTraceRefcnt.h:66
(Diff revision 1)
> +  static void PrintWatchedObject();

This should be |PrintWatchedObjects|, yes?

::: xpcom/base/nsTraceRefcnt.h:68
(Diff revision 1)
> +  void Print() const;

Does this need to be public?

::: xpcom/base/nsTraceRefcnt.cpp:1351
(Diff revision 1)
> +struct MOZ_STACK_CLASS AutoLeakWatchLock final

Maybe instead of copying-pasting AutoTraceLogLock, we could have:

template<mozilla::Atomic<bool, mozilla::ReleaseAcquire>& LockLocation>
class AutoTraceLogLock final
{
  ....
};

and then use that appropriately for this and the current AutoTraceLogLock?  (This is bikeshedding, but this is grotty code already and we might as well not make it any worse...)

Attachment #8650112 - Flags: review?(nfroyd)

Benoit Girard (:BenWa)

Reporter

Comment 6

•

9 years ago

(In reply to Nathan Froyd [:froydnj][:nfroyd] from comment #5)
> I assume when you say "Ideally I'd like to be able to watch objects that
> aren't refcount", you mean objects that aren't tied to XPCOM refcounting? 
> Because you can use things like MOZ_COUNT_CTOR/DTOR to tie non-refcounted
> things into the refcount logging stuff.  I want to try to understand your
> use-case (beyond just the intermittent this bug is for, for which this stuff
> seems useful) before sending you off to tie things into nsTraceRefcnt more
> deeply.
> 

I also want to avoid opting everything in gecko into expensive tracking. With this approach, when you expect something to leak then you can opt in something in particular into the track. This works well for objects that are allocates not overly frequently.


> ::: tools/profiler/core/platform.cpp
> (Diff revision 1)
> > -  if (!profiler_is_active()) {
> 
> Removing this seems...less than good.  Am I missing something?
> 

Yes, getting a backtrace is useful for general debugging, as I'm starting to use it here. I've moved the profiler_is_active() check into callers that only need backtrace to report to the profiler. But in general it's useful to get the best stack available (pseudo or native+pseudo) for debugging features regardless of the state of the profiler.

> and then use that appropriately for this and the current AutoTraceLogLock? 
> (This is bikeshedding, but this is grotty code already and we might as well
> not make it any worse...)

I don't mind.

Benoit Girard (:BenWa)

Reporter

Comment 7

•

9 years ago

https://treeherder.mozilla.org/#/jobs?repo=try&revision=88258db04ac0

Benoit Girard (:BenWa)

Reporter

Updated

•

9 years ago

Attachment #8650112 - Attachment description: MozReview Request: Bug 1196430 - Add MozLeakHelper to get allocation site for leaks. r=mccr8 → MozReview Request: Bug 1196430 - Add MozLeakHelper to get allocation site for leaks. r=froydnj

Attachment #8650112 - Flags: review?(nfroyd)

Benoit Girard (:BenWa)

Reporter

Comment 8

•

9 years ago

Comment on attachment 8650112 [details]
MozReview Request: Bug 1196430 - Add MozLeakHelper to get allocation site for leaks. r=froydnj

Bug 1196430 - Add MozLeakHelper to get allocation site for leaks. r=froydnj

Nathan Froyd [:froydnj]

Assignee

Comment 9

•

9 years ago

Attached patch part 1 - rename serialNumberRecord to SerialNumberRecord — Details — Splinter Review

This is a small cleanup to make the class name conform to coding standards.

Attachment #8652926 - Flags: review?(continuation)

Nathan Froyd [:froydnj]

Assignee

Comment 10

•

9 years ago

Attached patch part 2 - give SerialNumberRecord a proper constructor — Details — Splinter Review

Using C++ style is so much nicer for these sorts of things.  This change
also paves the way for easily using non-POD things in SerialNumberRecord.

Attachment #8652927 - Flags: review?(continuation)

Nathan Froyd [:froydnj]

Assignee

Comment 11

•

9 years ago

Attached patch part 3 - remove unnecessary nsString.h include from nsTraceRefcnt.cpp — Details — Splinter Review

Noticed in passing.

Attachment #8652928 - Flags: review?(continuation)

Nathan Froyd [:froydnj]

Assignee

Comment 12

•

9 years ago

Attached patch part 4 - record allocation stacks for classes in XPCOM_MEM_LOG_CLASSES (obsolete) — Details — Splinter Review

Attachment #8652929 - Flags: review?(continuation)

Nathan Froyd [:froydnj]

Assignee

Comment 13

•

9 years ago

Attached patch part 5 - dump allocation stacks for leaked objects in XPCOM_MEM_LOG_CLASSES (obsolete) — Details — Splinter Review

Putting this XPCOM_MEM_BLOAT_LOG seemed reasonable, especially since it's only
for selected classes.

Attachment #8652930 - Flags: review?(continuation)

Nathan Froyd [:froydnj]

Assignee

Comment 14

•

9 years ago

Attached patch part 6 - move cut-and-paste stack fixer code into mozrunner — Details — Splinter Review

This code is cut-and-pasted in several different places around the tree.
Let's put it in a common place.

We talked about where to put it in #ateam.  If you don't like this choice, this
bug can be the official record of you saying so. :)

Attachment #8652931 - Flags: review?(wlachance)

Nathan Froyd [:froydnj]

Assignee

Comment 15

•

9 years ago

Attached patch part 7 - teach process_leak_log how to symbolicate leaked object stacks — Details — Splinter Review

This is not the greatest thing ever, but this way the allocation stacks come
out nicely in automation...

Attachment #8652932 - Flags: review?(continuation)

Nathan Froyd [:froydnj]

Assignee

Comment 16

•

9 years ago

OK, alternate patch series up that does the integration with the xpcom leak checking.  Testing at home says:

== BloatView: ALL (cumulative) LEAK AND BLOAT STATISTICS, default process 9263

     |<----------------Class--------------->|<-----Bytes------>|<----Objects---->|
     |                                      | Per-Inst   Leaked|   Total      Rem|
   0 |TOTAL                                 |       27        4| 4797014        1|
 187 |DeliberatelyLeaked                    |        4        4|       1        1|

nsTraceRefcnt::DumpStatistics: 1299 entries

Serial Numbers of Leaked Objects:
1 @0x7f7f1d0043b8 (0 references; 0 from COMPtrs)
allocation stack:
#01: nsConsoleService::nsConsoleService() (/home/froydnj/src/gecko-dev.git/xpcom/base/nsConsoleService.cpp:80)
#02: nsConsoleServiceConstructor (/home/froydnj/src/gecko-dev.git/xpcom/build/XPCOMInit.cpp:207)
#03: nsComponentManagerImpl::CreateInstanceByContractID(char const*, nsISupports*, nsID const&, void**) (/home/froydnj/src/gecko-dev.git/xpcom/components/nsComponentManager.cpp:1224)
#04: nsComponentManagerImpl::GetServiceByContractID(char const*, nsID const&, void**) (/home/froydnj/src/gecko-dev.git/xpcom/components/nsComponentManager.cpp:1580)
#05: nsGetServiceByContractID::operator()(nsID const&, void**) const (/home/froydnj/src/gecko-dev.git/xpcom/glue/nsComponentManagerUtils.cpp:281)
#06: nsCOMPtr<nsIConsoleService>::assign_from_gs_contractid(nsGetServiceByContractID, nsID const&) (/opt/build/froydnj/build-debug/xpcom/components/../../dist/include/nsCOMPtr.h:1121)
#07: LogMessage(char const*, ...) (/home/froydnj/src/gecko-dev.git/xpcom/components/ManifestParser.cpp:189)
#08: ~nsACString_internal (/opt/build/froydnj/build-debug/xpcom/components/../../dist/include/nsTSubstring.h:95)
#09: nsComponentManagerImpl::RegisterModule(mozilla::Module const*, mozilla::FileLocation*) (/home/froydnj/src/gecko-dev.git/xpcom/components/nsComponentManager.cpp:498 (discriminator 3))
#10: ~nsACString_internal (/opt/build/froydnj/build-debug/xpcom/components/../../dist/include/nsTSubstring.h:95)
#11: ParseManifest(NSLocationType, mozilla::FileLocation&, char*, bool, bool) (/home/froydnj/src/gecko-dev.git/xpcom/components/ManifestParser.cpp:808 (discriminator 4))
#12: DoRegisterManifest (/home/froydnj/src/gecko-dev.git/xpcom/components/nsComponentManager.cpp:638)
#13: nsComponentManagerImpl::ManifestManifest(nsComponentManagerImpl::ManifestProcessingContext&, int, char* const*) (/home/froydnj/src/gecko-dev.git/xpcom/components/nsComponentManager.cpp:660)
#14: ParseManifest(NSLocationType, mozilla::FileLocation&, char*, bool, bool) (/home/froydnj/src/gecko-dev.git/xpcom/components/ManifestParser.cpp:808 (discriminator 4))
#15: DoRegisterManifest (/home/froydnj/src/gecko-dev.git/xpcom/components/nsComponentManager.cpp:638)
#16: nsComponentManagerImpl::RegisterManifest(NSLocationType, mozilla::FileLocation&, bool) (/home/froydnj/src/gecko-dev.git/xpcom/components/nsComponentManager.cpp:651)
#17: nsComponentManagerImpl::Init() (/home/froydnj/src/gecko-dev.git/xpcom/components/nsComponentManager.cpp:451)
#18: NS_InitXPCOM2 (/home/froydnj/src/gecko-dev.git/xpcom/build/XPCOMInit.cpp:701)
#19: ScopedXPCOMStartup::Initialize() (/home/froydnj/src/gecko-dev.git/toolkit/xre/nsAppRunner.cpp:1560)
#20: XREMain::XRE_main(int, char**, nsXREAppData const*) (/home/froydnj/src/gecko-dev.git/toolkit/xre/nsAppRunner.cpp:4387)
#21: XRE_main (/home/froydnj/src/gecko-dev.git/toolkit/xre/nsAppRunner.cpp:4479)
#22: do_main (/home/froydnj/src/gecko-dev.git/browser/app/nsBrowserApp.cpp:204)
#23: main (/home/froydnj/src/gecko-dev.git/browser/app/nsBrowserApp.cpp:399)
#24: __libc_start_main (/build/glibc-Ir_s5K/glibc-2.19/csu/libc-start.c:321)
#25: _start (/opt/build/froydnj/build-debug/dist/bin/firefox)
#26: ??? (???:???)
TEST-INFO | leakcheck | default process: leaked 1 DeliberatelyLeaked (4 bytes)
TEST-UNEXPECTED-FAIL | leakcheck | default process: 4 bytes leaked (DeliberatelyLeaked)

which is pretty nice.  You do need to set XPCOM_MEM_LOG_CLASSES=DeliberatelyLeaked, but the nice thing about this patch series is that we can modify the per-platform test options to track different sets of objects across tests and platforms.  And there's no overhead if we're not tracking anything.

Andrew McCreight [:mccr8]

Comment 17

•

9 years ago

Comment on attachment 8652926 [details] [diff] [review]
part 1 - rename serialNumberRecord to SerialNumberRecord

Review of attachment 8652926 [details] [diff] [review]:
-----------------------------------------------------------------

::: xpcom/base/nsTraceRefcnt.cpp
@@ +598,5 @@
>    PLHashEntry** hep = PL_HashTableRawLookup(gSerialNumbers,
>                                              HashNumber(aPtr),
>                                              aPtr);
>    if (hep && *hep) {
> +    return &((reinterpret_cast<SerialNumberRecord*>((*hep)->value))->refCount);

Gross. Out of curiousity, can this hash table be converted to something less low level?

Attachment #8652926 - Flags: review?(continuation) → review+

Andrew McCreight [:mccr8]

Comment 18

•

9 years ago

Comment on attachment 8652927 [details] [diff] [review]
part 2 - give SerialNumberRecord a proper constructor

Review of attachment 8652927 [details] [diff] [review]:
-----------------------------------------------------------------

Thanks for cleaning this up.

::: xpcom/base/nsTraceRefcnt.cpp
@@ +591,2 @@
>      PL_HashTableRawAdd(gSerialNumbers, hep, HashNumber(aPtr),
>                         aPtr, reinterpret_cast<void*>(record));

Can this just be a static_cast?

Attachment #8652927 - Flags: review?(continuation) → review+

Andrew McCreight [:mccr8]

Comment 19

•

9 years ago

Comment on attachment 8652928 [details] [diff] [review]
part 3 - remove unnecessary nsString.h include from nsTraceRefcnt.cpp

Review of attachment 8652928 [details] [diff] [review]:
-----------------------------------------------------------------

::: xpcom/base/nsTraceRefcnt.cpp
@@ +583,5 @@
>    PLHashEntry** hep = PL_HashTableRawLookup(gSerialNumbers,
>                                              HashNumber(aPtr),
>                                              aPtr);
>    if (hep && *hep) {
> +    return reinterpret_cast<SerialNumberRecord*>((*hep)->value)->serialNumber;

This change should go in part 1.

Attachment #8652928 - Flags: review?(continuation) → review+

Andrew McCreight [:mccr8]

Comment 20

•

9 years ago

Comment on attachment 8652929 [details] [diff] [review]
part 4 - record allocation stacks for classes in XPCOM_MEM_LOG_CLASSES

Review of attachment 8652929 [details] [diff] [review]:
-----------------------------------------------------------------

::: xpcom/base/nsTraceRefcnt.cpp
@@ +124,5 @@
>  
>    intptr_t serialNumber;
>    int32_t refCount;
>    int32_t COMPtrCount;
> +  std::vector<std::string> allocationStack;

Please add a comment that you are using this because we can't use leak checking data structures inside leak checking.

@@ +884,5 @@
> +RecordStackFrame(uint32_t aFrameNumber, void* aPC, void* aSP,
> +                 void* aClosure)
> +{
> +  auto locations = static_cast<std::vector<std::string>*>(aClosure);
> +  static const size_t buflen = 1024;

nit: bufLen ?

@@ +886,5 @@
> +{
> +  auto locations = static_cast<std::vector<std::string>*>(aClosure);
> +  static const size_t buflen = 1024;
> +  char buf[buflen];
> +  gCodeAddressService->GetLocation(aFrameNumber, aPC, buf, buflen);

I know this code doesn't require the highest possible performance, but I think it would be much faster to store a list of PCs, and only actually call into the location service to generate strings for objects that leak. I can be convinced this doesn't matter if you can record nsGlobalWindow allocations without 90% of the time being spent in the location service, which is what refcount logging ends up like (well, 99% but that records a lot more). r- for this. Otherwise this looks good.

@@ +922,5 @@
> +  if (!gCodeAddressService) {
> +    gCodeAddressService = new WalkTheStackCodeAddressService();
> +  }
> +  static const int kFramesToSkip =
> +    0 +                         // this frame gets inlined

Is there some reason these comments aren't over more to the left? Seems a little weird.

Attachment #8652929 - Flags: review?(continuation) → review-

Andrew McCreight [:mccr8]

Comment 21

•

9 years ago

Comment on attachment 8652930 [details] [diff] [review]
part 5 - dump allocation stacks for leaked objects in XPCOM_MEM_LOG_CLASSES

Review of attachment 8652930 [details] [diff] [review]:
-----------------------------------------------------------------

This will need to change, so clearing review for now.

Attachment #8652930 - Flags: review?(continuation)

William Lachance (:wlach)

Comment 22

•

9 years ago

Comment on attachment 8652931 [details] [diff] [review]
part 6 - move cut-and-paste stack fixer code into mozrunner

Review of attachment 8652931 [details] [diff] [review]:
-----------------------------------------------------------------

Looks like a nice cleanup. r=me with my nit addressed.

::: testing/mozbase/mozrunner/mozrunner/utils.py
@@ +244,5 @@
>  
>      return env
> +
> +
> +def stackFixerFunction(utilityPath, symbolsPath):

To be consistent with the style elsewhere in this module, name this get_stack_fixer_function

Attachment #8652931 - Flags: review?(wlachance) → review+

Andrew McCreight [:mccr8]

Comment 23

•

9 years ago

(In reply to Nathan Froyd [:froydnj][:nfroyd] from comment #16)
> which is pretty nice.  You do need to set
> XPCOM_MEM_LOG_CLASSES=DeliberatelyLeaked, but the nice thing about this
> patch series is that we can modify the per-platform test options to track
> different sets of objects across tests and platforms.  And there's no
> overhead if we're not tracking anything.

This looks pretty good. How do you set environment variables like this in automation?

Andrew McCreight [:mccr8]

Comment 24

•

9 years ago

Comment on attachment 8652932 [details] [diff] [review]
part 7 - teach process_leak_log how to symbolicate leaked object stacks

Review of attachment 8652932 [details] [diff] [review]:
-----------------------------------------------------------------

::: testing/mozbase/mozleak/mozleak/leaklog.py
@@ +16,5 @@
>  
>  
>  def process_single_leak_file(leakLogFileName, processType, leakThreshold,
> +                             ignoreMissingLeaks, log=None,
> +                             utilityPath=None, symbolsPath=None):

It would be a little nicer to just pass in a stack fixer as opposed to these weird path things, but okay.

Attachment #8652932 - Flags: review?(continuation) → review+

Nathan Froyd [:froydnj]

Assignee

Comment 25

•

9 years ago

(In reply to Andrew McCreight [:mccr8] from comment #23)
> (In reply to Nathan Froyd [:froydnj][:nfroyd] from comment #16)
> > which is pretty nice.  You do need to set
> > XPCOM_MEM_LOG_CLASSES=DeliberatelyLeaked, but the nice thing about this
> > patch series is that we can modify the per-platform test options to track
> > different sets of objects across tests and platforms.  And there's no
> > overhead if we're not tracking anything.
> 
> This looks pretty good. How do you set environment variables like this in
> automation?

I believe the way to do it is by modifying files like:

individual mochitest suites: http://mxr.mozilla.org/mozilla-central/source/testing/mozharness/configs/unittests/mac_unittest.py#50
all mochitests: http://mxr.mozilla.org/mozilla-central/source/testing/config/mozharness/mac_config.py#26

adding the --setenv XPCOM_MEM_LOG_CLASSES=$CLASS option.  Knowing our test suites, all of the harnesses probably have slightly different spellings for how to set an environment variable.

(In reply to Andrew McCreight [:mccr8] from comment #24)
> >  def process_single_leak_file(leakLogFileName, processType, leakThreshold,
> > +                             ignoreMissingLeaks, log=None,
> > +                             utilityPath=None, symbolsPath=None):
> 
> It would be a little nicer to just pass in a stack fixer as opposed to these
> weird path things, but okay.

I can do that.

(In reply to Andrew McCreight [:mccr8] from comment #18)
> ::: xpcom/base/nsTraceRefcnt.cpp
> @@ +591,2 @@
> >      PL_HashTableRawAdd(gSerialNumbers, hep, HashNumber(aPtr),
> >                         aPtr, reinterpret_cast<void*>(record));
> 
> Can this just be a static_cast?

Looks like virtually every reinterpret_cast in here should be static_cast.  I can fix that in this bug or in a followup.

(In reply to Andrew McCreight [:mccr8] from comment #20)
> ::: xpcom/base/nsTraceRefcnt.cpp
> @@ +124,5 @@
> >  
> >    intptr_t serialNumber;
> >    int32_t refCount;
> >    int32_t COMPtrCount;
> > +  std::vector<std::string> allocationStack;
> 
> Please add a comment that you are using this because we can't use leak
> checking data structures inside leak checking.

Yup.  I think that's why we're using PL_HashTable and not something slightly nicer.  I guess we could use std::unordered_set or something.  I thought PLDHashTable was leak-checked, but apparently not...we could try using that.

> @@ +884,5 @@
> > +RecordStackFrame(uint32_t aFrameNumber, void* aPC, void* aSP,
> > +                 void* aClosure)
> > +{
> > +  auto locations = static_cast<std::vector<std::string>*>(aClosure);
> > +  static const size_t buflen = 1024;
> 
> nit: bufLen ?

I was copying from the function above, but I can do that.

> @@ +886,5 @@
> > +{
> > +  auto locations = static_cast<std::vector<std::string>*>(aClosure);
> > +  static const size_t buflen = 1024;
> > +  char buf[buflen];
> > +  gCodeAddressService->GetLocation(aFrameNumber, aPC, buf, buflen);
> 
> I know this code doesn't require the highest possible performance, but I
> think it would be much faster to store a list of PCs, and only actually call
> into the location service to generate strings for objects that leak. I can
> be convinced this doesn't matter if you can record nsGlobalWindow
> allocations without 90% of the time being spent in the location service,
> which is what refcount logging ends up like (well, 99% but that records a
> lot more). r- for this. Otherwise this looks good.

That's a good point, I'll do that.

> @@ +922,5 @@
> > +  if (!gCodeAddressService) {
> > +    gCodeAddressService = new WalkTheStackCodeAddressService();
> > +  }
> > +  static const int kFramesToSkip =
> > +    0 +                         // this frame gets inlined
> 
> Is there some reason these comments aren't over more to the left? Seems a
> little weird.

That's where' emacs's M-; put them; I can move them over a little, I suppose.

Andrew McCreight [:mccr8]

Comment 26

•

9 years ago

If you could add a little blurb about how to set this in our test suites to this page I'd appreciate it:
https://developer.mozilla.org/en-US/docs/Mozilla/Performance/Refcount_tracing_and_balancing

Nathan Froyd [:froydnj]

Assignee

Comment 27

•

9 years ago

Attached patch part 4 - record allocation stacks for classes in XPCOM_MEM_LOG_CLASSES — Details — Splinter Review

Now with more efficiency.

Attachment #8652929 - Attachment is obsolete: true

Attachment #8653049 - Flags: review?(continuation)

Nathan Froyd [:froydnj]

Assignee

Comment 28

•

9 years ago

Attached patch part 5 - dump allocation stacks for leaked objects in XPCOM_MEM_LOG_CLASSES — Details — Splinter Review

Attachment #8652930 - Attachment is obsolete: true

Attachment #8653050 - Flags: review?(continuation)

Nathan Froyd [:froydnj]

Assignee

Comment 29

•

9 years ago

Attached patch part 8 - use less reinterpret_cast in nsTraceRefcnt.cpp — Details — Splinter Review

Attachment #8653051 - Flags: review?(continuation)

Andrew McCreight [:mccr8]

Comment 30

•

9 years ago

Comment on attachment 8653049 [details] [diff] [review]
part 4 - record allocation stacks for classes in XPCOM_MEM_LOG_CLASSES

Review of attachment 8653049 [details] [diff] [review]:
-----------------------------------------------------------------

Thanks.

::: xpcom/base/nsTraceRefcnt.cpp
@@ +918,5 @@
> +static void
> +WalkTheStackSavingLocations(std::vector<void*>& aLocations)
> +{
> +#ifdef MOZ_STACKWALKING
> +  if (!gCodeAddressService) {

nit: RecordStackFrame doesn't use the address service any more, so you don't have to initialize it here. That should get pushed into DumpSerialNumbers in the next patch.

Attachment #8653049 - Flags: review?(continuation) → review+

Andrew McCreight [:mccr8]

Comment 31

•

9 years ago

(In reply to Nathan Froyd [:froydnj][:nfroyd] from comment #25)
> Looks like virtually every reinterpret_cast in here should be static_cast. 
> I can fix that in this bug or in a followup.
A followup would be fine. I just prefer to have non-scary casts be static_casts, so when I do see a reinterpret_cast I know to be more cautious.

Andrew McCreight [:mccr8]

Updated

•

9 years ago

Attachment #8653050 - Flags: review?(continuation) → review+

Andrew McCreight [:mccr8]

Comment 32

•

9 years ago

Thanks for working on this, Benoit and Nathan. It will be nice to have a little light LSan-ish thing that works on all of our platforms.

Nathan Froyd [:froydnj]

Assignee

Comment 33

•

9 years ago

Comment on attachment 8650112 [details]
MozReview Request: Bug 1196430 - Add MozLeakHelper to get allocation site for leaks. r=froydnj

Canceling this review, then.

Attachment #8650112 - Flags: review?(nfroyd)

Andrew McCreight [:mccr8]

Updated

•

9 years ago

Attachment #8653051 - Flags: review?(continuation) → review+

Benoit Girard (:BenWa)

Reporter

Comment 34

•

9 years ago

Looks nice, thanks for doing a better version :nfroyd!

(In reply to Nathan Froyd [:froydnj][:nfroyd] from comment #16)
>  You do need to set
> XPCOM_MEM_LOG_CLASSES=DeliberatelyLeaked, but the nice thing about this
> patch series is that we can modify the per-platform test options to track
> different sets of objects across tests and platforms.

To bike-shed a bit. I hope that's well documented. We have so many platforms, and sub test configuration, often with their own obscure differences and exception, that from my experience I often end up activating a debug option in the wrong place to later notice that on that test' platform/configuration we don't run that for some reason and have to do another push. But if it's relatively simple that's fine.

(For instance how b2g mac desktop builds have random options from both b2g and mac set)

That's what I liked about just adding a member. I'm going to need to make a tree edit anyways and if I want to restrict to a particular platform I can #ifdef it.

Benoit Girard (:BenWa)

Reporter

Updated

•

9 years ago

Assignee: bgirard → nfroyd

Nathan Froyd [:froydnj]

Assignee

Comment 35

•

9 years ago

Just to confirm, we can get stacks on try:

08:37:12     INFO -  Serial Numbers of Leaked Objects:
08:37:12     INFO -  4363 @0x10bc151f0 (1 references; 0 from COMPtrs)
08:37:12     INFO -  allocation stack:
08:37:45     INFO -  #00: mozilla::AppleVDADecoder::OutputFrame(mozilla::CFRefPtr<__CVBuffer*>, mozilla::AppleVDADecoder::AppleFrameRef) [mfbt/nsRefPtr.h:285]
08:37:45     INFO -  #01: nsRunnableMethodImpl<nsresult (mozilla::AppleVDADecoder::*)(mozilla::CFRefPtr<__CVBuffer *>, mozilla::AppleVDADecoder::AppleFrameRef), true, mozilla::CFRefPtr<__CVBuffer *>, mozilla::AppleVDADecoder::AppleFrameRef>::Run [dom/media/platforms/apple/AppleUtils.h:82]
08:37:45     INFO -  #02: mozilla::TaskQueue::Runner::Run() [xpcom/threads/TaskQueue.h:135]
08:37:45     INFO -  #03: nsThreadPool::Run() [xpcom/glue/nsCOMPtr.h:403]
08:37:45     INFO -  #04: _ZThn8_N12nsThreadPool3RunEv [/var/folders/vs/t_n93bks6pd2b2mjk9q8pmdr00000w/T/tmpdywaKJ.cpp:242]
08:37:45     INFO -  #05: nsThread::ProcessNextEvent(bool, bool*) [xpcom/threads/nsThread.cpp:864]
08:37:45     INFO -  #06: NS_ProcessNextEvent(nsIThread*, bool) [xpcom/glue/nsThreadUtils.cpp:277]
08:37:45     INFO -  #07: mozilla::ipc::MessagePumpForNonMainThreads::Run(base::MessagePump::Delegate*) [ipc/glue/MessagePump.cpp:355]
08:37:45     INFO -  #08: MessageLoop::RunInternal() [ipc/chromium/src/base/message_loop.cc:235]
08:37:45     INFO -  #09: MessageLoop::Run() [ipc/chromium/src/base/message_loop.cc:520]
08:37:45     INFO -  #10: nsThread::ThreadFunc(void*) [xpcom/threads/nsThread.cpp:361]
08:37:46     INFO -  #11: _pt_root [nsprpub/pr/src/pthreads/ptthread.c:215]
08:37:46     INFO -  #12: libSystem.B.dylib + 0x39fd6
08:37:46     INFO -  TEST-INFO | leakcheck | default process: leaked 1 MacIOSurface (40 bytes)

And this was the patch:

https://hg.mozilla.org/try/diff/b7a43d0d2700/testing/config/mozharness/mac_config.py

Nathan Froyd [:froydnj]

Assignee

Comment 36

•

9 years ago

OK, after looking through logs and tarballs, I think I understand what's going wrong with the b2g mochitests and this patch series.

The "common" test zip for mochitests gets unpacked at /builds/slave/test/build/tests.  Let's call this $ROOT from now on.  Unpacking this creates the files we're interested in, such as:

bin/fix_linux_stack.py
bin/fix_stack_using_bpsyms.py
bin/xpcshell

The problem is that the common package is completed unused by the emulator tests.  xpcshell (for mochitests) is obtained by downloading packages from tooltool (!) rather than using the xpcshell we just built.  So runtestsb2g.py never gets informed that there are interesting Python packages available at $ROOT/bin and we can't import them.

What's the right way to fix this state of affairs?  I guess one could upload new packages to tooltool with the fix*.py files?  It looks like I ought to be able to add a --utility-path option here:

http://mxr.mozilla.org/mozilla-central/source/testing/config/mozharness/b2g_emulator_config.py#63

but given that b2g configs eschew --utility-path, it's not clear to me that utility_path would actually be what I want there, and I can't tell what it would be from the logs.

Are there other options that don't descend into a morass of buildbot configury?

Flags: needinfo?(wlachance)

William Lachance (:wlach)

Comment 37

•

9 years ago

I'm not the right person to ask about this. :ahal, can you help ^^^?

Flags: needinfo?(wlachance) → needinfo?(ahalberstadt)

Andrew Halberstadt [:ahal]

Comment 38

•

9 years ago

Yeah, it has to download the binaries from tooltool because the ones that just got built are for ARM and won't run on the test slaves.

Assuming this MozLeakHelper thing doesn't have the same problem, I don't think there's any particular reason that you couldn't use --utility-path. I glanced at the mochitest harness, and it looks like passing it in shouldn't interfere with any b2g things (as long as --xre-path is also passed in).

Failing that, you could either copy it to somewhere the harness knows about in the mozharness script, or create a new command line option to find it (would prefer to avoid this though). Either way, this should be solvable in mozharness only and shouldn't need modifications to buildbot-configs.

Flags: needinfo?(ahalberstadt)

Nathan Froyd [:froydnj]

Assignee

Comment 39

•

9 years ago

(In reply to Andrew Halberstadt [:ahal] from comment #38)
> Yeah, it has to download the binaries from tooltool because the ones that
> just got built are for ARM and won't run on the test slaves.

...  I guess tests go faster if we have a native xpcshell serving things than if we have an ARM xpcshell.

> Assuming this MozLeakHelper thing doesn't have the same problem, I don't
> think there's any particular reason that you couldn't use --utility-path. I
> glanced at the mochitest harness, and it looks like passing it in shouldn't
> interfere with any b2g things (as long as --xre-path is also passed in).

Is there some way I can verify where --utility-path is going to point to?

Flags: needinfo?(ahalberstadt)

Andrew Halberstadt [:ahal]

Comment 40

•

9 years ago

(In reply to Nathan Froyd [:froydnj] from comment #39)
> ...  I guess tests go faster if we have a native xpcshell serving things
> than if we have an ARM xpcshell.

The way mochitest is set up the tests *need* to be served from the host, but it's probably also faster :)

> Is there some way I can verify where --utility-path is going to point to?

You could add a print statement to the test harness I guess. But you'll be building --utility-path manually in the mozharness script. What you call $ROOT in comment 36, is called 'abs_test_install_dir' in the mozharness script:
https://dxr.mozilla.org/mozilla-central/source/testing/mozharness/scripts/b2g_emulator_unittest.py#157

You can use that as a starting point for building utility_path. Grep for 'xre_path' in that file to see where to set it. Then you'll be able to add --utility-path=%(utility_path)s to the config file:
https://dxr.mozilla.org/mozilla-central/rev/5fe9ed3edd6811a662d40d05e37b0d66e9520d82/testing/mozharness/configs/b2g/emulator_automation_config.py

Flags: needinfo?(ahalberstadt)

Nathan Froyd [:froydnj]

Assignee

Updated

•

9 years ago

Depends on: 1202716

Pulsebot

Comment 41

•

9 years ago

https://hg.mozilla.org/integration/mozilla-inbound/rev/ae7368db9b08
https://hg.mozilla.org/integration/mozilla-inbound/rev/884124b5b4d8
https://hg.mozilla.org/integration/mozilla-inbound/rev/f7734f2e5b0c
https://hg.mozilla.org/integration/mozilla-inbound/rev/ecd7d8204d45
https://hg.mozilla.org/integration/mozilla-inbound/rev/c14dc117d23f
https://hg.mozilla.org/integration/mozilla-inbound/rev/95b4e5d5b14c
https://hg.mozilla.org/integration/mozilla-inbound/rev/3f6c4bdd6295
https://hg.mozilla.org/integration/mozilla-inbound/rev/5b0c8bcaaa9a

Pulsebot

Comment 42

•

9 years ago

https://hg.mozilla.org/integration/mozilla-inbound/rev/d1ae50174aea

(no longer active)

Comment 43

•

9 years ago

I landed this as a fix to a B2G Desktop Windows build bustage:

https://hg.mozilla.org/integration/mozilla-inbound/rev/d1ae50174aea
<http://ftp.mozilla.org/pub/mozilla.org/b2g/tinderbox-builds/mozilla-inbound-win32_gecko/1441938725/mozilla-inbound-win32_gecko-bm84-build1-build353.txt.gz>

Carsten Book [:Tomcat]

Comment 44

•

9 years ago

https://hg.mozilla.org/mozilla-central/rev/ae7368db9b08
https://hg.mozilla.org/mozilla-central/rev/884124b5b4d8
https://hg.mozilla.org/mozilla-central/rev/f7734f2e5b0c
https://hg.mozilla.org/mozilla-central/rev/ecd7d8204d45
https://hg.mozilla.org/mozilla-central/rev/c14dc117d23f
https://hg.mozilla.org/mozilla-central/rev/95b4e5d5b14c
https://hg.mozilla.org/mozilla-central/rev/3f6c4bdd6295
https://hg.mozilla.org/mozilla-central/rev/5b0c8bcaaa9a
https://hg.mozilla.org/mozilla-central/rev/d1ae50174aea

Status: NEW → RESOLVED

Closed: 9 years ago

status-firefox43: affected → fixed

Resolution: --- → FIXED

Target Milestone: --- → mozilla43

Daniel Holbert [:dholbert]

Updated

•

9 years ago

Depends on: 1206203

Daniel Holbert [:dholbert]

Updated

•

9 years ago

No longer depends on: 1206203

You need to log in before you can comment on or make changes to this bug.