Closed Bug 1196430 Opened 5 years ago Closed 5 years ago

Add MozLeakHelper to get allocation site for leaks

Categories

(Core :: XPCOM, defect)

defect
Not set

Tracking

()

RESOLVED FIXED
mozilla43
Tracking Status
firefox43 --- fixed

People

(Reporter: BenWa, Assigned: froydnj)

References

Details

Attachments

(9 files, 2 obsolete files)

40 bytes, text/x-review-board-request
Details
4.43 KB, patch
mccr8
: review+
Details | Diff | Splinter Review
2.23 KB, patch
mccr8
: review+
Details | Diff | Splinter Review
1.42 KB, patch
mccr8
: review+
Details | Diff | Splinter Review
12.63 KB, patch
wlach
: review+
Details | Diff | Splinter Review
7.44 KB, patch
mccr8
: review+
Details | Diff | Splinter Review
3.95 KB, patch
mccr8
: review+
Details | Diff | Splinter Review
1.73 KB, patch
mccr8
: review+
Details | Diff | Splinter Review
4.17 KB, patch
mccr8
: review+
Details | Diff | Splinter Review
We have tools to track leaks and report the allocation sites (LSan, DMD etc...) but it's not easy to get them running in automation.

I'm looking for a simple helper to solve bug 1072313 and similar. I need a solution that will work in automation.
Bug 1196430 - Add MozLeakHelper to get allocation site for leaks. r=mccr8
Attachment #8650112 - Flags: review?(continuation)
(For the record, we are running LSan in automation, but only for a single platform, 64-bit opt Linux.)
Component: General → XPCOM
Comment on attachment 8650112 [details]
MozReview Request: Bug 1196430 - Add MozLeakHelper to get allocation site for leaks. r=froydnj

Sorry, I spent some time trying to work this out, but I think I'm going to have to bail on this. I think Nathan would be a better reviewer. My one comment is that at a high level it seems like it would make more sense to use the existing XPCOM refcount logging tables, which already can store per-object information, though given how much reworking of the patch that would require, maybe it isn't worth it.
Attachment #8650112 - Flags: review?(continuation) → review?(nfroyd)
(In reply to Andrew McCreight [:mccr8] from comment #3)
> though given how much reworking of the patch that
> would require, maybe it isn't worth it.

Most of the patch is getting stacks out of the profiler. I'm fine with reworking the rest but I didn't want to over-engineer stuff that no one really needs. At the same time I'm open to spending a bit more time building something that is useful for more people. Ideally I'd like to be able to watch objects that aren't refcount.

:nfroyd note that the profile traverse code is copy-pasted. If you're uncomfortable reviewing that then I could ask ehsan.
Assignee: nobody → bgirard
Comment on attachment 8650112 [details]
MozReview Request: Bug 1196430 - Add MozLeakHelper to get allocation site for leaks. r=froydnj

https://reviewboard.mozilla.org/r/16549/#review15157

mccr8's suggestion of tying this into the existing machinery makes a lot of sense, and might help provide some insight into some of those weird intermittent leaks we see on TBPL.  Though I guess we'd have to verify printing the allocation stacks for the leaks doesn't botch the output...

I assume when you say "Ideally I'd like to be able to watch objects that aren't refcount", you mean objects that aren't tied to XPCOM refcounting?  Because you can use things like MOZ_COUNT_CTOR/DTOR to tie non-refcounted things into the refcount logging stuff.  I want to try to understand your use-case (beyond just the intermittent this bug is for, for which this stuff seems useful) before sending you off to tie things into nsTraceRefcnt more deeply.

Bikesheddy comments below.

::: tools/profiler/core/platform.cpp
(Diff revision 1)
> -  if (!profiler_is_active()) {

Removing this seems...less than good.  Am I missing something?

::: xpcom/base/nsTraceRefcnt.h:60
(Diff revision 1)
> +  ~MozLeakWatcher();

I would think you'd want to delete copy/move constructors/operator= here to ensure people don't do anything funny.

::: xpcom/base/nsTraceRefcnt.h:66
(Diff revision 1)
> +  static void PrintWatchedObject();

This should be |PrintWatchedObjects|, yes?

::: xpcom/base/nsTraceRefcnt.h:68
(Diff revision 1)
> +  void Print() const;

Does this need to be public?

::: xpcom/base/nsTraceRefcnt.cpp:1351
(Diff revision 1)
> +struct MOZ_STACK_CLASS AutoLeakWatchLock final

Maybe instead of copying-pasting AutoTraceLogLock, we could have:

template<mozilla::Atomic<bool, mozilla::ReleaseAcquire>& LockLocation>
class AutoTraceLogLock final
{
  ....
};

and then use that appropriately for this and the current AutoTraceLogLock?  (This is bikeshedding, but this is grotty code already and we might as well not make it any worse...)
Attachment #8650112 - Flags: review?(nfroyd)
(In reply to Nathan Froyd [:froydnj][:nfroyd] from comment #5)
> I assume when you say "Ideally I'd like to be able to watch objects that
> aren't refcount", you mean objects that aren't tied to XPCOM refcounting? 
> Because you can use things like MOZ_COUNT_CTOR/DTOR to tie non-refcounted
> things into the refcount logging stuff.  I want to try to understand your
> use-case (beyond just the intermittent this bug is for, for which this stuff
> seems useful) before sending you off to tie things into nsTraceRefcnt more
> deeply.
> 

I also want to avoid opting everything in gecko into expensive tracking. With this approach, when you expect something to leak then you can opt in something in particular into the track. This works well for objects that are allocates not overly frequently.


> ::: tools/profiler/core/platform.cpp
> (Diff revision 1)
> > -  if (!profiler_is_active()) {
> 
> Removing this seems...less than good.  Am I missing something?
> 

Yes, getting a backtrace is useful for general debugging, as I'm starting to use it here. I've moved the profiler_is_active() check into callers that only need backtrace to report to the profiler. But in general it's useful to get the best stack available (pseudo or native+pseudo) for debugging features regardless of the state of the profiler.

> and then use that appropriately for this and the current AutoTraceLogLock? 
> (This is bikeshedding, but this is grotty code already and we might as well
> not make it any worse...)

I don't mind.
Attachment #8650112 - Attachment description: MozReview Request: Bug 1196430 - Add MozLeakHelper to get allocation site for leaks. r=mccr8 → MozReview Request: Bug 1196430 - Add MozLeakHelper to get allocation site for leaks. r=froydnj
Attachment #8650112 - Flags: review?(nfroyd)
Comment on attachment 8650112 [details]
MozReview Request: Bug 1196430 - Add MozLeakHelper to get allocation site for leaks. r=froydnj

Bug 1196430 - Add MozLeakHelper to get allocation site for leaks. r=froydnj
This is a small cleanup to make the class name conform to coding standards.
Attachment #8652926 - Flags: review?(continuation)
Using C++ style is so much nicer for these sorts of things.  This change
also paves the way for easily using non-POD things in SerialNumberRecord.
Attachment #8652927 - Flags: review?(continuation)
Noticed in passing.
Attachment #8652928 - Flags: review?(continuation)
Putting this XPCOM_MEM_BLOAT_LOG seemed reasonable, especially since it's only
for selected classes.
Attachment #8652930 - Flags: review?(continuation)
This code is cut-and-pasted in several different places around the tree.
Let's put it in a common place.

We talked about where to put it in #ateam.  If you don't like this choice, this
bug can be the official record of you saying so. :)
Attachment #8652931 - Flags: review?(wlachance)
This is not the greatest thing ever, but this way the allocation stacks come
out nicely in automation...
Attachment #8652932 - Flags: review?(continuation)
OK, alternate patch series up that does the integration with the xpcom leak checking.  Testing at home says:

== BloatView: ALL (cumulative) LEAK AND BLOAT STATISTICS, default process 9263

     |<----------------Class--------------->|<-----Bytes------>|<----Objects---->|
     |                                      | Per-Inst   Leaked|   Total      Rem|
   0 |TOTAL                                 |       27        4| 4797014        1|
 187 |DeliberatelyLeaked                    |        4        4|       1        1|

nsTraceRefcnt::DumpStatistics: 1299 entries

Serial Numbers of Leaked Objects:
1 @0x7f7f1d0043b8 (0 references; 0 from COMPtrs)
allocation stack:
#01: nsConsoleService::nsConsoleService() (/home/froydnj/src/gecko-dev.git/xpcom/base/nsConsoleService.cpp:80)
#02: nsConsoleServiceConstructor (/home/froydnj/src/gecko-dev.git/xpcom/build/XPCOMInit.cpp:207)
#03: nsComponentManagerImpl::CreateInstanceByContractID(char const*, nsISupports*, nsID const&, void**) (/home/froydnj/src/gecko-dev.git/xpcom/components/nsComponentManager.cpp:1224)
#04: nsComponentManagerImpl::GetServiceByContractID(char const*, nsID const&, void**) (/home/froydnj/src/gecko-dev.git/xpcom/components/nsComponentManager.cpp:1580)
#05: nsGetServiceByContractID::operator()(nsID const&, void**) const (/home/froydnj/src/gecko-dev.git/xpcom/glue/nsComponentManagerUtils.cpp:281)
#06: nsCOMPtr<nsIConsoleService>::assign_from_gs_contractid(nsGetServiceByContractID, nsID const&) (/opt/build/froydnj/build-debug/xpcom/components/../../dist/include/nsCOMPtr.h:1121)
#07: LogMessage(char const*, ...) (/home/froydnj/src/gecko-dev.git/xpcom/components/ManifestParser.cpp:189)
#08: ~nsACString_internal (/opt/build/froydnj/build-debug/xpcom/components/../../dist/include/nsTSubstring.h:95)
#09: nsComponentManagerImpl::RegisterModule(mozilla::Module const*, mozilla::FileLocation*) (/home/froydnj/src/gecko-dev.git/xpcom/components/nsComponentManager.cpp:498 (discriminator 3))
#10: ~nsACString_internal (/opt/build/froydnj/build-debug/xpcom/components/../../dist/include/nsTSubstring.h:95)
#11: ParseManifest(NSLocationType, mozilla::FileLocation&, char*, bool, bool) (/home/froydnj/src/gecko-dev.git/xpcom/components/ManifestParser.cpp:808 (discriminator 4))
#12: DoRegisterManifest (/home/froydnj/src/gecko-dev.git/xpcom/components/nsComponentManager.cpp:638)
#13: nsComponentManagerImpl::ManifestManifest(nsComponentManagerImpl::ManifestProcessingContext&, int, char* const*) (/home/froydnj/src/gecko-dev.git/xpcom/components/nsComponentManager.cpp:660)
#14: ParseManifest(NSLocationType, mozilla::FileLocation&, char*, bool, bool) (/home/froydnj/src/gecko-dev.git/xpcom/components/ManifestParser.cpp:808 (discriminator 4))
#15: DoRegisterManifest (/home/froydnj/src/gecko-dev.git/xpcom/components/nsComponentManager.cpp:638)
#16: nsComponentManagerImpl::RegisterManifest(NSLocationType, mozilla::FileLocation&, bool) (/home/froydnj/src/gecko-dev.git/xpcom/components/nsComponentManager.cpp:651)
#17: nsComponentManagerImpl::Init() (/home/froydnj/src/gecko-dev.git/xpcom/components/nsComponentManager.cpp:451)
#18: NS_InitXPCOM2 (/home/froydnj/src/gecko-dev.git/xpcom/build/XPCOMInit.cpp:701)
#19: ScopedXPCOMStartup::Initialize() (/home/froydnj/src/gecko-dev.git/toolkit/xre/nsAppRunner.cpp:1560)
#20: XREMain::XRE_main(int, char**, nsXREAppData const*) (/home/froydnj/src/gecko-dev.git/toolkit/xre/nsAppRunner.cpp:4387)
#21: XRE_main (/home/froydnj/src/gecko-dev.git/toolkit/xre/nsAppRunner.cpp:4479)
#22: do_main (/home/froydnj/src/gecko-dev.git/browser/app/nsBrowserApp.cpp:204)
#23: main (/home/froydnj/src/gecko-dev.git/browser/app/nsBrowserApp.cpp:399)
#24: __libc_start_main (/build/glibc-Ir_s5K/glibc-2.19/csu/libc-start.c:321)
#25: _start (/opt/build/froydnj/build-debug/dist/bin/firefox)
#26: ??? (???:???)
TEST-INFO | leakcheck | default process: leaked 1 DeliberatelyLeaked (4 bytes)
TEST-UNEXPECTED-FAIL | leakcheck | default process: 4 bytes leaked (DeliberatelyLeaked)

which is pretty nice.  You do need to set XPCOM_MEM_LOG_CLASSES=DeliberatelyLeaked, but the nice thing about this patch series is that we can modify the per-platform test options to track different sets of objects across tests and platforms.  And there's no overhead if we're not tracking anything.
Comment on attachment 8652926 [details] [diff] [review]
part 1 - rename serialNumberRecord to SerialNumberRecord

Review of attachment 8652926 [details] [diff] [review]:
-----------------------------------------------------------------

::: xpcom/base/nsTraceRefcnt.cpp
@@ +598,5 @@
>    PLHashEntry** hep = PL_HashTableRawLookup(gSerialNumbers,
>                                              HashNumber(aPtr),
>                                              aPtr);
>    if (hep && *hep) {
> +    return &((reinterpret_cast<SerialNumberRecord*>((*hep)->value))->refCount);

Gross. Out of curiousity, can this hash table be converted to something less low level?
Attachment #8652926 - Flags: review?(continuation) → review+
Comment on attachment 8652927 [details] [diff] [review]
part 2 - give SerialNumberRecord a proper constructor

Review of attachment 8652927 [details] [diff] [review]:
-----------------------------------------------------------------

Thanks for cleaning this up.

::: xpcom/base/nsTraceRefcnt.cpp
@@ +591,2 @@
>      PL_HashTableRawAdd(gSerialNumbers, hep, HashNumber(aPtr),
>                         aPtr, reinterpret_cast<void*>(record));

Can this just be a static_cast?
Attachment #8652927 - Flags: review?(continuation) → review+
Comment on attachment 8652928 [details] [diff] [review]
part 3 - remove unnecessary nsString.h include from nsTraceRefcnt.cpp

Review of attachment 8652928 [details] [diff] [review]:
-----------------------------------------------------------------

::: xpcom/base/nsTraceRefcnt.cpp
@@ +583,5 @@
>    PLHashEntry** hep = PL_HashTableRawLookup(gSerialNumbers,
>                                              HashNumber(aPtr),
>                                              aPtr);
>    if (hep && *hep) {
> +    return reinterpret_cast<SerialNumberRecord*>((*hep)->value)->serialNumber;

This change should go in part 1.
Attachment #8652928 - Flags: review?(continuation) → review+
Comment on attachment 8652929 [details] [diff] [review]
part 4 - record allocation stacks for classes in XPCOM_MEM_LOG_CLASSES

Review of attachment 8652929 [details] [diff] [review]:
-----------------------------------------------------------------

::: xpcom/base/nsTraceRefcnt.cpp
@@ +124,5 @@
>  
>    intptr_t serialNumber;
>    int32_t refCount;
>    int32_t COMPtrCount;
> +  std::vector<std::string> allocationStack;

Please add a comment that you are using this because we can't use leak checking data structures inside leak checking.

@@ +884,5 @@
> +RecordStackFrame(uint32_t aFrameNumber, void* aPC, void* aSP,
> +                 void* aClosure)
> +{
> +  auto locations = static_cast<std::vector<std::string>*>(aClosure);
> +  static const size_t buflen = 1024;

nit: bufLen ?

@@ +886,5 @@
> +{
> +  auto locations = static_cast<std::vector<std::string>*>(aClosure);
> +  static const size_t buflen = 1024;
> +  char buf[buflen];
> +  gCodeAddressService->GetLocation(aFrameNumber, aPC, buf, buflen);

I know this code doesn't require the highest possible performance, but I think it would be much faster to store a list of PCs, and only actually call into the location service to generate strings for objects that leak. I can be convinced this doesn't matter if you can record nsGlobalWindow allocations without 90% of the time being spent in the location service, which is what refcount logging ends up like (well, 99% but that records a lot more). r- for this. Otherwise this looks good.

@@ +922,5 @@
> +  if (!gCodeAddressService) {
> +    gCodeAddressService = new WalkTheStackCodeAddressService();
> +  }
> +  static const int kFramesToSkip =
> +    0 +                         // this frame gets inlined

Is there some reason these comments aren't over more to the left? Seems a little weird.
Attachment #8652929 - Flags: review?(continuation) → review-
Comment on attachment 8652930 [details] [diff] [review]
part 5 - dump allocation stacks for leaked objects in XPCOM_MEM_LOG_CLASSES

Review of attachment 8652930 [details] [diff] [review]:
-----------------------------------------------------------------

This will need to change, so clearing review for now.
Attachment #8652930 - Flags: review?(continuation)
Comment on attachment 8652931 [details] [diff] [review]
part 6 - move cut-and-paste stack fixer code into mozrunner

Review of attachment 8652931 [details] [diff] [review]:
-----------------------------------------------------------------

Looks like a nice cleanup. r=me with my nit addressed.

::: testing/mozbase/mozrunner/mozrunner/utils.py
@@ +244,5 @@
>  
>      return env
> +
> +
> +def stackFixerFunction(utilityPath, symbolsPath):

To be consistent with the style elsewhere in this module, name this get_stack_fixer_function
Attachment #8652931 - Flags: review?(wlachance) → review+
(In reply to Nathan Froyd [:froydnj][:nfroyd] from comment #16)
> which is pretty nice.  You do need to set
> XPCOM_MEM_LOG_CLASSES=DeliberatelyLeaked, but the nice thing about this
> patch series is that we can modify the per-platform test options to track
> different sets of objects across tests and platforms.  And there's no
> overhead if we're not tracking anything.

This looks pretty good. How do you set environment variables like this in automation?
Comment on attachment 8652932 [details] [diff] [review]
part 7 - teach process_leak_log how to symbolicate leaked object stacks

Review of attachment 8652932 [details] [diff] [review]:
-----------------------------------------------------------------

::: testing/mozbase/mozleak/mozleak/leaklog.py
@@ +16,5 @@
>  
>  
>  def process_single_leak_file(leakLogFileName, processType, leakThreshold,
> +                             ignoreMissingLeaks, log=None,
> +                             utilityPath=None, symbolsPath=None):

It would be a little nicer to just pass in a stack fixer as opposed to these weird path things, but okay.
Attachment #8652932 - Flags: review?(continuation) → review+
(In reply to Andrew McCreight [:mccr8] from comment #23)
> (In reply to Nathan Froyd [:froydnj][:nfroyd] from comment #16)
> > which is pretty nice.  You do need to set
> > XPCOM_MEM_LOG_CLASSES=DeliberatelyLeaked, but the nice thing about this
> > patch series is that we can modify the per-platform test options to track
> > different sets of objects across tests and platforms.  And there's no
> > overhead if we're not tracking anything.
> 
> This looks pretty good. How do you set environment variables like this in
> automation?

I believe the way to do it is by modifying files like:

individual mochitest suites: http://mxr.mozilla.org/mozilla-central/source/testing/mozharness/configs/unittests/mac_unittest.py#50
all mochitests: http://mxr.mozilla.org/mozilla-central/source/testing/config/mozharness/mac_config.py#26

adding the --setenv XPCOM_MEM_LOG_CLASSES=$CLASS option.  Knowing our test suites, all of the harnesses probably have slightly different spellings for how to set an environment variable.

(In reply to Andrew McCreight [:mccr8] from comment #24)
> >  def process_single_leak_file(leakLogFileName, processType, leakThreshold,
> > +                             ignoreMissingLeaks, log=None,
> > +                             utilityPath=None, symbolsPath=None):
> 
> It would be a little nicer to just pass in a stack fixer as opposed to these
> weird path things, but okay.

I can do that.

(In reply to Andrew McCreight [:mccr8] from comment #18)
> ::: xpcom/base/nsTraceRefcnt.cpp
> @@ +591,2 @@
> >      PL_HashTableRawAdd(gSerialNumbers, hep, HashNumber(aPtr),
> >                         aPtr, reinterpret_cast<void*>(record));
> 
> Can this just be a static_cast?

Looks like virtually every reinterpret_cast in here should be static_cast.  I can fix that in this bug or in a followup.

(In reply to Andrew McCreight [:mccr8] from comment #20)
> ::: xpcom/base/nsTraceRefcnt.cpp
> @@ +124,5 @@
> >  
> >    intptr_t serialNumber;
> >    int32_t refCount;
> >    int32_t COMPtrCount;
> > +  std::vector<std::string> allocationStack;
> 
> Please add a comment that you are using this because we can't use leak
> checking data structures inside leak checking.

Yup.  I think that's why we're using PL_HashTable and not something slightly nicer.  I guess we could use std::unordered_set or something.  I thought PLDHashTable was leak-checked, but apparently not...we could try using that.

> @@ +884,5 @@
> > +RecordStackFrame(uint32_t aFrameNumber, void* aPC, void* aSP,
> > +                 void* aClosure)
> > +{
> > +  auto locations = static_cast<std::vector<std::string>*>(aClosure);
> > +  static const size_t buflen = 1024;
> 
> nit: bufLen ?

I was copying from the function above, but I can do that.

> @@ +886,5 @@
> > +{
> > +  auto locations = static_cast<std::vector<std::string>*>(aClosure);
> > +  static const size_t buflen = 1024;
> > +  char buf[buflen];
> > +  gCodeAddressService->GetLocation(aFrameNumber, aPC, buf, buflen);
> 
> I know this code doesn't require the highest possible performance, but I
> think it would be much faster to store a list of PCs, and only actually call
> into the location service to generate strings for objects that leak. I can
> be convinced this doesn't matter if you can record nsGlobalWindow
> allocations without 90% of the time being spent in the location service,
> which is what refcount logging ends up like (well, 99% but that records a
> lot more). r- for this. Otherwise this looks good.

That's a good point, I'll do that.

> @@ +922,5 @@
> > +  if (!gCodeAddressService) {
> > +    gCodeAddressService = new WalkTheStackCodeAddressService();
> > +  }
> > +  static const int kFramesToSkip =
> > +    0 +                         // this frame gets inlined
> 
> Is there some reason these comments aren't over more to the left? Seems a
> little weird.

That's where' emacs's M-; put them; I can move them over a little, I suppose.
If you could add a little blurb about how to set this in our test suites to this page I'd appreciate it:
https://developer.mozilla.org/en-US/docs/Mozilla/Performance/Refcount_tracing_and_balancing
Now with more efficiency.
Attachment #8652929 - Attachment is obsolete: true
Attachment #8653049 - Flags: review?(continuation)
Attachment #8652930 - Attachment is obsolete: true
Attachment #8653050 - Flags: review?(continuation)
Comment on attachment 8653049 [details] [diff] [review]
part 4 - record allocation stacks for classes in XPCOM_MEM_LOG_CLASSES

Review of attachment 8653049 [details] [diff] [review]:
-----------------------------------------------------------------

Thanks.

::: xpcom/base/nsTraceRefcnt.cpp
@@ +918,5 @@
> +static void
> +WalkTheStackSavingLocations(std::vector<void*>& aLocations)
> +{
> +#ifdef MOZ_STACKWALKING
> +  if (!gCodeAddressService) {

nit: RecordStackFrame doesn't use the address service any more, so you don't have to initialize it here. That should get pushed into DumpSerialNumbers in the next patch.
Attachment #8653049 - Flags: review?(continuation) → review+
(In reply to Nathan Froyd [:froydnj][:nfroyd] from comment #25)
> Looks like virtually every reinterpret_cast in here should be static_cast. 
> I can fix that in this bug or in a followup.
A followup would be fine. I just prefer to have non-scary casts be static_casts, so when I do see a reinterpret_cast I know to be more cautious.
Attachment #8653050 - Flags: review?(continuation) → review+
Thanks for working on this, Benoit and Nathan. It will be nice to have a little light LSan-ish thing that works on all of our platforms.
Comment on attachment 8650112 [details]
MozReview Request: Bug 1196430 - Add MozLeakHelper to get allocation site for leaks. r=froydnj

Canceling this review, then.
Attachment #8650112 - Flags: review?(nfroyd)
Attachment #8653051 - Flags: review?(continuation) → review+
Looks nice, thanks for doing a better version :nfroyd!

(In reply to Nathan Froyd [:froydnj][:nfroyd] from comment #16)
>  You do need to set
> XPCOM_MEM_LOG_CLASSES=DeliberatelyLeaked, but the nice thing about this
> patch series is that we can modify the per-platform test options to track
> different sets of objects across tests and platforms.

To bike-shed a bit. I hope that's well documented. We have so many platforms, and sub test configuration, often with their own obscure differences and exception, that from my experience I often end up activating a debug option in the wrong place to later notice that on that test' platform/configuration we don't run that for some reason and have to do another push. But if it's relatively simple that's fine.

(For instance how b2g mac desktop builds have random options from both b2g and mac set)

That's what I liked about just adding a member. I'm going to need to make a tree edit anyways and if I want to restrict to a particular platform I can #ifdef it.
Assignee: bgirard → nfroyd
Just to confirm, we can get stacks on try:

08:37:12     INFO -  Serial Numbers of Leaked Objects:
08:37:12     INFO -  4363 @0x10bc151f0 (1 references; 0 from COMPtrs)
08:37:12     INFO -  allocation stack:
08:37:45     INFO -  #00: mozilla::AppleVDADecoder::OutputFrame(mozilla::CFRefPtr<__CVBuffer*>, mozilla::AppleVDADecoder::AppleFrameRef) [mfbt/nsRefPtr.h:285]
08:37:45     INFO -  #01: nsRunnableMethodImpl<nsresult (mozilla::AppleVDADecoder::*)(mozilla::CFRefPtr<__CVBuffer *>, mozilla::AppleVDADecoder::AppleFrameRef), true, mozilla::CFRefPtr<__CVBuffer *>, mozilla::AppleVDADecoder::AppleFrameRef>::Run [dom/media/platforms/apple/AppleUtils.h:82]
08:37:45     INFO -  #02: mozilla::TaskQueue::Runner::Run() [xpcom/threads/TaskQueue.h:135]
08:37:45     INFO -  #03: nsThreadPool::Run() [xpcom/glue/nsCOMPtr.h:403]
08:37:45     INFO -  #04: _ZThn8_N12nsThreadPool3RunEv [/var/folders/vs/t_n93bks6pd2b2mjk9q8pmdr00000w/T/tmpdywaKJ.cpp:242]
08:37:45     INFO -  #05: nsThread::ProcessNextEvent(bool, bool*) [xpcom/threads/nsThread.cpp:864]
08:37:45     INFO -  #06: NS_ProcessNextEvent(nsIThread*, bool) [xpcom/glue/nsThreadUtils.cpp:277]
08:37:45     INFO -  #07: mozilla::ipc::MessagePumpForNonMainThreads::Run(base::MessagePump::Delegate*) [ipc/glue/MessagePump.cpp:355]
08:37:45     INFO -  #08: MessageLoop::RunInternal() [ipc/chromium/src/base/message_loop.cc:235]
08:37:45     INFO -  #09: MessageLoop::Run() [ipc/chromium/src/base/message_loop.cc:520]
08:37:45     INFO -  #10: nsThread::ThreadFunc(void*) [xpcom/threads/nsThread.cpp:361]
08:37:46     INFO -  #11: _pt_root [nsprpub/pr/src/pthreads/ptthread.c:215]
08:37:46     INFO -  #12: libSystem.B.dylib + 0x39fd6
08:37:46     INFO -  TEST-INFO | leakcheck | default process: leaked 1 MacIOSurface (40 bytes)

And this was the patch:

https://hg.mozilla.org/try/diff/b7a43d0d2700/testing/config/mozharness/mac_config.py
OK, after looking through logs and tarballs, I think I understand what's going wrong with the b2g mochitests and this patch series.

The "common" test zip for mochitests gets unpacked at /builds/slave/test/build/tests.  Let's call this $ROOT from now on.  Unpacking this creates the files we're interested in, such as:

bin/fix_linux_stack.py
bin/fix_stack_using_bpsyms.py
bin/xpcshell

The problem is that the common package is completed unused by the emulator tests.  xpcshell (for mochitests) is obtained by downloading packages from tooltool (!) rather than using the xpcshell we just built.  So runtestsb2g.py never gets informed that there are interesting Python packages available at $ROOT/bin and we can't import them.

What's the right way to fix this state of affairs?  I guess one could upload new packages to tooltool with the fix*.py files?  It looks like I ought to be able to add a --utility-path option here:

http://mxr.mozilla.org/mozilla-central/source/testing/config/mozharness/b2g_emulator_config.py#63

but given that b2g configs eschew --utility-path, it's not clear to me that utility_path would actually be what I want there, and I can't tell what it would be from the logs.

Are there other options that don't descend into a morass of buildbot configury?
Flags: needinfo?(wlachance)
I'm not the right person to ask about this. :ahal, can you help ^^^?
Flags: needinfo?(wlachance) → needinfo?(ahalberstadt)
Yeah, it has to download the binaries from tooltool because the ones that just got built are for ARM and won't run on the test slaves.

Assuming this MozLeakHelper thing doesn't have the same problem, I don't think there's any particular reason that you couldn't use --utility-path. I glanced at the mochitest harness, and it looks like passing it in shouldn't interfere with any b2g things (as long as --xre-path is also passed in).

Failing that, you could either copy it to somewhere the harness knows about in the mozharness script, or create a new command line option to find it (would prefer to avoid this though). Either way, this should be solvable in mozharness only and shouldn't need modifications to buildbot-configs.
Flags: needinfo?(ahalberstadt)
(In reply to Andrew Halberstadt [:ahal] from comment #38)
> Yeah, it has to download the binaries from tooltool because the ones that
> just got built are for ARM and won't run on the test slaves.

...  I guess tests go faster if we have a native xpcshell serving things than if we have an ARM xpcshell.

> Assuming this MozLeakHelper thing doesn't have the same problem, I don't
> think there's any particular reason that you couldn't use --utility-path. I
> glanced at the mochitest harness, and it looks like passing it in shouldn't
> interfere with any b2g things (as long as --xre-path is also passed in).

Is there some way I can verify where --utility-path is going to point to?
Flags: needinfo?(ahalberstadt)
(In reply to Nathan Froyd [:froydnj] from comment #39)
> ...  I guess tests go faster if we have a native xpcshell serving things
> than if we have an ARM xpcshell.

The way mochitest is set up the tests *need* to be served from the host, but it's probably also faster :)

> Is there some way I can verify where --utility-path is going to point to?

You could add a print statement to the test harness I guess. But you'll be building --utility-path manually in the mozharness script. What you call $ROOT in comment 36, is called 'abs_test_install_dir' in the mozharness script:
https://dxr.mozilla.org/mozilla-central/source/testing/mozharness/scripts/b2g_emulator_unittest.py#157

You can use that as a starting point for building utility_path. Grep for 'xre_path' in that file to see where to set it. Then you'll be able to add --utility-path=%(utility_path)s to the config file:
https://dxr.mozilla.org/mozilla-central/rev/5fe9ed3edd6811a662d40d05e37b0d66e9520d82/testing/mozharness/configs/b2g/emulator_automation_config.py
Flags: needinfo?(ahalberstadt)
Depends on: 1202716
Depends on: 1206203
No longer depends on: 1206203
You need to log in before you can comment on or make changes to this bug.