Enable LeakSanitizer on TBPL Mochitests

RESOLVED FIXED in mozilla33

Status

defect
RESOLVED FIXED
5 years ago
5 years ago

People

(Reporter: mccr8, Assigned: mccr8)

Tracking

(Blocks 1 bug)

unspecified
mozilla33
Points:
---
Dependency tree / graph
Bug Flags:
in-testsuite +

Firefox Tracking Flags

(Not tracked)

Details

(Whiteboard: [MemShrink:P1], )

Attachments

(2 attachments, 4 obsolete attachments)

No description provided.
Whiteboard: [MemShrink]
Whiteboard: [MemShrink] → [MemShrink:P1]
Are you planning to enable this in the ASan job or are you aiming at having a separate job for this?

Also, your experiments currently use the newer Clang from bug 957865, right?
(In reply to Christian Holler (:decoder) from comment #1)
> Are you planning to enable this in the ASan job or are you aiming at having
> a separate job for this?
I was going to enable it for the ASan job.  Enabling LSan doesn't affect the build, and supposedly the only additional cost is some analysis after the run is over, so I think that makes the most sense.

> Also, your experiments currently use the newer Clang from bug 957865, right?

I'm using whatever is on my local machine, plus whatever is on TBPL right now with a default ASAN push.
Here's what I have so far. I still haven't gotten the suppressions file set up properly for CJR Cpp or X, but it seems to be working for Mochitests.

Mochitests are still orange because I haven't given up on figuring out how to get better stacks for these clipped stacks that call into system libraries.
Attachment #8396821 - Attachment is obsolete: true
Comment on attachment 8425169 [details] [diff] [review]
Enable LSAN for Mochitests. WIP

Hi Ted, in this patch I'm trying to copy a small text file, lsan_suppressions.txt, over somewhere and then refer to it in the environment I set up when I run mochitests.  I had this working at some point, but I guess something changed (and my attempt to fix it didn't work).  What's the right way to generate thet path so I can put it in the environment?  My current attempt to do this is at the ??? line, but it isn't working.  Any other feedback is welcome, of course.
Attachment #8425169 - Flags: feedback?(ted)
Try run: https://tbpl.mozilla.org/?tree=Try&rev=bf0efdb0c1b6

These are orange mostly because I failed to set up the suppressions list properly.  I'll do a new run once that is done.  I think the orange Bo is because I still have stuff to enable LSAN for XPCShell, but the suppressions list is set up even less well, so there's just total failure.  I'll have to remove some of the other patches to see how that works.
Comment on attachment 8425169 [details] [diff] [review]
Enable LSAN for Mochitests. WIP

Maybe you can help me with my question from comment 5, Joel?  Thanks.
Attachment #8425169 - Flags: feedback?(ted) → feedback?(jmaher)
Comment on attachment 8425169 [details] [diff] [review]
Enable LSAN for Mochitests. WIP

Review of attachment 8425169 [details] [diff] [review]:
-----------------------------------------------------------------

a drive by- nothing in here is scary or too far away from r+

::: build/automationutils.py
@@ +528,5 @@
> +
> +        asanOptions.append('detect_leaks=1')
> +
> +        # ???
> +        suppressionsFile = os.path.abspath("_tests/testing/mochitest/lsan_suppressions.txt")

on tbpl we don't have the _tests/testing/mochitest folder, instead we want lsan_suppressions.txt to be referenced from the current directory.  I suspect you can just reference it as a local file, but to be safe I would do:
suppressionsFile = os.path.join(SCRIPT_DIR, 'lsan_suppressions.txt')
* SCRIPT_DIR is defined in runtests.py


question, is lsan_suppressions.txt added the the tests.zip?

@@ +539,5 @@
> +        env['MOZ_CC_RUN_DURING_SHUTDOWN'] = '1'
> +
> +      if len(asanOptions):
> +        env['ASAN_OPTIONS'] = ' '.join(asanOptions)
> +        #env['ASAN_OPTIONS'] = ':'.join(asanOptions)

extra comment in here
Attachment #8425169 - Flags: feedback?(jmaher) → feedback+
Alias: LSAN-TBPL
Thanks!

> question, is lsan_suppressions.txt added the the tests.zip?
I added it to SERV_FILES in the Makefile.  I don't know if that's right or not.
Alias: LSAN-TBPL
Summary: Enable LeakSanitizer on TBPL → Enable LeakSanitizer on TBPL Mochitests
Depends on: 987918
The suppression list isn't going to work without the symbolizer working.
Depends on: 1020590
One odd thing I noticed is that running an individual mochitest directory doesn't produce the LSAN report, whereas if I use --total-chunks=5 --this-chunk=1 then it will.  I'm not sure why that is.  The stuff the logs print near the end looks mostly the same, aside from the LSAN report.  I'm pretty sure this used to work.
Another thing I should fix is to make TBPL failures show the top "interesting" frame of each leak.  The current thing where it shows the number of bytes is useless.  This will require modifying the Python test harness to parse an LSAN report, filter out boring top frames like calloc and malloc, and output the leak in some kind of format that TBPL understands, which will probably require modifying TBPL.
You might be interested in build/valgrind/output_handler.py, which is a Valgrind output parser that's used from build/valgrind/mach_commands.py. It prints the details of each error (error type and function names from the first four entries in the stack trace) in a single line.

Note particularly the "it may break" comment, which describes how I added some protection against any future changes in Valgrind's output that would break the parser.
Thanks, I'll have to look at that.  I tried to make it so that it would break the tree when the format changes so the parser isn't working, but I think I can improve that some more.

Here's the prototype of the report parser:
  https://hg.mozilla.org/try/rev/6975c9b79544

Right now I just had it print out the first frame, as that was already pretty large, but I guess it is a good idea to at least have a couple.  I also have a skip list that ignores various malloc frames.  I need to add some more to that.

Here's the latest run:
  https://tbpl.mozilla.org/?tree=Try&rev=25c047731f23

I need to investigate the two test sets with PORT_ZAlloc_Util leaks (bc1 and bc3), but over all I think it is getting close.
I use this regexp:

> r'^==\d+==.*0x[A-Z0-9]+: ([A-Za-z0-9_:\?]+)'

To extract just the function name (without arguments and other trailing junk). That shortens it a lot. If it fails to match, I fall back to '?!?'.

I also don't bother skipping names like |malloc| because four frames of context has been enough to distinguish things sufficiently thus far. But skipping things probably won't hurt.
Here's some sample output from my revised version:

 0:14.05 TEST-UNEXPECTED-FAIL | LeakSanitizer | leak at /usr/lib/libGL.so.1
 0:14.05 TEST-UNEXPECTED-FAIL | LeakSanitizer | leak at js::Nursery::reallocateSlots, JSObject::growSlots, JSObject::updateSlotsForSpan, JSObject::setLastProperty
 0:14.05 TEST-UNEXPECTED-FAIL | LeakSanitizer | leak at js::NativeIterator::allocateIterator, js::GlobalObject::initIteratorClasses, js_InitIteratorClasses, js::GlobalObject::resolveConstructor
 0:14.05 TEST-UNEXPECTED-FAIL | LeakSanitizer | leak at /usr/lib/libresolv.so.2
 0:14.05 TEST-UNEXPECTED-FAIL | LeakSanitizer | leak at js::Shape::entryCount, js::Shape::hashify, js::ObjectImpl::toDictionaryMode, js::ObjectImpl::replaceWithNewEquivalentShape
 0:14.05 TEST-UNEXPECTED-FAIL | LeakSanitizer | leak at js::Nursery::allocateSlots, AllocateSlots, JSObject::growSlots, JSObject::updateSlotsForSpan
 0:14.05 TEST-UNEXPECTED-FAIL | LeakSanitizer | leak at CreateFunctionPrototype, js::GlobalObject::resolveConstructor, js::GlobalObject::ensureConstructor, CreateObjectConstructor
 0:14.05 TEST-UNEXPECTED-FAIL | LeakSanitizer | leak at js::gc::AllocateObject, js::NewGCObject, JSObject::create, NewObject
 0:14.05 TEST-UNEXPECTED-FAIL | LeakSanitizer | leak at js::Shape::entryCount, js::Shape::hashify, js::ObjectImpl::toDictionaryMode, JSObject::addPropertyInternal
 0:14.05 TEST-UNEXPECTED-FAIL | LeakSanitizer | leak at js::HashSet, HashChildren, js::PropertyTree::insertChild, js::PropertyTree::getChild


I couldn't quite use your regexp, because there's some stack frames that have a return type.  Instead, I just look for the first ( or < and take everything before that, then from that, take everything after the last space, if any.
> Here's some sample output from my revised version:

Looks good!
Posted patch Enable LSAN for Mochitests. (obsolete) — Splinter Review
I think this is almost ready for review.
Attachment #8425169 - Attachment is obsolete: true
Unfortunately I can't enable CLEANUP_MEMORY because that ends up with a shutdown assert on TBPL.  Instead, I added libfontconfig.so to the suppression list.
No longer depends on: 987918
try run: https://tbpl.mozilla.org/?tree=Try&rev=9958f982be3e

I don't know if anybody else should review this.
Attachment #8440147 - Flags: review?(jmaher)
Attachment #8439484 - Attachment is obsolete: true
Comment on attachment 8440147 [details] [diff] [review]
Enable LeakSanitizer for Mochitests.

Review of attachment 8440147 [details] [diff] [review]:
-----------------------------------------------------------------

a lot of questions or simple things.  Please ensure this works on android or we remove it from running on android.  I guess on that same note, ensuring this doesn't break b2g scripts as well.

::: build/automationutils.py
@@ +447,5 @@
>    Works only on unix-like platforms where `free` is in the path.
>    """
>    return int(os.popen("free").readlines()[1].split()[1])
>  
> +def environment(xrePath, env=None, crashreporter=True, debugger=False, dmdPath=None, lsanPath=None):

we still use build/automation.py.in for android (sadly so), so please modify the function signature there as well.

@@ +529,5 @@
> +        log.info("LSan enabled.")
> +
> +        asanOptions.append('detect_leaks=1')
> +
> +        suppressionsFile = os.path.join(lsanPath, 'lsan_suppressions.txt')

from what I see lsan_suppressions.txt will be in the mochitest/ directory with all the other scripts (i.e. runtests.py).

@@ +537,5 @@
> +          env["LSAN_OPTIONS"] = "exitcode=0:suppressions=" + suppressionsFile
> +        else:
> +          log.info("WARNING | runtests.py | LSan suppressions file does not exist! " + suppressionsFile)
> +
> +        env['MOZ_CC_RUN_DURING_SHUTDOWN'] = '1'

do we still want to set this env variable without the suppressionsFile?

@@ +540,5 @@
> +
> +        env['MOZ_CC_RUN_DURING_SHUTDOWN'] = '1'
> +
> +      if len(asanOptions):
> +        env['ASAN_OPTIONS'] = ':'.join(asanOptions)

so this will be set in two cases:
* <4GB RAM
* we defined a lsanPath (irregardless if we find the supressions file)

@@ +706,5 @@
> +  def __init__(self, logger):
> +    self.logger = logger
> +    self.inReport = False
> +    self.directLeak = False
> +    self.indirectLeak = False

we never use directLeak or indirectLeak?

@@ +718,5 @@
> +      "calloc", "js_calloc", "calloc_", "__interceptor_calloc", "moz_calloc", "moz_xcalloc",
> +      "realloc","js_realloc", "realloc_", "__interceptor_realloc", "moz_realloc", "moz_xrealloc",
> +      "new",
> +      "js::MallocProvider",
> +    ]

can you add a brief comment or link as to why we want to skip these?

@@ +730,5 @@
> +  def log(self, line):
> +    if re.match(self.startRegExp, line):
> +      self.inReport = True
> +      self.directLeak = False
> +      self.indirectLeak = False

you could just return here and save the elif and extra indentation for the rest of the function.

@@ +761,5 @@
> +        else:
> +          m = re.match(self.sysLibStackFrameRegExp, line)
> +          if m:
> +            if self.recordMoreFrames:
> +              self._recordFrame(m.group(1))

do we not need to ignore the skiplist here?

@@ +763,5 @@
> +          if m:
> +            if self.recordMoreFrames:
> +              self._recordFrame(m.group(1))
> +          # If we don't match either of these, just ignore the frame.
> +          # We'll end up with "unknown stack" if everything is ignored.

we could simplify this code by doing something like this:
frame = None
stackframe = re.match(self.stackFrameRegExp, line)
if stackframe:
  frame = m.group(1).split()[-1]
syslib = re.match(self.sysLibStackFrameRegExp, line)
if syslib:
  frame = m.group(1)

if frame and self.recordMoreFrames:
  if not re.match(self.skipListRegExp, frame):
    self._recordFrame(frame)

::: testing/mochitest/runtests.py
@@ +1016,5 @@
>  
>    def buildBrowserEnv(self, options, debugger=False):
>      """build the environment variables for the specific test and operating system"""
>      browserEnv = self.environment(xrePath=options.xrePath, debugger=debugger,
> +                                  dmdPath=options.dmdPath, lsanPath=SCRIPT_DIR)

we will probably trigger this in android builds:
http://dxr.mozilla.org/mozilla-central/source/testing/mochitest/runtestsremote.py#537.

One option to remove it from running on Android will be to pass in lsanPath to buildBrowserEnv, then we could set it to None in runtestsremote.py.

@@ +1224,5 @@
>          shutdownLeaks = ShutdownLeaks(log.info)
>        else:
>          shutdownLeaks = None
>  
> +      if mozinfo.info["asan"] and (mozinfo.isLinux or mozinfo.isMac):

will this work on 32 and 64 bit linux?
Attachment #8440147 - Flags: review?(jmaher) → review-
Attachment #8440147 - Attachment is obsolete: true
Attachment #8442431 - Flags: review?(jmaher)
(In reply to Joel Maher (:jmaher) from comment #21)
> a lot of questions or simple things.  Please ensure this works on android or
> we remove it from running on android.  I guess on that same note, ensuring
> this doesn't break b2g scripts as well.

Will Android or B2G match this: "mozinfo.isLinux or mozinfo.isMac".  Also, we only run ASan builds on L64 on TBPL, so mozinfo.info.get("asan") should never be true for Android and B2G.  Or are you concerned about people running it themselves on these platforms?  Or just random other junk?  But yeah, I'll do a full push to make sure it is okay.  I did a non-ASan Linux push, so it isn't totally messed up without ASan.

> we still use build/automation.py.in for android (sadly so), so please modify
> the function signature there as well.
Done.  I will need to do this eventually anyways to get LSan working with reftests.

> from what I see lsan_suppressions.txt will be in the mochitest/ directory
> with all the other scripts (i.e. runtests.py).

I don't know exactly how the path stuff works, but the log ends up printing this out for suppressionsFile:
  LSan using suppression file /builds/slave/test/build/tests/mochitest/lsan_suppressions.txt
and it seems to find it, so I guess it works out okay?  Is there something I should do differently there?

> > +        env['MOZ_CC_RUN_DURING_SHUTDOWN'] = '1'
> do we still want to set this env variable without the suppressionsFile?
Yeah, those are two separate mechanisms for avoiding false positives, more or less.  Should I document what this is doing?

>
> so this will be set in two cases:
> * <4GB RAM
> * we defined a lsanPath (irregardless if we find the supressions file)
Yeah.  In the low memory case we want to define this even if we're not running LSan (which is the existing behavior).  If an lsanPath is defined, but we didn't find the suppressions file, we still want to run LSan, we'll just get noisier output.

I'll go ahead and define the exitcode=0 LSAN_OPTIONS thing even if the suppression file isn't found, because otherwise it will exit with an error code, which will make TBPL show additional failures, which is not very useful.

> we never use directLeak or indirectLeak?
Good point.  I stopped using those at some point.

> can you add a brief comment or link as to why we want to skip these?
Sure.

> you could just return here and save the elif and extra indentation for the
> rest of the function.
Good point.  I early-returnified the rest of the function.

> do we not need to ignore the skiplist here?
The skiplist won't match anything that the sysLibStackFrameRegExp matches.  I added a comment to that effect.
> we could simplify this code by doing something like this:
I hoisted the recordMoreFrames check out and added an early return, but I didn't want to match frames twice, so I didn't quite change it the way you pointed out.  Let me know if I should change this further.

> One option to remove it from running on Android will be to pass in lsanPath
> to buildBrowserEnv, then we could set it to None in runtestsremote.py.

I think in practice it shouldn't matter because the uses of lsanPath in environment() are guarded by a check for whether ASan is enabled, but I guess it is a bit messy to rely on that.  I made it so we only set lsanPath when we're doing an ASan run.

> > +      if mozinfo.info["asan"] and (mozinfo.isLinux or mozinfo.isMac):
>
> will this work on 32 and 64 bit linux?
It should.  Or if it doesn't, the problems will be in ASan itself and not in the particular tweaks for LSan. :)
hopefully this full try run will not break:  https://tbpl.mozilla.org/?tree=Try&rev=a3e8222d658e
> a lot of questions or simple things.  Please ensure this works on android or
> we remove it from running on android.  I guess on that same note, ensuring
> this doesn't break b2g scripts as well.

Will Android or B2G match this: "mozinfo.isLinux or mozinfo.isMac".  Also, we only run ASan builds on L64 on TBPL, so mozinfo.info.get("asan") should never be true for Android and B2G.  Or are you concerned about people running it themselves on these platforms?  Or just random other junk?  But yeah, I'll do a full push to make sure it is okay.  I did a non-ASan Linux push, so it isn't totally messed up without ASan.

[jm] - I am more concerned with the automation we see on tbpl, your try push should suppress my worries


> from what I see lsan_suppressions.txt will be in the mochitest/ directory
> with all the other scripts (i.e. runtests.py).

I don't know exactly how the path stuff works, but the log ends up printing this out for suppressionsFile:
  LSan using suppression file /builds/slave/test/build/tests/mochitest/lsan_suppressions.txt
and it seems to find it, so I guess it works out okay?  Is there something I should do differently there?

[jm] - this was a comment that I should have not made so ambiguous.  There is nothing else to do here!


> > +        env['MOZ_CC_RUN_DURING_SHUTDOWN'] = '1'
> do we still want to set this env variable without the suppressionsFile?
Yeah, those are two separate mechanisms for avoiding false positives, more or less.  Should I document what this is doing?

[jm] - I would appreciate a line or two of a comment to help future test file hackers put it all together.
Comment on attachment 8442431 [details] [diff] [review]
Enable LeakSanitizer for Mochitests.

Review of attachment 8442431 [details] [diff] [review]:
-----------------------------------------------------------------

a couple of small whitespace nits!  Thanks for the update.  As per my previous comment, just add another comment to help explain the environment variables as well.

::: build/automationutils.py
@@ +534,5 @@
> +        asanOptions.append('detect_leaks=1')
> +
> +        suppressionsFile = os.path.join(lsanPath, 'lsan_suppressions.txt')
> +
> +        lsanOptions = ["exitcode=0"]

nit: remove all the extra blank lines here

@@ +752,5 @@
> +
> +      # Only report direct leaks, in the hope that they are less flaky.
> +      self.recordMoreFrames = False
> +
> +      return

nit: extra blank lines in the above if block.
Attachment #8442431 - Flags: review?(jmaher) → review+
Thanks!

So, it turns out this actually does burn Android Mochitests.  (B2G looks okay so far.)

16:03:45     INFO -  Traceback (most recent call last):
16:03:45     INFO -    File "/builds/panda-0153/test/build/tests/mochitest/runtestsremote.py", line 743, in main
16:03:45     INFO -      retVal = mochitest.runTests(options)
16:03:45     INFO -    File "/builds/panda-0153/test/build/tests/mochitest/runtests.py", line 1356, in runTests
16:03:45     INFO -      return self.doTests(options, onLaunch)
16:03:45     INFO -    File "/builds/panda-0153/test/build/tests/mochitest/runtests.py", line 1419, in doTests
16:03:45     INFO -      self.browserEnv = self.buildBrowserEnv(options, debuggerInfo is not None)
16:03:45     INFO -    File "/builds/panda-0153/test/build/tests/mochitest/runtestsremote.py", line 538, in buildBrowserEnv
16:03:45     INFO -      browserEnv = Mochitest.buildBrowserEnv(self, options, debugger=debugger)
16:03:45     INFO -    File "/builds/panda-0153/test/build/tests/mochitest/runtests.py", line 1034, in buildBrowserEnv
16:03:45     INFO -      dmdPath=options.dmdPath, lsanPath=lsanPath)
16:03:45     INFO -  TypeError: environment() got an unexpected keyword argument 'lsanPath'

I'm not sure why that is happening.  I added an lsanPath argument to automation.py.in as you suggested.
Do you have a guess what might be happening?
Flags: needinfo?(jmaher)
so we define environment() in 4 places:
http://dxr.mozilla.org/mozilla-central/search?q=%22def+environment%28%22&case=false

I suspect we didn't get the mobile/remoteautomation.py case.
Flags: needinfo?(jmaher)
Thanks, I'll try that!  I attempted to search for all the definitions of environment() after it failed, but I managed to miss that one.
I posted a small guide to LSan on dev.platform, which I'll eventually wikify:
  https://groups.google.com/forum/#!topic/mozilla.dev.platform/pfSEIbEUSNM
I added a new suppression to cover some leaks that came in on the merge from m-c.  I filed bug 1028456 for that.
  https://hg.mozilla.org/integration/mozilla-inbound/rev/30c49f230867
https://hg.mozilla.org/mozilla-central/rev/9d11be2b1e82

\m/
https://hg.mozilla.org/mozilla-central/rev/30c49f230867
Status: NEW → RESOLVED
Closed: 5 years ago
Flags: in-testsuite+
Resolution: --- → FIXED
Target Milestone: --- → mozilla33
I'm not sure how bug filing should work in the long run for new suppressions, but for now I'll just pile them into this bug.

The XML parser leaks a single object a few times a day. Leaving it orange isn't really helpful.
Attachment #8444527 - Flags: review?(khuey)
You need to log in before you can comment on or make changes to this bug.