Closed
Bug 988041
Opened 11 years ago
Closed 10 years ago
Enable LeakSanitizer on TBPL Mochitests
Categories
(Testing :: General, defect)
Testing
General
Tracking
(Not tracked)
RESOLVED
FIXED
mozilla33
People
(Reporter: mccr8, Assigned: mccr8)
References
(Blocks 1 open bug, )
Details
(Whiteboard: [MemShrink:P1])
Attachments
(2 files, 4 obsolete files)
18.11 KB,
patch
|
jmaher
:
review+
|
Details | Diff | Splinter Review |
989 bytes,
patch
|
khuey
:
review+
|
Details | Diff | Splinter Review |
No description provided.
Assignee | ||
Updated•11 years ago
|
Whiteboard: [MemShrink]
Assignee | ||
Updated•11 years ago
|
Whiteboard: [MemShrink] → [MemShrink:P1]
Comment 1•11 years ago
|
||
Are you planning to enable this in the ASan job or are you aiming at having a separate job for this?
Also, your experiments currently use the newer Clang from bug 957865, right?
Assignee | ||
Comment 2•11 years ago
|
||
(In reply to Christian Holler (:decoder) from comment #1)
> Are you planning to enable this in the ASan job or are you aiming at having
> a separate job for this?
I was going to enable it for the ASan job. Enabling LSan doesn't affect the build, and supposedly the only additional cost is some analysis after the run is over, so I think that makes the most sense.
> Also, your experiments currently use the newer Clang from bug 957865, right?
I'm using whatever is on my local machine, plus whatever is on TBPL right now with a default ASAN push.
Assignee | ||
Comment 3•11 years ago
|
||
Here's what I have so far. I still haven't gotten the suppressions file set up properly for CJR Cpp or X, but it seems to be working for Mochitests.
Mochitests are still orange because I haven't given up on figuring out how to get better stacks for these clipped stacks that call into system libraries.
Assignee | ||
Comment 4•11 years ago
|
||
Attachment #8396821 -
Attachment is obsolete: true
Assignee | ||
Comment 5•11 years ago
|
||
Comment on attachment 8425169 [details] [diff] [review]
Enable LSAN for Mochitests. WIP
Hi Ted, in this patch I'm trying to copy a small text file, lsan_suppressions.txt, over somewhere and then refer to it in the environment I set up when I run mochitests. I had this working at some point, but I guess something changed (and my attempt to fix it didn't work). What's the right way to generate thet path so I can put it in the environment? My current attempt to do this is at the ??? line, but it isn't working. Any other feedback is welcome, of course.
Attachment #8425169 -
Flags: feedback?(ted)
Assignee | ||
Comment 6•11 years ago
|
||
Try run: https://tbpl.mozilla.org/?tree=Try&rev=bf0efdb0c1b6
These are orange mostly because I failed to set up the suppressions list properly. I'll do a new run once that is done. I think the orange Bo is because I still have stuff to enable LSAN for XPCShell, but the suppressions list is set up even less well, so there's just total failure. I'll have to remove some of the other patches to see how that works.
Assignee | ||
Comment 7•11 years ago
|
||
Comment on attachment 8425169 [details] [diff] [review]
Enable LSAN for Mochitests. WIP
Maybe you can help me with my question from comment 5, Joel? Thanks.
Attachment #8425169 -
Flags: feedback?(ted) → feedback?(jmaher)
Comment 8•11 years ago
|
||
Comment on attachment 8425169 [details] [diff] [review]
Enable LSAN for Mochitests. WIP
Review of attachment 8425169 [details] [diff] [review]:
-----------------------------------------------------------------
a drive by- nothing in here is scary or too far away from r+
::: build/automationutils.py
@@ +528,5 @@
> +
> + asanOptions.append('detect_leaks=1')
> +
> + # ???
> + suppressionsFile = os.path.abspath("_tests/testing/mochitest/lsan_suppressions.txt")
on tbpl we don't have the _tests/testing/mochitest folder, instead we want lsan_suppressions.txt to be referenced from the current directory. I suspect you can just reference it as a local file, but to be safe I would do:
suppressionsFile = os.path.join(SCRIPT_DIR, 'lsan_suppressions.txt')
* SCRIPT_DIR is defined in runtests.py
question, is lsan_suppressions.txt added the the tests.zip?
@@ +539,5 @@
> + env['MOZ_CC_RUN_DURING_SHUTDOWN'] = '1'
> +
> + if len(asanOptions):
> + env['ASAN_OPTIONS'] = ' '.join(asanOptions)
> + #env['ASAN_OPTIONS'] = ':'.join(asanOptions)
extra comment in here
Attachment #8425169 -
Flags: feedback?(jmaher) → feedback+
Assignee | ||
Updated•11 years ago
|
Alias: LSAN-TBPL
Assignee | ||
Comment 9•11 years ago
|
||
Thanks!
> question, is lsan_suppressions.txt added the the tests.zip?
I added it to SERV_FILES in the Makefile. I don't know if that's right or not.
Assignee | ||
Updated•11 years ago
|
Alias: LSAN-TBPL
Summary: Enable LeakSanitizer on TBPL → Enable LeakSanitizer on TBPL Mochitests
Assignee | ||
Comment 10•11 years ago
|
||
The suppression list isn't going to work without the symbolizer working.
Depends on: 1020590
Assignee | ||
Comment 11•11 years ago
|
||
One odd thing I noticed is that running an individual mochitest directory doesn't produce the LSAN report, whereas if I use --total-chunks=5 --this-chunk=1 then it will. I'm not sure why that is. The stuff the logs print near the end looks mostly the same, aside from the LSAN report. I'm pretty sure this used to work.
Assignee | ||
Comment 12•11 years ago
|
||
Another thing I should fix is to make TBPL failures show the top "interesting" frame of each leak. The current thing where it shows the number of bytes is useless. This will require modifying the Python test harness to parse an LSAN report, filter out boring top frames like calloc and malloc, and output the leak in some kind of format that TBPL understands, which will probably require modifying TBPL.
Comment 13•10 years ago
|
||
You might be interested in build/valgrind/output_handler.py, which is a Valgrind output parser that's used from build/valgrind/mach_commands.py. It prints the details of each error (error type and function names from the first four entries in the stack trace) in a single line.
Note particularly the "it may break" comment, which describes how I added some protection against any future changes in Valgrind's output that would break the parser.
Assignee | ||
Comment 14•10 years ago
|
||
Thanks, I'll have to look at that. I tried to make it so that it would break the tree when the format changes so the parser isn't working, but I think I can improve that some more.
Here's the prototype of the report parser:
https://hg.mozilla.org/try/rev/6975c9b79544
Right now I just had it print out the first frame, as that was already pretty large, but I guess it is a good idea to at least have a couple. I also have a skip list that ignores various malloc frames. I need to add some more to that.
Here's the latest run:
https://tbpl.mozilla.org/?tree=Try&rev=25c047731f23
I need to investigate the two test sets with PORT_ZAlloc_Util leaks (bc1 and bc3), but over all I think it is getting close.
Comment 15•10 years ago
|
||
I use this regexp:
> r'^==\d+==.*0x[A-Z0-9]+: ([A-Za-z0-9_:\?]+)'
To extract just the function name (without arguments and other trailing junk). That shortens it a lot. If it fails to match, I fall back to '?!?'.
I also don't bother skipping names like |malloc| because four frames of context has been enough to distinguish things sufficiently thus far. But skipping things probably won't hurt.
Assignee | ||
Comment 16•10 years ago
|
||
Here's some sample output from my revised version:
0:14.05 TEST-UNEXPECTED-FAIL | LeakSanitizer | leak at /usr/lib/libGL.so.1
0:14.05 TEST-UNEXPECTED-FAIL | LeakSanitizer | leak at js::Nursery::reallocateSlots, JSObject::growSlots, JSObject::updateSlotsForSpan, JSObject::setLastProperty
0:14.05 TEST-UNEXPECTED-FAIL | LeakSanitizer | leak at js::NativeIterator::allocateIterator, js::GlobalObject::initIteratorClasses, js_InitIteratorClasses, js::GlobalObject::resolveConstructor
0:14.05 TEST-UNEXPECTED-FAIL | LeakSanitizer | leak at /usr/lib/libresolv.so.2
0:14.05 TEST-UNEXPECTED-FAIL | LeakSanitizer | leak at js::Shape::entryCount, js::Shape::hashify, js::ObjectImpl::toDictionaryMode, js::ObjectImpl::replaceWithNewEquivalentShape
0:14.05 TEST-UNEXPECTED-FAIL | LeakSanitizer | leak at js::Nursery::allocateSlots, AllocateSlots, JSObject::growSlots, JSObject::updateSlotsForSpan
0:14.05 TEST-UNEXPECTED-FAIL | LeakSanitizer | leak at CreateFunctionPrototype, js::GlobalObject::resolveConstructor, js::GlobalObject::ensureConstructor, CreateObjectConstructor
0:14.05 TEST-UNEXPECTED-FAIL | LeakSanitizer | leak at js::gc::AllocateObject, js::NewGCObject, JSObject::create, NewObject
0:14.05 TEST-UNEXPECTED-FAIL | LeakSanitizer | leak at js::Shape::entryCount, js::Shape::hashify, js::ObjectImpl::toDictionaryMode, JSObject::addPropertyInternal
0:14.05 TEST-UNEXPECTED-FAIL | LeakSanitizer | leak at js::HashSet, HashChildren, js::PropertyTree::insertChild, js::PropertyTree::getChild
I couldn't quite use your regexp, because there's some stack frames that have a return type. Instead, I just look for the first ( or < and take everything before that, then from that, take everything after the last space, if any.
Comment 17•10 years ago
|
||
> Here's some sample output from my revised version:
Looks good!
Assignee | ||
Comment 18•10 years ago
|
||
I think this is almost ready for review.
Attachment #8425169 -
Attachment is obsolete: true
Assignee | ||
Comment 19•10 years ago
|
||
Unfortunately I can't enable CLEANUP_MEMORY because that ends up with a shutdown assert on TBPL. Instead, I added libfontconfig.so to the suppression list.
No longer depends on: 987918
Assignee | ||
Comment 20•10 years ago
|
||
try run: https://tbpl.mozilla.org/?tree=Try&rev=9958f982be3e
I don't know if anybody else should review this.
Attachment #8440147 -
Flags: review?(jmaher)
Assignee | ||
Updated•10 years ago
|
Attachment #8439484 -
Attachment is obsolete: true
Comment 21•10 years ago
|
||
Comment on attachment 8440147 [details] [diff] [review]
Enable LeakSanitizer for Mochitests.
Review of attachment 8440147 [details] [diff] [review]:
-----------------------------------------------------------------
a lot of questions or simple things. Please ensure this works on android or we remove it from running on android. I guess on that same note, ensuring this doesn't break b2g scripts as well.
::: build/automationutils.py
@@ +447,5 @@
> Works only on unix-like platforms where `free` is in the path.
> """
> return int(os.popen("free").readlines()[1].split()[1])
>
> +def environment(xrePath, env=None, crashreporter=True, debugger=False, dmdPath=None, lsanPath=None):
we still use build/automation.py.in for android (sadly so), so please modify the function signature there as well.
@@ +529,5 @@
> + log.info("LSan enabled.")
> +
> + asanOptions.append('detect_leaks=1')
> +
> + suppressionsFile = os.path.join(lsanPath, 'lsan_suppressions.txt')
from what I see lsan_suppressions.txt will be in the mochitest/ directory with all the other scripts (i.e. runtests.py).
@@ +537,5 @@
> + env["LSAN_OPTIONS"] = "exitcode=0:suppressions=" + suppressionsFile
> + else:
> + log.info("WARNING | runtests.py | LSan suppressions file does not exist! " + suppressionsFile)
> +
> + env['MOZ_CC_RUN_DURING_SHUTDOWN'] = '1'
do we still want to set this env variable without the suppressionsFile?
@@ +540,5 @@
> +
> + env['MOZ_CC_RUN_DURING_SHUTDOWN'] = '1'
> +
> + if len(asanOptions):
> + env['ASAN_OPTIONS'] = ':'.join(asanOptions)
so this will be set in two cases:
* <4GB RAM
* we defined a lsanPath (irregardless if we find the supressions file)
@@ +706,5 @@
> + def __init__(self, logger):
> + self.logger = logger
> + self.inReport = False
> + self.directLeak = False
> + self.indirectLeak = False
we never use directLeak or indirectLeak?
@@ +718,5 @@
> + "calloc", "js_calloc", "calloc_", "__interceptor_calloc", "moz_calloc", "moz_xcalloc",
> + "realloc","js_realloc", "realloc_", "__interceptor_realloc", "moz_realloc", "moz_xrealloc",
> + "new",
> + "js::MallocProvider",
> + ]
can you add a brief comment or link as to why we want to skip these?
@@ +730,5 @@
> + def log(self, line):
> + if re.match(self.startRegExp, line):
> + self.inReport = True
> + self.directLeak = False
> + self.indirectLeak = False
you could just return here and save the elif and extra indentation for the rest of the function.
@@ +761,5 @@
> + else:
> + m = re.match(self.sysLibStackFrameRegExp, line)
> + if m:
> + if self.recordMoreFrames:
> + self._recordFrame(m.group(1))
do we not need to ignore the skiplist here?
@@ +763,5 @@
> + if m:
> + if self.recordMoreFrames:
> + self._recordFrame(m.group(1))
> + # If we don't match either of these, just ignore the frame.
> + # We'll end up with "unknown stack" if everything is ignored.
we could simplify this code by doing something like this:
frame = None
stackframe = re.match(self.stackFrameRegExp, line)
if stackframe:
frame = m.group(1).split()[-1]
syslib = re.match(self.sysLibStackFrameRegExp, line)
if syslib:
frame = m.group(1)
if frame and self.recordMoreFrames:
if not re.match(self.skipListRegExp, frame):
self._recordFrame(frame)
::: testing/mochitest/runtests.py
@@ +1016,5 @@
>
> def buildBrowserEnv(self, options, debugger=False):
> """build the environment variables for the specific test and operating system"""
> browserEnv = self.environment(xrePath=options.xrePath, debugger=debugger,
> + dmdPath=options.dmdPath, lsanPath=SCRIPT_DIR)
we will probably trigger this in android builds:
http://dxr.mozilla.org/mozilla-central/source/testing/mochitest/runtestsremote.py#537.
One option to remove it from running on Android will be to pass in lsanPath to buildBrowserEnv, then we could set it to None in runtestsremote.py.
@@ +1224,5 @@
> shutdownLeaks = ShutdownLeaks(log.info)
> else:
> shutdownLeaks = None
>
> + if mozinfo.info["asan"] and (mozinfo.isLinux or mozinfo.isMac):
will this work on 32 and 64 bit linux?
Attachment #8440147 -
Flags: review?(jmaher) → review-
Assignee | ||
Comment 22•10 years ago
|
||
Attachment #8440147 -
Attachment is obsolete: true
Attachment #8442431 -
Flags: review?(jmaher)
Assignee | ||
Comment 23•10 years ago
|
||
(In reply to Joel Maher (:jmaher) from comment #21)
> a lot of questions or simple things. Please ensure this works on android or
> we remove it from running on android. I guess on that same note, ensuring
> this doesn't break b2g scripts as well.
Will Android or B2G match this: "mozinfo.isLinux or mozinfo.isMac". Also, we only run ASan builds on L64 on TBPL, so mozinfo.info.get("asan") should never be true for Android and B2G. Or are you concerned about people running it themselves on these platforms? Or just random other junk? But yeah, I'll do a full push to make sure it is okay. I did a non-ASan Linux push, so it isn't totally messed up without ASan.
> we still use build/automation.py.in for android (sadly so), so please modify
> the function signature there as well.
Done. I will need to do this eventually anyways to get LSan working with reftests.
> from what I see lsan_suppressions.txt will be in the mochitest/ directory
> with all the other scripts (i.e. runtests.py).
I don't know exactly how the path stuff works, but the log ends up printing this out for suppressionsFile:
LSan using suppression file /builds/slave/test/build/tests/mochitest/lsan_suppressions.txt
and it seems to find it, so I guess it works out okay? Is there something I should do differently there?
> > + env['MOZ_CC_RUN_DURING_SHUTDOWN'] = '1'
> do we still want to set this env variable without the suppressionsFile?
Yeah, those are two separate mechanisms for avoiding false positives, more or less. Should I document what this is doing?
>
> so this will be set in two cases:
> * <4GB RAM
> * we defined a lsanPath (irregardless if we find the supressions file)
Yeah. In the low memory case we want to define this even if we're not running LSan (which is the existing behavior). If an lsanPath is defined, but we didn't find the suppressions file, we still want to run LSan, we'll just get noisier output.
I'll go ahead and define the exitcode=0 LSAN_OPTIONS thing even if the suppression file isn't found, because otherwise it will exit with an error code, which will make TBPL show additional failures, which is not very useful.
> we never use directLeak or indirectLeak?
Good point. I stopped using those at some point.
> can you add a brief comment or link as to why we want to skip these?
Sure.
> you could just return here and save the elif and extra indentation for the
> rest of the function.
Good point. I early-returnified the rest of the function.
> do we not need to ignore the skiplist here?
The skiplist won't match anything that the sysLibStackFrameRegExp matches. I added a comment to that effect.
> we could simplify this code by doing something like this:
I hoisted the recordMoreFrames check out and added an early return, but I didn't want to match frames twice, so I didn't quite change it the way you pointed out. Let me know if I should change this further.
> One option to remove it from running on Android will be to pass in lsanPath
> to buildBrowserEnv, then we could set it to None in runtestsremote.py.
I think in practice it shouldn't matter because the uses of lsanPath in environment() are guarded by a check for whether ASan is enabled, but I guess it is a bit messy to rely on that. I made it so we only set lsanPath when we're doing an ASan run.
> > + if mozinfo.info["asan"] and (mozinfo.isLinux or mozinfo.isMac):
>
> will this work on 32 and 64 bit linux?
It should. Or if it doesn't, the problems will be in ASan itself and not in the particular tweaks for LSan. :)
Assignee | ||
Comment 24•10 years ago
|
||
hopefully this full try run will not break: https://tbpl.mozilla.org/?tree=Try&rev=a3e8222d658e
Comment 25•10 years ago
|
||
> a lot of questions or simple things. Please ensure this works on android or
> we remove it from running on android. I guess on that same note, ensuring
> this doesn't break b2g scripts as well.
Will Android or B2G match this: "mozinfo.isLinux or mozinfo.isMac". Also, we only run ASan builds on L64 on TBPL, so mozinfo.info.get("asan") should never be true for Android and B2G. Or are you concerned about people running it themselves on these platforms? Or just random other junk? But yeah, I'll do a full push to make sure it is okay. I did a non-ASan Linux push, so it isn't totally messed up without ASan.
[jm] - I am more concerned with the automation we see on tbpl, your try push should suppress my worries
> from what I see lsan_suppressions.txt will be in the mochitest/ directory
> with all the other scripts (i.e. runtests.py).
I don't know exactly how the path stuff works, but the log ends up printing this out for suppressionsFile:
LSan using suppression file /builds/slave/test/build/tests/mochitest/lsan_suppressions.txt
and it seems to find it, so I guess it works out okay? Is there something I should do differently there?
[jm] - this was a comment that I should have not made so ambiguous. There is nothing else to do here!
> > + env['MOZ_CC_RUN_DURING_SHUTDOWN'] = '1'
> do we still want to set this env variable without the suppressionsFile?
Yeah, those are two separate mechanisms for avoiding false positives, more or less. Should I document what this is doing?
[jm] - I would appreciate a line or two of a comment to help future test file hackers put it all together.
Comment 26•10 years ago
|
||
Comment on attachment 8442431 [details] [diff] [review]
Enable LeakSanitizer for Mochitests.
Review of attachment 8442431 [details] [diff] [review]:
-----------------------------------------------------------------
a couple of small whitespace nits! Thanks for the update. As per my previous comment, just add another comment to help explain the environment variables as well.
::: build/automationutils.py
@@ +534,5 @@
> + asanOptions.append('detect_leaks=1')
> +
> + suppressionsFile = os.path.join(lsanPath, 'lsan_suppressions.txt')
> +
> + lsanOptions = ["exitcode=0"]
nit: remove all the extra blank lines here
@@ +752,5 @@
> +
> + # Only report direct leaks, in the hope that they are less flaky.
> + self.recordMoreFrames = False
> +
> + return
nit: extra blank lines in the above if block.
Attachment #8442431 -
Flags: review?(jmaher) → review+
Assignee | ||
Comment 27•10 years ago
|
||
Thanks!
So, it turns out this actually does burn Android Mochitests. (B2G looks okay so far.)
16:03:45 INFO - Traceback (most recent call last):
16:03:45 INFO - File "/builds/panda-0153/test/build/tests/mochitest/runtestsremote.py", line 743, in main
16:03:45 INFO - retVal = mochitest.runTests(options)
16:03:45 INFO - File "/builds/panda-0153/test/build/tests/mochitest/runtests.py", line 1356, in runTests
16:03:45 INFO - return self.doTests(options, onLaunch)
16:03:45 INFO - File "/builds/panda-0153/test/build/tests/mochitest/runtests.py", line 1419, in doTests
16:03:45 INFO - self.browserEnv = self.buildBrowserEnv(options, debuggerInfo is not None)
16:03:45 INFO - File "/builds/panda-0153/test/build/tests/mochitest/runtestsremote.py", line 538, in buildBrowserEnv
16:03:45 INFO - browserEnv = Mochitest.buildBrowserEnv(self, options, debugger=debugger)
16:03:45 INFO - File "/builds/panda-0153/test/build/tests/mochitest/runtests.py", line 1034, in buildBrowserEnv
16:03:45 INFO - dmdPath=options.dmdPath, lsanPath=lsanPath)
16:03:45 INFO - TypeError: environment() got an unexpected keyword argument 'lsanPath'
I'm not sure why that is happening. I added an lsanPath argument to automation.py.in as you suggested.
Assignee | ||
Comment 28•10 years ago
|
||
Do you have a guess what might be happening?
Flags: needinfo?(jmaher)
Comment 29•10 years ago
|
||
so we define environment() in 4 places:
http://dxr.mozilla.org/mozilla-central/search?q=%22def+environment%28%22&case=false
I suspect we didn't get the mobile/remoteautomation.py case.
Flags: needinfo?(jmaher)
Assignee | ||
Comment 30•10 years ago
|
||
Thanks, I'll try that! I attempted to search for all the definitions of environment() after it failed, but I managed to miss that one.
Assignee | ||
Comment 31•10 years ago
|
||
L64 debug run: https://tbpl.mozilla.org/?tree=Try&rev=102e839bf6d0
Android opt run: https://tbpl.mozilla.org/?tree=Try&rev=2aa8dde05cb6
LSan run: https://tbpl.mozilla.org/?tree=Try&rev=38cd9d374e86
Thanks for all of your help, Joel!
https://hg.mozilla.org/integration/mozilla-inbound/rev/9d11be2b1e82
Comment 32•10 years ago
|
||
\o/
Assignee | ||
Comment 33•10 years ago
|
||
I posted a small guide to LSan on dev.platform, which I'll eventually wikify:
https://groups.google.com/forum/#!topic/mozilla.dev.platform/pfSEIbEUSNM
Assignee | ||
Comment 34•10 years ago
|
||
I added a new suppression to cover some leaks that came in on the merge from m-c. I filed bug 1028456 for that.
https://hg.mozilla.org/integration/mozilla-inbound/rev/30c49f230867
Comment 35•10 years ago
|
||
https://hg.mozilla.org/mozilla-central/rev/9d11be2b1e82
\m/
https://hg.mozilla.org/mozilla-central/rev/30c49f230867
Status: NEW → RESOLVED
Closed: 10 years ago
Flags: in-testsuite+
Resolution: --- → FIXED
Target Milestone: --- → mozilla33
Assignee | ||
Comment 36•10 years ago
|
||
I'm not sure how bug filing should work in the long run for new suppressions, but for now I'll just pile them into this bug.
The XML parser leaks a single object a few times a day. Leaving it orange isn't really helpful.
Attachment #8444527 -
Flags: review?(khuey)
Attachment #8444527 -
Flags: review?(khuey) → review+
Assignee | ||
Comment 37•10 years ago
|
||
Landed with DONTBUILD. try run: https://tbpl.mozilla.org/?tree=Try&rev=5d2aba8e2226
https://hg.mozilla.org/integration/mozilla-inbound/rev/271856d75e2f
Assignee | ||
Updated•10 years ago
|
Comment 38•10 years ago
|
||
You need to log in
before you can comment on or make changes to this bug.
Description
•