LeakSanitizer detects leak of 8 bytes in (<unknown module>) when running tests on Ubuntu 16.04

RESOLVED FIXED

Status

()

Core
Audio/Video: Playback
P3
normal
Rank:
25
RESOLVED FIXED
2 years ago
2 years ago

People

(Reporter: dminor, Assigned: karlt)

Tracking

(Blocks: 1 bug)

Firefox Tracking Flags

(Not tracked)

Details

(Whiteboard: [fixed with bug 1323382])

(Reporter)

Description

2 years ago
This is from a recent try run:

[task 2016-09-20T15:52:43.983897Z] 15:52:43     INFO -  ==1050==ERROR: LeakSanitizer: detected memory leaks
[task 2016-09-20T15:52:43.983947Z] 15:52:43     INFO -  Direct leak of 8 byte(s) in 1 object(s) allocated from:
[task 2016-09-20T15:52:43.984015Z] 15:52:43     INFO -      #0 0x4b247b in malloc /builds/slave/moz-toolchain/src/llvm/projects/compiler-rt/lib/asan/asan_malloc_linux.cc:52:3
[task 2016-09-20T15:52:43.986840Z] 15:52:43     INFO -      #1 0x7fe7f6104778  (<unknown module>)

I see a leak of the same size when I run locally. This blocks being able to run ASAN tests on Ubuntu 16.04 images.
(Reporter)

Updated

2 years ago
Rank: 25
(Reporter)

Comment 1

2 years ago
If I restart the browser between mochitests, it looks like the following tests expose the leak:
dom/media/tests/mochitest/test_getUserMedia_audioCapture.html
dom/media/tests/mochitest/test_peerConnection_capturedVideo.html

I also get a bunch of leak in libX11.so on my system, but that isn't present in the try run - possibly because I run ASAN+debug locally.
(Reporter)

Updated

2 years ago
Assignee: nobody → dminor
(Reporter)

Comment 2

2 years ago
If I turn on reporting of leaked memory locations and capture an rr run with a watch on the leak location, I get the following backtrace for the last allocation to touch that memory:

#0  __memset_avx2 ()
    at ../sysdeps/x86_64/multiarch/memset-avx2.S:102
#1  0x000000000042267b in __asan::Allocator::Allocate (
    can_fill=true, alloc_type=__asan::FROM_MALLOC, stack=0x2, 
    alignment=8, size=8, this=0x56ea60 <__asan::instance>)
    at /home/dminor/src/llvm/projects/compiler-rt/lib/asan/asan_allocator.cc:448
#2  __asan::asan_malloc (size=size@entry=8, 
    stack=stack@entry=0x7ffd3ea859a0)
    at /home/dminor/src/llvm/projects/compiler-rt/lib/asan/asan_allocator.cc:728
#3  0x00000000004be252 in __interceptor_malloc (size=8)
    at /home/dminor/src/llvm/projects/compiler-rt/lib/asan/asan_malloc_linux.cc:53
#4  0x00007f80155b1779 in ?? ()
   from /usr/lib/x86_64-linux-gnu/libgomp.so.1
#5  0x00007f80155ba648 in ?? ()
   from /usr/lib/x86_64-linux-gnu/libgomp.so.1
#6  0x00007f80155afde2 in ?? ()
   from /usr/lib/x86_64-linux-gnu/libgomp.so.1
#7  0x00007f807fb794ea in call_init (l=<optimized out>, 
    argc=argc@entry=5, argv=argv@entry=0x7ffd3ea89a08, 
    env=env@entry=0x618000067880) at dl-init.c:72
#8  0x00007f807fb795fb in call_init (env=0x618000067880, 
    argv=0x7ffd3ea89a08, argc=5, l=<optimized out>) at dl-init.c:30
#9  _dl_init (main_map=main_map@entry=0x61a00020be80, argc=5, 
    argv=0x7ffd3ea89a08, env=0x618000067880) at dl-init.c:120
#10 0x00007f807fb7e712 in dl_open_worker (a=a@entry=0x7ffd3ea86660)
    at dl-open.c:575
#11 0x00007f807fb79394 in _dl_catch_error (
    objname=objname@entry=0x7ffd3ea86650, 
    errstring=errstring@entry=0x7ffd3ea86658, 
    mallocedp=mallocedp@entry=0x7ffd3ea8664f, 
    operate=operate@entry=0x7f807fb7e300 <dl_open_worker>, 
    args=args@entry=0x7ffd3ea86660) at dl-error.c:187
#12 0x00007f807fb7dbd9 in _dl_open (
   file=0x7f806d9fbe60 <.str.20> "libavcodec-ffmpeg.so.56", 
    mode=-2147483646, caller_dlopen=
    0x444da5 <__interceptor_dlopen(char const*, int)+101>, nsid=-2, 
    argc=<optimized out>, argv=<optimized out>, env=0x618000067880)
    at dl-open.c:660
#13 0x00007f807ec88f09 in dlopen_doit (a=a@entry=0x7ffd3ea86890)
    at dlopen.c:66
#14 0x00007f807fb79394 in _dl_catch_error (
    objname=0x78c610 <calloc_memory_for_dlsym+16>, 
    errstring=0x78c618 <calloc_memory_for_dlsym+24>, 
    mallocedp=0x78c608 <calloc_memory_for_dlsym+8>, 
    operate=0x7f807ec88eb0 <dlopen_doit>, args=0x7ffd3ea86890)
    at dl-error.c:187
#15 0x00007f807ec89571 in _dlerror_run (
    operate=operate@entry=0x7f807ec88eb0 <dlopen_doit>, 
    args=args@entry=0x7ffd3ea86890) at dlerror.c:163
#16 0x00007f807ec88fa1 in __dlopen (file=<optimized out>, 
    mode=<optimized out>) at dlopen.c:87
#17 0x0000000000444da5 in __interceptor_dlopen (
    filename=0x7f806d9fbe60 <.str.20> "libavcodec-ffmpeg.so.56", 
    flag=2)
    at /home/dminor/src/llvm/projects/compiler-rt/lib/asan/../sanitizer_common/sanitizer_common_interceptors.inc:5135
#18 0x00007f807e488e18 in pr_LoadLibraryByPathname (
    name=0x7f806d9fbe60 <.str.20> "libavcodec-ffmpeg.so.56", 
    flags=<optimized out>)
    at /home/dminor/src/firefox-asan/nsprpub/pr/src/linking/prlink.c:803
#19 PR_LoadLibraryWithFlags (libSpec=..., flags=<optimized out>)
    at /home/dminor/src/firefox-asan/nsprpub/pr/src/linking/prlink.c:418
#20 0x00007f8066519b85 in mozilla::FFmpegRuntimeLinker::Init ()
    at /home/dminor/src/firefox-asan/dom/media/platforms/ffmpeg/FFmpegRuntimeLinker.cpp:62
#21 0x00007f80664b7a45 in mozilla::PDMFactoryImpl::PDMFactoryImpl (
    this=0x602000186e90)
  at /home/dminor/src/firefox-asan/dom/media/platforms/PDMFactory.cpp:74
#22 mozilla::PDMFactory::EnsureInit() const::$_0::operator()() const
    (this=<optimized out>)
    at /home/dminor/src/firefox-asan/dom/media/platforms/PDMFactory.cpp:196
#23 mozilla::detail::RunnableFunction<mozilla::PDMFactory::EnsureInit() const::$_0>::Run() (this=<optimized out>)
    at /home/dminor/src/firefox-asan/objdir-ff-asan/dist/include/nsThreadUtils.h:278

which makes it look like it might be a system library problem in libgomp.so.

If I set fast_unwind_on_malloc=0 in the ASAN options, the leak disappears, which makes me think this is something that we would normally suppress, except that we get a bad stack in LSAN. Presumably disabling fast_unwind_on_malloc has performance implications, but if it is bad, perhaps we can just set that flag for the media job.
Status: NEW → ASSIGNED
(Reporter)

Comment 4

2 years ago
I ran the dom/media/test tests and they are also affected, which is not surprising given the stack above.

Unless I've missed something, I think we're down to some unpleasant alternatives:
- add a suppression for <unknown module>
- stop running the media tests on asan builds
- investigate the libgomp leak and attempt to put together a patched version of Ubuntu 16.04 to use to run the tests. This isn't quite as bad as it sounds, since the test machines are docker images anyway.

With fast_unwind_on_malloc=0 set, the 8 bytes are reported as being leaked by libdl.so, so I guess there's no guarantee the leak is actually in libgomp.so.
Component: WebRTC → Audio/Video: Playback
Summary: LeakSanitizer detects leak of 8 bytes when running WebRTC tests on Ubuntu 16.04 → LeakSanitizer detects leak of 8 bytes when running media tests on Ubuntu 16.04
(Reporter)

Comment 5

2 years ago
Andrew, any suggestions on the above? Have I missed something? Thanks!
Flags: needinfo?(continuation)
(In reply to Dan Minor [:dminor] from comment #5)
> Andrew, any suggestions on the above? Have I missed something? Thanks!

Yeah, that analysis sounds right to me. Maybe installing the debug symbols for libgomp on the machine would give us a better stack? I know that's improved the stacks for me locally when debugging LSan issues.

jib, do you know who might be able to look at this? Maybe there's some easy fix to this leak. Otherwise we may have to disable some tests. <unknown module> seems like a very broad suppression, though I haven't noticed it before so maybe it isn't so bad.
Flags: needinfo?(continuation) → needinfo?(jib)
I agree with the analysis also. I was going to suggest dminor ;) The stack suggests this is playback though, which is also how it's triaged, so maybe Anthony has someone who can look at it?
Flags: needinfo?(jib) → needinfo?(ajones)
Gerald - I'm not convinced that an 8 byte leak is an issue. Can you find a way to make the problem go away?
Flags: needinfo?(ajones) → needinfo?(gsquelart)
Priority: P2 → P3
(Reporter)

Comment 9

2 years ago
The only reason this is an issue is that the ateam in the process of moving linux test machines from 12.04 to 16.04 and this blocks them from being able to run ASAN jobs on 16.04.
Assignee: dminor → nobody
Status: ASSIGNED → NEW
(In reply to Dan Minor [:dminor] from comment #9)
> The only reason this is an issue is that the ateam in the process of moving
> linux test machines from 12.04 to 16.04 and this blocks them from being able
> to run ASAN jobs on 16.04.

Is it possible to install debug symbols for libgomp.so onto these machines? I don't know how difficult that is. I know some distros have separate packages for those.
Flags: needinfo?(dminor)
(Reporter)

Comment 11

2 years ago
The Dockerfile used to create the test machine lives in tree here: testing/docker/desktop1604-test/Dockerfile. I think all that is needed is to add 'apt install libgomp1-dbg' there.

You will also need to remove the special case to run the ASAN job on Ubuntu 12.04 like I did in this try push: https://hg.mozilla.org/try/rev/f3dce4beadaf.

Here's a try push with libgomp1-dbg: https://treeherder.mozilla.org/#/jobs?repo=try&revision=4fa6d941c48d4dfaab853c5ca380c5e756877674
Flags: needinfo?(dminor)
One thing I could blindly do from my side, is to unload the library when Firefox shuts down, in case that helps clear the allocated memory.

I'm not setup to run *SAN, so Dan, could you please try the following patch:
https://hg.mozilla.org/try/rev/75807390ff8bf89c6db8228374261f46eed57e2a
(From this try: https://treeherder.mozilla.org/#/jobs?repo=try&revision=f684b972412dfcd949a4f4b65b9f6541fb46046e )
Flags: needinfo?(gsquelart) → needinfo?(dminor)
(Reporter)

Comment 13

2 years ago
Unfortunately, the leak is still there with this patch in place.
Flags: needinfo?(dminor)
It looks like the debug symbols didn't help either.
Blocks: 976414
(Reporter)

Comment 15

2 years ago
Looks like this is now working: https://treeherder.mozilla.org/#/jobs?repo=try&revision=97e5846bab0c8b06cbcd110f695adf52fd90bc2a
Status: NEW → RESOLVED
Last Resolved: 2 years ago
Resolution: --- → WORKSFORME
Blocks: 1319782
Blocks: 1319792
Blocks: 1319801
Blocks: 1319804
Blocks: 1319807
(Reporter)

Comment 16

2 years ago
Reopening as it seems this leak now shows up (maybe intermittently?) on other test suites as well.
Status: RESOLVED → REOPENED
Resolution: WORKSFORME → ---
Summary: LeakSanitizer detects leak of 8 bytes when running media tests on Ubuntu 16.04 → LeakSanitizer detects leak of 8 bytes when running tests on Ubuntu 16.04
Permafail on some suites, intermittent for others.
Blocks: 1319863
(Assignee)

Comment 18

2 years ago
An existing comment indicates that leaks in "<unknown module>" "can not [sic]
be suppressed":
http://searchfox.org/mozilla-central/rev/5ee2bd8800b007d6c120d9521d5bf01131885afb/media/webrtc/trunk/webrtc/modules/audio_device/linux/latebindingsymboltable_linux.cc#53

Here's a failed attempt:
https://treeherder.mozilla.org/#/jobs?repo=try&revision=864405473e689113f30b2921555d5b3142e37e3e&selectedJob=32511475

Looking at the sanitizer_stacktrace_printer.cc version in chromium, it seems
that "(<unknown module>)" is printed when there is no module to match against
suppressions.

https://cs.chromium.org/chromium/src/third_party/llvm/compiler-rt/test/asan/TestCases/Linux/stack-trace-dlclose.cc?sq=package:chromium&l=42&dr=C
tests that "(<unknown module>)" is printed when the module has been unloaded.
A different scenario could have another module at the same address as was in
the unloaded module.

https://bugs.chromium.org/p/webrtc/issues/detail?id=3402 is a related bug
involving pulseaudio.
(Assignee)

Comment 19

2 years ago
Skipping all dlclose calls removes these leaks.  I hesitate to apply such a
workaround in general because it may suppress ASAN detection of one class of
use-after-free bugs.

When skipping dlclose removes these leaks in c1, there are no extra
suppressions.  That suggests that the library may be holding, in a static
variable, a reference to the leaking memory.
https://treeherder.mozilla.org/#/jobs?repo=try&revision=5f8d1abb4aeb0cc9db69c8ed4422c2762b60584f&selectedJob=32534567

This try run confirms that libavcodec-ffmpeg.so.56 is the library that
triggers the leaks when dlclose()d.

https://treeherder.mozilla.org/#/jobs?repo=try&revision=55e3970969db84d54301f73c5af15964cbcdac7e&selectedJob=32539723

If the library is not designed to be dlclose()d, then it probably should not
be dlclose()d.  However, comment 2 seems to indicate that libgomp is the
problem library, and so perhaps keeping that open may work around this.
Or perhaps a gcc runtime library update could fix this.
Summary: LeakSanitizer detects leak of 8 bytes when running tests on Ubuntu 16.04 → LeakSanitizer detects leak of 8 bytes in (<unknown module>) when running tests on Ubuntu 16.04
(Assignee)

Comment 20

2 years ago
(In reply to Dan Minor [:dminor] from comment #15)
> Looks like this is now working:
> https://treeherder.mozilla.org/#/
> jobs?repo=try&revision=97e5846bab0c8b06cbcd110f695adf52fd90bc2a

I don't know why those runs were green, because the leaks were still output in the logs.
e.g. https://treeherder.mozilla.org/logviewer.html#?job_id=31044459&repo=try
(Assignee)

Updated

2 years ago
Depends on: 1323382
(Assignee)

Comment 21

2 years ago
https://treeherder.mozilla.org/#/jobs?repo=try&revision=46b709c267874ac9853029609c1a33e93e1ac0d8&selectedJob=32614977
indicates that the proposed fix for bug 1323382 suppresses these leaks as they appear in mochitest-chrome tests.
Assignee: nobody → karlt
Status: REOPENED → ASSIGNED
(Assignee)

Updated

2 years ago
Blocks: 1323616
(Assignee)

Updated

2 years ago
Status: ASSIGNED → RESOLVED
Last Resolved: 2 years ago2 years ago
Resolution: --- → FIXED
Whiteboard: [fixed with bug 1323382]
You need to log in before you can comment on or make changes to this bug.