AddressSanitizer: Random crashes with toolkit/components/osfile/tests/mochi/test_osfile* tests

RESOLVED FIXED in mozilla26

Status

()

RESOLVED FIXED
6 years ago
5 years ago

People

(Reporter: decoder, Assigned: decoder)

Tracking

(Blocks: 1 bug)

unspecified
mozilla26
x86_64
Linux
Points:
---
Dependency tree / graph

Firefox Tracking Flags

(Not tracked)

Details

(Whiteboard: [asan][asan-test-failure])

Attachments

(3 attachments, 2 obsolete attachments)

(Assignee)

Description

6 years ago
The mochitest-chrome tests under toolkit/components/osfile/ trigger random crashes and ASan traces on TBPL and locally. To reproduce locally, I used this command

./mach mochitest-chrome toolkit/components/osfile/

With an ASan opt build. In my case, I get a pretty reliable glibc abort message. I added symbols to the trace with a small script, yielding this:

*** glibc detected *** /srv/repos/browser/mozilla-central/objdir-ff-asan64opt/dist/bin/firefox-bin: free(): invalid pointer: 0x00007fcac67a7a40 ***
======= Backtrace: =========
malloc_printerr /build/buildd/eglibc-2.15/malloc/malloc.c:5007 
/srv/repos/browser/mozilla-central/objdir-ff-asan64opt/dist/bin/libxul.so(ffi_call_unix64+0x4c)
/srv/repos/browser/mozilla-central/objdir-ff-asan64opt/dist/bin/libxul.so(ffi_call+0x7b3)
js::ctypes::CDataFinalizer::CallFinalizer(js::ctypes::CDataFinalizer::Private*, int*, int*) /srv/repos/browser/mozilla-central/js/src/ctypes/CTypes.cpp:7083 
JSObject::finalize(js::FreeOp*) /srv/repos/browser/mozilla-central/js/src/jsobjinlines.h:243 
_ZL19FinalizeTypedArenasI8JSObjectEbPN2js6FreeOpEPPNS1_2gc11ArenaHeaderERNS4_9ArenaListENS4_9AllocKindERNS1_11SliceBudgetE /srv/repos/browser/mozilla-central/js/src/jsgc.cpp:391 
js::gc::ArenaLists::finalizeNow(js::FreeOp*, js::gc::AllocKind) /srv/repos/browser/mozilla-central/js/src/jsgc.cpp:1315 
~AutoSCC /srv/repos/browser/mozilla-central/js/src/gc/Statistics.h:223 
~AutoPhase /srv/repos/browser/mozilla-central/js/src/gc/Statistics.h:205 
~AutoGCSession /srv/repos/browser/mozilla-central/js/src/jsgc.cpp:4023 
Collect(JSRuntime*, bool, long, js::JSGCInvocationKind, JS::gcreason::Reason) /srv/repos/browser/mozilla-central/js/src/jsgc.cpp:4583 
_ZL16js_delete_poisonI9JSContextEvPT_ /srv/repos/browser/mozilla-central/./../../dist/include/js/Utility.h:530 
(anonymous namespace)::WorkerThreadRunnable::Run() /srv/repos/browser/mozilla-central/dom/workers/RuntimeService.cpp:545 
nsThread::ProcessNextEvent(bool, bool*) /srv/repos/browser/mozilla-central/xpcom/threads/nsThread.cpp:627 
NS_ProcessNextEvent(nsIThread*, bool) /srv/repos/browser/mozilla-central/objdir-ff-asan64opt/xpcom/build/nsThreadUtils.cpp:238 
nsThread::ShuttingDown() /srv/repos/browser/mozilla-central/xpcom/threads/nsThread.h:63 
_pt_root /srv/repos/browser/mozilla-central/nsprpub/pr/src/pthreads/ptthread.c:207 
__asan::AsanThread::ThreadStart() /srv/repos/llvm/projects/compiler-rt/lib/asan/asan_thread.cc:102


I would guess that something is corrupted/freed and then being collected by the GC. Ccing some people that might know who could help with this (hopefully^^).
The following indicates that this takes place during a finalization:
js::ctypes::CDataFinalizer::CallFinalizer(js::ctypes::CDataFinalizer::Private*, int*, int*)

We have exactly two calls that call free() during finalization:
* get_current_dir_name on Linux/Android;
* getwd_auto on MacOS X/BSD.

However, both calls are quite trivial and look correct.
By any chance, do you have the test logs?
(Assignee)

Comment 3

6 years ago
Thanks for looking into it. Yes, since the crashes are reproducible I can get all the info you need. I can attach the logs when I'm back at my computer. Fwiw, the tests were run on Linux.
(Assignee)

Updated

5 years ago
Blocks: 831491
(Assignee)

Comment 4

5 years ago
Created attachment 786664 [details]
Test log from optimized build
(Assignee)

Comment 5

5 years ago
Created attachment 786665 [details]
Test log from debug build
(Assignee)

Comment 6

5 years ago
The bug still cleanly reproduces for me and is blocking ASan testing on TBPL.

Yoric, can you take a look? Thanks! Let me know if you need help reproducing this locally or if I should try anything for you :)
Flags: needinfo?(dteller)
(Assignee)

Comment 7

5 years ago
This line looks problematic btw:

 0:43.87     #0 0x7f1c40e50956 _int_free (/build/buildd/eglibc-2.15/malloc/malloc.c:4076)

We shouldn't be calling free functions from glibc at all.
So how would you free memory that has been allocated by get_current_dir_name/getwd_auto?
Flags: needinfo?(dteller)
(In reply to David Rajchenbach Teller [:Yoric] from comment #8)
> So how would you free memory that has been allocated by
> get_current_dir_name/getwd_auto?

Use free().
(In reply to Mike Hommey [:glandium] from comment #9)
> (In reply to David Rajchenbach Teller [:Yoric] from comment #8)
> > So how would you free memory that has been allocated by
> > get_current_dir_name/getwd_auto?
> 
> Use free().

As discussed over irc, attempting to find out how to use the RTLD_DEFAULT version of free() instead of the libc version of free().
Created attachment 786894 [details] [diff] [review]
Letting the system use default resolution for free() wherever applicable

Christian, can you check whether this patch does the trick?
Attachment #786894 - Flags: feedback?(choller)
(Assignee)

Comment 12

5 years ago
Comment on attachment 786894 [details] [diff] [review]
Letting the system use default resolution for free() wherever applicable

Confirmed that this fixes the osfile tests on opt and debug builds. Thanks!
Attachment #786894 - Flags: feedback?(choller) → feedback+
Attachment #786894 - Flags: review?(nfroyd) → review+
(Assignee)

Comment 14

5 years ago
For some reason, xpcshell on the first try push is orange. A second try push with just xpcshell shows green though: https://tbpl.mozilla.org/?tree=Try&rev=5adcc6bd5ea7
Since there was some doubting over this being the cause of the failures, the backout push was indeed green.
Created attachment 790068 [details] [diff] [review]
Letting the system use default resolution for free() wherever applicable, v2

Same code, but with a if() explicitly excluding Android.
Assignee: nobody → dteller
Attachment #786894 - Attachment is obsolete: true
Comment on attachment 790068 [details] [diff] [review]
Letting the system use default resolution for free() wherever applicable, v2

Do you understand why Android needs to be excluded here?  That would be a helpful comment if you do.  If not, you should probably find out. :)
Not really. I was kind of hoping glandium would have an idea :)
Flags: needinfo?(mh+mozilla)
On linux and osx, jemalloc is hooked such that libc allocations go through jemalloc. So calling free() calls jemalloc's free.
On android, that's not possible, and libc allocations go through libc's free().
Flags: needinfo?(mh+mozilla)
(In reply to Mike Hommey [:glandium] from comment #22)
> On linux and osx, jemalloc is hooked such that libc allocations go through
> jemalloc. So calling free() calls jemalloc's free.
> On android, that's not possible, and libc allocations go through libc's
> free().

Actually, my question was the following: if we don't replace free() on Android, why does using a.out's version of free() have any effect at all on that platform?
Ah, forgot what this all was about. This all started with asan. Calling libc's free() as gotten from dlopen/dlsym with asan actually uses libc's free directly. But all allocations from libc itself go through the asan wrapper. So calling free() from libc directly with asan tracked memory is not supported. The patch allows to get the symbol from asan instead of libc's, by getting it through RTLD_DEFAULT.
NSPR has a special handle named "a.out", that corresponds to dlopen(RTLD_DEFAULT). But on android, that "a.out" handle is not available. So android does need to get the symbol from libc.

See:
  http://mxr.mozilla.org/mozilla-central/source/nsprpub/pr/src/linking/prlink.c#168
  http://mxr.mozilla.org/mozilla-central/source/nsprpub/pr/include/md/_linux.h#79
(Assignee)

Comment 25

5 years ago
I pushed Yoric's new version to try and it looks good so far:

https://tbpl.mozilla.org/?tree=Try&rev=64c931f37b43

I'll put the patch up for re-review when it's all done.
(Assignee)

Comment 26

5 years ago
Created attachment 790719 [details] [diff] [review]
Letting the system use default resolution for free() wherever applicable, v3

This is the latest version of the patch as provided by David, and it's green on try now, including Android.
Assignee: dteller → choller
Attachment #790068 - Attachment is obsolete: true
Status: NEW → ASSIGNED
Attachment #790719 - Flags: review?(nfroyd)
Comment on attachment 790719 [details] [diff] [review]
Letting the system use default resolution for free() wherever applicable, v3

Review of attachment 790719 [details] [diff] [review]:
-----------------------------------------------------------------

It took me a while to figure out what is going on here, because this looks very close to the first patch David posted.  After some looking through sources, I finally realized that the default_lib.declare(...) call can actually throw.  (Android does in fact declare a.out but there's no associated dynamic library handle to go with it, so symbol lookup fails.)

r=me with the below comment change.

::: toolkit/components/osfile/modules/osfile_unix_back.jsm
@@ +243,5 @@
> +             /*return*/ ctypes.void_t,
> +             /*ptr*/    ctypes.voidptr_t);
> +
> +         } catch (ex) {
> +           // On other platforms, well, use the regular version of free.

Why don't we say "We don't have an a.out library or a.out doesn't contain free.  Either way, use the ordinary libc free."
Attachment #790719 - Flags: review?(nfroyd) → review+
https://hg.mozilla.org/mozilla-central/rev/94088b6bf161
Status: ASSIGNED → RESOLVED
Last Resolved: 5 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla26
Thanks for finishing/landing this while I was on PTO.
You need to log in before you can comment on or make changes to this bug.