Closed Bug 1015898 Opened 10 years ago Closed 10 years ago

AddressSanitizer: Assertion failure: fd != NULL, at sslsock.c:177 blocking PKIX testing

Categories

(Core :: Security: PSM, defect)

x86_64
Linux
defect
Not set
critical

Tracking

()

RESOLVED DUPLICATE of bug 1022268
Tracking Status
firefox32 --- affected

People

(Reporter: decoder, Unassigned)

Details

(Keywords: sec-want)

Attachments

(1 file)

I tried running xpcshell tests with an ASan 64 bit debug build and I hit this abort with the test at security/manager/ssl/tests/unit/test_cert_overrides.js (trace is from m-c rev ab6d433e563b):


 0:06.07 TEST-INFO | (xpcshell/head.js) | test run_next_test 2 finished (2)
 0:06.09 Assertion failure: fd != NULL, at sslsock.c:177
 0:06.09 Redirecting call to abort() to mozalloc_abort
 0:06.09 
 0:06.09 Hit MOZ_CRASH() at memory/mozalloc/mozalloc_abort.cpp:30
 0:06.09 ASAN:SIGSEGV
 0:06.09 =================================================================
 0:06.09 ==850==ERROR: AddressSanitizer: SEGV on unknown address 0x000000000000 (pc 0x7f9523e0773a sp 0x7fffceb5f410 bp 0x7fffceb5f420 T0)
 0:06.18     #0 0x7f9523e07739 in mozalloc_abort(char const*) memory/mozalloc/mozalloc_abort.cpp:30
 0:06.18     #1 0x7f9523e07026 in abort memory/mozalloc/mozalloc_abort.cpp:39
 0:06.19     #2 0x7f9523d7f843 in PR_Assert nsprpub/pr/src/io/prlog.c:554
 0:06.24     #3 0x7f9514a40e55 in ssl_FindSocket security/nss/lib/ssl/sslsock.c:177
 0:06.24     #4 0x7f9514a41cf3 in SSL_OptionSet security/nss/lib/ssl/sslsock.c:567
 0:07.86     #5 0x7f951ce51f3d in nsNSSSocketInfo::ActivateSSL() security/manager/ssl/src/nsNSSIOLayer.cpp:466
 0:07.86     #6 0x7f95175ff619 in NS_InvokeByIndex xpcom/reflect/xptcall/src/md/unix/xptcinvoke_x86_64_unix.cpp:162
 0:07.88     #7 0x7f951a418a2c in CallMethodHelper::Invoke() js/xpconnect/src/XPCWrappedNative.cpp:2397
 0:07.88     #8 0x7f951a418a2c in CallMethodHelper::Call() js/xpconnect/src/XPCWrappedNative.cpp:1738
 0:07.88     #9 0x7f951a3dd4a2 in XPCWrappedNative::CallMethod(XPCCallContext&, XPCWrappedNative::CallMode) js/xpconnect/src/XPCWrappedNative.cpp:1705:12
 0:07.88     #10 0x7f951a3e205e in XPC_WN_CallMethod(JSContext*, unsigned int, JS::Value*) js/xpconnect/src/XPCWrappedNativeJSOps.cpp:1273
 0:07.89     #11 0x7f951f492658 in js::CallJSNative(JSContext*, bool (*)(JSContext*, unsigned int, JS::Value*), JS::CallArgs const&) js/src/jscntxtinlines.h:239
 0:07.90     #12 0x7f951f3f3050 in js::Invoke(JSContext*, JS::CallArgs, js::MaybeConstruct) js/src/vm/Interpreter.cpp:475:16
 0:07.90     #13 0x7f951f3d963d in Interpret(JSContext*, js::RunState&) js/src/vm/Interpreter.cpp:2620
 0:07.90     #14 0x7f951f3b8b4d in js::RunScript(JSContext*, js::RunState&) js/src/vm/Interpreter.cpp:422
 0:07.90     #15 0x7f951f3f306d in js::Invoke(JSContext*, JS::CallArgs, js::MaybeConstruct) js/src/vm/Interpreter.cpp:494
 0:07.90     #16 0x7f951f3657d9 in js::Invoke(JSContext*, JS::Value const&, JS::Value const&, unsigned int, JS::Value const*, JS::MutableHandle<JS::Value>) js/src/vm/Interpreter.cpp:531
 0:07.92     #17 0x7f951eeb01be in JS_CallFunctionValue(JSContext*, JS::Handle<JSObject*>, JS::Handle<JS::Value>, JS::HandleValueArray const&, JS::MutableHandle<JS::Value>) js/src/jsapi.cpp:5190
 0:07.92     #18 0x7f951a3c7c7a in nsXPCWrappedJSClass::CallMethod(nsXPCWrappedJS*, unsigned short, XPTMethodDescriptor const*, nsXPTCMiniVariant*) js/xpconnect/src/XPCWrappedJSClass.cpp:1272
 0:07.92     #19 0x7f951760090b in PrepareAndDispatch xpcom/reflect/xptcall/src/md/unix/xptcstubs_x86_64_linux.cpp:122
 0:07.92     #20 0x7f95175ff88a in SharedStub (objdir-ff-asan64dbg/dist/bin/libxul.so+0x1b9888a)
 0:07.92 
 0:07.92 AddressSanitizer can not provide additional info.
 0:07.92 SUMMARY: AddressSanitizer: SEGV memory/mozalloc/mozalloc_abort.cpp:30 mozalloc_abort(char const*)
 0:07.92 ==850==ABORTING
 0:07.94 PR_Recv failed: PR_END_OF_FILE_ERROR


I bisected the failure:

Due to skipped revisions, the first bad revision could be any of:
changeset:   183607:1d1935f80c97
user:        Patrick McManus <mcmanus@ducksong.com>
date:        Wed Apr 16 09:52:43 2014 -0400
summary:     bug 378637 part 12 - proxy over TLS (i.e. https proxying) r=hurley

changeset:   183608:ab6d433e563b
user:        Patrick McManus <mcmanus@ducksong.com>
date:        Fri May 16 11:46:12 2014 -0400
summary:     bug 378637 part 13 - https proxying changes to casting and trans lifecycle r=hurley


This blocks any further PKIX testing because my driver performing the TLS connections dies with exactly the same problem since this patch laneded, so it's crucial to get this fixed quickly. Patrick, mind taking a look? Thanks!
Flags: needinfo?(mcmanus)
Fwiw, a regular debug build does not die (otherwise, TBPL would probably have barked at that and we don't run tests on ASan debug builds, just opt). So it could be a timing-related issue, thread scheduling etc.
:decoder - which platform is this? (or more specifically - which set of prefs is it using?)
Flags: needinfo?(mcmanus)
This is Linux x86-64 just running the xpcshell tests with mach (default prefs therefore). Or did you mean mozconfig?
Flags: needinfo?(mcmanus)
Thanks christian - I have a build from:

https://ftp.mozilla.org/pub/mozilla.org/firefox/tinderbox-builds/mozilla-central-linux64-asan-debug/latest/

can you give me a quick paste of how to run the xpcshell test with that? I'm so accustomed to running it with a src directory I'd have to look up the details..
Flags: needinfo?(mcmanus)
Umm, I don't know how exactly you can run the xpcshell test with the downloaded build, I also ran it from my source repository because I do my own ASan builds. Maybe someone in #developers knows?
hmm - I've managed to make linux-64 asan debug builds (both with and without optimization) - so I've got a src tree and can run the xpchshell tests.. but I can't seem to reproduce yet.

do you run just that testcase, or a whole directory?
I just ran the one test using

./mach xpcshell-test security/manager/ssl/tests/unit/test_cert_overrides.js

Weird that it does not reproduce for you. Which Clang revision did you use and what's your mozconfig? Maybe there is some difference there. But it reproduced quite reliably for me, even across a lot of revisions. Otherwise the bisecting wouldn't have worked so well.
This isn't really an asan-logic detected bug, right? That stack is from a normal debug assert.. so maybe its not surprising that its more about timing that asan is influencing.

also, do you have any optimizations in your build you can turn off? ActivateSSL isn't actually an xpcom entry point and there are multiple paths to it..

followed directions in https://developer.mozilla.org/en-US/docs/Mozilla/Testing/Firefox_and_Address_Sanitizer

mcmanus@ds9:~/src/llvm/build$ echo $REV
200213

/home/mcmanus/src/mozilla2/wd/gecko-dev>cat .mozconfig
ac_add_options --enable-application=browser

mk_add_options MOZ_OBJDIR=@TOPSRCDIR@/../objdir-ff-asan
mk_add_options MOZ_MAKE_FLAGS=-j12

# Enable ASan specific code and build workarounds
ac_add_options --enable-address-sanitizer

# Ensure you set this to your LLVM_HOME path
export LLVM_HOME="/home/mcmanus/src/llvm"

# Set CC/CXX based on LLVM_HOME
export CC="$LLVM_HOME/build/bin/clang"
export CXX="$LLVM_HOME/build/bin/clang++"

# Add ASan to our compiler flags
export CFLAGS="-fsanitize=address -Dxmalloc=myxmalloc"
export CXXFLAGS="-fsanitize=address -Dxmalloc=myxmalloc"

# Additionally, we need the ASan flag during linking. Normally, our C/CXXFLAGS would
# be used during linking as well but there is at least one place in our build where
# our CFLAGS are not added during linking.
# Note: The use of this flag causes Clang to automatically link the ASan runtime :)
export LDFLAGS="-fsanitize=address"

# Avoid using ASan flags when building host tools like nsinstall
export HOST_CFLAGS=" "
export HOST_CXXFLAGS=" "
export HOST_LDFLAGS=" "

# These three are required by ASan
ac_add_options --disable-jemalloc
ac_add_options --disable-crashreporter
ac_add_options --disable-elf-hack

# Keep symbols to symbolize ASan traces later
export MOZ_DEBUG_SYMBOLS=1
ac_add_options --enable-debug-symbols
ac_add_options --disable-install-strip

# Settings for a debug+opt build
#ac_add_options --enable-optimize
ac_add_options --disable-optimize
ac_add_options --enable-debug

# Settings for an opt build
#ac_add_options --enable-optimize
#ac_add_options --disable-debug

ac_add_options --enable-tests
maybe if you could get a stack from your build with gdb and accompany it with an http log I could see something.. https://developer.mozilla.org/en-US/docs/Mozilla/Debugging/HTTP_logging
Actually my build is with --enable-optimize --enable-debug, so maybe that's the difference in timing. Could you try reproducing with that? I can also give it a try without optimizations and give the http logging a try, but that's likely not going to happen until Saturday because I'm leaving for a conference in Amsterdam very soon.
GDB trace for revision f28005b84ed0:


Program received signal SIGSEGV, Segmentation fault.
0x00007ffff7fc973a in mozalloc_abort (msg=<optimized out>) at  memory/mozalloc/mozalloc_abort.cpp:30
30          MOZ_CRASH();
(gdb) bt 32
#0  0x00007ffff7fc973a in mozalloc_abort (msg=<optimized out>) at  memory/mozalloc/mozalloc_abort.cpp:30
#1  0x00007ffff7fc9027 in abort () at  memory/mozalloc/mozalloc_abort.cpp:39
#2  0x00007ffff7f41844 in PR_Assert (s=<optimized out>, file=<optimized out>, ln=<optimized out>) at  nsprpub/pr/src/io/prlog.c:554
#3  0x00007fffe8a1673a in ssl_FindSocket (fd=0x0) at sslsock.c:177
#4  0x00007fffe8a175d6 in SSL_OptionSet (fd=0x0, which=1, on=1) at sslsock.c:567
#5  0x00007ffff0fa0026 in nsNSSSocketInfo::ActivateSSL (this=0x61200002fd40) at  security/manager/ssl/src/nsNSSIOLayer.cpp:467
#6  0x00007fffeb61698a in NS_InvokeByIndex (that=0x7fffe8e15180, methodIndex=<optimized out>, paramCount=<optimized out>, params=<optimized out>)
    at  xpcom/reflect/xptcall/src/md/unix/xptcinvoke_x86_64_unix.cpp:162
#7  0x00007fffee52f16d in Invoke (this=<optimized out>, this=<optimized out>) at  js/xpconnect/src/XPCWrappedNative.cpp:2389
#8  CallMethodHelper::Call (this=0x7ffffffeb060) at  js/xpconnect/src/XPCWrappedNative.cpp:1730
#9  0x00007fffee4f3e2b in XPCWrappedNative::CallMethod (ccx=..., mode=<optimized out>) at  js/xpconnect/src/XPCWrappedNative.cpp:1697
#10 0x00007fffee4f89e7 in XPC_WN_CallMethod (cx=<optimized out>, argc=<optimized out>, vp=<optimized out>) at  js/xpconnect/src/XPCWrappedNativeJSOps.cpp:1273
#11 0x00007ffff35bc1b9 in js::CallJSNative (native=0x7fffee4f851f <XPC_WN_CallMethod(JSContext*, unsigned int, JS::Value*)>, cx=<optimized out>, args=...)
    at  js/src/jscntxtinlines.h:239
#12 0x00007ffff351bb01 in js::Invoke (cx=<optimized out>, args=..., construct=<optimized out>) at  js/src/vm/Interpreter.cpp:455
#13 0x00007ffff350250a in Interpret (cx=0x615000046d00, state=...) at  js/src/vm/Interpreter.cpp:2561
#14 0x00007ffff34e17ee in js::RunScript (cx=<optimized out>, state=...) at  js/src/vm/Interpreter.cpp:402
#15 0x00007ffff351bb1e in js::Invoke (cx=<optimized out>, args=..., construct=<optimized out>) at  js/src/vm/Interpreter.cpp:474
#16 0x00007ffff34939aa in js::Invoke (thisv=..., fval=..., argv=<optimized out>, cx=<optimized out>, argc=<optimized out>, rval=...) at  js/src/vm/Interpreter.cpp:511
#17 0x00007ffff2fd280f in JS_CallFunctionValue (args=..., cx=<optimized out>, obj=..., fval=..., rval=...) at  js/src/jsapi.cpp:5200
#18 0x00007fffee4de3b6 in nsXPCWrappedJSClass::CallMethod (wrapper=<optimized out>, info_=<optimized out>, nativeParams=<optimized out>, this=<optimized out>, methodIndex=<optimized out>)
    at  js/xpconnect/src/XPCWrappedJSClass.cpp:1259
#19 0x00007fffeb617c7c in PrepareAndDispatch (self=<optimized out>, methodIndex=<optimized out>, gpregs=<optimized out>, fpregs=<optimized out>, args=<optimized out>)
    at  xpcom/reflect/xptcall/src/md/unix/xptcstubs_x86_64_linux.cpp:122
#20 0x00007fffeb616bfb in SharedStub () from  objdir-ff-asan64dbg/dist/bin/libxul.so
#21 0x00007fffeb5a83c6 in nsOutputStreamReadyEvent::Run (this=<optimized out>) at  xpcom/io/nsStreamUtils.cpp:169
#22 0x00007fffeb5e3f7d in nsThread::ProcessNextEvent (this=<optimized out>, aMayWait=<optimized out>, aResult=<optimized out>) at  xpcom/threads/nsThread.cpp:766
#23 0x00007fffeb61698a in NS_InvokeByIndex (that=0x7fffe8e15180, methodIndex=<optimized out>, paramCount=<optimized out>, params=<optimized out>)
    at  xpcom/reflect/xptcall/src/md/unix/xptcinvoke_x86_64_unix.cpp:162
#24 0x00007fffee52f16d in Invoke (this=<optimized out>, this=<optimized out>) at  js/xpconnect/src/XPCWrappedNative.cpp:2389
#25 CallMethodHelper::Call (this=0x7fffffff21a0) at  js/xpconnect/src/XPCWrappedNative.cpp:1730
#26 0x00007fffee4f3e2b in XPCWrappedNative::CallMethod (ccx=..., mode=<optimized out>) at  js/xpconnect/src/XPCWrappedNative.cpp:1697
#27 0x00007fffee4f89e7 in XPC_WN_CallMethod (cx=<optimized out>, argc=<optimized out>, vp=<optimized out>) at  js/xpconnect/src/XPCWrappedNativeJSOps.cpp:1273
Attached file NSPR log
NSPR log.
This still repros for me on tip and I attached all the requested information in the last two comments. If there is timing involved, then it might indeed be a good idea for you to test an --enable-debug --enable-optimize ASan build.
Flags: needinfo?(mcmanus)
1] a patch landed in 1022268 today that could be related. The reasoning would apply - but I'm not really sure. The issue came up from a different angle. :decoder can you try with a build that has that patch?
2] I return to kbd on thurs and can devote thu and fri to this as necessary.
Flags: needinfo?(mcmanus)
thanks for the log

it contains initiatesocket() twice for the same pointer, so that indicates a retry .. it also contains "trying again with next ip address" which is a signature for 1022268 .. the patch that landed on the 10th in 1022268 addressed exactly that problem and the result could be similar. in 1022268 the reporter had us using plaintext on socket that was supposed to be TLS, and your crash is a TLS socket that hasn't been initialized but expects to be.

so I'd like you to confirm with that patch the bug still exists.

I can't trigger the bug at all :(

your test tried 127.0.0.1 and ::1 so a difference in ipv6 config (or the host resolver libraries and how it deals with it) could explain the difference in reproducing.

if the issue is resolved you can just mark it as a dup.
Thanks Patrick :) I just tried a trunk build which includes the patch from bug 1022268 and I can confirm the issue is now gone \o/ Marking as a duplicate.
Status: NEW → RESOLVED
Closed: 10 years ago
Resolution: --- → DUPLICATE
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: