Closed Bug 833127 Opened 7 years ago Closed 7 years ago

TestStartupCache fails under ASan

Categories

(Core :: XPCOM, defect, critical)

x86_64
Linux
defect
Not set
critical

Tracking

()

RESOLVED FIXED
mozilla21

People

(Reporter: decoder, Assigned: froydnj)

Details

(Keywords: sec-want, Whiteboard: [asan][asan-test-failure])

Attachments

(3 files, 2 obsolete files)

The compiled test "TestStartupCache" crashes with ASan builds of mozilla-central revision 91b9995f9ac7:


make[1]: Entering directory `/srv/repos/browser/mozilla-central/objdir-ff-asan64dbg/startupcache/test'
cppunittests INFO | Running test TestStartupCache
=================================================================
==11267== ERROR: AddressSanitizer: alloc-dealloc-mismatch (malloc vs operator delete []) on 0x2b90a3713080
    #0 0x411cb2 in operator delete[](void*) ??:0
    #1 0x2b908f33012d in nsBufferedStream::Close() /srv/repos/browser/mozilla-central/netwerk/base/src/nsBufferedStreams.cpp:88
    #2 0x2b908f3340af in nsBufferedOutputStream::Close() /srv/repos/browser/mozilla-central/netwerk/base/src/nsBufferedStreams.cpp:597
    #3 0x2b909178506b in nsZipWriter::Close() /srv/repos/browser/mozilla-central/modules/libjar/zipwriter/src/nsZipWriter.cpp:696
    #4 0x2b908f7db918 in mozilla::scache::StartupCache::WriteToDisk() /srv/repos/browser/mozilla-central/startupcache/StartupCache.cpp:471
    #5 0x2b9098530543 in _pt_root /srv/repos/browser/mozilla-central/nsprpub/pr/src/pthreads/ptthread.c:156
0x2b90a3713080 is located 0 bytes inside of 65536-byte region [0x2b90a3713080,0x2b90a3723080)
allocated by thread T6 (StartupCache) here:
    #0 0x411632 in malloc ??:0
    #1 0x2b908f3304d2 in nsBufferedStream::Init(nsISupports*, unsigned int) /srv/repos/browser/mozilla-central/netwerk/base/src/nsBufferedStreams.cpp:77
    #2 0x2b908f31a565 in NS_NewBufferedOutputStream(nsIOutputStream**, nsIOutputStream*, unsigned int) /srv/repos/browser/mozilla-central/../../../../dist/include/nsNetUtil.h:1066
    #3 0x2b9091786679 in nsZipWriter::Open(nsIFile*, int) /srv/repos/browser/mozilla-central/modules/libjar/zipwriter/src/nsZipWriter.cpp:269
    #4 0x2b908f7db6c8 in mozilla::scache::StartupCache::WriteToDisk() /srv/repos/browser/mozilla-central/startupcache/StartupCache.cpp:438
    #5 0x2b9098530543 in _pt_root /srv/repos/browser/mozilla-central/nsprpub/pr/src/pthreads/ptthread.c:156
Thread T6 (StartupCache) created by T0 here:
    #0 0x40eea4 in pthread_create ??:0
    #1 0x2b909852beec in _PR_CreateThread /srv/repos/browser/mozilla-central/nsprpub/pr/src/pthreads/ptthread.c:393
    #2 0x2b909852b8a7 in PR_CreateThread /srv/repos/browser/mozilla-central/nsprpub/pr/src/pthreads/ptthread.c:476
    #3 0x2b9091e39201 in nsTimerImpl::Fire() /srv/repos/browser/mozilla-central/xpcom/threads/nsTimerImpl.cpp:482
    #4 0x2b9091e399f5 in nsTimerEvent::Run() /srv/repos/browser/mozilla-central/xpcom/threads/nsTimerImpl.cpp:565
    #5 0x2b9091e2e0aa in nsThread::ProcessNextEvent(bool, bool*) /srv/repos/browser/mozilla-central/xpcom/threads/nsThread.cpp:627
    #6 0x45510f in NS_ProcessPendingEvents(nsIThread*, unsigned int) /srv/repos/browser/mozilla-central/xpcom/glue/nsThreadUtils.cpp:188
    #7 0x41c607 in WaitForStartupTimer() /srv/repos/browser/mozilla-central/startupcache/test/TestStartupCache.cpp:67
    #8 0x41cbd3 in TestStartupWriteRead() /srv/repos/browser/mozilla-central/startupcache/test/TestStartupCache.cpp:104
    #9 0x420d3c in main /srv/repos/browser/mozilla-central/startupcache/test/TestStartupCache.cpp:543
    #10 0x2b9098fbe76c in __libc_start_main /build/buildd/eglibc-2.15/csu/libc-start.c:226
==11267== HINT: if you don't care about these warnings you may set ASAN_OPTIONS=alloc_dealloc_mismatch=0
Stats: 5M malloced (5M for red zones) by 8145 calls
Stats: 0M realloced by 513 calls
Stats: 2M freed by 3285 calls
Stats: 0M really freed by 0 calls
Stats: 48M (48M-0M) mmaped; 12 maps, 0 unmaps
  mmaps   by size class: 8:16383; 9:8191; 10:4095; 11:2047; 12:1024; 13:512; 14:256; 15:128; 16:64; 17:32; 18:16; 19:8;
  mallocs by size class: 8:5339; 9:1795; 10:438; 11:175; 12:63; 13:201; 14:27; 15:83; 16:8; 17:12; 18:3; 19:1;
  frees   by size class: 8:1322; 9:1238; 10:336; 11:137; 12:40; 13:185; 14:12; 15:5; 16:3; 17:7;
  rfrees  by size class:
Stats: malloc large: 16 small slow: 90
Stats: StackDepot: 0 ids; 0M mapped
==11267== ABORTING
cppunittests TEST-UNEXPECTED-FAIL | TestStartupCache | test failed with return code 256
For the message above (alloc-dealloc-mismatch) I don't know if ASan has false positives. However if I run the same test under an opt build, I'm getting something entirely different:

==11646== ERROR: AddressSanitizer: global-buffer-overflow on address 0x00000047ef65 at pc 0x2b4d84777189 bp 0x7fff595bfc50 sp 0x7fff595bfc48
READ of size 1 at 0x00000047ef65 thread T0
    #0 0x2b4d84777188 in mozilla::scache::StartupCache::PutBuffer(char const*, char const*, unsigned int) /srv/repos/browser/mozilla-central/startupcache/StartupCache.cpp:344
    #1 0x41f1dc in TestEarlyShutdown() /srv/repos/browser/mozilla-central/startupcache/test/TestStartupCache.cpp:364
    #2 0x420741 in main /srv/repos/browser/mozilla-central/startupcache/test/TestStartupCache.cpp:552
    #3 0x2b4d8c95a76c in __libc_start_main /build/buildd/eglibc-2.15/csu/libc-start.c:226
    #4 0x41ba44 in _start ??:0
0x00000047ef65 is located 27 bytes to the left of global variable '.str7 (/srv/repos/browser/mozilla-central/startupcache/test/TestStartupCache.cpp)' (0x47ef80) of size 22
  '.str7 (/srv/repos/browser/mozilla-central/startupcache/test/TestStartupCache.cpp)' is ascii string 'failed pre-write read'
0x00000047ef65 is located 34 bytes to the right of global variable '.str6 (/srv/repos/browser/mozilla-central/startupcache/test/TestStartupCache.cpp)' (0x47ef40) of size 3
  '.str6 (/srv/repos/browser/mozilla-central/startupcache/test/TestStartupCache.cpp)' is ascii string 'id'
Shadow bytes around the buggy address:
  0x10000008fd90: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x10000008fda0: 00 00 00 00 00 00 00 00 00 00 00 f9 f9 f9 f9 f9
  0x10000008fdb0: 00 05 f9 f9 f9 f9 f9 f9 00 00 00 00 00 00 00 00
  0x10000008fdc0: 00 00 00 00 02 f9 f9 f9 f9 f9 f9 f9 00 00 00 f9
  0x10000008fdd0: f9 f9 f9 f9 00 07 f9 f9 f9 f9 f9 f9 00 00 00 00
=>0x10000008fde0: 03 f9 f9 f9 f9 f9 f9 f9 03 f9 f9 f9[f9]f9 f9 f9
  0x10000008fdf0: 00 00 06 f9 f9 f9 f9 f9 00 00 06 f9 f9 f9 f9 f9
  0x10000008fe00: 00 00 00 01 f9 f9 f9 f9 00 00 00 01 f9 f9 f9 f9
  0x10000008fe10: 00 00 00 07 f9 f9 f9 f9 00 00 00 00 06 f9 f9 f9
  0x10000008fe20: f9 f9 f9 f9 00 00 00 00 00 00 02 f9 f9 f9 f9 f9
  0x10000008fe30: 00 00 00 00 00 00 00 05 f9 f9 f9 f9 00 00 00 00
Shadow byte legend (one shadow byte represents 8 application bytes):
  Addressable:           00
  Partially addressable: 01 02 03 04 05 06 07
  Heap left redzone:     fa
  Heap righ redzone:     fb
  Freed Heap region:     fd
  Stack left redzone:    f1
  Stack mid redzone:     f2
  Stack right redzone:   f3
  Stack partial redzone: f4
  Stack after return:    f5
  Stack use after scope: f8
  Global redzone:        f9
  Global init order:     f6
  Poisoned by user:      f7
  ASan internal:         fe
Stats: 5M malloced (5M for red zones) by 7944 calls
Stats: 0M realloced by 504 calls
Stats: 2M freed by 3369 calls
Stats: 0M really freed by 0 calls
Stats: 48M (48M-0M) mmaped; 12 maps, 0 unmaps
  mmaps   by size class: 8:16383; 9:8191; 10:4095; 11:2047; 12:1024; 13:512; 14:256; 15:128; 16:64; 17:32; 18:16; 19:8;
  mallocs by size class: 8:5174; 9:1822; 10:368; 11:175; 12:62; 13:204; 14:28; 15:81; 16:9; 17:17; 18:3; 19:1;
  frees   by size class: 8:1420; 9:1241; 10:307; 11:137; 12:42; 13:188; 14:13; 15:4; 16:4; 17:13;
  rfrees  by size class:
Stats: malloc large: 21 small slow: 94
Stats: StackDepot: 0 ids; 0M mapped
==11646== ABORTING


This is for sure not a false positive, memory is being read that belongs to a redzone (out of bounds read). Not sure if it's related to the debug message, but it would be a big coincidence if there were two such failures on the same test, where the whole tree is pretty much green :)
This looks fishy.  The source code in question:

http://mxr.mozilla.org/mozilla-central/source/netwerk/base/src/nsBufferedStreams.cpp#77
http://mxr.mozilla.org/mozilla-central/source/netwerk/base/src/nsBufferedStreams.cpp#88

matches up new[] and delete[] just fine.  Looking at the object file here, I see matching calls to moz_malloc and moz_free.  So maybe this is a bug in asan?  Can you attach the output of:

objdump -dr netwerk/base/src/nsBufferedStreams.o

from an asan build?
Attached file objdump output
Output from the debug build as requested. If the debug trace in comment 0 doesn't make sense, then I suggest focusing on the second trace from the opt build, because that is for sure not a false positive and indicating a memory-unsafe operation. Maybe the two are somehow connected.
(In reply to Christian Holler (:decoder) from comment #2)
> This is for sure not a false positive, memory is being read that belongs to
> a redzone (out of bounds read). Not sure if it's related to the debug
> message, but it would be a big coincidence if there were two such failures
> on the same test, where the whole tree is pretty much green :)

This is definitely a bug, the argument order for the call to PutBuffer is bogus.
(In reply to Christian Holler (:decoder) from comment #4)
> Created attachment 704710 [details]
> objdump output
> 
> Output from the debug build as requested. If the debug trace in comment 0
> doesn't make sense, then I suggest focusing on the second trace from the opt
> build, because that is for sure not a false positive and indicating a
> memory-unsafe operation. Maybe the two are somehow connected.

Yeah, the debug build is busted.  Here's the call to new[]:

 12e:	e8 00 00 00 00       	callq  133 <_ZN16nsBufferedStream4InitEP11nsISupportsj+0x133>
			12f: R_X86_64_PC32	_ZnamRKN7mozilla10fallible_tE-0x4

and the corresponding function it's supposed to call from the same file:

Disassembly of section .text._ZnamRKN7mozilla10fallible_tE:

0000000000000000 <_ZnamRKN7mozilla10fallible_tE>:
   0:	e9 00 00 00 00       	jmpq   5 <_ZnamRKN7mozilla10fallible_tE+0x5>
			1: R_X86_64_PLT32	moz_malloc-0x4

So the buffer is allocated with malloc, as the trace says.  But here's the call in ::Close:

  b9:	e8 00 00 00 00       	callq  be <_ZN16nsBufferedStream5CloseEv+0xbe>
			ba: R_X86_64_PLT32	_ZdaPv-0x4

and its (supposed) corresponding function:

Disassembly of section .text._ZdaPv:

0000000000000000 <_ZdaPv>:
   0:	e9 00 00 00 00       	jmpq   5 <_ZdaPv+0x5>
			1: R_X86_64_PLT32	moz_free-0x4

_ZdaPv is operator delete[]...but notice that the call in ::Close is a PLT32 relocation, rather than a PC32 relocation.  So the linker is going to make a call through a PLT entry for the call to delete[], rather than finding the delete[] defined locally in the same file.  I'm pretty sure this is a problem with asan and symbol visibility of some kind.
Christian, can you give this patch a whirl on an opt build?
Attachment #704714 - Flags: review?(choller)
Comment on attachment 704714 [details] [diff] [review]
fix order of arguments given to PutBuffer

I can confirm the fix, the crash is gone. However the test now fails:

Running Startup Cache tests...
TEST-PASS | STARTUP_CACHE_AGE_HOURS histogram before test
TEST-PASS | STARTUP_CACHE_INVALID histogram before test
TEST-PASS | got a pointer?
TEST-PASS | passed pre-write read
TEST-PASS | passed simple write/read
TEST-PASS | buffer not available after invalidate
TEST-PASS | write object
TEST-PASS | buffer not available after ignoring disk cache
TEST-UNEXPECTED-FAIL | GetBuffer succeeded unexpectedly after early shutdown
TEST-PASS | STARTUP_CACHE_AGE_HOURS histogram after test
TEST-PASS | histogram records age samples
TEST-PASS | STARTUP_CACHE_INVALID histogram after test
TEST-PASS | histogram records invalid disk cache
Finished running Startup Cache tests.
cppunittests TEST-UNEXPECTED-FAIL | TestStartupCache | test failed with return code 256
Attachment #704714 - Flags: review?(choller) → review+
asan checking identified a problem: we had reversed arguments to PutBuffer.
Unfortunately, that reversal means that a test was inadvertently passing:
we stuck a (buf, id) pair into the cache and then looked up |id|, which of
course won't find a key.

I think the reason for that is that GetBuffer should follow PutBuffer in
checking gShutdownInitiated and returning NS_ERROR_NOT_AVAILABLE in that
case.  WDYT?
Attachment #704714 - Attachment is obsolete: true
Attachment #704963 - Flags: review?(mwu)
So I guess this sort of makes sense since that's what the test specifies, but it's somewhat harder for me to justify not being able to read startup cache data that's available and already written to disk. I wonder if this might impact shutdown times.
Comment on attachment 704963 [details] [diff] [review]
fix startupcache test issues identified by asan

Taras, what do you think? I don't know if we really need to block startupcache access during shutdown, but that's what the test assumes. There's also a chance of a shutdown time regression but that seems unlikely.
Attachment #704963 - Flags: review?(mwu) → review?(taras.mozilla)
(In reply to Michael Wu [:mwu] from comment #10)
> So I guess this sort of makes sense since that's what the test specifies,
> but it's somewhat harder for me to justify not being able to read startup
> cache data that's available and already written to disk. I wonder if this
> might impact shutdown times.

Hm, I can see that.  Should we say that the test is bogus and what it ought to be checking for is:

- you ought to be able to GetBuffer things even after early shutdown;
- you can't PutBuffer things after early shutdown.

?
(In reply to Nathan Froyd (:froydnj) from comment #12)
> Hm, I can see that.  Should we say that the test is bogus and what it ought
> to be checking for is:
> 
> - you ought to be able to GetBuffer things even after early shutdown;
> - you can't PutBuffer things after early shutdown.
> 
> ?

That sounds better to me.
With the changed test only this time.
Attachment #704963 - Attachment is obsolete: true
Attachment #704963 - Flags: review?(taras.mozilla)
Attachment #705529 - Flags: review?(mwu)
Comment on attachment 705529 [details] [diff] [review]
fix startupcache test issues identified by asan

Looks good, thanks!
Attachment #705529 - Flags: review?(mwu) → review+
https://hg.mozilla.org/mozilla-central/rev/dff72e90b3ac
Status: NEW → RESOLVED
Closed: 7 years ago
Flags: in-testsuite+
Resolution: --- → FIXED
Target Milestone: --- → mozilla21
Assignee: nobody → nfroyd
> _ZdaPv is operator delete[]...but notice that the call in ::Close is a PLT32
> relocation, rather than a PC32 relocation.  So the linker is going to make a
> call through a PLT entry for the call to delete[], rather than finding the
> delete[] defined locally in the same file.  I'm pretty sure this is a
> problem with asan and symbol visibility of some kind.

The problem is with the #pragma magic we play. It reduces to

#pragma GCC visibility push(hidden)

#pragma GCC visibility push(default)
typedef long unsigned int size_t;
void operator delete[](void*) throw();
#pragma GCC visibility pop

namespace mozilla {
  struct fallible_t { };
}
inline
void operator delete[](void* ptr) throw()
{
}
inline
void* operator new[](size_t size, const mozilla::fallible_t&) throw()
{
}
void
Init()
{
  const mozilla::fallible_t fallible = mozilla::fallible_t();
  new (fallible) char[10];
}
char* mBuffer;
void Close()
{
  delete[] mBuffer;
}

The operator new gets hidden visibility because it is never forward declared in system headers (given that is uses a mozilla type). Operator delete is forward declared in system headers, so it is sandwiched with visibility pragmas and has default visibility.

I can't think of any way of really fixing this, but we can avoid the bug by forcing inlining (I will upload a test patch in a sec).
Attached patch test patchSplinter Review
Decoder, can you check if this fixes the problem for you?
Attachment #711469 - Flags: feedback?(choller)
Comment on attachment 711469 [details] [diff] [review]
test patch

Pretty sure this is gone by now.
Attachment #711469 - Flags: feedback?(choller)
You need to log in before you can comment on or make changes to this bug.