Closed Bug 1477490 Opened 3 years ago Closed 3 years ago

AddressSanitizer: stack-buffer-overflow with WRITE of size 360 through [@ patched_BaseThreadInitThunk] in WindowsDllBlocklist

Categories

(Core :: mozglue, defect)

x86_64
Windows
defect
Not set
critical

Tracking

()

RESOLVED FIXED
mozilla64
Tracking Status
firefox-esr60 --- unaffected
firefox62 --- unaffected
firefox63 --- wontfix
firefox64 --- fixed

People

(Reporter: decoder, Assigned: away)

References

(Blocks 2 open bugs)

Details

(Keywords: crash, regression)

Attachments

(3 files)

The attached crash information was submitted via the ASan Nightly Reporter on mozilla-central-asan-nightly revision 63.0a1-20180721100146-https://hg.mozilla.org/mozilla-central/rev/9daa53881b7ae80bf6b093dac5d7744cf7fd18b1.

For detailed crash information, see attachment.


This is from a Windows ASan test build that Emanuel manually downloaded from Treeherder. Putting this under mozglue because WindowsDllBlocklist.cpp is the only code that is ours and on the stack. I also don't know how to get better symbols for the rest of the stack on Windows.
David, can you help answering the symbols question maybe as a start?
Flags: needinfo?(dmajor)
Also CCing some people that might know what this code exactly does. Any help is greatly appreciated.
We hook the Windows thread init function, so patched_BaseThreadInitThunk appears at the bottom of nearly every thread and probably isn't at fault here.

To get anything more out of this we'd either need to get a memory dump (does ASan Nightly collect such things?), and/or fix bug 1426176, and/or at least get the exact version of kernel32/ntdll/etc and disassemble those addresses from a local copy.
Flags: needinfo?(dmajor)
Hmm, if I'm reading this correctly, dumpbin says that

1) ntdll.dll+0x18001f3f2 corresponds to TppWorkerThread (0x18001f320),
2) KERNEL32.DLL+0x180013033 corresponds to BaseThreadInitThunk (0x180013020) and
3) ntdll.dll+0x180071430 corresponds to RtlUserThreadStart (0x180071410)

All are located in the Guard CF Function Table which seems to be related to Control Flow Guard (but that might just be a consequence of using CFG at all).

We don't seem to have symbols for clang_rt.asan_dynamic-x86_64.dll so we're probably missing the most important part.
Can you link me the exact Treeherder build that you used? I wonder if I could guess at a function name just from reading clang_rt.asan_dynamic-x86_64.dll disassembly.
Sure, it was this one: https://treeherder.mozilla.org/#/jobs?repo=mozilla-central&revision=9daa53881b7ae80bf6b093dac5d7744cf7fd18b1&selectedJob=189310358

The file is also identical to the one included in the build of clang you currently get from mach bootstrap (30a8aceddc0675ca-clang.tar.bz2).
I don't understand this stack. These offsets like ntdll+0x180071430 are unreasonably large. If I take out 0x180000000 then ntdll+71430 is a sensible function on my machine. So OK, maybe there is some rebasing going on or something, who knows. But then clang_rt_asan_dynamic_x86_64+0x2f2ca still doesn't make sense, it's not instruction-aligned.

Also noteworthy is that ASan thinks this is "thread T16777215" or in other words 0xFFFFFF. Maybe ASan's bookkeeping is messed up somehow.

Emanuel, are you able to reproduce this? If so, can you share what steps you took? (I didn't get any problems from just starting that build locally) Even better, could you try running under WinDbg and see if it triggers a break?
Regarding the offsets, could they be a consequence of CFG-related indirection? The offsets in ntdll and kernel32 all corresponded to functions in the Guard CF Function Table (according to dumpbin). I don't know what to make of the alignment though - especially the odd-aligned offsets (do they correspond to the current instruction or the next one?).

I don't have a consistent STR, though I'll try to find some. I got a crash once trying to download the latest (regular) Nightly so there might be some connection with network activity - I also got a few crashes scrolling a Reddit post, but I wonder if I was hovering over links while scrolling and triggering some prefetching mechanism.

I tried attaching the Visual Studio debugger to a running process once and immediately got an exception (which I could choose to ignore), but I wasn't sure whether that was simply a quirk of ASAN.
On Win64, ASan uses a page fault handler to map shadow memory as needed, because mapping it all upfront would exceed OS limits. So you do get a lot of benign exceptions when running under a debugger. If you hit a non-continuable exception, then you know you're actually in trouble.
I'm having difficulty triggering these crashes on a newer build. On the build I was using previously, I stopped getting browser crashes but was still getting tab crashes, but only with my usual set of pinned tabs open. So this may require a very specific set of circumstances to trigger, but I'll keep browsing around to see if it happens again.
Any chance you could try that old build under a debugger?
I tried for a while but couldn't get it to crash, ugh. I'll try again tomorrow.
Group: core-security → core-security-release
Hmm, I should mention that I *have* gotten other crashes, but they're mostly assertion failures at [1]. I've also gotten a few stack-buffer-underflow crashes (very similar to the crashes here and equally unsymbolized).

I wonder if these crashes all have the same underlying cause. None of them have been very reproducible but the assertion failures seem to happen more frequently if something else in the system is using a lot of CPU.

[1] https://github.com/llvm-mirror/compiler-rt/blob/master/lib/asan/asan_thread.cc#L350
I set up a local build of clang to get symbols for the ASAN runtime; unfortunately the llvm-symbolizer doesn't seem to be working (not sure why). Here's the stack according to MSVC for one of the assertion failures:

>	clang_rt.asan_dynamic-x86_64.dll!__sanitizer::internal__exit(int exitcode) Line 748	C++
 	clang_rt.asan_dynamic-x86_64.dll!__sanitizer::Die() Line 60	C++
 	clang_rt.asan_dynamic-x86_64.dll!__asan::AsanCheckFailed(const char * file, int line, const char * cond, unsigned __int64 v1, unsigned __int64 v2) Line 77	C++
 	clang_rt.asan_dynamic-x86_64.dll!__sanitizer::CheckFailed(const char * file, int line, const char * cond, unsigned __int64 v1, unsigned __int64 v2) Line 81	C++
 	clang_rt.asan_dynamic-x86_64.dll!__asan::AsanThread::GetStackFrameAccessByAddr(unsigned __int64 addr, __asan::AsanThread::StackFrameAccess * access) Line 350	C++
 	clang_rt.asan_dynamic-x86_64.dll!__asan::GetStackAddressInformation(unsigned __int64 addr, unsigned __int64 access_size, __asan::StackAddressDescription * descr) Line 205	C++
 	clang_rt.asan_dynamic-x86_64.dll!__asan::AddressDescription::AddressDescription(unsigned __int64 addr, unsigned __int64 access_size, bool shouldLockThreadRegistry) Line 465	C++
 	clang_rt.asan_dynamic-x86_64.dll!__asan::ErrorGeneric::ErrorGeneric(unsigned int tid, unsigned __int64 pc_, unsigned __int64 bp_, unsigned __int64 sp_, unsigned __int64 addr, bool is_write_, unsigned __int64 access_size_) Line 396	C++
 	clang_rt.asan_dynamic-x86_64.dll!__asan::ReportGenericError(unsigned __int64 pc, unsigned __int64 bp, unsigned __int64 sp, unsigned __int64 addr, bool is_write, unsigned __int64 access_size, unsigned int exp, bool fatal) Line 463	C++
 	clang_rt.asan_dynamic-x86_64.dll!__asan_wrap_memset(void * dst, int v, unsigned __int64 size) Line 764	C++
 	ntdll.dll!LdrpInitializeThread()	Unknown
 	ntdll.dll!_LdrpInitialize()	Unknown
 	ntdll.dll!LdrpInitialize()	Unknown
 	ntdll.dll!LdrInitializeThunk()	Unknown
I'd be very curious to see a full-memory dump if you hit it again.
Here's the dump for that stack. Do you need the local build I made?
I managed to get a crash on a fresh profile, here's the build, profile and minidump+heap saved by MSVC: https://mega.nz/#!vF9HxaKR!OjU_s-JWJGa-ydHI5ck8dhlPYmAkBjuqSIlWJvqGURw

Ther's no ASAN report since it was a content process crash and I forgot to turn the sandbox off (I tried turning it off but then I couldn't get it to crash anymore, alas).
Thanks Emanuel!

OK, I'm a little hazy on some details but I think I have a rough idea of what's happening.

Windows is creating a new thread and ntdll!LdrpInitializeThread is calling memset to zero out some data structure on the stack. That call gets redirected to __asan_wrap_memset. ASan thinks that area is a poison/redzone (maybe leftover from a previous thread that occupied that memory?) and tries to report an error.

Then, when attempting to get a backtrace for the first complaint, ASan runs into some other internal inconsistencies, and trips a fatal assertion. I don't know or care much about the details here because fixing the poisoning problem would make this moot.

We should ask the ASan folks whether there are any known issues around stack memory re-use.

In the meantime we might be able to work around this with the `replace_intrin:0` option or by adding `interceptor_name:memset` to a suppressions file.
> We should ask the ASan folks whether there are any known issues around stack memory re-use.

"Stack memory re-use" was a bad phrase here, as it sounds too much like a common C++ bug. What I really mean is issues with the OS creating a new thread using the stack memory area from a previous thread.
No need to keep this s-s, this is a bug with ASan on Windows.

I am currently testing with `replace_intrin:0` as suggested by :dmajor. I am not sure if this will make the crashes go away entirely but so far, it seems more stable. I would opt for making a patch that adds this to the ASan default options for reporter on Windows only, just to see if the crashes disappear. That would improve the overall stability on Windows for the reporter nightlies.
Group: core-security-release
Blocks: 1479385
Duplicate of this bug: 1478591
Hey Aaron, anything we should be concerned about here?
Flags: needinfo?(aklotz)
I believe these crashes were fixed / worked around by disabling ASAN's stack instrumentation in bug 1479385.
I don't think so.
Flags: needinfo?(aklotz)
See Also: → 1481745
I can reproduce on Windows 10 64bit using https://www.ibatt.ru/cart/step1/ This is a pretty common crash in bughunter. I don't have access to bug 1481745. If someone could cc me that would be great.
oops, mine was an underflow. sorry. Let me know if there is anything else you would like me to do.
Bob, do I need to do anything more than loading the URL in comment 26 to reproduce?
Flags: needinfo?(bob)
I just used nightly opt asan, enabled pop ups for the site and then used web console to open a tab from the loaded page, then in that new tab opened developer console then executed setInterval('opener.document.location.reload()', 30000) to load the original tab once every thirty seconds.

One attempt failed quickly with

==8620==AddressSanitizer CHECK failed: Z:\task_1536324217\build\src\build\build-clang\build-clang\src\llvm\projects\compiler-rt\lib\asan\asan_thread.cc:350 "((ptr[0] == kCurrentStackFrameMagic)) != (0)" (0x0, 0x0)

Another failed while I was away with

==7740==ERROR: AddressSanitizer: stack-use-after-scope on address 0x003b319ff860 at pc 0x7ff97fddfae2 bp 0x003b319fef80 sp 0x003b319fefc8
WRITE of size 56 at 0x003b319ff860 thread T16777215
==7740==WARNING: Failed to use and restart external symbolizer!
    #0 0x7ff97fddfb0a in __asan_wrap_memset Z:\task_1536324217\build\src\build\build-clang\build-clang\src\llvm\projects\compiler-rt\lib\sanitizer_common\sanitizer_common_interceptors.inc:764
    #1 0x7ff9a7df1aa4 in RtlUnicodeStringToAnsiString+0x2e4 (C:\WINDOWS\SYSTEM32\ntdll.dll+0x180031aa4)
    #2 0x7ff9a7e330c0 in LdrInitializeThunk+0x100 (C:\WINDOWS\SYSTEM32\ntdll.dll+0x1800730c0)
    #3 0x7ff9a7e3301a in LdrInitializeThunk+0x5a (C:\WINDOWS\SYSTEM32\ntdll.dll+0x18007301a)
    #4 0x7ff9a7e32fcd in LdrInitializeThunk+0xd (C:\WINDOWS\SYSTEM32\ntdll.dll+0x180072fcd)

Address 0x003b319ff860 is a wild pointer.
SUMMARY: AddressSanitizer: stack-use-after-scope Z:\task_1536324217\build\src\build\build-clang\build-clang\src\llvm\projects\compiler-rt\lib\sanitizer_common\sanitizer_common_interceptors.inc:764 in __asan_wrap_memset
Shadow bytes around the buggy address:
  0x01ed33a3feb0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x01ed33a3fec0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x01ed33a3fed0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x01ed33a3fee0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x01ed33a3fef0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
=>0x01ed33a3ff00: 00 00 00 00 f1 f1 f1 f1 f8 f8 f8 f8[f8]f8 f3 f3
  0x01ed33a3ff10: f3 f3 f3 f3 00 00 00 00 00 00 00 00 00 00 00 00
  0x01ed33a3ff20: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x01ed33a3ff30: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x01ed33a3ff40: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x01ed33a3ff50: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
Shadow byte legend (one shadow byte represents 8 application bytes):
  Addressable:           00
  Partially addressable: 01 02 03 04 05 06 07 
  Heap left redzone:       fa
  Freed heap region:       fd
  Stack left redzone:      f1
  Stack mid redzone:       f2
  Stack right redzone:     f3
  Stack after return:      f5
  Stack use after scope:   f8
  Global redzone:          f9
  Global init order:       f6
  Poisoned by user:        f7
  Container overflow:      fc
  Array cookie:            ac
  Intra object redzone:    bb
  ASan internal:           fe
  Left alloca redzone:     ca
  Right alloca redzone:    cb
  Shadow gap:              cc

The stack is useless but you should be able to find something with this via the reload kludge.
Flags: needinfo?(bob)
Unhiding :bc's comments so :dmajor can read through them. The issue here is not s-s anyways, this is a bug in ASan itself and not an issue in our codebase.
Comment 26 is private: false
Comment 27 is private: false
Comment 29 is private: false
bc, I have a proposed fix/workaround here: https://treeherder.mozilla.org/#/jobs?repo=try&revision=d4bff18d6a54 -- could you take it for a spin and see if it addresses these issues? (The two BoR builds should be the same, I think that was just a misclick)
Flags: needinfo?(bob)
It's running. I've set asanreporter.clientid to bclary@mozilla.com. I'll leave it running for a while.
Flags: needinfo?(bob)
When I just checked on it, I found tab crashed but no information in the output related to asan. Perhaps it sent an asan report, but I can't check on it. Hopefully that helped and hopefully this fix will land upstream soon.
I've checked the results in the backend, there is one report from :bc but it is an OOM related crashed. No sign of the stack issues anymore, so I guess we can consider this fixed as soon as we have the new toolchain. Thanks :dmajor!
Committed upstream at 342652 (with a bonus change to the interceptor at 342649). I'll cherry-pick these into our tree. Note to self, don't forget to remove the asan-stack workaround.
Assignee: nobody → dmajor
Cherry-pick compiler-rt r342652.

r342649 is a ride-along patch that makes r342652 use a less-invasive hook. It's not strictly necessary on my Windows installation, but it doesn't hurt, and it may be helpful on other systems.
Comment on attachment 9010806 [details]
Bug 1477490: Merge upstream ASan patch to unpoison thread stacks.

Ted Mielczarek [:ted] [:ted.mielczarek] has approved the revision.
Attachment #9010806 - Flags: review+
Pushed by dmajor@mozilla.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/8846c0a176c8
Merge upstream ASan patch to unpoison thread stacks. r=ted
https://hg.mozilla.org/mozilla-central/rev/8846c0a176c8
Status: NEW → RESOLVED
Closed: 3 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla64
David, would this patch be a good candidate for uplifting to 63 beta?
Flags: needinfo?(dmajor)
There's no real benefit since the asan reporter project is nightly-only.
Flags: needinfo?(dmajor)
See Also: → 1532502
You need to log in before you can comment on or make changes to this bug.