Closed Bug 1565272 Opened 5 years ago Closed 5 years ago

AddressSanitizer: use-after-poison [@ __asan_memcpy] with WRITE of size 48 through [@ js::NewObjectCache::copyCachedToObject] and various other signatures

Categories

(Core :: JavaScript Engine: JIT, defect, P1)

x86_64
Linux
defect

Tracking

()

RESOLVED FIXED
mozilla70
Tracking Status
firefox-esr60 --- unaffected
firefox-esr68 --- unaffected
firefox68 --- unaffected
firefox69 --- unaffected
firefox70 --- fixed

People

(Reporter: decoder, Assigned: pbone)

References

(Blocks 1 open bug, Regression)

Details

(4 keywords, Whiteboard: [jsbugmon:update][fuzzblocker])

Attachments

(3 files, 1 obsolete file)

The following testcase crashes on mozilla-central revision ad05396bfeed (build with --enable-valgrind --enable-gczeal --disable-tests --disable-profiling --disable-debug --enable-address-sanitizer --disable-jemalloc --enable-optimize=-O2 --enable-fuzzing, run with --fuzzing-safe --cpu-count=2 --ion-offthread-compile=off --wasm-compiler=ion):

See attachment.

Backtrace:

==29939==ERROR: AddressSanitizer: use-after-poison on address 0x230d24d40000 at pc 0x55b2fa897da5 bp 0x7ffc1fc4ba30 sp 0x7ffc1fc4b1e0
WRITE of size 48 at 0x230d24d40000 thread T0
    #0 0x55b2fa897da4 in __asan_memcpy
    #1 0x55b2faba7d2a in js_memcpy(void*, void const*, unsigned long) js/src/jsutil.h:54:10
    #2 0x55b2faba7d2a in js::NewObjectCache::copyCachedToObject(js::NativeObject*, js::NativeObject*, js::gc::AllocKind) js/src/vm/Caches.h:216
    #3 0x55b2faba7d2a in js::NewObjectCache::newObjectFromHit(JSContext*, int, js::gc::InitialHeap) js/src/vm/Caches-inl.h:81
    #4 0x55b2fb06d9f3 in js::NewObjectWithClassProtoCommon(JSContext*, js::Class const*, JS::Handle<JSObject*>, js::gc::AllocKind, js::NewObjectKind) js/src/vm/JSObject.cpp:907:29
    #5 0x55b2faa98ed9 in js::NewObjectWithClassProto(JSContext*, js::Class const*, JS::Handle<JSObject*>, js::gc::AllocKind, js::NewObjectKind) js/src/vm/JSObject-inl.h:493:10
    #6 0x55b2faa98ed9 in js::NewBuiltinClassInstance(JSContext*, js::Class const*, js::gc::AllocKind, js::NewObjectKind) js/src/vm/JSObject-inl.h:526
    #7 0x55b2faa98ed9 in js::PlainObject* js::NewBuiltinClassInstance<js::PlainObject>(JSContext*, js::gc::AllocKind, js::NewObjectKind) js/src/vm/JSObject-inl.h:545
    #8 0x55b2faa98ed9 in js::CopyInitializerObject(JSContext*, JS::Handle<js::PlainObject*>, js::NewObjectKind) js/src/vm/NativeObject-inl.h:624
    #9 0x55b2faa99a8c in js::NewObjectOperationWithTemplate(JSContext*, JS::Handle<JSObject*>) js/src/vm/Interpreter.cpp:5212:7
    #10 0x55b2fbf26d3d in js::jit::DoNewObjectFallback(JSContext*, js::jit::BaselineFrame*, js::jit::ICNewObject_Fallback*, JS::MutableHandle<JS::Value>) js/src/jit/BaselineIC.cpp:4074:11
    #11 0x7fd2a1d6dacf  (<unknown module>)

Address 0x230d24d40000 is a wild pointer.
SUMMARY: AddressSanitizer: use-after-poison in __asan_memcpy
Shadow bytes around the buggy address:
  0x04622499ffd0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x04622499ffe0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x04622499fff0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
=>0x0462249a0000:[f7]f7 f7 f7 f7 f7 f7 f7 f7 f7 f7 f7 f7 f7 f7 f7
  0x0462249a0010: f7 f7 f7 f7 f7 f7 f7 f7 f7 f7 f7 f7 f7 f7 f7 f7
  0x0462249a0020: f7 f7 f7 f7 f7 f7 f7 f7 f7 f7 f7 f7 f7 f7 f7 f7
Shadow byte legend (one shadow byte represents 8 application bytes):
  Addressable:           00
  Poisoned by user:        f7
==29939==ABORTING

This is causing quite a few distinct and hard-to-reproduce use-after-poison issues, so I am marking this as fuzzblocker. Also marking s-s until we know the source of the problem. Unfortunately, the testcase does not reduce further than what is attached, it reproduced reliably for me though. It seems very sensitive to memory allocations and branches.

Attached file Testcase
Type: task → defect

This look like this could be a rooting issue in the IC of Baseline, while I doubt about it this is my simplest hypothesis so far.
Matthew, can you (in/)validate this hypothesis, and maybe find more info and who might feel concerned by this issue.

Component: JavaScript Engine → JavaScript Engine: JIT
Flags: needinfo?(mgaudet)
Priority: -- → P1

This memory is "poisoned by user", so it's the ASan/Valgrind instrumentation we added to Poison. Requesting NI from paul because he made changes in that area recently.

Flags: needinfo?(pbone)

So, rr is not being terribly cooperative, but here's what I've got.

  1. I can reproduce, with the same stack as :decoder. However, trying to set a breakpoint at the precise point where the read occurs seems to cause rr to crash. I was able however to step through from the start of the function.

  2. The byte read is also the first first byte of the allocated object here. I don't know about ASAN + our GC, but it's perhaps notable that it thinks Address 0x24faf0440000 is a wild pointer?

  3. The crash happens during the js_memcpy reached from the next line, the call to copyCachedToObject.

  4. Specifically, it's during the __asan_memcpy that gets injected that the error fires; but, I don't really understand why: the supposed uninitialized value read is inside the dst of the memcpy! Is this an ASAN bug?

Some other notes that came up during my investigation, perhaps helpful to the next person who looks at this:

  1. The value was poisoned here, by watching the value itself.

    #0  __memset_avx2_erms () at ../sysdeps/x86_64/multiarch/memset-vec-unaligned-erms.S:151
    #1  0x0000562c56fa2f1a in __asan_memset () at /builds/worker/workspace/moz-toolchain/src/llvm/projects/compiler-rt/lib/asan/asan_interceptors_memintrinsics.cc:27
    #2  0x0000562c5851b903 in js::AlwaysPoison (ptr=<optimized out>, value=47 '/', num=1048576, kind=MemCheckKind::MakeUndefined) at /home/mgaudet/unified/js/src/jsutil.h:352
    #3  js::Poison (ptr=0x7f73ad800000, value=47 '/', num=1048576, kind=MemCheckKind::MakeUndefined) at /home/mgaudet/unified/js/src/jsutil.h:368
    #4  js::NurseryChunk::poisonAndInit (this=0x7f73ad800000, rt=0x629000000200, extent=1048576) at /home/mgaudet/unified/js/src/gc/Nursery.cpp:77
    #5  js::Nursery::poisonAndInitCurrentChunk (this=<optimized out>, fullPoison=<optimized out>) at /home/mgaudet/unified/js/src/gc/Nursery.cpp:1253
    #6  js::Nursery::enable (this=0x6290000029e0) at /home/mgaudet/unified/js/src/gc/Nursery.cpp:301
    #7  0x0000562c5795b334 in JSRuntime::initSelfHosting (this=<optimized out>, cx=0x7ffd63cec340) at /home/mgaudet/unified/js/src/vm/SelfHosting.cpp:3158
    #8  0x0000562c57e78dee in JS::InitSelfHostedCode (cx=0x61d000000a80) at /home/mgaudet/unified/js/src/jsapi.cpp:433
    #9  0x0000562c56fe6eed in main (argc=-1957535376, argv=0x61d000000a80, envp=<optimized out>) at /home/mgaudet/unified/js/src/shell/js.cpp:11345
    

    however, it's peculiar, because I see a different place the poisoning occurs if watch the shadow byte that corresponds to that address directly for changes:

     #0  __memset_avx2_erms () at ../sysdeps/x86_64/multiarch/memset-vec-unaligned-erms.S:151
     #1  0x0000559e0b065ed2 in __asan_poison_memory_region () at /builds/worker/workspace/moz-toolchain/src/llvm/projects/compiler-rt/lib/asan/asan_poisoning.cc:141
     #2  0x0000559e0c5d8243 in js::gc::MarkPagesUnused (region=0x24faf0440000, length=782336) at /home/mgaudet/unified/js/src/gc/Memory.cpp:753
     #3  0x0000559e0c5d94a5 in js::NurseryDecommitTask::decommitRange (this=<optimized out>, lock=...) at /home/mgaudet/unified/js/src/gc/Nursery.cpp:191
     #4  js::NurseryDecommitTask::run (this=0x629000002d30) at /home/mgaudet/unified/js/src/gc/Nursery.cpp:156
     #5  0x0000559e0b763cc5 in js::GCParallelTask::runFromHelperThread (this=0x629000002d30, lock=...) at /home/mgaudet/unified/js/src/vm/HelperThreads.cpp:1716
     #6  js::HelperThread::handleGCParallelWorkload (this=<optimized out>, lock=...) at /home/mgaudet/unified/js/src/vm/HelperThreads.cpp:1737
     #7  0x0000559e0b7671a2 in js::HelperThread::threadLoop (this=0x611000000400) at /home/mgaudet/unified/js/src/vm/HelperThreads.cpp:2515
     #8  0x0000559e0b7f5163 in js::detail::ThreadTrampoline<void (&)(void*), js::HelperThread*>::callMain<0ul> (this=0x602000000370)
         at /home/mgaudet/unified/js/src/threading/Thread.h:239
     #9  js::detail::ThreadTrampoline<void (&)(void*), js::HelperThread*>::Start (aPack=0x602000000370) at /home/mgaudet/unified/js/src/threading/Thread.h:232
     #10 0x00007f694c494182 in start_thread (arg=<optimized out>) at pthread_create.c:486
     #11 0x00007f694c044b1f in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95
    
  2. I think it's notable that the read is the first byte of a poisoned area, where the byte before is addressable.

  3. Watch points seem to be a little peculiar; if I watch the address for maniplulation, and reverse-continue, all I ever see are poison writes. On the flip side however, if I reverse to the first poisoning, and step forward, I see a non-poisioning write flagged here

    #0  js::gc::ChunkBitmap::isMarkedAny (this=0x28f67cafc0a0, cell=0x28f67caad128) at /home/mgaudet/unified/js/src/gc/Heap.h:644
    #1  js::gc::TenuredCell::isMarkedAny (this=0x28f67caad128) at /home/mgaudet/unified/js/src/gc/Cell.h:312
    #2  js::gc::Arena::finalize<js::Shape> (this=0x28f67caad000, fop=0x7f694875c8c0, thingKind=<optimized out>, thingSize=40) at /home/mgaudet/unified/js/src/gc/GC.cpp:585
    #3  0x0000559e0c501839 in FinalizeTypedArenas<js::Shape> (fop=0x7f694875c8c0, src=0x7f694875b6a0, dest=..., budget=..., thingKind=<optimized out>)
        at /home/mgaudet/unified/js/src/gc/GC.cpp:653
    #4  FinalizeArenas (fop=0x7f694875c8c0, src=0x7f694875b6a0, dest=..., thingKind=<optimized out>, budget=...) at /home/mgaudet/unified/js/src/gc/GC.cpp:681
    #5  0x0000559e0c500984 in js::gc::ArenaLists::backgroundFinalize (fop=<optimized out>, listHead=0x28f67cab2000, empty=<optimized out>)
        at /home/mgaudet/unified/js/src/gc/GC.cpp:3255
    #6  0x0000559e0c50b232 in js::gc::GCRuntime::sweepBackgroundThings (this=0x6290000006e8, zones=..., freeBlocks=...) at /home/mgaudet/unified/js/src/gc/GC.cpp:3724
    #7  0x0000559e0c50c356 in js::gc::GCRuntime::sweepFromBackgroundThread (this=0x6290000006e8, lock=...) at /home/mgaudet/unified/js/src/gc/GC.cpp:3805
    #8  0x0000559e0c50c041 in js::gc::BackgroundSweepTask::run (this=0x629000001950) at /home/mgaudet/unified/js/src/gc/GC.cpp:3790
    #9  0x0000559e0b763cc5 in js::GCParallelTask::runFromHelperThread (this=0x629000001950, lock=...) at /home/mgaudet/unified/js/src/vm/HelperThreads.cpp:1716
    #10 js::HelperThread::handleGCParallelWorkload (this=<optimized out>, lock=...) at /home/mgaudet/unified/js/src/vm/HelperThreads.cpp:1737
    #11 0x0000559e0b7671a2 in js::HelperThread::threadLoop (this=0x611000000400) at /home/mgaudet/unified/js/src/vm/HelperThreads.cpp:2515
    #12 0x0000559e0b7f5163 in js::detail::ThreadTrampoline<void (&)(void*), js::HelperThread*>::callMain<0ul> (this=0x602000000370)
        at /home/mgaudet/unified/js/src/threading/Thread.h:239
    #13 js::detail::ThreadTrampoline<void (&)(void*), js::HelperThread*>::Start (aPack=0x602000000370) at /home/mgaudet/unified/js/src/threading/Thread.h:232
    #14 0x00007f694c494182 in start_thread (arg=<optimized out>) at pthread_create.c:486
    #15 0x00007f694c044b1f in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95
    

    which seems peculiar, as isMarkedAny shouldn't be doing writes? Right? Don't know how much I trust rr here.

In conclusion: I think this is either a GC thing, or, potentially just an ASAN bug exacerbated by the new poisoning?

Flags: needinfo?(mgaudet)

(In reply to Matthew Gaudet (he/him) [:mgaudet] from comment #4)

In conclusion: I think this is either a GC thing, or, potentially just an ASAN bug exacerbated by the new poisoning?

ASan bugs are of course not impossible, but highly unlikely. We have not seen notable bugs in ASan itself for about 2 years now. Also, this issue only started occurring on Tue 9 Jul 2019 (at a very high rate) and not before. I think the poisoning patch is older than that. It might help if we had an exact regression range. Gary, can you bisect this, even though it might be a little flaky?

Flags: needinfo?(nth10sd)

autobisectjs shows this is probably related to the following changeset:

The first bad revision is:
changeset: https://hg.mozilla.org/mozilla-central/rev/fbfcb9fbf6c5
user: Paul Bone
date: Fri Jul 05 13:18:17 2019 +0000
summary: Bug 1506733 - (part 4) Decommit unused portion of Nursery chunks r=jonco

Paul, is bug 1506733 a likely regressor?

(I fixed a bunch of bisection and/or asan compilation-related issues for this, I think this potential regressor makes sense)

Flags: needinfo?(nth10sd)
Regressed by: 1506733
Whiteboard: [jsbugmon:update,bisect][fuzzblocker] → [jsbugmon:update][fuzzblocker]
This is an automated crash issue comment:

Summary: AddressSanitizer: use-after-poison [@ __asan_memset] with WRITE of size 1048576
Build version: mozilla-central revision 3e793ca066f2
Build flags: --enable-valgrind --enable-gczeal --disable-tests --disable-profiling --disable-debug --enable-address-sanitizer --disable-jemalloc --enable-optimize=-O2 --enable-fuzzing
Runtime options: --fuzzing-safe --ion-offthread-compile=off --ion-warmup-threshold=10

Testcase:

    while(1) {
      try {
        evalInWorker(`
          function testOneSize(current_size) {
            var eval_string = 'obj = {';
            for (var current = 0; current <= current_size; ++current)
              eval_string += 'k' + current + ':' + current + ','
          }
          testOneSize(1023);
          testOneSize(1024);
          gczeal(4);
        `);
      } catch (exc) {}
    }

Backtrace:

    ==19613==ERROR: AddressSanitizer: use-after-poison on address 0x343ff3588000 at pc 0x558984cc0f8d bp 0x7f7674d828b0 sp 0x7f7674d82060
    WRITE of size 1048576 at 0x343ff3588000 thread T13
        #0 0x558984cc0f8c in __asan_memset (dist/bin/js+0x770f8c)
        #1 0x5589861303ba in js::PoisonImpl(void*, unsigned char, unsigned long) js/src/jsutil.h:350:3
        #2 0x5589861303ba in js::Poison(void*, unsigned char, unsigned long, MemCheckKind) js/src/jsutil.h:371
        #3 0x5589861303ba in js::gc::Chunk::init(JSRuntime*) js/src/gc/Allocator.cpp:766
        #4 0x558986233ad5 in js::NurseryChunk::toChunk(JSRuntime*) js/src/gc/Nursery.cpp:106:10
        #5 0x558986233ad5 in js::NurseryDecommitTask::queueChunk(js::NurseryChunk*, js::AutoLockHelperThreadState const&) js/src/gc/Nursery.cpp:113
        #6 0x558986233ad5 in js::Nursery::freeChunksFrom(unsigned int) js/src/gc/Nursery.cpp:1484
        #7 0x558986233ad5 in js::Nursery::disable() js/src/gc/Nursery.cpp:325
        #8 0x55898619d7ac in JS::AutoDisableGenerationalGC::AutoDisableGenerationalGC(JSContext*) js/src/gc/GC.cpp:8068:19
        #9 0x558986275619 in js::VerifyPreTracer::VerifyPreTracer(JSRuntime*) js/src/gc/Verifier.cpp:109:9
        #10 0x558986275619 in js::VerifyPreTracer* js_new<js::VerifyPreTracer, JSRuntime* const&>(JSRuntime* const&) dist/include/js/Utility.h:545
        #11 0x558986275619 in js::gc::GCRuntime::startVerifyPreBarriers() js/src/gc/Verifier.cpp:194
        #12 0x558984e92e8f in Interpret(JSContext*, js::RunState&) js/src/vm/Interpreter.cpp:3096:9
        #13 0x558984e7456b in js::RunScript(JSContext*, js::RunState&) js/src/vm/Interpreter.cpp:425:10
        #14 0x558984eaf49b in js::ExecuteKernel(JSContext*, JS::Handle<JSScript*>, JSObject&, JS::Value const&, js::AbstractFramePtr, JS::Value*) js/src/vm/Interpreter.cpp:787:13
        #15 0x558984eafc6f in js::Execute(JSContext*, JS::Handle<JSScript*>, JSObject&, JS::Value*) js/src/vm/Interpreter.cpp:820:10
        #16 0x558985196eab in JS_ExecuteScript(JSContext*, JS::Handle<JSScript*>, JS::MutableHandle<JS::Value>) js/src/vm/CompilationAndEvaluation.cpp:475:10
        #17 0x558984d605e9 in WorkerMain(WorkerInput*) js/src/shell/js.cpp:4064:5
        #18 0x558984d80a12 in void js::detail::ThreadTrampoline<void (&)(WorkerInput*), WorkerInput*&>::callMain<0ul>(std::integer_sequence<unsigned long, 0ul>) js/src/threading/Thread.h:239:5
        #19 0x558984d80a12 in js::detail::ThreadTrampoline<void (&)(WorkerInput*), WorkerInput*&>::Start(void*) js/src/threading/Thread.h:232
        #20 0x7f767a5336b9 in start_thread (/lib/x86_64-linux-gnu/libpthread.so.0+0x76b9)
        #21 0x7f767939641c in clone (/lib/x86_64-linux-gnu/libc.so.6+0x10741c)
    
    Address 0x343ff3588000 is a wild pointer.
    SUMMARY: AddressSanitizer: use-after-poison (dist/bin/js+0x770f8c) in __asan_memset
    Shadow bytes around the buggy address:
      0x06887e6a8fe0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
      0x06887e6a8ff0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
    =>0x06887e6a9000:[f7]f7 f7 f7 f7 f7 f7 f7 f7 f7 f7 f7 f7 f7 f7 f7
      0x06887e6a9010: f7 f7 f7 f7 f7 f7 f7 f7 f7 f7 f7 f7 f7 f7 f7 f7
    Shadow byte legend (one shadow byte represents 8 application bytes):
      Addressable:           00
      Poisoned by user:        f7


I managed to find this smaller, reliable testcase. Maybe this helps in debugging? To me, the stack looks like we are attempting to poison memory that is already poisoned.

(In reply to Gary Kwong [:gkw] [:nth10sd] from comment #6)

autobisectjs shows this is probably related to the following changeset:

The first bad revision is:
changeset: https://hg.mozilla.org/mozilla-central/rev/fbfcb9fbf6c5
user: Paul Bone
date: Fri Jul 05 13:18:17 2019 +0000
summary: Bug 1506733 - (part 4) Decommit unused portion of Nursery chunks r=jonco

Paul, is bug 1506733 a likely regressor?

(I fixed a bunch of bisection and/or asan compilation-related issues for this, I think this potential regressor makes sense)

Yes. From your bisect and especially Matthew's back traces it it certainly either caused by that patch if not it was pre-existing and the patch helped to tease it out.

Assignee: nobody → pbone
Status: NEW → ASSIGNED
Flags: needinfo?(pbone)

I think that Bug 1562550 fixes the first crash in comment 0, at least it does for me. decoder can you check this?

Flags: needinfo?(choller)

it looks like both those test cases/crashes are the same (they have different stacks where they crash so I checked) and that applying Bug 1562550 part 2 fixes it.

The problem was that entering zeal mode wouldn't clear the ASAN information for part of the chunk and so when we touched it later it would cause the failure.

I want to apply another patch though, to so there's no race between the background decommiter and either disabling the nursery or entering zeal mode. This was something else I investigated.

I think this is not a security bug because it's caused by improper asan calls. Not something that asan actually caught.

Group: javascript-core-security
  • Spelling fixes.
  • Make the meaning of a comment clearer.

Depends on D38043

(In reply to Paul Bone [:pbone] from comment #10)

it looks like both those test cases/crashes are the same (they have different stacks where they crash so I checked) and that applying Bug 1562550 part 2 fixes it.

Hm, I can still reproduce with the test in comment 7 on today's build (mozilla-central revision eae7203ff8a5), even though bug 1562550 has landed several days ago. Are you sure this is fixed by the patch?

Flags: needinfo?(choller) → needinfo?(pbone)

I'm sorry decoder. I must have missed something. I tested again today and Bug 1562550 fixes the first problem but not the second. they're different bugs.

Flags: needinfo?(pbone)
Attachment #9078089 - Attachment is obsolete: true
Attachment #9078087 - Attachment description: Bug 1565272 - Fix some comments r=jonco → Bug 1565272 - Fix some comments r=sfink
Attachment #9078088 - Attachment description: Bug 1565272 - Prevent races with the background decommit task r=jonco → Bug 1565272 - Prevent races with the background decommit task r=sfink

The 2nd problem was a race. The decommit task would attempt to decommit part of a chunk that was also being decommitted as a whole chunk while disabling the nursery. The solution is to (as in the patch I already wrote when I noticed this by reading code) to join() the decommit task before freeing the chunks while disabling the nursery.

Pushed by pbone@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/e49dd8bce963
Fix some comments r=sfink
https://hg.mozilla.org/integration/autoland/rev/227a100b5eb0
Prevent races with the background decommit task r=sfink
Status: ASSIGNED → RESOLVED
Closed: 5 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla70
Has Regression Range: --- → yes
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: