Closed Bug 914511 Opened 11 years ago Closed 11 years ago

Invalid read of size 8 [@ js::gc::MarkIonCodeUnbarriered] or [@ js::jit::Assembler::TraceJumpRelocations]

Categories

(Core :: JavaScript Engine, defect)

x86_64
Linux
defect
Not set
critical

Tracking

()

VERIFIED FIXED
mozilla27
Tracking Status
firefox24 --- wontfix
firefox25 --- verified
firefox26 --- verified
firefox27 + verified
firefox28 + verified
firefox-esr24 --- verified
b2g18 --- verified
b2g-v1.1hd --- verified
b2g-v1.2 --- verified

People

(Reporter: gkw, Assigned: sunfish)

References

Details

(4 keywords)

Attachments

(3 files)

Attached file stack
try {
    Object.defineProperty(this, "x", {
        get: function() {
            x
        }
    });
    x;
} catch (e) {};
try {
    timeout(1);
    x;
} catch (e) {};
try {
    x;
} catch (e) {}
gc();

shows an invalid read of size 8 on js opt shell on m-c changeset f320b8c034bd without any CLI arguments.

s-s because this involves gc and is an invalid read.

I ran `valgrind --smc-check=all-non-file ./js testcase.js`

My configure flags are:

--enable-optimize --disable-debug --enable-profiling --enable-gczeal --enable-debug-symbols --enable-methodjit --enable-type-inference --disable-tests --enable-more-deterministic --enable-valgrind --with-ccache --enable-threadsafe <other NSPR options>
Flags: needinfo?
This happens with js::jit::Assembler::TraceJumpRelocations sometimes at the top of the stack too.
Flags: needinfo?
Summary: Invalid read of size 8 [@ js::gc::MarkIonCodeUnbarriered] → Invalid read of size 8 [@ js::gc::MarkIonCodeUnbarriered] or [@ js::jit::Assembler::TraceJumpRelocations]
autoBisect shows this is probably related to the following changeset:

The first bad revision is:
changeset:   http://hg.mozilla.org/mozilla-central/rev/475c1655ce61
user:        Shu-yu Guo
date:        Tue Mar 19 23:26:08 2013 -0700
summary:     Bug 814795 - Remove v8-specific tools for selfhosted code in favor of CPP (r=till)

Shu-yu, is bug 814795 a possible regressor?
Blocks: 814795
Flags: needinfo?(shu)
I highly doubt it's bug 814795, which doesn't touch Ion at all. It just changes how we preprocess self-hosted JS files as part of the build process.
Flags: needinfo?(shu)
I suppose it's conceivable that the call to |DecompressString| for decompressing the self-hosted JS file is corrupting memory somehow, but I wouldn't know where to start investigating that.
Julian, is this likely related to bug 910477?
Flags: needinfo?(jseward)
No, I think completely unrelated to bug 910477.  This just looks
like a null pointer dereference.  Can you reproduce it not-on-valgrind?
Flags: needinfo?(jseward)
$ ./js-opt-64-dm-vg-ts-linux-e5ca10a2b3d0 914511.js
js_ReportOverRecursed called
js_ReportOverRecursed called
js_ReportOverRecursed called

Nope, I can't reproduce it without Valgrind, but I still can, with Valgrind.
This sounds like stack/frame corruption. Nicolas, Jan or myself should probably take a look at it. I'm swamped at the moment: Nicolas, do you have some spare cycles to investigate this?
Flags: needinfo?(nicolas.b.pierron)
I think Jan is busy too, but I also think that nbp is in Oslo, Norway this entire week.

Jan, I hope you're a better choice though. :)
Flags: needinfo?(nicolas.b.pierron) → needinfo?(jdemooij)
> nbp is in Oslo, Norway this entire week.

(for the B2G work week)
Gary, I can't reproduce this on Linux 64-bit. I updated to the revision in comment 0, used the exact same configure line and Valgrind command but it doesn't crash.

The timeout(1) call probably makes this harder to reproduce. Do you also get this crash if you use other timeout values? Maybe Nicolas or somebody else could debug this on your machine next week?

@decoder: i think you also reported a timeout-related bug this week, but I can't find it now?
Flags: needinfo?(choller)
(In reply to Jan de Mooij [:jandem] from comment #11)
> @decoder: i think you also reported a timeout-related bug this week, but I
> can't find it now?

Hm, I did another search and found it this time: bug 914174, but that seems ASan-only...
Flags: needinfo?(choller)
Indeed, bug 914174 was just about ASan stumbling over the new interrupt code that requires a custom signal handler just like ASM.js. If this bug just reproduces under valgrind, then it might well be a problem with the interrupt code as well. We currently have bug 913876 for an invalid write with ASM.js (which is the same interrupt stuff that timeout is using), so maybe that is related.
> The timeout(1) call probably makes this harder to reproduce. Do you also get
> this crash if you use other timeout values?

Yeah, other timeout values should work, it used to be timeout(1800) or something. I tested on Ubuntu Linux 12.04 LTS.
Nicolas can you try to reproduce this or maybe investigate on Gary's machine? It's still not crashing here with the info in comment 0.
Flags: needinfo?(jdemooij) → needinfo?(nicolas.b.pierron)
I checked on gary's computer, we are pushing the IonCode, and failing the GC code because we cannot read anymore the CompactBuffer containing the relocation table.

It seems to be a *use-after-free* of the code_ pointer of the IonCode which is allocated with execution rights in CodeGenerator::link (during IonScript generation).

I have no idea when the code get freed, but based on valgrind it seems ok, after it is allocated in the CodeGenerator::link function, but is no longer when we are seeing the IonCode in PushMarkStack.  The IonCode is not mutated, so we would need to track what might free a similar pointer.
Flags: needinfo?(nicolas.b.pierron)
Keywords: sec-critical
Thanks Nicolas, that sounds serious and may explain some other bug reports..

Can you take this? I can't repro here unfortunately :(
Setting a needinfo on myself to test this on ASan.
Flags: needinfo?(gary)
This doesn't reproduce for me in ASan (at least not in my build configuration), but it also doesn't crash without any of the two tools, although it should according to the trace. I assume the fact that Valgrind makes it crashing is just related to memory reordering or different initialization, not to a specific error detection capability.
Gary, I'm setting the tracking flags based on your comment 2 investigation. Please let me know if I goofed anywhere.
See also bug 917050.
I think this is probably a simple NULL deref. Nicolas, are the jump relocations stored in this buffer allowed to be NULL? If so, we simply need to guard, as in the attached patch. Otherwise, there is something more going on here.
Assignee: general → terrence
Status: NEW → ASSIGNED
Attachment #813333 - Flags: review?(nicolas.b.pierron)
Also, this showed up with the TraceJumpRelocations signature for me. I did not see the MarkIonCode signature; there may be two bugs here.
(In reply to Terrence Cole [:terrence] from comment #22)
> I think this is probably a simple NULL deref. Nicolas, are the jump
> relocations stored in this buffer allowed to be NULL? If so, we simply need
> to guard, as in the attached patch. Otherwise, there is something more going
> on here.

IonCode pointers should not be allowed to be NULL at any point in time, after the link phase.  Otherwise this means that we might just crash by executing the first page.

I guess we can assert that the target->raw() is not NULL such as here:
   http://dxr.mozilla.org/mozilla-central/source/js/src/jit/x64/Assembler-x64.h#l642

or do it as part of the writeRelocation functions.
Comment on attachment 813333 [details] [diff] [review]
fuzz_914511-v0.diff

Review of attachment 813333 [details] [diff] [review]:
-----------------------------------------------------------------

::: js/src/jit/x64/Assembler-x64.cpp
@@ +251,5 @@
>      RelocationIterator iter(reader);
>      while (iter.read()) {
>          IonCode *child = CodeFromJump(code, code->raw() + iter.offset());
> +        if (!child)
> +            continue;

They are not supposed to be NULL, otherwise this means that we can generate code which can crash.
If you want we can assert so, and do the same either in writeRelocations or at the end of the link phase.
Attachment #813333 - Flags: review?(nicolas.b.pierron)
Attached file valgrind.stack
When I run a debug build, I get the attached valgrind stack, which looks similar to bug 917050. Regardless, whatever is going on here is /extremely/ weird. Here is the code for CompactBufferReader::readByte: 

   │<readByte()+1>   mov    %rsp,%rbp
   │<readByte()+4>   sub    $0x10,%rsp      ; bump stack 2 words
   │<readByte()+8>   mov    %rdi,-0x8(%rbp) ; store |this| to top-of-stack
   <<<< JS_ASSERT(buffer_ < end_); >>>>
   ... snip ...
   <<<< *buffer_++ >>>>
   │<readByte()+81>  mov    -0x8(%rbp),%rax  ; load |this| into $rax [[0xffeffcad0]]
   │<readByte()+85>  mov    (%rax),%rax      ; load |buffer_| into $rax [[0x4034138]]
  >│<readByte()+88>  movzbl (%rax),%edx      ; <- Valgrind think $rax is 0 here?
                                             ; Result should be 0xe0 in $edx.
   │<readByte()+91>  lea    0x1(%rax),%rcx
   │<readByte()+95>  mov    -0x8(%rbp),%rax
   │<readByte()+99>  mov    %rcx,(%rax)
   │<readByte()+102> mov    %edx,%eax
   │<readByte()+104> leaveq
   │<readByte()+105> retq

When I run +81 and +85 manually in gdb, I get the correct addresses I've listed after those instructions. The value that gdb finds in $rax after valgrind loads it is 0x0, however.

Julian, could V's emulation of movzbl be going off the rails here or just left in an invalid state?
Flags: needinfo?(jseward)
Terrence this has been flagged needinfo for a while. Can you make progress on this without Julian's input? Is there anyone else, perhaps njn, that could weigh in valgrind's emulation of this instruction?
Now that I have upgraded valgrind I am getting a different and more valid-looking error. Re-doing the investigation.
Flags: needinfo?(jseward)
This is only giving me the MarkIonCode signature now.

So far I have only managed to repro this in opt builds running under valgrind. The image valgrind attaches gdb to does not have thread info -- so it's impossible to say if the background thread might be to blame -- and it can't re-run the program, so I cannot step backwards through states. This leaves us with printf debugging as our last resort. I simply do not know enough about this code to attempt printf debugging on this. I'm afraid I need to hand this off to someone who actually knows how this is supposed to work. Nicolas, did you ever try to repro this locally?
Flags: needinfo?(nicolas.b.pierron)
No I have not yet tried to reproduce it locally, I will try later today.
So far I haven't managed to compile it with the set of configure flags given in comment 0.  I am trying to get as close as possible to the set of configure options but I cannot reproduce it at the moment.
I cannot reproduce this issue, Gary, can you tell me how to connect to the computer on which you are able to reproduce this issue?
Flags: needinfo?(nicolas.b.pierron)
(In reply to Nicolas B. Pierron [:nbp] from comment #32)
> I cannot reproduce this issue, Gary, can you tell me how to connect to the
> computer on which you are able to reproduce this issue?

I can still reproduce on rev 8f8a683dfc42. Will contact nbp offline.
nbp has reproduced on my box. Terrence, let me know if you need credentials to work on this.
(In reply to Nicolas B. Pierron [:nbp] from comment #16)
> I have no idea when the code get freed, but based on valgrind it seems ok,
> after it is allocated in the CodeGenerator::link function, but is no longer
> when we are seeing the IonCode in PushMarkStack.  The IonCode is not
> mutated, so we would need to track what might free a similar pointer.

Just to backup what I was saying, this is what I am seeing under gdb:

Breakpoint 1, js::jit::IonCode::copyFrom (this=0x73651c0, masm=...) at /home/gkwong/Desktop/js-opt-64-dm-vg-ts-linux-mozilla-central-148065-8f8a683dfc42-6Hbofa/compilePath/js/src/jit/Ion.cpp:635
635     {
$397 = (js::jit::IonCode * const) 0x73651c0
$398 = (uint8_t *) 0x41c9008 ""
#0  js::jit::IonCode::copyFrom (this=0x73651c0, masm=...) at /home/gkwong/Desktop/js-opt-64-dm-vg-ts-linux-mozilla-central-148065-8f8a683dfc42-6Hbofa/compilePath/js/src/jit/Ion.cpp:635
#1  0x00000000006317f2 in js::jit::Linker::newCode (this=0xffef199a0, cx=0x64e9d30, execAlloc=0x673d5b0, kind=JSC::ION_CODE) at /home/gkwong/Desktop/js-opt-64-dm-vg-ts-linux-mozilla-central-148065-8f8a683dfc42-6Hbofa/compilePath/js/src/jit/IonLinker.h:60
#2  0x000000000065ddff in newCodeForIonScript (cx=0x64e9d30, this=0xffef199a0) at /home/gkwong/Desktop/js-opt-64-dm-vg-ts-linux-mozilla-central-148065-8f8a683dfc42-6Hbofa/compilePath/js/src/jit/IonLinker.h:93
#3  js::jit::CodeGenerator::link (this=0x673c4e0) at /home/gkwong/Desktop/js-opt-64-dm-vg-ts-linux-mozilla-central-148065-8f8a683dfc42-6Hbofa/compilePath/js/src/jit/CodeGenerator.cpp:5679
#4  0x000000000066a586 in js::jit::IonCompile (cx=0x64e9d30, script=<optimized out>, baselineFrame=0xffef19d68, osrPc=0x66cc3b0 "", constructing=224, executionMode=js::jit::SequentialExecution) at /home/gkwong/Desktop/js-opt-64-dm-vg-ts-linux-mozilla-central-148065-8f8a683dfc42-6Hbofa/compilePath/js/src/jit/Ion.cpp:1662
#5  0x000000000066a7ec in js::jit::Compile (cx=0x64e9d30, script=..., osrFrame=0xffef19d68, osrPc=0x0, constructing=false, executionMode=js::jit::SequentialExecution) at /home/gkwong/Desktop/js-opt-64-dm-vg-ts-linux-mozilla-central-148065-8f8a683dfc42-6Hbofa/compilePath/js/src/jit/Ion.cpp:1814
#6  0x000000000066b0d7 in js::jit::CompileFunctionForBaseline (cx=0x64e9d30, script=..., frame=0xffef19d68, isConstructing=false) at /home/gkwong/Desktop/js-opt-64-dm-vg-ts-linux-mozilla-central-148065-8f8a683dfc42-6Hbofa/compilePath/js/src/jit/Ion.cpp:1974
#7  0x000000000060498d in EnsureCanEnterIon (jitcodePtr=<synthetic pointer>, pc=0x6702e69 "\232", script=..., frame=0xffef19d68, stub=<optimized out>, cx=0x64e9d30) at /home/gkwong/Desktop/js-opt-64-dm-vg-ts-linux-mozilla-central-148065-8f8a683dfc42-6Hbofa/compilePath/js/src/jit/BaselineIC.cpp:735
#8  js::jit::DoUseCountFallback (cx=0x64e9d30, stub=<optimized out>, frame=0xffef19d68, infoPtr=0xffef19d40) at /home/gkwong/Desktop/js-opt-64-dm-vg-ts-linux-mozilla-central-148065-8f8a683dfc42-6Hbofa/compilePath/js/src/jit/BaselineIC.cpp:918
#9  0x000000000402db04 in ?? ()

Program received signal SIGSEGV, Segmentation fault.
read (this=<synthetic pointer>) at /home/gkwong/Desktop/js-opt-64-dm-vg-ts-linux-mozilla-central-148065-8f8a683dfc42-6Hbofa/compilePath/js/src/jit/x64/Assembler-x64.cpp:221
221             offset_ = reader_.readUnsigned();
(gdb) up
#1  js::jit::Assembler::TraceJumpRelocations (trc=0x648b028, code=0x73651c0, reader=...) at /home/gkwong/Desktop/js-opt-64-dm-vg-ts-linux-mozilla-central-148065-8f8a683dfc42-6Hbofa/compilePath/js/src/jit/x64/Assembler-x64.cpp:253
253         while (iter.read()) {

(gdb) p code
$399 = (js::jit::IonCode *) 0x73651c0
(gdb) p code->code_
$400 = (uint8_t *) 0x41c9008 <Address 0x41c9008 out of bounds>

The pointer 0x41c9008 is an offset inside the the ExecutableAllocator, which is a page of 64kB starting at 0x41c9000 and allocated with mmap.  The ExecutableAllocator::systemRelease should munmap this pointer when this area is no longer used.

I do not know from where the annotation "<Address 0x41c9008 out of bounds>" reported within gdb is coming from.  One thing is that valgrind *seems* to have contradictory information about the state of the memory allocated.

I used "monitor get_vbits 0x41c9000 338"[1] (338 because the code pointer is offset by 8 within the buffer and its size is 330 bytes), and every part of it is marked as "defined" (00).  Note that the memory is not munmap at the time of the SEGV, otherwise it would be marked as "unaddressable" (__).

I traced[1] where the code pointer was flowing into.  It is duplicated for a while on JSScript::baselineOrIonRaw, and also multiple times on the stack of the main thread for all calls to js::jit::Cannon (to fill the EnterJitData jitcode field).  Except these locations, I cannot spot any other references.

I traced where the page pointer was flowing into.  It does not seems to be referenced by anything else than the ExecutableAllocator pool.

I also noticed that even if this bug was mostly reproducible, some time the bug cannot be reproduce.  This seems to appear often when you try to step a lot within gdb(*).  At this point I guess I would need to dig more into valgrind code to understand from where the annotation "<Address 0x41c9008 out of bounds>" might come from, and really see if I forgot a case or if this is really in contradiction with the result of "monitor get_vbits …".

Gary, do you still have the original test case, were you able to reduce/bisect with the original test case without valgrind?

(*) Within gdb attached to vgdb server.
[1] http://valgrind.org/docs/manual/mc-manual.html#mc-manual.monitor-commands
Till, it looks like this is the crash you're seeing in bug 929374. Can you reproduce it locally with your patch when you run xpcshell without gdb? Hopefully I can also reproduce it and investigate it next week, there's a very subtle JIT and/or GC bug and we really need to know what's going on.

For now let's not post more info in bug 929374 to not draw attention to this signature..
Flags: needinfo?(till)
Huh, now that's unexpected. Yes, I can reproduce locally. Ping me over the next days if you want to do a debugging session.
Flags: needinfo?(till)
I can reproduce the xpcshell crash on Linux with Till's patch in bug 929374. Investigating...
(In reply to Jan de Mooij [:jandem] from comment #38)
> I can reproduce the xpcshell crash on Linux with Till's patch in bug 929374.
> Investigating...

OK, this crash is a different issue (xpcshell only, see bug 931861).
(unfortunately I'm only able to get back to this, earliest sometime mid/late next week..)
Blocks: 932143
I can reproduce this reliably on Linux x64, Ubuntu 12.10. I'm pretty sure it's caused by Valgrind not correctly returning from the AsmJS/Ion signal handler.

We mprotect the Ion code, TraceJumpRelocations then accesses the (mprotected) jump relocation table and ends up in the signal handler. There we patch the backedges and make the code accessible again, but when we return from the signal handler something is screwed up and we end up with a bogus offset for the jump.

Julian, does this sound like a known Valgrind issue?
Flags: needinfo?(jseward)
IIRC, we were looking at this same type of situation in a different bug.
Flags: needinfo?(n.nethercote)
Flags: needinfo?(n.nethercote)
Understanding if this is a Valgrind bug is beyond my knowledge, sorry.  jseward is your only hope.
I just pinged Julian about this bug on #developers.
(In reply to Gary Kwong [:gkw] [:nth10sd] (yes, still catching up on bugmail) from comment #44)
> I just pinged Julian about this bug on #developers.

I haven't got a response. :(
I'm not going to be able to add anything to Nicolas' analysis in comment 35. I'd guess this is a valgrind bug unless we can reproduce something like it without valgrind. In any case it's hard enough to repro that we shouldn't be tracking it for release.
Assignee: terrence → nobody
I can't repro this on m-c tip on x86_64 Linux (Fedora 17).

In reply to comment #41:
> I can reproduce this reliably on Linux x64, Ubuntu 12.10. I'm pretty
> sure it's caused by Valgrind not correctly returning from the
> AsmJS/Ion signal handler.

That might be the case, and that is also what Luke was referring to
in comment #42.  However, when running it on top, I do not see any
signal deliveries happening as you describe in comment #41.

If such signal deliveries did happen, and you are expecting precise
restarts of faulting instructions, then that will be a problem for
Valgrind unless you run with
--vex-iropt-register-updates=allregs-at-mem-access
(or =allregs-at-each-insn)

By default V's JIT optimises away most simulated register writes, and
caches the simulated registers in host registers over entire basic
blocks.  As a result, simulated register values seen in signal
handlers can be very out of date.  The suggested flags disable this
optimisation and should improve the situation.

Since I can't reproduce the problem, though, and since I can't see any
of the expected signal deliveries -- the above is all speculation.
Flags: needinfo?(jseward)
If anyone can get me some STR I will happily look into this further,
if for no other reason than the segfault-fixup-restart thing is
something that sounds like Valgrind needs to handle robustly in
order to be useful for the JS engine now.
I would add (more, sigh) that --vex-iropt-register-updates=allregs-at-mem-access
more or less gets us "precise exceptions" (PX) on x86_64 and x86.  But getting
PX on the arm port of Valgrind is more difficult, because the LDMIA and STMIA 
instructions do multiple loads and stores and there's no way for V to back out
the result of such an instruction that faults part way through.

If the JS JIT can restrict itself to generating potentially faulting instructions
that only involve a single memory transaction -- that is, boring vanilla loads
and stores -- then we're probably OK.
I thought I had commented here earlier, but since I didn't...

In bug 917050, when using --vex-iropt-register-updates=allregs-at-mem-access two other invalid reads in the compact buffer reader went away, so I suspect the same is going to be true here.
I had enough cycles to go through that box again, and found that this actually got fixed by bug 916580.

autoBisect shows this is probably related to the following changeset:

The first good revision is:
changeset:   http://hg.mozilla.org/mozilla-central/rev/f9ae7613751c
user:        Dan Gohman
date:        Wed Sep 25 12:18:43 2013 -0700
summary:     Bug 916580 - Fix bugs related to the usage of calloc. r=luke

So this was an actual bug, after all. I've adjusted the flags for this bug.
Status: ASSIGNED → RESOLVED
Closed: 11 years ago
Flags: needinfo?(gary) → in-testsuite?
Resolution: --- → FIXED
Assignee: nobody → sunfish
Target Milestone: --- → mozilla27
Great! A heap overflow while the values are cached in registers would explain the behavior here -- including the difficulty of tracking it down.
-> VERIFIED since this was very reproducible only seemingly on one of my machines, and I've double checked that it was fixed there, as per comment 51.
Status: RESOLVED → VERIFIED
Group: core-security
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: