Bugzilla

Gary Kwong [:gkw] [:nth10sd] (NOT official MoCo now)

Comment 6

•

11 years ago

No, I think completely unrelated to bug 910477. This just looks like a null pointer dereference. Can you reproduce it not-on-valgrind?

Flags: needinfo?(jseward)

Reporter

Comment 7

•

11 years ago

$ ./js-opt-64-dm-vg-ts-linux-e5ca10a2b3d0 914511.js js_ReportOverRecursed called js_ReportOverRecursed called js_ReportOverRecursed called Nope, I can't reproduce it without Valgrind, but I still can, with Valgrind.

Gary Kwong [:gkw] [:nth10sd] (NOT official MoCo now)

Comment 8

•

11 years ago

This sounds like stack/frame corruption. Nicolas, Jan or myself should probably take a look at it. I'm swamped at the moment: Nicolas, do you have some spare cycles to investigate this?

Flags: needinfo?(nicolas.b.pierron)

Reporter

Comment 9

•

11 years ago

I think Jan is busy too, but I also think that nbp is in Oslo, Norway this entire week. Jan, I hope you're a better choice though. :)

Flags: needinfo?(nicolas.b.pierron) → needinfo?(jdemooij)

Gary Kwong [:gkw] [:nth10sd] (NOT official MoCo now)

Reporter

Comment 10

•

11 years ago

> nbp is in Oslo, Norway this entire week. (for the B2G work week)

Comment 11

•

11 years ago

Gary, I can't reproduce this on Linux 64-bit. I updated to the revision in comment 0, used the exact same configure line and Valgrind command but it doesn't crash. The timeout(1) call probably makes this harder to reproduce. Do you also get this crash if you use other timeout values? Maybe Nicolas or somebody else could debug this on your machine next week? @decoder: i think you also reported a timeout-related bug this week, but I can't find it now?

Flags: needinfo?(choller)

Christian Holler (:decoder)

Comment 12

•

11 years ago

(In reply to Jan de Mooij [:jandem] from comment #11) > @decoder: i think you also reported a timeout-related bug this week, but I > can't find it now? Hm, I did another search and found it this time: bug 914174, but that seems ASan-only...

Flags: needinfo?(choller)

Comment 13

•

11 years ago

Indeed, bug 914174 was just about ASan stumbling over the new interrupt code that requires a custom signal handler just like ASM.js. If this bug just reproduces under valgrind, then it might well be a problem with the interrupt code as well. We currently have bug 913876 for an invalid write with ASM.js (which is the same interrupt stuff that timeout is using), so maybe that is related.

Gary Kwong [:gkw] [:nth10sd] (NOT official MoCo now)

Reporter

Comment 14

•

11 years ago

> The timeout(1) call probably makes this harder to reproduce. Do you also get > this crash if you use other timeout values? Yeah, other timeout values should work, it used to be timeout(1800) or something. I tested on Ubuntu Linux 12.04 LTS.

Comment 15

•

11 years ago

Nicolas can you try to reproduce this or maybe investigate on Gary's machine? It's still not crashing here with the info in comment 0.

Flags: needinfo?(jdemooij) → needinfo?(nicolas.b.pierron)

Comment 16

•

11 years ago

I checked on gary's computer, we are pushing the IonCode, and failing the GC code because we cannot read anymore the CompactBuffer containing the relocation table. It seems to be a *use-after-free* of the code_ pointer of the IonCode which is allocated with execution rights in CodeGenerator::link (during IonScript generation). I have no idea when the code get freed, but based on valgrind it seems ok, after it is allocated in the CodeGenerator::link function, but is no longer when we are seeing the IonCode in PushMarkStack. The IonCode is not mutated, so we would need to track what might free a similar pointer.

Flags: needinfo?(nicolas.b.pierron)

Keywords: sec-critical

Gary Kwong [:gkw] [:nth10sd] (NOT official MoCo now)

Comment 17

•

11 years ago

Thanks Nicolas, that sounds serious and may explain some other bug reports.. Can you take this? I can't repro here unfortunately :(

Reporter

Comment 18

•

11 years ago

Setting a needinfo on myself to test this on ASan.

Flags: needinfo?(gary)

Christian Holler (:decoder)

Comment 19

•

11 years ago

This doesn't reproduce for me in ASan (at least not in my build configuration), but it also doesn't crash without any of the two tools, although it should according to the trace. I assume the fact that Valgrind makes it crashing is just related to memory reordering or different initialization, not to a specific error detection capability.

David Bolter [:davidb] (NeedInfo me for attention)

Comment 20

•

11 years ago

Gary, I'm setting the tracking flags based on your comment 2 investigation. Please let me know if I goofed anywhere.

status-b2g-v1.2: --- → affected

status-firefox24: --- → affected

status-firefox25: --- → affected

status-firefox26: --- → affected

status-firefox27: --- → affected

status-firefox-esr24: --- → affected

tracking-firefox27: --- → +

Comment 21

•

11 years ago

Comment 22

•

11 years ago

Attached patch fuzz_914511-v0.diff — Details — Splinter Review

I think this is probably a simple NULL deref. Nicolas, are the jump relocations stored in this buffer allowed to be NULL? If so, we simply need to guard, as in the attached patch. Otherwise, there is something more going on here.

Assignee: general → terrence

Status: NEW → ASSIGNED

Attachment #813333 - Flags: review?(nicolas.b.pierron)

Comment 23

•

11 years ago

Also, this showed up with the TraceJumpRelocations signature for me. I did not see the MarkIonCode signature; there may be two bugs here.

Comment 24

•

11 years ago

(In reply to Terrence Cole [:terrence] from comment #22) > I think this is probably a simple NULL deref. Nicolas, are the jump > relocations stored in this buffer allowed to be NULL? If so, we simply need > to guard, as in the attached patch. Otherwise, there is something more going > on here. IonCode pointers should not be allowed to be NULL at any point in time, after the link phase. Otherwise this means that we might just crash by executing the first page. I guess we can assert that the target->raw() is not NULL such as here: http://dxr.mozilla.org/mozilla-central/source/js/src/jit/x64/Assembler-x64.h#l642 or do it as part of the writeRelocation functions.

Al Billings [:abillings - ex-MoCo]

Comment 25

•

11 years ago

Comment on attachment 813333 [details] [diff] [review] fuzz_914511-v0.diff Review of attachment 813333 [details] [diff] [review]: ----------------------------------------------------------------- ::: js/src/jit/x64/Assembler-x64.cpp @@ +251,5 @@ > RelocationIterator iter(reader); > while (iter.read()) { > IonCode *child = CodeFromJump(code, code->raw() + iter.offset()); > + if (!child) > + continue; They are not supposed to be NULL, otherwise this means that we can generate code which can crash. If you want we can assert so, and do the same either in writeRelocations or at the end of the link phase.

Attachment #813333 - Flags: review?(nicolas.b.pierron)

Updated

•

11 years ago

Blocks: 917050

Naveed Ihsanullah [:naveed]

Comment 26

•

11 years ago

Attached file valgrind.stack — Details

When I run a debug build, I get the attached valgrind stack, which looks similar to bug 917050. Regardless, whatever is going on here is /extremely/ weird. Here is the code for CompactBufferReader::readByte: │<readByte()+1> mov %rsp,%rbp │<readByte()+4> sub $0x10,%rsp ; bump stack 2 words │<readByte()+8> mov %rdi,-0x8(%rbp) ; store |this| to top-of-stack <<<< JS_ASSERT(buffer_ < end_); >>>> ... snip ... <<<< *buffer_++ >>>> │<readByte()+81> mov -0x8(%rbp),%rax ; load |this| into $rax [[0xffeffcad0]] │<readByte()+85> mov (%rax),%rax ; load |buffer_| into $rax [[0x4034138]] >│<readByte()+88> movzbl (%rax),%edx ; <- Valgrind think $rax is 0 here? ; Result should be 0xe0 in $edx. │<readByte()+91> lea 0x1(%rax),%rcx │<readByte()+95> mov -0x8(%rbp),%rax │<readByte()+99> mov %rcx,(%rax) │<readByte()+102> mov %edx,%eax │<readByte()+104> leaveq │<readByte()+105> retq When I run +81 and +85 manually in gdb, I get the correct addresses I've listed after those instructions. The value that gdb finds in $rax after valgrind loads it is 0x0, however. Julian, could V's emulation of movzbl be going off the rails here or just left in an invalid state?

Flags: needinfo?(jseward)

Comment 27

•

11 years ago

Terrence this has been flagged needinfo for a while. Can you make progress on this without Julian's input? Is there anyone else, perhaps njn, that could weigh in valgrind's emulation of this instruction?

Comment 28

•

11 years ago

Now that I have upgraded valgrind I am getting a different and more valid-looking error. Re-doing the investigation.

Flags: needinfo?(jseward)

Comment 29

•

11 years ago

This is only giving me the MarkIonCode signature now. So far I have only managed to repro this in opt builds running under valgrind. The image valgrind attaches gdb to does not have thread info -- so it's impossible to say if the background thread might be to blame -- and it can't re-run the program, so I cannot step backwards through states. This leaves us with printf debugging as our last resort. I simply do not know enough about this code to attempt printf debugging on this. I'm afraid I need to hand this off to someone who actually knows how this is supposed to work. Nicolas, did you ever try to repro this locally?

Flags: needinfo?(nicolas.b.pierron)

Comment 30

•

11 years ago

No I have not yet tried to reproduce it locally, I will try later today.

Comment 31

•

11 years ago

So far I haven't managed to compile it with the set of configure flags given in comment 0. I am trying to get as close as possible to the set of configure options but I cannot reproduce it at the moment.

Gary Kwong [:gkw] [:nth10sd] (NOT official MoCo now)

Comment 32

•

11 years ago

I cannot reproduce this issue, Gary, can you tell me how to connect to the computer on which you are able to reproduce this issue?

Flags: needinfo?(nicolas.b.pierron)

Reporter

Comment 33

•

11 years ago

(In reply to Nicolas B. Pierron [:nbp] from comment #32) > I cannot reproduce this issue, Gary, can you tell me how to connect to the > computer on which you are able to reproduce this issue? I can still reproduce on rev 8f8a683dfc42. Will contact nbp offline.

Gary Kwong [:gkw] [:nth10sd] (NOT official MoCo now)

Reporter

Comment 34

•

11 years ago

nbp has reproduced on my box. Terrence, let me know if you need credentials to work on this.

Comment 35

•

11 years ago

(In reply to Nicolas B. Pierron [:nbp] from comment #16) > I have no idea when the code get freed, but based on valgrind it seems ok, > after it is allocated in the CodeGenerator::link function, but is no longer > when we are seeing the IonCode in PushMarkStack. The IonCode is not > mutated, so we would need to track what might free a similar pointer. Just to backup what I was saying, this is what I am seeing under gdb: Breakpoint 1, js::jit::IonCode::copyFrom (this=0x73651c0, masm=...) at /home/gkwong/Desktop/js-opt-64-dm-vg-ts-linux-mozilla-central-148065-8f8a683dfc42-6Hbofa/compilePath/js/src/jit/Ion.cpp:635 635 { $397 = (js::jit::IonCode * const) 0x73651c0 $398 = (uint8_t *) 0x41c9008 "" #0 js::jit::IonCode::copyFrom (this=0x73651c0, masm=...) at /home/gkwong/Desktop/js-opt-64-dm-vg-ts-linux-mozilla-central-148065-8f8a683dfc42-6Hbofa/compilePath/js/src/jit/Ion.cpp:635 #1 0x00000000006317f2 in js::jit::Linker::newCode (this=0xffef199a0, cx=0x64e9d30, execAlloc=0x673d5b0, kind=JSC::ION_CODE) at /home/gkwong/Desktop/js-opt-64-dm-vg-ts-linux-mozilla-central-148065-8f8a683dfc42-6Hbofa/compilePath/js/src/jit/IonLinker.h:60 #2 0x000000000065ddff in newCodeForIonScript (cx=0x64e9d30, this=0xffef199a0) at /home/gkwong/Desktop/js-opt-64-dm-vg-ts-linux-mozilla-central-148065-8f8a683dfc42-6Hbofa/compilePath/js/src/jit/IonLinker.h:93 #3 js::jit::CodeGenerator::link (this=0x673c4e0) at /home/gkwong/Desktop/js-opt-64-dm-vg-ts-linux-mozilla-central-148065-8f8a683dfc42-6Hbofa/compilePath/js/src/jit/CodeGenerator.cpp:5679 #4 0x000000000066a586 in js::jit::IonCompile (cx=0x64e9d30, script=<optimized out>, baselineFrame=0xffef19d68, osrPc=0x66cc3b0 "", constructing=224, executionMode=js::jit::SequentialExecution) at /home/gkwong/Desktop/js-opt-64-dm-vg-ts-linux-mozilla-central-148065-8f8a683dfc42-6Hbofa/compilePath/js/src/jit/Ion.cpp:1662 #5 0x000000000066a7ec in js::jit::Compile (cx=0x64e9d30, script=..., osrFrame=0xffef19d68, osrPc=0x0, constructing=false, executionMode=js::jit::SequentialExecution) at /home/gkwong/Desktop/js-opt-64-dm-vg-ts-linux-mozilla-central-148065-8f8a683dfc42-6Hbofa/compilePath/js/src/jit/Ion.cpp:1814 #6 0x000000000066b0d7 in js::jit::CompileFunctionForBaseline (cx=0x64e9d30, script=..., frame=0xffef19d68, isConstructing=false) at /home/gkwong/Desktop/js-opt-64-dm-vg-ts-linux-mozilla-central-148065-8f8a683dfc42-6Hbofa/compilePath/js/src/jit/Ion.cpp:1974 #7 0x000000000060498d in EnsureCanEnterIon (jitcodePtr=<synthetic pointer>, pc=0x6702e69 "\232", script=..., frame=0xffef19d68, stub=<optimized out>, cx=0x64e9d30) at /home/gkwong/Desktop/js-opt-64-dm-vg-ts-linux-mozilla-central-148065-8f8a683dfc42-6Hbofa/compilePath/js/src/jit/BaselineIC.cpp:735 #8 js::jit::DoUseCountFallback (cx=0x64e9d30, stub=<optimized out>, frame=0xffef19d68, infoPtr=0xffef19d40) at /home/gkwong/Desktop/js-opt-64-dm-vg-ts-linux-mozilla-central-148065-8f8a683dfc42-6Hbofa/compilePath/js/src/jit/BaselineIC.cpp:918 #9 0x000000000402db04 in ?? () Program received signal SIGSEGV, Segmentation fault. read (this=<synthetic pointer>) at /home/gkwong/Desktop/js-opt-64-dm-vg-ts-linux-mozilla-central-148065-8f8a683dfc42-6Hbofa/compilePath/js/src/jit/x64/Assembler-x64.cpp:221 221 offset_ = reader_.readUnsigned(); (gdb) up #1 js::jit::Assembler::TraceJumpRelocations (trc=0x648b028, code=0x73651c0, reader=...) at /home/gkwong/Desktop/js-opt-64-dm-vg-ts-linux-mozilla-central-148065-8f8a683dfc42-6Hbofa/compilePath/js/src/jit/x64/Assembler-x64.cpp:253 253 while (iter.read()) { (gdb) p code $399 = (js::jit::IonCode *) 0x73651c0 (gdb) p code->code_ $400 = (uint8_t *) 0x41c9008 <Address 0x41c9008 out of bounds> The pointer 0x41c9008 is an offset inside the the ExecutableAllocator, which is a page of 64kB starting at 0x41c9000 and allocated with mmap. The ExecutableAllocator::systemRelease should munmap this pointer when this area is no longer used. I do not know from where the annotation "<Address 0x41c9008 out of bounds>" reported within gdb is coming from. One thing is that valgrind *seems* to have contradictory information about the state of the memory allocated. I used "monitor get_vbits 0x41c9000 338"[1] (338 because the code pointer is offset by 8 within the buffer and its size is 330 bytes), and every part of it is marked as "defined" (00). Note that the memory is not munmap at the time of the SEGV, otherwise it would be marked as "unaddressable" (__). I traced[1] where the code pointer was flowing into. It is duplicated for a while on JSScript::baselineOrIonRaw, and also multiple times on the stack of the main thread for all calls to js::jit::Cannon (to fill the EnterJitData jitcode field). Except these locations, I cannot spot any other references. I traced where the page pointer was flowing into. It does not seems to be referenced by anything else than the ExecutableAllocator pool. I also noticed that even if this bug was mostly reproducible, some time the bug cannot be reproduce. This seems to appear often when you try to step a lot within gdb(*). At this point I guess I would need to dig more into valgrind code to understand from where the annotation "<Address 0x41c9008 out of bounds>" might come from, and really see if I forgot a case or if this is really in contradiction with the result of "monitor get_vbits …". Gary, do you still have the original test case, were you able to reduce/bisect with the original test case without valgrind? (*) Within gdb attached to vgdb server. [1] http://valgrind.org/docs/manual/mc-manual.html#mc-manual.monitor-commands

Till Schneidereit [:till]

Comment 36

•

11 years ago

Till, it looks like this is the crash you're seeing in bug 929374. Can you reproduce it locally with your patch when you run xpcshell without gdb? Hopefully I can also reproduce it and investigate it next week, there's a very subtle JIT and/or GC bug and we really need to know what's going on. For now let's not post more info in bug 929374 to not draw attention to this signature..

Flags: needinfo?(till)

Comment 37

•

11 years ago

Huh, now that's unexpected. Yes, I can reproduce locally. Ping me over the next days if you want to do a debugging session.

Flags: needinfo?(till)

Comment 38

•

11 years ago

I can reproduce the xpcshell crash on Linux with Till's patch in bug 929374. Investigating...

Gary Kwong [:gkw] [:nth10sd] (NOT official MoCo now)

Comment 39

•

11 years ago

(In reply to Jan de Mooij [:jandem] from comment #38) > I can reproduce the xpcshell crash on Linux with Till's patch in bug 929374. > Investigating... OK, this crash is a different issue (xpcshell only, see bug 931861).

Reporter

Comment 40

•

11 years ago

(unfortunately I'm only able to get back to this, earliest sometime mid/late next week..)

Andrew McCreight [:mccr8]

Updated

•

11 years ago

Blocks: 932143

Matt Wobensmith [:mwobensmith][:matt:]

Comment 41

•

11 years ago

I can reproduce this reliably on Linux x64, Ubuntu 12.10. I'm pretty sure it's caused by Valgrind not correctly returning from the AsmJS/Ion signal handler. We mprotect the Ion code, TraceJumpRelocations then accesses the (mprotected) jump relocation table and ends up in the signal handler. There we patch the backedges and make the code accessible again, but when we return from the signal handler something is screwed up and we end up with a bogus offset for the jump. Julian, does this sound like a known Valgrind issue?

Flags: needinfo?(jseward)

Luke Wagner [:luke]

Comment 42

•

11 years ago

IIRC, we were looking at this same type of situation in a different bug.

Updated

•

11 years ago

status-firefox24: affected → wontfix

status-firefox25: affected → wontfix

status-firefox28: --- → affected

Matt Wobensmith [:mwobensmith][:matt:]

Updated

•

11 years ago

Flags: needinfo?(n.nethercote)

Nicholas Nethercote [inactive]

Updated

•

11 years ago

Flags: needinfo?(n.nethercote)

Nicholas Nethercote [inactive]

Comment 43

•

11 years ago

Understanding if this is a Valgrind bug is beyond my knowledge, sorry. jseward is your only hope.

David Bolter [:davidb] (NeedInfo me for attention)

Updated

•

11 years ago

tracking-firefox28: --- → +

Gary Kwong [:gkw] [:nth10sd] (NOT official MoCo now)

Reporter

Comment 44

•

11 years ago

I just pinged Julian about this bug on #developers.

Gary Kwong [:gkw] [:nth10sd] (NOT official MoCo now)

Reporter

Comment 45

•

11 years ago

(In reply to Gary Kwong [:gkw] [:nth10sd] (yes, still catching up on bugmail) from comment #44) > I just pinged Julian about this bug on #developers. I haven't got a response. :(

Comment 46

•

11 years ago

I'm not going to be able to add anything to Nicolas' analysis in comment 35. I'd guess this is a valgrind bug unless we can reproduce something like it without valgrind. In any case it's hard enough to repro that we shouldn't be tracking it for release.

Assignee: terrence → nobody

Comment 47

•

11 years ago

I can't repro this on m-c tip on x86_64 Linux (Fedora 17). In reply to comment #41: > I can reproduce this reliably on Linux x64, Ubuntu 12.10. I'm pretty > sure it's caused by Valgrind not correctly returning from the > AsmJS/Ion signal handler. That might be the case, and that is also what Luke was referring to in comment #42. However, when running it on top, I do not see any signal deliveries happening as you describe in comment #41. If such signal deliveries did happen, and you are expecting precise restarts of faulting instructions, then that will be a problem for Valgrind unless you run with --vex-iropt-register-updates=allregs-at-mem-access (or =allregs-at-each-insn) By default V's JIT optimises away most simulated register writes, and caches the simulated registers in host registers over entire basic blocks. As a result, simulated register values seen in signal handlers can be very out of date. The suggested flags disable this optimisation and should improve the situation. Since I can't reproduce the problem, though, and since I can't see any of the expected signal deliveries -- the above is all speculation.

Flags: needinfo?(jseward)

Comment 48

•

11 years ago

If anyone can get me some STR I will happily look into this further, if for no other reason than the segfault-fixup-restart thing is something that sounds like Valgrind needs to handle robustly in order to be useful for the JS engine now.

Comment 49

•

11 years ago

I would add (more, sigh) that --vex-iropt-register-updates=allregs-at-mem-access more or less gets us "precise exceptions" (PX) on x86_64 and x86. But getting PX on the arm port of Valgrind is more difficult, because the LDMIA and STMIA instructions do multiple loads and stores and there's no way for V to back out the result of such an instruction that faults part way through. If the JS JIT can restrict itself to generating potentially faulting instructions that only involve a single memory transaction -- that is, boring vanilla loads and stores -- then we're probably OK.

Gary Kwong [:gkw] [:nth10sd] (NOT official MoCo now)

Comment 50

•

11 years ago

I thought I had commented here earlier, but since I didn't... In bug 917050, when using --vex-iropt-register-updates=allregs-at-mem-access two other invalid reads in the compact buffer reader went away, so I suspect the same is going to be true here.

Reporter

Comment 51

•

11 years ago

I had enough cycles to go through that box again, and found that this actually got fixed by bug 916580. autoBisect shows this is probably related to the following changeset: The first good revision is: changeset: http://hg.mozilla.org/mozilla-central/rev/f9ae7613751c user: Dan Gohman date: Wed Sep 25 12:18:43 2013 -0700 summary: Bug 916580 - Fix bugs related to the usage of calloc. r=luke So this was an actual bug, after all. I've adjusted the flags for this bug.

Status: ASSIGNED → RESOLVED

Closed: 11 years ago

status-b2g18: --- → fixed

status-b2g-v1.1hd: --- → fixed

status-b2g-v1.2: affected → fixed

status-firefox25: wontfix → fixed

status-firefox26: affected → fixed

status-firefox27: affected → fixed

status-firefox28: affected → fixed

status-firefox-esr24: affected → fixed

Flags: needinfo?(gary) → in-testsuite?

Resolution: --- → FIXED

Ryan VanderMeulen [:RyanVM]

Updated

•

11 years ago

Assignee: nobody → sunfish

Target Milestone: --- → mozilla27