Closed Bug 738117 Opened 13 years ago Closed 13 years ago

Valgrind and Nulgrind show SIGSEGV with testcase on Snow Leopard even though testcase does not crash

Categories

(Core :: JavaScript Engine, defect)

x86_64
macOS
defect
Not set
normal

Tracking

()

RESOLVED WORKSFORME

People

(Reporter: gkw, Unassigned)

Details

(Keywords: testcase, valgrind, Whiteboard: [valgrind bug])

Attachments

(1 file)

function h(sandboxType) {
  switch (sandboxType) {
  default:
    k = newGlobal("new-compartment");
  }
  return function(f, code) {
    try {
      evalcx(code, k);
    } catch (e) {}
  };
}
function p(code) {
  f = new Function(code);
  g(f, code);
}
function g(f, code, wtt) {
  var rv = f();
}
p("g=h()");
p("with(x=<y>></y>){(function(){return function(){1}})}");
p("for(let z=0;z<1;){c=z;t};");
p("c%-x");

(without Valgrind testcase does not abort in any way)

I ran with valgrind --dsymutil=yes ./js

(this happens on Snow Leopard)

Using 64-bit js opt shell on Mac 10.6 with Valgrind r12455 on m-i changeset ca353538c7f9, the following error is shown:

==17021== Invalid read of size 8
==17021==    at 0x10003FE8F: JSCompartment::wrap(JSContext*, JS::Value*) (jscompartment.cpp:214)
==17021==    by 0x10000488E: EvalInContext(JSContext*, unsigned int, JS::Value*) (js.cpp:2687)
==17021==    by 0x10009BC7C: js::InvokeKernel(JSContext*, js::CallArgs, js::MaybeConstruct) (jscntxtinlines.h:314)
==17021==    by 0x10008C7EF: js::Interpret(JSContext*, js::StackFrame*, js::InterpMode) (jsinterp.cpp:2685)
==17021==    by 0x10009AA54: js::RunScript(JSContext*, JSScript*, js::StackFrame*) (jsinterp.cpp:469)
==17021==    by 0x10009AE6D: js::ExecuteKernel(JSContext*, JSScript*, JSObject&, JS::Value const&, js::ExecuteType, js::StackFrame*, JS::Value*) (jsinterp.cpp:667)
==17021==    by 0x10009B9C7: js::Execute(JSContext*, JSScript*, JSObject&, JS::Value*) (jsinterp.cpp:709)
==17021==    by 0x10001E607: JS_ExecuteScript (jsapi.cpp:5229)
==17021==    by 0x100005770: Process(JSContext*, JSObject*, char const*, bool) (js.cpp:478)
==17021==    by 0x100005DEB: Shell(JSContext*, js::cli::OptionParser*, char**) (js.cpp:4730)
==17021==    by 0x100007045: main (js.cpp:5022)
==17021==  Address 0x7ffffffff000 is not stack'd, malloc'd or (recently) free'd
==17021== 
==17021== 
==17021== Process terminating with default action of signal 11 (SIGSEGV)
==17021==  Access not within mapped region at address 0x7FFFFFFFF000
==17021==    at 0x10003FE8F: JSCompartment::wrap(JSContext*, JS::Value*) (jscompartment.cpp:214)
==17021==    by 0x10000488E: EvalInContext(JSContext*, unsigned int, JS::Value*) (js.cpp:2687)
==17021==    by 0x10009BC7C: js::InvokeKernel(JSContext*, js::CallArgs, js::MaybeConstruct) (jscntxtinlines.h:314)
==17021==    by 0x10008C7EF: js::Interpret(JSContext*, js::StackFrame*, js::InterpMode) (jsinterp.cpp:2685)
==17021==    by 0x10009AA54: js::RunScript(JSContext*, JSScript*, js::StackFrame*) (jsinterp.cpp:469)
==17021==    by 0x10009AE6D: js::ExecuteKernel(JSContext*, JSScript*, JSObject&, JS::Value const&, js::ExecuteType, js::StackFrame*, JS::Value*) (jsinterp.cpp:667)
==17021==    by 0x10009B9C7: js::Execute(JSContext*, JSScript*, JSObject&, JS::Value*) (jsinterp.cpp:709)
==17021==    by 0x10001E607: JS_ExecuteScript (jsapi.cpp:5229)
==17021==    by 0x100005770: Process(JSContext*, JSObject*, char const*, bool) (js.cpp:478)
==17021==    by 0x100005DEB: Shell(JSContext*, js::cli::OptionParser*, char**) (js.cpp:4730)
==17021==    by 0x100007045: main (js.cpp:5022)

Nulgrind shows the latter error.
Summary: Valgrind and Nulgrind show SIGSEGV even though testcase does not crash → Valgrind and Nulgrind show SIGSEGV with testcase on Snow Leopard even though testcase does not crash
Hardware: x86 → x86_64
Last I checked, this happens on m-c changeset 4bdae514b9be as well.
I can reproduce this, but I can't figure out why it happens.  It
strikes me as likely that it is a bug in Valgrind, but I am not even
sure of that -- V can run Firefox on MacOSX no problem, for example.

It might also be a bug in JS, in that it makes some assumption about
memory layout or some such, which happens to be true natively but is
not true when running on V.  Unlikely, but such things have happened
in the past.

The test case fails with symptoms of JS heap corruption, despite
running fine natively.  A debug build also fails, somewhat earlier in
an assertion, also with symptoms of heap corruption.

I also tried to reproduce on on x86_64-linux, but failed.

Currently I have no leads.  One line of approach would be to take the
failing case and cut it down to something smaller than the entire JS
engine.  That does not sound easy, though.

Is there any part of the JS engine that makes assumptions about memory
layout?  In particular (eg) whether objects can/can't exist in
particular 64-bit address ranges?  Isn't there some such bit twiddling
magic to do with the "new" 64-bit JSVal representation?
Math.atan2((-01.t),0)


(without Valgrind testcase does not abort in any way)

I ran with valgrind --dsymutil=yes ./js

(this happens on Snow Leopard)

Using 64-bit js opt shell on Mac 10.6 with Valgrind on m-c changeset 95df15895e02, the following error is shown:

==73463== Invalid read of size 8
==73463==    at 0x1000C1B90: js_GetMethod(JSContext*, JSObject*, long, unsigned int, JS::Value*) (jsscope.h:603)
==73463==    by 0x100107A71: js_ValueToSource(JSContext*, JS::Value const&) (jsstr.cpp:3315)
==73463==    by 0x100018514: JS_ValueToSource (jsapi.cpp:534)
==73463==    by 0x100005C36: Process(JSContext*, JSObject*, char const*, bool) (js.cpp:583)
==73463==    by 0x1000064CB: Shell(JSContext*, js::cli::OptionParser*, char**) (js.cpp:4731)
==73463==    by 0x100007165: main (js.cpp:5017)
==73463==  Address 0x7fffffffffff is not stack'd, malloc'd or (recently) free'd
==73463== 
==73463== 
==73463== Process terminating with default action of signal 11 (SIGSEGV)
==73463==  Access not within mapped region at address 0x7FFFFFFFFFFF
==73463==    at 0x1000C1B90: js_GetMethod(JSContext*, JSObject*, long, unsigned int, JS::Value*) (jsscope.h:603)
==73463==    by 0x100107A71: js_ValueToSource(JSContext*, JS::Value const&) (jsstr.cpp:3315)
==73463==    by 0x100018514: JS_ValueToSource (jsapi.cpp:534)
==73463==    by 0x100005C36: Process(JSContext*, JSObject*, char const*, bool) (js.cpp:583)
==73463==    by 0x1000064CB: Shell(JSContext*, js::cli::OptionParser*, char**) (js.cpp:4731)
==73463==    by 0x100007165: main (js.cpp:5017)
==73463==  If you believe this happened as a result of a stack
==73463==  overflow in your program's main thread (unlikely but
==73463==  possible), you can try to increase the size of the
==73463==  main thread stack using the --main-stacksize= flag.
==73463==  The main thread stack size used in this run was 8388608.
==73463== 
==73463== HEAP SUMMARY:
==73463==     in use at exit: 1,159,878 bytes in 1,408 blocks
==73463==   total heap usage: 2,112 allocs, 704 frees, 1,273,167 bytes allocated
==73463== 
==73463== LEAK SUMMARY:
==73463==    definitely lost: 16 bytes in 1 blocks
==73463==    indirectly lost: 0 bytes in 0 blocks
==73463==      possibly lost: 832 bytes in 16 blocks
==73463==    still reachable: 1,158,942 bytes in 1,390 blocks
==73463==         suppressed: 88 bytes in 1 blocks
==73463== Rerun with --leak-check=full to see details of leaked memory
==73463== 
==73463== For counts of detected and suppressed errors, rerun with: -v
==73463== ERROR SUMMARY: 1 errors from 1 contexts (suppressed: 0 from 0)
Segmentation fault
Hi Julian.  I looked at this with Gary.  I put a printf on the first line of JS_ValueToString that looks like:

  printf("The bits: %p\n", (void *)JSVAL_TO_IMPL(v).asBits);

and ran:

  print(Math.atan2((-01.t),0))

in a 64-bit opt shell on OSX10.6.  For native, the result is a "canonical" NaN: 0xfff8000000000000.
Run from inside valgrind, the result is a decidedly non-canonical NaN: 0xffffffffffffffff.

So valgrind is technically returning a NaN, but it is breaking the implicit contract that we (as well as jsc and v8) depend on: that math functions only return canonical NaNs.  In theory, we could put a JS_CANONICALIZE_NAN after all math functions, but this is slow.  (I suppose we could put one in for --enable-valgrind builds, but I think ideally valgrind would return the same NaN value as the system).
Whiteboard: js-triage-needed → valgrind bug
Whiteboard: valgrind bug → [valgrind bug]
(In reply to Luke Wagner [:luke] from comment #4)
> Hi Luke, Gary,

Thanks for the lead.  I had been contemplating "address space
weirdness" (unlikely) or "integer arithmetic weirdness" (very
unlikely), so I was completely on the wrong track.

I can repro this with a 10-line C program that prints atan2(NaN,0),
which makes it a lot easier to track down.  I think it is due to the
kludging that V does for 80-vs-64 legacy (x87) floating point.  On the
case.
I think this will fix the reported problem.
Committed, r2276/r12497.
> Committed, r2276/r12497.

Confirming fixed by the Valgrind patch. Thus, this bug can be marked WFM since it is not a patch for Mozilla code.
Status: NEW → RESOLVED
Closed: 13 years ago
Resolution: --- → WORKSFORME
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: