tracer generated code reads stack memory it didn't write

RESOLVED FIXED

Status

()

defect
RESOLVED FIXED
9 years ago
9 years ago

People

(Reporter: jseward, Assigned: njn)

Tracking

Trunk
x86_64
Linux
Points:
---

Firefox Tracking Flags

(blocking2.0 betaN+)

Details

(Whiteboard: hardblocker, [sg:critical?], fixed-in-tracemonkey)

Attachments

(2 attachments)

Reporter

Description

9 years ago
x64-linux build, MC of current.

No easy STR currently.  I only see it when running a production
workload of about 80 tabs.

I see this, which I interpret as tracer generated code doing a
conditional branch based on uninitialised value in its own stack
frame:

Conditional jump or move depends on uninitialised value(s)
   at 0x1CA518D1: ???
   by 0x6719171: js::ExecuteTree(JSContext*, js::TreeFragment*, unsigned int&,
                 js::VMSideExit**, js::VMSideExit**) (jstracer.cpp:6456)
   by 0x672DE9B: js::TraceRecorder::attemptTreeCall(js::TreeFragment*,
                 unsigned int&) (jstracer.cpp:6046)
   by 0x672E21C: js::TraceRecorder::recordLoopEdge(JSContext*,
                 js::TraceRecorder*, unsigned int&) (jstracer.cpp:6007)
   by 0x67375D2: js::RecordLoopEdge(JSContext*, unsigned int&)
                 (jstracer.cpp:7000)
   by 0x67E429E: js::Interpret(JSContext*, JSStackFrame*, unsigned int,
                 JSInterpMode) (jsinterp.cpp:3502)
   by 0x67219F7: js::RecordTracePoint(JSContext*, unsigned int&, bool*, bool)
                 (jstracer.cpp:16545)
   by 0x6721E21: js::MonitorTracePoint(JSContext*, unsigned int&, bool*,
                 void**, unsigned int*) (jstracer.cpp:16701)
   by 0x67A61BE: RunTracer(js::VMFrame&, js::mjit::ic::TraceICInfo&)
                 (InvokeHelpers.cpp:974)
   by 0x1C9FE777: ???
   by 0x674CC87: js::mjit::EnterMethodJIT(JSContext*, JSStackFrame*, void*,
                 js::Value*) (MethodJIT.cpp:748)
   by 0x674CF1C: js::mjit::JaegerShot(JSContext*) (MethodJIT.cpp:774)

 Uninitialised value was created by a stack allocation
   at 0x1CA517BC: ???


Catching it in GDB:

(gdb) where
#0  0x000000001ca518d1 in ?? ()
#1  0x000000001b0e8330 in ?? ()
#2  0x0000000020795e20 in ?? ()
#3  0xfffa800018000040 in ?? ()
#4  0x000000000670bcef in VisitFrameSlots<js::CountSlotsVisitor> (visitor=..., cx=0x7fffffffffff, depth=<value optimised out>, fp=0x7feffe590, next=0x0)
    at /space2/sewardj/MOZ/MC-13-01-2011/js/src/jstracer.cpp:1761
#5  0x0000000000000000 in ?? ()


Well, that doesn't tell us much.  But the assembly is interesting.
I've annotated reads and writes that are %rbp-based.  At
0x000000001ca518c9 (near the end) we read a 4 byte value from
-0x34(%rbp) .. -0x31(%rbp), and do a conditional branch on it, at
which point V complains.

Looking at all the rbp-based references, though, I can't see any place
where those 4 bytes are written.  Just prior to that we pass the value
-0x40(%rbp) as the 4th arg ("Value* vp") to
GetPropertyWithNativeGetter, but, even assuming it writes to that arg,
it would only hit -0x40 .. -0x38-1.

So .. it looks kinda suspicious to me.


Dump of assembler code from 0x1ca517bc to 0x1ca51900:
   0x000000001ca517bc:	push   %rbp
   0x000000001ca517bd:	mov    %rsp,%rbp
   0x000000001ca517c0:	sub    $0x50,%rsp
   0x000000001ca517c4:	mov    %rbx,-0x30(%rbp)   # W -0x30 .. -0x28 -1
   0x000000001ca517c8:	mov    %r12,-0x20(%rbp)   # W -0x20 .. -0x18 -1
   0x000000001ca517cc:	mov    %r13,-0x18(%rbp)   # W -0x18 .. -0x10 -1
   0x000000001ca517d0:	mov    %r14,-0x10(%rbp)   # W -0x10 .. -0x8 -1
   0x000000001ca517d4:	mov    %r15,-0x8(%rbp)    # W -0x8  .. -0x0 -1
   0x000000001ca517d8:	mov    %rdi,-0x28(%rbp)   # W -0x28 .. -0x20 -1
   0x000000001ca517dc:	mov    %rdi,%r13
   0x000000001ca517df:	mov    0x10(%r13),%rbx
   0x000000001ca517e3:	mov    0x0(%r13),%rdi
   0x000000001ca517e7:	mov    $0x15d40940,%r13d
   0x000000001ca517ed:	mov    0x0(%r13),%r12d
   0x000000001ca517f1:	cmp    $0x0,%r12d
   0x000000001ca517f5:	jne    0x1ca61751
   0x000000001ca517fb:	mov    $0x2296d9d0,%r13d
   0x000000001ca51801:	mov    0x0(%r13),%r12d
   0x000000001ca51805:	cmp    $0xc8,%r12d
   0x000000001ca5180c:	jge    0x1ca51816
   0x000000001ca5180e:	add    $0x1,%r12d
   0x000000001ca51812:	mov    %r12d,0x0(%r13)
   0x000000001ca51816:	mov    -0x8(%rbx),%rsi
   0x000000001ca5181a:	movl   $0x0,0x0(%rbx)
   0x000000001ca51821:	mov    0x14(%rsi),%r12d
   0x000000001ca51825:	cmp    $0x41ada,%r12d
   0x000000001ca5182c:	jne    0x1ca61776
   0x000000001ca51832:	mov    0x20(%rsi),%r13
   0x000000001ca51836:	cmp    $0x0,%r13
   0x000000001ca5183a:	je     0x1ca6179b
   0x000000001ca51840:	mov    0x14(%r13),%r12d
   0x000000001ca51844:	cmp    $0x41851,%r12d
   0x000000001ca5184b:	jne    0x1ca617c0
   0x000000001ca51851:	movq   $0x34716ae8,0x1c8(%rdi)
   0x000000001ca5185c:	lea    -0x40(%rbp),%rcx   # rcx == %rbp - 0x40; argreg4 (Value* vp)
   0x000000001ca51860:	mov    $0x3a4bebc8,%edx
   0x000000001ca51865:	mov    %rdi,%r12
   0x000000001ca51868:	callq  0x670a4e0 <GetPropertyWithNativeGetter>
   0x000000001ca5186d:	mov    %r12,%rdi
   0x000000001ca51870:	movsd  -0x40(%rbp),%xmm0  # R -0x40 .. -0x38 -1
   0x000000001ca51875:	movsd  %xmm0,0x8(%rbx)
   0x000000001ca5187a:	movq   $0x0,0x1c8(%rdi)
   0x000000001ca51885:	cmp    $0x0,%eax
   0x000000001ca51888:	je     0x1ca617e5
   0x000000001ca5188e:	mov    -0x40(%rbp),%r12   # R -0x40 .. -0x38 -1
   0x000000001ca51892:	movabs $0x7fffffffffff,%r13
   0x000000001ca5189c:	mov    %r12,%rsi
   0x000000001ca5189f:	and    %r13,%rsi
   0x000000001ca518a2:	shr    $0x2f,%r12
   0x000000001ca518a6:	cmp    $0x1fff5,%r12d
   0x000000001ca518ad:	jne    0x1ca6180a
   0x000000001ca518b3:	mov    %rsi,0x8(%rbx)
   0x000000001ca518b7:	mov    0x0(%rsi),%r12
   0x000000001ca518bb:	shr    $0x4,%r12
   0x000000001ca518bf:	cmp    $0x0,%r12
   0x000000001ca518c3:	jne    0x1ca6182f
   0x000000001ca518c9:	mov    -0x34(%rbp),%r12d  # R -0x34 .. -0x30 -1
   0x000000001ca518cd:	cmp    $0x0,%r12d
=> 0x000000001ca518d1:	je     0x1ca61854         # ERROR REPORTED HERE
   0x000000001ca518d7:	lea    -0x34(%rbp),%rdx
   0x000000001ca518db:	callq  0x674bb90
                        <_Z16js_StringToInt32P9JSContextP8JSStringPi>
   0x000000001ca518e0:	mov    -0x34(%rbp),%r12d
   0x000000001ca518e4:	cmp    $0x0,%r12d
   0x000000001ca518e8:	je     0x1ca61879
   0x000000001ca518ee:	mov    %eax,0x0(%rbx)
   0x000000001ca518f1:	cmp    $0x0,%eax
   0x000000001ca518f4:	jne    0x1ca6189e
   0x000000001ca518fa:	mov    -0x20(%rbx),%r12d
   0x000000001ca518fe:	add    $0x1,%r12d
Reporter

Comment 1

9 years ago
STR:
1.  create an empty (default) profile
2.  load http://www.theregister.co.uk  (on M-C on valgrind on x64-linux)

The error appears after the page is mostly fully drawn and is repeated
3 times.
Reporter

Comment 2

9 years ago
Also happens with a 64-bit build on MacOSX (unsurprisingly, as the
generated code is presumably identical).
Assignee

Comment 3

9 years ago
Julian, can you dig into this some more if you have time?  We need to know which fragment is doing the read.  Here's how I'd proceed:

- In jstracer.cpp is a function called tprint(), which allows you to insert prinft-style calls into a trace.  It's not well documented, look at its code and PrintOnTrace() to work out how to use it.

- Insert a call to it in each fragment, which prints the file/line number for the fragment.

- Run it in a (debug) shell build with TMFLAGS=afterdce to check that it works.

- Run it in a (debug) browser build on theregister.co.uk under Valgrind with TMFLAGS=afterdce,native and see if you can determine which fragment is causing the problem.  If you combine this with --db-attach in Valgrind you'll probably be able to pinpoint it to the exact LIR/native instruction in the fragment.

- Post that fragment here :)

Make sense?
Assignee

Comment 4

9 years ago
One complication: printing of LIR stuff is done to stdout, which is buffered.  Hopefully the interleaving with the stderr output from Valgrind will still make sense.

Actually, you mightn't even need the tprint stuff;  just having the code addresses from GDB should allow you to find the relevant parts of the LIR dumps.
Reporter

Comment 5

9 years ago
Here's the offending fragment.  Seems like it allocates
a bit of space, then loads the value and branches on it.

  ------------------------------ # JSOP_BITXOR
      allocp2 = allocp 4
      ldi3 = ldi.alloc allocp2[0]
  0x199ee8c9   movl r12d, -52(rbp)
      guard(oom) = eqi ldi3, undefined/*0*/
      xt2: xt guard(oom) -> exit=0x1b1136f8 pc=0x16820b1b imacpc=(nil) sp+16 rp+0 OOM
  0x199ee8cd   cmpl r12d, 0
  0x199ee8d1   je 0x199fe854          ################ Uninit value use
Reporter

Comment 6

9 years ago
the complete fragment
Assignee

Comment 7

9 years ago
Ok, bug 609440 changed js_StringToNumber's to this:

jsdouble FASTCALL
js_StringToNumber(JSContext* cx, JSString* str, JSBool *ok)
{
    double out = 0;  /* silence warnings. */
    *ok = StringToNumberType<jsdouble>(cx, str, &out);
    return out;
}

But left its callinfo unchanged, as this:

JS_DEFINE_CALLINFO_3(extern, DOUBLE, js_StringToNumber, CONTEXT, STRING, BOOLPTR,
                     1, ACCSET_NONE)

In other words, it's claiming that js_StringToNumber is pure and doesn't store to any LIR-readable memory.  But it has an outparam that gets written to, so both those claims are false.

What happened is that the return value was dead, so the function was removed, because it was marked as pure.  But the OOM check still checked allocp2, which was the outparam.

So it's a perf bug, we end up thinking we've OOM'd when we haven't.  I don't think it's a correctness bug, but I'm not certain, it might cause an OOM message to be presented.

It's an easy fix, and should be a blocker.
blocking2.0: --- → ?
Assignee

Comment 8

9 years ago
Oh, and this is something I've been worried about -- our purity/AccSet annotations on functions aren't checked at all, and if you get them wrong it can lead to subtle bugs.  I opened bug 625610 to add partial checking of annotations.
Assignee: general → nnethercote
Status: NEW → UNCONFIRMED
Ever confirmed: false
Status: UNCONFIRMED → ASSIGNED
Ever confirmed: true
Assignee

Comment 9

9 years ago
Simple fix.
Attachment #503690 - Flags: review?(lw)
Assignee

Comment 10

9 years ago
I forgot to say:  Julian confirmed that this patch stop's Memcheck complaining.
Comment on attachment 503690 [details] [diff] [review]
patch (against TM 60218:c8cb1ab7612a)

Thanks!
Attachment #503690 - Flags: review?(lw) → review+

Updated

9 years ago
blocking2.0: ? → betaN+
Whiteboard: hardblocker
Assignee

Comment 12

9 years ago
http://hg.mozilla.org/tracemonkey/rev/b824cec25ece
Whiteboard: hardblocker → hardblocker, [sg:critical?], fixed-in-tracemonkey
http://hg.mozilla.org/mozilla-central/rev/b824cec25ece
Status: ASSIGNED → RESOLVED
Closed: 9 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.