Closed Bug 625333 Opened 14 years ago Closed 14 years ago

tracer generated code reads stack memory it didn't write

Categories

(Core :: JavaScript Engine, defect)

x86_64
Linux
defect
Not set
normal

Tracking

()

RESOLVED FIXED
Tracking Status
blocking2.0 --- betaN+

People

(Reporter: jseward, Assigned: n.nethercote)

Details

(Whiteboard: hardblocker, [sg:critical?], fixed-in-tracemonkey)

Attachments

(2 files)

x64-linux build, MC of current. No easy STR currently. I only see it when running a production workload of about 80 tabs. I see this, which I interpret as tracer generated code doing a conditional branch based on uninitialised value in its own stack frame: Conditional jump or move depends on uninitialised value(s) at 0x1CA518D1: ??? by 0x6719171: js::ExecuteTree(JSContext*, js::TreeFragment*, unsigned int&, js::VMSideExit**, js::VMSideExit**) (jstracer.cpp:6456) by 0x672DE9B: js::TraceRecorder::attemptTreeCall(js::TreeFragment*, unsigned int&) (jstracer.cpp:6046) by 0x672E21C: js::TraceRecorder::recordLoopEdge(JSContext*, js::TraceRecorder*, unsigned int&) (jstracer.cpp:6007) by 0x67375D2: js::RecordLoopEdge(JSContext*, unsigned int&) (jstracer.cpp:7000) by 0x67E429E: js::Interpret(JSContext*, JSStackFrame*, unsigned int, JSInterpMode) (jsinterp.cpp:3502) by 0x67219F7: js::RecordTracePoint(JSContext*, unsigned int&, bool*, bool) (jstracer.cpp:16545) by 0x6721E21: js::MonitorTracePoint(JSContext*, unsigned int&, bool*, void**, unsigned int*) (jstracer.cpp:16701) by 0x67A61BE: RunTracer(js::VMFrame&, js::mjit::ic::TraceICInfo&) (InvokeHelpers.cpp:974) by 0x1C9FE777: ??? by 0x674CC87: js::mjit::EnterMethodJIT(JSContext*, JSStackFrame*, void*, js::Value*) (MethodJIT.cpp:748) by 0x674CF1C: js::mjit::JaegerShot(JSContext*) (MethodJIT.cpp:774) Uninitialised value was created by a stack allocation at 0x1CA517BC: ??? Catching it in GDB: (gdb) where #0 0x000000001ca518d1 in ?? () #1 0x000000001b0e8330 in ?? () #2 0x0000000020795e20 in ?? () #3 0xfffa800018000040 in ?? () #4 0x000000000670bcef in VisitFrameSlots<js::CountSlotsVisitor> (visitor=..., cx=0x7fffffffffff, depth=<value optimised out>, fp=0x7feffe590, next=0x0) at /space2/sewardj/MOZ/MC-13-01-2011/js/src/jstracer.cpp:1761 #5 0x0000000000000000 in ?? () Well, that doesn't tell us much. But the assembly is interesting. I've annotated reads and writes that are %rbp-based. At 0x000000001ca518c9 (near the end) we read a 4 byte value from -0x34(%rbp) .. -0x31(%rbp), and do a conditional branch on it, at which point V complains. Looking at all the rbp-based references, though, I can't see any place where those 4 bytes are written. Just prior to that we pass the value -0x40(%rbp) as the 4th arg ("Value* vp") to GetPropertyWithNativeGetter, but, even assuming it writes to that arg, it would only hit -0x40 .. -0x38-1. So .. it looks kinda suspicious to me. Dump of assembler code from 0x1ca517bc to 0x1ca51900: 0x000000001ca517bc: push %rbp 0x000000001ca517bd: mov %rsp,%rbp 0x000000001ca517c0: sub $0x50,%rsp 0x000000001ca517c4: mov %rbx,-0x30(%rbp) # W -0x30 .. -0x28 -1 0x000000001ca517c8: mov %r12,-0x20(%rbp) # W -0x20 .. -0x18 -1 0x000000001ca517cc: mov %r13,-0x18(%rbp) # W -0x18 .. -0x10 -1 0x000000001ca517d0: mov %r14,-0x10(%rbp) # W -0x10 .. -0x8 -1 0x000000001ca517d4: mov %r15,-0x8(%rbp) # W -0x8 .. -0x0 -1 0x000000001ca517d8: mov %rdi,-0x28(%rbp) # W -0x28 .. -0x20 -1 0x000000001ca517dc: mov %rdi,%r13 0x000000001ca517df: mov 0x10(%r13),%rbx 0x000000001ca517e3: mov 0x0(%r13),%rdi 0x000000001ca517e7: mov $0x15d40940,%r13d 0x000000001ca517ed: mov 0x0(%r13),%r12d 0x000000001ca517f1: cmp $0x0,%r12d 0x000000001ca517f5: jne 0x1ca61751 0x000000001ca517fb: mov $0x2296d9d0,%r13d 0x000000001ca51801: mov 0x0(%r13),%r12d 0x000000001ca51805: cmp $0xc8,%r12d 0x000000001ca5180c: jge 0x1ca51816 0x000000001ca5180e: add $0x1,%r12d 0x000000001ca51812: mov %r12d,0x0(%r13) 0x000000001ca51816: mov -0x8(%rbx),%rsi 0x000000001ca5181a: movl $0x0,0x0(%rbx) 0x000000001ca51821: mov 0x14(%rsi),%r12d 0x000000001ca51825: cmp $0x41ada,%r12d 0x000000001ca5182c: jne 0x1ca61776 0x000000001ca51832: mov 0x20(%rsi),%r13 0x000000001ca51836: cmp $0x0,%r13 0x000000001ca5183a: je 0x1ca6179b 0x000000001ca51840: mov 0x14(%r13),%r12d 0x000000001ca51844: cmp $0x41851,%r12d 0x000000001ca5184b: jne 0x1ca617c0 0x000000001ca51851: movq $0x34716ae8,0x1c8(%rdi) 0x000000001ca5185c: lea -0x40(%rbp),%rcx # rcx == %rbp - 0x40; argreg4 (Value* vp) 0x000000001ca51860: mov $0x3a4bebc8,%edx 0x000000001ca51865: mov %rdi,%r12 0x000000001ca51868: callq 0x670a4e0 <GetPropertyWithNativeGetter> 0x000000001ca5186d: mov %r12,%rdi 0x000000001ca51870: movsd -0x40(%rbp),%xmm0 # R -0x40 .. -0x38 -1 0x000000001ca51875: movsd %xmm0,0x8(%rbx) 0x000000001ca5187a: movq $0x0,0x1c8(%rdi) 0x000000001ca51885: cmp $0x0,%eax 0x000000001ca51888: je 0x1ca617e5 0x000000001ca5188e: mov -0x40(%rbp),%r12 # R -0x40 .. -0x38 -1 0x000000001ca51892: movabs $0x7fffffffffff,%r13 0x000000001ca5189c: mov %r12,%rsi 0x000000001ca5189f: and %r13,%rsi 0x000000001ca518a2: shr $0x2f,%r12 0x000000001ca518a6: cmp $0x1fff5,%r12d 0x000000001ca518ad: jne 0x1ca6180a 0x000000001ca518b3: mov %rsi,0x8(%rbx) 0x000000001ca518b7: mov 0x0(%rsi),%r12 0x000000001ca518bb: shr $0x4,%r12 0x000000001ca518bf: cmp $0x0,%r12 0x000000001ca518c3: jne 0x1ca6182f 0x000000001ca518c9: mov -0x34(%rbp),%r12d # R -0x34 .. -0x30 -1 0x000000001ca518cd: cmp $0x0,%r12d => 0x000000001ca518d1: je 0x1ca61854 # ERROR REPORTED HERE 0x000000001ca518d7: lea -0x34(%rbp),%rdx 0x000000001ca518db: callq 0x674bb90 <_Z16js_StringToInt32P9JSContextP8JSStringPi> 0x000000001ca518e0: mov -0x34(%rbp),%r12d 0x000000001ca518e4: cmp $0x0,%r12d 0x000000001ca518e8: je 0x1ca61879 0x000000001ca518ee: mov %eax,0x0(%rbx) 0x000000001ca518f1: cmp $0x0,%eax 0x000000001ca518f4: jne 0x1ca6189e 0x000000001ca518fa: mov -0x20(%rbx),%r12d 0x000000001ca518fe: add $0x1,%r12d
STR: 1. create an empty (default) profile 2. load http://www.theregister.co.uk (on M-C on valgrind on x64-linux) The error appears after the page is mostly fully drawn and is repeated 3 times.
Also happens with a 64-bit build on MacOSX (unsurprisingly, as the generated code is presumably identical).
Julian, can you dig into this some more if you have time? We need to know which fragment is doing the read. Here's how I'd proceed: - In jstracer.cpp is a function called tprint(), which allows you to insert prinft-style calls into a trace. It's not well documented, look at its code and PrintOnTrace() to work out how to use it. - Insert a call to it in each fragment, which prints the file/line number for the fragment. - Run it in a (debug) shell build with TMFLAGS=afterdce to check that it works. - Run it in a (debug) browser build on theregister.co.uk under Valgrind with TMFLAGS=afterdce,native and see if you can determine which fragment is causing the problem. If you combine this with --db-attach in Valgrind you'll probably be able to pinpoint it to the exact LIR/native instruction in the fragment. - Post that fragment here :) Make sense?
One complication: printing of LIR stuff is done to stdout, which is buffered. Hopefully the interleaving with the stderr output from Valgrind will still make sense. Actually, you mightn't even need the tprint stuff; just having the code addresses from GDB should allow you to find the relevant parts of the LIR dumps.
Here's the offending fragment. Seems like it allocates a bit of space, then loads the value and branches on it. ------------------------------ # JSOP_BITXOR allocp2 = allocp 4 ldi3 = ldi.alloc allocp2[0] 0x199ee8c9 movl r12d, -52(rbp) guard(oom) = eqi ldi3, undefined/*0*/ xt2: xt guard(oom) -> exit=0x1b1136f8 pc=0x16820b1b imacpc=(nil) sp+16 rp+0 OOM 0x199ee8cd cmpl r12d, 0 0x199ee8d1 je 0x199fe854 ################ Uninit value use
the complete fragment
Ok, bug 609440 changed js_StringToNumber's to this: jsdouble FASTCALL js_StringToNumber(JSContext* cx, JSString* str, JSBool *ok) { double out = 0; /* silence warnings. */ *ok = StringToNumberType<jsdouble>(cx, str, &out); return out; } But left its callinfo unchanged, as this: JS_DEFINE_CALLINFO_3(extern, DOUBLE, js_StringToNumber, CONTEXT, STRING, BOOLPTR, 1, ACCSET_NONE) In other words, it's claiming that js_StringToNumber is pure and doesn't store to any LIR-readable memory. But it has an outparam that gets written to, so both those claims are false. What happened is that the return value was dead, so the function was removed, because it was marked as pure. But the OOM check still checked allocp2, which was the outparam. So it's a perf bug, we end up thinking we've OOM'd when we haven't. I don't think it's a correctness bug, but I'm not certain, it might cause an OOM message to be presented. It's an easy fix, and should be a blocker.
blocking2.0: --- → ?
Oh, and this is something I've been worried about -- our purity/AccSet annotations on functions aren't checked at all, and if you get them wrong it can lead to subtle bugs. I opened bug 625610 to add partial checking of annotations.
Assignee: general → nnethercote
Status: NEW → UNCONFIRMED
Ever confirmed: false
Status: UNCONFIRMED → ASSIGNED
Ever confirmed: true
Simple fix.
Attachment #503690 - Flags: review?(lw)
I forgot to say: Julian confirmed that this patch stop's Memcheck complaining.
Comment on attachment 503690 [details] [diff] [review] patch (against TM 60218:c8cb1ab7612a) Thanks!
Attachment #503690 - Flags: review?(lw) → review+
blocking2.0: ? → betaN+
Whiteboard: hardblocker
Whiteboard: hardblocker → hardblocker, [sg:critical?], fixed-in-tracemonkey
Status: ASSIGNED → RESOLVED
Closed: 14 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: