Closed Bug 476122 Opened 15 years ago Closed 3 years ago

Patch to add misaligned load/store checking to Valgrind

Categories

(Core :: JavaScript Engine, defect)

Other
Other
defect
Not set
normal

Tracking

()

RESOLVED WORKSFORME

People

(Reporter: n.nethercote, Unassigned)

Details

(Keywords: valgrind)

Attachments

(2 files)

(I wrote this patch for Andreas, and he found it useful and so suggested we record it in Bugzilla.  It might also be of use for components other than JavaScript.)

Attached is a patch that adds misaligned access checking to Memcheck.
It even includes proper error recording, with duplicate aggregation
and a couple of suppressions for libc.  Not bad for 90 minutes of work :)

To use it, save the patch to a directory where you want to put your
custom Valgrind, then do this:

  svn co svn://svn.valgrind.org/valgrind/trunk valgrind
  cd valgrind
  patch -p0 < ../diff
  sh autogen.sh
  ./configure --prefix=`pwd`/inst
  make install

(Nb: if you want this for Mac, you'll need the Darwin branch:
 svn co svn://svn.valgrind.org/valgrind/branches/DARWIN valgrind
)

Then run 'inst/bin/valgrind' as usual.  You'll need to use --smc-check=all.

You'll get messages like this:

==23947== Misaligned access of size 8
==23947==    at 0x8077AF9: js_NewDoubleInRootedValue (jsgc.cpp:2205)
==23947==    by 0x8077D17: js_NewWeaklyRootedDouble (jsgc.cpp:2237)
==23947==    by 0x807E9D9: js_InitRuntimeNumberState (jsnum.cpp:676)
==23947==    by 0x805CB22: js_NewContext (jscntxt.cpp:330)
==23947==    by 0x804D3F6: main (js.cpp:4468)
==23947==  Address 0xfe8a3064 is on thread 1's stack

The meaning of which should be obvious.

If you run with -v you'll get some additional Valgrind output.  Most
notably, at the end, for each misalignment reported, you can see how
many times it happens, which seems useful for misalignments.  The most
frequent one I saw on a partial run of trace-tests.js:

==23947== 1336 errors in context 40 of 40:
==23947== Misaligned access of size 8
==23947==    at 0x80956FF: PrimaryExpr(JSContext*, JSTokenStream*,
JSTreeContext*, JSTokenType, int) (jsparse.cpp:5890)
==23947==    by 0x8098FAD: MemberExpr(JSContext*, JSTokenStream*,
JSTreeContext*, int) (jsparse.cpp:4513)
==23947==    by 0x8099616: UnaryExpr(JSContext*, JSTokenStream*,
JSTreeContext*) (jsparse.cpp:4161)
==23947==    by 0x80996BF: MulExpr(JSContext*, JSTokenStream*,
JSTreeContext*) (jsparse.cpp:4000)
==23947==    by 0x8099747: AddExpr(JSContext*, JSTokenStream*,
JSTreeContext*) (jsparse.cpp:3982)
==23947==    by 0x80997D9: ShiftExpr(JSContext*, JSTokenStream*,
JSTreeContext*) (jsparse.cpp:3967)
==23947==    by 0x8099861: RelExpr(JSContext*, JSTokenStream*,
JSTreeContext*) (jsparse.cpp:3942)
==23947==    by 0x809990A: EqExpr(JSContext*, JSTokenStream*,
JSTreeContext*) (jsparse.cpp:3920)
==23947==    by 0x809997B: BitAndExpr(JSContext*, JSTokenStream*,
JSTreeContext*) (jsparse.cpp:3908)
==23947==    by 0x80999DF: BitXorExpr(JSContext*, JSTokenStream*,
JSTreeContext*) (jsparse.cpp:3895)
==23947==    by 0x8099A43: BitOrExpr(JSContext*, JSTokenStream*,
JSTreeContext*) (jsparse.cpp:3882)
==23947==    by 0x8099AA7: AndExpr(JSContext*, JSTokenStream*,
JSTreeContext*)(jsparse.cpp:3871)
==23947==  Address 0xfe8a2ebc is on thread 1's stack

ie. this misaligned access occurred 1336 times.

It seems to be working fairly well.  It checks 16, 32, 64 and 128 bit
loads and stores.  I'm not completely confident about the 128-bit
checks, 128-bit values are handled a little differently in Valgrind, I
can check more carefully if it would be helpful.  In error messages it
doesn't say if the access was a load or store, I can add that if it
would help.

All the errors I'm seeing are in the compiler itself, rather than in
generated code.  Any ones in generated code will give stack traces
without any debug/symbol info, of course.
This is awesome!  I wanted to distinguish reads/writes, so I sortof stumbled my way into getting it working.  Also fixed a rebase conflict.  It's all probably wrong, but it seems to work :)
This should be made part of the Cachegrind and Callgrind (profilers).
They already collect various kinds of bad-interactions-with-the-
microarchitecture info (cache misses, mispredicts, global bus locks)
and this falls naturally in that category.  They also have the
infrastructure to display large amounts of performance data.  Memcheck
is about correctness, not performance, so it doesn't seem like the
right place for this, long term.
(In reply to comment #2)
> This should be made part of the Cachegrind and Callgrind (profilers).
> They already collect various kinds of bad-interactions-with-the-
> microarchitecture info (cache misses, mispredicts, global bus locks)
> and this falls naturally in that category.  They also have the
> infrastructure to display large amounts of performance data.  Memcheck
> is about correctness, not performance, so it doesn't seem like the
> right place for this, long term.

Hmm, not necessarily.  The error message from bug 589527 comment 0 was this:

==11823== 9998 errors in context 111 of 111:
==11823== Misaligned 64-bit memory read
==11823==    at 0x4C63FA4: ???
==11823==    by 0x81B3E0A: js::ExecuteTrace(JSContext*, nanojit::Fragment*,
js::TracerState&) (jstracer.cpp:6662)
==11823==  Address 0x4464d4c is 844 bytes inside a block of size 2,004 alloc'd
==11823==    at 0x402433F: calloc (vg_replace_malloc.c:467)
==11823==    by 0x81A16DF: js_calloc (jsutil.h:197)
==11823==    by 0x81A2D65: nanojit::Allocator::allocChunk(unsigned int)
(jstracer.cpp:113)

Without the misaligned address (0x4464d4c) Luke wouldn't have found the generated code that was responsible.  Admittedly that is a difficult case because the misalignment was in generated code, but the misaligned address still seems to be important.

I guess it depends on how you look at misalignments.  In my mind they are not like cache misses or branch mispredicts, which are undesirable but unavoidable in large numbers.  Misalignments are something that you want to and can reduce to zero, like memory errors.

Now, doing this checking within Memcheck isn't great, but I think the messages of the above Memcheck style are more useful than a Cachegrind-style output.
njn, does this bug still have usefulness?
Potentially, yes.  I would have closed it otherwise :)
Assignee: general → nobody
Status: NEW → RESOLVED
Closed: 3 years ago
Resolution: --- → WORKSFORME
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: