Closed Bug 572678 Opened 10 years ago Closed 10 years ago

valgrind integration with conservative gc stack scan

Categories

(Core :: JavaScript Engine, defect)

defect
Not set

Tracking

()

RESOLVED FIXED
Tracking Status
blocking2.0 --- betaN+

People

(Reporter: igor, Assigned: igor)

References

(Blocks 1 open bug)

Details

(Whiteboard: fixed-in-tracemonkey)

Attachments

(1 file, 6 obsolete files)

The conservative GC scanner may access a native stack regions that valgrind assumes should be unreachable. We should add valgrind hooks to workaround that.
Attached patch initial prototype (obsolete) — Splinter Review
The patch just calls VALGRIND_MAKE_MEM_DEFINED on the stack range. This is not right as it disables all valgrind checks for the stack, but at least it silence valgrind warnings.
Attached patch v2 (obsolete) — Splinter Review
The new version does not read words from the stack that valgrind considers as not initialized.
Attachment #451925 - Attachment is obsolete: true
Any valgrind guru to review the patch here?
Attachment #451935 - Flags: review?(nnethercote)
Comment on attachment 451935 [details] [diff] [review]
v2

>+    /*
>+     * To avoid bogus errors under valgrind we do not mark conservatively
>+     * words that the memcheck tool treats as not initialized.
>+     */
>+    bool canRead(jsuword *ptr) {
>+#ifdef JS_VALGRIND
>+        jsuword vbits;
>+        JS_ALWAYS_TRUE(VALGRIND_GET_VBITS(ptr, &vbits, sizeof(vbits)) <= 1);
>+        return !vbits;
>+#else
>+        return true;
>+#endif
>+    }
>+    

VALGRIND_GET_VBITS is almost certainly not what you want.  

Do you want to know if Valgrind thinks a word is undefined?  (Nb.
"undefined" basically means "uninitialized or derived from uninitialized
value(s)".)  If so, use VALGRIND_CHECK_MEM_IS_DEFINED.  See
http://www.valgrind.org/docs/manual/mc-manual.html#mc-manual.clientreqs for
details.

And "canRead()" is a bad name because it's unclear if you're talking about
whether the memory address is accessible (ie. mapped, ie. can you access it
without causing a seg fault or similar) or defined.  "isDefined()" would be
better.

So I think you want something like this:

    bool isDefined(jsuword *ptr) {
#ifdef JS_VALGRIND
        return VALGRIND_CHECK_MEM_IS_DEFINED(ptr, sizeof(jsuword));
#else
        return true;
#endif
    }

which is simple enough that you might as well inline it.
Attachment #451935 - Flags: review?(nnethercote) → review-
It's unclear from the above whether the problem is with stack words
that V regards as unaccessible, or with ones that accessible but
undefined:

> The conservative GC scanner may access a native stack regions that valgrind
> assumes should be unreachable.

> The patch just calls VALGRIND_MAKE_MEM_DEFINED on the stack range.

Can you attach examples of the errors it reports, that you don't want
to see?
(In reply to comment #4)
> Do you want to know if Valgrind thinks a word is undefined?  (Nb.
> "undefined" basically means "uninitialized or derived from uninitialized
> value(s)".)  If so, use VALGRIND_CHECK_MEM_IS_DEFINED.  See
> http://www.valgrind.org/docs/manual/mc-manual.html#mc-manual.clientreqs for
> details.

That is not suitable here. That prints an error message on errors while the stack scanner definitely accesses words that valgrind may consider as both inaccessible (between native stack frames, for example) or as accessible but not initialized. That is why I have used VALGRIND_GET_VBITS to find out if valgrind would complain about the access and skip the word if so as it the word cannot contain the root.

(In reply to comment #5)
> Can you attach examples of the errors it reports, that you don't want
> to see?

So far two errors were reported:

Conditional jump or move depends on uninitialised value(s)
Use of uninitialised value of size 8

like in
==17517== Conditional jump or move depends on uninitialised value(s)
==17517==    at 0x4294F0: JSString::isIntString(void*) (jsstr.h:278)
==17517==    by 0x42954B: JSString::isStatic(void*) (jsstr.h:287)
==17517==    by 0x489D4A: CheckValidGCThingPtr(void*) (jsgc.cpp:431)
==17517==    by 0x489FE6: GetGCThingMarkBit(void*, unsigned long&) (jsgc.cpp:504)
==17517==    by 0x48A03F: IsMarkedGCThing(void*) (jsgc.cpp:515)
==17517==    by 0x480C3C: js_IsAboutToBeFinalized(void*) (jsgc.cpp:893)
==17517==    by 0x4818C1: js::ConservativeGCStackMarker::markWord(unsigned long) (jsgc.cpp:1235)
==17517==    by 0x481A5D: js::ConservativeGCStackMarker::markRange(unsigned long*, unsigned long*) (jsgc.cpp:1263)
==17517==    by 0x481B7A: js::ConservativeGCStackMarker::markRoots() (jsgc.cpp:1291)
==17517==    by 0x485F4A: js_TraceRuntime(JSTracer*) (jsgc.cpp:2774)
==17517==    by 0x486672: GC(JSContext*) (jsgc.cpp:3324)
==17517==    by 0x486A33: GCUntilDone(JSContext*, JSGCInvocationKind) (jsgc.cpp:3708)
==17517==


==17517== Use of uninitialised value of size 8
==17517==    at 0x4817F3: js::ConservativeGCStackMarker::markWord(unsigned long) (jsgc.cpp:1212)
==17517==    by 0x481A5D: js::ConservativeGCStackMarker::markRange(unsigned long*, unsigned long*) (jsgc.cpp:1263)
==17517==    by 0x481B7A: js::ConservativeGCStackMarker::markRoots() (jsgc.cpp:1291)
==17517==    by 0x485F4A: js_TraceRuntime(JSTracer*) (jsgc.cpp:2774)
==17517==    by 0x486672: GC(JSContext*) (jsgc.cpp:3324)
==17517==    by 0x486A33: GCUntilDone(JSContext*, JSGCInvocationKind) (jsgc.cpp:3708)
==17517==    by 0x486B91: js_GC(JSContext*, JSGCInvocationKind) (jsgc.cpp:3762)
==17517==    by 0x42110C: JS_GC (jsapi.cpp:2421)
==17517==    by 0x4052E1: GC(JSContext*, unsigned int, long*) (js.cpp:1109)
==17517==    by 0x5F9338: js_Interpret (jsops.cpp:2146)
==17517==    by 0x496EA5: js_Execute (jsinterp.cpp:855)
==17517==    by 0x426EB6: JS_ExecuteScript (jsapi.cpp:4640)
==17517==
I still don't understand exactly what the problem is, and what you want to achieve, and judging from comment 5 Julian doesn't either.  Can you try to explain more clearly?  Nb. the terminology we usually use:

  accessible/inaccessible: to describe whether a chunk of memory can be accessed or not, eg. if it's mapped (this says nothing about the values within the chunk)

  defined/undefined: only applies to accessible memory, and describes whether the value within the chunk of memory is defined (ie. initialized and/or derived from initialized values)
(In reply to comment #6)
> So far two errors were reported:
> 
> Conditional jump or move depends on uninitialised value(s)
> Use of uninitialised value of size 8

If these are the only kinds of errors reported, and there are no
complaints about reading or writing at invalid addresses, then I
suggest the following:

 unsigned long get_and_sanitise_word ( unsigned long* p )
 {
     unsigned long w = *p;
     VALGRIND_MAKE_MEM_DEFINED(&w, sizeof(w));
     return w;
 }

This way, you will still get complaints if you read an invalid area,
but no complaint if you read uninitialised values from a valid area.
And unlike the initial prototype (comment 1) it does not affect the
stack range, so subsequent checking on it is unaffected.  The only
drawback is you'll be doing a client request once per stack word, and
that takes at least several hundred cycles.  So it's going to be a
performance hit if you do more than a few tens of thousand of stack
words per second.
(In reply to comment #7)
>   accessible/inaccessible: to describe whether a chunk of memory can be
> accessed or not, eg. if it's mapped (this says nothing about the values within
> the chunk)
> 
>   defined/undefined: only applies to accessible memory, and describes whether
> the value within the chunk of memory is defined (ie. initialized and/or derived
> from initialized values)

The conservative stack scanner definitely reads undefined cells. But it may access memory that is mapped, belongs to the range of pages reserved for a native stack but that is not withing the native stack base and the current stack pointer. If valgrind treats such memory as accessible, than the conservative stack scanner can only read undefined memory and cannot read inaccessible memory.

(In reply to comment #8)
> If these are the only kinds of errors reported, and there are no
> complaints about reading or writing at invalid addresses, then I
> suggest the following:
> 
>  unsigned long get_and_sanitise_word ( unsigned long* p )
>  {
>      unsigned long w = *p;
>      VALGRIND_MAKE_MEM_DEFINED(&w, sizeof(w));
>      return w;
>  }

Correct me if I am wrong, but that would make the word permanently defined and valgrind would not be able to tell if some code would later read that word. I.e. in the following example the valgrind would not be able to detect the undefined read from v after js_GC() as the GC would make v defined:

jsval v;
js_GC();
jsval v2 = v;

So my goal is to make sure that the conservative GC does not change the valgrind view of the world.
It calls VALGRIND_MAKE_MEM_DEFINED on a copy, "w", and leaves the original, "p", alone.  So it should do what you want.
(In reply to comment #10)
> It calls VALGRIND_MAKE_MEM_DEFINED on a copy, "w", and leaves the original,
> "p", alone.  So it should do what you want.

Thanks, I have missed that.
Attached patch v3 (obsolete) — Splinter Review
The new patch calls VALGRIND_MAKE_MEM_DEFINED on a copy of stack memory. To avoid performance losses when compiled with valgrind enabled the patch doing piecewise copy.
Attachment #451935 - Attachment is obsolete: true
Attachment #452446 - Flags: review?(nnethercote)
Comment on attachment 452446 [details] [diff] [review]
v3

>+    /*
>+     * The conservative scanner may access words on the stack that valgrind
>+     * considers as undefined. To avoid false positives and not to alter
>+     * valgrind view of the stack we copy the words to a buffer (piecewise if
>+     * necessary) and make the buffer defined. See bug 572678.
>+     */
>+    void markNativeStackRange(jsuword *begin, jsuword *end) {
>+#ifdef JS_VALGRIND
>+        static bool novalgrind = false;
>+        if (!novalgrind) {
>+            jsuword valgrindBuffer[128];
>+            for (;;) {
>+                size_t n = end - begin;
>+                if (n > JS_ARRAY_LENGTH(valgrindBuffer))
>+                    n = JS_ARRAY_LENGTH(valgrindBuffer);
>+                size_t copySize = n * sizeof *begin;
>+                memcpy(valgrindBuffer, begin, copySize);
>+                if (!VALGRIND_MAKE_MEM_DEFINED(valgrindBuffer, copySize)) {
>+                    novalgrind = true;
>+                    break;
>+                }
>+                markRange(valgrindBuffer, valgrindBuffer + n);
>+                begin += n;
>+                if (begin == end)
>+                    return;
>+            }
>+        }
>+#endif
>+        markRange(begin, end);
>+    }

I was expecting something similar to what Julian wrote in comment 8.  This
is much more complicated.  

Why copy the values to a new buffer?  Julian's code achieves the same effect
but does it one word at a time which is simpler.

Also, 'novalgrind' seems unnecessary.  Again, Julian's code wouldn't require
it.
(In reply to comment #13)
> I was expecting something similar to what Julian wrote in comment 8.

It is necessary to define values only when scanning the stack, not when scanning the registers copy. So something with more if and ifdefs would be necessary even without bulk copy. 

> This is much more complicated.  

From the comment 8:

> The only
> drawback is you'll be doing a client request once per stack word, and
> that takes at least several hundred cycles.  So it's going to be a
> performance hit if you do more than a few tens of thousand of stack
> words per second.

The above avoids an extra performance hit when checking with valgrind various tests that uses 500K of stack and than run the GC. 

> Also, 'novalgrind' seems unnecessary.  Again, Julian's code wouldn't require
> it.

This is to minimize any penalty for builds compiled with --enable-valgrind when they run without valgrind.
 
> > The only
> > drawback is you'll be doing a client request once per stack word, and
> > that takes at least several hundred cycles.  So it's going to be a
> > performance hit if you do more than a few tens of thousand of stack
> > words per second.
> 
> The above avoids an extra performance hit when checking with valgrind various
> tests that uses 500K of stack and than run the GC. 

Well, I kind of see what you're saying, but I'd be more convinced if
you had some performance numbers to show this extra complication is
necessary.

> > Also, 'novalgrind' seems unnecessary.  Again, Julian's code wouldn't require
> > it.
> 
> This is to minimize any penalty for builds compiled with --enable-valgrind when
> they run without valgrind.

I agree with Nick.  If you can reliably detect any performance penalty
for runs compiled --enable-valgrind, with/without the 'novalgrind'
logic, I would be absolutely amazed.  Typically it is difficult to
detect user time changes of less than about 0.5% reliably, and what
'novalgrind' will save is a few tens of thousands of cycles every few
billion cycles in total (I would guess).
Attached patch v4 (obsolete) — Splinter Review
Smaller patch without novalgrind
Attachment #452446 - Attachment is obsolete: true
Attachment #452700 - Flags: review?(nnethercote)
Attachment #452446 - Flags: review?(nnethercote)
Attached patch v5 (obsolete) — Splinter Review
Refreshed patch
Attachment #452700 - Attachment is obsolete: true
Attachment #453030 - Flags: review?(nnethercote)
Attachment #452700 - Flags: review?(nnethercote)
(In reply to comment #14)
> > I was expecting something similar to what Julian wrote in comment 8.
> 
> It is necessary to define values only when scanning the stack, not when
> scanning the registers copy.

I don't understand this sentence.  Can you clarify?  I know almost nothing about SM's GC, so feel free to include lots of detail.
(In reply to comment #18)
> > It is necessary to define values only when scanning the stack, not when
> > scanning the registers copy.
> 
> I don't understand this sentence.  Can you clarify?  I know almost nothing
> about SM's GC, so feel free to include lots of detail.

The conservative GC scans both the native stack and the copy of the registers using the same markRange function. The register copy is obtained using setjmp and as such should always be defined from the valgrind point of view. So it is necessary to set the defined status only when calling markRange on the native stack.
Attachment #453030 - Flags: review?(nnethercote) → review+
Its just a couple registers. Does it really hurt to set it for them too?
(In reply to comment #20)
> Its just a couple registers. Does it really hurt to set it for them too?

Good point.  This whole patch sequence has smelt like you're avoiding costs that you haven't measured.
(In reply to comment #19)

> The register copy is obtained using setjmp and as such should 
> always be defined from the valgrind point of view.

Perhaps I misunderstood something, but .. per se the above is
not correct.  Memcheck tracks definedness for registers too and
you can't assume that just because you dumped them in memory,
it now considers them to be defined.
Attached patch v6 (obsolete) — Splinter Review
To test the performance impact of calling VALGRIND_MAKE_MEM_DEFINED I tested valgrind-enabled built with the following test case that exercise the conservative GC against 500K of stack space:

var N = 350;
var obj = { get a() {
    if (!N) {
	print(gc());
	return;
    }
    --N;
    obj.a;
}};

obj.a;

Doing one call per word increased the running time by about 8%. Since the test case represents the worst case scenario which happens only with synthetic tests which are rarely executed under valgrind that does not justify the extra complexity in the patch.

So here is simple patch that takes advantage from changes from the bug 573709. The patch marks all words as defined per rightful observation from the comment 22.
Attachment #453030 - Attachment is obsolete: true
Attachment #453347 - Flags: review?(nnethercote)
+    VALGRIND_MAKE_MEM_DEFINED(w, sizeof w);

Is that what you intended, rather than ...(&w, sizeof w) ?
Attached patch v7Splinter Review
v6 had a wrong patch that I generated before fixing a typo that was also pointed by Julian.
Attachment #453347 - Attachment is obsolete: true
Attachment #453363 - Flags: review?(nnethercote)
Attachment #453347 - Flags: review?(nnethercote)
Comment on attachment 453363 [details] [diff] [review]
v7

Thanks for measuring!  This looks much better.   The comment is out-of-date, refers to copying to a buffer.  r=me with that fixed (no need for me to re-review the patch).
Attachment #453363 - Flags: review?(nnethercote) → review+
http://hg.mozilla.org/tracemonkey/rev/650a162fbb00
Whiteboard: fixed-in-tracemonkey
Blocks: 578233
http://hg.mozilla.org/mozilla-central/rev/650a162fbb00
Status: NEW → RESOLVED
Closed: 10 years ago
Resolution: --- → FIXED
blocking2.0: ? → betaN+
You need to log in before you can comment on or make changes to this bug.