Make SpiderMonkey helgrind-friendlier

RESOLVED DUPLICATE of bug 551155

Status

()

RESOLVED DUPLICATE of bug 551155
9 years ago
4 years ago

People

(Reporter: jorendorff, Assigned: jorendorff)

Tracking

(Blocks: 1 bug)

Other Branch
Points:
---

Firefox Tracking Flags

(Not tracked)

Details

Attachments

(1 attachment, 3 obsolete attachments)

(Assignee)

Description

9 years ago
There are some great unsupported, undocumented helgrind features we can use in MOZ_VALGRIND builds to make helgrind understand ThinLocks. This is necessary to use helgrind usefully (and it's very useful—it found bug 547274, not to mention several bugs in shell workers).

Julian explicitly told me *not* to check these in, but I want to anyway. This stuff is too valuable to live in a rotting patch queue on my disk. I'll fix them up or take them out when a new version of valgrind breaks them.

Beyond that, the patch adds a JS_NON_THREADSAFE macro that marks fields that race in ways we don't care about (jitstats) or that are safe for subtle reasons beyond helgrind's understanding (protoHazardShape, gcPoke). The point of this change is that you can then suppress all messages about these fields by adding a few lines to your valgrind.supp file.
(Assignee)

Comment 1

9 years ago
Created attachment 428231 [details] [diff] [review]
v1
Assignee: general → jorendorff
Attachment #428231 - Flags: review?
(Assignee)

Updated

9 years ago
Attachment #428231 - Flags: review? → review?(igor)
(In reply to comment #1)
> Created an attachment (id=428231) [details]
> v1

Nitlet:

> void
> js_FinishLock(JSThinLock *tl)
> {
> #ifdef NSPR_LOCK
>     tl->owner = 0xdeadbeef;
>     if (tl->fat)
>         JS_DESTROY_LOCK(((JSLock*)tl->fat));
> #else
>     JS_ASSERT(tl->owner == 0);
>     JS_ASSERT(tl->fat == NULL);
>+    ANNOTATE__MUTEX_DESTROY_PRE(tl);
> #endif
> }

Really, the _PRE should happen before the real lock-destroy action,
not after it(that's what the _PRE/_POST convention is for).  Not that
it makes any difference in this case.

> #else
>+    ANNOTATE__MUTEX_DESTROY_PRE(tl);
>     JS_ASSERT(tl->owner == 0);
>     JS_ASSERT(tl->fat == NULL);
> #endif
(In reply to comment #0)

> Julian explicitly told me *not* to check these in, but I want to
> anyway.

On consideration, it's probably harmless to land it.  I was concerned
that it would cause problems for people using older valgrinds (3.3.x
or before).  But I think that's ok because an older Memcheck should
ignore any Helgrind annotations it sees.

--------

One other comment.  You could land this as-is, and have something that
works with Helgrind in 3.4.x and 3.5.x.  However, I have uncommitted
(V) patches to add two more annotations, to disable and re-enable
checking for a given address range.  You can use them to mark stats
counters or whatever that are non-thread-safe, with the result you
would no longer need a suppressions file.

Said patches are in a relatively good state and I would be happy to
commit them.  Downside is they would only take effect with trunk or
future-3.6.x releases of Valgrind.



+/* Tell H that an address range is not to be "tracked" until further
+   notice.  This puts it in the NOACCESS state, in which case we
+   ignore all reads and writes to it.  Useful for ignoring ranges of
+   memory where there might be races we don't want to see.  If the
+   memory is subsequently reallocated via malloc/new/stack allocation,
+   then it is put back in the trackable state.  Hence it is safe in
+   the situation where checking is disabled, the containing area is
+   deallocated and later reallocated for some other purpose. */
+#define VALGRIND_HG_DISABLE_CHECKING(_qzz_start, _qzz_len)    \
+   do {                                                       \
+     unsigned long _qzz_res;                                  \
+     VALGRIND_DO_CLIENT_REQUEST(                              \
+        (_qzz_res), 0, _VG_USERREQ__HG_ARANGE_MAKE_UNTRACKED, \
+        (_qzz_start), (_qzz_len), 0, 0, 0                     \
+     );                                                       \
+     (void)0;                                                 \
+   } while(0)

+/* And put it back into the normal "tracked" state, that is, make it
+   once again subject to the normal race-checking machinery.  This
+   puts it in the same state as new memory allocated by this thread --
+   that is, basically owned exclusively by this thread. */
+#define VALGRIND_HG_ENABLE_CHECKING(_qzz_start, _qzz_len)     \
+   do {                                                       \
+     unsigned long _qzz_res;                                  \
+     VALGRIND_DO_CLIENT_REQUEST(                              \
+        (_qzz_res), 0, _VG_USERREQ__HG_ARANGE_MAKE_TRACKED,   \
+        (_qzz_start), (_qzz_len), 0, 0, 0                     \
+     );                                                       \
+     (void)0;                                                 \
+   } while(0)
+
+
(Assignee)

Comment 4

9 years ago
(In reply to comment #2)
> >     JS_ASSERT(tl->owner == 0);
> >     JS_ASSERT(tl->fat == NULL);
> >+    ANNOTATE__MUTEX_DESTROY_PRE(tl);
> 
> Really, the _PRE should happen before the real lock-destroy action,
> not after it(that's what the _PRE/_POST convention is for).  Not that
> it makes any difference in this case.

Those are just asserts, though. Actually destroying the lock is a no-op.

Thanks very much for all the help and handholding, Julian. I'd like to get rid of the JS_NON_THREADSAFE junk, but balanced against requiring a pre-release valgrind it's sort of a toss-up. Please let me know when that feature is in, though, and I'll give it a spin...

Comment 5

9 years ago
Comment on attachment 428231 [details] [diff] [review]
v1

Sorry for a long overdue review.
Attachment #428231 - Flags: review?(igor) → review+
Helgrind's annotation set got revamped/updated in valgrind
trunk r11062 (2010-03-04 00:03:40 +0100 (Thu, 04 Mar 2010))
and I am hoping that it will be stable from now on, at least
past a V 3.6.0 release.

As a consequence the patch can be tidied up considerably:

* the DO_CREQ_*_* macros can be removed

* the ANNOTATE__ macros can be removed; any call to ANNOTATE__XX_YY
  now becomes VALGRIND_HG_XX_YY instead

* added new macros VALGRIND_HG_{DIS,EN}ABLE_CHECKING(address, size)
  to disable/enable tracking on arbitrary address ranges.  This could
  possibly be used to reduce or remove the need for
  js::NonThreadSafe<>, although it would need to be used carefully.
  See comments in helgrind.h about how to use, and caveats.
  With some care it should also be possible to remove the need for
  a suppressions file when Helgrinding SM.

* I also added definitions of

  #define __VALGRIND__          3
  #define __VALGRIND_MINOR__    6

  so it's possible to compile in the new stuff without breaking
  on headers from older versions of valgrind
(In reply to comment #1)
Jason: I'd be happy to update the patch to the new annotations if
you can advise me how to run a test workload that exercises js in
whatever way it needs to be exercised.  I just about managed to 
build a threaded js following your guidance on irc yesterday.
Created attachment 430773 [details] [diff] [review]
V1 patch re-done for annotation set for helgrind in valgrind-3.6 (WIP)
(In reply to comment #8)
> V1 patch re-done for annotation set for helgrind in valgrind-3.6 (WIP)

This is much shorter and cleaner.  The JS_NON_THREADSAFE macro and
class NonThreadSafe<> are gone, and replaced by 4 lines in
jscntxt.cpp, which tell the checker at runtime which fields it should
ignore.  Also, this patch does not require the use of a special
suppressions file.
Oh, and there's a bug in js_InitTitle.  js_FinishTitle calls js_FinishLock
on title->lock, but js_InitTitle never calls js_InitLock on it.  So Helgrind
complains that js_FinishLock is called on a bunch of locks it doesn't know
about, which is true.
(In reply to comment #10)
> Oh, and there's a bug in js_InitTitle.

File as a separate bug?
One conclusion from this is that jsengine races on at least the
following fields, presumably harmless and by design, but worth
documenting:

   JSRuntime::gcPoke
   JSRuntime::protoHazardShape

   JSContext::operationCallbackFlag

   JSGCStats::arenaStats

I'm sure I also saw races on some debug printing stuff in nanojit/,
but I can't reproduce those now.
Blocks: 551155
(In reply to comment #12)
> 
> I'm sure I also saw races on some debug printing stuff in nanojit/,
> but I can't reproduce those now.

This is a total guess, but... try making LIR.cpp:LabelMap::buf[] smaller, say 1000 or 500 chars.  That might make it more reproducible.

I'm in the middle of fixing up nanojit's debug printing for bug 531687.  Currently a couple of the printing functions share LabelMap::buf[] and always put stuff into it so they're almost certainly not re-entrant.  I'm changing things so that a buffer is always passed in to these functions from outside.
(In reply to comment #12)
> One conclusion from this is that jsengine races on at least the
> following fields, presumably harmless and by design, but worth
> documenting:
> 
>    JSRuntime::gcPoke
>    JSRuntime::protoHazardShape
> 
>    JSContext::operationCallbackFlag
> 
>    JSGCStats::arenaStats

We only write to gcPoke, and from within requests, which are serialized with the GC (Igor, please check my memory here).

rt->protoHazardShape is regenerated and read within requests, which could race, but when read it is then compared only to samples taken on the same thread, in thread-local storage, by loading the same member, in order to check assumptions about other writes to shared memory that happen after the protoHazardShapeWrite. We assume weak store order.

operationCallbackFlag is safe, the reader is sampling often and races with writers on purpose. We'll see it set soon enough if a script is taking too much realtime.

arenaStats are DEBUG only metering.

/be
Created attachment 431845 [details] [diff] [review]
better version of "V1 patch redone [...] for helgrind 3.6"

A revised version, which moves the "don't check this field"
annotations for JSRuntime::gcPoke/protoHazardShape/gcStats
to the right place -- JSRuntime::JSRuntime().
Attachment #430773 - Attachment is obsolete: true
These bugs are all part of a search I made for js bugs that are getting lost in
transit:

http://tinyurl.com/jsDeadEndBugs

They all have a review+'ed, non-obsoleted patch and are not marked
fixed-in-tracemonkey or checkin-needed but have not seen any activity in 300
days. Some of these got lost simply because the assignee/patch provider never
requested a checkin, or just because they were forgotten about.
Created attachment 514797 [details] [diff] [review]
usable markup, for SpiderMonkey only (not the entire browser)

Version of patch used for race-checking SpiderMonkey during
Jan 2011.  Works pretty well.  Note you'll need to use NSPR
as built in the tree, not a system NSPR, since some of the
annotations in the patch are for NSPR.  Do not commit --
needs cleanup first.
Attachment #428231 - Attachment is obsolete: true
Attachment #431845 - Attachment is obsolete: true
Currently assuming the races on the following fields are harmless:

JSRuntime::gcMallocBytes
JSRuntime::gcPoke
JSRuntime::protoHazardShape
JSRuntime::stringMemoryUsed
JSRuntime::mjitMemoryUsed
JSRuntime::shapeGen

Note this bug should be closed in favour of bug 551155, which contains
a more comprehensive and up to date patch.
In comment 18, Julian says this Helgrind bug should be closed obsoleted by bug 551155.
Status: NEW → RESOLVED
Last Resolved: 4 years ago
Resolution: --- → DUPLICATE
Duplicate of bug: 551155
You need to log in before you can comment on or make changes to this bug.