debugger thread safety patch for spidermonkey

RESOLVED FIXED in mozilla1.9alpha1

Status

()

Core
JavaScript Engine
P1
normal
RESOLVED FIXED
12 years ago
11 years ago

People

(Reporter: Mike Moening, Assigned: brendan)

Tracking

Trunk
mozilla1.9alpha1
Points:
---
Bug Flags:
blocking1.8.1.2 -
blocking1.8.0.10 -
in-testsuite -

Firefox Tracking Flags

(Not tracked)

Details

Attachments

(1 attachment, 4 obsolete attachments)

(Reporter)

Description

12 years ago
User-Agent:       Mozilla/5.0 (Windows; U; Windows NT 5.0; en-US; rv:1.8) Gecko/20051111 Firefox/1.5
Build Identifier: Mozilla/5.0 (Windows; U; Windows NT 5.0; en-US; rv:1.8) Gecko/20051111 Firefox/1.5

Traps and watches are not thread safe in spidermonkey.  I have patches that will be attached for version 1.5

Reproducible: Always
(Reporter)

Comment 1

12 years ago
Created attachment 208964 [details]
thread safety patch for watches and traps
Um, your patch is garbled, sir.
Attachment #208964 - Attachment is patch: false
Attachment #208964 - Attachment mime type: text/plain → application/zip
> Um, your patch is garbled, sir.

nah, it's just a zip archive...

Updated

12 years ago
Flags: testcase-
(Reporter)

Comment 4

12 years ago
Created attachment 228729 [details] [diff] [review]
thread safety patch for traps/watches from current CVS 7/10/06

This patch is based off CVS head.
Attachment #208964 - Attachment is obsolete: true
(Assignee)

Comment 5

12 years ago
Mike, please cc: me and request my review on the patch by setting the review ? flag and entering brendan@moz (or longer, bugzilla will autocomplete) in the attachment manager.  I'll do this one.

/be
Status: UNCONFIRMED → NEW
Ever confirmed: true
(Assignee)

Comment 6

12 years ago
Comment on attachment 228729 [details] [diff] [review]
thread safety patch for traps/watches from current CVS 7/10/06

I won't get to this right away, but now it's in my queue.  Timeless, feel free to add your thoughts.

/be
Attachment #228729 - Flags: review?(brendan)

Comment 7

11 years ago
Comment on attachment 228729 [details] [diff] [review]
thread safety patch for traps/watches from current CVS 7/10/06

for the future, if you could use cvs diff -up that'd be appreciated
there is some strange whitespace in this patch, but i don't think that's from -w...

i believe canonical JSBool is 'ok' (not 'retVal' or whatever)

>+    /* Used to make the traps and watch lists thread safe for 
multi-threaded debuggers */

I assume we can say threadsafe as a single word.
unless I'm missing something, Object.watch() should be sufficient in a multithreaded spidermonkey to cause problems, so mention of a debugger shouldn't be necessary (i.e. multi-threaded consumers ?).

>+    PRLock              *debuggerLock;

Should we name this LockedFindTrap to make this explicit? (not sure if spidermonkey has that style, I think some systems do)

> FindTrap(JSRuntime *rt, JSScript *script, jsbytecode *pc)
>+    /* Do not acquire a debuggerLock here since locks cannot be nested */
>+    /* Lock is aquired by the parent caller. */


>+    /* Locks around everything to be sure its completely destroyed before releasing the lock*/

it's

missing spaces inside /**/
missing spaces between if and (
>+        /*This pc should already be a normal op code by now.*/
>+        /* If not prevent infinite loop */
>+        if(JSOP_TRAP == (jsint)*pc)

>+        *rval = INT_TO_JSVAL(*pc);		/*This pc should already be a normal op code by now.*/

>      * It's important that we not use 'trap->' after calling the callback --

>         if ((!scope || wp->object == scope->object) && wp->sprop == sprop)
>+        {
>+            JS_RELEASE_LOCK(rt->debuggerLock);
>             return wp->setter;
>     }
>+     }
>+ 
>+    JS_RELEASE_LOCK(rt->debuggerLock);

oh, wow:
>     JSBool ok;
>+    JSBool retVal;

I don't see any reason not to recycle ok here:
>+            /*Be sure to wrap the drop inside a lock */
>+            retVal = DropWatchPoint(cx, wp);
>+            JS_RELEASE_LOCK(rt->debuggerLock);
>+            return retVal;	

call this ok:
>+    JSBool retVal;

>+        if (wp->object == obj && !DropWatchPoint(cx, wp)) {
>+            JS_RELEASE_LOCK(rt->debuggerLock);
>             return JS_FALSE;
>     }
>+    }
>+    JS_RELEASE_LOCK(rt->debuggerLock);

>-     OBJ_GET_CLASS(cx, JSVAL_TO_OBJECT(v)) == &js_FunctionClass)
>+     JS_GET_CLASS(cx, JSVAL_TO_OBJECT(v)) == &js_FunctionClass)

I seem to recall trying to make this change and having brendan tell me I was doing something wrong. iirc you really shouldn't be using this header outside spidermonkey itself. Which was my problem.

There's an alternative method you should call instead which I could find if necessary.

Basically, I hit:
js3250!JS_Assert+0x2a
js3250!JS_HandleTrap+0x2f
js3250!js_Interpret+0x9b2
js3250!js_Invoke+0x7a8
xpc3250!nsXPCWrappedJSClass::CallMethod+0x845
xpc3250!nsXPCWrappedJS::CallMethod+0x27

using venkman (main thread) while stepping through code. I don't actually see any other threads of interest, although I am using PAC which probably makes my life more interesting.

I have this crash as jsd_xpc-is_broken_1F4C_2006-12-18_17-38-50-408_1888.cab in case anyone really cares.
(Reporter)

Comment 8

11 years ago
Thanks for that review timeless.  
Can you update the patch for me?
I don't have cvs access unfortunately.
It's a stupid firewall issue I've never been able to resolve.
I generated the patch basically by hand. It really sucked.

With regards to the OBJ_GET_CLASS versus JS_GET_CLASS...
I have no choice but to use this header to make it compile.
When you are writing a debugger you need access to the internals.
The external API doesn't have enough to get the job done.
Can we make this simple change?

What was the "oh, wow:" comment for?

>Basically, I hit:
Are you saying my patch caused to crash somehow?
Please explain.

Thanks a lot.
(Assignee)

Comment 9

11 years ago
Created attachment 249052 [details] [diff] [review]
fix

Mike, please test this hard. It fixes several bugs in your patch (subtle threading issues not for the faint of heart, for the most part -- plus a JSOP_LIMIT return where a JSTRAP_* enum was wanted). This patch takes great pains never to nest locks.

mrbkap: this fixes another bug I recall you were working on; hope it saves you from having to hack a fix.

/be
Assignee: general → brendan
Attachment #228729 - Attachment is obsolete: true
Status: NEW → ASSIGNED
Attachment #249052 - Flags: review?(mrbkap)
Attachment #228729 - Flags: review?(brendan)

Comment 10

11 years ago
the wow was because i was wondering why you would need to have retVal since the name ok is usually used, and then i realized that there was an ok you were trying not to collide with.

the basically i hit is me explaining that the reason i'm here is because i hit the problem you hit.

as for your OBJ_ change, I'd love it since I wanted it for jsd_xpc, but i do recall asking for it and being told no.

wrt patching.. http://viper.haque.net/~timeless/redbean has a cvsdo diff which can be used to get diffs that look like cvs diffs. i feel your pain. we have a firewall here too.

and i just midaired trying to say brendan was going to adopt this bug.
(Assignee)

Comment 11

11 years ago
(In reply to comment #8)
> With regards to the OBJ_GET_CLASS versus JS_GET_CLASS...
> I have no choice but to use this header to make it compile.
> When you are writing a debugger you need access to the internals.
> The external API doesn't have enough to get the job done.
> Can we make this simple change?

No, first you should say why your debugger needs jsfun.h.  It's true that jsd/jsd_xpc.cpp includes jsfun.h, but it should not (timeless, rs=me on the fix, which uses JS_GetFunctionArity).

/be
(Assignee)

Updated

11 years ago
Blocks: 355044
Flags: blocking1.8.1.2?
Flags: blocking1.8.0.10?
Priority: -- → P1
Target Milestone: --- → mozilla1.9alpha
(Assignee)

Comment 12

11 years ago
Created attachment 249055 [details] [diff] [review]
fix, v2

Typo fix that timeless caught, plus optimization to avoid extra debuggerLock trip under DropWatchPointAndUnlock.

/be
Attachment #249052 - Attachment is obsolete: true
Attachment #249055 - Flags: review?(mrbkap)
Attachment #249052 - Flags: review?(mrbkap)

Comment 13

11 years ago
Hypothetical pathological problem, not sure if this is real.

Thread zero (there are no other threads)
sets up a JSScript and calls JS_SetTrap for some part of it.
It now sets up two threads "victim" and debugger.

Thread victim calls js_DecompileValueGenerator on the script

     if (JS_UPTRDIFF(pc, script->code) >= (jsuword)script->length) {
         pc = fp->pc;
         if (!pc)
             goto do_fallback;
     }
/* code pointer is here at time T */
     op = (JSOp) *pc;
/* T+1 */
     if (op == JSOP_TRAP)
/* T+2 */
         op = JS_GetTrapOpcode(cx, script, pc);
Thread debugger calls JS_ClearTrap on the script at time T+1.

JS_ClearTrap finishes before T+2. Actually it hardly matters as long as JS_ClearTrap is called before JS_GetTrapOpcode because there's a lock and ClearTrap will hold it preventing GetTrapOpcode from doing anything until ClearTrap finishes.

Based on the bug i'm fixing. Does that mean that instead of

 #ifdef JS_THREADSAFE
 JS_PUBLIC_API(JSOp)
 JS_GetTrapOpcode(JSContext *cx, JSScript *script, jsbytecode *pc)
 {
     JSTrap *trap;
 
     DBG_LOCK_EVAL(cx->runtime, trap = FindTrap(cx->runtime, script, pc));
     if (!trap) {
         /*
          * If we lost a race with another thread, return JSOP_LIMIT so our
          * caller can detect this case and do something sane.
          */
         return JSOP_LIMIT;
     }
 #endif

we should return *pc ? There are only 3 instances (soon 4) consuming this function, at the moment all in one file. We could fix that file to properly handle this function.

Otherwise, I'm pretty sure that the crash I'm fixing will haunt me again.

I'm pretty sure that most of the consumers are broken atm.
 172 js_Disassemble1(JSContext *cx, JSScript *script, jsbytecode *pc, uintN loc,
 173                 JSBool lines, FILE *fp)
 201             op = JS_GetTrapOpcode(cx, script, pc);
 202             if (op == JSOP_LIMIT)
 203                 return 0;
I don't see it setting an error.

3790                 op = JS_GetTrapOpcode(cx, jp->script, pc);
actually looks correct. The other I am asserting is wrong as the problem statement.
Comment on attachment 249055 [details] [diff] [review]
fix, v2

We might want to separate the JSWatchPoint.nref->JSWatchPoint.flags patch into the bug that it's fixing. I think this looks OK, though I'm not sure I've properly vetted all of the MT issues.
Attachment #249055 - Flags: review?(mrbkap) → review+

Comment 15

11 years ago
Not blocking the upcoming releases on this one.  Still needs bake time on the Trunk.  Please renominate for future release if you feel this really needs to land on the branches and are worth the risk.
Flags: blocking1.8.1.2?
Flags: blocking1.8.1.2-
Flags: blocking1.8.0.10?
Flags: blocking1.8.0.10-
(Assignee)

Comment 16

11 years ago
MikeM: can you test this bug's patch and testify to its goodness or badness?  Thanks a ton,

/be
(Reporter)

Comment 17

11 years ago
Yes I will. Sorry for being so slow.  Most of my resources on this didn't get back from vacation until yesterday.  I'll get back to you soon. Thanks!

What this change checked into truck or do I have to apply the patch?
(Assignee)

Comment 18

11 years ago
Fix was committed yesterday by happy accident, when trying to commit the patch for bug 366468.

/be
Status: ASSIGNED → RESOLVED
Last Resolved: 11 years ago
Resolution: --- → FIXED
(Reporter)

Comment 19

11 years ago
This patch is working fine so far.  Not 100% done testing yet.

However, I am noticing a contention problem on the JSRuntime->trapList once a trap has been set on a script that is being executed by multiple threads.

Can we change the debuggerLock in the runtime from a PRLock to a "ReadWriteLock" (Many readers but only 1 writer)??

That way findTrap() will be fast and won't slow other threads executions except when a trap is cleared or set (which isn't very often)

Do you already have a "ReadWriteLock" available in NSPR or JS that I can test this with?

(Assignee)

Comment 20

11 years ago
Yes, NSPR has multiple-reader/single-writer locks, see:

http://lxr.mozilla.org/mozilla/source/nsprpub/pr/include/prrwlock.h

We can use one unless wtc thinks it's not advisable in this case. Please file a new bug. Thanks,

/be

Comment 21

11 years ago
Yes, NSPR has a reader-writer lock.

To our surprise, switching to a reader-writer lock doesn't
always get the performance improvement that we expect.  So
make sure you experiment and measure first.
(Assignee)

Comment 22

11 years ago
Created attachment 253141 [details] [diff] [review]
make debuggerLock a r/w lock

As wtc said, testing needed.  MikeM, can you try this out and report performance results?  Not just whether it relieves the contention you saw, but how it compares for other cases.

/be
Attachment #253141 - Flags: review?(mrbkap)

Updated

11 years ago
Attachment #253141 - Flags: review?(mrbkap) → review+
(Reporter)

Comment 23

11 years ago
It looks like the PRRWLock NSPR is heavier and slower than the simple PRLock.
The implementation of PRRWLock has an internal PRLock as well as 2 PRCondVar's.
Each of these has its own internal lock.
I think on a platform that supports native Condition Variables in the kernel we might be ok, but not on Win32.  

Lets just leave it as a PRLock for now.
Thanks
Mike M. 
(Assignee)

Comment 24

11 years ago
Comment on attachment 253141 [details] [diff] [review]
make debuggerLock a r/w lock

Withdrawn.

/be
Attachment #253141 - Attachment is obsolete: true
You need to log in before you can comment on or make changes to this bug.