Closed Bug 324010 Opened 14 years ago Closed 13 years ago

debugger thread safety patch for spidermonkey


(Core :: JavaScript Engine, defect, P1)






(Reporter: MikeM, Assigned: brendan)




(1 file, 4 obsolete files)

User-Agent:       Mozilla/5.0 (Windows; U; Windows NT 5.0; en-US; rv:1.8) Gecko/20051111 Firefox/1.5
Build Identifier: Mozilla/5.0 (Windows; U; Windows NT 5.0; en-US; rv:1.8) Gecko/20051111 Firefox/1.5

Traps and watches are not thread safe in spidermonkey.  I have patches that will be attached for version 1.5

Reproducible: Always
Um, your patch is garbled, sir.
Attachment #208964 - Attachment is patch: false
Attachment #208964 - Attachment mime type: text/plain → application/zip
> Um, your patch is garbled, sir.

nah, it's just a zip archive...
Flags: testcase-
This patch is based off CVS head.
Attachment #208964 - Attachment is obsolete: true
Mike, please cc: me and request my review on the patch by setting the review ? flag and entering brendan@moz (or longer, bugzilla will autocomplete) in the attachment manager.  I'll do this one.

Ever confirmed: true
Comment on attachment 228729 [details] [diff] [review]
thread safety patch for traps/watches from current CVS 7/10/06

I won't get to this right away, but now it's in my queue.  Timeless, feel free to add your thoughts.

Attachment #228729 - Flags: review?(brendan)
Comment on attachment 228729 [details] [diff] [review]
thread safety patch for traps/watches from current CVS 7/10/06

for the future, if you could use cvs diff -up that'd be appreciated
there is some strange whitespace in this patch, but i don't think that's from -w...

i believe canonical JSBool is 'ok' (not 'retVal' or whatever)

>+    /* Used to make the traps and watch lists thread safe for 
multi-threaded debuggers */

I assume we can say threadsafe as a single word.
unless I'm missing something, should be sufficient in a multithreaded spidermonkey to cause problems, so mention of a debugger shouldn't be necessary (i.e. multi-threaded consumers ?).

>+    PRLock              *debuggerLock;

Should we name this LockedFindTrap to make this explicit? (not sure if spidermonkey has that style, I think some systems do)

> FindTrap(JSRuntime *rt, JSScript *script, jsbytecode *pc)
>+    /* Do not acquire a debuggerLock here since locks cannot be nested */
>+    /* Lock is aquired by the parent caller. */

>+    /* Locks around everything to be sure its completely destroyed before releasing the lock*/


missing spaces inside /**/
missing spaces between if and (
>+        /*This pc should already be a normal op code by now.*/
>+        /* If not prevent infinite loop */
>+        if(JSOP_TRAP == (jsint)*pc)

>+        *rval = INT_TO_JSVAL(*pc);		/*This pc should already be a normal op code by now.*/

>      * It's important that we not use 'trap->' after calling the callback --

>         if ((!scope || wp->object == scope->object) && wp->sprop == sprop)
>+        {
>+            JS_RELEASE_LOCK(rt->debuggerLock);
>             return wp->setter;
>     }
>+     }
>+    JS_RELEASE_LOCK(rt->debuggerLock);

oh, wow:
>     JSBool ok;
>+    JSBool retVal;

I don't see any reason not to recycle ok here:
>+            /*Be sure to wrap the drop inside a lock */
>+            retVal = DropWatchPoint(cx, wp);
>+            JS_RELEASE_LOCK(rt->debuggerLock);
>+            return retVal;	

call this ok:
>+    JSBool retVal;

>+        if (wp->object == obj && !DropWatchPoint(cx, wp)) {
>+            JS_RELEASE_LOCK(rt->debuggerLock);
>             return JS_FALSE;
>     }
>+    }
>+    JS_RELEASE_LOCK(rt->debuggerLock);

>-     OBJ_GET_CLASS(cx, JSVAL_TO_OBJECT(v)) == &js_FunctionClass)
>+     JS_GET_CLASS(cx, JSVAL_TO_OBJECT(v)) == &js_FunctionClass)

I seem to recall trying to make this change and having brendan tell me I was doing something wrong. iirc you really shouldn't be using this header outside spidermonkey itself. Which was my problem.

There's an alternative method you should call instead which I could find if necessary.

Basically, I hit:

using venkman (main thread) while stepping through code. I don't actually see any other threads of interest, although I am using PAC which probably makes my life more interesting.

I have this crash as in case anyone really cares.
Thanks for that review timeless.  
Can you update the patch for me?
I don't have cvs access unfortunately.
It's a stupid firewall issue I've never been able to resolve.
I generated the patch basically by hand. It really sucked.

With regards to the OBJ_GET_CLASS versus JS_GET_CLASS...
I have no choice but to use this header to make it compile.
When you are writing a debugger you need access to the internals.
The external API doesn't have enough to get the job done.
Can we make this simple change?

What was the "oh, wow:" comment for?

>Basically, I hit:
Are you saying my patch caused to crash somehow?
Please explain.

Thanks a lot.
Attached patch fix (obsolete) — Splinter Review
Mike, please test this hard. It fixes several bugs in your patch (subtle threading issues not for the faint of heart, for the most part -- plus a JSOP_LIMIT return where a JSTRAP_* enum was wanted). This patch takes great pains never to nest locks.

mrbkap: this fixes another bug I recall you were working on; hope it saves you from having to hack a fix.

Assignee: general → brendan
Attachment #228729 - Attachment is obsolete: true
Attachment #249052 - Flags: review?(mrbkap)
Attachment #228729 - Flags: review?(brendan)
the wow was because i was wondering why you would need to have retVal since the name ok is usually used, and then i realized that there was an ok you were trying not to collide with.

the basically i hit is me explaining that the reason i'm here is because i hit the problem you hit.

as for your OBJ_ change, I'd love it since I wanted it for jsd_xpc, but i do recall asking for it and being told no.

wrt patching.. has a cvsdo diff which can be used to get diffs that look like cvs diffs. i feel your pain. we have a firewall here too.

and i just midaired trying to say brendan was going to adopt this bug.
(In reply to comment #8)
> With regards to the OBJ_GET_CLASS versus JS_GET_CLASS...
> I have no choice but to use this header to make it compile.
> When you are writing a debugger you need access to the internals.
> The external API doesn't have enough to get the job done.
> Can we make this simple change?

No, first you should say why your debugger needs jsfun.h.  It's true that jsd/jsd_xpc.cpp includes jsfun.h, but it should not (timeless, rs=me on the fix, which uses JS_GetFunctionArity).

Blocks: js1.7src
Flags: blocking1.8.1.2?
Flags: blocking1.8.0.10?
Priority: -- → P1
Target Milestone: --- → mozilla1.9alpha
Attached patch fix, v2Splinter Review
Typo fix that timeless caught, plus optimization to avoid extra debuggerLock trip under DropWatchPointAndUnlock.

Attachment #249052 - Attachment is obsolete: true
Attachment #249055 - Flags: review?(mrbkap)
Attachment #249052 - Flags: review?(mrbkap)
Hypothetical pathological problem, not sure if this is real.

Thread zero (there are no other threads)
sets up a JSScript and calls JS_SetTrap for some part of it.
It now sets up two threads "victim" and debugger.

Thread victim calls js_DecompileValueGenerator on the script

     if (JS_UPTRDIFF(pc, script->code) >= (jsuword)script->length) {
         pc = fp->pc;
         if (!pc)
             goto do_fallback;
/* code pointer is here at time T */
     op = (JSOp) *pc;
/* T+1 */
     if (op == JSOP_TRAP)
/* T+2 */
         op = JS_GetTrapOpcode(cx, script, pc);
Thread debugger calls JS_ClearTrap on the script at time T+1.

JS_ClearTrap finishes before T+2. Actually it hardly matters as long as JS_ClearTrap is called before JS_GetTrapOpcode because there's a lock and ClearTrap will hold it preventing GetTrapOpcode from doing anything until ClearTrap finishes.

Based on the bug i'm fixing. Does that mean that instead of

 JS_GetTrapOpcode(JSContext *cx, JSScript *script, jsbytecode *pc)
     JSTrap *trap;
     DBG_LOCK_EVAL(cx->runtime, trap = FindTrap(cx->runtime, script, pc));
     if (!trap) {
          * If we lost a race with another thread, return JSOP_LIMIT so our
          * caller can detect this case and do something sane.
         return JSOP_LIMIT;

we should return *pc ? There are only 3 instances (soon 4) consuming this function, at the moment all in one file. We could fix that file to properly handle this function.

Otherwise, I'm pretty sure that the crash I'm fixing will haunt me again.

I'm pretty sure that most of the consumers are broken atm.
 172 js_Disassemble1(JSContext *cx, JSScript *script, jsbytecode *pc, uintN loc,
 173                 JSBool lines, FILE *fp)
 201             op = JS_GetTrapOpcode(cx, script, pc);
 202             if (op == JSOP_LIMIT)
 203                 return 0;
I don't see it setting an error.

3790                 op = JS_GetTrapOpcode(cx, jp->script, pc);
actually looks correct. The other I am asserting is wrong as the problem statement.
Comment on attachment 249055 [details] [diff] [review]
fix, v2

We might want to separate the JSWatchPoint.nref->JSWatchPoint.flags patch into the bug that it's fixing. I think this looks OK, though I'm not sure I've properly vetted all of the MT issues.
Attachment #249055 - Flags: review?(mrbkap) → review+
Not blocking the upcoming releases on this one.  Still needs bake time on the Trunk.  Please renominate for future release if you feel this really needs to land on the branches and are worth the risk.
Flags: blocking1.8.1.2?
Flags: blocking1.8.1.2-
Flags: blocking1.8.0.10?
Flags: blocking1.8.0.10-
MikeM: can you test this bug's patch and testify to its goodness or badness?  Thanks a ton,

Yes I will. Sorry for being so slow.  Most of my resources on this didn't get back from vacation until yesterday.  I'll get back to you soon. Thanks!

What this change checked into truck or do I have to apply the patch?
Fix was committed yesterday by happy accident, when trying to commit the patch for bug 366468.

Closed: 13 years ago
Resolution: --- → FIXED
This patch is working fine so far.  Not 100% done testing yet.

However, I am noticing a contention problem on the JSRuntime->trapList once a trap has been set on a script that is being executed by multiple threads.

Can we change the debuggerLock in the runtime from a PRLock to a "ReadWriteLock" (Many readers but only 1 writer)??

That way findTrap() will be fast and won't slow other threads executions except when a trap is cleared or set (which isn't very often)

Do you already have a "ReadWriteLock" available in NSPR or JS that I can test this with?

Yes, NSPR has multiple-reader/single-writer locks, see:

We can use one unless wtc thinks it's not advisable in this case. Please file a new bug. Thanks,

Yes, NSPR has a reader-writer lock.

To our surprise, switching to a reader-writer lock doesn't
always get the performance improvement that we expect.  So
make sure you experiment and measure first.
Attached patch make debuggerLock a r/w lock (obsolete) — Splinter Review
As wtc said, testing needed.  MikeM, can you try this out and report performance results?  Not just whether it relieves the contention you saw, but how it compares for other cases.

Attachment #253141 - Flags: review?(mrbkap)
Attachment #253141 - Flags: review?(mrbkap) → review+
It looks like the PRRWLock NSPR is heavier and slower than the simple PRLock.
The implementation of PRRWLock has an internal PRLock as well as 2 PRCondVar's.
Each of these has its own internal lock.
I think on a platform that supports native Condition Variables in the kernel we might be ok, but not on Win32.  

Lets just leave it as a PRLock for now.
Mike M. 
Comment on attachment 253141 [details] [diff] [review]
make debuggerLock a r/w lock


Attachment #253141 - Attachment is obsolete: true
You need to log in before you can comment on or make changes to this bug.