Closed Bug 631742 Opened 11 years ago Closed 8 years ago

jsdService::EnumerateScripts() does the wrong locking dance

Categories

(Core :: JavaScript Engine, defect)

x86_64
Linux
defect
Not set
normal

Tracking

()

RESOLVED WONTFIX
Tracking Status
blocking2.0 --- .x+

People

(Reporter: sfink, Assigned: sfink)

Details

(Whiteboard: [softblocker])

Attachments

(2 files, 3 obsolete files)

Initial symptom: assertion failure in jsd_IterateScripts for JSD_SCRIPTS_LOCKED(jsdc).

What's actually happening is that we are enumerating over the scripts with JSDContext c1, and within one of those callbacks, debugging is deactivated and that context is destroyed (along with all JSDScripts, including the ones being enumerated over.) That would cause problems, except that something else goes wrong first: back within the calling EnumerateScripts(), mCx has been updated to the new JSDContext, which is not in use yet and so in particular its scripts subsystem is unlocked. This is detected for the next script in the original list.

One problem here is that we are invoking enumeration callbacks with a lock held, and in particular these callbacks are expected to invoke JS code. (Maybe even JS code that deactivates debugging, as is the case here.)

That can be resolved by constructing an array of jsdscripts and iterating over them after releasing the lock. That leads to the next problem:

Many of the JSScripts that these JSDScripts are referring to are actually dead upon entry to EnumerateScripts(). Specifically, they will be found during the next GC run. This GC gets triggered during the enumeration by calls made inside jsdScript::FromPtr, and as a result mangles the jsdscript->script (and probably the jsdscript as well, I guess?) I thought I watched for callbacks on the script destruction callback that jsdscript uses to detect this case and didn't see any, but I'll have to try that again to be sure.

Finally, as I mentioned the jsdScript::FromPtr invokes JS code. It appears to correctly enter compartments as needed, but it does not begin a request.

I'll dig into this more a bit later.
blocking2.0: --- → ?
blocking2.0: ? → final+
Whiteboard: [softblocker]
Thinking about this more, I think it's a regression.

The main problem happens because debugging gets turned off while enumerating over the scripts. Before I committed bug 626830, turning off debugging waited until no code was running, same as turning it on did and still does. But bug 626830 is also needed since without it debugging can get turned half-on and wreak havoc, which itself is a regression from making things asynchronous in the first place, and *that* was needed because of Jägermonkey.

What a tangled web we weave, when first we practice to... um, make things go fast. Shakespeare should've spent more time in the computer lab so he could have left me more topical quotes.

Two possible paths for fixing this -- fix this problem for real by finding all the places where we hang onto state that might get ripped out from underneath us (and track down why the jsdscripts aren't getting gently invalidated), or just make it an error to deactivate debugging within a callback. The latter is tempting, but it's also fairly likely to break functionality that Firebug relies upon.

Or maybe a middle ground? Instead of destroying things when deactivating, just mark them as invalid and clean them up later. Then all of the unsafe places would need to check the valid bits.
jjb - This bug causes crashes and memory corruption while executing the Firebug test suite (especially the "firebug/" section). Does Firebug need to deactivate debugging from within a JSD callback?

It might be possible to work around by suspending instead of deactivating, then deactivating on a timer callback (as long as you aren't in a nested event loop.) I'd hate to impose yet another difficult-to-workaround restriction on the debugger API, though.
(In reply to comment #2)
> jjb - This bug causes crashes and memory corruption while executing the Firebug
> test suite (especially the "firebug/" section). Does Firebug need to deactivate
> debugging from within a JSD callback?

Do you mean: "does Firebug need to call jsd.off() when another jsd function is on the callstack"? I don't think so, but I'm not sure how to know that we don't do it. It's not like we have a "isJSDCallbackActive()" test.  I guess we could think about it, after we decide if that is what you are asking.

If you are asking just about enumerateScripts, that is a much easier issue to address.

> 
> It might be possible to work around by suspending instead of deactivating, then
> deactivating on a timer callback (as long as you aren't in a nested event
> loop.) 

A lot of the time that we use jsd are in a nested event loop.
(In reply to comment #3)
> (In reply to comment #2)
> > jjb - This bug causes crashes and memory corruption while executing the Firebug
> > test suite (especially the "firebug/" section). Does Firebug need to deactivate
> > debugging from within a JSD callback?
> 
> Do you mean: "does Firebug need to call jsd.off() when another jsd function is
> on the callstack"? I don't think so, but I'm not sure how to know that we don't
> do it. It's not like we have a "isJSDCallbackActive()" test.  I guess we could
> think about it, after we decide if that is what you are asking.
> 
> If you are asking just about enumerateScripts, that is a much easier issue to
> address.

I think it's just enumerateScripts, enumerateFilters, and enumerateContexts.

Unfortunately, I think I've lumped a couple of problems together here, though, and I haven't managed to tease them apart yet. Inserting a forced GC() call before script enumeration helps some, but not for a reason as simple as I posited above (that the relevant jsdScripts were already dead.)

> > It might be possible to work around by suspending instead of deactivating, then
> > deactivating on a timer callback (as long as you aren't in a nested event
> > loop.) 
> 
> A lot of the time that we use jsd are in a nested event loop.

Does that change your answer above? As in, do you need to be able to spin a nested event loop within a script enumeration callback? (Or rather, do you need to be able to deactivate debugging from within that nested event loop?)
Attached patch WIP: JSD enumeration handling (obsolete) — Splinter Review
My current work in progress for dealing with the enumeration issues. This is not complete, and some of the stuff in here just isn't going to work. I think I need to figure out how to lock out most changes or something.
(In reply to comment #4)
> 
> > > It might be possible to work around by suspending instead of deactivating, then
> > > deactivating on a timer callback (as long as you aren't in a nested event
> > > loop.) 
> > 
> > A lot of the time that we use jsd are in a nested event loop.
> 
> Does that change your answer above? As in, do you need to be able to spin a
> nested event loop within a script enumeration callback? 

We never need to spin a nested event loop within any enumeration; we don't need to exit a nested event loop either.

(Or rather, do you need
> to be able to deactivate debugging from within that nested event loop?)

The scenario that I don't like goes:
   hit jsd breakpoint,
   spin event loop
   enumerateX
   bizarre exception,  
   resume javascript (exit nested event loop, deactivate debugging)
   crash
So it would better if that last thing did not happen. I guess we can audit our code to ensure we trap exceptions in enumerations.
(In reply to comment #6)

> The scenario that I don't like goes:

Now I realize that scenario makes no sense: the exception will just throw us back into the nestedEventLoop.  So the only problem is to create a new loop or exit explicitly from this one while in enumerateX.

One thing puzzles me: I don't believe we do any event loop work now in enumerateX; I doubt venkman does. So is there anything that really needs to be fixed here?
(In reply to comment #7)
> (In reply to comment #6)
> 
> > The scenario that I don't like goes:
> 
> Now I realize that scenario makes no sense: the exception will just throw us
> back into the nestedEventLoop.  So the only problem is to create a new loop or
> exit explicitly from this one while in enumerateX.
> 
> One thing puzzles me: I don't believe we do any event loop work now in
> enumerateX; I doubt venkman does. So is there anything that really needs to be
> fixed here?

At least one of the tests in the FBTest suite calls jsd.off() inside a script enumerate callback. I can't remember now whether a nested event loop was involved, but I doubt it. I only brought that up because if it was done, it would make it more likely that you'd want to turn off debugging inside an enumeration.
Any idea which test? I'd like to understand why that happens.
Attached patch JSD enumeration handling (obsolete) — Splinter Review
I think this patch should fix all the enumeration problems not related to threading. I don't understand the JSD threading story, so I'm not sure whether I need to add more locking or not.

I'm not even sure what JSD_LockScriptSubsystem() is supposed to protect. I took it away from the script enumeration callbacks, because it seems bad to hold a lock across arbitrarily expensive operations.

This is all mostly untested. I ran the Firebug test suite through it, and it's happy, but I'm not sure it's currently hitting any of this code. I guess I'll at least try to figure out a way of making the original problem fail earlier and more reliably.
Attachment #515182 - Flags: review?(timeless)
Comment on attachment 515182 [details] [diff] [review]
JSD enumeration handling

Cancelling the review request for now. As I write tests, I realize this patch isn't necessary for some of the things I thought were problematic. Some form of it will still be necessary, but I don't think the current patch is either necessary or sufficient.
Attachment #515182 - Flags: review?(timeless)
Assignee: nobody → sphink
Status: NEW → ASSIGNED
Attachment #515804 - Flags: review?(timeless)
Attached patch Fix JSD enumeration handling (obsolete) — Splinter Review
Somewhat strange patch: there are 3 different enumerators in JSD, and I handle each one differently.

For JSContexts, I left it alone. A doubly-linked list is pretty nice for enumeration, and it seems hard to mangle the set of contexts anyway. I'm not sure what keeps other threads from nuking contexts while you're iterating, so if there's something I need to deal with here for that, let me know.

For JSDFilters, I hang onto a filter while calling its enumeration callback to prevent it from being deleted (note that enumerations can nest). If the filter is deleted, I break out of the iteration because there's no way to get to another filter without constructing another data structure. (eg an already-visited table, or a separate list of things to iterate over, or a freed list, or whatever.) That does mean that you can't loop over filters and remove each one, which is a little annoying.

I have another version that creates an upfront list of filters to iterate over, locking all of them from deletion, and then the iteration can proceed over all survivors (and skip newborns). I can dig it up if that is preferable.

For JSDScripts, it's the same basic problem as with JSDFilters, but JSDScript is an opaque data type in jsd_xpc.cpp so if I added similar deleted/locked fields I would have to add accessors to the JSD API, which is a pain and is user-visible. So instead I make iteration n^2: for every script in the original list, check whether it's still in the list before invoking the callback. This is still not 100% correct, because theoretically the current script could be deleted and a new one could get allocated at the same address and be added to the list. That bothers me, but the result would just be that the new script would be included in the enumeration that started before it existed.

Both JSDFilters and JSDScripts are now also protected against debugging being deactivated while enumeration is happening.

Iterating while invoking callbacks that can mutate what you're iterating over is messy, and this patch plugs some of the holes. There are many alternative solutions. Some are more invasive. Some modify the API more. This patch isn't great, but it tries to strike a balance.
Attachment #510771 - Attachment is obsolete: true
Attachment #515182 - Attachment is obsolete: true
Attachment #515813 - Flags: review?(timeless)
(In reply to comment #13)
> Created attachment 515813 [details] [diff] [review]
> Fix JSD enumeration handling

Just some FYI

> For JSContexts, I left it alone. 

Firebug does not enumerate these, we don't know what they are.

> 
> For JSDFilters, ... That does mean that you can't loop over filters and remove
> each one, which is a little annoying.

As long as we know, it's not a big deal. Because the filters are global and contain variables they have to be dealt with in compact code anyway (vs various places calling the api).

> For JSDScripts, it's the same basic problem as with JSDFilters,

But from JS perspective they are completely different: we can't explicitly create or delete them. FWIW debuggers are not interested the scripts created during enumeration in a single event thread world, we'll skip them if we can.
Comment on attachment 515813 [details] [diff] [review]
Fix JSD enumeration handling

This bug is still marked blocking2.0=final+, and I haven't heard from timeless in a little while, so I'll try Brendan.
Attachment #515813 - Flags: review?(timeless) → review?(brendan)
** PRODUCT DRIVERS PLEASE NOTE **

This bug is one of 7 automatically changed from blocking2.0:final+ to blocking2.0:.x during the endgame of Firefox 4 for the following reasons:

 - it was marked as a soft blocking issue without a requirement for beta coverage
blocking2.0: final+ → .x+
Comment on attachment 515813 [details] [diff] [review]
Fix JSD enumeration handling

your handling of rev->filterObject here makes no sense:
+    if (rec->lockedByEnumeration > 0) {
+        NS_IF_RELEASE (rec->filterObject);
+        rec->filterObject = NULL;
+        rec->deleted = JS_TRUE;
+    } else {
+        if (rec->filterObject)
+            NS_IF_RELEASE (rec->filterObject);
+        PR_Free (rec);
+    }
 }


1. NS_IF_RELEASE() does a null check, so the null check in the else is superfluous.
2. you're doing the release in both branches, why not do it before the conditional?
 
 jsdService::EnumerateFilters (jsdIFilterEnumerator *enumerator) 
it seems like !enumerator would be an error...
+    if (!gFilters || !enumerator)
         return NS_OK;
if you're worried about threading then using ++ isn't really proper:
+        ++current->lockedByEnumeration;

has something removed this from the list:
+            /* No way to continue the iteration */
+            jsds_FreeFilter (current);

         current = reinterpret_cast<FilterRecord *>
                                   (PR_NEXT_LINK (&current->links));
Attachment #515813 - Flags: review?(brendan) → review-
(In reply to comment #17)
> Comment on attachment 515813 [details] [diff] [review]
> Fix JSD enumeration handling
> 
> your handling of rev->filterObject here makes no sense:
> +    if (rec->lockedByEnumeration > 0) {
> +        NS_IF_RELEASE (rec->filterObject);
> +        rec->filterObject = NULL;
> +        rec->deleted = JS_TRUE;
> +    } else {
> +        if (rec->filterObject)
> +            NS_IF_RELEASE (rec->filterObject);
> +        PR_Free (rec);
> +    }
>  }
> 
> 
> 1. NS_IF_RELEASE() does a null check, so the null check in the else is
> superfluous.

Ok

> 2. you're doing the release in both branches, why not do it before the
> conditional?

Good point.

>  jsdService::EnumerateFilters (jsdIFilterEnumerator *enumerator) 
> it seems like !enumerator would be an error...
> +    if (!gFilters || !enumerator)
>          return NS_OK;

I was just following the preexisting code, which checks

  if (enumerator) { ... }

before invoking it. Though it was doing it within the loop... oh! My change is wrong. RefreshFilters() calls EnumerateFilters(nsnull) to sync all the filters. 

> if you're worried about threading then using ++ isn't really proper:
> +        ++current->lockedByEnumeration;

This isn't for threading. This is for enumeration callbacks modifying what they're enumerating (either directly, or within a nested enumeration.)

Hmm... can you think of a better name? "locked" does imply a thread lock, which is not what's going on here. "held"?

> has something removed this from the list:
> +            /* No way to continue the iteration */
> +            jsds_FreeFilter (current);

Yes. This is conditional on the 'deleted' flag being set, which means jsds_FreeFilter has already been called on this filter (but it was "locked" by lockedByEnumeration > 0). That is only called after removing a filter from a list, assuming it was in a list in the first place (which this one was).

The 'deleted' flag means that a filter is dead but its memory cannot be released yet because something may still need to access it. Specifically, the 'deleted' and 'lockedByEnumeration' fields still contain valid information; nothing else makes sense to look at.
I believe this addresses all review comments so far.
Attachment #515813 - Attachment is obsolete: true
Attachment #517486 - Flags: review?(timeless)
Comment on attachment 517486 [details] [diff] [review]
Fix JSD enumeration handling

so...

i think you want 'refcount'

please note that jsdService is threadsafe. so i think you probably do want to use an atomic addref. the primary consumer that you're dealing with is firebug, but there could be others, and i'd like jsd to be friendly/proper.

in terms of unhappy patterns:

325         NS_IF_RELEASE(rec->filterObject);
326         NS_ADDREF(filter);
327         rec->filterObject = filter;

filterObject should be converted to an nsCOMPtr<>, the code long predates templates.

on the topic of reentrancy, jsdService can't be allowed to call out to xpcom while it holds one of its locks.

     while((script = JSD_IterateScripts(mCx, &iter))) {
+        scripts.AppendElement(script);
+    }
+
+    for (size_t i = 0; i < scripts.Length(); ++i) {
+        if(!JSD_IsActiveScript(mCx, scripts[i]))
+            continue;

Can't we move the if into the while loop?

If we do that, can we move the unlock subsystem to right after the while loop?

for jsdService::EnumerateFilters, i think the thing to do is to provide a reflection which is somehow not actually in the linked list (sorry, it's late, this is the best i can offer)
Attachment #517486 - Flags: review?(timeless) → review-
Comment on attachment 515804 [details] [diff] [review]
JSD enumeration tests

seems ok
Attachment #515804 - Flags: review?(timeless) → review+
Component: JavaScript Debugging/Profiling APIs → JavaScript Engine
JSD is dead.
Status: ASSIGNED → RESOLVED
Closed: 8 years ago
Resolution: --- → WONTFIX
You need to log in before you can comment on or make changes to this bug.