Closed Bug 477999 Opened 15 years ago Closed 14 years ago

JS_SuspendRequest should suspend requests from all contexts

Categories

(Core :: JavaScript Engine, defect)

defect
Not set
normal

Tracking

()

RESOLVED FIXED
Tracking Status
blocking2.0 --- betaN+

People

(Reporter: igor, Assigned: igor)

References

Details

(Whiteboard: fixed-in-tracemonkey)

Attachments

(2 files, 5 obsolete files)

+++ This bug was initially created as a clone of Bug #477627 +++

Currently SpiderMonkey allows to enter several requests on the same thread using different contexts. As a result of this API like JS_YieldRequest does not work with such nested requests and one has to be careful when using JS_SuspendRequest as the code must call the function for all running context on the current thread.

Thus the idea is to make JS_SuspendRequest to suspend requests for all contexts running on the current thread.
Changing the title to match the description.
Summary: Avoiding nested JS requests → JS_SuspendRequest should suspend requests from all contexts
Attached patch v1 (obsolete) — Splinter Review
Here is the untested patch. The main idea here is that JS_SuspendRequest does the same what ClaimTitle and js_GC do when they about to wait for the GC on another thread to complete. That is, JS_SuspendRequest does not change cx->requestDepth but rather just update JSRuntime.requestCount to remove from it all the contribution from the current thread.

Note that this is not compatible with an embedding that use several active contexts per thread and calls JS_SuspendRequest on each of them. On the other hand, it fully fixes JS_Yield problem. 

I will check how difficult it would be to preserve the current semantics but the simplest solution would be to introduce a new API.
Currently in the browser JS_SuspendRequest is used for 2 purposes:

1. To do what the name of API advertises: to suspend the request while potentially waiting for something.

2. To transfer the request counter from one context to another so JS_SuspendRequest will work when applied to the most nested context. With the patch it should be unnecessary. I will update the patch to reflect such usage.
Attached patch v2 (obsolete) — Splinter Review
Work in progress. The patch depends on the patch from bug 585686 comment 3 applied on top of TM revision f84b470314a8.
Attachment #361789 - Attachment is obsolete: true
Blocks: 585686
Blocks: 586252
Attached patch v3 (obsolete) — Splinter Review
The patch moves the requestDepth counter from JSContext to JSThread. This trivially allowed to make sure that JS_SuspendRequest always suspends the requests.

To stay compatible with the current code that may call JS_DestroyContext with request running and for cycle collector assumptions I added separated beginRequestCount so it is possible to know which context contributes to the requestDepth counter. These changes should fix the bug 586252 and bug 585059.

To patch also improves the detection of the stack boundary for conservative GC. Before the patch in the situation like:

JSAutoRequest run_request(cx);
{
    ....
    JSAutoSuspendRequest suspend_request(cx);
    {
        while (run_nested_application_loop_that_may_trigger_gc()) {
            if (event_needs_gc) {
                JSAutoRequest run_request_again(cx2);
                {
                    a_lot_of_js_stack_frames;
                    JSAutoSuspendRequest suspend_request_again(cx2);
                }
            }
       }
    }        
}

the suspend_request_again destructor that restarts the request would capture the native stack SP and that SP will continue to be used in subsequent event loop iterations. So if that nested event loop runs a GC, that GC scans the stack from the base until the SP including the space occupied by a_lot_of_js_stack_frames even if that space should be unused. That results in leaks.

With the patch in the above picture the destructor of run_request_again(cx2) would capture SP avoiding the space taken by a_lot_of_js_stack_frames.

This is still suboptimal as ideally after that destructor the SP should be that on the moment of suspend_request constructor call. But that is for another bug.

These conservative GC changes should also fix the bug 585686.
Attachment #464789 - Attachment is obsolete: true
Attachment #465224 - Flags: review?(jorendorff)
Attachment #465224 - Flags: review?(anygregor)
Comment on attachment 465224 [details] [diff] [review]
v3

>@@ -881,26 +879,24 @@ public:
>     }
>     NS_IMETHOD Unroot(void *n)
>     {
>         return NS_OK;
>     }
>     NS_IMETHODIMP Traverse(void *n, nsCycleCollectionTraversalCallback &cb)
>     {
>         JSContext *cx = static_cast<JSContext*>(n);
> 
>-        // Add cx->requestDepth to the refcount, if there are outstanding
>-        // requests the context needs to be kept alive and adding unknown
>+        // If cx is pushed it the context needs to be kept alive and adding unknown
>         // edges will ensure that any cycles this context is in won't be
>         // collected.

nit: This sentence is malformed.


>diff --git a/js/src/xpconnect/src/xpcthreadcontext.cpp b/js/src/xpconnect/src/xpcthreadcontext.cpp
>--- a/js/src/xpconnect/src/xpcthreadcontext.cpp
>+++ b/js/src/xpconnect/src/xpcthreadcontext.cpp
>@@ -90,37 +90,42 @@ XPCJSContextStack::Pop(JSContext * *_ret
>     NS_ASSERTION(!mStack.IsEmpty(), "ThreadJSContextStack underflow");
> 
>     PRUint32 idx = mStack.Length() - 1; // The thing we're popping
>     NS_ASSERTION(!mStack[idx].frame,
>                  "Shouldn't have a pending frame to restore on the context "
>                  "we're popping!");
> 
>     if(_retval)
>         *_retval = mStack[idx].cx;
>-
>+    
>     mStack.RemoveElementAt(idx);

whitespace nit.

I don't really know the XPC code.
Attachment #465224 - Flags: review?(anygregor) → review+
Attached patch v4 (obsolete) — Splinter Review
The new version syncs with the tip and addresses the nits above.

Note regarding xpcthreadcontext.cpp changes. They remove no longer necessary JS_SuspendRequest/JS_ResumeRequest call when pushing/poping cx on top of another cx. With manual code inspection I have found that XPCJSContextStack::Push(cx) with cx != null is never used just to suspend a request running on another cx. The intention is always to make cx an active one. Thus with requestDepth moved to JSThread the extra suspend/resume are no longer necessary.
Attachment #465224 - Attachment is obsolete: true
Attachment #466409 - Flags: review?(jorendorff)
Attachment #465224 - Flags: review?(jorendorff)
This should be 2.0 a blocker as it fixes few issues related to the interaction of the conservative GC with the request model.
blocking2.0: --- → ?
(In reply to comment #9)
> This should be 2.0 a blocker as it fixes few issues related to the interaction
> of the conservative GC with the request model.

Marked as betaN+. Should I move it up?
blocking2.0: ? → betaN+
Attachment #466409 - Flags: review?(jorendorff) → review?(mrbkap)
Comment on attachment 466409 [details] [diff] [review]
v4

Gregor, would you mind taking this from me? Thanks!
Attachment #466409 - Flags: review?(mrbkap) → review?(anygregor)
I already gave my r+ for the JS side but I don't know the XPC code.
Gregor, I can take the XPC code chunks just show me everything that you didn't review. I will be downstairs in a bit.
Is a try-server version of this fix somewhere? Mozilla/5.0 (Windows NT 6.1; rv:2.0b5pre) Gecko/20100824 Minefield/4.0b5pre hangs for me when Chromebug returns from the callback from jsd while processing the just JS code for "services-sync". Seems like a possible combo that could involve this bug.
(In reply to comment #15)
> Is a try-server version of this fix somewhere? 

http://ftp.mozilla.org/pub/mozilla.org/firefox/tryserver-builds/ibukanov@mozilla.com-e35e5560b384 includes the fix IIRC.
Thank you! I do not see the hang with that build. I'll try the nightly as soon as this bug is fixed.
Blocks: 590746
Comment on attachment 466409 [details] [diff] [review]
v4

Blake, do you have to review the xpconnect part here?
Attachment #466409 - Flags: review?(anygregor) → review?(mrbkap)
I will steal the xpconnect bits review. blake is super busy.
Comment on attachment 466409 [details] [diff] [review]
v4


>+#endif /* JS_THREADSAFE */
>+
>+JS_PUBLIC_API(void)
>+JS_BeginRequest(JSContext *cx)
>+{
>+#ifdef JS_THREADSAFE
>+    cx->beginRequestCount++;
>+    StartRequest(cx);
> #endif
>+}

beginRequestCount is a terrible name. Maybe pendingRequests? 

>+    t->requestDepth = saveDepth;
>+    t->suspendCount--;
> #endif

Maybe requestCount like suspendCount instead of beginRequestCount?

> JS_PUBLIC_API(jsword)
> JS_SetContextThread(JSContext *cx)
> {
> #ifdef JS_THREADSAFE
>-    JS_ASSERT(cx->requestDepth == 0);

Why is this gone now?

>+    unsigned            beginRequestCount;

A quick comment what this is.

>+    MarkRangeConservatively(trc, ctd->registerSnapshot.words,
>+                            JS_ARRAY_END(ctd->registerSnapshot.words));

You can do this on a single line if you want.
>+    ConservativeGCThreadData *ctd = &JS_THREAD_DATA(cx)->conservativeGC;
>+   
>+#ifdef JS_THREADSAFE
>+    /* Record the stack top here only if we are called from a request. */
>+    JS_ASSERT(cx->thread->requestDepth >= ctd->ignoreBeginRequestCount);

Thats a pretty terrible name, too. Maybe something with "mark" or "threshold"?

>+    JS_ASSERT(cx->thread->requestDepth >= 1);
>+    JS_ASSERT(!cx->thread->data.conservativeGC.ignoreBeginRequestCount);
>+    if(cx->thread->requestDepth == 1)
>+        cx->thread->data.conservativeGC.ignoreBeginRequestCount = 1;
>+    JS_GC(cx);
>+    if(cx->thread->requestDepth == 1)
>+        cx->thread->data.conservativeGC.ignoreBeginRequestCount = 0;

This could use a comment explaining whats going on here.

Jorendorff might be interested in looking over this. Maybe feedback? him?
Attachment #466409 - Flags: review?(mrbkap) → review+
Attached patch v5 (obsolete) — Splinter Review
The new version addresses the nits above. Jason, do you have time for a quick look?
Attachment #466409 - Attachment is obsolete: true
Attachment #469985 - Flags: feedback?(jorendorff)
Attachment #469985 - Flags: feedback?(jorendorff)
http://hg.mozilla.org/tracemonkey/rev/eae8350841be
Whiteboard: fixed-in-tracemonkey
Backed out:

http://hg.mozilla.org/tracemonkey/rev/0d5d2ceb9436

Due to total trace-test failure on Mac with this stack:

#0  0x0000000100170c56 in JS_Assert (s=0x10022fa20 "!conservativeGC.hasStackToScan()", file=0x10022f21f "../jscntxt.cpp", ln=469) at ../jsutil.cpp:80
#1  0x000000010004485c in JSThreadData::finish (this=0x10053d5c0) at ../jscntxt.cpp:469
#2  0x00000001000448d7 in js_FinishThreads (rt=0x10053d000) at ../jscntxt.cpp:647
#3  0x000000010001cf1c in JSRuntime::~JSRuntime (this=0x10053d000) at ../jsapi.cpp:658
#4  0x000000010001d099 in JS_Finish (rt=0x10053d000) at ../jsapi.cpp:750
#5  0x000000010000bb37 in main (argc=7, argv=0x7fff5fbff4f8, envp=0x7fff5fbff538) at ../../shell/js.

Igor, please verify that I backed out properly on top of my big landing for bug 558451 (the giant patch I landed on top of your patch, but did *not* back out).

/be
Whiteboard: fixed-in-tracemonkey
Argh. One-line fix to hand-merge botch in back-out:

http://hg.mozilla.org/tracemonkey/rev/e6496cd735a6

/be
Attached patch v6Splinter Review
This fixes the issue from the comment 23 - the assert should only present in JS_THREADSAFE version.
Attachment #469985 - Attachment is obsolete: true
Attachment #470341 - Flags: review+
Attached patch proper v6Splinter Review
The prev attachment has a wrong version of the patch.
Attachment #470342 - Flags: review+
relanded - http://hg.mozilla.org/tracemonkey/rev/319b1a4e0883
Whiteboard: fixed-in-tracemonkey
http://hg.mozilla.org/mozilla-central/rev/319b1a4e0883
Status: NEW → RESOLVED
Closed: 14 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: