Closed Bug 477021 Opened 15 years ago Closed 15 years ago

JSRuntime.contextList mutations must not happen when GC is running

Categories

(Core :: JavaScript Engine, defect)

defect
Not set
normal

Tracking

()

RESOLVED FIXED

People

(Reporter: igor, Assigned: igor)

References

Details

(Keywords: fixed1.9.0.11, fixed1.9.1, Whiteboard: fixed-in-tracemonkey)

Attachments

(2 files, 4 obsolete files)

As it was discovered in the bug 476934 the xpc code always calls JS_DestroyContext APIs outside JS request violating the JS API requirement. One way to fix this is to add a call to JS_BeginRequest to ReleaseJSContext implementation. But this would effectively mean taking an extra lock when destroying the context.

Thus I suggest to add a code to js_DestroyContext to wait for the GC to finish before unlinking the context from the runtime list. It will explicitly allow to call JS_DestroyContext outside requests. It will also allow for a C++ embedding to always use JSAutoRequest and similar classes removing a possibility of missing JS_EndRequest calls.
There is similar threading bug in js_NewContext. The function updates JSRuntime.contextList to add the new context there when the GC may run on a separated thread accessing the field when iterating over contexts.

I nominate this for 1.9.1 as DOM workers may expose the bug. I also make this bug to depend on the bug 476934 as for the fix I plan to use js_WaitIfGC from that bug.
Flags: blocking1.9.1?
Summary: js_DestroyContext should wait for GC to finish → JSRuntime.contextList must not happen when GC is running
About embeddings forgetting to call JS_DestroyContext from a request: it is not only xpconnet but also shell/js.cpp who does exactly that in all 3 cases when it calls JS_DestroyContext* APIs. At this point I would even suggest to require that JS_DestroyContext must be called outside the request.
Summary: JSRuntime.contextList must not happen when GC is running → JSRuntime.contextList mutations must not happen when GC is running
Attached patch v1 (obsolete) — Splinter Review
The patch fixes js_NewContext issue and allows to call js_DestroyContext outside a request. The patch depends on the patch from bug 476934 comment 12 so I will ask for its review after addressing the bug 476934.
Depends on: 476934
Attached patch v2Splinter Review
The new version of the patch is against TM tip.

It makes sure that js_(New|Destroy)Context wait for the GC if necessary. I also changed the initialization order for the context to make sure that it is fully initialized before it is put on the runtime list.
Attachment #360901 - Attachment is obsolete: true
Attachment #361029 - Flags: review?(brendan)
Comment on attachment 361029 [details] [diff] [review]
v2

Looks good -- best to get MikeM, Wes, and anyone else who can help to test hard on their MT server-side embeddings.

/be
Attachment #361029 - Flags: review?(brendan) → review+
Context list mutations are almost certainly causing this bug too: Bug 466182
Check the patch in. I'll test monday.  Is the trunk otherwise stable for MT guys?
(In reply to comment #6)
> Context list mutations are almost certainly causing this bug too: Bug 466182
> Check the patch in. I'll test monday.  Is the trunk otherwise stable for MT
> guys?

Trunk for us is http://hg.mozilla.org/tracemonkey -- we have full tinderbox mochi and unit test. As for MT testing, it's getting better all the time since Firefox 3.1 has web worker threads. Finally! But please help us test in any way you can. Thanks, MikeM -- and thanks for all your years of patience while we were idling in single-thread land.

/be
Hope this is the right bug to block bug 466182. If not, someone will fix the deps I'm sure.

/be
Blocks: 466182
landed to TM - http://hg.mozilla.org/tracemonkey/rev/7153a04e0968
Whiteboard: fixed-in-tracemonkey
It turned out that I could not wait until the full TM tinderbox cycle finishes. So if the patch causes troubles, just back it out.
Flags: blocking1.9.1? → blocking1.9.1+
http://hg.mozilla.org/mozilla-central/rev/7153a04e0968
Status: NEW → RESOLVED
Closed: 15 years ago
Resolution: --- → FIXED
Flags: in-testsuite-
Flags: in-litmus-
Blocks: 478336
To land this in the 1.9.0 branch, I think we must first land bug 467441 there.  igor, does that sound right to you?
The interdiff is like:
>v1 added this extraneous line, which isn't relevant in the branch.
>-    JS_ASSERT(cx->resolveFlags == 0);
Attachment #366892 - Attachment is obsolete: true
(In reply to comment #13)
> To land this in the 1.9.0 branch, I think we must first land bug 467441 there. 
> igor, does that sound right to you?

Hm, does the trunk patch in this bug depends on that and not, for example, on bug 477021? In any case, if that makes backport life easier, I am very fine with that.
Did you mean some other bug?  Bug 477021 is this bug!
(In reply to comment #17)
> Did you mean some other bug?  Bug 477021 is this bug!

Yes, I meant some other bug - one from the list that you discovered for 1.9.0 backporting.
Comment on attachment 366905 [details] [diff] [review]
backport to 1.9.0 (for SpiderMonkey 1.8 source release) v2

Patch 3/4 to fix a crashing bug for embedders that blocks the SpiderMonkey 1.8 source release.  See bug 478336 comment 21 for the full list.
Attachment #366905 - Flags: approval1.9.0.10?
Assignee: general → jorendorff
Comment on attachment 366905 [details] [diff] [review]
backport to 1.9.0 (for SpiderMonkey 1.8 source release) v2

Approved for 1.9.0.10, a=dveditz for release-drivers
Attachment #366905 - Flags: approval1.9.0.10? → approval1.9.0.10+
landed to 1.9.0:

Checking in jscntxt.c;
/cvsroot/mozilla/js/src/jscntxt.c,v  <--  jscntxt.c
new revision: 3.139; previous revision: 3.138
done
Keywords: fixed1.9.0.10
Backed out from 1.9.0 - the landed patch depends on the patch from bug 476934 that has caused compilation error on Windows.
Keywords: fixed1.9.0.10
A "final" glance before landing the patch has revealed that it used the JS_STATIC_ASSERT macro after some code. This breaks older MSVC versions that are still supported on 1.9.0 because  JS_STATIC_ASSERT is implemented via an extern declaration that must come before any code in older versions of C.

The new patch just drops that static assert and removes `< garbage text in the comment.
Attachment #366905 - Attachment is obsolete: true
Attachment #373852 - Flags: review+
Attachment #373852 - Flags: approval1.9.0.10?
Assignee: jorendorff → igor
Here is the diff between v2 and v3 for 1.9.0:

--- jscntxt.c.base      2009-04-21 17:36:40.000000000 +0200                                                                                                        
+++ jscntxt.c   2009-04-21 17:37:32.000000000 +0200                                                                                                                
@@ -228,7 +228,6 @@
 #ifdef JS_THREADSAFE                                                                                                                                              
     js_InitContextThread(cx, thread);                                                                                                                             
 #endif                                                                                                                                                            
-    JS_STATIC_ASSERT(JSVERSION_DEFAULT == 0);                                                                                                                     
     JS_ASSERT(cx->version == JSVERSION_DEFAULT);                                                                                                                  
     JS_INIT_ARENA_POOL(&cx->stackPool, "stack", stackChunkSize, sizeof(jsval),                                                                                    
                        &cx->scriptStackQuota);                                                                                                                    
@@ -352,7 +351,7 @@
                                                                                                                                                                   
     if (last) {                                                                                                                                                   
 #ifdef JS_THREADSAFE                                                                                                                                              
-        /*`<                                                                                                                                                      
+        /*                                                                                                                                                        
          * If cx is not in a request already, begin one now so that we wait                                                                                       
          * for any racing GC started on a not-last context to finish, before                                                                                      
          * we plow ahead and unpin atoms.  Note that even though we begin a
Comment on attachment 373852 [details] [diff] [review]
backport to 1.9.0 (for SpiderMonkey 1.8 source release) v3

Yeah, this is fine.

Approved for 1.9.0.10. a=ss
Attachment #373852 - Flags: approval1.9.0.10? → approval1.9.0.10+
landed to 1.9.0:

Checking in jscntxt.c;
/cvsroot/mozilla/js/src/jscntxt.c,v  <--  jscntxt.c
new revision: 3.143; previous revision: 3.142
done
Keywords: fixed1.9.0.10
Dear Developers,
    I have got some updates with the latest js code in CVS, and I make sure these code has incorporated the final patch for the Bug 477021. However, now I can use only two threads and three semaphores (the method is similar to my patches published in bug #133773) to generate a order violation (js_FinishRuntimeScriptState can happen first to destroy the scriptFilenameTable hash table, and then js_MarkScriptFilenames and js_SweepScriptFilenames are called to try to access this hash 
table).

    Although the NULL check at the beginning of the js_MarkScriptFilenames and 
js_SweepScriptFilenames functions can hide this segmentation fault, but I am afraid the NULL check is not a perfect solution to this order violation. Also, it seems to me that the js_WaitForGC() before removing context from the contextlist does not help.

    Below is my output. I will submit my patch to the bug page.

    If you have any concerns, please let me know.

----------------------------------------------------------------------
heming@rcs:~/bugs/firefox-133773$ sh bug
Tid -138024048 END js_WaitForGC
Tid -146416752 END js_WaitForGC
DestroyContext tid -138024048 in critical region, js_ContextNum 2
Tid -138024048 sets -138024048 to sem 1 tid1
Tid -138024048 sets -138024048 to sem 2 tid2
Tid -138024048 js_WaitForGC in js_DestroyContext
Tid -138024048 END js_WaitForGC
enter js_GC tid -138024048, rt state 2, gcLevel 0
Tid -138024048 post on sem 1
Tid -138024048 wait on sem 2
DestroyContext tid -146416752 in critical region, js_ContextNum 1
Tid -146416752 sets -146416752 to sem 1 tid2
Tid -146416752 sets -146416752 to sem 2 tid1
Tid -146416752 wait on sem 1
Tid -146416752 wakes up on sem 1
Tid -146416752 js_WaitForGC in js_DestroyContext
Tid -146416752 END js_WaitForGC
enter js_GC tid -146416752, rt state 3, gcLevel 0
Tid -146416752 js_MarkScriptFilenames
Tid -146416752 destroy js_FinishRuntimeScriptState
Tid -146416752 post on sem 2
Tid -138024048 wakes up on sem 2
Tid -138024048 js_MarkScriptFilenames
Tid -138024048 js_MarkScriptFilenames NUUUUUUUL, will CRASH without the 
scriptFilenameTable check! 0
Tid -138024048 js_SweepScriptFilenames NUUUUUUUL, will CRASH without the 
scriptFilenameTable check!
Attachment #399119 - Flags: review+
Attachment #399119 - Attachment is patch: true
Attachment #399119 - Attachment mime type: application/octet-stream → text/plain
Here is the seg fault back trace if I remove the "if (!rt->scriptFilenameTable)" check in the js_SweepScriptFilenames() function (this check is added to fix another bug 131246, not for this bug or bug 133773).
The order violation in a legal schedule is: one thread is allowed to call js_FinishRuntimeScriptState() (in js_DestroyContext() function) to destroy the hash table, then another thread is allowd to call js_SweepScriptFilenames() (in js_GC() function) to access this table.


(gdb) bt
#0  0x080a5bbb in JS_HashTableEnumerateEntries (ht=0x0, f=0x80feaa0 <js_script_filename_sweeper>, arg=0x815e008) at jshash.c:360
#1  0x080feb2b in js_SweepScriptFilenames (rt=0x815e008) at jsscript.c:1283
#2  0x080a4e2a in js_GC (cx=0x8175708, gckind=GC_NORMAL) at jsgc.c:3426
#3  0x08071798 in js_DestroyContext (cx=0x8175708, mode=JSDCM_FORCE_GC) at jscntxt.c:453
#4  0x08058cb3 in JS_DestroyContext (cx=0x8175708) at jsapi.c:1035
#5  0x0804bf37 in RunJavascript ()
Attachment #399119 - Flags: review+ → review?(igor)
Comment on attachment 399119 [details] [diff] [review]
The js_WaitForGC() in js_DestroyContext does not help.

You can not review your own patch. A JavaScript peer must review your patch instead. I'll ask Igor to look at it but I'm not sure the problem you are trying to fix is clearly identified. It may be better to file a new bug rather than try to hijack this bug.
Yes, please file a new bug, attach a patch with r?igor@mir2.org set there, as I suggested elsewhere. Thanks,

/be
Comment on attachment 399119 [details] [diff] [review]
The js_WaitForGC() in js_DestroyContext does not help.

The patch went to bug 515403.
Attachment #399119 - Attachment is obsolete: true
Attachment #399119 - Flags: review?(igor)
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: