Closed Bug 515403 Opened 10 years ago Closed 10 years ago

The bug order (js_FinishRuntimeScriptState --> js_{Mark,Sweep}ScriptFilenames) can still occur and cause crash with the fix in bug 133773 and 477021

Categories

(Core :: JavaScript Engine, defect, P2, critical)

All
Linux
defect

Tracking

()

RESOLVED FIXED
Tracking Status
status1.9.2 --- beta3-fixed

People

(Reporter: hc2428, Assigned: igor)

Details

(Whiteboard: [ccbr] fixed-in-tracemonkey)

Attachments

(4 files, 1 obsolete file)

User-Agent:       Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 6.0; Maxthon; SLCC1; .NET CLR 2.0.50727; Media Center PC 5.0; .NET CLR 1.1.4322; InfoPath.2; MS-RTC LM 8; CIBA; .NET CLR 3.5.30729; .NET CLR 3.0.30618)
Build Identifier: The bug order (js_FinishRuntimeScriptState --> js_SweepScriptFilenames) can still occur and cause crash

Dear developers:
    I am investigating the bug order (js_FinishRuntimeScriptState --> js_SweepScriptFilenames) in latest javascript code in hg repository. In this bug order, js_FinishRuntimeScriptState will destroy a hash table, then js_SweepScriptFilenames will access this table, causing a crash. I have used sempahores to successfully enforce a feasible schedule in latest source code to prove that this bug order can still happen.
    
    Although this bug can be hidden by another fix for another bug 131246 (the fix for this bug is adding NULL check at js_{Mark,Sweep}AtomState), this bug order is very dangerous. In the past 7 years (starting from the bug 133773 at 2002), developers have tried some ordering mechanisms like js_WaitForGC to avoid this bug order, but failed.

    The corresponding bug order in old code (js-1.5) is (js_FreeAtomState --> js_{Mark,Sweep}AtomState). Regarding the fixes in bug 133773, my schedule can cause a crash (please refer to the 133773 and 477021 bug pages for more details), because the NULL check at js_{Mark,Sweep}AtomState has not been deployed when bug 133773 is being fixed. This means that the fix in bug 133773 actually can not fix 133773, and a crash can still occur after the patches in 133773 are added.

    Since I am using semaphores, this bug can happen at each execution.

Reproducible: Always

Steps to Reproduce:
1. Add my patch to latest js source code in hg repository.
2. Run my simple testcase (just start two threads and only two threads).
3. Bug occur.
Actual Results:  
Below is my output, which shows js_FinishRuntimeScriptState is called in one thread, and then js_{Mark,Sweep}ScriptFilenames is called in another thread:

---------------------------------------------------------------------------
DestroyContext tid -1220404336 in critical region, js_ContextNum 2
Tid -1220404336 sets -1220404336 to sem 1 tid1
Tid -1220404336 sets -1220404336 to sem 2 tid2
Tid -1220404336 js_WaitForGC in js_DestroyContext
enter js_GC tid -1220404336, rt state 2, gcLevel 0
Tid -1220404336 post on sem 1
Tid -1220404336 wait on sem 2
DestroyContext tid -1229329520 in critical region, js_ContextNum 1
Tid -1229329520 sets -1229329520 to sem 1 tid2
Tid -1229329520 sets -1229329520 to sem 2 tid1
Tid -1229329520 wait on sem 1
Tid -1229329520 wakes up on sem 1
Tid -1229329520 js_WaitForGC in js_DestroyContext
enter js_GC tid -1229329520, rt state 3, gcLevel 0
Tid -1229329520 js_MarkScriptFilenames
Tid -1229329520 destroy js_FinishRuntimeScriptState
Tid -1229329520 post on sem 2
Tid -1220404336 wakes up on sem 2
Tid -1220404336 js_MarkScriptFilenames
Tid -1220404336 js_MarkScriptFilenames NUUUUUUUL, will CRASH without the scriptFilenameTable check! 0
Tid -1220404336 js_SweepScriptFilenames NUUUUUUUL, will CRASH without the scriptFilenameTable check!


A crash and its gdb backtrace if I remove the NULL check at js_{Mark,Sweep}ScriptFilenames functions (we can see that ht=0x0, because js_FinishRuntimeScriptState has already destroyed this table):

(gdb) bt
#0  JS_HashTableEnumerateEntries (ht=0x0, f=0x80ccb10 <js_script_filename_sweeper>, arg=0x8135818) at jshash.cpp:361
#1  0x080cd1e0 in js_SweepScriptFilenames (rt=0x8135818) at jsscript.cpp:1310
#2  0x08080a83 in js_GC (cx=0x813d310, gckind=GC_NORMAL) at jsgc.cpp:3744
#3  0x0805d8f5 in js_DestroyContext (cx=0x813d310, mode=JSDCM_FORCE_GC) at jscntxt.cpp:768
#4  0x0804e759 in JS_DestroyContext (cx=0x813d310) at jsapi.cpp:1074
#5  0x0804a4d7 in RunJavascript ()
My patch for adding semaphores, my testcase, and the pseudo code for the feasible schedule will come soon.
For the old code in js-1.5, I need 3 threads and 4 semphores to expose this bug order (please refer to the bug 133773 page for more details). However, in latest source code in hg, I just need 2 threads and 2 semaphores. It seems to me that the latest code are more vulnerable to this bug.
Assume only two threads are setup, so after thread T2 remove its context from list, the list becomes empty, so T2 will try to destroy the hash table. However, while T2 is destroying the hash table, the garbage collection of T1 has not finished yet.
The source code version in hg repository is up to 2009-9-8. Especially, jscntxt.cpp is Aug 20, and jsgc.cpp is Aug 24.



hg log jscntxt.cpp 
changeset:   31911:371c845dfd5d
user:        Jason Orendorff <jorendorff@mozilla.com>
date:        Thu Aug 20 14:13:21 2009 -0500
summary:     Bug 511418 - static-analysis error in jsobj.cpp:4257: cannot access JS_REQUIRES_STACK variable JSContext::fp and another trivial error in jstracer.cpp. r=gal.



hg log jsgc.cpp
changeset:   31939:7b7f9ae673cf
user:        David Anderson <danderson@mozilla.com>
date:        Mon Aug 24 17:09:44 2009 -0700
summary:     Removed JSStackFrame::callee (bug 512029, r=brendan).
Dear Developers,
    Is my description clear? If there is any another information I need to provide, please tell me.

Thanks,
Heming
Heming, thanks for all the work -- I have not had time to look yet, way too busy with other commitments. I'm hoping Igor has time.

/be
Assignee: general → igor
Status: UNCONFIRMED → NEW
Ever confirmed: true
Whiteboard: [ccbr]
Attached patch fix v1 (obsolete) — Splinter Review
The fix just removes the call to FinishRuntimeScriptState and this bug race from js_DestroyContext. The same job is done when JS_DestroyContext calls js_FreeRuntimeScriptState.
Attachment #402557 - Flags: review?(brendan)
Attached patch fix v2Splinter Review
The new version moves clearance of rt->builfinFunctions into js_GC as doing that in js_DestroyContext has the same race as this bug and theoretically could lead to a shutdown leak. The patch also removes useless js_UnlockGCThing from js_FinishNumberState as NaN and +-oo are long explicitly marked.
Attachment #402557 - Attachment is obsolete: true
Attachment #402564 - Flags: review?(brendan)
Attachment #402557 - Flags: review?(brendan)
(In reply to comment #8)
> Created an attachment (id=402557) [details]
> fix v1
> 
> The fix just removes the call to FinishRuntimeScriptState and this bug race
> from js_DestroyContext. The same job is done when JS_DestroyContext calls
> js_FreeRuntimeScriptState.

JS_DestroyRuntime calls js_FreeRuntimeScriptState, rather -- which leaves the space tied up during zero-contexts-active-on-runtime intervals, which was what the over-complicated code patched by this bug's patch was trying to avoid.

Isn't there a race-free way to save that space, by analogy to the clearing of rt->builtinFunctions in the patch, by calling js_FreeRuntimeScriptState from js_GC?

/be
(In reply to comment #10)
> JS_DestroyRuntime calls js_FreeRuntimeScriptState, rather -- which leaves the
> space tied up during zero-contexts-active-on-runtime intervals, which was what
> the over-complicated code patched by this bug's patch was trying to avoid.

The entries from rt->scriptFilenameTable are removed when they are swept using  JS_HashTableEnumerateEntries. That function shrinks the hashtable when necessary leaving just MINBUCKETS or 16 entries in the hasharray when all entries are removed. Thus for an embedding that is frequently in the zero context state, the space that the patch waste is minuscule. Moreover, for such embedding the patch would avoid recreation of the table after compiling a new script.

> 
> Isn't there a race-free way to save that space, by analogy to the clearing of
> rt->builtinFunctions in the patch, by calling js_FreeRuntimeScriptState from
> js_GC?

This is what I did in the initial version of the patch until I realized that that code is not necessary due to the above reasoning.
Comment on attachment 402564 [details] [diff] [review]
fix v2

Ok, maybe a comment about the space held being tiny is appropriate. r=me, thanks.

/be
Attachment #402564 - Flags: review?(brendan) → review+
http://hg.mozilla.org/tracemonkey/rev/3fc33a44b5ef

I nominate this for branches as this bug race could lead to accessing freed memory in some embedding or with Firefox addons. Note that in the browser itself the race cannot happen as thread workers are shutdown strictly before the last context is destroyed.
Flags: wanted1.9.0.x?
Flags: blocking1.9.2?
Whiteboard: [ccbr] → [ccbr] fixed-in-tracemonkey
Flags: blocking1.9.2? → blocking1.9.2+
Priority: -- → P2
http://hg.mozilla.org/mozilla-central/rev/3fc33a44b5ef
Status: NEW → RESOLVED
Closed: 10 years ago
Resolution: --- → FIXED
(In reply to comment #6)
> Dear Developers,
>     Is my description clear? If there is any another information I need to
> provide, please tell me.
> 
> Thanks,
> Heming

Heming, 

I am a PhD student from UCF and my research involves techniques for debugging. 
I would like to study this bug, but I am new to Mozilla. Can you please help me reproduce the bug. 

Here is what I have so far: 

- I checked out an old version of the code from 9-8-2009.

$ hg clone http://hg.mozilla.org/releases/comm-1.9.1/
$ python client.py --comm-rev=b7f6edb75fa --mozilla-rev=dd9b10ed7eeb checkout

- Then I applied your patch with the semaphores and finally, I compiled mozilla. 

- However, I am not able to compile your test program: test-js.cpp
Can u please tell me how to compile and run that, in order to trigger the bug. 

Thank you very much, 

Martin
Hi, 

I figured out how to reproduce the bug. 

Thanks, 
Martin
Flags: wanted1.9.0.x?
You need to log in before you can comment on or make changes to this bug.