DOM Worker can destroy JSContext while Cycle Collector is running

RESOLVED FIXED

Status

()

defect
RESOLVED FIXED
9 years ago
3 months ago

People

(Reporter: igor, Assigned: bent.mozilla)

Tracking

({regression})

Trunk
Points:
---
Dependency tree / graph

Firefox Tracking Flags

(blocking2.0 betaN+)

Details

(Whiteboard: [softblocker])

Attachments

(1 attachment)

Reporter

Description

9 years ago
With the Cycle Collector decoupled from JS GC it is possible that a DOM worker thread shutdowns during the cycle collection leading to a destruction of its context. Yet the CC does not account for that leading to the CC accessing a freed memory.

If DOM worker JSContext::global cannot participate in a cycle, then an easy fix would be not to enumerate over non-main JSContext instances. But if we cannot rule out that then I presume we can turn js_DestroyContext into something that just moves the context into to-be-destroyed later list with the actual destruction done during the next GC.

Ben, could you comment on that?

Comment 1

9 years ago
Great catch. Thanks for looking into this.
A worker's global object is set and unset from the JSContext every time a worker runs. While it's set a cycle may exist between the main thread's Worker object and the worker's global object (main thread Worker object holds onto inner WorkerGlobalScope object). Does that answer your question?
Reporter

Comment 3

9 years ago
(In reply to comment #2)
> A worker's global object is set and unset from the JSContext every time a
> worker runs. While it's set a cycle may exist between the main thread's Worker
> object and the worker's global object (main thread Worker object holds onto
> inner WorkerGlobalScope object). Does that answer your question?

Yes - this means that CC must properly deal with concurrently running workers and be prepared to JSContext destruction when the latter is destroyed on thread poll pressure. it also must properly deal with concurrent refcount updates. 

One way to deal with is to suspend the workers. Before CC and GC separation the GC request machinery took care about that. I suppose we can reuse that again.

Comment 4

9 years ago
I don't remember the code suspecting workers. Can you elaborate?
Reporter

Comment 5

9 years ago
(In reply to comment #4)
> I don't remember the code suspecting workers. Can you elaborate?

That is a part of request serialization that the GC is doing. That ensured that the GC would wait until workers leaves or suspend their request. See also SetProtoCheckingForCycles and TraceRuntime, of http://hg.mozilla.org/tracemonkey/file/fb8de56a6925/js/src/jsgc.cpp#l2601 , http://hg.mozilla.org/tracemonkey/file/fb8de56a6925/js/src/jsgc.cpp#l2674 , that also rely on request synchronization.
Reporter

Comment 6

9 years ago
I have just irced with Ben. Workers do not participate in CC so as a simple workaround we should just skip workers contexts. But he also pointed out that we cannot just skip any non-main context in CC as some extension may use  nsiThreadManager.
(In reply to comment #6)
> I have just irced with Ben. Workers do not participate in CC so as a simple
> workaround we should just skip workers contexts. But he also pointed out that
> we cannot just skip any non-main context in CC as some extension may use 
> nsiThreadManager.

Yeah, sorry, I misinterpreted your question. Those cycles I pointed out are real, they're just not traversed by the CC.

Updated

9 years ago
blocking2.0: ? → betaN+
Is this bug still the current strategy for attacking bug 600622?
Moving up since this is supposed to be the strategy for bug 600622.
blocking2.0: betaN+ → beta9+
Bug 600622 is fixed, so I guess it's back to betaN.

Can anyone comment on how serious this issue is? Is it likely to cause a lot of crashes, or is it more theoretical?

Updated

9 years ago
Assignee: general → bent.mozilla

Updated

9 years ago
Assignee: bent.mozilla → nobody
Component: JavaScript Engine → DOM
QA Contact: general → general
Assignee: nobody → bent.mozilla
Keywords: regression

Comment 12

9 years ago
As per today's meeting, beta 9 will be a time-based release. Marking these all betaN+. Please move it back to beta9+ if  you believe it MUST be in the next beta (ie: trunk is in an unshippable state without this)
blocking2.0: beta9+ → betaN+

Updated

9 years ago
No longer blocks: 600622

Updated

9 years ago
Blocks: 600622
Whiteboard: softblocker
Whiteboard: softblocker → [softblocker]
Posted patch Patch, v1Splinter Review
This fixes us up by proxying context destruction to the main thread. It's a little ugly, but gal and I chatted and think that this is the best quick fix we can think of.
Attachment #502967 - Flags: review?(jst)
Attachment #502967 - Flags: review?(jst) → review+
Reporter

Comment 14

9 years ago
(In reply to comment #13)
> This fixes us up by proxying context destruction to the main thread.

Could it happen with the patch that the context will be destructed on shutdown after the main thread JS context is destroyed?
(In reply to comment #14)
> Could it happen with the patch that the context will be destructed on shutdown
> after the main thread JS context is destroyed?

I don't think so... The JSRuntime is destroyed in XPConnect's module destructor, and we shut down the worker thread pool and then call ProcessPendingEvents long before that. I believe we're safe.
http://hg.mozilla.org/mozilla-central/rev/19565050334a
Status: NEW → RESOLVED
Closed: 9 years ago
Resolution: --- → FIXED
(In reply to comment #10)
> Can anyone comment on how serious this issue is? Is it likely to
> cause a lot of crashes, or is it more theoretical?

I reckon I saw it happen a couple of days back.  At least, if the
error below is it.  Sounds like it fits the bill though.

Thread 3:
Invalid read of size 4
   at 0x5F19125: JSContextParticipant::Traverse(void*, nsCycleCollectionTraversalCallback&) (nsXPConnect.cpp:804)
   by 0x64A02EE: GCGraphBuilder::Traverse(PtrInfo*) (nsCycleCollector.cpp:1525)
   by 0x64A036B: nsCycleCollector::MarkRoots(GCGraphBuilder&) (nsCycleCollector.cpp:1766)
   by 0x64A1F52: nsCycleCollector::BeginCollection(int, nsICycleCollectorListener*) (nsCycleCollector.cpp:2644)
   by 0x64A2B1E: nsCycleCollectorRunner::Run() (nsCycleCollector.cpp:3323)
   by 0x64921DD: nsThread::ProcessNextEvent(int, int*) (nsThread.cpp:633)
   by 0x644F2F3: NS_ProcessNextEvent_P(nsIThread*, int) (nsThreadUtils.cpp:250)
   by 0x6492C84: nsThread::ThreadFunc(void*) (nsThread.cpp:278)
   by 0x7B4BE72: _pt_root (ptthread.c:187)
   by 0x4E349C9: start_thread (pthread_create.c:300)
   by 0xB3E270C: clone (clone.S:112)
 Address 0x660cabe8 is 376 bytes inside a block of size 576 free'd
   at 0x4C2748D: free (vg_replace_malloc.c:366)
   by 0x5F184B7: nsXPConnect::ReleaseJSContext(JSContext*, int) (nsXPConnect.cpp:2098)
   by 0x5C3E734: nsDOMThreadService::OnThreadShuttingDown() (nsDOMThreadService.cpp:1399)
   by 0x6494F82: nsThreadPool::Run() (nsThreadPool.cpp:226)
   by 0x64921DD: nsThread::ProcessNextEvent(int, int*) (nsThread.cpp:633)
   by 0x644F2F3: NS_ProcessNextEvent_P(nsIThread*, int) (nsThreadUtils.cpp:250)
   by 0x6492C84: nsThread::ThreadFunc(void*) (nsThread.cpp:278)
   by 0x7B4BE72: _pt_root (ptthread.c:187)
   by 0x4E349C9: start_thread (pthread_create.c:300)
   by 0xB3E270C: clone (clone.S:112)
Component: DOM → DOM: Core & HTML
You need to log in before you can comment on or make changes to this bug.