Closed
Bug 1075572
Opened 9 years ago
Closed 8 years ago
Intermittent valgrind-test | Invalid read of size 4 at pthread_cond_broadcast / PR_Unlock / js::GCHelperState::work / js::HelperThread::handleGCHelperWorkload
Categories
(Core :: JavaScript: GC, defect)
Tracking
()
RESOLVED
FIXED
mozilla38
Tracking | Status | |
---|---|---|
firefox36 | --- | fixed |
firefox37 | --- | fixed |
firefox38 | --- | fixed |
firefox-esr31 | --- | unaffected |
People
(Reporter: cbook, Assigned: terrence)
References
()
Details
(Keywords: intermittent-failure)
Attachments
(1 file)
2.21 KB,
patch
|
bhackett1024
:
review+
Sylvestre
:
approval-mozilla-aurora+
Sylvestre
:
approval-mozilla-beta+
|
Details | Diff | Splinter Review |
Linux x86-64 mozilla-inbound valgrind https://treeherder.mozilla.org/ui/logviewer.html#?job_id=2682357&repo=mozilla-inbound TEST-UNEXPECTED-FAIL | valgrind-test | Invalid read of size 4 at pthread_cond_broadcast / PR_Unlock / js::GCHelperState::work / js::HelperThread::handleGCHelperWorkload ==24883== Invalid read of size 4 ==24883== at 0x4E3BCBC: pthread_cond_broadcast@@GLIBC_2.3.2 (in /lib64/libpthread-2.12.so) ==24883== by 0x65C97D4: PR_Unlock (ptsynch.c:212) ==24883== by 0xA5FD234: js::GCHelperState::work() (GCRuntime.h:361) ==24883== by 0xA662C48: js::HelperThread::handleGCHelperWorkload() (HelperThreads.cpp:1200) ==24883== by 0xA676D62: js::HelperThread::threadLoop() (HelperThreads.cpp:1257) ==24883== by 0xA676E57: js::HelperThread::ThreadMain(void*) (HelperThreads.cpp:864) ==24883== by 0x65CEE9D: _pt_root (ptthread.c:212) ==24883== by 0x4E377F0: start_thread (in /lib64/libpthread-2.12.so) ==24883== by 0x5CDD92C: clone (in /lib64/libc-2.12.so) ==24883== Address 0x148e2f80 is 16 bytes inside a block of size 168 free'd ==24883== at 0x4C2806A: free (vg_replace_malloc.c:446) ==24883== by 0x65BC008: PR_Free (prmem.c:458) ==24883== by 0x65C951B: PR_DestroyLock (ptsynch.c:167) ==24883== by 0xA5B965D: js::gc::GCRuntime::finish() (jsgc.cpp:1395) ==24883== by 0xA6BBB6B: JSRuntime::~JSRuntime() (Runtime.cpp:421) ==24883== by 0xA57C8AF: JS_DestroyRuntime(JSRuntime*) (Utility.h:526) ==24883== by 0x8325331: mozilla::CycleCollectedJSRuntime::~CycleCollectedJSRuntime() (CycleCollectedJSRuntime.cpp:517) ==24883== by 0x93A21A4: (anonymous namespace)::WorkerThreadPrimaryRunnable::Run() (RuntimeService.cpp:864) ==24883== by 0x835C289: nsThread::ProcessNextEvent(bool, bool*) (nsThread.cpp:830) ==24883== by 0x837A236: NS_ProcessNextEvent(nsIThread*, bool) (nsThreadUtils.cpp:265) ==24883== by 0x85AD9FD: mozilla::ipc::MessagePumpForNonMainThreads::Run(base::MessagePump::Delegate*) (MessagePump.cpp:339) ==24883== by 0x859D056: MessageLoop::RunInternal() (message_loop.cc:230) ==24883== by 0x859D062: MessageLoop::RunHandler() (message_loop.cc:223) ==24883== by 0x859D2EB: MessageLoop::Run() (message_loop.cc:197) ==24883== by 0x83601C0: nsThread::ThreadFunc(void*) (nsThread.cpp:350) ==24883== by 0x65CEE9D: _pt_root (ptthread.c:212) ==24883== by 0x4E377F0: start_thread (in /lib64/libpthread-2.12.so) ==24883== by 0x5CDD92C: clone (in /lib64/libc-2.12.so)
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment 2•9 years ago
|
||
Possibly some kind of CC/GC shutdown race?
Comment 3•9 years ago
|
||
The stack involves the GC helper thread, it looks like.
Component: General → JavaScript: GC
Assignee | ||
Comment 4•8 years ago
|
||
Somehow waitBackgroundSweepEnd must be failing to wait for the |done| CV to fire, somehow. Brian, I think you are most familiar with how the GC helper thread interact with HelperThreads: could you take a quick look?
Flags: needinfo?(bhackett1024)
Comment 5•8 years ago
|
||
(In reply to Terrence Cole [:terrence] from comment #4) > Somehow waitBackgroundSweepEnd must be failing to wait for the |done| CV to > fire, somehow. Brian, I think you are most familiar with how the GC helper > thread interact with HelperThreads: could you take a quick look? Are you sure this is the only possibility? GCRuntime::finish calls waitBackgroundSweepEnd, not waitBackgroundSweepOrAllocEnd, and what is keeping a helper thread from doing background allocation when the runtime is destroyed? There isn't a GC helper thread anymore (now or when this bug was filed) and everything is done on helper threads using GCHelperState and GCParallelTask, but the problem still seems to exist. The interaction here is pretty simple: off thread GC work should not be happening once we start destroying the GCRuntime. With the addition of GCParallelTasks though I don't know the best way to make sure that no GC work is happening, so can you take a look at this?
Flags: needinfo?(bhackett1024) → needinfo?(terrence)
Assignee | ||
Comment 6•8 years ago
|
||
(In reply to Brian Hackett (:bhackett) from comment #5) > (In reply to Terrence Cole [:terrence] from comment #4) > > Somehow waitBackgroundSweepEnd must be failing to wait for the |done| CV to > > fire, somehow. Brian, I think you are most familiar with how the GC helper > > thread interact with HelperThreads: could you take a quick look? > > Are you sure this is the only possibility? GCRuntime::finish calls > waitBackgroundSweepEnd, not waitBackgroundSweepOrAllocEnd, and what is > keeping a helper thread from doing background allocation when the runtime is > destroyed? There isn't a GC helper thread anymore (now or when this bug was > filed) and everything is done on helper threads using GCHelperState and > GCParallelTask, but the problem still seems to exist. Yes. I should have said "... how the GCHelperState interacts with HelperThreads ...". > The interaction here is pretty simple: off thread GC work should not be > happening once we start destroying the GCRuntime. With the addition of > GCParallelTasks though I don't know the best way to make sure that no GC > work is happening, so can you take a look at this? Good point, we definitely need to join our outstanding tasks too! I'll file a separate bug. However, I don't think that could possibly explain the bug here as GCParallelTask doesn't touch GCHelperState::done. The problem from the backtrace in this bug seems to be that GCHelperState::work (run from a helper thread) is trying to use the |done| CV after GCRuntime::finish has deleted it; however, GCRuntime::finish waits on |done| before deleting it. I think that means that either the synchronization isn't synchronizing, or we're starting the background helper thread after ::finish. Moreover, as demonstrated by bug 1072931, we are shutting down without finalizing all things. In combination, I thought that makes a synchronization mistake our most likely culprit. I was hoping you'd be able to suggest some other likely options because I don't see the synchronization error either!
Flags: needinfo?(terrence)
See Also: → 1072931
Comment 7•8 years ago
|
||
I think this is a UAF of the GC lock, not the |done| cvar, at least that is what the stack trace for the free call corresponded to (e.g. PR_DestroyLock, not PR_DestroyCondVar).
Assignee | ||
Comment 8•8 years ago
|
||
(In reply to Brian Hackett (:bhackett) from comment #7) > I think this is a UAF of the GC lock, not the |done| cvar, at least that is > what the stack trace for the free call corresponded to (e.g. PR_DestroyLock, > not PR_DestroyCondVar). Sorry, I should have read the stacks more carefully.
Assignee: nobody → terrence
Status: NEW → ASSIGNED
Attachment #8554769 -
Flags: review?(bhackett1024)
Comment 9•8 years ago
|
||
Comment on attachment 8554769 [details] [diff] [review] join_on_alloc_task-v0.diff Review of attachment 8554769 [details] [diff] [review]: ----------------------------------------------------------------- Thanks!
Attachment #8554769 -
Flags: review?(bhackett1024) → review+
Assignee | ||
Comment 10•8 years ago
|
||
https://hg.mozilla.org/integration/mozilla-inbound/rev/9076fc40dad4
https://hg.mozilla.org/mozilla-central/rev/9076fc40dad4
Status: ASSIGNED → RESOLVED
Closed: 8 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla38
Comment 12•8 years ago
|
||
Any reason not to get this on Aurora/Beta?
status-firefox36:
--- → affected
status-firefox37:
--- → affected
status-firefox38:
--- → fixed
status-firefox-esr31:
--- → unaffected
Flags: needinfo?(terrence)
Assignee | ||
Comment 13•8 years ago
|
||
None! I was going to ask for permission to uplift today, but probably would have forgotten. Thanks for the reminder!
Flags: needinfo?(terrence)
Assignee | ||
Comment 14•8 years ago
|
||
Comment on attachment 8554769 [details] [diff] [review] join_on_alloc_task-v0.diff Approval Request Comment [Feature/regressing bug #]: Splitting bg alloc to a separate task. [User impact if declined]: Occasional shutdown leaks and crashes (e.g. very low). [Describe test coverage new/current, TreeHerder]: A day on m-i. [Risks and why]: Very low: this is an obvious two line fix. The impact would also be very low except that it occasionally triggers test assertions. We mostly want to uplift for the sheriffs' sakes. [String/UUID change made/needed]: None.
Attachment #8554769 -
Flags: approval-mozilla-beta?
Attachment #8554769 -
Flags: approval-mozilla-aurora?
Updated•8 years ago
|
Attachment #8554769 -
Flags: approval-mozilla-beta?
Attachment #8554769 -
Flags: approval-mozilla-beta+
Attachment #8554769 -
Flags: approval-mozilla-aurora?
Attachment #8554769 -
Flags: approval-mozilla-aurora+
Comment 15•8 years ago
|
||
https://hg.mozilla.org/releases/mozilla-aurora/rev/8145e28863d1 https://hg.mozilla.org/releases/mozilla-beta/rev/40c010f30f39
You need to log in
before you can comment on or make changes to this bug.
Description
•