Closed Bug 1075572 Opened 5 years ago Closed 5 years ago

Intermittent valgrind-test | Invalid read of size 4 at pthread_cond_broadcast / PR_Unlock / js::GCHelperState::work / js::HelperThread::handleGCHelperWorkload

Categories

(Core :: JavaScript: GC, defect)

x86
Linux
defect
Not set

Tracking

()

RESOLVED FIXED
mozilla38
Tracking Status
firefox36 --- fixed
firefox37 --- fixed
firefox38 --- fixed
firefox-esr31 --- unaffected

People

(Reporter: cbook, Assigned: terrence)

References

()

Details

(Keywords: intermittent-failure)

Attachments

(1 file)

Linux x86-64 mozilla-inbound valgrind

https://treeherder.mozilla.org/ui/logviewer.html#?job_id=2682357&repo=mozilla-inbound

TEST-UNEXPECTED-FAIL | valgrind-test | Invalid read of size 4 at pthread_cond_broadcast / PR_Unlock / js::GCHelperState::work / js::HelperThread::handleGCHelperWorkload
==24883== Invalid read of size 4
==24883== at 0x4E3BCBC: pthread_cond_broadcast@@GLIBC_2.3.2 (in /lib64/libpthread-2.12.so)
==24883== by 0x65C97D4: PR_Unlock (ptsynch.c:212)
==24883== by 0xA5FD234: js::GCHelperState::work() (GCRuntime.h:361)
==24883== by 0xA662C48: js::HelperThread::handleGCHelperWorkload() (HelperThreads.cpp:1200)
==24883== by 0xA676D62: js::HelperThread::threadLoop() (HelperThreads.cpp:1257)
==24883== by 0xA676E57: js::HelperThread::ThreadMain(void*) (HelperThreads.cpp:864)
==24883== by 0x65CEE9D: _pt_root (ptthread.c:212)
==24883== by 0x4E377F0: start_thread (in /lib64/libpthread-2.12.so)
==24883== by 0x5CDD92C: clone (in /lib64/libc-2.12.so)
==24883== Address 0x148e2f80 is 16 bytes inside a block of size 168 free'd
==24883== at 0x4C2806A: free (vg_replace_malloc.c:446)
==24883== by 0x65BC008: PR_Free (prmem.c:458)
==24883== by 0x65C951B: PR_DestroyLock (ptsynch.c:167)
==24883== by 0xA5B965D: js::gc::GCRuntime::finish() (jsgc.cpp:1395)
==24883== by 0xA6BBB6B: JSRuntime::~JSRuntime() (Runtime.cpp:421)
==24883== by 0xA57C8AF: JS_DestroyRuntime(JSRuntime*) (Utility.h:526)
==24883== by 0x8325331: mozilla::CycleCollectedJSRuntime::~CycleCollectedJSRuntime() (CycleCollectedJSRuntime.cpp:517)
==24883== by 0x93A21A4: (anonymous namespace)::WorkerThreadPrimaryRunnable::Run() (RuntimeService.cpp:864)
==24883== by 0x835C289: nsThread::ProcessNextEvent(bool, bool*) (nsThread.cpp:830)
==24883== by 0x837A236: NS_ProcessNextEvent(nsIThread*, bool) (nsThreadUtils.cpp:265)
==24883== by 0x85AD9FD: mozilla::ipc::MessagePumpForNonMainThreads::Run(base::MessagePump::Delegate*) (MessagePump.cpp:339)
==24883== by 0x859D056: MessageLoop::RunInternal() (message_loop.cc:230)
==24883== by 0x859D062: MessageLoop::RunHandler() (message_loop.cc:223)
==24883== by 0x859D2EB: MessageLoop::Run() (message_loop.cc:197)
==24883== by 0x83601C0: nsThread::ThreadFunc(void*) (nsThread.cpp:350)
==24883== by 0x65CEE9D: _pt_root (ptthread.c:212)
==24883== by 0x4E377F0: start_thread (in /lib64/libpthread-2.12.so)
==24883== by 0x5CDD92C: clone (in /lib64/libc-2.12.so)
Possibly some kind of CC/GC shutdown race?
The stack involves the GC helper thread, it looks like.
Component: General → JavaScript: GC
Somehow waitBackgroundSweepEnd must be failing to wait for the |done| CV to fire, somehow. Brian, I think you are most familiar with how the GC helper thread interact with HelperThreads: could you take a quick look?
Flags: needinfo?(bhackett1024)
(In reply to Terrence Cole [:terrence] from comment #4)
> Somehow waitBackgroundSweepEnd must be failing to wait for the |done| CV to
> fire, somehow. Brian, I think you are most familiar with how the GC helper
> thread interact with HelperThreads: could you take a quick look?

Are you sure this is the only possibility?  GCRuntime::finish calls waitBackgroundSweepEnd, not waitBackgroundSweepOrAllocEnd, and what is keeping a helper thread from doing background allocation when the runtime is destroyed?  There isn't a GC helper thread anymore (now or when this bug was filed) and everything is done on helper threads using GCHelperState and GCParallelTask, but the problem still seems to exist.

The interaction here is pretty simple: off thread GC work should not be happening once we start destroying the GCRuntime.  With the addition of GCParallelTasks though I don't know the best way to make sure that no GC work is happening, so can you take a look at this?
Flags: needinfo?(bhackett1024) → needinfo?(terrence)
(In reply to Brian Hackett (:bhackett) from comment #5)
> (In reply to Terrence Cole [:terrence] from comment #4)
> > Somehow waitBackgroundSweepEnd must be failing to wait for the |done| CV to
> > fire, somehow. Brian, I think you are most familiar with how the GC helper
> > thread interact with HelperThreads: could you take a quick look?
> 
> Are you sure this is the only possibility?  GCRuntime::finish calls
> waitBackgroundSweepEnd, not waitBackgroundSweepOrAllocEnd, and what is
> keeping a helper thread from doing background allocation when the runtime is
> destroyed? There isn't a GC helper thread anymore (now or when this bug was
> filed) and everything is done on helper threads using GCHelperState and
> GCParallelTask, but the problem still seems to exist.

Yes. I should have said "... how the GCHelperState interacts with HelperThreads ...".

> The interaction here is pretty simple: off thread GC work should not be
> happening once we start destroying the GCRuntime.  With the addition of
> GCParallelTasks though I don't know the best way to make sure that no GC
> work is happening, so can you take a look at this?

Good point, we definitely need to join our outstanding tasks too! I'll file a separate bug. However, I don't think that could possibly explain the bug here as GCParallelTask doesn't touch GCHelperState::done. 

The problem from the backtrace in this bug seems to be that GCHelperState::work (run from a helper thread) is trying to use the |done| CV after GCRuntime::finish has deleted it; however, GCRuntime::finish waits on |done| before deleting it. I think that means that either the synchronization isn't synchronizing, or we're starting the background helper thread after ::finish. Moreover, as demonstrated by bug 1072931, we are shutting down without finalizing all things. In combination, I thought that makes a synchronization mistake our most likely culprit. I was hoping you'd be able to suggest some other likely options because I don't see the synchronization error either!
Flags: needinfo?(terrence)
See Also: → 1072931
I think this is a UAF of the GC lock, not the |done| cvar, at least that is what the stack trace for the free call corresponded to (e.g. PR_DestroyLock, not PR_DestroyCondVar).
(In reply to Brian Hackett (:bhackett) from comment #7)
> I think this is a UAF of the GC lock, not the |done| cvar, at least that is
> what the stack trace for the free call corresponded to (e.g. PR_DestroyLock,
> not PR_DestroyCondVar).

Sorry, I should have read the stacks more carefully.
Assignee: nobody → terrence
Status: NEW → ASSIGNED
Attachment #8554769 - Flags: review?(bhackett1024)
Comment on attachment 8554769 [details] [diff] [review]
join_on_alloc_task-v0.diff

Review of attachment 8554769 [details] [diff] [review]:
-----------------------------------------------------------------

Thanks!
Attachment #8554769 - Flags: review?(bhackett1024) → review+
https://hg.mozilla.org/mozilla-central/rev/9076fc40dad4
Status: ASSIGNED → RESOLVED
Closed: 5 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla38
Any reason not to get this on Aurora/Beta?
Flags: needinfo?(terrence)
None! I was going to ask for permission to uplift today, but probably would have forgotten. Thanks for the reminder!
Flags: needinfo?(terrence)
Comment on attachment 8554769 [details] [diff] [review]
join_on_alloc_task-v0.diff

Approval Request Comment
[Feature/regressing bug #]: Splitting bg alloc to a separate task.
[User impact if declined]: Occasional shutdown leaks and crashes (e.g. very low).
[Describe test coverage new/current, TreeHerder]: A day on m-i.
[Risks and why]: Very low: this is an obvious two line fix. The impact would also be very low except that it occasionally triggers test assertions. We mostly want to uplift for the sheriffs' sakes. 
[String/UUID change made/needed]: None.
Attachment #8554769 - Flags: approval-mozilla-beta?
Attachment #8554769 - Flags: approval-mozilla-aurora?
Attachment #8554769 - Flags: approval-mozilla-beta?
Attachment #8554769 - Flags: approval-mozilla-beta+
Attachment #8554769 - Flags: approval-mozilla-aurora?
Attachment #8554769 - Flags: approval-mozilla-aurora+
You need to log in before you can comment on or make changes to this bug.