Closed Bug 1095433 Opened 10 years ago Closed 9 years ago

Task Tracer: crash in the preallocated process after Task Tracer is turned on

Categories

(Core :: Gecko Profiler, defect)

ARM
Gonk (Firefox OS)
defect
Not set
normal

Tracking

()

RESOLVED FIXED
mozilla40
Tracking Status
firefox40 --- fixed

People

(Reporter: cyu, Assigned: cyu)

References

Details

Attachments

(1 file, 5 obsolete files)

The preallocated process sometimes crashes after Task Tracer is turned on. I captured the crash stack in gdb as follows: Thread 2 (Thread 28755.28905): #0 pthread_mutex_lock_impl (mutex=0x5a5a5a5a) at bionic/libc/bionic/pthread.c:588 #1 0xb6edd606 in __wrap_pthread_mutex_lock (mtx=0x5a5a5a5a) at /mnt/SSD/data/hg/mozilla-central/mozglue/build/Nuwa.cpp:1176 #2 0xb69ba6b4 in PR_Lock (lock=0x5a5a5a5a) at /mnt/SSD/data/hg/mozilla-central/nsprpub/pr/src/pthreads/ptsynch.c:177 #3 0xb59b3c26 in BaseAutoLock (aLock=..., this=0xb393bab4) at ../../dist/include/mozilla/Mutex.h:164 #4 mozilla::tasktracer::TraceInfo::AppendLog (this=this@entry=0xb3a35d00) at /mnt/SSD/data/hg/mozilla-central/tools/profiler/GeckoTaskTracer.cpp:168 #5 0xb59b3d1c in mozilla::tasktracer::LogEnd (aTaskId=<optimized out>, aSourceEventId=<optimized out>) at /mnt/SSD/data/hg/mozilla-central/tools/profiler/GeckoTaskTracer.cpp:308 #6 0xb4e65d38 in MessageLoop::RunTask (this=0xb393bba0, task=0xb3a3498c) at /mnt/SSD/data/hg/mozilla-central/ipc/chromium/src/base/message_loop.cc:361 #7 0xb4e681be in MessageLoop::DeferOrRunPendingTask (this=<optimized out>, pending_task=...) at /mnt/SSD/data/hg/mozilla-central/ipc/chromium/src/base/message_loop.cc:369 #8 0xb4e69d24 in DoWork (this=<optimized out>) at /mnt/SSD/data/hg/mozilla-central/ipc/chromium/src/base/message_loop.cc:447 #9 MessageLoop::DoWork (this=0xb393bba0) at /mnt/SSD/data/hg/mozilla-central/ipc/chromium/src/base/message_loop.cc:426 #10 0xb4e65e04 in base::MessagePumpDefault::Run (this=0xb3a86ba0, delegate=0xb393bba0) at /mnt/SSD/data/hg/mozilla-central/ipc/chromium/src/base/message_pump_default.cc:34 #11 0xb4e65cc4 in MessageLoop::RunInternal (this=this@entry=0xb393bba0) at /mnt/SSD/data/hg/mozilla-central/ipc/chromium/src/base/message_loop.cc:233 #12 0xb4e65d78 in RunHandler (this=0xb393bba0) at /mnt/SSD/data/hg/mozilla-central/ipc/chromium/src/base/message_loop.cc:226 #13 MessageLoop::Run (this=this@entry=0xb393bba0) at /mnt/SSD/data/hg/mozilla-central/ipc/chromium/src/base/message_loop.cc:200 #14 0xb4e6b2c8 in base::Thread::ThreadMain (this=0xb3a85b80) at /mnt/SSD/data/hg/mozilla-central/ipc/chromium/src/base/thread.cc:170 #15 0xb4e60e48 in ThreadFunc (closure=<optimized out>) at /mnt/SSD/data/hg/mozilla-central/ipc/chromium/src/base/platform_thread_posix.cc:39 #16 0xb6edcafe in _thread_create_startup (arg=0xb3a64400) at /mnt/SSD/data/hg/mozilla-central/mozglue/build/Nuwa.cpp:631 #17 thread_create_startup (arg=0xb3a64400) at /mnt/SSD/data/hg/mozilla-central/mozglue/build/Nuwa.cpp:662 #18 0xb6e5f22c in __thread_entry (func=0xb6edcac1 <thread_create_startup(void*)>, arg=0xb3a64400, tls=0xb3a64400) at bionic/libc/bionic/pthread_create.cpp:105 #19 0xb6e5f3c4 in pthread_create (thread_out=0xb3a85b88, attr=<optimized out>, start_routine=0xb6edcac1 <thread_create_startup(void*)>, arg=0x78) at bionic/libc/bionic/pthread_create.cpp:224 #20 0x00000000 in ?? ()
When the crash happens, the BufferMgrChild thread in Nuwa is frozen in: In Nuwa the BufferMgrChild thread is frozen when holding the lock: (gdb) bt #0 __futex_syscall3 () at bionic/libc/arch-arm/bionic/futex_arm.S:39 #1 0xb6eac7c8 in _normal_lock (shared=0, mutex=0xb6f583e4 <sThreadFreezeLock>) at bionic/libc/bionic/pthread.c:472 #2 pthread_mutex_lock_impl (mutex=0xb6f583e4 <sThreadFreezeLock>) at bionic/libc/bionic/pthread.c:594 #3 0xb6f2963a in __wrap_pthread_mutex_lock (mtx=0xb6a601a0) at /mnt/SSD/data/hg/mozilla-central/mozglue/build/Nuwa.cpp:1177 #4 0xb69ba6b4 in PR_Lock (lock=0xb6a601a0) at /mnt/SSD/data/hg/mozilla-central/nsprpub/pr/src/pthreads/ptsynch.c:177 #5 0xb59a99a6 in BaseAutoLock (aLock=..., this=0xb3977abc) at ../../dist/include/mozilla/Mutex.h:164 #6 mozilla::tasktracer::(anonymous namespace)::IsStartLogging (aInfo=aInfo@entry=0xb31c7240) at /mnt/SSD/data/hg/mozilla-central/tools/profiler/GeckoTaskTracer.cpp:153 #7 0xb59aa650 in mozilla::tasktracer::LogEnd (aTaskId=<optimized out>, aSourceEventId=<optimized out>) at /mnt/SSD/data/hg/mozilla-central/tools/profiler/GeckoTaskTracer.cpp:314 #8 0xb4e537f0 in MessageLoop::RunTask (this=0xb3977ba0, task=0xb3ab2b6c) at /mnt/SSD/data/hg/mozilla-central/ipc/chromium/src/base/message_loop.cc:361 #9 0xb4e55c9e in MessageLoop::DeferOrRunPendingTask (this=<optimized out>, pending_task=...) at /mnt/SSD/data/hg/mozilla-central/ipc/chromium/src/base/message_loop.cc:369 #10 0xb4e57804 in DoWork (this=<optimized out>) at /mnt/SSD/data/hg/mozilla-central/ipc/chromium/src/base/message_loop.cc:447 #11 MessageLoop::DoWork (this=0xb3977ba0) at /mnt/SSD/data/hg/mozilla-central/ipc/chromium/src/base/message_loop.cc:426 #12 0xb4e538bc in base::MessagePumpDefault::Run (this=0xb3a829c0, delegate=0xb3977ba0) at /mnt/SSD/data/hg/mozilla-central/ipc/chromium/src/base/message_pump_default.cc:34 #13 0xb4e5377c in MessageLoop::RunInternal (this=this@entry=0xb3977ba0) at /mnt/SSD/data/hg/mozilla-central/ipc/chromium/src/base/message_loop.cc:233 #14 0xb4e53830 in RunHandler (this=0xb3977ba0) at /mnt/SSD/data/hg/mozilla-central/ipc/chromium/src/base/message_loop.cc:226 #15 MessageLoop::Run (this=this@entry=0xb3977ba0) at /mnt/SSD/data/hg/mozilla-central/ipc/chromium/src/base/message_loop.cc:200 #16 0xb4e58da8 in base::Thread::ThreadMain (this=0xb31c8a60) at /mnt/SSD/data/hg/mozilla-central/ipc/chromium/src/base/thread.cc:170 #17 0xb4e4e8fc in ThreadFunc (closure=<optimized out>) at /mnt/SSD/data/hg/mozilla-central/ipc/chromium/src/base/platform_thread_posix.cc:39 #18 0xb6f28afe in _thread_create_startup (arg=0xb3a5ec00) at /mnt/SSD/data/hg/mozilla-central/mozglue/build/Nuwa.cpp:631 #19 thread_create_startup (arg=0xb3a5ec00) at /mnt/SSD/data/hg/mozilla-central/mozglue/build/Nuwa.cpp:662 #20 0xb6eab22c in __thread_entry (func=0xb6f28ac1 <thread_create_startup(void*)>, arg=0xb3a5ec00, tls=0xb3977dd0) at bionic/libc/bionic/pthread_create.cpp:105 #21 0xb6eab3c4 in pthread_create (thread_out=0xb31c8a68, attr=<optimized out>, start_routine=0xb6f28ac1 <thread_create_startup(void*)>, arg=0x78) at bionic/libc/bionic/pthread_create.cpp:224 #22 0x00000000 in ?? () This is really bad because in the app process, this thread can be recreated, return from IsStartLogging() and release the lock, get preempted, and the main thread calls Cleanup() and InitTaskTracer() to free aInfo to cause the crash. We need to change the way we check whether the Task Tracer is started in an atomic way that is safe for thread freezing in Nuwa.
I think the lock described in comment 0 is caused by |mLogsMutex|, GetOrCreateTraceInfo() is thread-safe. And I agree on we should find a better way of signalling processes to start logging TaskTracer.
Attached patch WIP (obsolete) — Splinter Review
TODO: Fix the race in Cleanup()
Attached patch Fix the race condition (obsolete) — Splinter Review
The crash results from non-atomic read/write to the status of whether the Task Tracer is started. The Nuwa process could freeze a thread that is stuck in mozilla::tasktracer::(anonymous namespace)::IsStartLogging(), which involves acquiring a mutex. In the forked process, the thread frozen in IsStartLogging() can use a TraceInfo instance that is destroyed after fork() from Nuwa. The patch fixes the race by making IsStartLogging() really atomic. It also fixes the way we clean up sTraceInfos, the shared nsTArray storing TraceInfo instances for each traced thread. Instead of freeing TraceInfo instances on the main thread, we use the atomic generation number to obsolete TraceInfo instances and let each thread free the instance it uses when it observes the obsoleteness.
Attachment #8538431 - Attachment is obsolete: true
Attachment #8540011 - Flags: review?(tlee)
Attached patch Fix the race condition (v2) (obsolete) — Splinter Review
Changes to v1: don't use NS_ENSURE_TRUE() that spits out warnings on debug build.
Attachment #8540011 - Attachment is obsolete: true
Attachment #8540011 - Flags: review?(tlee)
Attachment #8542487 - Flags: review?(tlee)
Attached patch Fix the race condition (v3) (obsolete) — Splinter Review
Changes to v2: fix a build break in build build. #undef ENSURE_TRUE_VOID at the end of GeckoTaskTracer.cpp to avoid polluting other source files built together in unified build.
Attachment #8542487 - Attachment is obsolete: true
Attachment #8542487 - Flags: review?(tlee)
Blocks: 1118209
Blocks: 1113562
Attached patch Fix the race condition (obsolete) — Splinter Review
Attachment #8544545 - Attachment is obsolete: true
Attachment #8579971 - Flags: review?(tlee)
The revision without the nonobvious use of generation. Try push log: https://treeherder.mozilla.org/#/jobs?repo=try&revision=be24d8c37d13
Attachment #8579971 - Attachment is obsolete: true
Attachment #8579971 - Flags: review?(tlee)
Attachment #8589505 - Flags: review?(tlee)
Comment on attachment 8589505 [details] [diff] [review] Fix the race condition Review of attachment 8589505 [details] [diff] [review]: ----------------------------------------------------------------- ::: tools/profiler/GeckoTaskTracerImpl.h @@ +31,3 @@ > , mLogsMutex("TraceInfoMutex") > { > + MOZ_ASSERT(aGeneration > 0); Please remove this line.
Attachment #8589505 - Flags: review?(tlee) → review+
Status: NEW → RESOLVED
Closed: 9 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla40
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: