Last Comment Bug 740719 - b2g-gonk is hanging on shutdown again
: b2g-gonk is hanging on shutdown again
Status: RESOLVED FIXED
:
Product: Core
Classification: Components
Component: General (show other bugs)
: Trunk
: ARM Gonk (Firefox OS)
: -- normal (vote)
: mozilla16
Assigned To: Jim Straus
:
Mentors:
Depends on: 742797 776132
Blocks:
  Show dependency treegraph
 
Reported: 2012-03-30 00:12 PDT by Chris Jones [:cjones] inactive; ni?/f?/r? if you need me
Modified: 2012-08-17 17:01 PDT (History)
7 users (show)
ryanvm: in‑testsuite-
See Also:
Crash Signature:
(edit)
QA Whiteboard:
Iteration: ---
Points: ---
Has Regression Range: ---
Has STR: ---


Attachments
Fix in GonkSensor (3.69 KB, patch)
2012-04-25 18:37 PDT, Jim Straus
cjones.bugs: feedback-
Details | Diff | Review
Changes to nsAppShell (1.91 KB, patch)
2012-05-01 13:09 PDT, Jim Straus
cjones.bugs: review+
Details | Diff | Review
Changes to nsAppShell (1.35 KB, patch)
2012-05-16 08:35 PDT, Jim Straus
cjones.bugs: review+
Details | Diff | Review
Changes to nsAppShell (1.36 KB, patch)
2012-05-31 07:33 PDT, Jim Straus
cjones.bugs: review+
Details | Diff | Review

Description Chris Jones [:cjones] inactive; ni?/f?/r? if you need me 2012-03-30 00:12:52 PDT
Bug 739137 was introduced mostly as a temporary hack to work around the shutdown hang, although it is something fairly generally useful.

But we need to fix the underlying bug.  To investigate, add this code

  window.setTimeout(function() {
     let appStartup = Cc["@mozilla.org/toolkit/app-startup;1"].getService(Ci.nsIAppStartup);
     appStartup.quit(appStartup.eForceQuit)
    }, 30000);

to here http://mxr.mozilla.org/mozilla-central/source/b2g/chrome/content/shell.js#83
.  This cause b2g to attempt to shutdown after 30 seconds.

Note, the last shutdown hang was wifi related, so this is best tested while connected to a wifi network.
Comment 1 Blake Kaplan (:mrbkap) (please use needinfo!) 2012-04-06 06:42:16 PDT
Bug 742797 fixed one possible cause of this, but when I try to reproduce this with my fix in that bug, I see:

E/GeckoConsole(   86): [JavaScript Error: "[Exception... "'JavaScript component does not have a method named: "onStopListening"' when calling method: [nsIServerSocketListener::onStopListening]"  nsresult: "0x80570030 (NS_ERROR_XPC_JSOBJECT_HAS_NO_FUNCTION_NAMED)"  location: "native frame :: <unknown filename> :: <TOP_LEVEL> :: line 0"  data: no]"]

which seems like a possible second cause.
Comment 2 Blake Kaplan (:mrbkap) (please use needinfo!) 2012-04-06 06:43:05 PDT
Also, is this ICS only or does it happen on GB?
Comment 3 Chris Jones [:cjones] inactive; ni?/f?/r? if you need me 2012-04-06 19:30:12 PDT
I saw this on Nexus S ICS.  jstraus reports seeing it on GB sgs2.
Comment 4 Faramarz [:faramarz] 2012-04-24 16:07:46 PDT
Jim: what's the status here?  It's been a while...
Comment 5 Jim Straus 2012-04-24 17:09:41 PDT
Looks like this hal/gonk/GonkSensor not shutting down it's thread.
Comment 6 Jim Straus 2012-04-25 18:37:17 PDT
Created attachment 618504 [details] [diff] [review]
Fix in GonkSensor

It appears that GonkSensor is dispatching events to the thread fast enough to keep the queue from ever emptying.  My solution is to have GonkSensor check to see if the app is shutting down and not dispatch anymore events in that case.

At this point, we get further through the shutdown process, but it still is hanging somewhere.  If I attach-gdb and set a breakpoint on nsThread::Dispatch(nsIRunnable*, unsigned int) and continue, the process terminates and restarts.

At this point, can we review, check-in and close this bug and open a new one with a similar title?
Comment 7 Chris Jones [:cjones] inactive; ni?/f?/r? if you need me 2012-04-30 21:23:31 PDT
Comment on attachment 618504 [details] [diff] [review]
Fix in GonkSensor

Was this hang nondeterministic, by any chance?  I'm a bit surprised it
was reproducible.  But definitely something we should fix.

The symptom here is caused by the nsAppShell accelerometer observer
staying registered too long.  If it unregisters at the right time,
then we're back to maintaining the contract of xpcom-shutdown-threads.
Beyond that, we should start being better citizens and shut down the
sensor thread too, but that's not on the critical path here.

bent, do you have a suggestion for what to listen for / how to listen
for it in nsAppShell?  See widget/gonk/nsAppShell,
OrientationSensorObserver.  Currently its lifetime matches that of the
nsAppShell.
Comment 8 Jim Straus 2012-05-01 09:43:04 PDT
It was very reproducible.  Steven Lee and I were both able to reproduce it.  The comment wasn't clear.  Do you want me to look at an alternate fix in nsAppShell instead of the posted patch?
Comment 9 Ben Turner (not reading bugmail, use the needinfo flag!) 2012-05-01 10:40:29 PDT
(In reply to Chris Jones [:cjones] [:warhammer] from comment #7)
> bent, do you have a suggestion for what to listen for / how to listen
> for it in nsAppShell?

Looks like you could just override Exit(). Make sure you call the base class version though!
Comment 10 Chris Jones [:cjones] inactive; ni?/f?/r? if you need me 2012-05-01 11:37:05 PDT
(In reply to Jim Straus from comment #8)
> It was very reproducible.  Steven Lee and I were both able to reproduce it. 
> The comment wasn't clear.  Do you want me to look at an alternate fix in
> nsAppShell instead of the posted patch?

Yes, follow comment 9:
 - in widget/gonk/nsAppShell, override Exit()
 - unregister the orientation-sensor in the overridden Exit()
 - call the base-class Exit()
Comment 11 Jim Straus 2012-05-01 13:09:08 PDT
Created attachment 620041 [details] [diff] [review]
Changes to nsAppShell

Here are the changes to nsAppShell as specified.  In testing, it doesn't seem to keep the thread from GonkSensor from hanging.  I did try combining this change with the previous and while it keep the GonkSensor thread from hanging, we're still hanging in the main thread somewhere (as soon as I try to look where with gdb, the main thread exits and the restart occurs, so I haven't figured that out yet.)
I didn't invalidate the previous patch, because, while this may be useful, it doesn't seem to fix the problem.
Comment 12 Chris Jones [:cjones] inactive; ni?/f?/r? if you need me 2012-05-01 13:37:32 PDT
bent, does the thread manager forceably shut down nsIThreads at some point or will it politely block indefinitely waiting for them to die off on their own?
Comment 13 Jim Straus 2012-05-01 13:49:58 PDT
cjones:  I don't think it so.  Here is the core of what happens:

nsThreadManager shuts down each of the threads:

  // Shutdown all threads that require it (join with threads that we created).
  LOG(("ThreadManager:Shutdown about to shutdown threads\n"));
  for (PRUint32 i = 0; i < threads.Length(); ++i) {
    nsThread *thread = threads[i];
    if (thread->ShutdownRequired() && thread != mMainThread) {
      thread->Shutdown();
    }
  }
  // In case there are any more events somehow...
  NS_ProcessPendingEvents(mMainThread);

So, if the thread doesn't shutdown when requested (because something is constantly pumping in events), it will just hang.  It might be worthwhile putting a timer in the shutdown process to make sure it goes away.
Comment 14 Chris Jones [:cjones] inactive; ni?/f?/r? if you need me 2012-05-01 16:10:29 PDT
Are we reaching that code?
Comment 15 Ben Turner (not reading bugmail, use the needinfo flag!) 2012-05-01 16:47:53 PDT
(In reply to Chris Jones [:cjones] [:warhammer] from comment #12)

It will politely wait. But it's trivial to let your thread stop posting events now that you've overridden nsAppshell::Exit, right? Much simpler than trying to make a timer at this late stage in shutdown, I would think.
Comment 16 Chris Jones [:cjones] inactive; ni?/f?/r? if you need me 2012-05-01 16:59:08 PDT
I don't have a good picture of what's going wrong here yet.
Comment 17 Jim Straus 2012-05-01 17:21:58 PDT
Chris, yes, we're reaching the nsAppShell::Exit code.
Comment 18 Chris Jones [:cjones] inactive; ni?/f?/r? if you need me 2012-05-01 17:23:46 PDT
I meant the code in comment 13.  Are we reaching that code, and if so is the thread manager calling Shutdown() on the sensor thread?
Comment 19 Chris Jones [:cjones] inactive; ni?/f?/r? if you need me 2012-05-01 23:33:21 PDT
Comment on attachment 620041 [details] [diff] [review]
Changes to nsAppShell

>diff --git a/widget/gonk/nsAppShell.cpp b/widget/gonk/nsAppShell.cpp

>+nsresult
>+nsAppShell::Exit(void)

This is an XPCOM method, and C++ style omits "void", so you want

  NS_IMETHODIMP
  nsAppShell::Exit()

plz.

>diff --git a/widget/gonk/nsAppShell.h b/widget/gonk/nsAppShell.h

>     nsresult Init();

Let's put a newline here to indicate that Init()/Exit() aren't
related, even though it seems like they should be :/.  Init() is a
leaf method on this class.

>+    nsresult Exit();

(and here should be)

    NS_IMETHOD Exit() MOZ_OVERRIDE;

r=me with those fixes.  Then we need to figure out why the sensor
events aren't stopping and/or the thread isn't being shut down when
it's supposed to be.
Comment 20 Jim Straus 2012-05-02 08:43:43 PDT
In regards to comment 18, yes
Comment 21 Chris Jones [:cjones] inactive; ni?/f?/r? if you need me 2012-05-02 12:14:08 PDT
Yes to the code being reached?  Is Shutdown() being called on the sensor thread?

When nsAppShell unregisters its sensor observer, do the sensor events stop?
Comment 22 Jim Straus 2012-05-02 15:20:21 PDT
Yes, the code is being reached.  Yes, Shutdown() is being called on the sensor thread.  My Ubuntu VM is hosed at the moment.  I'm recovering a backup, which will take some time, so I can't test if the unregistering causes the events to stop.  As soon as I'm recovered, I'll check it out.
Comment 23 Jim Straus 2012-05-03 15:01:32 PDT
Finally recovered and tested.  Yes, GonkSensor stops dispatching events when nsAppShell::Exit() is called.
Comment 24 Chris Jones [:cjones] inactive; ni?/f?/r? if you need me 2012-05-03 15:49:25 PDT
What's the stack from the shutdown hang?
Comment 25 Jim Straus 2012-05-03 23:40:35 PDT
#0  __futex_syscall3 () at bionic/libc/arch-arm/bionic/atomics_arm.S:200
#1  0xafd11428 in __pthread_cond_timedwait_relative (cond=0x40201374, 
    mutex=0x402124c4, reltime=0x0) at bionic/libc/bionic/pthread.c:1457
#2  0xafd11514 in __pthread_cond_timedwait (cond=0x40201374, mutex=0x402124c4, 
    abstime=0x0, clock=0) at bionic/libc/bionic/pthread.c:1480
#3  0x80115c78 in PR_WaitCondVar (cvar=0x40201370, timeout=4294967295)
    at ../../../../../nsprpub/pr/src/pthreads/ptsynch.c:417
#4  0x80115ca0 in PR_Wait (mon=0x402124c0, timeout=128)
    at ../../../../../nsprpub/pr/src/pthreads/ptsynch.c:614
#5  0x82ab3a4c in Wait (interval=<optimized out>, this=<optimized out>)
    at ../../dist/include/mozilla/ReentrantMonitor.h:122
#6  Wait (interval=<optimized out>, this=<optimized out>)
    at ../../dist/include/mozilla/ReentrantMonitor.h:224
#7  nsEventQueue::GetEvent (this=0x4021248c, mayWait=true, result=0xbef67940)
    at ../../../xpcom/threads/nsEventQueue.cpp:83
#8  0x82ab4656 in nsThread::ProcessNextEvent (this=0x40212460, mayWait=true, 
    result=0xbef6796f) at ../../../xpcom/threads/nsThread.cpp:642
#9  0x82a913ea in NS_ProcessNextEvent_P (thread=0xfffffe00, mayWait=true)
    at nsThreadUtils.cpp:245
#10 0x82ab481a in nsThread::Shutdown (this=0x41bfd4c0)
    at ../../../xpcom/threads/nsThread.cpp:503
#11 0x82ab544e in nsThreadManager::Shutdown (this=0x8333be94)
    at ../../../xpcom/threads/nsThreadManager.cpp:170
Comment 26 Jim Straus 2012-05-03 23:42:46 PDT
Just as an experiment, in nsThread.cpp::Shutdown() [lines 502-503], I commented out:

  // Process events on the current thread until we receive a shutdown ACK.
  while (!context.shutdownAck)
    NS_ProcessNextEvent(context.joiningThread);

and Gecko restarted successfully.  So, we're definitely having an issue with thread(s) not finishing up.

Since running under GDB changes things, I'm inserting lots of debug prints to see if I can figure out what is going on.
Comment 27 Ben Turner (not reading bugmail, use the needinfo flag!) 2012-05-04 00:10:39 PDT
Next time it hangs you should see if you can get both stacks (main thread and sensor thread). But yeah, basically the sensor thread isn't responding to the shutdown request. Either something in PollSensor::Poll is blocking or the sensor thread has too many runnables in its queue and just can't get to the shutdownAck. 

Looking at the code there's at least one race (on sActivatedSensors) that could cause multiple PollSensor::Poll events to be queued at once, so it's not entirely impossible. Still seems unlikely though. And we should fix that race.
Comment 28 Jim Straus 2012-05-16 08:35:31 PDT
Created attachment 624392 [details] [diff] [review]
Changes to nsAppShell

Here is a new version of nsAppShell due to changes from GonkSensor to a separate API for orientation.
I'm still seeing some thread hanging (an Android thread this time), and trying to run under gdb changes things.  The interesting thing is that while it is hanging, if any hard button is pressed the thread exits and the restart of Gecko is successful.
Comment 29 Ben Turner (not reading bugmail, use the needinfo flag!) 2012-05-16 08:39:32 PDT
Once it hangs can't you just attach the debugger (rather than trying to run under debugger from the start)?
Comment 30 Jim Straus 2012-05-16 09:33:42 PDT
Yes, though if I try to do anything (step, next) the exit completes.  Here is what I get when I attach.  Below this back trace is the list of threads and each of their back traces.  It appears that the InputReaderThread, which is an Android component, is hanging in trying to exit.

Thread 1's back trace:
#0  __futex_syscall3 () at bionic/libc/arch-arm/bionic/atomics_arm.S:182
#1  0x400bc36c in __pthread_cond_timedwait_relative (cond=0x347858, 
    mutex=0x347854, reltime=0x0) at bionic/libc/bionic/pthread.c:1477
#2  0x400bc420 in __pthread_cond_timedwait (cond=0x347858, mutex=0x347854, 
    abstime=0x0, clock=0) at bionic/libc/bionic/pthread.c:1500
#3  0x415d2590 in android::Condition::wait (this=0x347848)
    at frameworks/base/include/utils/threads.h:455
#4  android::Thread::requestExitAndWait (this=0x347848)
    at frameworks/base/libs/utils/Threads.cpp:884
#5  0x40b75754 in ~nsAppShell (this=0x1d3ed0, __in_chrg=<value optimized out>)
    at /Volumes/B2G/m-c/widget/gonk/nsAppShell.cpp:530
#6  0x40b75802 in ~nsAppShell (this=0x347858, __in_chrg=<value optimized out>)
    at /Volumes/B2G/m-c/widget/gonk/nsAppShell.cpp:534
#7  0x4055f906 in nsFileProtocolHandler::Release (this=0x1d3ed0)
    at /Volumes/B2G/m-c/netwerk/protocol/file/nsFileProtocolHandler.cpp:87
#8  0x40561ab4 in ~nsRefPtr (this=0x1d3e90, __in_chrg=<value optimized out>)
    at ../../../dist/include/nsAutoPtr.h:908
#9  0x40ac04da in ~nsCOMPtr (this=0x1d3e78)
    at ../../../dist/include/nsCOMPtr.h:480
#10 ~nsAppStartup (this=0x1d3e78)
    at /Volumes/B2G/m-c/toolkit/components/startup/nsAppStartup.h:82
#11 nsAppStartup::Release (this=0x1d3e78)
    at /Volumes/B2G/m-c/toolkit/components/startup/nsAppStartup.cpp:245
#12 0x40561ab4 in ~nsRefPtr (this=0xbeab99c4, __in_chrg=<value optimized out>)
    at ../../../dist/include/nsAutoPtr.h:908
#13 0x405146d2 in ~nsCOMPtr (this=0xee88, __in_chrg=<value optimized out>)
    at ../../dist/include/nsCOMPtr.h:480
#14 ~ScopedXPCOMStartup (this=0xee88, __in_chrg=<value optimized out>)
    at /Volumes/B2G/m-c/toolkit/xre/nsAppRunner.cpp:1130
#15 0x405169a4 in XREMain::XRE_main (this=0xbeab9a04, 
    argc=<value optimized out>, argv=<value optimized out>, 
    aAppData=<value optimized out>)
    at /Volumes/B2G/m-c/toolkit/xre/nsAppRunner.cpp:3879
#16 0x40516afa in XRE_main (argc=1, argv=0xbeabbbf4, aAppData=0xa100)
    at /Volumes/B2G/m-c/toolkit/xre/nsAppRunner.cpp:3933
#17 0x000089ee in do_main (argc=1, argv=0xbeabbbf4)
    at /Volumes/B2G/m-c/b2g/app/nsBrowserApp.cpp:186
#18 main (argc=1, argv=0xbeabbbf4)
    at /Volumes/B2G/m-c/b2g/app/nsBrowserApp.cpp:269

The list of threads:
  9 Thread 4314.4352  __futex_syscall3 ()
    at bionic/libc/arch-arm/bionic/atomics_arm.S:182
  8 Thread 4314.4348  syscall () at bionic/libc/arch-arm/bionic/syscall.S:50
  7 Thread 4314.4347  __ioctl () at bionic/libc/arch-arm/syscalls/__ioctl.S:9
  6 Thread 4314.4346  __futex_syscall3 ()
    at bionic/libc/arch-arm/bionic/atomics_arm.S:182
  5 Thread 4314.4345  __futex_syscall3 ()
    at bionic/libc/arch-arm/bionic/atomics_arm.S:182
  4 Thread 4314.4344  read () at bionic/libc/arch-arm/syscalls/read.S:9
  3 Thread 4314.4330  __futex_syscall3 ()
    at bionic/libc/arch-arm/bionic/atomics_arm.S:182
  2 Thread 4314.4329  __futex_syscall3 ()
    at bionic/libc/arch-arm/bionic/atomics_arm.S:182
* 1 Thread 4314.4314  __futex_syscall3 ()
    at bionic/libc/arch-arm/bionic/atomics_arm.S:182

Back traces for the other threads
Thread 2:
#0  __futex_syscall3 () at bionic/libc/arch-arm/bionic/atomics_arm.S:182
#1  0x400bc36c in __pthread_cond_timedwait_relative (cond=0x19abdc, 
    mutex=0x19e5f0, reltime=0x0) at bionic/libc/bionic/pthread.c:1477
#2  0x400bc420 in __pthread_cond_timedwait (cond=0x19abdc, mutex=0x19e5f0, 
    abstime=0x0, clock=0) at bionic/libc/bionic/pthread.c:1500
#3  0x401ce1b0 in PR_WaitCondVar (cvar=0x19abd8, timeout=4294967295)
    at /Volumes/B2G/m-c/nsprpub/pr/src/pthreads/ptsynch.c:385
#4  0x40ed3924 in js::GCHelperThread::threadLoop (arg=<value optimized out>)
    at /Volumes/B2G/m-c/js/src/jsgc.cpp:2657
#5  js::GCHelperThread::threadMain (arg=<value optimized out>)
    at /Volumes/B2G/m-c/js/src/jsgc.cpp:2639
#6  0x401d1868 in _pt_root (arg=<value optimized out>)
    at /Volumes/B2G/m-c/nsprpub/pr/src/pthreads/ptthread.c:155
#7  0x400bcc28 in __thread_entry (func=0x401d1809 <_pt_root>, arg=0x19e638, 
    tls=<value optimized out>) at bionic/libc/bionic/pthread.c:217
#8  0x400bc77c in pthread_create (thread_out=<value optimized out>, 
    attr=0xbeab94f4, start_routine=0x401d1809 <_pt_root>, arg=0x19e638)
    at bionic/libc/bionic/pthread.c:357
#9  0x00000000 in ?? ()

Thread 3:
#0  __futex_syscall3 () at bionic/libc/arch-arm/bionic/atomics_arm.S:182
#1  0x400bc36c in __pthread_cond_timedwait_relative (cond=0x7104c, 
    mutex=0x1a5470, reltime=0x0) at bionic/libc/bionic/pthread.c:1477
#2  0x400bc420 in __pthread_cond_timedwait (cond=0x7104c, mutex=0x1a5470, 
    abstime=0x0, clock=0) at bionic/libc/bionic/pthread.c:1500
#3  0x401ce1b0 in PR_WaitCondVar (cvar=0x71048, timeout=4294967295)
    at /Volumes/B2G/m-c/nsprpub/pr/src/pthreads/ptsynch.c:385
#4  0x40a11c7a in XPCJSRuntime::WatchdogMain (arg=<value optimized out>)
    at /Volumes/B2G/m-c/js/xpconnect/src/XPCJSRuntime.cpp:946
#5  0x401d1868 in _pt_root (arg=<value optimized out>)
    at /Volumes/B2G/m-c/nsprpub/pr/src/pthreads/ptthread.c:155
#6  0x400bcc28 in __thread_entry (func=0x401d1809 <_pt_root>, arg=0x1a54b8, 
    tls=<value optimized out>) at bionic/libc/bionic/pthread.c:217
#7  0x400bc77c in pthread_create (thread_out=<value optimized out>, 
    attr=0xbeab953c, start_routine=0x401d1809 <_pt_root>, arg=0x1a54b8)
    at bionic/libc/bionic/pthread.c:357
#8  0x00000000 in ?? ()

Thread 4:
#0  read () at bionic/libc/arch-arm/syscalls/read.S:9
#1  0x40b7662a in frameBufferWatcher ()
    at /Volumes/B2G/m-c/widget/gonk/nsWindow.cpp:140
#2  0x400bcc28 in __thread_entry (func=0x40b76595 <frameBufferWatcher>, 
    arg=0x0, tls=<value optimized out>) at bionic/libc/bionic/pthread.c:217
#3  0x400bc77c in pthread_create (thread_out=<value optimized out>, 
    attr=0x400e3c2c, start_routine=0x40b76595 <frameBufferWatcher>, arg=0x0)
    at bionic/libc/bionic/pthread.c:357
#4  0x00000000 in ?? ()

Thread 5:
#0  __futex_syscall3 () at bionic/libc/arch-arm/bionic/atomics_arm.S:182
#1  0x400bc36c in __pthread_cond_timedwait_relative (cond=0x32e348, 
    mutex=0x32e344, reltime=0x0) at bionic/libc/bionic/pthread.c:1477
#2  0x400bc420 in __pthread_cond_timedwait (cond=0x32e348, mutex=0x32e344, 
    abstime=0x0, clock=0) at bionic/libc/bionic/pthread.c:1500
#3  0x432ddc58 in _mali_osu_lock_wait (lock=0x32e340, 
    mode=<value optimized out>)
    at hardware/arm/mali-samsung-dev/driver/./src/base/os/linux/mali_osu_locks.c:360
#4  0x42dee4bc in __egl_worker_thread_wait_for_message (
    start_param=<value optimized out>)
    at hardware/arm/mali-samsung-dev/driver/./src/egl/egl_worker.c:54
#5  __egl_worker_thread (start_param=<value optimized out>)
    at hardware/arm/mali-samsung-dev/driver/./src/egl/egl_worker.c:104
#6  0x400bcc28 in __thread_entry (func=0x42dee49c <__egl_worker_thread>, 
    arg=0x32e308, tls=<value optimized out>)
    at bionic/libc/bionic/pthread.c:217
#7  0x400bc77c in pthread_create (thread_out=<value optimized out>, 
    attr=0x400e3c2c, start_routine=0x42dee49c <__egl_worker_thread>, 
    arg=0x32e308) at bionic/libc/bionic/pthread.c:357
#8  0x00000000 in ?? ()

Thread 6:
#0  __futex_syscall3 () at bionic/libc/arch-arm/bionic/atomics_arm.S:182
#1  0x400bc36c in __pthread_cond_timedwait_relative (cond=0x330058, 
    mutex=0x330054, reltime=0x0) at bionic/libc/bionic/pthread.c:1477
#2  0x400bc420 in __pthread_cond_timedwait (cond=0x330058, mutex=0x330054, 
    abstime=0x0, clock=0) at bionic/libc/bionic/pthread.c:1500
#3  0x432ddc58 in _mali_osu_lock_wait (lock=0x330050, 
    mode=<value optimized out>)
    at hardware/arm/mali-samsung-dev/driver/./src/base/os/linux/mali_osu_locks.c:360
#4  0x42dee4bc in __egl_worker_thread_wait_for_message (
    start_param=<value optimized out>)
    at hardware/arm/mali-samsung-dev/driver/./src/egl/egl_worker.c:54
#5  __egl_worker_thread (start_param=<value optimized out>)
    at hardware/arm/mali-samsung-dev/driver/./src/egl/egl_worker.c:104
#6  0x400bcc28 in __thread_entry (func=0x42dee49c <__egl_worker_thread>, 
    arg=0x330018, tls=<value optimized out>)
    at bionic/libc/bionic/pthread.c:217
#7  0x400bc77c in pthread_create (thread_out=<value optimized out>, 
    attr=0x400e3c2c, start_routine=0x42dee49c <__egl_worker_thread>, 
    arg=0x330018) at bionic/libc/bionic/pthread.c:357
#8  0x00000000 in ?? ()

Thread 7:
#0  __ioctl () at bionic/libc/arch-arm/syscalls/__ioctl.S:9
#1  0x400d1900 in ioctl (fd=<value optimized out>, request=1168187032)
    at bionic/libc/bionic/ioctl.c:41
#2  0x432de55c in mali_driver_ioctl (context=0x21, command=1168187020, 
    args=0x45a11e98)
    at hardware/arm/mali-samsung-dev/driver/./src/base/os/linux/mali_uku.c:306
#3  0x432decdc in arch_worker_thread (callback_thread_id=<value optimized out>)
    at hardware/arm/mali-samsung-dev/driver/./src/base/arch/arch_011_udd/base_arch_main.c:380
#4  0x400bcc28 in __thread_entry (func=0x432deca4 <arch_worker_thread>, 
    arg=0x0, tls=<value optimized out>) at bionic/libc/bionic/pthread.c:217
#5  0x400bc77c in pthread_create (thread_out=<value optimized out>, 
    attr=0x400e3c2c, start_routine=0x432deca4 <arch_worker_thread>, arg=0x0)
    at bionic/libc/bionic/pthread.c:357
#6  0x00000000 in ?? ()

Thread 8:
#0  syscall () at bionic/libc/arch-arm/bionic/syscall.S:50
#1  0x40caf76e in epoll_wait (epfd=38, events=0x347548, maxevents=16, 
    timeout=<value optimized out>)
    at /Volumes/B2G/m-c/ipc/chromium/src/third_party/libevent/epoll_sub.c:51
#2  0x40b79558 in android::EventHub::getEvents (this=0x3474f0, 
    timeoutMillis=<value optimized out>, buffer=<value optimized out>, 
    bufferSize=<value optimized out>)
    at /Volumes/B2G/m-c/widget/gonk/libui/EventHub.cpp:753
#3  0x40b8b6d2 in android::InputReader::loopOnce (this=0x348740)
    at /Volumes/B2G/m-c/widget/gonk/libui/InputReader.cpp:277
#4  0x40b81b24 in android::InputReaderThread::threadLoop (
    this=<value optimized out>)
    at /Volumes/B2G/m-c/widget/gonk/libui/InputReader.cpp:838
#5  0x415d2198 in android::Thread::_threadLoop (user=<value optimized out>)
    at frameworks/base/libs/utils/Threads.cpp:834
#6  0x415d27de in thread_data_t::trampoline (t=<value optimized out>)
    at frameworks/base/libs/utils/Threads.cpp:127
#7  0x400bcc28 in __thread_entry (
    func=0x415d2749 <thread_data_t::trampoline(thread_data_t const*)>, 
    arg=0x347898, tls=<value optimized out>)
    at bionic/libc/bionic/pthread.c:217
#8  0x400bc77c in pthread_create (thread_out=<value optimized out>, 
    attr=0xbeab9494, 
    start_routine=0x415d2749 <thread_data_t::trampoline(thread_data_t const*)>, arg=0x347898) at bionic/libc/bionic/pthread.c:357
#9  0x00000000 in ?? ()

Thread 9:
#0  __futex_syscall3 () at bionic/libc/arch-arm/bionic/atomics_arm.S:182
#1  0x400bc36c in __pthread_cond_timedwait_relative (cond=0x429fe5c8, 
    mutex=0x429fe5cc, reltime=0x0) at bionic/libc/bionic/pthread.c:1477
#2  0x400bc420 in __pthread_cond_timedwait (cond=0x429fe5c8, mutex=0x429fe5cc, 
    abstime=0x0, clock=0) at bionic/libc/bionic/pthread.c:1500
#3  0x429f8c68 in MeasureSNGLoop ()
   from /Volumes/B2G/out/target/product/galaxys2/system/lib/libakm.so
#4  0x429f8c68 in MeasureSNGLoop ()
   from /Volumes/B2G/out/target/product/galaxys2/system/lib/libakm.so
Backtrace stopped: previous frame identical to this frame (corrupt stack?)
Comment 31 Ben Turner (not reading bugmail, use the needinfo flag!) 2012-05-16 10:17:25 PDT
Yeah, that stack is not about the GonkSensorThread...
Comment 32 Jim Straus 2012-05-16 15:16:04 PDT
How about we approve the changes to nsAppShell, close this bug and open a new one against the InputReaderThread?
Comment 33 Chris Jones [:cjones] inactive; ni?/f?/r? if you need me 2012-05-23 15:22:19 PDT
Comment on attachment 624392 [details] [diff] [review]
Changes to nsAppShell

Looks good.  Let's get the remaining hang figured out.
Comment 34 Ben Turner (not reading bugmail, use the needinfo flag!) 2012-05-23 15:27:02 PDT
Comment on attachment 624392 [details] [diff] [review]
Changes to nsAppShell

Review of attachment 624392 [details] [diff] [review]:
-----------------------------------------------------------------

::: widget/gonk/nsAppShell.cpp
@@ +553,5 @@
>      return rv;
>  }
>  
> +nsresult
> +nsAppShell::Exit()

Wait, this needs to be NS_IMETHODIMP
Comment 35 Jim Straus 2012-05-31 07:33:49 PDT
Created attachment 628729 [details] [diff] [review]
Changes to nsAppShell

Note, with this in place now we successfully reboot many times.  The specific case that I've found at this point is the device unlocked and the screen off.  That hangs, but other variations reboot fine.
Comment 36 Jim Straus 2012-05-31 19:00:20 PDT
Hi Chris.  Can you push this up into m-c?
Comment 37 Chris Jones [:cjones] inactive; ni?/f?/r? if you need me 2012-05-31 19:04:22 PDT
Bugs are only resolved when the fix(es) land on m-c.

checkin-needed is the flag you want.
Comment 38 Ryan VanderMeulen [:RyanVM] 2012-06-09 14:18:02 PDT
https://hg.mozilla.org/integration/mozilla-inbound/rev/6a6d022c96b7

Thanks for the patch, Jim! To make life easier for those checking in on your behalf, please follow the directions below to make sure your future patches contain all the necessary commit information in them. Also, apologies for the delay.
https://developer.mozilla.org/en/Creating_a_patch_that_can_be_checked_in
Comment 39 Ryan VanderMeulen [:RyanVM] 2012-06-09 19:46:16 PDT
https://hg.mozilla.org/mozilla-central/rev/6a6d022c96b7

Note You need to log in before you can comment on or make changes to this bug.