Open Bug 926522 Opened 11 years ago Updated 2 years ago

JSIL game "Bytown Lumberjack" causes Firefox hang for about 30 seconds on exit

Categories

(Core :: General, defect)

25 Branch
defect

Tracking

()

Tracking Status
firefox24 --- unaffected
firefox25 --- affected
firefox26 --- affected
firefox27 --- affected

People

(Reporter: cpeterson, Assigned: karlt)

References

(Depends on 1 open bug, Blocks 1 open bug, )

Details

(Keywords: hang, perf, regression, Whiteboard: [leave open][games])

Attachments

(4 files, 3 obsolete files)

STR:
1. Load http://hildr.luminance.org/Lumberjack/Lumberjack.html
2. After the game's main menu loads, quit Firefox.

RESULT:
Firefox will hang for at least 30 seconds before the application icon disappears. (I have only tested OSX 10.8.5.)

This hang seems to be related to some resource cleanup because if you close the Lumberjack game window and wait ~30 seconds before quitting Firefox, then Firefox does not hang.

The hang manifests differently in Firefox 24 and 25+:

* When quitting Firefox 24 on OSX, the Firefox application icon disappears and the process exits (as expected), but the application menu bar remains visible for about 30 seconds.

* When quitting Firefox 25, 26, or 27 on OSX, the Firefox application icon, menu bar, and process hang around for about 30 seconds.

This icon hang seems to be a regression in Firefox 25, maybe around Nightly 25 build 2013-07-31 or 2013-08-01.
Regression window(m-c)
Good:
http://hg.mozilla.org/mozilla-central/rev/a4e9c9c9dbf9
Mozilla/5.0 (Windows NT 6.1; WOW64; rv:26.0) Gecko/20100101 Firefox/26.0 ID:20130912025238
Bad:
http://hg.mozilla.org/mozilla-central/rev/b83f6d80af5f
Mozilla/5.0 (Windows NT 6.1; WOW64; rv:26.0) Gecko/20100101 Firefox/26.0 ID:20130912040539
Pushlog:
http://hg.mozilla.org/mozilla-central/pushloghtml?fromchange=a4e9c9c9dbf9&tochange=b83f6d80af5f


Regression window(m-c)
Good:
http://hg.mozilla.org/integration/mozilla-inbound/rev/e5e75d914d5b
Mozilla/5.0 (Windows NT 6.1; WOW64; rv:26.0) Gecko/20100101 Firefox/26.0 ID:20130911211939
Bad:
http://hg.mozilla.org/integration/mozilla-inbound/rev/3c13601b7924
Mozilla/5.0 (Windows NT 6.1; WOW64; rv:26.0) Gecko/20100101 Firefox/26.0 ID:20130911213111
Pushlog:
http://hg.mozilla.org/integration/mozilla-inbound/pushloghtml?fromchange=e5e75d914d5b&tochange=3c13601b7924

Suspected : Bug 914030
In local build
Last Good: c5e3442b758e\
First Bad: 8edc48525bc6\
Regressed by:
	8edc48525bc6	Karl Tomlinson — b=914030 remove MediaBufferDecoder::Shutdown(), no longer necessary since bug 900711, as part of making AudioContext::Shutdown() idempotent r=ehsan
Component: General → XPCOM
Keywords: perf
Version: unspecified → 25 Branch
Can we get a profile using Instruments or something equivalent?
Component: XPCOM → General
Flags: needinfo?(cpeterson)
Karl, can you take a look please?
Stack during shutdown(process remains in taskmanager for about 30sec)
bp-7c1acae1-44c4-4840-bd8b-3bade2131015
The main thread is pretty late, shutting down the thread manager and therefore joining a bunch of threads. It's blocked on some other thread, but I'm not sure which.

Thread 23 is in `anonymous namespace'::wasapi_stream_render_loop
 	media/libcubeb/src/cubeb_wasapi.cpp

Not sure whether that's relevant or not! But it's not *directly* the cause, since it's not an XPCOM thread.
Assignee: nobody → karlt
bsmedberg: Here is an Instruments trace of the shutdown hang. I hope this helps.
Flags: needinfo?(cpeterson)
Do opt builds disable CC during shutdown?

I see quite different behavior in a debug build:
The penultimate CC takes 17.6 seconds and frees 623677 objects, and this seems
to trigger deletion of the AudioContext soon after.  This is a much faster
shutdown than the opt build.

The opt build has 0 objects to visit during the late CC runs, even when the game main menu has displayed long enough that shutdown is fast.
The AudioContext is never deleted.

In 8edc48525bc6, posting the event to shut down the decoder thread pool moved
from window shutdown notification to AudioContext destruction.
With AudioContext never deleted in opt builds, this event is not posted.

I haven't yet worked out why this holds up shutdown, or why XPCOM shutdown doesn't save the day by cleaning up remaining thread pools.
(In reply to Karl Tomlinson (:karlt) from comment #9)
> Do opt builds disable CC during shutdown?

IIRC yes.
Yes, CC is disabled in opt builds during shutdown.
nsThreadManager on the main thread is trying to shut down the thread named
"MediaBufferDecoder #"NN, which is waiting with up to 60 second timeout on
mThreadPool::mEvents.mReentrantMonitor.

Running an opt build with XPCOM_CC_RUN_DURING_SHUTDOWN=1 still pauses for 40+
seconds before nsCycleCollector::ShutdownCollect() runs, so I think the
difference between debug and opt builds is just that the debug build takes so
long to load that the timeout expires before shutdown.

If we are not collecting objects to check for leaks and we haven't explicitly
shutdown() the nsIThreadPool, then I assume we don't need to wait for the
threads in the pool to finish before exiting the process.

In NS_FREE_PERMANENT_DATA builds, however, we still need to do something to
ensure the thread finishes before application exit.
The way idle nsThreadPool threads wait with a timeout doesn't work well with the nsIThread shutdown from nsThreadManager.  These are single event threads so that shutdown process doesn't really make sense.

WIP at https://tbpl.mozilla.org/?tree=Try&rev=3dea98fe6c24 is for shutting down nsThreadPool threads separately for nsThreadManager threads.  It avoids the hang, but still need to hook into xpcom shutdown.  I think that is better than requiring every client to shutdown its pool, usually off an event.
> If we are not collecting objects to check for leaks and we haven't explicitly
> shutdown() the nsIThreadPool, then I assume we don't need to wait for the
> threads in the pool to finish before exiting the process.

Exiting the process isn't the issue, it's what those threads might be doing, including using the component manager or thread manager globals and locks, which will already be dead. Callers really are responsible for shutting down threadpools during xpcom-shutdown-threads, just as they are responsible for shutting down individual threads, to make sure that no XPCOM code is running at the time of shutdown.
Yeah, seems like AudioContext and other things using threads (ConvolverNode comes to mind) should just handle xpcom-shutdown-threads and shut down their threads gracefully.
(In reply to :Ehsan Akhgari (needinfo? me!) from comment #15)
> Yeah, seems like AudioContext and other things using threads (ConvolverNode
> comes to mind) should just handle xpcom-shutdown-threads and shut down their
> threads gracefully.

I don't think ReverbConvolver uses XPCOM on its background thread and it uses
base::Thread to manage the thread.  I don't know what shutdown requirements
base::Thread has but the xpcom shutdown reasoning doesn't necessarily apply to
ReverbConvolver.
While threads are processing their initial runnable, which is the nsThreadPool, it will not be destroyed.  The threads remove themselves from mThreads before returning from Run().

This includes removal of changes from bug 900711.
Attachment #818822 - Flags: review?(benjamin)
Blocks: 928222
(In reply to Benjamin Smedberg  [:bsmedberg] from comment #14)
> Exiting the process isn't the issue, it's what those threads might be doing,
> including using the component manager or thread manager globals and locks,
> which will already be dead.

OK, thanks.  nsThreadPool threads don't necessarily need to run XPCOM code
(though they do in the current implementation), but it is likely that they
will, so joining them during xpcom-shutdown-threads sounds good.

> Callers really are responsible for shutting down
> threadpools during xpcom-shutdown-threads,

That's the existing model but we have a choice of either doing that from every
client or from nsThreadPool, the code creating the threads.

I'd prefer to join the threads from nsThreadPool so that each client
doesn't need to do this and so that similar bugs like this can't happen.
Shutting down thread pools is difficult to get right, and running nested event
loops leads to problems.

> just as they are responsible for
> shutting down individual threads, to make sure that no XPCOM code is running
> at the time of shutdown.

It would be nice if nsIThread clients didn't need to call shutdown before
removing the last reference, but at least nsThreadManager does already ensure
that shutdown() is called on nsIThreads after xpcom-shutdown-threads.
NS_XPCOM_SHUTDOWN_THREADS_OBSERVER_ID is sent before gXPCOMThreadsShutDown,
which leads to a race condition if thread creation is blocked on
gXPCOMThreadsShutDown and objects are created after
NS_XPCOM_SHUTDOWN_THREADS_OBSERVER_ID is sent.
nsObserverList::NotifyObservers does not notify observers registered during
notification.  nsThreadPool can be used from multiple threads, and so it is
subject to this race condition.  The timer thread is also not shut down until
after NS_XPCOM_SHUTDOWN_THREADS_OBSERVER_ID has been sent.
nsIObserverService::addObserver() is not thread-safe.

This all makes it better to shutdown nsThreadPools at the same time as the
nsThreadManager, than to have each nsThreadPool try to observe
NS_XPCOM_SHUTDOWN_THREADS_OBSERVER_ID.

Calling ShutdownPools() before other nsIThread::Shutdown() happens to mean this is enough to resolve the issues in this bug.  I've filed bug 928222 for
simplifying the thread management in nsThreadPool, which makes this ordering
irrelevant.
Attachment #818828 - Flags: review?(benjamin)
It looks like WMFByteStream may have a similar bug to AudioContext.

In RasterImage, bug 857367 comment 15 indicates that dispatching to the main
thread DecodePool::RequestDecode() after xpcom-threads-shutdown was a
side-effect of code clean-up.  Failing to dispatch after shutdown will be
fine.  Looks like it works around the
xpcom-threads-shutdown/gXPCOMThreadsShutDown race because the DecodePool is
usually instantiated before xpcom-threads-shutdown.

I suspect TransactionThreadPool needs its explicit nsIThreadPool shutdown to
ensure ordering so I've left that.

Please let me know if you'd prefer I ask the individual module owners to review these changes.
Attachment #818829 - Flags: review?(benjamin)
Attachment #818822 - Flags: review?(benjamin) → review+
Comment on attachment 818828 [details] [diff] [review]
join nsThreadPool threads from nsThreadPool during xpcom thread shutdown

I'm a bit torn on this. I think that clients should still be expected to shut down their own thread pools at an appropriate time (long before xpcom-shutdown-threads, in most cases!). However for stability, it's good that we are adding a last-ditch shutdown system. I really would like an assertion that there are no active threadpools, though.

I don't think that a hashmap is the right data structure for storing these, since random lookup really isn't required. Please use mozilla::LinkedList for this.
Attachment #818828 - Flags: review?(benjamin) → review-
Comment on attachment 818829 [details] [diff] [review]
part 5: remove now-unnecessary nsThreadPool shutdown code

I don't really understand what this code is doing, but it appears to mean that we hold a static ref sThreadPool and never call Shutdown on that threadpool? That should trigger the leak detector and I think we should continue to shut down every threadpool at some point.
Attachment #818829 - Flags: review?(benjamin)
Run() runs only once on each thread, so it always needs to shutdown the thread
if it is exiting/returning.

The next patch uses this invariant.
Attachment #819601 - Flags: review?(benjamin)
Thank you for looking at this Benjamin.

(In reply to Benjamin Smedberg  [:bsmedberg] from comment #21)
> I'm a bit torn on this. I think that clients should still be expected to
> shut down their own thread pools at an appropriate time (long before
> xpcom-shutdown-threads, in most cases!). However for stability, it's good
> that we are adding a last-ditch shutdown system. I really would like an
> assertion that there are no active threadpools, though.

nsIThreadPools are different from nsIThreads, so I'm not understanding why you
want to require an explicit shutdown() call.

2 things we want when using these interfaces are:

1) Ensure that threads are joined at some point, so they are not leaked.

2) Maintain ordering requirements for operations on the thread or pool and
   those on the owning thread.

The patches here ensure that we have each of these at shutdown, so I assume
there is only the question of what needs to happen when shutdown is not
happening in the near future.

With nsIThreads, the thread will not be joined if shutdown() is not called, in
which case it would leak, and so explicit shutdown() is necessary.  I would
like to find another solution for this, but not in this bug.

With nsIThreadPools, however, the pool's threads will initiate their join when
they have no more tasks to run.  Provided we stop dispatching more events, the
pool's threads will be joined at some point, and we automatically have what we
want for (1), even without an explicit shutdown.

Some clients will want to do additional cleanup only after the nsThreadPool
has run all its events, and so will call shutdown() on the pool, using nested
event loops to get a notification that the events have all run.

Other clients don't need to wait for all the events to run, and so don't need
the nested event loops.  It is better if nested event loops higher in the
stack are allowed to unroll.  e.g. JS code that runs after an alert will not
run until nested event loops lower on the stack, such as thread shutdowns,
unwind.  This is of course a bug in the alert implementation or specification,
but all nested event loops contribute to the problems of other nested event
loops.

The only advantage I see for having these clients explicitly shut down
nsIThreadPools is that the threads will not wait for mIdleThreadTimeout before
they are joined.  IMO that is not worth the cost of nested event loops.  If
there is a real need to join immediately, even when nothing is waiting for the
threads to finish, then I'd like to use an alternative approach that does not
use nested event loops.

> I don't think that a hashmap is the right data structure for storing these,
> since random lookup really isn't required.

Yes, you are right, of course.  A hashmap was not the right structure, thanks,
and there was an incorrect assumption that adding a strong reference to a weak
reference in the hashtable would be enough to keep the objects alive.

I've changed the approach so that pools are always removed from the list
before their last strong reference is removed.  This was easier with the
atomic changes to the number of threads from the patch in bug 928222, and so
this patch applies on top of that.

> Please use mozilla::LinkedList for this.

mozilla::LinkedListElement is designed for single thread use and LinkedList is
not suitable for a static variable.  The behavior of ~LinkedListElement() to
remove the element from the list could be suppressed by explicitly removing
while a lock is held in ~nsThreadPool, but allocation of a LinkedList object
on the heap is not required here.

I've used a simple implementation for the limited use of the linked list here.

Part 3 is the patch in bug 928222.

See https://tbpl.mozilla.org/?tree=Try&rev=3be0734b8a83 for complete files with all the changes.
Attachment #819604 - Flags: review?(benjamin)
Comment on attachment 818829 [details] [diff] [review]
part 5: remove now-unnecessary nsThreadPool shutdown code

(In reply to Benjamin Smedberg  [:bsmedberg] from comment #22)
> Comment on attachment 818829 [details] [diff] [review]
> remove now-unnecessary nsThreadPool shutdown code
> 
> I don't really understand what this code is doing, but it appears to mean
> that we hold a static ref sThreadPool and never call Shutdown on that
> threadpool? That should trigger the leak detector and I think we should
> continue to shut down every threadpool at some point.

Yes, this changes the code to never call shutdown() on that pool.
However, the patch still leave the code to release sThreadPool and sets it to
null.  The thread pool cleans up its own threads and nothing is leaked.

WMFByteStream does not need to wait for the events on the pool to run or for
the threads to join, but these do happen.
Attachment #818829 - Attachment description: remove now-unnecessary nsThreadPool shutdown code → part 5: remove now-unnecessary nsThreadPool shutdown code
Attachment #818828 - Attachment is obsolete: true
Comment on attachment 818822 [details] [diff] [review]
remove unecessary shutdown code from ~nsThreadPool

Checked in for bug 928222.
https://hg.mozilla.org/integration/mozilla-inbound/rev/936fe240cb00
Attachment #818822 - Flags: checkin+
No longer blocks: 928222
Depends on: 928222
Whiteboard: [leave open]
Comment on attachment 819601 [details] [diff] [review]
part 2: turn always-true conditional for thread shutdown into an assert

This is not always true until we have the change in shutdown implementation from bug 928222.
Attachment #819601 - Flags: review?(benjamin)
Comment on attachment 819604 [details] [diff] [review]
part 4: join nsThreadPool threads from nsThreadPool during xpcom thread shutdown v2

Removing review request here until bug 928222 is sorted out.
Attachment #819604 - Flags: review?(benjamin)
Comment on attachment 819601 [details] [diff] [review]
part 2: turn always-true conditional for thread shutdown into an assert

This is now part of attachment 824412 [details] [diff] [review] in bug 928222.
Attachment #819601 - Attachment is obsolete: true
Merged with changes in bug 928222.  See comment 24 for explanation.
Attachment #819604 - Attachment is obsolete: true
Attachment #824416 - Flags: review?(benjamin)
Whiteboard: [leave open] → [leave open][games]
Blocks: gecko-games
Blocks: 936317
Attachment #824416 - Flags: review?(benjamin) → review+
Severity: normal → S3
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: