Closed Bug 687726 Opened 13 years ago Closed 13 years ago

Shutdown the async thread of each connection before xpcom-shutdown.

Categories

(Core :: SQLite and Embedded Database Bindings, defect)

x86_64
Linux
defect
Not set
normal

Tracking

()

RESOLVED DUPLICATE of bug 744294

People

(Reporter: espindola, Assigned: espindola)

References

Details

Attachments

(1 file)

This is part of 673017. In

https://tbpl.mozilla.org/?tree=Try&rev=fb8a6119739d

I have a patch that causes the shutdown to spin. Ignore the XPCOM bits since that will be fixed by 687696.

The question is, is this the right thing to do or should there be another API to cause it to spin (and the xpcom shutdown procedure should use it in some way)
From what I can tell, Storage just relies on thread destruction doing the spinning and merging the async thread to main thread. This means runnables posted to the async thread thread may run pretty late (I guess at xpcom-shutdown-threads, that comes well after xpcom-shutdown), and that may be an issue with some runnables.
Thus the question here is mostly whether this is something Storage should take care of (maybe anticipating async thread spinning and closing in xpcom-will-shutdown?), or it should be handled apart in each Storage consumer who posts runnables to the Storage async thread.
If it is something the consumers should take care of, is there an API for it already? If so I would love to use to fix 673017. If not, I would propose implementing this first in the places shutdown (similar to what the above patch does, but without the bits that cover for not having a profile) and moving it to the consumers afterwards if so desired.
There is no such API, an implementable idea may be to have Storage shutdown() the async thread at xpcom-will-shutdown, ideally consumers should stop posting stuff to that thread much earlier, being profile shutdown dependent.

This requires feedback from Sdwilsh and Asuth though.
It's not immediately obvious from that bug what the problem Places is experiencing is.  It looks like the Places callbacks are being invoked after most of the system is dead, and Places isn't checking for that, so all kinds of invariants are being violated and crashes are happening.  Does that sound right?

Spinning a nested event loop is almost always a bad idea, definitely so if you are not very low level code.  What happens if that spin happens while the cursed Sync nested-event-loop spinning happens?

How about if we just have the Storage component cancel all outstanding statements at XPCOM shutdown time so that we don't call your callbacks.  If Places really needs to be doing something, it should probably interrupt the shutdown process via the mechanisms that exist for that (when querying whether shutdown is possible), rather than dragging them out.
(In reply to Marco Bonardo [:mak] from comment #3)
> There is no such API, an implementable idea may be to have Storage
> shutdown() the async thread at xpcom-will-shutdown, ideally consumers should
> stop posting stuff to that thread much earlier, being profile shutdown
> dependent.

I like the idea. We can probably use the close callback as in the above patch, but move the loop waiting for it to a xpcom shutdown message. I will give it a try.
 
> This requires feedback from Sdwilsh and Asuth though.
(In reply to Andrew Sutherland (:asuth) from comment #4)
> It's not immediately obvious from that bug what the problem Places is
> experiencing is.  It looks like the Places callbacks are being invoked after
> most of the system is dead, and Places isn't checking for that, so all kinds
> of invariants are being violated and crashes are happening.  Does that sound
> right?
> 
> Spinning a nested event loop is almost always a bad idea, definitely so if
> you are not very low level code.  What happens if that spin happens while
> the cursed Sync nested-event-loop spinning happens?
> 
> How about if we just have the Storage component cancel all outstanding
> statements at XPCOM shutdown time so that we don't call your callbacks.  If
> Places really needs to be doing something, it should probably interrupt the
> shutdown process via the mechanisms that exist for that (when querying
> whether shutdown is possible), rather than dragging them out.

Note that the crash I was seeing is because a callback created in places. There is stack trace in

https://bugzilla.mozilla.org/show_bug.cgi?id=673017#c237
(In reply to Marco Bonardo [:mak] from comment #3)
> There is no such API, an implementable idea may be to have Storage
> shutdown() the async thread at xpcom-will-shutdown, ideally consumers should
> stop posting stuff to that thread much earlier, being profile shutdown
> dependent.

Sorry, I skimmed this bit a little too much.  This sounds fairly reasonable, although there's definitely the UX issue of "I am trying to shutdown Firefox, but Places is doing something that takes forever and I don't want to wait for that", so I think canceling a bunch of stuff also sounds pretty reasonable.
Places creates a thread and is responsible for calling shutdown() on it during or before the xpcom-shutdown-threads notification. I believe this does what you want:

62    * Shutdown the thread.  This method prevents further dispatch of events to
63    * the thread, and it causes any pending events to run to completion before
64    * the thread joins (see PR_JoinThread) with the current thread.  During this
65    * method call, events for the current thread may be processed.

You may also want to cancel statements or do other work, but joining the thread is the minimum requirement from the XPCOM perspective.
Now that the simpler patches are in, I have pushed the remaining bits to

https://tbpl.mozilla.org/?tree=Try&rev=60830f3153d7

Note that according to Benjanmin's comment, this is not complete as it still doesn't shut down the thread. It should fix this observed failure as the event causing the crash was coming from a connection and those are closed.

So, is this OK or should this spinning be done when we observe xpcom-shutdown?

There is currently no easy way to access the thread that is used by the connection. We should probably add a Connection::Shutdown method, but we can do that as an independent patch.
(In reply to Andrew Sutherland (:asuth) from comment #4)
> It looks like the Places callbacks are being invoked after
> most of the system is dead, and Places isn't checking for that, so all kinds
> of invariants are being violated and crashes are happening.  Does that sound
> right?

the runnables we dispatch to the Storage async thread (to be serialized with the queries) are run too late, one of these runnables includes a scriptBlocker that runs after xpcom-shutdown when layout is dead, and causes all sort of cries. Other runnables we use should not have issues since they are really simple.

> How about if we just have the Storage component cancel all outstanding
> statements at XPCOM shutdown time so that we don't call your callbacks.

There is some problem with this, supposing the user requested a clear history on shutdown and there is outstanding async work before it, we'd cancel the user request and create a privacy problem. I suppose this may happen for other consumers.

(In reply to Andrew Sutherland (:asuth) from comment #7)
> Sorry, I skimmed this bit a little too much.  This sounds fairly reasonable,
> although there's definitely the UX issue of "I am trying to shutdown
> Firefox, but Places is doing something that takes forever and I don't want
> to wait for that", so I think canceling a bunch of stuff also sounds pretty
> reasonable.

We are already doing this now, when the thread is destroyed it gets spinned and then merged, this would not be different, would just happen earlier than at xpcom-shutdown-threads. Places doesn't do a lot of work at shutdown, but there may be some backlog due to link coloring and visits that delays this small amount of needed work so that it ends up being still queued up when we hit xpcom-shutdown. In this case is most likely due to the test shutting down too late (the patch for that has already been pushed, so that may already fix the only known case).

(In reply to Benjamin Smedberg  [:bsmedberg] from comment #8)
> Places creates a thread and is responsible for calling shutdown() on it
> during or before the xpcom-shutdown-threads notification.

as it is now I think this already happens at xpcom-shutdown-threads, as for any other Storage consumer. the thread is actually created by Storage, is the thread that it makes available to all it consumers, that's why I'm arguing it may be fixed more globally than for a single one.

> You may also want to cancel statements or do other work, but joining the
> thread is the minimum requirement from the XPCOM perspective.

That's my suggestion, shutdown all the Storage async threads for all consumers at xpcom-will-shutdown.
just a quick summary from a chat on IRC:

A proposed solution is
*) Record all created connections (or at least the threads they create).
*) Add a shutdown method to the connection that just shutdowns the thread.
*) In mozStorageService.cpp's Service::shutdown, call shutdown on every created connection.
Comment on attachment 561855 [details] [diff] [review]
Remember the connections we create and shutdown their threads.

Review of attachment 561855 [details] [diff] [review]:
-----------------------------------------------------------------

So, first of all I think is better if this goes through the module owner (sdwilsh), I'm mostly worried about thread safety issues I'm not aware of. He may also have better ideas.
I also think if this is going to be approved, it may want a cpp test that gets a handle to the thread, invokes shutdown and checks it has been correctly shutdown (but I'd suggest to wait approval before writing it)

::: storage/src/mozStorageConnection.h
@@ +158,5 @@
>    nsCString getFilename();
>  
> +  /**
> +   * Calls shutdown on mAsyncExecutionThread. The connection must be
> +   * closing.

the comment may be improved specifying the effect of shutting down async thread, the fact all events on it are processed and it gets merged to the service thread.

::: storage/src/mozStorageService.cpp
@@ +320,5 @@
>  {
> +  MutexAutoLock mutex(mRegistrationMutex);
> +  for (std::vector<Connection*>::iterator I = mConnections.begin(),
> +         E = mConnections.end(); I != E; ++I) {
> +    Connection *C = *I;

this looks strange, casting problems?

@@ +327,1 @@
>    NS_IF_RELEASE(sXPConnect);

you may want to sub scope the added part, the mutex doesn't need to protect the release

::: storage/src/mozStorageService.h
@@ +105,5 @@
>     */
>    static PRInt32 getSynchronousPref();
>  
> +  /**
> +   * Remeber that we have created this connection so that we can shut it down.

wants @param too, and wants comment regarding the fact it uses a certain mutex
Attachment #561855 - Flags: review?(mak77) → review?(sdwilsh)
I'll look at this later this week.
there's another thing I think it's wrong in the patch, the connection must be supported as a weak reference, it can go away before the service, thus you'd crash in that case. Only survived connections should be shutdown.
(In reply to Benjamin Smedberg  [:bsmedberg] from comment #8)
> Places creates a thread and is responsible for calling shutdown() on it
> during or before the xpcom-shutdown-threads notification. I believe this
> does what you want:
It was my understanding that `nsIThreadManager` shut down threads at the right time.  Is that not correct?

Even then, "xpcom-shutdown-threads" is far later than "places-shutdown", which would be problematic, right?  This patch just makes it happen at "xpcom-shutdown", but even that is later than "places-shutdown" (IIRC).
What is "the right time"? In general no, you are responsible for shutting down threads explicitly. The thread manager *should* assert if there are any dangling threads that it has to clean up, and this is one of the culprits that are left.
(In reply to Shawn Wilsher :sdwilsh from comment #16)
> Even then, "xpcom-shutdown-threads" is far later than "places-shutdown",
> which would be problematic, right?  This patch just makes it happen at
> "xpcom-shutdown", but even that is later than "places-shutdown" (IIRC).

places-shutdown is just an event used to tell all places users to stop using the database and fire any remaining task, at that point asynchronous stuff has just to be executed and it doesn't matter much when that happens, thanks to asyncClose().
It would be fine for places if Storage shuts down the thread at xpcom-will-shutdown, the issue we hit is trying to do work after xpcom-shutdown. It would make even more sense if we'd catch Storage users trying to do work at xpcom-shutdown (bad) since those are profile-driven.
Blocks: 658305
So, espindola, if you may fix or answer issues in comment 13 and comment 15, and shutdown the thread at xpcom-will-shutdown (this is late enough since Storage uses profile files, nothing should use it after profile-before-change), I think that may be the right way to go. It may be possible to test that too by passing the shutdown notification to the service and trying to use some async API, that should fail.
I will. Just debugging the memory leak that showed up on the debug+opt builds on windows. This bug is next on my list.
Blocks: 686237
Comment on attachment 561855 [details] [diff] [review]
Remember the connections we create and shutdown their threads.

clearing request since not yet ready, will reask for review when we feel like it's ready for the tree.
Attachment #561855 - Flags: review?(sdwilsh)
Depends on: 696399, 696404
Summary: places shutdown process doesn't wait for connections to close → Storage should not shutdown async threads after xpcom-shutdown.
Rafael, is there any other bug that should be added to dependencies, before this one can be worked on? I added the ones I'm aware of.
Depends on: 702815
Summary: Storage should not shutdown async threads after xpcom-shutdown. → Shutdown the async thread of each connection before xpcom-shutdown.
I spun off bug 702815 to just add the list of connections.  This bug can deal with the shutdown issues.
Blocks: 704030
This is now a dup of 744294, no?
yeah, not exactly 100% the same (this was expected to spin earlier) but not worth to keep it open until we decide how to move on from the current spinning
Status: ASSIGNED → RESOLVED
Closed: 13 years ago
Resolution: --- → DUPLICATE
Product: Toolkit → Core
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: