687726 - Shutdown the async thread of each connection before xpcom-shutdown.

Rafael Ávila de Espíndola (:espindola) (not reading bugmail)

Assignee

Description

•

13 years ago

This is part of 673017. In

https://tbpl.mozilla.org/?tree=Try&rev=fb8a6119739d

I have a patch that causes the shutdown to spin. Ignore the XPCOM bits since that will be fixed by 687696.

The question is, is this the right thing to do or should there be another API to cause it to spin (and the xpcom shutdown procedure should use it in some way)

Marco Bonardo [:mak]

Comment 1

•

13 years ago

From what I can tell, Storage just relies on thread destruction doing the spinning and merging the async thread to main thread. This means runnables posted to the async thread thread may run pretty late (I guess at xpcom-shutdown-threads, that comes well after xpcom-shutdown), and that may be an issue with some runnables.
Thus the question here is mostly whether this is something Storage should take care of (maybe anticipating async thread spinning and closing in xpcom-will-shutdown?), or it should be handled apart in each Storage consumer who posts runnables to the Storage async thread.

Rafael Ávila de Espíndola (:espindola) (not reading bugmail)

Assignee

Comment 2

•

13 years ago

If it is something the consumers should take care of, is there an API for it already? If so I would love to use to fix 673017. If not, I would propose implementing this first in the places shutdown (similar to what the above patch does, but without the bits that cover for not having a profile) and moving it to the consumers afterwards if so desired.

Marco Bonardo [:mak]

Comment 3

•

13 years ago

There is no such API, an implementable idea may be to have Storage shutdown() the async thread at xpcom-will-shutdown, ideally consumers should stop posting stuff to that thread much earlier, being profile shutdown dependent.

This requires feedback from Sdwilsh and Asuth though.

Andrew Sutherland [:asuth] (he/him)

Comment 4

•

13 years ago

It's not immediately obvious from that bug what the problem Places is experiencing is.  It looks like the Places callbacks are being invoked after most of the system is dead, and Places isn't checking for that, so all kinds of invariants are being violated and crashes are happening.  Does that sound right?

Spinning a nested event loop is almost always a bad idea, definitely so if you are not very low level code.  What happens if that spin happens while the cursed Sync nested-event-loop spinning happens?

How about if we just have the Storage component cancel all outstanding statements at XPCOM shutdown time so that we don't call your callbacks.  If Places really needs to be doing something, it should probably interrupt the shutdown process via the mechanisms that exist for that (when querying whether shutdown is possible), rather than dragging them out.

Rafael Ávila de Espíndola (:espindola) (not reading bugmail)

Assignee

Comment 5

•

13 years ago

(In reply to Marco Bonardo [:mak] from comment #3)
> There is no such API, an implementable idea may be to have Storage
> shutdown() the async thread at xpcom-will-shutdown, ideally consumers should
> stop posting stuff to that thread much earlier, being profile shutdown
> dependent.

I like the idea. We can probably use the close callback as in the above patch, but move the loop waiting for it to a xpcom shutdown message. I will give it a try.
 
> This requires feedback from Sdwilsh and Asuth though.

Rafael Ávila de Espíndola (:espindola) (not reading bugmail)

Assignee

Comment 6

•

13 years ago

(In reply to Andrew Sutherland (:asuth) from comment #4)
> It's not immediately obvious from that bug what the problem Places is
> experiencing is.  It looks like the Places callbacks are being invoked after
> most of the system is dead, and Places isn't checking for that, so all kinds
> of invariants are being violated and crashes are happening.  Does that sound
> right?
> 
> Spinning a nested event loop is almost always a bad idea, definitely so if
> you are not very low level code.  What happens if that spin happens while
> the cursed Sync nested-event-loop spinning happens?
> 
> How about if we just have the Storage component cancel all outstanding
> statements at XPCOM shutdown time so that we don't call your callbacks.  If
> Places really needs to be doing something, it should probably interrupt the
> shutdown process via the mechanisms that exist for that (when querying
> whether shutdown is possible), rather than dragging them out.

Note that the crash I was seeing is because a callback created in places. There is stack trace in

https://bugzilla.mozilla.org/show_bug.cgi?id=673017#c237

Andrew Sutherland [:asuth] (he/him)

Comment 7

•

13 years ago

(In reply to Marco Bonardo [:mak] from comment #3)
> There is no such API, an implementable idea may be to have Storage
> shutdown() the async thread at xpcom-will-shutdown, ideally consumers should
> stop posting stuff to that thread much earlier, being profile shutdown
> dependent.

Sorry, I skimmed this bit a little too much.  This sounds fairly reasonable, although there's definitely the UX issue of "I am trying to shutdown Firefox, but Places is doing something that takes forever and I don't want to wait for that", so I think canceling a bunch of stuff also sounds pretty reasonable.

Benjamin Smedberg

Comment 8

•

13 years ago

Places creates a thread and is responsible for calling shutdown() on it during or before the xpcom-shutdown-threads notification. I believe this does what you want:

62    * Shutdown the thread.  This method prevents further dispatch of events to
63    * the thread, and it causes any pending events to run to completion before
64    * the thread joins (see PR_JoinThread) with the current thread.  During this
65    * method call, events for the current thread may be processed.

You may also want to cancel statements or do other work, but joining the thread is the minimum requirement from the XPCOM perspective.

Rafael Ávila de Espíndola (:espindola) (not reading bugmail)

Assignee

Comment 9

•

13 years ago

Now that the simpler patches are in, I have pushed the remaining bits to

https://tbpl.mozilla.org/?tree=Try&rev=60830f3153d7

Note that according to Benjanmin's comment, this is not complete as it still doesn't shut down the thread. It should fix this observed failure as the event causing the crash was coming from a connection and those are closed.

So, is this OK or should this spinning be done when we observe xpcom-shutdown?

There is currently no easy way to access the thread that is used by the connection. We should probably add a Connection::Shutdown method, but we can do that as an independent patch.

Marco Bonardo [:mak]

Comment 10

•

13 years ago

(In reply to Andrew Sutherland (:asuth) from comment #4)
> It looks like the Places callbacks are being invoked after
> most of the system is dead, and Places isn't checking for that, so all kinds
> of invariants are being violated and crashes are happening.  Does that sound
> right?

the runnables we dispatch to the Storage async thread (to be serialized with the queries) are run too late, one of these runnables includes a scriptBlocker that runs after xpcom-shutdown when layout is dead, and causes all sort of cries. Other runnables we use should not have issues since they are really simple.

> How about if we just have the Storage component cancel all outstanding
> statements at XPCOM shutdown time so that we don't call your callbacks.

There is some problem with this, supposing the user requested a clear history on shutdown and there is outstanding async work before it, we'd cancel the user request and create a privacy problem. I suppose this may happen for other consumers.

(In reply to Andrew Sutherland (:asuth) from comment #7)
> Sorry, I skimmed this bit a little too much.  This sounds fairly reasonable,
> although there's definitely the UX issue of "I am trying to shutdown
> Firefox, but Places is doing something that takes forever and I don't want
> to wait for that", so I think canceling a bunch of stuff also sounds pretty
> reasonable.

We are already doing this now, when the thread is destroyed it gets spinned and then merged, this would not be different, would just happen earlier than at xpcom-shutdown-threads. Places doesn't do a lot of work at shutdown, but there may be some backlog due to link coloring and visits that delays this small amount of needed work so that it ends up being still queued up when we hit xpcom-shutdown. In this case is most likely due to the test shutting down too late (the patch for that has already been pushed, so that may already fix the only known case).

(In reply to Benjamin Smedberg  [:bsmedberg] from comment #8)
> Places creates a thread and is responsible for calling shutdown() on it
> during or before the xpcom-shutdown-threads notification.

as it is now I think this already happens at xpcom-shutdown-threads, as for any other Storage consumer. the thread is actually created by Storage, is the thread that it makes available to all it consumers, that's why I'm arguing it may be fixed more globally than for a single one.

> You may also want to cancel statements or do other work, but joining the
> thread is the minimum requirement from the XPCOM perspective.

That's my suggestion, shutdown all the Storage async threads for all consumers at xpcom-will-shutdown.

Rafael Ávila de Espíndola (:espindola) (not reading bugmail)

Assignee

Comment 11

•

13 years ago

just a quick summary from a chat on IRC:

A proposed solution is
*) Record all created connections (or at least the threads they create).
*) Add a shutdown method to the connection that just shutdowns the thread.
*) In mozStorageService.cpp's Service::shutdown, call shutdown on every created connection.

Rafael Ávila de Espíndola (:espindola) (not reading bugmail)

Assignee

Comment 12

•

13 years ago

Attached patch Remember the connections we create and shutdown their threads. — Details — Splinter Review

https://tbpl.mozilla.org/?tree=Try&rev=698841da005c

Assignee: nobody → respindola

Status: NEW → ASSIGNED

Attachment #561855 - Flags: review?(mak77)

Marco Bonardo [:mak]

Comment 13

•

13 years ago

Comment on attachment 561855 [details] [diff] [review]
Remember the connections we create and shutdown their threads.

Review of attachment 561855 [details] [diff] [review]:
-----------------------------------------------------------------

So, first of all I think is better if this goes through the module owner (sdwilsh), I'm mostly worried about thread safety issues I'm not aware of. He may also have better ideas.
I also think if this is going to be approved, it may want a cpp test that gets a handle to the thread, invokes shutdown and checks it has been correctly shutdown (but I'd suggest to wait approval before writing it)

::: storage/src/mozStorageConnection.h
@@ +158,5 @@
>    nsCString getFilename();
>  
> +  /**
> +   * Calls shutdown on mAsyncExecutionThread. The connection must be
> +   * closing.

the comment may be improved specifying the effect of shutting down async thread, the fact all events on it are processed and it gets merged to the service thread.

::: storage/src/mozStorageService.cpp
@@ +320,5 @@
>  {
> +  MutexAutoLock mutex(mRegistrationMutex);
> +  for (std::vector<Connection*>::iterator I = mConnections.begin(),
> +         E = mConnections.end(); I != E; ++I) {
> +    Connection *C = *I;

this looks strange, casting problems?

@@ +327,1 @@
>    NS_IF_RELEASE(sXPConnect);

you may want to sub scope the added part, the mutex doesn't need to protect the release

::: storage/src/mozStorageService.h
@@ +105,5 @@
>     */
>    static PRInt32 getSynchronousPref();
>  
> +  /**
> +   * Remeber that we have created this connection so that we can shut it down.

wants @param too, and wants comment regarding the fact it uses a certain mutex

Attachment #561855 - Flags: review?(mak77) → review?(sdwilsh)

Rafael Ávila de Espíndola (:espindola) (not reading bugmail)

Assignee

Updated

•

13 years ago

Depends on: 688913

Shawn Wilsher :sdwilsh

Comment 14

•

13 years ago

I'll look at this later this week.

Marco Bonardo [:mak]

Comment 15

•

13 years ago

there's another thing I think it's wrong in the patch, the connection must be supported as a weak reference, it can go away before the service, thus you'd crash in that case. Only survived connections should be shutdown.

Shawn Wilsher :sdwilsh

Comment 16

•

13 years ago

(In reply to Benjamin Smedberg  [:bsmedberg] from comment #8)
> Places creates a thread and is responsible for calling shutdown() on it
> during or before the xpcom-shutdown-threads notification. I believe this
> does what you want:
It was my understanding that `nsIThreadManager` shut down threads at the right time.  Is that not correct?

Even then, "xpcom-shutdown-threads" is far later than "places-shutdown", which would be problematic, right?  This patch just makes it happen at "xpcom-shutdown", but even that is later than "places-shutdown" (IIRC).

Benjamin Smedberg

Comment 17

•

13 years ago

What is "the right time"? In general no, you are responsible for shutting down threads explicitly. The thread manager *should* assert if there are any dangling threads that it has to clean up, and this is one of the culprits that are left.

Marco Bonardo [:mak]

Comment 18

•

13 years ago

(In reply to Shawn Wilsher :sdwilsh from comment #16)
> Even then, "xpcom-shutdown-threads" is far later than "places-shutdown",
> which would be problematic, right?  This patch just makes it happen at
> "xpcom-shutdown", but even that is later than "places-shutdown" (IIRC).

places-shutdown is just an event used to tell all places users to stop using the database and fire any remaining task, at that point asynchronous stuff has just to be executed and it doesn't matter much when that happens, thanks to asyncClose().
It would be fine for places if Storage shuts down the thread at xpcom-will-shutdown, the issue we hit is trying to do work after xpcom-shutdown. It would make even more sense if we'd catch Storage users trying to do work at xpcom-shutdown (bad) since those are profile-driven.

Marco Bonardo [:mak]

Updated

•

13 years ago

Blocks: 658305

Marco Bonardo [:mak]

Comment 19

•

13 years ago

So, espindola, if you may fix or answer issues in comment 13 and comment 15, and shutdown the thread at xpcom-will-shutdown (this is late enough since Storage uses profile files, nothing should use it after profile-before-change), I think that may be the right way to go. It may be possible to test that too by passing the shutdown notification to the service and trying to use some async API, that should fail.

Rafael Ávila de Espíndola (:espindola) (not reading bugmail)

Assignee

Comment 20

•

13 years ago

I will. Just debugging the memory leak that showed up on the debug+opt builds on windows. This bug is next on my list.

Marco Bonardo [:mak]

Updated

•

13 years ago

Blocks: 686237

Marco Bonardo [:mak]

Comment 21

•

13 years ago

Comment on attachment 561855 [details] [diff] [review]
Remember the connections we create and shutdown their threads.

clearing request since not yet ready, will reask for review when we feel like it's ready for the tree.

Attachment #561855 - Flags: review?(sdwilsh)

Marco Bonardo [:mak]

Updated

•

13 years ago

Depends on: 696399, 696404

Summary: places shutdown process doesn't wait for connections to close → Storage should not shutdown async threads after xpcom-shutdown.

Marco Bonardo [:mak]

Comment 22

•

13 years ago

Rafael, is there any other bug that should be added to dependencies, before this one can be worked on? I added the ones I'm aware of.

Nicholas Nethercote [inactive]

Updated

•

13 years ago

Depends on: 702815

Marco Bonardo [:mak]

Updated

•

13 years ago

Summary: Storage should not shutdown async threads after xpcom-shutdown. → Shutdown the async thread of each connection before xpcom-shutdown.

Nicholas Nethercote [inactive]

Comment 23

•

13 years ago

I spun off bug 702815 to just add the list of connections.  This bug can deal with the shutdown issues.

Marco Bonardo [:mak]

Updated

•

13 years ago

Blocks: 704030

Rafael Ávila de Espíndola (:espindola) (not reading bugmail)

Assignee

Comment 24

•

13 years ago

This is now a dup of 744294, no?

Marco Bonardo [:mak]

Comment 25

•

13 years ago

yeah, not exactly 100% the same (this was expected to spin earlier) but not worth to keep it open until we decide how to move on from the current spinning

Status: ASSIGNED → RESOLVED

Closed: 13 years ago

Resolution: --- → DUPLICATE

BMO Automation

Updated

•

28 days ago

Product: Toolkit → Core