Closed Bug 718449 Opened 8 years ago Closed 8 years ago

History may dispatch events that use the database after asyncClose

Categories

(Toolkit :: Places, defect)

x86_64
Linux
defect
Not set

Tracking

()

RESOLVED FIXED
mozilla13

People

(Reporter: espindola, Assigned: espindola)

References

Details

Attachments

(2 files, 2 obsolete files)

The sequence of events is:

* NotifyVisitObservers is queued
* we send TOPIC_PROFILE_BEFORE_CHANGE
* The Database.cpp observer calls asyncclose on the connection
* NotifyVisitObservers::Run tries to use the connection
I can't see where it tries to use the connection, unless it's an onVisit observer using it?

Btw, I think it's not the only issue on shutdown, for example if we remove a visit to a download we notify the download manager of the removal, but this notification may happen after profile-before-change, at that point the DM connection has gone.
This latter case may be fixed spinning the events loop in Places profile-before-change till it's closed.
It is in a onVisit, yes:

      navHistory->NotifyOnVisit(uri, mPlace.visitId, mPlace.visitTime,
                                mPlace.sessionId, mReferrer.visitId,
                                mPlace.transitionType, mPlace.guid);

The funny thing is that I found this trying to spin the loop in places :-)
I am testing a combined patch that uses  TOPIC_PLACES_SHUTDOWN to make sure the NotifyVisitObservers are processed in time. If that works I will split it and attached it to this bug.
https://tbpl.mozilla.org/?tree=Try&rev=e9a31e9f5a06

In the first patch I had missed the ! in

 NS_PRECONDITION(!NS_IsMainThread(

and was thinking that both Run methods were in the main thread. This patch fixes this by using a single atomic counter. It also uses an RAII helper to make sure we always decrement the counters.
Attachment #588961 - Attachment is obsolete: true
Attachment #588961 - Flags: review?(mak77)
Attachment #589472 - Flags: review?(mak77)
Comment on attachment 589472 [details] [diff] [review]
wait for NotifyVisitObservers and InsertVisitedURIs to finish

Review of attachment 589472 [details] [diff] [review]:
-----------------------------------------------------------------

I was thinking of an alternative to the counter, for a couple reasons: atomic increment/decrement is not cheap, and it will be done for each visit even if we actually won't use it till shutdown and we may not even need it (in most cases will be 0 on shutdown). Both the counter and IsShutdown are accessed by both threads, luckily we check on the same thread as the increment so shouldn't be problematic, but I fear future changes here (so if we keep this approach I want some deeper documenting comment).

We may have a spinner class implementing nsrunnable, similar to the one we use in tests, we may post it to the async thread and immediately start spinning mainthread, like you do here. It would be enqueued after any async statement or event. When that event will run it will post an event to the mainthread, this event should then be enqueued after anything created by the other async events. We just have to spin till this last mainthread event arrives and sets something (A property may be fine). insertVisitedURIs::Start should bail out on shutdown, if it doesn't already.
The downside of this approach is that we will have to spin on shutdown regardless of having pending stuff or less, that is still not funny.

Now, to the review, since this is still a plausible approach, we can just discuss what's the best way on.

::: toolkit/components/places/History.cpp
@@ +67,5 @@
>  using namespace mozilla::dom;
>  using mozilla::unused;
>  
> +namespace {
> +  class AtomicDecAux {

I's prefer a slightly more verbose AtomicDecrementScoper

@@ +73,5 @@
> +  public:
> +    AtomicDecAux(PRInt32 *a) : m(a) {}
> +    ~AtomicDecAux() {
> +      PR_AtomicDecrement(m);
> +    }

Please add some debug code+assert that constructor and destructor are invoked on the same thread.

@@ +1662,5 @@
>  
>  void
>  History::Shutdown()
>  {
>    NS_ASSERTION(!mShuttingDown, "Shutdown was called more than once!");

please add a moz_assert NS_IsMainThread, also please convert this assertion to a MOZ_ASSERT.

@@ +1666,5 @@
>    NS_ASSERTION(!mShuttingDown, "Shutdown was called more than once!");
>  
> +  nsCOMPtr<nsIThread> thread = do_GetCurrentThread();
> +  while (mNumNotifyVisitObservesAndInsertVisitedURIs != 0)
> +    NS_ProcessNextEvent(thread);

always brace loops

::: toolkit/components/places/History.h
@@ +72,5 @@
>    NS_DECL_NSIOBSERVER
>  
>    History();
>  
> +  PRInt32 mNumNotifyVisitObservesAndInsertVisitedURIs;

let's just call it mNumPendingVisitEvents, and add a nice comment on what it tracks and why (mostly a description of this bug)
Also ensure to make clear this may be accessed across threads, that the increments/decrements should only be atomic.
Attachment #589472 - Flags: review?(mak77)
Hm, my above thinking on events and spinning may maybe be used in Database.cpp before asyncClose() as a centralized solution? What do you think?
> I was thinking of an alternative to the counter, for a couple reasons:
> atomic increment/decrement is not cheap,

This code uses a mutex to get a pointer to a thread object that will parse a SQL command. The additional cost of an atomic increment on intel is 0 and on ARM it will not be worse than a memory barrier.

> We may have a spinner class implementing nsrunnable, similar to the one we
> use in tests, we may post it to the async thread and immediately start
> spinning mainthread, like you do here. It would be enqueued after any async
> statement or event. When that event will run it will post an event to the
> mainthread, this event should then be enqueued after anything created by the
> other async events. We just have to spin till this last mainthread event
> arrives and sets something (A property may be fine).

This is a *lot* more complex! How would you implement an enqueue operation faster than an atomic inc? The overall options we have in here fall in the following categories

*) Keep track of things that can use the connection and spin on them (this patch).
*) Mark that the connection is going away (atomically!), and make NotifyVisitObservers and InsertVisitedURIs Run methods do nothing if that is the case. This assumes that the semantics of both is such that they can be aborted. With this we don't need to spin.
*) Mark that the connection is going away like in the second approach, but just prevent InsertVisitedURIs from creating NotifyVisitInfoCallback. With this we only have to spin on NotifyVisitInfoCallback.


I have opened a new bug for the changes to existing code, since they are unrelated. I will upload a new patch to this bug with the remaining comments.
No longer depends on: 720554
> This is a *lot* more complex! How would you implement an enqueue operation
> faster than an atomic inc? 

The point was not making it absolutely "faster", but paying a cost once, rather than on each visit. The main positive point though, would be that we may implement this in Database.cpp and have it as a catch-all solution, rather than having different spinners in different points.
At a certain point we'll have to spin in Database.cpp regardless, to be sure to handle all notifications before asyncClose, and while this patch fixes the History case, I'm pretty sure there are another couple cases: one surely in icons that have a similar structure to these History runnables, one surely in downloads that may try to handle notifications really too late. That would require dedicated spinners in each component, that would be a mess. Thus having a centralized solution would be a win in the long term.
Though, I'm not saying we should do that immediately, just that I'd like you to evaluate that, and file a bug to implementing something like and remove these changes, since would then be obsoleted.
Attachment #590952 - Flags: review? → review?(mak77)
> The point was not making it absolutely "faster", but paying a cost once,
> rather than on each visit. The main positive point though, would be that we
> may implement this in Database.cpp and have it as a catch-all solution,
> rather than having different spinners in different points.

Let me be clear on this. Your assert that the semantics of these two methods is such that it is OK if them to do *nothing* after we call async close? If that is not the case, we *must* spin. When async close is called it will atomically and immediately set the connection in a closed state.

At this point InsertVisitedURIs::Run might be active in another thread, with the NotifyTitleObservers it creates already allocated. That NotifyTitleObservers will fail do to its job.

Since there is work being done in two threads, just spinning the loop is not sufficient. The other thread can add work just after we think we are done.

If the Run methods can do nothing after the call to async close, what History::Shutdown can do is set a flag while being guarded by a lock. Similar to the way connection SetClosedState works. The Run methods would then grab that look too and if we are shutting down, they would return.

> At a certain point we'll have to spin in Database.cpp regardless, to be sure
> to handle all notifications before asyncClose, and while this patch fixes
> the History case, I'm pretty sure there are another couple cases: one surely
> in icons that have a similar structure to these History runnables, one
> surely in downloads that may try to handle notifications really too late.
> That would require dedicated spinners in each component, that would be a
> mess. Thus having a centralized solution would be a win in the long term.
> Though, I'm not saying we should do that immediately, just that I'd like you
> to evaluate that, and file a bug to implementing something like and remove
> these changes, since would then be obsoleted.

As I noted in the thread dolske started, any async work done by an observer can causes this kind of problem, since the caller of NotifyObservers expects the work to be done when NotifyObservers returns. We need an Observe method that takes a callback and indicates it is done by calling it. This would centralize *all* the spinning coming from observers doing async work (like in this case).

I would really like to at least be able to assert that we close the connections before side tracking into another painful task.
We are basically saying the same things, and proposing two different approaches that will bring to the same result, so no reasons to discuss what's the goal.
My solution can be centralized to a single point and involves 2 runnables and a spin that must always run, your solution is per-component, and involves atomic counters and a spin per-component that may have to run. Both will work, no doubts.

That said, I'm fine taking this, if we make a bug to investigate the centralized solution, I'd not want another spinning in another component like favicons, when we can just have one "global" (to Places) that handles all Places observers. Also because as you said in future we may have a callback solution that avoid to spin, then we could remove the single spin point.
You have not answered the question of both events being cancelable or not.

I insist that spinning just in places *will not do*.

Places sends TOPIC_PLACES_SHUTDOWN. By the time notifyObverser returns, we call asyncClose. Since asyncClose sets the connection state to closed, anything that might use that connection has to be schedule *before* the call to asyncClose.

That include things running in *other threads*. Places can spin as much as it wants and the kernel might not even schedule that thread during that time.

Now, places could do the counting. It could pass a pointer to an atomic counter to anything that might use the database. If you don't want atomics, those tasks could send Increment and Decrement messages. But then we would have to spin the loop in places twice: once to make sure anything that might use a connection is scheduled and after we call asyncClose to make sure the thread is gone.

Both are wasteful and involve refactoring to implement a bad solution. The real problem we have is the mismatch of a sync notify/observe api and observers doing async work.

When you say you would not want another spin. Does that mean I should give up all hope of fixing bug 711076 if any other dependency like this shows up?
(In reply to Rafael Ávila de Espíndola (:espindola) from comment #12)

I don't think we should cancel anything for now. It's something to be evaluated later for the risk of regressions.

> Places sends TOPIC_PLACES_SHUTDOWN. By the time notifyObverser returns, we
> call asyncClose.

Yep, I'd spin before we call asyncClose, at that time anything that the observers may have generated should be scheduled. The double runnable would do the rest.

> Both are wasteful and involve refactoring to implement a bad solution.

I disagree, my solution doesn't involve any refactoring, just adding a couple runnable classes, dispatching one to the asyncthread and spinning, just before asyncClose.

> When you say you would not want another spin. Does that mean I should give
> up all hope of fixing bug 711076 if any other dependency like this shows up?

Yes, if they involve spinning in multiple Places components, we should try to find a centralized solution, rather than having spinners spread everywhere. spinners may introduce subtle bugs, so the less we have the better is.
> Yep, I'd spin before we call asyncClose, at that time anything that the
> observers may have generated should be scheduled. The double runnable would
> do the rest.
>
> > Both are wasteful and involve refactoring to implement a bad solution.
> 
> I disagree, my solution doesn't involve any refactoring, just adding a
> couple runnable classes, dispatching one to the asyncthread and spinning,
> just before asyncClose.

You have to send two messages. Basically the current atomic in becomes an send "Something that uses the db wants to run" and the dec becomes a message "Something that uses the db has finished running".

And you still spin the loop twice! Once before calling asyncClose. One after.
 
> > When you say you would not want another spin. Does that mean I should give
> > up all hope of fixing bug 711076 if any other dependency like this shows up?
> 
> Yes, if they involve spinning in multiple Places components, we should try
> to find a centralized solution, rather than having spinners spread
> everywhere. spinners may introduce subtle bugs, so the less we have the
> better is.

I guess I just have to find something else to work then.
(In reply to Rafael Ávila de Espíndola (:espindola) from comment #14)
> You have to send two messages. Basically the current atomic in becomes an
> send "Something that uses the db wants to run" and the dec becomes a message
> "Something that uses the db has finished running".
> 
> And you still spin the loop twice! Once before calling asyncClose. One after.

Why should I spin after it?
1. TOPIC SHUTDOWN
2. Observers enqueue stuff to async thread
3. enqueue final-runnable to the async thread
4. spin till we get mainthread-runnable-topic
5. async-runnables run, they may enqueue mainthread runnables
6. final-runnable runs (no more async runnables exist), enqeues mainthread-runnable
7. mainthread-runnable runs (no more mainthread runnables) and fires mainthread-runnable-topic
8. asyncClose()

Now, surely it may be possible that mainthread runnables enqueued by async runnables enqueue another async runnable, but that'd be an edge case we should try to avoid.

> I guess I just have to find something else to work then.

This negative attitude doesn't help, I don't think discussing solutions to find good approaches has anything wrong in it.
> Why should I spin after it?
> 1. TOPIC SHUTDOWN
> 2. Observers enqueue stuff to async thread
> 3. enqueue final-runnable to the async thread
> 4. spin till we get mainthread-runnable-topic
> 5. async-runnables run, they may enqueue mainthread runnables
> 6. final-runnable runs (no more async runnables exist), enqeues
> mainthread-runnable
> 7. mainthread-runnable runs (no more mainthread runnables) and fires
> mainthread-runnable-topic
> 8. asyncClose()

9. The connection thread keeps running until after xpcom-shutdown-threads if nothing waits for it. Which is how we started finding these bugs in the first place.

> > I guess I just have to find something else to work then.
> 
> This negative attitude doesn't help, I don't think discussing solutions to
> find good approaches has anything wrong in it.

Really? I say that blocking fixing P1 bugs on a refactoring that would take months (adding an ObserveAsync) or moving the code in a direction that would be reverted once that refactoring is done is what is not very productive.
(In reply to Rafael Ávila de Espíndola (:espindola) from comment #16)
> 9. The connection thread keeps running until after xpcom-shutdown-threads if
> nothing waits for it. Which is how we started finding these bugs in the
> first place.

I don't see how your patch here solves this problem... That is a separate bug that should be fixed in storage, doesn't look related to this bug or its solution.

> Really? I say that blocking fixing P1 bugs on a refactoring that would take
> months (adding an ObserveAsync)

I never suggested this... My local Places solution may take a couple days maximum, I'd never block on waiting a large refactoring like ObserveAsync.

> or moving the code in a direction that would
> be reverted once that refactoring is done is what is not very productive.

Right, that was my fear and the reason I started this discussion, your changes here will be reverted once we implement a solution in Places Database.cpp. As I said I'm not interested in throwing away this work, but if we start having to special case each Places component I'd prefer starting now to think to a central solution before having to do it later and revert all of this work.
> I don't see how your patch here solves this problem... That is a separate
> bug that should be fixed in storage, doesn't look related to this bug or its
> solution.

It doesn't. It adds on loop spinning. The next bug would add the next one. You proposal just adds a loop spinning in a different location.


> > or moving the code in a direction that would
> > be reverted once that refactoring is done is what is not very productive.
> 
> Right, that was my fear and the reason I started this discussion, your
> changes here will be reverted once we implement a solution in Places
> Database.cpp. As I said I'm not interested in throwing away this work, but
> if we start having to special case each Places component I'd prefer starting
> now to think to a central solution before having to do it later and revert
> all of this work.

And your solution which is just a mechanical replacement of inc/dec with message sends is an ever larger bunch of code to be reverted.
(In reply to Rafael Ávila de Espíndola (:espindola) from comment #18)
> It doesn't. It adds on loop spinning. The next bug would add the next one.
> You proposal just adds a loop spinning in a different location.

Sure, that's why I said we are trying to reach the same goal.  On the other side it has the potential to solve this issue for any other places observer and the async favicons service, while the current solution here only covers History.
Surely both may have pro and cons, that is what we are supposed to evaluate.

> And your solution which is just a mechanical replacement of inc/dec with
> message sends is an ever larger bunch of code to be reverted.

It would be larger, though being a couple classes would be mostly adjacent code in the anonymous namespace, and we should not remove it from history, favicons and other components, but just from Database.

As I said I don't pretend we do this immediately, but if we end up doing this same inc/dec trick in multiple components that depend on Places shutdown, my solution would gain more pros. That's why I'd suggest to file a bug to evaluate a more global approach before going head down in special casing each Places observer, and in the meanwhile take this to proceed.
> As I said I don't pretend we do this immediately, but if we end up doing
> this same inc/dec trick in multiple components that depend on Places
> shutdown, my solution would gain more pros. That's why I'd suggest to file a
> bug to evaluate a more global approach before going head down in special
> casing each Places observer, and in the meanwhile take this to proceed.

OK. So one step at a time. My patch adds one loop spinning. As would yours. Is my patch OK? What would I *have* to change on *only* this patch to fix *only* this bug for it to be OK?
(In reply to Marco Bonardo [:mak] from comment #13)
> I don't think we should cancel anything for now. It's something to be
> evaluated later for the risk of regressions.

Marco i think we *should* explore that here. The code is broken as is and one way to fix it is to allow the request to be cancelled. It's the simpler way and it would avoid slowing down our shutdown further(as spinning the event loop would).

Here is Rafael's summary on how broken this is now:
10:15 <espindola> we call asyncClose, which sets the state to closed
10:15 <espindola> but doesn't close it for real
10:15 <espindola> when an async stmt is created, it gets a pointer to the exec thread
10:16 <espindola> so if one was already created before asyncClose was called, it will execute normally
10:16 <espindola> if we try to create one after (as we don now)
10:16 <espindola> it gets a NULL thread and fails
10:17 <espindola> leaks, doesn't notify anything that might be waiting for it, etc
10:17 <espindola> NS_ASSERTION returns an error code
10:18 <espindola> I found this when I tried to spin the loop after places closes its database

We know we need to start figuring out how to cancel things soon, so we might as well start here.
I will try to upload a patch with taras suggestion. It will also avoid all spinning in History.cpp, since we don't have to wait for a method we know will do nothing :-)
(In reply to Taras Glek (:taras) from comment #21)
> (In reply to Marco Bonardo [:mak] from comment #13)
> > I don't think we should cancel anything for now. It's something to be
> > evaluated later for the risk of regressions.
> 
> Marco i think we *should* explore that here. The code is broken as is and
> one way to fix it is to allow the request to be cancelled. It's the simpler
> way and it would avoid slowing down our shutdown further(as spinning the
> event loop would).

Then this won't land till we merge, since I feel like we are underestimating the risk here, and there is no reason to rush things just to try if something may work.

> Here is Rafael's summary on how broken this is now:

This describes well the problem, it's known from some time and part of it is by design (that may be wrong, but it's how things were supposed to work).

> We know we need to start figuring out how to cancel things soon, so we might
> as well start here.

Sure, in another bug, since it's not the scope of this bug.  Then we can evaluate which approach we prefer and wontfix the other one.
I exactly didn't want to rush this fix to evaluate from a larger point of view the problem, so I can only be happy if we start really evaluating it.
https://tbpl.mozilla.org/?tree=Try&rev=94cd9987869f

This patches uses a technique very similar to the SetCloseState in Connection. We grab a lock for setting mShuttingDown and other threads that might schedule uses of the connection grab it too.

That is the easy part. Figuring out what we are not executing is the hard one. I was able to track:

* InsertVisitedURIs::DoDatabaseInserts. This is what records the URI being visited.
*  navHistory->NotifyOnVisit. This calls OnVisit on many observers. I think they can be nsNavHistoryQueryResultNode::OnVisit, nsNavHistoryResult::OnVisit, nsNavBookmarks::OnVisit or  nsDownloadManager::OnVisit. I haven't tracked them all to find which ones might write to disk.
* obsService->NotifyObservers(uri, URI_VISIT_SAVED, nsnull). This is for testing only. We should probably move it inside an #ifdef ...
*  history->NotifyVisited(uri). Calls link->SetLinkState(eLinkState_Visited) which calls UpdateState on the ellement. I guess anything could plug in here...

So the executive summary is that we would not be recording the last visited url and there might be something more on down the notifications paths.

Is this OK? Despite the above issues, I think it is. The reasoning is:

* This already happens most of the time. I notice places being used after asyncClose because I added a spin to wait for the closing and got failures coming from:

  nsIEventTarget *target = aConnection->getAsyncExecutionTarget();
  NS_ENSURE_TRUE(target, NS_ERROR_NOT_AVAILABLE);

Given that we don't normally see them, this implies that we normally quit before this has a chance to run. We are just making that decision explicit and the code well behaved when it does run.

Additionally, "visited" is a fuzzy concept. Do we provide any guarantees on what shows up as visited? Any URI we asked for? Any URI we finished downloading?

Other additional advantages are that we don't have to spin (since the Run methods will do nothing) and this is also a step reducing work we have to do during shutdown.
Attachment #591924 - Flags: review?(mak77)
Attachment #591924 - Flags: feedback?(taras.mozilla)
Looks like the previous try got a bad base. A better looking one is at

https://tbpl.mozilla.org/?tree=Try&rev=3fafaa878355
Comment on attachment 591924 [details] [diff] [review]
Don't run NotifyVisitObservers or InsertVisitedURIs after shutdown

this looks reasonable to me, but this should've gone in a new bug.
Attachment #591924 - Flags: feedback?(taras.mozilla) → feedback+
Comment on attachment 590952 [details] [diff] [review]
wait for NotifyVisitObservers and InsertVisitedURIs to finish

Per discussion, I don't particularly like this local spinning solution. Looking into the other one.
Attachment #590952 - Flags: review?(mak77) → feedback-
Comment on attachment 591924 [details] [diff] [review]
Don't run NotifyVisitObservers or InsertVisitedURIs after shutdown

Review of attachment 591924 [details] [diff] [review]:
-----------------------------------------------------------------

Ok, I'm fine with taking this on early FF13, so we have enough time to bake it and see if it causes any troubles.

::: toolkit/components/places/History.cpp
@@ +511,5 @@
>    }
>  private:
>    VisitData mPlace;
>    VisitData mReferrer;
> +  History *mHistory;

I'd prefer keeping a strong nsRefPtr here, we don't want history to go away before us. Even if with your change that should really not happen. But better being paranoid.

@@ +1451,5 @@
>  History* History::gService = NULL;
>  
>  History::History()
> +  : mShuttingDown(false),
> +    mShutdownMutex("History::mShutdownMutex")

per coding style, comma aligned with the semicolon please (it saves 1 blame change). So
 : a
 , b

::: toolkit/components/places/History.h
@@ +143,5 @@
>  
> +  bool IsShuttingDown() const {
> +    return mShuttingDown;
> +  }
> +  Mutex &GetShutdownMutex() {

per code style, decorators towards types please.

@@ +185,5 @@
>    static History* gService;
>  
>    // Ensures new tasks aren't started on destruction.
>    bool mShuttingDown;
> +  Mutex mShutdownMutex;

Add some better comment above the mutex explaining when it's locked and what it prevents. Mostly a sum up of this bug for future reference.
Attachment #591924 - Flags: review?(mak77) → review+
Summary: places database can be used after asyncshutdown is called → History may dispatch events that use the database after asyncClose
https://hg.mozilla.org/mozilla-central/rev/decd5f642268
Status: ASSIGNED → RESOLVED
Closed: 8 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla13
You need to log in before you can comment on or make changes to this bug.