Closed Bug 556631 Opened 14 years ago Closed 14 years ago

Make frecency update async

Categories

(Toolkit :: Places, defect)

defect
Not set
normal

Tracking

()

RESOLVED FIXED
mozilla2.0b9
Tracking Status
blocking2.0 --- betaN+

People

(Reporter: stechz, Assigned: mak)

References

Details

(Whiteboard: [fixed-in-places])

Attachments

(3 files, 6 obsolete files)

Figure out a way to make changes to a site's frecency asynchronous when docshell calls VisitURI.
Depends on: 556400
if we don't fix it, this will hurt navigation since now frecency calculation is done just after a visit is added, and visits are no more added lazily (but asynchronously).
Asking blocking because this should get some priority and could cause problems with ui responsiveness.
blocking2.0: --- → ?
OS: Mac OS X → All
Hardware: x86 → All
(In reply to comment #1)
> if we don't fix it, this will hurt navigation since now frecency calculation is
> done just after a visit is added, and visits are no more added lazily (but
> asynchronously).
> Asking blocking because this should get some priority and could cause problems
> with ui responsiveness.

Surely if we don't fix it things stay as they are now, or is this a regression caused by something else?
(In reply to comment #2)
> Surely if we don't fix it things stay as they are now, or is this a regression
> caused by something else?

it's an expected regression due to the fact now visits are async but still using sync frecency.
so, to clarify, it could be more noticeable by users because before it was happening at "random" times, now it happens on visit addition that is when user clicks a link or whatever similar.
blocking2.0: ? → beta4+
Assignee: nobody → mak77
Not clear to me why this was blocking beta4 specifically.  At best, we need beta coverage on this fix (although I'd probably be OK with just making sure it's in the final release).  Moving to betaN.
blocking2.0: beta4+ → betaN+
btw, patch is proceeding nicely (will attach wip soon), but fixing tests could require some time.
I'm splitting work in 2 parts.

Part 1 will change frecency calculation code to simplify it. currently we have a bunch of methods to do it, and making all of that stuff async would be crazy. So the idea is to make that as easy as:
"UPDATE moz_places SET frecency = CALCULATE_FRECENCY(id)"
This part should be in a reviewable state, I ran Places tests locally and they were fine (still needs a full run though).

At this point it should be easy to make it async changing execute to executeAsync, that is Part 2, that will include test changes that could be needed to avoid timing issues.

sdwilsh, take your time, I won't be able to work on this for a week, but in case you have comments, I'll fix them when i'll be back.
Attachment #465907 - Flags: review?(sdwilsh)
Comment on attachment 465907 [details] [diff] [review]
Part1: change frecency calculation code

For comments with expandable context, please see http://reviews.visophyte.org/r/465907/.

on file: toolkit/components/places/src/SQLFunctions.h line 225
>  * @param [optional] typed
>  *        Whether the page has been typed in.
>  * @param [optional] fullVisitCount
>  *        Count of all the visits (All types).
>  * @param [optional] isBookmarked
>  *        Whether the page is bookmarked.

Please specify what the defaults are if it's optional.


on file: toolkit/components/places/src/SQLFunctions.cpp line 350
>     if (pageId > 0) {

I think we actually want to raid an error in this situation.


on file: toolkit/components/places/src/SQLFunctions.cpp line 365
>       rv = getPageInfo->BindUTF8StringByName(NS_LITERAL_CSTRING("anno_name"),
>                                              NS_LITERAL_CSTRING("livemark/feedURI"));

We really need to figure out how to get that out of this table so we don't
need to worry about it anymore...


on file: toolkit/components/places/src/SQLFunctions.cpp line 394
>       while (NS_SUCCEEDED(getVisits->ExecuteStep(&hasResult)) && hasResult) {
>         numSampledVisits++;

Just to be clear, we are now looking at all visits, right?  Isn't this sorta
expensive?


on file: toolkit/components/places/src/SQLFunctions.cpp line 404
>         // if bonus was zero, we can skip the work to determine the weight

nit: comment is in our style


on file: toolkit/components/places/src/SQLFunctions.cpp line 408
>           if (ageInDays <= history->GetFrecencyBucketCutoff(1))
>             weight = history->GetFrecencyBucketWeight(1);
>           else if (ageInDays <= history->GetFrecencyBucketCutoff(2))
>             weight = history->GetFrecencyBucketWeight(2);
>           else if (ageInDays <= history->GetFrecencyBucketCutoff(3))
>             weight = history->GetFrecencyBucketWeight(3);
>           else if (ageInDays <= history->GetFrecencyBucketCutoff(4))
>             weight = history->GetFrecencyBucketWeight(4);
>           else
>             weight = history->GetFrecencyBucketWeight(0);

thought our style these days was to brace ifs, even if they are only one
liners


on file: toolkit/components/places/src/SQLFunctions.cpp line 472
>     // use NS_ceilf() so that we don't round down to 0, which
>     // would cause us to completely ignore the place during autocomplete

nit: comment doesn't follow our convention


on file: toolkit/components/places/src/nsNavBookmarks.cpp line 989
>   rv = transaction.Commit();
>   NS_ENSURE_SUCCESS(rv, rv);
> 
>   // Re-calculate the frecency for this moz_place entry since it was set to -1.
>   rv = history->UpdateFrecency(childID);

Why are we updating frecency post commit now?  Because we can't read committed
data?  This will make adding a bookmark even slower for mobile, who is already
complaining that this is slow.


I'm r- this because I think it fundamentally changes how we are doing frecency, and I want to here why we need to do that.

I'm also a bit concerned about some thread safety issues and would like to see some assertions added to make sure some things are only used on one thread (such as the statement we use in the background).
Attachment #465907 - Flags: review?(sdwilsh) → review-
(In reply to comment #8)
> on file: toolkit/components/places/src/SQLFunctions.cpp line 394
> >       while (NS_SUCCEEDED(getVisits->ExecuteStep(&hasResult)) && hasResult) {
> >         numSampledVisits++;
> 
> Just to be clear, we are now looking at all visits, right?  Isn't this sorta
> expensive?

no, we are looking to the last N visits where N is controlled by a pref and it's currently 10.

> on file: toolkit/components/places/src/nsNavBookmarks.cpp line 989
> >   rv = transaction.Commit();
> >   NS_ENSURE_SUCCESS(rv, rv);
> > 
> >   // Re-calculate the frecency for this moz_place entry since it was set to -1.
> >   rv = history->UpdateFrecency(childID);
> 
> Why are we updating frecency post commit now?  Because we can't read committed
> data?  This will make adding a bookmark even slower for mobile, who is already
> complaining that this is slow.

Hm, I think I just did it wrong order here

> I'm r- this because I think it fundamentally changes how we are doing frecency,
> and I want to here why we need to do that.

Is this just about bookmarks issue above or the overall approach?

> I'm also a bit concerned about some thread safety issues and would like to see
> some assertions added to make sure some things are only used on one thread
> (such as the statement we use in the background).

can you define "things" please? those statements are ran in the thread where the query running them is. which kind of check do you want on them?
(In reply to comment #9)
> no, we are looking to the last N visits where N is controlled by a pref and
> it's currently 10.
I hadn't realized it was stored in the SQL itself.  Can you please add a comment explaining this in OnFunctionCall please?

> Is this just about bookmarks issue above or the overall approach?
The approach is fine, I just felt like there were enough issues (mostly threadsafety assumptions that were not documented, plus the (incorrect) thought that frencency was changing.

> can you define "things" please? those statements are ran in the thread where
> the query running them is. which kind of check do you want on them?
Looks like things I was worried about we either cannot assert that we are not on the main thread (like in OnFunctionCall), or aren't actually accessed anywhere but on the main thread.  This code is big and scary (all of places), which makes it hard to fact check.

If you address the above comments, I suspect it'll be r+
So I have fixed patches for both this change and making frecency async but I see an assertion in AsyncStatement:~AsyncStatement when running fixInvalidFrecencies (that is a query I create as an mozIStorageAsyncStatement)...

Practically here:

    Connection *forgottenConn = nsnull;
    mDBConnection.swap(forgottenConn);
    (void)::NS_ProxyRelease(mDBConnection->threadOpenedOn, forgottenConn);

mDBConnection is nsnull and NS_ProxyRelease is trying to use it.
This happens at shutdown, any idea what could cause such a thing?
cc-ing Asuth that worked on AsyncStatement, see comment 11 please.
Attached file stack at the assertion (obsolete) —
(In reply to comment #11)
>     Connection *forgottenConn = nsnull;
>     mDBConnection.swap(forgottenConn);
>     (void)::NS_ProxyRelease(mDBConnection->threadOpenedOn, forgottenConn);
> 
> mDBConnection is nsnull and NS_ProxyRelease is trying to use it.
> This happens at shutdown, any idea what could cause such a thing?

So, this code snippet does seem to be aggressively shooting itself in the foot since the swap command is causing mDBConnection to become null.

As the comment ahead of the block suggests:
  // If we are getting destroyed on the wrong thread, proxy the connection
  // release to the right thread.  I'm not sure why we do this.
I think I may just have cargo culted that bit.

I think the rationale for the disclaimer comment is the CompletionNotifier in mozStorageAsyncStatementExecution which tries to make sure that no statements die on the asynchronous thread by keeping them alive until it is dispatched back to the originating thread.

It looks like the problem here is that we have 3 threads going on:
A: the thread that opened the connection
B: the thread that is dispatching the async statements
C: the mozStorage async thread

B is issuing the request, C is servicing it and sending the CompletionNotifier back to B.  B goes to release the async statement which decides that the connection needs to be closed on A.

I think we can all agree that the foot-shooting logic should is buggy and either should be removed or made happier.
worth filing a bug? If you have ideas for a patch that'd be welcome.
Depends on: 596970
this is the refactoring part, the algo does not change nor the points where we do it, just the implementation.
Attachment #465907 - Attachment is obsolete: true
Attachment #475822 - Attachment is obsolete: true
Attachment #476059 - Flags: review?(sdwilsh)
And this uses part 1 to do updates async.
Attachment #476060 - Flags: review?(sdwilsh)
this (with Storage assertion fix) passed tryserver. I'm waiting talos still.
Whiteboard: [has patch][needs review sdwilsh]
Comment on attachment 476059 [details] [diff] [review]
part 1: refactor frecency calculation code

+++ b/toolkit/components/places/src/SQLFunctions.cpp
Note that I'd really really like to see this method split up into some smaller methods so it's easier to understand what is going on.  It's one long method now with is all sorts of sadfaces :(

+    nsNavHistory* history = nsNavHistory::GetHistoryService();
+    NS_ENSURE_TRUE(history, NS_ERROR_OUT_OF_MEMORY);
We do need to be careful not to touch this object for anything other than the database here (we could be off of the main thread).  As a result, we should do one of two things:
1) scope the history variable so it goes away once we get the db connection
2) make a static helper method on this class that caches the database connection
I'm leaning towards (2) because it means on each function call we'll be doing less.
Although, as I read more of this patch I see we use it elsewhere too.  So, I'm OK with us accessing history as long as the method is a const method, and we take advantage of the compiler doing this checking for us.  We'd need to use a non-const version still to get the db connection, but going with (2) makes that easy.  Then, in this method, we'd simply use a const version of the class (I cannot recall the right syntax and cannot look it up on the plane, but |const nsNavHistory*| is not what we want).  Basically, we cannot mutate the nsNavHistory object on the background thread, and that's what I'm trying to protect.  I guess I was wrong in comment 10 when I said that OnFunctionCall was OK.  We'll be changing when we do it, so we need to be threadsafe.

+    // If pageId is invalid the page is not yet in the database, we use
nit: "If pageId is invalid, the...", and "...in the database.  We..."

+      // Check connection status.
+      PRBool ready;
+      if (NS_FAILED(DBConn->GetConnectionReady(&ready)) || !ready) {
+        NS_WARNING("Database connection not ready for frecency calculation.");
+        return NS_ERROR_UNEXPECTED;
+      }
I'm 99% sure that we couldn't possibly be getting called if the connection was not ready.  We could assert that the connection is ready if you are worried about it though.

+      nsCOMPtr<mozIStorageStatement> getPageInfo =
+        history->GetStatementById(DB_PAGE_INFO_FOR_FRECENCY);
+      NS_ENSURE_STATE(getPageInfo);
And this is dangerous to access because of threadsafety issues.  This function could be called on the background and the main thread, right?  We should only use the cached statement when on the main thread, and I guess recreate it (ugh) anytime on the background thread.  The alternative is to have a cached copy for this to use on the async thread too.

r- only because of the threadsafety stuff mentioned above.  If you add a part 1.5 between this version and part 2, this can be r+ with the non-threadsafety comments addressed
Attachment #476059 - Flags: review?(sdwilsh) → review-
(In reply to comment #20)
> +    nsNavHistory* history = nsNavHistory::GetHistoryService();
> +    NS_ENSURE_TRUE(history, NS_ERROR_OUT_OF_MEMORY);
> We do need to be careful not to touch this object for anything other than the
> database here (we could be off of the main thread).

All the calls to history are read-only getters of properties (read from prefs or cached), making all of them static could be tricky. I'll try the const object strategy, not sure how well that will work.

> +      nsCOMPtr<mozIStorageStatement> getPageInfo =
> +        history->GetStatementById(DB_PAGE_INFO_FOR_FRECENCY);
> +      NS_ENSURE_STATE(getPageInfo);
> And this is dangerous to access because of threadsafety issues.  This function
> could be called on the background and the main thread, right?  We should only
> use the cached statement when on the main thread

hm, I thought statements objects were threadsafe by themselves, looks like this is not the case? the getter is not creating statements, it is just passing out cached statements, and we can use these statements both in the main or in the pool async execution, I thought this was pretty similar usage.
(In reply to comment #21)
> All the calls to history are read-only getters of properties (read from prefs
> or cached), making all of them static could be tricky. I'll try the const
> object strategy, not sure how well that will work.
I do not think it makes sense for them to be static, fwiw.

> hm, I thought statements objects were threadsafe by themselves, looks like this
> is not the case? the getter is not creating statements, it is just passing out
> cached statements, and we can use these statements both in the main or in the
> pool async execution, I thought this was pretty similar usage.
Not threadsafe, but uses threadsafe refcounting because they can be passed around.  It looked to me (but hard to be sure because of our growing amount in indirection in this code) that we create it on the first call to get the getter, and then end up using the cached copy.

Gosh, I can't wait until all of our code is async :/
Blocks: 595574
Comment on attachment 476060 [details] [diff] [review]
part 2: make frecency updates async

I'm clearing this request because I think it is best for me to see this once we shake out part 1[.5].
Attachment #476060 - Flags: review?(sdwilsh)
Whiteboard: [has patch][needs review sdwilsh] → [needs review comments addressed]
This tries to address comments on part 1.

I did not split the function though, I tried different ways to do so, but it's complicated, I should build helpers with a large amount of input params. And I also want to avoid any risk to change the algorithm, so far this is a 1:1 port.
For now I tried to add/clarify some comment to make it more readable instead.
Attachment #478148 - Flags: review?(sdwilsh)
Comment on attachment 478148 [details] [diff] [review]
part 1.5: address comments on part 1

>+++ b/toolkit/components/places/src/SQLFunctions.cpp
>+    // This is a const version of the history object for thread-safety.
>+    const nsNavHistory* history = nsNavHistory::GetConstHistoryService();
We want |nsNavHistory const* history| here.  See http://www.parashift.com/c++-faq-lite/const-correctness.html#faq-18.5
 
>+      // Tthe page is already in the database, and we can fetch current
nit: typo

>+++ b/toolkit/components/places/src/nsNavHistory.h
>   /**
>+   * Used by other components in the places directory to get a reference to a
>+   * const version of this history object.
>+   *
>+   * @return A pointer to a const version of the service if it exists,
>+   *         NULL otherwise.
>+   */
>+  static const nsNavHistory* GetConstHistoryService()
static nsNavHistory const* GetConstHistoryService
However, we should probably just call it GetHistoryService and let the overriding do what it does.

>+  mozIStorageStatement* GetStatementByIdAndThread(
>+    enum mozilla::places::HistoryStatementId aStatementId
>+  ) const
>+  {
>+    using namespace mozilla::places;
>+
>+    switch(aStatementId) {
>+      case DB_PAGE_INFO_FOR_FRECENCY:
>+        return NS_IsMainThread() ? mDBPageInfoForFrecency
>+                                 : mDBAsyncThreadPageInfoForFrecency;
>+      case DB_VISITS_FOR_FRECENCY:
>+        return NS_IsMainThread() ? mDBVisitsForFrecency
>+                                 : mDBAsyncThreadVisitsForFrecency;
>+    }
>+    return nsnull;
>+  }
This isn't really checking by thread id, just main thread or not.  Name seems funny because of it.

r=sdwilsh with comments addressed.
Attachment #478148 - Flags: review?(sdwilsh) → review+
Comment on attachment 476059 [details] [diff] [review]
part 1: refactor frecency calculation code

r=sdwilsh with comment 20 addressed
Attachment #476059 - Flags: review- → review+
Comment on attachment 476060 [details] [diff] [review]
part 2: make frecency updates async

this is mostly tests fixes, personally I'd suggest to concentrate on code changes and have just a quick look at tests. Should apply cleanly still since I have it applied in my queue.
Attachment #476060 - Flags: review?(sdwilsh)
qfolded part 1 with part 1.5 (that was most comments fix), fixed latest comments apart the const* one and the overloading one, as discussed on IRC they are fine as they are.
Attachment #476059 - Attachment is obsolete: true
Attachment #478148 - Attachment is obsolete: true
Whiteboard: [needs review comments addressed] → [needs review]
Comment on attachment 476060 [details] [diff] [review]
part 2: make frecency updates async

>+class AsyncStatementCallbackNotifier : public AsyncStatementCallback
>+{
>+public:
>+  AsyncStatementCallbackNotifier(const char* aTopic)
>+    : mTopic(aTopic)
>+  {
>+  }
>+
>+  NS_DECL_ISUPPORTS
We should just make AsyncStatementCallback implent nsISupports for everyone.  People who need a different QI can go ahead and implement it.

>+  const char* mTopic;
const char* const here since we don't expect to every change this pointer either, yeah?

>+NS_IMETHODIMP
>+AsyncStatementCallbackNotifier::HandleResult(mozIStorageResultSet *aResultSet)
>+{
>+  NS_ASSERTION(PR_FALSE, "You cannot use AsyncStatementCallbackNotifier to get a resultset");
>+  return NS_OK;
NS_ABORT_IF_FALSE?

>+#define ASYNC_BIND(_stmt) \
>+  { \
PR_BEGIN_MACRO and PR_END_MACRO

r=sdwilsh
Attachment #476060 - Flags: review?(sdwilsh) → review+
addressed comments but ISUPPORTS inheritance (new part coming), fixed a couple  frecency oranges in tests.
Attachment #476060 - Attachment is obsolete: true
Shawn, do you want to skim through this changes?
Attachment #480916 - Flags: review?(sdwilsh)
ok, so far everything is nice but test_history_removeAllPages and test_database_sync_after_shutdown_with_removeAllPages. Both are failing randomly on different platforms due to frecency updates on idle. I hate these tests, and I also wrote them!
(In reply to comment #32)
> ok, so far everything is nice but test_history_removeAllPages and
> test_database_sync_after_shutdown_with_removeAllPages. Both are failing
> randomly on different platforms due to frecency updates on idle. I hate these
> tests, and I also wrote them!
Can we just disable idle service stuff (by overriding the xpcom that does that stuff) in our tests?
I have disabled idle-daily removing the history observer, but I fear there is more to do because with it enabled platforms A,C fail, with it enabled platforms B,D fail. I'm sure this stuff works fine, and tests are lying me. I'll recover changes I did to those tests in the temp tables patch and try to bring out something that works. If I could reproduce locally! At least there is the trychooser and I only need xpcshell.
Good news:
I've been able to fix all the tests that were randomly failing, a couple tests will have to be modified again for temp tables removals, but one will disappar and the other one should just require removing a couple lines.
So this is mostly ready to land, just need to know if you want to review the iSupports inheritance stuff (part 3).

Bad news:
Talos is showing a 15% Ts_cold regression on Linux. The amount is the same regardless MAX,MED,MIN profiles, so does not look like related to a query.
Ts (not-cold) does not show any change (something around a slight 1% improvement instead, but could be in noise).
I can't explain this, the only different stuff we do at startup is creating the 2 statements for the async thread and registering the function, but then I'd expect those to be visible on Ts too. Ts cold should just remove the OS libraries/exes caches so I can't see a relation unless there is more cached stuff(?). I'll try to ask Adw since he investigated cold startups.
Any ideas?
Attachment #480915 - Attachment is obsolete: true
Comment on attachment 480916 [details] [diff] [review]
part3: cleanup AsyncStatementCallback implementation

r=sdwilsh

Yay for simpler code!
Attachment #480916 - Flags: review?(sdwilsh) → review+
Whiteboard: [needs review] → [showing ts_cold regression, not ts, needs investigation on reasons]
unbitrotted to land on top of temp tables removal and landed in Places:
part 1: http://hg.mozilla.org/projects/places/rev/45fbd37ec7fe
part 2: http://hg.mozilla.org/projects/places/rev/76861281f7a9
part 3: http://hg.mozilla.org/projects/places/rev/9299905b7242
Whiteboard: [showing ts_cold regression, not ts, needs investigation on reasons] → [showing ts_cold regression, ts is fine][fixed-in-places]
Depends on: 603340
Whoops - we forgot to remove a comment about this bug in History.cpp :)
Whiteboard: [showing ts_cold regression, ts is fine][fixed-in-places] → [fixed-in-places]
Status: NEW → RESOLVED
Closed: 14 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla2.0b9
filed Bug 619784 for a global sweep od all places code for nsCOMPtr to concrete classes, thanks.
Depends on: 620362
Depends on: 620365
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: