Closed Bug 610442 Opened 9 years ago Closed 9 years ago

TRANSITION_EMBED visits should be session persistent on Places branch

Categories

(Toolkit :: Places, defect)

defect
Not set

Tracking

()

RESOLVED FIXED
mozilla2.0b9
Tracking Status
blocking2.0 --- betaN+

People

(Reporter: mak, Assigned: mak)

References

Details

(Whiteboard: [fixed-in-places])

Attachments

(1 file, 4 obsolete files)

After the removal of temp tables embed visits are again being added to the database persistently.
We currently use them only for isVisited check on eventual links that could point to them, we were holding them only in temp tables, so they should stay session persistent and die with the session.
We can probably cache those uris in a memory hash and let isVisited query it first.
That should save some disk IO.

Notice this is something we need only for TRANSITION_EMBED and that some test could need to be updated (embeds won't come out of queries).
blocking2.0: --- → ?
This may be responsible for most of the increase in the size of writes we do on the places branch right now!
blocking2.0: ? → betaN+
will try to have a test patch sooner, so we can get numbers.
Assignee: nobody → mak77
Status: NEW → ASSIGNED
Status: ASSIGNED → NEW
blocking2.0: betaN+ → ?
Status: NEW → ASSIGNED
blocking2.0: ? → betaN+
Attached patch patch v1.0 (obsolete) — Splinter Review
I thought this would have been more straight-forward but it needed more attention.

The difference is that these visits won't have a database entry, thus I have to bail out in all cases where we try to set values for them, plus redirects in frames should be embed unless user activated (or adding a redirect would add the origin uri as a visibile element, when it should not be!)

I've fixed a bunch of tests, but I think most of the changes there can be ignored or just have a quick look. The changes involve:
- AddVisit does not always return a visitId, it was not always doing that already but some test was assuming the opposite. the new behavior here is that adding a embed visit can't return a visitId. I've clarified the bit in the idl. I don't think it needs SR since it could already return 0 for a lot of reasons (disabled history, privatebrowsing, not acceptable uri, ...).
- embed visits won't appear anymore in includeHidden queries. There is no workaround for this, not that I think these queries are really useful. They will still return framed_link visits though, that are far more important.

I still have a run on tryserver to check if I've skipped some fancy test in DOM or whatever other test.
Apart this it should be fine to measure I/O!
Attached patch patch v1.1 (obsolete) — Splinter Review
typo fix
Attachment #489500 - Attachment is obsolete: true
Attached patch patch v1.2 (obsolete) — Splinter Review
Fixed a failure in browser_markPageAsFollowedLink (it was expecting embed visit in the db).

As before, I think most of tests changes can be skipped once we agree on comment 3.
Attachment #489791 - Attachment is obsolete: true
Attachment #489877 - Flags: review?(sdwilsh)
Comment on attachment 489877 [details] [diff] [review]
patch v1.2

>+++ b/toolkit/components/places/src/History.cpp
>+/**
>+ * Notifies that the visited status of a URI has been resolved.
>+ */
>+class NotifyVisitedStatus : public nsRunnable
>+{
>+public:
>+  NotifyVisitedStatus(nsIURI* aURI, bool aVisited)
javadoc style comment on this
nit: aVisited on newline

However, ignore those two comments, and let's just use VisitedQuery here and add a method that does the notification for you. Then just use NS_NewRunnableMethod (http://mxr.mozilla.org/mozilla-central/source/xpcom/glue/nsThreadUtils.h#356).  Less code FTW :)

>+    nsNavHistory* history = nsNavHistory::GetHistoryService();
>+    NS_ENSURE_STATE(history);
>+    if (history->hasEmbedVisit(aURI)) {
>+      // Note that the link is not yet tracking visited status, thus we have to
>+      // enqueue the notification.
>+      nsCOMPtr<nsIRunnable> event = new NotifyVisitedStatus(aURI, true);
>+      nsresult rv = NS_DispatchToMainThread(event);
>+      NS_ENSURE_SUCCESS(rv, rv);
>+      return NS_OK;
>+    }
Here is where you'd use NS_NewRunnableMethod.  The comment could use some work though.  How about something like this:
As per IHistory contract, we must notify asynchronously.

>   NS_IMETHOD HandleCompletion(PRUint16 aReason)
>   {
>     if (aReason != mozIStorageStatementCallback::REASON_FINISHED) {
>       return NS_OK;
>     }
> 
>-    if (mIsVisited) {
>-      History::GetService()->NotifyVisited(mURI);
>-    }
>-
>-    // Notify any observers about that we have resolved the visited state of
>-    // this URI.
>-    nsCOMPtr<nsIObserverService> observerService =
>-      mozilla::services::GetObserverService();
>-    if (observerService) {
>-      nsAutoString status;
>-      if (mIsVisited) {
>-        status.AssignLiteral(URI_VISITED);
>-      }
>-      else {
>-        status.AssignLiteral(URI_NOT_VISITED);
>-      }
>-      (void)observerService->NotifyObservers(mURI,
>-                                             URI_VISITED_RESOLUTION_TOPIC,
>-                                             status.get());
>-    }
>+    nsCOMPtr<nsIRunnable> event = new NotifyVisitedStatus(mURI, mIsVisited);
>+    nsresult rv = NS_DispatchToMainThread(event);
>+    NS_ENSURE_SUCCESS(rv, rv);
And then here just call that new method on the class too.  No need to enque another event on the main thread (especially because we could get a bunch of events there).

>   NotifyVisitObservers(VisitData& aPlace,
>                        VisitData& aReferrer)
>   : mPlace(aPlace)
>   , mReferrer(aReferrer)
>   {
>-    NS_PRECONDITION(!NS_IsMainThread(),
>-                    "This should not be called on the main thread");
>+    // This object may be created on both threads.
I think it is safe to not include this comment.

>+  // Embed visits should not be added to the database, and the same is valid for
>+  // redirects across frames.  Thus all of these visits are considered embed.
>+  if (!(aFlags & IHistory::TOP_LEVEL) &&
>+      !(recentFlags & nsNavHistory::RECENT_ACTIVATED)) {
>+    // A frame redirected to a new site without user interaction.
>+    place.transitionType = nsINavHistoryService::TRANSITION_EMBED;
>+  }
A few things:
1) recentFlags & nsNavHistory::RECENT_ACTIVATED is used twice.  Please pull it into a descriptive local variable and use it.
2) why did you put this in the beginning?
3) how we've done comments elsewhere in this block, the outer comment feels out of place.  It seems like you are talking about how all the if-elses work, which is not the case.
4) the inner comment feels redundant with the outer comment.  Pick one please and lose the other.

>+  // EMBED visits are session-persistent and should not go through the database.
>+  // They exist only to keep track of isVisited status during the session.
>+  if (place.transitionType == nsINavHistoryService::TRANSITION_EMBED) {
>+    history->registerEmbedVisit(place.uri, place.visitTime);
We talked before about using history to refer to mozilla::places::History service, and navHistory to refer to nsNavHistory service.  We should update this method here now because it's getting confusing.

>+    // Finally, enqueue an event to notify observers.
>+    nsCOMPtr<nsIRunnable> event = new NotifyVisitObservers(place, place);
Is the page really its own referrer?  I think this is wrong here.

>   // First, ensure that our hash table is setup.
>   if (!mObservers.IsInitialized()) {
>-    NS_ENSURE_TRUE(mObservers.Init(), NS_ERROR_OUT_OF_MEMORY);
>+    NS_ENSURE_TRUE(mObservers.Init(128), NS_ERROR_OUT_OF_MEMORY);
why this change?  Pretty sure 128 is the default.

>+  // Embed visits don't have a database entry, thus don't set a title on them.
>+  if (history->hasEmbedVisit(aURI)) {
>+    return NS_OK;
>+  }
s/history/navHistory/ here too please

>+++ b/toolkit/components/places/src/nsNavHistory.cpp
>+// Initial size of the embed visits cache.
>+#define EMBED_VISITS_INITIAL_CACHE_SIZE 128
Fairly certain that that is the default size of hash tables.

>+  if (aTransitionType == TRANSITION_EMBED) {
>+    registerEmbedVisit(aURI, GetNow());
>+    *aVisitID = 0;
>+    return NS_OK;
>+  }
Would be nice to have a comment here.

> NS_IMETHODIMP
> nsNavHistory::RemovePages(nsIURI **aURIs, PRUint32 aLength, PRBool aDoBatchNotify)
> {
>+  // Clear the registered embed visits.
>+  clearEmbedVisits();
I think we should only do this if the database work succeeds.  If it doesn't, we've done work, but then threw and cannot undo our work.
Also, while we are here, add the main thread assertion check please.

> NS_IMETHODIMP
> nsNavHistory::RemovePagesFromHost(const nsACString& aHost, PRBool aEntireDomain)
> {
>   NS_ASSERTION(NS_IsMainThread(), "This can only be called on the main thread");
> 
>+  // Clear the registered embed visits.
>+  clearEmbedVisits();
Ditto.

> NS_IMETHODIMP
> nsNavHistory::RemovePagesByTimeframe(PRTime aBeginTime, PRTime aEndTime)
> {
>   NS_ASSERTION(NS_IsMainThread(), "This can only be called on the main thread");
> 
>+  // Clear the registered embed visits.
>+  clearEmbedVisits();
ditto

> NS_IMETHODIMP
> nsNavHistory::RemoveVisitsByTimeframe(PRTime aBeginTime, PRTime aEndTime)
> {
>   NS_ASSERTION(NS_IsMainThread(), "This can only be called on the main thread");
> 
>+  // Clear the registered embed visits.
>+  clearEmbedVisits();
ditto

> NS_IMETHODIMP
> nsNavHistory::RemoveAllPages()
> {
>   NS_ASSERTION(NS_IsMainThread(), "This can only be called on the main thread");
> 
>+  // Clear the registered embed visits.
>+  clearEmbedVisits();
ditto

>+  if (hasEmbedVisit(aURI)) {
>+    *_retval = PR_TRUE;
>+  }
>+  else {
>+    *_retval = IsURIStringVisited(utf8URISpec);
>+  }
An optionally more compact way of doing this:
  *_retval = hasEmbedVisit(aURI) ? PR_TRUE : IsURIStringVisited(utf8URISpec);

>+void
>+nsNavHistory::registerEmbedVisit(nsIURI* aURI, PRInt64 aTime)
>+{
nit: aTime on newline

>+  if (!mEmbedVisits.IsInitialized()) {
>+    mEmbedVisits.Init(EMBED_VISITS_INITIAL_CACHE_SIZE);
>+  }
Is there any reason to not just do this in our constructor?  What's the likelihood of us not having an embed visit (apart from Ts and Tp, which I'd rather not game in this instance)?

>+  PRInt64 lastVisitTime;
>+  if (mEmbedVisits.Get(aURI, &lastVisitTime)) {
>+    // Only keep the latest visit.
>+    mEmbedVisits.Remove(aURI);
>+  }
>+  if (NS_FAILED(mEmbedVisits.Put(aURI, aTime))) {
>+    NS_WARNING("Unable to register a EMBED visit.");
>+  }
It would be more efficient if we used an nsTHashtable here, with a KeyClass (probably call it something else) like this (see History.h for more details):
class KeyClass : public nsURIHashKey
{
public:
  ...
  PRTime visitTime;
};
And then in this method, just do:
KeyClass* visit = mEmbedVisists.PutEntry(aURI);
visit->visitTime = aTime;

It's only going to fail for out of memory, so probably want to check that (uses malloc, so not infallible).

>+bool
>+nsNavHistory::hasEmbedVisit(nsIURI* aURI) {
>+ NS_ASSERTION(NS_IsMainThread(), "This can only be called on the main thread");
>+
>+  if (!mEmbedVisits.IsInitialized()) {
>+    return false;
>+  }
>+  PRInt64 lastVisitTime;
>+  return mEmbedVisits.Get(aURI, &lastVisitTime);
return !!mEmbedVisits.GetEntry(aURI);

>+++ b/toolkit/components/places/src/nsNavHistory.h
>+  /**
>+   * Registers a TRANSITION_EMBED visit for the session.
>+   *
>+   * @param aSpec
>+   *        Url of the page.
This is not the variable you are looking for

>+  void registerEmbedVisit(nsIURI* aURI, PRInt64 aTime);
>+  /**
nit: newline between please

>+   * Returns whether the specified url has a embed visit.
>+   *
>+   * @param aSpec
>+   *        Url of the page.
This, too, is not the variable you are looking for

>+   * @return whether the page has a embed visit.
I thought it was @returns that we are supposed to use.  That's what you used in the idl at least.

>+  bool hasEmbedVisit(nsIURI* aURI);
>+  /**
nit: newline between please

>+++ b/toolkit/components/places/tests/queries/head_queries.js
>@@ -343,17 +343,17 @@ function compareArrayToResult(aArray, aR
>   // check expected number of results against actual
>   var expectedResultCount = aArray.filter(function(aEl) { return aEl.isInQuery; }).length;
>   do_check_eq(expectedResultCount, aRoot.childCount);
> 
>   var inQueryIndex = 0;
>   for (var i = 0; i < aArray.length; i++) {
>     if (aArray[i].isInQuery) {
>       var child = aRoot.getChild(inQueryIndex);
>-      LOG("testing testData[" + i + "] vs result[" + inQueryIndex + "]");
>+      //LOG("testing testData[" + i + "] vs result[" + inQueryIndex + "]");
don't think you meant for this to be in the diff

>+++ b/toolkit/components/places/tests/unit/test_248970.js
> // Initial batch of history items (7) + Bookmark_A (1)
> // This number should not change after tests enter private browsing
> // it will be set to 10 with the addition of Bookmark-B during private
> // browsing mode.
>-var num_places_entries = 9;
>+let num_places_entries = 8;
Was this comment right before, or is it right now?

>+++ b/toolkit/components/places/tests/unit/test_454977.js
>   // Add a visit, check that hidden is not overwritten and check place id
>   // - We expect that the place has still hidden = 0, while retaining the same
>   //   place_id and a correct visit_count.
>-  do_check_eq(add_visit(testURI, Date.now()*1000, hs.TRANSITION_EMBED), placeId);
>+  add_visit(TEST_URI, Date.now()*1000, TRANSITION_EMBED);
This comment is no longer accurate

While this feels like a lot of comments to say r+ with comments, there's nothing major here.  r=sdwilsh
Attachment #489877 - Flags: review?(sdwilsh) → review+
> >+  // Embed visits should not be added to the database, and the same is valid
for
> >+  // redirects across frames.  Thus all of these visits are considered embed.
> >+  if (!(aFlags & IHistory::TOP_LEVEL) &&
> >+	!(recentFlags & nsNavHistory::RECENT_ACTIVATED)) {
> >+    // A frame redirected to a new site without user interaction.
> >+    place.transitionType = nsINavHistoryService::TRANSITION_EMBED;
> >+  }
> A few things:
> 2) why did you put this in the beginning?

I've clarified the comment, handling embeds as last step is going to give us issues
since redirects or other visits could then try to reinsert the missing referrer page in
the database, bypassing embed check. We would then end with non-hidden embed visits that would
be marked as redirects or even worse links.

> 3) how we've done comments elsewhere in this block, the outer comment feels out
> of place.  It seems like you are talking about how all the if-elses work, which
> is not the case.
> 4) the inner comment feels redundant with the outer comment.  Pick one please
> and lose the other.

the external one was a global comment on the if/else indeed, i've clarified it a bit.

> >   // First, ensure that our hash table is setup.
> >   if (!mObservers.IsInitialized()) {
> >-    NS_ENSURE_TRUE(mObservers.Init(), NS_ERROR_OUT_OF_MEMORY);
> >+    NS_ENSURE_TRUE(mObservers.Init(128), NS_ERROR_OUT_OF_MEMORY);
> why this change?  Pretty sure 128 is the default.

default size is 64, I feel it's small for pages with lots of links.

> >+++ b/toolkit/components/places/src/nsNavHistory.cpp
> >+// Initial size of the embed visits cache.
> >+#define EMBED_VISITS_INITIAL_CACHE_SIZE 128
> Fairly certain that that is the default size of hash tables.

default size is 64, I feel it's small for pages with lots of links.

> >+  if (!mEmbedVisits.IsInitialized()) {
> >+    mEmbedVisits.Init(EMBED_VISITS_INITIAL_CACHE_SIZE);
> >+  }
> Is there any reason to not just do this in our constructor?  What's the
> likelihood of us not having an embed visit (apart from Ts and Tp, which I'd
> rather not game in this instance)?

The time we spend is the same, and the start page does not have frames, so, while it's
probably an ignorable amount, it would be a useless increase on startup.
I don't have strong preferences, this looked better because all related code is contained in a common location.
If you feel like the isInitialized() calls are too expensive to do for visits, I can move it to construction.

> >+   * @return whether the page has a embed visit.
> I thought it was @returns that we are supposed to use.	That's what you used in
> the idl at least.

uh, the idl is wrong then, since @return is the correct one
But I can't find where I used it...

> >+++ b/toolkit/components/places/tests/queries/head_queries.js
> >@@ -343,17 +343,17 @@ function compareArrayToResult(aArray, aR
> >   // check expected number of results against actual
> >   var expectedResultCount = aArray.filter(function(aEl) { return
aEl.isInQuery; }).length;
> >   do_check_eq(expectedResultCount, aRoot.childCount);
> > 
> >   var inQueryIndex = 0;
> >   for (var i = 0; i < aArray.length; i++) {
> >     if (aArray[i].isInQuery) {
> >	var child = aRoot.getChild(inQueryIndex);
> >-	LOG("testing testData[" + i + "] vs result[" + inQueryIndex + "]");
> >+	//LOG("testing testData[" + i + "] vs result[" + inQueryIndex + "]");
> don't think you meant for this to be in the diff

I meant. there is no ifdef debug for tests, this generates too much noise but is useful to uncomment for debug.

> >+++ b/toolkit/components/places/tests/unit/test_248970.js
> > // Initial batch of history items (7) + Bookmark_A (1)
> > // This number should not change after tests enter private browsing
> > // it will be set to 10 with the addition of Bookmark-B during private
> > // browsing mode.
> >-var num_places_entries = 9;
> >+let num_places_entries = 8;
> Was this comment right before, or is it right now?

it was right before some pb change to the test, then it was wrong, now it is right again :)


feel free to reiterate over these answers if you don't agree, I'm fixing all the other comments (so far I just still have to change the hash format)
Attached patch patch v1.3 (obsolete) — Splinter Review
Attachment #489877 - Attachment is obsolete: true
(In reply to comment #7)
> > >   // First, ensure that our hash table is setup.
> > >   if (!mObservers.IsInitialized()) {
> > >-    NS_ENSURE_TRUE(mObservers.Init(), NS_ERROR_OUT_OF_MEMORY);
> > >+    NS_ENSURE_TRUE(mObservers.Init(128), NS_ERROR_OUT_OF_MEMORY);
> > why this change?  Pretty sure 128 is the default.
> 
> default size is 64, I feel it's small for pages with lots of links.
Please do not use a magic number then.

> If you feel like the isInitialized() calls are too expensive to do for visits,
> I can move it to construction.
It does mean we get to lose the isInitialized checks in the three methods that we have.  I think it's probably worth it.
Fix remaining comments, a typo in a js test and a cpp test failure.
Attachment #490087 - Attachment is obsolete: true
Changes in places I/O stuff:
Places Trunk:
Read: 1,317 (43,090,120 bytes)
Write: 9,736 (295,124,928 bytes)
Flush: 844

With Patch:
Read: 1,330 (43,516,104 bytes)
      negligible change
Write: 9,530 (288,667,280 bytes)
       -206 (2%), -6,457,648 bytes (2%)
Flush: 833
       -11 (1%)

Marginally better I/O pattern here, so we should take it.
uh, I can't explain the reads, there can't be any more read, at least less.. could be there is some noise
(In reply to comment #13)
> uh, I can't explain the reads, there can't be any more read, at least less..
> could be there is some noise
I'm sure there is some noise, so I wouldn't worry about it too much.
Depends on: 612592
http://hg.mozilla.org/mozilla-central/rev/20311c05703e
Status: ASSIGNED → RESOLVED
Closed: 9 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla2.0b9
Blocks: 605468
Attachment #490198 - Attachment description: patch v1.4 → patch v1.4 [Checked in: Comment 12 & 15]
Blocks: 585713
Flags: in-testsuite+
You need to log in before you can comment on or make changes to this bug.