Closed Bug 1261386 Opened 4 years ago Closed 4 years ago

rarely-visited page shows up in most-visited list with absurdly high visit count

Categories

(Firefox :: Bookmarks & History, defect, P2)

defect

Tracking

()

RESOLVED FIXED
Firefox 50
Tracking Status
firefox50 --- fixed

People

(Reporter: dietrich, Assigned: mak)

References

Details

Attachments

(1 file)

STR:

1. open history sidebar
2. sort by "most visited"

Expected: See the sites you visit all the time

Actual: See some that you've only been to a few times!

Example: I have http://smile.amazon.com/gp/product/B00YF5YZXE?redirect=true&pldnSite=1 in my top 10. I looked in places.sqlite and moz_places.visit_count is 1468, and that number matches the number of records in moz_historyvisits for that place_id.

I've been to that page maybe 5 times.

I asked another Firefox user to take a look at their most visited list, and they found the same: Entries of pages that they've only been to a few times are listed in their top 10 most visited.
Priority: -- → P2
Blocks: 1209027
I plan 2 things here:
1. improve the current anti-flooding system to be time-based rather than entries based
2. introduce a new history transition, TRANSITION_RELOAD, that should have a much smaller impact on frecency

Unfortunately we cannot retro-actively fix the existing data, but with time things should stabilize.
I just have to figure out how to make this transition reduce the frecency value.
Assignee: nobody → mak77
as a side note, the new transition should also help with the chrome.history web-extension effort, since it's one of the transitions we are missing.
Thanks for letting me know, Marco!
this is still running on Try, but I'd like to give you a bit more time to look at it.

There are some interesting changes here, it may be scary, but also a tangible improvement.

first, the previous flooding protection was based on a fixed number of 8 previous entries. In the current world, it's very likely we get far more than 8 entries for a single page, so it was clearly ineffective. The new system collects uris visited in the last 6 minutes. It may take some more memory, but we track that, and in case we can reduce it if we figure it's excessive. It shouldn't be.
So basically, if the same url is being revisited in less than 6 minutes, we don't add it to history.

the second things is the new transition. it will act similarly to other "less important" transitions we have, like downloads or framed_links. The idea is that these visits are stored, but they don't increase the visit_count value that we use for various APIs.
Additionally I'm simplifying a bit the frecency calculation code, that in years collected lots of half-done ideas. It also changed a bit, it's going to use visit_count instead of the full_visit_count, that means it won't account for those less-important visits. That means a page only having those visits will have a 0 frecency, thus it won't appear in the awesomebar.
The original intent was indeed to make downloads and embed not appear in the awesomebar, and along the years we have broken that (I may add a test for this).
The main advantage is that frecency recalculations become cheaper, since we don't have to count visits again.
The expected outcome is also that reload visits don't count much for frecency, that is very much based on the visit count.
Comment on attachment 8757368 [details]
MozReview Request: Bug 1261386 - Avoid history floding by repeated reloads. r=adw

https://reviewboard.mozilla.org/r/55830/#review54942

It seems like a good idea to me to base the "is recent" logic on time and not count, so I think this patch is definitely worth trying.

::: toolkit/components/places/History.h:40
(Diff revision 1)
>  
> -// Max size of History::mRecentlyVisitedURIs
> -#define RECENTLY_VISITED_URI_SIZE 8
> +// Initial size of mRecentlyVisitedURIs.
> +#define RECENTLY_VISITED_URIS_SIZE 64
> +// Microseconds after which a visit can be expired from mRecentlyVisitedURIs.
> +// If an URI is reloaded before this time, we don't store it in history.
> +// Since a commonly found case is to reload every 5 minutes, we pick 6 minutes.

There's a subtle ambiguity that this comment should probably mention/clarify I think.  If I'm reading the patch correctly, if a URI is constantly reloaded every five minutes, then it will always be considered recent, so only the first visit will count, right?  It is not the case that one visit for that URI will be counted every six minutes.

::: toolkit/components/places/History.cpp:2410
(Diff revision 1)
>  }
>  
>  void
>  History::AppendToRecentlyVisitedURIs(nsIURI* aURI) {
> -  if (mRecentlyVisitedURIs.Length() < RECENTLY_VISITED_URI_SIZE) {
> -    // Append a new element while the array is not full.
> +  // Add a new entry, if necessary.
> +  RecentURIKey* entry = mRecentlyVisitedURIs.GetEntry(aURI);

We should be able to save some memory by changing mRecentlyVisitedURIs from an array to a hash, no?  (Keyed on URIs, with a value type that stores most recent visit date.)  I mention it since it occurred to me after reading your comment about memory use (comment 5 in the bug).  But also if I were designing this from scratch, a hash is what I would use.  And it would make IsRecentlyVisitedURI more efficient.

It probably wouldn't make a big difference on memory in practice though, so if that's a hard change to make, the array is fine with me.

::: toolkit/components/places/History.cpp:2421
(Diff revision 1)
> -    mRecentlyVisitedURIsNextIndex++;
> +  }
> +
> +  // Expire oldest entries.
> +  for (auto iter = mRecentlyVisitedURIs.Iter(); !iter.Done(); iter.Next()) {
> +    RecentURIKey* entry = iter.Get();
> +    if ((PR_Now() - entry->time) > RECENTLY_VISITED_URIS_MAX_AGE) {

Could you factor out the conditional/logic in IsRecentlyVisitedURI into IsRecentlyVisitedEntry so that you wouldn't have to duplicate it here?  But it's not a big deal, so don't worry about it if you don't think it's worth it.

::: toolkit/components/places/History.cpp:2422
(Diff revision 1)
> +
> +  // Expire oldest entries.
> +  for (auto iter = mRecentlyVisitedURIs.Iter(); !iter.Done(); iter.Next()) {
> +    RecentURIKey* entry = iter.Get();
> +    if ((PR_Now() - entry->time) > RECENTLY_VISITED_URIS_MAX_AGE) {
> +      iter.Remove();

I'm very rusty on my COM arrays... is it safe to mutate mRecentlyVisitedURIs while you're iterating over it?

::: toolkit/components/places/nsINavHistoryService.idl:1242
(Diff revision 1)
>     * a frame.
>     */
>    const unsigned long TRANSITION_FRAMED_LINK = 8;
>  
>    /**
> +   * This transition type means the user reloaded the page.

This comment is misleading if I understand right.  "User reloaded the page" means that the user pressed the reload button (for example).  But doesn't TRANSITION_RELOAD cover any case where the same URI was loaded more than once in 6 minutes, however that load was triggered?
Attachment #8757368 - Flags: review?(adw) → review+
(In reply to Drew Willcoxon :adw from comment #6)
> We should be able to save some memory by changing mRecentlyVisitedURIs from
> an array to a hash, no?  (Keyed on URIs, with a value type that stores most
> recent visit date.)

That's what the patch is doing (it was an array, the patch changes it to an hash):
+ nsTHashtable<RecentURIKey> mRecentlyVisitedURIs;

> It probably wouldn't make a big difference on memory in practice though, so
> if that's a hard change to make, the array is fine with me.

I thought of ways to make the hash take less space, currently nsURIHashKey keeps a reference to the uri, for our use case we may be fine with a large enough numeric hash of the URI(likely 64bits, but we don't have a quality 64 bit hashing in mfbt). I didn't want to complicate the hash until we figure whether it's needed.

> ::: toolkit/components/places/History.cpp:2422
> (Diff revision 1)
> > +
> > +  // Expire oldest entries.
> > +  for (auto iter = mRecentlyVisitedURIs.Iter(); !iter.Done(); iter.Next()) {
> > +    RecentURIKey* entry = iter.Get();
> > +    if ((PR_Now() - entry->time) > RECENTLY_VISITED_URIS_MAX_AGE) {
> > +      iter.Remove();
> 
> I'm very rusty on my COM arrays... is it safe to mutate mRecentlyVisitedURIs
> while you're iterating over it?

I checked again, to be sure, this is in nsTHashTable.h:
// This is an iterator that also allows entry removal.
so, it sounds like I'm respecting the contract.
(In reply to Drew Willcoxon :adw from comment #6)
> ::: toolkit/components/places/History.cpp:2421
> (Diff revision 1)
> > -    mRecentlyVisitedURIsNextIndex++;
> > +  }
> > +
> > +  // Expire oldest entries.
> > +  for (auto iter = mRecentlyVisitedURIs.Iter(); !iter.Done(); iter.Next()) {
> > +    RecentURIKey* entry = iter.Get();
> > +    if ((PR_Now() - entry->time) > RECENTLY_VISITED_URIS_MAX_AGE) {
> 
> Could you factor out the conditional/logic in IsRecentlyVisitedURI into
> IsRecentlyVisitedEntry so that you wouldn't have to duplicate it here?  But
> it's not a big deal, so don't worry about it if you don't think it's worth
> it.

Not going to do this cause one is checking "greater than" and the other one is checking "smaller than". This also confused me until I saw test failures :) I'll add a couple comment.
(In reply to Marco Bonardo [::mak] - Away 9-13 June from comment #7)
> (In reply to Drew Willcoxon :adw from comment #6)
> > We should be able to save some memory by changing mRecentlyVisitedURIs from
> > an array to a hash, no?  (Keyed on URIs, with a value type that stores most
> > recent visit date.)
> 
> That's what the patch is doing (it was an array, the patch changes it to an
> hash):
> + nsTHashtable<RecentURIKey> mRecentlyVisitedURIs;

Whoops, somehow I missed that!
(In reply to Marco Bonardo [::mak] - Away 9-13 June from comment #8)
> Not going to do this cause one is checking "greater than" and the other one
> is checking "smaller than".

Right right, but in one case you would just negate the return value and in the other case you wouldn't.  But not factoring it out is fine with me.
Pushed by mak77@bonardo.net:
https://hg.mozilla.org/integration/fx-team/rev/5119ac972910
Avoid history flooding from repeated reloads. r=adw
https://hg.mozilla.org/mozilla-central/rev/5119ac972910
Status: NEW → RESOLVED
Closed: 4 years ago
Resolution: --- → FIXED
Target Milestone: --- → Firefox 50
You need to log in before you can comment on or make changes to this bug.