Closed Bug 737836 Opened 12 years ago Closed 7 years ago

Frecency redirects bonuses are messed up

Categories

(Toolkit :: Places, defect, P2)

defect

Tracking

()

RESOLVED FIXED
mozilla53
Tracking Status
firefox53 --- fixed

People

(Reporter: mak, Assigned: standard8)

References

(Blocks 1 open bug)

Details

(Whiteboard: [fxsearch])

Attachments

(2 files, 2 obsolete files)

Right now we have
places.frecency.permRedirectVisitBonus;0
places.frecency.tempRedirectVisitBonus;0
places.frecency.linkVisitBonus;100

This means I visit page A (gets a 100 bonus) that redirects to page B (gets a 0 bonus).  I'd expect to find page B higher in the frecency ranking since it's the destination page I'm actually using and not just a redirecting page, though it's currently the opposite.
If a page is redirecting to another one, we should probably half the bonus, and just use link for redirect transitions since they represent targets, not sources.
Assignee: nobody → mak77
Depends on: 738762
Summary: Frecency redirects bonus are messed up → Frecency redirects bonuses are messed up
Blocks: 407760
So after investigating a bit, the problem is sort of different from what I was looking into. The frecency of redirect targets is calculated using the source page visit_type.
So, the problem is actually that source and target have the same frecency, what we can do is override the transition bonus if the page is a redirect, but we'd still apply the bookmark bonus, and the target page would still gain the proper frecency.
Basically, unless the redirect is a bookmark, push it quite down the list.
Attached patch patch v1.0 (obsolete) — Splinter Review
needs more brainstorming and tests!
Blocks: 487813
Attached patch patch v1.1 (obsolete) — Splinter Review
Fixed a couple issues, particularly when adding a new visit we can't rely on past visits to figure if it's a redirect, thus we have to pass the info to the calculate_frecency function.

I also figured places/tests/network is not testing anything, since it doesn't have a docshell... I will try to move the meaningful parts into a browser-chrome test in next version.
Attachment #712009 - Attachment is obsolete: true
Priority: -- → P2
Priority: P2 → P3
Whiteboard: [fxsearch]
Severity: normal → major
Priority: P3 → P2
Blocks: 1285350
Assignee: mak77 → nobody
not working on this atm, feel free to continue on it.
the attached patch seems to be the latest version I have, the patch I have locally looks identical.

There's still comment 3 to address: figure out if we have a test for what test_history_redirect was trying to do, or eventually make one, it should be a test with a docshell, like b-c for example. It should also check frecency.
And probably the frecency calculation part was broken by recent simplifications to that code path, so it may need an unbitrot.
(In reply to Marco Bonardo [::mak] from comment #5)
> There's still comment 3 to address: figure out if we have a test for what
> test_history_redirect was trying to do, or eventually make one, it should be
> a test with a docshell, like b-c for example. It should also check frecency.
> And probably the frecency calculation part was broken by recent
> simplifications to that code path, so it may need an unbitrot.

test_history_redirect was introduced with bug 542941. From how I read it, it looks like an integration/functional type test - pretend the browser actually visits a url and check it gets logged in the places database.

Looking at other tests that reference TRANSITION_* it seems most of those are based around `PlacesTestUtils.addVisits`.

It looks like this test is doing a similar thing, and with a frecency test integrated:

https://dxr.mozilla.org/mozilla-central/source/toolkit/components/places/tests/browser/browser_redirect.js

Marco, does that look right to you? If so I propose we remove test_history_redirect in another bug (as it is quick & simple to do and isn't testing anything), then look at updating this patch and adjusting the test for it.
Flags: needinfo?(mak77)
Sure, we can do that.
I have nothing against removing that test, it's testing nothing!
Flags: needinfo?(mak77)
Depends on: 1316287
Test removal split out to bug 1316287, I'll start looking at the patch here over the next few days.
Assignee: nobody → standard8
Comment on attachment 8809424 [details]
Bug 737836 - Don't rely on past visits for frecency calcuations for redirects.

https://reviewboard.mozilla.org/r/92020/#review91966

::: toolkit/components/places/SQLFunctions.h:193
(Diff revision 1)
>   * otherwise they will be fetched from it automatically.
>   *
>   * @param pageId
>   *        The id of the page.  Pass -1 if the page is being added right now.
> - * @param [optional] typed
> - *        Whether the page has been typed in.  Default is false.
> + * @param [optional] redirect
> + *        Whether the page visit is a redirect.  Default is false.

I followed the original patch that passes in the redirect. However, I'll admit I'm not 100% if the CALCULATE_FRECENCY function is just meant to get it out the database similar to how the other parameters were changed in https://hg.mozilla.org/mozilla-central/rev/5119ac972910
Attachment #719530 - Attachment is obsolete: true
Comment on attachment 8809424 [details]
Bug 737836 - Don't rely on past visits for frecency calcuations for redirects.

https://reviewboard.mozilla.org/r/92020/#review92000

In this case we cannot extract the information from the database when a _new_ visit is inserted, because every visit has a visit_type that tells how we _reached_ the page, but doesn't tell how we moved away from it. So given a visit in the database we know if it was a redirect target, but we don't know if it will be redirecting to something else. The docshell knows and that's what we are passing into.
For previous visits instead we can check if there's a visit that is a redirect target and has visit_from pointing to the currently analyzed visit.
And that's why the SQL function in case redirect is not provided tries to extract that information from the database. But there's a problem here... explained below.

::: browser/app/profile/firefox.js:862
(Diff revision 1)
> -
> +// 0 would hide these pages, instead we want them low ranked.  Thus we use
> +// linkVisitBonus - bookmarkVisitBonus, so that a bookmarked source is in par
> +// with a common link.
> +pref("places.frecency.redirectSourceVisitBonus", 25);
> +// The bookmarks bonus is always added on top of any other bonus, included
> +// the redirect source and the bookmark ones.

I think here should read "and the typed ones."... it would not make much sense to add the bookmark bonus on top of the bookmark bonus

::: browser/app/profile/firefox.js:863
(Diff revision 1)
> +// linkVisitBonus - bookmarkVisitBonus, so that a bookmarked source is in par
> +// with a common link.
> +pref("places.frecency.redirectSourceVisitBonus", 25);
> +// The bookmarks bonus is always added on top of any other bonus, included
> +// the redirect source and the bookmark ones.
> +pref("places.frecency.bookmarkVisitBonus", 75);

I'm not sure why in the original patch bookmarkVisitBonus was moved down in the file, I'd leave it where it was and just att the comment above it.
So we save some blame.

::: toolkit/components/places/SQLFunctions.cpp:569
(Diff revision 1)
>             NS_SUCCEEDED(getVisits->ExecuteStep(&hasResult)) && hasResult;
>             numSampledVisits++) {
>          int32_t visitType;
>          rv = getVisits->GetInt32(1, &visitType);
>          NS_ENSURE_SUCCESS(rv, rv);
> -        bonus = history->GetFrecencyTransitionBonus(visitType, true);
> +        if (!isRedirect) {

I think the problem is here.
Basically the isRedirect information, when available, is only valid for the currently being added visit.
But here we are applying this info to all of the selected visits.

We should probably make isRedirect a tri-state bool, or just track whether a meaningful value was passed into (null is not good).
And if it's available it should override the information only for the _first_ returned visit that is the most recent.
Attachment #8809424 - Flags: review?(mak77)
Marco, can I try and clarify what we're expecting for the output based on what you've said?

So currently, what happens in nightly is:

- Visit redirect
=> frecency of redirect = 2000, frecency of target 2000 (the typed url visit bonus).

- Visit the same redirect a second time
=> frecency of redirect = 4000, frecency of target 4000

What I think you're saying, is that because we don't know a page is a rediect, until *after* we've visited it once.

So I think the best we can do is:

- Visit redirect
=> frecency of redirect = 2000, frecency of target 2000.

- Visit the same redirect a second time
=> frecency of redirect = 2025, frecency of target 4000

Is that what you're expecting? If so, I think I can work from that and get some tests to show it.
Flags: needinfo?(mak77)
I'll try to clarify, but feel free to further nag me if it's still unclear.

Basically, you are adding to the calculate_frecency function this "redirect" argument, that can be valid (0 or 1), or can be NULL. it's a TRISTATE bool indeed, but you use it as a normal boolean.

The 3 states are:
a. numEntries == 1. so value is NULL. We don't know anything about this visit, thus we are not adding a new visit through the docshell (and History.cpp). Could be we are not adding a visit at all, or we are adding it through the History API. Unfortunately we can't do much about this, but it's not a user navigation.
b. numEntries == 2 and value = 1. we are adding a new visit from the docshell and it's a redirect.
c. numEntries == 2 and value = 0. we are adding a new visit from the docshell and it's not a redirect.

When we fetch the 10 most recent visits, we also fetch the just newly added visit, AND if the redirect value we pass is valid (case b or c) the isRedirect we got is only valid for the most recent visit.
For the other 9 visits it's still unknown whether they are redirects or not.

So, we should always use the value read from the db BUT when we are in case b or c AND we are in the first iteration of the query. Any other case should use the db value.

Fixing this, should make the frecency values correct WHEN the visit happens from the docshell. So the browser-chrome test should properly see a target frecency bigger than the source frecency.
If this ends up being untrue, please let me know and I can do more proper debugging and calculations.
Flags: needinfo?(mak77)
Marco: Thank you for the comments, that made it a lot clearer.

I've implemented the changes, and added a couple more tests for repeat/non-typed visits.
Blocks: 329983
Blocks: 349844
Blocks: 576405
Let me rephrase here what I said on IRC, for reference:

The first problem I see, is that the first redirect seems to be using the originating page transition_type, even if it's a redirect, if it's a redirect it should not use the originating page transition_type.
This may be solved with some logic (if) in the SqlFunction. Basically if a visit is a redirect, it should use its own transition_type, not its source. If a page is a redirect target BUT NOT a redirect, it should use its ancestor transition_type.

The second problem is that if a page is a redirect and comes from a redirect, it gets bonus 0, and that causes us to end up with frecency -1 (http://searchfox.org/mozilla-central/rev/790b2cb423ea1ecb5746722d51633caec9bab95a/toolkit/components/places/SQLFunctions.cpp#584)
I'm thinking we should probably change redirect bonuses for this. Since they indicate we arrived through a redirect, 0 doesn't sound good (we want to reduce score of sources, not targets!).
Both should be set at a value bigger than redirectSourceVisitBonus but smaller than bookmarkVisitBonus. 40 may work.

The final scope is to have final target frecency bigger than any previous redirects.
No longer blocks: 329983
Comment on attachment 8809424 [details]
Bug 737836 - Don't rely on past visits for frecency calcuations for redirects.

This is an updated patch with the latest comments & discussions.

It passes the tests, but there's some strange inconsistency when trying the multiple redirect sequences with entering the url in the url bar (the second redirect is seen as not a redirect after the first entry).

I'm trying to narrow down what is causing this at the moment.
Attachment #8809424 - Flags: review?(mak77)
I expect that bug to be elsewhere (lower level, likely a missing flag in the docshell code that hands visits to VisitURI), and as such we could even proceed here, and handle that problem separately.
(In reply to Marco Bonardo [::mak] from comment #21)
> I expect that bug to be elsewhere (lower level, likely a missing flag in the
> docshell code that hands visits to VisitURI), and as such we could even
> proceed here, and handle that problem separately.

Yeah, I've researched it enough to be convinced it is somewhere in the docshell code. I'll get the patch setup for landing.
See Also: → 1332288
(In reply to Mark Banner (:standard8) from comment #22)
> (In reply to Marco Bonardo [::mak] from comment #21)
> > I expect that bug to be elsewhere (lower level, likely a missing flag in the
> > docshell code that hands visits to VisitURI), and as such we could even
> > proceed here, and handle that problem separately.
> 
> Yeah, I've researched it enough to be convinced it is somewhere in the
> docshell code. I'll get the patch setup for landing.

Now filed as bug 1332288.
Comment on attachment 8809424 [details]
Bug 737836 - Don't rely on past visits for frecency calcuations for redirects.

https://reviewboard.mozilla.org/r/92020/#review106718

::: toolkit/components/places/SQLFunctions.cpp:583
(Diff revision 4)
> -        bonus = history->GetFrecencyTransitionBonus(visitType, true);
>  
> +        // If we don't know if this is a redirect or not, or this is not the
> +        // most recent visit that we're looking at, then we use the redirect
> +        // value from the database.
> +        if (isRedirect == eRedirectUnknown || numSampledVisits >= 1) {

There's still something I don't understand here.
The redirect status is per visit, but this code passes the previous value to the next loop iteration.

supposing the 2nd visit sets it to true/false, the third visit will skip this if.
I guess it would be less error prone to have 2 vars, one for the passed-in value and one for the local value.
Attachment #8809424 - Flags: review?(mak77)
Comment on attachment 8827995 [details]
Bug 737836 - Improve frecency handling for multiple redirect sequences, and add a test.

https://reviewboard.mozilla.org/r/105552/#review106720

::: browser/app/profile/firefox.js:868
(Diff revision 2)
>  // linkVisitBonus - bookmarkVisitBonus, so that a bookmarked source is in par
>  // with a common link.
>  pref("places.frecency.redirectSourceVisitBonus", 25);
>  pref("places.frecency.downloadVisitBonus", 0);
> -pref("places.frecency.permRedirectVisitBonus", 0);
> -pref("places.frecency.tempRedirectVisitBonus", 0);
> +pref("places.frecency.permRedirectVisitBonus", 50);
> +pref("places.frecency.tempRedirectVisitBonus", 50);

Just off-hand, permanent redirects are a bit "stronger" than temp ones, what about 40 for temp redirects? it's mostly a guess, honestly.

::: toolkit/components/places/SQLFunctions.cpp:541
(Diff revision 2)
>        rv = getPageInfo->GetInt32(3, &isQuery);
>        NS_ENSURE_SUCCESS(rv, rv);
> +      nsAutoCString url;
> +      rv = getPageInfo->GetUTF8String(4, url);
> +      NS_ENSURE_SUCCESS(rv, rv);
> +      printf("Processing url %s, isRedirect %d\n", url.get(), (int)isRedirect);

debug leftover

::: toolkit/components/places/SQLFunctions.cpp:582
(Diff revision 2)
>        for (int32_t maxVisits = history->GetNumVisitsForFrecency();
>             numSampledVisits < maxVisits &&
>             NS_SUCCEEDED(getVisits->ExecuteStep(&hasResult)) && hasResult;
>             numSampledVisits++) {
> +
> +          // if null or redirect -> use main visit type not origin

nit: UCfirst and end with a period.

::: toolkit/components/places/SQLFunctions.cpp:587
(Diff revision 2)
> +          // if null or redirect -> use main visit type not origin
>          int32_t visitType;
> -        rv = getVisits->GetInt32(1, &visitType);
> +        if (isRedirect == eIsRedirect) {
> +          rv = getVisits->GetInt32(2, &visitType);
> +          NS_ENSURE_SUCCESS(rv, rv);
> +          printf("Using redirect visit type %d\n", visitType);

leftover

::: toolkit/components/places/SQLFunctions.cpp:589
(Diff revision 2)
> -        rv = getVisits->GetInt32(1, &visitType);
> +        if (isRedirect == eIsRedirect) {
> +          rv = getVisits->GetInt32(2, &visitType);
> +          NS_ENSURE_SUCCESS(rv, rv);
> +          printf("Using redirect visit type %d\n", visitType);
> +        } else {
> +          bool isNull = false;

probably to increase readability a bit, I'd GetIsNull regardless, and then (pseudocode)

if (eIsRedirect || isNull) {
  // Use main visit_type.
  ...
} else {
  // This is a redirect target, so use the
  // origin visit_type.
  ...
}

The GetIsNull operation should be quite cheap.

::: toolkit/components/places/SQLFunctions.cpp:595
(Diff revision 2)
> +          rv = getVisits->GetIsNull(1, &isNull);
> -        NS_ENSURE_SUCCESS(rv, rv);
> +          NS_ENSURE_SUCCESS(rv, rv);
>  
> +          rv = getVisits->GetInt32(isNull ? 2 : 1, &visitType);
> +          NS_ENSURE_SUCCESS(rv, rv);
> +          printf("Using non-redirect visit type %d, isNull %d\n", visitType, isNull);

leftover

::: toolkit/components/places/tests/browser/browser_multi_redirect_frecency.js:77
(Diff revision 2)
> +add_task(function* test_multiple_redirect() {
> +  // Used to verify the redirect bonus overrides the typed bonus.
> +  PlacesUtils.history.markPageAsTyped(REDIRECT_URI);
> +
> +  expectedRedirectSourceFrecency += REDIRECT_SOURCE_VISIT_BONUS;
> +  // XXX Bug 487813 - This should be TYPED_VISIT_BONUS, however as we don't

s/XXX/TODO/

::: toolkit/components/places/tests/browser/browser_multi_redirect_frecency.js:90
(Diff revision 2)
> +  yield Promise.all([visitedURIPromise, newTabPromise]);
> +
> +  yield testURIFields(REDIRECT_URI, expectedRedirectSourceFrecency, 1);
> +  yield testURIFields(INTERMEDIATE_URI_1, expectedRedirectSourceFrecency, 1);
> +  yield testURIFields(INTERMEDIATE_URI_2, expectedRedirectSourceFrecency, 1);
> +  yield testURIFields(TARGET_URI, expectedTargetFrecency, 0); // XXX -1

This comment is not very clear

::: toolkit/components/places/tests/browser/redirect_thrice.sjs:4
(Diff revision 2)
> +/**
> + * Any copyright is dedicated to the Public Domain.
> + * http://creativecommons.org/publicdomain/zero/1.0/
> + */

nit: in tests the PD license can be omitted, test code defaults to PD if doesn't have a license header.
Attachment #8827995 - Flags: review?(mak77)
Comment on attachment 8809424 [details]
Bug 737836 - Don't rely on past visits for frecency calcuations for redirects.

https://reviewboard.mozilla.org/r/92020/#review106718

> There's still something I don't understand here.
> The redirect status is per visit, but this code passes the previous value to the next loop iteration.
> 
> supposing the 2nd visit sets it to true/false, the third visit will skip this if.
> I guess it would be less error prone to have 2 vars, one for the passed-in value and one for the local value.

Thanks for pointing that out, I'd missed that.
Comment on attachment 8827995 [details]
Bug 737836 - Improve frecency handling for multiple redirect sequences, and add a test.

https://reviewboard.mozilla.org/r/105552/#review106720

> Just off-hand, permanent redirects are a bit "stronger" than temp ones, what about 40 for temp redirects? it's mostly a guess, honestly.

Yep, let's give it a try.

> debug leftover

Bah, how did I leave those in there :-(

> This comment is not very clear

I'd left those in by mistake, after I was trying to figure a few things out.
Comment on attachment 8809424 [details]
Bug 737836 - Don't rely on past visits for frecency calcuations for redirects.

https://reviewboard.mozilla.org/r/92020/#review107084

It looks great now!
Attachment #8809424 - Flags: review?(mak77) → review+
Comment on attachment 8827995 [details]
Bug 737836 - Improve frecency handling for multiple redirect sequences, and add a test.

https://reviewboard.mozilla.org/r/105552/#review107086

::: browser/app/profile/firefox.js:867
(Diff revision 3)
>  // 0 would hide these pages, instead we want them low ranked.  Thus we use
>  // linkVisitBonus - bookmarkVisitBonus, so that a bookmarked source is in par
>  // with a common link.
>  pref("places.frecency.redirectSourceVisitBonus", 25);
>  pref("places.frecency.downloadVisitBonus", 0);
> -pref("places.frecency.permRedirectVisitBonus", 0);
> +pref("places.frecency.permRedirectVisitBonus", 50);

please add a comment that explains these redirects bonuses represent targets, not sources.

::: toolkit/components/places/SQLFunctions.cpp:514
(Diff revision 3)
>  
>      // Fetch the page stats from the database.
>      {
>        RefPtr<mozIStorageStatement> getPageInfo = DB->GetStatement(
>          "SELECT typed, visit_count, foreign_count, "
> -               "(substr(url, 0, 7) = 'place:') "
> +               "(substr(url, 0, 7) = 'place:'), url "

url still looks like a leftover.

::: toolkit/components/places/SQLFunctions.cpp:539
(Diff revision 3)
>        NS_ENSURE_SUCCESS(rv, rv);
>        hasBookmark = foreignCount > 0;
>        rv = getPageInfo->GetInt32(3, &isQuery);
>        NS_ENSURE_SUCCESS(rv, rv);
> +      nsAutoCString url;
> +      rv = getPageInfo->GetUTF8String(4, url);

ditto, looks like a leftover
Attachment #8827995 - Flags: review?(mak77) → review+
Thanks for the comments, updated & pushed to autoland :-)
Pushed by mbanner@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/9c0c8b1a1250
Don't rely on past visits for frecency calcuations for redirects. r=mak
https://hg.mozilla.org/integration/autoland/rev/05848bf39248
Improve frecency handling for multiple redirect sequences, and add a test. r=mak
https://hg.mozilla.org/mozilla-central/rev/9c0c8b1a1250
https://hg.mozilla.org/mozilla-central/rev/05848bf39248
Status: NEW → RESOLVED
Closed: 7 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla53
Blocks: 624458
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: