Closed
Bug 409271
Opened 18 years ago
Closed 7 years ago
Weight the main website address higher than sub-pages
Categories
(Firefox :: Address Bar, defect)
Firefox
Address Bar
Tracking
()
RESOLVED
INACTIVE
People
(Reporter: ancestor.ak, Unassigned)
References
Details
Attachments
(5 files)
I think that we should add extra weight to root pages of websites. All other factors being equal, "www.goats.com" should be ranked higher in search results than sub-pages.
This way, if you visit this website once and then search for "goats", you will be more likely to be presented with the main page than with some random, obscure subpage like "www.goats.com/aspx?FamilyID=ad724ae0-e72d-4f54-9ab3-75b8eb148356&displaylang=en".
Of course, it would be just one of the factors - other factors could still cause the sub-page to be ranked higher.
I believe this is how the Firefox2-Places autocomplete used to work and as far as I recall it was much praised.
Comment 1•18 years ago
|
||
Bug 394066 comment #6 briefly mentioned something along these lines..
"url length (shorter is better?)"
Just a data point..
SELECT MAX(LENGTH(url)), MIN(LENGTH(url)) FROM moz_places
32116, 13
where length 13 is place:sort=4& and http://cs598/ (14 has javascript:NaN, place:folder=4, place:folder=3)
Comment 2•18 years ago
|
||
Give weight to shorter urls by grouping them into buckets by length (divide and round to 1 decimal)
| Reporter | ||
Comment 3•18 years ago
|
||
Is checking length used as a hack for detecting website roots, or is it because you think that shorter URLs in general should be ranked higher?
The most correct way to detect website root would be to create an nIURI and check the path. But that's probably out of the question for performance reasons, even if done using a custom SQL function, am I right? The length hack might be good enough but it may also generate some undesired results in some cases.
Comment 4•18 years ago
|
||
Right. This is to estimate the root because..
1) callback and creating a nsIURI for *every* row in the moz_places table will probably be slow
2) figuring out what's exactly a root and not isn't always easy -- / vs /index.html ?
Comment 5•18 years ago
|
||
Here's typing in "bug" on trunk. I've cleared out my adaptive searching history and fixed all the pages to be "recent".
Comment 6•18 years ago
|
||
Prefer shorter urls for the same "bug"
Comment 7•18 years ago
|
||
Comment 8•18 years ago
|
||
Note that I never visit forums.mozillazine.org -- only to the firefox builds page directly, so it doesn't show up here, but it's something the adaptive bar will bubble up.
Updated•18 years ago
|
Attachment #294160 -
Attachment description: bug (trunk) → bug (short)
Updated•18 years ago
|
Attachment #294161 -
Attachment description: bug (short) → bug (trunk)
Attachment #294161 -
Attachment filename: bug short.png → bug trunk.png
Updated•18 years ago
|
Attachment #294162 -
Attachment description: moz (trunk) → moz (short)
Attachment #294162 -
Attachment filename: moz trunk.png → moz short.png
Updated•18 years ago
|
Attachment #294163 -
Attachment description: moz (short) → moz (trunk)
Attachment #294163 -
Attachment filename: moz short.png → moz trunk.png
Comment 9•18 years ago
|
||
Whoops.. named the files the wrong way ;)
As a side note.. this also includes the patch to include typed in the rank (bug 352923), and some pages like wiki.mozilla.org and bonsai.mozilla.org actually redirect to another page. Should we put weight for the targets of redirects?
Comment 10•18 years ago
|
||
Adam: Can you try these builds to see if it does what you want? (Only the mac build is finished right now)
https://build.mozilla.org/tryserver-builds/2007-12-20_19:46-edward.lee@engineering.uiuc.edu-awbar.weight.dlmgr.faster/
It's got a bunch of patches, but you should be able to see the effect of weighting the main website over sub-pages.. (note that if you start selecting stuff, the adaptive will kick in)
Bug 395735 - Instrument the location bar auto-complete to report accuracy
Bug 406359 - Unify the logic for url bar searches and drop down items
Bug 406358 - Location bar drop down ignores frequency of visits
Bug 395739 - adaptive learning (match entered text to selected item) in url bar
autocomplete
Bug 393678 - location bar autocomplete should take word boundaries in account
Bug 406422 - Globally decay adaptive input history to allow for new entries
Bug 406425 - Limit how many adaptive input history entries to track
Bug 406427 - Penalize skipped results to allow adaptive learning of shifting
preferences
Bug 352923 - Autocomplete should prefer typed URLs to URLs that were merely
visited
Bug 394066 - use "if bookmark" to weight url autocomplete results
Bug 409271 - Weight the main website address higher than sub-pages
Bug 408696 - Don't show leading 0 for download dates
Bug 409311 - I see beach ball instead of download manager
Bug 408745 - Download Manager clobbers the UI thread and is slow to respond
when updating its list
Bug 409105 - Just finished download shows start time instead of completed time
Bug 409314 - Default download manager size should be golden (485x300)
Bug 409317 - Download date/time should show the full date and time as a tooltip
Bug 409326 - Complement eTLD+1 status with a full host tooltip
Bug 394516 - Figure out a remaining-time rounding scheme for minutes ->
hours/days
Bug 406857 - Don't clear referrer when restarting downloads
Bug 401316 - Open-with downloads are readonly
Just to be clear, the full query is..
// ?1 Starting chunk time
// ?2 Ending chunk time
// ?3 Search string for %LIKE%
// ?4 Plain search string
"SELECT h.url, h.title, f.url, b.id, b.parent, "
// Find the best rank for a given id/url.
"MAX("
// Adaptive ranking relies on inputhistory which can be NULL
"IFNULL(0 "
// adaptive: Factor in use_count if the input matches (0 to 10)
"+ (i.input LIKE ?3 ESCAPE '/') * i.use_count "
// adaptive: Give extra weight for exact matches (0 or 5)
"+ (i.input = ?4) * 5 "
// adaptive: Check "word boundary" for starting matches (0 or 5)
"+ (SUBSTR(i.input, 1, LENGTH(?4)) = ?4) * 5 "
", 0) "
// visit type: Give extra weight for typed results (0 or 1)
"+ h.typed "
// visit type: Give extra weight for bookmark visits (0 or 1)
"+ (v.visit_type = 3) "
// url size: Shorter urls (main page) over longer (sub page) (0 to .5)
"+ ROUND(5.0 / LENGTH(h.url), 1) "
") rank "
"FROM moz_places h "
"JOIN moz_historyvisits v ON h.id = v.place_id "
"LEFT OUTER JOIN moz_bookmarks b ON b.fk = h.id "
"LEFT OUTER JOIN moz_favicons f ON h.favicon_id = f.id "
"LEFT OUTER JOIN moz_inputhistory i ON i.place_id = h.id "
"WHERE v.visit_date >= ?1 AND v.visit_date <= ?2 AND h.hidden <> 1 AND "
" v.visit_type <> 0 AND v.visit_type <> 4 AND "
"(h.title LIKE ?3 ESCAPE '/' OR h.url LIKE ?3 ESCAPE '/') "
"GROUP BY h.id "
"ORDER BY rank DESC, h.visit_count DESC, MAX(v.visit_date) DESC;");
Comment 11•18 years ago
|
||
why using visit_type=3 instead of b.id NOT NULL?
this query could be probably an ideal query to use triggers: add a trigger that updates rank on every visit insert and add a rank column to moz_places... should really be effective and fast
Comment 12•18 years ago
|
||
Marco, you seem to better understand the performance aspect better.. I was thinking about changing the above query into sub queries instead of JOINing on moz_historyvisits and moz_inputhistory. Both of those together will effectively cross join on both of those tables for each moz_place.id.
So this query here is changed to the 2 sub-queries per moz_place.id
SELECT COUNT(*) FROM (
SELECT h.url, h.title, h.visit_count,
(SELECT MAX(v.visit_date) FROM moz_historyvisits v WHERE v.place_id = h.id AND v.visit_type <> 0 AND v.visit_type <> 4) visit_date,
IFNULL((SELECT MAX((i.input LIKE '%%' ESCAPE '/') * i.use_count + (i.input = '') * 5 + (SUBSTR(i.input, 1, LENGTH('')) = '') * 5) FROM moz_inputhistory i WHERE i.place_id = h.id), 0) + h.typed + NOT b.id ISNULL + ROUND(5.0 / LENGTH(h.url), 1) rank
FROM moz_places h
LEFT OUTER JOIN moz_bookmarks b ON b.fk = h.id
LEFT OUTER JOIN moz_favicons f ON h.favicon_id = f.id
WHERE h.hidden <> 1 AND
(h.title LIKE '%%' ESCAPE '/' OR h.url LIKE '%%' ESCAPE '/')
GROUP BY h.id
ORDER BY rank DESC, h.visit_count DESC, visit_date DESC
)
And the original query..
SELECT COUNT(*) FROM (
SELECT h.url, h.title, h.visit_count,
MAX(v.visit_date) visit_date,
MAX(IFNULL((i.input LIKE '%%' ESCAPE '/') * i.use_count + (i.input = '') * 5 + (SUBSTR(i.input, 1, LENGTH('')) = '') * 5, 0) + h.typed + NOT b.id ISNULL + ROUND(5.0 / LENGTH(h.url), 1)) rank
FROM moz_places h
JOIN moz_historyvisits v ON v.place_id = h.id
LEFT OUTER JOIN moz_bookmarks b ON b.fk = h.id
LEFT OUTER JOIN moz_favicons f ON h.favicon_id = f.id
LEFT OUTER JOIN moz_inputhistory i ON i.place_id = h.id
WHERE h.hidden <> 1 AND v.visit_type <> 0 AND v.visit_type <> 4 AND
(h.title LIKE '%%' ESCAPE '/' OR h.url LIKE '%%' ESCAPE '/')
GROUP BY h.id
ORDER BY rank DESC, h.visit_count DESC, visit_date DESC
)
Both the queries perform about the same for my places, history and inputhistory. I suppose the former would be better if there are many historyvisits for a place id that also has many inputhistory for that same place id. But most likely there'll be at most a couple inputhistory per place id and that table is small to begin with.
(In reply to comment #11)
> updates rank on every visit insert and add a rank column to moz_places...
I believe that's what Seth is doing for bug 394038. Essentially all the static rank stuff can be pre-computed (v.visit_count, MAX(v.visit_date), h.typed, b.id ISNULL, LENGTH(h.url)) but the adaptive stuff that depends on the user's input depend on whatever the user types in.
Comment 13•18 years ago
|
||
subqueries should be faster than joining on big tables, but this needs some testing, i think however that the first query is better, i'd move the rank calculation directly into the order by clause, but again this should be tested to have timings, this could prevent calculating the rank for visits that will be discarded by the group by clause
for max visit_date see bug 392399, with a new index and the new fragment (limited order by) it will perform faster, but for now using max is ok
i'll try to take some timings on the queries
Comment 14•18 years ago
|
||
(In reply to comment #13)
> i'd move the rank
> calculation directly into the order by clause, but again this should be tested
> to have timings, this could prevent calculating the rank for visits that will
> be discarded by the group by clause
Would that do the right thing? Like for the MAX(visit_date), the MAX is aggregating over all the GROUPed place.id. Similarly, the rank is computed to be the best out of the GROUPed place.id, so the GROUP BY can't really discard computations.
Comment 15•18 years ago
|
||
i'm not sure if sqlite makes the computation before or after the group by... i think that you end up having
h.id | v.id | rank
12 32 5
12 34 5
12 56 5
and then the group unify the rows, while putting in the order by shoul be
h.id | v.id
12 32
12 34
12 56
then group by unify the rows and then calls order by that do the calculation once per place_id
Comment 16•18 years ago
|
||
mmh wait that is true when joining on moz_historyvisits, but since you don't join anymore i'm not sure the group by is needed at all...
Comment 17•18 years ago
|
||
the only things affected by group by are b.id and f.id, for a single place_id you have only one favicon, so that does not require grouping... while if a place_id has more than one bookmark (is this possibile now? i don't think) how can you choose the bookmark.id? So, if you don't join on moz_historyvisits you don't need to group by at all...
so the only think affecting moving rank to order by is some memory saving when returning the results from sqlite to places (you don't return rank), but since rank is an int will be really a little saving
Comment 18•18 years ago
|
||
moz_inputhistory can have multiple entries for a given h.id
'mozilla' and 'moz' can both lead to the same place.id
If the user types in 'moz', the 'moz' entry will get extra weight for a perfect match so it's rank would be higher than the 'mozilla' entry.
And rank is a float in this case.
Comment 19•18 years ago
|
||
yes, but since you are not joining on moz_inputhistory, but doing the work on a subselect, the group id of the main query is not used, it applies only to joined tables (favicon and bookmarks).
the only thing that the group by can change in rank calculation is the b.id, but if a place_id cannot have different bookmarks it does nothing.
FROM moz_places h
LEFT OUTER JOIN moz_bookmarks b ON b.fk = h.id
LEFT OUTER JOIN moz_favicons f ON h.favicon_id = f.id
GROUP BY h.id
the group by will catch in if a place_id has more than one entry in bookmarks or more than one entry in favicons... subselected columns cannot add rows to resultset, while joins do
Comment 20•18 years ago
|
||
> this query could be probably an ideal query to use triggers: add a trigger that
> updates rank on every visit insert and add a rank column to moz_places...
> should really be effective and fast
a drive by thought on triggers:
while visits happen one (or a few, think about opening n tabs at once) at a time, visit removal happens in bulk (think clear private data). I'm concerned that if we do calculation on a trigger, we could impact performace.
another drive by thought:
in bug #394038, the general approach is to pay the cost of the frecency calculation on visit time, and not on autocomplete query time. a visit happens one at a time, but we do several queries per key stroke (because of chunking) when typing in the url bar.
I haven't read all the recent comments and thoughts from Edward, I'll go do that so I better understand what he is doing.
I'll go write up the approach in bug #394038 in more details (so you don't have to read the patch to figure it out).
Comment 21•18 years ago
|
||
(In reply to comment #19)
> yes, but since you are not joining on moz_inputhistory, but doing the work on a
> subselect, the group id of the main query is not used, it applies only to
> joined tables (favicon and bookmarks).
Yup. Thought you were referring to the first query I pasted that joined on historyvisits and inputhistory.
> the group by will catch in if a place_id has more than one entry in bookmarks
> or more than one entry in favicons
I don't see there being more than one favicon for a url, but there's potentially multiple bookmarks. But would we want to show multiple bookmark entries? They could have different titles as well.
Seth: There's 2 separate aspects to my stack of patches: the adaptive feedback and a query-time ranking. The query-time ranking is similar to the frecency stuff in that it ranks results, but bug 394038 will do things at visit time. I'm assuming eventually there'll be a split of ranks that we can pre-compute and weights that can only be applied with inputs.
Comment 23•18 years ago
|
||
I just typed "gateway" into the location bar, and instead of http://gateway.2wire.net/, which is what I wanted, I got http://www.cvs.com/CVSApp/cvs/gateway/cvsstores on the top of the list.
Comment 24•18 years ago
|
||
That's a separate issue. This bug is about pages from the same domain. You're talking more about bug 409895.
That should be addressed with this build..
https://build.mozilla.org/tryserver-builds/2007-12-26_15:26-edward.lee@engineering.uiuc.edu-awesom.domain/
| Reporter | ||
Comment 25•17 years ago
|
||
Mardak, are you going to try to get this in for Firefox 3 by any chance?
Comment 26•17 years ago
|
||
This shouldn't be as much as a problem anymore now that adaptive learning is in. There are only so many root sites that you can visit compared to the subpages, and you're likely to select the root site multiple times causing it to be adapted.
I doubt that we'll do ranking of results when querying they've gotten in the DB for perf reasons.
Seth: I suppose it would be possible to tweak the frecency to give higher rank to "main websites", but as I've pointed out before, it's difficult to determine what's a root. It's not just http://site/ because it could be http://site/MainPage.html. Or what I did before with an earlier patch.. just prefer shorter urls.
Comment 27•17 years ago
|
||
> Seth: I suppose it would be possible to tweak the frecency to give higher rank
> to "main websites"
I haven't thought as long or as hard about this as you, but it seems to me that your adaptive feature would be enough to solve this problem, as over time, that should improve results to give users what they want.
or is this bug more about improving the quality of results before we have have adaptive data?
either way, I wouldn't block fx 3 for this one.
| Reporter | ||
Comment 28•17 years ago
|
||
(In reply to comment #27)
> > Seth: I suppose it would be possible to tweak the frecency to give higher rank
> > to "main websites"
>
> I haven't thought as long or as hard about this as you, but it seems to me that
> your adaptive feature would be enough to solve this problem, as over time, that
> should improve results to give users what they want.
>
> or is this bug more about improving the quality of results before we have have
> adaptive data?
>
> either way, I wouldn't block fx 3 for this one.
>
Adaptive learning can compensate any kind shortcomings in the frecency algorithm, because the initial ordering of the results is going be gradually changed to match the user's behaviour. However, this shouldn't be a reason not to refine the initial results.
So yes, this bug is about improving the quality of results before we have have
adaptive data. I don't think it should block Firefox 3 either, but it would be a nice-to-have.
Comment 30•13 years ago
|
||
Bug 240397 is a duplicate
Updated•13 years ago
|
| Reporter | ||
Comment 32•13 years ago
|
||
Dao, I'm reopening for reconsideration. This bug is probably more useful because it contains research and discussion from Mardak and Marco, whereas bug 240397 is old and predates Awesomebar. Feel free to mark as duplicate again if you don't agree, though.
Status: RESOLVED → REOPENED
Resolution: DUPLICATE → ---
Comment 33•7 years ago
|
||
Per policy at https://wiki.mozilla.org/Bug_Triage/Projects/Bug_Handling/Bug_Husbandry#Inactive_Bugs. If this bug is not an enhancement request or a bug not present in a supported release of Firefox, then it may be reopened.
Status: REOPENED → RESOLVED
Closed: 13 years ago → 7 years ago
Resolution: --- → INACTIVE
You need to log in
before you can comment on or make changes to this bug.
Description
•