Google redirect links on search suggestion can be misleading
Categories
(Firefox :: Address Bar, enhancement, P3)
Tracking
()
People
(Reporter: florencia.diciocco, Assigned: mcheang)
References
(Blocks 1 open bug)
Details
(Keywords: papercut, Whiteboard: [snt-scrubbed][search-papercut])
Attachments
(2 files)
[Affected platforms:]
mac 10.14.6
Ubuntu 18.04.2 LTS
Windows: will test it by eod
[Affected versions:]
Nightly 86.0a1 (2021-01-21)
Beta 85.0b9
ESR 78.7.0esr
Release 84.0.2
[Description:]
FF is showing redirect links as history suggestions when you search for a visited link. This is a bit misleading since the user might believe that it's opening google but ends up in another site.
[Steps:]
- Go to google.com and google something.
- Open a result/link in a new tab
- Close those tabs
- In a new tab, in the url bar type the first letters of the link opened in the new tab.
- It appears the google link (the one in grey). Click on it.
[Actual result:]
It opens the link from step 2
[Expected result:]
It should open a google link.
Comment 1•3 years ago
•
|
||
In general we don't hide redirect sources, because sometimes they can be more useful or readable than targets. Often the user also remembers what they typed, rather than the destination.
This is a special case though, it's a redirect where the url contains another embedded url... It sounds like we should be able to do something about these, that may also just mean hiding them (since the destination visit should be in history anyway).
Updated•3 years ago
|
Comment 2•3 years ago
|
||
The component has been changed since the backlog priority was decided, so we're resetting it.
For more information, please visit auto_nag documentation.
Updated•3 years ago
|
Comment 3•2 years ago
|
||
We should investigate de-dupe urls containing embedded urls for search results.
For example, there’s a url to the recipe for ice tea, but there’s another result that’s for google to redirect to the same url with the recipe for ice tea. Therefore, this is a duplicate
Updated•2 years ago
|
Updated•1 year ago
|
Comment 4•1 year ago
•
|
||
I was thinking of picking this up, but I feel like I need more info to know how to move forward.
If I'm understanding this correctly, the issue is that target url is also contained in the Google search results URL. So if the user happens to type a string that matches both the target URL as well as something in the source URL, then source link may show up in suggest.
Other providers (i.e. Bing) may also use redirects but they may not be including the target URL in the request URL, so we avoid these duplicate entries.
I feel like I should know how URL's are added to history. There are times when I visit a link that redirects but it doesn't appear in my list of recent history. Thus I think it's worth investigating the process of adding a link to history. I'm going to needsinfo you :mak because you probably have a quick answer to this (and more insight).
I feel into the weeds and stepped through the backend logic of adding URI visits which called VisitURI
in History.cpp. Will do more testing to see how the behaviour of this method changes depending on the conditions.
Comment 5•1 year ago
|
||
(In reply to James Teow [:jteow] from comment #4)
If I'm understanding this correctly, the issue is that target url is also contained in the Google search results URL. So if the user happens to type a string that matches both the target URL as well as something in the source URL, then source link may show up in suggest.
that's correct, both source and target may show.
Other providers (i.e. Bing) may also use redirects but they may not be including the target URL in the request URL, so we avoid these duplicate entries.
I didn't check what others are doing, off-hand it may be interesting and useful to check if Bing, DuckDuckGo, Yahoo, Yandex, Baidu (I think these are still the most used engines today) do something similar.
I feel into the weeds and stepped through the backend logic of adding URI visits which called
VisitURI
in History.cpp. Will do more testing to see how the behaviour of this method changes depending on the conditions.
When the docshell visits a url, is is indeed invoking VisitURI and SetURITitle (this is not a single call because the title must be extracted later from the DOM, there was some discussion about how to do a single call but it's low priority). The docshell knows the most about of the page was visited, and passes that info through flags
History may decide that some visits are not worth showing in the UI, and then sets the hidden field that can be used later by history views to exclude them. The main reason for that is performance of views showing hundreds/thousands of entries and providing to the user history lists that are more useful (like in the history menu). I think the urlbar, like searching through history, completely ignores hidden, the reason is if you are looking for something you should be able to find it.
The other part of this is frecency, when updating frecency from a new visit, IIRC we pass down the redirect info, otherwise we try to extract it from history itself. If we detect a visit is a redirect source, that is 0.
Sometimes we exclude frecency = 0 entries in the urlbar, but I suspect we're not very consistent with that, some queries may not do it, or frecency of these entries may not be exactly zero.
Comment 6•1 year ago
|
||
Note ideally you could just dedupe in the Urlbar Muxer, there is already code doing deduplication of other urls there.
Comment 7•1 year ago
|
||
Thanks for the background info, Marco. I'll do a deeper dive on how the main search engines code their URLs.
If just de-duping from the Muxer, I feel the simplest way without causing potential side effects would be to check if the URL comes from a search engine domain and it contains a URL in one of the query params.
The key point is making sure the URL could be from a SERP, which isn't trivial since the URL isn't necessarily going to have the same sub domain and path of a SERP, hence making parseSubmissionURL
not applicable.
Some ideas:
- UrlbarSearchUtils.enginesForDomainPrefix(): While I can extract the domain from the result URL, the helper is async and it's overall implementation could add a performance hit to
_canAddResult
- SearchUtils.GENERAL_SEARCH_ENGINE_IDS: Check if the result domain starts with a
www
and contains a name that starts with any of the general search engines before checking the query params. It's not entirely comprehensive since you can't check engines outside of those defined engines, and if the provider switches or uses subdomains for the redirect this won't work - Hard code an exception if this is predominantly an issue with Google
- Dig deeper to find out why this result is being shown whereas other redirects from other SERPs aren't always shown even if the user types a terms that are similar to the redirect URL: Maybe something could be done prevent the result from appearing in
_canAddResult
?
Comment 8•1 year ago
|
||
I took a stab at conducting searches in the urlbar, opening both non-ads and ads into new tabs and then checked the places.sqlite -> MozPlaces database for changes.
Non-ad links:
- Google and Yahoo US can include a URL as a param value.
- Yahoo US will also use a different sub-domain to trigger the redirect (r.search.yahoo.com), so strictly matching against www.yahoo.com won't work.
- DDG, Yandex don't add hidden entries.
- Bing and Baidu do add hidden entries but they don't encode the target URLs in a human readable format.
Ad-links:
- Google and Yahoo can include the target domain in their ad URLs, though the domains are different from the search engine.
- Bing doesn't include the target domain in the ad url.
- DDG uses their own URL that does include the target domain, before loading the Bing ad URL.
- Baidu doesn't include a target domain in the ad URL that I saw. Likewise Yandex.
Also interesting is that other browsers may choose to omit these ad URLs in history even if you paste the ad manually: Perhaps they prevent specific URLs that redirect from being added to the user history? Or maybe they cherry pick a specific set of domains that they know are ad serving.
Assignee | ||
Comment 9•4 months ago
|
||
Assignee | ||
Comment 10•4 months ago
|
||
I have a WIP patch for this: https://phabricator.services.mozilla.com/D186578
It's not exactly ready yet and it's deduping but in some cases it's not. So I'm still testing it out.
A question I had was should we only apply this to the google case? Or should we parse the nested url for other search engines?
I did a manual test and the problem is only with google redirects.
Reproducible in
Not Reproducible
- DDG
- Bing
- Qwant
- Yahoo
- ask.com
- Baidu
Updated•3 months ago
|
Comment 11•3 months ago
|
||
(In reply to Mandy Cheang [:mcheang][she/her] from comment #10)
A question I had was should we only apply this to the google case? Or should we parse the nested url for other search engines?
The best would ideally be to apply it only to redirecting urls... unfortunately we don't have that info available easily, so for now we could use a short blocklist with partial origins and file a follow-up bug to return redirecting info from history.
Comment 12•3 months ago
|
||
Pushed by mcheang@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/cacd1aee41f8 Dedupe google redirect links results. r=mak
Comment 13•3 months ago
|
||
bugherder |
Updated•3 months ago
|
Comment 14•2 months ago
|
||
Confirming enhancement is implemented on Nightly 119.0a1(build ID: 20230925093027) on Ubuntu 22, macOS 12, Windows 10.
Description
•