Closed Bug 1688019 Opened 5 years ago Closed 2 years ago

Google redirect links on search suggestion can be misleading

Tracking

()

Status:

VERIFIED FIXED

Milestone:

119 Branch

Tracking Flags:

Tracking

Status

firefox-esr78

---

wontfix

firefox84

---

wontfix

firefox85

---

wontfix

firefox86

---

wontfix

firefox119

---

verified

People

(Reporter: florencia.diciocco, Assigned: mcheang)

References

(Blocks 1 open bug)

Details

(Keywords: papercut, Whiteboard: [snt-scrubbed][search-papercut])

Attachments

(2 files)

Screenshot 5 years ago Florencia Di Ciocco, NI to vbalducci 1.90 MB, image/png		Details
Bug 1688019 - Dedupe google redirect links results. r=mak 2 years ago Mandy Cheang [:mcheang][she/her] 48 bytes, text/x-phabricator-request		Details \| Review

Florencia Di Ciocco, NI to vbalducci

Reporter

Description

•

5 years ago

Attached image Screenshot — Details

[Affected platforms:]
mac 10.14.6
Ubuntu 18.04.2 LTS
Windows: will test it by eod

[Affected versions:]
Nightly 86.0a1 (2021-01-21)
Beta 85.0b9
ESR 78.7.0esr
Release 84.0.2

[Description:]
FF is showing redirect links as history suggestions when you search for a visited link. This is a bit misleading since the user might believe that it's opening google but ends up in another site.

[Steps:]

Go to google.com and google something.
Open a result/link in a new tab
Close those tabs
In a new tab, in the url bar type the first letters of the link opened in the new tab.
It appears the google link (the one in grey). Click on it.

[Actual result:]
It opens the link from step 2

[Expected result:]
It should open a google link.

Marco Bonardo [:mak]

Comment 1

•

5 years ago

•

Edited

In general we don't hide redirect sources, because sometimes they can be more useful or readable than targets. Often the user also remembers what they typed, rather than the destination.

This is a special case though, it's a redirect where the url contains another embedded url... It sounds like we should be able to do something about these, that may also just mean hiding them (since the destination visit should be in history anyway).

Blocks: qb-results-papercuts

Severity: -- → S3

Points: --- → 5

Keywords: papercut

Priority: -- → P3

Mark Banner (:standard8)

Updated

•

5 years ago

Component: Search → Address Bar

BugBot [:suhaib / :marco/ :calixte]

Comment 2

•

5 years ago

The component has been changed since the backlog priority was decided, so we're resetting it.
For more information, please visit auto_nag documentation.

Priority: P3 → --

Marco Bonardo [:mak]

Updated

•

5 years ago

Priority: -- → P3

James Teow [:jteow]

Comment 3

•

3 years ago

We should investigate de-dupe urls containing embedded urls for search results.

For example, there’s a url to the recipe for ice tea, but there’s another result that’s for google to redirect to the same url with the recipe for ice tea. Therefore, this is a duplicate

Assignee: nobody → mcheang

Whiteboard: [snt-triaged][search-papercut]

Jira Integration Bot

Updated

•

3 years ago

See Also: → https://mozilla-hub.atlassian.net/browse/SNT-167

Chris Bellini (:cbellini)

Updated

•

3 years ago

Whiteboard: [snt-triaged][search-papercut] → [snt-scrubbed][search-papercut]

James Teow [:jteow]

Comment 4

•

3 years ago

•

Edited

I was thinking of picking this up, but I feel like I need more info to know how to move forward.

If I'm understanding this correctly, the issue is that target url is also contained in the Google search results URL. So if the user happens to type a string that matches both the target URL as well as something in the source URL, then source link may show up in suggest.

Other providers (i.e. Bing) may also use redirects but they may not be including the target URL in the request URL, so we avoid these duplicate entries.

I feel like I should know how URL's are added to history. There are times when I visit a link that redirects but it doesn't appear in my list of recent history. Thus I think it's worth investigating the process of adding a link to history. I'm going to needsinfo you :mak because you probably have a quick answer to this (and more insight).

I feel into the weeds and stepped through the backend logic of adding URI visits which called VisitURI in History.cpp. Will do more testing to see how the behaviour of this method changes depending on the conditions.

Flags: needinfo?(mak)

Marco Bonardo [:mak]

Comment 5

•

3 years ago

(In reply to James Teow [:jteow] from comment #4)

If I'm understanding this correctly, the issue is that target url is also contained in the Google search results URL. So if the user happens to type a string that matches both the target URL as well as something in the source URL, then source link may show up in suggest.

that's correct, both source and target may show.

Other providers (i.e. Bing) may also use redirects but they may not be including the target URL in the request URL, so we avoid these duplicate entries.

I didn't check what others are doing, off-hand it may be interesting and useful to check if Bing, DuckDuckGo, Yahoo, Yandex, Baidu (I think these are still the most used engines today) do something similar.

I feel into the weeds and stepped through the backend logic of adding URI visits which called VisitURI in History.cpp. Will do more testing to see how the behaviour of this method changes depending on the conditions.

When the docshell visits a url, is is indeed invoking VisitURI and SetURITitle (this is not a single call because the title must be extracted later from the DOM, there was some discussion about how to do a single call but it's low priority). The docshell knows the most about of the page was visited, and passes that info through flags
History may decide that some visits are not worth showing in the UI, and then sets the hidden field that can be used later by history views to exclude them. The main reason for that is performance of views showing hundreds/thousands of entries and providing to the user history lists that are more useful (like in the history menu). I think the urlbar, like searching through history, completely ignores hidden, the reason is if you are looking for something you should be able to find it.
The other part of this is frecency, when updating frecency from a new visit, IIRC we pass down the redirect info, otherwise we try to extract it from history itself. If we detect a visit is a redirect source, that is 0.
Sometimes we exclude frecency = 0 entries in the urlbar, but I suspect we're not very consistent with that, some queries may not do it, or frecency of these entries may not be exactly zero.

Flags: needinfo?(mak)

Marco Bonardo [:mak]

Comment 6

•

3 years ago

Note ideally you could just dedupe in the Urlbar Muxer, there is already code doing deduplication of other urls there.

James Teow [:jteow]

Comment 7

•

3 years ago

Thanks for the background info, Marco. I'll do a deeper dive on how the main search engines code their URLs.

If just de-duping from the Muxer, I feel the simplest way without causing potential side effects would be to check if the URL comes from a search engine domain and it contains a URL in one of the query params.

The key point is making sure the URL could be from a SERP, which isn't trivial since the URL isn't necessarily going to have the same sub domain and path of a SERP, hence making parseSubmissionURL not applicable.

Some ideas:

UrlbarSearchUtils.enginesForDomainPrefix(): While I can extract the domain from the result URL, the helper is async and it's overall implementation could add a performance hit to _canAddResult
SearchUtils.GENERAL_SEARCH_ENGINE_IDS: Check if the result domain starts with a www and contains a name that starts with any of the general search engines before checking the query params. It's not entirely comprehensive since you can't check engines outside of those defined engines, and if the provider switches or uses subdomains for the redirect this won't work
Hard code an exception if this is predominantly an issue with Google
Dig deeper to find out why this result is being shown whereas other redirects from other SERPs aren't always shown even if the user types a terms that are similar to the redirect URL: Maybe something could be done prevent the result from appearing in _canAddResult?

James Teow [:jteow]

Comment 8

•

3 years ago

I took a stab at conducting searches in the urlbar, opening both non-ads and ads into new tabs and then checked the places.sqlite -> MozPlaces database for changes.

Non-ad links:

Google and Yahoo US can include a URL as a param value.
Yahoo US will also use a different sub-domain to trigger the redirect (r.search.yahoo.com), so strictly matching against www.yahoo.com won't work.
DDG, Yandex don't add hidden entries.
Bing and Baidu do add hidden entries but they don't encode the target URLs in a human readable format.

Ad-links:

Google and Yahoo can include the target domain in their ad URLs, though the domains are different from the search engine.
Bing doesn't include the target domain in the ad url.
DDG uses their own URL that does include the target domain, before loading the Bing ad URL.
Baidu doesn't include a target domain in the ad URL that I saw. Likewise Yandex.

Also interesting is that other browsers may choose to omit these ad URLs in history even if you paste the ad manually: Perhaps they prevent specific URLs that redirect from being added to the user history? Or maybe they cherry pick a specific set of domains that they know are ad serving.

Mandy Cheang [:mcheang][she/her]

Assignee

Comment 9

•

2 years ago

Attached file Bug 1688019 - Dedupe google redirect links results. r=mak — Details

Mandy Cheang [:mcheang][she/her]

Assignee

Comment 10

•

2 years ago

I have a WIP patch for this: https://phabricator.services.mozilla.com/D186578
It's not exactly ready yet and it's deduping but in some cases it's not. So I'm still testing it out.

A question I had was should we only apply this to the google case? Or should we parse the nested url for other search engines?
I did a manual test and the problem is only with google redirects.

Reproducible in

Google

Not Reproducible

DDG
Bing
Qwant
Yahoo
ask.com
Baidu

Phabricator Automation

Updated

•

2 years ago

Attachment #9349636 - Attachment description: WIP: Bug 1688019 - Dedupe google redirect links results. r=mak → Bug 1688019 - Dedupe google redirect links results. r=mak

Marco Bonardo [:mak]

Comment 11

•

2 years ago

(In reply to Mandy Cheang [:mcheang][she/her] from comment #10)

A question I had was should we only apply this to the google case? Or should we parse the nested url for other search engines?

The best would ideally be to apply it only to redirecting urls... unfortunately we don't have that info available easily, so for now we could use a short blocklist with partial origins and file a follow-up bug to return redirecting info from history.

Pulsebot

Comment 12

•

2 years ago

Pushed by mcheang@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/cacd1aee41f8 Dedupe google redirect links results. r=mak

Serban Stanca [:SerbanS]

Comment 13

•

2 years ago

bugherder

https://hg.mozilla.org/mozilla-central/rev/cacd1aee41f8

Status: NEW → RESOLVED

Closed: 2 years ago

status-firefox119: --- → fixed

Resolution: --- → FIXED

Target Milestone: --- → 119 Branch

Ryan VanderMeulen [:RyanVM]

Updated

•

2 years ago

status-firefox84: affected → wontfix

status-firefox85: affected → wontfix

status-firefox86: affected → wontfix

status-firefox-esr78: affected → wontfix

Ardelean Oana, Desktop Test Engineering [:oardelean]

Comment 14

•

2 years ago

Confirming enhancement is implemented on Nightly 119.0a1(build ID: 20230925093027) on Ubuntu 22, macOS 12, Windows 10.

Status: RESOLVED → VERIFIED

status-firefox119: fixed → verified

You need to log in before you can comment on or make changes to this bug.