Closed Bug 681904 Opened 13 years ago Closed 13 years ago

[SEO] Google Sitelinks show 3.0.x whatsnew page

Categories

(www.mozilla.org :: General, defect)

defect
Not set
normal

Tracking

(Not tracked)

VERIFIED FIXED

People

(Reporter: kohei, Assigned: rik)

References

()

Details

(Whiteboard: r=97767,97768 b=trunk)

+++ This bug was initially created as a clone of Bug #681753 +++

While the Sitelinks were updated as a part of Bug 629787, recently Google changed the system. Now those are difficult to control; we cannot block specific URLs. Anyway we should care more about that.

http://www.google.com/search?hl=en&q=Firefox

Such links should not be there.

> Firefox 4
> www.mozilla.com/firefox/fx/

> Firefox Updated
> http://www.mozilla.com/en-US/firefox/3.0.x/whatsnew/

1. Use Google Webmaster Tools to "demote" them.
2. Use the "noindex" meta tag (or HTTP header) for the in-product pages (whatnew, firstrun, etc.)
Official Google Webmaster Central Blog: Introducing new and improved sitelinks
http://googlewebmastercentral.blogspot.com/2011/08/introducing-new-and-improved-sitelinks.html
I'm ok with removing the whatsnew page from there but I'm not sure we should remove the firefox/fx/ one. Sure the title is bad but we need to poke Google to find out why.
Assignee: nobody → anthony
What we'd like Google to index is www.mozilla.com/firefox and we don't want IE users to visit firefox/fx, right?
(In reply to Kohei Yoshino from comment #3)
> What we'd like Google to index is www.mozilla.com/firefox and we don't want
> IE users to visit firefox/fx, right?

Either way the user will be redirected to the proper download page. The only out-of-date item here is the "Firefox 4" title tag (?) which is no longer live. That said, I'm not sure what we can do to remove that now. 

Ideally in the future it should not give a version number, but instead, say something like "Download Firefox".
(In reply to Kohei Yoshino from comment #3)
> What we'd like Google to index is www.mozilla.com/firefox and we don't want
> IE users to visit firefox/fx, right?
Oh you're right!

Laura: No, if you end up on /firefox/fx, we don't do a redirection depending on your user-agent. That's on purpose, we want people to be sure that they can share the same page.
(In reply to Anthony Ricaud (:rik) from comment #5)
> (In reply to Kohei Yoshino from comment #3)
> > What we'd like Google to index is www.mozilla.com/firefox and we don't want
> > IE users to visit firefox/fx, right?
> Oh you're right!
> 
> Laura: No, if you end up on /firefox/fx, we don't do a redirection depending
> on your user-agent. That's on purpose, we want people to be sure that they
> can share the same page.

That's fine. 

Main objective of this bug - do what we can to remove old version numbers from Google search results.
Blocks: 667557
Whiteboard: r=94607,94608,94609 b=trunk
I have verified ownership of mozilla.org. I cannot get ownership of mozilla.com since we now redirect. So I guess we should just wait for the sitelinks to be updated. I couldn't find info on when that happens.

I'm not sure what to do about /fx ? We don't want this page to be in the search results but we want the page that does the redirection to be. By putting a noindex here, we might loose some rankings.
Whiteboard: r=94607,94608,94609 b=trunk → r=94607,94608,94609,94696 b=trunk
Removed indexation of firstrun and whatsnew pages.
Whiteboard: r=94607,94608,94609,94696 b=trunk → r=94607,94608,94609,94696,94700 b=trunk
(In reply to Anthony Ricaud (:rik) from comment #7)
> I'm not sure what to do about /fx ?

My idea is to stop redirecting non-Firefox users to /firefox/new/ and use /firefox/ as the main landing page. Firefox users should be redirected to /firefox/fx/ as before. If non-Firefox users (including Googlebots) visit /firefox/fx/, just redirect them to /firefox/ with (probably) 303 See Other.
(In reply to Anthony Ricaud (:rik) from comment #8)
> Removed indexation of firstrun and whatsnew pages.

Did you send removal requests at least for http://www.mozilla.com/en-US/firefox/3.6.13/whatsnew/ ?
With this, Google will remove the blocked URLs within 24 hours. Or we have to wait days.
https://www.google.com/webmasters/tools/crawl-access?siteUrl=http://www.mozilla.org/&tid=removal-list
(In reply to Kohei Yoshino from comment #10)
> Did you send removal requests at least for http://www.mozilla.com/en-US/firefox/3.6.13/whatsnew/ ?

I mean http://www.mozilla.org/en-US/firefox/3.6.13/whatsnew/
(In reply to Anthony Ricaud (:rik) from comment #7)
> I cannot get ownership of mozilla.com since we now redirect.

Via Bug 629787 Comment 16, you have access to Google Webmaster Tools for mozilla.com.
If so this "Change of address" tool should work:
https://www.google.com/webmasters/tools/change-address?siteUrl=http://www.mozilla.com/
(In reply to Kohei Yoshino from comment #12)
> (In reply to Anthony Ricaud (:rik) from comment #7)
> > I cannot get ownership of mozilla.com since we now redirect.
> 
> Via Bug 629787 Comment 16, you have access to Google Webmaster Tools for
> mozilla.com.
> If so this "Change of address" tool should work:
> https://www.google.com/webmasters/tools/change-address?siteUrl=http://www.
> mozilla.com/

I don't have access anymore :( You need to keep the file online to keep ownership.

Even if we had access, the change of address form is only applicable if you transfer one website to a new one. Here, we were moving to an existing website. It's more like merging than moving.
(In reply to Kohei Yoshino from comment #9)
> (In reply to Anthony Ricaud (:rik) from comment #7)
> > I'm not sure what to do about /fx ?
> 
> My idea is to stop redirecting non-Firefox users to /firefox/new/ and use
> /firefox/ as the main landing page. Firefox users should be redirected to
> /firefox/fx/ as before. If non-Firefox users (including Googlebots) visit
> /firefox/fx/, just redirect them to /firefox/ with (probably) 303 See Other.
That might be a good idea. I'm gonna ask in the Google Webmasters Forum to get opinions from experts.
(In reply to Anthony Ricaud (:rik) from comment #8)
> Removed indexation of firstrun and whatsnew pages.

I just learned that Google had supported unofficial Noindex directive in robots.txt. It's the same as <meta name="googlebot" content="noindex">. Googlebot won't show the corresponding pages on their SERP but follows (passes PageRank to) links on the page, as I learned. So it's better than Disallow to block their crawl.

> User-agent: *
> Disallow: /*/products/download.html
> Disallow: /*/download/
> Disallow: /*/whatsnew/
> Disallow: /*/firstrun/
> 
> User-agent: Googlebot
> Disallow:
> Noindex: /*/products/download.html
> Noindex: /*/download/
> Noindex: /*/whatsnew/
> Noindex: /*/firstrun/

As always you can test robots.txt with Google Webmaster Tools:
https://www.google.com/webmasters/tools/crawl-access?siteUrl=http://www.mozilla.org/
Target Milestone: --- → 3.10
Target Milestone: 3.10 → 3.12
Anthony - Any updates here?
Target Milestone: 3.12 → 4.0
Target Milestone: 4.0 → 4.1
Target Milestone: 4.1 → 4.2
Target Milestone: 4.2 → 4.3
Target Milestone: 4.3 → 4.4
Anthony - Any updates here?

If not, please add it to the "Future" milestone or a milestone later in Q4 if it's out of scope for next Tuesdays release.
Target Milestone: 4.4 → 4.5
Anthony, ping, can you provide an update on where you are with this bug?
Target Milestone: 4.5 → 4.6
Hey Anthony - A friendly ping on this - can you update with your latest status? 

In the mean-time I'm moving to Future since it's not a likely candidate for this next upcoming release but please place it back into a milestone once it's ready to launch.
Target Milestone: 4.6 → Future
Anthony, we would like this in the 4.7 release. If it's not possible, please update in the bug as to why and the steps you will take to get it into X.X release (fill in with the release).
Target Milestone: Future → 4.7
My sincere apologies for the late answer.

The answer on the Google forums suggest to use rel=canonical so that's what I'm implementing.

Fixed with r97767.
Status: NEW → RESOLVED
Closed: 13 years ago
Resolution: --- → FIXED
Thank you for the update and fix Anthony, glad to close this one out :-)
Quick followup. I added a trailing slash to avoid a redirect. r97768.

Gonna push it as soon as it's QA-ed
Whiteboard: r=94607,94608,94609,94696,94700 b=trunk → r=97767,97768 b=trunk
qa-verified
Pushed with r97870.
verified fixed https://www.google.com/search?hl=en&q=Firefox
Status: RESOLVED → VERIFIED
Component: www.mozilla.org/firefox → www.mozilla.org
Component: www.mozilla.org → General
Product: Websites → www.mozilla.org
You need to log in before you can comment on or make changes to this bug.