Closed Bug 1144956 Opened 9 years ago Closed 9 years ago

Fix Google search result for 'bugzil.la' to list primary URL 'bugzilla.mozilla.org'

Categories

(bugzilla.mozilla.org :: Infrastructure, defect)

Production
defect
Not set
normal

Tracking

()

RESOLVED FIXED

People

(Reporter: Atoll, Assigned: reed)

Details

Attachments

(1 file)

Right now, searching Google for "bugzil.la" results in the BMO home page and Google's discovered sitemap being shown for the domain "bugzil.la", which is incorrect - Google should be informed, somehow (meta tags or webmaster tools or whatever), that the canonical URL for that result page is 'https://bugzilla.mozilla.org/'.

Unsure where to file, so starting out in BMO :: General, but assigning directly to :cmore as he requested.
If the network monitor is not broken, bugzil.la responds with status code 200 and Location: header field.
* The response status should be 301 rather than 200.
* Is it expected that Location: works even with 200? (Maybe Core/Networking bug)
(In reply to Masatoshi Kimura [:emk] from comment #1)
> * Is it expected that Location: works even with 200? (Maybe Core/Networking
> bug)

Hm, at least IE 8 and Chrome also responds with 200 with Location:.
note bugzil.la isn't owned by mozilla, :reed owns and maintains that domain.
Component: General → Infrastructure
QA Contact: mcote
Do you want Google to just completely ignore bugzil.la URLs complete for bugzilla.mozilla.org pages in the results page?
I need to verify the website with Google's webmaster tools. What would be easier: adding a meta page to the base html template in bugzilla or hosting a one-off single html page at the root of the domain?
Do you have an example that Google returns a bugzil.la result with a search other than "bugzil.la" ?
(In reply to Chris More [:cmore] from comment #4)
> Do you want Google to just completely ignore bugzil.la URLs complete for
> bugzilla.mozilla.org pages in the results page?

bugzilla.mozilla.org doesn't include canonical URLs on its pages (<link rel="canonical">):

https://support.google.com/webmasters/answer/139066?hl=en

Which means that Google thinks that https://bugzil.la/ is a distinct site from https://bugzilla.mozilla.org/.

If we add canonical URLs, then that solves the problem without any changes on the bugzil.la side, since Google will detect that the bugzil.la result is a duplicate for another canonical URL and purge it.

(In reply to Chris More [:cmore] from comment #5)
> I need to verify the website with Google's webmaster tools. What would be
> easier: adding a meta page to the base html template in bugzilla or hosting
> a one-off single html page at the root of the domain?

Both of these options would require cooperation from :reed, so I can't offer any guidance there. But if we add canonical tags, this becomes somewhat unnecessary.

(In reply to Chris More [:cmore] from comment #6)
> Do you have an example that Google returns a bugzil.la result with a search
> other than "bugzil.la" ?

Nope. I did notice that bugzil.la has a robots.txt that bans indexing, so Google *knows* about a lot of bugzil.la bug ID URLs, but doesn't index them.
(In reply to Richard Soderberg [:atoll] from comment #7)
> (In reply to Chris More [:cmore] from comment #4)
> > Do you want Google to just completely ignore bugzil.la URLs complete for
> > bugzilla.mozilla.org pages in the results page?
> 
> bugzilla.mozilla.org doesn't include canonical URLs on its pages (<link
> rel="canonical">):
> 
> https://support.google.com/webmasters/answer/139066?hl=en
> 
> Which means that Google thinks that https://bugzil.la/ is a distinct site
> from https://bugzilla.mozilla.org/.
> 
> If we add canonical URLs, then that solves the problem without any changes
> on the bugzil.la side, since Google will detect that the bugzil.la result is
> a duplicate for another canonical URL and purge it.

I think canonical URLs on bugzilla.mozilla.org has no effect to make it canonical itself. Otherwise any spam site can claim "I'm canonical." The canonical URLs must be added on bugzil.la.
By the way, did you read comment #1?
> * The response status should be 301 rather than 200.
(In reply to Masatoshi Kimura [:emk] from comment #8)
> I think canonical URLs on bugzilla.mozilla.org has no effect to make it
> canonical itself. Otherwise any spam site can claim "I'm canonical." The
> canonical URLs must be added on bugzil.la.

When you publish the canonical meta tag at site A and site B, with the tag pointing to site A, google removes site B from the results and stops penalizing you for duplication of content.

So any spam site can claim "I'm canonical", and they will promptly be delisted from the search index, because they're site B delisting themselves in favor of the canonical site A. This would be an extremely ineffective form of spam, but, yes, it's technically possible today for *any* site to do that.

Practically, this would result in all content served through BMO being treated as canonically from 'bugzilla.mozilla.org', regardless of shorteners or whatever.

(In reply to Masatoshi Kimura [:emk] from comment #9)
> By the way, did you read comment #1?
> > * The response status should be 301 rather than 200.

I did. I'm not authorized to alter bugzil.la to fix this, nor do I have history to confirm that this is accidental and can be corrected easily. I can only request that we fix this *somehow*, propose solutions, and let the BMO admins and the bugzil.la admin decide what to do.
from reading https://support.google.com/webmasters/answer/139066?hl=en it seems that changing bugzil.la to return 301 redirects instead of 200 should be sufficient.

reed, what do you think about making this change?
Flags: needinfo?(reed)
(In reply to Byron Jones ‹:glob› from comment #11)
> from reading https://support.google.com/webmasters/answer/139066?hl=en it
> seems that changing bugzil.la to return 301 redirects instead of 200 should
> be sufficient.
> 
> reed, what do you think about making this change?

No idea why it's sending 200 instead of 301. Looks set up correctly to me. I should just move this from Apache to nginx. Will do that either today or Sunday to see if that fixes it.
Flags: needinfo?(reed)
Moved bugzil.la from Apache to nginx, which let me fix a few other things as well.

$ curl -I https://bugzil.la
HTTP/1.1 301 Moved Permanently
Server: nginx
Date: Mon, 23 Mar 2015 08:47:50 GMT
Content-Type: text/html
Content-Length: 178
Connection: keep-alive
Location: https://bugzilla.mozilla.org/
Strict-Transport-Security: max-age=63113852; includeSubDomains; preload

Can folks confirm this solves the problem?
Assignee: chrismore.bugzilla → reed
Status: NEW → RESOLVED
Closed: 9 years ago
OS: Mac OS X → All
Hardware: x86 → All
Resolution: --- → FIXED
(In reply to Reed Loden [:reed] (use needinfo?) from comment #13)
> Moved bugzil.la from Apache to nginx, which let me fix a few other things as
> well.
> 
> $ curl -I https://bugzil.la
> HTTP/1.1 301 Moved Permanently
> Server: nginx
> Date: Mon, 23 Mar 2015 08:47:50 GMT
> Content-Type: text/html
> Content-Length: 178
> Connection: keep-alive
> Location: https://bugzilla.mozilla.org/
> Strict-Transport-Security: max-age=63113852; includeSubDomains; preload
> 
> Can folks confirm this solves the problem?

It will probably take a few days for the search index to update and eliminate 301 redirects from the results.
Verified "bugzilla.mozilla.org" is now the top search result for "bugzil.la", with the second result being a "denied by robots.txt" link that can't be purged while robots.txt is in place. Thank you!
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: