Closed Bug 1924611 Opened 4 months ago Closed 4 months ago

Firefox does not accurately recognize language of various Spanish language websites.

Categories

(Firefox :: Translations, defect)

Firefox 131
defect

Tracking

()

RESOLVED DUPLICATE of bug 1859081

People

(Reporter: houselooknow, Unassigned)

Details

User Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:131.0) Gecko/20100101 Firefox/131.0

Steps to reproduce:

Load http://www.cespe.gob.mx/public/ or https://www.cfe.mx/Pages/default.aspx sites with English selected as preferred language for displaying pages.

Actual results:

Sites are not recognized as Spanish language and there is no offer to translate or translate icon in the address field. If "Translate page" is selected from the Firefox menu it offers to translate from English for both pages.

The same thing occurs in both Linux and Windows.

Expected results:

Sites should be recognized as in Spanish with appropriate translate options.

The Bugbug bot thinks this bug should belong to the 'Firefox::Translations' component, and is moving the bug to that component. Please correct in case you think the bot is wrong.

Component: Untriaged → Translations

Hey houselooknow, thanks for filing this!

It looks like both of these are examples where the developers of the websites have incorrectly labeled their page as being in English.

For http://www.cespe.gob.mx/public/ the markup contains:

<html lang="en" dir="ltr">

For https://www.cfe.mx/Pages/default.aspx the markup contains:

<html dir="ltr" lang="en-US">

The current implementation of our algorithm is to trust the page's language tag as being the true language of the page (since it was put there explicitly by the developers). However, there are many cases such as these where the tag does not seem to match.

We discussed in Bug 1859081 (comment 3) a potential algorithmic change to consider running the language detector even when the page developers have specified a language.

However, even in the discussed algorithm, we would end up in the case where the specified page language and the detected language will likely disagree, and continue to not offer translations in this case.

The algorithm hasn't been implemented yet, and we may consider further improvements when we prioritize improving the language detection.

For the moment, I am going to mark this bug as a duplicate of the one I have been referring to.

In the meantime, you may want to reach out to the site developers and ask them to remove the markup tag that specifies the page is in English.

If you have any further questions or comments, feel free to comment here or in Bug 1859081.

Status: UNCONFIRMED → RESOLVED
Closed: 4 months ago
Duplicate of bug: 1859081
Resolution: --- → DUPLICATE

(In reply to Erik Nordin [:nordzilla] from comment #2)

Hey houselooknow, thanks for filing this!

It looks like both of these are examples where the developers of the websites have incorrectly labeled their page as being in English.

For http://www.cespe.gob.mx/public/ the markup contains:

<html lang="en" dir="ltr">

For https://www.cfe.mx/Pages/default.aspx the markup contains:

<html dir="ltr" lang="en-US">

The current implementation of our algorithm is to trust the page's language tag as being the true language of the page (since it was put there explicitly by the developers). However, there are many cases such as these where the tag does not seem to match.

We discussed in Bug 1859081 (comment 3) a potential algorithmic change to consider running the language detector even when the page developers have specified a language.

However, even in the discussed algorithm, we would end up in the case where the specified page language and the detected language will likely disagree, and continue to not offer translations in this case.

The algorithm hasn't been implemented yet, and we may consider further improvements when we prioritize improving the language detection.

For the moment, I am going to mark this bug as a duplicate of the one I have been referring to.

In the meantime, you may want to reach out to the site developers and ask them to remove the markup tag that specifies the page is in English.

If you have any further questions or comments, feel free to comment here or in Bug 1859081.

*** This bug has been marked as a duplicate of bug 1859081 ***

Unfortunately developers in Mexico regularly mislabel their pages and there are many more examples of this problem.

It would be helpful if the translate icon in the URL field could be turned on by default.

Thanks for the response.

You need to log in before you can comment on or make changes to this bug.