Bug 1907597 Comment 5 Edit History

Note: The actual edited comment in the bug view page will always show the original commenter’s name and original timestamp.

I think it's worth considering. 

On one hand, it appears to users as though we do the wrong thing here (even though we are intentionally respecting the language tag). On the other hand, there exist cases in which the specified language tag that _does_ match the content, and then it would be correct to not translate it.

I think the key goal here should feature parity with other popular browsers, and maximizing the perception of "correctness."

In Bug 1859081, we audited Google's algorithm for when to offer Translations for a page as a whole, and came up with a list of changes to make that would improve feature parity on our end. 

I think it would be worth doing the same thing here, before we make a final decision. 

---

**Surface-Level Investigation**

I experimented with Google Chrome a bit (without looking at their code yet), and I can't exactly tell what they do. 

They certainly seem to have a "detected language" mode, because I tested our [Select Translations test file](https://searchfox.org/mozilla-central/source/toolkit/components/translations/tests/browser/translations-tester-select.html), which correctly uses `lang` attributes and has content in 3 different languages. Google Chrome seems to have a full-page "translate from detected language" mode, that is capable of translating the entire page correctly to a single target language.

However, I also experimented with manually translating the `<select>` items in https://www.soeren-hentzschel.at/kontakt/ from German to Chinese (Simplified), then replacing them in the markup while retaining the `lang="en-US"` tag. This resulted in Google hilariously translating everything incorrectly, and, curiously, leaving one of the strings in Chinese (see attached screenshot). I assume it ran the Chinese strings through it's `de -> en` model, though if that's the case I'm surprised the end results are even coherent despite being incorrect translations. 

---

**Conclusion**

My stance right now is that we shouldn't make any changes to outright _ignore_ `lang` attributes in favor of fixing this single case, but we should consider cases in which we can run the language detector to try to achieve better feature parity with other popular browsers.
I think it's worth considering. 

On one hand, it appears to users as though we do the wrong thing here (even though we are intentionally respecting the language tag). On the other hand, there exist cases in which the specified language tags _do_ match the content, and then it would be correct to not translate it.

I think the key goal here should feature parity with other popular browsers, and maximizing the perception of "correctness."

In Bug 1859081, we audited Google's algorithm for when to offer Translations for a page as a whole, and came up with a list of changes to make that would improve feature parity on our end. 

I think it would be worth doing the same thing here, before we make a final decision. 

---

**Surface-Level Investigation**

I experimented with Google Chrome a bit (without looking at their code yet), and I can't exactly tell what they do. 

They certainly seem to have a "detected language" mode, because I tested our [Select Translations test file](https://searchfox.org/mozilla-central/source/toolkit/components/translations/tests/browser/translations-tester-select.html), which correctly uses `lang` attributes and has content in 3 different languages. Google Chrome seems to have a full-page "translate from detected language" mode, that is capable of translating the entire page correctly to a single target language.

However, I also experimented with manually translating the `<select>` items in https://www.soeren-hentzschel.at/kontakt/ from German to Chinese (Simplified), then replacing them in the markup while retaining the `lang="en-US"` tag. This resulted in Google hilariously translating everything incorrectly, and, curiously, leaving one of the strings in Chinese (see attached screenshot). I assume it ran the Chinese strings through it's `de -> en` model, though if that's the case I'm surprised the end results are even coherent despite being incorrect translations. 

---

**Conclusion**

My stance right now is that we shouldn't make any changes to outright _ignore_ `lang` attributes in favor of fixing this single case, but we should consider cases in which we can run the language detector to try to achieve better feature parity with other popular browsers.
I think it's worth considering. 

On one hand, it appears to users as though we do the wrong thing here (even though we are intentionally respecting the language tag). On the other hand, there exist cases in which the specified language tags _do_ match the content, and then it would be correct to not translate it.

I think the key goal here should be feature parity with other popular browsers, as well as maximizing the perception of "correctness."

In Bug 1859081, we audited Google's algorithm for when to offer Translations for a page as a whole, and came up with a list of changes to make that would improve feature parity on our end. 

I think it would be worth doing the same thing here, before we make a final decision. 

---

**Surface-Level Investigation**

I experimented with Google Chrome a bit (without looking at their code yet), and I can't exactly tell what they do. 

They certainly seem to have a "detected language" mode, because I tested our [Select Translations test file](https://searchfox.org/mozilla-central/source/toolkit/components/translations/tests/browser/translations-tester-select.html), which correctly uses `lang` attributes and has content in 3 different languages. Google Chrome seems to have a full-page "translate from detected language" mode, that is capable of translating the entire page correctly to a single target language.

However, I also experimented with manually translating the `<select>` items in https://www.soeren-hentzschel.at/kontakt/ from German to Chinese (Simplified), then replacing them in the markup while retaining the `lang="en-US"` tag. This resulted in Google hilariously translating everything incorrectly, and, curiously, leaving one of the strings in Chinese (see attached screenshot). I assume it ran the Chinese strings through it's `de -> en` model, though if that's the case I'm surprised the end results are even coherent despite being incorrect translations. 

---

**Conclusion**

My stance right now is that we shouldn't make any changes to outright _ignore_ `lang` attributes in favor of fixing this single case, but we should consider cases in which we can run the language detector to try to achieve better feature parity with other popular browsers.
I think it's worth considering. 

On one hand, it appears to users as though we do the wrong thing here (even though we are intentionally respecting the language tag). On the other hand, there exist cases in which the specified language tags _do_ match the content, and then it would be correct to not translate it.

I think the key goal here should be feature parity with other popular browsers, as well as maximizing the perception of "correctness."

In Bug 1859081, we audited Google's algorithm for when to offer Translations for a page as a whole, and came up with a list of changes to make that would improve feature parity on our end. 

I think it would be worth doing the same thing here, before we make a final decision. 

---

**Surface-Level Investigation**

I experimented with Google Chrome a bit (without looking at their code yet), and I can't exactly tell what they do. 

They certainly seem to have a "detected language" mode, because I tested our [Select Translations test file](https://searchfox.org/mozilla-central/source/toolkit/components/translations/tests/browser/translations-tester-select.html), which correctly uses `lang` attributes and has content in 3 different languages. Google Chrome seems to have a full-page "translate from detected language" mode, that is capable of translating the entire page correctly to a single target language.

However, I also experimented with manually translating the `<select>` items in https://www.soeren-hentzschel.at/kontakt/ from German to Chinese (Simplified), then replacing them in the markup while retaining the `lang="en-US"` tag. This resulted in Google hilariously translating everything incorrectly, and, curiously, leaving one of the strings in Chinese (see attached screenshot). I assume it ran the Chinese strings with the context of `de -> en`, though if that's the case I'm surprised the end results are even coherent despite being incorrect translations. 

---

**Conclusion**

It appears that Google Chrome is doing some fine-grained language detection, but is also not 100% correct when the `lang` attributes don't match the actual content. 

My stance right now is that we shouldn't make any changes to outright _ignore_ `lang` attributes in favor of fixing this single case, but we should consider cases in which we can run the language detector to try to achieve better feature parity with other popular browsers.

Back to Bug 1907597 Comment 5