Firefox treats some html lang="en" pages as "other writing systems" instead of Roman alphabet [due to invalid lang tag on https://www.aclu.org/ ]
Categories
(Web Compatibility :: Site Reports, defect, P3)
Tracking
(Not tracked)
People
(Reporter: erwinm, Unassigned)
References
()
Details
(Keywords: webcompat:site-wait)
User Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.14; rv:77.0) Gecko/20100101 Firefox/77.0
Steps to reproduce:
I use Andika for Roman and Cyrillic alphabets, Skeirs for Other Writing Systems.
I visited https://www.aclu.org/news/free-speech/police-are-attacking-journalists-at-protests-were-suing/
Actual results:
It displayed in Skeirs.
Checking the page source, it's html lang="en" and also uses Javascript.
Expected results:
It should display in Andika.
The alphabet switch is a bit awkward.
See also bug 1633627
Comment 1•5 years ago
|
||
Bugbug thinks this bug should belong to this component, but please revert this change in case of error.
Comment 2•5 years ago
|
||
(In reply to MarjaE from comment #0)
User Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.14; rv:77.0) Gecko/20100101 Firefox/77.0
Steps to reproduce:
I use Andika for Roman and Cyrillic alphabets, Skeirs for Other Writing Systems.
I visited https://www.aclu.org/news/free-speech/police-are-attacking-journalists-at-protests-were-suing/
Actual results:
It displayed in Skeirs.
Checking the page source, it's html lang="en" and also uses Javascript.
Looking at the page in the Inspector, I see that the <html>
tag actually has an attribute lang="en en"
, which is invalid and therefore ignored.
(The original page that the server delivers does seem to have lang="en"
, so I presume something in its JavaScript subsequently updates this (along with lots of other changes -- e.g. I see that it adds a style
attribute to the root element), and does so incorrectly. Indeed, the original page has <html lang="en" data-n-head="%7B%22lang%22:%7B%221%22:%22en%22%7D%7D">
, where the data-n-head
attribute would decode to {"lang":{"1":"en"}}
, which looks very much like it could be getting used by a loading script to add the spurious extra en
. I guess they should either remove the original lang
attribute, if they're always relying on this being added by script, or make the script smarter so that it replaces the existing value instead of appending to it and creating an invalid tag.)
I see the same lang="en en"
when inspecting the loaded page in Chrome, so it does look like this is a site bug rather than a Firefox bug.
Updated•5 years ago
|
Comment 3•5 years ago
|
||
I sent a message to a engineering leader via LinkedIn.
Comment 4•5 years ago
|
||
(I got a reply that the bug would be passed along to the right folks.)
I've tried reporting similar errors to webcompat, they close as unable to reproduce the errors when I am unable to avoid them.
Comment 6•4 years ago
|
||
The ACLU page where this was originally reported no longer seems to show the invalid lang
tag problem as described in comment 2, so I guess that has been fixed. @MarjaE, if you're seeing similar problems again please indicate the specific page (or pages) involved.
Here's one: http://www.digitalattic.org/home/war/vegetius/
View Page Info shows content-language English.
![]() |
||
Comment 8•4 years ago
|
||
The issue that MarjaE is talking about is
https://github.com/webcompat/web-bugs/issues/68809
Comment 9•4 years ago
•
|
||
(In reply to MarjaE from comment #7)
Here's one: http://www.digitalattic.org/home/war/vegetius/
View Page Info shows content-language English.
This page ends up with the Other Writing Systems font preference because it does not have a lang=en
(or equivalent) attribute.
Here's the beginning of the document, from View Source:
<!-- HEADER -->
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN"
"http://www.w3.org/TR/html4/strict.dtd">
<head>
<title>The Military Institutions of the Romans (De Re Militari)</title>
<meta http-equiv="content-Type" content="text/html;charset=utf-8">
<meta http-equiv="content-Language" content="English">
<meta name="author" content="Mads Brevik">
<meta name="Robots" content="all">
<meta name="description" content="'De Re Militari' by Flavius Vegetius Renatus">
<meta name="keywords" content="flavius vegetius renatus, de re militari">
<link href="/favicon.ico" rel="shortcut icon">
<link href="/home/_include/attic.css" type="text/css" rel="stylesheet">
</head>
<!-- BODY -->
<body>
<div class="Body">
<div id="Mainmenu">
<div class="MainTitle">
<a href="/" style="color: black">Digital Attic</a>
</div>
<div class="TopMenuLink">
<a href="/home/read/asoiaf/">A Song of Ice and Fire</a> : <a href="/home/war/">Warfare</a>
</div>
</div>
Note the absence of any lang
attribute. What Page Info shows (content-Language: English) is not a language tag (lang
attribute) but a meta
tag, which (despite its name) is not the correct way to declare the language of the document itself; see https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/Content-Language; its purpose is slightly different.
In addition, even if this were the relevant place to declare the document language, it wouldn't work as intended because it literally says "English" as the value (which Page Info dutifully shows); but the value is supposed to be a formally-defined language tag such as "en-US" that can be parsed in a well-defined way for processing. The string "English" is not a defined language tag and so would be ignored anyway.
So to sum up: the page does not declare its document language because of two authoring errors: trying to use an arbitrary language name rather than a well-formed language tag, and putting it in the http-equiv="content-Language"
meta tag rather than as an HTML lang
attribute. As a result, Firefox ends up using the Other Writing Systems font preference to resolve sans-serif
for this content.
Reporter | ||
Comment 10•3 years ago
|
||
Also occurs on Project Gutenberg:
https://www.gutenberg.org/ebooks/search/?query=test+search&submit_search=Go%21
Where line 11 of the page source reads:
--><html lang="en_US">
Comment 11•3 years ago
|
||
(In reply to MarjaE from comment #10)
Also occurs on Project Gutenberg:
https://www.gutenberg.org/ebooks/search/?query=test+search&submit_search=Go%21
Where line 11 of the page source reads:
--><html lang="en_US">
In this case, the issue is that "en_US" is not a well-formed language tag; it should be "en-US".
See for example https://datatracker.ietf.org/doc/html/rfc5646#section-2.
Comment 12•3 years ago
|
||
Although comment 10 is really a website error, it appears that both Blink and Webkit recognize such "broken" lang tags for the purpose of font selection. So I've filed bug 1757578 to propose making Firefox do the same, in the interests of compatibility.
Comment 13•3 years ago
|
||
It seems now that the HTML tag has the correct formed attribute, "en-US", thus not being able to reproduce the issue
Marja, is the issue reproducible on your side?
Tested with:
Browser / Version: Firefox Release 102.0 (64-bit)/ Firefox Nightly 104.0a1 (2022-06-28) (64-bit)
Operating System: Mac OSX Catalina 10.15.7
Updated•3 years ago
|
Comment 15•3 years ago
|
||
Thanks for the update. I will be closing this issue.
Description
•