pages are sometimes interpreted as windows-1252 instead of UTF-8 until a reload is done
Categories
(Core :: DOM: Core & HTML, defect)
Tracking
()
People
(Reporter: vincent-moz, Unassigned)
Details
Attachments
(4 files)
User Agent: Mozilla/5.0 (X11; Linux x86_64; rv:45.0) Gecko/20100101 Firefox/45.0 Build ID: 20160607223741 Steps to reproduce: 1. Start firefox -safe-mode -no-remote 2. Create a fresh profile 3. Open https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=827249 4. From it, open https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=807528 Actual results: For both pages, the text encoding is windows-1252, so that accented characters are incorrect. Expected results: The correct encoding is UTF-8, which is obtained when I do Ctrl-Shift-R to force a reload. This may be a cache issue, because I did the following with my main Firefox profile: When I reloaded https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=807528 directly, it was in UTF-8 (confirmed with Live HTTP Headers). Then I opened this URL via the link on: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=790825#84 and Live HTTP Headers showed nothing (so I assume that it came from the cache), but the accented characters are now incorrect and "View Page Info" says windows-1252 for the text encoding.
Reporter | ||
Comment 1•8 years ago
|
||
I can't reproduce this problem with the current Nightly: 50.0a1 (2016-06-13).
Reporter | ||
Comment 2•8 years ago
|
||
(In reply to Vincent Lefevre from comment #1) > I can't reproduce this problem with the current Nightly: 50.0a1 (2016-06-13). Actually I could reproduce it with Nightly.
When I load the page initially (or via shift+reload) it's served as UTF-8 by the server, but when I reload the server instead claims it's "ISO-8859-1"
Reporter | ||
Comment 4•8 years ago
|
||
More precisely, by using Web Developer → Network: 1. I open https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=827249 and get a page with incorrect accented characters and in the response headers: Content-Type: "text/html; charset=ISO-8859-1" 2. I do Ctrl-Shift-R to force a reload, and the page is now correct. In the response headers: Content-Type: "text/html; charset=utf-8" 3. I open https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=827249 again and I get the page from the cache, which is correct (contrary to what happened with Firefox 45.2.0). So, there are two problems: 1. For the initial URL open, I get charset=ISO-8859-1, which is incorrect. This is specific to Firefox: no such problem with wget, lynx, w3m and Opera. This happens with both Firefox 45.2.0 and Nightly. This problem disappears with Ctrl-Shift-R. 2. When the page is obtained from the cache after a Ctrl-Shift-R, the charset is incorrect with Firefox 45.2.0, but I couldn't reproduce this problem with Nightly.
Reporter | ||
Comment 5•8 years ago
|
||
(In reply to Alex from comment #3) > When I load the page initially (or via shift+reload) it's served as UTF-8 by > the server, but when I reload the server instead claims it's "ISO-8859-1" Yes, I confirm that a simple Ctrl-R in Firefox gives ISO-8859-1 (but with lynx, a Ctrl-R still gives UTF-8).
Reporter | ||
Comment 6•8 years ago
|
||
I now see the same problem with w3m, where restarting w3m with the same URL. So, problem (1) is a server issue, while problem (2) seems to be an issue with Firefox 45.2.0.
Chrome, Safari and Firefox all behave the same for me (And none of them seem to use the cache, the server reports 200 with the same etag instead of a 304) Does lynx do re-validation? That's when the server returns the incorrect headers for me.
Reporter | ||
Comment 8•8 years ago
|
||
(In reply to Alex from comment #7) > Chrome, Safari and Firefox all behave the same for me (And none of them seem > to use the cache, the server reports 200 with the same etag instead of a 304) Firefox uses the cache when one opens the URL very shortly after a (forced) reload. > Does lynx do re-validation? That's when the server returns the incorrect > headers for me. Like w3m, same problem with lynx when I restart it on the same URL. After a Ctrl-R, I get the correct charset. I don't know what Ctrl-R does exactly; it is just documented as "Reload current file and refresh the screen". Perhaps it's like Ctrl-Shift-R in Firefox, which would explain the behavior.
Reporter | ||
Comment 9•8 years ago
|
||
It seems that I can no longer reproduce problem (2).
Comment 10•8 years ago
|
||
Thanks for the report and details. If you can reproduce, please re-open this bug.
Assignee | ||
Updated•5 years ago
|
Comment 11•2 years ago
|
||
Looks like this problem has reappeared in 100.0.1 mint001 on Linux Mint, 64 Bit. Also seen on Windows 7, 64 Bit.
Visit http://biblio.aktionsgruppe.de/obiblio/opac/index.php and search for Marek. The site is an instance of Obiblio library software.
The pages that it generates are clearly marked <META http-equiv="content-type" content="text/html; charset=iso-8859-1"> which Firefox also reports when accessing Ctrl-I. There Firefox states
content-type text/html; charset=iso-8859-1
but above that, it reads
Text encoding Windows1252
and all accented characters are wrong. They are correct in the database (checked with phpMyAdmin), the web page is correctly marked, but still...
Possibly linked to this behaviour is some odd problem I see on various systems, Obiblio one of them, but also instances of Tiki Wiki, since a couple of weeks:
On sites that worked flawlessly for years, which did NOT get a software update of the PHP software or a change of collation in the database.
When a user fills in a text entry field that the PHP program will use to perform a search: Is there a collation information sent from Firefox to the website?
When you enter text to be searched, the (PHP) programs now throw an error "illegal mix of collations". Looks like in text entry fields, firefox does not pass the pages own encoding as collation (which would be latin1_german2_ci for iso8859), but utf8_general_ci, if accented characters are present in the user input... It does not happen when there are no accented characters in the user input...
Thanks
hman
Reporter | ||
Comment 12•2 years ago
|
||
The HTTP response headers contain:
Content-Type: text/html; charset=OBIB_CHARSET
OBIB_CHARSET is not a valid charset. So the issue would be the configuration of the web server, not Firefox.
Comment 13•2 years ago
|
||
In HTML the meta statement is <META http-equiv="content-type" content="text/html; charset=iso-8859-1">
Comment 14•2 years ago
|
||
And Firefox detects ISO 8859-1, but does not use it, see screenshot. Hm, cannot attach a screenshot here?
Comment 15•2 years ago
|
||
Codepage detection info page.
Comment 16•2 years ago
|
||
Comment 17•2 years ago
|
||
Uh, yes, despite the META, network analysis showed indeed Content-Type: text/html; charset=OBIB_CHARSET. Thanks for pointing me to that.
And I got lucky. I did not write Open Biblio software, but this was the result of a bug that was quickly found and corrected, thank you.
Now the correct Content-Type text/html; charset=iso-8859-1 is written in the HTTP request header.
But - Firefox still doesn't do it right. Firefox still correctly recognizes iso-8859-1, but still uses windows-1252 and produces dysfunctional diacritics...
Reporter | ||
Comment 18•2 years ago
|
||
Concerning windows-1252 while the page is declared as iso-8859-1 is related to this bug. This is bug 897302 / bug 890478.
Reporter | ||
Comment 19•2 years ago
|
||
I meant is unrelated to this bug.
Comment 20•2 years ago
|
||
That bug was closed as invalid. Declaring windows1252 can never be correct on a Linux machine, because that code page does not exist. Also, the actual text rendering is obviously wrong and NOT iso8859-1, just look at it.
Comment 21•2 years ago
|
||
Table is set to biblio_status_dm InnoDB 10 Dynamic 9 1820 16384 0 0 0 NULL 2022-05-29 11:57:30 NULL NULL latin1_german2_ci NULL row_format=DYNAMIC
latin1_german2_ci = iso-8859-1 character set. And this is the rendered result:
Comment 22•2 years ago
|
||
Render result.
Comment 23•2 years ago
|
||
Text content in the table .
Reporter | ||
Comment 24•2 years ago
|
||
(In reply to Oliver Kluge from comment #22)
Bildschirmfoto vom 2022-06-11 02-22-06.png
This looks incorrect because the characters are encoded in UTF-8 while the server declares iso-8859-1. You need to fix the configuration of the server so that it declares utf-8 (at the same time, this will avoid the windows-1252 nonsense).
Description
•