Closed Bug 1279982 Opened 9 years ago Closed 9 years ago

pages are sometimes interpreted as windows-1252 instead of UTF-8 until a reload is done

Tracking

()

Status:

RESOLVED WORKSFORME

People

(Reporter: vincent-moz, Unassigned)

Details

Attachments

(4 files)

Bildschirmfoto vom 2022-06-10 23-41-45.png 3 years ago Oliver Kluge 65.72 KB, image/png		Details
Bildschirmfoto vom 2022-06-10 23-41-45.png 3 years ago Oliver Kluge 65.72 KB, image/png		Details
Bildschirmfoto vom 2022-06-11 02-22-06.png 3 years ago Oliver Kluge 11.45 KB, image/png		Details
Bildschirmfoto vom 2022-06-11 02-26-16.png 3 years ago Oliver Kluge 11.11 KB, image/png		Details

Vincent Lefevre

Reporter

Description

•

9 years ago

User Agent: Mozilla/5.0 (X11; Linux x86_64; rv:45.0) Gecko/20100101 Firefox/45.0 Build ID: 20160607223741 Steps to reproduce: 1. Start firefox -safe-mode -no-remote 2. Create a fresh profile 3. Open https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=827249 4. From it, open https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=807528 Actual results: For both pages, the text encoding is windows-1252, so that accented characters are incorrect. Expected results: The correct encoding is UTF-8, which is obtained when I do Ctrl-Shift-R to force a reload. This may be a cache issue, because I did the following with my main Firefox profile: When I reloaded https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=807528 directly, it was in UTF-8 (confirmed with Live HTTP Headers). Then I opened this URL via the link on: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=790825#84 and Live HTTP Headers showed nothing (so I assume that it came from the cache), but the accented characters are now incorrect and "View Page Info" says windows-1252 for the text encoding.

Vincent Lefevre

Reporter

Comment 1

•

9 years ago

I can't reproduce this problem with the current Nightly: 50.0a1 (2016-06-13).

Vincent Lefevre

Reporter

Comment 2

•

9 years ago

(In reply to Vincent Lefevre from comment #1) > I can't reproduce this problem with the current Nightly: 50.0a1 (2016-06-13). Actually I could reproduce it with Nightly.

Alex

Comment 3

•

9 years ago

When I load the page initially (or via shift+reload) it's served as UTF-8 by the server, but when I reload the server instead claims it's "ISO-8859-1"

Vincent Lefevre

Reporter

Comment 4

•

9 years ago

More precisely, by using Web Developer → Network: 1. I open https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=827249 and get a page with incorrect accented characters and in the response headers: Content-Type: "text/html; charset=ISO-8859-1" 2. I do Ctrl-Shift-R to force a reload, and the page is now correct. In the response headers: Content-Type: "text/html; charset=utf-8" 3. I open https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=827249 again and I get the page from the cache, which is correct (contrary to what happened with Firefox 45.2.0). So, there are two problems: 1. For the initial URL open, I get charset=ISO-8859-1, which is incorrect. This is specific to Firefox: no such problem with wget, lynx, w3m and Opera. This happens with both Firefox 45.2.0 and Nightly. This problem disappears with Ctrl-Shift-R. 2. When the page is obtained from the cache after a Ctrl-Shift-R, the charset is incorrect with Firefox 45.2.0, but I couldn't reproduce this problem with Nightly.

Vincent Lefevre

Reporter

Comment 5

•

9 years ago

(In reply to Alex from comment #3) > When I load the page initially (or via shift+reload) it's served as UTF-8 by > the server, but when I reload the server instead claims it's "ISO-8859-1" Yes, I confirm that a simple Ctrl-R in Firefox gives ISO-8859-1 (but with lynx, a Ctrl-R still gives UTF-8).

Vincent Lefevre

Reporter

Comment 6

•

9 years ago

I now see the same problem with w3m, where restarting w3m with the same URL. So, problem (1) is a server issue, while problem (2) seems to be an issue with Firefox 45.2.0.

Alex

Comment 7

•

9 years ago

Chrome, Safari and Firefox all behave the same for me (And none of them seem to use the cache, the server reports 200 with the same etag instead of a 304) Does lynx do re-validation? That's when the server returns the incorrect headers for me.

Vincent Lefevre

Reporter

Comment 8

•

9 years ago

(In reply to Alex from comment #7) > Chrome, Safari and Firefox all behave the same for me (And none of them seem > to use the cache, the server reports 200 with the same etag instead of a 304) Firefox uses the cache when one opens the URL very shortly after a (forced) reload. > Does lynx do re-validation? That's when the server returns the incorrect > headers for me. Like w3m, same problem with lynx when I restart it on the same URL. After a Ctrl-R, I get the correct charset. I don't know what Ctrl-R does exactly; it is just documented as "Reload current file and refresh the screen". Perhaps it's like Ctrl-Shift-R in Firefox, which would explain the behavior.

Loic

Updated

•

9 years ago

Component: Untriaged → DOM

Product: Firefox → Core

Vincent Lefevre

Reporter

Comment 9

•

9 years ago

It seems that I can no longer reproduce problem (2).

Andrew Overholt [:overholt]

Comment 10

•

9 years ago

Thanks for the report and details. If you can reproduce, please re-open this bug.

Status: UNCONFIRMED → RESOLVED

Closed: 9 years ago

Resolution: --- → WORKSFORME

Nobody; OK to take it and work on it

Assignee

Updated

•

6 years ago

Component: DOM → DOM: Core & HTML

Oliver Kluge

Comment 11

•

3 years ago

Looks like this problem has reappeared in 100.0.1 mint001 on Linux Mint, 64 Bit. Also seen on Windows 7, 64 Bit.

Visit http://biblio.aktionsgruppe.de/obiblio/opac/index.php and search for Marek. The site is an instance of Obiblio library software.

The pages that it generates are clearly marked <META http-equiv="content-type" content="text/html; charset=iso-8859-1"> which Firefox also reports when accessing Ctrl-I. There Firefox states

content-type text/html; charset=iso-8859-1
but above that, it reads

Text encoding Windows1252
and all accented characters are wrong. They are correct in the database (checked with phpMyAdmin), the web page is correctly marked, but still...

Possibly linked to this behaviour is some odd problem I see on various systems, Obiblio one of them, but also instances of Tiki Wiki, since a couple of weeks:

On sites that worked flawlessly for years, which did NOT get a software update of the PHP software or a change of collation in the database.

When a user fills in a text entry field that the PHP program will use to perform a search: Is there a collation information sent from Firefox to the website?

When you enter text to be searched, the (PHP) programs now throw an error "illegal mix of collations". Looks like in text entry fields, firefox does not pass the pages own encoding as collation (which would be latin1_german2_ci for iso8859), but utf8_general_ci, if accented characters are present in the user input... It does not happen when there are no accented characters in the user input...

Thanks
hman

Vincent Lefevre

Reporter

Comment 12

•

3 years ago

The HTTP response headers contain:

Content-Type: text/html; charset=OBIB_CHARSET

OBIB_CHARSET is not a valid charset. So the issue would be the configuration of the web server, not Firefox.

Oliver Kluge

Comment 13

•

3 years ago

In HTML the meta statement is <META http-equiv="content-type" content="text/html; charset=iso-8859-1">

Oliver Kluge

Comment 14

•

3 years ago

And Firefox detects ISO 8859-1, but does not use it, see screenshot. Hm, cannot attach a screenshot here?

Oliver Kluge

Comment 15

•

3 years ago

Attached image Bildschirmfoto vom 2022-06-10 23-41-45.png — Details

Codepage detection info page.

Oliver Kluge

Comment 16

•

3 years ago

Attached image Bildschirmfoto vom 2022-06-10 23-41-45.png — Details

Oliver Kluge

Comment 17

•

3 years ago

Uh, yes, despite the META, network analysis showed indeed Content-Type: text/html; charset=OBIB_CHARSET. Thanks for pointing me to that.

And I got lucky. I did not write Open Biblio software, but this was the result of a bug that was quickly found and corrected, thank you.

Now the correct Content-Type text/html; charset=iso-8859-1 is written in the HTTP request header.

But - Firefox still doesn't do it right. Firefox still correctly recognizes iso-8859-1, but still uses windows-1252 and produces dysfunctional diacritics...

Vincent Lefevre

Reporter

Comment 18

•

3 years ago

Concerning windows-1252 while the page is declared as iso-8859-1 is related to this bug. This is bug 897302 / bug 890478.

Vincent Lefevre

Reporter

Comment 19

•

3 years ago

I meant is unrelated to this bug.

Oliver Kluge

Comment 20

•

3 years ago

That bug was closed as invalid. Declaring windows1252 can never be correct on a Linux machine, because that code page does not exist. Also, the actual text rendering is obviously wrong and NOT iso8859-1, just look at it.

Oliver Kluge

Comment 21

•

3 years ago

Table is set to biblio_status_dm InnoDB 10 Dynamic 9 1820 16384 0 0 0 NULL 2022-05-29 11:57:30 NULL NULL latin1_german2_ci NULL row_format=DYNAMIC
latin1_german2_ci = iso-8859-1 character set. And this is the rendered result:

Oliver Kluge

Comment 22

•

3 years ago

Attached image Bildschirmfoto vom 2022-06-11 02-22-06.png — Details

Render result.

Oliver Kluge

Comment 23

•

3 years ago

Attached image Bildschirmfoto vom 2022-06-11 02-26-16.png — Details

Text content in the table .

Vincent Lefevre

Reporter

Comment 24

•

3 years ago

(In reply to Oliver Kluge from comment #22)

Bildschirmfoto vom 2022-06-11 02-22-06.png

This looks incorrect because the characters are encoded in UTF-8 while the server declares iso-8859-1. You need to fix the configuration of the server so that it declares utf-8 (at the same time, this will avoid the windows-1252 nonsense).

You need to log in before you can comment on or make changes to this bug.