pages are sometimes interpreted as windows-1252 instead of UTF-8 until a reload is done

RESOLVED WORKSFORME

Status

()

Core
DOM
RESOLVED WORKSFORME
2 years ago
2 years ago

People

(Reporter: Vincent Lefevre, Unassigned)

Tracking

45 Branch
Points:
---

Firefox Tracking Flags

(Not tracked)

Details

(Reporter)

Description

2 years ago
User Agent: Mozilla/5.0 (X11; Linux x86_64; rv:45.0) Gecko/20100101 Firefox/45.0
Build ID: 20160607223741

Steps to reproduce:

1.  Start firefox -safe-mode -no-remote
2. Create a fresh profile
3. Open https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=827249
4. From it, open https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=807528


Actual results:

For both pages, the text encoding is windows-1252, so that accented characters are incorrect.


Expected results:

The correct encoding is UTF-8, which is obtained when I do Ctrl-Shift-R to force a reload.

This may be a cache issue, because I did the following with my main Firefox profile:

When I reloaded

  https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=807528

directly, it was in UTF-8 (confirmed with Live HTTP Headers). Then I opened this URL via the link on:

  https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=790825#84

and Live HTTP Headers showed nothing (so I assume that it came from the cache), but the accented characters are now incorrect and "View Page Info" says windows-1252 for the text encoding.
(Reporter)

Comment 1

2 years ago
I can't reproduce this problem with the current Nightly: 50.0a1 (2016-06-13).
(Reporter)

Comment 2

2 years ago
(In reply to Vincent Lefevre from comment #1)
> I can't reproduce this problem with the current Nightly: 50.0a1 (2016-06-13).

Actually I could reproduce it with Nightly.

Comment 3

2 years ago
When I load the page initially (or via shift+reload) it's served as UTF-8 by the server, but when I reload the server instead claims it's "ISO-8859-1"
(Reporter)

Comment 4

2 years ago
More precisely, by using Web Developer → Network:

1. I open https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=827249 and get a page with incorrect accented characters and in the response headers:

    Content-Type: "text/html; charset=ISO-8859-1"

2. I do Ctrl-Shift-R to force a reload, and the page is now correct. In the response headers:

    Content-Type: "text/html; charset=utf-8"

3. I open https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=827249 again and I get the page from the cache, which is correct (contrary to what happened with Firefox 45.2.0).

So, there are two problems:

1. For the initial URL open, I get charset=ISO-8859-1, which is incorrect. This is specific to Firefox: no such problem with wget, lynx, w3m and Opera. This happens with both Firefox 45.2.0 and Nightly. This problem disappears with Ctrl-Shift-R.

2. When the page is obtained from the cache after a Ctrl-Shift-R, the charset is incorrect with Firefox 45.2.0, but I couldn't reproduce this problem with Nightly.
(Reporter)

Comment 5

2 years ago
(In reply to Alex from comment #3)
> When I load the page initially (or via shift+reload) it's served as UTF-8 by
> the server, but when I reload the server instead claims it's "ISO-8859-1"

Yes, I confirm that a simple Ctrl-R in Firefox gives ISO-8859-1 (but with lynx, a Ctrl-R still gives UTF-8).
(Reporter)

Comment 6

2 years ago
I now see the same problem with w3m, where restarting w3m with the same URL. So, problem (1) is a server issue, while problem (2) seems to be an issue with Firefox 45.2.0.

Comment 7

2 years ago
Chrome, Safari and Firefox all behave the same for me (And none of them seem to use the cache, the server reports 200 with the same etag instead of a 304)

Does lynx do re-validation? That's when the server returns the incorrect headers for me.
(Reporter)

Comment 8

2 years ago
(In reply to Alex from comment #7)
> Chrome, Safari and Firefox all behave the same for me (And none of them seem
> to use the cache, the server reports 200 with the same etag instead of a 304)

Firefox uses the cache when one opens the URL very shortly after a (forced) reload.

> Does lynx do re-validation? That's when the server returns the incorrect
> headers for me.

Like w3m, same problem with lynx when I restart it on the same URL. After a Ctrl-R, I get the correct charset. I don't know what Ctrl-R does exactly; it is just documented as "Reload current file and refresh the screen". Perhaps it's like Ctrl-Shift-R in Firefox, which would explain the behavior.

Updated

2 years ago
Component: Untriaged → DOM
Product: Firefox → Core
(Reporter)

Comment 9

2 years ago
It seems that I can no longer reproduce problem (2).
Thanks for the report and details. If you can reproduce, please re-open this bug.
Status: UNCONFIRMED → RESOLVED
Last Resolved: 2 years ago
Resolution: --- → WORKSFORME
You need to log in before you can comment on or make changes to this bug.