Closed Bug 639121 Opened 13 years ago Closed 2 years ago

View/Page Source shows in windows-1252, even when Shift_JIS is detected by auto-detect and page is shown in Shift_JIS as expected

Categories

(Core :: Internationalization, defect)

x86
Windows XP
defect
Not set
normal

Tracking

()

RESOLVED WORKSFORME

People

(Reporter: World, Unassigned)

Details

View/Page Source shows in windows-1252, even when Shift_JIS is detected by auto-detect and page is shown in Shift_JIS as expected.
To obtain Shift_JIS display by View/Page Source, one of next is required.
- At page display window,
    Change View/Character Encoding to other than Shift_JIS,
    then change back to Shift_JIS
    => Disk Cache, charset: is changed to Shift_JIS from windows-1252
- At View/Page Source window,
    Change View/Character Encoding to Shift_JIS
    => Disk Cache, charset: is changed to Shift_JIS from windows-1252

[Build ID]
> Build identifier: Mozilla/5.0 (Windows NT 5.1; rv:2.0b12pre) Gecko/20110220 Firefox/4.0b12pre
[Test Web page]
> http://bugzilla-attachments.mozilla.gr.jp/attachment.cgi?id=3762
  Written in Shift_JIS.
  Meta tag is intentionally commented out for test of auto-detect. 
  <!-- <meta http-equiv="Content-Type" content="...; charset=Shift_JIS"> -->
[NSPR log(all:5) for load of above Web page]
> http://bugzilla-attachments.mozilla.gr.jp/attachment.cgi?id=4122
[Extracted NSPR log lines to show test procedure]
> http://bugzilla-attachments.mozilla.gr.jp/attachment.cgi?id=4123
 [0] Auto-Detect = Japanese (Japanese is not mandatory. Same with others)
 [1] Cache clear, initial load
     Page is shown in Shift_JIS.
     View/Character Encoding : Shift_JIS is shown.
     At this step, Disk Cache, charset: window-1252 is set.
     => View/Page Source shows in windows-1252
 [2] Change auto-detect language choice of View/Character Encoding/Auto-Detect
     Change from Japanese to Chinese etc.
     Page is still shown in Shift_JIS.
     View/Character Encoding : Shift_JIS is still shown.
     At this step, Disk Cache, charset: window-1252 is still set.
     => View/Page Source shows in windows-1252
 [3] Change charset choice of View/Character Encoding to UTF-8
     Page is shown in UTF-8
     View/Character Encoding : UTF-8 is shown.
     At this step, Disk Cache, charset: UTF-8 is set.
     => View/Page Source shows in UTF-8

If this web page is loaded by Fx3, phenomenon of "double HTTP GET" is always observed.
  First HTTP GET for initial load.
  Second HTTP GET by charset change by auto-detect.
Same problem as bug 597820?

In this bug's case, additional resource to which HTTP GET is requested is "fav icon" only. Is "fav iocon" subresource of this page?
I don't think "to hit the network again for the loads we started before finding the <meta>" is required in this bug's case.
I think "save detected charset in Disk Cache by auto-detect" is sufficient.
Note:
mozilla.gr.jp sends the test page with next header, as bugzillla.mozilla.org does do for attachment of a bug with no charset specification in mime-type.
> Content-Type: text/html; charset=; name="testcase.html"
Hmm.  We almost certainly don't run the charset sniffer for the view-source load, right?

Arguably, we should store the sniffed charset in the shentry....

> Same problem as bug 597820?

The double-get thing?  No.
(In reply to comment #2)
> Hmm.  We almost certainly don't run the charset sniffer for the view-source
> load, right?

I don't know what the current code does. For the new code, I was planning on treating view-source: GETs like http: POSTs: Sniffing only the first 1024 bytes and not allowing reloads. Does that make sense as the plan going forward?

> Arguably, we should store the sniffed charset in the shentry....

I had expected the charset to go into the history entry already, but I hadn't tested. :-(
> Does that make sense as the plan going forward?

Yes, but imo we should be propagating the charset that was used to view the page to view-source, which will look like a channel charset to the parser...

> I had expected the charset to go into the history entry already

Doesn't seem to.

I think the right fix to this bug is to store the charset in the shentry and propagate it to the view-source load.

The bug assignee didn't login in Bugzilla in the last 7 months, so the assignee is being reset.

Assignee: smontagu → nobody
Status: NEW → RESOLVED
Closed: 2 years ago
Resolution: --- → WORKSFORME
You need to log in before you can comment on or make changes to this bug.