Closed
Bug 212158
Opened 22 years ago
Closed 4 years ago
HTTP Content-Type charset property should be inserted as a META tag when saving HTML page to disk
Categories
(Core :: DOM: Serializers, defect, P5)
Core
DOM: Serializers
Tracking
()
RESOLVED
WORKSFORME
People
(Reporter: sgautherie, Unassigned)
References
Details
(Keywords: intl)
Attachments
(2 files)
User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.0; en-US; rv:1.4) Gecko/20030624
Build Identifier: Mozilla/5.0 (Windows; U; Windows NT 5.0; en-US; rv:1.4) Gecko/20030624
Viewing EUC-KR (in my example) saved page with no indication of charset display
them as ISO-8859-1.
I'll attach a page sent with
"Content-Type: text/html; charset=EUC-KR"
in the HTTP response header:
1) with a "same" <META> tag: which behaves correctly.
2) without "same" <META> tag: which behaves incorrectly.
Reproducible: Always
Steps to Reproduce:
NB: This testcase is for information only, as one needs a HTTP server; and I
don't have a public URL. (My test is a JSP page on a "J2EE" server.)
1. Display the page from the HTTP server, which sends the charset in the HTTP
response. (get EUC-KR :-))
2. Save the page to disk.
3. Display the page from disk. (get ISO-8859-1 :-()
Actual Results:
The saved page displays as ISO-8859-1.
Expected Results:
The saved page displays as EUC-KR.
One mean to achieve this could be:
Add the
<META http-equiv="Content-Type" content="text/html; charset=EUC-KR">
line at the beginning of the <HEAD> section.
(This could be "tricky" if there is there is already some kind of {META
http-equiv="Content-Type"} line present.!?.)
Both version (with/out META tag) are displayed correctly by MsIE v6.0sp1 !
I believe adding a META tag is one of the right things to do;
but it also seems that MsIEv6 has some "auto-detect" capability that Mozilla
lakes. (Is there another bug about this second issue ?)
NB: This issue is much like the one in MailNews bug 186407.
| Reporter | ||
Comment 1•22 years ago
|
||
EUC-KR always used :-)
| Reporter | ||
Comment 2•22 years ago
|
||
MsIE uses (= auto-detects !?) EUC-KR (or alike);
Mozilla defaults to ISO-8859-1 :-(
| Reporter | ||
Updated•22 years ago
|
Attachment #127352 -
Attachment description: page woithout META: MsIE right, Mozilla wrong → page without META: MsIE right, Mozilla wrong
| Reporter | ||
Comment 4•22 years ago
|
||
[Mozilla/5.0 (Windows; U; Win95; en-US; rv:1.4) Gecko/20030624]
Same bug.
NB:
*In EUC-KR, my Moz/W95 displays korean characters as '?' (question mark), since
I cancel the (2.1 MB) font download.
*(My (Moz/)W2K appears to support that charset, I'll have to check how it was
installed.)
*In "ISO-8859-1", Moz/W95orW2K displays the raw data: each korean character
appears as 2 "latin" characters.
[Netscape® Communicator 4.8 : en-20020722] (W95)
*"EUC": same as Mozilla, except I get "empty square" instead of '?'.
*"ISO": same as Mozilla, except PageInfo says "Charset: Unknown".
*NB: View menu allways says "ISO" :-<
(Not a '4xp' bug.)
[Microsoft Internet Explorer, version 3.0 (4.70.1158)] (W95)
Irrelevent: displays everything as "ISO" like, seems not to support other charset.
| Reporter | ||
Comment 5•22 years ago
|
||
Comment 6•22 years ago
|
||
Sounds like something that the persistence object would have to do...
Comment 7•21 years ago
|
||
*** Bug 259246 has been marked as a duplicate of this bug. ***
Comment 8•21 years ago
|
||
*** Bug 248865 has been marked as a duplicate of this bug. ***
Comment 9•21 years ago
|
||
*** Bug 264333 has been marked as a duplicate of this bug. ***
Comment 10•21 years ago
|
||
*** Bug 218407 has been marked as a duplicate of this bug. ***
Updated•21 years ago
|
Component: Browser-General → DOM to Text Conversion
OS: Windows 2000 → All
Hardware: PC → All
Comment 11•21 years ago
|
||
Copied from my comment in bug 259246 :
My idea about the subject :
- At an evangelism level, page authors should be informed about this effect, and
recommended to include the META tag, and not only use HTTP header, for correct
off-line viewing of their page.
- Save page as "Web page, HTML only" is expected to save the exact page that was
received and I don't think it is a good thing if it modifies the page in any way
when saving. Keep as is.
- Save page as "Web Page, Complete" already modifies the page it saves, so it
should be enhanced to save the encoding selected for display on disk (the
encoding selected, not the one in the HTTP header. For the case everything
failed, and the user has had to manually select the correct encoding).
Comment 12•21 years ago
|
||
(In reply to comment #11)
> Copied from my comment in bug 259246 :
>
> My idea about the subject :
> - At an evangelism level, page authors should be informed about this effect, and
> recommended to include the META tag, and not only use HTTP header, for correct
> off-line viewing of their page.
I agree that an author should try to ensure the proper viewing of his page by
including the encoding information. Not setting this parameter is very common in
Western countries that use ISO-8859-1, and according to HTML they are doing
nothing wrong. However, whith the slow introduction of UTF-8 and the need for
symbols outside that character set (e.g. the euro sign, present in ISO-8859-15)
it is clear that page writers (as well as sofware writers) should be encouraged
to take notice of encoding issues and solutions.
However, the importance and priority of the HTTP header is not to be forgotten.
Reasons are the possibility of multi-language page serving in different charsets
or modifications done by the web server to the author's original page.
> - Save page as "Web page, HTML only" is expected to save the exact page that was
> received and I don't think it is a good thing if it modifies the page in any way
> when saving. Keep as is.
I am not very sure about this except as an effort to mimic Internet Explorer.
"Web page, HTML only" states that the HTML content will be saved exclusively,
but nothing else. Bug 125729 is an example of a nice feature which would require
modification of the original even when saving only the HTML. If one wants access
to the page as served by the server, I would recommend using "View Page/Frame
Source". Maybe adding a "Save" menu entry in the source visualization screen
should be submitted as an enhancement. Adding a new save type option "Original
HTML" is another possibility.
> - Save page as "Web Page, Complete" already modifies the page it saves, so it
> should be enhanced to save the encoding selected for display on disk (the
> encoding selected, not the one in the HTTP header. For the case everything
> failed, and the user has had to manually select the correct encoding).
It seems a good idea to use the encoding used for display when saving. The
program should add or modify the HTML Content-Type meta tag of the page/frame
when saving --or more radical and more dangerous, change all the encoding.
However, as reasoned above, I think this should be done both for the "Web Page,
HTML only" and "Web Page, complete" options.
Comment 13•21 years ago
|
||
*** Bug 280282 has been marked as a duplicate of this bug. ***
Updated•16 years ago
|
Assignee: general → nobody
QA Contact: general → dom-to-text
Comment 14•5 years ago
|
||
Bulk-downgrade of unassigned, untouched DOM/Storage bug's priority.
If you have reason to believe, this is wrong, please write a comment and ni :jstutte.
Severity: normal → S4
Priority: -- → P5
Comment 15•4 years ago
|
||
If saving as complete, the serialized adds a meta charset. Even when saving verbatim, Gecko autodetects the encoding of HTML loaded from file: URLs these days.
Status: NEW → RESOLVED
Closed: 4 years ago
Resolution: --- → WORKSFORME
You need to log in
before you can comment on or make changes to this bug.
Description
•