Closed Bug 212158 Opened 22 years ago Closed 4 years ago

HTTP Content-Type charset property should be inserted as a META tag when saving HTML page to disk

Categories

(Core :: DOM: Serializers, defect, P5)

defect

Tracking

()

RESOLVED WORKSFORME

People

(Reporter: sgautherie, Unassigned)

References

Details

(Keywords: intl)

Attachments

(2 files)

User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.0; en-US; rv:1.4) Gecko/20030624 Build Identifier: Mozilla/5.0 (Windows; U; Windows NT 5.0; en-US; rv:1.4) Gecko/20030624 Viewing EUC-KR (in my example) saved page with no indication of charset display them as ISO-8859-1. I'll attach a page sent with "Content-Type: text/html; charset=EUC-KR" in the HTTP response header: 1) with a "same" <META> tag: which behaves correctly. 2) without "same" <META> tag: which behaves incorrectly. Reproducible: Always Steps to Reproduce: NB: This testcase is for information only, as one needs a HTTP server; and I don't have a public URL. (My test is a JSP page on a "J2EE" server.) 1. Display the page from the HTTP server, which sends the charset in the HTTP response. (get EUC-KR :-)) 2. Save the page to disk. 3. Display the page from disk. (get ISO-8859-1 :-() Actual Results: The saved page displays as ISO-8859-1. Expected Results: The saved page displays as EUC-KR. One mean to achieve this could be: Add the <META http-equiv="Content-Type" content="text/html; charset=EUC-KR"> line at the beginning of the <HEAD> section. (This could be "tricky" if there is there is already some kind of {META http-equiv="Content-Type"} line present.!?.) Both version (with/out META tag) are displayed correctly by MsIE v6.0sp1 ! I believe adding a META tag is one of the right things to do; but it also seems that MsIEv6 has some "auto-detect" capability that Mozilla lakes. (Is there another bug about this second issue ?) NB: This issue is much like the one in MailNews bug 186407.
EUC-KR always used :-)
MsIE uses (= auto-detects !?) EUC-KR (or alike); Mozilla defaults to ISO-8859-1 :-(
Adding (K) 'intl'. [I believe it's appropriate !?]
Keywords: intl
Attachment #127352 - Attachment description: page woithout META: MsIE right, Mozilla wrong → page without META: MsIE right, Mozilla wrong
[Mozilla/5.0 (Windows; U; Win95; en-US; rv:1.4) Gecko/20030624] Same bug. NB: *In EUC-KR, my Moz/W95 displays korean characters as '?' (question mark), since I cancel the (2.1 MB) font download. *(My (Moz/)W2K appears to support that charset, I'll have to check how it was installed.) *In "ISO-8859-1", Moz/W95orW2K displays the raw data: each korean character appears as 2 "latin" characters. [Netscape® Communicator 4.8 : en-20020722] (W95) *"EUC": same as Mozilla, except I get "empty square" instead of '?'. *"ISO": same as Mozilla, except PageInfo says "Charset: Unknown". *NB: View menu allways says "ISO" :-< (Not a '4xp' bug.) [Microsoft Internet Explorer, version 3.0 (4.70.1158)] (W95) Irrelevent: displays everything as "ISO" like, seems not to support other charset.
Addition to comment 4: Comment 4 is about viewing the attached files only: not about the save step.
Sounds like something that the persistence object would have to do...
*** Bug 259246 has been marked as a duplicate of this bug. ***
*** Bug 248865 has been marked as a duplicate of this bug. ***
*** Bug 264333 has been marked as a duplicate of this bug. ***
*** Bug 218407 has been marked as a duplicate of this bug. ***
Component: Browser-General → DOM to Text Conversion
OS: Windows 2000 → All
Hardware: PC → All
Copied from my comment in bug 259246 : My idea about the subject : - At an evangelism level, page authors should be informed about this effect, and recommended to include the META tag, and not only use HTTP header, for correct off-line viewing of their page. - Save page as "Web page, HTML only" is expected to save the exact page that was received and I don't think it is a good thing if it modifies the page in any way when saving. Keep as is. - Save page as "Web Page, Complete" already modifies the page it saves, so it should be enhanced to save the encoding selected for display on disk (the encoding selected, not the one in the HTTP header. For the case everything failed, and the user has had to manually select the correct encoding).
(In reply to comment #11) > Copied from my comment in bug 259246 : > > My idea about the subject : > - At an evangelism level, page authors should be informed about this effect, and > recommended to include the META tag, and not only use HTTP header, for correct > off-line viewing of their page. I agree that an author should try to ensure the proper viewing of his page by including the encoding information. Not setting this parameter is very common in Western countries that use ISO-8859-1, and according to HTML they are doing nothing wrong. However, whith the slow introduction of UTF-8 and the need for symbols outside that character set (e.g. the euro sign, present in ISO-8859-15) it is clear that page writers (as well as sofware writers) should be encouraged to take notice of encoding issues and solutions. However, the importance and priority of the HTTP header is not to be forgotten. Reasons are the possibility of multi-language page serving in different charsets or modifications done by the web server to the author's original page. > - Save page as "Web page, HTML only" is expected to save the exact page that was > received and I don't think it is a good thing if it modifies the page in any way > when saving. Keep as is. I am not very sure about this except as an effort to mimic Internet Explorer. "Web page, HTML only" states that the HTML content will be saved exclusively, but nothing else. Bug 125729 is an example of a nice feature which would require modification of the original even when saving only the HTML. If one wants access to the page as served by the server, I would recommend using "View Page/Frame Source". Maybe adding a "Save" menu entry in the source visualization screen should be submitted as an enhancement. Adding a new save type option "Original HTML" is another possibility. > - Save page as "Web Page, Complete" already modifies the page it saves, so it > should be enhanced to save the encoding selected for display on disk (the > encoding selected, not the one in the HTTP header. For the case everything > failed, and the user has had to manually select the correct encoding). It seems a good idea to use the encoding used for display when saving. The program should add or modify the HTML Content-Type meta tag of the page/frame when saving --or more radical and more dangerous, change all the encoding. However, as reasoned above, I think this should be done both for the "Web Page, HTML only" and "Web Page, complete" options.
*** Bug 280282 has been marked as a duplicate of this bug. ***
Assignee: general → nobody
QA Contact: general → dom-to-text

Bulk-downgrade of unassigned, untouched DOM/Storage bug's priority.

If you have reason to believe, this is wrong, please write a comment and ni :jstutte.

Severity: normal → S4
Priority: -- → P5

If saving as complete, the serialized adds a meta charset. Even when saving verbatim, Gecko autodetects the encoding of HTML loaded from file: URLs these days.

Status: NEW → RESOLVED
Closed: 4 years ago
Resolution: --- → WORKSFORME
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: