Closed Bug 190302 Opened 23 years ago Closed 22 years ago

any xsl transformations show wrong codepage, but draw normal page

Categories

(Core :: XSLT, enhancement)

enhancement
Not set
normal

Tracking

()

RESOLVED FIXED

People

(Reporter: andrew_v, Assigned: peterv)

References

Details

Attachments

(1 file, 2 obsolete files)

User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.2.1) Gecko/20021130 Build Identifier: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.2.1) Gecko/20021130 any xsl transformaions show wrong codepage, but draw normal page Reproducible: Always Steps to Reproduce: 1. do any xsl transformaions with any encoding( default value is "UTF-8") 2. Ctrl+I and view an "Encoding= 3. ISO-8859" 3. why?
well, we actually should set the encoding to UTF-16, regardless of what the stylesheet says. As that is what we do. (Andrew, encoding is a serialisation issue, and Mozilla does not serialize its output but generates content directly. Internally, all string data is encoded in 2 byte strings, which, AFAICT, is UCS2 (sp?), closest to UTF-16?) I should look up our possibilities before rambling.
Severity: normal → enhancement
Status: UNCONFIRMED → NEW
Ever confirmed: true
OS: Linux → All
Hardware: PC → All
Most of the time the codepage is pretty useless information, but in some cases it is used as default encoding when loading linked resources such as stylesheets. If that is done according to some spec we should allow control over that even for XSLT generated pages. However if that is done strictly because we want to be intelligent, then we should probably set the codepage of the result document to be the same as that of the source document.
Could be as simple as mDocument->SetDocumentCharacterSet("UTF-16"); mDocument->SetDocumentCharacterSetSource(kCharsetFromOtherComponent); There are 3 options though: 1) Use the stylesheet's hint 2) Use the source's codepage 3) Always use UTF-16 Right now I'm leaning to 1, but I'm not sure.
I think it mostly depends on if we indeed use it to load external resources such as stylesheets. If we do then I think the most important thing is that we get that part right, i.e. then we should use 2). On the other hand, if the encoding in the document isn't used for anything other then to show something in the pageinfo dialog then 1) gets my vote since that is what the author explicitly requests. 3) doesn't really make any sense to me since we're not using UTF-16. We're not "really" using any encoding since encodings only exists in serialized documents, which we of course don't have. CC-ing bz to get his input on if/how the stylecode uses the document codepage.
The document charset is used in a few places: 1) When loading CSS sheets, it's used if there is no HTTP header listing the charset, no @charset rule in the sheet, and not charset attr on the element or PI loading the sheet (in other words, 99% of the time). 2) When unescaping URIs (since the unescaped version of a URI is a byte array which then needs to be converted to characters). This case is quite evil, since we have no real way to tell whether a given URI string is coming from the original XML document the XSLT is applied to or from the XSL sheet itself. I'd use one or the other for the encoding there and maybe have an NS_WARNING when they don't match. 3) When saving the document as "web page, complete" (serialization issue); in fact when performing any sort of serialization to byte-stream. 4) Same as for stylesheets but for scripts loaded via <script src=""> (used if there is no HTTP header, no charset attr on the loading element, and no BOM). There are probably other uses that I can't think of at the moment.
Summary: any xsl transformaions show wrong codepage, but draw normal page → any xsl transformations show wrong codepage, but draw normal page
point three in comment 5 convinces me that we should use the encoding specified in the stylesheet if one is specified. But from the other points I think we should use either the stylesheets or the documents encoding as the default. Can't really say I have any arguments for if the documents or the stylesheets encoding is more important.
Imo, if the sheet specifies an encoding we should just use that. If the author is confused, that's the author's problem. This gives authors the freedom to not specify an encoding if they want us to "just do something".
Attached patch patch to fix (obsolete) — Splinter Review
This patch makes us use the encoding specified in <xsl:output> element, or fallback to the source-documents encoding if none is specified. We should really should refactor the code in txMozillaTextOutput::createResultDocument and txMozillaXMLOutput::createResultDocument into a common function, but that's for another bug.
Attachment #133706 - Flags: superreview?(peterv)
Attachment #133706 - Flags: review?(axel)
Comment on attachment 133706 [details] [diff] [review] patch to fix <Pike> like, three Sets and one Get <sicking> doh! could you move the charset up in txMozillaXMLOutput, so that the call to ResetWithSource is part of the patch? That makes it easier to grok.
Attachment #133706 - Flags: review?(axel) → review-
Attachment #133706 - Flags: superreview?(peterv)
Attached patch patch v2 (obsolete) — Splinter Review
Attachment #133706 - Attachment is obsolete: true
Attachment #133776 - Flags: superreview?(peterv)
Attachment #133776 - Flags: review?(axel)
Attachment #133776 - Flags: superreview?(peterv) → superreview+
Attachment #133776 - Flags: review?(axel) → review+
checked in
Status: NEW → RESOLVED
Closed: 22 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: