Closed Bug 57164 Opened 24 years ago Closed 23 years ago

[charset]loading stylesheet in xml by using <?xml-stylesheet do not listen to charset parameter [IMPORT]

Categories

(Core :: XML, defect, P3)

x86
Windows NT
defect

Tracking

()

RESOLVED FIXED
mozilla1.0

People

(Reporter: ftang, Assigned: bzbarsky)

References

()

Details

(Keywords: css2, relnote, Whiteboard: relnote-devel (py8ieh: fodder for XML importtest) (py8ieh: attach testcases) (py8ieh: also check text/css charset parameter) (py8ieh: check HTML <link> and HTTP "Link:" also support charset))

Attachments

(7 files)

According to "Jun '99: W3C Recommendation:  Associating stylesheets with XML
documents" http://www.w3.org/TR/xml-stylesheet/

....
The following pseudo attributes are defined

....
charset CDATA #IMPLIED
....

Which mean we should listen to the charset attribute of <?xml-stylesheet tag to
load the stylesheet. Currently we don't

reproduce procedure
1. visit http://ftang/ftang/css2/kanji/bug.xml
2. visit http://ftnag/ftang/css2/kanji/correct.xml
they should look the same.

bug.xml use bug.css and include the charset informating at the <?xml-stylesheet
?> tag (as charset="Shift_JIS")
correct.xml use correct.css and include the charset information at the first
line of css by using @charset "Shift_JIS";

The correct.xml currently working since the @charset is already implemented. The
<?xml-stylesheet charset="Shift_JIS" ?> is not working now.

I think this is not important for Netscape6 rtm, but it will be nice if we can
fix this right after.
Future.
Target Milestone: --- → Future
Frank: Nice catch!

I'm assuming your server does not return authoritative character set 
information, e.g. in the Content-Type field? If it is, that would override the
chatset field of the stylesheet PI.

RELEASE NOTE ITEM:
   Mozilla currently does not support the 'charset' pseudo-attribute of the 
   XML Stylesheet Linking PI. Workaround: Use the CSS2 @charset rule to specify
   the encoding as the first rule in your stylesheet.
Component: Style System → XML
Whiteboard: (py8ieh: fodder for XML importtest) (py8ieh: attach testcases) (py8ieh: also check text/css charset parameter) (py8ieh: check HTML <link> and HTTP "Link:" also support charset)
I don't think we need to fix this this time. It will be nice if we can fix this 
after RTM ASAP.
Whiteboard: (py8ieh: fodder for XML importtest) (py8ieh: attach testcases) (py8ieh: also check text/css charset parameter) (py8ieh: check HTML <link> and HTTP "Link:" also support charset) → relnote-devel (py8ieh: fodder for XML importtest) (py8ieh: attach testcases) (py8ieh: also check text/css charset parameter) (py8ieh: check HTML <link> and HTTP "Link:" also support charset)
QA Contact: chrisd → petersen
Nom. nsbeta1 on grounds of standards compliance correctness and enablement of
international content.
Keywords: nsbeta1
Franck, could you attach the two testcases to the bug report?  Thanks.
Reassigned to ftang.  Franck, please attach the two testcases to the bug report 
and reassign the bug to me.

Related bugs are bug 66190 and bug 63502
Assignee: pierre → ftang
Target Milestone: Future → ---
pierre- just visit
1. visit http://ftang/ftang/css2/kanji/bug.xml
2. visit http://ftnag/ftang/css2/kanji/correct.xml
Assignee: ftang → pierre
Summary: loading stylesheet in xml by using <?xml-stylesheet do not listen to charset parameter → loading stylesheet in xml by using <?xml-stylesheet do not listen to charset parameter [IMPORT]
Frank please attach the pages as there are people outside of netscape who might
be interested in this bug.
Attached file ftang's bug.xml
Attached file ftang's correct.xml
Attached file ftang's bug.css
Attached file ftang's correct.css
Boris: another charset bug... Do you want to take it?
Target Milestone: --- → mozilla1.0
um.... Let me wrap up my other ones first.. I have no idea where to even start
on this one.  But I'll keep it in mind.  :)
using build 2001100903 win32

both testcases do not work. Is @charset also broken?
Blocks: 104166
I think files were converted when uploaded with non-Japanese
browser encoding or something like that. I zipped up the original
4 files and attached it above. Unarchive the file with WinZip and
you should see the @charset working with correct.xml & correct.css files.
Keywords: nsbeta1
ok, with the zip attachment it works.
Summary: loading stylesheet in xml by using <?xml-stylesheet do not listen to charset parameter [IMPORT] → [charset]loading stylesheet in xml by using <?xml-stylesheet do not listen to charset parameter [IMPORT]
Frank, I have the fix for bug 72658 in my tree, so the included testcase
worksforme (since the document charset and the stylesheet charset are the same).

I can disable that code while I work on this, but could you possibly create a
stylesheet in a _different_ charset from the document for testing and
verification purposes?
Explanation of test cases from "attachcomments.txt" file:

The following test should be conducted with default browser encoding set to
Western (ISO-8859-1), Edit | Prefs | Navigator | Languages. auto-detection must
be OFF.

** For test cases 1-4: Element names are in Japanese: all XML & CSS files are in
Shift_JIS Japanese.

1. shiftjisA.xml/shiftjisa.css -- stylesheet charset; no @charset in .css file.
(Patch for 72658 or patch for this bug should work)

2. shiftjisB.xml/shiftjisb.css -- no stylesheet charset; @charset in .css file.
(should work now with no patches)

3. shiftjisC.xml/shiftjisc.css -- stylesheet charset; @charset in .css file.
(should work now with no patches)

4. shiftjisD.xml/shiftjisd.css -- no stylesheet charset; no @charset in .css
file. (Only the patch for 72658 can fix this problem.)


** All CSS files in the following tests are encoded in UTF-8. XML files are
either in Shift_JIS Japanese or UTF-8.

5. utf8a.xml/utf8a.css -- XML in Shift_JIS; stylesheet charset=UTF-8; no
@charset in .css file. (Color style works because the element names are in
ASCII. Character display is incorrect. Only Patch for this bug can fix the
latter problem.)

6. utf8b.xml/utf8b.css -- XML in Shift_JIS; stylesheet charset=UTF-8; no
@charset in .css file. (NO styling applied. Unlike 5, element names are UTF-8
Japanese in .css file. Only patch for this bug can fix it.)

7. utf8c.xml/utf8c.css -- XML in Shift_JIS; no stylesheet charset; @charset
exists in .css file. Element names in UTF-8 Japanese in .css file. (This should
work now without any patches)

8. utf8d.xml/utf8d.css -- XML doc in UTF-8 but no encoding declaration; no
stylesheet charset; no @charset in .css file. Element names in UTF-8 Japanese in
.css file. (Only the patch for 72658 can fix this problem.)

9. utf8e.xml/utf8e.css -- XML doc in UTF-8 but no encoding declaration; no
stylesheet charset; @charset=UTF-8 in .css file. Element names in UTF-8 Japanese
in .css file. (This should work now without any patches.)

Test cases 5 & 6 can be viwed correctly only with the fix for this
bug.

Test cases 4 & 8 can be correctly viwed only with the fix for Bug 72658.

Test case 1 can be fixed with the patch for this bug or Bug 72658.

** These test cases also show that Mozilla can handle non-ASCII element
names in CSS definitions. (IE6 cannot currently.) Mozilla can also
handle non-ASCII attribute names, values, and IDs in CSS definitions 
but these are not in the current test cases.
Thanks for the testcases!

My build currently passes all of them except utf8d.xml/utf8d.css

At a guess, this is because the stylesheet is loaded _before_ we've done charset
sniffing on the XML document (I assume that's how we get the XML doc's charset).

In particular, we ask the document for its charset in that case and the document
tells us that it's in ISO-8859-1....
I'll take this one after all... :)

Patch fixes this and also bug 72658 and bug 83207
Assignee: pierre → bzbarsky
Keywords: patch, review
The comment below is not correct, since the default charset of an XML document
is UTF-8. I would advice deleting ", falling back to ISO-8869-1".

+    // NOTE: the SetCharset method will always get the preferred
+    // charset from the charset passed in unless it is the
+    // emptystring, which causes the default charset (that of the
+    // document, falling back to ISO-8869-1) to be set
Hmm..  Perhaps I should clarify that to:

"that of the document, falling back to ISO-8859-1 if no document is present"

But that being said, would UTF-8 be a more reasonable fallback for the default
charset if we have absolutely no other way of getting it?

Blocks: 83207
The default charset for XML is UTF-8, I have no idea what the default charset
for CSS would be. See if the spec has anything to say. If not, I think
ISO-8859-1 is good for CSS.
according to http://www.w3.org/TR/REC-CSS2/syndata.html#q23 

<quote>
When a style sheet resides in a separate file, user agents must observe the
following priorities when determining a document's character encoding (from
highest priority to lowest):

1. An HTTP "charset" parameter in a "Content-Type" field.
2. The @charset at-rule.
3. Mechanisms of the language of the referencing document (e.g., in HTML, the
"charset" attribute of the LINK element).

</quote>
Yes. And 

4.  Use the document's character encoding

What's the fallback in case all of 1-4 fail, though?  (yes, we _do_ have a case
in which this is necessary due to other issues that are sort of outside the
scope of this bug, imo).
Oops! Quoted the wrong part. Wanted to quote this part:
<quote>
For transmission and storage, these characters must be encoded by a character
encoding that supports the set of characters available in US-ASCII(e.g., ISO
8859-x, SHIFT JIS, etc.).
</quote>

it doesn't say what should be the default though. I'm wondering if it should be
the same as however moz treat html pages?
Ok.... tracing through the code, the _only_ time that we actually need that #5
fallback is when we are loading the agent sheets.  There are ways to restructure
the code that would make this fallback unnecessary, as I said.  Not going to do
it as part of this patch.

But our internal sheets are fine in ISO-8859-1. So we can just leave it at that.
So, with my proposed change to that comment, reviews?
> What's the fallback in case all of 1-4 fail, though?  
> (yes, we _do_ have a case in which this is necessary 
> due to other issues that are sort of outside the
> scope of this bug, imo).

I meant testcase #8 to prove that whatever current 
document encoding determined by the browser should
propagate into unlabaled (in terms of charset/encoding) 
CSS files. You should just check what the final document
encoding is and then use that for CSS, too. 
My intent was that that encoding should be UTF-8 as 
required by XML 1.0. NOT ISO-8859-1. 
Yep.  That's what my patch does.  The XML document was actually reporting its
own encoding incorrectly.  That's what my change to nsXMLDocument.cpp fixes.
Replace fprintf(stderr) with a debug macro.  We have a couple of other instances 
of 'stderr' in nsCSSLoader that need to be removed.

Rename parameters to OnStreamComplete() as "a-Uppercase" (ie. aContext, 
aString...)

Why in CSSLoaderImpl::SetCharset() do you look for "@charset" in 
strStyleDataUndecoded instead of in aStyleSheetData directly?
Comment on attachment 53578 [details] [diff] [review]
Proposed patch (works correctly on all of the attached testcases)

r=pierre with minor changes above
Attachment #53578 - Flags: review+
Oops. the stderr was not meant to be in there at all.  removed.  :)

Parameters renamed.

aStyleSheetData is a char*.  It used to be a nsString, but I've changed that...
basically, the creation of the nsString moved from OnStreamComplete to SetCharset().

bug 80106 will address further improvements to how we parse @charset; that's
what I plan to work on once this is done... 

Or did I misunderstand the comment?
Comment on attachment 53578 [details] [diff] [review]
Proposed patch (works correctly on all of the attached testcases)

sr=attinasi
Attachment #53578 - Flags: superreview+
Checked in.
Status: NEW → RESOLVED
Closed: 23 years ago
Resolution: --- → FIXED
QA Contact: petersen → rakeshmishra
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: