Closed Bug 226786 Opened 22 years ago Closed 22 years ago

Blanks And New Line Characters Get Stripped From CDATA when prettyprinting

Categories

(Core :: XML, defect)

x86
Windows 2000
defect
Not set
normal

Tracking

()

RESOLVED WORKSFORME

People

(Reporter: vgendler, Assigned: hjtoi-bugzilla)

References

Details

Attachments

(4 files)

User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.0; en-US; rv:1.5) Gecko/20031007 Build Identifier: Mozilla/5.0 (Windows; U; Windows NT 5.0; en-US; rv:1.5) Gecko/20031007 Blanks and new line characters get stripped from XML CDATA section. Also the keyword CDATA together with symbols "[" and "]" are not showed up. Reproducible: Always Steps to Reproduce: 1. See the attached pictures for Mozilla and IE and sample XML file. Actual Results: Incorrect picture in Mozilla Expected Results: Correct picture in IE
Attached file Sample XML File
Attached image Correct View In IE
Summary: Blanks And New Line Characters Get Stripped From CDATA → Blanks And New Line Characters Get Stripped From CDATA when prettyprinting
Attachment #136328 - Attachment mime type: text/plain → text/xml
Las changed was: bz-vacation@mit.edu changed: What |Removed |Added ---------------------------------------------------------------------------- Attachment #136328 [details]|text/plain |text/xml mime type| | PLEASE DO NOT CHANGE!!!!!!!!!!!!!!!!!! THIS BUG REPORTS THAT Mozilla DOES NO OUTPUT XML FILES CORRECTLY, SO THIS CHANGE PRODUCES WRONG EXAMPLE OUTPUT. ---------- I will attach this XML file again as text/plain ------------
Please DO NOT change to text/xml!!!!!!!!!!!!!!!!!!!!!
This is more or less by design. There is nothing in any spec that says that newlines and whitespace in CDATA sections are more important then newlines and whitespace in other textnodes. So why should we perseve that whitespace in one case but not the other?
Status: UNCONFIRMED → RESOLVED
Closed: 22 years ago
Resolution: --- → INVALID
By design?????????????? CDATA contains DATA and each space and new line character belong to data - has meaning. See IE, XMLSpy, ... ANY XML editor.
If this is "by design" then the design is wrong. Couple of examples. 1. CDATA contains EJB QL statement (J2EE), for instance SELECT Object(p) FROM schema WHERE schema.attr AS p LIKE 'abc de%' In Mozilla we see SELECT Object(p) FROM schema WHERE schema.attr AS p LIKE 'abc de%' And this is wrong!!! 2. CDATA contains code snippet written in Python language. Not only we will see the complete mess but also WRONG code because in Python lines indentation have syntactical meaning. I am reading now J2EE 1.4 documentation which has XML code as links. This links show WRONG code - I have to use MSIE. I can give you hundreds of examples why it is wrong. You can reformat XML code and still have well-formed XML document even valid (if there is DTD or XML schema) but CDATA contents has nothing to do with XML code. It is data and as such should be preserved as it is.
As stated, the whitespace in normal textnodes can be just as important as whitespace is cdata-sections so if we should preserve 'formatting' in one we should in the other. You can put python or images or whatever in textnodes without using cdata sections
Up to you guys, up to you. Anyway, the last attempt. In Additional Comment #6 From Jonas Sicking 2003-11-26 09:36 you wrote "There is nothing in any spec that says that newlines and whitespace in CDATA sections are more important then newlines and whitespace in other textnodes." I am not talking about what is MORE or LESS important. I said that CDATA sections must be outputted preserving ALL characters in it. Here is an excerpt from W3C XML specification http://www.w3.org/TR/REC-xml ====================================================== Section 3.3.3 Attribute-Value Normalization If the attribute type is not CDATA, then the XML processor must further process the normalized attribute value by discarding any leading and trailing space (#x20) characters, and by replacing sequences of space (#x20) characters by a single space (#x20) character. ====================================================== Here we clearly see that CDATA MUST NOT be processed. I think that mark this issue as "RESOLVED INVALID" you prevent other developers to express their opinion regarding this issue.
whitespace in cdata sections are preserved in the DOM so we're not breaking any specs here, we're simply using a css-style that doesn't do what you want it to do. Note that the xml-spec says the same thing about whitespace in textnodes as in cdata sections, they must be preserved. If you were to write an application that used the data from the XML (through DOM or any other method) you will find that the whitespace is there. As stated, cdata-sections and other text should not be treated differently when it comes to whitespace. Whitespace is equally important in both. It could be argued that we should have some mode in the prettyprint that preserves all whitespace, but then IMHO we should do that for both textnodes and cdata-sections.
You wrote: Additional Comment #11 From Jonas Sicking 2003-11-27 22:02 "we're not breaking any specs here, we're simply using a css-style that doesn't do what you want it to do" NO! Nobody wants to have this. It is not only me - check W3C specs, check ANY XML book. css-style is not enough - in the css you can define WINDINGS font. Right? It will create very funky document. Try this too. "Note that the xml-spec says the same thing about whitespace in textnodes as in cdata sections, they must be preserved." - SO! Preserve them! "If you were to write an application that used the data from the XML (through DOM or any other method) you will find that the whitespace is there." - Right! But I also want TO SEE WHAT I GET! This is what I see in IE but unfortunately NOT in Mozilla. The same I see in ANY XML editor. Do you want people to abandon Mozilla and use IE? "As stated, cdata-sections and other text should not be treated differently when it comes to whitespace." - Dead wrong and W3C (and ALL XML books) states this as I quoted in the previous message. Want to HURT Mozilla, make it unusable (for XML) go ahead, do it.
Some additions. I have checked many books about this subject matter. All of them state the same as described in W3C's XML specification. Here is an example from "J2EE™ Web Services" By Richard Monson-Haefel Addison Wesley ISBN: 0-321-14618-2 2.1.2.5 CDATA Section A CDATA section allows you to mark a section of text as LITERAL so that it will NOT be PARSED for tags and SYMBOLS, but will instead be considered just a string of characters. As we see again CDATA section must be treated differently. It is not the same as other text nodes. Actually XML document is a TEXT document and as such ALL its elements are text. It does not mean that all of them must be treated in the same way. One more example - comments - they must not be parsed too. It is similar to HTML <pre></pre> Tag - Mozilla, of course, does not parse the text in this tag.
You have only found quotes that say that whitespace in cdata should be preserved, nothing that says that it doesn't need to in textnodes, so my statement still holds true: textnodes and cdata-sections are no different when it comes to whitespace, it needs to be preserved in both.
Message Additional Comment #14 From Jonas Sicking 2003-11-28 14:46 You said "You have only found quotes that say that whitespace in cdata should be preserved, nothing that says that it doesn't need to in textnodes" First of all XML document is a text document that is it contains TEXT, only TEXT, nothing but the TEXT. I do not understand what you are mean under "textnodes". Everything is text there. Next, the W3C's XML specification clearly says that CDATA must not be parsed. The same say all XML books authors, the same implemented in all XML editors, the same, of course implemented in IE. Once again CDATA is different than other "textnodes" as the XML specs says. Mozilla also does not show the CDATA markups together with "[" and "]". Many Control Centers for various J2EE application servers implemented as browser applications. All of them have screens for editing component descriptors which are XML documents. Such implementation of CDATA makes impossible to present these descriptors for viewing and editing as I showed in example of EJB QL for EJB.
Vladimir, what Jonas is referring to as nodes are DOM nodes, I believe. When Mozilla parses the XML, it is treated per the XML spec. But then it will be put into DOM, whose rules differ slightly from XML. Then the XML is transformed with XSLT, which can not preserve all constructs in the original XML (for example, CDATA sections and namespace declarations). Finally the result DOM is displayed with CSS, whose rules are again different. But, supposing the result DOM still has the whitespace you'd need, could we change the CSS style to preserve whitespace? Anybody see any problems with that? We could even have alternate CSS stylesheet for that case. Reopening while we discuss that.
Status: RESOLVED → UNCONFIRMED
Resolution: INVALID → ---
Yeah, I'd be fine with adding more alternative stylesheets that preserves whitespace. We'd have to add some capabilities to the xslt-engine if we want to be able to just preserve stuff for cdata-sections though which i'm less sure i want to do.
Correct Heikki. I can add to it that in this version of Mozilla we are not able to read the latest J2EE specification. Unfortunately we can do this in IE. As much as I hate IE and love Mozilla I have no other choice as to use damn IE regardless all "for" and "gains". Do we want this? I do NOT.
What do you know - the Monospace alternative stylesheet we supply already contains the rule to preserve whitespace (http://lxr.mozilla.org/seamonkey/source/content/xml/document/resources/XMLMonoPrint.css). Vladimir, see if this is enough for you: when you open an unstyled XML document (like the first attachment in this bug), select the Monospace alternative stylesheet (View > Use style > Monospace). Does it now look like you wanted?
Almost Heikki!!! I mean this is enough to see the correct contents of CDATA sections. Should be also the keyword CDATA and all brackets "[", "]" preserved? Thank you sir!
Yes, they should be preserved, but at the moment they are not. The reason is because we do the pretty printed view via XSLT transformation and as far as I know XSLT will not be able to preserve them. We would need an extension in our XSLT engine to handle them. There are also other things that we loose because of XSLT, but they are all covered by bug 175946. Closing this as worksforme.
Status: UNCONFIRMED → RESOLVED
Closed: 22 years ago22 years ago
Resolution: --- → WORKSFORME
One more thing: may be make this style as a default for XML. Onse again, thank you very much!
*** Bug 300593 has been marked as a duplicate of this bug. ***
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: