User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.5) Gecko/20031016 Build Identifier: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.5) Gecko/20031016 When literal containing newline characters serialized into XML newline characters insert as is into the attribute of RDF:Description. Then, while reading this value from XML parser treats these newline characters as spaces, so writing and rereading of XML leads to lost line breaks. Reproducible: Always Steps to Reproduce: text_res = this.RDF.GetLiteral("Some \n text \n goes \n here"); this.ds.Assert(this.selected, this.rdf_text, text_res, true); this.flush(); Actual Results: In the RDF file: <RDF:Description about="rdf:#$XHK122" otl:text="Some text goes here" /> There are two possible ways to resolve this prob: 1) Quote newline characters while serializing literal into the Description's attribute. works fine. Heh, I've got no idea about possible issues with different newline representation in different OS, could it be a prob here? 2) Serialize complex literals as tag, not attribute. I.e. <otl:text>Some text goes here</otl:text> Both parses correctly, but second is more readable (IMHO).
I don't have a tree at the moment, but this isn't a dup, and I'm pretty sure this problem exists as described. Tentatively confirming.
Status: UNCONFIRMED → NEW
Ever confirmed: true
Created attachment 138913 [details] [diff] [review] patch This patch adopts method #2. I'm not totally thrilled with the design of this whole section of the code, since the tests on each property can be run redundantly, but I'd rather not fix that here.
I think it would be best to see what the XML/RDF specs say, if anything, about this matter. [ One interesting, somewhat-related note that I stumbled across: http://www.w3.org/TR/REC-xml#sec-line-ends ]
Actually, I looked it up before writing the patch, because I wasn't sure either. I'm pretty sure, though, that this is the write thing to do. Normalization of whitespace in XML attributes is defined here: http://www.w3.org/TR/REC-xml#AVNormalize The first of several steps is the normalization of line breaks as described in the link you posted. The critical portion, however, is the third bullet under "3.": * For a white space character (#x20, #xD, #xA, #x9), append a space character (#x20) to the normalized value. In other words, our attribute parsing is correct -- when reading something of the form <rdf:Description NC:foo="a value that spans several lines" /> It should be processed as "a value that spans several lines". However, this changes the value of the RDF literal! Plain RDF literals are simply Unicode strings, and comparison of two literals is accomplished through bytewise comparison of their respective Unicode characters. http://www.w3.org/TR/2003/PR-rdf-concepts-20031215/#dfn-literal Monkeying with literal values is -- as this bug correctly points out -- naughty behavior. By serializing the literal as an XML element body, we can preserve the whitespace as we originally read it.
Comment on attachment 138913 [details] [diff] [review] patch Looks acceptable to me, r=rjc
Attachment #138913 - Flags: review?(rjc) → review+
Attachment #138913 - Flags: superreview?(bz-vacation)
Comment on attachment 138913 [details] [diff] [review] patch Yeah, seems ok. sr=bzbarsky. Use nsnull instead of NULL maybe? And is GetValueConst guaranteed to give a valid nonzero ptr that you can deref?
Attachment #138913 - Flags: superreview?(bz-vacation) → superreview+
Assignee: rjc → tingley
Fix checked in, with bz's suggested changes.
Status: NEW → RESOLVED
Last Resolved: 15 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.