Closed Bug 301260 Opened 19 years ago Closed 19 years ago

[FIX]XMLSerializer gives inaccurate results

Categories

(Core :: XML, defect, P1)

defect

Tracking

()

RESOLVED FIXED
mozilla1.9alpha1

People

(Reporter: bedney, Assigned: bzbarsky)

References

(Depends on 1 open bug)

Details

Attachments

(5 files, 2 obsolete files)

User-Agent:       Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.7.5) Gecko/20041107 Firefox/1.0
Build Identifier: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.7.5) Gecko/20041107 Firefox/1.0

Basically, elements with a 'null' namespace don't get serialized that way such
that when a serialize / parse cycle occurs, the resulting document is not at all
what the original document intended.

The test case has much more explanatory text and example code.

Reproducible: Always

Steps to Reproduce:
1. Run the given test case
2.
3.

Actual Results:  
The serializer did not serialize properly.

Expected Results:  
The serializer to have serialized properly.
Attached file And another one
Note that I'm not changing the testcase text, just the code...  So ignore the text, please.
Attached patch Proposed patch (obsolete) — Splinter Review
I _think_ this covers everything...  We should really write a semi-exhaustive test suite to check.
Attachment #202334 - Flags: superreview?(peterv)
Attachment #202334 - Flags: review?(peterv)
Assignee: general → xml
Status: UNCONFIRMED → NEW
Component: General → XML
Ever confirmed: true
Priority: -- → P1
Product: Mozilla Application Suite → Core
QA Contact: general → ashshbhatt
Version: unspecified → Trunk
Assignee: xml → bzbarsky
OS: Windows XP → All
Hardware: PC → All
Summary: XMLSerializer gives inaccurate results → [FIX]XMLSerializer gives inaccurate results
Target Milestone: --- → mozilla1.9alpha
Attachment #202334 - Flags: superreview?(peterv)
Attachment #202334 - Flags: superreview+
Attachment #202334 - Flags: review?(peterv)
Attachment #202334 - Flags: review+
Attached patch Some tests (obsolete) — Splinter Review
These found some regressions from "proposed patch" (eg we roundtripped <root/> into <root xmlns=""/>).
Attachment #202334 - Attachment is obsolete: true
Attachment #202463 - Flags: superreview?(peterv)
Attachment #202463 - Flags: review?(bugmail)
Attachment #202464 - Flags: superreview?(peterv)
Attachment #202464 - Flags: review?(peterv)
Comment on attachment 202463 [details] [diff] [review]
Some tests

Note to self -- when checking this in, make sure to chmod u+x the shell script.
Comment on attachment 202464 [details] [diff] [review]
Updated patch that passes all those tests

Let's hope we got all the cases now.
Attachment #202464 - Flags: superreview?(peterv)
Attachment #202464 - Flags: superreview+
Attachment #202464 - Flags: review?(peterv)
Attachment #202464 - Flags: review+
Comment on attachment 202463 [details] [diff] [review]
Some tests

>+function do_compare_attrs(e1, e2) {
>+  const xmlns = "http://www.w3.org/2000/xmlns/";
>+
>+  var a1 = e1.attributes;
>+  var a2 = e2.attributes;
>+  for (var i = 0; i < a1.length; ++i) {
>+    var att = a1.item(i);
>+    // Don't test for namespace decls, since those can just sorta be
>+    // scattered about
>+    if (att.namespaceURI != xmlns) {
>+      var att2 = a2.getNamedItemNS(att.namespaceURI, att.localName);
>+      if (!att2) {
>+        do_throw("Missing attribute with namespaceURI '" + att.namespaceURI +
>+		 "' and localName '" + att.localName + "'");
>+      }

Don't you want to do_check_eq(att.value, att2.value) too?

>+function do_check_equiv(dom1, dom2) {
>+  do_check_eq(dom1.nodeType, dom2.nodeType);
>+  switch (dom1.nodeType) {
>+  case nsIDOMNode.TEXT_NODE:
>+  case nsIDOMNode.CDATA_SECTION_NODE:
>+  case nsIDOMNode.PROCESSING_INSTRUCTION_NODE:

For PIs you need to check that the target is the same.

>+  case nsIDOMNode.COMMENT_NODE:
>+    do_check_eq(dom1.data, dom2.data);
>+    break;
>+  case nsIDOMNode.ELEMENT_NODE:
>+    do_check_eq(dom1.namespaceURI, dom2.namespaceURI);
>+    do_check_eq(dom1.localName, dom2.localName);
>+    // Compare attrs in both directions -- do_compare_attrs does a
>+    // subset check.
>+    do_compare_attrs(dom1, dom2);
>+    do_compare_attrs(dom2, dom1);

If you check that dom1.attribute.length == dom2.attributes.length then one call should be enough, no?


>Index: content/test/unit/test_xml_serializer.js
...
>+function testString(str) {
>+  var doc = ParseXML(str);
>+  do_check_serialize(doc);
>+  do_check_eq(SerializeXML(doc), str);
>+}

Won't this test the same thing twice? It seems equivalent to 

  var doc = ParseXML(str);
  do_check_eq(SerializeXML(doc), str);
  var doc2 = ParseXML(str);
  do_check_equiv(doc, doc2);

Where the last two lines seem only vaugly usefull. You're only testing roundtripping of documents that come straight from the parser. Ideally you should muck around with |doc| a bit before calling do_check_serialize(doc).

I'll do the rest of the file tomorrow. In general there is some nice test infrastructure here. What i'm worried about is that since the markup is included directly in the code it'll get messy to do bigger files. And in the end I do think that we want to run tests within the browser (though possibly in addition to tests like this) to be able to access all the things that are available there.
(In reply to comment #9)
> Don't you want to do_check_eq(att.value, att2.value) too?

Yes, I do.  Added.

> >+  case nsIDOMNode.PROCESSING_INSTRUCTION_NODE:
> 
> For PIs you need to check that the target is the same.

Indeed.  I also realized that I was missing some QIs throughout this code; classinfo is very addictive...  ;)  I've added those now.

> If you check that dom1.attribute.length == dom2.attributes.length then one
> call should be enough, no?

I don't think we can enforce that.  For example, say I create a DOM (via DOM methods) that looks like:

qname = root ns = null
  qname = child ns = xxx

with no xmlns decls in sight (so I created child with createElementNS()).  Now if I serialize it we'll generate an xmlns decl on the child, and when we reparse that we'll have a DOM with a different number of attributes.

The invariant we should be enforcing (and which I aimed for) is that all attributes that are not in the xmlns namespace should be the same between the two elements.

> >+function testString(str) {
> >+  var doc = ParseXML(str);
> >+  do_check_serialize(doc);
> >+  do_check_eq(SerializeXML(doc), str);
> >+}
> 
> Won't this test the same thing twice? It seems equivalent to 
> 
>   var doc = ParseXML(str);
>   do_check_eq(SerializeXML(doc), str);
>   var doc2 = ParseXML(str);
>   do_check_equiv(doc, doc2);

Hmm...  Yes, given the |do_check_eq(SerializeXML(doc), str);| there's no real reason for the |do_check_serialize(doc);| check, is there?  I can remove that if you'd like.

This section really is just testing that we serialize right back to what we started with if it's not too hard to do it.  So the impotant part is the |do_check_eq(SerializeXML(doc), str);| check.

> What i'm worried about is that since the markup is
> included directly in the code it'll get messy to do bigger files.

We can add a ParseFile() method that takes a file name, opens up a stream on the file, and passes the stream to the DOMParser.  Want me to set that up?

> And in the end I do think that we want to run tests within the browser (though
> possibly in addition to tests like this) to be able to access all the things
> that are available there.

Oh, I agree.  This isn't even testing the effect of classinfo on this stuff, so it really is a test of the basic functionality of the parser and serializer.  I think we want both unit tests like this and whole-system DOM tests.

Thanks for looking at this, Jonas!  Let me know if you want to see an updated patch, ok?
Comment on attachment 202463 [details] [diff] [review]
Some tests

>+function test2() {
>+  // Test setting of "xmlns" attribute in the null namespace
>+
>+  // XXXbz are these tests needed?  What should happen here?  These
>+  // may be bogus.
>+
>+  var doc = ParseXML('<root/>');
>+  doc.documentElement.setAttribute("xmlns", "ns1");
>+  // XXXbz We fail this right now, but as I said, this may be a bogus test.
>+  // do_check_serialize(doc);

I think you should remove this entierly. IMO we should actually throw on that setAttribute call. 

>+  // Setting random "xmlns" attribute
>+  doc = ParseXML('<root xmlns="ns1"/>');
>+  doc.documentElement.setAttribute("xmlns", "ns2");
>+  do_check_serialize(doc);

I think this test should stay though.


>+function test4() {
...
>+  doc = ParseXML('<root xmlns="ns1" xmlns:a="ns2">'+
>+                 '<child xmlns:b="ns2" xmlns:a="ns3">'+
>+                 '<child2/></child></root>');
>+  root = doc.documentElement;
>+  // Have to QI here -- no classinfo flattening in xpcshell, apparently
>+  var node = root.firstChild.firstChild.QueryInterface(nsIDOMElement);
>+  node.setAttributeNS("ns4", "l1", "v1");
>+  node.setAttributeNS("ns4", "p2:l2", "v2");
>+  node.setAttributeNS("", "p3:l3", "v3");

This should actually throw so you might not want to include that. We end up dropping the prefix right away.

>+  node.setAttributeNS("ns3", "l4", "v4");
>+  node.setAttributeNS("ns3", "p5:l5", "v5");
>+  node.setAttributeNS("ns3", "a:l6", "v6");
>+  node.setAttributeNS("ns2", "l7", "v7");
>+  node.setAttributeNS("ns2", "p8:l8", "v8");
>+  node.setAttributeNS("ns2", "b:l9", "v9");
>+  node.setAttributeNS("ns2", "a:l10", "v10");
>+  node.setAttributeNS("ns1", "a:l11", "v11");
>+  node.setAttributeNS("ns1", "b:l12", "v12");
>+  node.setAttributeNS("ns1", "l13", "v13");
>+  do_check_serialize(doc);
>+  //  Note: we end up with "a2" as the prefix on "l11" and "l12" because we use
>+  //  "a1" earlier, and discard it in favor of something we get off the
>+  //  namespace stack, apparently

This seems slighly bad, though not really a big deal. Ideally wouldn't be the case though.

>+  do_check_eq(SerializeXML(doc),
>+              '<root xmlns="ns1" xmlns:a="ns2">'+
>+              '<child xmlns:b="ns2" xmlns:a="ns3">'+
>+              '<child2 a0:l1="v1" xmlns:a0="ns4"' +
>+              ' a0:l2="v2"' +

Shouldn't we ideally use 'p2' as prefix for l1 and l2? Not that you need to fix that now, but do we really want to test for the current behaviour?

>+              ' a:l5="v5"' +

Is this really what we want to do? Shouldn't we allow explicitly specified prefixes to stick even if there is another one around? Especially given that the prefix isn't used for another namespace. I guess I'm fine either way as long as it's really what we want to do. Same for l8
Attachment #202463 - Flags: review?(bugmail) → review+
(In reply to comment #10)
> The invariant we should be enforcing (and which I aimed for) is that all
> attributes that are not in the xmlns namespace should be the same between the
> two elements.

Ah, forgot that you had those xmlns checks in there.

> Hmm...  Yes, given the |do_check_eq(SerializeXML(doc), str);| there's no real
> reason for the |do_check_serialize(doc);| check, is there?  I can remove that
> if you'd like.

Yeah, i think it is redundant.

> > What i'm worried about is that since the markup is
> > included directly in the code it'll get messy to do bigger files.
> 
> We can add a ParseFile() method that takes a file name, opens up a stream on
> the file, and passes the stream to the DOMParser.  Want me to set that up?

As you please. It's probably a good idea to do if bigger testcases are written.
(In reply to comment #11)
> I think you should remove this entierly. IMO we should actually throw on that
> setAttribute call. 
...
> I think this test should stay though.

I'm not sure I really see the difference between those two tests.  Wouldn't they both throw?  Waiting on the answer to this before checking anything in.

> >+  node.setAttributeNS("", "p3:l3", "v3");
> 
> This should actually throw so you might not want to include that. We end up
> dropping the prefix right away.

I'd rather include it for completeness and then change the test if we ever change the core.  At the moment, this creates an attribute "l3" in the null namespace, which is what I test for...

> This seems slighly bad, though not really a big deal. Ideally wouldn't be the
> case though.

I really don't see how to avoid it, nor do I see a good reason to try.

> >+              ' a0:l2="v2"' +
> 
> Shouldn't we ideally use 'p2' as prefix for l1 and l2?

We go to serialize l1.  We see that it's in a namespace that's not declared, so we push an xmlns decl for it.  Then we go to serialize l2.  We see that it's got a prefix, but the prefix doesn't map to anything useful.  Then we see that we already have a prefix declared for the namespace it's in, so we reuse that.

Short of scanning the list of all attributes first, before starting any serializing, I don't see how we can improve on this...

> but do we really want to test for the current behaviour?

Yes; this is testing the "find an existing prefix for our namespace when our prefix isn't declared" code.

> >+              ' a:l5="v5"' +
> 
> Is this really what we want to do? Shouldn't we allow explicitly specified
> prefixes to stick even if there is another one around?

Good question.  For now, our algorithm tries to push as few xmlns decls as possible, so "no".  If we decide to change that, that's fine, but I'd like to test how our algorithm is succeeding for the time being.

All through this, I'm aiming to test for what our impl does; that way if we change the impl we'll know we changed behavior and will be able to evaluate whether the change is desirable.
(In reply to comment #13)
> (In reply to comment #11)
> > I think you should remove this entierly. IMO we should actually throw on that
> > setAttribute call. 
> ...
> > I think this test should stay though.
> 
> I'm not sure I really see the difference between those two tests.  Wouldn't
> they both throw?  Waiting on the answer to this before checking anything in.

The first one creates an attribute named 'xmlns' but that does not live in the xmlns namespace. This should throw.

The second one changes the value of an existing attribute (this isn't exactly what the DOM spec says to do, but IMHO our way of dealing with mixed namespaced and non-namespaced calls are saner).

> > >+              ' a0:l2="v2"' +
> > 
> > Shouldn't we ideally use 'p2' as prefix for l1 and l2?
...
> Short of scanning the list of all attributes first, before starting any
> serializing, I don't see how we can improve on this...

That is what I think we should do. Usually there aren't enough attributes that it matters. But changing that is for later so if you want to test the current impl then keep this one too.
This just adds ParseFile() and removes the xmlns thing that we should be throwing for.
Attachment #202463 - Attachment is obsolete: true
Attachment #202463 - Flags: superreview?(peterv)
Fixed.
Status: NEW → RESOLVED
Closed: 19 years ago
Resolution: --- → FIXED
Thanks Boris for fixing this and to Jonas and Peter for the reviews!

Cheers,

- Bill
Attachment #204412 - Flags: superreview+
Depends on: 335071
Depends on: 326994
Depends on: 335080
*** Bug 343470 has been marked as a duplicate of this bug. ***
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: