Last Comment Bug 301260 - [FIX]XMLSerializer gives inaccurate results
: [FIX]XMLSerializer gives inaccurate results
Status: RESOLVED FIXED
:
Product: Core
Classification: Components
Component: XML (show other bugs)
: Trunk
: All All
: P1 normal (vote)
: mozilla1.9alpha1
Assigned To: Boris Zbarsky [:bz] (TPAC)
: Ashish Bhatt
Mentors:
: 343470 (view as bug list)
Depends on: 335080 326994 335071
Blocks:
  Show dependency treegraph
 
Reported: 2005-07-18 19:58 PDT by William J. Edney
Modified: 2006-07-05 05:06 PDT (History)
3 users (show)
See Also:
Crash Signature:
(edit)
QA Whiteboard:
Iteration: ---
Points: ---
Has Regression Range: ---
Has STR: ---


Attachments
Attachment showing that the XMLSerializer forgets about null namespaces (2.31 KB, text/html)
2005-07-18 19:58 PDT, William J. Edney
no flags Details
Similar testcase along those lines... (2.45 KB, text/html)
2005-11-08 19:29 PST, Boris Zbarsky [:bz] (TPAC)
no flags Details
And another one (2.31 KB, text/html)
2005-11-08 20:12 PST, Boris Zbarsky [:bz] (TPAC)
no flags Details
Proposed patch (4.74 KB, patch)
2005-11-08 20:35 PST, Boris Zbarsky [:bz] (TPAC)
peterv: review+
peterv: superreview+
Details | Diff | Splinter Review
Some tests (24.59 KB, patch)
2005-11-09 22:09 PST, Boris Zbarsky [:bz] (TPAC)
jonas: review+
Details | Diff | Splinter Review
Updated patch that passes all those tests (9.71 KB, patch)
2005-11-09 22:11 PST, Boris Zbarsky [:bz] (TPAC)
peterv: review+
peterv: superreview+
Details | Diff | Splinter Review
Tests updated to comments (26.07 KB, patch)
2005-11-28 20:38 PST, Boris Zbarsky [:bz] (TPAC)
peterv: superreview+
Details | Diff | Splinter Review

Description William J. Edney 2005-07-18 19:58:13 PDT
User-Agent:       Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.7.5) Gecko/20041107 Firefox/1.0
Build Identifier: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.7.5) Gecko/20041107 Firefox/1.0

Basically, elements with a 'null' namespace don't get serialized that way such
that when a serialize / parse cycle occurs, the resulting document is not at all
what the original document intended.

The test case has much more explanatory text and example code.

Reproducible: Always

Steps to Reproduce:
1. Run the given test case
2.
3.

Actual Results:  
The serializer did not serialize properly.

Expected Results:  
The serializer to have serialized properly.
Comment 1 William J. Edney 2005-07-18 19:58:57 PDT
Created attachment 189734 [details]
Attachment showing that the XMLSerializer forgets about null namespaces
Comment 2 Boris Zbarsky [:bz] (TPAC) 2005-11-08 19:29:11 PST
Created attachment 202330 [details]
Similar testcase along those lines...
Comment 3 Boris Zbarsky [:bz] (TPAC) 2005-11-08 20:12:53 PST
Created attachment 202332 [details]
And another one

Note that I'm not changing the testcase text, just the code...  So ignore the text, please.
Comment 4 Boris Zbarsky [:bz] (TPAC) 2005-11-08 20:35:29 PST
Created attachment 202334 [details] [diff] [review]
Proposed patch

I _think_ this covers everything...  We should really write a semi-exhaustive test suite to check.
Comment 5 Boris Zbarsky [:bz] (TPAC) 2005-11-09 22:09:42 PST
Created attachment 202463 [details] [diff] [review]
Some tests

These found some regressions from "proposed patch" (eg we roundtripped <root/> into <root xmlns=""/>).
Comment 6 Boris Zbarsky [:bz] (TPAC) 2005-11-09 22:11:45 PST
Created attachment 202464 [details] [diff] [review]
Updated patch that passes all those tests
Comment 7 Boris Zbarsky [:bz] (TPAC) 2005-11-09 22:18:49 PST
Comment on attachment 202463 [details] [diff] [review]
Some tests

Note to self -- when checking this in, make sure to chmod u+x the shell script.
Comment 8 Peter Van der Beken [:peterv] 2005-11-10 05:35:03 PST
Comment on attachment 202464 [details] [diff] [review]
Updated patch that passes all those tests

Let's hope we got all the cases now.
Comment 9 Jonas Sicking (:sicking) No longer reading bugmail consistently 2005-11-10 18:45:14 PST
Comment on attachment 202463 [details] [diff] [review]
Some tests

>+function do_compare_attrs(e1, e2) {
>+  const xmlns = "http://www.w3.org/2000/xmlns/";
>+
>+  var a1 = e1.attributes;
>+  var a2 = e2.attributes;
>+  for (var i = 0; i < a1.length; ++i) {
>+    var att = a1.item(i);
>+    // Don't test for namespace decls, since those can just sorta be
>+    // scattered about
>+    if (att.namespaceURI != xmlns) {
>+      var att2 = a2.getNamedItemNS(att.namespaceURI, att.localName);
>+      if (!att2) {
>+        do_throw("Missing attribute with namespaceURI '" + att.namespaceURI +
>+		 "' and localName '" + att.localName + "'");
>+      }

Don't you want to do_check_eq(att.value, att2.value) too?

>+function do_check_equiv(dom1, dom2) {
>+  do_check_eq(dom1.nodeType, dom2.nodeType);
>+  switch (dom1.nodeType) {
>+  case nsIDOMNode.TEXT_NODE:
>+  case nsIDOMNode.CDATA_SECTION_NODE:
>+  case nsIDOMNode.PROCESSING_INSTRUCTION_NODE:

For PIs you need to check that the target is the same.

>+  case nsIDOMNode.COMMENT_NODE:
>+    do_check_eq(dom1.data, dom2.data);
>+    break;
>+  case nsIDOMNode.ELEMENT_NODE:
>+    do_check_eq(dom1.namespaceURI, dom2.namespaceURI);
>+    do_check_eq(dom1.localName, dom2.localName);
>+    // Compare attrs in both directions -- do_compare_attrs does a
>+    // subset check.
>+    do_compare_attrs(dom1, dom2);
>+    do_compare_attrs(dom2, dom1);

If you check that dom1.attribute.length == dom2.attributes.length then one call should be enough, no?


>Index: content/test/unit/test_xml_serializer.js
...
>+function testString(str) {
>+  var doc = ParseXML(str);
>+  do_check_serialize(doc);
>+  do_check_eq(SerializeXML(doc), str);
>+}

Won't this test the same thing twice? It seems equivalent to 

  var doc = ParseXML(str);
  do_check_eq(SerializeXML(doc), str);
  var doc2 = ParseXML(str);
  do_check_equiv(doc, doc2);

Where the last two lines seem only vaugly usefull. You're only testing roundtripping of documents that come straight from the parser. Ideally you should muck around with |doc| a bit before calling do_check_serialize(doc).

I'll do the rest of the file tomorrow. In general there is some nice test infrastructure here. What i'm worried about is that since the markup is included directly in the code it'll get messy to do bigger files. And in the end I do think that we want to run tests within the browser (though possibly in addition to tests like this) to be able to access all the things that are available there.
Comment 10 Boris Zbarsky [:bz] (TPAC) 2005-11-10 19:50:28 PST
(In reply to comment #9)
> Don't you want to do_check_eq(att.value, att2.value) too?

Yes, I do.  Added.

> >+  case nsIDOMNode.PROCESSING_INSTRUCTION_NODE:
> 
> For PIs you need to check that the target is the same.

Indeed.  I also realized that I was missing some QIs throughout this code; classinfo is very addictive...  ;)  I've added those now.

> If you check that dom1.attribute.length == dom2.attributes.length then one
> call should be enough, no?

I don't think we can enforce that.  For example, say I create a DOM (via DOM methods) that looks like:

qname = root ns = null
  qname = child ns = xxx

with no xmlns decls in sight (so I created child with createElementNS()).  Now if I serialize it we'll generate an xmlns decl on the child, and when we reparse that we'll have a DOM with a different number of attributes.

The invariant we should be enforcing (and which I aimed for) is that all attributes that are not in the xmlns namespace should be the same between the two elements.

> >+function testString(str) {
> >+  var doc = ParseXML(str);
> >+  do_check_serialize(doc);
> >+  do_check_eq(SerializeXML(doc), str);
> >+}
> 
> Won't this test the same thing twice? It seems equivalent to 
> 
>   var doc = ParseXML(str);
>   do_check_eq(SerializeXML(doc), str);
>   var doc2 = ParseXML(str);
>   do_check_equiv(doc, doc2);

Hmm...  Yes, given the |do_check_eq(SerializeXML(doc), str);| there's no real reason for the |do_check_serialize(doc);| check, is there?  I can remove that if you'd like.

This section really is just testing that we serialize right back to what we started with if it's not too hard to do it.  So the impotant part is the |do_check_eq(SerializeXML(doc), str);| check.

> What i'm worried about is that since the markup is
> included directly in the code it'll get messy to do bigger files.

We can add a ParseFile() method that takes a file name, opens up a stream on the file, and passes the stream to the DOMParser.  Want me to set that up?

> And in the end I do think that we want to run tests within the browser (though
> possibly in addition to tests like this) to be able to access all the things
> that are available there.

Oh, I agree.  This isn't even testing the effect of classinfo on this stuff, so it really is a test of the basic functionality of the parser and serializer.  I think we want both unit tests like this and whole-system DOM tests.

Thanks for looking at this, Jonas!  Let me know if you want to see an updated patch, ok?
Comment 11 Jonas Sicking (:sicking) No longer reading bugmail consistently 2005-11-28 19:41:23 PST
Comment on attachment 202463 [details] [diff] [review]
Some tests

>+function test2() {
>+  // Test setting of "xmlns" attribute in the null namespace
>+
>+  // XXXbz are these tests needed?  What should happen here?  These
>+  // may be bogus.
>+
>+  var doc = ParseXML('<root/>');
>+  doc.documentElement.setAttribute("xmlns", "ns1");
>+  // XXXbz We fail this right now, but as I said, this may be a bogus test.
>+  // do_check_serialize(doc);

I think you should remove this entierly. IMO we should actually throw on that setAttribute call. 

>+  // Setting random "xmlns" attribute
>+  doc = ParseXML('<root xmlns="ns1"/>');
>+  doc.documentElement.setAttribute("xmlns", "ns2");
>+  do_check_serialize(doc);

I think this test should stay though.


>+function test4() {
...
>+  doc = ParseXML('<root xmlns="ns1" xmlns:a="ns2">'+
>+                 '<child xmlns:b="ns2" xmlns:a="ns3">'+
>+                 '<child2/></child></root>');
>+  root = doc.documentElement;
>+  // Have to QI here -- no classinfo flattening in xpcshell, apparently
>+  var node = root.firstChild.firstChild.QueryInterface(nsIDOMElement);
>+  node.setAttributeNS("ns4", "l1", "v1");
>+  node.setAttributeNS("ns4", "p2:l2", "v2");
>+  node.setAttributeNS("", "p3:l3", "v3");

This should actually throw so you might not want to include that. We end up dropping the prefix right away.

>+  node.setAttributeNS("ns3", "l4", "v4");
>+  node.setAttributeNS("ns3", "p5:l5", "v5");
>+  node.setAttributeNS("ns3", "a:l6", "v6");
>+  node.setAttributeNS("ns2", "l7", "v7");
>+  node.setAttributeNS("ns2", "p8:l8", "v8");
>+  node.setAttributeNS("ns2", "b:l9", "v9");
>+  node.setAttributeNS("ns2", "a:l10", "v10");
>+  node.setAttributeNS("ns1", "a:l11", "v11");
>+  node.setAttributeNS("ns1", "b:l12", "v12");
>+  node.setAttributeNS("ns1", "l13", "v13");
>+  do_check_serialize(doc);
>+  //  Note: we end up with "a2" as the prefix on "l11" and "l12" because we use
>+  //  "a1" earlier, and discard it in favor of something we get off the
>+  //  namespace stack, apparently

This seems slighly bad, though not really a big deal. Ideally wouldn't be the case though.

>+  do_check_eq(SerializeXML(doc),
>+              '<root xmlns="ns1" xmlns:a="ns2">'+
>+              '<child xmlns:b="ns2" xmlns:a="ns3">'+
>+              '<child2 a0:l1="v1" xmlns:a0="ns4"' +
>+              ' a0:l2="v2"' +

Shouldn't we ideally use 'p2' as prefix for l1 and l2? Not that you need to fix that now, but do we really want to test for the current behaviour?

>+              ' a:l5="v5"' +

Is this really what we want to do? Shouldn't we allow explicitly specified prefixes to stick even if there is another one around? Especially given that the prefix isn't used for another namespace. I guess I'm fine either way as long as it's really what we want to do. Same for l8
Comment 12 Jonas Sicking (:sicking) No longer reading bugmail consistently 2005-11-28 19:43:51 PST
(In reply to comment #10)
> The invariant we should be enforcing (and which I aimed for) is that all
> attributes that are not in the xmlns namespace should be the same between the
> two elements.

Ah, forgot that you had those xmlns checks in there.

> Hmm...  Yes, given the |do_check_eq(SerializeXML(doc), str);| there's no real
> reason for the |do_check_serialize(doc);| check, is there?  I can remove that
> if you'd like.

Yeah, i think it is redundant.

> > What i'm worried about is that since the markup is
> > included directly in the code it'll get messy to do bigger files.
> 
> We can add a ParseFile() method that takes a file name, opens up a stream on
> the file, and passes the stream to the DOMParser.  Want me to set that up?

As you please. It's probably a good idea to do if bigger testcases are written.
Comment 13 Boris Zbarsky [:bz] (TPAC) 2005-11-28 19:57:05 PST
(In reply to comment #11)
> I think you should remove this entierly. IMO we should actually throw on that
> setAttribute call. 
...
> I think this test should stay though.

I'm not sure I really see the difference between those two tests.  Wouldn't they both throw?  Waiting on the answer to this before checking anything in.

> >+  node.setAttributeNS("", "p3:l3", "v3");
> 
> This should actually throw so you might not want to include that. We end up
> dropping the prefix right away.

I'd rather include it for completeness and then change the test if we ever change the core.  At the moment, this creates an attribute "l3" in the null namespace, which is what I test for...

> This seems slighly bad, though not really a big deal. Ideally wouldn't be the
> case though.

I really don't see how to avoid it, nor do I see a good reason to try.

> >+              ' a0:l2="v2"' +
> 
> Shouldn't we ideally use 'p2' as prefix for l1 and l2?

We go to serialize l1.  We see that it's in a namespace that's not declared, so we push an xmlns decl for it.  Then we go to serialize l2.  We see that it's got a prefix, but the prefix doesn't map to anything useful.  Then we see that we already have a prefix declared for the namespace it's in, so we reuse that.

Short of scanning the list of all attributes first, before starting any serializing, I don't see how we can improve on this...

> but do we really want to test for the current behaviour?

Yes; this is testing the "find an existing prefix for our namespace when our prefix isn't declared" code.

> >+              ' a:l5="v5"' +
> 
> Is this really what we want to do? Shouldn't we allow explicitly specified
> prefixes to stick even if there is another one around?

Good question.  For now, our algorithm tries to push as few xmlns decls as possible, so "no".  If we decide to change that, that's fine, but I'd like to test how our algorithm is succeeding for the time being.

All through this, I'm aiming to test for what our impl does; that way if we change the impl we'll know we changed behavior and will be able to evaluate whether the change is desirable.
Comment 14 Jonas Sicking (:sicking) No longer reading bugmail consistently 2005-11-28 20:22:06 PST
(In reply to comment #13)
> (In reply to comment #11)
> > I think you should remove this entierly. IMO we should actually throw on that
> > setAttribute call. 
> ...
> > I think this test should stay though.
> 
> I'm not sure I really see the difference between those two tests.  Wouldn't
> they both throw?  Waiting on the answer to this before checking anything in.

The first one creates an attribute named 'xmlns' but that does not live in the xmlns namespace. This should throw.

The second one changes the value of an existing attribute (this isn't exactly what the DOM spec says to do, but IMHO our way of dealing with mixed namespaced and non-namespaced calls are saner).

> > >+              ' a0:l2="v2"' +
> > 
> > Shouldn't we ideally use 'p2' as prefix for l1 and l2?
...
> Short of scanning the list of all attributes first, before starting any
> serializing, I don't see how we can improve on this...

That is what I think we should do. Usually there aren't enough attributes that it matters. But changing that is for later so if you want to test the current impl then keep this one too.
Comment 15 Boris Zbarsky [:bz] (TPAC) 2005-11-28 20:38:41 PST
Created attachment 204412 [details] [diff] [review]
Tests updated to comments

This just adds ParseFile() and removes the xmlns thing that we should be throwing for.
Comment 16 Boris Zbarsky [:bz] (TPAC) 2005-11-29 10:15:22 PST
Fixed.
Comment 17 William J. Edney 2005-11-29 10:24:44 PST
Thanks Boris for fixing this and to Jonas and Peter for the reviews!

Cheers,

- Bill
Comment 18 Peter Van der Beken [:peterv] 2006-07-05 05:06:15 PDT
*** Bug 343470 has been marked as a duplicate of this bug. ***

Note You need to log in before you can comment on or make changes to this bug.