Closed Bug 126669 Opened 23 years ago Closed 23 years ago

Save As XML adds traling slashes to start tags when saving XHTML


(Core :: DOM: Serializers, defect, P1)






(Reporter: brentmh, Assigned: hjtoi-bugzilla)




(Keywords: dataloss, regression, xhtml, Whiteboard: [Hixie-1.0][Hixie-P1][ADT2])


(1 file, 2 obsolete files)

From Bugzilla Helper:
User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:0.9.8+) Gecko/20020220
BuildID:    2002022003

If you go to an XML page in the browser and then Save As an xml file, the saved
file does not match what was browsed.  Mozilla is adding a trailing slash to
some (but not all) of the start tags.  It doesn't remove the content of the tag
or the end tag.  It just adds a slash, making the start tag into an empty tag. 
This causes the saved page to be not well-formed XML.

Reproducible: Always
Steps to Reproduce:
1. Go to the above URL
2. Select "Save As", either under the file menu or the context menu
3. Save the page as XML
4. View the page in a text editor 

Actual Results:  The <body> start tag is now an empty tag: <body/>

Expected Results:  <body> should be preserved as <body>
Confirmed.  Over to Adam -- this is a webbrowserpersist problem.
Assignee: law → adamlock
Ever confirmed: true
Keywords: dataloss, mozilla1.0
Not persist but an encoder problem. Who actually "owns" the document encoder
since I have 2 or 3 bugs which that component is causing?
There are also some private -moz attributes that show up. I use a lot of these 
for various annotations to modulate MathML rendering. (Such attributes can only 
be set programmatically since Expat would flag an XML parsing error if an 
attribute starts with '-'). I would like the SaveAs to skip such attributes too. 
They turn the saved document into an invalid document anyway.
Oh how I hate our -moz attribute crap, it's wrong, wrong, wrong. And the DOM
API's shouldn't allow those being set either, one day we will validate attribute
names when attributes are set too, like the DOM spec says we should...

(just wenting...)
Perhaps the DOM WG could consider formalizing anonymous attributes in the same
way as there are anonymous nodes. What I find very useful with the -moz
attributes is that they also allow to automatically match desired style rules
with the Style System without further fuss.
All that comes to my mind when I see those -moz attributes is "hack", I don't
see any good uses of those attributes, and I don't think advocating their use is
a good idea, at all. Poking attributes into the DOM tree just for the purpose of
making something happen simply because it will due to other surrounding systems
that happen to be in place is just wrong. We should be taking out those
attributes from our code, not adding more.
It's off-topic for this bug, but the -moz- attribute stuff could be rewritten to
use the architecture suggested in bug 115462.

Regarding this bug, it's rather serious dataloss. (To the unexperienced user, it
makes their documents unusable.)
Whiteboard: [Hixie-1.0][Hixie-P1]
-> DOM to text conversion
Assignee: adamlock → harishd
Component: File Handling → DOM to Text Conversion
QA Contact: sairuh → sujay
*** Bug 131620 has been marked as a duplicate of this bug. ***
Keywords: nsbeta1
A workaround: Do a 'view source' and then 'save as'.
*** Bug 132263 has been marked as a duplicate of this bug. ***
Tanu, could you have a look at this? Tentatively setting milestones and
priorities and reassigning. Reassign to me if you can't take this.

This might potentially be a dupe of bug 127300 or bug 120556.
Assignee: harishd → tmutreja
Priority: -- → P1
Target Milestone: --- → mozilla1.0
Attached patch Patch to fix it... (obsolete) — Splinter Review
It does not look like an exact dupe of the other bugs. Patch solves this one...
Keywords: patch
Whiteboard: [Hixie-1.0][Hixie-P1] → [Hixie-1.0][Hixie-P1][needs r=]
This won't do it, there are several XML mime types. I will look into this as
well. Thanks for finding where this happens.
Checking this based on the mSerializer was a good thing to do, as this is the 
one created on the basis of mime types but I could not do that. So just assumed 
that other xml mime type which are like "Application/..." or "img/..." may not 
need the serilaizer to come in picture. Please see if somehow we can check if 
the mSerilaizer is pointing to nsXMLContentSerializer.
Looking at the code I think FixupNode() clones some tags but does not clone
their children, which means that nsXMLContentSerializer::AppendElementStart()
will think the element is empty and end the starting element with "/>". So it
looks like FixupNode() is screwing up the XML serializer.

Also, we need FixupNode() for XHTML elements so we can not just bybass it.

I guess the easiest fix is to add one more parameter to AppendElementStart()
called aEmptyElement or something. HTML will just ignore it, but XML will end
empty elements with "/>". The has-children check from nsXMLContentSerializer.cpp
around line 514 needs to move into nsDocumentEncoder.cpp's
SerialializeNodeStart(), line 310 or so.
This does not affect most people, but for those that work with XML (like me)
this is so bad that I would not use a product with this kind of bug in it.

I have a fix, will attach.

Assignee: tmutreja → heikki
Summary: Save As XML adds traling slashes to start tags → Save As XML adds traling slashes to start tags when saving XHTML
Whiteboard: [Hixie-1.0][Hixie-P1][needs r=] → [Hixie-1.0][Hixie-P1][ADT3]
Attachment #76378 - Attachment is obsolete: true
What about the private [-|_]moz* attributes? Would be best to filter them too.
File a separate bug please, with a testcase, and I can do that.
Follow-up: bug 134298 - SaveAs leaves private [-|_]moz attributes in the output 
Comment on attachment 76799 [details] [diff] [review]
Forgot to diff public

The patch looks fine to me, with one concern, that of changing the serializer
API (I'm not sure what the status is of making that a w3c standard).  r=akkana,
provided that jst or vidur or someone involved with the API also r's or sr's

Also, please run the TestOutput test (which will turn Tinderbox orange if it
breaks) before checking in.

If Tanu sees this in time, I hope she can also take a look and make sure she
has no objections to the API change.
Attachment #76799 - Flags: review+
Comment on attachment 76799 [details] [diff] [review]
Forgot to diff public

Attachment #76799 - Flags: superreview+
jst is the editor of the DOM Level 3 Load and Save module, he gave sr= Besides,
this is internal Mozilla interface and has nothing to do with that DOM draft.
Whiteboard: [Hixie-1.0][Hixie-P1][ADT3] → [Hixie-1.0][Hixie-P1][ADT3][fixinhand]
Also, I will run the tests before checkin in, although I don't see how this
could break HTML or plaintext serializers since they don't care about the API
As Hiekki mentioned, I also find this internal API change to be fine with 
serializer. Patch works fine and succesfully passes all the test cases too.
adt1.0.0+ approval for checkin.

Chaning to ADT2 because this negatively affects Web Developers using our product.
Keywords: adt1.0.0adt1.0.0+
Whiteboard: [Hixie-1.0][Hixie-P1][ADT3][fixinhand] → [Hixie-1.0][Hixie-P1][ADT2][fixinhand]
Comment on attachment 76799 [details] [diff] [review]
Forgot to diff public

a=asa (on behalf of drivers) for checkin to the 1.0 trunk
Attachment #76799 - Flags: approval+
Checked in.
Closed: 23 years ago
Resolution: --- → FIXED
Whiteboard: [Hixie-1.0][Hixie-P1][ADT2][fixinhand] → [Hixie-1.0][Hixie-P1][ADT2]
Kathy thinks that this has regressed composer -- on empty documents, composer is
now outputting a single br tag, which comes from composer's bogus node, which
should look like: <br _moz_editor_bogus_node>.  I don't understand why this
would have caused that, but I'm cc'ing Kathy and Joe for discussion.
I think you mean bug 134298. It had fixes for XML and HTML side (I did XML, rbs
did HTML). rbs checked in the HTML side fixes on 4/13 on trunk and 4/15 on 1.0
From further looking, the bug/regression has been there before my checkin.
You need to log in before you can comment on or make changes to this bug.