Closed Bug 209152 Opened 21 years ago Closed 14 years ago

msword structure comments <![if ... ]> are rewritten into <!--[if ... ]--> which breaks Internet Explorer

Categories

(Core :: DOM: HTML Parser, defect)

defect
Not set
normal

Tracking

()

RESOLVED INVALID

People

(Reporter: truelsen, Unassigned)

Details

(Keywords: topembed, Whiteboard: editorbase)

Attachments

(1 file)

User-Agent:       Mozilla/5.0 (X11; U; SunOS sun4u; en-US; rv:1.5a) Gecko/20030610
Build Identifier: Mozilla/5.0 (X11; U; SunOS sun4u; en-US; rv:1.5a) Gecko/20030610

When text is copied to the clip-board and brought to Mozilla Composer using
paste then Word specific <![if ...]> ... <![endif]> are written into <!--[if
...]--> ... <!--[endif]--> which is not bad to Mozilla but Internet Explorer
displays it as clear text as IE seem to not identify "<!--[" as a comment.

This also happens when using Composer to edit a .htm produced by Word using Save
as Web Page.

This makes it hard to use Mozilla to work with Word documents which makes our
users choose off Mozilla.

Reproducible: Always

Steps to Reproduce:
1. Open Composer and click the Source pane.
2. Insert "<![if]>" (without the quotes) is the body sections
3. Switch to "Normal" and then back to "Source"
4. The string above is rewritten into "<!--[if]-->"

Actual Results:  
The <!--[if]--> is show as text in Internet Explorer

Expected Results:  
It should have left the "<![if]>" (and the like) as-is.
This problem i seen in 1.4b and 1.4(May 29) on Windows XP and in the current
nightly build on Solaris 8
Keywords: topembed
Whiteboard: editorbase
It is also checked to happen in 1.0.2-2 from Debian 3.0r1 on i386

Furthermore it is independent of the settings "Retain original source
formatting" and "Reformet HTML source" which would be obvious as it occurs while
editing (well when switching display mode).
Sounds like a parser issue. Shouldn't <![if ...] be treated as an unknown tag.
Not a comment?

--> Reassigning to parser for further investigation.
Assignee: jfrancis → harishd
Component: Editor: Core → Parser
QA Contact: sairuh → dsirnapalli
Why would it be treated as an unknown tag? The entire premise of this bogosity 
is that non-IE browsers will treat them as comments, as we do. The real 
question is "Should Composer be able to preserve and emit an invalid MS 
markup 'innovation'?", which I tend to feel it shouldn't, although this is of 
course up to the module owners.
Severity: major → minor
As far as I understand the sgml documents "<!" is not an comment in it self - it
is an "comment declaration" marking a region in the file where comments can
occur - the actual comment starts at the "--" sequence.

Hence it is perfectly legal sgml to use the "<! ... >" comment declaration for
something else.

If Composer keeps "fixing" the "<! ... >" sequences into "<!-- ... -->" then
users of M$ Word will not be able to copy-n-paste parts for Word text into the
Composer to successfully view it in M$ Internet Explorer.  Try for yourselves.

IMHO this will stop any attempt to persuade Office users into use Mozilla
Composer instead of other web editors and your could even as well stop
developing the Composer part of Mozilla.  I would actually vote to remove it.

If pages made with Composer cannot reliably be viewed in MSIE which 95% of the
world's surfers use then there is no need for Mozilla Composer.

Just my ¢.02
Severity: minor → normal
Had a long IM conversation with Harish about this.  We've worked out a solution
which he will implement.  Thanks, Harish!

The gist of the solution is that we will continue to treat markup like <![ ...
]> as comments but add delimeter information to comment nodes so that they can
be serialized correctly.
Status: UNCONFIRMED → NEW
Ever confirmed: true
Not to start a flame war or anything, but Brian, you said:

> Hence it is perfectly legal sgml to use the "<! ... >" comment declaration
> for something else.

SGML is very specific about what's allowed between "<!" and ">", and what MS
Word inserts there is by no means valid SGML. The only thing that's allowed
between "<!" and ">" is an optional list of optionally whitespace separated
sequences of "--...--", and nothing else. Therefore, it's not unreasonable for
Mozilla to clean up those invalid comments when serializing.

Having said that, I'm not arguing that we shouldn't fix this, but we shouldn't
pretend that what MS Word is doing is by any means standards compliant, so our
fix should apply to quirks mode documents only (as I just discussed with Harish).
Johnny, may I start with the same phrase that you were "Not to start a flame war
or anything" ?

Your point may be valid for plain HTML within the <body> section as proposed by
W3C.  But let me remind you that all HTML files already has another markup using
the <! ... > sequence namely <!DOCTYPE ... >.

Furthermore taking it to the next step XHTML which is the first convergence
towards XML (which is a subset of the larger SGML) allow for the use of <!ENTITY
... > and more - still outside the <body> section.  I know that Composer does
not allow you to edit the <!DOCTYPE ... > line which makes my point invalid this
bug as such.

It think Composer should be large in what it accepts and strict in what it
produces by it self.  To "correct" what Windows produces resulting in pages that
are unusable to Internet Explorer would be like sawing off the branch you are
sitting on.
Correct, on all accounts. I should have clarified in my comment that what's
allowed between "<!" and ">" within the root element is only SGML comments, and
what MS Word is doing is thus invalid *ML markup.
Do not always add <!-- and --> to a comment on serialization in quirks mode.
ccing parser purity task force.
Comment on attachment 126778 [details] [diff] [review]
patch v1.0 [ not final yet; needs testing etc. )

>Index: content/base/src/nsXMLContentSerializer.cpp
>@@ -56,6 +56,34 @@
>+InStandardsMode(nsIContent* aContent,

>+  // Should content with no document default to quirks mode?

>+      nsCompatibility mode;
>+	    rv = htmldoc->GetCompatibilityMode(mode);
>+      *_retval = (mode != eCompatibility_NavQuirks);

there are now two quirks modes:
 42   eCompatibility_FullStandards   = 1,
 43   eCompatibility_AlmostStandards = 2,
 44   eCompatibility_NavQuirks	     = 3
Assignee: harishd → parser
QA Contact: dsirnapalli
Summary: msword structure comments <![if ... ]> are rewritten into <!--[if ... ]--> which groks Internet Explorer → msword structure comments <![if ... ]> are rewritten into <!--[if ... ]--> which breaks Internet Explorer
Taking. As a note: patch v1.0 isn't quite ready (it "breaks" the DOMI's display
of comments, showing "-- hi --" instead of simply " hi ").
Assignee: parser → mrbkap
Hello,

I have the same kind of problem with Mozilla Thunderbird.
If a copy-past from word a text wich contains a list, I got something like this in the source code:

<!--[if !supportEmptyParas]--> <!--[endif]-->
Points positifs :
<!--[if !supportLists]-->-         l’outil aulation<!--[endif]-->
<!--[if !supportLists]-->-         la mise en œuv)<!--[endif]-->
<!--[if !supportLists]-->-         les essaparaison <!--[endif]-->
<!--[if !supportEmptyParas]--> <!--[endif]-->

This is well displayed in Thunderbird, but when I send it, Outlook or a webmail doesn't recognize this and display it, which made the mail unreadable...
Assignee: mrbkap → nobody
QA Contact: parser
This should be WONTFIX per HTML5 as far as Web-facing parser features go. I suggest providing an editor feature for discarding comments that look like IE conditional comments.
Invalid, even.
Status: NEW → RESOLVED
Closed: 14 years ago
Resolution: --- → INVALID
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: