Closed Bug 133044 Opened 22 years ago Closed 22 years ago

Single line HTML comments get mangled in textarea

Categories

(Core :: DOM: HTML Parser, defect, P1)

x86
All
defect

Tracking

()

VERIFIED FIXED
mozilla1.0

People

(Reporter: bugzilla, Assigned: hjtoi-bugzilla)

References

Details

(4 keywords, Whiteboard: [ADT1 RTM][Fixed on the trunk and branch 05/22 ],custrtm-)

Attachments

(4 files, 2 obsolete files)

When editing HTML in a text area, single line comments have the opening and
closing comment tags removed.

<!--
this is fine, cuz it's multi-line
-->

<!-- THISLLBREAK -->

The latter turns into:

    THISLLBREAK    

This has the potential for pretty big data corruption.
Sorry Build ID: 2002032306
not critical
Severity: critical → normal
You should escape the "<" and ">" because as the HTML spec says, "comments are
markup".

Over to parser.
Assignee: rods → harishd
Status: UNCONFIRMED → NEW
Component: HTML Form Controls → Parser
Ever confirmed: true
QA Contact: madhur → moied
Probably related to bug 105973.
The problem is that the parser should be ignoring any HTML within the
<textarea></textarea>.  This is critical as it can result in dataloss!  This
will make Mozilla unsafe to use for managing webpages, from a webpage.

I am attaching a more complete test case.  In my experience, multi-line doesn't
make a difference (build 2002032803).
Attached file Simple Test Case
Possible dupe of bug 64799 or bug 127043 particularly the bits to do with
<SCRIPT> tags within TEXTAREAs from those bugs.
I still don't find any reference in the HTML 4.01 Specification that says there
can't be html within a <textarea> although I did find it in the 3.2 Spec and I
suppose that it probably does make sense.

However, this is not consistent with any other browsers and WILL RESULT IN
DATALOSS in many applications.  Therefore, IMHO, it should only be enforced in
Strict mode.
Marking "compat". The only tags in which unescaped HTML is not parsed are those 
with declared content CDATA, such as <script> and <style>. <textarea> is not one 
of them.
Keywords: compat, testcase
If you are writing for one spec and put it on a different doctype you are gonna
get problems.. especially with depreciated tags in newer versions of standards.
Heikki, could you please take a look at this? Thanx.
Assignee: harishd → heikki
*** Bug 135364 has been marked as a duplicate of this bug. ***
Hmm... seems I broke this with my comment optimization :(
Status: NEW → ASSIGNED
OS: Linux → All
Priority: -- → P1
Target Milestone: --- → mozilla1.0
Whiteboard: [ADT2]
*** Bug 138494 has been marked as a duplicate of this bug. ***
I get something similar. Go to http://chuckr.bravepages.com for an example.
Sometimes the page is rendered and missing the whole table, a single line
comment seems to mess it up. Other times it works. Using Mozilla 1.0rc1. Here is
a partial code sample as viewed from Mozilla 1.0rc1. Everything after the BODY
tag seems to be commented out, until you get to the end of the table. 

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0//EN">
<html>
<head>
<title>Freezone freeware</title>
<meta name="keywords" content="freeware freezone computer software free Unrealed
Unrealed2">
<meta NAME="description" CONTENT="Freezone Freeware, 600+ freeware programs">
<meta NAME="author" CONTENT="croberts@gilsongfx.com">
<body bgcolor="#c0c0c0" TOPMARGIN=0 LEFTMARGIN=5 MARGINHEIGHT=0 MARGINWIDTH=5>
<!--></noscript></div></xmp></style>
<center>
<!partner is bravepages 192.168.100.13>
<SCRIPT SRC='http://www.bravenet.com/jsbanner.php?size=666'></SCRIPT><SCRIPT
SRC='http://www.bravenet.com/jsbanner.php?size=999'></SCRIPT>
</center>
<table border=1>
<tr>
<td width=20% bgcolor=lightblue valign=top>
<p>
<center>
<a href="http://www.cdrinfo.com"><img border="0"
src="http://www.cdrinfo.com/images/button6.gif" alt="Click here to Visit
CDR-Info!"></a>

First view the page with IE, and view source in IE. Now do the same in Mozilla.
You'll get different results. 

croberts@gilsongfx.com
I support Stuart Johnston in that it's vital that HTML should not be parsed with
&lt;textarea&gt; tags.

IMHO this should extend beyond HTML comments, and include HTML characters (the
ones that begin with "&" and end with ";", and have "nbsp" or "lt" etc. in the
middle).
This is a showstopper for our use of the browser.  Can someone specify what we
need to do to proceed?  If unescaped HTML inside a TEXTAREA is actually in
conformance with the HTML spec, do we need to approach W3C to discuss the issue
and possibly request a modification the spec?
*** Bug 144013 has been marked as a duplicate of this bug. ***
Whiteboard: [ADT2] → [ADT1 RTM]
*** Bug 145238 has been marked as a duplicate of this bug. ***
Blocks: 143200
Attached patch WIP, does not work (obsolete) — Splinter Review
This shows the idea I am after. New comment parsing has been implemented for
strict mode (not really new, just create comment string between '<' and '>' and
hold iters into the string for actual data start and end). Will do quirks after
I see this works in practice. Create comment node with that string and iters.
To minimize changes, don't change GetStringValue() member, but expose the new
stuff with a new method that gets the iters (or possibly substring). Most
callers of GetStringValue are fine with just that value (for example view
source and serialization), but the data that is exposed to DOM must be the
substring between iters. I think we'll need a new interface, nsINSDOMComment or
something, that we use internally for comments instead of nsICOMComment.
Additional twist is if someone edits the comment data via script: we must
create new data, but retain the old delimiters (for serialization).
In my opinion, it would be sufficient if there was a clear rule like "HTML has
to be quoted in textareas". At the moment, it is neither parsed correctly nor
displayed as is. If it was parsed correctly, then a comment would be removed
completely, but in the current situation, the parser just removes the opening
and closing tags.

On the other hand, I would like it if textareas just would allow unparsed
content, simply for practical reasons. I run a web form where people can modify
XML documents (which are then evaluated) and it would be rather inconvenient to
quote every brace and ampersand.

P.S. Mozilla only began behaving this way in 0.9.9 (or RC1?) ...
That won't do. There are several things that go wrong, all of which were caused
by bug 110544. The things that come to mind right now are:

* Comments in textarea elements
* View source (noticeable with bad comments)
* Serialization

My fix will fix all of them. To put it simply, the problem is that we need to
retain information about how the comments started and ended in order to make
round tripping possible. All of the above problems are caused by this basic
problem, or they are variations of it. Currently the code only stores what is
between the comment delimiters.
Turns out the full fix seems to be pretty big, with some non-trivial changes.
While I am working on that, harishd will be working on making a smaller band-aid
fix to fix just the textarea part.
Attached patch patch v1.0 (obsolete) — Splinter Review
This patch should fix the problem with comments in textarea and in viewsource.

I've added an extra member to the comment token that stores the complete
comment. AFAIK, the only overhead is the extra member there is no extra copy (
I think :-/ ) because the member variable is a slidingsubstring.

FYI: The patch needs clean up and extensive testing.
is backing out bug 110544 an option for rc3/1.0?  and we could land the
optimization and the bigger clean up patches for 1.01?
Backing out is not a good option in my opinion: in addition to some performance,
the fix did fix some pages we had never gotten right, including pages on top100
sites, and it made us more standards compliant and more tolerant to really bad
comments. Also there have been quite a bit of changes on top of these changes as
well. I think harishd's patch would be good enough for RC3/1.0, need to confirm
with a bit of testing tomorrow.
Attached file testcase
8 different comments in textarea
I can understand how the patch fixes view source, and that looks good. However,
I don't know how this would fix textareas. Maybe something was missing from the
patch?
>I can understand how the patch fixes view source, and that looks good. However,
>I don't know how this would fix textareas. Maybe something was missing from the
>patch?

Textarea content is collected in CNavDTD::CollectSkippedContent() by calling
AppendSourceTo on the token ( comment in our case ).
CCommentToken::AppendSourceTo will provide us with the comment as seen on the
source.
Attached patch patch v1.1Splinter Review
This patch in addition to fixing the stated problem it also fixes a problem,
pertaining to comments, in the viewsource. I've replaced the nsString member,
in CCommentToken, with nsSlidingSubstring. This would avoid one extra copy!

Heikki: I have also removed a hack that was in place to avoid a hang ( bug
112943 ) since using nsSlidingSubstring avoids the problem.
Attachment #84404 - Attachment is obsolete: true
Attachment #84575 - Attachment is obsolete: true
Comment on attachment 84675 [details] [diff] [review]
patch v1.1

>+    mNewlineCount = !(aFlag & NS_IPARSER_FLAG_VIEW_SOURCE) ? mComment.CountChar(kNewLine) : -1;

We should count newlines from mCommentDecl member.

With that, r=heikki.
Attachment #84675 - Flags: review+
>We should count newlines from mCommentDecl member.

Yes it should be. Made that change on my local tree.
Comment on attachment 84675 [details] [diff] [review]
patch v1.1

sr=jst
Attachment #84675 - Flags: superreview+
Oh, and get rid of the old unused member variable in CCommentToken :)
This testcase shows the minor dataloss problem that still remains after this
fix. The issue is that if you have malformed comments and you save page as
complete, we always save with the correct comment open/close so your bad markup
probably changed to better.
Comment on attachment 84675 [details] [diff] [review]
patch v1.1

with all of heikki's comments attended to, a=brendan@mozilla.org, please get
this in tonight.

/be
Attachment #84675 - Flags: approval+
adding adt1.0.0+ for checkin to the 1.0 branch.
Keywords: adt1.0.0adt1.0.0+
Fix landed ( 05/22 )on the trunk and branch.
Status: ASSIGNED → RESOLVED
Closed: 22 years ago
Resolution: --- → FIXED
Whiteboard: [ADT1 RTM] → [ADT1 RTM][Fixed on the trunk and branch 05/22 ]
*** Bug 146576 has been marked as a duplicate of this bug. ***
Verified fixed with trunk and branch build 20020524 on winxp 
Status: RESOLVED → VERIFIED
Whiteboard: [ADT1 RTM][Fixed on the trunk and branch 05/22 ] → [ADT1 RTM][Fixed on the trunk and branch 05/22 ],custrtm-
Keywords: verified1.0.0
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: