Closed Bug 88952 Opened 23 years ago Closed 20 years ago

Remove record trailing content

Categories

(Core :: DOM: HTML Parser, defect)

x86
Windows NT
defect
Not set
normal

Tracking

()

RESOLVED FIXED
Future

People

(Reporter: harishd, Assigned: mrbkap)

References

Details

(Whiteboard: [fixinhand?])

Attachments

(1 file, 9 obsolete files)

 
Attached patch Patch v1.1 (obsolete) — Splinter Review
Status: NEW → ASSIGNED
Target Milestone: --- → mozilla1.2
Attached patch Patch v1.2 (obsolete) — Splinter Review
QA Contact: bsharma → moied
What's the status on this?
This bug needs a description...
Whiteboard: [fixinhand?]
Target Milestone: mozilla1.2alpha → mozilla1.2beta
Futuring since I don't have the time right now.
Target Milestone: mozilla1.2beta → Future
Stealing this so I can update it to the current tree.

bz, this patch would make all of our <textarea> bugs go away and I think I could
update it. The patch makes a lot of sense. Would a patch this large be
considered, though?

I'm a bit confused about how the patch handles <noscript>, but I need to
investigate more to find out how the DTD handles those sorts of tags. I'll give
this bug a better description (per comment 6) sometime when it isn't 2:00am.
Assignee: harishd → mrbkap
Status: ASSIGNED → NEW
Ok, I understand the NOSCRIPT/NOFRAMES stuff now: if script/frames are disabled,
we want to parse it as actual HTML content so we can display it on the screen.
Otherwise, we want to parse it as text for easy discarding (this has bad effects
on view-source, for tags like this, I'm going to need to special-case view-source).
Blake, if we can fix all the textarea stuff, that would be wonderful.  As for
size, if it's large but understandable, we can take it in an alpha.... ;)
Attached patch patch v1 (obsolete) — Splinter Review
This is a first pass at making this all work. There are a couple of problems
still (i.e., this isn't the final patch, I'm attaching it so that whoever's
reviewing can take a look). There are some questions remaining about this:

1) This needs testing, especially <noframes> and <noscript>. <xmp> should work
(thanks to the elementtable changes). <script>/<style> should "just work", but
they should be tested also. The logic around there changed fairly dramatically,
and I'm worried about edge cases.
2) IE has a bug where if you don't close <title>, it doesn't display the
document. However, if you don't close <textarea> everything down to EOF becomes
part of the <textarea>. Currently, I emulate these exactly, but this regresses
bug 42945, so I think ConsumeParsedCharacterData() needs to deal with both
cases (since the CNavDTD.cpp solution will no longer apply). Any opinions?

Comments welcome.
Attachment #40989 - Attachment is obsolete: true
Attachment #45651 - Attachment is obsolete: true
Attachment #45655 - Attachment is obsolete: true
Attachment #45697 - Attachment is obsolete: true
We don't really want to regress bug 42945
Comment on attachment 165855 [details] [diff] [review]
patch v1

In the interests of sanity, I'm going to make new return values to return from
ConsumeParsedCharacterData(). I'm also going to add another parameter to it to
indicate which behavior to follow (either stopping at the first < as for
<title> or consuming all the way to EOF as for <textarea>).

I'll mark this obsolete when I have a new patch.
Attached patch patch v2 (obsolete) — Splinter Review
Updated patch. We now handle the lack of a </title> the same way as before. As
a note: I think "skipped content" in CNavDTD could possibly go away with this
patch, but that's another bug for another time.

I removed CopyState from nsITokenizer as it's no longer needed (the only state
to copy was the trailing content target).

I've commented things the best I can. We also now handle <xmp> not followed by
</xmp> the same way IE does (correctly).

I'm not sure about <noscript>, <noframes> they seem to do the right thing
(i.e., not show their content when frames, script are enabled. <noscript> does
show when scripts are disabled too. Turning frames off messes up my browser too
much to test).

I can't think of anything else except to test and get reviews. Comments are
welcome.
Attachment #165855 - Attachment is obsolete: true
Comment on attachment 165870 [details] [diff] [review]
patch v2

After running through the regression tests, I'm pretty confident with this
patch. There were differences because with this patch we parse the contents of
<noscript> and friends as actual content instead of PCDATA, and some stray
<newline> tokens hanging around (which are fine).

The only real difference was with our parsing of:
<title><!-- </title> --></title>

With this patch, we have a title of "<!--" and text content "-->". Is this a
problem? I could make ConsumePCDATA() respect comments, but I don't feel it's
worth it. Is this a common problem?
> Turning frames off messes up my browser too much to test

Erm... it should not... please file bugs?

> I could make ConsumePCDATA() respect comments, but I don't feel it's worth it.

Agreed.
Attached patch patch v3 (obsolete) — Splinter Review
This fixes all known problems with the previous patch. The only differences
shown by the regression tests are the <!-- </title> --> difference mentioned
above and some differences handling newlines (more correctly!) around </xmp>.

<noscript> works as expected. As mentioned above, turning frames off in the
browser breaks it too much to test, but <noframes> *should* have precisely the
same behavior.

I'm going to ask for review on this patch, but I have two questions
outstanding: I've added a parameter to CParserContext::GetTokenizer() so that
the HTMLTokenizer can know the status of the script/frames prefs. Is this
really the right place to do this and what should theFlags be if the QI fails
(can it fail in that branch?)?

Note: The change to nsLoggingSink.h is useful so that aResult can't go back
uninitialized.
Attachment #165870 - Attachment is obsolete: true
Comment on attachment 165970 [details] [diff] [review]
patch v3

rbs, could you give this a look?
Attachment #165970 - Flags: review?(rbs)
Attached patch new nsHTMLTokenizer changes (obsolete) — Splinter Review
The last patch's nsHTMLTokenizer.cpp had a problem with <script
/>content</script> (we'd lose the content). I cleaned up the /> code some and
fixed that problem. I also made us copy our old behavior on missing </iframe>,
</xmp>, </noscript>, </noframes> (consume until the end of the document, don't
act as if we had <xmp></xmp>).
Comment on attachment 166026 [details] [diff] [review]
new nsHTMLTokenizer changes

rbs, please look at these nsHTMLTokenizer.cpp changes instead of the ones in
patch v3.
Attachment #166026 - Flags: review?(rbs)
I won't be able to review your patch(es) in the next few days. I have to deal
with something urgent in my plate now. Feel free to request other reviewers in
the meantime.
Comment on attachment 165970 [details] [diff] [review]
patch v3

Obsoleting both this and the next patch since rbs won't be able to review them
for a couple of days. In the meantime I've given PCDATA the ability to respecct
comments, and we shouldn't regress that behavior anyway. I'll attach a new
patch tomorrow.
Attachment #165970 - Attachment is obsolete: true
Attachment #165970 - Flags: review?(rbs)
Attachment #166026 - Attachment is obsolete: true
Attachment #166026 - Flags: review?(rbs)
Attached patch latest and greatest (obsolete) — Splinter Review
This is the latest and greatest version of the patch. It handles <title> <!--
</title> --> Hi! </title> properly, it handles missing close tags properly, and
it passes all of the parser regression tests. It also fixes the bug where the
<! of an unterminated comment in strict mode gets dropped (since that behavior
is bad if we're going to be using it to display PCDATA).
Comment on attachment 166246 [details] [diff] [review]
latest and greatest

jst, could you give this a look?
Attachment #166246 - Flags: review?(jst)
Blocks: 144902
Attached patch updated to tipSplinter Review
The changes required to make this merge with bryner's string changes were
non-trivial, and there was a slight "bug" where we'd assert if we didn't find a
</script>, so I've updated this patch. Sorry for the bugspam!
Attachment #166246 - Attachment is obsolete: true
Attachment #167406 - Flags: review?(jst)
Attachment #166246 - Flags: review?(jst)
Comment on attachment 167406 [details] [diff] [review]
updated to tip

+nsHTMLTokenizer::nsHTMLTokenizer(PRInt32 aParseMode,
+				  eParserDocType aDocType,
+				  eParserCommands aCommand,
+				  PRUint16 aPrefs) :
   nsITokenizer(), mTokenDeque(0)
 {
+  mFlags = aPrefs;

Use member initializer syntax here, and shouldn't aPrefs be named aFlags?

Other than that this looks good, but given a change of this size in this code
the only way to really know if it does the right thing is to land it and keep
an eye on incoming parser bugs. r=jst
Attachment #167406 - Flags: review?(jst) → review+
Comment on attachment 167406 [details] [diff] [review]
updated to tip

Looking for sr= now.
Attachment #167406 - Flags: superreview?(rbs)
Comment on attachment 167406 [details] [diff] [review]
updated to tip

sr=rbs

+	 if (eHTMLTag_iframe == theTag	 && (mFlags &
NS_IPARSER_FLAG_FRAMES_ENABLED) ||
+	     eHTMLTag_noframes == theTag && (mFlags &
NS_IPARSER_FLAG_FRAMES_ENABLED) ||
+	     eHTMLTag_noscript == theTag && (mFlags &
NS_IPARSER_FLAG_SCRIPT_ENABLED)) {
+	   isCDATA = PR_TRUE;
	 }

Missing some parentheses there. You want:
if ( (a && b) ||
     (c && d)
...

The problems you reported with <noframes> may be due to the fact that it has
"display:none" in html.css, and is not toggled back to really display when
frames are disabled... (see bug 236450 for pointers where this is done for
<noscript>).
Attachment #167406 - Flags: superreview?(rbs) → superreview+
Fix checked in.
Status: NEW → RESOLVED
Closed: 20 years ago
Resolution: --- → FIXED
Blocks: 196953
This could be unrelated, but since you're touching textarea parsing anyway...

Right now the content in textareas doesn't get passed to the sink as a normal
textnode. Instead the sink has to call CollectSkippedContent and then manually
shuffel it to the right place. The 'shuffeling to the right place' actually ends
up just creating a normal textnode as a child of the textarea.

see the comment at
http://lxr.mozilla.org/mozilla/source/content/html/document/src/nsHTMLContentSink.cpp#902
as well as a few lines further down. It would be great if this whole dance was
removed and the 'skipped content' just got passed as a textnode.
Yeah, we can easily disable skipped content for <textarea>. It would be nice to
remove skipped content altogether... but there are a couple of more issues to be
cleared up before doing that (like how to handle <script>). I'll open a bug
tomorrow on these things.
Great! Please cc me on the new bug!
Blocks: 271184
Depends on: 288991
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: