Closed Bug 137628 Opened 18 years ago Closed 18 years ago

commented out HTML code is shown

Categories

(Core :: DOM: HTML Parser, defect, P1)

defect

Tracking

()

VERIFIED FIXED
mozilla1.0

People

(Reporter: Daniel.Steinberger, Assigned: hjtoi-bugzilla)

References

()

Details

(Keywords: dataloss, regression, Whiteboard: [ADT1 RTM][fixed on trunk 5/16] custrtm-)

Attachments

(4 files, 5 obsolete files)

From Bugzilla Helper:
User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:0.9.9+)
Gecko/20020415
BuildID:    2002041503 (trunk)

the page at the given URL does include commented out HTML code that gets shown.
IMHO the comments are very valid, although the page does NOT validate as
HTML4.0/strict. (because wrapping <div> or <p> around text is missing)

Reproducible: Always (do you really need steps to reproduce??)
david: i added you, since it's your page (btw: your fix to Bug 5693 rules =)
Worksforme on Linux, although my tree is a few days old.
Seeing this on a branch nightly build id 2002041306.

The comment works as expected execpt for the fact that it does not hide the
neding comment line, for example..

"--> This test page used to contain..."

is what is displayed on the screen.
This works in the linux 2002-04-12-21 build but not in the 2002-04-15-21 build.
Keywords: regression
OS: Windows XP → All
Hardware: PC → All
--> Heikki
Assignee: harishd → heikki
This might be a dupe of bug 133044, or get fixed by that if not dupes.

Other's have reported these kinds of symptoms but I am still unable to reproduce.
Status: NEW → ASSIGNED
Keywords: nsbeta1+
Priority: -- → P1
Target Milestone: --- → mozilla1.0
Attached file testcase (obsolete) —
To test Shift+Reload the testcase.
It will not be reproducible from local FS or from cache.

The first tag inside the comment closes it, and anything that follows it gets
rendered. This seems to affect only strict and not quirks pages.
It also depends on the ammount of code above the comment closure (including
priot to comment start) and the position of the <tag> inside the comment.

My guess is that, this happens when the <!-- and <tag> arrive in one IP packet,
and --> arrives in the next one. But could be something else.
Comment on attachment 79714 [details]
testcase

damn doesn't work here.
Bugzilla server probably uses different packet size.
Attachment #79714 - Attachment is obsolete: true
Whiteboard: [ADT2]
I can reproduce it with mozilla 1.0 rc1.
but it works with netscape 622.

maybe a regression bug?
I want to make one simple testcase for it.
but I failed.
I save the source of the page to one html using mozilla1.0rc1, then  to open it,
it works.
*** Bug 138710 has been marked as a duplicate of this bug. ***
A slightly modified (lengthened) copy of the attached testcase has been put up
at http://www-124.ibm.com/developerworks/oss/jikes/79714.html, I've pulled this
file from a couple different locations and it recreates the problem 100%. Where
as Alexey notes it doesn't recreate pulling from b.m.o - tcpdump confirms that
pulling it from b.m.o results in fewer, larger packets than from
www-124.ibm.com. In theory you could just keep increasing it untill it fails
from b.m.o, or you can use the one I put up on www-124.ibm.com (shh... don't
tell the boss ;)
*** Bug 138971 has been marked as a duplicate of this bug. ***
*** Bug 138790 has been marked as a duplicate of this bug. ***
another site leaking comments these days is http://www.digi.no
I was finally able to reproduce this once on the IBM site (when I went there the
first time). But only once, reload or shift+reload worked perfectly after that.
The Norvegian site hangs for me, but I have some work in progress regarding
comments so that might be it.

Since this is so elusive I would be happy to get access to more test cases, so
if you see this somewhere just post the URL and I'll see if I could find a
reliably failing site for me.
A Url I mentioned in a dupe: <http://www.mozilla.org/news.html>

I copied the testcase to <http://home.in.tum.de/~hohe/moz/79714.html> and can
see the problem there sometimes.
I can reproduce it if I double-click the reload button.
I'm running Moz 1.0RC1 on Linux and I just spotted this problem on one of our sites.

http://www.employmentwizard.com/perl/vaui/Register

The problem comes up intermittently (try shift-reloading). Look for -> over the
USER ID & PASSWORD table, and look for the Secret Question field. The secret
question field is inside a comment. All of the comments in the file looked legit
to me.
The site that I reported does not seem to be accessible from outside .au, so I
was preparing an attachment containing the files.  When I used the page editor
to save the files, it butchered the source, and was able to display the result,

surprisingly.

In order to get back to something like the problematical source, I did a cut
and
paste from the View Source display, and corrected the image filenames.	The
"corrected" result does not display incorrectly, except for the logo display
(DSTC, Optus and Sun logos), which are truncated or eliminated, but which
appear
in the butchered source.  Whether there is any relationship to the parsing
problem I don't know.

I will attach here a bundle with the original output of the page editor, and in
a separate attachment, the html file I created from the View Source.
Accompanies previous attachment.
When the mail with my second attachment (the bare html file) arrived, I tried
the link and the comment problem showed up.
*** Bug 139331 has been marked as a duplicate of this bug. ***
*** Bug 139454 has been marked as a duplicate of this bug. ***
Heikki: are you behind a caching proxy like squid or something? that might
explain why you aren't able to recreate it as easy as the rest of us. The jikes
homepage from my dup'd bug ( http://www-124.ibm.com/developerworks/oss/jikes )
shows it 100% of the time with a shift-reload (or in theory by disabling the
local disk and memory cache.)
I am not behind a proxy (as far as I know), but I have unually fast connection
to the web (Netscape has fat pipes;).
*** Bug 135635 has been marked as a duplicate of this bug. ***
WFM with Mozilla/5.0 (Windows; U; Win98; en-US; rv:1.0rc1) Gecko/20020425
*** Bug 142167 has been marked as a duplicate of this bug. ***
Heikki asked for reliably failing test cases.  Please see Bug 142167.

Test case url:

 http://www.aracnet.com/~cmorgan/mozilla_comment_bug.html

If the test page renders properly, it will only show a single line of text. 
When the page fails, it exposes a bunch of ASCII line "art".

If you don't see the ASCII line art with the first page load, simply reload.

This test case is 100% reproducible with the very latest Mozilla bits (as of
last night).
Clark, as with all the other test cases for this bug, the exact network
connection from you to the server seems to play a HUGE role in how reproducable
it is. Your example is 0% successfull in reproducing it from here. Of course I
have a very different network connection to your server than you do.
dunno if it helps anything, but I'm also seeing this at
http://spamcop.net/spamstats.shtml with both win2k RC1 and win2k latest trunk
builds (currently 2002050308)
I've also occasionally seen this on the Bugzilla search pages, though not with
any consistency.  See bug 131863.
*** Bug 142721 has been marked as a duplicate of this bug. ***
*** Bug 143224 has been marked as a duplicate of this bug. ***
I seem to be able to reproduce this on my home laptop via AT&T cable using URL.
It is interesting to note that view source works correctly, though.

What seems to be happening occasionally is that for the second comment
(commenting out <div class="one"...) we don't find the comment end. I don't know
if we simply don't even have that yet, or if we have it in the next string
fragment but we fail to look into it (in FindInReadable()). Because that
happens, the comment parsing code falls back to looking at the closest '>', so
we have the following comment:

<!--
<div class="one">

Everything else in the real comment is displayed. Once we encounter --> it is
just regular text to us, and we display it as text.

I changed this code in bug 110544 (fixed 3/15/2002); we used to go directly to
the scanner (GetChar, Peek, ReadUntil). I would guess that 1) FindInReadable is
broken and/or 2) the scanner takes into account new content brought in by necko
that the string does not, or 3) something else.
harish: could this be related to the mysterious extra null byte you showed me
yesterday?
Ok, a little more information. The fragment we are interested in begins with 

<!--
<div class="one">

and ends with 

...activating this
||<

(where | is unprintable char, at least in VC++ debugger).

While doing FindInReadable, "fast inner loop" reaches the end of this fragment,
and goes into nsSlidingSubstring::GetReadableFragment() asking for next
fragment,  where |current_buffer == mEnd.mBuffer|, so we cannot advance to the
next fragment (the string thinks there isn't one).
I can reproduce this 100% with build 2002050708 on WinME:

1. Go to http://www.people.fas.harvard.edu/~dbaron/css/test/sec051103b
2. Both divs are visible.
2. Use the back and forward buttons.
3. The div with class one has disappeared.
4. Reload.
5. see step 2.
My guess for now is that you just don't have all the data available yet and need
to roll your own searching code that can deal with searching for the "-->" in
buffers appended at a later time.
This was just an experiment to see if it would help moving some string
operations to the scanner in the hope that it would see new content that the
strings classes wouldn't. This patch did not help.

I have a hard time believing my comment parsing changes could have been the
cause of this. The old code did not do anything special to wait for more
content to appear if it did not find a proper comment close, as far as I can
tell.

I'll see some old builds to try to pinpoint the problem start.
Note to self:  nsHTMLTokenizer::ConsumeStartTag() actually unwinds the input if
it run out of data in "this part of the stream". Try the same for
nsHTMLTokenizer::ConsumeComment(). If that works, we probably have a problem
with end tags and every other consume method as well, but for some reason they
just don't show up that often.
dunno if relevant at all, but there are some vaguely resemblant bugs out there,
where code leaks through or content randomly vanish:
bug 82948, bug 137159
> If that works, we probably have a problem
> with end tags and every other consume method as well, but for some reason they
> just don't show up that often.

Just after reading this I viewed <http://www.ftd.de/ub/in/1014399081894.html>
and saw a "</FONT>" after "Phil Morris" on position 24 of the shown list. In the
page source there is a normal closing "font"-tag which was rendered as plain text.
Unfortunatly I cannot reproduce it.

I can confirm that my fix for bug 110544 did NOT cause this bug to happen. When
I tried 3/16 build the URL worked perfectly every time.
I checked old commercial builds, and this problem does not appear in 3/19 build
(around 10am), but it does appear in 3/20 build (also around 10am).

This Bonsai URL shows checkins between 3am on the morning of 3/19 to 3pm on 3/20:

http://bonsai.mozilla.org/cvsquery.cgi?treeid=default&module=SeaMonkeyAll&branch=HEAD&branchtype=match&dir=&file=&filetype=match&who=&whotype=match&sortby=Change+Size&hours=2&date=explicit&mindate=03%2F19%2F2002+03%3A00%3A00&maxdate=03%2F20%2F2002+15%3A00%3A00&cvsroot=%2Fcvsroot
Attached patch Proposed fix for strict mode (obsolete) — Splinter Review
Turns out it was me after all.

This patch assumes that if a comment starts with <!--, we want to wait for the
end of the document for -->, and only if we reach the end without finding the
close we will take the nearest >.

I have been unable to find a testcase that fails in quirks mode. Please mention
this if you see it happen for documents without a DOCTYPE declaration (which
means we enter quirks). dbaron, could you provide a quirks mode document of
your testcase on the same server?
Attachment #83005 - Attachment is obsolete: true
Changing impact to ADT1. This is more likely to happen with slower connection. I
do not see this at work at all, but see this very often at home through AT&T
cable. I can image this will fail nearly 100% with a regular modem. When this
happens, you cannot trust the page to work at all, and you can not always tell
the difference visually.
Keywords: adt1.0.0, dataloss
Whiteboard: [ADT2] → [ADT1 RTM][fixinhand]
Comment on attachment 83131 [details] [diff] [review]
Proposed fix for strict mode

This patch is really small, btw, just 3 lines added (indentation shows up in
normal diff)

+  // If beginData == end, we did not find opening '--'
+  if (beginData == end) {

+  }
Thanks David! I do not see the bug on the quirks page. I think my fix is
sufficient, unless someone can come up with a quirks test that fails.

Reviews?
Comment on attachment 83131 [details] [diff] [review]
Proposed fix for strict mode

Looks OK to me. r=choess.
Attachment #83131 - Flags: review+
Attachment #83131 - Flags: superreview+
Comment on attachment 83131 [details] [diff] [review]
Proposed fix for strict mode

sr=jag
Actually I can reproduce this in quirks mode as well. I realized the beginning
DOCTYPE is significant because it affects where the buffers begin/end. By
changing dbaron's strict testcase so that all the strings in the DOCTYPE were
replaced with equal length "X" strings I saw this bug in quirks mode. I think a
similar fix would work for quirks mode, but since the logic is much more
complicated and fragile I need to verify this carefully (tomorrow). I can land
strict and quirks part separately if needed.
I have an observation, take it for what it's worth....

I don't think Mozilla 1.0 should be released with this bug left unfixed.  This,
in my opinion, is the kind of bug that rattles the confidence of html coders. 
Hypothetical conversation:

Coder1:  "Say, did you know that if you comment out html code, Mozilla may show
it anyway?"

Coder2:  "No way."

Coder1:  "Way."

Coder2:  "You've got to be kidding...that's whacked!"

I'm admittedly not conversant with Mozilla bug severity/priority schemes, but it
doesn't look like this bug is going to stop the 1.0 release.  It should.
confirming patch 83131 atop a fresh 1_0_0 branch pull on linux with the half
dozen reliably recreatable test cases I've seen (all have been 4.0 strict or 4.0
transitional). Yeah Heikki!
*** Bug 144205 has been marked as a duplicate of this bug. ***
Let's get this verified on the trunk before getting checked into the branch. 
I noticed line numbers get out of whack if we return kEOF, so adding a small
check for return values before counting the lines.

I also did some more testing regarding quirks mode, and it seems what I thought
was quirks mode was in fact strict mode. It seems if document begins with:

<!DOCTYPE HTML PUBLIC "xxx..." "xxx...">

seems to be enough to be enough to trigger strict mode. I thought we scanned
the public ID for HTML 4.0 string, or something.

After changing the DOCTYPE even further I did not see this bug in the
quaranteed-to-be-quirks mode document that was exactly as many bytes as the
strict one. So it seems it does not appear in quirks, but if anyone observes
otherwise please let me know.

I still need to do a little bit more testing, especially for documents that
have no proper comment end, and go through some regressions tests.
Attachment #83131 - Attachment is obsolete: true
Unknown doctypes trigger quirks mode, for future-compatibility.  See
http://mozilla.org/docs/web-developer/quirks/ .
Blocks: 143047
dbaron: I think you meant to say "unknown doctypes trigger *standards* mode, for
future compatibility". :)

FWIW, I just played around with a trunk (nightly binary, not with these patches)
from a couple of days ago, and I can't make this bug happen with quirks mode in
the couple of pages where I was seeing it in standards mode.
This is a major issue, so maybe someone should set the mozilla1.0 keyword.
This testcase is mainly for regression testing, although this patch shows we
have 2 more unreported bugs at our hands. My next patch will fix one (<!foo>
followed by <!--) and I will open a new bug for the other case (the contents of
comment node should be the insides of -- -- sections, even when there are
several; currently we include the inner --). Another good regression test is

http://lxr.mozilla.org/mozilla/source/htmlparser/tests/logparse/comments.html
It turns out we can do a little better by checking the scanner's state
(IsIncremental) in comment parsing before returning kEOF and causing unwind. If
we are not incremental, by returning NS_OK we effectively skip this comment
start (it was not a strict comment).

I will open a new bug to finish this work; not 1.0 critical.
Attachment #83786 - Attachment is obsolete: true
Filed bug 144901 for minor problems with comments that have several comment
delimiter pairs, and bug 144902 regarding unclosed comments in strict mode. Both
are minor issues, although the latter is more important.
Comment on attachment 83816 [details] [diff] [review]
Make changes harishd requested

+  #if 0
+	   // XXX We should do this, but it HANGS until bug 112943 is fixed:
+	   aString = Substring(beginData, current);
+  #else
...

sr=jst, but please leave all pre-processor directives at column 0 like they
normally are.
Attachment #83816 - Flags: superreview+
Comment on attachment 83816 [details] [diff] [review]
Make changes harishd requested

r=harishd
Attachment #83816 - Flags: review+
Fixed on trunk.
Whiteboard: [ADT1 RTM][fixinhand] → [ADT1 RTM][fixed on trunk 5/16]
Please get this verified on the trunk.  What areas of code do we need to test
around to be sure this didn't break something else?
This can affect any HTML page we parse in strict mode that has comments; both
viewing the page itself and viewing page source. Basically anything on those
pages could break in the worst case (but shouldn't).
adding adt1.0.0+.  Please get drivers approval and checkin to the 1.0 branch.
Keywords: adt1.0.0adt1.0.0+
Comment on attachment 83816 [details] [diff] [review]
Make changes harishd requested

a=chofmann,brendan
please check in to 1.0 branch asap to get this in rc3.
Attachment #83816 - Flags: approval+
Fixed on the branch too.
Status: ASSIGNED → RESOLVED
Closed: 18 years ago
Keywords: adt1.0.0+fixed1.0.0
Resolution: --- → FIXED
*** Bug 145887 has been marked as a duplicate of this bug. ***
*** Bug 146086 has been marked as a duplicate of this bug. ***
Marking verified fixed, tested URL and testcases with branch build ID 20020523
on winxp adding KW:verified1.0.0
Status: RESOLVED → VERIFIED
Keywords: verified1.0.0
Whiteboard: [ADT1 RTM][fixed on trunk 5/16] → [ADT1 RTM][fixed on trunk 5/16] custrtm-
You need to log in before you can comment on or make changes to this bug.