Closed
Bug 154304
Opened 22 years ago
Closed 14 years ago
nested <dl>'s inconsistently parse, 1 byte difference (scanner is confused)
Categories
(Core :: DOM: HTML Parser, defect)
Tracking
()
RESOLVED
WORKSFORME
People
(Reporter: ftobin+bugzilla, Unassigned)
Details
Attachments
(9 obsolete files)
From Bugzilla Helper:
User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.0.0) Gecko/20020606
BuildID: 2002060614
This is quite difficult to explain (Mozilla is acting *very* inconsistently, and
I'm trying to be exacting), so please bear with me.
Note: I'm going to attach the pages which have problems, instead giving a URL,
because Mozilla is acting differently depending on extensions, and the problems
are seen most vividly if loaded from the local HD.
Each file I'm going to attach has a list of two-level nested <dl>'s. You'll
notice that the files are exactly the same, except for the abscence of exactly 3
'class' attributes on some elements in the second page. Note that there is no
CSS at all in this page, or linked to. If the files are named, say, 'out2' and
'out3' (no extension), then Mozilla renders these files differently. However,
if 'out2' is renamed to 'out2.html', it is suddenly rendered the same as 'out3'.
Reproducible: Always
Steps to Reproduce:
1. Save the two attached files to the local disk, and name them 'out2', and 'out3'.
2. Diff 'out2' and 'out3' to verify the differences are minimal:
$ diff out2 out3
11,13c11,13
< <dt class="date">Date</dt>
< <dd class="date">2002-06-17</dd>
< <dt class="picked_up">Picked Up</dt>
---
> <dt>Date</dt>
> <dd>2002-06-17</dd>
> <dt>Picked Up</dt>
3. View each file in Mozilla. Note that <dl> inside the first item (the one
with Date: 2002-06-17) is rendered differently in each case.
4. Rename out2 to out2.html, and note that is now rendered the same as out3.
Actual Results: The files render differently, even though the differences in
the files are strictly non-presentational.
Expected Results: The files should render the same, specifically as 'out3' is
rendered.
Each file is XHTML 1.1 conformant.
It seems that there is a variety of small, similar, should-be-non-affecting
changes that I can make to out2 to get it to render the same as out3, so
isolating this change is extremely difficult for me. I have merely picked one
small change that should obviously not affect the rendering.
Reporter | ||
Comment 1•22 years ago
|
||
Reporter | ||
Comment 2•22 years ago
|
||
Reporter | ||
Comment 3•22 years ago
|
||
I'm going to have to replace the out2 file I've uploaded, and give a different content-type for it, because when it uploaded Bugzilla modified the data inside the file (e.g., removed the XML declaration), and said modification magically 'corrects' the problem. I'll use octet-stream so that Bugzilla doesn't dare touch it.
Attachment #89207 -
Attachment is obsolete: true
Reporter | ||
Comment 4•22 years ago
|
||
bz: When it doesn't have an extension, is it going to come into the parser with a text/xml MIME type (from the unknown decoder) or with a blank mime type?
Reporter | ||
Comment 6•22 years ago
|
||
Note: this screenshot is identical to 'out2' being renamed to 'out2.html'.
Comment on attachment 89209 [details]
out2 (with the class statements, renders incorrectly)
Changing this one to text/xml.
Attachment #89209 -
Attachment mime type: application/octet-stream → text/xml
Mmm, never mind, it's not really XHTML.
Reporter | ||
Comment 9•22 years ago
|
||
Re: comment #8: What do you mean it's not XHTML? W3 thinks it is: http://validator.w3.org/check?uri=http%3A%2F%2Fbugzilla.mozilla.org%2Fattachment.cgi%3Fid%3D89209%26action%3Dview&charset=%28detect+automatically%29&doctype=Inline
It doesn't have an xmlns attribute on the root element, so it can't usefully be served as text/xml. (Validating XHTML with an SGML parser is a bit of a joke, too.)
In any case, I do see the bug when I download it to a local disk, and my initial guess would be that it's a Parser bug, both since the parser is the main thing that does evil things with MIME types and because the problems seem like the result of misparsing (perhaps even due to packet sizes, though).
Assignee: attinasi → harishd
Component: Layout → Parser
QA Contact: petersen → moied
Status: UNCONFIRMED → NEW
Ever confirmed: true
Reporter | ||
Comment 12•22 years ago
|
||
db: Out of curiousity, what validator did you use to check it? Something online or something in your head? :)
Component: Parser → Layout
Comment 13•22 years ago
|
||
The unknown decoder would flag that one (luckily) as text/html, because it does not flag anything as text/xml. We should correct that... And yes, this is most likely a parser bug, and most likely a dup of our other nested <dl> bugs.
Component: Layout → Parser
Whiteboard: DUPEME
Comment 14•22 years ago
|
||
Frank, see http://www.w3.org/TR/xhtml1/#normative, item #3.
Reporter | ||
Comment 15•22 years ago
|
||
Oh, I'm not disagreeing about the validity anymore, I was just wondering how you picked it up on it, since the validator most people probably use, w3.org's, didn't. As a side note, adding the xmlns attribute to make it XHTML 1.1 doesn't fix the problem.
Comment 16•22 years ago
|
||
It's a matter of working with this stuff and trying to implement it... leads to a certaing memorization of the spec. ;)
Reporter | ||
Comment 17•22 years ago
|
||
I've been able to isolate a trigger of the bug. The attachment I'm making now, 'out4', renders incorrectly (the same as 'out2'). The diff from 'out3' (which renders correctly) to 'out4' is: 11c11 < <dt>Date</dt> --- > <dt>Date 1234</dt> As you can see, the only addition is 5 characters. Note that 'out4' is 2038 bytes. I've been repeatedly able to find that removing 1 byte from anywhere in the page, bringing the size down to 2037 bytes causes the bug to *not* be triggered. For example, the 'out5' I will be attaching next is 2037 bytes. It renders correctly (the same as 'out3'). The diff from 'out4' (correcty) to 'out5' (incorrect) is: 11c11 < <dt>Date 1234</dt> --- > <dt>Date 123</dt> The bug is most definitely triggered by moving from 2037 to 2038 bytes, anywhere in the page.
Reporter | ||
Comment 18•22 years ago
|
||
This file is referenced in the comment attached to the 'out4' attachment.
Reporter | ||
Comment 19•22 years ago
|
||
Ack, I need to correct what I said earlier in comment #17: I said: > The diff from 'out4' (correcty) to 'out5' (incorrect) is: I reversed the situation; it should be: The diff from 'out4' (incorrect) to 'out5' (correct) is:
Reporter | ||
Updated•22 years ago
|
Summary: nested <dl>'s inconsistently indent, depending on things it shouldn't → nested <dl>'s inconsistently indent, depending 1 byte difference
Reporter | ||
Comment 20•22 years ago
|
||
The issue definitely seems to come down to a parsing issue. This is a screenshot of inspecting the DOM tree of 'out4'. The DL highlighted in this screenshot and the upcoming screenshot for 'out5' represents the DL that is not being indented properly. As you'll note, the highlighted DL for 'out4' is incorrectly located in the document. The highlighted DL in the 'out5' DOM screenshot shows it being in the DOM correctly.
Reporter | ||
Comment 21•22 years ago
|
||
This screenshot of the 'out5' DOM tree is mentioned in comment #20.
Comment 22•22 years ago
|
||
What are the _exact_ steps to reproduce here? When I save "out4" as "test.html" and open in Mozilla, it renders fine...
Reporter | ||
Comment 23•22 years ago
|
||
First of all, I'm changing the summary to reflect the fact that this is a parsing, not rendering issue. The rendering issue is due to an incorrect DOM. The sample file out4 and out5 now BOTH render correctly in Mozilla 1.1. I'm not sure when they suddenly started working. However, I have NEW example files which, again, differ by one byte, yet produce different DOMs in 1.1. I shan't bother with the screenshots or dom tree attachments anymore; they were probably just confusing.
Summary: nested <dl>'s inconsistently indent, depending 1 byte difference → nested <dl>'s inconsistently parse, 1 byte difference
Reporter | ||
Updated•22 years ago
|
Attachment #89208 -
Attachment is obsolete: true
Reporter | ||
Updated•22 years ago
|
Attachment #89209 -
Attachment is obsolete: true
Reporter | ||
Updated•22 years ago
|
Attachment #89210 -
Attachment is obsolete: true
Reporter | ||
Updated•22 years ago
|
Attachment #89211 -
Attachment is obsolete: true
Reporter | ||
Updated•22 years ago
|
Attachment #89275 -
Attachment is obsolete: true
Reporter | ||
Updated•22 years ago
|
Attachment #89276 -
Attachment is obsolete: true
Reporter | ||
Updated•22 years ago
|
Attachment #89280 -
Attachment is obsolete: true
Reporter | ||
Updated•22 years ago
|
Attachment #89279 -
Attachment is obsolete: true
Comment 24•22 years ago
|
||
Frank, could you get me those files that are showing the problem? I'm investigating <dl> parsing as it is, and I'd like to see how my changes affect this bug.... (you're right that screenshots are not necessary as long as the files show the problem).
Reporter | ||
Comment 25•22 years ago
|
||
I was hoping to attach them, but due to a combination of bug #179290 and bug #87404 I can't. So, I'll provide urls and hope that there aren't any cr/nl issues. I recommend downloading them locally, and then viewing them in Mozilla (if they are delivered as text/html the bug does not present itself). http://www.neverending.org/~ftobin/tmp/out6 is valid, 1053-byte XHTML 1.1 document and is parsed into a DOM correctly. http://www.neverending.org/~ftobin/tmp/out7 is valid, 1054-byte XHTML 1.1 document and is parsed into a DOM *incorrectly*. It really doesn't matter where the extra byte happens to be; I just put a ! in the last dd.
Reporter | ||
Comment 26•22 years ago
|
||
Here are my new agent/build specs: User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.1) Gecko/20020912 Build Identifier: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.1) Gecko/20020912
Reporter | ||
Comment 27•22 years ago
|
||
FYI, it should be clear from the rendering difference between the two files where the DOM trees differ. Basically, the <dd> after the "Linux Security Newsletters <dt> is closed prematurely, and the <dl> that is supposed to be a child of said <dd> becomes the succeeding sibiling of it instead.
Comment 28•22 years ago
|
||
Tried saving those two files as .xml, .html, .xhtml. They render correctly (and identically) in all three cases. Linux build 2002-11-01-21 here.
Reporter | ||
Comment 29•22 years ago
|
||
Interesting; I just noticed that if you have an extension on the filename, then it renders correctly. But if you don't, it doesn't render correctly. For example, have out7 be just plain 'out7', with no extension; it renders incorrectly (at least for me).
Comment 30•22 years ago
|
||
OK, I see that with a linux trunk 2002-11-01-21 build. I get the following warnings when loading out7 as "out7" and not as "out7.html": WARNING: NS_ENSURE_TRUE(NS_SUCCEEDED(result)) failed, file /home/bzbarsky/mozilla/debug/mozilla/htmlparser/src/nsHTMLTokens.cpp, line 343 WARNING: NS_ENSURE_TRUE(NS_SUCCEEDED(result)) failed, file /home/bzbarsky/mozilla/debug/mozilla/htmlparser/src/nsHTMLTokenizer.cpp, line 801 The second warning is a corollary of the first. The first happens because FillBuffer() is returning an end-of-file error, because the scanner's mInputStream is null! Which seems very very wrong...
Summary: nested <dl>'s inconsistently parse, 1 byte difference → nested <dl>'s inconsistently parse, 1 byte difference (scanner is confused)
Updated•15 years ago
|
Assignee: harishd → nobody
QA Contact: moied → parser
Comment 31•14 years ago
|
||
I don't see a problem here anymore.
Status: NEW → RESOLVED
Closed: 14 years ago
Resolution: --- → WORKSFORME
You need to log in
before you can comment on or make changes to this bug.
Description
•