Closed Bug 225667 Opened 22 years ago Closed 21 years ago

various HTML constructs are mangled by composer

Categories

(SeaMonkey :: Composer, defect)

x86
Windows 2000
defect
Not set
normal

Tracking

(Not tracked)

RESOLVED DUPLICATE of bug 141338

People

(Reporter: ddyer, Unassigned)

Details

User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.0; en-US; rv:1.5b) Gecko/20030827 Build Identifier: Mozilla/5.0 (Windows; U; Windows NT 5.0; en-US; rv:1.5b) Gecko/20030827 Some HTML constructs that are interpreted correctly by Mozilla are mangled by composer, which means that if you pass a page through composer, it is damamaged and no longer works. Here's a sample page which exhibts two independant cases of this. Here is the sample text: <applet ><!--#include virtual="/cgi-bin/gs_AppletTag.cgi" --> height="550" width="500" > <param name=test value=test> </applet> <table><caption>Demonstrate that tag structure is mangled</caption> <form action=/cgi-bin/process.pl> <tr><td>row 1 1</td></tr> <tr><td>row 2 1</td></td> <input type=submit> </form> </table> The two "odd" things about this are that the <applet .. > tag has a server side include for apache. In the result a new > is added in the wrong place, and the real > is converted to &gt; In the original, The <form> ... </form> construct spans a table <tr></tr> pair. This works perfectly, but composer rewrites it with a new </form> immediately after the <form>. This destroys the form. Reproducible: Always Steps to Reproduce: 1. save the sample text as a document. View in mozilla. 2. view source, note all is well. 3. use "edit page" then "view source" in composer 4. note the new </applet> and </form> tags added. Actual Results: <html> <head> </head> <body> <applet =""><!--#include virtual="/cgi-bin/gs_AppletTag.cgi" --> height="550" width="500" &gt; <param value="test" name="test"></applet> <input type="submit"> <table> <caption>Demonstrate that tag structure is mangled</caption> <form action="/cgi-bin/process.pl"></form> <tbody> <tr> <td>row 1 1</td> </tr> <tr> <td>row 2 1</td> </tr> </tbody> </table> </body> </html> Expected Results: I expect the output to have the same tree structure as the input, at least if the input tree is well formed, as is the case here. The worst thing about this bug is that the damage is pretty silent, and could easily not be noticed until long after the page was edited.
Since the test steps don't involve any actual editting, I belive this means it is a serializer problem.
Assignee: composer → dom-to-text
Component: Editor: Composer → DOM to Text Conversion
Um... you are loading something that's not HTML. We parse it into a DOM. Then when you save, we serialize out the DOM. If you give an app that processes HTML something that's not HTML, it'll try its best to make it look like HTML, which is what we do. This is a dup. And it's not a DOM-to-Text bug, since given the DOM we have the conversion is exactly correct.
Assignee: dom-to-text → composer
Component: DOM to Text Conversion → Editor: Composer
Whiteboard: DUPEME
The testcase seems confusing. Is the first snippet the input or the output? It looks like a mixture of the two. And if there's any "exactly correct" conversion of the broken form tag, then according to a little-implemented corner of the HTML spec, it's something like this: <form action="">cgi-bin</form>process.pl&gt;
For what it's worth, the greater than character > is also converted to &gt; (in some cases) when attemting to send properly constructed javascript in email. This would point to a common problem in the serializer..no?
No, that would point to a totally different issue from this bug.
What is the doctype associated with the snippet in the original bug description? Is the comment with the #include correctly formatted or does it need a space before the #? Is there an extra > in the applet tag's line?
the original documents are plain html. There are no "extra" brackets in the source. In the case of the applet tag, embedded < > inside the tag is a syntactically correct comment, which should be parsed and passed through to the output. In the case of the table tag, all the constructs are well formed and properly nested, but the sequence of constructs doesn't match some preconvieved idea of what can be found below a <table> tag. The REALLY unaccepable aspect of this bug is that it damages pages with no warning. An acceptable halfway measure would be to at least complain that the input was ill-formed and the output is possibly damaged.
Stewart: NET only applies to tags with no attribute specification list, and the presence of "action=" would presumably initiate recognition of aforesaid, but good guess. "In the case of the applet tag, embedded < > inside the tag is a syntactically correct comment, which should be parsed and passed through to the output." No, it isn't. While it would be syntactically correct in document content, comments are not permitted within start-tags (or end-tags, for that matter). "...but the sequence of constructs doesn't match some preconvieved [sic] idea of what can be found below a <table> tag." Right, that's called a "DTD", and it's par for the course in HTML (although our parser is considerably more lenient than the standard in that respect). Because well-formedness is not required in HTML, some limits on the arrangement of tags are required. Bear in mind that whatever source is output will produce exactly the same results in the browser as would feeding it your original source (with the exception in this case of the SSI processing). As far as generating warnings, I think that would effectively require a validating parser attached to Composer, and I don't think anyone's prepared to undertake that level of architectural rearrangement in the near future.
> Bear in mind that whatever source is output will produce exactly the same > results in the browser as would feeding it your original source Actually, not. Not with the screwed up form nesting. Note that the input is no longer inside the form in the result (we keep out-of-band info on what input goes with what form in cases like that that's lost at serialization time).
> Bear in mind that whatever source is output will produce exactly the same > results in the browser as would feeding it your original source Actually, not in either case. The <applet > tag that emerges is complely mangled, and the intended side effect of the comment is completely lost, since the comment is gone and would have been interpreted by apache.
(In reply to comment #8) > Stewart: NET only applies to tags with no attribute specification > list What on earth is NET? > "In the case of the applet tag, embedded < > inside the tag is a > syntactically correct comment, which should be parsed and passed > through to the output." There's no embedded < > in the reporter's example, unless the immediately preceding > somehow counts as 'embedded'. > Because well-formedness is not required in HTML, some limits on the > arrangement of tags are required. What is meant by well-formedness, exactly?
(In reply to comment #11) > What on earth is NET? "Null End Tag". The thing that says <a /> and <a></a>&gt; are the same thing. > There's no embedded < > in the reporter's example, unless the > immediately preceding > somehow counts as 'embedded'. Actually, the original example has a stray '>' after '<applet' that needs to be removed to get the mangling described under "actual results". So we're looking at markup like: <applet <!--#include virtual="/cgi-bin/gs_AppletTag.cgi" --> height="550" width="500" > > What is meant by well-formedness, exactly? Choess meant well-formedness as defined in the XML 1.0 spec.
Product: Browser → Seamonkey
Sounds like bug 141338 to me, and bz. *** This bug has been marked as a duplicate of 141338 ***
Status: UNCONFIRMED → RESOLVED
Closed: 21 years ago
Resolution: --- → DUPLICATE
Whiteboard: DUPEME
You need to log in before you can comment on or make changes to this bug.