Closed Bug 59985 Opened 25 years ago Closed 13 years ago

Parser needs to tag <? (PI/PHP) and <% (ASP/JSP) so that composer can decide not to corrupt them when it saves PHP/JSP/ASP files

Categories

(Core :: DOM: HTML Parser, enhancement, P1)

enhancement

Tracking

()

RESOLVED WONTFIX
mozilla1.6alpha

People

(Reporter: Michael.Kolmodin, Unassigned)

References

(Blocks 1 open bug, )

Details

(Keywords: helpwanted)

Attachments

(1 file, 1 obsolete file)

From Bugzilla Helper: User-Agent: Mozilla/5.0 (X11; U; SunOS 5.8 sun4u; en-US; m18) Gecko/20001109 BuildID: [Mozilla] Mozilla M18 Mozilla/5.0 (X11; U; SunOS 5.8 sun4u; en-US; m18) Gecko/20001109 JSP files typically contains things like <%@ taglib href="uri" %>. The '<%' and '%> are converted to &lt;% and %&gt; when saved from the editor. This brakes the JSP syntax. Reproducible: Always Steps to Reproduce: - Load the editor with a legal JSP file containing stuff like above (that is, almost any one). - Save it Actual Results: JSP file corrupted, <% => &lt;% Expected Results: Should leave all '<%' and '%>' character combinations as-is. Besides this, I see no major problem using the composer on JSP pages :-)
accepting for Beth and moving to Future; add keyword helpwanted
Status: UNCONFIRMED → ASSIGNED
Ever confirmed: true
Keywords: helpwanted
Summary: JSP files corrupted by composer → JSP files corrupted by composer: '<%' and '%>'
Target Milestone: --- → Future
-->brade
Assignee: beppe → brade
Status: ASSIGNED → NEW
spam composer change
Component: Editor: Core → Editor: Composer
-->akkana for investigation/reassignment
Assignee: brade → akkana
Target Milestone: Future → ---
We need a policy on this. The parser, of course, doesn't recognize these tags, so it passes them through into the dom as &lt; and &gt; entities. It's correct for an html editor to leave those entities in entity form (outputting them as jsp wants would be incorect html). So we're doing the right thing, as regards an html editor. However, it would also be great to be able to edit .jsp and .php and other types of super-html scripted files. What we need here is some mechanism for composer to know that it's editing something that isn't an ordinary html file, and have some plugged-in behavior to treat certain types of tags specially. For extra credit, perhaps we could have pluggable types where a user could define the character sequences he wanted treated specially for a particular file extension. Then somehow composer needs to be able to tell the serializer not to entity-ize the tags: there's no current mechanism to do that -- it's easy to add a mechanism in the serializer over the whole document, but there's no current way to tell the serializer to treat one character or one section of a document specially (e.g. suspend entity translation but only for this section of the document).
Target Milestone: --- → Future
*If* we should be able to handle JSP, ASP etc. there are more aspects. Quoting myself from n.p.m.editor long time ago there are tree issues related to this. * To handle the syntax. JSP uses blocks like/ <% code...%>/, ASP and PHP has a similar syntax. Today, we read it but when saved the block is converted to /&lt;% code... %&gt>;/ which just does not work. I've filed a bug 59985 concerning JSP about this. * To display this kind of block in running text. Today, such a block is represented by the "yellow markers". Although this works, I think it would be a big advantage to use another colour, eg. red, for this kind of "active code" which definitely is something else than normal, unknown tags.. * To display a tag (i. e., an input field) which contains active code. This might be something like / <input type="text" size="<? print $size? >" > /. The only thing we could and should do is to display the complete tag as some kind of "red box"; by definition, we don't have sufficient information to render it. I'm definitely not sure whether this is a correct and complete picture...
Are these files coming to us with a mimetype of text/html or ? Perhaps the problem is in the mimetype code? (Since Composer is not ready or able to handle these types of files right now.)
The very idea of these files is that they represent some kind of source code which is converted to HTML in runtime. Thus, there are no registered types for them to my knowledge. The mime type is used when transferred from server to client. At this point, all kind of scripted files have been converted to legal "text/html", and this is the type used. Bottom line: scripted files is a large and increasing part of web content today. I think it might be justified to make an effort to enhance the composer to be able to handle it. Of course, there tradeoffs here...
OS: Solaris → All
Hardware: Sun → All
Summary: JSP files corrupted by composer: '<%' and '%>' → JSP/PHP files corrupted by composer: '<%' and '%>' <?php ?>
*** Bug 100672 has been marked as a duplicate of this bug. ***
Ueful links: JSP: http://java.sun.com/products/jsp PHP: http://www.php.net/ ASP: http://msdn.microsoft.com/library/default.asp? url=/library/en-us/iisref/html/psdk/asp/iiwawelc.asp I guess our main focus should be PHP and JSP - ASP is a MS solution and I doubt we will become strong in this area. Propably, future scripting technologies will be focused on XML; that is, they should not be actual until we want to make an XML editor.
At the very least I would like to see Composer stop obliterating those tags, as it makes it unusuable for any page that has even the slightest PHP call (like a simple function to append a common footer to each page)
I'd like that too. I use PHP for footers on my own site. But unless you can convince my boss (and his boss) that it's a company priority (not likely), all I can say right now (that I haven't already said) is "We're accepting patches".
Okay, the folks involved with this bug feel strongly about getting a partial solution to the problem (private mail) -- just having a way to output the < without it getting munged into &lt; (and same for >). And I agree -- I'm bothered that we have no way of preserving the user's original source, and this isn't the only bug in which this issue has come up. But that part of it isn't an editor issue, it's a parser/serializer issue. The parser doesn't save any information indicating whether the original file had < vs &lt; so the serializer assumes that since < is an error in html, that character must have been an entity, and changes it to &lt; because there's no way of storing < in the dom tree. If the parser stored the entity &lt; differently from the ascii (well, ucs-2) character <, then the serializer could do the right thing on output. Now that the serializer is owned by the parser group, perhaps there's a chance this could happen. Reassigning to the parser group, but I'm staying on the cc list and will be happy to do anything that might be needed in the editor, or even help with the serializer end of the fix if the parser offers some way to differentiate.
Assignee: akkana → harishd
Component: Editor: Composer → Parser
QA Contact: sujay → bsharma
Talking about storing '<': Wouldn't it be easier to parse '<%', '<?', ,'%>' and '?>' instead? If we define these as "startCode" and "endCode" respectively, just handle the things in between as a comment?
Blocks: 101917
I've filed a new bug 101917 which covers handling of tags containing scripted code. This means that from now on this bug (59985) is only concerned about preserving the pieces of scripted code such as <% jsp code %> or <? php code ?> in HTML source files.
*** Bug 101917 has been marked as a duplicate of this bug. ***
*** Bug 152909 has been marked as a duplicate of this bug. ***
Summary: JSP/PHP files corrupted by composer: '<%' and '%>' <?php ?> → Parser needs to tag <? (PI/PHP) and <% (ASP/JSP) so that composer can decide not to corrupt them when it saves PHP/JSP/ASP files
*** Bug 153019 has been marked as a duplicate of this bug. ***
*** Bug 166609 has been marked as a duplicate of this bug. ***
Mozilla would be a great CMS for php systems when this bug is fixed. . . In terms of its interaction with Composer, I don't think editors should change code without express permission, but I'm not sure of the connections with the browser code.
Why would there be *any* modification being made to tags that are not strictly known to be HTML/XML/etc???? I would argue that no modification should be made at all to tags that are not known entities, nor to their scope. -No closing tag generated, and no modification made to its params, nor body. There's a huge assortment of other tag-based embedded languages that have no DOM and use characters beyond just '%' and '?', including a-zA-Z, !, etc, to represent the parser entry point, or they have non-standard characters in their params. Consider Standard SSI: <!--#echo var="DOCUMENT_ROOT"--> Standard JSTL ( http://java.sun.com/webservices/docs/1.0/tutorial/doc/JSTL5.html ): <c:set var="bookId" value="${param.Remove}"/> <jsp:useBean id="bookId" type="java.lang.String" /> <% cart.remove(bookId); %> <sql:query var="books" dataSource="${applicationScope.bookDS}"> select * from PUBLIC.books where id = ? <sql:param value="${bookId}" /> </sql:query> Both are mangled --even though the JSTL sample has a DOM heirarchy and is a good model for a embedded langauge. So, leave any tag alone --no matter *what* character(s) follow the '<', if its not strictly a known HTML/XML/etc entity. If not always, at least when the "Prefs:Composer:retain Orig HTML Source Formatting" is selected. If the issue is just that we need be able to open an original text-format file and automatically translate it to HTML meta characters, shouldnt that be enabled by strictly by the [file-open,files-of-type(html || text)] dialog?
Ken (comment 22 -- perhaps I'm missing a subtlety in your point? if so, please restate in a more concise manner): Please don't distract from the specific issue covered in this bug. There is at least one other bug on the issue of changing the output in general (when it shouldn't have). This bug is specific to php/asp/jsp tags which the parser doesn't hand over to Composer. Comment 14 is the best summary I have seen for what is going on. Please read or re-read it to understand.
Then I would presume that, given the many other type of tags beyond just <? and <%, that the parser folks should try to preserve *any* unknown tag. I think that could also apply to the subject of this bug; 'tag any unknown tag, not just <? and <%, so that composer..'. Would this be a more accurate objective? "Parser needs to tag {[mark?] unknown tags/entities} so that composer can decide not to corrupt them when it saves PHP/JSP/ASP files" I definately acknowledge that this is a hard one... Hope this is more clear. -k
Ken, why did you remove a long list of people from the cc list? Shouldn't that be our choice to make?
I haven't really delved into the source so I have no idea what the implementation issues here are. I gather from these comments that they are far from trivial. However I'd REALLY like to see some kind of handling added to Mozilla in this respect. Once I've added some PHP to some HTML, I can no longer use Composer to edit the file at all. So I tend to save php-free templates of my pages, and then cut-and-paste all my php back into them every time I make changes. :( It would be a godsend if Composer would tag everything between <? and ?> as a php block, ignore it on rendering, and save it as is. I'm not asking for advanced handling of PHP and every other special scripting type under the sun; I'd just like to see the bare minimum of recognizing certain types and leaving them unmolested (and unrendered) during editing and saving. I'd love to try helping out here but I wouldn't even know where to begin, not having any clear idea of how Mozilla operates under the hood.
Version 1.2b has gone worse; it seems to strip out everything between <? and ?> rather than just mangling it.
Why can't we just do the same thing we do with a COMMENT OR A JAVASCRIPT -- leave everything inside the <!-- --> OR <script> </script> tags UNTOUCHED (though I think there are some problems with changing line endings even within comments that must be fixed). Just leaving everything inside of <?php ?> untouched would make Composer at least usable on these file types for editing the static HTML parts of the page, and should be very easy to accomplish! CCing composer gurus burpmaster@truffula.net & daniel@glazman.org
Well, a comment gets output as <!-- and a script tag as <script>, so obviously it would have to be flagged specially in order to get it output as <?php. Maybe we could add an attribute to the comment node, or even a pair of attributes, for open (="?php" in this case) and another for close (="?"). The parser code to figure out where the special-comment sequence begins and ends still needs to be written, though (that's what the helpwanted keyword is there for). The fixes to the serializer and editor are probably straightfo if someone steps up to fix the parser to stick these tags into the dom in the first place.
Um, special comment? "<? ... >" is a marked section, perfectly valid in HTML, and should be entirely preserved; I suspect parser is just screwing up, plain and simple. "<%" is, unfortunately, an entirely different beast. I'm of the opinion that the code for marked section handling should be shared among the content sinks as we do for comments; maybe this would be a good time.
> Um, special comment? "<? ... >" is a marked section, perfectly valid in HTML No. It is a processing instruction. I am taking this bug but I am warning readers that it will probably be solved ONLY in standalone composer. stay tuned.
Assignee: harishd → daniel
Severity: normal → enhancement
Priority: P3 → P1
Target Milestone: Future → mozilla1.6alpha
Attached patch this is part of the solution (obsolete) — Splinter Review
this patch preserves PHP PIs (and only them) in HTML documents opened/saved in the forthcoming Standalone Composer. It is not a complete fix since I still have a problem with direct modifications of the PHP PIs themselves in Composer's source view.
There is still one little problem, unrelated to my new code, but more probably related to nsHTMLEditor::InsertHTMLWithContext(). Investigating. If I can't find it, I'll ask for reviews for the current patch.
Neat! Daniel, why is it only applicable to standalone composer? Isn't there any way we could add it in the non-standalone case, rather than just dropping it? If there's no way, then please leave this bug open for the non-standalone case after the standalone fix is checked in.
> Neat! Daniel, why is it only applicable to standalone composer? > Isn't there any way we could add it in the non-standalone case, > rather than just dropping it? The problem is adding some sort of Processing Support to the html parser *for the editor* without adding it to the same html parser *for the browser*... That's why a model where the browser and the editor are distinct applications is VERY helpful. We can make modifications/adaptations to core modules that are specific to a high-level application. I am here thinking of the DOM : the editor will need additions that the browser will probably not want to integrate.
What would the browser do with it if we added it in browser mode? Would it cause bad behavior? Is there any way we could add it as a comment (when not in standalone editor mode), as long as you've written the code to notice it in the first place?
We could add support for PIs in HTML to the browser too (though the DOM spec limits them to XML). Not sure how frequent they are and what the exact side-effects would be. IIRC PIs end with a single > in SGML/HTML but attachment 131141 [details] [diff] [review] seems to cut 2 chars from the end.
Yes, PI's end with just a '>' in SGML, and thus they should do the same also in HTML, if we choose to "support" them, IMO.
Hmmmm. The problem is the following one : nobody uses processing instructions in HTML 4 *but* for PHP code. So what we really want/need here is the <? ... ?> form, not the SGML one. Peter, Jst, what do you suggest here? Support both forms in HTML? Or...
It kinda doesn't matter does it, but if we follow the SGML rules we're more likely to "properly" deal with non PHP PI's that are real SGML PI's too. If we go with XML rules, then we'll fail to parse SGML PI's, but if we go with SGML rules we'll deal with both types, at no cost. The only difference is that the '?' before the closing '>' will be consider part of the PI's content, and we'll need to make the serializers deal with this situation, and there's a couple of options there...
Supporting SGML PIs would improve round-tripping :-).
Wouldn't implementing entity reference nodes solve the problem of "the DOM doesn't know whether the '<' came from &lt; or from normal text"?
Yes, that would solve the problem for non-character entities.
*** Bug 236912 has been marked as a duplicate of this bug. ***
*** Bug 256908 has been marked as a duplicate of this bug. ***
*** Bug 286002 has been marked as a duplicate of this bug. ***
*** Bug 247589 has been marked as a duplicate of this bug. ***
Has anyone some news about this bug? Mozilla composer is a very good editor and it's a pity that this bug completely prevents me from using it.
*** Bug 317390 has been marked as a duplicate of this bug. ***
*** Bug 355188 has been marked as a duplicate of this bug. ***
QA Contact: bsharma → parser
Per HTML5, this should be WONTFIX as a Web-exposed feature.
Assignee: daniel → nobody
This is WONTFIX on the parser level, since this would violate the HTML spec and PHP/ASP/JSP stuff inside HTML tags (e.g <foo <?php /* generate attributes */?>>) wouldn't work anyway. For the simple case, composer could try to recover stuff from comment nodes with good probability (maybe BlueGriffon does?).
Status: NEW → RESOLVED
Closed: 13 years ago
Resolution: --- → WONTFIX
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: