Parser needs to tag <? (PI/PHP) and <% (ASP/JSP) so that composer can decide not to corrupt them when it saves PHP/JSP/ASP files

RESOLVED WONTFIX

Status

()

P1
enhancement
RESOLVED WONTFIX
18 years ago
6 years ago

People

(Reporter: Michael.Kolmodin, Unassigned)

Tracking

(Blocks: 1 bug, {helpwanted})

Trunk
mozilla1.6alpha
helpwanted
Points:
---

Firefox Tracking Flags

(Not tracked)

Details

(URL)

Attachments

(1 attachment, 1 obsolete attachment)

(Reporter)

Description

18 years ago
From Bugzilla Helper:
User-Agent: Mozilla/5.0 (X11; U; SunOS 5.8 sun4u; en-US; m18) Gecko/20001109
BuildID:    [Mozilla]
Mozilla M18
Mozilla/5.0 (X11; U; SunOS 5.8 sun4u; en-US; m18) Gecko/20001109

JSP files typically contains things like <%@ taglib href="uri" %>.
The '<%' and '%> are converted to &lt;% and %&gt; when saved from
the editor. This brakes the JSP syntax.

Reproducible: Always
Steps to Reproduce:
- Load the editor with a legal JSP file containing stuff like
  above (that is, almost any one).
- Save it

Actual Results:  JSP file corrupted, <% => &lt;%

Expected Results:  Should leave all '<%' and '%>' character combinations as-is.

Besides this, I see no major problem using the composer on JSP
pages :-)

Comment 1

18 years ago
accepting for Beth and moving to Future; add keyword helpwanted
Status: UNCONFIRMED → ASSIGNED
Ever confirmed: true
Keywords: helpwanted
Summary: JSP files corrupted by composer → JSP files corrupted by composer: '<%' and '%>'
Target Milestone: --- → Future

Comment 2

17 years ago
-->brade
Assignee: beppe → brade
Status: ASSIGNED → NEW

Comment 3

17 years ago
spam composer change
Component: Editor: Core → Editor: Composer

Comment 4

17 years ago
-->akkana for investigation/reassignment
Assignee: brade → akkana
Target Milestone: Future → ---

Comment 5

17 years ago
We need a policy on this.  The parser, of course, doesn't recognize these tags,
so it passes them through into the dom as &lt; and &gt; entities.  It's correct
for an html editor to leave those entities in entity form (outputting them as
jsp wants would be incorect html).  So we're doing the right thing, as regards
an html editor.

However, it would also be great to be able to edit .jsp and .php and other types
of super-html scripted files.  What we need here is some mechanism for composer
to know that it's editing something that isn't an ordinary html file, and have
some plugged-in behavior to treat certain types of tags specially.  For extra
credit, perhaps we could have pluggable types where a user could define the
character sequences he wanted treated specially for a particular file extension.

Then somehow composer needs to be able to tell the serializer not to entity-ize
the tags: there's no current mechanism to do that -- it's easy to add a
mechanism in the serializer over the whole document, but there's no current way
to tell the serializer to treat one character or one section of a document
specially (e.g. suspend entity translation but only for this section of the
document).
Target Milestone: --- → Future
(Reporter)

Comment 6

17 years ago
*If* we should be able to handle JSP, ASP etc. there are more aspects.
Quoting myself from n.p.m.editor long time ago there are tree issues 
related to this.

   * To handle the syntax. JSP uses blocks like/ <% code...%>/, ASP and
     PHP has a similar syntax. Today,  we read it but when saved  the
     block is converted to /&lt;% code... %&gt>;/  which just does not
     work. I've filed a bug 59985 concerning JSP about this.
   * To display this kind of block in running text. Today, such a block
     is represented by the "yellow markers". Although this works, I
     think it would be a big advantage to use another colour, eg. red,      
     for this kind of "active code" which definitely is something else
     than normal, unknown tags..
   * To display a tag (i. e., an input field) which contains active
     code. This might be something like 
     / <input type="text" size="<? print $size? >" > /. 
     The only thing we could and should do is to display the complete tag 
     as some kind of "red box"; by definition, we don't have sufficient
     information to  render it. 

I'm definitely not sure whether this is a correct and complete picture...

Comment 7

17 years ago
Are these files coming to us with a mimetype of text/html or ?
Perhaps the problem is in the mimetype code?  (Since Composer is not ready or 
able to handle these types of files right now.)
(Reporter)

Comment 8

17 years ago
The very idea of these files is that they represent some kind
of source code which is converted to HTML in runtime. Thus,
there are no registered types for them to my knowledge.
The mime type is used when transferred from server to
client. At this point, all kind of scripted files have been
converted to legal "text/html", and this is the type used.

Bottom line: scripted files is a large and increasing part of web
content today. I think it might be justified to make an effort
to enhance the composer to be able to handle it.  Of course, there 
tradeoffs here...

Comment 9

17 years ago
see example php file:  http://bugzilla.mozilla.org/showattachment.cgi?attach_id=
50012
OS: Solaris → All
Hardware: Sun → All
Summary: JSP files corrupted by composer: '<%' and '%>' → JSP/PHP files corrupted by composer: '<%' and '%>' <?php ?>

Comment 10

17 years ago
*** Bug 100672 has been marked as a duplicate of this bug. ***
(Reporter)

Comment 11

17 years ago
Ueful links:
JSP: http://java.sun.com/products/jsp
PHP: http://www.php.net/
ASP: http://msdn.microsoft.com/library/default.asp?
     url=/library/en-us/iisref/html/psdk/asp/iiwawelc.asp

I guess our main focus should be PHP and JSP - ASP is a MS solution and I doubt
we will become strong in this area. Propably, future scripting technologies
will be focused on XML; that is, they should not be actual until we want to
make an XML editor.

Comment 12

17 years ago
At the very least I would like to see Composer stop obliterating those tags, as
it makes it unusuable for any page that has even the slightest PHP call (like a
simple function to append a common footer to each page)

Comment 13

17 years ago
I'd like that too.  I use PHP for footers on my own site.  But unless you can
convince my boss (and his boss) that it's a company priority (not likely), all I
can say right now (that I haven't already said) is "We're accepting patches".

Comment 14

17 years ago
Okay, the folks involved with this bug feel strongly about getting a partial
solution to the problem (private mail) -- just having a way to output the <
without it getting munged into &lt; (and same for >).  And I agree -- I'm
bothered that we have no way of preserving the user's original source, and this
isn't the only bug in which this issue has come up.

But that part of it isn't an editor issue, it's a parser/serializer issue.  The
parser doesn't save any information indicating whether the original file had <
vs &lt; so the serializer assumes that since < is an error in html, that
character must have been an entity, and changes it to &lt; because there's no
way of storing < in the dom tree.
If the parser stored the entity &lt; differently from the ascii (well, ucs-2)
character <, then the serializer could do the right thing on output.  Now that
the serializer is owned by the parser group, perhaps there's a chance this could
happen.  Reassigning to the parser group, but I'm staying on the cc list and
will be happy to do anything that might be needed in the editor, or even help
with the serializer end of the fix if the parser offers some way to differentiate.
Assignee: akkana → harishd
Component: Editor: Composer → Parser
QA Contact: sujay → bsharma
(Reporter)

Comment 15

17 years ago
Talking about storing '<': Wouldn't it be easier to parse '<%', '<?', 
,'%>' and '?>' instead? If we define these as "startCode" and "endCode"
respectively, just handle the things in between as a comment? 
(Reporter)

Updated

17 years ago
Blocks: 101917
(Reporter)

Comment 16

17 years ago
I've filed a new bug  101917 which covers handling of tags containing 
scripted code. This means that from now on this bug (59985)  is only 
concerned about preserving the pieces of scripted code such as 
<% jsp code %> or <? php code ?> in HTML source files.
*** Bug 101917 has been marked as a duplicate of this bug. ***

Comment 18

17 years ago
*** Bug 152909 has been marked as a duplicate of this bug. ***

Updated

17 years ago
Summary: JSP/PHP files corrupted by composer: '<%' and '%>' <?php ?> → Parser needs to tag <? (PI/PHP) and <% (ASP/JSP) so that composer can decide not to corrupt them when it saves PHP/JSP/ASP files

Comment 19

17 years ago
*** Bug 153019 has been marked as a duplicate of this bug. ***

Comment 20

16 years ago
*** Bug 166609 has been marked as a duplicate of this bug. ***

Comment 21

16 years ago
Mozilla would be a great CMS for php systems when this bug is fixed. . .
In terms of its interaction with Composer, I don't think editors should change
code without express permission, but I'm not sure of the connections with the
browser code.

Comment 22

16 years ago
Why would there be *any* modification being made to tags that are not strictly
known to be HTML/XML/etc???? I would argue that no modification should be made
at all to tags that are not known entities, nor to their scope. -No closing tag
generated, and no modification made to its params, nor body. There's a huge
assortment of other tag-based embedded languages that have no DOM and use
characters beyond just '%' and '?', including a-zA-Z, !, etc, to represent the
parser entry point, or they have non-standard characters in their params.

Consider Standard SSI:
   <!--#echo var="DOCUMENT_ROOT"-->
Standard JSTL ( http://java.sun.com/webservices/docs/1.0/tutorial/doc/JSTL5.html ):
   <c:set var="bookId" value="${param.Remove}"/>
   <jsp:useBean id="bookId" type="java.lang.String" />
   <% cart.remove(bookId); %>
   <sql:query var="books" 
     dataSource="${applicationScope.bookDS}">
     select * from PUBLIC.books where id = ?
     <sql:param value="${bookId}" />
   </sql:query>

Both are mangled --even though the JSTL sample has a DOM heirarchy and is a good
model for a embedded langauge.

So, leave any tag alone --no matter *what* character(s) follow the '<', if its
not strictly a known HTML/XML/etc entity. If not always, at least when the
"Prefs:Composer:retain Orig HTML Source Formatting" is selected.

If the issue is just that we need be able to open an original text-format file
and automatically translate it to HTML meta characters, shouldnt that be enabled
by strictly by the [file-open,files-of-type(html || text)] dialog?

Comment 23

16 years ago
Ken (comment 22 -- perhaps I'm missing a subtlety in your point? if so, please
restate in a more concise manner):

Please don't distract from the specific issue covered in this bug.  There is at
least one other bug on the issue of changing the output in general (when it
shouldn't have).  This bug is specific to php/asp/jsp tags which the parser
doesn't hand over to Composer.  Comment 14 is the best summary I have seen for
what is going on.  Please read or re-read it to understand.

Comment 24

16 years ago
Then I would presume that, given the many other type of tags beyond just <? and
<%, that the parser folks should try to preserve *any* unknown tag. I think that
could also apply to the subject of this bug; 'tag any unknown tag, not just <?
and <%, so that composer..'.

Would this be a more accurate objective?
 "Parser needs to tag {[mark?] unknown tags/entities} so that composer can
decide not to corrupt them when it saves PHP/JSP/ASP files"

I definately acknowledge that this is a hard one... Hope this is more clear. -k

Comment 25

16 years ago
Ken, why did you remove a long list of people from the cc list?  Shouldn't that
be our choice to make?

Comment 26

16 years ago
I haven't really delved into the source so I have no
idea what the implementation issues here are. I gather
from these comments that they are far from trivial.

However I'd REALLY like to see some kind of handling
added to Mozilla in this respect. Once I've added some
PHP to some HTML, I can no longer use Composer to edit
the file at all. So I tend to save php-free templates of 
my pages, and then cut-and-paste all my php back into
them every time I make changes. :(

It would be a godsend if Composer would tag everything
between <? and ?> as a php block, ignore it on rendering,
and save it as is. I'm not asking for advanced handling
of PHP and every other special scripting type under the
sun; I'd just like to see the bare minimum of recognizing
certain types and leaving them unmolested (and unrendered)
during editing and saving.

I'd love to try helping out here but I wouldn't even
know where to begin, not having any clear idea of how
Mozilla operates under the hood. 

Comment 27

16 years ago
Version 1.2b has gone worse; it seems to strip out everything between <? and ?>
rather than just mangling it.

Comment 28

16 years ago
Why can't we just do the same thing we do with a COMMENT OR A JAVASCRIPT --
leave everything inside the 
<!--          -->   OR
<script> 
</script> 
tags UNTOUCHED (though I think there are some problems with changing line
endings even within comments that must be fixed).  

Just leaving everything inside of 
<?php     
?>
untouched would make Composer at least usable on these file types for editing
the static HTML parts of the page, and should be very easy to accomplish!

CCing composer gurus burpmaster@truffula.net & daniel@glazman.org

Comment 29

16 years ago
Well, a comment gets output as <!-- and a script tag as <script>, so obviously
it would have to be flagged specially in order to get it output as <?php.  Maybe
we could add an attribute to the comment node, or even a pair of attributes, for
open (="?php" in this case) and another for close (="?").

The parser code to figure out where the special-comment sequence begins and ends
still needs to be written, though (that's what the helpwanted keyword is there
for).  The fixes to the serializer and editor are probably straightfo if someone
steps up to fix the parser to stick these tags into the dom in the first place.
Um, special comment? "<? ... >" is a marked section, perfectly valid in HTML, 
and should be entirely preserved; I suspect parser is just screwing up, plain 
and simple. "<%" is, unfortunately, an entirely different beast. I'm of the 
opinion that the code for marked section handling should be shared among the 
content sinks as we do for comments; maybe this would be a good time.
> Um, special comment? "<? ... >" is a marked section, perfectly valid in HTML

No. It is a processing instruction.
I am taking this bug but I am warning readers that it will probably be solved
ONLY in standalone composer. stay tuned.
Assignee: harishd → daniel
Severity: normal → enhancement
Priority: P3 → P1
Target Milestone: Future → mozilla1.6alpha
Created attachment 131097 [details] [diff] [review]
this is part of the solution

this patch preserves PHP PIs (and only them) in HTML documents opened/saved in
the forthcoming Standalone Composer. It is not a complete fix since I still
have a problem with direct modifications of the PHP PIs themselves in
Composer's source view.
Created attachment 131141 [details] [diff] [review]
aaah, this is MUCH better

There is still one little problem, unrelated to my new code, but more
probably related to nsHTMLEditor::InsertHTMLWithContext().
Investigating. If I can't find it, I'll ask for reviews for the current patch.
Attachment #131097 - Attachment is obsolete: true

Comment 34

15 years ago
Neat!  Daniel, why is it only applicable to standalone composer? Isn't there any
way we could add it in the non-standalone case, rather than just dropping it?

If there's no way, then please leave this bug open for the non-standalone case
after the standalone fix is checked in.
> Neat!  Daniel, why is it only applicable to standalone composer?
> Isn't there any way we could add it in the non-standalone case,
> rather than just dropping it?

The problem is adding some sort of Processing Support to the html parser *for
the editor* without adding it to the same html parser *for the browser*...
That's why a model where the browser and the editor are distinct applications
is VERY helpful. We can make modifications/adaptations to core modules that
are specific to a high-level application. I am here thinking of the DOM : the
editor will need additions that the browser will probably not want to
integrate.

Comment 36

15 years ago
What would the browser do with it if we added it in browser mode?  Would it
cause bad behavior?

Is there any way we could add it as a comment (when not in standalone editor
mode), as long as you've written the code to notice it in the first place?
We could add support for PIs in HTML to the browser too (though the DOM spec
limits them to XML). Not sure how frequent they are and what the exact
side-effects would be.
IIRC PIs end with a single > in SGML/HTML but attachment 131141 [details] [diff] [review] seems to cut 2
chars from the end.
Yes, PI's end with just a '>' in SGML, and thus they should do the same also in
HTML, if we choose to "support" them, IMO.
Hmmmm. The problem is the following one : nobody uses processing instructions
in HTML 4 *but* for PHP code. So what we really want/need here is the
<? ... ?> form, not the SGML one. Peter, Jst, what do you suggest here?
Support both forms in HTML? Or...
It kinda doesn't matter does it, but if we follow the SGML rules we're more
likely to "properly" deal with non PHP PI's that are real SGML PI's too. If we
go with XML rules, then we'll fail to parse SGML PI's, but if we go with SGML
rules we'll deal with both types, at no cost. The only difference is that the
'?' before the closing '>' will be consider part of the PI's content, and we'll
need to make the serializers deal with this situation, and there's a couple of
options there...
Supporting SGML PIs would improve round-tripping :-).
Wouldn't implementing entity reference nodes solve the problem of "the DOM
doesn't know whether the '<' came from &lt; or from normal text"?
Yes, that would solve the problem for non-character entities.

Comment 44

15 years ago
*** Bug 236912 has been marked as a duplicate of this bug. ***

Comment 45

14 years ago
*** Bug 256908 has been marked as a duplicate of this bug. ***

Comment 46

14 years ago
*** Bug 286002 has been marked as a duplicate of this bug. ***

Comment 47

13 years ago
*** Bug 247589 has been marked as a duplicate of this bug. ***

Comment 48

13 years ago
Has anyone some news about this bug? Mozilla composer is a very good editor and it's a pity that this bug completely prevents me from using it. 

*** Bug 317390 has been marked as a duplicate of this bug. ***
*** Bug 355188 has been marked as a duplicate of this bug. ***
QA Contact: bsharma → parser
Per HTML5, this should be WONTFIX as a Web-exposed feature.

Updated

6 years ago
Assignee: daniel → nobody
This is WONTFIX on the parser level, since this would violate the HTML spec and PHP/ASP/JSP stuff inside HTML tags (e.g <foo <?php /* generate attributes */?>>) wouldn't work anyway.

For the simple case, composer could try to recover stuff from comment nodes with good probability (maybe BlueGriffon does?).
Status: NEW → RESOLVED
Last Resolved: 6 years ago
Resolution: --- → WONTFIX
You need to log in before you can comment on or make changes to this bug.