57248 - Options|Format|Plaintext is not converting contents of HTMLcompose message to

No, it doesn't, not for me. The msg source is wrong already, and 4.x doesn't have a chance to recover from that, just like our viewer. It is a problem during composition. -> Composition. This bug does already appear on a 2000-09-13 build, so it is not caused by NOXIF. If you do Debug|OutputText during composition, everything is fine.

Ben Bucksch (:BenB)

Comment 5

•

25 years ago

.

Assignee: putterman → ducarroz

Component: Mail Window Front End → Composition

Ben Bucksch (:BenB)

Comment 6

•

25 years ago

Update: The last sentence suggests that this is a bug in Composition. We seem have a very odd data path during send (somthing like HTML->TXT->HTML->TXT or similar*), which, IIRC, exposed various bugs already (which is partly good and partly bad). IMO, it is time to clean this up. The only thing I am wondering is what exactly inserts the &. This is propably an additional bug, because, in a perfect world, HTML->TXT->HTML->TXT would produce the same as HTML->TXT. *IIRC, in previous bugs, we found that it is not just HTML/TXT conversions - there was some other converter involved, but I don't remember the details. Do you akk? Or am I erronously confusing this with the bugs we found during reply?

Summary: URL is htmlified (& = &, etcs..) in revieved mails that contains attached webpages → URL is htmlified (& -> & etc.)

Akkana Peck

Comment 7

•

25 years ago

The path followed by entity conversion changed with the noxif landing. This bug was fixed at one point; perhaps it has regressed. Adding jst since he knows the current status of entity handling in the output converters.

Ben Bucksch (:BenB)

Comment 8

•

25 years ago

akk, I don't understand. And I said, I see the exact same behavious in an old build.

Akkana Peck

Comment 9

•

25 years ago

Isn't this the same issue as in bug 44372 (entities inside tags not converted properly?) That was fixed some time ago, so if this is the same issue, it would help to know what regressed. What does the href look like if you do Debug->Output HTML during composition?

Ben Bucksch (:BenB)

Comment 10

•

25 years ago

> Isn't this the same issue as in bug 44372 (entities inside tags not converted > properly?) Dunno. Note that here, the part in the tag has &. We has a similar bug (tags not processed corrently) in libmime's flowed->HTML converter, but that was supposed to be fixed, too. Ccing Daniel Bratell nevertheless. > What does the href look like if you do Debug->Output HTML during composition? <a href="http://gemal.dk/browserspy/ws.cgi?server=http://gemal.dk&test=1">http://gemal.dk/browserspy/ws.cgi?server=http://gemal.dk&test=1</a> (which is very odd, because the link is not blue in the HTML composer). OutputText gives << http://gemal.dk/browserspy/ws.cgi?server=http://gemal.dk&test=1 >> because the converter correctly recognized that text and href are identical.

Akkana Peck

Comment 11

•

25 years ago

Oh, I think I get it. This is the opposite of bug 44372. That bug was that entities like & inside of a tags were getting converted to &, which was not considered to be correct html. This bug is that you WANT the & inside the a tag and don't want it converted to &. I don't know what is technically correct, or whether both are correct. Once they're in the dom, though, we can't tell the two cases apart; the parser makes them equivalent, so we have to choose one or the other form on output.

Ben Bucksch (:BenB)

Comment 12

•

25 years ago

akk, the output is plaintext - the & is wrong. Daniel, I enabled some debugging output and is seems that the libmime flowed->HTML converter is not hit at all during send (which is correct).

Johnny Stenback (:jst)

Comment 13

•

25 years ago

The HTML srializer will convert '&' to '&', and doing that is absolutely corect, not doing that conversion will produce non-wellformed HTML (ha ha, I know, there's no well-formed HTML:-). If the plaintext serializer grabs attribute values it should unescape the value before using it.

Jean-Francois Ducarroz

Comment 14

•

25 years ago

Accepting. I am confuse, is it a real bug?

Status: NEW → ASSIGNED

Target Milestone: --- → mozilla0.9

Ben Bucksch (:BenB)

Comment 15

•

25 years ago

> is it a real bug? Yes. We send out plaintext. "&" is valid in URLs - no need to escape. Actually the way we escape gives a wrong URL, since "&" is HTML-escaping, not URL-escaping. I guess the only reason why it works nevertheless in Mozilla (you can click on either link and get to the same page) is that Mozilla is "smart" and fixes up the URL. The only question is *where* the bug is. Is it in ScanHTML, ScanTXT, Compose, ...?

Ben Bucksch (:BenB)

Comment 16

•

25 years ago

Daniel, akk, could HTML->TXT be the problem? Here's the code: Opening a tag: nsAutoString url; if (NS_SUCCEEDED(GetAttributeValue(nsHTMLAtoms::href, url)) && !url.IsEmpty()) { mURL = url; } Closing a tag: nsAutoString temp; temp.AssignWithConversion(" <"); temp += mURL; temp.AppendWithConversion(">"); Write(temp); mURL.Truncate(); Does |GetAttributeValue| or |Write| any HTMLunescaping, i.e. "&" -> "&"? If not, that would be the culprit.

Akkana Peck

Comment 17

•

25 years ago

I don't believe Write does any unescaping. GetAttributeValue calls mContent's GetAttribute and mParserNode's GetValueAt, neither of which, AFAIK, should do any unescaping.

Ben Bucksch (:BenB)

Comment 18

•

25 years ago

Based on akk's comments, over to DOM->TXT. BTW: How do we unescape normal text in the converter? Didn't find any Unescape (*scape*) function call.

Assignee: ducarroz → akkana

Status: ASSIGNED → NEW

Component: Composition → DOM to Text Conversion

Product: MailNews → Browser

QA Contact: esther → sujay

Daniel Bratell

Comment 19

•

25 years ago

The parser splits up text containing entitys in parts so foo<bar would enter the converter as text: "foo" entity: "lt" text: "bar" You will se an entity handler in DoAddLeaf (IIRC).

Akkana Peck

Comment 20

•

25 years ago

Anthony is taking over dom-text conversion. I really need to figure out who to ask to change the default component owner. :-) Any unescaping in the plaintext serializer comes from the charset converters, AFAIK.

Assignee: akkana → anthonyd

rubydoo123

Comment 21

•

25 years ago

this should be resolved for moz0.9

Keywords: correctness, nsbeta1

Priority: P3 → P2

Whiteboard: [nsbeta1+]

Kathleen :Brade

Comment 22

•

25 years ago

reassign to myself since anthonyd has a large buglist

Assignee: anthonyd → brade

Kathleen :Brade

Updated

•

24 years ago

Target Milestone: mozilla0.9 → mozilla0.9.1

Kathleen :Brade

Comment 23

•

24 years ago

based on RFC1738, & when it's not a delimiter should be %26. If it's a delimiter, then we need to do the html-escaping (&)

Status: NEW → RESOLVED

Closed: 24 years ago

Keywords: correctness

Resolution: --- → INVALID

sujay

Comment 24

•

24 years ago

verified in 3/5 build.

Status: RESOLVED → VERIFIED

Ben Bucksch (:BenB)

Comment 25

•

24 years ago

> based on RFC1738, & when it's not a delimiter should be %26. Please quote the exact sentence, with paragraph number. I think, you are wrong. It must be %26, if it isn't used as query specifier. In this case (the common case), it is a query specifier, so the "?" is completely legal. See RFC2396 (which updates RFC1793 - please use the former, not the latter), Section 3.. REOPENing. > If it's a delimiter, then we need to do the html-escaping (&) We're talking about the case where it is *not* a delimiter. > verified Did you actually check the RFC? > in 3/5 build. That's complete nonsense.

Status: VERIFIED → REOPENED

Resolution: INVALID → ---

Kathleen :Brade

Comment 26

•

24 years ago

reassign to beppe for explanation/resolution Ben--in the case where it is *not* a delimiter, then the user should have entered "%26" in the url. Am I mistaken?

Assignee: brade → beppe

Status: REOPENED → NEW

rubydoo123

Comment 27

•

24 years ago

item 1: Reserved: Many URL schemes reserve certain characters for a special meaning: their appearance in the scheme-specific part of the URL has a designated semantics. If the character corresponding to an octet is reserved in a scheme, the octet must be encoded. The characters ";", "/", "?", ":", "@", "=" and "&" are the characters which may be reserved for special meaning within a scheme. No other characters may be reserved within a scheme. Usually a URL has the same interpretation when an octet is represented by a character and when it encoded. However, this is not true for reserved characters: encoding a character reserved for a particular scheme may change the semantics of a URL. Thus, only alphanumerics, the special characters "$-_.+!*'(),", and reserved characters used for their reserved purposes may be used unencoded within a URL. On the other hand, characters that are not required to be encoded (including alphanumerics) may be encoded within the scheme-specific part of a URL, as long as they are not being used for a reserved purpose. Item 2: 3.1. Common Internet Scheme Syntax While the syntax for the rest of the URL may vary depending on the particular scheme selected, URL schemes that involve the direct use of an IP-based protocol to a specified host on the Internet use a common syntax for the scheme-specific data: //<user>:<password>@<host>:<port>/<url-path> Some or all of the parts "<user>:<password>@", ":<password>", ":<port>", and "/<url-path>" may be excluded. The scheme specific data start with a double slash "//" to indicate that it complies with the common Internet scheme syntax. The different components obey the following rules: user An optional user name. Some schemes (e.g., ftp) allow the specification of a user name. password An optional password. If present, it follows the user name separated from it by a colon. The user name (and password), if present, are followed by a commercial at-sign "@". Within the user and password field, any ":", "@", or "/" must be encoded. NOTE: ampersand is not listed Item 3: ip-schemepart = "//" login [ "/" urlpath ] login = [ user [ ":" password ] "@" ] hostport hostport = host [ ":" port ] host = hostname | hostnumber hostname = *[ domainlabel "." ] toplabel domainlabel = alphadigit | alphadigit *[ alphadigit | "-" ] alphadigit toplabel = alpha | alpha *[ alphadigit | "-" ] alphadigit alphadigit = alpha | digit hostnumber = digits "." digits "." digits "." digits port = digits user = *[ uchar | ";" | "?" | "&" | "=" ] password = *[ uchar | ";" | "?" | "&" | "=" ] urlpath = *xchar ; depends on protocol see section 3.1 NOTE: for user and password the & is present, and must be encoded Item 4: ; FTP (see also RFC959) ftpurl = "ftp://" login [ "/" fpath [ ";type=" ftptype ]] fpath = fsegment *[ "/" fsegment ] fsegment = *[ uchar | "?" | ":" | "@" | "&" | "=" ] ftptype = "A" | "I" | "D" | "a" | "i" | "d" ; FILE fileurl = "file://" [ host | "localhost" ] "/" fpath ; HTTP httpurl = "http://" hostport [ "/" hpath [ "?" search ]] hpath = hsegment *[ "/" hsegment ] hsegment = *[ uchar | ";" | ":" | "@" | "&" | "=" ] search = *[ uchar | ";" | ":" | "@" | "&" | "=" ] NOTE: & in fsegment,ftptype, hsegment and search, and must be encoded Item 5: ; NEWS (see also RFC1036) newsurl = "news:" grouppart grouppart = "*" | group | article group = alpha *[ alpha | digit | "-" | "." | "+" | "_" ] article = 1*[ uchar | ";" | "/" | "?" | ":" | "&" | "=" ] "@" host NOTE: the ampersand is within article, and must be encoded Item 6: ; PROSPERO prosperourl = "prospero://" hostport "/" ppath *[ fieldspec ] ppath = psegment *[ "/" psegment ] psegment = *[ uchar | "?" | ":" | "@" | "&" | "=" ] fieldspec = ";" fieldname "=" fieldvalue fieldname = *[ uchar | "?" | ":" | "@" | "&" ] fieldvalue = *[ uchar | "?" | ":" | "@" | "&" ] NOTE: the ampersand is within psegment, fieldname and fieldvalue, and must be encoded Item 7: reserved = ";" | "/" | "?" | ":" | "@" | "&" | "=" NOTE: the & is listed as a reserved value -- now go back up and re-read item 1 To further illustrate: 2.1. The main parts of URLs A full BNF description of the URL syntax is given in Section 5. In general, URLs are written as follows: <scheme>:<scheme-specific-part> A URL contains the name of the scheme being used (<scheme>) followed by a colon and then a string (the <scheme-specific-part>) whose interpretation depends on the scheme. Scheme names consist of a sequence of characters. The lower case letters "a"--"z", digits, and the characters plus ("+"), period ("."), and hyphen ("-") are allowed. For resiliency, programs interpreting URLs should treat upper case letters as equivalent to lower case in scheme names (e.g., allow "HTTP" as well as "http"). Thus, if the & is being utilized as a delimiter, it must be escaped, consequently not be translated when passing through necko. If the & is used for any other purpose, it must be encoded so it does get translated when passing through necko

Status: NEW → RESOLVED

Closed: 24 years ago → 24 years ago

Ben Bucksch (:BenB)

Comment 28

•

24 years ago

First, with "delimiter", I mean a charater that shows where the URL starts and ends, i.e. is not part of the URL, i.e. is part of the normal text. The usual delimiters are "<" and ">", e.g. <http://www.mozilla.org>. RFC2396, 2.4.3.: > The angle-bracket "<" and ">" and double-quote (") characters are > excluded because they are often used as the delimiters around URI in > text documents and protocol fields. The character "#" is excluded > because it is used to delimit a URI from a fragment identifier in URI > references (Section 4). The percent character "%" is excluded because > it is used for the encoding of escaped characters. > delims = "<" | ">" | "#" | "%" | If you mean "delimiter" = HTTP query variable separator, e.g. "&" in <http://www.mozilla.org/cgi/script.pl?var1=string1&var2=string2>, then, yes, we're talking about a "delimiter" in this bug. Second, apart from the discussion if to encode or not, even *if* you had to encode the "&", using "&" would be wrong: It is HTML-encoding, while we're talking about a URL in plaintext. *If* you'd encode, you'd have to use %26 or similar. So, this is a bug in *any* case. You quote RFC1738: > Reserved: > Many URL schemes reserve certain characters for a special meaning: > their appearance in the scheme-specific part of the URL has a > designated semantics. "&" is such a "certain", "reserved" charater for the HTTP scheme - it separates query variables. > If the character corresponding to an octet is > reserved in a scheme, the octet must be encoded. ...for the non-reserved use! If you encoded the variable separator "&", how would you represent a "&" in the variable content? You'd have to encode it, too, and it would be impossible to tell, if the encoded "&" was meant as variable content or separator. To avoid exactly *that* is the whole point of the encoding! The variable separator appears as raw "&", while "&" as variable content has to be encoded as %26 or similar. > The characters ";", > "/", "?", ":", "@", "=" and "&" are the characters which may be > reserved for special meaning within a scheme. But, again, please do not use RFC1738, it is outdated. In fact, RFC2396 is much more clear: > 2.2. Reserved Characters > Many URI include components consisting of or delimited by, certain > special characters. These characters are called "reserved", since > their usage within the URI component is limited to their reserved > purpose. If the data for a URI component would conflict with the > reserved purpose, then the conflicting data must be escaped before > forming the URI. > reserved = ";" | "/" | "?" | ":" | "@" | "&" | "=" | "+" | > "$" | "," Basically exactly what I said above. > 3.4. Query Component [...] > query = *uric > Within a query component, the characters ";", "/", "?", ":", "@", > "&", "=", "+", ",", and "$" are reserved. REOPENing.

Status: RESOLVED → REOPENED

rubydoo123

Comment 29

•

24 years ago

and reclosing

Status: REOPENED → RESOLVED

Closed: 24 years ago → 24 years ago

Resolution: --- → INVALID

Ben Bucksch (:BenB)

Comment 30

•

24 years ago

Huh? Please say why I am wrong.

Status: RESOLVED → REOPENED

Resolution: INVALID → ---

Ben Bucksch (:BenB)

Comment 31

•

24 years ago

I must say I find it pretty barfling that you just reclose, completely ignoring my well-founded comment.

rubydoo123

Comment 32

•

24 years ago

ok, let me try this one last time before I reclose the bug. From a user perspective, when they are entering a value they are normally doing that in some sort of dialog such as the link dialog. As the user types, the expectation is that they will actually see the ampersand. The reason I bring this up, is because we need to account for that instance during this whole process. So, if a user types in (or copies) the following string into such a dialog: http://ruby:ban&far@www.foo.bar/foo.html?p="1"&q="2"&r="bert&ernie" the user would expect to see just that, however, in the backend there will be a need for the translations. Based on the above example, the ampersand (&) is being utilized in three different instances. The first instance is within a password and is treated as string value, the second instance in which it is being used is as a delimiter, the thrid and last instance in which it is being used is within an attribute value and is again treated as a text string value. The ampersand cannot be used in any other fashion within the url -- this is it, no other uses in the url string. In such cases where the ampersand is being used as a text string value: it MUST BE encoded to %26. It is done so, so it is not translated when retrieving the value. In such cases where it is used as a delimiter: it MUST BE html-entitied (& or &#nn;). It is done so, so it IS translated when retrieving the value. In either case, there is an additional translation required for rendering to the user. Now, hopefully for the last time -- if the ampersand is not being used as a delimiter -- which would mean that no matter where it is placed in the string, if it not specifially being used as the delimiter, it is translated as text, if it is translated as text, then it gets encoded - that means if it is at the beginning, the middle or the end -- doesn't matter. Closing yet again

Status: REOPENED → RESOLVED

Closed: 24 years ago → 24 years ago

Resolution: --- → INVALID

rubydoo123

Updated

•

24 years ago

Whiteboard: [nsbeta1+] → [nsbeta1-]

Ben Bucksch (:BenB)

Comment 33

•

24 years ago

> let me try this one last time before I reclose the bug. You seem to be very sure that you are right and I am too dumb to understand you. Somebody else has an opinion here? > the second instance in which it is being used is as a delimiter (variable separator) > In such cases where it is used as a delimiter: it MUST BE html-entitied > (& or &#nn;). Why? How do you come to HTML? This is a URL, sent in *plaintext*. No HTML at all here. URLs are not "HTML-entitied". Plaintext isn't either. *If* were talking about HTML source code, you were right - & has spepcial memaning in HTML and thus must be "entitied" in HTML. But "&" appears in the plaintext, on-the-wire msg source.

rubydoo123

Comment 34

•

24 years ago

I'm was not alluding that you're dumb or any such thing. why you interpreted the comment as such is an incorrect interpretation. I was addressing this issue via html based context and in that context, it is working correctly. However, in the context of plaintext mail, what I believe is really happening is that the string (the url) is actually being filtered as if it is an anchor. When you receive the mail message, the string is viewed as a link -- do you see that? So, what I think is happening is that on send, the url is being wrapped within an anchor, which in turn is/was getting escaped and what you saw is happening because of that. However, using the current builds, the string is still embedded in an anchor, but the last ampersand is being displayed correctly. In that respect, the delimited (variable separator) is being handled correctly. If in the plaintext context, you are correct in that it should not be escaped, but I do not think that we are actually doing plaintext in that sense, I believe there is still some level of html. variable separator == delimiter

Ben Bucksch (:BenB)

Comment 35

•

24 years ago

beppe, do the following: Reproduction: 1. Go to <http://gemal.dk/browserspy/ws.cgi?server=http://gemal.dk&test=1> 2. File|Send Link... 3. Add yourself as recipient 4. Send, as Plaintext Only (this is the default, at least in Mozilla) 5. Recieve mail, in any mailer, e.g. Mozilla or 4.x 6. View Source Actual result: http://gemal.dk/browserspy/ws.cgi?server=http://gemal.dk&test=1 <http://gemal.dk/browserspy/ws.cgi?server=http://gemal.dk&test=1> Expected result: http://gemal.dk/browserspy/ws.cgi?server=http://gemal.dk&test=1 As for *why* this happens, see the discussion above.

rubydoo123

Comment 36

•

24 years ago

I did the test and I do not see the & in the source -- regardless if I open it in mozilla or 4.x -- In fact, if it is sent via plaintext, in the source there is only this: This is a multi-part message in MIME format. --------------030401010301090802020903 Content-Type: text/plain; charset=us-ascii; format=flowed Content-Transfer-Encoding: 7bit <http://gemal.dk/browserspy/ws.cgi?server=http://gemal.dk&test=1> --------------030401010301090802020903 However, in HTML compose, the source looks like this: This is a multi-part message in MIME format. --------------010300030303040103020804 Content-Type: text/plain; charset=us-ascii; format=flowed Content-Transfer-Encoding: 7bit http://gemal.dk/browserspy/ws.cgi?server=http://gemal.dk&test=1 <http://gemal.dk/browserspy/ws.cgi?server=http://gemal.dk&test=1> Notice, that the html compose source is exactly what was mentioned in this bug -- not the plaintext. Also note, that the second url line is wrapped in the < > delimiters, and is treated as if it is the content (value) of the href attribute. What does this lead me to believe: that the issue is within html compose and not plaintext compose, due to 1. the contstructs of the source within the plaintext and html files verses the discussion and samples in this bug are consistent with the source of html compose, 2. within the html source, the 2nd url is treated as a link (easily tested by displaying the mail and mousing over the line). Consequently, since the first line being displayed is the text string of the url, and the ampersand is not encoded, it is correct. The second url is being treated as the content of the href atribute, the encoding of the ampersand is correct.

sujay

Comment 37

•

24 years ago

verified in 3/8 build.

Status: RESOLVED → VERIFIED

Ben Bucksch (:BenB)

Comment 38

•

24 years ago

beppe, I'm sorry that I didn't give very exact instructions for reproduction. Next try: Reproduction: 1. Set your prefs to use the HTML composer 2. Go to <http://gemal.dk/browserspy/ws.cgi?server=http://gemal.dk&test=1> 3. File|Send Link... 4. Add yourself as recipient 5. Options|Format|Plaintext Only 6. Send 5. Recieve mail, in any mailer, e.g. Mozilla or 4.x 6. View Source Yes, you do have to use the HTML composer. Nevertheless, the HTML composer can send as Plaintext, after the HTML is converted. This is the default, at least in Mozilla: Compose in HTML editor, send as Plaintext. No matter how you compose is, if you send as plaintext, it has to be plaintext. An "&" is completely misplaced in plaintext. Yes, IIRC, the bug comes from "&" being escaped in the href attribute. But that doesn't change the fact that the endresult is bogus and thus this is a bug. As for where exactly the bug has to be fixed, see above.

Akkana Peck

Comment 39

•

24 years ago

I just tried Ben's instructions: the mail I got back, viewed in a plaintext mail UA, said: http://gemal.dk/browserspy/ws.cgi?server=http://gemal.dk&test=1 <http://gemal.dk/browserspy/ws.cgi?server=http://gemal.dk&test=1> If I tried to go to the URL inside the angle brackets, I would go to the wrong URL. So I have to agree with BenB that this is still a bug. Beth, am I missing something? Ben, any idea what the right fix for this is? Perhaps the plaintext serializer is doing entity conversion and shouldn't? But I don't see anywhere in nsPlaintextSerializer where it's doing entity conversion except for entity tags (which obviously aren't the problem since this is inside an anchor tag). Doesn't look like the plaintext serializer ever calls the charset encoder either. Not sure where this conversion is happening if it isn't in the html source inside the edit window (Beth, how are you viewing the source? I wish mail would enable the debug menu in their compose window, or offer a source pane tab).

Status: VERIFIED → REOPENED

Resolution: INVALID → ---

Ben Bucksch (:BenB)

Comment 40

•

24 years ago

> Ben, any idea what the right fix for this is? See the comments from 2000-12-08 17:45 to 2000-12-09 10:43. I suggest to put some code that HTML-unescapes into |nsPlaintextSerializer::GetAttributeValue|. (The actualy unescaping should probably be written as a function in a more central place and just be called from |GetAttributeValue|.)

Ben Bucksch (:BenB)

Comment 41

•

24 years ago

Actually, I have an even simpler reproduction, which surely doesn't involve the TXT->HTML converter: 1. Open HTMl composer 2. Enter "bla1", select it 3. Creat Link, with link-target (href) "bla&bla" 4. Options|Format|Plaintext Only 5. Send 6. Recieve with any mailer 7. View Source Acutal result: bla1 <bla&bpa> Expected result: bla1 <bla&bpa>

Akkana Peck

Comment 42

•

24 years ago

If I follow Ben's simpler repro instructions in the html editor (not mail compose), so I can get the Debug menu and see what's going on, if I then Output Text, I see: bla <bla&bla> If I follow the longer (mail send) instructions and check the value of url at line 672 of nsPlaintextSerializer, just before it sets mURL = url, I see that the &amp is already there. url came straight from mParserNode->GetValueAt(i) and hasn't gone through any extra url escapes. This suggests that the mail (libmime?) code which inserts the initial text into the compose window did the url escaping. That's at odds with Ben's latest experiment, though, where he typed the value in to the link dialog in html compose. His experiment suggested to me (since I didn't see the same thing happening in Output Text from the editor) that the problem likely had something to do with the fact that mail saves as html, then reads the html back in and converts it to plaintext, instead of just writing it as plaintext in the first place. In any case, I'm leaning strongly toward thinking this is a mail problem, unless someone can find a way to demonstrate it in the editor without going through mail code at all. I can't seem to repro it in the editor.

rubydoo123

Comment 43

•

24 years ago

ok, yes that one step is quite important (Options|Format|Plaintext), if selecting that option is indeed supposed to convert the content of the message to plaintext, then yes that is an issue. But it is quite evident that that is not what is happening (the content of the web page is sent as an inline html file and should not if it truly is plaintext). I contacted Seth and have asked him to take a look at this (will also cc him now).

rubydoo123

Comment 44

•

24 years ago

forgot to mention, if that is the case, then this should be assigned to mail team to work out the conversion issues.

rubydoo123

Updated

•

24 years ago

Component: DOM to Text Conversion → Mail Window Front End

Product: Browser → MailNews

Summary: URL is htmlified (& -> & etc.) → Options|Format|Plaintext is not converting contents of HTML compose message to plaintext

rubydoo123

Comment 45

•

24 years ago

reassigning to the mail team. The real problem: Options|Format|Plaintext is not converting contents of HTML compose message to plaintext updating Summary with the correct issue

rubydoo123

Comment 46

•

24 years ago

reassigning to mail team

Assignee: beppe → sspitzer

Status: REOPENED → NEW

QA Contact: sujay → esther

Ben Bucksch (:BenB)

Comment 47

•

24 years ago

beppe, no, the msg is converted. You did "Send Page", not "Send Link". Even if you do Send Page, the behaviour is correct, because the page is an attachment, and the conversion applies only to the main body, i.e. the text you write. This is exactly the same as 4.x. Restoring SUMMARY.

Summary: Options|Format|Plaintext is not converting contents of HTML compose message to plaintext → URL is htmlified (& -> & etc.)

rubydoo123

Comment 48

•

24 years ago

no Ben, when you send page with the initial mail message set to plaintext it works just fine, when you set the default method to be html and then select via the options menu it doesn't work fine -- get it? changing the summary back to THE CORRECT SUMMARY STATEMENT

Summary: URL is htmlified (& -> & etc.) → Options|Format|Plaintext is not converting contents of HTML compose message to

Akkana Peck

Comment 49

•

24 years ago

Beth and I talked about it and agreed that it is converting to plaintext, it just isn't doing it correctly (since it's expanding the entities). Seth can change the summary to whatever he feels best describes the bug. One interesting thing I just discovered: when I view a folder that has a message like this, I see both instances of the url linkified (the first one, which corresponds to the link text, and the second one in angle brackets, which is the real url) and the one in angle brackets still displays with the & in it; but if I click on the one in angle brackets, it removes the & before it goes to the link, so it goes to the right link. Perhaps someone noticed this problem earlier, and wrote a workaround so that the links would work right, but forgot that they'd still look wrong and they wouldn't work in other mail clients. (And if someone ever did want to mail a link that had & in it, it wouldn't work right because it would be converted to &.)

Summary: Options|Format|Plaintext is not converting contents of HTML compose message to → Options|Format|Plaintext is not converting contents of HTMLcompose message to

scottputterman

Comment 50

•

24 years ago

reassigning to ducarroz and marking nsbeta1+

Whiteboard: [nsbeta1-] → [nsbeta1+]

scottputterman

Comment 51

•

24 years ago

forgot to reassign

Assignee: sspitzer → ducarroz

varada

Comment 52

•

24 years ago

Re-assigning to varada

Assignee: ducarroz → varada

varada

Comment 53

•

24 years ago

CCing harishd and jst.

Johnny Stenback (:jst)

Comment 54

•

24 years ago

AFAICT (from looking at the code and discussing this with harishd and varada) this problem is in the HTML serializer (and is visible in the HTML editor as well), the seiralizer is converting '&'s in anchor href attributes into '&amp'. Attributes that are url's need to be special cased for HTML to not do the entity expansion since that's just the way they work. Reassigning to anthonyd, fixing this sohuld be fairly straight forward in the HTML serializer. Also cc:ing nhotta since he's working on related changes to getting the characterset of href's correct, or something.

Assignee: varada → anthonyd

nhottanscp

Comment 55

•

24 years ago

I am changing the HTML serializer for bug 74137 to escape URI. For URI attributes like 'href' and 'src', do a charset conversion (if not ASCII) then apply URI encode (%xx). I think '&' will not be URL escaped so it still needs a special handling for that case separately from bug 74137.

Akkana Peck

Comment 56

•

24 years ago

Johnny: did you read my comment of 2001-03-08 14:35 ? This does not happen when we go straight from the editor to the plaintext serializer, only when we go through mailnews to the html serializer, then re-parse, after which the extra characters are already in the dom and the serializer can't do anything about that. That convinces me that the serializer is doing the right thing, and the problem happens when mail/news sends the contents back around to be re-parsed and then re-serialized. Are you seeing something I'm not? Have you come up with a way to reproduce this without going through mime code?

Johnny Stenback (:jst)

Comment 57

•

24 years ago

Akkana, start composer, click on the intsert link button, enter "foo" as the text to show in the link and type "http://foo.bar/foo.html&a=b" as the href and insert the link. Then save the file and look at what the serializer produced, you'll see <a href="http://foo.bar/foo.html&a=b">foo</a> when it should've been <a href="http://foo.bar/foo.html&a=b">foo</a>

Ben Bucksch (:BenB)

Comment 58

•

24 years ago

jst, I believe that the current behaviour you describe is correct. [Note for all: I am discussing that with jst in private mail, will log results here.] Anyways, I think you can reproduce this bug even without URLs (so changing the HTML serializer for URLs only won't fix this bug completely): Reproduction: 1. New mail 2. Insert Image 3. Enter as alternative text: "An image & a test". 4. Send as 'Plaintext and HTML'. 5. Recieve 6. View Source Actual result (similar to): [...] Content-Type: text/plain; charset=us-ascii; format=flowed Content-Transfer-Encoding: 7bit ghkfguk [an image & a text] [...] Content-Type: text/html; charset=us-ascii Content-Transfer-Encoding: 7bit <html><head></head><body>ghkfguk<img src="cid:part1.03000002.09080505@netscape.com" alt="an image & a text" width="191" height="79"></body></html> [...] I.e. in both the plaintext part and the HTML part, an "&" appears for "&". Expected result: [...] Content-Type: text/plain; charset=us-ascii; format=flowed Content-Transfer-Encoding: 7bit ghkfguk [an image & a text] [...] Content-Type: text/html; charset=us-ascii Content-Transfer-Encoding: 7bit <html><head></head><body>ghkfguk<img src="cid:part1.03000002.09080505@netscape.com" alt="an image & a text" width="191" height="79"></body></html> [...] I.e. in the HTML part, an "&" appears for "&", while in the plaintext part, a normal "&" appears. This shows that this bug has nothing to do with URL-encoding at all, although URLs are the worst and most common case.

Johnny Stenback (:jst)

Comment 59

•

24 years ago

Yeah, you're right, this is not specific to url attributes, it applies to all attributes, and my suggestion would only fix this for href attributes. My suggestion would fix the bug, bit it would make the serializer produce non-standard (non-well formed if you like) HTML, but all browsers out there deals with '&'s in href attributes correctly, I bet the common useage on the web is to not encode '&' in an URL as '&', but it's still wrong... So the problem here is probably that the serializer needs to expand entities in attribute values when it's driven by the contentsink API (whihc it really shouldn't be, but oh well), just like the HTML content sink does (HTMLContentSink::GetAttributeValueAt()), the parser is really the one that should be doing this, but for some odd reason it doesn't, it leaves it up to the sink to do it. So it seems like the real fix for this problem, in the long run, is to make mailnews use the plaintext serializer to generate the plaintext output directly from the DOM in stead of first serializing it to HTML and then doign a HTML to text conversion (which is an expensive operation). But in the short run the fix is to make the plaintext serializer expand entities in attribute values that are gotten from the parser.

rubydoo123

Comment 60

•

24 years ago

reassigning back to the mail team, as suggested back on 09-03-2001, the Format|Options|Plaintext option needs to do the correct thing

Assignee: anthonyd → ducarroz

Akkana Peck

Comment 61

•

24 years ago

Adding extra code to "fix" the serializer (which one, html or plaintext?) to do the wrong thing intentionally, to compensate for problems in the interaction between incorrect mail code mail and the parser caused by doing extra work that we know we shouldn't be doing, seems like the wrong approach. Please, let's not do that.

harishd

Assignee

Comment 62

•

24 years ago

It looks like the entity reduction should happen in the parser than in the sinks ( currently only the HTMLContentSink supports this ). So, moving over to the parser and will come up with a fix based on risk factors.

Assignee: ducarroz → harishd

Component: Mail Window Front End → Parser

Product: MailNews → Browser

Johnny Stenback (:jst)

Comment 63

•

24 years ago

Akkana, I wasn't suggesting we'd add code to the serializers (plaintext) to do the wrong thing, what I suggested was to add code to the plaintext serializer to make it work correctly in the current model (that we've lived with for 3 years now, but the model is IMO wrong) where the parser doesn't expand entities in attribute values. To fix the plaintext serializer we wouldn't even need to write any new code, we could use the code in the HTML content sink. I'd feel better about adding the code to the plaintext serializer for now and not change the parser since that change can and will change the interaction with the HTML sink so we could screw up normal parsing, who knows what will regress, and we're getting close to a critical milestone. Harish is looking into changing the parser to do the right thing, and if that change doesn't look to scary then I'm ok with that, but if that gets complicated then I'd say we should fix the plaintext serializer (which is trivial). In either case, this is not a mailnews problem, they might be using an API to the serializer that they shouldn't be using but since that API exists, it should work correctly.

clayton

Comment 64

•

24 years ago

Adding nsbeta1+ in keywords

Keywords: nsbeta1 → nsbeta1+

harishd

Assignee

Comment 65

•

24 years ago

Attached patch Patch v1.1 ( Note: This is not the final patch ) — Details — Splinter Review

harishd

Assignee

Comment 66

•

24 years ago

Attached patch Patch v1.2 — Details — Splinter Review

harishd

Assignee

Comment 67

•

24 years ago

I have moved the attribute entity parsing code from the sinks to the parser. The patch also covers the issue raised by dbaron in bug 63188 ( where the parameter aMode was being used inconsistently ). This change in addition to fixing the bug should also enhance performance, a little bit ( sorry I don't have the numbers ..yet), since it completely avoids attribute parsing in the sink.

Status: NEW → ASSIGNED

Heikki Toivonen (remove -bugzilla when emailing directly)

Comment 68

•

24 years ago

You could add a little whistespace to make things more readable, for example on these lines: mFlags |= (aCommand==eViewSource)? NS_IPARSER_FLAG_VIEW_SOURCE:NS_IPARSER_FLAG_VIEW_NORMAL; isUsableAttr=!(mFlags & (NS_IPARSER_FLAG_STRICT_MODE|NS_IPARSER_FLAG_TRANSITIONAL_MODE)); This comment does not make sense: //Enabling strict comment parsing for Bug 53011 and 2749 contradicts!!!! In ConsumeAttributeEntity() you have two declarations for theNCRValue. Easiest would be to just use the outer, I guess. There are some other things in that function I do not really like, but since they were copied from the old implementation they can stay unless you want to do extra work (cast of PRInt32 to PRUnichar, ignoring error value from ToInteger). Also I have trouble reading this: + // Nav 4.x does not support entities greater than 255 however IE + // does support them if terminated with semicolon. Let's do the same... + // ...atleast in the quirks mode. + if (-1!=theNCRValue && + (theNCRValue<255 || !(aFlag & NS_IPARSER_FLAG_QUIRKS_MODE) || theTermChar == ';')) { + aString.Append(PRUnichar(theNCRValue)); To me the comment and code don't seem to match, and I think you changed how it works from the old code. How would this work: + if ( (theNCRValue > 255) && (theTermChar == ';') && (aFlag & NS_IPARSER_FLAG_QUIRKS_MODE) ) { + aString.Append(PRUnichar(theNCRValue));

Ben Bucksch (:BenB)

Comment 69

•

24 years ago

> // Nav 4.x does not support entities greater than 255 however IE > // does support them if terminated with semicolon. Let's do the same... > // ...atleast in the quirks mode. HTML 4.0 spec <http://www.w3.org/TR/REC-html40/charset.html#h-5.3>: > Note. In SGML [, on which HTML is based], it is possible to eliminate the > final ";" after a character reference in some cases (e.g., at a line break > or immediately before a tag). In other circumstances it may not be > eliminated (e.g., in the middle of a word). We strongly suggest using > the ";" in all cases to avoid problems with user agents that require this > character to be present.

harishd

Assignee

Comment 70

•

24 years ago

Heikki, I don't think that the logic is incorrect but will try to simplify it further so that it can be more readable.

harishd

Assignee

Comment 71

•

24 years ago

Ben: Are you suggesting that we should always look for ";", for entities greater than 255, whether the mode is strict or not? According to the spec., from what I understand, the semicolon is not a MUST but strongly recommends using it. So, according to my comment, mozilla will not care for a semicolon, for entity values greater than 255, in strict mode. That is, in strict mode, the entities would get reduced irrespective of a semicolon.

Ben Bucksch (:BenB)

Comment 72

•

24 years ago

harishd, I don't know enough about the actual web (pages) to make a good suggestion, but I would try to allow what SGML allows, at least in strict mode. The "recommondation" is for web authors to guard them from non-SGML-complying user agents (what Mozilla would be with the logic in your patch, if I interpret it correctly).

harishd

Assignee

Comment 73

•

24 years ago

Attached patch Patch v1.3 [ addressing heikki's concerns. r=heikki ] — Details — Splinter Review

harishd

Assignee

Comment 74

•

24 years ago

Attached patch patch v1.4 [ ignore my previous patch ] r=heikki — Details — Splinter Review

harishd

Assignee

Comment 75

•

24 years ago

Attached patch patch v1.5 [ forgot to include content changes. sorry! ] — Details — Splinter Review

Johnny Stenback (:jst)

Comment 76

•

24 years ago

sr=jst

harishd

Assignee

Comment 77

•

24 years ago

Fix is in. Marking FIXED.

Status: ASSIGNED → RESOLVED

Closed: 24 years ago → 24 years ago

Resolution: --- → FIXED

esther

Comment 78

•

24 years ago

Using build 2001-05-22 on win, linux and mac this is fixed per original scenario. the & is not in the viewed msgs after doing a send page in html and or plain text. Side note: when viewing this page with linux I sometimes crash. We had crashing bugs in today's build with links and viewing msgs. I will check this page again tomorrow (with a fix) to see if it still crashes. If so, I will log a new bug. Verified.

Status: RESOLVED → VERIFIED

Patch v1.1 ( Note: This is not the final patch ) 24 years ago harishd 40.47 KB, patch		Details \| Diff \| Splinter Review
Patch v1.2 24 years ago harishd 39.83 KB, patch		Details \| Diff \| Splinter Review
Patch v1.3 [ addressing heikki's concerns. r=heikki ] 24 years ago harishd 32.28 KB, patch		Details \| Diff \| Splinter Review
patch v1.4 [ ignore my previous patch ] r=heikki 24 years ago harishd 31.29 KB, patch		Details \| Diff \| Splinter Review
patch v1.5 [ forgot to include content changes. sorry! ] 24 years ago harishd 40.24 KB, patch		Details \| Diff \| Splinter Review