User Agent: Mozilla/5.0 (Windows NT 6.2; rv:36.0) Gecko/20100101 Firefox/36.0 Build ID: 20150212154903 Steps to reproduce: Sending an email with an link like ..id=T000000445×tamp_sent=-.... in html only will result in encoding the × as x Because Outlook does not translate the × in the url (maybe depending on version) i assume that thunderbird does for some reason. Actual results: ..id=T000000445×tamp_sent=-....breaks the url to id=T000000445×tamp_sent= Expected results: but it should look like _id=T000000445&stamp_sent=11
correct result should be : _id=T000000445×tamp_sent=11
Btw. the html link in the receiving message is not properly formatted (not enclosed by double quotes) so maybe the url recognition does not work perfectly.
Edit again: The sender placed a link into the HTML part of the message without using the correct HTML syntax for links beginning with <a href= .... > .... only the "plain" url is placed somewhere in html part. A part of this link contains the ×tamp= variable which should redirect a user to the url. Instead, the × is being displayed as x because of "wrong" recognition as encoded character ×
&tims; is HTML Entity Name for U+00D7. http://en.wikipedia.org/wiki/Multiplication_sign http://www.fileformat.info/info/unicode/char/d7/index.htm http://www.fileformat.info/info/unicode/char/d7/index.htm HTML5 definition of "named-character-references" http://www.w3.org/TR/html5/syntax.html#named-character-references Both "×" and "×" is seen in definition. times; U+000D7 × times U+000D7 × Composer of Thunderbird 31.5.0 generated : in Subject header. Plced as-is, becuse Subject: != HTML. Subject: × × × in HTML Body : Tb 31.5.0 generated "HTML entity name for & == &" for letter "&" in typed/pasted string of "×" <a href="http://www,google.com?id=&times;AAA,&timetamp=BBB">&times; &times; &times;</a><br> <br> If HTML source in text/html partr is manully changed to following, <br> <a href="http://www,google.com?id=×AAA">×AAA ×BBB ×CCC</a><br> <br> Tb 31.5.0 showed the link as; ×AAA ×BBB ×CCC "Copy link location" of this link. UTF-8 of U+00D7 == 0xC397 http://www,google.com/?id=%C3%97AAA I could see following only: Compose of Tb doesn't support typimg "×" as HTML Character Entity in HTML message body. Tb replaces "&" by "&". In HTML mail display, Tb correctly shows HTML Character Entity represented by "×". I dpn't know about "times"(no semicolon) in table of HTML5 specificaation(represented as × in HTML source). To Bug opener. What is wrong of Tb in this bug? Who(what mailer) generated the HTML mail? Wht is actual source of HTML in the mail?
Gotcha! HTML source. <br> <a href="http://www,google.com?id=×stampAAA">×stampAAA ×stampBBB ×stampCCC</a><br> <br> Link display by Tb. ×stampAAA ×stampBBB ×stampCCC "Copy link location" by Tb. http://www,google.com/?id=×stampAAA What is meanin of "times U+000D7 ×" in table of http://www.w3.org/TR/html5/syntax.html#named-character-references? if "times", no delimiting character such as ";", Space, simbols which is defined in SGML only., for chracter entity name? IIRC, HTML 5 used rule of "start with &, end with ;" for character entity...
Firefox 36.0 also showed: ×stampAAA ×stampBBB ×stampCCC
Component: Filters → Backend
Product: Thunderbird → MailNews Core
Status: UNCONFIRMED → NEW
Ever confirmed: true
OS: Windows 8 → All
Hardware: x86 → All
Version: 35 → 31
Summary: × in HTML URL interpreted as x → × in HTML text is interpreted as x, even though it's ×tamp... in HRML text instead of ×
Description on parsing of character reference in http://www.w3.org/TR/html5/syntax.html#tokenizing-character-references Anything else Consume the maximum number of characters possible, with the consumed characters matching one of the identifiers in the first column of the named character references table (in a case-sensitive manner). If no match can be made, then no characters are consumed, and nothing is returned. In this case, if the characters after the U+0026 AMPERSAND character (&) consist of a sequence of one or more alphanumeric ASCII characters followed by a U+003B SEMICOLON character (;), then this is a parse error. If the character reference is being consumed as part of an attribute, and the last character matched is not a ";" (U+003B) character, and the next character is either a "=" (U+003D) character or an alphanumeric ASCII character, then, for historical reasons, all the characters that were matched after the U+0026 AMPERSAND character (&) must be unconsumed, and nothing is returned. However, if this next character is in fact a "=" (U+003D) character, then this is a parse error, because some legacy user agents will misinterpret the markup in those cases. Otherwise, a character reference is parsed. If the last character matched is not a ";" (U+003B) character, there is a parse error. Return one or two character tokens for the character(s) corresponding to the character reference name (as given by the second column of the named character references table).
Character entity till HTML 4.2 produced mny confusions because it's based on SGML. IIUC, it's reason why "based on SGML" was stopped in HTML 5, and "start with &, end with ;" was introduced. https://mathiasbynens.be/notes/ambiguous-ampersands Why "6timestamp" === "×tamp"? How about "6times;;tamp"? "x;stamp"? or "xstamp"? Historical reason? Or practical reason? ("× [SP]", "×,", "×/" etc. are pretty widely used?)
Summary: × in HTML text is interpreted as x, even though it's ×tamp... in HRML text instead of × → × in HTML text is interpreted as x, even though it's ×tamp... in HTML text instead of ×
Table of named-character-references in 2011 http://www.w3.org/TR/2011/WD-html5-20110113/named-character-references.html Only "×" is written in table. "×" is not seen. Other document of w3.org. http://dev.w3.org/html5/html-author/charref Only "×" is written in table. "×" is not seen.
Definition of character-references in HTML Standard https://html.spec.whatwg.org/#character-references > HTML > Living Standard — Last Updated 21 February 2015 > > 12.1.4 Character references > Named character references > The ampersand must be followed by one of the names given in the named character references section, using the same case. > The name must be one that is terminated by a U+003B SEMICOLON character (;). This is definition what I knew. "times" (no semicolon at end) in current table of HTML 5 is violation of this "12.1.4" of HTML Living Standard.
You need to log in before you can comment on or make changes to this bug.