&times in HTML text is interpreted as x, even though it's &timestamp... in HTML text instead of ×



MailNews Core
3 years ago
3 years ago


(Reporter: Thomas Brunnthaler, Unassigned)


3 years ago
User Agent: Mozilla/5.0 (Windows NT 6.2; rv:36.0) Gecko/20100101 Firefox/36.0
Build ID: 20150212154903

Steps to reproduce:

Sending an email with an link like ..id=T000000445&timestamp_sent=-.... in html only will result in encoding the &times as x 

Because Outlook does not translate the &times in the url (maybe depending on version) i assume that thunderbird does for some reason.

Actual results:

..id=T000000445&timestamp_sent=-....breaks the url to id=T000000445×tamp_sent=

Expected results:

but it should look like _id=T000000445&stamp_sent=11

3 years ago
correct result should be : _id=T000000445&timestamp_sent=11

3 years ago
Btw. the html link in the receiving message is not properly formatted (not enclosed by double quotes) so maybe the url recognition does not work perfectly.

3 years ago
Edit again: The sender placed a link into the HTML part of the message without using the correct HTML syntax for links beginning with <a href= .... > .... only the "plain" url is placed somewhere in html part. A part of this link contains the &timestamp= variable which should redirect a user to the url. Instead, the &times is being displayed as x because of "wrong" recognition as encoded character &times;


3 years ago
Component: Message Reader UI → Filters
&tims; is HTML Entity Name for U+00D7.

HTML5 definition of "named-character-references"
Both "&times;" and "&times" is seen in definition.

times; 	U+000D7 	×
times 	U+000D7 	×

Composer of Thunderbird 31.5.0 generated : 

in Subject header. Plced as-is, becuse Subject: != HTML.
    Subject: &times; &times;  &times;

 in HTML Body : Tb 31.5.0 generated  "HTML entity name for & == &amp;" for letter "&" in typed/pasted string of "&times;"
    <a href="http://www,google.com?id=&amp;times;AAA,&amp;timetamp=BBB">&amp;times;
      &amp;times; &amp;times;</a><br>

If HTML source in text/html partr is manully changed to following,

    <a href="http://www,google.com?id=&times;AAA">&times;AAA &times;BBB &times;CCC</a><br>

Tb 31.5.0 showed the link as;


 "Copy link location" of this link. UTF-8 of U+00D7 == 0xC397


I could see following only:
    Compose of Tb doesn't support typimg "&times;" as HTML Character Entity in HTML message body. Tb replaces "&" by "&amp;".
    In HTML mail display, Tb correctly shows HTML Character Entity represented by "&times;".
I dpn't know about "times"(no semicolon) in table of HTML5 specificaation(represented as &times in HTML source).

To Bug opener.
What is wrong of Tb in this bug?
Who(what mailer) generated the HTML mail?
Wht is actual source of HTML in the mail?
HTML source.
    <a href="http://www,google.com?id=&timesstampAAA">&timesstampAAA &timesstampBBB &timesstampCCC</a><br>
Link display by Tb.
    ×stampAAA ×stampBBB ×stampCCC
"Copy link location" by Tb.

What is meanin of  "times U+000D7 ×" in table of http://www.w3.org/TR/html5/syntax.html#named-character-references?
 if "times", no delimiting character such as ";", Space, simbols which is defined in SGML only., for chracter entity name?
IIRC, HTML 5 used rule of "start with &, end with ;" for character entity...
Firefox 36.0 also showed:
   ×stampAAA ×stampBBB ×stampCCC
Component: Filters → Backend
Product: Thunderbird → MailNews Core
Version: 37 → 35
Ever confirmed: true
OS: Windows 8 → All
Hardware: x86 → All
Version: 35 → 31
Summary: &times in HTML URL interpreted as x → &times in HTML text is interpreted as x, even though it's &timestamp... in HRML text instead of &times;
Description on parsing of character reference in http://www.w3.org/TR/html5/syntax.html#tokenizing-character-references

Anything else
 Consume the maximum number of characters possible, with the consumed characters matching one of the identifiers in the first column of the named character references table (in a case-sensitive manner).
 If no match can be made, then no characters are consumed, and nothing is returned. In this case, if the characters after the U+0026 AMPERSAND character (&) consist of a sequence of one or more alphanumeric ASCII characters followed by a U+003B SEMICOLON character (;), then this is a parse error.
 If the character reference is being consumed as part of an attribute, and the last character matched is not a ";" (U+003B) character, and the next character is either a "=" (U+003D) character or an alphanumeric ASCII character, then, for historical reasons, all the characters that were matched after the U+0026 AMPERSAND character (&) must be unconsumed, and nothing is returned. However, if this next character is in fact a "=" (U+003D) character, then this is a parse error, because some legacy user agents will misinterpret the markup in those cases.
 Otherwise, a character reference is parsed. If the last character matched is not a ";" (U+003B) character, there is a parse error.
 Return one or two character tokens for the character(s) corresponding to the character reference name (as given by the second column of the named character references table).
Character entity till HTML 4.2 produced mny confusions because it's based on SGML.
IIUC, it's reason why "based on SGML" was stopped in HTML 5, and "start with &, end with ;" was introduced.

Why "6timestamp" === "&times;tamp"?
How about "6times;;tamp"?  "x;stamp"? or "xstamp"?
Historical reason? Or practical reason? ("&times [SP]", "&times,", "&times/" etc. are pretty widely used?)
Summary: &times in HTML text is interpreted as x, even though it's &timestamp... in HRML text instead of &times; → &times in HTML text is interpreted as x, even though it's &timestamp... in HTML text instead of &times;
Table of named-character-references in 2011
  Only "&times;" is written in table. "&times" is not seen.
Other document of  w3.org.
  Only "&times;" is written in table. "&times" is not seen.
Definition of character-references in HTML Standard
> Living Standard — Last Updated 21 February 2015
> 12.1.4 Character references
> Named character references
>   The ampersand must be followed by one of the names given in the named character references section, using the same case.
>   The name must be one that is terminated by a U+003B SEMICOLON character (;).
This is definition what I knew. 
"times" (no semicolon at end) in current table of HTML 5 is violation of this  "12.1.4" of HTML Living Standard.
