Open
Bug 1133691
Opened 10 years ago
Updated 3 years ago
× in HTML text is interpreted as x, even though it's ×tamp... in HTML text instead of ×
Categories
(MailNews Core :: Backend, defect)
Tracking
(Not tracked)
NEW
People
(Reporter: donauinsel, Unassigned)
Details
User Agent: Mozilla/5.0 (Windows NT 6.2; rv:36.0) Gecko/20100101 Firefox/36.0
Build ID: 20150212154903
Steps to reproduce:
Sending an email with an link like ..id=T000000445×tamp_sent=-.... in html only will result in encoding the × as x
Because Outlook does not translate the × in the url (maybe depending on version) i assume that thunderbird does for some reason.
Actual results:
..id=T000000445×tamp_sent=-....breaks the url to id=T000000445×tamp_sent=
Expected results:
but it should look like _id=T000000445&stamp_sent=11
| Reporter | ||
Comment 1•10 years ago
|
||
correct result should be : _id=T000000445×tamp_sent=11
| Reporter | ||
Comment 2•10 years ago
|
||
Btw. the html link in the receiving message is not properly formatted (not enclosed by double quotes) so maybe the url recognition does not work perfectly.
| Reporter | ||
Comment 3•10 years ago
|
||
Edit again: The sender placed a link into the HTML part of the message without using the correct HTML syntax for links beginning with <a href= .... > .... only the "plain" url is placed somewhere in html part. A part of this link contains the ×tamp= variable which should redirect a user to the url. Instead, the × is being displayed as x because of "wrong" recognition as encoded character ×
| Reporter | ||
Updated•10 years ago
|
Component: Message Reader UI → Filters
Comment 4•10 years ago
|
||
&tims; is HTML Entity Name for U+00D7.
http://en.wikipedia.org/wiki/Multiplication_sign
http://www.fileformat.info/info/unicode/char/d7/index.htm
http://www.fileformat.info/info/unicode/char/d7/index.htm
HTML5 definition of "named-character-references"
http://www.w3.org/TR/html5/syntax.html#named-character-references
Both "×" and "×" is seen in definition.
times; U+000D7 ×
times U+000D7 ×
Composer of Thunderbird 31.5.0 generated :
in Subject header. Plced as-is, becuse Subject: != HTML.
Subject: × × ×
in HTML Body : Tb 31.5.0 generated "HTML entity name for & == &" for letter "&" in typed/pasted string of "×"
<a href="http://www,google.com?id=&times;AAA,&timetamp=BBB">&times;
&times; &times;</a><br>
<br>
If HTML source in text/html partr is manully changed to following,
<br>
<a href="http://www,google.com?id=×AAA">×AAA ×BBB ×CCC</a><br>
<br>
Tb 31.5.0 showed the link as;
×AAA ×BBB ×CCC
"Copy link location" of this link. UTF-8 of U+00D7 == 0xC397
http://www,google.com/?id=%C3%97AAA
I could see following only:
Compose of Tb doesn't support typimg "×" as HTML Character Entity in HTML message body. Tb replaces "&" by "&".
In HTML mail display, Tb correctly shows HTML Character Entity represented by "×".
I dpn't know about "times"(no semicolon) in table of HTML5 specificaation(represented as × in HTML source).
To Bug opener.
What is wrong of Tb in this bug?
Who(what mailer) generated the HTML mail?
Wht is actual source of HTML in the mail?
Comment 5•10 years ago
|
||
Gotcha!
HTML source.
<br>
<a href="http://www,google.com?id=×stampAAA">×stampAAA ×stampBBB ×stampCCC</a><br>
<br>
Link display by Tb.
×stampAAA ×stampBBB ×stampCCC
"Copy link location" by Tb.
http://www,google.com/?id=×stampAAA
What is meanin of "times U+000D7 ×" in table of http://www.w3.org/TR/html5/syntax.html#named-character-references?
if "times", no delimiting character such as ";", Space, simbols which is defined in SGML only., for chracter entity name?
IIRC, HTML 5 used rule of "start with &, end with ;" for character entity...
Comment 6•10 years ago
|
||
Firefox 36.0 also showed:
×stampAAA ×stampBBB ×stampCCC
Component: Filters → Backend
Product: Thunderbird → MailNews Core
Updated•10 years ago
|
Version: 37 → 35
Updated•10 years ago
|
Status: UNCONFIRMED → NEW
Ever confirmed: true
OS: Windows 8 → All
Hardware: x86 → All
Version: 35 → 31
Updated•10 years ago
|
Summary: × in HTML URL interpreted as x → × in HTML text is interpreted as x, even though it's ×tamp... in HRML text instead of ×
Comment 7•10 years ago
|
||
Description on parsing of character reference in http://www.w3.org/TR/html5/syntax.html#tokenizing-character-references
Anything else
Consume the maximum number of characters possible, with the consumed characters matching one of the identifiers in the first column of the named character references table (in a case-sensitive manner).
If no match can be made, then no characters are consumed, and nothing is returned. In this case, if the characters after the U+0026 AMPERSAND character (&) consist of a sequence of one or more alphanumeric ASCII characters followed by a U+003B SEMICOLON character (;), then this is a parse error.
If the character reference is being consumed as part of an attribute, and the last character matched is not a ";" (U+003B) character, and the next character is either a "=" (U+003D) character or an alphanumeric ASCII character, then, for historical reasons, all the characters that were matched after the U+0026 AMPERSAND character (&) must be unconsumed, and nothing is returned. However, if this next character is in fact a "=" (U+003D) character, then this is a parse error, because some legacy user agents will misinterpret the markup in those cases.
Otherwise, a character reference is parsed. If the last character matched is not a ";" (U+003B) character, there is a parse error.
Return one or two character tokens for the character(s) corresponding to the character reference name (as given by the second column of the named character references table).
Comment 8•10 years ago
|
||
Character entity till HTML 4.2 produced mny confusions because it's based on SGML.
IIUC, it's reason why "based on SGML" was stopped in HTML 5, and "start with &, end with ;" was introduced.
https://mathiasbynens.be/notes/ambiguous-ampersands
Why "6timestamp" === "×tamp"?
How about "6times;;tamp"? "x;stamp"? or "xstamp"?
Historical reason? Or practical reason? ("× [SP]", "×,", "×/" etc. are pretty widely used?)
Updated•10 years ago
|
Summary: × in HTML text is interpreted as x, even though it's ×tamp... in HRML text instead of × → × in HTML text is interpreted as x, even though it's ×tamp... in HTML text instead of ×
Comment 9•10 years ago
|
||
Table of named-character-references in 2011
http://www.w3.org/TR/2011/WD-html5-20110113/named-character-references.html
Only "×" is written in table. "×" is not seen.
Other document of w3.org.
http://dev.w3.org/html5/html-author/charref
Only "×" is written in table. "×" is not seen.
Comment 10•10 years ago
|
||
Definition of character-references in HTML Standard
https://html.spec.whatwg.org/#character-references
> HTML
> Living Standard — Last Updated 21 February 2015
>
> 12.1.4 Character references
> Named character references
> The ampersand must be followed by one of the names given in the named character references section, using the same case.
> The name must be one that is terminated by a U+003B SEMICOLON character (;).
This is definition what I knew.
"times" (no semicolon at end) in current table of HTML 5 is violation of this "12.1.4" of HTML Living Standard.
Updated•3 years ago
|
Severity: normal → S3
You need to log in
before you can comment on or make changes to this bug.
Description
•