Closed
Bug 1512647
Opened 6 years ago
Closed 6 years ago
Incomplete Unicode support for detecting *marked-up* words in plain text email
Categories
(Core :: Networking, enhancement, P5)
Core
Networking
Tracking
()
RESOLVED
FIXED
mozilla66
Tracking | Status | |
---|---|---|
firefox66 | --- | fixed |
People
(Reporter: jfkthame, Assigned: jfkthame)
References
Details
(Whiteboard: [necko-triaged])
Attachments
(2 files)
6.28 KB,
patch
|
valentin
:
review+
|
Details | Diff | Splinter Review |
1.87 KB,
patch
|
valentin
:
review+
|
Details | Diff | Splinter Review |
+++ This bug was initially created as a clone of Bug #1505911 +++
Steps to reproduce:
Use Thunderbird 52.9.1 and mark up the Deseret-script string *𐐔𐐯𐑅𐐨𐑉𐐯𐐻* with asterisks.
Also try supplementary-plane Chinese _𠜎𠜱𠝹𠱓𠱸_ and /áçčẻñțëḍ/ Latin using combining marks (rather than precomposed characters).
Actual results:
Sending a plain text message with these strings, Thunderbird does not apply the expected bold/underline/italic formatting.
Expected results:
These strings should be formatted just as similarly-marked ASCII text would be.
In bug 1505911, mozTXTToHTMLConv.cpp was patched to extend the definition of "alphabetic" characters used when scanning for such marked strings. However, the Unicode support is still incomplete, as it just looks at individual char16_t code units. Therefore, it fails to recognize supplementary-plane characters that should be treated as letters.
It also doesn't handle combining marks; they are regarded as non-letters, rather than being treated as part of a single cluster that has the category of its base character.
We can improve the behavior here if we iterate over the text using mozilla::unicode::ClusterIterator, and check for surrogate pairs when looking up character types.
Assignee | ||
Updated•6 years ago
|
Mentor: valentin.gosu
Assignee | ||
Comment 1•6 years ago
|
||
This should fix the examples mentioned, afaict.
Attachment #9029974 -
Flags: review?(valentin.gosu)
Assignee | ||
Updated•6 years ago
|
Assignee: nobody → jfkthame
Status: NEW → ASSIGNED
Assignee | ||
Comment 2•6 years ago
|
||
Attachment #9029975 -
Flags: review?(valentin.gosu)
Updated•6 years ago
|
Attachment #9029975 -
Flags: review?(valentin.gosu) → review+
Updated•6 years ago
|
Whiteboard: [necko-triaged]
Updated•6 years ago
|
Attachment #9029974 -
Flags: review?(valentin.gosu) → review+
Pushed by jkew@mozilla.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/8097285ea1c0
Use unicode::ClusterIterator and decode surrogates when scanning text for marked-up words. r=valentin
https://hg.mozilla.org/integration/mozilla-inbound/rev/1bb44497f6c0
Add mozTXTToHTMLConv testcases including supplementary-plane letters and combining marks. r=valentin
Comment 4•6 years ago
|
||
bugherder |
https://hg.mozilla.org/mozilla-central/rev/8097285ea1c0
https://hg.mozilla.org/mozilla-central/rev/1bb44497f6c0
Status: ASSIGNED → RESOLVED
Closed: 6 years ago
status-firefox66:
--- → fixed
Resolution: --- → FIXED
Target Milestone: --- → mozilla66
You need to log in
before you can comment on or make changes to this bug.
Description
•