Closed Bug 1512647 Opened 6 years ago Closed 6 years ago

Incomplete Unicode support for detecting *marked-up* words in plain text email

Categories

(Core :: Networking, enhancement, P5)

enhancement

Tracking

()

RESOLVED FIXED
mozilla66
Tracking Status
firefox66 --- fixed

People

(Reporter: jfkthame, Assigned: jfkthame)

References

Details

(Whiteboard: [necko-triaged])

Attachments

(2 files)

+++ This bug was initially created as a clone of Bug #1505911 +++ Steps to reproduce: Use Thunderbird 52.9.1 and mark up the Deseret-script string *𐐔𐐯𐑅𐐨𐑉𐐯𐐻* with asterisks. Also try supplementary-plane Chinese _𠜎𠜱𠝹𠱓𠱸_ and /áçčẻñțëḍ/ Latin using combining marks (rather than precomposed characters). Actual results: Sending a plain text message with these strings, Thunderbird does not apply the expected bold/underline/italic formatting. Expected results: These strings should be formatted just as similarly-marked ASCII text would be. In bug 1505911, mozTXTToHTMLConv.cpp was patched to extend the definition of "alphabetic" characters used when scanning for such marked strings. However, the Unicode support is still incomplete, as it just looks at individual char16_t code units. Therefore, it fails to recognize supplementary-plane characters that should be treated as letters. It also doesn't handle combining marks; they are regarded as non-letters, rather than being treated as part of a single cluster that has the category of its base character. We can improve the behavior here if we iterate over the text using mozilla::unicode::ClusterIterator, and check for surrogate pairs when looking up character types.
Mentor: valentin.gosu
This should fix the examples mentioned, afaict.
Attachment #9029974 - Flags: review?(valentin.gosu)
Assignee: nobody → jfkthame
Status: NEW → ASSIGNED
Attachment #9029975 - Flags: review?(valentin.gosu) → review+
Whiteboard: [necko-triaged]
Attachment #9029974 - Flags: review?(valentin.gosu) → review+
Pushed by jkew@mozilla.com: https://hg.mozilla.org/integration/mozilla-inbound/rev/8097285ea1c0 Use unicode::ClusterIterator and decode surrogates when scanning text for marked-up words. r=valentin https://hg.mozilla.org/integration/mozilla-inbound/rev/1bb44497f6c0 Add mozTXTToHTMLConv testcases including supplementary-plane letters and combining marks. r=valentin
Status: ASSIGNED → RESOLVED
Closed: 6 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla66
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: