Closed Bug 352842 Opened 13 years ago Closed 13 years ago

Bad import of UTF-8 ICS file with non-ASCII characters

Categories

(Calendar :: Import and Export, defect)

defect
Not set

Tracking

(Not tracked)

VERIFIED FIXED

People

(Reporter: mattwillis, Unassigned)

References

Details

(Keywords: dataloss, regression, Whiteboard: [patch in hand])

Attachments

(3 files)

When importing iCal.app's Birthday's calendar, it uses a right curved single quote (’) in the SUMMARY (ex: "SUMMARY:Matthew Willis’s Birthday")

When this file (which opens properly in BBEdit as UTF-8, no BOM) is imported into Sunbird, the quote becomes mangled, and is displayed as "AcAA" where the c is a cents sign, the first A has a tilde above it and the other two have carats above them.

Something's wacky here.
Attached file ICS from iCal.app
The character in question, in HexEdit, from iCal's ICS:
E2 80 99

This matches the info at 
http://www.fileformat.info/info/unicode/char/2019/index.htm

The character in question, in HexEdit, from Sunbird's ICS:
C3 83 C2 A2 C3 82 C2 80 C3 82 C2 99 

We're adding C3 83 C2 to the first, and C3 82 C2 to the second and third, and th e first byte is becoming A2 rather than E2.
Flags: blocking0.3+
Keywords: dataloss
(In reply to comment #3)
> The character in question, in HexEdit, from iCal's ICS:
> E2 80 99
> The character in question, in HexEdit, from Sunbird's ICS:
> C3 83 C2 A2 C3 82 C2 80 C3 82 C2 99 

These bytes are getting misinterpreted as ISO-8859-1 and double-converted to UTF-8, not once but twice

ISO-8859-1              UTF-8
----------              -----
E2 80 99          -->   C3 A2 C2 80 C2 99
C3 A2 C2 80 C2 99 -->   C3 82 C2 A2 C3 82 C2 80 C3 82 C3 99
Attached patch use octet arraysSplinter Review
Somehow, using a string as array of didn't really work. This patch makes the importer really only use octet arrays until it has been decoded from utf8 into a string.
Attachment #238908 - Flags: second-review?(dmose)
Attachment #238908 - Flags: first-review?(mattwillis)
Whiteboard: [needs review dmose]
Comment on attachment 238908 [details] [diff] [review]
use octet arrays

This fixed it on my birthdays calendar with the smart quotes.

r=lilmatt
Attachment #238908 - Flags: first-review?(mattwillis) → first-review+
regression from bug 315672
Keywords: regression
Whiteboard: [needs review dmose] → [patch in hand][needs review dmose]
*** Bug 352964 has been marked as a duplicate of this bug. ***
Updating summary to be more general
Blocks: UTF-import
Summary: Bad import of UTF-8 ICS file with smart-quotes from iCal.app → Bad import of UTF-8 ICS file with non-ASCII characters
Comment on attachment 238908 [details] [diff] [review]
use octet arrays

r=dmose
Attachment #238908 - Flags: second-review?(dmose) → second-review+
Whiteboard: [patch in hand][needs review dmose] → [patch in hand][needs checkin]
patch checked in
Status: NEW → RESOLVED
Closed: 13 years ago
Resolution: --- → FIXED
verified with
Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9a1) Gecko/20060921 Calendar/0.3a2+
Status: RESOLVED → VERIFIED
Whiteboard: [patch in hand][needs checkin] → [patch in hand][litmus testcase wanted]
Litmus testcase 2693 created
Whiteboard: [patch in hand][litmus testcase wanted] → [patch in hand]
You need to log in before you can comment on or make changes to this bug.