Closed Bug 13246 Opened 25 years ago Closed 25 years ago

LDIF import to deal with UTF-8 data

Categories

(SeaMonkey :: MailNews: Address Book & Contacts, defect, P3)

All
Other
defect

Tracking

(Not tracked)

VERIFIED FIXED

People

(Reporter: momoi, Assigned: chuang)

References

()

Details

(Whiteboard: 4.5x is fixed. Need UI for 4.0x idif file.)

** Observed with 9/2/99 Win32 build **

We need to be able to import LDIF files from 4.x, 4.5x and other clients
and show original non-ASCII data correctly.

Here's what need to be done:

1. From 4.5x and later clients. Assume LDIF files are in UTF-8 and
   engage proper procedure to display them.
   The above URL points to an LDIF file containing multi-lingual
   entries in UTF-8. If successfully imported, it should show
   names in 13-14 languages.

   To see the entries should look like, you can look the same entries
   with 4.5x or later client at: (type "*" as a search key.)

   DS: polyglot.mcom.com
   port: 389
   search root: o=Airius.com

2. From 4.0x clients, if the user has changed the extension to
   .4ld(if), then this is a sign that the original data is not in
   UTF-8 but in one of the native encodings. In this case, we should
   do one of 2 things:

   A. Ask the user which encoding the file is in, and convert from
      the encoding supplied to an appropriate internal encoding.
   B. Assume that the encoding is the default for the current OS, e.g.
      assume Shift_JIS for Japanese-Windows. I18n should have a locale-to-
      default charset mapping function available.

   I guess B is the easier option since it would not involve UI
   additions.
I also created a data file for non-UTF-8 (legacy) LDIF file and
placed it in the same directory as the UTF-8 data. The file is named:

latin1.4ld

This file contains only Latin 1 data generated by 4.06 Address Book.
Status: NEW → ASSIGNED
Target Milestone: M11
Whiteboard: 4.5x is fixed. Need UI for 4.0x idif file.
Case 1(4.5 later client) is fixed.

For case 2,  since there's no information stored in the file about the char set
when exporting the 4ldi file and according to Naoki,  the char set could be
different when user export the 4ldi file.  To solve this problem, we need a UI
for user to pick the original char set, so we can convert it into UTF-8 format.
Regarding the case 2, the charset information is stored in pref.js. So we may
convert if we can link the importing file to that pref information.
But the problem is that when exporting as 4.0 ldif, 4.x generates a file with
.ldif extension. Although the extension is .ldif, it is not UTF-8 so we cannot
assume the charset of that (since real ldif is always UTF-8 regardless of the
pref setting thus no conversion needed).
The ".4ld(if)" thing was a workaround even for 4.5.
We asked the user to manually change the extension to
".4ld" if the user wants to import 4.0x LDIF into 4.5
Address Book. And in this case, 4.5 does nothing -- just a
passthrough. (This means that it would not work
if you import Latin 1 data into a Communicator running under
a Japanese locale.)

In the same spirit, I think this can remain a workaround for
those wanting to import from 4.0x into 5.0. This is why  I think
option B I suggested might be Ok in this case.
Status: ASSIGNED → RESOLVED
Closed: 25 years ago
Resolution: --- → FIXED
Summary: LDIF import to deal with UTF-8 and non-UTF-8 legacy data → LDIF import to deal with UTF-8 data
Per discussion with nhotta, I am going to separate the part
which deals with legacy data and write another bug for it.
We agreed to do it in the right way this time rather than
keep trying to heap a hack upon a hack as we did in 4.5 and 4.61.

So this bug now will be restriced to only issue I mentioned in item
#1 in the original report. I also modified the summary for this reason.
QA Contact: lchiang → momoi
I'm singing up to verify this.
Status: RESOLVED → VERIFIED
** Checked with 9/16/99 Win32 build **

LDIF files containing Japanese, and other lang data can now be
successfully imported from 4.5 and later Communicator which spits out
LDIF files in UTF-8. I tried both a single lang data as well as
a multilingual data which contained some 14 language entries
and they both worked.

Marking it verified/fixed.
Product: Browser → Seamonkey
You need to log in before you can comment on or make changes to this bug.