Closed Bug 57151 Opened 24 years ago Closed 24 years ago

Addressbook doesn't display accented characters when migrating from 4.x to Netscape 6.

Categories

(MailNews Core :: Internationalization, defect, P3)

x86
All
defect

Tracking

(Not tracked)

VERIFIED FIXED

People

(Reporter: msanz, Assigned: sspitzer)

Details

(Keywords: dataloss, Whiteboard: [rtm++])

Attachments

(5 files)

Addressbook doesn't display names with accented characters. For example, if you
have a card with the name Antonio Martín (Alt+161 for í), addressbook displays
only Antonio Mar the accented character and any thing after it is not displayed.
Using build ID 2000101608, on Win NT US.
This must be a regression. Marina, do you know when did it start?
On my Win95-J, I can see accented characters on Address Book.
I'm using 2000101709 branch build.
i see them perfectly in 2000-10-13 build
i am not able to reproduce it with 2000-10-17 build either, all 8 bit chars 
including latin-2 and Cyrillic and displayed correctly, new cards are fine as 
well.
How about today's build?
today's build is OK as well, i viewed the entries in all non-ascii ABs and they 
looks fine
worksforme
Status: NEW → RESOLVED
Closed: 24 years ago
Resolution: --- → WORKSFORME
no,  it doesn't work for me. I have a question, are the addressbooks that you
are using created with Netscape 6 or are they migrated from Netscape 4.x?
Because that may be a difference. Reopening.
Status: RESOLVED → REOPENED
Resolution: WORKSFORME → ---
i have both : migrated 4.x with accented chars and cyrillic entries and i have 
ones that are created with N6, i created some today as well. All of them are 
correctly displayed.
QA Contact: esther → laurel
could that be related to Themes change?
This problem happens when migrating address book from 4.x to Netscape 6.
I saw this when migrating from 4.72
Here is the steps to reproduce the problem.
1. Use 4.x to create an address book card containing the name of
   "Antonio Martín"
2. Launch Netscape 6, migrating the same problem used at step 1.
3. Open Address Book, you'll see "Martín" is cut off from "t"
Summary: Addressbook doesn't display accented characters → Addressbook doesn't display accented characters when migrating from 4.x to Netscape 6.
i just did export from 4.72 and imported it into seamonkey, look absolutely
correct (see the attached screenshot), using 2000-10-18 build
Attached image import image
So the profile convertion messed up the entries in Address Book.
I migrated a 4.75 Ja address book containing some Japanese cards to Netscape 6.
The Japanese cards display blank in Netscape 6.
there were two bugs i entered a long time ago # 13909 and # 32251. I can not
reproduce the problems described in those two bug reports (same as you see) with
my export/import. The only difference i can come up with is that i am the only
one using 4.72 and you guys are using 4.73 and 4.75. Could that be a reason??
now i am almost positive that Seamonkey is NOT at fault, it is Communicator 4.73
and older versions, here i what i did:
- took Montse's na2 file, imported it into 4.73 Communicator;(look fine)
- saved it and imported into Seamonkey (no problem with accented chars display;
//note: in both cases the display is correct -4.72 and 6 have no compatibilty
problems
it take it all back , i was able to reproduce it with Montse file when migrating
a profile from 4.72. The display was cut off. This is not an Import/Export
problem, the utility works fine, this is a migration problem. 
Marina or Xianglan, please summarize what is working and what is not.
Also attach test data. Is this rtm, or any work around available?
It should be rtm but there is a workaround: the workaround is to use Import
utility wich is working ok, the migration though fails. 
How do I import from Netscape itself? It doesn't give me the option in Import
Utilities.
the way to Import is to Export it first meaning save in any format : ldif,
csv,tab)and then Import AB into Nscp6, this way nothing gets corrupted 
Okay, so 4.x -> 6.0 by export/import works, right?
But if migrated, the data is broken. Put rtm to keyword (data loss but has a
work around).
Candice, should this go to you or Seth?
Someone please attach test data (4.x ab file).
Status: REOPENED → NEW
Keywords: rtm
changed component to I18n
Component: Address Book → Internationalization
Keywords: dataloss
Reassign to sspitzer.
Assignee: nhotta → sspitzer
yikes.  I'm able to reproduce.  accepting.
Status: NEW → ASSIGNED
OS: Windows NT → All
Whiteboard: [rtm need info]
I honestly don't think that the workaround is acceptable. The workaround means
all users internationally will have to export and then import their ab. I don't
think we can expect them to do that. 
Is this really a regression?  Seems unlikely unless migration itself was changed
around that date.  What is the export/import method doing that migration doesn't
do?  Is it that export does an encoding that migration is skipping?
I don't think this is a regression.  I just think it never worked.

I noticed that when I export to ldif in 4.x, I get different values then when I
export to ldif using the ab upgrader code.  (ns/mailnews/addrbook/upgrade)

still investigating.
Export (ldif) uses UTF-8 which is same as 6.0 AB's charset but 4.x AB's charset
is specified in the pref (cc to mscott, he knows better).
But if the upgrade code also outputs ldif then it should also be UTF-8 and no
problem using it in 6.0?

I don't think the 6.0 ab upgrader code is exporting LDIF to UTF-8

I think it is doing iso-8859-1.  (that is addressbook's csid pref in my 4.x
prefs file.)

my guess is we are not filtering from the csid of the ab to utf8.
I'm curious if an addressbook with Japanese characters has the same problems.

nhotta, can you test that?
Tested branch build id2000102008 on WinNT 4 JA.
Japanese name was not migrated correctly, Japanese turned to empty strings after
the migration.
I think this is serious data loss. I have my 4.x addressbook and I can export
and import it, but users may uninstall 4.x because of space, then all data is lost!
oh boy.  after doing some more debugging, here's what I've found:

1) the code that creates the filter to convert from the 4.x addressbooks char
set to utf8 is turned off.
2) when I turn on and fix that code, the filter code will not work because it
needs INTL_ConvertLineWithoutAutoDetect() and INTL_DefaultWinCharSetID()

I'll go look around the 6.0 code base for the equivalents of those functions.
cool, I found the 6.0 equivalent.  working on fix now.

nhotta, can you export (to ldif) the 4.x japanese address book, and attach it
here?
nhotta, what was the char set for that ab?

(look in your 4.x prefs.js file)

user_pref("ldap_2.servers.pab.csid", "iso-8859-1");
user_pref("ldap_2.servers.pab.csid", "Shift_JIS");

So, it was "Shift_JIS".
But the file I attached was exported as LDIF so the charset there is "UTF-8".
ok, I've got a fix.  I had one fix, then I re-tooled it so as to be as minimal
as possible.  (thanks to bienvenu for the suggestions.)

here's the fix:

for each addressbook, while migrating addressbooks:

get the "csid" pref for the addressbook.
(in 4.x, that pref was the charset.  ex, "iso-8859-1", "Shift_JIS")

store that charset as the "current charset" in the AbUpgrader for use by the
import code.

In the import code, when converting .na2 to LDIF, get the "current charset" and
use that to convert entries from "current charset" to "UTF8".

If the "current charset" is not set, get and use the system charset.

I've tested that this works fine when the csid pref is not set or is set to "".
To fully test, I need to try this fix out on a machine where the system charset
is "Shift_JIS"

I'll attach the patches.

assuming I get all the necessary reviews and approval, I'd like to land this on
the trunk first, and then test trunk builds with nhotta, and then land on then
branch.
fixed attached.  bienvenu, can you review?
Whiteboard: [rtm need info] → [rtm need info] fix attached.
r=bienvenu
> get the "csid" pref for the addressbook.
> (in 4.x, that pref was the charset.  ex, "iso-8859-1", "Shift_JIS")

I hope you have a fallback in case there is no charset
specified for an address book in 4.x. The charset entry
for the address book started with 4.5, I believe. I think earlier
versions of Communicator relied on the system charset.
Laurel, can we take this from you?
QA contact to marina.
QA Contact: laurel → marina
No problem.  Do you want any more? :-)
momoi:  yes, if no csid was specified, I assume the system charset.
sr=mscott
r=bienvenu, sr=mscott, marking rtm+

pdt:  you want this on the tip first, and then branch?
Whiteboard: [rtm need info] fix attached. → [rtm+]
This is big and scary.  We want to see this on the trunk and get 48 hours of 
cooking.  In addition, we want to see the I18N testing that proves this is a 
correct fix without regressions.
Whiteboard: [rtm+] → [rtm need info]
I agree this is big and scary. But I want to make sure it's clear this won't get
tested unless people migrate profiles, so we need to do more than just land it
on the trunk - we need to get some active testing of profile migration on the trunk.
I'm going to land this on the trunk, so we can have more eyeballs on it.

adding gbush and dbragg to the cc list.  they do a lot of migration testing, so
this will give them a heads up.
we need to get I18N coverage on tomorrow's trunk builds as well.
I will try with the same file that I originally discover this in. Marina,
Xianglan, I need you to do this with JA text, I only have Latin-1.
fix landed on trunk.
Yes, David, you beat me to my comments!  I was going to send email to ask folks 
to try this specific test since migrating address books aren't a normal test 
which people do every day :-)

gbush is on vacation.  I'll cc: suresh to coordinate a few people to try 
migrating especially address books.  Montse has done the same already, I see.  
Pls post results here.
to test this code, you'd need today's (10-26-00) trunk commercial builds.

you'd need to remigrate your 4.x profile.  the code doesn't get executed until
you startup mailnews (or addressbook) for the first time.
The 10/26/ trunk build looks good to me for the following
types of Address Book folde migration on Windows NT4-J:

1. PABook with Shift_JIS charset
2. PABook with no charset (under Japanese Windows)
3. LDAP directory files with content in it (UTF-8)
4. PABook with UTF-8 charset

The only thing I didn't like was that only for type 1,
you see the original name of the folder.
For types 2-4, you see an .ldif suffix appended to the
original names. This is not that bad for Type 3, but
for Type 2 and TYpe 4, it looks funny. The optimal
thing would be not to show the LDIF suffix unless
the entries were *directly* imported from an LDIF file.
For a Japanese address book name created at 4.75-J, after migration the name 
shows
as a strange English name with .ldif extension. But you only can see this at the 
first time when you migrate it to Netscape 6. After you restart the applcation, 
the Ja address book name shows correctly and it doesn't have .ldif extension. 
So, the LDIF extension is temporary.
This is kind of confusing with an additional
fact that any LDAP directory gets imported temporarily
with LDIF extension but and you can see the content
in them if they happen to have content from prior search
but then when you re-start, all KDAP directory folders
disappear along with the extension. So in my particular
example above, Type 3 folder was there initially, but
is gone after a re-start.
I think these things happened before in English AB's as well. Candice, can you
comment on this?
Adding Cindy to cc: list.
Yes, the .ldif extension problem appears in English too and I believe it is 
still there.  This problem should be filed as a separate bug.  
comments:

the "first time the addressbook are named *.ldif" is a know bug, assigned to chuang.

kat just showed me a valid bug which is this:

if you have a saved LDAP query (as an addressbook in 4.x) it shows up the first
time you run addressbook after migration, but then disappears if you quit and
restart.  (the other LDAP addressbooks do this too.)

since we don't support LDAP yet, we should only be migrating local addressbooks.

fixing this would prevent addressbooks that "disappear" from ever showing up.

I'll start a new bug on this, and look for the 4.x pref that I can use to tell
local from LDAP (and LDAP search results.)

thanks for testing, kat.
#58110 is the new bug (about not migrating LDAP and LDAP queries).

the other bug (about the *.ldif names on first launch) is #41887
I tried again with the trunk build today and it failed for me. But I was having
problems migrating anyway. Kat, Xianglan, I won't be in the office tomorrow, but
can you try with my address book again? I have placed the 4.7x profile files in
babel
http://babel/5x_tests/misc/montse/msanz/
you can access it through the network //babel/babel_docs2/5x_tests/misc/montse
msanz, what migration problems were you having?
I tried msanz' profile under NT4-US and migration
succeeded with 2000102604 Win32 build. For example,
the personal Address Book contains this entry mentioned
earlier by msanz, Antonio Martín Herreros, and this entry
migrated without a porblem. Her prefs.js contains this entry

user_pref("ldap_2.servers.pab.csid", "iso-8859-1");
user_pref("ldap_2.servers.pab.filename", "pab.na2");

There's nothing amiss here.
Let me write down test procedures I took so that we
can be all the same wave lengths:

1. I created a profile directory called "msanz2" using
    Comm 4.75.
2. Quit Communicator and copied all the profile dir files into
    this new msanz2 directory.
3. Re-start Comm 4.75 and ascertain that accented entries are
    there.
4. Now quit Comm 4.75.
5. Start Mozilla(NS6) 10262000 MN build (Build ID: 2000102604).

    netscpe6.exe -installer

     This will bring up the migration manager.
6. Choose to migrate msanz2 which I set up earlier.
   Wait until it is finished.

7. When NS6/Mozilla come ups, open the Address Book and look at the Personal
    AdBook folder to see if the entries migrated correctly.

I was able to migrate all accented entries using the above procedures.

marina, please try out the test steps under US Windows, and
report here -- without using the msanz profile but by creating your
own entries.
   
kat, thanks for doing the testing on msanz account.

just curious, did you have to edit any of her prefs to get it to work after
moving the files?
No. I didn't. I used her prefs.js as is.
There could be other causes for her problem, e.g.
such as, not having deleted the profile files migrated earlier.
I think we can trust the test only if we have deleted any
migration files first.
I'll now check to see if it wil migrate OK without the
charset line in the prefs.js.
Under Windows NT4-US, the accented entries migrated
without a problem with the 2000102604 build even
without the charset line in the prefs.js.

Let's look at msanz's machine next week.
In the meantime, we should keep on looking at this
with the trunk build.

ji, can you try the msanz profile under Japanese Windows?
That should work.
OK, it looks like Kat was able to migrate my files. Kat, I was using the Dell
machine that is in my cube, you can use it. I think the profile name was "test"
or "iqa". I was using that machine with Windows NT JA.

Seth, I was having the problems described in the bug, I logged it. All entries
get truncated when the string has an accented character and nothing gets
displayed after it.
I forgot to mention, I rely on Kat's testing more than in mine (a note for PDT :-)
msanz:
msanz,

were you using the branch, and not the most recent trunk build?
did you forget to remove your profile and re-migrate?

I was using the trunk, but I was also having problems in that machine displaying 
accented characters in 4.x. When I was creating the profile in 4.x I noticed 
there weren't accents displayed. I wouldn't worry, if more people with the same 
data have been able to migrate it, I'd say this is fixed. Kat, use the machine 
in my cube in you want. I'm out today (believe it or not)
Marking rtm+ since msanz wants to review this at PDT today
Whiteboard: [rtm need info] → [rtm+]
I migrated Montse's profile using 10/27 trunk build on Japanese Win95.
The accented characters in the address book are not lost.
Ok, here is the test i've performed today with 2000-10-27 trunk build on French
Win NT (with sp5):
- In communicator4.61 created a new profile;
- in the AB under Personal AB created several entries containing accented chars;
- exit Communicator;
- installed trunk build;
- open trunk build with 4.61 profile, migrated it;
- open AB , all entries show accented chars correctly!
I used Maontse's machine and her profile. Migration works fine with today's 
trunk build. Accented characters in her address book are displayed correctly 
after migration. I didn't get the problem that Montse has.
Montse's test machine was Japanese Windows-NT4 but with
Address Book files created under US Windows --> thus the
profile entry for the Personal Address Book has "ISO-8859-1"
charset associated with the 4.7x PAB. Given sspitzer's fix,
even this kind of non-standard migration will work and
as ji's 2 tests showed today, it does. Of course, if the
pref.js does not contain the charset info with the PAB, then
migration of accented character entries will fail unless
the charset of the PAB matches that of the OS.

So the fix works for normal cases where the migrated ADbooks
charset matches that of the OS and also for those cases
where the OS has a different charset as long as the charset
info associated with AdBook files is in the prefs.js file.

The only remaining mystery is the result of msanz's original
test but later tests by ji shows that works also.

summing up all test results i feel like resolving this bug as fixed
Status: ASSIGNED → RESOLVED
Closed: 24 years ago24 years ago
Resolution: --- → FIXED
well to be politically correct (because the fix is in the trunk and didn't land
inot the branch yet) i have to leave it as assigned (not fixed) and the only way
to do it is to reopen, so reopening...
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
leaving it as ASSIGNED to seth
Status: REOPENED → ASSIGNED
PDT marking [rtm need info]. After talking to Kat, we believe that the patch
fixes the problem. Next, we'd like to make sure that there are no regressions.
Please report test results back and mark rtm+ in order to get reconsidered for RTM.
Whiteboard: [rtm+] → [rtm need info]
note to kat:  I just tested migrating 4.x mailing lists (with accented
characters in every text field) and everything worked fine.

[Summary of the testing status]

After discussing the details of the fix with sspitzer and nhotta, and testing
some more among ji, marina, and momoi, I am happy to report the following
conclusions:

1. The fix works for all the key cases involving non-ASCII characters under
    Latin 1 (includign US) and Japanese Windows.
2. The actual lines of changed code are fairly extensive but
   the area affected is limited to just the conversion process from
   4.7x na2 Adbook file format to LDIF data format. sspitzer's code
    changes do not affect the process from the LDIF to .mab NS6 Adbook format.
3. The code changes do not affect migration of files at all. In fact, na2 Adbook
   files are simply coipied into a new NS6 profile directory first.
   And only when Adbook or Mail is opened, the na2 to LDIF format conversion
   takes place.
4. Thus, the only area of possible regresion is the ASCII character conversion.
    Before the fix with the 10/27 Branch build, ASCII conversion worked but
    non-ASCII conversion did not.
5. After the fix with 10/27 trunk build, non-ASCII conversion now works.
   I tested about 500 entries of ASCII only data with Branch and Trunk builds from
    today. Both builds were able to migrate the exact same number of entries.
    (Note: There were 2 entries which disappeared but this problem occurred in
           both builds and will be filed as a separate bug unrelated to sspitzer's
           fix.)

ji will conduct ASCII data migration on Monday with a larger set of entries --
something like 2000 entries just to cover all the bases for a possible regression,
but I am now satisfied that 1) the fix works, and 2) there are no ASCII data
regressions. Unless ji finds some startling differences with a larger set of
data, there is no more doubt in my mind.

Marking the status as [rtm+] again for consideration.
Whiteboard: [rtm need info] → [rtm+]
One additional note regarding point #2 above.

sspitzer's code additions for na2 to LDIF format conversion
are essentially the same code used in exporting Personal Adbook into
the LDIF file format from Comm 4.5 through 4.76. This code has been
in existence for 2 years or so in 4.x and we have not
heard of any problems with it. This gives us another dose of
confidence for the current fix. 
Tested 2000103006-Mtrunk build with a large set of pb data. All the cards 
(about 3200 ASCII cards) can be migrated from 4.75 to Netscape 6 without any 
problem.
No errors occurred in the migration process and none of the cards is corrcupted 
after migration.
2..4..6..8
let's land this before it's too late.
PDT marking [rtm++]. This bug is now out of limbo and approved for checkin to
the branch. Please check in ASAP.

Whiteboard: [rtm+] → [rtm++]
fixed on both trunk and branch.

thanks to the excellent QA help verify the fix worked.
Status: ASSIGNED → RESOLVED
Closed: 24 years ago24 years ago
Resolution: --- → FIXED
** Checked with Windows 11/2/2000 MN build **

Looks very good. The following were migrated without a problem:

1. ASCII-only entries (over 600)
2. Mixed Japanese and ASCII data with charset in the original
AdBook file. (Shift_JIS)
3. Mixed Japanese and ASCII data with no charset in the original
AdBook file.
4. Mixed Latin 1 & ASCII data (msanz profile) with ISO-8859-1
   charset.

marina and ji, can you help with the following?

marina -- Mac with similar testing
ji -- Linux with similar testing.
tested Mac branch build 2000-11-03 , migrated monse's AB from 4.7 into 6.0 ,
looks good, now when ji tells us that it looks OK on Linux, i'll mark it as
verified.
After conferring with marina and ji, we have decided
to mark this fix verified as fixed.
The Linux problem is actually a general problem
and would be also applicable to other platforms
for Traditional and Simplified Chinese locales.
But the problem there does not threaten the core
integrity of this fix. There is a workaround
and also migration/conversion occurs when the charset
name is changed to match what is expected by the current
code. These problems are described in Bug 59073
and Bug  58577.
We have to release note the remaining problem
but this bug is now verified.
Status: RESOLVED → VERIFIED
Product: MailNews → Core
Product: Core → MailNews Core
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: