Closed
Bug 14349
Opened 26 years ago
Closed 25 years ago
[FEATURE] implement migration tool to convert 4.x prefs to UTF8
Categories
(Core :: Internationalization, defect, P1)
Tracking
()
VERIFIED
FIXED
M14
People
(Reporter: neeti, Assigned: sspitzer)
References
Details
(Whiteboard: [PDT+] eta 2-17-00)
Need to implement a migration tool to convert convert all 4.x prefs to utf8
Updated•26 years ago
|
Assignee: ftang → neeti
Comment 1•26 years ago
|
||
neeti, who own the migration tool ?
I can provide needed function call the support the migration tool. But I don't
think I should own the tool.
Since the text in the pref could be in native encoding or in UTF-8 , we need the
following algorithm in the migration tool-
1. read in one line of pref text as char*
2. call the i18ngrp support routine IsUTF8String
3. if IsUTF8String return true, the pref is already in UTF-8. Save that pref as
is to the new pref
4. If IsUTF8String return false, the charset of that line is in the native
encoding, use the nsIPlatformCharset API to get the charset
5. Then use Get a nsIUnicodeDecoder to decode that char* to PRUnichar*. Then you
can use nsString::ToNewUTF8String to generate the UTF8 and save into the new 5.0
pref.
Reassign this back to Neeti since I do not own the tool. When do we need these
api ?
All the needed function is already in the code, except the IsUTF8String, which I
will attached the code into this bug report later.
Comment 2•26 years ago
|
||
Here is the IsUTF8String function
#define kLeft1BitMask 0x80
#define kLeft2BitsMask 0xC0
#define kLeft3BitsMask 0xE0
#define kLeft4BitsMask 0xF0
#define kLeft5BitsMask 0xF8
#define kLeft6BitsMask 0xFC
#define kLeft7BitsMask 0xFE
#define k2BytesLeadByte kLeft2BitsMask
#define k3BytesLeadByte kLeft3BitsMask
#define k4BytesLeadByte kLeft4BitsMask
#define k5BytesLeadByte kLeft5BitsMask
#define k6BytesLeadByte kLeft6BitsMask
#define kTrialByte kLeft1BitMask
#define UTF8_1Byte(c) ( 0 == ((c) & kLeft1BitMask))
#define UTF8_2Bytes(c) ( k2BytesLeadByte == ((c) & kLeft3BitsMask))
#define UTF8_3Bytes(c) ( k3BytesLeadByte == ((c) & kLeft4BitsMask))
#define UTF8_4Bytes(c) ( k4BytesLeadByte == ((c) & kLeft5BitsMask))
#define UTF8_5Bytes(c) ( k5BytesLeadByte == ((c) & kLeft6BitsMask))
#define UTF8_6Bytes(c) ( k6BytesLeadByte == ((c) & kLeft7BitsMask))
#define UTF8_ValidTrialByte(c) ( kTrialByte == ((c) & kLeft2BitsMask))
PRBool IsUTF8String(const char* utf8)
{
if(NULL == utf8)
return TRUE;
return IsUTF8Text(utf8, strlen(utf8));
}
PRBool IsUTF8Text(const char* utf8, int32 len)
{
int32 i;
int32 j;
int32 clen;
for(i =0; i < len; i += clen)
{
if(UTF8_1Byte(utf8[i]))
{
clen = 1;
} else if(UTF8_2Bytes(utf8[i])) {
clen = 2;
/* No enough trail bytes */
if( (i + clen) > len)
return FALSE;
/* 0000 0000 - 0000 007F : should encode in less bytes */
if(0 == (utf8[i] & 0x1E ))
return FALSE;
} else if(UTF8_3Bytes(utf8[i])) {
clen = 3;
/* No enough trail bytes */
if( (i + clen) > len)
return FALSE;
/* a single Surrogate should not show in 3 bytes UTF8, instead, the pair
should be intepreted
as one single UCS4 char and encoded UTF8 in 4 bytes */
if((0xED == utf8[i] ) && (0xA0 == (utf8[i+1] & 0xA0 ) ))
return FALSE;
/* 0000 0000 - 0000 07FF : should encode in less bytes */
if((0 == (utf8[i] & 0x0F )) && (0 == (utf8[i+1] & 0x20 ) ))
return FALSE;
} else if(UTF8_4Bytes(utf8[i])) {
clen = 4;
/* No enough trail bytes */
if( (i + clen) > len)
return FALSE;
/* 0000 0000 - 0000 FFFF : should encode in less bytes */
if((0 == (utf8[i] & 0x07 )) && (0 == (utf8[i+1] & 0x30 )) )
return FALSE;
} else if(UTF8_5Bytes(utf8[i])) {
clen = 5;
/* No enough trail bytes */
if( (i + clen) > len)
return FALSE;
/* 0000 0000 - 001F FFFF : should encode in less bytes */
if((0 == (utf8[i] & 0x03 )) && (0 == (utf8[i+1] & 0x38 )) )
return FALSE;
} else if(UTF8_6Bytes(utf8[i])) {
clen = 6;
/* No enough trail bytes */
if( (i + clen) > len)
return FALSE;
/* 0000 0000 - 03FF FFFF : should encode in less bytes */
if((0 == (utf8[i] & 0x01 )) && (0 == (utf8[i+1] & 0x3E )) )
return FALSE;
} else {
return FALSE;
}
for(j = 1; j<clen ;j++)
{
if(! UTF8_ValidTrialByte(utf8[i+j])) /* Trail bytes invalid */
return FALSE;
}
}
return TRUE;
}
Frank, profile migration is being done by Don Bragg and Steve Elmer's group
deals with profiles. I think the ownership issue of who is going to implement
the migration tool needs to be worked out between internationalization, Steve
Elmer and Don Bragg.
Comment 4•26 years ago
|
||
Frank, you need to write the migration portion of the tool and then it can hook
into Don's migrator. None of us are signed up to create every migration tool
required, each team needs to fit their part into the framework we've created.
Updated•26 years ago
|
Target Milestone: M12
Updated•26 years ago
|
Status: NEW → ASSIGNED
Comment 5•26 years ago
|
||
selmer: I understand that part, see my email for details. I need to know who is working on migration tool in your team (so we can chat about eng
details.) Our team will do the text conversion for the prefile content to UTF-8. But it will be nice if we can understand who the code can fit into your
tool code. Let' chat over the phone.
Comment 6•26 years ago
|
||
Don, Seth, it sounds like Frank needs to hook in at your level. Can one of you
help him with the details?
Updated•26 years ago
|
Assignee: ftang → bobj
Status: ASSIGNED → NEW
Comment 7•26 years ago
|
||
bobj- I have no bandwidth to handle this right now. Please find a owner for it.
Summary: implement migration tool to convert 4.x prefs to UTF8 → [BETA] implement migration tool to convert 4.x prefs to UTF8
Updated•25 years ago
|
Assignee: bobj → nhotta
Status: ASSIGNED → NEW
Comment 8•25 years ago
|
||
nhotta kindly volunteer for taking this bug. reassign to him.
Updated•25 years ago
|
Status: NEW → ASSIGNED
Summary: [BETA] implement migration tool to convert 4.x prefs to UTF8 → [BETA][FEATURE] implement migration tool to convert 4.x prefs to UTF8
Comment 9•25 years ago
|
||
I have a couple of questions.
1)Where to put IsUTF8String? In mozilla/intl?
2)Where (in which directory) is the migration code?
Comment 10•25 years ago
|
||
I found nsPrefMigration does the migration.
http://lxr.mozilla.org/seamonkey/source/profile/pref-migrator/src/nsPrefMigratio
n.cpp
Here is what I can do. I can add member functions to
1) detects UTF-8 string
2) get a platform charset
3) convert string to UTF-8
Then I will reassign this to dougt or sspitzer (as they appear in CVS log).
Updated•25 years ago
|
Target Milestone: M13 → M14
Comment 11•25 years ago
|
||
I can probably check in my part in next week but it needs time actually to be
used in the migration code. Moving to M14.
Updated•25 years ago
|
Assignee: nhotta → sspitzer
Status: ASSIGNED → NEW
Comment 12•25 years ago
|
||
Checked in I18N functions, reassign to sspitzer.
Here is a usage from the header file (nsPrefMigration.h).
// I18N pref migration:
//
// 5.0 stores pref strings are UTF-8 while 4.x stores them either plat
form charset or UTF-8
// depends on the pref.
// Functions here provide possible two ways to deal with the I18N
migration.
//
// 1) Use the knowleage of which 4.x pref strings are platform charset.
// If PrefStringNeedsCharsetConversion() returns true then the string to
be converted to UTF-8.
//
// 2) Apply UTF-8 detection to all string pres. Apply the conversion if
the string is detected as UTF-8.
//
// The user of the functions need to decide 1) or 2).
// The functions to get platform charset and charset conversion code to
UTF-8 are also provided.
//
| Assignee | ||
Comment 13•25 years ago
|
||
selmer, is this something your team can handle?
I don't think this is a mail-news specific bug, and I'm overloaded.
Comment 14•25 years ago
|
||
Seth, why isn't nhotta finishing the implementation? It's certainly not mail
specific...
Comment 15•25 years ago
|
||
I do not have knowledge of pref migration. All I could do was to provide I18N
functions (unicode conversion etc..), see my comment on 2000-01-05 15:05.
I assigned to sspitzer because he and dougt is the owner of the source files
according to cvs.
Basically, I18N group provides I18N functions and consultation and it's each
group responsibility to support I18N features. I agree this is not mail/news
issue so sspitzer is not the person to resolve this.
Seth, please reassign this to the person who is responsible for pref migration
tool. We can help implementing I18N support.
| Assignee | ||
Updated•25 years ago
|
Status: NEW → ASSIGNED
| Assignee | ||
Comment 16•25 years ago
|
||
accepting for now, but I may re-assign.
| Assignee | ||
Updated•25 years ago
|
Priority: P3 → P2
| Assignee | ||
Comment 17•25 years ago
|
||
marking p2
| Assignee | ||
Updated•25 years ago
|
Priority: P2 → P1
| Assignee | ||
Comment 18•25 years ago
|
||
this is migration related, so marking p1.
Summary: [BETA][FEATURE] implement migration tool to convert 4.x prefs to UTF8 → [FEATURE] implement migration tool to convert 4.x prefs to UTF8
Whiteboard: [PDT+]
| Assignee | ||
Comment 20•25 years ago
|
||
it was suggested by nhotta that I call PrefStringNeedsCharsetConversion() for
each prefname, and if that returned true, check if the pref was in utf-8 (by
calling IsUTF8String(), and if not, I'd convert that pref to utf-8 by using
GetPlatformCharset() and ConvertStringToUTF8()
that seems like a waste.
looking at PrefStringNeedsCharsetConversion() it looks like he plans on listing
all the prefs that may have been stored in the platform charset.
I'm going to just do this:
if we migrate a profile, after ReadUserPrefs(), I'll call ConvertPrefsToUTF8()
which will go through a list of prefs (currently the list is in
PrefStringNeedsCharsetConversion()) and then migrate those prefs from the
platform charset to utf8, if necessary.
this way, we will only migrate the prefs we know we need to.
I'll get rid of PrefStringNeedsCharsetConversion()
the only draw backs:
1) we need to know all the prefs that might need conversion
2) if the user doesn't migrate from 4.x to 5.0 using the migrator, their prefs
won't get migrated.
those seem acceptable.
I'll work on doing what I propose, then getting nhotta to review my changes (and
help me test) and then it will be up to nhotta to provide the full list of prefs
that may need migration.
I should be able to have this ready tomorrow.
Whiteboard: [PDT+] → [PDT+] eta 2-17-00
| Assignee | ||
Comment 21•25 years ago
|
||
if it turns out that there prefs that we can't hard code ahead of time, we can
always add to ConvertPrefsToUTF8() [the thing I am writing now] use
nsIPref::EnumerateChildren to look for prefs that begin with "foo.bar", for
example.
| Assignee | ||
Comment 22•25 years ago
|
||
I have a fix in hand.
I need nhotta to review and we need that list of prefs to be converted
Comment 23•25 years ago
|
||
You can attach your diff in the bug report. You can ask momoi for the list of
the prefs to be converted.
Comment 24•25 years ago
|
||
How about if I gave you a prefs.js file which conatins
probably almost all possible non-ASCII pref items.
| Assignee | ||
Comment 25•25 years ago
|
||
yes, please, do that.
and I can use it to test!
Comment 26•25 years ago
|
||
I walked through the change Seth made with him. I reviewed a basic concept of
the conversion. I didn't review the code in detail so need another reviewer for
the change.
| Assignee | ||
Comment 27•25 years ago
|
||
alecf has reviewed my code.
there are prefs I'm trying to convert:
"mail.identity.username"
"mail.signature_file"
"mail.identity.organization"
"li.server.ldap.userbase"
"editor.image_editor"
"editor.html_editor"
"editor.author"
"custtoolbar.personal_toolbar_folder"
"browser.cache.directory"
"mail.directory"
"news.directory"
"mail.imap.root_dir"
"premigration.mail.directory"
"premigration.news.directory"
"browser.bookmark_file"
"browser.history_file"
"browser.sarcache.directory"
"browser.user_history_file"
"helpers.private_mailcap_file"
"helpers.private_mime_types_file"
"mail.default_fcc"
"mail.default_templates"
"news.default_fcc"
note, many of those are paths. any pref that is a path could have characters in
the system charset.
I'm also converting any pref that matches these patterns:
"ldap_2.server.*.description"
"intl.font*.fixed_font"
"intl.font*.prop_font"
a=phil, checking in soon.
Comment 28•25 years ago
|
||
The following could also be saved locally:
"mail.default_drafts"
| Assignee | ||
Updated•25 years ago
|
Status: ASSIGNED → RESOLVED
Closed: 25 years ago
Resolution: --- → FIXED
| Assignee | ||
Comment 29•25 years ago
|
||
fixed.
if we find more prefs to migrate, let me know, and I'll add them.
| Assignee | ||
Comment 30•25 years ago
|
||
I just added it.
You need to log in
before you can comment on or make changes to this bug.
Description
•