5130 - [FEATURE]Prefs backend code must be changed to use UTF8

While the majority of prefs are probably limited to ASCII values, some can contain non-ASCII strings. Examples: - File or directory names (encoded in the localized OS script/codepage) - Font names may (encoded in the localized OS script/codepage) - Real names (e.g., mail/vcard identities) can have non-ASCII LDAP v3 prefs are stored in UTF-8 The 4.x mish-mash is due to historical evolution. This worked sorta OK as long as these files were used only on the local system. We want to improve 5.0 and beyond. This is a chance for us to change the file format. After 5.0, we will want to preserve forward compatibility. If we ever want prefs to be served remotely (a la something like Location Independence), the 4.x model breaks. It is a known problem that LI does not interoperate between platforms. Any code reading the prefs should convert from UTF-8 to the appropriate encoding (e.g., UCS-2 for internal processing or the OS script/codepage to access the filesystem). Hopefully the pref API will provide some easy mechanism for this. Since the 4.x data is not tagged, the migration tool should probably guess that the old prefs are encoded in the current OS script/codepage. (I18N can provide APIs to determine this.) And then convert to UTF-8. The tool will need to special case the LDAP prefs (already in UTF-8) and not convert those entries. Do we want to provide a UI for the end-user to change the default (guessed) conversion? The other issue is how to let users edit non-ASCII prefs. Most text editors today on all platforms, do not support UTF-8 (NT editors can handle UCS-2.). Two options: 1. Provide something similar to the Java native2ascii tool to convert from UTF-8 to another editable encoding. 2. Rely upon Ender to edit plain text UTF-8 files. Theoretically this should work since Ender support both plain text and UTF-8. But we'd need to do some QA. Also, it may be a chicken and the egg problem, if you need to hack to prefs to bring up Communicator. But maybe that's an edge case that can be handled by creating a new profile.

Chris McAfee

Comment 8

•

26 years ago

M9, giving to don for re-assigning.

don

Updated

•

26 years ago

Assignee: don → dp

Target Milestone: M9

don

Comment 9

•

26 years ago

dp, can your team eventually handle this issue?

Suresh Duddi (gone)

Updated

•

26 years ago

Assignee: dp → neeti

neeti

Assignee

Updated

•

26 years ago

Status: NEW → ASSIGNED

Target Milestone: M11

neeti

Assignee

Updated

•

26 years ago

Summary: Prefs backend code must be changed to use UTF8 → [Feature]Prefs backend code must be changed to use UTF8

neeti

Assignee

Updated

•

26 years ago

Summary: [Feature]Prefs backend code must be changed to use UTF8 → [FEATURE]Prefs backend code must be changed to use UTF8

neeti

Assignee

Updated

•

26 years ago

Blocks: 11408

leger

Comment 10

•

26 years ago

Adding lyecies@netscape.com to cc list.

neeti

Assignee

Updated

•

26 years ago

Assignee: neeti → bobj

Status: ASSIGNED → NEW

neeti

Assignee

Comment 11

•

26 years ago

Bob, I have discussed this bug with Erik Van der Poel and Frank Tang to figure out the changes required for the prefs backend code(libpref) to use UTF8. Currently libpref uses char* in its apis for e.g. PREF_SetCharPref(const char *pref_name, const char *value) PREF_GetCharPref(const char *pref_name, char * return_buffer, int * length). Both Frank and Erik agree that we should not be adding or changing apis in libpref, but always treat the char* in the above apis as UTF8 strings. The code calling these apis should make sure that the char* is a UTF8 string, if the data is non-ascii. So, once the old prefs files are converted to UTF8 using the migration tool, libpref could handle utf8. I am assigning you the bug so that you could reassign it to the appropriate persons to write the migration tool utility. Neeti The r

Katsuhiko Momoi

Updated

•

26 years ago

Assignee: bobj → ftang

Katsuhiko Momoi

Comment 12

•

26 years ago

Bob is on sabbatical and Erik is on vacation. re-assigning to ftang for now..

Frank Tang

Updated

•

26 years ago

Status: NEW → ASSIGNED

Frank Tang

Comment 13

•

26 years ago

The issue should be solve in the code which CALL PREF_SetCharPref(const char *pref_name, const char *value) PREF_GetCharPref(const char *pref_name, char * return_buffer, int * length). but not themself. For these two function. Make sure we clearly spec out the char* mean UTF8 . Neeti, could you do the spec out part ?

neeti

Assignee

Comment 14

•

26 years ago

Frank, I can spec out spec out the char* mean UTF8 . But, I think this will only be effective, once we have the migration tool ready.

Frank Tang

Comment 15

•

26 years ago

I think we need to change the API - in http://lxr.mozilla.org/mozilla/source/modules/libpref/public/nsIPref.idl 100 void SetCharPref(in string pref, in string value); should be 100 void SetCharPref(in string pref, in wstring value); and 126 void SetDefaultCharPref(in string pref, in string value); should be 126 void SetDefaultCharPref(in string pref, in wstring value); and 137 string CopyCharPref(in string pref); should be 137 wstring CopyCharPref(in string pref); If we do that, then we also need to change the following callers- http://lxr.mozilla.org/mozilla/ident?i=CopyCharPref http://lxr.mozilla.org/mozilla/ident?i=SetDefaultCharPref http://lxr.mozilla.org/mozilla/ident?i=SetCharPref a safer plan is first add the new wstring version (in C++ it will be PRUnichar* or nsString) and then graduage change the callers. And finally remove the string version. Reassign this back to neeti for adding the wstring version.

Assignee: ftang → neeti

Status: ASSIGNED → NEW

neeti

Assignee

Updated

•

26 years ago

Status: NEW → ASSIGNED

Depends on: 14349

neeti

Assignee

Comment 16

•

26 years ago

Opened a new bug for the migration tool utility - 14349. Bug 5130 if implementing the three new apis which Frank mentioned above.

neeti

Assignee

Updated

•

26 years ago

Target Milestone: M11 → M15

neeti

Assignee

Comment 17

•

26 years ago

Moving target milestone to M15

sairuh (rarely reading bugmail)

Comment 18

•

26 years ago

spam: added self to cc list as this might affect my realm.

neeti

Assignee

Comment 19

•

26 years ago

I think we should not be adding or changing apis in libpref, but always treat the char* in the above apis as UTF8 strings. The code calling these apis should make sure that the char* is a UTF8 string, if the data is non-ascii. So, once the old prefs files are converted to UTF8 using the migration tool, libpref could handle utf8.

neeti

Assignee

Comment 20

•

26 years ago

The libpref api need not know anything about anything about the various encodings. It just stores the string it was passed in, and returns the same. The callers of the api should make sure that char* means UTF-8

Erik van der Poel

Comment 21

•

26 years ago

Alec has already started adding APIs to libpref to allow callers to get/set Unicode (PRUnichar) strings. I think we need to continue going down this path, because it is too painful for the rest of Mozilla to do the UTF-8 <-> Unicode conversion all over the place, when it could easily be done in one place (libpref).

Erik van der Poel

Comment 22

•

26 years ago

Alec, see my previous comment in this bug.

Suresh Duddi (gone)

Comment 23

•

26 years ago

Eric, How would that be ? Any code using libpref should already be doing that. I see two cases: 1. Code storing strings into pref are already using utf 8 in which case there is no change 2. Code storing strings into pref are using ascii or whatever and need to be converted to utf8. The api's you are adding arent going to solve the problem as now instead of changing these code from converting to utf8, you would be changing this code to convert to unicode. Is that so much simpler to do the later ? I dont understand the reasoning behind this api addition to libpref.

Erik van der Poel

Comment 24

•

26 years ago

The existing APIs are defined in terms of XPIDL "string". These are, and should remain, UTF-8. I am asking for the addition of APIs defined in terms of XPIDL "wstring", which are of course PRUnichar (Unicode). And as I said, Alec has already started adding such APIs. I'm asking for that work to be completed. There is a separate bug report for that completion work. I don't have the bug number, off-hand.

Suresh Duddi (gone)

Comment 25

•

26 years ago

Ok I talked with eric about this. I think we need to talk about this together. I will try to huddle with alecf, neeti, eric and selmer to close on this.

Alec Flett

Comment 26

•

26 years ago

I agree with erik that the API needs to be extended.. there are many places where you need to retrieve a unicode string from prefs, and it makes no sense to make all these call sites do the conversion over and over, in possibly different or incorrect ways. ... please take a look at http://lxr.mozilla.org/seamonkey/source/modules/libpref/src/nsPref.cpp#708 The way I did it there was approved by the i18n team some time back... if there are any revisions or improvements, we only have to do it in one place, not fifty.

Suresh Duddi (gone)

Comment 27

•

26 years ago

I see. So are we planning on continuing the unicode and char * apis. Sounds like should actively deprecate the char * api.

Erik van der Poel

Comment 28

•

26 years ago

I think it would be good to deprecate the char* (string) APIs. However, it might be a good idea to see how much code will be affected by such a change.

Alec Flett

Comment 29

•

26 years ago

no! that's not right either. Sometimes you just want a char* pref... for instance in mail we use a char* pref to determine the server type like "imap" or "pop3" - it would be rediculous to convert those from UTF8 to PRUnichar* and back to UTF8 if I just want to compare it with a char string constant like "imap"

Erik van der Poel

Comment 30

•

26 years ago

OK, I don't feel too strongly about this, so leave the char* APIs in, but add PRUnichar* APIs correspondingly.

Alec Flett

Comment 31

•

26 years ago

ok, I've checked in the fix to add get/set of default Unichar prefs, so I'd say mark this INVALID or WONTFIX.

neeti

Assignee

Comment 32

•

25 years ago

Marking this invalid, since the existing APIs are defined in terms of XPIDL "string". These are UTF-8. Alec has checked in the APIs defined in terms of XPIDL "wstring", which are PRUnichar (Unicode).

Status: ASSIGNED → RESOLVED

Closed: 25 years ago

Resolution: --- → INVALID

leger

Comment 33

•

25 years ago

Moving all libPref component bugs to new Preferences: Backend component. libPref component will be deleted.

Component: libPref → Preferences: Backend

shrirang khanzode

Comment 34

•

25 years ago

sorry for the spam, changing QA contact.

QA Contact: paulmac → sairuh

sairuh (rarely reading bugmail)

Comment 35

•

25 years ago

verif.

Status: RESOLVED → VERIFIED