Last Comment Bug 5130 - [FEATURE]Prefs backend code must be changed to use UTF8
: [FEATURE]Prefs backend code must be changed to use UTF8
Status: VERIFIED INVALID
:
Product: Core
Classification: Components
Component: Preferences: Backend (show other bugs)
: Trunk
: All Mac System 8.5
: P3 normal (vote)
: M15
Assigned To: neeti
: sairuh (rarely reading bugmail)
Mentors:
Depends on: 14349
Blocks: 7228 11408
  Show dependency treegraph
 
Reported: 1999-04-15 12:32 PDT by mcmullen
Modified: 2000-04-24 16:47 PDT (History)
7 users (show)
See Also:
Crash Signature:
(edit)
QA Whiteboard:
Iteration: ---
Points: ---
Has Regression Range: ---
Has STR: ---


Attachments

Description mcmullen 1999-04-15 12:32:39 PDT
I'm adding this task as a bug to help me track it.
Comment 1 mcmullen 1999-04-15 12:33:59 PDT
Setting the right component.
Comment 2 mcmullen 1999-04-15 12:41:59 PDT
Platform=all
Comment 3 mcmullen 1999-04-15 12:45:59 PDT
Changing plumbing bugs to M5 milestone, and accepting them.
Comment 4 don 1999-04-19 14:39:59 PDT
Changed target milestone to M6.
Comment 5 mcmullen 1999-05-20 11:57:59 PDT
Changing to m7
Comment 6 mcmullen 1999-06-02 19:01:59 PDT
This has been scheduled for M8
Comment 7 bobj 1999-06-11 14:11:59 PDT
While the majority of prefs are probably limited to ASCII values, some can
contain non-ASCII strings.  Examples:

  - File or directory names (encoded in the localized OS script/codepage)
  - Font names may (encoded in the localized OS script/codepage)
  - Real names (e.g., mail/vcard identities) can have non-ASCII LDAP v3
    prefs are stored in UTF-8

The 4.x mish-mash is due to historical evolution.  This worked sorta OK as long
as these files were used only on the local system.

We want to improve 5.0 and beyond.  This is a chance for us to change the
file format.  After 5.0, we will want to preserve forward compatibility.

If we ever want prefs to be served remotely (a la something like Location
Independence), the 4.x model breaks.  It is a known problem that LI does
not interoperate between platforms.

Any code reading the prefs should convert from UTF-8 to the appropriate
encoding (e.g., UCS-2 for internal processing or the OS script/codepage
to access the filesystem).  Hopefully the pref API will provide some
easy mechanism for this.

Since the 4.x data is not tagged, the migration tool should probably guess
that the old prefs are encoded in the current OS script/codepage.  (I18N
can provide APIs to determine this.)  And then convert to UTF-8.  The
tool will need to special case the LDAP prefs (already in UTF-8) and not
convert those entries.  Do we want to provide a UI for the end-user to
change the default (guessed) conversion?

The other issue is how to let users edit non-ASCII prefs.  Most text
editors today on all platforms, do not support UTF-8 (NT editors can
handle UCS-2.).  Two options:

   1. Provide something similar to the Java native2ascii tool to convert
      from UTF-8 to another editable encoding.
   2. Rely upon Ender to edit plain text UTF-8 files.  Theoretically this
      should work since Ender support both plain text and UTF-8.   But
      we'd need to do some QA.  Also, it may be a chicken and the egg
      problem, if you need to hack to prefs to bring up Communicator.
      But maybe that's an edge case that can be handled by creating a
      new profile.
Comment 8 Chris McAfee 1999-07-06 14:27:59 PDT
M9, giving to don for re-assigning.
Comment 9 don 1999-07-23 10:51:59 PDT
dp, can your team eventually handle this issue?
Comment 10 leger 1999-08-30 15:25:59 PDT
Adding lyecies@netscape.com to cc list.
Comment 11 neeti 1999-09-10 12:08:59 PDT
Bob,
I have discussed this bug with Erik Van der Poel and Frank Tang to figure out
the changes required for the prefs backend code(libpref) to use UTF8. Currently
libpref uses char* in its apis for e.g.
PREF_SetCharPref(const char *pref_name, const char *value)
PREF_GetCharPref(const char *pref_name, char * return_buffer, int * length).

Both Frank and Erik agree that we should not be adding or changing apis in
libpref, but always treat the char* in the above apis as UTF8 strings. The code
calling these apis should make sure that the char* is a UTF8 string, if the data
is non-ascii. So, once the old prefs files are converted to UTF8 using the
migration tool, libpref could handle utf8.

I am assigning you the bug so that you could reassign it to the appropriate
persons to write the migration tool utility.

Neeti

The r
Comment 12 Katsuhiko Momoi 1999-09-10 12:30:59 PDT
Bob is on sabbatical and Erik is on vacation.
re-assigning to ftang for now..
Comment 13 Frank Tang 1999-09-11 00:59:59 PDT
The issue should be solve in the code which CALL
PREF_SetCharPref(const char *pref_name, const char *value)
PREF_GetCharPref(const char *pref_name, char * return_buffer, int * length).

but not themself. For these two function. Make sure we clearly spec out the
char* mean UTF8 .

Neeti, could  you do the spec out part ?
Comment 14 neeti 1999-09-13 09:44:59 PDT
Frank, I can spec out spec out the char* mean UTF8 . But, I think this will only
be effective, once we have the migration tool ready.
Comment 15 Frank Tang 1999-09-17 14:00:59 PDT
I think we need to change the API - in
http://lxr.mozilla.org/mozilla/source/modules/libpref/public/nsIPref.idl
100   void SetCharPref(in string pref, in string value);
should be
100   void SetCharPref(in string pref, in wstring value);

and
126   void SetDefaultCharPref(in string pref, in string value);
should be
126   void SetDefaultCharPref(in string pref, in wstring value);

and
137   string CopyCharPref(in string pref);
should be
137   wstring CopyCharPref(in string pref);

If we do that, then we also need to change the following callers-

http://lxr.mozilla.org/mozilla/ident?i=CopyCharPref
http://lxr.mozilla.org/mozilla/ident?i=SetDefaultCharPref
http://lxr.mozilla.org/mozilla/ident?i=SetCharPref

a safer plan is first add the new wstring version (in C++ it will be
PRUnichar* or nsString) and then graduage change the callers. And finally remove
the string version.

Reassign this back to neeti for adding the wstring version.
Comment 16 neeti 1999-09-20 11:57:59 PDT
Opened a new bug for the migration tool utility - 14349. Bug 5130 if
implementing the three new apis which Frank mentioned above.
Comment 17 neeti 1999-09-20 14:36:59 PDT
Moving target milestone to M15
Comment 18 sairuh (rarely reading bugmail) 2000-01-06 15:01:59 PST
spam: added self to cc list as this might affect my realm.
Comment 19 neeti 2000-01-13 11:32:59 PST
I think we should not be adding or changing apis in libpref, but always treat
the char* in the above apis as UTF8 strings. The code calling these apis should
make sure that the char* is a UTF8 string, if the data is non-ascii. So, once
the old prefs files are converted to UTF8 using the migration tool, libpref
could handle utf8.
Comment 20 neeti 2000-01-13 11:38:59 PST
The libpref api need not know anything about anything about the various
encodings. It just stores the string it was passed in, and returns the same.

The callers of the api should make sure that char* means UTF-8
Comment 21 Erik van der Poel 2000-01-13 11:42:59 PST
Alec has already started adding APIs to libpref to allow callers to get/set
Unicode (PRUnichar) strings. I think we need to continue going down this path,
because it is too painful for the rest of Mozilla to do the UTF-8 <-> Unicode
conversion all over the place, when it could easily be done in one place
(libpref).
Comment 22 Erik van der Poel 2000-01-13 11:42:59 PST
Alec, see my previous comment in this bug.
Comment 23 Suresh Duddi (gone) 2000-01-13 11:54:59 PST
Eric, How would that be ?

Any code using libpref should already be doing that. I see two cases:

1. Code storing strings into pref are already using utf 8 in which case there is
no change

2. Code storing strings into pref are using ascii or whatever and need to be
converted to utf8. The api's you are adding arent going to solve the problem as
now instead of changing these code from converting to utf8, you would be
changing this code to convert to unicode. Is that so much simpler to do the
later ?

I dont understand the reasoning behind this api addition to libpref.
Comment 24 Erik van der Poel 2000-01-13 12:05:59 PST
The existing APIs are defined in terms of XPIDL "string". These are, and should
remain, UTF-8. I am asking for the addition of APIs defined in terms of XPIDL
"wstring", which are of course PRUnichar (Unicode). And as I said, Alec has
already started adding such APIs. I'm asking for that work to be completed.
There is a separate bug report for that completion work. I don't have the bug
number, off-hand.
Comment 25 Suresh Duddi (gone) 2000-01-13 12:26:59 PST
Ok I talked with eric about this. I think we need to talk about this together. I
will try to huddle with alecf, neeti, eric and selmer to close on this.
Comment 26 Alec Flett 2000-01-13 15:02:59 PST
I agree with erik that the API needs to be extended.. there are many places
where you need to retrieve a unicode string from prefs, and it makes no sense to
make all these call sites do the conversion over and over, in possibly different
or incorrect ways.
... please take a look at
http://lxr.mozilla.org/seamonkey/source/modules/libpref/src/nsPref.cpp#708

The way I did it there was approved by the i18n team some time back... if there
are any revisions or improvements, we only have to do it in one place, not
fifty.
Comment 27 Suresh Duddi (gone) 2000-01-13 16:31:59 PST
I see. So are we planning on continuing the unicode and char * apis. Sounds like
should actively deprecate the char * api.
Comment 28 Erik van der Poel 2000-01-13 17:33:59 PST
I think it would be good to deprecate the char* (string) APIs. However, it might
be a good idea to see how much code will be affected by such a change.
Comment 29 Alec Flett 2000-01-13 17:58:59 PST
no! that's not right either. Sometimes you just want a char* pref... for
instance in mail we use a char* pref to determine the server type like "imap" or
"pop3" - it would be rediculous to convert those from UTF8 to PRUnichar* and
back to UTF8 if I just want to compare it with a char string constant like
"imap"
Comment 30 Erik van der Poel 2000-01-13 18:00:59 PST
OK, I don't feel too strongly about this, so leave the char* APIs in, but add
PRUnichar* APIs correspondingly.
Comment 31 Alec Flett 2000-01-18 14:06:59 PST
ok, I've checked in the fix to add get/set of default Unichar prefs, so I'd say
mark this INVALID or WONTFIX.
Comment 32 neeti 2000-01-25 11:08:31 PST
Marking this invalid, since the existing APIs are defined in terms of XPIDL 
"string". These are UTF-8. Alec has checked in the APIs defined in terms of 
XPIDL "wstring", which are PRUnichar (Unicode).
Comment 33 leger 2000-02-14 08:32:01 PST
Moving all libPref component bugs to new Preferences: Backend component.  
libPref component will be deleted.
Comment 34 shrirang khanzode 2000-04-24 16:23:24 PDT
sorry for the spam, changing QA contact.
Comment 35 sairuh (rarely reading bugmail) 2000-04-24 16:47:41 PDT
verif.

Note You need to log in before you can comment on or make changes to this bug.