Closed
Bug 262385
Opened 20 years ago
Closed 20 years ago
nsIScriptableUnicodeConverter is not very scriptable
Categories
(Core :: Internationalization, defect)
Core
Internationalization
Tracking
()
RESOLVED
FIXED
mozilla1.8alpha5
People
(Reporter: darin.moz, Assigned: Biesinger)
References
Details
Attachments
(1 file, 1 obsolete file)
8.31 KB,
patch
|
jshin1987
:
review+
darin.moz
:
superreview+
|
Details | Diff | Splinter Review |
nsIScriptableUnicodeConverter is not very scriptable. this interface is used by some JS code to convert unicode data: http://lxr.mozilla.org/mozilla/search?string=nsIScriptableUnicodeConverter but, the interface has methods like this: string ConvertFromUnicode(in wstring aSrc); |string| is only capable of representing ascii codepoints, and xpconnect will decimate non-ASCII codepoints, so this interface is completely useless for javascript applications. cc'ing rginda since venkman is affected. calendar is apparently also affected.
Comment 1•20 years ago
|
||
Darin, |string ConvertFromUnicode(in wstring aSrc)| is inevitable because what we want out of that is a sequence of bytes(octets) that represent a multi-byte string ('multi' including one). Crossing XPCOM, it will be "inflated" (with zero-padding) but the information is still there. Needless to say, clients have to be careful. Javascript being typeless, I don't know what better way there is for this.
Assignee | ||
Comment 2•20 years ago
|
||
ACString may be slightly better... or [array,size_is(foo)] out octet
Reporter | ||
Comment 3•20 years ago
|
||
jshin: i totally understand how it manages to work today, but xpconnect asserts (in debug mode) that the high-bit of any element of a |string| will be zero. it's a hack to rely on this behavior, and we should do something better. an octet array is probably the best option (disregarding performance issues) since you really don't want to treat the resulting character array as a JS string. another option would be to use a nsISupportsCString instance to represent the non-unicode encoded string.
Comment 4•20 years ago
|
||
I wrote a comment concurring with comment #2 (and now with comment #3 as well), but somehow I didn't press the submit button :-). In comment #1, I didn't mention using an octet array because I saw once a method in an interface being marked as unscriptable for using it, which made me (incorrectly) believe that an octet array is no better than |string| when crossing xpconnect. Btw, I heard a complaint that the calendar in Mozilla-suite has problems with non-ASCII characters while Sunbird doesn't. It occurred to me that this discrepancy may have to do with different treatments of 'pseudo-Unicode' JS strings (for multi-byte encoded C strings) in two versions of calendar (if there's a fork between two.). Chatzilla seems to have managed to work so far (this interface was added for it in bug 54857).
Assignee | ||
Comment 5•20 years ago
|
||
*** Bug 269649 has been marked as a duplicate of this bug. ***
Assignee | ||
Comment 6•20 years ago
|
||
how about this?
Assignee: smontagu → cbiesinger
Status: NEW → ASSIGNED
Assignee | ||
Updated•20 years ago
|
Attachment #165841 -
Flags: review?(jshin)
Comment 7•20 years ago
|
||
Comment on attachment 165841 [details] [diff] [review] suggested patch r=jshin thanks. >+ wstring convertByteArrayToUnicode(in unsigned long aCount, >+ [const,array,size_is(aCount)] in octet aData); >+ void convertToByteArray(in wstring aString, >+ out unsigned long aLen, >+ [array, size_is(aLen),retval] out octet aData); How about |convertFromByteArray| and |convertToByteArray| these methods being always used in scripts where 'Unicode' is implicitly assumed? It's unfortunate that two existing methods did exactly the opposite. Alternatively, |convertByteArrayToUnicode| and |convertUnicodeToByteArray| would do.
Attachment #165841 -
Flags: review?(jshin) → review+
Assignee | ||
Comment 8•20 years ago
|
||
now using convertFromByteArray/convertToByteArray. this also improves the documentation in the idl file.
Attachment #165841 -
Attachment is obsolete: true
Assignee | ||
Comment 9•20 years ago
|
||
Comment on attachment 165906 [details] [diff] [review] patch v2 since I'm not quite sure my Finish comment change is correct, re-requesting review
Attachment #165906 -
Flags: review?(jshin)
Assignee | ||
Updated•20 years ago
|
Target Milestone: --- → mozilla1.8alpha5
Comment 10•20 years ago
|
||
Comment on attachment 165906 [details] [diff] [review] patch v2 r=jshin
Attachment #165906 -
Flags: review?(jshin) → review+
Assignee | ||
Updated•20 years ago
|
Attachment #165906 -
Flags: superreview?(darin)
Reporter | ||
Comment 11•20 years ago
|
||
So, since STRICT_CHECK_OF_UNICODE is not defined in xpcconvert.cpp (contrary to my comment about it being defined in DEBUG builds), what's the real advantage to using these octet array methods over the ones that use |string|? I've heard that octet arrays are pretty slow in JS :(
Assignee | ||
Comment 12•20 years ago
|
||
well, for example, you can convert to UTF-16 or UTF-32... (where string would be truncated due to the null byte)
Comment 13•20 years ago
|
||
It could be used in nsIStreamLoaderObserver.onStreamComplete. That gives on array of octets, not a string. (i guess because it isn't a string, it can contain nulls)
Reporter | ||
Comment 14•20 years ago
|
||
Comment on attachment 165906 [details] [diff] [review] patch v2 OK, those sound like good reasons :) >Index: idl/nsIScriptableUConv.idl > /** > * Converts the data from Unicode to one Charset. > */ > string ConvertFromUnicode(in wstring aSrc); Bonus points: It would be good to better document this result. > wstring ConvertToUnicode([const] in string aSrc); [const] here is redundant. sr=darin
Attachment #165906 -
Flags: superreview?(darin) → superreview+
Comment 15•20 years ago
|
||
Comment on attachment 165906 [details] [diff] [review] patch v2 Darin: let's optimize JS byte arrays (ECMA TG1 is back in the game, working on 262 Edition 4); someone file a bug if performance is urgently bac. Why does this interface have StudlyCaps method names? IDL, especially scriptable IDL, should use camelCaps. /be
Assignee | ||
Comment 16•20 years ago
|
||
brendan: why it is using them, I don't know. but I left them unchanged to keep compat with existing JS code using this interface. if people don't think that's necessary, sure, I can change it.
Assignee | ||
Comment 17•20 years ago
|
||
(In reply to comment #14) > Bonus points: It would be good to better document this result. I added this commnet: 56 * Converts the data from Unicode to one Charset. 57 * Returns the converted string. After converting, Finish should be called 58 * and its return value appended to this return value. Checking in idl/nsIScriptableUConv.idl; /cvsroot/mozilla/intl/uconv/idl/nsIScriptableUConv.idl,v <-- nsIScriptableUConv.idl new revision: 1.8; previous revision: 1.7 done Checking in src/nsScriptableUConv.cpp; /cvsroot/mozilla/intl/uconv/src/nsScriptableUConv.cpp,v <-- nsScriptableUConv.cpp new revision: 1.15; previous revision: 1.14 done Checking in src/nsScriptableUConv.h; /cvsroot/mozilla/intl/uconv/src/nsScriptableUConv.h,v <-- nsScriptableUConv.h new revision: 1.8; previous revision: 1.7 done
Status: ASSIGNED → RESOLVED
Closed: 20 years ago
Resolution: --- → FIXED
You need to log in
before you can comment on or make changes to this bug.
Description
•