Closed
Bug 262385
Opened 21 years ago
Closed 21 years ago
nsIScriptableUnicodeConverter is not very scriptable
Categories
(Core :: Internationalization, defect)
Core
Internationalization
Tracking
()
RESOLVED
FIXED
mozilla1.8alpha5
People
(Reporter: darin.moz, Assigned: Biesinger)
References
Details
Attachments
(1 file, 1 obsolete file)
8.31 KB,
patch
|
jshin1987
:
review+
darin.moz
:
superreview+
|
Details | Diff | Splinter Review |
nsIScriptableUnicodeConverter is not very scriptable.
this interface is used by some JS code to convert unicode data:
http://lxr.mozilla.org/mozilla/search?string=nsIScriptableUnicodeConverter
but, the interface has methods like this:
string ConvertFromUnicode(in wstring aSrc);
|string| is only capable of representing ascii codepoints, and xpconnect will
decimate non-ASCII codepoints, so this interface is completely useless for
javascript applications.
cc'ing rginda since venkman is affected. calendar is apparently also affected.
Comment 1•21 years ago
|
||
Darin, |string ConvertFromUnicode(in wstring aSrc)| is inevitable because what
we want out of that is a sequence of bytes(octets) that represent a multi-byte
string ('multi' including one). Crossing XPCOM, it will be "inflated" (with
zero-padding) but the information is still there. Needless to say, clients have
to be careful. Javascript being typeless, I don't know what better way there is
for this.
Assignee | ||
Comment 2•21 years ago
|
||
ACString may be slightly better... or [array,size_is(foo)] out octet
![]() |
Reporter | |
Comment 3•21 years ago
|
||
jshin: i totally understand how it manages to work today, but xpconnect asserts
(in debug mode) that the high-bit of any element of a |string| will be zero.
it's a hack to rely on this behavior, and we should do something better. an
octet array is probably the best option (disregarding performance issues) since
you really don't want to treat the resulting character array as a JS string.
another option would be to use a nsISupportsCString instance to represent the
non-unicode encoded string.
Comment 4•21 years ago
|
||
I wrote a comment concurring with comment #2 (and now with comment #3 as well),
but somehow I didn't press the submit button :-). In comment #1, I didn't
mention using an octet array because I saw once a method in an interface being
marked as unscriptable for using it, which made me (incorrectly) believe that an
octet array is no better than |string| when crossing xpconnect.
Btw, I heard a complaint that the calendar in Mozilla-suite has problems with
non-ASCII characters while Sunbird doesn't. It occurred to me that this
discrepancy may have to do with different treatments of 'pseudo-Unicode' JS
strings (for multi-byte encoded C strings) in two versions of calendar (if
there's a fork between two.). Chatzilla seems to have managed to work so far
(this interface was added for it in bug 54857).
Assignee | ||
Comment 5•21 years ago
|
||
*** Bug 269649 has been marked as a duplicate of this bug. ***
Assignee | ||
Comment 6•21 years ago
|
||
how about this?
Assignee: smontagu → cbiesinger
Status: NEW → ASSIGNED
Assignee | ||
Updated•21 years ago
|
Attachment #165841 -
Flags: review?(jshin)
Comment 7•21 years ago
|
||
Comment on attachment 165841 [details] [diff] [review]
suggested patch
r=jshin
thanks.
>+ wstring convertByteArrayToUnicode(in unsigned long aCount,
>+ [const,array,size_is(aCount)] in octet aData);
>+ void convertToByteArray(in wstring aString,
>+ out unsigned long aLen,
>+ [array, size_is(aLen),retval] out octet aData);
How about |convertFromByteArray| and |convertToByteArray| these methods being
always used in scripts where 'Unicode' is implicitly assumed? It's unfortunate
that two existing methods did exactly the opposite.
Alternatively, |convertByteArrayToUnicode| and |convertUnicodeToByteArray|
would do.
Attachment #165841 -
Flags: review?(jshin) → review+
Assignee | ||
Comment 8•21 years ago
|
||
now using convertFromByteArray/convertToByteArray. this also improves the
documentation in the idl file.
Attachment #165841 -
Attachment is obsolete: true
Assignee | ||
Comment 9•21 years ago
|
||
Comment on attachment 165906 [details] [diff] [review]
patch v2
since I'm not quite sure my Finish comment change is correct, re-requesting
review
Attachment #165906 -
Flags: review?(jshin)
Assignee | ||
Updated•21 years ago
|
Target Milestone: --- → mozilla1.8alpha5
Comment 10•21 years ago
|
||
Comment on attachment 165906 [details] [diff] [review]
patch v2
r=jshin
Attachment #165906 -
Flags: review?(jshin) → review+
Assignee | ||
Updated•21 years ago
|
Attachment #165906 -
Flags: superreview?(darin)
![]() |
Reporter | |
Comment 11•21 years ago
|
||
So, since STRICT_CHECK_OF_UNICODE is not defined in xpcconvert.cpp (contrary to
my comment about it being defined in DEBUG builds), what's the real advantage to
using these octet array methods over the ones that use |string|? I've heard
that octet arrays are pretty slow in JS :(
Assignee | ||
Comment 12•21 years ago
|
||
well, for example, you can convert to UTF-16 or UTF-32... (where string would be
truncated due to the null byte)
![]() |
||
Comment 13•21 years ago
|
||
It could be used in nsIStreamLoaderObserver.onStreamComplete. That gives on
array of octets, not a string. (i guess because it isn't a string, it can
contain nulls)
![]() |
Reporter | |
Comment 14•21 years ago
|
||
Comment on attachment 165906 [details] [diff] [review]
patch v2
OK, those sound like good reasons :)
>Index: idl/nsIScriptableUConv.idl
> /**
> * Converts the data from Unicode to one Charset.
> */
> string ConvertFromUnicode(in wstring aSrc);
Bonus points: It would be good to better document this result.
> wstring ConvertToUnicode([const] in string aSrc);
[const] here is redundant.
sr=darin
Attachment #165906 -
Flags: superreview?(darin) → superreview+
![]() |
||
Comment 15•21 years ago
|
||
Comment on attachment 165906 [details] [diff] [review]
patch v2
Darin: let's optimize JS byte arrays (ECMA TG1 is back in the game, working on
262 Edition 4); someone file a bug if performance is urgently bac.
Why does this interface have StudlyCaps method names? IDL, especially
scriptable IDL, should use camelCaps.
/be
Assignee | ||
Comment 16•21 years ago
|
||
brendan: why it is using them, I don't know. but I left them unchanged to keep
compat with existing JS code using this interface. if people don't think that's
necessary, sure, I can change it.
Assignee | ||
Comment 17•21 years ago
|
||
(In reply to comment #14)
> Bonus points: It would be good to better document this result.
I added this commnet:
56 * Converts the data from Unicode to one Charset.
57 * Returns the converted string. After converting, Finish should be called
58 * and its return value appended to this return value.
Checking in idl/nsIScriptableUConv.idl;
/cvsroot/mozilla/intl/uconv/idl/nsIScriptableUConv.idl,v <--
nsIScriptableUConv.idl
new revision: 1.8; previous revision: 1.7
done
Checking in src/nsScriptableUConv.cpp;
/cvsroot/mozilla/intl/uconv/src/nsScriptableUConv.cpp,v <-- nsScriptableUConv.cpp
new revision: 1.15; previous revision: 1.14
done
Checking in src/nsScriptableUConv.h;
/cvsroot/mozilla/intl/uconv/src/nsScriptableUConv.h,v <-- nsScriptableUConv.h
new revision: 1.8; previous revision: 1.7
done
Status: ASSIGNED → RESOLVED
Closed: 21 years ago
Resolution: --- → FIXED
You need to log in
before you can comment on or make changes to this bug.
Description
•