Closed
Bug 239303
Opened 21 years ago
Closed 21 years ago
provide basic string conversion routines for the Gecko SDK
Categories
(Core :: XPCOM, defect)
Core
XPCOM
Tracking
()
RESOLVED
FIXED
mozilla1.7final
People
(Reporter: darin.moz, Assigned: darin.moz)
Details
Attachments
(1 file, 1 obsolete file)
19.59 KB,
patch
|
benjamin
:
review+
dbaron
:
superreview+
chofmann
:
approval1.7+
|
Details | Diff | Splinter Review |
provide basic string conversion routines to embedding string API.
bsmedberg and i discussed this issue, and we came up with the following API:
//--------------------------------------------------------------------------
/**
* Available single-byte charsets.
*/
enum {
/* Conversion between ASCII and UTF-16 assumes that all single-byte
* characters can be inflated to UTF-16 by inserting zero bytes. Reverse
* conversion is done by truncating every other byte. No attempt is made
* to protect against data loss in this case. */
NS_CHARSET_ASCII = 0,
/* Conversion between UTF-8 and UTF-16 is non-lossy. */
NS_CHARSET_UTF8 = 1,
/* Conversion from UTF-16 to the native filesystem charset may result in a
* loss of information. No attempt is made to protect against data loss in
* this case. The native filesystem charset applies to strings passed to
* the "Native" method variants on nsIFile and nsILocalFile. */
NS_CHARSET_NATIVE_FILESYSTEM = 2
};
/**
* NS_CStringToUTF16
*
* This function converts the characters in a nsACString to an array of UTF-16
* characters, in the platform endianness. The result is stored in a nsAString
* object.
*
* @param aSource abstract string reference containing source string
* @param aCharset character set of the source string
* @param aDest abstract string reference to hold the result
*/
NS_STRINGAPI(nsresult)
NS_CStringToUTF16(const nsACString &aSource, PRUint32 aCharset,
nsAString &aDest);
/**
* NS_UTF16ToCString
*
* This function converts the UTF-16 characters in a nsAString to a single-byte
* encoding. The result is stored in a nsACString object. In some cases this
* conversion may be lossy. In such cases, the conversion may succeed with a
* return code indicating loss of information. The exact behavior is not
* specified at this time.
*
* @param aSource abstract string reference containing source string
* @param aCharset character set of the resulting string
* @param aDest abstract string reference to hold the result
*/
NS_STRINGAPI(nsresult)
NS_UTF16ToCString(const nsAString &aSource, PRUint32 aCharset,
nsACString &aDest);
//--------------------------------------------------------------------------
This API provides embedders and external component developers with an easy way
to convert between the primary string encodings used by the Gecko SDK. The
current state of affairs makes the Gecko SDK difficult to use and/or promotes
the use of non-frozen uconv interfaces or methods from nsReadableUtils.h and
nsNativeCharsetUtils.h.
I'd like to see these methods added to the Gecko SDK for Mozilla 1.7 final.
Assignee | ||
Comment 1•21 years ago
|
||
Assignee | ||
Updated•21 years ago
|
Attachment #145207 -
Flags: superreview?(dbaron)
Attachment #145207 -
Flags: review?(bsmedberg)
Assignee | ||
Comment 2•21 years ago
|
||
dougt: can you also please review these additions to XPCOM and the glue. thx!
Status: NEW → ASSIGNED
Target Milestone: --- → mozilla1.7final
Comment 3•21 years ago
|
||
> enum {
why not give it a name and use the name of that enum in the function declaration
instead of PRUint32. that way, the compiler can do some typechecks and it makes
it easier to find out what the excepted parameter value is.
also, the new functions in glue/standalone/nsXPCOMGlue.cpp don't return a value
despite being declared nsresult. but should they be nsresult? can they fail?
NS_ConvertUTF16toUTF8 fails by returning an empty string (iirc), should these
behave the same?
I agree with comment 3, and I'd also prefer NS_ENCODING_* instead of NS_CHARSET_*
Assignee | ||
Comment 5•21 years ago
|
||
> > enum {
>
> why not give it a name and use the name of that enum in the function...
yeah, i thought about doing that, but i couldn't make up my mind what to call
it. nsCharset seems too generic. suggestions?
> also, the new functions in glue/standalone/nsXPCOMGlue.cpp don't return a value
ack.. good catch. why didn't the compiler complain? :-/
> can they fail?
yes, see the comments in the header file.
> NS_ConvertUTF16toUTF8 fails by returning an empty string (iirc), should these
> behave the same?
not sure yet. the comments in the header file leave the error handling vaguely
defined. i'm not happy with that, but i want to think about it some more. i
don't want to be too specific about how it behaves since i might not be able to
implement that without major changes to the conversion routines. i might regret
not nailing down such behavior for 1.7 though :-/
Assignee | ||
Comment 6•21 years ago
|
||
(In reply to comment #4)
> I agree with comment 3, and I'd also prefer NS_ENCODING_* instead of NS_CHARSET_*
Yeah, NS_ENCODING_* sounds better to me too.
Are we in general agreement with the premise for adding these functions? And,
does anyone have any other suggestions about what the function signatures should be?
Is nsStringEncoding the right name for the enum type?
Comment 7•21 years ago
|
||
> why not give it a name and use the name of that enum in the function declaration
> instead of PRUint32. that way, the compiler can do some typechecks and it makes
> it easier to find out what the excepted parameter value is.
Technically, this is not portable across C compilers, enum types can be smaller
than int. I think we should keep PRUint32.
I definitely agree that we should have these, and the names sound fine.
Although I wonder if the function names should say "SingleByte" rather than
"CString", since "CString" could be interpreted in multiple ways and
"SingleByte" is conceptually closer to "UTF16".
Assignee | ||
Comment 9•21 years ago
|
||
"SingleByte" works for me.
NS_SingleByteToUTF16
NS_UTF16ToSingleByte
Assignee | ||
Comment 10•21 years ago
|
||
Actually, what about "MultiByte" instead of "SingleByte"? I say that because it
would then be more consistent with WIN32's WideCharToMultiByte and the wchar_t
functions, like wcstombs. thoughts?
The term "multi-byte" for "characters are variable length" has never made much
sense to me. Maybe CString is best.
Assignee | ||
Comment 12•21 years ago
|
||
CString is good because it matches nsACString, and the encoding is clearly
variable as the API suggests. OK, I'll stick with CString.
Assignee | ||
Comment 13•21 years ago
|
||
ok, here's the revised patch. i added some test code.
Attachment #145207 -
Attachment is obsolete: true
Assignee | ||
Updated•21 years ago
|
Attachment #145207 -
Flags: superreview?(dbaron)
Attachment #145207 -
Flags: review?(bsmedberg)
Assignee | ||
Updated•21 years ago
|
Attachment #145217 -
Flags: superreview?(dbaron)
Attachment #145217 -
Flags: review?(bsmedberg)
Comment 14•21 years ago
|
||
Comment on attachment 145217 [details] [diff] [review]
v1.3 patch
>+NS_STRINGAPI(nsresult)
>+NS_CStringToUTF16(const nsACString &aSrc, PRUint32 aSrcEncoding, nsAString &aDest)
>+{
>+ // XXX handle errors
>+ case NS_ENCODING_UTF8:
>+ CopyUTF8toUTF16(aSrc, aDest);
>+ break;
Do we want to do an IsEmpty check to see if the conversion was successful?
Otherwise, looks good to me.
Attachment #145217 -
Flags: review?(bsmedberg) → review+
Assignee | ||
Comment 15•21 years ago
|
||
i thought about doing IsEmpty checks... i'd need to verify that the source is
not empty also before returning an error. i guess i could do that though.
Assignee | ||
Comment 16•21 years ago
|
||
dbaron and i talked about this. the converters currently behave very
inconsistently. it is impossible to discern OOM errors from conversion errors
in some cases, and in other cases conversion errors result in garbage in the
output.
my intent with the nsresult return value is to allow us to one-day do better
error checking.
Attachment #145217 -
Flags: superreview?(dbaron) → superreview+
Assignee | ||
Updated•21 years ago
|
Attachment #145217 -
Flags: approval1.7?
Comment 17•21 years ago
|
||
Comment on attachment 145217 [details] [diff] [review]
v1.3 patch
a=chofmann for 1.7
Attachment #145217 -
Flags: approval1.7? → approval1.7+
Assignee | ||
Comment 18•21 years ago
|
||
dougt mentioned over AIM that this patch "is fine"
fixed-on-trunk for 1.7 final
Status: ASSIGNED → RESOLVED
Closed: 21 years ago
Resolution: --- → FIXED
Comment 19•21 years ago
|
||
The patch for this bug broke the AIX Tinderbox. I have filed Bug 239441 for the
issue.
Comment 20•21 years ago
|
||
Re: comment 7: What is not portable across C compilers? Using an enum type for
a formal parameter should be fine, and using an enum tag to define a type name
is fine too. Why not do that and let the compiler pack, or not, as it sees fit?
Is there some need to interface with PRUint32 on only one side of a calling
convention?
Enum is int in C, packable if the domain allows and the compiler wants to, IIRC.
Not sure whether C++ changes this.
Sorry for the late drive-by.
/be
Assignee | ||
Comment 21•21 years ago
|
||
We can add the enum type back in before marking these APIs as frozen. See bug
239123.
Updated•4 years ago
|
Component: String → XPCOM
You need to log in
before you can comment on or make changes to this bug.
Description
•