This is ONE of many possible character encodings, that happen to fit in 8-bit characters. I don't think this is a proper use of the attribute mechanism, to single out a particular character encoding. Let's either add a new type, or pressure the nsIURI folks to just move to UNICODE. Either way, the nsIURI implementation will have to become UNICODE aware.

Dan Mosedale (:dmosedale, :dmose)

Reporter

•

24 years ago

Marking this won't fix, since an alternative solution was presented and seems to be accepted. If incorrect feel free to change it back and discuss the issue further.

Status: ASSIGNED → RESOLVED

Closed: 24 years ago

Resolution: --- → WONTFIX

Phil Schwartau

Updated

•

24 years ago

Status: RESOLVED → VERIFIED

Phil Schwartau

Comment 7

•

24 years ago

Marking Verified -

Dan Mosedale (:dmosedale, :dmose)

Reporter

Comment 8

•

24 years ago

Um, what alternate solution are you referring to?

Status: VERIFIED → REOPENED

Resolution: WONTFIX → ---

David Bradley

Comment 9

•

Comment 17

•

24 years ago

If we are still thinking about making nsIURI/nsIURL/nsIFile intefaces pass utf8 strings, we are going to need this to be fixed.

David Bradley

Comment 18

•

Comment 28

•

24 years ago

It seems like the *only* pressing reason to provide a UTF-8 datatype is for the nsIURI interface -- which must be frozen for mozilla 1.0 I have no problem creating 'parallel'interfaces (ie. nsIURI_UCS2) to access data as unicode. I guess my only question is when the unicode versions should be used vs. the ASCII ones :-) I suppose, the ASCII ones could *always* return data as Url-Encoded 7-bit data while the unicode ones would just return the 'raw' data as unicode. Is this sufficient? Or can we simply ditch the ASCII ones and always access the data as unicode -- and possibly translate it to Url-Encoded 7-bit ASCII if necessary? Thoughts? Opinions? -- rick

Andreas Otte

Comment 29

•

24 years ago

It really depends on the protocol and we should not escape twice. With the new urlparser rewrite we already store the url escaped. Depending on the protocol (could be set by the protocol flags) we should call the escape functions which do the normal escaping for the first 127 chars always (this is a must!) and sometimes drop the escaping of the other chars.

timeless

•

24 years ago

Most of the huge UCS2 bloat is inside of layout, not from XPCOM boundary crossing, I think. We've been talking about this for literally years, but I was always under the impression that it was Too Hard to fix layout. We have bugs related to using UCS-2 even when the data is all-ASCII because of defects in AppendData, if people are looking for a place to start.

Johnny Stenback (:jst)

Comment 37

•

24 years ago

Most of the string data in layout (or at least in the DOM) is stored as char*'s if it's ASCII data, which is what most data on most pages is. Do we have other huge consumers of string data in layout? Style system?

John Bandhauer

Comment 38

•

24 years ago

I'm seeing a lot more reasons to *not* do this than reasons to do it. I do want to reiterate the thought that if we *do* decide to add such support to xpidl then it should be as a way to talk about an aString-like *class* (yet to be written!) and not about a pointer to an array of UTF8 chars. We want to get rid of uses of string and wstring and their ilk, not encourage their use.

Dan Mosedale (:dmosedale, :dmose)

Reporter

Comment 39

•

23 years ago

Changing bug summary to more accurately describe what is desired after discussion with bryner and jag in irc.

•

23 years ago

NS_ConvertUTF8toUCS2() calls class ConvertUTF8ToUCS2, which does validate. Note the use of minUCS4 in ConvertUTF8ToUCS2::write() -- a C0 80 sequence will decode to U+FFFD.

Dan Mosedale (:dmosedale, :dmose)

Reporter

Comment 59

•

Comment 65

•

23 years ago

yeah, if i can help out with any of the grunt work just let me know.

Mark Hammond [:markh] [:mhammond]

Comment 66

•

23 years ago

I certainly see where jband is coming from - tracking down encoding related errors could be gruesome. Take PyXPCOM for example; this change gives me a number of dilemmas about how to handle string conversions. Every option has a significant downside, including too much encoding magic (Python has a default encoding (generally strict 7-bit ASCII), Moz uses UTF-8, on Windows all the narrow CRT functions use "mbcs" encoding, etc) causing strange and hard to reproduce failures when first tested with non 7 bit ascii data. So I would prefer to see this not go ahead, but jband's idea of a new IDL type explored. I do, however, appreciate the reality of the situation and that my 2c doesn't change it :)

David Bradley

Comment 67

•

23 years ago

Trying to put the finishing touches on a patch, I ran into something that I wanted to get a concensus on. For a type T_CHAR (single character) that is being translated into UCS2, how should I handle that? Should I still call the UTF8 to UCS2 function or assume that it can only be ASCII and leave it as it is. There's warning code for testing the high bit, but I'm not familiar enough with UTF8 encoding to know if there's a valid UTF8 character that is one byte and has the high bit set. My take is that it doesn't make sense to use T_CHAR for UTF8 and that it should be assumed ASCII, leaving the assert in. Conversely if a conversion from UCS2 character converted to T_CHAR I probably should fail the conversion if it would result in a non-ASCII character?

Daniel Veditz [:dveditz] back January 5, 2026

I mentioned concrete because we want static type checking benefits: we don't want the compiler to silently swallow lots of hidden runtime conversions to and from UTF-8. That's an argument against implementing nsAString with UTF-8 storage and declaring victory in this bug. But if we support nsACString in typelibs, and make a concrete class nsUTF8String that implements nsACString, do we need a more specific IDL type? The issue is what .Length() and iteration over elements numbered by [0, length) count -- bytes or characters? I understand current nsACString users want Length to count bytes. But UTF8String users would want Length to count characters, and string users are too comfortable already with calling Length (to the point of ignoring IsEmpty). Jag, please advise. If we do need a more specific IDL type for UTF-8, I don't see the need to put a gratuitous 'A' in its name. We were talking about all this just now, and I believe a consensus in favor of UTF-8 first class support in typelibs and strings emerged. We (I especially) feel that UTF-8 should have been used in the ancient days instead of PRUnichar vectors (wstring, "UCS-2" but not really). We're facing plane 1 character codes coming in via certain locales (Korean, I hear), and if we have to switch to an encoding that doesn't store characters in equal-sized units, we might as well use UTF-8 and save space in this hemisphere. /be

rpotts (gone)

Comment 104

•

•

23 years ago

dbradley: Brendan, Nisheeth, and I convinced ourselves that we could get away with only incrementing the minor version number. See the discusion in the first comment of bug 65762. We are flirting with tripping up much older browsers by doing this - especially if we add more than one type. Except for the issue with *real* old browsers with the shorter lookup table in xpcconvert.cpp, we are free (as far as xpconnect is concerned) to add more type tags. XPConnect will simply fail to reflect methods with unsupported types. We *may* be playing a little fast and loose here, but my hope is to avoid crossing the XPT_MAJOR_INCOMPATIBLE_VERSION threshold until we find a compelling reason to actually change the pattern of bytes in the structs that get deserialized from the typelibs. At that point the problem would be that the old readers would literally be unable to correctly read the files, and not just a problem of types that clients of the reader would be unable to proxy. Certainly other classes of change might come along that might make us cross that threshold. But, we are thinking that we can get away with not doing it now. Also, before we go hogwild adding new types, let's not forget that there is still a very short list of available type tags. And that adding *any* types adds work for supporting those types at various places in our xpconnect and proxy code. Also, as Brendan pointed out, the last available tag should absolutely be reserved for use as a future use when we (may) reach the point where some type tags will require an additional byte. That tag value would allow us to flag that case while still leaving the existing tags unchanged. (Of course, that *would* be an incompatible version change - and would be hell on typeinfo clients too!) As I said in that other bug, the tools (xpidl compiler *and* xpt_link) should support creating the current xpt version 1.1 files on request and fail with an error if non-1.1 types are used. I think this is necessary. Though, I suspect, some plugin vendors (etc.) who might have use of this feature will likely fail to discover it and will be blindly playing hit and miss depending on what types they happen to attempt to pass thorough their interfaces.

Nisheeth Ranjan

Assignee

Comment 111

•

23 years ago

Taking...

Assignee: dbradley → nisheeth

Status: ASSIGNED → NEW

Darin Fisher

Comment 112

•

23 years ago

nisheeth: what's the ETA on this one looking like? i'd like to squeeze the nsIURI changes in by 0.9.9 if possible.

Nisheeth Ranjan

Assignee

Comment 113

•

23 years ago

I expect to land this by Wednesday (Feb 13) of next week. Would that work for you?

Darin Fisher

Comment 114

•

23 years ago

sure, that'll be fine.

Darin Fisher

Updated

•

23 years ago

Blocks: 124042

Nisheeth Ranjan

Assignee

Comment 115

•

23 years ago

Just to clarify. I am only gonna get the XPIDL and XPConnect changes landed. Jag or some string guy needs to do the new string classes needed for the world to be happy.

No longer blocks: 124042

Nisheeth Ranjan

Assignee

Comment 116

•

23 years ago

Also want to acknowledge David Bradley's contribution. Even though he is bogged down with 0.9.9 bugs, he's sent me the XPIDL/XPConnect changes for adding UTF8String. Hopefully, I'm just gonna have to replicate them (famous last words! :)) for CString and then test, test, test.

Darin Fisher

Comment 136

•

23 years ago

Rick mentioned both UCS2 and UTF16. How is the code in the tree split between these two?

David Baron :dbaron: (⌚️UTC-5, no longer working on Mozilla)

Comment 137

•

23 years ago

dmose (re: comment 136): See bug 118000, particularly bug 118000 comment 19 through bug 118000 comment 27.

rpotts (gone)

Comment 138

•

23 years ago

hey dbaron, unfortunately since we are freezing the DOM interfaces we need to freeze nsAString as well... since it is used extensively in these interfaces... similarly, since nsIURI is freezing, we'll need to freeze nsAUTFString too. Currently, no public interfaces expose the nsACString interface so this one is still up in the air ;-) but realistically, if AString and AUTF8String freeze, then ACString may end up being frozen too... -- rick

David Baron :dbaron: (⌚️UTC-5, no longer working on Mozilla)

•

23 years ago

I assume the OS/2 changes are needed. I notice that your lines use tab chars and the lines in that file use spaces. I have no idea if it matters, but you ought to not use the tabs just in case. In xpconnect we yse 'if(' not 'if ('. Please fix those. Please wrap lines longer than 80 columns. In XPCConvert::NativeData2JS the T_UTF8STRING case looks reasonable. The T_CSTRING case makes me uncomfortable. I like avoiding the extra string copy, but I don't like the implicit assumption that the JS string system uses the same allocator as the xpcom global allocator. I'd rather see us do a one-time registration of a finalizer using JS_AddExternalStringFinalizer and then use JS_NewExternalString in this T_CSTRING string creation case. For JSData2Native I don't think you gain anything by trying to factor out the GetJSStringInternals because... You tried to copy the T_DOMSTRING stuff here, but that stuff has wierd behavior specifically for the DOM. I think you should follow the pattern used for the other string types here and just convert void and null jsvals to nsnull and be done with them. Also, because in the T_CSTRING case you can use JS_GetStringBytes to get the 8-bit char arrays that the JS engine caches for us. This saves us a convertion when the same string is passed through more than once. Also, Is the plan to land all of this using nCSting for both types? Or is the AUTF8String going in too? Please address these issues (and whatever you want from my previous attachement) and post us another patch when you have it. Thanks! I'll look at doing the nsIVariant changes.

David Bradley

Comment 150

•

23 years ago

Looks like John got most of the big things. Some minor issues: I'm wondering if we really need the create_old_typelib flag used outside of the parameter checking for additional -t's. It eliminates some version checks, but those don't look too costly. From what I can tell, all the other checks for create_old_typelib can be eliminated safely and would make the code a little simpler. The outFileName declared in the main function in xpt_link.c should be const char *. You commented out the Foo attribute in interface nsITestXPCFoo in xpctest.idl. Just curious if that was accidental or not.

Johnny Stenback (:jst)

•

23 years ago

Attached patch Patch that incorporates jband's and dbradley's comments (obsolete) — Details — Splinter Review

Attachment #69802 - Attachment is obsolete: true

Attachment #70041 - Attachment is obsolete: true

Nisheeth Ranjan

Assignee

Comment 155

•

23 years ago

In comment 153, s/bug 125481/bug 125841. Darin, how are your nsIURI changes coming along? The plan is still to our changes together for 0.9.9 without waiting for nsUTF8String, right? jband/dbradley/shaver, please review attachment 70474 [details] [diff] [review]. Thanks!

jag (Peter Annema)

Comment 156

•

23 years ago

Yes, that's the plan as far as I know.

Nisheeth Ranjan

Assignee

Comment 157

•

23 years ago

Attached patch Patch that handles null and void string values better. (obsolete) — Details — Splinter Review

OK, I think this patch addresses everything so far except Johnny's request to add a new string type. Please look through this patch to see if anything bad jumps out at you. I will go through the code to add a new string type tomorrow. You guys have one day to object! :-) Once I'm done there will be four IDL string types (among others): DOMString - handles void (undefined) and null JS string values in a special DOM compatible way as described in comment 151 from Johnny. AString, ACString, and AUTF8String - will treat null and void JS string values as empty strings.

Attachment #70474 - Attachment is obsolete: true

Johnny Stenback (:jst)

Comment 158

•

23 years ago

Um, I had the |undefined| conversion a bit wrong in my mind, non-DOM strings throw an exception when asked to convert |undefined| into a string value, whereas when converted into a DOMString it converts into the string "undefined". I vote for parity between all the non-DOMString string types.

John Bandhauer

•

23 years ago

I should have realized this earlier but better late than never. Just to clarify: I am not really adding a new string type (AString) in addition to DOMString, AUTF8String, and ACString. AString was already an accepted string type in XPIDL but it was mapped behind the scenes to do *exactly* what DOMStrings do. What I am doing is changing things so that AStrings behave *mostly* like DOMStrings except that they assign empty strings into themseleves when null and void js vals are passed into them. So, please ignore my earlier posts about a whole new string type.

John Bandhauer

Comment 163

•

23 years ago

Sure, you are not adding *another* new string C++ type for AString (abstract or otherwise). But you *are* adding a new xpidl type.

Nisheeth Ranjan

Assignee

Comment 164

•

23 years ago

Attached patch Final patch with support for AString, AUtf8String, and ACString (obsolete) — Details — Splinter Review

Thanks for the clarification, jband. Here's the final patch with support for the new AString XPIDL type. jband, shaver, dbradley are requested to r/sr. Thanks!

Attachment #70479 - Attachment is obsolete: true

jag (Peter Annema)

•

23 years ago

Attached patch Version 2.0 of final patch! :-) (obsolete) — Details — Splinter Review

Jband pointed out a performance problem in JSData2Native() which is fixed in this patch. Dbradley's comments are also addressed. David, would you please give this patch a review stamp also? You need only look at the xpcconvert.cpp diffs. The rest of the files haven't changed since the last patch. Jag, please review the string code in the conversion routines in xpcconvert.cpp. Jband, please sr. Thanks!

Attachment #70643 - Attachment is obsolete: true

Nisheeth Ranjan

Assignee

Comment 169

•

23 years ago

Attached patch version 3.0 of patch (I've given up on calling it final.) — Details — Splinter Review

Rearranged code in the UTF8String and CString cases inside JSData2Native() in xpcconvert.cpp per jband's request. Rest of the patch is the same as earlier. David, please r=. Jband, please sr=. Thanks!

Attachment #71031 - Attachment is obsolete: true

John Bandhauer

Comment 170

•

23 years ago

Comment on attachment 71051 [details] [diff] [review] version 3.0 of patch (I've given up on calling it final.) sr=jband This is the one. I agree that we should land it. Note that I will be very annoyed if we *don't follow through with any of the following: - utf8 string class. - changes to this to use that class. - changes to nsVariant. This change by itself leaves things in a state that we should pass through quickly. But, let's get it in and get on with things.

Attachment #71051 - Flags: superreview+

jag (Peter Annema)

Comment 171

•

23 years ago

Comment on attachment 71051 [details] [diff] [review] version 3.0 of patch (I've given up on calling it final.) I thought the non-DOM strings were going to do what |string| and |wstring| do, that is to say, no support for null/void/undefined converting from or to. >Index: js/src/xpconnect/src/xpcconvert.cpp >=================================================================== >RCS file: /cvsroot/mozilla/js/src/xpconnect/src/xpcconvert.cpp,v >retrieving revision 1.73 >diff -u -r1.73 xpcconvert.cpp >--- js/src/xpconnect/src/xpcconvert.cpp 5 Feb 2002 06:45:02 -0000 1.73 >+++ js/src/xpconnect/src/xpcconvert.cpp 23 Feb 2002 06:04:30 -0000 >@@ -623,7 +740,7 @@ > { > nsAWritableString* ws = *((nsAWritableString**)d); > >- if(JSVAL_IS_NULL(s)) >+ if(JSVAL_IS_NULL(s) || (!isDOMString && JSVAL_IS_VOID(s))) > { > ws->Truncate(); > ws->SetIsVoid(PR_TRUE); >@@ -726,6 +843,116 @@ > return JS_TRUE; > } > >+ case nsXPTType::T_UTF8STRING: >+ { >+ jschar* chars; >+ PRUint32 length; >+ JSString* str; >+ >+ if(JSVAL_IS_NULL(s) || JSVAL_IS_VOID(s)) >+ { >+ if(useAllocator) >+ { >+ nsACString *rs = new nsCString(); >+ if(!rs) >+ return JS_FALSE; >+ >+ rs->SetIsVoid(PR_TRUE); nsCString's SetIsVoid does nothing, there's no point in setting this. >+ *((nsACString**)d) = rs; >+ } >+ else >+ { >+ nsCString* rs = *((nsCString**)d); >+ rs->Truncate(); >+ rs->SetIsVoid(PR_TRUE); Same here and further below. r=jag on the string changes, but check with jst on the use of DOMString-like IsVoid.

Nisheeth Ranjan

Assignee

Comment 172

•

23 years ago

Jag, we will have the notion of "voidness" for AStrings, CStrings and UTF8Strings (see bug 125841). You are right that the SetISVoid() don't do anything for CStrings and UTF8Strings currently. The plan is to implement something similar to XPCVoidableString for CStrings and UTF8String that *do* implement "voidness". The calls to SetIsVoid() are no-ops right now but will do real work in the future. Thanks for the review!

David Bradley

Comment 173

•

23 years ago

Comment on attachment 71051 [details] [diff] [review] version 3.0 of patch (I've given up on calling it final.) r=dbradley

Attachment #71051 - Flags: review+

Darin Fisher

Comment 174

•

23 years ago

when i tried the very first version of this patch, i had to explicitly %{C++ #include "nsAString.h" %} at the top of each .idl file that referenced AUTF8String. is this still required?

John Bandhauer

Comment 175

•

23 years ago

Darin: that has long been the case for AString idl users. It is debatable whether that block should be added to the idl or if the #include'rs of the generated .h should be expected to pre-#include the string header. Nevertheless, there is no automatic generation of that #include line. There is a bug around suggesting that this would be nice, but it requires xpidl hacking far beyond the scope of this bug.

Mike Shaver (:shaver emeritus)

Comment 176

•

23 years ago

Let #includers of those headers #include "nsAString.h". Scott Meyers says so, if you need a reason!

Mike Shaver (:shaver emeritus)

Comment 177

•

23 years ago

Comment on attachment 71051 [details] [diff] [review] version 3.0 of patch (I've given up on calling it final.) a=shaver for 0.9.9, but I won't cry if you take out those %{C++%} header warts. Nice work, all.

Attachment #71051 - Flags: approval+

Nisheeth Ranjan

Assignee

Comment 178

•

23 years ago

The fix is in. Finally! :-) I've updated the status summary to reflect reality.

Status: ASSIGNED → RESOLVED

Closed: 24 years ago → 23 years ago

Resolution: --- → FIXED

Summary: 8-bit UTF8-capable string type for XPIDL → Add ACString, AUTF8String, and AString to XPIDL

Darin Fisher

Comment 179

•

23 years ago

jband,shaver,"scott-meyers": so adding #include "nsAString.h" to nsrootidl.idl is bad because it causes everyone to include that file? how about at least adding class nsAString; class nsACString; to nsrootidl.idl?

Summary: Add ACString, AUTF8String, and AString to XPIDL → 8-bit UTF8-capable string type for XPIDL

Nisheeth Ranjan

Assignee

Updated

•

23 years ago

Summary: 8-bit UTF8-capable string type for XPIDL → Add ACString, AUTF8String, AString types to XPIDL

Nisheeth Ranjan

Assignee

Comment 180

•

23 years ago

>sr=jband This is the one. I agree that we should land it. Note that I will be >very annoyed if we *don't follow through with any of the following: >- utf8 string class. >- changes to this to use that class. I've filed bug 127789 on jag to implement utf8strings. I've added a comment on it to go fix up the extra string copies in xpcconvert.cpp. >- changes to nsVariant. I'd filed bug 125465 on myself last week to track this.

Dan Mosedale (:dmosedale, :dmose)

Reporter

•

23 years ago

Brendan, please clarify. Exactly what should I not #include? Are you saying that I should remove the #include lines that were there in nsrootidl.idl before my changes?

Brendan Eich [:brendan]

Comment 187

•

23 years ago

nisheeth, sorry I was unclear. I was supporting your final patch, which adds class nsACString; to nsrootidl.idl's C++ section, thereby relieving others from having to #include "nsACString.h" or repeat the class forward decl. IOW, I'm all happy here. /be

Nisheeth Ranjan

Assignee

Comment 188

•

23 years ago

Attachment 71980 [details] [diff] got checked into the 0.9.9 branch and the trunk just now as part of the fix to bug 125465. Jag, when you implement nsAUTF8String, please remember to add a forward declaration to it in nsrootidl.idl. Thanks!

Phil Schwartau

Comment 189

•

23 years ago

Marking Verified -

Status: RESOLVED → VERIFIED

Nisheeth Ranjan

Assignee

Updated

•

23 years ago

Blocks: 129613

timeless

Updated

•

17 years ago

Component: xpidl → XPCOM

QA Contact: pschwartau → xpcom

Prelim patch 23 years ago David Bradley 6.55 KB, patch		Details \| Diff \| Splinter Review
Log of irc session with darin, shaver and jag. 23 years ago jag (Peter Annema) 16.87 KB, text/plain		Details
Add ACString and AUTF8String support in XPIDL/XPConnect 23 years ago Nisheeth Ranjan 34.26 KB, patch		Details \| Diff \| Splinter Review
First cut, need to make utf8string really utf8string, and need utf8 iterators 23 years ago jag (Peter Annema) 85.61 KB, patch		Details \| Diff \| Splinter Review
Unified patch for adding AUTF8String and ACString types to XPIDL 23 years ago Nisheeth Ranjan 53.14 KB, patch		Details \| Diff \| Splinter Review
variation on xpcom/typelib part of Nisheeth's patch 23 years ago John Bandhauer 31.60 KB, patch		Details \| Diff \| Splinter Review
Patch that incorporates jband's and dbradley's comments 23 years ago Nisheeth Ranjan 56.93 KB, patch		Details \| Diff \| Splinter Review
Patch that handles null and void string values better. 23 years ago Nisheeth Ranjan 56.47 KB, patch		Details \| Diff \| Splinter Review
Final patch with support for AString, AUtf8String, and ACString 23 years ago Nisheeth Ranjan 61.01 KB, patch		Details \| Diff \| Splinter Review
Version 2.0 of final patch! :-) 23 years ago Nisheeth Ranjan 61.81 KB, patch		Details \| Diff \| Splinter Review
version 3.0 of patch (I've given up on calling it final.) 23 years ago Nisheeth Ranjan 61.90 KB, patch	dbradley : review+ jband_mozilla : superreview+ shaver : approval+	Details \| Diff \| Splinter Review
Patch to nsrootidl that forward declares string types. 23 years ago Nisheeth Ranjan 567 bytes, patch		Details \| Diff \| Splinter Review