Closed Bug 127789 Opened 23 years ago Closed 15 years ago

Implement nsAUTF8String and nsUTF8String

Categories

(Core :: XPCOM, defect)

defect
Not set
normal

Tracking

()

RESOLVED WONTFIX
Future

People

(Reporter: nisheeth_mozilla, Assigned: jag+mozilla)

References

Details

(Keywords: embed, Whiteboard: [adt3])

You know what I mean!  :-)
When you implement this, please make sure that you go into xpcconvert.cpp and
fix up the places where we are currently doing extra string copying.

Thanks!
OS: Windows 2000 → All
Hardware: PC → All
Keywords: nsbeta1, topembed
Target Milestone: --- → mozilla1.0
Nav triage team needs info: Why is this needed for MachV? (we don't know what
you mean ;-))
Whiteboard: [need info]
At a high level, this is needed to complete out the implementation of new string
types in XPIDL that got checked in the 0.9.9 milestone.  The current situation
is that the AUTF8String type in XPIDL (used to store strings encoded in the UTF8
encoding) uses a simple C string class for storage which does not enforce UTF8
encoding on its contents.  The implementation of nsUTF8String is needed so that
the already added AUTF8String XPIDL type can start using the nsUTF8String
concrete class for storage of UTF8 strings.

Darin, I'm not sure about this, but don't your nsIURI related changes also
depend on a nsUTF8String implementation?  If you do depend on the nsUTF8String
implementation, then that is another reason why this is needed for Mach V
because your changes have also been checked in to the 0.9.9 milestone.

CCing jband, who will probably argue much more eloquently about why this should
be allowed into Mach V and Mozilla 1.0.
Blocks: 129613
nisheeth: right, the URL changes I made would need to be fixed-up once the
string code provides UTF8 string class(es).

required abstract classes:

nsAUTF8String
nsASingleFragmentUTF8String
nsAFlatUTF8String

and we'd of course need an impl "nsUTF8String"... but unless we want to penalize
performance, we need a stack based version of this.

and the classes would need to be able to act like nsACString, etc. in some cases
to avoid extra string copies.  and then there's the conversion classes (e.g.,
NS_ConvertUTF8toUCS2)... we need to deal w/ those as well.
Does that mean that internationalized URLs (and strings passed around
internally) are broken?  What is the end user impact?  Thanks.
trudelle:

the impact of this bug isn't really measurable.  however, clearly without strong
type safety, there may be many nasty i18n bugs lurking around.  most would agree
that such bugs are very likely given the way we currently abuse |char*| and
|nsCString|.

post mozilla 1.0 it will be next to impossible to add these UTF-8 string
classes.  so IMO, we really shouldn't miss this opportunity to get help from the
compiler to prevent mismatched string encodings and related i18n bugs.
embed triage: topembed-, not critical for immediate embedding needs. Adding
embed keyword since this impacts the usage of the embedding API's.
Keywords: topembedembed, topembed-
Cc'ing valeski to appeal the topembed-.  Marking mozilla1.0+.

/be
Keywords: mozilla1.0+
Marking topembed+.
Keywords: embed, topembed-topembed+
nsbeta1+ per Nav triage team, adt3
Keywords: nsbeta1nsbeta1+
Whiteboard: [need info] → [need info][adt3]
Whiteboard: [need info][adt3] → [adt3]
Mass moving nsbeta1+/adt3 bugs assigned to Navigator team engineers out to
target milestone 1.0.1.  Please confine your attentions to driving down our list
of TM 1.0 bugs for beta.  Better to help, debug, or test one of them than fix
one of these.
Target Milestone: mozilla1.0 → mozilla1.0.1
Changing nsbeta1+ [adt3] bugs to nsbeta1- on behalf of the adt.  If you have any
questions about this, please email adt@netscape.com.  You can search for
"changing adt3 bugs" to quickly find and delete these bug mails.
Keywords: nsbeta1-
Changing nsbeta1+ [adt3] bugs to nsbeta1- on behalf of the adt.  If you have any
questions about this, please email adt@netscape.com.  You can search for
"changing adt3 bugs" to quickly find and delete these bug mails.
Keywords: nsbeta1+
Blocks: 125389
-> future
Target Milestone: mozilla1.0.1 → Future
MOving KW to embed as no immediate customer need (per edt)
Keywords: topembed+embed
As a small step in this direction, could we typedef nsAUTF8String as nsACString
so that C++ code can use it for argument types?  That should make it clearer
what functions are expecting as input, at least...
> As a small step in this direction, could we typedef nsAUTF8String as nsACString

I agree 100%... patches welcome ;-)
So what would we need exactly?  Changes to nsStringAPI, nsStringFwd, and
nsrootidl, and a change to xpidl to generate nsAUTF8String headers for AUTF8String?
QA Contact: jag → string
Status: NEW → RESOLVED
Closed: 15 years ago
Resolution: --- → WONTFIX
I still think we should fix this....  We've had numerous bugs where people stuffed UTF-8 into non-UTF-8 strings and vice versa because on the C++ API level you can't tell them apart.
I don't think we should fix this unless we have a scheme to actually be completely correct and then use compilers and static analysis to enforce certain invariants. I don't think the reward of that is sufficient to actually devote the necessary effort to this bug.
(In reply to comment #20)
> I don't think we should fix this unless we have a scheme to actually be
> completely correct

Isn't this an instance of allowing the perfect to be the enemy of the good?
I'm saying that partial utf8string classes wouldn't be good, just confusing.
That seems like it would depend heavily on what was implemented and what wasn't.
(In reply to comment #23)
> That seems like it would depend heavily on what was implemented and what
> wasn't.

Examples? I find Benjamin's scenario -- developers thinking that they're getting some protection from using nsAUTF8String, but not actually getting it because of an infelicity underneath them -- plausible, but I would find examples of where we would use the classes pretty interesting too, since you seem to have some such in mind.
Component: String → XPCOM
You need to log in before you can comment on or make changes to this bug.