Closed Bug 256785 Opened 21 years ago Closed 7 years ago

conversion from unicode to |string| should be stricter

Categories

(Core :: XPConnect, defect)

defect
Not set
normal

Tracking

()

RESOLVED INACTIVE

People

(Reporter: darin.moz, Unassigned)

Details

conversion from unicode to |string| should be stricter. currently, xpconnect happily truncates unicode characters to 8-bits when passing through a |string| IDL parameter. this obviously results in data loss if the caller passes non-ASCII values to xpconnect. however, more than just data loss, this creates major trouble for a lot of code. take this example: interface nsIFoo { void doStuff(in string aURL); }; now, imagine that nsIFoo is implemented in C++, like so: NS_IMETHODIMP nsFoo::DoStuff(const char *aURL) { nsCOMPtr<nsIURI> uri; NS_NewURI(getter_AddRefs(uri), nsDependentCString(aURL)); ... } in this code, lurks trouble. if DoStuff is ever passed a non-ASCII URL string from JavaScript code, that URL string will be destructively truncated to 8-bits and the result will be passed to NS_NewURI. NS_NewURI expects a UTF-8 input string, but in this example, aURL is not valid UTF-8. currently, necko deals with this problem by validating the input. that solution works for necko, but in effect it means that every XPCOM API, which uses AUTF8String, would need similar IsUTF8 checks. i think we might be well-served by having xpconnect perform a more strict conversion from unicode to |string|. xpconnect could truncate to 7-bits or perhaps it should throw an exception if the input is invalid! ;-) see bug 250900 for reference.
There's a more general problem. When one type is being converted to another type and the conversion result is out of range of the destination type. For instance an int containing -1 converted to an unsigned int. Or an int containing 40000 converted to a short. Should those throw an exception? This has been brought up before and something in the back of my mind One issue to consider is the performance hit we'll take by either trimming every string to 7 bits or checking every character to see if its out of range. If we're going to take such a hit when strings cross JS/XPCOM boundaries it makes me wonder if we wouldn't be better off with a common string representation and provide the appropriate rendering of the string when needed. I wonder if that existed if the number of conversions would actually go down.
Yeah, performance concerns are the real kicker. > If we're going to take such a hit when strings cross JS/XPCOM boundaries it > makes me wonder if we wouldn't be better off with a common string > representation and provide the appropriate rendering of the string when > needed. Of course, this isn't an option for |string| :-(
Right, it's unfortunate that the implementation of "string" was exposed by XPCOM. Ignoring performance, the question still remains should this be an exception or should it "truncate" similar to what currently happens with numbers?
I think we should try throwing an exception. This may cause problems for people using nsIOutputStream::write, which probably should not be a scriptable method in the first place.
for the record, i'm already paying an incredible performance penalty with the stuff i do, at times we are reading/writing a character at a time, and triggering threadsafety warnings for each poke. -- I will eventually post some classinfo patches to reuces the number of threadsafety warnings. further, we use nsIOutputStream nsITransport.openOutputStream. it's marked as scriptable, is not deprecated and for all i know we have no alternatives. afaik our code is both intl safe and binary safe, as we send arbitrary images through our code and they render correctly (and we have qa who are picky enough to complain about images).
You may be paying the cost, but is the rest of Mozilla? The majority of the browser code may be happily ignorant and injecting this overhead in such a well travelled place could noticeably slow things down. Not to say that that code is safe and/or correct. I'll try and get some measurements, but if anyone else is able to put a patch together or willing to do time trials that would be a great help.
i'm suggesting that the cost i pay will only rise.
timeless: it sounds to me as if you should be wrapping your nsIOutputStream instance with a nsIBinaryOutputStream instance. then you would be able to safely write binary data to the stream. using nsIOutputStream::write to send anything other than ASCII from JS is a bad idea.
Assignee: dbradley → nobody
QA Contact: pschwartau → xpconnect
Per policy at https://wiki.mozilla.org/Bug_Triage/Projects/Bug_Handling/Bug_Husbandry#Inactive_Bugs. If this bug is not an enhancement request or a bug not present in a supported release of Firefox, then it may be reopened.
Status: NEW → RESOLVED
Closed: 7 years ago
Resolution: --- → INACTIVE
You need to log in before you can comment on or make changes to this bug.