Closed Bug 288877 Opened 20 years ago Closed 20 years ago

Need stream-based Unicode converter apis

Categories

(Core :: Internationalization, defect, P2)

defect

Tracking

()

RESOLVED FIXED
mozilla1.8beta2

People

(Reporter: bzbarsky, Assigned: Biesinger)

References

Details

Attachments

(2 files, 1 obsolete file)

Per discussion on irc, currently the very simple operation of taking a string
and uploading it to a server in a given encoding is incredibly painful if done
right (and all the non-painful ways screw up non-ISO-8859-1 text).  biesi and I
were talking about this, and it seems that the solution is to have a
stream-based Unicode conversion api that consumers can use.  Something like
(leaving off exactly how the encoding is specified for clarity):

  nsIInputStream convertFromUnicode(in AString string);
  AString convertToUnicode(in nsIInputStream stream);

This second one has a little problem if the stream is non-blocking; how to
address that?  Also, do we want to make nsIUnicharInputStream usefully
scriptable and allow conversions to/from it?  We already have that in the form
of the nsConverterInputStream, sorta....
> Also, do we want to make nsIUnicharInputStream usefully scriptable

bug 260281

> We already have that in the form of the nsConverterInputStream, sorta....

that should really be scriptable as well. conversion unicharinputstream ->
iinputstream is probably not so useful, imo...
Agreed on conversion from unichar stream to input stream.  I don't really see a
use case for that.
Attached patch possible patch (obsolete) — Splinter Review
OK, this implements that stuff on nsIScriptableUnicodeConverter.
I used wstring instead of AString to follow the style of this interface (if
reviewers want, I can convert the entire interface to use AString)

nonblocking streams just return NS_BASE_STREAM_WOULD_BLOCK (if they run out of
data before EOF); that seemed to make the most sense to me and was easiest to
implement.
Attachment #179511 - Flags: superreview?(bzbarsky)
Attachment #179511 - Flags: review?(jshin1987)
some testcases...

js> conv.charset = "UTF-16"
UTF-16
js> conv.convertFromInputStream(conv.convertToInputStream("foo"))
foo

js> str=conv.convertToInputStream("foo: \u00FC");
[xpconnect wrapped nsIInputStream]
[... snipped wrapping of str with an nsIBinaryInputStream bin ...]
js> bin.readByteArray(12)
102,0,111,0,111,0,58,0,32,0,252,0

js>
conv.convertFromInputStream(conv.convertToInputStream("\u00FC")).charCodeAt(0).toString(16)
fc
Status: NEW → ASSIGNED
Priority: -- → P1
Target Milestone: --- → mozilla1.8beta2
Comment on attachment 179511 [details] [diff] [review]
possible patch

>Index: uconv/idl/nsIScriptableUConv.idl

I'd definitely prefer AString usage here.... string sharing is possiblem then,
and these strings can get long.

The rest looks good!
Attachment #179511 - Flags: superreview?(bzbarsky) → superreview+
So, there's also the existing stream converter concept.  Maybe we just need a
way to wrap a nsIInputStream around an AString.  Then, you could use
nsIStreamConverter::ConvertData.

Since nsIInputStream provides a non-blocking API, it makes sense for your
converter to be a nsIInputStream layered on top of the source nsIInputStream.
> Maybe we just need a way to wrap a nsIInputStream around an AString.

And just deliver the UTF-16 bytes in native endianness?  That seems like a lot
more code to implement and a lot harder to use from JS (have to create the input
stream, then have to create a converter, then have to put one into the other,
hope it implements sync conversions, etc).
I agree that the API in the proposed patch is nice and simple.  It works great
for small streams.  It doesn't work for large streams.  Of course, we don't
really have a good upload story for large data.  We expect all of the data to be
there when we start uploading :-(

If you wanted to use this sort of thing in a general way with nsIInputStream as
your source, obviously it would fail in some cases.  It's interesting that using
this API, my input stream might give some data, then fail with
NS_BASE_STREAM_WOULD_BLOCK.  The data that it gave out would be lost.  Should
this code expect IsNonBlocking() to return false?  Maybe, maybe not.  Should
this API give me a way to say convert up to a certain number of bytes?

Also, what if my input stream supports ReadSegments?  We're missing an
opportunity to optimize for that case.  Maybe that doesn't matter too much.
I agree that the stream-to-string conversion is suboptimal here.  Perhaps I
should describe the use case we are trying to solve... uploading data via
nsIUploadChannel from JS.  So we start with a JSString and want to end up with
an nsIInputStream.

At the moment this is simply not possible.  The closest to it would be if JS
could create nsIPipe implementations (for which I see no contractid).  Then it
could do:

1)  Convert string to byte array using nsIScriptableUnicodeConverter.
2)  Create a pipe.
3)  Create a binary output stream; wrap it around the output stream of the pipe.
4)  Write byte array to binary output stream.
5)  Pass input stream of the pipe to the upload channel.

The convertFromUnicode() method biesi is adding addresses this need very nicely.
 convertToUnicode is a bigger problem, but I think we need _some_ solution
there...  If nothing else, we need a reasonable way of actually reading text
data in JS from network streams.  Note that in those cases the stream will in
fact typically be non-blocking (it'll be the stream passed to onDataAvailable).
instead of nsIPipe, you could nsIStorageStream.  this is exactly what people use
today to upload from JS.

stream-to-string should be useful from a JS nsIStreamListener::OnDataAvailable
implementation.  that requires support for non-blocking streams, and consuming
only up to a certain byte count.  it therefore also requires converter state
being held over to the next OnDataAvailable call.  so, maybe this is the wrong
API for that.

maybe this patch should just focus on getting string-to-stream conversion done well.
> be non-blocking (it'll be the stream passed to onDataAvailable).

that stream can be blocking or nonblocking, all you know is that you can read
aCount bytes without blocking.

darin: It seems that nsIConverterInputStream it something like what you are
asking for here:
http://lxr.mozilla.org/seamonkey/source/intl/uconv/public/nsIConverterInputStream.h
You can give it a stream, and can read from it as an nsIUnicharInputStream.
Alas, this is not scriptable :(

>I agree that the API in the proposed patch is nice and simple.  It works great
>for small streams.  It doesn't work for large streams.

That's true; I would say we should make nsIConverterInputStream.h scriptable and
suggest using that for large streams, and have this API here for small streams.

> It's interesting that using
> this API, my input stream might give some data, then fail with
> NS_BASE_STREAM_WOULD_BLOCK. 

hm, yeah, that is true.

I considered ensuring/requiring that isNonBlocking returns false; I thought it
might be useful to allow for the case where the stream does not actually return
WOULD_BLOCK.

> Should this API give me a way to say convert up to a certain number of bytes?

Hm, that byte boundary might fall inside a character (eg in UTF-8, or any other
charset where a character can take more than one byte). what to do in that case?

>Also, what if my input stream supports ReadSegments?  We're missing an
>opportunity to optimize for that case.  Maybe that doesn't matter too much.

that's true, I will fix that.

bz: Loading text data from js could be implemented via
http://lxr.mozilla.org/seamonkey/source/netwerk/base/public/nsIUnicharStreamLoader.idl#83,
I think (once unicharstream becomes scriptable)
> maybe this patch should just focus on getting string-to-stream conversion
> done well.

that's probably a good idea, I'll split the patch up
Attachment #179511 - Flags: review?(jshin1987)
ok, this only does string->stream conversion; uses AString instead of wstring.
Attachment #179511 - Attachment is obsolete: true
Attachment #179529 - Flags: review?(jshin1987)
this patch (applied on top of the previous one) adds stream->string conversion
(not for review)
Comment on attachment 179529 [details] [diff] [review]
patch v2: Only string->stream conversion (checked in)

r=jshin


>+nsScriptableUnicodeConverter::ConvertToUnicode(const char *aSrc, nsAString& _retval)
> {
>   return ConvertFromByteArray(NS_REINTERPRET_CAST(const PRUint8*, aSrc),
>                               strlen(aSrc),
>                               _retval);

It's not this bug, but it wouldn't work with non-null-safe charsets.
Attachment #179529 - Flags: review?(jshin1987) → review+
(In reply to comment #15)
> It's not this bug, but it wouldn't work with non-null-safe charsets.

Yeah... I want to stay backwards-compatible in this method though... maybe it
could take an const nsACString&
Comment on attachment 179529 [details] [diff] [review]
patch v2: Only string->stream conversion (checked in)

Checking in intl/uconv/idl/nsIScriptableUConv.idl;
/cvsroot/mozilla/intl/uconv/idl/nsIScriptableUConv.idl,v  <-- 
nsIScriptableUConv.idl
new revision: 1.9; previous revision: 1.8
done
Checking in intl/uconv/src/nsScriptableUConv.cpp;
/cvsroot/mozilla/intl/uconv/src/nsScriptableUConv.cpp,v  <-- 
nsScriptableUConv.cpp
new revision: 1.16; previous revision: 1.15
done
Checking in intl/uconv/src/nsScriptableUConv.h;
/cvsroot/mozilla/intl/uconv/src/nsScriptableUConv.h,v  <--  nsScriptableUConv.h
new revision: 1.9; previous revision: 1.8
done
Attachment #179529 - Attachment description: patch v2: Only string->stream conversion → patch v2: Only string->stream conversion (checked in)
Priority: P1 → P2
a better solution for the convertToUnicode suggestion here may be to just use
bug 295047's nsConverterInputStream.
Depends on: 295047
I claim this is fixed by bug 295047's scriptable nsIConverterInputStream.
Status: ASSIGNED → RESOLVED
Closed: 20 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: