Closed
Bug 288877
Opened 20 years ago
Closed 20 years ago
Need stream-based Unicode converter apis
Categories
(Core :: Internationalization, defect, P2)
Core
Internationalization
Tracking
()
RESOLVED
FIXED
mozilla1.8beta2
People
(Reporter: bzbarsky, Assigned: Biesinger)
References
Details
Attachments
(2 files, 1 obsolete file)
|
10.06 KB,
patch
|
jshin1987
:
review+
|
Details | Diff | Splinter Review |
|
1.74 KB,
patch
|
Details | Diff | Splinter Review |
Per discussion on irc, currently the very simple operation of taking a string and uploading it to a server in a given encoding is incredibly painful if done right (and all the non-painful ways screw up non-ISO-8859-1 text). biesi and I were talking about this, and it seems that the solution is to have a stream-based Unicode conversion api that consumers can use. Something like (leaving off exactly how the encoding is specified for clarity): nsIInputStream convertFromUnicode(in AString string); AString convertToUnicode(in nsIInputStream stream); This second one has a little problem if the stream is non-blocking; how to address that? Also, do we want to make nsIUnicharInputStream usefully scriptable and allow conversions to/from it? We already have that in the form of the nsConverterInputStream, sorta....
| Assignee | ||
Comment 1•20 years ago
|
||
> Also, do we want to make nsIUnicharInputStream usefully scriptable bug 260281 > We already have that in the form of the nsConverterInputStream, sorta.... that should really be scriptable as well. conversion unicharinputstream -> iinputstream is probably not so useful, imo...
| Reporter | ||
Comment 2•20 years ago
|
||
Agreed on conversion from unichar stream to input stream. I don't really see a use case for that.
| Assignee | ||
Comment 3•20 years ago
|
||
OK, this implements that stuff on nsIScriptableUnicodeConverter. I used wstring instead of AString to follow the style of this interface (if reviewers want, I can convert the entire interface to use AString) nonblocking streams just return NS_BASE_STREAM_WOULD_BLOCK (if they run out of data before EOF); that seemed to make the most sense to me and was easiest to implement.
Attachment #179511 -
Flags: superreview?(bzbarsky)
Attachment #179511 -
Flags: review?(jshin1987)
| Assignee | ||
Comment 4•20 years ago
|
||
some testcases...
js> conv.charset = "UTF-16"
UTF-16
js> conv.convertFromInputStream(conv.convertToInputStream("foo"))
foo
js> str=conv.convertToInputStream("foo: \u00FC");
[xpconnect wrapped nsIInputStream]
[... snipped wrapping of str with an nsIBinaryInputStream bin ...]
js> bin.readByteArray(12)
102,0,111,0,111,0,58,0,32,0,252,0
js>
conv.convertFromInputStream(conv.convertToInputStream("\u00FC")).charCodeAt(0).toString(16)
fc
Status: NEW → ASSIGNED
Priority: -- → P1
Target Milestone: --- → mozilla1.8beta2
| Reporter | ||
Comment 5•20 years ago
|
||
Comment on attachment 179511 [details] [diff] [review] possible patch >Index: uconv/idl/nsIScriptableUConv.idl I'd definitely prefer AString usage here.... string sharing is possiblem then, and these strings can get long. The rest looks good!
Attachment #179511 -
Flags: superreview?(bzbarsky) → superreview+
Comment 6•20 years ago
|
||
So, there's also the existing stream converter concept. Maybe we just need a way to wrap a nsIInputStream around an AString. Then, you could use nsIStreamConverter::ConvertData. Since nsIInputStream provides a non-blocking API, it makes sense for your converter to be a nsIInputStream layered on top of the source nsIInputStream.
| Reporter | ||
Comment 7•20 years ago
|
||
> Maybe we just need a way to wrap a nsIInputStream around an AString.
And just deliver the UTF-16 bytes in native endianness? That seems like a lot
more code to implement and a lot harder to use from JS (have to create the input
stream, then have to create a converter, then have to put one into the other,
hope it implements sync conversions, etc).
Comment 8•20 years ago
|
||
I agree that the API in the proposed patch is nice and simple. It works great for small streams. It doesn't work for large streams. Of course, we don't really have a good upload story for large data. We expect all of the data to be there when we start uploading :-( If you wanted to use this sort of thing in a general way with nsIInputStream as your source, obviously it would fail in some cases. It's interesting that using this API, my input stream might give some data, then fail with NS_BASE_STREAM_WOULD_BLOCK. The data that it gave out would be lost. Should this code expect IsNonBlocking() to return false? Maybe, maybe not. Should this API give me a way to say convert up to a certain number of bytes? Also, what if my input stream supports ReadSegments? We're missing an opportunity to optimize for that case. Maybe that doesn't matter too much.
| Reporter | ||
Comment 9•20 years ago
|
||
I agree that the stream-to-string conversion is suboptimal here. Perhaps I should describe the use case we are trying to solve... uploading data via nsIUploadChannel from JS. So we start with a JSString and want to end up with an nsIInputStream. At the moment this is simply not possible. The closest to it would be if JS could create nsIPipe implementations (for which I see no contractid). Then it could do: 1) Convert string to byte array using nsIScriptableUnicodeConverter. 2) Create a pipe. 3) Create a binary output stream; wrap it around the output stream of the pipe. 4) Write byte array to binary output stream. 5) Pass input stream of the pipe to the upload channel. The convertFromUnicode() method biesi is adding addresses this need very nicely. convertToUnicode is a bigger problem, but I think we need _some_ solution there... If nothing else, we need a reasonable way of actually reading text data in JS from network streams. Note that in those cases the stream will in fact typically be non-blocking (it'll be the stream passed to onDataAvailable).
Comment 10•20 years ago
|
||
instead of nsIPipe, you could nsIStorageStream. this is exactly what people use today to upload from JS. stream-to-string should be useful from a JS nsIStreamListener::OnDataAvailable implementation. that requires support for non-blocking streams, and consuming only up to a certain byte count. it therefore also requires converter state being held over to the next OnDataAvailable call. so, maybe this is the wrong API for that. maybe this patch should just focus on getting string-to-stream conversion done well.
| Assignee | ||
Comment 11•20 years ago
|
||
> be non-blocking (it'll be the stream passed to onDataAvailable). that stream can be blocking or nonblocking, all you know is that you can read aCount bytes without blocking. darin: It seems that nsIConverterInputStream it something like what you are asking for here: http://lxr.mozilla.org/seamonkey/source/intl/uconv/public/nsIConverterInputStream.h You can give it a stream, and can read from it as an nsIUnicharInputStream. Alas, this is not scriptable :( >I agree that the API in the proposed patch is nice and simple. It works great >for small streams. It doesn't work for large streams. That's true; I would say we should make nsIConverterInputStream.h scriptable and suggest using that for large streams, and have this API here for small streams. > It's interesting that using > this API, my input stream might give some data, then fail with > NS_BASE_STREAM_WOULD_BLOCK. hm, yeah, that is true. I considered ensuring/requiring that isNonBlocking returns false; I thought it might be useful to allow for the case where the stream does not actually return WOULD_BLOCK. > Should this API give me a way to say convert up to a certain number of bytes? Hm, that byte boundary might fall inside a character (eg in UTF-8, or any other charset where a character can take more than one byte). what to do in that case? >Also, what if my input stream supports ReadSegments? We're missing an >opportunity to optimize for that case. Maybe that doesn't matter too much. that's true, I will fix that. bz: Loading text data from js could be implemented via http://lxr.mozilla.org/seamonkey/source/netwerk/base/public/nsIUnicharStreamLoader.idl#83, I think (once unicharstream becomes scriptable)
| Assignee | ||
Comment 12•20 years ago
|
||
> maybe this patch should just focus on getting string-to-stream conversion
> done well.
that's probably a good idea, I'll split the patch up| Assignee | ||
Updated•20 years ago
|
Attachment #179511 -
Flags: review?(jshin1987)
| Assignee | ||
Comment 13•20 years ago
|
||
ok, this only does string->stream conversion; uses AString instead of wstring.
Attachment #179511 -
Attachment is obsolete: true
Attachment #179529 -
Flags: review?(jshin1987)
| Assignee | ||
Comment 14•20 years ago
|
||
this patch (applied on top of the previous one) adds stream->string conversion (not for review)
Comment 15•20 years ago
|
||
Comment on attachment 179529 [details] [diff] [review] patch v2: Only string->stream conversion (checked in) r=jshin >+nsScriptableUnicodeConverter::ConvertToUnicode(const char *aSrc, nsAString& _retval) > { > return ConvertFromByteArray(NS_REINTERPRET_CAST(const PRUint8*, aSrc), > strlen(aSrc), > _retval); It's not this bug, but it wouldn't work with non-null-safe charsets.
Attachment #179529 -
Flags: review?(jshin1987) → review+
| Assignee | ||
Comment 16•20 years ago
|
||
(In reply to comment #15) > It's not this bug, but it wouldn't work with non-null-safe charsets. Yeah... I want to stay backwards-compatible in this method though... maybe it could take an const nsACString&
| Assignee | ||
Comment 17•20 years ago
|
||
Comment on attachment 179529 [details] [diff] [review] patch v2: Only string->stream conversion (checked in) Checking in intl/uconv/idl/nsIScriptableUConv.idl; /cvsroot/mozilla/intl/uconv/idl/nsIScriptableUConv.idl,v <-- nsIScriptableUConv.idl new revision: 1.9; previous revision: 1.8 done Checking in intl/uconv/src/nsScriptableUConv.cpp; /cvsroot/mozilla/intl/uconv/src/nsScriptableUConv.cpp,v <-- nsScriptableUConv.cpp new revision: 1.16; previous revision: 1.15 done Checking in intl/uconv/src/nsScriptableUConv.h; /cvsroot/mozilla/intl/uconv/src/nsScriptableUConv.h,v <-- nsScriptableUConv.h new revision: 1.9; previous revision: 1.8 done
Attachment #179529 -
Attachment description: patch v2: Only string->stream conversion → patch v2: Only string->stream conversion (checked in)
| Assignee | ||
Updated•20 years ago
|
Priority: P1 → P2
| Assignee | ||
Comment 18•20 years ago
|
||
a better solution for the convertToUnicode suggestion here may be to just use bug 295047's nsConverterInputStream.
Depends on: 295047
| Assignee | ||
Comment 19•20 years ago
|
||
I claim this is fixed by bug 295047's scriptable nsIConverterInputStream.
Status: ASSIGNED → RESOLVED
Closed: 20 years ago
Resolution: --- → FIXED
You need to log in
before you can comment on or make changes to this bug.
Description
•