Closed Bug 170585 Opened 22 years ago Closed 22 years ago

Scriptable Streams are broken.

Categories

(Core :: XPCOM, defect)

defect
Not set
normal

Tracking

()

RESOLVED FIXED

People

(Reporter: jst, Assigned: rginda)

References

Details

Attachments

(6 files, 10 obsolete files)

2.27 KB, patch
darin.moz
: review+
Details | Diff | Splinter Review
682 bytes, patch
Details | Diff | Splinter Review
1.29 KB, patch
Details | Diff | Splinter Review
1.40 KB, application/x-javascript
Details
8.83 KB, patch
darin.moz
: review+
Details | Diff | Splinter Review
841 bytes, text/plain
Details
The interface nsIScriptableInputStream interface's read() method is completely broken and can't be used from script with any data that's not string data. The definition of the method is like this: string read(in unsigned long aCount); but this just doesn't work w/ embedded nulls in the string since those will terminate the string output and any data past the first embedded null will be unreachable from JS. This interface is of course frozen, but the reasons for freezing this interface are unknown to me. Who relies on it? Can we un-freeze it and fix it (and change the IID)? If not, it's kindof a broken useless interface...
chatzilla uses this interfaces. i don't know of any embedders that use this interface.
jst, how would you redefine this interface to support embeded nulls?
That should be something like: void read(in unsigned long aCount, [retval, array, size_is(aCount)] out PRInt8 aResult); Not tested, but should work... Alternatively the return type could be an nsACString, but I don't know that we want one of those in a *stream* interface.
I don't want to have to convert that array to a string myself. What a total PITA for simple text based protocols like IRC, TELNET, etc. Perhaps jst's method could be called readBytes, and clients could decide which read method was right for their application.
right, nsIScriptableInputStream::read is totally fine for text based streams. we should just document this restriction and introduce a new interface for streaming binary data to JS components.
Then maybe nsACString is the answer after all?
expanding bug.
Summary: nsIScriptableInputStream broken. → Scriptable Streams are broken.
How about something like: http://lxr.mozilla.org/mozilla/source/xpcom/io/nsIBinaryInputStream.idl jst said: I'd say readBytes() should return an ACString, cheaper implementation (i.e. no need to malloc for every call), we just need to document that the string you get back should be treated as an array of bytes, nothing else. What about byte-order?
Comment on attachment 108916 [details] [diff] [review] patch v.1 - marks interface UNDER_REVIEW sr=jst
Attachment #108916 - Flags: superreview+
Comment on attachment 108916 [details] [diff] [review] patch v.1 - marks interface UNDER_REVIEW r=darin (how about a comment indicating that some methods like setEOF might be unimplemented if the underlying stream does not support truncation -- e.g., maybe the stream is readonly)
Attachment #108916 - Flags: review+
I just checked in this bustage fix: Index: nsISeekableStream.idl =================================================================== RCS file: /cvsroot/mozilla/xpcom/io/nsISeekableStream.idl,v retrieving revision 3.6 diff -u -6 -d -r3.6 nsISeekableStream.idl --- nsISeekableStream.idl 11 Dec 2002 00:01:36 -0000 3.6 +++ nsISeekableStream.idl 11 Dec 2002 00:51:42 -0000 @@ -87,13 +87,13 @@ /** * tell * * This method reports the current offset, in bytes, from the start of the * stream. */ - long tell(); + unsigned long tell(); /** * setEOF * * This method truncates the stream at the current offset. (This was PRUint32 before the patch.)
Has there been any movement on getting nsIScriptableInputStream to work with files containing nulls? There doesn't seem to be any way of reliably reading files from JS without this. http://lxr.mozilla.org/mozilla/source/extensions/python/xpcom/src/PyIInputStream.cpp#33 http://mozdev.org/bugs/show_bug.cgi?id=3064 ...
Can someone post a test case text file w/ embedded nulls that breaks nsIScriptableInputStream? This affects jslib as well, since 'read' is one of the most widely used methods. Thanks --pete
Sure, here's an xpcshell script that does it.
Looks like the problem is in PR_Read. http://lxr.mozilla.org/seamonkey/source/netwerk/base/src/nsFileStreams.cpp#327 It fills the buffer and returns the correct bytes read but the buffer isn't filled to the actual byte amount. I haven't looked at PR_Read internals yet. --pete
pete, check out the first comment in this bug. The problem is that js expects all strings to be null terminated. If you read in some binary data with embedded nulls, javascript will truncate the data at the first null. The problem is well understood, we just need to agree on an acceptable interface for reading binary data. There are two proposals on the table, as I understand it. One is to have a readBinary (or readBytes, or something) method that returns an nsACString, and the other is to return an array of integers. The only issue with nsACString is that it is documented as always being encoded in UTF-8, whereas arbitrary binary data cannot be guaranteed as being in any encoding.
I'd say go with ACString readBytes (in unsigned long aCount); Easiest to implement. Callers can choose which read they want to use. What interface would this live in? nsIScriptableInputStream is FROZEN. I'll implement if there is agreement to go this way. --pete
Attached patch rough patch (obsolete) — Splinter Review
What about implementing something like this?
Using the file from the text case provided "null.txt" My output is now this: This is a text file with a null (0) inside.
I'd rather not replace nulls if possible: it would mean that JS code still couldn't reliably read binary files. Besides, if we're using an ACString, is there any need to replace nulls? I thought the point of using an ACString was that it was null-safe.
If I don't replace the null I get the same behavior as char*, js truncates it. --pete
Hmm. Is it possible to get it to return an array of bytes, instead? (Thinking about it, most people reading binary files probably would rather have an array of bytes than a string.)
yes it is possible. void read(in unsigned long count, [array, size_is(len)] out octet buf, out unsigned long bufLen); but this will not be handled very efficiently by the JS engine (or so i've been told).
Inefficiency isn't the greatest problem here, though: at present there is _no_ method available, efficient or inefficient, that can reliably read binary data (PeteC's example above is the best yet, but still doesn't allow the JS program to distinguish nulls and zeroes). I doubt the method can be anywhere near as inefficent as the only currently reliable way to read a file: you must discover the file's length, and then use read() to load the file. Whenever read() returns a string shorter than the length you requested, you must assume there was a null immediately afterwards. You must then reposition the stream to the byte after the null (since the data after the null was read in, even though it's not available) and continue reading as before.
What does an nsACString look like to javascript? That's the part that I'm not clear on. Does it look like a regular js string, that just happens to be able to contain nulls, or doe it look like some XPCOM class?
rginda: the nsACString is copied to a JS string, so there is no difference from the point of view of JS.
XPConnect will translate the nsACString (IDL type ACString) to a JS string, including any embedded nulls (If you find otherwise let me know as that would be a bug).
Attached patch test case 2 (obsolete) — Splinter Review
> If you find otherwise let me know as that would be a bug) This sample assigns the entire buffer to nsACString. js treats the nsACString as a regular string and truncates at the first null. Output of test case using this patch is: This is a text file with a null (
Hmm, I went back and double checked the code and XPConnect is passing along the length in both directions of the conversion. I suspect that the output functions within the test aren't handling the null. Generally printf is used, which will terminate when it encounters a null. Printing the length of the string might be a good way to see if this is true.
Are you using the test case posted as the second attachment? It seems to print the length, what did that report?
Yea, already considered the printf problem. The actual length of the file is 44. The length of the js var is 33 My output: This is a text file with a null ( 33
I really don't think this is an XPConnect issue, as it uses the length's reported by nsACString and the JS string during conversions, but I can't be 100% certain of that. I tried to get the test case running, but wasn't able to build the extension. Setting MOZ_EXTENSIONS=inspector didn't get it, as the resource directory under there doesn't get built.
dbradley, I wasn't using the test js code provided in the test case (it didn't work for me either), just the "null.txt" file. Steps to test: 1. apply patch 114431 and recompile. 2. copy attachment 114487 [details] [diff] [review] to you mozilla dist dir 3. run attachment 114487 [details] [diff] [review] in xpcshell from the command line 4. output should look like this: /run-mozilla.sh xpcshell -w -s stream.js true len: 44 This is a text file with a null ( 33
With nsScriptableInputStream::ReadBytes properly implemented it works fine. You have to pass in the length to the string, otherwise it uses the null terminator as the end of the string. nsCAutoString cString(buffer); The string was truncated before it ever got to XPConnect or JS.
Ah, duh. I changed the patch to: - nsCAutoString cString(buffer); + nsCAutoString cString(buffer, count); It does indeed work. It prints the same but the var size is now 44.
Attached patch init nsACAutoString w/ size (obsolete) — Splinter Review
Ok, the test patch hooks into nsIScriptableInputStream which is frozen. Any suggestions as to where the public idl declaration should live?
I got comment 37 after 38 in my mail Hmm, so available is reporting 44 bytes. Is the stream including the final null terminator in that count?
Yes, available is 44, fileSize is 44, js str.length is 44 as well, where the patch before initing nsAString w/ "count" size was 33 (the first null terminator). If js print is wrapping printf, then that would explain why stdout is truncated. --pete
Oh, excellent. Does str.charCodeAt(34)==0 ?
Yes, print/dump use printf, so it's not going to print the full string.
str.charCodeAt(33)==0 Yes. Remember we start at 0.
What about writing to a stream? Is that unaffected by nulls, or do we need a fix for that too?
Comment on attachment 114493 [details] [diff] [review] init nsACAutoString w/ size +nsScriptableInputStream::ReadBytes(PRUint32 aCount, nsACString & _retval) { ... + nsCAutoString cString(buffer, count); + nsMemory::Free(buffer); + _retval = cString; How about just this in stead: + _retval = nsDependentCString(buffer, count); That'll save you a string copy and potentially an allocation too.
I don't know of anything in our code that says an nsACString is in any way tied to UTF8, AUTF8String is, but ACString is not, AFAIK.
Excellent jst. It's been a while since I messed w/ Mozilla strings. jst, perhaps you can offer up some guidence here. nsIScriptableInputStream is FROZEN. So we can't add the method there. Any sugestions where the idl declararion should be? Thanks --pete
Assignee: dougt → petejc
We could either invent a new scriptable interface stream interface and make up a new name for it, or we could go the COM way with this and make an nsIScriptableInputStream2 interface that inherits nsIScriptableInputStream (that way there's no vtable cost in the implementations). I kinda like the latter approach myself. Ideally we could leave the new interface unfrozen for a while until we're more certain that it really does work... Then that too could be frozen (whenever we feel comfortable enough about this).
Agree, nsIScriptableInputStream2 sounds logical. There will be room to add other methods if the need arises. I'll do this in my next pass and make the changes to the patch from jst comment #46. Accepting bug.
Status: NEW → ASSIGNED
we already have http://lxr.mozilla.org/mozilla/source/xpcom/io/nsIBinaryInputStream.idl you just can't create one from script because there is no factory registered.
Yea, I noticed that. I don't see a heck of a lot of consumers of "nsIBinaryInputStream". http://lxr.mozilla.org/seamonkey/search?string=nsIBinaryInputStream Is the interface even used anywhere directly? --pete
nsIOjbectInputStream inherits nsIBinaryInputStream. Not that that means we can't change it, but it is used...
Just to make this clear, if we give the implementation of nsIScriptableInputStream2 or whatever we'll call it) classinfo (if it doesn't already have it) then the name of the interface doesn't really matter since all interfaces that are implemented by the object can be flattened and interface names are not an issue any more (for JS callers, that is).
Is this bug a dup of bug 128215? or separate problem?
*** Bug 128215 has been marked as a duplicate of this bug. ***
Ok, I think this is what we want here.
I just got the tab in the new idl file so please ignore it.
Why not just create a factory for the existing binary streams? Even if we add this interface, we still need to worry about *writing* binary streams. nsIBinaryInputStream and nsIBinaryOutputStream would take care of both situations.
- This bug is flagged for Help. - The requirement is for script to read binary files. - It was discussed that the direction to take was to create an nsIScriptableInputStream2 interface. - It has now been implemented. - The patch is posted. Be my guest, otherwise i'll take review on what's posted here please.
> - This bug is flagged for Help. That's not a reason at all. > - The requirement is for script to read binary files. The subject of the bug is "scriptable streams are broken". They are broken in two places, not one. > - It was discussed that the direction to take was to create an nsIScriptableInputStream2 interface. It was discussed that that was a possible direction. I didn't see anyone say it was the "correct" solution. In fact, I think it's a pretty ugly solution, considering we already *have* interfaces that fix *both* of the problems here. That is why I mentioned the existing interfaces in comment 51. Perhaps I should have made myself clearer then. > - It has now been implemented. That's not a good reason to check it in. > - The patch is posted. Neither is that.
Here you go pete. This patch adds the factories required to instantiate the binary stream classes. These classes need buffered input streams because file streams don't do ReadSegments, or something like that. It's an extra step, but it isn't that bad. Perhaps the binary stream classes could provide a convenience method to do that. I think the signature of the readBytes method should change to *return* the result, instead of using an out parameter, and |setInputStream| and |setOutputStream| should be both be renamed to |init| to match the other stream interfaces.
Here is a testcase showing nsIBinaryInputStream usage on the sample file.
Cool Rob. That's all i'm saying. Post the code. I have very little time to code up multiple Mozilla patches. A few questions. Which solution is best for performance? Which has a smaller mem footprint? Why was the factory never created in the first place? I am w/ Rob here this appears to be the better solution. So lets get review on any of these patches and get this issue fixed.
Taking.
Assignee: petejc → rginda
Status: ASSIGNED → NEW
This patch has readBytes *return* the string, instead of send it as an out param. This required a small change in nsHashtable.cpp, to reverse the order of parameters. I didn't bother renaming the set*Stream() methods to init(), because that would shadow the init() method of some inherited interfaces. I also added readByteArray and writeByteArray, which are just wrappers around readBytes and writeBytes methods. The wrappers work in terms of PRUint8 arrays. Arrays of numbers are more natural for JavaScript code that wants to write binary data of its own creation, as any string created in js will be truncated as wherever the first null happens to be. There are a few other places in the tree where ReadBytes needs to be updated. I'll have a patch for that soon.
Attachment #115208 - Attachment is obsolete: true
changes to the rest of the tree.
Attachment #114363 - Attachment is obsolete: true
Attachment #114382 - Attachment is obsolete: true
Attachment #114431 - Attachment is obsolete: true
Attachment #114493 - Attachment is obsolete: true
Attachment #115164 - Attachment is obsolete: true
Attachment #115165 - Attachment is obsolete: true
Attachment #115210 - Attachment is obsolete: true
Attachment #115346 - Flags: superreview?(dougt)
Attachment #115346 - Flags: review?(darinf)
Attachment #115346 - Flags: review?(darinf) → review?(darin)
Comment on attachment 115346 [details] [diff] [review] add nsIBinary*Stream factories patch, take 2 >Index: io/nsBinaryStream.cpp >+nsBinaryInputStream::Create(nsISupports *aOuter, REFNSIID aIID, void **aResult) why not use the GENERIC_FACTORY macro instead? >Index: io/nsIBinaryInputStream.idl >- void readBytes([size_is(aLength)] out string aString, in PRUint32 aLength); >+ void readBytes(in PRUint32 aLength, >+ [size_is(aLength), retval] out string aString); what's up with this change? read methods usually pass the buffer, followed by the length. i know out params typically occur as the last parameter, but this is against convention for read-like methods. >+ void readByteArray(in PRUint32 aLength, >+ [array, size_is(aLength), retval] out PRUint8 aBytes); same goes for this method. >Index: io/nsIBinaryOutputStream.idl > void writeBytes([size_is(aLength)] in string aString, in PRUint32 aLength); notice writeBytes follows the buffer before length convention. don't mix it up; it'll just cause headaches for developers.
Attachment #115346 - Flags: review?(darin) → review-
>why not use the GENERIC_FACTORY macro instead? NS_GENERIC_FACTORY_CONSTRUCTOR requires that the class have a constructor that takes no arguments. The nsBinaryInputStream and nsBinaryOutputStream classes both take a parameter. I'll attach a new patch that changes the constructors and uses a genercic factory instead. It's a tradeoff either way. >what's up with this change? read methods usually pass the buffer, >followed by the length. i know out params typically occur as the last >parameter, but this is against convention for read-like methods. This isn't just an out param anymore, it's a now a [retval]. [retval]s *have* to be the last parameter. Using out parameters makes for ugly javascript code, like: o = new Object(); bin.readBytes(o, f.fileSize); var str = o.value; Which becomes: str = readBytes(f.fileSize); when a [retval] is used. I'd prefer to leave it like this for this reason. >notice writeBytes follows the buffer before length convention. don't mix it >up; >it'll just cause headaches for developers. writeBytes doesn't have a [retval] (or even an out) parameter, they're both in, so I didn't change anything here.
Use NS_GENERIC_FACTORY_CONSTRUCTOR instead, which requires changing the signatures of the constructors for nsBinary(Input|Output)Stream. The signature change involves two small changes in nsFastLoadFile.h, and possibly other changes outside of xpcom/. If there are other changes, I'll post them when my tree completes.
Attachment #115346 - Attachment is obsolete: true
No need for those constructors in the nsBinaryStreams.cpp anymore.
Attachment #115437 - Attachment is obsolete: true
Comment on attachment 115441 [details] [diff] [review] add nsIBinary*Stream factories patch, take 4 I didn't find any other bustage related to the constructor changes. Darin, what say you re: the [retval] argument?
Attachment #115441 - Flags: review?(darin)
rob: ah, i see... well, then it's a tradeoff. what's nice for javascript is not so nice for C++, but today nsIBinaryInputStream is only used from C++ code. cc'ing brendan to see what his thoughts are on this matter. at the very least, i think it is wrong to have inconsistent argument ordering between ReadBytes and WriteBytes. this will frustrate C++ coders... "was it ReadBytes or WriteBytes that has length before buffer?!?" also, should ReadBytes and WriteBytes really be scriptable? i mean, don't they have the same problem as existing read/write methods? if scriptable, then we must document the fact that these cannot be used to read/write data with embedded nulls from script.
Comment on attachment 115441 [details] [diff] [review] add nsIBinary*Stream factories patch, take 4 I don't mind the argument order difference -- read and write and not the same kind of function here, as they might seem to be in Unix system call terms. Even there, the kernel copies user data (or page-flips it) in on write, and out on read. I think the patch looks great, FWIW. /be
>should ReadBytes and WriteBytes really be scriptable? i mean, don't they > have the same problem as existing read/write methods? No, ReadBytes doesn't have the same problem. It turns out that the only problem with nsIScriptableInputStream.read is the missing [size_is()] attribute. If that method appeared as... void read (in unsigned long length, [size_is(length), retval] out string bytes) ...everything on the read side would be fine. Without this attribute, XPConnect attempts to discover the length by searching for the first null. As proof, here is the output of the readBytes test in the ``new testcase, shows readByteArray usage'' attachment... exists: true length: 43 escaped string: ``This is a text file with a null (\0) inside.'' length of original string: 43 Writing data won't be a problem if you're writing a string that came directly from readBytes (or any other string that came from outside JavaScript, and had the size properly set.) ChatZilla might use writeBytes like this to write binary data it received from an input stream during a DCC transfer. There is no way to create a string in JavaScript that can contain embedded nulls, so [size_is] can't help us here. For this, the JavaScript programmer would manipulate the data as an array of numbers, and use writeByteArray. An application that wanted to create an image from scratch might do this. An application that wanted to manupulate a binary input stream before sending it to an output stream would use a combination of readByteArray and writeByteArray.
> There is no way to create a string in JavaScript that can contain embedded > nulls, so [size_is] can't help us here. Don't believe the hype! Ever since the very-early Mocha string rewrite, I believe, JS has been able to handle embedded NULs: js> "foo\0null".charCodeAt(3) 0 js> "foo\0null".length 8
> Don't believe the hype! Ever since the very-early Mocha string rewrite, I > believe, JS has been able to handle embedded NULs Huh, I thought I verified that before I spoke. I wonder what I was smoking. Still, the fact that js arrays are mutable could be useful to folks who want to manipulate the data without causing a bunch of string copies.
all other stream "read" methods in the tree pass a buffer reference before the buffer length. this would be the one notable exception. is that not awkward? maybe this is a worthwhile exception, and i suppose the C++ compiler helps. maybe it just doesn't matter that much ;-) anyone else have thoughts on this?
Comment on attachment 115441 [details] [diff] [review] add nsIBinary*Stream factories patch, take 4 ok, i give in... C++ coders aren't likely to call these methods since they can easily call the methods directly on nsIInputStream/nsIOutputStream. so, what's nice for JS is probably best here. r=darin
Attachment #115441 - Flags: review?(darin) → review+
Attachment #115441 - Flags: superreview?(dougt)
Comment on attachment 115441 [details] [diff] [review] add nsIBinary*Stream factories patch, take 4 Is this something that you want to freeze?
No, not yet. I'd like to make it available first and take it from there.
Comment on attachment 115441 [details] [diff] [review] add nsIBinary*Stream factories patch, take 4 looks safe to me.
Attachment #115441 - Flags: superreview?(dougt) → superreview+
checked in.
Status: NEW → RESOLVED
Closed: 22 years ago
Resolution: --- → FIXED
I used the code from attachment 127940 [details] to read a binary file with the new binaryinputstream and it worked fine.
Attachment #115346 - Flags: superreview?(dougt)
*** Bug 148264 has been marked as a duplicate of this bug. ***
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: