Open Bug 329448 Opened 18 years ago Updated 2 years ago

Want a useful scriptable input and output stream

Categories

(Core :: XPCOM, defect)

defect

Tracking

()

mozilla1.8.1

People

(Reporter: Biesinger, Unassigned)

References

(Blocks 1 open bug)

Details

Attachments

(1 file, 1 obsolete file)

Neither our "scriptable" input nor our scriptable output stream can handle data containing null bytes. additionally, the |string| data type they use does not guarantee that non-ASCII characters work.
FWIW, the binary input/output streams that can wrap them are fine for streams of bytes (and scriptable enough). I guess what you're after here is something else?
I think we want something simpler than nsIBinaryInputStream.  We should just create a nsIScriptableInputStream2 that replaces |string| with |ACString|.  The implementation of that interface should use ClassInfo so consumers do not need to QI.
(In reply to comment #1)
> FWIW, the binary input/output streams that can wrap them are fine for streams
> of bytes (and scriptable enough). I guess what you're after here is something
> else?

Oh, I meant to mention that. Their read/write semantics differ from the usual stream semantics (they fail if the requested number of bytes couldn't be read/written, rather than succeed with a partial number of bytes)

And yeah, they are complex interfaces with a single general-purpose method.

Target Milestone: --- → mozilla1.8.1
Why ACString instead of octet array?  That encourages people to treat the result as a string, which is wrong for the output of read(), imo.

Brendan, JSString can contain embedded nulls in a reasonable way, right?  It looks to me like converting data with an embedded null from ACString to JSString should work, but....
We have been treating the ACString IDL type as the preferred way to handle binary data, over octet-arrays, mainly because octet arrays are very cumbersome to code against and the conversion to JS strings avoids a lot of boxing overhead associated with arrays (unless I'm missing an XPConnect special-case conversion of octet arrays).
Octet arrays are not special-cased, so the JS reflection is an Array of numbers, not very efficient.

Yes, JS strings can contain NULs.

/be
(In reply to comment #4)
> Why ACString instead of octet array?  That encourages people to treat the
> result as a string, which is wrong for the output of read(), imo.

yeah, that's my opinion too...

(In reply to comment #5)
> We have been treating the ACString IDL type as the preferred way to handle
> binary data

I guess that depends on who wrote the interface.
Please, ACString is the way to go here.  Arrays of octets are not reflected in a very efficient manner to JS.  Let's not go there.  See nsIStringInputStream2, where I added an ACString attribute.  ACString is the best container we have for arrays of octets.
(It turns out that nsIStringInputStream2 is gone, replaced by nsISupportsCString IIRC, but a comment about it remains :-/)
(In reply to comment #9)
> (It turns out that nsIStringInputStream2 is gone, replaced by
> nsISupportsCString IIRC, but a comment about it remains :-/)

heh.. how could i forget? ;-)  I checked in a correction to the comment in nsStringStream.h.  Thanks for noticing.
Attached patch patch (obsolete) — Splinter Review
so, I decided to provide a byte array method too. it can be more convenient in some places. (especially in combination with interfaces that use byte arrays)

if they are inefficient we can fix the js engine
Attachment #214733 - Flags: superreview?(bzbarsky)
Attachment #214733 - Flags: review?(darin)
> if they are inefficient we can fix the js engine

How might you do that?  I mean, we're talking about a JS Array Object, right? ;-)
Comment on attachment 214733 [details] [diff] [review]
patch

>diff --git a/xpcom/io/nsIScriptableInputStream2.idl
...
>+ * The Original Code is Mozilla Communicator client code, released
>+ * March 31, 1998.
>+ *
>+ * The Initial Developer of the Original Code is
>+ * Netscape Communications Corporation.
>+ * Portions created by the Initial Developer are Copyright (C) 1998-1999
>+ * the Initial Developer. All Rights Reserved.

License block is wrong.


>+[scriptable, uuid(2606ee92-9bcd-4415-a5b7-c34b640671bc)]
>+interface nsIScriptableInputStream2 : nsISupports
...
>+    ACString readData(in unsigned long aCount);

Why not call this method readString?  We can document the fact that it
reads a string of bytes from the underlying stream.  The resulting
elements of the string are the raw bytes from the stream.  Call out
the fact that no charset conversion is performed.


>+    /**
>+     * Read data from the stream. This method is more convenient than readData
>+     * if no textual interpretation of the data is desired.

You should call out the fact that readBytes may be less efficient
than readString.


>diff --git a/xpcom/io/nsScriptableInputStream.cpp 

>-NS_IMPL_ISUPPORTS1(nsScriptableInputStream, nsIScriptableInputStream)
>+NS_IMPL_ISUPPORTS2_CI(nsScriptableInputStream, nsIScriptableInputStream,
>+                      nsIScriptableInputStream2)

Since there are common methods on both interfaces, and the interfaces
are only meant to be used by script, maybe you don't need to bother
repeating methods like init, close, and available.  The idea here is
that anyone using the scriptable input stream object would just have 
access to all of the methods via classinfo.

I guess you went with the current approach so that nsIScriptableInputStream
could just be deprecated, right?  And, because C++ hooks up the duplicate
methods properly, it didn't really matter.  That's OK too I guess.  Let's
add a deprecated notice to nsIScriptableInputStream then :)



>+NS_IMETHODIMP
>+nsScriptableInputStream::ReadBytes(PRUint32 aCount, PRUint32* aReadData,
>+                                   PRUint8 **_retval) {
>+    if (!mInputStream)
>+        return NS_ERROR_NOT_INITIALIZED;
>+
>+    // NOTE: We allocate one byte more for implementing Read() conveniently
>+    PRUint8* data = NS_REINTERPRET_CAST(PRUint8*, NS_Alloc(aCount + 1));
>+    if (!data)
>+        return NS_ERROR_OUT_OF_MEMORY;

Shouldn't we limit aCount to 4096 like we do for the Read method?


>+    data[*aReadData] = '\0';
>+    *_retval = data;

Add a comment about null termination?
(In reply to comment #13)
> License block is wrong.

I dunno, I started by copying nsIScriptableInputStream... did I change it enough for a new header?

> Why not call this method readString?  We can document the fact that it
> reads a string of bytes from the underlying stream.  The resulting
> elements of the string are the raw bytes from the stream.  Call out
> the fact that no charset conversion is performed.

Hm, ok.

> You should call out the fact that readBytes may be less efficient
> than readString.

will do that.

> I guess you went with the current approach so that nsIScriptableInputStream
> could just be deprecated, right?  And, because C++ hooks up the duplicate
> methods properly, it didn't really matter. 

Right :-)

> That's OK too I guess.  Let's
> add a deprecated notice to nsIScriptableInputStream then :)

I considered that, I wasn't sure how the best way would be :-) Should I change FROZEN to DEPRECATED (like we have other @status DEPRECATED interfaces)? or just mention it in addition?

> Shouldn't we limit aCount to 4096 like we do for the Read method?

Well, I did it for backwards compatibility only... do you think we should do it in general here?

(In reply to comment #12)
> > if they are inefficient we can fix the js engine
> 
> How might you do that?  I mean, we're talking about a JS Array Object, right?

That's not a problem -- we'll optimize object allocation in general, and dense arrays in particular.  See bug 322889 for one piece of work.  With JS2 work that is on the boards, we should get polymorphic arrays (Array(T) for a type parameter T), which can be optimized to use even less storage.  That's coming in a bit.

/be
Attached patch patch v2Splinter Review
Attachment #214733 - Attachment is obsolete: true
Attachment #214783 - Flags: superreview?(bzbarsky)
Attachment #214783 - Flags: review?(darin)
Attachment #214733 - Flags: superreview?(bzbarsky)
Attachment #214733 - Flags: review?(darin)
Comment on attachment 214783 [details] [diff] [review]
patch v2

>diff --git a/xpcom/io/nsScriptableInputStream.cpp 
>+nsScriptableInputStream::ReadString(PRUint32 aCount, nsACString& _retval) {
>+    _retval.SetLength(aCount);

Shouldn't this use SetCapacity()?  This use of SetLength is explicitly deprecated as far as I can tell.

>+    _retval.SetLength(read);

And that should Truncate(), imo.

>+    return NS_OK;
>+nsScriptableInputStream::Read(PRUint32 aCount, char **_retval) {
>+    // Some code passes PR_UINT32_MAX as the count. We can't allocate this much

Shouldn't we worry about that in all three of these methods?

sr=bzbarsky either way
Attachment #214783 - Flags: superreview?(bzbarsky) → superreview+
(summary of our irc discussion)

SetCapacity is documented in a way that allows it to do nothing, so I'd rather not use it. Plus, the XXX comment for SetLength says it is used for my purpose. Although it also says SetLength is the wrong function for this. So, either one is wrong, and I'll stay with SetLength for now.

I'll switch to Truncate() for the second part though.

Putting the limit in just this function was intentional and for backwards compat, but it occurs to me that this way there's a integer overflow leading to a heap overflow if someone does pass -1 to ReadBytes, so I'll move the length check there and add a comment about the overflow.
Your use of SetLength + Length + BeginWriting is perfect.  That is the correct way to modify the contents of a string's buffer.  You tell the string how big it should be, you verify that it is that big, and then you get the raw pointer to its buffer.

The string docs are horribly broken.  That's the problem here.
FYI, bug 284219 is about auditing our code to ensure that everyone is using SetLength + BeginWriting properly.  Part of that patch should (must!) include corrections to the documentation for SetLength.
Comment on attachment 214783 [details] [diff] [review]
patch v2

>+NS_IMETHODIMP
>+nsScriptableInputStream::Read(PRUint32 aCount, char **_retval) {
>+    // Some code passes PR_UINT32_MAX as the count. We can't allocate this much
>+    // memory, so we don't try (bug 278786)
>+    if (aCount == PR_UINT32_MAX)
>+        aCount = 4096;

We probably also cannot allocate (PR_UINT32_MAX-1) bytes.  Maybe what
we are looking for here is something like this:

      if (aCount > SOME_MAX)
          aCount = SOME_MAX;

And, then it would be good to apply this consistently to all of the read
functions.  Maybe you could use (PR_UINT32_MAX / 2) as SOME_MAX?  Or, 
should it be some smaller value, like N * 4096 for some small value of N?
I could, sure... but why not just have callers pass a sensible value instead? the limits you are proposing seem rather arbitrary to me...
Yeah, they are arbitrary! ;-)  Since ReadBytes null terminates, why don't we do this then:

  if (aCount == PR_UINT32_MAX)
    --aCount;

It seems like we'd need this in ReadBytes instead of Read to protect against overflow (aCount + 1).  I think the string code will fail to increase the length of the string beyond PR_UINT32_MAX/2.
Comment on attachment 214783 [details] [diff] [review]
patch v2

marking review- to get attention.  see last comment ;)
Attachment #214783 - Flags: review?(darin) → review-
Blocks: 316178
Assignee: cbiesinger → nobody
Severity: normal → S3
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: