329448 - Want a useful scriptable input and output stream

Christian :Biesinger (don't email me, ping me on IRC)

Reporter

Description

•

19 years ago

Neither our "scriptable" input nor our scriptable output stream can handle data containing null bytes. additionally, the |string| data type they use does not guarantee that non-ASCII characters work.

James Ross

Comment 1

•

19 years ago

FWIW, the binary input/output streams that can wrap them are fine for streams of bytes (and scriptable enough). I guess what you're after here is something else?

Darin Fisher

Comment 2

•

19 years ago

I think we want something simpler than nsIBinaryInputStream. We should just create a nsIScriptableInputStream2 that replaces |string| with |ACString|. The implementation of that interface should use ClassInfo so consumers do not need to QI.

Christian :Biesinger (don't email me, ping me on IRC)

Reporter

Comment 3

•

19 years ago

(In reply to comment #1) > FWIW, the binary input/output streams that can wrap them are fine for streams > of bytes (and scriptable enough). I guess what you're after here is something > else? Oh, I meant to mention that. Their read/write semantics differ from the usual stream semantics (they fail if the requested number of bytes couldn't be read/written, rather than succeed with a partial number of bytes) And yeah, they are complex interfaces with a single general-purpose method.

Target Milestone: --- → mozilla1.8.1

Boris Zbarsky [:bzbarsky]

Comment 4

•

19 years ago

Why ACString instead of octet array? That encourages people to treat the result as a string, which is wrong for the output of read(), imo. Brendan, JSString can contain embedded nulls in a reasonable way, right? It looks to me like converting data with an embedded null from ACString to JSString should work, but....

Benjamin Smedberg

Comment 5

•

19 years ago

We have been treating the ACString IDL type as the preferred way to handle binary data, over octet-arrays, mainly because octet arrays are very cumbersome to code against and the conversion to JS strings avoids a lot of boxing overhead associated with arrays (unless I'm missing an XPConnect special-case conversion of octet arrays).

Brendan Eich [:brendan]

Comment 6

•

19 years ago

Octet arrays are not special-cased, so the JS reflection is an Array of numbers, not very efficient. Yes, JS strings can contain NULs. /be

Christian :Biesinger (don't email me, ping me on IRC)

Reporter

Comment 7

•

19 years ago

(In reply to comment #4) > Why ACString instead of octet array? That encourages people to treat the > result as a string, which is wrong for the output of read(), imo. yeah, that's my opinion too... (In reply to comment #5) > We have been treating the ACString IDL type as the preferred way to handle > binary data I guess that depends on who wrote the interface.

Darin Fisher

Comment 8

•

19 years ago

Please, ACString is the way to go here. Arrays of octets are not reflected in a very efficient manner to JS. Let's not go there. See nsIStringInputStream2, where I added an ACString attribute. ACString is the best container we have for arrays of octets.

Christian :Biesinger (don't email me, ping me on IRC)

Reporter

Comment 9

•

19 years ago

(It turns out that nsIStringInputStream2 is gone, replaced by nsISupportsCString IIRC, but a comment about it remains :-/)

Darin Fisher

Comment 10

•

19 years ago

(In reply to comment #9) > (It turns out that nsIStringInputStream2 is gone, replaced by > nsISupportsCString IIRC, but a comment about it remains :-/) heh.. how could i forget? ;-) I checked in a correction to the comment in nsStringStream.h. Thanks for noticing.

Christian :Biesinger (don't email me, ping me on IRC)

Reporter

Comment 11

•

19 years ago

Attached patch patch (obsolete) — Details — Splinter Review

so, I decided to provide a byte array method too. it can be more convenient in some places. (especially in combination with interfaces that use byte arrays) if they are inefficient we can fix the js engine

Attachment #214733 - Flags: superreview?(bzbarsky)

Attachment #214733 - Flags: review?(darin)

Darin Fisher

Comment 12

•

19 years ago

> if they are inefficient we can fix the js engine How might you do that? I mean, we're talking about a JS Array Object, right? ;-)

Darin Fisher

Comment 13

•

19 years ago

Comment on attachment 214733 [details] [diff] [review] patch >diff --git a/xpcom/io/nsIScriptableInputStream2.idl ... >+ * The Original Code is Mozilla Communicator client code, released >+ * March 31, 1998. >+ * >+ * The Initial Developer of the Original Code is >+ * Netscape Communications Corporation. >+ * Portions created by the Initial Developer are Copyright (C) 1998-1999 >+ * the Initial Developer. All Rights Reserved. License block is wrong. >+[scriptable, uuid(2606ee92-9bcd-4415-a5b7-c34b640671bc)] >+interface nsIScriptableInputStream2 : nsISupports ... >+ ACString readData(in unsigned long aCount); Why not call this method readString? We can document the fact that it reads a string of bytes from the underlying stream. The resulting elements of the string are the raw bytes from the stream. Call out the fact that no charset conversion is performed. >+ /** >+ * Read data from the stream. This method is more convenient than readData >+ * if no textual interpretation of the data is desired. You should call out the fact that readBytes may be less efficient than readString. >diff --git a/xpcom/io/nsScriptableInputStream.cpp >-NS_IMPL_ISUPPORTS1(nsScriptableInputStream, nsIScriptableInputStream) >+NS_IMPL_ISUPPORTS2_CI(nsScriptableInputStream, nsIScriptableInputStream, >+ nsIScriptableInputStream2) Since there are common methods on both interfaces, and the interfaces are only meant to be used by script, maybe you don't need to bother repeating methods like init, close, and available. The idea here is that anyone using the scriptable input stream object would just have access to all of the methods via classinfo. I guess you went with the current approach so that nsIScriptableInputStream could just be deprecated, right? And, because C++ hooks up the duplicate methods properly, it didn't really matter. That's OK too I guess. Let's add a deprecated notice to nsIScriptableInputStream then :) >+NS_IMETHODIMP >+nsScriptableInputStream::ReadBytes(PRUint32 aCount, PRUint32* aReadData, >+ PRUint8 **_retval) { >+ if (!mInputStream) >+ return NS_ERROR_NOT_INITIALIZED; >+ >+ // NOTE: We allocate one byte more for implementing Read() conveniently >+ PRUint8* data = NS_REINTERPRET_CAST(PRUint8*, NS_Alloc(aCount + 1)); >+ if (!data) >+ return NS_ERROR_OUT_OF_MEMORY; Shouldn't we limit aCount to 4096 like we do for the Read method? >+ data[*aReadData] = '\0'; >+ *_retval = data; Add a comment about null termination?

Christian :Biesinger (don't email me, ping me on IRC)

Reporter

Comment 14

•

19 years ago

(In reply to comment #13) > License block is wrong. I dunno, I started by copying nsIScriptableInputStream... did I change it enough for a new header? > Why not call this method readString? We can document the fact that it > reads a string of bytes from the underlying stream. The resulting > elements of the string are the raw bytes from the stream. Call out > the fact that no charset conversion is performed. Hm, ok. > You should call out the fact that readBytes may be less efficient > than readString. will do that. > I guess you went with the current approach so that nsIScriptableInputStream > could just be deprecated, right? And, because C++ hooks up the duplicate > methods properly, it didn't really matter. Right :-) > That's OK too I guess. Let's > add a deprecated notice to nsIScriptableInputStream then :) I considered that, I wasn't sure how the best way would be :-) Should I change FROZEN to DEPRECATED (like we have other @status DEPRECATED interfaces)? or just mention it in addition? > Shouldn't we limit aCount to 4096 like we do for the Read method? Well, I did it for backwards compatibility only... do you think we should do it in general here?

Brendan Eich [:brendan]

Comment 15

•

19 years ago

(In reply to comment #12) > > if they are inefficient we can fix the js engine > > How might you do that? I mean, we're talking about a JS Array Object, right? That's not a problem -- we'll optimize object allocation in general, and dense arrays in particular. See bug 322889 for one piece of work. With JS2 work that is on the boards, we should get polymorphic arrays (Array(T) for a type parameter T), which can be optimized to use even less storage. That's coming in a bit. /be

Christian :Biesinger (don't email me, ping me on IRC)

Reporter

Comment 16

•

19 years ago

Attached patch patch v2 — Details — Splinter Review

Attachment #214733 - Attachment is obsolete: true

Attachment #214783 - Flags: superreview?(bzbarsky)

Attachment #214783 - Flags: review?(darin)

Attachment #214733 - Flags: superreview?(bzbarsky)

Attachment #214733 - Flags: review?(darin)

Boris Zbarsky [:bzbarsky]

Comment 17

•

19 years ago

Comment on attachment 214783 [details] [diff] [review] patch v2 >diff --git a/xpcom/io/nsScriptableInputStream.cpp >+nsScriptableInputStream::ReadString(PRUint32 aCount, nsACString& _retval) { >+ _retval.SetLength(aCount); Shouldn't this use SetCapacity()? This use of SetLength is explicitly deprecated as far as I can tell. >+ _retval.SetLength(read); And that should Truncate(), imo. >+ return NS_OK; >+nsScriptableInputStream::Read(PRUint32 aCount, char **_retval) { >+ // Some code passes PR_UINT32_MAX as the count. We can't allocate this much Shouldn't we worry about that in all three of these methods? sr=bzbarsky either way

Attachment #214783 - Flags: superreview?(bzbarsky) → superreview+

Christian :Biesinger (don't email me, ping me on IRC)

Reporter

Comment 18

•

19 years ago

(summary of our irc discussion) SetCapacity is documented in a way that allows it to do nothing, so I'd rather not use it. Plus, the XXX comment for SetLength says it is used for my purpose. Although it also says SetLength is the wrong function for this. So, either one is wrong, and I'll stay with SetLength for now. I'll switch to Truncate() for the second part though. Putting the limit in just this function was intentional and for backwards compat, but it occurs to me that this way there's a integer overflow leading to a heap overflow if someone does pass -1 to ReadBytes, so I'll move the length check there and add a comment about the overflow.

Darin Fisher

Comment 19

•

19 years ago

Your use of SetLength + Length + BeginWriting is perfect. That is the correct way to modify the contents of a string's buffer. You tell the string how big it should be, you verify that it is that big, and then you get the raw pointer to its buffer. The string docs are horribly broken. That's the problem here.

Darin Fisher

Comment 20

•

19 years ago

FYI, bug 284219 is about auditing our code to ensure that everyone is using SetLength + BeginWriting properly. Part of that patch should (must!) include corrections to the documentation for SetLength.

Darin Fisher

Comment 21

•

19 years ago

Comment on attachment 214783 [details] [diff] [review] patch v2 >+NS_IMETHODIMP >+nsScriptableInputStream::Read(PRUint32 aCount, char **_retval) { >+ // Some code passes PR_UINT32_MAX as the count. We can't allocate this much >+ // memory, so we don't try (bug 278786) >+ if (aCount == PR_UINT32_MAX) >+ aCount = 4096; We probably also cannot allocate (PR_UINT32_MAX-1) bytes. Maybe what we are looking for here is something like this: if (aCount > SOME_MAX) aCount = SOME_MAX; And, then it would be good to apply this consistently to all of the read functions. Maybe you could use (PR_UINT32_MAX / 2) as SOME_MAX? Or, should it be some smaller value, like N * 4096 for some small value of N?

Christian :Biesinger (don't email me, ping me on IRC)

Reporter

Comment 22

•

19 years ago

I could, sure... but why not just have callers pass a sensible value instead? the limits you are proposing seem rather arbitrary to me...

Darin Fisher

Comment 23

•

19 years ago

Yeah, they are arbitrary! ;-) Since ReadBytes null terminates, why don't we do this then: if (aCount == PR_UINT32_MAX) --aCount; It seems like we'd need this in ReadBytes instead of Read to protect against overflow (aCount + 1). I think the string code will fail to increase the length of the string beyond PR_UINT32_MAX/2.

Darin Fisher

Comment 24

•

19 years ago

Comment on attachment 214783 [details] [diff] [review] patch v2 marking review- to get attention. see last comment ;)

Attachment #214783 - Flags: review?(darin) → review-

Wayne Mery (:wsmwk)

Updated

•

13 years ago

Blocks: 316178

Christian :Biesinger (don't email me, ping me on IRC)

Reporter

Updated

•

12 years ago

Assignee: cbiesinger → nobody

BMO Automation

Updated

•

2 years ago

Severity: normal → S3

patch 19 years ago Christian :Biesinger (don't email me, ping me on IRC) 12.35 KB, patch		Details \| Diff \| Splinter Review
patch v2 19 years ago Christian :Biesinger (don't email me, ping me on IRC) 12.71 KB, patch	darin.moz : review- bzbarsky : superreview+	Details \| Diff \| Splinter Review