audit all uses of nsI*InputStream.readBytes(stream.available()) for potentially not reading all data, in c-c

ASSIGNED
Assigned to

Status

ASSIGNED
a year ago
14 days ago

People

(Reporter: aceman, Assigned: aceman)

Tracking

({dataloss})

Trunk
dataloss

Firefox Tracking Flags

(Not tracked)

Details

(URL)

Attachments

(1 obsolete attachment)

(Assignee)

Description

a year ago
+++ This bug was initially created as a clone of Bug #1356306 comment 9 +++

In bug 1356306 it was found by Jorg that reading a stream like this won't read all the whole file or stream:
let stream = Components.classes["@mozilla.org/binaryinputstream;1"]
     .createInstance(Components.interfaces.nsIBinaryInputStream);
let data = stream.readBytes(stream.available());

stream.available() only returns the first ~550KB. This is confirmed by
https://developer.mozilla.org/en-US/docs/Mozilla/Tech/XPCOM/Reference/Interface/nsIInputStream:

available()
Determine number of bytes available in the stream.
Note: This method should not be used to determine the total size of a stream, even if the stream corresponds to a local file. Moreover, since a stream may make available more than 2^32 bytes of data, this method is incapable of expressing the entire size of the underlying data source.

It seems we need to query stream.available() in a loop as only after reading the first block from the stream will reveal there is more data available. This pattern is confirmed e.g. at:
https://dxr.mozilla.org/comm-central/rev/abf145ebd05fe105efbc78b761858c34f7690154/mozilla/addon-sdk/source/lib/sdk/io/byte-streams.js#36
https://dxr.mozilla.org/comm-central/rev/95da4e9ee8dc6cbf96b056b3e0e70ae90d0fff46/mail/components/cloudfile/nsHightail.js#1030

So we need to look through our tree and check all the patterns described above and change them into:

data = "";
while (stream.available() > 0) {
  data += stream.readBytes(stream.available());
}

It is up for discussion whether we do this automatically for all occurrences, or only for those where we expect more than 500KB of data. On the other hand, this 500KB block size of nsIInputStream could change anytime under us so I'm not sure we should do any assumptions on what a safe cut-off limit is.

The callers of nsIInputStream.available() to audit are here:
https://dxr.mozilla.org/comm-central/search?q=.available()&redirect=false

I mark this 'dataloss' as not in all cases the skipping of tail of the data may be as noticeable to the user as in the image bug 1356306 and real data may get lost.
(Assignee)

Updated

a year ago
Summary: audit all uses of stream.readBytes(stream.available()) for potentially not reading all data → audit all uses of nsI*InputStream.readBytes(stream.available()) for potentially not reading all data, in c-c
(Assignee)

Updated

a year ago
Depends on: 1356306

Comment 1

a year ago
Wow, the world has caught up with me.

I noticed while I was checking File I/O against remote file system that TB fails
to read all the available data if |read| issued via network did not return all the expected data:

I filed a few bugzilla entries:

Bug 1170564 - [META] Failure to deal with short read

Bug 1170578 - RFC: Buggy NSS dbm code ought to be updated
   <== this is shared with FF.

Bug 1170606 - C-C T-B fixes to take care of short read (part of [META] Failure to deal with short read)

Bug 1170646 - A few M-C fixes to handle short read in Cache code ( from [META] Failure to deal with short read)
  <--- This one was backed out, but I know what the bug was: I think it was "<" vs. "<=".

Bug 1170668 - Failure to handle short read: the case of reading a JSON file into JavaScript 'String' type. ( [META] Failure to deal with short read)

Bug 1172304 - A M-C fix to handle short read in Plugin code ( from [META] Failure to deal with short read)

The above are what I found out by replacing read() system call with a hand-crafted read() to simulate
a short read (and see TB fails to read the remaining octets if an earlier read did not read all the expected octets.

Basically, under linux, I used LD_PRELOAD to replace |read| with my own input routine that reads a few bytes less than requested
in |read| from TB. This revealed many problems.

Bug 1170606 - C-C T-B fixes to take care of short read (part of [META] Failure to deal with short read)
has the patches. But they are slightly outdated due to the output buffering changes.

There  *were* (maybe still are) places in M-C portions of the code that exhibit similar issues.

I whitelist them since I was focusing on TB code.

TIA

Comment 2

a year ago
I tested the short read issues by running |make mozmill| so unless the reloading of 
images in a draft is tested in |make mozmill|, my patch may have missed the particular place where the
issue arises.

I am happy to explain how exactly I analyzed the issue (I need refresh my memory, too).

OTOH, I am still struggling to set up my new PC: short of time. Busy office, long commute, and some family business.
Not much time to set up a new PC properly.

TIA
(Assignee)

Comment 3

a year ago
Thanks, but I'll focus on the Javascript callers of stream.available() in this bug. It's good you have bugs for the C++ issues already filed.

Comment 4

a year ago
(In reply to :aceman from comment #3)
> Thanks, but I'll focus on the Javascript callers of stream.available() in
> this bug. It's good you have bugs for the C++ issues already filed.

Back when I tried to fix c++ callers, I was not sure of JavaScript callers.
It is good that this JS bug in this bugzilla entry was noticed.

Comment 5

a year ago
Since the goal here is audit, rather than a bustage fix fire drill, I would suggest a more strategic approach rather than just switching to while() in an xpcom stream interface with likely a short life left.  Meaning OS.File for these simple file reads - comes with async. Or even FileReader (which would at least demonstrate getting more webby rather than just endlessly talking about it).
(Assignee)

Comment 6

a year ago
Thanks, does OS.File work on anything other than a local file?

But I see possibilities even if it doesn't. There is one occurrence of nsIFileInputStream which I would convert like this:

mailnews/extensions/newsblog/content/feed-subscriptions.js
@@ -2553,18 +2553,21 @@ var FeedSubscriptions = {
     let opmlDom, statusReport;
     let stream = Cc["@mozilla.org/network/file-input-stream;1"].
                  createInstance(Ci.nsIFileInputStream);

     // Read in file as raw bytes, so Expat can do the decoding for us.
     try {
       stream.init(aFile, FileUtils.MODE_RDONLY, FileUtils.PERMS_FILE, 0);
       let parser = new DOMParser();
-      opmlDom = parser.parseFromStream(stream, null, stream.available(),
-                                       "application/xml");
+      let data = "";
+      while (stream.available() > 0) {
+        data += stream.readBytes(stream.available());
+      }
+      opmlDom = parser.parseFromString(data, "application/xml");

So in this case OS.File or FileReader could be used.
Can you please convert this case to the new API (which one looks better to you) to show us the pattern?

Comment 7

a year ago
In this specific case, the file is xml, so I would use xhr and let it do all the work of returning a dom tree; a method like below. If you like, I can take care of the opml case you cite.
https://dxr.mozilla.org/comm-central/rev/d4ab2edcaf02449a999eb7eeb3cda8307e720450/mailnews/extensions/newsblog/content/FeedUtils.jsm#1414

For remote links I would use xhr otherwise File/FileReader for everything else, and if File can't do something, then OS.File. There are already some FileReader examples in c-c. Note that, obviously, these methods are for http/file urls and if a specific case involves chrome or mailnews urls (content policy, etc.) that's extra investigation.

Perhaps the goal should be to try to stop using nsIFileInputStream and variants.  If that's not the goal, then patching available() is fast and fine, but is can kicking.
(Assignee)

Comment 8

a year ago
Yes, please take the feed parts.

Comment 9

a year ago
Created attachment 8859197 [details] [diff] [review]
opmlFile.patch

Here's the ompl file import part. The intention is to remain sync and remove nsIFileInputStream dependency.
Attachment #8859197 - Flags: review?(acelists)
(Assignee)

Comment 10

a year ago
And now it depends on "@mozilla.org/xmlextras/xmlhttprequest;1"].createInstance(Ci.nsIXMLHttpRequest). Is that a net gain?

Comment 11

a year ago
Sure, our chairs are now moved much closer to the band on the Titanic. ;)

XMLHttpRequest is the web api. I didn't use the constructor form since it complains about sync, which isn't necessary here. Using a webidl api (xhr or filereader) would mean having to adjust the two callers for promises/await.  Absent that, the question is which xpcom interface is more likely to be deprecated first.  It's up to you.
(Assignee)

Comment 12

a year ago
Comment on attachment 8859197 [details] [diff] [review]
opmlFile.patch

Review of attachment 8859197 [details] [diff] [review]:
-----------------------------------------------------------------

I'm fine with this if you like it better.
Attachment #8859197 - Flags: review?(acelists) → review+
(Assignee)

Comment 13

a year ago
Any idea how to convert this?
 let data = NetUtil.readInputStreamToString(
                inputStream, inputStream.available());

NetUtil is a m-c module and what to do when inputStream.available() does not return the whole file size?

Comment 14

a year ago
(In reply to :aceman from comment #13)
> Any idea how to convert this?
>  let data = NetUtil.readInputStreamToString(
>                 inputStream, inputStream.available());
> 
> NetUtil is a m-c module and what to do when inputStream.available() does not
> return the whole file size?

You probably have to open-code this: i.e., 
str = ""
while {
  tstr <- data read at a time.
  str = concatenation of str and tstr.
} while (input stream is not EOF)

data <- str

Details of EOF handling is left to the reader.

[I have to do similar stuff for C++ side of the issues in some places.]

Comment 15

a year ago
(In reply to :aceman from comment #13)
> Any idea how to convert this?
>  let data = NetUtil.readInputStreamToString(
>                 inputStream, inputStream.available());
> 
> NetUtil is a m-c module and what to do when inputStream.available() does not
> return the whole file size?

This is the method used to read session.json (I didn't look at the others).  If you look at toolkit/modules/JSONFile.jsm, there is an async method (using OS.File) and a sync method (using nsIFileInputStream). There isn't any use of while() to read bytes when using nsIFileInputStream there or most (any) place in the codebase.  So while using nsIBinaryInputStream has evidence of byte loss, is there any evidence of a problem with nsIFileInputStream?  If not, perhaps the audit should just deal with nsIBinaryInputStream.

But for json at least, it would be nice to use toolkit and await for session.json.  Since my fingerprints are on that one, I'll see about converting it; toolkit even renames bad files which was done manually in session store.
(Assignee)

Comment 16

a year ago
(In reply to alta88 from comment #15)
>There isn't any use
> of while() to read bytes when using nsIFileInputStream there or most (any)
> place in the codebase.  So while using nsIBinaryInputStream has evidence of
> byte loss, is there any evidence of a problem with nsIFileInputStream?  If
> not, perhaps the audit should just deal with nsIBinaryInputStream.

Since nsIFileInputStream inherits all of nsIInputStream and only redefines 'init' method, I would say it may contain the same problem of the 'available' method. Do we really want a proof first before obeying the docs? I would convert all occurrences just for safety.

Comment 17

a year ago
Well, given that Firefox, with an order of magnitude more pounding, doesn't include while() for nsIFileInputStream reads in its JSONFile api and elsewhere indicates it's not a problem.  And implementing init is different than nsIBinaryInputStream implementing setInputStream so both inheriting from nsIInputStream doesn't mean equivalence of available().  I'm sure using while() won't harm anything and could help, but it's unlikely a big win.  As I said, it would be more useful to use core apis that abstract or don't depend on xpcom calls that are on the block for deprecation.  But it's jmo.

It wasn't hard to get session store converted to JSONFile and using async/await and the OS.File promise. I'll file a bug for that one once the tests are adjusted.

Comment 18

a year ago
Session store conversion in Bug 1358906.

Comment 19

14 days ago
Comment on attachment 8859197 [details] [diff] [review]
opmlFile.patch

moved to bug 1488897.
Attachment #8859197 - Attachment is obsolete: true
You need to log in before you can comment on or make changes to this bug.