Closed Bug 1170668 Opened 9 years ago Closed 2 years ago

Failure to handle short read: the case of reading a JSON file into JavaScript 'String' type. (failure to deal with short read)

Categories

(Core :: XPCOM, defect)

All
Linux
defect
Not set
major

Tracking

()

RESOLVED FIXED
105 Branch
Tracking Status
firefox105 --- fixed

People

(Reporter: ishikawa, Assigned: nika)

References

(Blocks 2 open bugs)

Details

Attachments

(2 files)

This blocks bug 1170564. This bugzilla about > 4. Case of reading JSON file from JavaScript. FYI, see bug 1170564 quoted below for details about "short read". During the testing of C-C TB |make mozmill| test suite with simulated short read I noticed an error caused by the failure to handle short read against a file named _tests/mozmill/mozmillprofile/session.json below my MOZOBJ /REF-OBJ-DIR/objdir-tb3/ What happened was that a simulated short read occurred. Only the initial portion of the JSON file was returned to the C-C TB, thus the program misbehaved since it did not have all the info in JSON file. Thus I noticed a few errors starting with TEST-UNEXPECTED-FAIL | /REF-COMM-CENTRAL/comm-central/mail/test/mozmill/session-store/test-session-store.js | test-session-store.js::test_restore_single_3pane_persistence NOTE; BEWARE FF developers. Since this has something to do with JSON reading from JavaScript (found by analyzing the stack trace), this can affect FF as well, I think. Explanation. JSON files One of the short read issue manifests itself when JavaScript interpreter reads a JSON file into an array of known size (I think it is a String). It is case (a)-1 explained in the original META bugzilla entry. If a |read| fails to read all the available data, the JS interpreter doesn't care and proceed and so eventually we have an error during |make mozmill| test because not all the relevant data in JSON file is read! This operation involves a JavaScript interpreter operation and CPP I/O operation and I am not sure how to fix it: while I tried to fix it naively, I get EINTR error(!) (See the discussion in [1]. So There may be a case that EINTR needed to be deal with for real after all!) With my naive fix, the program got into an infinite loop :-( This is hard to fix and so I am submitting this as a separate entry to ask for wider attention. [1] Bug 1126881 - Mapping of errno to NS_* macros (nsresultForErrno(int aErr)) is not as complete as mapping to PR_* macros ++ This bug was initially created as a clone of Bug #1170564 +++ After spending some time to weed out the failure to detect I/O errors from C-C TB code during message download [1], I have come to realize there are issues regarding short read. Short read refers to |read| system call returning without the full number of requested octets although the subsequent |read| calls would return the missing octets. That is, short read returns prematurely. Short read can happen during a read operation against a remotely mounted file. High-load on the remote server, or a transient network issue can cause such short read. It is not a stranger to FF developers. Network code is full of cautious programming to take care of such short read. Unfortunately, C-C TB (under linux at least) fails to handle the short read very well and as a result we see errors which I noticed during tests. So I set out to identify and fix these issues and this is a meta entry to track such errors. It turns out that performing additional writes after a short-write is automagically handled by |pt_Write|. Isn't it great? Then why not pt_Read? As I explain later in this post, we sometimes don't know how many we should read in advance. That's why. Short read simulated: I have come to learn that mounting one's profile to remote file server seems to cause strange errors. Since I don't have a CIFS server that seems to cause such short read very often under heavy I/O work load, I resort to simulation. I inject short read by preparing some functions such as |read| and |read64| that simulates short read and preload them using "export LD_PRELOAD=that-file.so" before running C-C thunderbird to find errors caused by short read. How to FIX after identifying the trouble spot: When you deal with short read issues there are a few things that we must keep in mind. At the level of |read| system call, we have no idea whether a short read ought to be followed by additional |read| or not. Then where should we make the decision? It depends on the type of |read| operation. There are two types of |read| operations. (a) # of octets to read is known in advance. The caller of |read| knows exactly that there are certain number of octets to be read in advance. E.g. (1) we check the size of JSON file by calling |stat| in advance and try to read that many octets in one |read|. [As it turns out this is a real-world problem.] (2) we try to read a "block" of a known data structure and sizeof(block) is already known in advance (at compile time, etc). A case in point is the "block" for CACHE operation. (There was an issue and it was fixed.) In these cases, we ought to perform additional |read| operations until all the expected number of octets are read (or until error condition or EOF is encountered.) in the face of short read. So we must provide a mechanism for such additional reads. In my patch, I have introduced a couple of variants of |Read| operation and |PR_Read| that performs additional read operations if we failed to read the desired # of octets in one run until all the octets are read. (b) # of octets to read is not known in advance. E.g., (1) There is a mime-encoded data in a file (as in attachment), and the program needs to read as many data as possible until the end of the data marker line is reached. (2) In general, suppose there is a data structure written in ASCII representation in a file, and also suppose the caller of |read| can know the end of the logical data structure by reading a certain textual construct that signals the end of the data. In this case, the caller of |read| needs to perform the |read| repeatedly until this end of data marker is read. Case (1) is obviously a particular case of (2). (3) There are similar variants. In these cases, we can NOT rely on the number of octets to read in advance. The size of the buffer passed to |read| is just an estimate, a ball park figure. We may have more data to read or much less data to read actually. We ought to perform additional |read| until the end of data marker is reached (or the error is signaled or EOF is reached.) Between (a) and (b), the main difference is case (a) knows the fixed number of octets to be read in advance, and case (b) does not know it. In practice, |read| calls of case (b) tend to happen inside loops and are relatively easy to spot, and no modification is necessary most of the time even if short read has to be considered. (Well there are cases where the ERROR condition was not detected at all in the current code and I tried to fix them.) The short read in case (a) needs to be handled at the spot where the required number of octets is known in the call chain. At the |read| system call level, we have no idea whether a particular short |read| ought to be followed by additional reads at all. To figure out where such problematic short read is encountered, and to learn whether the |read| is type (a) or type (b) above, I had to dump stack trace to figure out where such short read is NOT followed by necessary additional read operations. Using the stack trace, I tried to figure out where the decision to perform additional reads can be made and, if necessary, replaced the read with a version that performs such additional read operations. I think I explained enough background. There will be four bugzilla entries I intend to submit today and block this meta entriy. 1. NSS "dbm" code used for certificate and key handling. (Already reported in newsgroup/mailing list.) 2. Mostly C-C T-B fixes to take care of short read. 3. A few M-C fixes. Yes, there *ARE* places where short read is not handled in M-C! I was surprised myself. Maybe storing one's profile under remote file server was an arcane idea when mozilla code base was written years ago. But today, in corporate setup, PC is often used like a thin client and everything including your profile may be stored on a central server. 4. Case of reading JSON file from JavaScript. There is a known bug which I don't know how to fix. It is filed as a separate bugzilla. It is concerned with a reading of JSON file from JavaScript. There are a few more unresolved cases that manifest themselves during local |make mozmill| test suite operation with simulated short read injection, but I am not sure if they are real and not timing-dependent random errors that are seen before. [Yes, there are such random errors in |make mozmill|.] I will add them to known C-C T-B fixes and M-C fixes *iff* they turn out to be real bugs in the future. TIA [1] Bug 1121842 - [META] RFC: C-C Thunderbird - Cleaning of incorrect Close, unchecked Flush, Write etc. in nsPop3Sink.cpp and friends.
No longer depends on: 1170564, 1170578
Assignee: nobody → ishikawa
No longer blocks: 1170606
No longer blocks: 1170646
Blocks: 1170564
See Also: → 1170578
See Also: → 1170606
See Also: → 1170646
If I put #if 1 instead of #if 0 in the uploaded patch, I get EINTR errors repeatedly. The |busy_beaver_Stream_Read| in the patch is indeed a very quick hack and there may be an issue when it needs to work with NS_FillArray semantics? Anyway, this is the first time I see EINTR happening for real. TIA
Component: JavaScript: Standard Library → XPCOM

The meta keyword is there, the bug doesn't depend on other bugs and there is no activity for 12 months.
:froydnj, maybe it's time to close this bug?

Flags: needinfo?(nfroyd)

AFAIK, this bug still applies. Ishikawa-san, is that correct?

Flags: needinfo?(nfroyd) → needinfo?(ishikawa)
Keywords: meta
Keywords: meta
Summary: Failure to handle short read: the case of reading a JSON file into JavaScript 'String' type. ( [META] Failure to deal with short read) → Failure to handle short read: the case of reading a JSON file into JavaScript 'String' type. (failure to deal with short read)

(In reply to Nathan Froyd [:froydnj] from comment #3)

AFAIK, this bug still applies. Ishikawa-san, is that correct?

Yes, I believe so.

My plan is

    • to land the POP3 fix to reduce the # of system calls (write/seek done for each line today).
      E.G. A photo attachment (in MIME format) causes easily 40K system calls... Kills the
      performance of a File server if the profile is on a remote file system.
      I have seen more than x10 write speed gain under local test. I mean copying about 1000 messages took more than two minutes while with my patch finished under 5 seconds in a test.
    • to land the series of short-read fixes mentioned here and elsewhere.

Trying to finish the fiscal-year-end hassle at the office, AND cope with the corona virus outbreak in Japan.

Flags: needinfo?(ishikawa)
QA Whiteboard: qa-not-actionable

The NS_FillArray function definitely only reads a single time from the underlying input stream, so can read less than the full capacity of aDest without reaching EOF. That being said, the signature of the function does describe its behavior that way, so it doesn't look like it's necessarily wrong unless the callers aren't handling it correctly.

Looking at the tree right now, the only caller appears to be nsConverterInputStream, which calls it as part of the Fill method (https://searchfox.org/mozilla-central/rev/7b9d23ece4835bf355e5195f30fef942d376a1c7/intl/uconv/nsConverterInputStream.cpp#189). This method does appear to have a slight bug when reading unicode data with short reads, though I don't know for sure if it's specifically the cause of the issue mentioned in comment 0.

Specifically, the call will read at least 1 byte from the source stream, but 1 byte is often not enough data to decode a full utf-16 character for whichever encoding is used as the source encoding. This could mean that the value of written returned from the decoder (https://searchfox.org/mozilla-central/rev/7b9d23ece4835bf355e5195f30fef942d376a1c7/intl/uconv/nsConverterInputStream.cpp#212-221) is zero, which would lead to a NS_OK code with 0 bytes written being returned despite the number of bytes filled being 0, leading to the stream being terminated early.

I think the code in nsConverterInputStream should instead call NS_FillArray and try to decode in a loop until either at least 1 utf-16 character has been decoded from the source stream, or the NS_FillArray call returns an error.

Flags: needinfo?(nika)

This patch changes how nsConverterInputStream handles passing data
through to the underlying unicode converter in order to make it more
reliably handle propagating errors and deal with short reads from the
underlying input stream.

This was done by making the code continuously read within the Fill
method until at least one character has been decoded from the input
stream, so that we don't spuriously communicate an EOF to the caller due
to a short read not producing enough bytes for the decoder to produce a
UTF-16 character.

In addition, while making this change it became easier to signal to
the decoder about the final read from the input stream, meaning that
partial characters at the end of the stream will now generate a
replacement character, rather than being ignored.

Assignee: nobody → nika
Status: NEW → ASSIGNED
Attachment #9286934 - Attachment description: Bug 1170668 - Improve short read handling ns nsConverterInputStream, r=hsivonen → Bug 1170668 - Improve short read handling in nsConverterInputStream, r=hsivonen

The following patch is waiting for review from an inactive reviewer:

ID Title Author Reviewer Status
D152682 Bug 1170668 - Improve short read handling in nsConverterInputStream, r=hsivonen nika hsivonen: Back Aug 7, 2022

:nika, could you please find another reviewer?

For more information, please visit auto_nag documentation.

Flags: needinfo?(nika)

I am glad to see this finally handled by someone in the know.
Thank you!

We're about a week out from :hsivonen being back, and it'd be nice if they could look over the use of the encoding API to make sure I didn't screw anything up, so leaving the reviewer as-is for now.

Flags: needinfo?(nika)
Pushed by nlayzell@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/2c6bb36b0667 Improve short read handling in nsConverterInputStream, r=hsivonen

Backed out for causing build bustages on TestShortRead.cpp

Backout link

Push with failures

Failure log

Flags: needinfo?(nika)
Pushed by nlayzell@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/465d9c96cdee Improve short read handling in nsConverterInputStream, r=hsivonen
Flags: needinfo?(nika)
Status: ASSIGNED → RESOLVED
Closed: 2 years ago
Resolution: --- → FIXED
Target Milestone: --- → 105 Branch
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: