Closed Bug 1198095 Opened 4 years ago Closed 4 years ago
Reader APIs return incorrect data when the file read has been updated
User Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/44.0.2403.155 Safari/537.36 Steps to reproduce: I'm using the FileReader.readAsArrayBuffer() method to read from an <input> field on a form. Under normal circumstances, everything works correctly. However, if the user modifies the file after selecting it in the <input> field, readAsArrayBuffer can get very confused. I've created a jsFiddle that shows everything in just a few dozen lines of code here: https://jsfiddle.net/Lv5y9m2u/4/ More specifically, the following steps reproduce the problem: 1. Have an <input type="file" /> field on a form 2. Select a text file for that input that has say 10 characters 3. Use a FileReader.readAsArrayBuffer to read the file 4. Everything is fine at this point. The buffer will have 10 characters 5. Now edit that file (while the webpage is still running) and remove 5 characters. 6. Use the readAsArrayBuffer again and this time it will still return a buffer with 10 characters. The first 5 characters are correct, but the last 5 are all character code 90 (capital letter Z). Actual results: The returned buffer has the same size as the file did originally even though the file is smaller now. The 'excess' in the buffer is filled with capital letter Zs Expected results: The buffer should have been of length 5 since that is the length of the file.
I found another case where the behavior is unexpected. If you start with a smaller file and then make it larger, readAsArrayBuffer will return the new data but only as much data as was in the file originally. So if the file starts off as "AAA" and then you change it to "BBBBBB", the FileReader will only return the bytes for "BBB".
So there are a few questions here: 1) What should happen if the same File instance is passed to readAsArrayBuffer but the file has actually been modified? The answer, per spec, is "whatever you want". The relevant bit is this: If a File object is a reference to a byte sequence originating from a file on disk, then its snapshot state should be set to the state of the file on disk at the time the File object is created. followed by a note explaining that this is a "should", not a "must", because in practice this is quite hard to do reliably. See https://w3c.github.io/FileAPI/#file Ignoring that for the moment, there is informative text that claims that if the snapshot state has changed the read should fail, though I can't find any normative text that unambiguously leads to that behavior. https://w3c.github.io/FileAPI/#dfn-error-codes is the closest, I guess, though that just says what error code to return in some cases if an error is returned, not that an error actually needs to be returned. 2) When the file is modified, should input.files become a new object? Arguably per https://html.spec.whatwg.org/multipage/forms.html#concept-input-type-file-selected it should, but that has some of the same issues as in the Note mentioned above...
Component: DOM → DOM: Core & HTML
I can appreciate that this behavior may be non-trivial to implement and it sounds like the spec doesn't require it. So I guess what I'm asking for is that the 'whatever you want' behavior be at least consistent with other File APIs in the browser. Based on this interpretation of the spec, it seems like IE is the most 'spec-compliant'. It appears to do the snapshot logic in most cases (although, not in all). For example, it will continue to report the original Last Modified Date of the File even if the underlying file on disk changes. Also, trying to read from a File that has been modified throws a SecurityError. However, the snapshotting is not perfect because doing a FormData.append of the file ends up sending along the latest contents of the file. That's a bit of a digression I guess. In the end, I understand this may not be a bug per the spec but it sure seems odd. If I use FileReader.readAsText or FileReader.readAsBinaryString everything works as I expect. But for whatever reason, using FileReader.readAsArrayBuffer works differently. That inconsistency just 'feels' wrong. As for 2), my immediate guess is that the object can be the same. I think that is how I've seen things work in other browsers and so I'm selfish and want things to be consistent. Which way is best per the spec...well, that's another issue. Thanks for the detailed response.
That inconsistency between the different fileReader methods definitely seems weird. Andrea, do you have time to look into that? Seems like per spec we should throw an exception from all of those if the same File instance is used after modification of the underlying storage...
Er, and actually needinfoing Andrea.
Nothing can prevent that the file is read/modified between readAsSomething and the real operation. This patch just does a size check. In case we really want to proceed with a lastModified check, we must change way too many things because blobs take this value when they are created (or at the first time this value is needed - non e10s) or when "mysteries" are taken from the parent process (in e10s).
Assignee: nobody → amarchesini
Status: UNCONFIRMED → NEW
Ever confirmed: true
Attachment #8694695 - Flags: review?(bzbarsky)
Comment on attachment 8694695 [details] [diff] [review] fileReader.patch r=me to at least make sure we don't end up with totally garbage data. That said, shouldn't this be the spec error for this situation, not NS_ERROR_FAILURE? Looks like the inconsistency comment 3 mentions is because one of the cases used mTotal and the other used mDataLen? :(
Attachment #8694695 - Flags: review?(bzbarsky) → review+
Backout without any reason/comment?
There was a test failure, if you look at inbound tbpl for the push from comment 8: 1276 INFO TEST-UNEXPECTED-FAIL | dom/base/test/test_fileapi.html | error set to NotFoundError for nonexistent files - got "NotReadableError", expected "NotFoundError"
Backed out for being a likely cause of https://treeherder.mozilla.org/logviewer.html#?job_id=18307492&repo=mozilla-inbound https://hg.mozilla.org/integration/mozilla-inbound/rev/3ab7d2736c11
You need to log in before you can comment on or make changes to this bug.