Closed Bug 1378228 Opened 7 years ago Closed 3 months ago

XHR Range Requests on LARGE local files (via file://) takes forever to return and sometimes freeze up the browser

Categories

(Core :: DOM: Networking, defect, P2)

54 Branch
defect

Tracking

()

RESOLVED INVALID

People

(Reporter: sharun.msgs, Unassigned)

Details

(Whiteboard: [domcore-bugbash-triaged])

Attachments

(2 files)

User Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/59.0.3071.115 Safari/537.36

Steps to reproduce:

Doing a XHR Range Request on a local large files (using 50GB wikipedia dumps) takes forever to return. Firefox appears to search for the end of the file before returning a response. 

A couple weeks back, it would try to read the entire file into memory and the browser would hang. It now looks like it's not doing that, but it is still probably searching for the end of file given the amount of time it takes to return. 

My code just needs the file meta data in the header (first 80 bytes) of these files. And it uses a basic XHR function below to get it. This works on Chrome instantly. 

function read(bytestart, byteend, oncomplete){	
	var req = new XMLHttpRequest();
	req.addEventListener("load", oncomplete);
	req.open('GET', params["archive"], true); 
	req.overrideMimeType('text\/plain; charset=x-user-defined');
	req.responseType = "arraybuffer";
	req.setRequestHeader('Range', 'bytes='+bytestart+'-'+byteend);
	try {
	  req.send(null);
	} catch (ex) {
  		console.log(ex);
	}	
}
Component: Untriaged → DOM
Product: Firefox → Core
baku, do you know what's up here? Something to do with file:// sandboxing?
Flags: needinfo?(amarchesini)
Hi, thanks for this bug report.
But I cannot reproduce it. Can you please send me a testcase? Plus, I would like to know a couple of things:
1. are you using Firefox 54? Or nightly?
2. In e10s mode or not?
3. Is the file a zip file?
Flags: needinfo?(amarchesini) → needinfo?(sharun.msgs)
Attached file loadZIM.html
Flags: needinfo?(sharun.msgs)
1. I am using Firefox 54.0.1 32 bit on win 10.
2. Not using nightly. Don't know what e10s mode is. 
3. The file I am using is a ZIM file parts of which are compressed. 
[ http://www.openzim.org/wiki/ZIM_file_format ]. It is the format used by Kiwix the offling wikipedia reader. 

I have attached a loadZIM.html. 
Place a ZIM in the same directory as the html file. 
Here's a test file https://github.com/kiwix/kiwix-js/blob/master/tests/wikipedia_en_ray_charles_2015-06.zim

To load file:///C:/loadZIM.html?archive=<filename>
It should print out the file header in console.

If you want to test with larger files - http://wiki.kiwix.org/wiki/Special:MyLanguage/Content_in_all_languages

Current behaviour
Firefox doesn't get to the XHR load handler
Chrome and Edges print out the header but Chrome has to be started with the --allow-file-access-from-files
Thanks for looking into this and please let me know if you need any further details.
The issue here is that firefox ignores req.setRequestHeader('Range', 'bytes='+bytestart+'-'+byteend) for file:// URL.
If you see, the ArrayBuffer contains all the bytes of the file.
The fastest way to fix it on your side, is to set responseType to 'blob'. When you have the response, slice it: var blob = this.response.slice(bytestart, byteend-bytestart); After this, do a FileReader.readAsArrayBuffer(the_sliced_blob).
Flags: needinfo?(sharun.msgs)
If it ignores the Range Request for file:// isn't it going to read the entire file as a blob?

My problem is large files. 

Just some context-
Wiki dumps are 50-60 GB. Similar story for most other offline web content - stackexchange, khan academy, zealdocs etc.  
The problem with File object is it can't be easily created or stored across sessions. So we have to keep asking users to reselect files via the fileselector. XHR allows us to bypass this but as these files grow in size, not having Range Requests makes access thro XHR also unusable. Fixing this issue would greatly benefit offline web content access.  

PS: I did try the blob approach with my large files and they don't return.
Flags: needinfo?(sharun.msgs)
> The problem with File object is it can't be easily created or stored across
> sessions. So we have to keep asking users to reselect files via the
> fileselector.

I'll probably work on this issue. But in the meantime, you have a workaround.
Here an example.
Flags: needinfo?(sharun.msgs)
This just made my day :) Thanks a lot for the workaround! We have been struggling with this and it's consequences for a while. 

And great to hear you might work on the File object issue!!!
Flags: needinfo?(sharun.msgs)
Ok I got a bit carried away there seeing the results return immediately for the request. To retrieve and render a wikipedia page takes hundreds of range requests and it gets slow very fast. I guess because the big blob keeps getting reloaded or whatever.
Anyways...the workaround is still useful for tests/usecases requiring a small number of requests. So thanks for it.

Ideally the ultimate fix for this issue should see multiple xhr file:// range requests, performance matching http:// range requests. In theory it should be much faster as everything is on disk.

Related comment for sake of completeness 
A fix for his issue will help reloading a file from session to session without need for a File Object. But since this is XHR, (range) requests will work on a file found in the local directory where initiating code resides. Specifying a relative path or a file://full-path-to-diff-directory will require the FileSelector approach.
Please ignore my performance comment. Sorry! Bug in my code. Workaround seems to work.
Priority: -- → P2
I think we can move it to P3 or P4.
I am still seeing performance issues with the workaround. It's not clear to me what is going on or how to pin it down. But there is a noticeable difference between multiple XHR range request for "http://" and "file://" over a single large file. "file://" for some reason is very slow.

Will just add - supporting range based access on "file://" (in addition to the fileselector dependency issue mentioned above) is crucial to supporting offline web content access. It's the simplest route compared to webframes/apps/addons/extensions/firefoxOS etc. There is no reason to be connected to the increasingly noisy and distracting internet all the time, if one can store KhanAcademy, StackExchange, Wikipedia and other increasing numbers of high quality web archives on a little SD card. 

It's unbelievable to me that even though all this great web content is now able to fit on my disk with tons of space to spare, the platforms makes it more efficient and easy to access the content online!!! 
Mozilla can really do something about this.
The workaround doesn't seem to be working in 55.0.3. 
Same issue as before, range requests on large files never seem to complete loading. Do I need to change anything in the workaround code? baku any suggestions?
Flags: needinfo?(amarchesini)
> ... loading. Do I need to change anything in the workaround code? baku any
> suggestions?

Can you share your code with the workaround again? Thanks.
Flags: needinfo?(amarchesini)
This is where I am using it - https://github.com/sharun-s/kiwix-html5/blob/dev/www/js/lib/util.js#L241 
Does that help? Let me know if you need something else. Thanks for looking into it!
Flags: needinfo?(amarchesini)
sharun, can you please check it again? I did some improvements for FF57. Let me know if this issue is already fixed in nightly.
Thanks!
Flags: needinfo?(amarchesini) → needinfo?(sharun.msgs)
I tried nightly (https://archive.mozilla.org/pub/firefox/nightly/2017/10/2017-10-20-22-11-29-mozilla-central/firefox-58.0a1.en-US.win64.zip) and 57. 

Still causes my machine to freeze. I tried with both arraybuffer and blob as response type. 
As mentioned above the key here is file size. 

When I tried with a 600MB file XHR returns (used attached loadZIM.html above). If I try this with a 50-60GB file (wikipedia/stackoverflow dumps) it causes the freeze up.
Component: DOM → DOM: Core & HTML
Severity: normal → S3

[domcore-bugbash-triaged] Doing domcore random bug triage : if this is still valid, please file a new bug; providing new test cases is going to be a huge help for us.

Status: UNCONFIRMED → RESOLVED
Closed: 3 months ago
Component: DOM: Core & HTML → DOM: Networking
Flags: needinfo?(sharun.msgs)
Resolution: --- → INVALID
Whiteboard: [domcore-bugbash-triaged]

The demo now fails in both Chrome and Firefox with a CORS request for the XHR.
If I set security.fileuri.strict_origin_policy to false it still happens.

The problem with this is that it expects the Range request header to do something for a file channel, and that's not the case.
Andrea's suggestion with the blob works much better because it doesn't actually copy the entire file into memory.

You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: