Open Bug 945281 Opened 10 years ago Updated 2 years ago

indexedDB and/or Blob constructor - Blob incremental storage seems very slow for large files

Categories

(Core :: Storage: IndexedDB, defect, P5)

25 Branch
x86
Windows Vista
defect

Tracking

()

People

(Reporter: aymeric, Unassigned)

Details

(Keywords: perf)

User Agent: Mozilla/5.0 (Windows NT 6.0; rv:25.0) Gecko/20100101 Firefox/25.0 (Beta/Release)
Build ID: 20131112160018

Steps to reproduce:

See #944918 for the use case, the size of files is +/- 250 MB, after +/- 100 "put" operations appending each time 2 MB to the Blob, we open a transaction to get the record and we have to wait about 260 seconds to get onsuccess fired (probably waiting that all put operations have been processed).

The following code seems to reproduce the same behavior:

var db;
var DB=indexedDB.open('test',1);
DB.onupgradeneeded=function(evt) {
	var db=evt.target.result;
	var store=db.createObjectStore('test',{keyPath:'id'});
};
DB.onsuccess=function (evt) {
	db=evt.target.result;
	open_db=function() {
		return db.transaction(['test'],'readwrite').objectStore('test');
	};
	var a=new Blob();
	for (var i=0;i<100;i++) {
		var t=open_db();
		var b=new Uint8Array(2097152);
		a=new Blob([a,b]);
		t.put({id:0,data:a});
	};
	console.log("get");
	var c=open_db();
	var t0=Date.now();
	var d=c.get(0);
	d.onsuccess=function(evt) {
		console.log((Date.now()-t0));
	};
};
Component: Untriaged → DOM: IndexedDB
Product: Firefox → Core
Incremental blob for large files as shown above is definitely slow, it's easy to reproduce.

Or is the method incorrect and should the blob not be incremented but reconstituted once with all chunks? (new Blob([chunk1,...,chunk100000]))
It seems that stored files are slow especially on windows.
Made some benchmarks with indexeddb and filehandle :

Mac:
1) number of appends: 750 , append size:1200*1000 = took ~ 20 seconds 
2)number of appends: 750*4 , append size:1200*1000/4 = took ~ 28 seconds

Windows:

1)number of appends: 750*4 , append size:1200*1000/4 = took ~ 404 seconds
2)number of appends: 750 , append size:1200*1000 = took ~ 136 seconds
3)number of appends: 750/2 , append size:1200*1000*2 = took ~ 96 seconds

So it is clear that windows is much slower than mac and also that on windows - increasing the size of each append while decreasing number of appends significantly improves performance.
Did you run the benchmark on the same hardware ?

I suspect that fsync is much slower on windows.
Status: UNCONFIRMED → NEW
Ever confirmed: true
A discussion that confirms my suspicion:
http://stackoverflow.com/questions/18276554/windows-fsync-flushfilebuffers-performance-with-large-files

I'll investigated the FILE_FLAG_NO_BUFFERING flag.
(In reply to Jan Varga [:janv] from comment #4)
> Did you run the benchmark on the same hardware ?
> 
> I suspect that fsync is much slower on windows.

No - That was on two different machines with different hardware. Will try to benchmark on the same machine.
I can confirm that the same problem exists on Ubuntu 14.04 (with Firefox 33.0).

The cause is obvious...data is not appended but somehow copied to the new blob.

This means, if you save a blob incrementally by appending a 1 MB chunk each time, it will get slower the larger the file gets because when for example a file has 512 MB and you already downloaded and persisted 500 MB, when you get the next 1 MB chunk and append it to the file, actually 501 MB are written to the disk (I guess the the first 500 MB are copied from the old blob and the new chunk gets appended at the end), for the next 1 MB chunk then 502 MB are written to disk. This results in 1003 MB beeing written to disk instead of only 2 MB...and it gets even worse to persist the "last" 10 MB of the file.
(In reply to Peter from comment #7)
> The cause is obvious...data is not appended but somehow copied to the new
> blob.

Yeah, this is just how blobs work: they represent one indivisible blob of data once you create them (i.e. they lose the notion that they are somehow composed of multiple chunks).

An alternative strategy for your download case is to store each chunk as an individual blob, and then when you're finished you create a new blob that wraps each of those chunks.
Then you spend your time concatenating and slicing... as I am doing for Peersm project: save chunks, concat, load blob, slice, encrypt, decrypt, hash, concat, etc

This is completely inefficient and inept.

Please take a look at http://lists.w3.org/Archives/Public/public-webapps/2014JulSep/0332.html (read all the thread if you want) which demonstrates that really basic things can not be done with File and indexedDB, and apparently will not be feasible with Streams neither...

An older thread was http://lists.w3.org/Archives/Public/public-webapps/2013OctDec/0657.html, I kept since that time requesting that both APIs handle partial data, without success until now, I don't even think it has been considered for indexedDB V2
(In reply to Aymeric Vitte from comment #9)

Let's take this to another bug. This bug is about our windows implementation flushing file data more slowly than other platforms.
Priority: -- → P5
Severity: normal → S3
You need to log in before you can comment on or make changes to this bug.