Closed Bug 161783 Opened 19 years ago Closed 12 years ago

Need a more scaleable implementation for download data storage

Categories

(SeaMonkey :: Download & File Handling, defect)

defect
Not set
normal

Tracking

(Not tracked)

RESOLVED DUPLICATE of bug 380250

People

(Reporter: jrgmorrison, Unassigned)

References

Details

(Whiteboard: [adt2])

Attachments

(1 file)

I had mentioned this earlier to blake and sgehani. Currently, we store download
information in downloads.rdf as RDF/XML. I'm concerned that this is not a
scalable format for the long haul (i.e., > 4 months of usage). 

The problem with RDF/XML is that it is read-everything/write-everything. In
other words, to access or update any entry, we have to pull the whole thing into
memory 
and re-serialize the whole thing after a change (although we avoid flushing when 
doing a batch update/deletion).

Possible alternatives:

1) automatically expire entries after some number of days (would need to start 
   tracking creation/modified time).

2) don't track download information unless the user has elected to use the 
   download manager window when downloading (in which case, it is now the user's
   responsibility to prune the list of downloads).

3) switch to a different storage format and implement a RDF datasource object 
   to access/update the data. (Possibly mork, although I've never understood 
   why we aren't using a public domain berkeley-ish dbm for this sort of thing).

1) and 2) have drawbacks in terms of the functionality offered to end users. 
[Apologies: I haven't reviewed other download managers myself, so I don't know
what basic feature set is required]. 3) is a greater amount of engineering and 
testing time.
Keywords: mozilla1.2, nsbeta1
Here's a large 'downloads.rdf' with 1500 entries to ponder the future with.
I guess one question is "how many entries per week will a typical and a heavy
user make". Obviously, some users will download very little, but in other
bugs, I have already seen users who had 600 entries and more. Note also that
triggered RealAudio, etc., links will also put and additional entry in 
downloads.rdf.
QA Contact: sairuh → petersen
nsbeta1+/adt3 per the nav triage team.

Keywords: nsbeta1nsbeta1+
Whiteboard: [adt3]
Should be adt2

Whiteboard: [adt3] → [adt2]
1) is similar to bug 136054
2) is similar to bug 132755
(or both are both...)

Bug 159107 is about the performance problem with the current implementation, so
I'd *propose* to make this bug concentrate on 3) only - a new implementation.
I also attached some more example download.rdf's to bug 159107.
Link: 
Sorry for the spam.
Grrrrr....
The missing link from my last comment:
http://bugzilla.mozilla.org/attachment.cgi?id=110879&action=view
...really sorry now... :-(
There are two related issues I would like to mention:
- to better make it possible to avoid the problem, there could be a preference
to keep only the latest N completed downloads. This would probably be easier to 
implement than last N days and more effective, since the download frequence 
does not matter anymore.
- There could also be an option to "zap" the whole list which is active if there
is no uncompleted download going on: this option would simply delete the file
(set to 0 length) instead of individually removing every single entry

Don't we already have a general-purpose random-access database in the tree and
being used: Mork?
-> varga
Assignee: blaker → varga
*** Bug 191006 has been marked as a duplicate of this bug. ***
Target Milestone: --- → mozilla1.4alpha
*** Bug 191708 has been marked as a duplicate of this bug. ***
*** Bug 159107 has been marked as a duplicate of this bug. ***
Target Milestone: mozilla1.4alpha → mozilla1.4beta
*** Bug 197765 has been marked as a duplicate of this bug. ***
Just an observation - Even downloading jpeg's causes an entry, so it's quite
easy to get to high numbers in the rdf. Even my wife - who teaches and uses web
images as documentation, had a large number of jpegs in the database. I strongly
vote for a more efficient format, or a more efficient parser, as I cannot
imagine why 1000 entries take 5 seconds to read/update.
I also vote to eliminate the - very slow - deletion process. I gave up after
more than 15 minutes on a 1000-entry or so database.
There should be a note in the release notes about the slowness of deleting
entries, and the easy workaround to delete the whole downloads.rdf, just getting
rid of all entries. I did this, when I noticed I had to do something, but
deleting some entries was way too slow, and I wasn´t sure if deleting the whole
file would kill mozilla, but moz survived.
*** Bug 210282 has been marked as a duplicate of this bug. ***
*** Bug 210778 has been marked as a duplicate of this bug. ***
It appears as though the code causing this problem is:

RDFContainerUtilsImpl::IndexOf( nsIRDFDataSource* aDataSource, nsIRDFResource*  ...
http://lxr.mozilla.org/seamonkey/source/rdf/base/src/nsRDFContainerUtils.cpp#484

The trouble is that this search loops - sequentially - through each "Arc," then
loops - sequentially - through each element in that Arc. This results in just
barely worse than O(N) (linear) performance of a search. It does so using a
".getNext()" method of traversing the arcs.

This method should either be rewritten to allow a binary search of arcs and
their elements, or be overridden in the nsDownloadManager.cpp file alone. The
former is more difficult but would fix several bugs (including one claiming slow
Bookmark performance). The latter is simpler but fixes only this one.

Note that the .getNext() architecture suggests a binary search will demand a
different, more memory-intensive container implementation than the present
"Linked List" structure.
This bug effects not only the speed of saving a file to the disc, but also
attempting the clear the list by hand.  Attempting to remove more than one item
from the list at a time increases the time it takes by a magnitute of how many
items there are.  Thus, if your list becomes TOO large, you are no longer able
to clear it at all.  There needs to be a way to zap the list completely.
this is getting pretty annoying and dupes are comming in as well
my 4mb+ download list wasnt deleteable via downloadmanager (over 10min freeze)
and  save as downloads are slowed extremly by the large and slow logs !

see bug 160624 and bug 208576
Target Milestone: mozilla1.4beta → ---
Blocks: 159107
Blocks: 160624
*** Bug 228061 has been marked as a duplicate of this bug. ***
*** Bug 248652 has been marked as a duplicate of this bug. ***
*** Bug 251254 has been marked as a duplicate of this bug. ***
*** Bug 256323 has been marked as a duplicate of this bug. ***
Implementation of bug 251337 may provide an elegant workaround...
hm... I'm sorry, that would be alternative 1) of the original post. Although
it's not a complete solution, I think it's worth the effort. It would have
additional benefits (for instance, improved privacy) and would, generally
spoken, make mozilla (firefox) a more consistent unity...
*** Bug 185142 has been marked as a duplicate of this bug. ***
*** Bug 263205 has been marked as a duplicate of this bug. ***
What about using the new "@mozilla.org/storage" components for the download
list?  I believe in the future global history and cache are planned to be moved
to it.
Product: Browser → Seamonkey
*** Bug 294979 has been marked as a duplicate of this bug. ***
*** Bug 293430 has been marked as a duplicate of this bug. ***
*** Bug 326405 has been marked as a duplicate of this bug. ***
Is there even a plan to fix this?  Even 50 entries or less cause major slowdowns and disk thrashing.
Given the growing number of Firefox dups here and the suggestion in comment #4:

Shouldn't this bug be moved to the Core product?
I think that the Firefox version of this is bug #240525. Should they be consolidated?  Lots more dups there too.  Perhaps bug #304251 is a firefox dup also.
*** Bug 363045 has been marked as a duplicate of this bug. ***
I guess since this really is a problem, the download manager is not a separate thread from the Firefox browsing thread. Multi-threading is another way to address the issue.

Thread-list:
1. Firefox browsing
2. Download Manager gui
3. Download Manager downloader
4. Download Manager history updater & xml parser

Maybe 2 & 3 could be in the same thread. If implemented correctly, with this approach, Firefox wouldn't hang even if the Download Manager history thread hangs.

Since this bug is well over 4 years old, it should be safe to assume there is not much happening on this part.

The severity of this bug should be upped.
*** Bug 363414 has been marked as a duplicate of this bug. ***
I can understand if you want to use an embedded database or something like that, but I would really regret to see this XML format (or XML in general) be lost, because I've written a video clip manager that grabs the downloads.rdf to get metadata about downloaded video clips and saves them in a database where you can add other metadata. For this application it's a great help to have information about the download. 
It's in general a huge problem to loose all context information after you've downloaded a file so your downloads.rdf is the only way to get something. I don't know which implementation you are using, but I would recommend to use a SAX parser istead of a DOM implementation to speed up processing. I'm frequently dealing with XML in my applications as well and it usually is no problem to have XML files of a few MB. Both Java and .NET implementations process them at pretty good speed. I would also recommend, not to free the memory every time the download manager closes. Especially with the settings "open on download" plus "close on completion" it is quite clear that the download manager will open and close a lot of times. If you completely intialize it every time it is clear why you have performance issues. You could at least add a configuration option to let users control if they prefer speed over memory consumption or vice versa. 
If clearing the downloads succeeds, but file save as still freezes, 
you must exit ffox and delete the downloads.rdf file from your profile.
After saving a page with 100's of icons, I could not even change directory
in the file open dialog - had to kill firefox.
see bug 159107 page saving/downloads takes too much time (is slow) ('marooned' entries in downloads.rdf) 
https://bugzilla.mozilla.org/show_bug.cgi?id=159107
I have had this problem, long delays in saving web content or when starting download manager.  I solved it by clearing the download manager list.  It seems that FireFox parses the file that contains list of previously downloaded items, if the list is very long, it delays firefox, my suggestion is to simply, periodically clear your download list.  It would be handy if there was a setting that allowed you to automatically trim the download list in download manager.
Depends on: 380250
Apologies if I am misunderstanding something but I think I have been experiencing this bug in firefox on various versions of linux for some time and I don't think it is related to the *size* of downloads.rdf, only its existence, under some conditions that I have not been able to identify.

I currently use this version of firefox: Gecko/20070309 Firefox/2.0.0.3
I don't think it existed prior to firefox 2.

The slowness occurs in at least three contexts:
   1. save page/link as ...
   2. open file
   3. open with other..

In all cases it seems to be concerned with reading the directory tree, and the time it takes seems to be related to the number of files in the directories traversed (this got much worse when X.org merged all its binaries with standard linux binaries in /usr/bin/ so that 'open with' became intolerably slow.)

Clearing downloads, using 'cleanup' button on the download manager does not seem to make it go away.

I still had the problem when downloads.rdf was quite small (a few hundred bytes).

However, the problem went away, if I deleted downloads.rdf, without any need to restart firefox. Since then I have downloaded a few dozen files, but the slowness has not returned.

All this shows that firefox is perfectly capable of reading large directories and displaying them quickly, but that under some conditions it takes an intolerably long time to read directories -- and this seems to be connected with the existence of downloads.rdf and perhaps the time since it was last deleted, though not necessarily its size.

I have been using mozilla (SeaMonkey 1.0.5), though not as often as Firefox.
So far I have never noticed this problem in mozilla. It seems to be firefox specific in my experience. But maybe that's because I use it much more (because of extensions -- otherwise I would stick with mozilla as I much prefer its 'preferences' mechanism).

I cannot see any reason why firefox should even look at downloads.rdf while the user is specifying a directory in which to save something, find a file to open, or find an application to run. If a file is being saved there might possibly be some reason to check whether it was saved previously, but I can't see why: all that's needed is to check whether a file of the same name exists and give the user the option to overwrite or save to a different name. At that point the download manager may wish to access downloads.rdf in order to record the selected location for the download, but not before.

I could have inserted this comment in one of the other bugreports concerned with slowness, but I chose this one because I wonder whether it is possible that a problem attributed to the size of the downloads list is actually cause by a bug in the directory traversal code.

I have not looked at any of the innards of firefox, so I apologise if I have raised a red herring, and should have described my experiences in another place.
Once the move to toolkit is complete, this issues would have been fixed by Bug 380250.
Depends on: 381157
No longer depends on: 380250
It is nonsensical that this blocks a resolved bug.  Editing.
No longer blocks: 160624
Assignee: Jan.Varga → nobody
QA Contact: chrispetersen → download-manager
Dupe of (i.e. resolved by) bug 380250 now that we landed bug 381157.
Status: NEW → RESOLVED
Closed: 12 years ago
Resolution: --- → DUPLICATE
Duplicate of bug: 380250
You need to log in before you can comment on or make changes to this bug.