Closed
Bug 161783
Opened 22 years ago
Closed 16 years ago
Need a more scaleable implementation for download data storage
Categories
(SeaMonkey :: Download & File Handling, defect)
SeaMonkey
Download & File Handling
Tracking
(Not tracked)
RESOLVED
DUPLICATE
of bug 380250
People
(Reporter: jrgmorrison, Unassigned)
References
Details
(Whiteboard: [adt2])
Attachments
(1 file)
51.89 KB,
application/octet-stream
|
Details |
I had mentioned this earlier to blake and sgehani. Currently, we store download
information in downloads.rdf as RDF/XML. I'm concerned that this is not a
scalable format for the long haul (i.e., > 4 months of usage).
The problem with RDF/XML is that it is read-everything/write-everything. In
other words, to access or update any entry, we have to pull the whole thing into
memory
and re-serialize the whole thing after a change (although we avoid flushing when
doing a batch update/deletion).
Possible alternatives:
1) automatically expire entries after some number of days (would need to start
tracking creation/modified time).
2) don't track download information unless the user has elected to use the
download manager window when downloading (in which case, it is now the user's
responsibility to prune the list of downloads).
3) switch to a different storage format and implement a RDF datasource object
to access/update the data. (Possibly mork, although I've never understood
why we aren't using a public domain berkeley-ish dbm for this sort of thing).
1) and 2) have drawbacks in terms of the functionality offered to end users.
[Apologies: I haven't reviewed other download managers myself, so I don't know
what basic feature set is required]. 3) is a greater amount of engineering and
testing time.
Reporter | ||
Updated•22 years ago
|
Keywords: mozilla1.2,
nsbeta1
Reporter | ||
Comment 1•22 years ago
|
||
Here's a large 'downloads.rdf' with 1500 entries to ponder the future with.
I guess one question is "how many entries per week will a typical and a heavy
user make". Obviously, some users will download very little, but in other
bugs, I have already seen users who had 600 entries and more. Note also that
triggered RealAudio, etc., links will also put and additional entry in
downloads.rdf.
Updated•22 years ago
|
QA Contact: sairuh → petersen
Comment 2•22 years ago
|
||
nsbeta1+/adt3 per the nav triage team.
Comment 4•22 years ago
|
||
1) is similar to bug 136054
2) is similar to bug 132755
(or both are both...)
Bug 159107 is about the performance problem with the current implementation, so
I'd *propose* to make this bug concentrate on 3) only - a new implementation.
Comment 5•22 years ago
|
||
I also attached some more example download.rdf's to bug 159107.
Link:
Sorry for the spam.
Comment 6•22 years ago
|
||
Grrrrr....
The missing link from my last comment:
http://bugzilla.mozilla.org/attachment.cgi?id=110879&action=view
...really sorry now... :-(
Comment 7•22 years ago
|
||
There are two related issues I would like to mention:
- to better make it possible to avoid the problem, there could be a preference
to keep only the latest N completed downloads. This would probably be easier to
implement than last N days and more effective, since the download frequence
does not matter anymore.
- There could also be an option to "zap" the whole list which is active if there
is no uncompleted download going on: this option would simply delete the file
(set to 0 length) instead of individually removing every single entry
Comment 8•22 years ago
|
||
Don't we already have a general-purpose random-access database in the tree and
being used: Mork?
Reporter | ||
Comment 10•22 years ago
|
||
*** Bug 191006 has been marked as a duplicate of this bug. ***
Updated•22 years ago
|
Target Milestone: --- → mozilla1.4alpha
Comment 11•22 years ago
|
||
*** Bug 191708 has been marked as a duplicate of this bug. ***
Comment 12•22 years ago
|
||
*** Bug 159107 has been marked as a duplicate of this bug. ***
Updated•22 years ago
|
Target Milestone: mozilla1.4alpha → mozilla1.4beta
Comment 13•22 years ago
|
||
*** Bug 197765 has been marked as a duplicate of this bug. ***
Comment 14•22 years ago
|
||
Just an observation - Even downloading jpeg's causes an entry, so it's quite
easy to get to high numbers in the rdf. Even my wife - who teaches and uses web
images as documentation, had a large number of jpegs in the database. I strongly
vote for a more efficient format, or a more efficient parser, as I cannot
imagine why 1000 entries take 5 seconds to read/update.
I also vote to eliminate the - very slow - deletion process. I gave up after
more than 15 minutes on a 1000-entry or so database.
Comment 15•22 years ago
|
||
There should be a note in the release notes about the slowness of deleting
entries, and the easy workaround to delete the whole downloads.rdf, just getting
rid of all entries. I did this, when I noticed I had to do something, but
deleting some entries was way too slow, and I wasn´t sure if deleting the whole
file would kill mozilla, but moz survived.
Reporter | ||
Comment 16•21 years ago
|
||
*** Bug 210282 has been marked as a duplicate of this bug. ***
Comment 17•21 years ago
|
||
*** Bug 210778 has been marked as a duplicate of this bug. ***
Comment 18•21 years ago
|
||
It appears as though the code causing this problem is:
RDFContainerUtilsImpl::IndexOf( nsIRDFDataSource* aDataSource, nsIRDFResource* ...
http://lxr.mozilla.org/seamonkey/source/rdf/base/src/nsRDFContainerUtils.cpp#484
The trouble is that this search loops - sequentially - through each "Arc," then
loops - sequentially - through each element in that Arc. This results in just
barely worse than O(N) (linear) performance of a search. It does so using a
".getNext()" method of traversing the arcs.
This method should either be rewritten to allow a binary search of arcs and
their elements, or be overridden in the nsDownloadManager.cpp file alone. The
former is more difficult but would fix several bugs (including one claiming slow
Bookmark performance). The latter is simpler but fixes only this one.
Note that the .getNext() architecture suggests a binary search will demand a
different, more memory-intensive container implementation than the present
"Linked List" structure.
Comment 19•21 years ago
|
||
This bug effects not only the speed of saving a file to the disc, but also
attempting the clear the list by hand. Attempting to remove more than one item
from the list at a time increases the time it takes by a magnitute of how many
items there are. Thus, if your list becomes TOO large, you are no longer able
to clear it at all. There needs to be a way to zap the list completely.
Comment 20•21 years ago
|
||
this is getting pretty annoying and dupes are comming in as well
my 4mb+ download list wasnt deleteable via downloadmanager (over 10min freeze)
and save as downloads are slowed extremly by the large and slow logs !
see bug 160624 and bug 208576
Target Milestone: mozilla1.4beta → ---
Comment 21•21 years ago
|
||
*** Bug 228061 has been marked as a duplicate of this bug. ***
Comment 22•20 years ago
|
||
*** Bug 248652 has been marked as a duplicate of this bug. ***
Comment 23•20 years ago
|
||
*** Bug 251254 has been marked as a duplicate of this bug. ***
Comment 24•20 years ago
|
||
*** Bug 256323 has been marked as a duplicate of this bug. ***
Comment 25•20 years ago
|
||
Implementation of bug 251337 may provide an elegant workaround...
Comment 26•20 years ago
|
||
hm... I'm sorry, that would be alternative 1) of the original post. Although
it's not a complete solution, I think it's worth the effort. It would have
additional benefits (for instance, improved privacy) and would, generally
spoken, make mozilla (firefox) a more consistent unity...
Comment 27•20 years ago
|
||
*** Bug 185142 has been marked as a duplicate of this bug. ***
Comment 28•20 years ago
|
||
*** Bug 263205 has been marked as a duplicate of this bug. ***
Comment 29•20 years ago
|
||
What about using the new "@mozilla.org/storage" components for the download
list? I believe in the future global history and cache are planned to be moved
to it.
Updated•20 years ago
|
Product: Browser → Seamonkey
Comment 30•20 years ago
|
||
*** Bug 294979 has been marked as a duplicate of this bug. ***
Comment 31•19 years ago
|
||
*** Bug 293430 has been marked as a duplicate of this bug. ***
Comment 32•19 years ago
|
||
*** Bug 326405 has been marked as a duplicate of this bug. ***
Comment 33•19 years ago
|
||
Is there even a plan to fix this? Even 50 entries or less cause major slowdowns and disk thrashing.
Comment 34•19 years ago
|
||
Given the growing number of Firefox dups here and the suggestion in comment #4:
Shouldn't this bug be moved to the Core product?
Comment 35•19 years ago
|
||
I think that the Firefox version of this is bug #240525. Should they be consolidated? Lots more dups there too. Perhaps bug #304251 is a firefox dup also.
Comment 36•18 years ago
|
||
*** Bug 363045 has been marked as a duplicate of this bug. ***
Comment 37•18 years ago
|
||
I guess since this really is a problem, the download manager is not a separate thread from the Firefox browsing thread. Multi-threading is another way to address the issue.
Thread-list:
1. Firefox browsing
2. Download Manager gui
3. Download Manager downloader
4. Download Manager history updater & xml parser
Maybe 2 & 3 could be in the same thread. If implemented correctly, with this approach, Firefox wouldn't hang even if the Download Manager history thread hangs.
Since this bug is well over 4 years old, it should be safe to assume there is not much happening on this part.
The severity of this bug should be upped.
Comment 38•18 years ago
|
||
*** Bug 363414 has been marked as a duplicate of this bug. ***
Comment 39•18 years ago
|
||
I can understand if you want to use an embedded database or something like that, but I would really regret to see this XML format (or XML in general) be lost, because I've written a video clip manager that grabs the downloads.rdf to get metadata about downloaded video clips and saves them in a database where you can add other metadata. For this application it's a great help to have information about the download.
It's in general a huge problem to loose all context information after you've downloaded a file so your downloads.rdf is the only way to get something. I don't know which implementation you are using, but I would recommend to use a SAX parser istead of a DOM implementation to speed up processing. I'm frequently dealing with XML in my applications as well and it usually is no problem to have XML files of a few MB. Both Java and .NET implementations process them at pretty good speed. I would also recommend, not to free the memory every time the download manager closes. Especially with the settings "open on download" plus "close on completion" it is quite clear that the download manager will open and close a lot of times. If you completely intialize it every time it is clear why you have performance issues. You could at least add a configuration option to let users control if they prefer speed over memory consumption or vice versa.
Comment 40•18 years ago
|
||
If clearing the downloads succeeds, but file save as still freezes,
you must exit ffox and delete the downloads.rdf file from your profile.
After saving a page with 100's of icons, I could not even change directory
in the file open dialog - had to kill firefox.
see bug 159107 page saving/downloads takes too much time (is slow) ('marooned' entries in downloads.rdf)
https://bugzilla.mozilla.org/show_bug.cgi?id=159107
Comment 41•18 years ago
|
||
I have had this problem, long delays in saving web content or when starting download manager. I solved it by clearing the download manager list. It seems that FireFox parses the file that contains list of previously downloaded items, if the list is very long, it delays firefox, my suggestion is to simply, periodically clear your download list. It would be handy if there was a setting that allowed you to automatically trim the download list in download manager.
Comment 42•17 years ago
|
||
Apologies if I am misunderstanding something but I think I have been experiencing this bug in firefox on various versions of linux for some time and I don't think it is related to the *size* of downloads.rdf, only its existence, under some conditions that I have not been able to identify.
I currently use this version of firefox: Gecko/20070309 Firefox/2.0.0.3
I don't think it existed prior to firefox 2.
The slowness occurs in at least three contexts:
1. save page/link as ...
2. open file
3. open with other..
In all cases it seems to be concerned with reading the directory tree, and the time it takes seems to be related to the number of files in the directories traversed (this got much worse when X.org merged all its binaries with standard linux binaries in /usr/bin/ so that 'open with' became intolerably slow.)
Clearing downloads, using 'cleanup' button on the download manager does not seem to make it go away.
I still had the problem when downloads.rdf was quite small (a few hundred bytes).
However, the problem went away, if I deleted downloads.rdf, without any need to restart firefox. Since then I have downloaded a few dozen files, but the slowness has not returned.
All this shows that firefox is perfectly capable of reading large directories and displaying them quickly, but that under some conditions it takes an intolerably long time to read directories -- and this seems to be connected with the existence of downloads.rdf and perhaps the time since it was last deleted, though not necessarily its size.
I have been using mozilla (SeaMonkey 1.0.5), though not as often as Firefox.
So far I have never noticed this problem in mozilla. It seems to be firefox specific in my experience. But maybe that's because I use it much more (because of extensions -- otherwise I would stick with mozilla as I much prefer its 'preferences' mechanism).
I cannot see any reason why firefox should even look at downloads.rdf while the user is specifying a directory in which to save something, find a file to open, or find an application to run. If a file is being saved there might possibly be some reason to check whether it was saved previously, but I can't see why: all that's needed is to check whether a file of the same name exists and give the user the option to overwrite or save to a different name. At that point the download manager may wish to access downloads.rdf in order to record the selected location for the download, but not before.
I could have inserted this comment in one of the other bugreports concerned with slowness, but I chose this one because I wonder whether it is possible that a problem attributed to the size of the downloads list is actually cause by a bug in the directory traversal code.
I have not looked at any of the innards of firefox, so I apologise if I have raised a red herring, and should have described my experiences in another place.
Comment 43•17 years ago
|
||
Once the move to toolkit is complete, this issues would have been fixed by Bug 380250.
Updated•17 years ago
|
Comment 44•17 years ago
|
||
It is nonsensical that this blocks a resolved bug. Editing.
No longer blocks: 160624
Updated•17 years ago
|
Assignee: Jan.Varga → nobody
QA Contact: chrispetersen → download-manager
Comment 45•16 years ago
|
||
Dupe of (i.e. resolved by) bug 380250 now that we landed bug 381157.
Status: NEW → RESOLVED
Closed: 16 years ago
Resolution: --- → DUPLICATE
You need to log in
before you can comment on or make changes to this bug.
Description
•