Closed Bug 388004 Opened 18 years ago Closed 18 years ago

Cleanup takes up to 10 minutes and 90% CPU with large number of downloads

Categories

(Toolkit :: Downloads API, defect)

1.8 Branch
x86
Windows Server 2003
defect
Not set
normal

Tracking

()

RESOLVED INCOMPLETE

People

(Reporter: mariusads, Unassigned)

Details

Attachments

(1 file)

User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.2; en-US; rv:1.8.1.4) Gecko/20070515 Firefox/2.0.0.4 Build Identifier: Mozilla/5.0 (Windows; U; Windows NT 5.2; en-US; rv:1.8.1.4) Gecko/20070515 Firefox/2.0.0.4 I have been hired by a company to create a backup of a website which contains about 320.000 documents of various sizes. They no longer have access to the server, the documents are generated from a database... long story short... I was actually forced to write an application that moves the mouse on the screen, clicks on the document to download it, moves the mouse again and clicks on the Next page and this process is repeated forever, for up to 20.000 documents at a time. The pages are loaded very fast, a cycle can be done in about 3 seconds. The problem appears usually when over 2000 documents are downloaded, Firefox starts needing a lot of time to open the download window and also needs more time to change the page after clicking on the Next button. I suspect this is because the downloads.rdf reaches over 2 MB in size and the history.dat file is over 5.5 MB, so it probably needs more time to add entries to these files. This causes the application moving the mouse to skip a cycle, to click twice on the download document link or twice on the Next arrow, either downloading the document twice in the first instance or skipping one completely in the other. Because of this, after about 3000 files I stop the process, click on the Downloads link on the menu, wait up to *1 minute* to appear (is it necessary to load all entries in the Downloads page when it's shown?) and after I click on the Cleanup button, it needs about 7-8 minutes to remove all entries while using between 90 and 99 percent of the CPU (1.8Ghz,512MB DDR). This is the biggest problem actually, it makes me think it removes one entry at a time and rewrites the downloads.rdf over and over, if it was a simply query like "delete from downloads.rdf" it would take seconds, maybe even less than a second. I think a limit to how many files in the DOwnloads window are shown or how many are kept in the downloads.rds would be useful, same for history.dat files. I know, most people don't visit 10.000 pages or download 10.000 small files daily but it wouldn't hurt. As a side note, after about 6.000 pages and documents downloaded Firefox consumes about 200MB of memory and about 240MB of virtual memory, which also doesn't seem right. Reproducible: Always Steps to Reproduce: 1. have lots of files in the Downloads window 2. click cleanup 3. wait Actual Results: I believe the Cleanup button should work very fast and not use 90-100% CPU while removing items from the list, right now it seems to work as if it deletes one file at a time from the list, which wouldn't make sense to me. Expected Results: The downloads cleanup button should work very fast, after all it removes all the entries, without any conditions (Except that it doesn't remove the files that are still downloading). So some features could be: * make the Cleanup button work faster and use less CPU * implement options to restrict the number of files that are shown in the Downloads window or are kept in history or automatically remove the oldest downloaded files if the number of files exceeds a certain number * implement options to restrict the number of pages kept in the history.dat file, so that Firefox would be more responsive throughout its use. Thank you for taking time to read this feature request / possible bug
OFF TOPIC: why don't you use an automatic spider as http://www.httrack.com/ to download files? could you try with the latest nightly build (Trunk/Minefield), the download manager backend has been revised, and you could tell how it performs against the old
Severity: enhancement → normal
Summary: Cleanup takes up to 10 minutes @90% on large number of files. → Cleanup takes up to 10 minutes and 90% CPU with large number of downloads
Version: unspecified → 2.0 Branch
Thank you for answering. I have tried using MetaProducts Offline Explorer which worked very well for other websites in the past but this one is somewhat different. The documents I need to save can only be accessed by selecting a subcategory that is loaded using Ajax after I select a category and after this, I have to enter a few keywords related to that category in a form to get the actual list of documents, one document on a page at a time, up to 35-40.000 documents one after another. If I don't enter any keywords, only the first 12 results are shown and that doesn't help... The link is also in an iframe on the page, along with a text preview of that documents.. it's really nasty. So, it's not really possible to automate it as much as I'd like. I'll try using the latest nightly build later on and I hope I won't forget to update you people with a message. Right now it's a bit hard to change the browser because I kind of have to reach a daily quota of downloaded files if I want to download the whole site by the end of the month and the server used to time out was for a few hours today (it's somewhere in Zair, South Africa, on a relatively unstable connection). So it's a bit hard to stop the script right now.
I would love to get a copy of your downloads.rdf file for perf testing if you were alright with that. For Firefox 3 we actually switched from using rdf to store this information to using sqlite, so I suspect there will be a large performance win.
The downloads.rdf file compressed with rar. The domain was changed with mysite.com but the length of the old domain was exactly the same as "mysite.com" so this file has the exact number of bytes, the exact URL lengths and so on. There are about 2012 records inside (this is the number the replace function returned).
Thanks - trunk is looking a lot better than branch, but there are still some perf issues to be looked into (namely, updating the UI)
Similar bugs were duped to bug 240525, but Shawn will probably keep this bug open for extra work on the UI.
Actually, I'm not sure what to do with this bug. We won't have to worry about the UI after the UI overhaul since the cleanup button is going away (still available in clear private data). The backend doesn't seem to have a problem handling this at all :)
Resolving old UNCONFIRMED Download Manager bugs as INCOMPLETE. If you still see this issue, please reopen. To mark all these bug changes as read, filter on ONOMATOPOEIA.
Status: UNCONFIRMED → RESOLVED
Closed: 18 years ago
Resolution: --- → INCOMPLETE
Product: Firefox → Toolkit
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: