Closed Bug 69009 Opened 24 years ago Closed 24 years ago

Loading bookmarks is O(n**2)

Categories

(SeaMonkey :: Bookmarks & History, defect)

All
Windows NT
defect
Not set
normal

Tracking

(Not tracked)

VERIFIED FIXED
mozilla0.9

People

(Reporter: dveditz, Assigned: waterson)

Details

(Keywords: perf)

Attachments

(2 files)

Loading bookmarks--which happens at startup--has O(n**2) behavior. On my windows machine my accumulated ~250 bookmarks takes about 2.5% of my normal 6 second startup. With a 1111 bookmark file my startup takes 42 seconds, and with larger files it gets worse quickly. The culprit appears to be BookmarkParser::ParseBookmarkInfo which gets called once per bookmark and calls RDFContainerImpl::IndexOf() to weed out duplicate bookmarks. 85% of the time spent in the 1111 IndexOf() calls is in 797,956 calls to InMemoryAssertionEnumeratorImpl::HasMoreElements, and 10% of the time is in 398,978 calls to RDFContainerUtilsImpl::IndexToOrdinalResource. (there were an additional 401,223 calls to HasMoreElements from InMemoryAssertionEnumeratorImple::GetNext but they account to only a tenth of a percent of the time spent in that function). There were an astounding 224 MILLION calls to Assertion::Release, which is a quick routine but in that volume still managed to account for about 8% of the total startup time. The sad thing is that there are NEVER any duplicates on startup because thye've long since been weeded out (except in the rare case someone has hand-edited their bookmarks file). Might be related to the crash in bug 46882, although if the guys bookmark file is really close to a meg I'm surprised it ever finished -- I tried an 8000 item bookmarks file and gave up after 10 minutes or so trying to start the product. Probably also related to bug 62907, high CPU use with large bookmarks from all those IndexOf calls. bug 10198 assigned to Ben is an odd one, it says we should warn the user if he tries to add a duplicate, but the current code will simply make it go away. bug 52199 reports that adding duplicates fails silently, and asks (without ever receiving an answer) if this was a formal UI decision and if so why? Good question! In light of the performance cost in making them go away I hope we have a very good reason. If we really do have to strip duplicate URLs (and again I have to ask why) then we might want to switch the algorithm to match on a hash table rather than using O(n**2) RDFContainer::IndexOf. It'd cost more memory (especially on a large bookmark file which is where we'd need the performance gain) but its memory we could probably toss at the end of the bookmark parsing step. Doing an IndexOf() when adding a single bookmark might not kill us as badly (though of course we need to verify that).
IMHO: 1) why check for dups at startup? If the user duplicated a bookmark that's their own problem, besides there may be reasons why dups are wanted in complex heirarchies (Shopping->Computers->Bookstores->Amazon, MyBooks->Amazon, but obviously more complex). 2) how do we determine a duplicate? Based on what keys? URL? Even this may prevent the user from doing what they want to do. 3) should be looking at 2) when we decide if a new bookmark addition is a duplicate. I'm not convinced dup-checking is needed at all, at least not off the top of my head. Just some thoughts.
perf keyword, sorry for spam
Keywords: perf
It is a little bit of topic, but i checked what files mozilla (including bookmarks) opens on startup. Results are below. Isn't it strange, that every file is opened 3 times ??? Maybe there is way to reduce it, so mozilla will start faster. mhankus opened file .mozilla/Default/izb81h28.slt/prefs.js read=Yes write=No (numopen=1) mhankus opened file .mozilla/Default/izb81h28.slt/prefs.js read=Yes write=No (numopen=1) mhankus opened file .mozilla/Default/izb81h28.slt/prefs.js read=Yes write=No (numopen=1) mhankus opened file .mozilla/Default/izb81h28.slt/chrome/user-skins.rdf read=Yes write=No (numopen=2) mhankus opened file .mozilla/Default/izb81h28.slt/chrome/user-skins.rdf read=Yes write=No (numopen=2) mhankus opened file .mozilla/Default/izb81h28.slt/chrome/user-skins.rdf read=Yes write=No (numopen=2) mhankus opened file .mozilla/Default/izb81h28.slt/chrome/user-locales.rdf read=Yes write=No (numopen=3) mhankus opened file .mozilla/Default/izb81h28.slt/chrome/user-locales.rdf read=Yes write=No (numopen=3) mhankus opened file .mozilla/Default/izb81h28.slt/chrome/user-locales.rdf read=Yes write=No (numopen=3) mhankus opened file .mozilla/Default/izb81h28.slt/localstore.rdf read=Yes write=No (numopen=4) mhankus opened file .mozilla/Default/izb81h28.slt/localstore.rdf read=Yes write=No (numopen=4) mhankus opened file .mozilla/Default/izb81h28.slt/localstore.rdf read=Yes write=No (numopen=4) mhankus opened file .mozilla/Default/izb81h28.slt/bookmarks.html read=Yes write=No (numopen=5) mhankus opened file .mozilla/Default/izb81h28.slt/bookmarks.html read=Yes write=No (numopen=5) mhankus opened file .mozilla/Default/izb81h28.slt/bookmarks.html read=Yes write=No (numopen=4) mhankus opened file .mozilla/Default/izb81h28.slt/search.rdf read=Yes write=No (numopen=5) mhankus opened file .mozilla/Default/izb81h28.slt/search.rdf read=Yes write=No (numopen=5) mhankus opened file .mozilla/Default/izb81h28.slt/search.rdf read=Yes write=No (numopen=4) mhankus opened file .mozilla/Default/izb81h28.slt/history.dat read=Yes write=No (numopen=5) mhankus opened file .mozilla/Default/izb81h28.slt/history.dat read=Yes write=No (numopen=5) mhankus opened file .mozilla/Default/izb81h28.slt/history.dat read=Yes write=Yes (numopen=4) mhankus opened file .mozilla/Default/izb81h28.slt/cookies.txt read=Yes write=No (numopen=5) mhankus opened file .mozilla/Default/izb81h28.slt/cookies.txt read=Yes write=No (numopen=5) mhankus opened file .mozilla/Default/izb81h28.slt/cookies.txt read=Yes write=No (numopen=5) mhankus opened file .mozilla/Default/izb81h28.slt/panels.rdf read=Yes write=No (numopen=5) mhankus opened file .mozilla/Default/izb81h28.slt/panels.rdf read=Yes write=No (numopen=5) mhankus opened file .mozilla/Default/izb81h28.slt/panels.rdf read=Yes write=No (numopen=4)
Blocks: 7251
stripping duplicate bookmarks is BAD! I shall not be denied my God-given right to make duplicate bookmarks! But I do understand the reasoning behind it. It isnt difficult to accadentaly create a bookmark either by slip-of-the-mouse or by hitting CTRL+D , and there are many people out there who, having created a bookmark, would have no clue how to remove it (yes, indeed. I support many such users at my company. They are invarably the same ones who always have two or three "New Folder"s scatterd around their desktop and start menu :) Anyway, the new prompt-for-info-when-adding-a-bookmark dialogue (which doesnt appear in my current build for some reason, but I know I have seen it) protects such people from themselves far better than any silly duplicate stripping would.
No longer blocks: 7251
Another thought: I can see why you might not want duplicate bookmarks in the same folder, so would it help performance much if you only search for a duplicate in the same bookmark folder?
> If we really do have to strip duplicate URLs (and again I have to ask why) unfortunately we do, because RDF is using URLs as Unique references. 99% of us probably agree that the bug lies there and should be fixed.
This appears to have been a kludge put in by rjc to fix bug 51021. I suspect that removing this will cause some people to start seeing duplicate bookmarks again, and we'll have to then fix that problem the right way. http://bonsai.mozilla.org/cvsview2.cgi?diff_mode=context&whitespace_mode=show&ro ot=/cvsroot&subdir=mozilla/xpfe/components/bookmarks/src&command=DIFF_FRAMESET&f ile=nsBookmarksService.cpp&rev2=1.146&rev1=1.145
Target Milestone: --- → mozilla0.9
I won't argue when and if for stripping duplicate bookmarks - but know this: allowing duplicate bookmarks opens a huge can of worms, I don't think the code handles them well at all. There are already a bunch of bugs in the system related to this. In a nutshell creating/manipulating a dupe bookmark typically resulted in the user losing one silently under several different scenarios.
Yeah, I remember those bugs. But I never understood why we had them: we're using RDF sequences to maintain the list of bookmarks in a folder, so (in theory) we should be able to distinguish >1 bookmark in the same folder with the same URL.
I think some of those bugs might have been related to copying folders from one place to another, which in the old FE would just add the copied folder resource into the destination, creating an alias to the original rather than a bona fide copy. The new FE actually creates a new anonymous resource for the folder on a copy, and places the contents into the new resource.
This patch seems fine, given that in a container, adding a duplicate would yield: | -- _1 --> http://www.foo.com/ | Container Resource | -- _2 --> http://www.bar.com/ | | -- _3 --> http://www.foo.com/ Delete 3, | -- _1 --> http://www.foo.com/ | Container Resource | -- _2 --> http://www.bar.com/ | | -- X ---> http://www.foo.com/ http://www.foo.com/ still exists in the graph, as item _1. _1 and _3 are completely in sync, and are updated as such (change a property on _1 and it changes on _3). This is a bug, but seems harmless enough. r=ben@netscape.com for the patch to remove the hack.
oops, thought I sr'ed here already. darn bugzilla security!
Fix checked in. Be vigilant for the re-appearance of bug 51021.
Status: NEW → RESOLVED
Closed: 24 years ago
Resolution: --- → FIXED
Possible re-occurence of bug 51021. I had a bookmark file grow to 1.8MB the other day, with 2 or 3 entries in one folder duplicated over and over. Unfortunately, I have no idea what caused it, and it isn't happening again. But a heads-up that, indeed, it may be possible to trigger it.
Oy. After some investigation, I discovered that this seems to be causing 74969 (importing a bookmarks file into itself causes the file to blow up). Seems to be related to duplicates. The old checking code was preventing this. Rather than just adding this old code back, here's a proposed solution: Bookmarks folders are currently being stored with an ID set to the anonymous resource URI. Scrap this. Make the resource URI be generated for each folder as it is created. This means that the new folder structure that is imported is unique.
mass-verifying claudius' Fixed bugs which haven't changed since 2001.12.31. if you think this particular bug is not fixed, please make sure of the following before reopening: a. retest with a *recent* trunk build. b. query bugzilla to see if there's an existing, open bug (new, reopened, assigned) that covers your issue. c. if this does need to be reopened, make sure there are specific steps to reproduce (unless already provided and up-to-date). thanks! [set your search string in mail to "AmbassadorKoshNaranek" to filter out these messages.]
Status: RESOLVED → VERIFIED
Product: Browser → Seamonkey
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: