Closed Bug 287066 Opened 17 years ago Closed 5 years ago
History automatic indexing (full text indexing of pages content)
User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.7.6) Gecko/20050223 Firefox/1.0.1 Build Identifier: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.7.6) Gecko/20050223 Firefox/1.0.1 History is useful to find pages you consulted some times ago. However, when you browse a site, the title of each page is often the same. The history results in tens of links with the same title and no way to know which page is relevant to you. Also, when you browse the Web, you can go through a very large number of sites and do not remember at all the title/name/URL of the site. You simply remember some keywords related to the content of the page (e.g. firefox, block, image for http://adblock.mozdev.org/) A very nice functionnality would be to have an automatic indexing (as performed by a web crawler) of the history. When you access a web page, it is added to the history and automatically indexed for future search. When you do a fast history search, the search is performed on the title, and also of the index database. Related to https://bugzilla.mozilla.org/show_bug.cgi?id=286544 which is on bookmarks This bug https://bugzilla.mozilla.org/show_bug.cgi?id=126621 presents a solution based on caches for mozilla. However, this solution has many drawbacks (limitation in the number of pages cached, solution slower than a keyword indexing, cache can be flushed, etc.) Reproducible: Always
I have an argument similar to the one of https://bugzilla.mozilla.org/show_bug.cgi?id=286544 The first entry to find a site is to use google. However, the first pages do not always contain the links you are looking for. Reasons are: -the site is not well ranked -you do not remember the important keywords One solution would be to automatically create your own database of sites and to perform the search in this database. A unified database of bookmarks and history links would be an elegant solution. Bookmarks would be history links with a given rating.
Status: RESOLVED → UNCONFIRMED
Resolution: EXPIRED → ---
History and bookmarks are being reworked for 2.0 http://wiki.mozilla.org/Places However, I don't think full text indexing is being implemented.
Assignee: bugs → nobody
QA Contact: mozilla → history
Resorting this enhancement into places...
Component: History → Places
QA Contact: history → places
Very interesting, at least as an extension. I'm wondering how large would the database be tough. Let's guess for English : 5000 words is supposed to cover 98.5% of the words. The average English word length is said to be 5 characters. We have to add 1 byte for varchar, 4 bytes for (integer type) count number and 4 bytes for the page id (an integer too). So we have 5000 x (5+1+4+4) = 50 000 bytes ~= 50 KiB. We can admit that 5000 is very few, so if we do it with 500 000 words (a little less than all the English words), we end up with a database of less than 7 MiB, so the size is not a problem.
Status: UNCONFIRMED → NEW
Ever confirmed: true
Summary: History automatic indexing → History automatic indexing (full text indexing of pages content)
Read word id instead of page id. But the size estimation is wrong, since SQLite use variable integer size. It would be more like 5+1+3+3 since word id and page id aren't likely to be greater than 8 388 608. The words table could be smaller than 7 MiB, but what would be big is the table which stores page/word pairs. The structure would be id - word id - page id - word count The size per record would be 4+3+3+2 = 12 bytes. But now how much record would we have ? Let's say that the mean number of unique words per page would 1000. For the maximum of pages allowed by default (40 000), we would get 1000 x 40 000 x 12 = 480000000 bytes which is a little more than 450 MiB. And a little more than the words table... not counting the indexes.
Places received a great feedback from FF3 beta users because it partially solves a very old problem with bookmarks: we bookmark a lot but do not remember accurately enough what we bookmarked in order to find it again. With full page indexing of bookmarks and history, you bring to users a google-like service tailored to their needs. This is really a killer feature. Indeed, a lot of people uses google as a start page, and this is a non-sense as what you want is very often something you already looked for, so something you already have in your bookmarks or history. If you tell me it is 450MiB to have this feature for 40 000 pages (which sounds a lot) I would say it is fine. We all have already have much more for imap offline support and various indexing services. Moreover, simple heuristics can reduce the size of the indexes: 1)Limit the number of words indexed per page (for intance to 100), indexing the most frequent words first (not taking into account words like a, the, etc.). Also, words in title and HTML headers should always be indexed. 2)implement the long awaited bookmark sanity check that proposes to check on-demand all bookmarks and propose to remove bookmarks that return a 404 error. 3)give a UI preference to index bookmarks only or bookmarks and history. 4)Fix a hard limit in the size of the index and then use a FIFO policy. Probably FIFO is not the best policy in that case, as pages browsed a long time ago will be that ones that really need support to be found (as you are unlikely to remember where the bookmark is). I strongly believe this is a must have feature that everybody wants to have, as it will dramatically improve the usability of bookmarks and history.
I just stumbled across Breadcrumbs(1), which does exactly what this bug is about. Not updated since 10.02.2007 but it works perfectly even under the soon-to-be-released Firefox 3 with Nightly Tester Tools(2). It looks like the author needs a little bit of motivation, so feel free to tell him how much you like his add-on by adding a review on the add-on's page. (1)https://addons.mozilla.org/fr/firefox/addon/2954 (2)https://addons.mozilla.org/fr/firefox/addon/6543
We're not going to index content of the pages in Places (it may happen in Activity Stream or elsewhere, but for sure not in Places).
Status: NEW → RESOLVED
Closed: 16 years ago → 5 years ago
Resolution: --- → WONTFIX
You need to log in before you can comment on or make changes to this bug.