Quick suggest uses the KeywordTree class to map keywords to suggestions and to look up a suggestion given a keyword. KeywordTree is essentially a trie or prefix tree, where each node in the tree is a single character and each keyword in the dataset corresponds to a path in the tree. This is a perfectly good way to store the mapping from keywords to suggestions in theory, but while I was investigating the memory usage of larger dataset sizes, I found that its memory footprint is waaaay bigger than I expected. In fact it's much bigger than keeping a single global Map whose keys are the full keywords. Judging by about:memory, the problem seems to be all the Map instances that are created as the nodes of the tree.
I used about:memory to run some numbers with several different lookup implementations. I'll attach a messy patch that includes them and that also creates mock suggestions data so large datasets can be tested.
The implementations are:
KeywordTree - The current implementation as described above.
KeywordTree2 - The same as KeywordTree except it used plain JS objects instead of Maps. I wanted to see if using plain JS objects would improve the memory footprint. It did.
resultsByKeyword - Uses a single Map that maps from full keyword strings to suggestions (called "results" inside UrlbarQuickSuggest).
resultsByID - Uses two Maps. One map is
resultIDsByKeyword that maps from full keyword strings to suggestion IDs. The other map is
resultsByID that maps from suggestion IDs to suggestions. I wanted to see if specifically keeping a single reference to each suggestion object instead of one per keyword would improve the footprint but of course it did not. Objects are not copied when they are added as values to a Map.
Currently the Suggest dataset contains 1,007 suggestions. I tested mock datasets with 1x, 2x, 10x, and 100x suggestions, where "x" is the current number of suggestions.
Below are three sets of results. Each set is labeled "X vs Y", meaning the memory diff between X and Y according to about:memory. For each set, I tested datasets with 1x, 2x, 10x, and 100x suggestions. A negative value means Y had a smaller footprint than X. A positive value means Y had a bigger footprint. "B" means bytes.
KeywordTree vs resultsByKeyword
1x: -11,841,120 B
2x: -82,398,492 B
10x: -638,278,891 B
100x: -6,584,932,099 B
KeywordTree vs KeywordTree2
1x: -8,565,470 B
2x: -65,262,421 B
10x: -516,379,233 B
100x: -5,337,783,878 B
resultsByKeyword vs resultsByID
1x: +1,459,470 B
2x: +78,123 B
10x: -831,868 B
100x: +6,139,777 B
resultsByKeyword had the smallest footprint, although it's kind of a wash with
resultsByID. I don't think the diff between those two is very meaningful, so we should prefer the simpler implementation with a single Map.
KeywordTree had the largest footprint by far.
KeywordTree2 improved on that but its footprint is still huge.
I also compared the memory footprints of different dataset sizes within implementations. The following are the memory diffs between when Suggest is disabled vs. particular dataset sizes, per implementation. I'll only list
KeywordTree is the current implementation and
resultsByKeyword is the best performing one.
1x: +13,570,062 B
2x: +93,375,734 B
10x: +715,341,030 B
100x: +7,375,708,991 B
1x: +2,805,774 B
2x: +12,054,074 B
10x: +78,138,971 B
100x: +791,853,724 B
These results say that increasing the dataset size by 100x uses ~7 GiB of memory (!!!) with the current
KeywordTree implementation but ~755 MiB with the
755 MiB is still a lot and at some point we may need to think about not keeping all suggestions in memory all the time. But right now let's switch to the