Open
Bug 470471
Opened 17 years ago
Updated 3 years ago
hunspell should use memory more intelligently
Categories
(Core :: Spelling checker, defect)
Tracking
()
NEW
People
(Reporter: vlad, Unassigned)
References
(Depends on 1 open bug)
Details
(Keywords: memory-footprint, perf)
The first time hunspell gets initialized, it loads the dictionary from disk. This results in about 2MB of memory being used, most of it allocated in HashMgr::add_word. Every entry in the hash table is a one-off malloc, so there could even be lots of slop here if entries are rounded up in allocation.
Instead, since the dictionary data is static, I'd suggest that we figure out an on-disk binary format that we can just mmap instead of rebuilding the table on every run. Would need to figure out how to mix in non-static data (user dicts and the like).
Comment 1•17 years ago
|
||
Hrm... if we're guaranteed that the on-disk file won't change (is that a safe assumption?), we could just mmap it and store pointer+length pairs for everything, no?
I'm certainly hoping we won't have to muck with the file format: it's a standard external format.
Comment 2•17 years ago
|
||
It's not like it'd be rocket science to make a python script to convert it at build time though, right? Or are you concerned about being able to drop a new dictionary file into an existing build?
Comment 3•17 years ago
|
||
Yeah, people ship additional dictionaries in extensions, copying them wholesale from OOo. I'd hate to lose that ability, and I don't think changing the disk format is necessary to solve this bug.
Comment 4•17 years ago
|
||
see bug 468779
| Reporter | ||
Comment 5•17 years ago
|
||
Bug 468779 looks like a great place to start. I'm not proposing that we remove the ability to use additional dictionaries, but to optimize the main and most common use case. Those optimizations can apply to additional dictionaries as well -- it's not unreasonable to do a one-time preprocessing step that'll dump data to disk if we haven't seen that dictionary before, and we can preprocess the default dictionary when we do compreg before packaging things up...
Updated•16 years ago
|
Updated•9 years ago
|
Whiteboard: [platform-rel-Intel]
Updated•9 years ago
|
platform-rel: --- → ?
Updated•9 years ago
|
platform-rel: ? → ---
Whiteboard: [platform-rel-Intel]
Updated•3 years ago
|
Severity: minor → S4
You need to log in
before you can comment on or make changes to this bug.
Description
•