Open Bug 470471 Opened 17 years ago Updated 3 years ago

hunspell should use memory more intelligently

Categories

(Core :: Spelling checker, defect)

x86
All
defect

Tracking

()

People

(Reporter: vlad, Unassigned)

References

(Depends on 1 open bug)

Details

(Keywords: memory-footprint, perf)

The first time hunspell gets initialized, it loads the dictionary from disk. This results in about 2MB of memory being used, most of it allocated in HashMgr::add_word. Every entry in the hash table is a one-off malloc, so there could even be lots of slop here if entries are rounded up in allocation. Instead, since the dictionary data is static, I'd suggest that we figure out an on-disk binary format that we can just mmap instead of rebuilding the table on every run. Would need to figure out how to mix in non-static data (user dicts and the like).
Hrm... if we're guaranteed that the on-disk file won't change (is that a safe assumption?), we could just mmap it and store pointer+length pairs for everything, no? I'm certainly hoping we won't have to muck with the file format: it's a standard external format.
It's not like it'd be rocket science to make a python script to convert it at build time though, right? Or are you concerned about being able to drop a new dictionary file into an existing build?
Yeah, people ship additional dictionaries in extensions, copying them wholesale from OOo. I'd hate to lose that ability, and I don't think changing the disk format is necessary to solve this bug.
Bug 468779 looks like a great place to start. I'm not proposing that we remove the ability to use additional dictionaries, but to optimize the main and most common use case. Those optimizations can apply to additional dictionaries as well -- it's not unreasonable to do a one-time preprocessing step that'll dump data to disk if we haven't seen that dictionary before, and we can preprocess the default dictionary when we do compreg before packaging things up...
Keywords: footprint, perf
Severity: normal → minor
Depends on: 468779
Whiteboard: [platform-rel-Intel]
platform-rel: --- → ?
platform-rel: ? → ---
Whiteboard: [platform-rel-Intel]
Severity: minor → S4
You need to log in before you can comment on or make changes to this bug.