Open Bug 691176 Opened 13 years ago Updated 2 years ago

Investigate reducing calls to malloc() from hunspell

Categories

(Core :: Spelling checker, defect)

x86_64
Linux
defect

Tracking

()

People

(Reporter: justin.lebar+bug, Unassigned)

References

Details

(Whiteboard: [MemShrink:P3])

My preliminary analysis from bug 688979 indicates that about 4% of mallocs in the browser are due to hunspell.  This places hunspell just after nsTArray and ns{C,}String as the third-heaviest user of malloc in the browser.

We should figure out if all this malloc'ing is necessary, and whether it's contributing to fragmentation.

57880 malloc's ( 2%):
  malloc at memory/jemalloc/jemalloc.c:6075
  hunspell_malloc(unsigned long) at extensions/spellcheck/hunspell/src/hunspell_alloc_hooks.h:61
  HashMgr::add_word(char const*, int, int, unsigned short*, int, char const*, bool) at extensions/spellcheck/hunspell/src/hashmgr.cpp:176

44553 malloc's ( 2%):
  malloc at src/memory/jemalloc/jemalloc.c:6075
  hunspell_malloc(unsigned long) at extensions/spellcheck/hunspell/src/hunspell_alloc_hooks.h:61
  HashMgr::decode_flags(unsigned short**, char*, FileMgr*) at extensions/spellcheck/hunspell/src/hashmgr.cpp:606
Whiteboard: [MemShrink]
Here are the same stats but with allocation sizes:

57880 malloc's ( 2%, cuml 24%) size (min 41, max 63, median 47, mean 47.6):
  malloc at src/memory/jemalloc/jemalloc.c:6075
  hunspell_malloc(unsigned long) at src/extensions/spellcheck/hunspell/src/hunspell_alloc_hooks.h:61
  HashMgr::add_word(char const*, int, int, unsigned short*, int, char const*, bool) at src/extensions/spellcheck/hunspell/src/hashmgr.cpp:176

44553 malloc's ( 2%, cuml 28%) size (min 2, max 22, median 2, mean 3.7):
  malloc at src/memory/jemalloc/jemalloc.c:6075
  hunspell_malloc(unsigned long) at src/extensions/spellcheck/hunspell/src/hunspell_alloc_hooks.h:61
  HashMgr::decode_flags(unsigned short**, char*, FileMgr*) at src/extensions/spellcheck/hunspell/src/hashmgr.cpp:606

Note that the decode_flags mallocs are all very small.  AIUI, hunspell's affix flags are not used heavily in simpler languages like English.  So in another language, we might see larger malloc sizes in decode_flags.

But maybe we can hack hunspell to avoid malloc'ing unless it has a very large flags array.
Thankfully both these hotspots occur during HashMgr::load_tables, which I presume happens only once.  So I'd guess that the fragmentation affects of these allocations are small.
Caolan, Nemeth: do you know of anybody familiar with the hunspell code who can help us determine how best to improve the situation here?  Thanks!
Over to Justin to investigate whether we need to worry about this etc.
Assignee: nobody → justin.lebar+bug
Whiteboard: [MemShrink] → [MemShrink:P2]
jlebar, do we still care about this?
Maybe?  It really is a lot of malloc() calls.  But I haven't had time to do malloc() hunting lately.
So what is the concern?  Fragmentation?  The time taken for all those calls?  If it's the latter, this doesn't need to be a MemShrink bug.
Fragmentation, yes.

It seems unlikely that this is actually a major cause of fragmentation, because most (all?) of the calls happen all at once.  But my previous instincts about what would and wouldn't be a large cause of fragmentation have been off the mark, so I don't attach a lot of weight to that speculation.

I'd be OK downgrading this to a P3.
Whiteboard: [MemShrink:P2] → [MemShrink:P3]
Assignee: justin.lebar+bug → nobody
Severity: normal → S3
You need to log in before you can comment on or make changes to this bug.