Last Comment Bug 669116 - Add memory reporter for the spell checker
: Add memory reporter for the spell checker
Status: RESOLVED FIXED
[MemShrink:P3]
:
Product: Core
Classification: Components
Component: Spelling checker (show other bugs)
: Trunk
: All All
: -- normal (vote)
: mozilla8
Assigned To: :Ehsan Akhgari (busy, don't ask for review please)
:
Mentors:
Depends on:
Blocks: DarkMatter MemShrinkTools MatchStartupMem
  Show dependency treegraph
 
Reported: 2011-07-03 21:12 PDT by Nicholas Nethercote [:njn]
Modified: 2011-09-07 14:18 PDT (History)
15 users (show)
khuey: in‑testsuite-
See Also:
Crash Signature:
(edit)
QA Whiteboard:
Iteration: ---
Points: ---
Has Regression Range: ---
Has STR: ---


Attachments
output from DMD (11.86 KB, text/plain)
2011-08-08 23:27 PDT, Nicholas Nethercote [:njn]
no flags Details
WIP (6.52 KB, patch)
2011-08-09 22:33 PDT, :Ehsan Akhgari (busy, don't ask for review please)
no flags Details | Diff | Review
Patch (v1) (6.88 KB, patch)
2011-08-10 10:22 PDT, :Ehsan Akhgari (busy, don't ask for review please)
n.nethercote: review+
khuey: review+
Details | Diff | Review

Description Nicholas Nethercote [:njn] 2011-07-03 21:12:49 PDT
From bug 653817 comment 7, HashMgr::add_word accounts for about 2.7MB of memory usage on my Linux64 machine.  And there's probably a bit more in other parts of the hash table, eg. the main array.

It'd be good to have memory reporters for this.  AFAICT the dictionary isn't created at browser start-up, but once it is created is hangs around forever?  So a memory reporter would be helpful in answering the question "why does Firefox use more memory once I close all tabs than it did at startup?"

RyanVM, I can attempt this, but I'm happy to defer to you if you want to do it;  you seem to know this component well judging from hg logs.
Comment 1 Justin Lebar (not reading bugmail) 2011-07-03 22:05:04 PDT
I suggest considering dropping the hash on memory-pressure in bug 669119.
Comment 2 Ryan VanderMeulen [:RyanVM] 2011-07-04 08:07:59 PDT
njn, I just keep the in-tree Hunspell in sync with upstream. Olli and/or Ehsan are probably better to have involved.
Comment 3 Nicholas Nethercote [:njn] 2011-08-08 23:27:44 PDT
Created attachment 551681 [details]
output from DMD

Here's some output from DMD (see bug 676724) after starting the browser and loading Gmail.  Notable things:

- Almost 4MB of space is used by the spelling dictionary!

- Lots of slop in the decode_flags entry:

==13617== Unreported: 713,744 bytes in 44,556 heap block(s) in record 10 of 2686
7:
==13617==  Requested bytes unreported: 164,772 / 164,772
==13617==  Slop      bytes unreported: 548,972 / 548,972
==13617==    at 0x4029F90: malloc (vg_replace_malloc.c:235)
==13617==    by 0x403B040: moz_malloc (mozalloc.cpp:120)
==13617==    by 0x7749BD5: HashMgr::decode_flags(unsigned short**, char*, FileMg
r*) (hashmgr.cpp:605)

  Now, DMD's allocator adds more slop than jemalloc does for really small blocks (eg. < 8 bytes), but still!  That's a lot of slop.
Comment 4 Ryan VanderMeulen [:RyanVM] 2011-08-09 15:03:21 PDT
CCing upstream Hunspell devs also.
Comment 5 :Ehsan Akhgari (busy, don't ask for review please) 2011-08-09 15:50:28 PDT
So, this would require us to modify the upstream project's code.  Nick, do we usually do that for code that we borrow from other projects which needs to have memory reporters?

Also, can you please point me to some docs on how to add the memory reporter?  The stuff that I found in the code are a bit confusing.  I know that I saw a howto for that somewhere, but I can't find it right now...
Comment 6 Justin Lebar (not reading bugmail) 2011-08-09 15:52:45 PDT
The memory reporters in nsMemoryReporterManager.cpp are pretty simple.  For example, http://mxr.mozilla.org/mozilla-central/source/xpcom/base/nsMemoryReporterManager.cpp#240
Comment 7 Nicholas Nethercote [:njn] 2011-08-09 17:07:34 PDT
(In reply to Ehsan Akhgari [:ehsan] from comment #5)
> So, this would require us to modify the upstream project's code.  Nick, do
> we usually do that for code that we borrow from other projects which needs
> to have memory reporters?

There isn't a standard procedure.  SQLite is the only upstream project for which we have memory reporter, AFAIK, and we use its existing memory measurement API.
Comment 8 :Ehsan Akhgari (busy, don't ask for review please) 2011-08-09 22:33:57 PDT
Created attachment 552028 [details] [diff] [review]
WIP

I think I've got everything right, but "spellcheck" doesn't appear in about:memory?verbose.  Do you have any idea what I've missed?
Comment 9 Nicholas Nethercote [:njn] 2011-08-09 23:06:17 PDT
Comment on attachment 552028 [details] [diff] [review]
WIP

Review of attachment 552028 [details] [diff] [review]:
-----------------------------------------------------------------

You need to register the memory reporter with NS_RegisterMemoryReporter.  Wherever hunspell is initialized, that's probably the best place to register the reporter.
Comment 10 Nicholas Nethercote [:njn] 2011-08-09 23:19:19 PDT
Comment on attachment 552028 [details] [diff] [review]
WIP

Review of attachment 552028 [details] [diff] [review]:
-----------------------------------------------------------------

One concern:  the macros override malloc/calloc/realloc/etc, but do not affect |operator new| et al.  ATM malloc accounts for all the big allocations but that's not guaranteed to remain true in the future.

Also, I wonder if calling malloc_usable_size for every alloc/free is expensive.

And I won't pretend to understand how the configure.in change works.
Comment 11 :Ehsan Akhgari (busy, don't ask for review please) 2011-08-10 10:21:22 PDT
(In reply to Nicholas Nethercote [:njn] from comment #9)
> You need to register the memory reporter with NS_RegisterMemoryReporter. 
> Wherever hunspell is initialized, that's probably the best place to register
> the reporter.

Ah, right.  Yes, this fixes the problem!

(In reply to Nicholas Nethercote [:njn] from comment #10)
> One concern:  the macros override malloc/calloc/realloc/etc, but do not
> affect |operator new| et al.  ATM malloc accounts for all the big
> allocations but that's not guaranteed to remain true in the future.

Well, you can't really "override" the C++ allocation operators defined in mozalloc.h.  I don't have a solution for how we should deal with that scenario yet, so I decided that we should cross that bridge when we come to it.

> Also, I wonder if calling malloc_usable_size for every alloc/free is
> expensive.

The jemalloc implementation seems pretty simple.  And also there are lots of stuff in the hunspell code that can be made more efficient.  If we want to spend energy on it, I'd rather us mix some of the other craziness in that code.  :-)

Which is my way of saying that I'm not too worried about that for now.

> And I won't pretend to understand how the configure.in change works.

It's easy.  Those changes put the conditional include at the end of mozilla-config.h, which is force-included in all of our source files.  The preprocessor macro is however defined only for files in the hunspell directory, which enables us to include alloc_hooks.h only for those files.

This is hackish, but it was the best way that I could think of in order for us to make as little change to hunspell as we can afford to (and indeed, this patch achieves this with *no* change to hunspell, which makes me happy).
Comment 12 :Ehsan Akhgari (busy, don't ask for review please) 2011-08-10 10:22:02 PDT
Created attachment 552114 [details] [diff] [review]
Patch (v1)
Comment 13 Nicholas Nethercote [:njn] 2011-08-10 17:53:15 PDT
Comment on attachment 552114 [details] [diff] [review]
Patch (v1)

Review of attachment 552114 [details] [diff] [review]:
-----------------------------------------------------------------

This malloc shim business is clever-but-gross.  I'm giving r+ but I want a second opinion from someone who knows more about build system stuff:  paging khuey!

::: configure.in
@@ +9392,5 @@
> +/* Force-include alloc_hooks.h for hunspell, so that we don't need to modify it
> + * directly. */
> +#if defined(HUNSPELL_STATIC)
> +#include "alloc_hooks.h"
> +#endif

This introduces coupling between configure.in and extensions/spellcheck/hunspell/src/Makefile.in which is ugly but I can live with it if you:

- Add to this comment something like "HUNSPELL_STATIC is defined in extensions/spellcheck/hunspell/src/Makefile.in unless --enable-system-hunspell is defined".

- Rename alloc_hooks.h to hunspell_alloc_hooks.h.

- Add a comment in extensions/spellcheck/hunspell/src/Makefile.in pointing out that HUNSPELL_STATIC is referred to in configure.in.

::: extensions/spellcheck/hunspell/src/alloc_hooks.h
@@ +39,5 @@
> + * reporting to hunspell without modifying its code, in order to ease future
> + * upgrades.
> + *
> + * This file is force-included through mozilla-config.h which is generated
> + * during the configure step.

Can you mention here that operator new/new[] isn't covered by this file, but we live with that because those allocators don't account for much of the memory used by Hunspell.

@@ +42,5 @@
> + * This file is force-included through mozilla-config.h which is generated
> + * during the configure step.
> + */
> +
> +// Prevent the standard macros to be redefined

Nit: "from being redefined"

::: extensions/spellcheck/hunspell/src/mozHunspell.cpp
@@ +94,5 @@
>  
> +// Memory reporting stuff
> +static PRInt64 gHunspellAllocatedSize = 0;
> +
> +void hunspell_report_memory_allocation(void* ptr) {

Nit: Wouldn't HunspellReportMemoryAllocation be more standard style?  Likewise for the other functions.

@@ +109,5 @@
> +    "explicit/spellcheck",
> +    KIND_HEAP,
> +    UNITS_BYTES,
> +    hunspell_get_current_allocated_size,
> +    "Memory used by the Hunspell spellchecking engine.  This number accounts "

Nit "spell checking" seems to be the preferred usage, so I'd use "explicit/spell-check" for the path and "spell checking" in the description.

@@ +110,5 @@
> +    KIND_HEAP,
> +    UNITS_BYTES,
> +    hunspell_get_current_allocated_size,
> +    "Memory used by the Hunspell spellchecking engine.  This number accounts "
> +    "for the memory in use by Hunspell's internal data structures."

Would "Most of this space is taken up by the dictionary." be a better 2nd sentence?

@@ +135,5 @@
>  
>  mozHunspell::~mozHunspell()
>  {
>    mPersonalDictionary = nsnull;
>    delete mHunspell;

You should unregister the reporter here.  That'll require adding a nsCOMPtr<nsIMemoryReporter> member to mozHunspell.

(Hmm, this is beyond the scope of your patch, but I see that the observer created by mozHunspell::Init().  Is that a leak, or do observer objects get auto-released by the observer service when they're destroyed?)
Comment 14 Kyle Huey [:khuey] (khuey@mozilla.com) 2011-08-10 17:56:28 PDT
Comment on attachment 552114 [details] [diff] [review]
Patch (v1)

Yeah, this is a little nasty, but I don't want to build infrastructure for force including headers for this one case.  If we need to do this again in the future we'll do it right.
Comment 15 :Ehsan Akhgari (busy, don't ask for review please) 2011-08-15 14:07:47 PDT
(In reply to Nicholas Nethercote [:njn] from comment #13)
> @@ +110,5 @@
> > +    KIND_HEAP,
> > +    UNITS_BYTES,
> > +    hunspell_get_current_allocated_size,
> > +    "Memory used by the Hunspell spellchecking engine.  This number accounts "
> > +    "for the memory in use by Hunspell's internal data structures."
> 
> Would "Most of this space is taken up by the dictionary." be a better 2nd
> sentence?

Actually, I don't think so.  It's less precise, so I'd rather keep the existing description.

> (Hmm, this is beyond the scope of your patch, but I see that the observer
> created by mozHunspell::Init().  Is that a leak, or do observer objects get
> auto-released by the observer service when they're destroyed?)

The last parameter passed to AddObserver is true, which means that the observer is going to hold a weak reference to the object, which makes this not be a leak.
Comment 16 Kyle Huey [:khuey] (khuey@mozilla.com) 2011-08-16 04:01:30 PDT
http://hg.mozilla.org/mozilla-central/rev/44f89f14c84a
Comment 17 Kyle Huey [:khuey] (khuey@mozilla.com) 2011-08-16 04:03:32 PDT
and
http://hg.mozilla.org/mozilla-central/rev/74bf2e6459e3
Comment 18 Matthew Gregan [:kinetik] 2011-08-25 21:17:53 PDT
Is the mHunspellReporter leak intentional?  NS_UnregisterMemoryReporter does not free the reporter.
Comment 19 Nicholas Nethercote [:njn] 2011-08-25 21:23:34 PDT
(In reply to Matthew Gregan (:kinetik) from comment #18)
> Is the mHunspellReporter leak intentional?  NS_UnregisterMemoryReporter does
> not free the reporter.

Good question.  If mHunspellReporter was a nsCOMPtr(nsIMemoryReporter) it would be freed because the refcount would drop to zero when unregistered.  But it's not.  I think it should be.  Ehsan?
Comment 20 Kyle Huey [:khuey] (khuey@mozilla.com) 2011-08-26 03:17:44 PDT
Huh?  There's no leak here.  Memory reporters are reference counted and will be destroyed when their refcount drops to 0 (which, in this case, occurs when NS_UnregisterMemoryReporter removes it from the com array.
Comment 21 Matthew Gregan [:kinetik] 2011-08-26 05:25:52 PDT
    4.66 +  mHunspellReporter = new NS_MEMORY_REPORTER_NAME(Hunspell);
...
    5.44 +  nsIMemoryReporter* mHunspellReporter;

mHunspellReporter is not deleted in the destructor and is owned by a raw pointer.
Comment 22 Kyle Huey [:khuey] (khuey@mozilla.com) 2011-08-26 05:28:13 PDT
No, it's not owned by that pointer.

XPCOM objects start with a refcount of 0, not 1.
Comment 23 Kyle Huey [:khuey] (khuey@mozilla.com) 2011-08-26 05:28:43 PDT
And if this were leaking, tinderbox would be permaorange.
Comment 24 Matthew Gregan [:kinetik] 2011-08-26 05:39:32 PDT
(In reply to Kyle Huey [:khuey] (khuey@mozilla.com) from comment #22)
> XPCOM objects start with a refcount of 0, not 1.

Good point, I forgot about that.

The asymmetric nature of this code (allocate and register, but only unregister) doesn't make it super easy to understand ownership at a glance.
Comment 25 Nicholas Nethercote [:njn] 2011-08-26 19:43:08 PDT
Should it be a nsCOMPtr<nsIMemoryReporter>, though?  I thought raw pointers to XPCOM objects were a bad idea in general, even if in this case it happens to work.
Comment 26 :Ehsan Akhgari (busy, don't ask for review please) 2011-09-07 14:18:36 PDT
(In reply to Nicholas Nethercote [:njn] (on vacation Sep 12--Oct 17) from comment #25)
> Should it be a nsCOMPtr<nsIMemoryReporter>, though?  I thought raw pointers
> to XPCOM objects were a bad idea in general, even if in this case it happens
> to work.

Why?  Global nsCOMPtrs are a bad idea since they hold a reference to the object till the end of time (well, the end of application runtime).

Note You need to log in before you can comment on or make changes to this bug.