Closed Bug 669116 Opened 13 years ago Closed 13 years ago

Add memory reporter for the spell checker

Categories

(Core :: Spelling checker, defect)

defect
Not set
normal

Tracking

()

RESOLVED FIXED
mozilla8

People

(Reporter: n.nethercote, Assigned: ehsan.akhgari)

References

(Blocks 1 open bug)

Details

(Whiteboard: [MemShrink:P3])

Attachments

(2 files, 1 obsolete file)

From bug 653817 comment 7, HashMgr::add_word accounts for about 2.7MB of memory usage on my Linux64 machine.  And there's probably a bit more in other parts of the hash table, eg. the main array.

It'd be good to have memory reporters for this.  AFAICT the dictionary isn't created at browser start-up, but once it is created is hangs around forever?  So a memory reporter would be helpful in answering the question "why does Firefox use more memory once I close all tabs than it did at startup?"

RyanVM, I can attempt this, but I'm happy to defer to you if you want to do it;  you seem to know this component well judging from hg logs.
I suggest considering dropping the hash on memory-pressure in bug 669119.
njn, I just keep the in-tree Hunspell in sync with upstream. Olli and/or Ehsan are probably better to have involved.
Whiteboard: [MemShrink] → [MemShrink:P3]
Version: unspecified → Trunk
Attached file output from DMD
Here's some output from DMD (see bug 676724) after starting the browser and loading Gmail.  Notable things:

- Almost 4MB of space is used by the spelling dictionary!

- Lots of slop in the decode_flags entry:

==13617== Unreported: 713,744 bytes in 44,556 heap block(s) in record 10 of 2686
7:
==13617==  Requested bytes unreported: 164,772 / 164,772
==13617==  Slop      bytes unreported: 548,972 / 548,972
==13617==    at 0x4029F90: malloc (vg_replace_malloc.c:235)
==13617==    by 0x403B040: moz_malloc (mozalloc.cpp:120)
==13617==    by 0x7749BD5: HashMgr::decode_flags(unsigned short**, char*, FileMg
r*) (hashmgr.cpp:605)

  Now, DMD's allocator adds more slop than jemalloc does for really small blocks (eg. < 8 bytes), but still!  That's a lot of slop.
Assignee: nobody → ehsan
CCing upstream Hunspell devs also.
So, this would require us to modify the upstream project's code.  Nick, do we usually do that for code that we borrow from other projects which needs to have memory reporters?

Also, can you please point me to some docs on how to add the memory reporter?  The stuff that I found in the code are a bit confusing.  I know that I saw a howto for that somewhere, but I can't find it right now...
The memory reporters in nsMemoryReporterManager.cpp are pretty simple.  For example, http://mxr.mozilla.org/mozilla-central/source/xpcom/base/nsMemoryReporterManager.cpp#240
(In reply to Ehsan Akhgari [:ehsan] from comment #5)
> So, this would require us to modify the upstream project's code.  Nick, do
> we usually do that for code that we borrow from other projects which needs
> to have memory reporters?

There isn't a standard procedure.  SQLite is the only upstream project for which we have memory reporter, AFAIK, and we use its existing memory measurement API.
Attached patch WIP (obsolete) — Splinter Review
I think I've got everything right, but "spellcheck" doesn't appear in about:memory?verbose.  Do you have any idea what I've missed?
Attachment #552028 - Flags: feedback?(nnethercote)
Comment on attachment 552028 [details] [diff] [review]
WIP

Review of attachment 552028 [details] [diff] [review]:
-----------------------------------------------------------------

You need to register the memory reporter with NS_RegisterMemoryReporter.  Wherever hunspell is initialized, that's probably the best place to register the reporter.
Attachment #552028 - Flags: feedback?(nnethercote)
Comment on attachment 552028 [details] [diff] [review]
WIP

Review of attachment 552028 [details] [diff] [review]:
-----------------------------------------------------------------

One concern:  the macros override malloc/calloc/realloc/etc, but do not affect |operator new| et al.  ATM malloc accounts for all the big allocations but that's not guaranteed to remain true in the future.

Also, I wonder if calling malloc_usable_size for every alloc/free is expensive.

And I won't pretend to understand how the configure.in change works.
(In reply to Nicholas Nethercote [:njn] from comment #9)
> You need to register the memory reporter with NS_RegisterMemoryReporter. 
> Wherever hunspell is initialized, that's probably the best place to register
> the reporter.

Ah, right.  Yes, this fixes the problem!

(In reply to Nicholas Nethercote [:njn] from comment #10)
> One concern:  the macros override malloc/calloc/realloc/etc, but do not
> affect |operator new| et al.  ATM malloc accounts for all the big
> allocations but that's not guaranteed to remain true in the future.

Well, you can't really "override" the C++ allocation operators defined in mozalloc.h.  I don't have a solution for how we should deal with that scenario yet, so I decided that we should cross that bridge when we come to it.

> Also, I wonder if calling malloc_usable_size for every alloc/free is
> expensive.

The jemalloc implementation seems pretty simple.  And also there are lots of stuff in the hunspell code that can be made more efficient.  If we want to spend energy on it, I'd rather us mix some of the other craziness in that code.  :-)

Which is my way of saying that I'm not too worried about that for now.

> And I won't pretend to understand how the configure.in change works.

It's easy.  Those changes put the conditional include at the end of mozilla-config.h, which is force-included in all of our source files.  The preprocessor macro is however defined only for files in the hunspell directory, which enables us to include alloc_hooks.h only for those files.

This is hackish, but it was the best way that I could think of in order for us to make as little change to hunspell as we can afford to (and indeed, this patch achieves this with *no* change to hunspell, which makes me happy).
Attached patch Patch (v1)Splinter Review
Attachment #552028 - Attachment is obsolete: true
Attachment #552114 - Flags: review?(nnethercote)
Comment on attachment 552114 [details] [diff] [review]
Patch (v1)

Review of attachment 552114 [details] [diff] [review]:
-----------------------------------------------------------------

This malloc shim business is clever-but-gross.  I'm giving r+ but I want a second opinion from someone who knows more about build system stuff:  paging khuey!

::: configure.in
@@ +9392,5 @@
> +/* Force-include alloc_hooks.h for hunspell, so that we don't need to modify it
> + * directly. */
> +#if defined(HUNSPELL_STATIC)
> +#include "alloc_hooks.h"
> +#endif

This introduces coupling between configure.in and extensions/spellcheck/hunspell/src/Makefile.in which is ugly but I can live with it if you:

- Add to this comment something like "HUNSPELL_STATIC is defined in extensions/spellcheck/hunspell/src/Makefile.in unless --enable-system-hunspell is defined".

- Rename alloc_hooks.h to hunspell_alloc_hooks.h.

- Add a comment in extensions/spellcheck/hunspell/src/Makefile.in pointing out that HUNSPELL_STATIC is referred to in configure.in.

::: extensions/spellcheck/hunspell/src/alloc_hooks.h
@@ +39,5 @@
> + * reporting to hunspell without modifying its code, in order to ease future
> + * upgrades.
> + *
> + * This file is force-included through mozilla-config.h which is generated
> + * during the configure step.

Can you mention here that operator new/new[] isn't covered by this file, but we live with that because those allocators don't account for much of the memory used by Hunspell.

@@ +42,5 @@
> + * This file is force-included through mozilla-config.h which is generated
> + * during the configure step.
> + */
> +
> +// Prevent the standard macros to be redefined

Nit: "from being redefined"

::: extensions/spellcheck/hunspell/src/mozHunspell.cpp
@@ +94,5 @@
>  
> +// Memory reporting stuff
> +static PRInt64 gHunspellAllocatedSize = 0;
> +
> +void hunspell_report_memory_allocation(void* ptr) {

Nit: Wouldn't HunspellReportMemoryAllocation be more standard style?  Likewise for the other functions.

@@ +109,5 @@
> +    "explicit/spellcheck",
> +    KIND_HEAP,
> +    UNITS_BYTES,
> +    hunspell_get_current_allocated_size,
> +    "Memory used by the Hunspell spellchecking engine.  This number accounts "

Nit "spell checking" seems to be the preferred usage, so I'd use "explicit/spell-check" for the path and "spell checking" in the description.

@@ +110,5 @@
> +    KIND_HEAP,
> +    UNITS_BYTES,
> +    hunspell_get_current_allocated_size,
> +    "Memory used by the Hunspell spellchecking engine.  This number accounts "
> +    "for the memory in use by Hunspell's internal data structures."

Would "Most of this space is taken up by the dictionary." be a better 2nd sentence?

@@ +135,5 @@
>  
>  mozHunspell::~mozHunspell()
>  {
>    mPersonalDictionary = nsnull;
>    delete mHunspell;

You should unregister the reporter here.  That'll require adding a nsCOMPtr<nsIMemoryReporter> member to mozHunspell.

(Hmm, this is beyond the scope of your patch, but I see that the observer created by mozHunspell::Init().  Is that a leak, or do observer objects get auto-released by the observer service when they're destroyed?)
Attachment #552114 - Flags: review?(nnethercote)
Attachment #552114 - Flags: review?(khuey)
Attachment #552114 - Flags: review+
Comment on attachment 552114 [details] [diff] [review]
Patch (v1)

Yeah, this is a little nasty, but I don't want to build infrastructure for force including headers for this one case.  If we need to do this again in the future we'll do it right.
Attachment #552114 - Flags: review?(khuey) → review+
(In reply to Nicholas Nethercote [:njn] from comment #13)
> @@ +110,5 @@
> > +    KIND_HEAP,
> > +    UNITS_BYTES,
> > +    hunspell_get_current_allocated_size,
> > +    "Memory used by the Hunspell spellchecking engine.  This number accounts "
> > +    "for the memory in use by Hunspell's internal data structures."
> 
> Would "Most of this space is taken up by the dictionary." be a better 2nd
> sentence?

Actually, I don't think so.  It's less precise, so I'd rather keep the existing description.

> (Hmm, this is beyond the scope of your patch, but I see that the observer
> created by mozHunspell::Init().  Is that a leak, or do observer objects get
> auto-released by the observer service when they're destroyed?)

The last parameter passed to AddObserver is true, which means that the observer is going to hold a weak reference to the object, which makes this not be a leak.
http://hg.mozilla.org/mozilla-central/rev/44f89f14c84a
Status: NEW → RESOLVED
Closed: 13 years ago
Flags: in-testsuite-
Resolution: --- → FIXED
Target Milestone: --- → mozilla8
Is the mHunspellReporter leak intentional?  NS_UnregisterMemoryReporter does not free the reporter.
(In reply to Matthew Gregan (:kinetik) from comment #18)
> Is the mHunspellReporter leak intentional?  NS_UnregisterMemoryReporter does
> not free the reporter.

Good question.  If mHunspellReporter was a nsCOMPtr(nsIMemoryReporter) it would be freed because the refcount would drop to zero when unregistered.  But it's not.  I think it should be.  Ehsan?
Huh?  There's no leak here.  Memory reporters are reference counted and will be destroyed when their refcount drops to 0 (which, in this case, occurs when NS_UnregisterMemoryReporter removes it from the com array.
    4.66 +  mHunspellReporter = new NS_MEMORY_REPORTER_NAME(Hunspell);
...
    5.44 +  nsIMemoryReporter* mHunspellReporter;

mHunspellReporter is not deleted in the destructor and is owned by a raw pointer.
No, it's not owned by that pointer.

XPCOM objects start with a refcount of 0, not 1.
And if this were leaking, tinderbox would be permaorange.
(In reply to Kyle Huey [:khuey] (khuey@mozilla.com) from comment #22)
> XPCOM objects start with a refcount of 0, not 1.

Good point, I forgot about that.

The asymmetric nature of this code (allocate and register, but only unregister) doesn't make it super easy to understand ownership at a glance.
Should it be a nsCOMPtr<nsIMemoryReporter>, though?  I thought raw pointers to XPCOM objects were a bad idea in general, even if in this case it happens to work.
(In reply to Nicholas Nethercote [:njn] (on vacation Sep 12--Oct 17) from comment #25)
> Should it be a nsCOMPtr<nsIMemoryReporter>, though?  I thought raw pointers
> to XPCOM objects were a bad idea in general, even if in this case it happens
> to work.

Why?  Global nsCOMPtrs are a bad idea since they hold a reference to the object till the end of time (well, the end of application runtime).
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: