Closed Bug 386154 Opened 13 years ago Closed 12 years ago

effective TLD service uses too much memory

Categories

(Core :: Networking, defect)

defect
Not set

Tracking

()

RESOLVED FIXED
mozilla1.9alpha8

People

(Reporter: dbaron, Assigned: dwitte)

References

Details

(Keywords: memory-footprint)

Attachments

(3 files, 2 obsolete files)

As shown by the leaks from bug 366797, the effective TLD service uses *way* too much memory.

In particular, it uses:
 * 826200 bytes of hash table storage, allocated at the stack:

 malloc (/builds/trunk/mozilla/tools/trace-malloc/lib/nsTraceMalloc.c:934)
 PL_DHashAllocTable (/builds/trunk/obj/firefox-debugopt/xpcom/build/pldhash.c:91)
 PL_DHashTableInit (/builds/trunk/obj/firefox-debugopt/xpcom/build/pldhash.c:244)
 nsTHashtable<nsBaseHashtableET<nsCStringHashKey, SubdomainNode*> >::Init(unsigned int) (/builds/trunk/obj/firefox-debugopt/netwerk/dns/src/../../../dist/include/xpcom/nsTHashtable.h:342)
 nsBaseHashtable<nsCStringHashKey, SubdomainNode*, SubdomainNode*>::Init(unsigned int) (/builds/trunk/obj/firefox-debugopt/netwerk/dns/src/../../../dist/include/xpcom/nsBaseHashtable.h:99)
 FindMatchingChildNode (/builds/trunk/mozilla/netwerk/dns/src/nsEffectiveTLDService.cpp:430)
 AddEffectiveTLDEntry(nsCString&) (/builds/trunk/mozilla/netwerk/dns/src/nsEffectiveTLDService.cpp:465)
 LoadOneEffectiveTLDFile(nsCOMPtr<nsIFile>&) (/builds/trunk/mozilla/netwerk/dns/src/nsEffectiveTLDService.cpp:546)
 LoadEffectiveTLDFiles (/builds/trunk/mozilla/netwerk/dns/src/nsEffectiveTLDService.cpp:575)
 nsEffectiveTLDService::Init() (/builds/trunk/mozilla/netwerk/dns/src/nsEffectiveTLDService.cpp:189)
 nsEffectiveTLDServiceConstructor (/builds/trunk/mozilla/netwerk/build/nsNetModule.cpp:116)
 nsGenericFactory::CreateInstance(nsISupports*, nsID const&, void**) (/builds/trunk/obj/firefox-debugopt/xpcom/build/nsGenericFactory.cpp:84)
 nsComponentManagerImpl::CreateInstance(nsID const&, nsISupports*, nsID const&, void**) (/builds/trunk/mozilla/xpcom/components/nsComponentManager.cpp:1715)
 nsComponentManagerImpl::GetService(nsID const&, nsID const&, void**) (/builds/trunk/mozilla/xpcom/components/nsComponentManager.cpp:1926)



 * 103000 bytes of SubdomainNode objects, allocated at the stack:

 operator new(unsigned int) (/usr/lib/libstdc++.so.6)
 FindMatchingChildNode (/builds/trunk/mozilla/netwerk/dns/src/nsEffectiveTLDService.cpp:426)
 AddEffectiveTLDEntry(nsCString&) (/builds/trunk/mozilla/netwerk/dns/src/nsEffectiveTLDService.cpp:465)
 LoadOneEffectiveTLDFile(nsCOMPtr<nsIFile>&) (/builds/trunk/mozilla/netwerk/dns/src/nsEffectiveTLDService.cpp:546)
 LoadEffectiveTLDFiles (/builds/trunk/mozilla/netwerk/dns/src/nsEffectiveTLDService.cpp:575)
 nsEffectiveTLDService::Init() (/builds/trunk/mozilla/netwerk/dns/src/nsEffectiveTLDService.cpp:189)
 nsEffectiveTLDServiceConstructor (/builds/trunk/mozilla/netwerk/build/nsNetModule.cpp:116)
 nsGenericFactory::CreateInstance(nsISupports*, nsID const&, void**) (/builds/trunk/obj/firefox-debugopt/xpcom/build/nsGenericFactory.cpp:84)
 nsComponentManagerImpl::CreateInstance(nsID const&, nsISupports*, nsID const&, void**) (/builds/trunk/mozilla/xpcom/components/nsComponentManager.cpp:1715)
 nsComponentManagerImpl::GetService(nsID const&, nsID const&, void**) (/builds/trunk/mozilla/xpcom/components/nsComponentManager.cpp:1926)
Flags: blocking1.9?
The storage used is incredibly inefficient.  The biggest problem is allocating a hashtable for the children of childless entries, but there are a number of other obvious improvements.
Blocks: 366797
Blocks: 367373, 385299
this sounds like fun, i'll look into this and bug 386155.
Assignee: nobody → dwitte
Blocks: 386155
Target Milestone: --- → mozilla1.9beta1
Flags: blocking1.9? → blocking1.9+
Attached patch patch v1 (obsolete) — Splinter Review
this reworks eTLD to use a single flat hash for storing all the domains, rather than the present (overcomplicated) tree+hash approach. it also uses a custom hash entryclass with arenas to store the domain strings, which will avoid malloc overhead when adding entries. (this data will be static for duration of app). in order to find the eTLD for a given domain, we perform a hash lookup at each domain level (e.g. for foo.co.nz, we look up foo.co.nz, co.nz, and nz) and see if there's an entry at each point. once we find a match, we return immediately.

in writing this patch, i reviewed the present semantics (http://wiki.mozilla.org/Gecko:Effective_TLD_Service) and found them to be overcomplex and the implementation buggy. basically, since we control the input file to the eTLD service, we want the simplest implementation that gets the job done. (if we need more, we can add it later.) after reviewing the actual data file (http://mxr.mozilla.org/mozilla/source/netwerk/dns/src/effective_tld_names.dat), i've come up with the following new semantics:

there are three ways to specify an entry (same as before).
a) a basic entry:                 co.jp
b) a wildcard entry:      *.hokkaido.jp
c) an exception:      !pref.hokkaido.jp

a basic entry simply returns that as the eTLD; a wildcard allows the subdomain to be considered an eTLD (e.g. foo.hokkaido.jp in the above example); an exception can be used to negate the wildcard for a specific case (thus pref.hokkaido.jp is not an eTLD). the previous impl allowed for multiple wildcards (e.g. *.*.foo) but in practice this is too simple a rule to be of any use. at each domain level, we do the lookup and then apply the rules in order of precedence (there may be multiple rules at each level). a more specific rule always trumps a less specific one, be it an exception, a wildcard, or a normal rule. this is the main difference to the old code. at a given level, we might have all three entries, so the precedence for a "bar.com" lookup is:

1) the wildcard, *.bar.com. this will be looked up in the hash as "bar.com", but is actually a more specific rule by one level, so must come first. (this gives it the least precedence of that more specific level).
2) the exception, !bar.com. the only time an exception is useful is when paired with a wildcard, so this automatically implies "com" and thus we return that.
3) the normal rule, bar.com. note that "*.bar.com" should also imply "bar.com" (and that the current impl is wrong in not doing so). when the "*.bar.com" entry is added to the hash, a normal rule is also added to take care of this case.

once the patch is finalized and lands, i'll update the wiki documentation to reflect the new code.
Attachment #272612 - Flags: review?(cbiesinger)
as a side note, i removed the singleton handling with this patch. if you want me to add in a GetSingleton() method to enforce it again, i'm happy to.
Did anybody else (non-Mozilla) care about the eTLD file format?  See, e.g., bug 331510 comment 2 and later, and perhaps parts of bug 342314.
Comment on attachment 272612 [details] [diff] [review]
patch v1

actually, this is slightly wrong.
Attachment #272612 - Flags: review?(cbiesinger)
Attached patch patch v1.1 (obsolete) — Splinter Review
this switches the precedence of 2) and 3) above. if we have a "bar.com" rule and a "!bar.com" rule for whatever reason, we want to use "bar.com" instead of returning the implied "com" since the former is more specific.
Attachment #272612 - Attachment is obsolete: true
Attachment #272615 - Flags: review?(cbiesinger)
(In reply to comment #5)
> Did anybody else (non-Mozilla) care about the eTLD file format?

i doubt it, but regardless this patch doesn't change the format (the current file works just fine).
Comment on attachment 272615 [details] [diff] [review]
patch v1.1

>Index: netwerk/dns/src/nsEffectiveTLDService.cpp
>Index: netwerk/dns/src/nsEffectiveTLDService.h

>+// hash entry class
>+class nsDomainEntry : public PLDHashEntryHdr
>+{
>+public:

>+  nsDomainEntry(const nsDomainEntry& toCopy)
>+  {
>+    // if we end up here, things will break. nsTHashtable shouldn't
>+    // allow this, since we set ALLOW_MEMMOVE to true.
>+    NS_NOTREACHED("nsDomainEntry copy constructor is forbidden!");
>+  }

Move this into the private: section and don't provide an implementation -- with no copying constructor, any attempt to copy will be caught and flagged as a compile-time error.
(In reply to comment #9)
> Move this into the private: section and don't provide an implementation -- with
> no copying constructor, any attempt to copy will be caught and flagged as a
> compile-time error.

that doesn't work with nsTHashtable, since it relies on the enum |ALLOW_MEMMOVE| to decide whether to call the copy constructor at runtime.
No point in using packed bools on the stack (AddEffectiveTLDEntry)

Hm, I wonder why this code doesn't use NS_APP_RES_DIR to find the res dir...

Is there a particular reason why you removed the NAMED_LITERAL_STRING for commentMarker? (It'd avoid a little bit of overhead, no need to construct it in every iteration)

Why do you now truncate lines at whitespace, when the code didn't do that before?
Attachment #272615 - Flags: review?(cbiesinger) → review+
Attached patch patch v1.2Splinter Review
fixed nits, except for the one about whitespace - the old code also chomped comments and whitespace, this version is just more obvious about it. ;)

i'm going to check this in before freeze since this lays the groundwork for lots of other stuff (interface work, consumers etc).
Attachment #272615 - Attachment is obsolete: true
checked in to trunk for 1.9a7 freeze. i guess we'll need to wait for a consumer of this code before we can get allocation & leak stats from tinderbox.
Status: NEW → RESOLVED
Closed: 12 years ago
Resolution: --- → FIXED
Attached patch orange fix 1Splinter Review
checked this in to fix the case where there's a wildcard rule that applies to toplevel (e.g. *.uk). r+sr=biesi (after the fact).
Attachment #274012 - Flags: superreview+
Attachment #274012 - Flags: review+
Attached patch orange fix 2Splinter Review
... and this one is a followup to ensure we always check what we're about to return for sanity (also r+sr=biesi).
Attachment #274013 - Flags: superreview+
Attachment #274013 - Flags: review+
You need to log in before you can comment on or make changes to this bug.