704848 - reduce space required by nsEffectiveTLDService with more preprocessing

Reporter

Description

•

13 years ago

The space required by nsEffectiveTLDService.cpp:gEntries can be reduced significantly with a little more work in prepare_tlds.py.  Patch and explanation coming up.

Nathan Froyd [:froydnj]

Reporter

Comment 1

•

13 years ago

Attached patch patch (obsolete) — Details — Splinter Review

The pointers in ETLDEntry create two problems:

1. They (indirectly) introduce padding at the end of ETLDEntry, wasting space;
2. They require runtime relocations, increasing startup time and footprint.

This patch addresses both of these issues by turning the pointers into indices into a private string table.  The runtime cost is minimal: an extra arithmetic instruction on x86, probably two arithmetic instructions on ARM, with GCC.  I haven't checked what MSVC does to the code, but I would hope it does something similar.  After doing this:

1. ETLDEntry shrinks from 8 bytes on 32-bit platforms to 4 bytes.  (The savings are even better on 64-bit systems.)
2. We no longer need runtime relocations (there are no pointers in the data), so we save on the space required by them and the startup cost needed to process them.  I realize elfhack helps reduce the space of the relocations, but the best work is work that you don't do, right?

All told, we win by ~30-40K by doing this (16K of table size, plus 32K of relocation entries on x86/ARM, shrunk by whatever elfhack does to them).  I assume something similar happens on Windows.

I realize this is sort of an odd thing to do, but looking at data that needed relocations, nsEffectiveTLDService.cpp:gEntries was the biggest offender that was easily modifiable (e.g. not vtables, not table-driven QueryInterface bits).

Assignee: nobody → nfroyd

Attachment #576520 - Flags: review?(jduell.mcbugs)

Nathan Froyd [:froydnj]

Reporter

Comment 2

•

13 years ago

Actually, this doesn't quite build right; other files that include nsEffectiveTLDService.h don't know where to find the now-required .inc file generated by the build process.  This is handled correctly for netwerk/dns/, but not for other netwerk/ subdirectories.  Will have a think about how to handle this; suggestions welcome.

Going to change review? to feedback?.

Comment hidden (typo)

Comment on attachment 576520 [details] [diff] [review]
patch

># HG changeset patch
># User Nathan Froyd <nfroyd@mozilla.com>
># Date 1322074510 18000
># Branch dns-reloc-reduction
># Node ID adaccdd5d510037f7d4eac017ac88ae7bf02c00a
># Parent  6f998cc964be237a2fc77b19006a01227837b9c3
>Bug 704848 - reduce space required by nsEffectiveTLDService with more preprocessing; r=jduell
>
>diff --git a/netwerk/dns/Makefile.in b/netwerk/dns/Makefile.in
>--- a/netwerk/dns/Makefile.in
>+++ b/netwerk/dns/Makefile.in
>@@ -81,14 +81,17 @@ LOCAL_INCLUDES = \
>   $(NULL)
> 
> include $(topsrcdir)/config/rules.mk
> 
> DEFINES += -DIMPL_NS_NET
> 
> # Generate the include file containing compact, static definitions
> # for effective TLD data.
>-nsEffectiveTLDService.$(OBJ_SUFFIX): etld_data.inc
>+nsEffectiveTLDService.$(OBJ_SUFFIX): etld_data.inc etld_indexer.inc
>+nsEffectiveTLDService.h: etld_indexer.inc
> 
> etld_data.inc: $(srcdir)/prepare_tlds.py $(srcdir)/effective_tld_names.dat
>-	$(PYTHON) $(srcdir)/prepare_tlds.py $(srcdir)/effective_tld_names.dat > etld_data.inc
>+	$(PYTHON) $(srcdir)/prepare_tlds.py data $(srcdir)/effective_tld_names.dat > etld_data.inc
>+etld_indexer.inc: $(srcdir)/prepare_tlds.py $(srcdir)/effective_tld_names.dat
>+	$(PYTHON) $(srcdir)/prepare_tlds.py indexer $(srcdir)/effective_tld_names.dat > etld_indexer.inc
> 
>-GARBAGE += etld_data.inc
>+GARBAGE += etld_data.inc etld_indexer.inc
>diff --git a/netwerk/dns/nsEffectiveTLDService.cpp b/netwerk/dns/nsEffectiveTLDService.cpp
>--- a/netwerk/dns/nsEffectiveTLDService.cpp
>+++ b/netwerk/dns/nsEffectiveTLDService.cpp
>@@ -76,24 +76,25 @@ nsEffectiveTLDService::Init()
>     return NS_ERROR_OUT_OF_MEMORY;
> 
>   nsresult rv;
>   mIDNService = do_GetService(NS_IDNSERVICE_CONTRACTID, &rv);
>   if (NS_FAILED(rv)) return rv;
> 
>   // Initialize eTLD hash from static array
>   for (PRUint32 i = 0; i < ArrayLength(gEntries) - 1; i++) {
>+    const char *domain = getEffectiveTLDName(gEntries[i].strtab_index);
> #ifdef DEBUG
>-    nsDependentCString name(gEntries[i].domain);
>-    nsCAutoString normalizedName(gEntries[i].domain);
>+    nsDependentCString name(domain);
>+    nsCAutoString normalizedName(domain);
>     NS_ASSERTION(NS_SUCCEEDED(NormalizeHostname(normalizedName)),
>                  "normalization failure!");
>     NS_ASSERTION(name.Equals(normalizedName), "domain not normalized!");
> #endif
>-    nsDomainEntry *entry = mHash.PutEntry(gEntries[i].domain);
>+    nsDomainEntry *entry = mHash.PutEntry(domain);
>     NS_ENSURE_TRUE(entry, NS_ERROR_OUT_OF_MEMORY);
>     entry->SetData(&gEntries[i]);
>   }
>   return NS_OK;
> }
> 
> // External function for dealing with URI's correctly.
> // Pulls out the host portion from an nsIURI, and calls through to
>diff --git a/netwerk/dns/nsEffectiveTLDService.h b/netwerk/dns/nsEffectiveTLDService.h
>--- a/netwerk/dns/nsEffectiveTLDService.h
>+++ b/netwerk/dns/nsEffectiveTLDService.h
>@@ -43,21 +43,22 @@
> #include "nsTHashtable.h"
> #include "nsString.h"
> #include "nsCOMPtr.h"
> 
> class nsIIDNService;
> 
> // struct for static data generated from effective_tld_names.dat
> struct ETLDEntry {
>-  const char* domain;
>+  PRUint16 strtab_index;
>   bool exception;
>   bool wild;
> };
> 
>+#include "etld_indexer.inc"
> 
> // hash entry class
> class nsDomainEntry : public PLDHashEntryHdr
> {
> public:
>   // Hash methods
>   typedef const char* KeyType;
>   typedef const char* KeyTypePointer;
>@@ -74,22 +75,22 @@ public:
>   }
> 
>   ~nsDomainEntry()
>   {
>   }
> 
>   KeyType GetKey() const
>   {
>-    return mData->domain;
>+    return getEffectiveTLDName(mData->strtab_index);
>   }
> 
>   bool KeyEquals(KeyTypePointer aKey) const
>   {
>-    return !strcmp(mData->domain, aKey);
>+    return !strcmp(GetKey(), aKey);
>   }
> 
>   static KeyTypePointer KeyToPointer(KeyType aKey)
>   {
>     return aKey;
>   }
> 
>   static PLDHashNumber HashKey(KeyTypePointer aKey)
>diff --git a/netwerk/dns/prepare_tlds.py b/netwerk/dns/prepare_tlds.py
>--- a/netwerk/dns/prepare_tlds.py
>+++ b/netwerk/dns/prepare_tlds.py
>@@ -33,16 +33,17 @@
> # the terms of any one of the MPL, the GPL or the LGPL.
> #
> # ***** END LICENSE BLOCK *****
> 
> import codecs
> import encodings.idna
> import re
> import sys
>+import StringIO
> 
> """
> Processes a file containing effective TLD data.  See the following URL for a
> description of effective TLDs and of the file format that this script
> processes (although for the latter you're better off just reading this file's
> short source code).
> 
> http://wiki.mozilla.org/Gecko:Effective_TLD_Service
>@@ -117,35 +118,121 @@ class EffectiveTLDEntry:
>   def exception(self):
>     "True if this entry's domain denotes does not denote an effective TLD."
>     return self._exception
> 
>   def wild(self):
>     "True if this entry represents a class of effective TLDs."
>     return self._wild
> 
>+# Instead of using char*'s in our table, we're going to use indices into
>+# a large string constant.  This scheme has minimal run-time consquences
>+# (an extra arithmetic instruction or two on access) and two big
>+# benefits:
>+#
>+# 1. Our table is smaller because we can use small integer types for the
>+#    indices instead of pointers;
>+# 2. The indices do not require run-time relocations like pointers would,
>+#    reducing startup time and footprint.
>+#
>+# Since some compilers can't handle long string constants, we're going to
>+# use an array of array of chars where each internal array is short enough
>+# to satisfy all compilers we care about.  A good compiler will be able to
>+# turn the double-indexing into something efficient.
>+def c_strlen(s):
>+  return len(s) + 1
>+
>+def computeStringTable(strings):
>+  """
>+  Compute a string table from STRINGS.  Return a string containing the table
>+  and the table indices for each member of STRINGS.
>+  """
>+  MAX_STRING_CONSTANT_LENGTH = 2047
>+  buf = StringIO.StringIO()
>+  table = []
>+  indices = []
>+  current_table_index = 0
>+  # We need to keep track of where we are in the current sub-array to
>+  # determine when a new one needs to be started.
>+  current_line_index = 0
>+  # Write an initial null so that index 0 is an invalid index.
>+  buf.write('\0')
>+  current_table_index += 1
>+  current_line_index += 1
>+  for s in strings:
>+    c_length = c_strlen(s)
>+    # Don't try to handle strings longer than MAX_STRING_CONSTANT_LENGTH.
>+    if len(s) > MAX_STRING_CONSTANT_LENGTH:
>+      print "String constant too long:", s
>+      sys.exit(1)
>+    if current_table_index >= 65536:
>+      print "String index overflowed a 16-bit unsigned integer on", s
>+      sys.exit(1)
>+    # Do we need to start a new line?
>+    if (c_length + current_line_index) > MAX_STRING_CONSTANT_LENGTH:
>+      next_table_index = (current_table_index + 2047) & ~2048
>+      current_line_index = 0
>+      table.append(buf.getvalue())
>+      buf.truncate(0)
>+    buf.write(s)
>+    buf.write('\0')
>+    indices.append(current_table_index)
>+    current_table_index += c_length
>+    current_line_index += c_length
>+  table.append(buf.getvalue())
>+  buf.close()
>+  return table, indices
>+
>+def print_entries_table(etlds, strtab_indices):
>+  def boolStr(b):
>+    if b:
>+      return "true"
>+    return "false"
>+
>+  print '/* %d entries */' % len(etlds)
>+  print '{'
>+  for (etld, strtab_index) in zip(etlds, strtab_indices):
>+    exception = boolStr(etld.exception())
>+    wild = boolStr(etld.wild())
>+    print '  { %s, %s, %s },' % (strtab_index, exception, wild)
>+  print '  { 0, false, false }'
>+  print '}'
>+
>+def print_entries_indexer(strtab):
>+  print 'static const char *'
>+  print 'getEffectiveTLDName(size_t idx)'
>+  print '{'
>+  print '  static const char table[][2048] = {'
>+  for line in strtab:
>+    print '    /* %d bytes */' % len(line)
>+    print '    "%s",' % line
>+  print '  };'
>+  print '  return &table[idx / 2048][idx % 2048];'
>+  print '}'
> 
> #################
> # DO EVERYTHING #
> #################
> 
> def main():
>   """
>-  argv[1] is the effective TLD file to parse.
>-  A C++ array of { domain, exception, wild } entries representing the
>-  eTLD file is then printed to stdout.
>+  argv[1] is the kind of .inc we are going to generate.
>+  argv[2] is the effective TLD file to parse.
>+  If argv[1] is 'indexer', then a function getEffectiveTLDName for
>+  mapping indices to TLD names is printed to stdout.
>+  If argv[2] is 'data', then a C++ array of
>+  { index, exception, wild } entries representing the eTLD file is
>+  printed to stdout.
>   """
> 
>-  def boolStr(b):
>-    if b:
>-      return "PR_TRUE"
>-    return "PR_FALSE"
>-
>-  print "{"
>-  for etld in getEffectiveTLDs(sys.argv[1]):
>-    exception = boolStr(etld.exception())
>-    wild = boolStr(etld.wild())
>-    print '  { "%s", %s, %s },' % (etld.domain(), exception, wild)
>-  print "  { nsnull, PR_FALSE, PR_FALSE }"
>-  print "}"
>+  etlds = list(getEffectiveTLDs(sys.argv[2]))
>+  etlds.sort(cmp, lambda x: len(x.domain()))
>+  strtab, indices = computeStringTable(map(lambda x: x.domain(), etlds))
>+  if sys.argv[1] == 'data':
>+    print_entries_table(etlds, indices)
>+  elif sys.argv[1] == 'indexer':
>+    print_entries_indexer(strtab)
>+  else:
>+    print 'invalid command %s' % sys.argv[1]
>+    sys.exit(1)
> 
> if __name__ == '__main__':
>   main()

Nathan Froyd [:froydnj]

Reporter

Comment 4

•

13 years ago

Attached patch patch, v2 (obsolete) — Details — Splinter Review

New patch, this time appropriately modifying the build system so everything builds.  It also includes preprocessor magic that reduces the Python modifications required and eliminates warnings about NULLs in strings from GCC (and possibly elsewhere).  Also gone are the two-level arrays and extra indexing that required; any compiler that can't generate good code now is not worth using.  Static asserts that our indices into the string table do not overflow are also provided, at a small cost: 1 byte for a dummy function totally unused at runtime.

The placement of the actual data in nsDomainEntry is a little weird, but seeing as how putting it anywhere else has problems, it is the best solution.  I looked at the generated code and it's ideal.

Actually asking for review this time.  Benefits: 40-50k smaller binary, nearly 2% fewer relocations to help improve startup.

Attachment #576520 - Attachment is obsolete: true

Attachment #576520 - Flags: feedback?(jduell.mcbugs)

Attachment #582382 - Flags: review?(jduell.mcbugs)

Nathan Froyd [:froydnj]

Reporter

Comment 5

•

13 years ago

Attached patch patch v3 (obsolete) — Details — Splinter Review

Now with less idiocy on my part.  We don't need a separate index array because we're storing the indices directly in the ETLDEntry struct.

Attachment #582382 - Attachment is obsolete: true

Attachment #582382 - Flags: review?(jduell.mcbugs)

Attachment #582445 - Flags: review?(jduell.mcbugs)

Nathan Froyd [:froydnj]

Reporter

Comment 6

•

12 years ago

Should mention that if the giant struct method is unpalatable, doing it the way parser/html/nsHtml5NamedCharactersInclude.h does it is also a possibility.  (We're generating the #include, so adding lengths and splitting things into spaces is no problem.)

Jason Duell

Comment 7

•

12 years ago

As discussed with Nathan on IRC, this patch look good but I think we can do even better using gperf (it will also let us skip initializing the ~5K element hash table at startup).  I'm going to give that a go.

Assignee: nfroyd → nobody

Component: Networking → Nanojit

QA Contact: networking → nanojit

Jason Duell

Comment 8

•

12 years ago

Comment on attachment 582445 [details] [diff] [review]
patch v3

Clearing review--if gperf approach doesn't work, we'll revisit.

Attachment #582445 - Flags: review?(jduell.mcbugs)

Boris Zbarsky [:bzbarsky]

Updated

•

12 years ago

Component: Nanojit → Networking

QA Contact: nanojit → networking

Nathan Froyd [:froydnj]

Reporter

Comment 9

•

12 years ago

FWIW, gperf's hashtable, even with -P (optimize for fewer relocations in shared libraries) produces a struct with:

{ char *name; bool b1; bool b2; }

which is the same as what we have today.

The generated code spews warnings with GCC (not even with -Wall, just -O2) and features dodgy casts like (int)(long)&chararray[index].  Hacking gperf might be an option (would be useful in other places)...

Jason Duell

Comment 10

•

12 years ago

-P gives us an "int" offset (not a char *), which is still not what we want (we want an unsigned 16 bit value).  I'm planning to either postprocess that with a python script, or add a flag to gconf.

> dodgy casts like (int)(long)&chararray[index]

what do you recommend?  Casting to a uintptr_t, then to PRUint16?

Component: Networking → Nanojit

Nathan Froyd [:froydnj]

Reporter

Comment 11

•

12 years ago

Doh, I skimmed the documentation for -P and didn't notice that you should change the type of the 'name' field.  We still have the dodgy casts, but defining:

struct dnsentry { PRUint16 offset; bool isException; bool isWild; };

works just fine and eliminates the warning spew.

The generated hashtable is ~300K; it can be cut down to ~220K by using an appropriately high -m option.  It's not clear that trading a couple hundred K of on-disk space to save several tens of K of runtime memory and some computation is worth it.

Jason Duell

Comment 12

•

12 years ago

Nathan,

Thanks for looking into this.  If you've got momentum and want to write a patch that uses gperf, grab this from me and go for it.  Otherwise I should get to it pretty soon.

Nathan Froyd [:froydnj]

Reporter

Comment 13

•

12 years ago

Go for it. :)

Assignee: nobody → jduell.mcbugs

Nathan Froyd [:froydnj]

Reporter

Comment 14

•

12 years ago

FWIW, DMD reports the ETLD hashtable as:

==1512== Unreported: 1 block(s) in record 14 of 13721
==1512==  135,168 bytes (131,076 requested / 4,092 slop)

(this is a debug build, hence the slop)

So the table we have today takes up ~128K at runtime; I don't think trading that for a gperf hashtable that's twice the size on-disk would be worthwhile.

Component: Nanojit → Networking

Nathan Froyd [:froydnj]

Reporter

Comment 15

•

12 years ago

Comment on attachment 582445 [details] [diff] [review]
patch v3

Hey Jason, it's been six months with no movement on the gperf front.  I think this patch is worthwhile to go in as-is, so setting r?jduell on it.

Attachment #582445 - Flags: review?(jduell.mcbugs)

Jason Duell

Comment 16

•

12 years ago

Sounds like a good plan--I'll review, but may not happen until after B2G fork.

Assignee: jduell.mcbugs → nobody

Josh Aas

Comment 17

•

12 years ago

Comment on attachment 582445 [details] [diff] [review]
patch v3

Is this review something we can delegate to Steve Workman or Patrick McManus?

Jason Duell

Comment 18

•

12 years ago

Attached patch v4: v3, unbitrotted, but now overflows (obsolete) — Details — Splinter Review

I can find the cycles to keep reviewing this.  Sorry for the delay.

So I unbitrotted the patch (mainly PRType -> stdint).  Alas, it's now barfing during compilation with a warning (which, thankfully we treat as an error--yay!) that int constants are being truncated to fit into the uint16 'strtab_index' field.

The culprit?  effective_tld_names.dat has grown from  71K (including comments) to 98K since this patch was submitted, and the approach that stores an index into a concatenated array of domain names will no longer work with a 16 bit index.   (I'm going to email Gerv to see if we can try to contain the growth rate of the ETLD database, which is getting scary).

Going forward there are probably a bunch of ways to proceed that will still save us memory storage.  Off the top of my head: we could generate two separate arrays, and pick which is indexed based on some simple hash (first character of domain name, etc).

Attachment #582445 - Attachment is obsolete: true

Attachment #582445 - Flags: review?(jduell.mcbugs)

Attachment #659968 - Flags: feedback+

Nathan Froyd [:froydnj]

Reporter

Comment 19

•

12 years ago

(In reply to Jason Duell (:jduell) from comment #18)
> I can find the cycles to keep reviewing this.  Sorry for the delay.

Thanks for the review.

> Going forward there are probably a bunch of ways to proceed that will still
> save us memory storage.  Off the top of my head: we could generate two
> separate arrays, and pick which is indexed based on some simple hash (first
> character of domain name, etc).

I think the easiest way is to use bitfields:

struct ETLDEntry {
  uint32_t strtab_index : 30;
  uint32_t exception : 1;
  uint32_t wild : 1;
};

30 bits should be enough for anybody!

Nathan Froyd [:froydnj]

Reporter

Comment 20

•

12 years ago

Attached patch patch v5, v4 + bitfields — Details — Splinter Review

Here's an updated patch that compiles on my Linux box (I think previous versions were failing to define the `strings' structure, leading to link errors; not sure how I missed that...).

Attachment #659968 - Attachment is obsolete: true

Attachment #660035 - Flags: review?(jduell.mcbugs)

Gervase Markham [:gerv]

Comment 21

•

12 years ago

To answer jduell's question: the change is mostly due to an update from the .jp registry, which instituted a large set of fairly specific regional 3rd level domains, of which there are over 1600 :-| This sort of thing is a fairly unusual occurrence.

We don't have a specific plan to limit the size of the list because it's merely a reflection of reality. The only way to reduce its size, in its raw form, would be to make it less accurate. Of course, one could do cunning things to reduce its size in processed/compiled form, as we are doing here...

Gerv

Jason Duell

Comment 22

•

12 years ago

Comment on attachment 660035 [details] [diff] [review]
patch v5, v4 + bitfields

Review of attachment 660035 [details] [diff] [review]:
-----------------------------------------------------------------

Looks good!  Please run through try before landing.

Thanks for taking this on and being patient with reviews :)

::: netwerk/dns/prepare_tlds.py
@@ +109,2 @@
>  
> +  for (i, etld) in enumerate(getEffectiveTLDs(sys.argv[1])):

So we don't use "i": any reason to use enumerate then?

Attachment #660035 - Flags: review?(jduell.mcbugs) → review+

Nathan Froyd [:froydnj]

Reporter

Comment 23

•

12 years ago

(In reply to Jason Duell (:jduell) from comment #22)
> Thanks for taking this on and being patient with reviews :)

Thanks for the review!

> ::: netwerk/dns/prepare_tlds.py
> @@ +109,2 @@
> >  
> > +  for (i, etld) in enumerate(getEffectiveTLDs(sys.argv[1])):
> 
> So we don't use "i": any reason to use enumerate then?

Hm, no.  Fixed in the push:

https://hg.mozilla.org/integration/mozilla-inbound/rev/e7b4f8be9a4d

The try run was a little odd:

https://tbpl.mozilla.org/?tree=Try&rev=64db83b424a8

There was one (heretofore unknown?) intermittent orange (browser/base/content/test/browser_bug435235.js), but I'm fairly certain that it was unrelated to the specific changes made here.

If the sheriffs and/or the networking folks feel differently, I'm happy to back this out.

Ed Morley [:emorley]

Comment 24

•

12 years ago

https://hg.mozilla.org/mozilla-central/rev/e7b4f8be9a4d

Status: NEW → RESOLVED

Closed: 12 years ago

Resolution: --- → FIXED

Target Milestone: --- → mozilla18

patch 13 years ago Nathan Froyd [:froydnj] 8.83 KB, patch		Details \| Diff \| Splinter Review
patch, v2 13 years ago Nathan Froyd [:froydnj] 6.10 KB, patch		Details \| Diff \| Splinter Review
patch v3 13 years ago Nathan Froyd [:froydnj] 7.39 KB, patch		Details \| Diff \| Splinter Review
v4: v3, unbitrotted, but now overflows 12 years ago Jason Duell 7.44 KB, patch	jduell.mcbugs : feedback+	Details \| Diff \| Splinter Review
patch v5, v4 + bitfields 12 years ago Nathan Froyd [:froydnj] 7.88 KB, patch	jduell.mcbugs : review+	Details \| Diff \| Splinter Review