Closed Bug 704848 Opened 13 years ago Closed 12 years ago

reduce space required by nsEffectiveTLDService with more preprocessing

Categories

(Core :: Networking, defect)

defect
Not set
normal

Tracking

()

RESOLVED FIXED
mozilla18

People

(Reporter: froydnj, Unassigned)

Details

Attachments

(1 file, 4 obsolete files)

The space required by nsEffectiveTLDService.cpp:gEntries can be reduced significantly with a little more work in prepare_tlds.py.  Patch and explanation coming up.
Attached patch patch (obsolete) — Splinter Review
The pointers in ETLDEntry create two problems:

1. They (indirectly) introduce padding at the end of ETLDEntry, wasting space;
2. They require runtime relocations, increasing startup time and footprint.

This patch addresses both of these issues by turning the pointers into indices into a private string table.  The runtime cost is minimal: an extra arithmetic instruction on x86, probably two arithmetic instructions on ARM, with GCC.  I haven't checked what MSVC does to the code, but I would hope it does something similar.  After doing this:

1. ETLDEntry shrinks from 8 bytes on 32-bit platforms to 4 bytes.  (The savings are even better on 64-bit systems.)
2. We no longer need runtime relocations (there are no pointers in the data), so we save on the space required by them and the startup cost needed to process them.  I realize elfhack helps reduce the space of the relocations, but the best work is work that you don't do, right?

All told, we win by ~30-40K by doing this (16K of table size, plus 32K of relocation entries on x86/ARM, shrunk by whatever elfhack does to them).  I assume something similar happens on Windows.

I realize this is sort of an odd thing to do, but looking at data that needed relocations, nsEffectiveTLDService.cpp:gEntries was the biggest offender that was easily modifiable (e.g. not vtables, not table-driven QueryInterface bits).
Assignee: nobody → nfroyd
Attachment #576520 - Flags: review?(jduell.mcbugs)
Actually, this doesn't quite build right; other files that include nsEffectiveTLDService.h don't know where to find the now-required .inc file generated by the build process.  This is handled correctly for netwerk/dns/, but not for other netwerk/ subdirectories.  Will have a think about how to handle this; suggestions welcome.

Going to change review? to feedback?.
Attached patch patch, v2 (obsolete) — Splinter Review
New patch, this time appropriately modifying the build system so everything builds.  It also includes preprocessor magic that reduces the Python modifications required and eliminates warnings about NULLs in strings from GCC (and possibly elsewhere).  Also gone are the two-level arrays and extra indexing that required; any compiler that can't generate good code now is not worth using.  Static asserts that our indices into the string table do not overflow are also provided, at a small cost: 1 byte for a dummy function totally unused at runtime.

The placement of the actual data in nsDomainEntry is a little weird, but seeing as how putting it anywhere else has problems, it is the best solution.  I looked at the generated code and it's ideal.

Actually asking for review this time.  Benefits: 40-50k smaller binary, nearly 2% fewer relocations to help improve startup.
Attachment #576520 - Attachment is obsolete: true
Attachment #576520 - Flags: feedback?(jduell.mcbugs)
Attachment #582382 - Flags: review?(jduell.mcbugs)
Attached patch patch v3 (obsolete) — Splinter Review
Now with less idiocy on my part.  We don't need a separate index array because we're storing the indices directly in the ETLDEntry struct.
Attachment #582382 - Attachment is obsolete: true
Attachment #582382 - Flags: review?(jduell.mcbugs)
Attachment #582445 - Flags: review?(jduell.mcbugs)
Should mention that if the giant struct method is unpalatable, doing it the way parser/html/nsHtml5NamedCharactersInclude.h does it is also a possibility.  (We're generating the #include, so adding lengths and splitting things into spaces is no problem.)
As discussed with Nathan on IRC, this patch look good but I think we can do even better using gperf (it will also let us skip initializing the ~5K element hash table at startup).  I'm going to give that a go.
Assignee: nfroyd → nobody
Component: Networking → Nanojit
QA Contact: networking → nanojit
Comment on attachment 582445 [details] [diff] [review]
patch v3

Clearing review--if gperf approach doesn't work, we'll revisit.
Attachment #582445 - Flags: review?(jduell.mcbugs)
Component: Nanojit → Networking
QA Contact: nanojit → networking
FWIW, gperf's hashtable, even with -P (optimize for fewer relocations in shared libraries) produces a struct with:

{ char *name; bool b1; bool b2; }

which is the same as what we have today.

The generated code spews warnings with GCC (not even with -Wall, just -O2) and features dodgy casts like (int)(long)&chararray[index].  Hacking gperf might be an option (would be useful in other places)...
-P gives us an "int" offset (not a char *), which is still not what we want (we want an unsigned 16 bit value).  I'm planning to either postprocess that with a python script, or add a flag to gconf.

> dodgy casts like (int)(long)&chararray[index]

what do you recommend?  Casting to a uintptr_t, then to PRUint16?
Component: Networking → Nanojit
Doh, I skimmed the documentation for -P and didn't notice that you should change the type of the 'name' field.  We still have the dodgy casts, but defining:

struct dnsentry { PRUint16 offset; bool isException; bool isWild; };

works just fine and eliminates the warning spew.

The generated hashtable is ~300K; it can be cut down to ~220K by using an appropriately high -m option.  It's not clear that trading a couple hundred K of on-disk space to save several tens of K of runtime memory and some computation is worth it.
Nathan,

Thanks for looking into this.  If you've got momentum and want to write a patch that uses gperf, grab this from me and go for it.  Otherwise I should get to it pretty soon.
Go for it. :)
Assignee: nobody → jduell.mcbugs
FWIW, DMD reports the ETLD hashtable as:

==1512== Unreported: 1 block(s) in record 14 of 13721
==1512==  135,168 bytes (131,076 requested / 4,092 slop)

(this is a debug build, hence the slop)

So the table we have today takes up ~128K at runtime; I don't think trading that for a gperf hashtable that's twice the size on-disk would be worthwhile.
Component: Nanojit → Networking
Comment on attachment 582445 [details] [diff] [review]
patch v3

Hey Jason, it's been six months with no movement on the gperf front.  I think this patch is worthwhile to go in as-is, so setting r?jduell on it.
Attachment #582445 - Flags: review?(jduell.mcbugs)
Sounds like a good plan--I'll review, but may not happen until after B2G fork.
Assignee: jduell.mcbugs → nobody
Comment on attachment 582445 [details] [diff] [review]
patch v3

Is this review something we can delegate to Steve Workman or Patrick McManus?
I can find the cycles to keep reviewing this.  Sorry for the delay.

So I unbitrotted the patch (mainly PRType -> stdint).  Alas, it's now barfing during compilation with a warning (which, thankfully we treat as an error--yay!) that int constants are being truncated to fit into the uint16 'strtab_index' field.

The culprit?  effective_tld_names.dat has grown from  71K (including comments) to 98K since this patch was submitted, and the approach that stores an index into a concatenated array of domain names will no longer work with a 16 bit index.   (I'm going to email Gerv to see if we can try to contain the growth rate of the ETLD database, which is getting scary).

Going forward there are probably a bunch of ways to proceed that will still save us memory storage.  Off the top of my head: we could generate two separate arrays, and pick which is indexed based on some simple hash (first character of domain name, etc).
Attachment #582445 - Attachment is obsolete: true
Attachment #582445 - Flags: review?(jduell.mcbugs)
Attachment #659968 - Flags: feedback+
(In reply to Jason Duell (:jduell) from comment #18)
> I can find the cycles to keep reviewing this.  Sorry for the delay.

Thanks for the review.

> Going forward there are probably a bunch of ways to proceed that will still
> save us memory storage.  Off the top of my head: we could generate two
> separate arrays, and pick which is indexed based on some simple hash (first
> character of domain name, etc).

I think the easiest way is to use bitfields:

struct ETLDEntry {
  uint32_t strtab_index : 30;
  uint32_t exception : 1;
  uint32_t wild : 1;
};

30 bits should be enough for anybody!
Here's an updated patch that compiles on my Linux box (I think previous versions were failing to define the `strings' structure, leading to link errors; not sure how I missed that...).
Attachment #659968 - Attachment is obsolete: true
Attachment #660035 - Flags: review?(jduell.mcbugs)
To answer jduell's question: the change is mostly due to an update from the .jp registry, which instituted a large set of fairly specific regional 3rd level domains, of which there are over 1600 :-| This sort of thing is a fairly unusual occurrence.

We don't have a specific plan to limit the size of the list because it's merely a reflection of reality. The only way to reduce its size, in its raw form, would be to make it less accurate. Of course, one could do cunning things to reduce its size in processed/compiled form, as we are doing here...

Gerv
Comment on attachment 660035 [details] [diff] [review]
patch v5, v4 + bitfields

Review of attachment 660035 [details] [diff] [review]:
-----------------------------------------------------------------

Looks good!  Please run through try before landing.

Thanks for taking this on and being patient with reviews :)

::: netwerk/dns/prepare_tlds.py
@@ +109,2 @@
>  
> +  for (i, etld) in enumerate(getEffectiveTLDs(sys.argv[1])):

So we don't use "i": any reason to use enumerate then?
Attachment #660035 - Flags: review?(jduell.mcbugs) → review+
(In reply to Jason Duell (:jduell) from comment #22)
> Thanks for taking this on and being patient with reviews :)

Thanks for the review!

> ::: netwerk/dns/prepare_tlds.py
> @@ +109,2 @@
> >  
> > +  for (i, etld) in enumerate(getEffectiveTLDs(sys.argv[1])):
> 
> So we don't use "i": any reason to use enumerate then?

Hm, no.  Fixed in the push:

https://hg.mozilla.org/integration/mozilla-inbound/rev/e7b4f8be9a4d

The try run was a little odd:

https://tbpl.mozilla.org/?tree=Try&rev=64db83b424a8

There was one (heretofore unknown?) intermittent orange (browser/base/content/test/browser_bug435235.js), but I'm fairly certain that it was unrelated to the specific changes made here.

If the sheriffs and/or the networking folks feel differently, I'm happy to back this out.
https://hg.mozilla.org/mozilla-central/rev/e7b4f8be9a4d
Status: NEW → RESOLVED
Closed: 12 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla18
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: