Last Comment Bug 711563 - reduce space and relocations required by quickstubs tables
: reduce space and relocations required by quickstubs tables
Status: RESOLVED FIXED
:
Product: Core
Classification: Components
Component: XPConnect (show other bugs)
: unspecified
: All All
: -- normal (vote)
: mozilla12
Assigned To: Nathan Froyd [:froydnj]
:
Mentors:
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2011-12-16 11:49 PST by Nathan Froyd [:froydnj]
Modified: 2012-01-16 19:38 PST (History)
4 users (show)
See Also:
Crash Signature:
(edit)
QA Whiteboard:
Iteration: ---
Points: ---
Has Regression Range: ---
Has STR: ---


Attachments
uint16_t indices for hash table chains (893 bytes, patch)
2011-12-16 11:52 PST, Nathan Froyd [:froydnj]
no flags Details | Diff | Review
use flatter property and function tables (9.28 KB, patch)
2011-12-16 11:57 PST, Nathan Froyd [:froydnj]
jorendorff: review+
Details | Diff | Review
part 1 - uint16_t indices for hash table chains, v2 (1.65 KB, patch)
2011-12-16 12:17 PST, Nathan Froyd [:froydnj]
jorendorff: review+
Details | Diff | Review
part 2 -use flatter property and function tables, v2 (8.91 KB, patch)
2011-12-29 13:02 PST, Nathan Froyd [:froydnj]
nfroyd: review+
Details | Diff | Review
use flatter property and function names (10.35 KB, patch)
2011-12-29 13:40 PST, Nathan Froyd [:froydnj]
jorendorff: review+
Details | Diff | Review
part 4 - compact function and traceable specs (2.12 KB, patch)
2011-12-29 13:42 PST, Nathan Froyd [:froydnj]
jorendorff: review+
Details | Diff | Review
part 3 - use flatter property and function names (10.20 KB, patch)
2012-01-12 09:41 PST, Nathan Froyd [:froydnj]
jorendorff: review+
Details | Diff | Review
part 4 - compact function specs (1.26 KB, patch)
2012-01-12 09:42 PST, Nathan Froyd [:froydnj]
nfroyd: review+
Details | Diff | Review

Description Nathan Froyd [:froydnj] 2011-12-16 11:49:15 PST
Quickstubs's tables take up more space than required due to:

1) size_t for hash table entry chaining;
2) 0-terminated property and function tables;
3) pointers to said tables, requiring relocations; and
4) pointers to strings in said tables, requiring relocations.

Patches for the first three points to be attached shortly.  A patch for the 4th should not be that hard, but I don't have it ready yet.
Comment 1 Nathan Froyd [:froydnj] 2011-12-16 11:52:26 PST
Created attachment 582347 [details] [diff] [review]
uint16_t indices for hash table chains

Our xpc_qsHashEntry table only has ~200ish entries; we don't need a full size_t to index all of them.  Use a uint16_t instead, which saves a size_t for each entry.  Not that big of a savings, admittedly, but still.
Comment 2 Nathan Froyd [:froydnj] 2011-12-16 11:57:04 PST
Created attachment 582349 [details] [diff] [review]
use flatter property and function tables

xpc_qsHashEntry contains a pointer to its properties and functions.  This is all well and good, except that the pointer requires relocations at startup.  (And pointers are big on 64-bit platforms, too.)  The individual arrays also have an unnecessary nsnull entry at the end, which we don't need.

Instead of per-entry properties and functions, just produce one big array of properties, one big array of functions, and record indices and lengths in the xpc_qsHashEntrys.  No more relocations, no more nsnull entries.

16 bits of index and count should be enough for anybody!  We have less than a thousand properties and functions each, so more than two orders of magnitude to grow here.
Comment 3 Boris Zbarsky [:bz] (Out June 25-July 6) 2011-12-16 12:00:42 PST
That first patch needs a PR_STATIC_ASSERT about table size _somewhere_.
Comment 4 Nathan Froyd [:froydnj] 2011-12-16 12:17:29 PST
Created attachment 582355 [details] [diff] [review]
part 1 - uint16_t indices for hash table chains, v2

Now with PR_STATIC_ASSERTs (thanks, bz!).

Not using ArrayLength due to bizarre template errors, fwiw.
Comment 5 Jason Orendorff [:jorendorff] 2011-12-28 16:26:33 PST
Comment on attachment 582349 [details] [diff] [review]
use flatter property and function tables

Instead of 'mydict.has_key(key)', write 'key in mydict'.

Don't use global state like iface_propspecs and iface_funcspecs. Here are two alternatives:

  - create them in one of the functions and pass them around everywhere

  - instead of using dictionaries, just stick the propspecs and funcspecs on
    each interface object. You can do it unconditionally at the end of
    writeStubsForInterface:

        iface.propspecs = propspecs
        iface.funcspecs = funcspecs

    There is no need to declare the new properties.

    Then writeSpecs would look like this:

>+def writeSpecs(f, elementType, varname, spec_type, spec_indices, interfaces):
>+    index = 0
>+    f.write("static const %s %s[] = {\n" % (elementType, varname))
>+    for iface in interfaces:
>+        specs = getattr(iface, spec_type)
>+        if specs:
>+            spec_indices[iface.name] = index
>+            f.write("    // %s (index %d)\n" % (iface.name,index))
>+            for s in specs:
>+                f.write("    %s,\n" % s)
>+            index += len(specs)
>+    f.write("};\n\n")

and for spec_type the caller would the string 'propspecs' or 'funcspecs'.

How much of a win is this at startup?
Comment 6 Nathan Froyd [:froydnj] 2011-12-29 13:02:56 PST
Created attachment 584824 [details] [diff] [review]
part 2 -use flatter property and function tables, v2

Changes as requested by review.  Carrying over r+.
Comment 7 Nathan Froyd [:froydnj] 2011-12-29 13:40:47 PST
Created attachment 584834 [details] [diff] [review]
use flatter property and function names

Each const char * in our propspecs and funcspecs takes up sizeof(str) sizeof(const char *) + sizeof(relocation).  This patch cuts out the relocations required by producing a string table containing all the names of properties and functions and storing indexes into that table in place of pointers.  About 1k relocations are trimmed by doing this, or ~8k of data on x86 and ARM.

A clever implementation would have used offsetof instead of computing the indices in Python.

xpc_qsFunctionSpecs can be further reduced by better packing after this change; that patch is coming up.
Comment 8 Nathan Froyd [:froydnj] 2011-12-29 13:42:30 PST
Created attachment 584835 [details] [diff] [review]
part 4 - compact function and traceable specs

'arity' in xpc_qs{Function,Traceable}Spec doesn't need a full 32-bit word; by downsizing it, we can rearrange those structures to take up less space.
Comment 9 Nathan Froyd [:froydnj] 2011-12-29 13:44:41 PST
(In reply to Jason Orendorff [:jorendorff] from comment #5)
> How much of a win is this at startup?

I haven't tried to measure this.  Overall savings are about 16K of data (.data.rel.ro and .rel.dyn; .rodata gets a smidge bigger due to the hash table entries not requiring relocations) on x86 Linux.
Comment 10 Jason Orendorff [:jorendorff] 2011-12-30 06:54:09 PST
(In reply to Nathan Froyd (:froydnj) from comment #9)
> (In reply to Jason Orendorff [:jorendorff] from comment #5)
> > How much of a win is this at startup?
> 
> I haven't tried to measure this.

OK, how much do we usually expect to win per relocation removed? I don't need actual cachegrind numbers, any back-of-envelope number will do.
Comment 11 Nathan Froyd [:froydnj] 2011-12-30 07:13:58 PST
(In reply to Jason Orendorff [:jorendorff] from comment #10)
> OK, how much do we usually expect to win per relocation removed? I don't
> need actual cachegrind numbers, any back-of-envelope number will do.

Mike Hommey says normal relocation processing on Android (non-elfhacked) takes ~270ms.  libxul has ~240k relocations and though our other shared libraries have relocations, let's say there is only libxul.  We eliminate ~1.2k relocs with this patch, so the startup wins would be on the order of 1ms, plus slightly less disk I/O.

Depressing, huh? :)
Comment 12 Jason Orendorff [:jorendorff] 2011-12-30 07:34:49 PST
Comment on attachment 584834 [details] [diff] [review]
use flatter property and function names

In xpconnect/src/XPCQuickStubs.h:
> struct xpc_qsTraceableSpec {
>-    const char *name;
>+    uint16_t name_index;
>     JSNative native;
>     uintN arity;
> };

Please just delete this struct. Also delete writeTraceableStub and MAX_TRACEABLE_NATIVE_ARGS from qsgen.py. All three are dead code.

>+class StringTable:
>+    def __init__(self):
>+        self.current_index = 0;
>+        self.table = {}
>+        self.reverse_table = {}

reverse_table is unnecessary.

>+    def writeDefinition(self, f, name):
>+        entries = self.reverse_table.items()
>+        entries.sort(lambda x, y: cmp(x[0], y[0]))

In this case, that turns out to be the default sort order, so you could just do:

        entries.sort()

However since we're getting rid of reverse_table, you have to swap the pairs somehow, perhaps like this:
        entries = [(offset, str) for (str, offset) in self.table]
        entries.sort()
or like this:
        entries = sorted((offset, str) for (str, offset) in self.table)

By the way -- don't use cmp-functions when sorting in Python. Use key-functions instead. Your code ends up being shorter. (Key-sorting is also faster by a constant factor, not that it matters here.) For example, your original code would look like this:
        entries = self.reverse_table.items()
        entries.sort(key=lambda x: x[0])

More on key functions:
  http://wiki.python.org/moin/HowTo/Sorting#Key_Functions

>+        f.write("union strtab {\n")
>+        f.write("    struct f {\n")
>+        for e in entries:
>+            index = e[0]
>+            string = e[1]
>+            f.write("        char field%d[%d];\n" % (index, self.c_strlen(string)))
>+        f.write("    } f;\n")
>+        f.write("    char table[1];\n")
>+        f.write("};\n\n")
>+
>+        f.write("static const union strtab %s = { {\n" % name)
>+        for e in entries:
>+            f.write('    "%s",\n' % e[1])
>+        f.write("} };\n\n")

You can simplify this to:

        f.write("static const char %s[] =\n" % name)
        for pair in entries[:-1]:
            f.write('    /* %5d */ "%s\0"\n' % pair)
        f.write('    /* %5d */ "%s";\n' % entries[-1])

(This code intentionally does not write any commas. C/C++ glues together adjacent string literal tokens.)
Comment 13 Jason Orendorff [:jorendorff] 2011-12-30 07:38:07 PST
Comment on attachment 584835 [details] [diff] [review]
part 4 - compact function and traceable specs

r=me with the Traceable stuff removed.
Comment 14 Nathan Froyd [:froydnj] 2011-12-30 08:11:44 PST
(In reply to Jason Orendorff [:jorendorff] from comment #12)
> You can simplify this to:
> 
>         f.write("static const char %s[] =\n" % name)
>         for pair in entries[:-1]:
>             f.write('    /* %5d */ "%s\0"\n' % pair)
>         f.write('    /* %5d */ "%s";\n' % entries[-1])
> 
> (This code intentionally does not write any commas. C/C++ glues together
> adjacent string literal tokens.)

This approach has two problems:

1) The standard does not require compilers to support arbitrary length string constants.  The required lower bound on the maximum is somewhat small (~500 characters, IIRC).  I don't know of any compilers we support that go quite that low, but some versions of MSVC have limits of 2k; more recent versions have a limit of 65k.  (MSVC enforces its limits post-concatenation, fwiw.)

2) GCC will grouse at you for null characters in string literals.  This warning is on by default and I haven't found any way to turn it off.

Avoiding both of these issues is why I went with the union way.  It is more robust, if more complicated.
Comment 15 Nathan Froyd [:froydnj] 2012-01-12 09:41:10 PST
Created attachment 588080 [details] [diff] [review]
part 3 - use flatter property and function names

I had an epiphany this morning: we can just write the string table out as a big char[], there's no need to do all the games with unions and structs.

Requesting re-review as this is not a cosmetic change.
Comment 16 Nathan Froyd [:froydnj] 2012-01-12 09:42:04 PST
Created attachment 588081 [details] [diff] [review]
part 4 - compact function specs

Traceable stuff nuked in part 3, as requested.  Carrying over r+.
Comment 17 Jason Orendorff [:jorendorff] 2012-01-12 14:25:07 PST
(In reply to Nathan Froyd (:froydnj) from comment #11)
> (In reply to Jason Orendorff [:jorendorff] from comment #10)
> > OK, how much do we usually expect to win per relocation removed? I don't
> > need actual cachegrind numbers, any back-of-envelope number will do.
> 
> Mike Hommey says normal relocation processing on Android (non-elfhacked)
> takes ~270ms.  libxul has ~240k relocations and though our other shared
> libraries have relocations, let's say there is only libxul.  We eliminate
> ~1.2k relocs with this patch, so the startup wins would be on the order of
> 1ms, plus slightly less disk I/O.
> 
> Depressing, huh? :)

Hey, you shave off enough milliseconds and eventually you're snappy. Thanks for finding out for me.
Comment 18 Jason Orendorff [:jorendorff] 2012-01-12 14:27:18 PST
(In reply to Nathan Froyd (:froydnj) from comment #14)
> (In reply to Jason Orendorff [:jorendorff] from comment #12)
> > You can simplify this to:
> > 
> >         f.write("static const char %s[] =\n" % name)
> >         for pair in entries[:-1]:
> >             f.write('    /* %5d */ "%s\0"\n' % pair)
> >         f.write('    /* %5d */ "%s";\n' % entries[-1])
> 
> 2) GCC will grouse at you for null characters in string literals.  This
> warning is on by default and I haven't found any way to turn it off.

I wonder if this is because I wrote \0 when I should have written \\0.

In any case the char array is fine.
Comment 19 Jason Orendorff [:jorendorff] 2012-01-12 14:27:32 PST
Comment on attachment 588080 [details] [diff] [review]
part 3 - use flatter property and function names

Looks good.
Comment 20 Dão Gottwald [:dao] 2012-01-14 02:28:06 PST
Has this passed try?
Comment 21 Mozilla RelEng Bot 2012-01-15 21:45:45 PST
Try run for ff93001d627c is complete.
Detailed breakdown of the results available here:
    https://tbpl.mozilla.org/?tree=Try&rev=ff93001d627c
Results (out of 276 total builds):
    exception: 2
    success: 245
    warnings: 26
    failure: 3
Builds (or logs if builds failed) available at:
http://ftp.mozilla.org/pub/mozilla.org/firefox/try-builds/philringnalda@gmail.com-ff93001d627c
Comment 22 Phil Ringnalda (:philor) 2012-01-15 22:18:00 PST
Behold, the twisted and mutilated visage of a successful try run.

Note You need to log in before you can comment on or make changes to this bug.