Closed Bug 711563 Opened 13 years ago Closed 12 years ago

reduce space and relocations required by quickstubs tables

Categories

(Core :: XPConnect, defect)

defect
Not set
normal

Tracking

()

RESOLVED FIXED
mozilla12

People

(Reporter: froydnj, Assigned: froydnj)

Details

Attachments

(4 files, 4 obsolete files)

Quickstubs's tables take up more space than required due to:

1) size_t for hash table entry chaining;
2) 0-terminated property and function tables;
3) pointers to said tables, requiring relocations; and
4) pointers to strings in said tables, requiring relocations.

Patches for the first three points to be attached shortly.  A patch for the 4th should not be that hard, but I don't have it ready yet.
Our xpc_qsHashEntry table only has ~200ish entries; we don't need a full size_t to index all of them.  Use a uint16_t instead, which saves a size_t for each entry.  Not that big of a savings, admittedly, but still.
Assignee: nobody → nfroyd
Status: NEW → ASSIGNED
Attachment #582347 - Flags: review?(jorendorff)
xpc_qsHashEntry contains a pointer to its properties and functions.  This is all well and good, except that the pointer requires relocations at startup.  (And pointers are big on 64-bit platforms, too.)  The individual arrays also have an unnecessary nsnull entry at the end, which we don't need.

Instead of per-entry properties and functions, just produce one big array of properties, one big array of functions, and record indices and lengths in the xpc_qsHashEntrys.  No more relocations, no more nsnull entries.

16 bits of index and count should be enough for anybody!  We have less than a thousand properties and functions each, so more than two orders of magnitude to grow here.
Attachment #582349 - Flags: review?(jorendorff)
That first patch needs a PR_STATIC_ASSERT about table size _somewhere_.
Now with PR_STATIC_ASSERTs (thanks, bz!).

Not using ArrayLength due to bizarre template errors, fwiw.
Attachment #582347 - Attachment is obsolete: true
Attachment #582347 - Flags: review?(jorendorff)
Attachment #582355 - Flags: review?(jorendorff)
Attachment #582355 - Flags: review?(jorendorff) → review+
Comment on attachment 582349 [details] [diff] [review]
use flatter property and function tables

Instead of 'mydict.has_key(key)', write 'key in mydict'.

Don't use global state like iface_propspecs and iface_funcspecs. Here are two alternatives:

  - create them in one of the functions and pass them around everywhere

  - instead of using dictionaries, just stick the propspecs and funcspecs on
    each interface object. You can do it unconditionally at the end of
    writeStubsForInterface:

        iface.propspecs = propspecs
        iface.funcspecs = funcspecs

    There is no need to declare the new properties.

    Then writeSpecs would look like this:

>+def writeSpecs(f, elementType, varname, spec_type, spec_indices, interfaces):
>+    index = 0
>+    f.write("static const %s %s[] = {\n" % (elementType, varname))
>+    for iface in interfaces:
>+        specs = getattr(iface, spec_type)
>+        if specs:
>+            spec_indices[iface.name] = index
>+            f.write("    // %s (index %d)\n" % (iface.name,index))
>+            for s in specs:
>+                f.write("    %s,\n" % s)
>+            index += len(specs)
>+    f.write("};\n\n")

and for spec_type the caller would the string 'propspecs' or 'funcspecs'.

How much of a win is this at startup?
Attachment #582349 - Flags: review?(jorendorff) → review+
Changes as requested by review.  Carrying over r+.
Attachment #582349 - Attachment is obsolete: true
Attachment #584824 - Flags: review+
Each const char * in our propspecs and funcspecs takes up sizeof(str) sizeof(const char *) + sizeof(relocation).  This patch cuts out the relocations required by producing a string table containing all the names of properties and functions and storing indexes into that table in place of pointers.  About 1k relocations are trimmed by doing this, or ~8k of data on x86 and ARM.

A clever implementation would have used offsetof instead of computing the indices in Python.

xpc_qsFunctionSpecs can be further reduced by better packing after this change; that patch is coming up.
Attachment #584834 - Flags: review?(jorendorff)
'arity' in xpc_qs{Function,Traceable}Spec doesn't need a full 32-bit word; by downsizing it, we can rearrange those structures to take up less space.
Attachment #584835 - Flags: review?(jorendorff)
(In reply to Jason Orendorff [:jorendorff] from comment #5)
> How much of a win is this at startup?

I haven't tried to measure this.  Overall savings are about 16K of data (.data.rel.ro and .rel.dyn; .rodata gets a smidge bigger due to the hash table entries not requiring relocations) on x86 Linux.
(In reply to Nathan Froyd (:froydnj) from comment #9)
> (In reply to Jason Orendorff [:jorendorff] from comment #5)
> > How much of a win is this at startup?
> 
> I haven't tried to measure this.

OK, how much do we usually expect to win per relocation removed? I don't need actual cachegrind numbers, any back-of-envelope number will do.
(In reply to Jason Orendorff [:jorendorff] from comment #10)
> OK, how much do we usually expect to win per relocation removed? I don't
> need actual cachegrind numbers, any back-of-envelope number will do.

Mike Hommey says normal relocation processing on Android (non-elfhacked) takes ~270ms.  libxul has ~240k relocations and though our other shared libraries have relocations, let's say there is only libxul.  We eliminate ~1.2k relocs with this patch, so the startup wins would be on the order of 1ms, plus slightly less disk I/O.

Depressing, huh? :)
Comment on attachment 584834 [details] [diff] [review]
use flatter property and function names

In xpconnect/src/XPCQuickStubs.h:
> struct xpc_qsTraceableSpec {
>-    const char *name;
>+    uint16_t name_index;
>     JSNative native;
>     uintN arity;
> };

Please just delete this struct. Also delete writeTraceableStub and MAX_TRACEABLE_NATIVE_ARGS from qsgen.py. All three are dead code.

>+class StringTable:
>+    def __init__(self):
>+        self.current_index = 0;
>+        self.table = {}
>+        self.reverse_table = {}

reverse_table is unnecessary.

>+    def writeDefinition(self, f, name):
>+        entries = self.reverse_table.items()
>+        entries.sort(lambda x, y: cmp(x[0], y[0]))

In this case, that turns out to be the default sort order, so you could just do:

        entries.sort()

However since we're getting rid of reverse_table, you have to swap the pairs somehow, perhaps like this:
        entries = [(offset, str) for (str, offset) in self.table]
        entries.sort()
or like this:
        entries = sorted((offset, str) for (str, offset) in self.table)

By the way -- don't use cmp-functions when sorting in Python. Use key-functions instead. Your code ends up being shorter. (Key-sorting is also faster by a constant factor, not that it matters here.) For example, your original code would look like this:
        entries = self.reverse_table.items()
        entries.sort(key=lambda x: x[0])

More on key functions:
  http://wiki.python.org/moin/HowTo/Sorting#Key_Functions

>+        f.write("union strtab {\n")
>+        f.write("    struct f {\n")
>+        for e in entries:
>+            index = e[0]
>+            string = e[1]
>+            f.write("        char field%d[%d];\n" % (index, self.c_strlen(string)))
>+        f.write("    } f;\n")
>+        f.write("    char table[1];\n")
>+        f.write("};\n\n")
>+
>+        f.write("static const union strtab %s = { {\n" % name)
>+        for e in entries:
>+            f.write('    "%s",\n' % e[1])
>+        f.write("} };\n\n")

You can simplify this to:

        f.write("static const char %s[] =\n" % name)
        for pair in entries[:-1]:
            f.write('    /* %5d */ "%s\0"\n' % pair)
        f.write('    /* %5d */ "%s";\n' % entries[-1])

(This code intentionally does not write any commas. C/C++ glues together adjacent string literal tokens.)
Attachment #584834 - Flags: review?(jorendorff) → review+
Comment on attachment 584835 [details] [diff] [review]
part 4 - compact function and traceable specs

r=me with the Traceable stuff removed.
Attachment #584835 - Flags: review?(jorendorff) → review+
(In reply to Jason Orendorff [:jorendorff] from comment #12)
> You can simplify this to:
> 
>         f.write("static const char %s[] =\n" % name)
>         for pair in entries[:-1]:
>             f.write('    /* %5d */ "%s\0"\n' % pair)
>         f.write('    /* %5d */ "%s";\n' % entries[-1])
> 
> (This code intentionally does not write any commas. C/C++ glues together
> adjacent string literal tokens.)

This approach has two problems:

1) The standard does not require compilers to support arbitrary length string constants.  The required lower bound on the maximum is somewhat small (~500 characters, IIRC).  I don't know of any compilers we support that go quite that low, but some versions of MSVC have limits of 2k; more recent versions have a limit of 65k.  (MSVC enforces its limits post-concatenation, fwiw.)

2) GCC will grouse at you for null characters in string literals.  This warning is on by default and I haven't found any way to turn it off.

Avoiding both of these issues is why I went with the union way.  It is more robust, if more complicated.
I had an epiphany this morning: we can just write the string table out as a big char[], there's no need to do all the games with unions and structs.

Requesting re-review as this is not a cosmetic change.
Attachment #584834 - Attachment is obsolete: true
Attachment #588080 - Flags: review?(jorendorff)
Traceable stuff nuked in part 3, as requested.  Carrying over r+.
Attachment #584835 - Attachment is obsolete: true
Attachment #588081 - Flags: review+
Attachment #582355 - Attachment description: uint16_t indices for hash table chains, v2 → part 1 - uint16_t indices for hash table chains, v2
Attachment #584824 - Attachment description: use flatter property and function tables, v2 → part 2 -use flatter property and function tables, v2
(In reply to Nathan Froyd (:froydnj) from comment #11)
> (In reply to Jason Orendorff [:jorendorff] from comment #10)
> > OK, how much do we usually expect to win per relocation removed? I don't
> > need actual cachegrind numbers, any back-of-envelope number will do.
> 
> Mike Hommey says normal relocation processing on Android (non-elfhacked)
> takes ~270ms.  libxul has ~240k relocations and though our other shared
> libraries have relocations, let's say there is only libxul.  We eliminate
> ~1.2k relocs with this patch, so the startup wins would be on the order of
> 1ms, plus slightly less disk I/O.
> 
> Depressing, huh? :)

Hey, you shave off enough milliseconds and eventually you're snappy. Thanks for finding out for me.
(In reply to Nathan Froyd (:froydnj) from comment #14)
> (In reply to Jason Orendorff [:jorendorff] from comment #12)
> > You can simplify this to:
> > 
> >         f.write("static const char %s[] =\n" % name)
> >         for pair in entries[:-1]:
> >             f.write('    /* %5d */ "%s\0"\n' % pair)
> >         f.write('    /* %5d */ "%s";\n' % entries[-1])
> 
> 2) GCC will grouse at you for null characters in string literals.  This
> warning is on by default and I haven't found any way to turn it off.

I wonder if this is because I wrote \0 when I should have written \\0.

In any case the char array is fine.
Comment on attachment 588080 [details] [diff] [review]
part 3 - use flatter property and function names

Looks good.
Attachment #588080 - Flags: review?(jorendorff) → review+
Keywords: checkin-needed
Has this passed try?
Try run for ff93001d627c is complete.
Detailed breakdown of the results available here:
    https://tbpl.mozilla.org/?tree=Try&rev=ff93001d627c
Results (out of 276 total builds):
    exception: 2
    success: 245
    warnings: 26
    failure: 3
Builds (or logs if builds failed) available at:
http://ftp.mozilla.org/pub/mozilla.org/firefox/try-builds/philringnalda@gmail.com-ff93001d627c
Behold, the twisted and mutilated visage of a successful try run.
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: