Closed Bug 1136896 Opened 9 years ago Closed 9 years ago

Speed up fill() and dedent()

Categories

(Core :: DOM: Core & HTML, defect)

x86
macOS
defect
Not set
normal

Tracking

()

RESOLVED FIXED
mozilla39
Tracking Status
firefox39 --- fixed

People

(Reporter: bzbarsky, Assigned: bzbarsky)

References

(Blocks 1 open bug)

Details

Attachments

(1 file)

I was looking into why codegen for all bindings takes ~15 seconds, and ~5 seconds of that is under fill()/dedent(), mostly doing regexpy bits and whatnot.

In practice, we end up calling these a bunch of times on the same strings and redoing a lot of work.  Memoizing a bit cuts the number of calls to textwrap.dedent from 86000-some to 400-some, and saves 3-3.5 seconds of codegen time for me.
Jason, you're probably most familiar with this code, so do you mind taking a look?  If you'd rather I found another reviewer, let me know.
Attachment #8569415 - Flags: review?(jorendorff)
Comment on attachment 8569415 [details] [diff] [review]
Speed up fill() and dedent() by memoizing some of the work they currently end up doing on each call

Review of attachment 8569415 [details] [diff] [review]:
-----------------------------------------------------------------

I would do this a little differently, because I think it leaves the code reading better to separate the caching code from functionality. But feel free to take or leave this suggestion.

import functools

def memoize(fn):
    """
    Decorator to memoize a function of one argument.
    The cache simply grows without bound.
    """
    cache = {}
    @functools.wraps(f)
    def wrapper(arg):
        r = cache.get(arg)
        if r is None:
            r = cache[arg] = fn(arg)
        return r
    return wrapper


@memoize
def dedent(s):
    ...body and docstring of dedent unchanged...


@memoize
def compile_fill_template(template_str):
    """
    Helper function for fill(). Given the template string passed to fill(),
    do the reusable part of template processing and return a pair (t, argsToIndent)
    that can be used every time fill() is called with that template argument.
    """
    t = dedent(template)
    assert t.endswith("\n") or "\n" not in t

    argsToIndent = []
    def replace(match):
        ...

    t = re.sub(fill_multiline_substitution_re, replace, t)
    return string.Template(t), argsToIndent


def fill(template, **args):
    """
    ...long docstring...
    """
    t, argsToIndent = compile_fill_template(template)
    for name, modified_name, depth in argsToIndent:
        args[modified_name] = indent(args[name], depth)
    return t.substitute(args)

::: dom/bindings/Codegen.py
@@ +172,4 @@
>      def replace(match):
>          """
>          Replaces a line like '  $*{xyz}\n' with '${xyz_n}',
>          where n is the indent depth, and add a corresponding entry to args.

"to argModList", not args

@@ +174,5 @@
>          Replaces a line like '  $*{xyz}\n' with '${xyz_n}',
>          where n is the indent depth, and add a corresponding entry to args.
> +
> +        Note that this needs to close over argModList, so it has to be
> +        defined inside fill().

BTW, defining a replace() function that closes over local variables and gets passed to re.sub() is a very common idiom in Python. It wouldn't typically merit a comment; you just put the def right next to the re.sub() call that uses it, and the reader will catch on.  But in this file (maintained by C++ hackers) it doesn't hurt to explain a bit more.

@@ +197,5 @@
> +        # Call dedent_internal here, since we don't really need to memoize the
> +        # intermediate state.
> +        t = dedent_internal(template)
> +        assert t.endswith("\n") or "\n" not in t
> +        argModList = list()

Change `list()` to `[]`.

@@ +210,5 @@
>          indented_value = indent(args[name], depth)
>          if modified_name in args:
>              assert args[modified_name] == indented_value
>          else:
>              args[modified_name] = indented_value

Pre-existing code, but I don't think the if-statement and the assert are useful anymore.

Now that we have separate "compile" and "run time" phases, we can do better. At compile time, eliminate duplicates:

    entry = (name, modified_name, depth)
    if entry not in argModList:
        argModList.append(entry)

Then at run time, you can just

    for name, modified_name, depth in argsToIndent:
        args[modified_name] = indent(args[name], depth)
Attachment #8569415 - Flags: review?(jorendorff) → review+
https://hg.mozilla.org/mozilla-central/rev/58f17236b187
Status: NEW → RESOLVED
Closed: 9 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla39
Component: DOM → DOM: Core & HTML
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: