Speed up fill() and dedent()

RESOLVED FIXED in Firefox 39

Status

()

RESOLVED FIXED
4 years ago
4 years ago

People

(Reporter: bzbarsky, Assigned: bzbarsky)

Tracking

(Blocks: 1 bug)

Trunk
mozilla39
x86
Mac OS X
Points:
---

Firefox Tracking Flags

(firefox39 fixed)

Details

Attachments

(1 attachment)

I was looking into why codegen for all bindings takes ~15 seconds, and ~5 seconds of that is under fill()/dedent(), mostly doing regexpy bits and whatnot.

In practice, we end up calling these a bunch of times on the same strings and redoing a lot of work.  Memoizing a bit cuts the number of calls to textwrap.dedent from 86000-some to 400-some, and saves 3-3.5 seconds of codegen time for me.
Created attachment 8569415 [details] [diff] [review]
Speed up fill() and dedent() by memoizing some of the work they currently end up doing on each call

Jason, you're probably most familiar with this code, so do you mind taking a look?  If you'd rather I found another reviewer, let me know.
Attachment #8569415 - Flags: review?(jorendorff)
Comment on attachment 8569415 [details] [diff] [review]
Speed up fill() and dedent() by memoizing some of the work they currently end up doing on each call

Review of attachment 8569415 [details] [diff] [review]:
-----------------------------------------------------------------

I would do this a little differently, because I think it leaves the code reading better to separate the caching code from functionality. But feel free to take or leave this suggestion.

import functools

def memoize(fn):
    """
    Decorator to memoize a function of one argument.
    The cache simply grows without bound.
    """
    cache = {}
    @functools.wraps(f)
    def wrapper(arg):
        r = cache.get(arg)
        if r is None:
            r = cache[arg] = fn(arg)
        return r
    return wrapper


@memoize
def dedent(s):
    ...body and docstring of dedent unchanged...


@memoize
def compile_fill_template(template_str):
    """
    Helper function for fill(). Given the template string passed to fill(),
    do the reusable part of template processing and return a pair (t, argsToIndent)
    that can be used every time fill() is called with that template argument.
    """
    t = dedent(template)
    assert t.endswith("\n") or "\n" not in t

    argsToIndent = []
    def replace(match):
        ...

    t = re.sub(fill_multiline_substitution_re, replace, t)
    return string.Template(t), argsToIndent


def fill(template, **args):
    """
    ...long docstring...
    """
    t, argsToIndent = compile_fill_template(template)
    for name, modified_name, depth in argsToIndent:
        args[modified_name] = indent(args[name], depth)
    return t.substitute(args)

::: dom/bindings/Codegen.py
@@ +172,4 @@
>      def replace(match):
>          """
>          Replaces a line like '  $*{xyz}\n' with '${xyz_n}',
>          where n is the indent depth, and add a corresponding entry to args.

"to argModList", not args

@@ +174,5 @@
>          Replaces a line like '  $*{xyz}\n' with '${xyz_n}',
>          where n is the indent depth, and add a corresponding entry to args.
> +
> +        Note that this needs to close over argModList, so it has to be
> +        defined inside fill().

BTW, defining a replace() function that closes over local variables and gets passed to re.sub() is a very common idiom in Python. It wouldn't typically merit a comment; you just put the def right next to the re.sub() call that uses it, and the reader will catch on.  But in this file (maintained by C++ hackers) it doesn't hurt to explain a bit more.

@@ +197,5 @@
> +        # Call dedent_internal here, since we don't really need to memoize the
> +        # intermediate state.
> +        t = dedent_internal(template)
> +        assert t.endswith("\n") or "\n" not in t
> +        argModList = list()

Change `list()` to `[]`.

@@ +210,5 @@
>          indented_value = indent(args[name], depth)
>          if modified_name in args:
>              assert args[modified_name] == indented_value
>          else:
>              args[modified_name] = indented_value

Pre-existing code, but I don't think the if-statement and the assert are useful anymore.

Now that we have separate "compile" and "run time" phases, we can do better. At compile time, eliminate duplicates:

    entry = (name, modified_name, depth)
    if entry not in argModList:
        argModList.append(entry)

Then at run time, you can just

    for name, modified_name, depth in argsToIndent:
        args[modified_name] = indent(args[name], depth)
Attachment #8569415 - Flags: review?(jorendorff) → review+
https://hg.mozilla.org/mozilla-central/rev/58f17236b187
Status: NEW → RESOLVED
Last Resolved: 4 years ago
status-firefox39: --- → fixed
Resolution: --- → FIXED
Target Milestone: --- → mozilla39
You need to log in before you can comment on or make changes to this bug.