Last Comment Bug 654448 - pyxpt is slow during make package
: pyxpt is slow during make package
Status: RESOLVED FIXED
:
Product: Core
Classification: Components
Component: Build Config (show other bugs)
: Trunk
: All All
: -- normal (vote)
: mozilla15
Assigned To: Ted Mielczarek [:ted.mielczarek]
:
:
Mentors:
Depends on:
Blocks: 643817 707569
  Show dependency treegraph
 
Reported: 2011-05-03 08:15 PDT by Mike Hommey [:glandium]
Modified: 2012-05-04 04:02 PDT (History)
4 users (show)
See Also:
Crash Signature:
(edit)
QA Whiteboard:
Iteration: ---
Points: ---
Has Regression Range: ---
Has STR: ---


Attachments
Remove sanity check after each merge, it's already done when writing output (921 bytes, patch)
2011-05-03 08:15 PDT, Mike Hommey [:glandium]
mh+mozilla: review-
Details | Diff | Splinter Review
don't call Typelib._sanitycheck after every merge in xpt_link [checked-in] (6.37 KB, patch)
2011-05-16 05:05 PDT, Ted Mielczarek [:ted.mielczarek]
mh+mozilla: review+
mounir: checkin+
Details | Diff | Splinter Review
refactor pyxpt to accept file-like objects for Typelib.{read,write} (14.31 KB, patch)
2012-05-02 13:09 PDT, Ted Mielczarek [:ted.mielczarek]
khuey: review+
Details | Diff | Splinter Review
remove xpt.py's Typelib.merge, move all the logic into xpt_link (also make it a lot faster) (38.32 KB, patch)
2012-05-02 13:18 PDT, Ted Mielczarek [:ted.mielczarek]
khuey: review+
Details | Diff | Splinter Review

Description Mike Hommey [:glandium] 2011-05-03 08:15:07 PDT
Created attachment 529721 [details] [diff] [review]
Remove sanity check after each merge, it's already done when writing output

It takes 25s to merge all firefox .xpts on my i7-870.

It gets down to 14s with the attached patch (r=ted over irc)
Comment 2 Ted Mielczarek [:ted.mielczarek] 2011-05-03 08:44:09 PDT
Yeah, I can think of two optimizations that wouldn't be too hard:
1) Keep the interface list sorted at all times, so we can binary search it instead of scanning it. Might be nice to subclass list as sortedlist or something, so that "i in t.interfaces" continues to work, it's just faster. You could probably use the heapq methods internally, they just work on lists anyway.
2) In Typelib.merge, merge both interface lists together and sort the entire thing (if you implemented #1, and we required Python 2.6, you could use heapq.merge: http://docs.python.org/release/2.6.6/library/heapq.html#heapq.merge to do a sorted merge), then check for consecutive duplicates instead of searching the entire array.

That combo would probably speed things up a bit.
Comment 3 Ted Mielczarek [:ted.mielczarek] 2011-05-03 10:39:02 PDT
Actually, heapq is probably more trouble than it's worth, maybe just use the bisect methods.
Comment 4 Mike Hommey [:glandium] 2011-05-11 23:28:00 PDT
Comment on attachment 529721 [details] [diff] [review]
Remove sanity check after each merge, it's already done when writing output

http://hg.mozilla.org/mozilla-central/rev/2a0fbd6eedbd
http://hg.mozilla.org/users/tmielczarek_mozilla.com/pyxpt/rev/eb719a0e146a
Comment 5 Mike Hommey [:glandium] 2011-05-12 00:07:39 PDT
Comment on attachment 529721 [details] [diff] [review]
Remove sanity check after each merge, it's already done when writing output

Backed out, it breaks the test suite (though, I thought I had run it with the patch applied)
Comment 6 Ted Mielczarek [:ted.mielczarek] 2011-05-16 05:05:49 PDT
Created attachment 532612 [details] [diff] [review]
don't call Typelib._sanitycheck after every merge in xpt_link [checked-in]

I fiddled the patch a bit, this passes the unit tests (and I added a new test to explicitly test xpt.xpt_link).
Comment 7 Mike Hommey [:glandium] 2011-05-16 05:50:21 PDT
Comment on attachment 532612 [details] [diff] [review]
don't call Typelib._sanitycheck after every merge in xpt_link [checked-in]

http://hg.mozilla.org/mozilla-central/rev/732d20702e9a
http://hg.mozilla.org/users/tmielczarek_mozilla.com/pyxpt/rev/68188da41f81
Comment 8 Ted Mielczarek [:ted.mielczarek] 2012-05-02 13:09:15 PDT
Created attachment 620445 [details] [diff] [review]
refactor pyxpt to accept file-like objects for Typelib.{read,write}

Some preliminary refactoring. Not sure if I actually needed this in the end, but it's nicer.
Comment 9 Ted Mielczarek [:ted.mielczarek] 2012-05-02 13:18:48 PDT
Created attachment 620449 [details] [diff] [review]
remove xpt.py's Typelib.merge, move all the logic into xpt_link (also make it a lot faster)

tl;dr: this patch makes `xpt.py link` almost 5x faster on my machine.

This patch does a couple of things that sped up xpt_link by quite a bit:
1) Remove Typelib.merge, put all the logic in xpt_link: this way we can batch a bunch of the logic instead of doing it over and over when merging a big pile of XPT files (like we do during every `make package`)
2) In xpt_link, simply collect all interfaces from the input typelibs into one big list, and then sort out duplicates after the fact. I used logic similar to what the old xpt_link.c did here: sort the interface list by name then check them pairwise.
3) Instead of iterating over all the replacements and then iterating over all the interfaces to find things to fixup, just iterate over all the interfaces, and look them up in a dict to see if they've been replaced. Key to this was changing the hash function for Interface objects so that two interfaces with the same name+iid would hash to the same thing (since every typelib has a definition of nsISupports, but they create unique objects when read in).

Without this patch `xpt.py link` takes about 16.5 seconds on my machine. With it it takes about 3.5s.
Comment 10 Kyle Huey [:khuey] (Exited; not receiving bugmail, email if necessary) 2012-05-02 14:50:12 PDT
Comment on attachment 620445 [details] [diff] [review]
refactor pyxpt to accept file-like objects for Typelib.{read,write}

Review of attachment 620445 [details] [diff] [review]:
-----------------------------------------------------------------

I would like to make it so that these methods only take file objects (and not file names) but I won't stop you from landing it as is.

::: xpcom/typelib/xpt/tools/xpt.py
@@ +1060,5 @@
>          """
> +        filename = ""
> +        data = None
> +        expected_size = None
> +        if isinstance(input_file, basestring):

Yuck.  Do we really need this?

@@ +1180,4 @@
>  
>          """
>          self._sanityCheck()
> +        if isinstance(output_file, basestring):

Yuck again.
Comment 11 Kyle Huey [:khuey] (Exited; not receiving bugmail, email if necessary) 2012-05-02 15:19:26 PDT
Comment on attachment 620449 [details] [diff] [review]
remove xpt.py's Typelib.merge, move all the logic into xpt_link (also make it a lot faster)

Review of attachment 620449 [details] [diff] [review]:
-----------------------------------------------------------------

Yay sorting.

::: xpcom/typelib/xpt/tools/xpt.py
@@ +1260,5 @@
>      """
> +    def read_input(i):
> +        if isinstance(i, Typelib):
> +            return i
> +        return Typelib.read(i)

More yuck.

@@ +1307,5 @@
> +        if i.resolved != j.resolved:
> +            # prefer resolved interfaces over unresolved
> +            if j.resolved:
> +                # keep j
> +                return Result.KeepSecond

If appropriate, I'd like to stick an

assert i.iid == Interface.UNRESOLVED_IID

in here.  I'm kinda of paranoid about duplicate information not being kept in sync ;-)

@@ +1309,5 @@
> +            if j.resolved:
> +                # keep j
> +                return Result.KeepSecond
> +            else:
> +                # replace j with i

and an

assert j.iid == Interface.UNRESOLVED_IID

in here.

@@ +1314,5 @@
> +                return Result.KeepFirst
> +        elif i.iid != j.iid:
> +            # Prefer unresolved interfaces with valid IIDs
> +            if j.iid == Interface.UNRESOLVED_IID:
> +                # replace j with i

I'd like an

assert not j.resolved

@@ +1317,5 @@
> +            if j.iid == Interface.UNRESOLVED_IID:
> +                # replace j with i
> +                return Result.KeepFirst
> +            elif i.iid == Interface.UNRESOLVED_IID:
> +                # keep j

and an

assert not i.resolved

@@ +1328,5 @@
> +                                (i.name, i.iid, i.xpt_filename, \
> +                                 j.iid, j.xpt_filename)
> +        raise DataError, "No idea what happened here: %s:%s (%s), %s:%s (%s)" % \
> +            (i.name, i.iid, i.xpt_filename, j.name, j.iid, j.xpt_filename)
> +    # Compare interfaces pairwise to find duplicates that should be merged.

Put a newline between the end of def compare and the remaining code.

Note You need to log in before you can comment on or make changes to this bug.