Closed Bug 654448 Opened 13 years ago Closed 12 years ago

pyxpt is slow during make package

Categories

(Firefox Build System :: General, defect)

defect
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED
mozilla15

People

(Reporter: glandium, Assigned: ted)

References

Details

Attachments

(3 files, 1 obsolete file)

It takes 25s to merge all firefox .xpts on my i7-870.

It gets down to 14s with the attached patch (r=ted over irc)
Attachment #529721 - Flags: review+
Assignee: nobody → mh+mozilla
Yeah, I can think of two optimizations that wouldn't be too hard:
1) Keep the interface list sorted at all times, so we can binary search it instead of scanning it. Might be nice to subclass list as sortedlist or something, so that "i in t.interfaces" continues to work, it's just faster. You could probably use the heapq methods internally, they just work on lists anyway.
2) In Typelib.merge, merge both interface lists together and sort the entire thing (if you implemented #1, and we required Python 2.6, you could use heapq.merge: http://docs.python.org/release/2.6.6/library/heapq.html#heapq.merge to do a sorted merge), then check for consecutive duplicates instead of searching the entire array.

That combo would probably speed things up a bit.
Actually, heapq is probably more trouble than it's worth, maybe just use the bisect methods.
Comment on attachment 529721 [details] [diff] [review]
Remove sanity check after each merge, it's already done when writing output

http://hg.mozilla.org/mozilla-central/rev/2a0fbd6eedbd
http://hg.mozilla.org/users/tmielczarek_mozilla.com/pyxpt/rev/eb719a0e146a
Attachment #529721 - Attachment description: Remove sanity check after each merge, it's already done when writing output → Remove sanity check after each merge, it's already done when writing output [checked-in]
Comment on attachment 529721 [details] [diff] [review]
Remove sanity check after each merge, it's already done when writing output

Backed out, it breaks the test suite (though, I thought I had run it with the patch applied)
Attachment #529721 - Attachment description: Remove sanity check after each merge, it's already done when writing output [checked-in] → Remove sanity check after each merge, it's already done when writing output
Attachment #529721 - Attachment is obsolete: true
Attachment #529721 - Flags: review+ → review-
I fiddled the patch a bit, this passes the unit tests (and I added a new test to explicitly test xpt.xpt_link).
Attachment #532612 - Flags: review?(mh+mozilla)
Assignee: mh+mozilla → ted.mielczarek
Attachment #532612 - Flags: review?(mh+mozilla) → review+
Comment on attachment 532612 [details] [diff] [review]
don't call Typelib._sanitycheck after every merge in xpt_link [checked-in]

http://hg.mozilla.org/mozilla-central/rev/732d20702e9a
http://hg.mozilla.org/users/tmielczarek_mozilla.com/pyxpt/rev/68188da41f81
Attachment #532612 - Attachment description: don't call Typelib._sanitycheck after every merge in xpt_link → don't call Typelib._sanitycheck after every merge in xpt_link [checked-in]
Attachment #532612 - Flags: checkin+
Blocks: 707569
Some preliminary refactoring. Not sure if I actually needed this in the end, but it's nicer.
tl;dr: this patch makes `xpt.py link` almost 5x faster on my machine.

This patch does a couple of things that sped up xpt_link by quite a bit:
1) Remove Typelib.merge, put all the logic in xpt_link: this way we can batch a bunch of the logic instead of doing it over and over when merging a big pile of XPT files (like we do during every `make package`)
2) In xpt_link, simply collect all interfaces from the input typelibs into one big list, and then sort out duplicates after the fact. I used logic similar to what the old xpt_link.c did here: sort the interface list by name then check them pairwise.
3) Instead of iterating over all the replacements and then iterating over all the interfaces to find things to fixup, just iterate over all the interfaces, and look them up in a dict to see if they've been replaced. Key to this was changing the hash function for Interface objects so that two interfaces with the same name+iid would hash to the same thing (since every typelib has a definition of nsISupports, but they create unique objects when read in).

Without this patch `xpt.py link` takes about 16.5 seconds on my machine. With it it takes about 3.5s.
Attachment #620445 - Flags: review?(khuey)
Attachment #620449 - Flags: review?(khuey)
Comment on attachment 620445 [details] [diff] [review]
refactor pyxpt to accept file-like objects for Typelib.{read,write}

Review of attachment 620445 [details] [diff] [review]:
-----------------------------------------------------------------

I would like to make it so that these methods only take file objects (and not file names) but I won't stop you from landing it as is.

::: xpcom/typelib/xpt/tools/xpt.py
@@ +1060,5 @@
>          """
> +        filename = ""
> +        data = None
> +        expected_size = None
> +        if isinstance(input_file, basestring):

Yuck.  Do we really need this?

@@ +1180,4 @@
>  
>          """
>          self._sanityCheck()
> +        if isinstance(output_file, basestring):

Yuck again.
Attachment #620445 - Flags: review?(khuey) → review+
Comment on attachment 620449 [details] [diff] [review]
remove xpt.py's Typelib.merge, move all the logic into xpt_link (also make it a lot faster)

Review of attachment 620449 [details] [diff] [review]:
-----------------------------------------------------------------

Yay sorting.

::: xpcom/typelib/xpt/tools/xpt.py
@@ +1260,5 @@
>      """
> +    def read_input(i):
> +        if isinstance(i, Typelib):
> +            return i
> +        return Typelib.read(i)

More yuck.

@@ +1307,5 @@
> +        if i.resolved != j.resolved:
> +            # prefer resolved interfaces over unresolved
> +            if j.resolved:
> +                # keep j
> +                return Result.KeepSecond

If appropriate, I'd like to stick an

assert i.iid == Interface.UNRESOLVED_IID

in here.  I'm kinda of paranoid about duplicate information not being kept in sync ;-)

@@ +1309,5 @@
> +            if j.resolved:
> +                # keep j
> +                return Result.KeepSecond
> +            else:
> +                # replace j with i

and an

assert j.iid == Interface.UNRESOLVED_IID

in here.

@@ +1314,5 @@
> +                return Result.KeepFirst
> +        elif i.iid != j.iid:
> +            # Prefer unresolved interfaces with valid IIDs
> +            if j.iid == Interface.UNRESOLVED_IID:
> +                # replace j with i

I'd like an

assert not j.resolved

@@ +1317,5 @@
> +            if j.iid == Interface.UNRESOLVED_IID:
> +                # replace j with i
> +                return Result.KeepFirst
> +            elif i.iid == Interface.UNRESOLVED_IID:
> +                # keep j

and an

assert not i.resolved

@@ +1328,5 @@
> +                                (i.name, i.iid, i.xpt_filename, \
> +                                 j.iid, j.xpt_filename)
> +        raise DataError, "No idea what happened here: %s:%s (%s), %s:%s (%s)" % \
> +            (i.name, i.iid, i.xpt_filename, j.name, j.iid, j.xpt_filename)
> +    # Compare interfaces pairwise to find duplicates that should be merged.

Put a newline between the end of def compare and the remaining code.
Attachment #620449 - Flags: review?(khuey) → review+
Product: Core → Firefox Build System
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: