pyxpt is slow during make package

RESOLVED FIXED in mozilla15

Status

()

Core
Build Config
RESOLVED FIXED
6 years ago
5 years ago

People

(Reporter: glandium, Assigned: ted)

Tracking

Trunk
mozilla15
Points:
---
Dependency tree / graph

Firefox Tracking Flags

(Not tracked)

Details

Attachments

(3 attachments, 1 obsolete attachment)

Created attachment 529721 [details] [diff] [review]
Remove sanity check after each merge, it's already done when writing output

It takes 25s to merge all firefox .xpts on my i7-870.

It gets down to 14s with the attached patch (r=ted over irc)
Attachment #529721 - Flags: review+
(Assignee)

Updated

6 years ago
Assignee: nobody → mh+mozilla
Wins would be expected from improving these:
http://hg.mozilla.org/users/tmielczarek_mozilla.com/pyxpt/file/tip/xpt.py#l1173
http://hg.mozilla.org/users/tmielczarek_mozilla.com/pyxpt/file/tip/xpt.py#l1222
(Assignee)

Comment 2

6 years ago
Yeah, I can think of two optimizations that wouldn't be too hard:
1) Keep the interface list sorted at all times, so we can binary search it instead of scanning it. Might be nice to subclass list as sortedlist or something, so that "i in t.interfaces" continues to work, it's just faster. You could probably use the heapq methods internally, they just work on lists anyway.
2) In Typelib.merge, merge both interface lists together and sort the entire thing (if you implemented #1, and we required Python 2.6, you could use heapq.merge: http://docs.python.org/release/2.6.6/library/heapq.html#heapq.merge to do a sorted merge), then check for consecutive duplicates instead of searching the entire array.

That combo would probably speed things up a bit.
(Assignee)

Comment 3

6 years ago
Actually, heapq is probably more trouble than it's worth, maybe just use the bisect methods.
Comment on attachment 529721 [details] [diff] [review]
Remove sanity check after each merge, it's already done when writing output

http://hg.mozilla.org/mozilla-central/rev/2a0fbd6eedbd
http://hg.mozilla.org/users/tmielczarek_mozilla.com/pyxpt/rev/eb719a0e146a
Attachment #529721 - Attachment description: Remove sanity check after each merge, it's already done when writing output → Remove sanity check after each merge, it's already done when writing output [checked-in]
Comment on attachment 529721 [details] [diff] [review]
Remove sanity check after each merge, it's already done when writing output

Backed out, it breaks the test suite (though, I thought I had run it with the patch applied)
Attachment #529721 - Attachment description: Remove sanity check after each merge, it's already done when writing output [checked-in] → Remove sanity check after each merge, it's already done when writing output
Attachment #529721 - Attachment is obsolete: true
Attachment #529721 - Flags: review+ → review-
(Assignee)

Comment 6

6 years ago
Created attachment 532612 [details] [diff] [review]
don't call Typelib._sanitycheck after every merge in xpt_link [checked-in]

I fiddled the patch a bit, this passes the unit tests (and I added a new test to explicitly test xpt.xpt_link).
Attachment #532612 - Flags: review?(mh+mozilla)
(Assignee)

Updated

6 years ago
Assignee: mh+mozilla → ted.mielczarek
Attachment #532612 - Flags: review?(mh+mozilla) → review+
Comment on attachment 532612 [details] [diff] [review]
don't call Typelib._sanitycheck after every merge in xpt_link [checked-in]

http://hg.mozilla.org/mozilla-central/rev/732d20702e9a
http://hg.mozilla.org/users/tmielczarek_mozilla.com/pyxpt/rev/68188da41f81
Attachment #532612 - Attachment description: don't call Typelib._sanitycheck after every merge in xpt_link → don't call Typelib._sanitycheck after every merge in xpt_link [checked-in]
Attachment #532612 - Flags: checkin+
(Assignee)

Updated

5 years ago
Blocks: 707569
(Assignee)

Comment 8

5 years ago
Created attachment 620445 [details] [diff] [review]
refactor pyxpt to accept file-like objects for Typelib.{read,write}

Some preliminary refactoring. Not sure if I actually needed this in the end, but it's nicer.
(Assignee)

Comment 9

5 years ago
Created attachment 620449 [details] [diff] [review]
remove xpt.py's Typelib.merge, move all the logic into xpt_link (also make it a lot faster)

tl;dr: this patch makes `xpt.py link` almost 5x faster on my machine.

This patch does a couple of things that sped up xpt_link by quite a bit:
1) Remove Typelib.merge, put all the logic in xpt_link: this way we can batch a bunch of the logic instead of doing it over and over when merging a big pile of XPT files (like we do during every `make package`)
2) In xpt_link, simply collect all interfaces from the input typelibs into one big list, and then sort out duplicates after the fact. I used logic similar to what the old xpt_link.c did here: sort the interface list by name then check them pairwise.
3) Instead of iterating over all the replacements and then iterating over all the interfaces to find things to fixup, just iterate over all the interfaces, and look them up in a dict to see if they've been replaced. Key to this was changing the hash function for Interface objects so that two interfaces with the same name+iid would hash to the same thing (since every typelib has a definition of nsISupports, but they create unique objects when read in).

Without this patch `xpt.py link` takes about 16.5 seconds on my machine. With it it takes about 3.5s.
(Assignee)

Updated

5 years ago
Attachment #620445 - Flags: review?(khuey)
(Assignee)

Updated

5 years ago
Attachment #620449 - Flags: review?(khuey)
Comment on attachment 620445 [details] [diff] [review]
refactor pyxpt to accept file-like objects for Typelib.{read,write}

Review of attachment 620445 [details] [diff] [review]:
-----------------------------------------------------------------

I would like to make it so that these methods only take file objects (and not file names) but I won't stop you from landing it as is.

::: xpcom/typelib/xpt/tools/xpt.py
@@ +1060,5 @@
>          """
> +        filename = ""
> +        data = None
> +        expected_size = None
> +        if isinstance(input_file, basestring):

Yuck.  Do we really need this?

@@ +1180,4 @@
>  
>          """
>          self._sanityCheck()
> +        if isinstance(output_file, basestring):

Yuck again.
Attachment #620445 - Flags: review?(khuey) → review+
Comment on attachment 620449 [details] [diff] [review]
remove xpt.py's Typelib.merge, move all the logic into xpt_link (also make it a lot faster)

Review of attachment 620449 [details] [diff] [review]:
-----------------------------------------------------------------

Yay sorting.

::: xpcom/typelib/xpt/tools/xpt.py
@@ +1260,5 @@
>      """
> +    def read_input(i):
> +        if isinstance(i, Typelib):
> +            return i
> +        return Typelib.read(i)

More yuck.

@@ +1307,5 @@
> +        if i.resolved != j.resolved:
> +            # prefer resolved interfaces over unresolved
> +            if j.resolved:
> +                # keep j
> +                return Result.KeepSecond

If appropriate, I'd like to stick an

assert i.iid == Interface.UNRESOLVED_IID

in here.  I'm kinda of paranoid about duplicate information not being kept in sync ;-)

@@ +1309,5 @@
> +            if j.resolved:
> +                # keep j
> +                return Result.KeepSecond
> +            else:
> +                # replace j with i

and an

assert j.iid == Interface.UNRESOLVED_IID

in here.

@@ +1314,5 @@
> +                return Result.KeepFirst
> +        elif i.iid != j.iid:
> +            # Prefer unresolved interfaces with valid IIDs
> +            if j.iid == Interface.UNRESOLVED_IID:
> +                # replace j with i

I'd like an

assert not j.resolved

@@ +1317,5 @@
> +            if j.iid == Interface.UNRESOLVED_IID:
> +                # replace j with i
> +                return Result.KeepFirst
> +            elif i.iid == Interface.UNRESOLVED_IID:
> +                # keep j

and an

assert not i.resolved

@@ +1328,5 @@
> +                                (i.name, i.iid, i.xpt_filename, \
> +                                 j.iid, j.xpt_filename)
> +        raise DataError, "No idea what happened here: %s:%s (%s), %s:%s (%s)" % \
> +            (i.name, i.iid, i.xpt_filename, j.name, j.iid, j.xpt_filename)
> +    # Compare interfaces pairwise to find duplicates that should be merged.

Put a newline between the end of def compare and the remaining code.
Attachment #620449 - Flags: review?(khuey) → review+
(Assignee)

Comment 12

5 years ago
https://hg.mozilla.org/integration/mozilla-inbound/rev/db0c14b8d2d8
https://hg.mozilla.org/integration/mozilla-inbound/rev/cb5b1d8212fa
Target Milestone: --- → mozilla15
https://hg.mozilla.org/mozilla-central/rev/cb5b1d8212fa
https://hg.mozilla.org/mozilla-central/rev/db0c14b8d2d8
Status: NEW → RESOLVED
Last Resolved: 5 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.