hgpoller._parse_changes too slow?

RESOLVED FIXED

Status

Release Engineering
General
P3
normal
RESOLVED FIXED
9 years ago
4 years ago

People

(Reporter: catlee, Assigned: catlee)

Tracking

Firefox Tracking Flags

(Not tracked)

Details

(Whiteboard: [automation])

Attachments

(1 attachment)

(Assignee)

Description

9 years ago
I spent a little time tonight profiling a local copy of production-master's configs to try and figure out why production-master is so slow.

One thing that showed up pretty high in the profile was code related to _parse_changes.  In my test, which was to start up buildbot, let it run for 2 minutes, and then shut it down, _parse_changes accounted for 20 seconds of the total (120 second) CPU time.

Here's an alternate version that brought the time down to 3 seconds.

    import xml.etree.cElementTree as ElementTree

    def _parse_changes(query):
        etree = ElementTree.fromstring(query)
        xmlns = re.sub("({.*}).*", "\\1", etree.tag)

        changes = []
        for entry in etree.findall(xmlns+'entry'):
            d = {}
            for n in entry.getiterator():
                tag = n.tag.replace(xmlns, "")
                if tag == "title":
                    d['title'] = n.text
                    d['changeset'] = n.text.split(" ")[1]
                elif tag == "updated":
                    d['updated'] = parse_date_string(n.text)
                elif tag == "author":
                    d['author'] = n[0].text
                elif tag == "link":
                    d['link'] = n.get('href')
                elif tag == "{http://www.w3.org/1999/xhtml}ul" and n.get('class') == "filelist":
                    d['files'] = [c.text for c in n.findall('{http://www.w3.org/1999/xhtml}li')]
            changes.insert(0, d)
        return changes

Comment 1

9 years ago
ElementTree is new in python 2.5, is that good enough these days?

If it is, we could probably use reversed(changes) and just append to the list. That might be a tad faster still.

Not sure if findall() could use a path even (and what the path would be, great docs say nothing about that).

I recall that we wanted to do something with the changeset in the title at some point, Ted?
I think people wanted the title to be more useful, so they could use this feed in a feed reader. I think bug 450006 is the closest thing to this that was filed.

Updated

9 years ago
Component: Release Engineering → Release Engineering: Future
Priority: -- → P3

Comment 3

8 years ago
Mass move of bugs from Release Engineering:Future -> Release Engineering. See
http://coop.deadsquid.com/2010/02/kiss-the-future-goodbye/ for more details.
Component: Release Engineering: Future → Release Engineering

Comment 4

8 years ago
catlee: will this still be an issue in 0.8.0?
Whiteboard: [automation]
(Assignee)

Comment 5

8 years ago
(In reply to comment #4)
> catlee: will this still be an issue in 0.8.0?

yes...but since the masters will be split up, the load on the master running the poller won't affect the masters doing builds, so less of a concern.

P5 / WONTFIX
(Assignee)

Updated

8 years ago
Assignee: nobody → catlee
(Assignee)

Comment 6

8 years ago
Any reason we can't use the json push log for this?  It doesn't have as much data, but do we need anything more than the revision string and author?
(Assignee)

Comment 7

8 years ago
Parsing mozilla-2.0 pushlog (a bit of an extreme case) on my laptop:

existing code: 408s and lots of swapping (peaked around 3GB memory usage)
ElementTree:   7.8s
json:         0.01s

Comment 8

8 years ago
Yeah, but that's not too surprising, given that the json hook doesn't show the changeset details like files etc.
(Assignee)

Comment 9

8 years ago
(In reply to comment #8)
> Yeah, but that's not too surprising, given that the json hook doesn't show the
> changeset details like files etc.

I'm wondering if we actually need to be handling that information?
I don't see why you would. All you should need is the changeset and the identity of the pusher.

Comment 11

8 years ago
The l10n schedulers need the files in changes, at least.
(Assignee)

Comment 12

8 years ago
(In reply to comment #11)
> The l10n schedulers need the files in changes, at least.

Ok, and you still use our buildbotcustom's HgPoller?  I can create an alternate poller that works with json-pushes in that case.
We could do some optimizations based on the list of files that we can't do if we stop getting the file list.  I would like to start doing this for mobile builds so that we don't build mobile if the change is completely unrelated to mobile.

(In reply to comment #7)
> Parsing mozilla-2.0 pushlog (a bit of an extreme case) on my laptop:
> 
> existing code: 408s and lots of swapping (peaked around 3GB memory usage)
> ElementTree:   7.8s
> json:         0.01s

What about using the ElementTree instead of json parsing?  It is still a ton faster than the existing code and aiui, gives us the same information.

Alternately, could we add the missing information to the json version of pushlog?

Comment 14

8 years ago
I'd like to keep the default version of json-pushes to be what it is right now, this is mostly a speed concern for the use case I have on the l10n dashboard. Here, I just get the pushes, and if I see a push, update a local clone. That local clone then serves the information on files etc.

I guess that the ElementTree version should be a good step forward as is.

Comment 15

8 years ago
FYI, Arpad filed bug 584836 to get an option for full json-ess to the pushlog web apis to be used by tbpl. Synergies arise.
Depends on: 584836
(Assignee)

Comment 16

7 years ago
Created attachment 465795 [details] [diff] [review]
Faster parse_changes using cElementTree
Attachment #465795 - Flags: review?(bhearsum)
Comment on attachment 465795 [details] [diff] [review]
Faster parse_changes using cElementTree

Yay, faster and more readable!
Attachment #465795 - Flags: review?(bhearsum) → review+
(Assignee)

Updated

7 years ago
Blocks: 586793
(Assignee)

Comment 18

7 years ago
Comment on attachment 465795 [details] [diff] [review]
Faster parse_changes using cElementTree

changeset:   881:4c226a8f9804
Attachment #465795 - Flags: checked-in+
(Assignee)

Comment 19

7 years ago
Looks like it's working
Status: NEW → RESOLVED
Last Resolved: 7 years ago
Resolution: --- → FIXED
Product: mozilla.org → Release Engineering
You need to log in before you can comment on or make changes to this bug.