Closed Bug 498055 Opened 15 years ago Closed 14 years ago

hgpoller._parse_changes too slow?

Categories

(Release Engineering :: General, defect, P3)

x86
Linux
defect

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: catlee, Assigned: catlee)

References

Details

(Whiteboard: [automation])

Attachments

(1 file)

I spent a little time tonight profiling a local copy of production-master's configs to try and figure out why production-master is so slow.

One thing that showed up pretty high in the profile was code related to _parse_changes.  In my test, which was to start up buildbot, let it run for 2 minutes, and then shut it down, _parse_changes accounted for 20 seconds of the total (120 second) CPU time.

Here's an alternate version that brought the time down to 3 seconds.

    import xml.etree.cElementTree as ElementTree

    def _parse_changes(query):
        etree = ElementTree.fromstring(query)
        xmlns = re.sub("({.*}).*", "\\1", etree.tag)

        changes = []
        for entry in etree.findall(xmlns+'entry'):
            d = {}
            for n in entry.getiterator():
                tag = n.tag.replace(xmlns, "")
                if tag == "title":
                    d['title'] = n.text
                    d['changeset'] = n.text.split(" ")[1]
                elif tag == "updated":
                    d['updated'] = parse_date_string(n.text)
                elif tag == "author":
                    d['author'] = n[0].text
                elif tag == "link":
                    d['link'] = n.get('href')
                elif tag == "{http://www.w3.org/1999/xhtml}ul" and n.get('class') == "filelist":
                    d['files'] = [c.text for c in n.findall('{http://www.w3.org/1999/xhtml}li')]
            changes.insert(0, d)
        return changes
ElementTree is new in python 2.5, is that good enough these days?

If it is, we could probably use reversed(changes) and just append to the list. That might be a tad faster still.

Not sure if findall() could use a path even (and what the path would be, great docs say nothing about that).

I recall that we wanted to do something with the changeset in the title at some point, Ted?
I think people wanted the title to be more useful, so they could use this feed in a feed reader. I think bug 450006 is the closest thing to this that was filed.
Component: Release Engineering → Release Engineering: Future
Priority: -- → P3
Mass move of bugs from Release Engineering:Future -> Release Engineering. See
http://coop.deadsquid.com/2010/02/kiss-the-future-goodbye/ for more details.
Component: Release Engineering: Future → Release Engineering
catlee: will this still be an issue in 0.8.0?
Whiteboard: [automation]
(In reply to comment #4)
> catlee: will this still be an issue in 0.8.0?

yes...but since the masters will be split up, the load on the master running the poller won't affect the masters doing builds, so less of a concern.

P5 / WONTFIX
Assignee: nobody → catlee
Any reason we can't use the json push log for this?  It doesn't have as much data, but do we need anything more than the revision string and author?
Parsing mozilla-2.0 pushlog (a bit of an extreme case) on my laptop:

existing code: 408s and lots of swapping (peaked around 3GB memory usage)
ElementTree:   7.8s
json:         0.01s
Yeah, but that's not too surprising, given that the json hook doesn't show the changeset details like files etc.
(In reply to comment #8)
> Yeah, but that's not too surprising, given that the json hook doesn't show the
> changeset details like files etc.

I'm wondering if we actually need to be handling that information?
I don't see why you would. All you should need is the changeset and the identity of the pusher.
The l10n schedulers need the files in changes, at least.
(In reply to comment #11)
> The l10n schedulers need the files in changes, at least.

Ok, and you still use our buildbotcustom's HgPoller?  I can create an alternate poller that works with json-pushes in that case.
We could do some optimizations based on the list of files that we can't do if we stop getting the file list.  I would like to start doing this for mobile builds so that we don't build mobile if the change is completely unrelated to mobile.

(In reply to comment #7)
> Parsing mozilla-2.0 pushlog (a bit of an extreme case) on my laptop:
> 
> existing code: 408s and lots of swapping (peaked around 3GB memory usage)
> ElementTree:   7.8s
> json:         0.01s

What about using the ElementTree instead of json parsing?  It is still a ton faster than the existing code and aiui, gives us the same information.

Alternately, could we add the missing information to the json version of pushlog?
I'd like to keep the default version of json-pushes to be what it is right now, this is mostly a speed concern for the use case I have on the l10n dashboard. Here, I just get the pushes, and if I see a push, update a local clone. That local clone then serves the information on files etc.

I guess that the ElementTree version should be a good step forward as is.
FYI, Arpad filed bug 584836 to get an option for full json-ess to the pushlog web apis to be used by tbpl. Synergies arise.
Depends on: 584836
Attachment #465795 - Flags: review?(bhearsum)
Comment on attachment 465795 [details] [diff] [review]
Faster parse_changes using cElementTree

Yay, faster and more readable!
Attachment #465795 - Flags: review?(bhearsum) → review+
Blocks: 586793
Comment on attachment 465795 [details] [diff] [review]
Faster parse_changes using cElementTree

changeset:   881:4c226a8f9804
Attachment #465795 - Flags: checked-in+
Looks like it's working
Status: NEW → RESOLVED
Closed: 14 years ago
Resolution: --- → FIXED
Product: mozilla.org → Release Engineering
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: