Closed Bug 498055 Opened 16 years ago Closed 15 years ago

hgpoller._parse_changes too slow?

Tracking

(Not tracked)

Status:

RESOLVED FIXED

People

(Reporter: catlee, Assigned: catlee)

References

Details

(Whiteboard: [automation])

Attachments

(1 file)

Faster parse_changes using cElementTree 15 years ago Chris AtLee [:catlee] 3.11 KB, patch	bhearsum : review+ catlee : checked-in+	Details \| Diff \| Splinter Review

Chris AtLee [:catlee]

Assignee

Description

•

16 years ago

I spent a little time tonight profiling a local copy of production-master's configs to try and figure out why production-master is so slow. One thing that showed up pretty high in the profile was code related to _parse_changes. In my test, which was to start up buildbot, let it run for 2 minutes, and then shut it down, _parse_changes accounted for 20 seconds of the total (120 second) CPU time. Here's an alternate version that brought the time down to 3 seconds. import xml.etree.cElementTree as ElementTree def _parse_changes(query): etree = ElementTree.fromstring(query) xmlns = re.sub("({.*}).*", "\\1", etree.tag) changes = [] for entry in etree.findall(xmlns+'entry'): d = {} for n in entry.getiterator(): tag = n.tag.replace(xmlns, "") if tag == "title": d['title'] = n.text d['changeset'] = n.text.split(" ")[1] elif tag == "updated": d['updated'] = parse_date_string(n.text) elif tag == "author": d['author'] = n[0].text elif tag == "link": d['link'] = n.get('href') elif tag == "{http://www.w3.org/1999/xhtml}ul" and n.get('class') == "filelist": d['files'] = [c.text for c in n.findall('{http://www.w3.org/1999/xhtml}li')] changes.insert(0, d) return changes

Axel Hecht [:Pike]

Comment 1

•

16 years ago

ElementTree is new in python 2.5, is that good enough these days? If it is, we could probably use reversed(changes) and just append to the list. That might be a tad faster still. Not sure if findall() could use a path even (and what the path would be, great docs say nothing about that). I recall that we wanted to do something with the changeset in the title at some point, Ted?

(not currently active) Ted Mielczarek

Comment 2

•

16 years ago

I think people wanted the title to be more useful, so they could use this feed in a feed reader. I think bug 450006 is the closest thing to this that was filed.

Chris Cooper [:coop] (he/him)

Updated

•

16 years ago

Component: Release Engineering → Release Engineering: Future

Priority: -- → P3

Chris Cooper [:coop] (he/him)

Comment 3

•

15 years ago

Mass move of bugs from Release Engineering:Future -> Release Engineering. See http://coop.deadsquid.com/2010/02/kiss-the-future-goodbye/ for more details.

Component: Release Engineering: Future → Release Engineering

Chris Cooper [:coop] (he/him)

Comment 4

•

15 years ago

catlee: will this still be an issue in 0.8.0?

Whiteboard: [automation]

Chris AtLee [:catlee]

Assignee

Comment 5

•

15 years ago

(In reply to comment #4) > catlee: will this still be an issue in 0.8.0? yes...but since the masters will be split up, the load on the master running the poller won't affect the masters doing builds, so less of a concern. P5 / WONTFIX

Chris AtLee [:catlee]

Assignee

Updated

•

15 years ago

Assignee: nobody → catlee

Chris AtLee [:catlee]

Assignee

Comment 6

•

15 years ago

Any reason we can't use the json push log for this? It doesn't have as much data, but do we need anything more than the revision string and author?

Chris AtLee [:catlee]

Assignee

Comment 7

•

15 years ago

Parsing mozilla-2.0 pushlog (a bit of an extreme case) on my laptop: existing code: 408s and lots of swapping (peaked around 3GB memory usage) ElementTree: 7.8s json: 0.01s

Axel Hecht [:Pike]

Comment 8

•

15 years ago

Yeah, but that's not too surprising, given that the json hook doesn't show the changeset details like files etc.

Chris AtLee [:catlee]

Assignee

Comment 9

•

15 years ago

(In reply to comment #8) > Yeah, but that's not too surprising, given that the json hook doesn't show the > changeset details like files etc. I'm wondering if we actually need to be handling that information?

(not currently active) Ted Mielczarek

Comment 10

•

15 years ago

I don't see why you would. All you should need is the changeset and the identity of the pusher.

Axel Hecht [:Pike]

Comment 11

•

15 years ago

The l10n schedulers need the files in changes, at least.

Chris AtLee [:catlee]

Assignee

Comment 12

•

15 years ago

(In reply to comment #11) > The l10n schedulers need the files in changes, at least. Ok, and you still use our buildbotcustom's HgPoller? I can create an alternate poller that works with json-pushes in that case.

John Ford [:jhford] CET/CEST Berlin Time

Comment 13

•

15 years ago

We could do some optimizations based on the list of files that we can't do if we stop getting the file list. I would like to start doing this for mobile builds so that we don't build mobile if the change is completely unrelated to mobile. (In reply to comment #7) > Parsing mozilla-2.0 pushlog (a bit of an extreme case) on my laptop: > > existing code: 408s and lots of swapping (peaked around 3GB memory usage) > ElementTree: 7.8s > json: 0.01s What about using the ElementTree instead of json parsing? It is still a ton faster than the existing code and aiui, gives us the same information. Alternately, could we add the missing information to the json version of pushlog?

Axel Hecht [:Pike]

Comment 14

•

15 years ago

I'd like to keep the default version of json-pushes to be what it is right now, this is mostly a speed concern for the use case I have on the l10n dashboard. Here, I just get the pushes, and if I see a push, update a local clone. That local clone then serves the information on files etc. I guess that the ElementTree version should be a good step forward as is.

Axel Hecht [:Pike]

Comment 15

•

15 years ago

FYI, Arpad filed bug 584836 to get an option for full json-ess to the pushlog web apis to be used by tbpl. Synergies arise.

Arpad Borsos [:Swatinem]

Updated

•

15 years ago

Depends on: 584836

Chris AtLee [:catlee]

Assignee

Comment 16

•

15 years ago

Attached patch Faster parse_changes using cElementTree — Details — Splinter Review

Attachment #465795 - Flags: review?(bhearsum)

bhearsum@mozilla.com (:bhearsum)

Comment 17

•

15 years ago

Comment on attachment 465795 [details] [diff] [review] Faster parse_changes using cElementTree Yay, faster and more readable!

Attachment #465795 - Flags: review?(bhearsum) → review+

Chris AtLee [:catlee]

Assignee

Updated

•

15 years ago

Blocks: 586793

Chris AtLee [:catlee]

Assignee

Comment 18

•

15 years ago

Comment on attachment 465795 [details] [diff] [review] Faster parse_changes using cElementTree changeset: 881:4c226a8f9804

Attachment #465795 - Flags: checked-in+

Chris AtLee [:catlee]

Assignee

Comment 19

•

15 years ago

Looks like it's working

Status: NEW → RESOLVED

Closed: 15 years ago

Resolution: --- → FIXED

Nobody; OK to take it and work on it

Updated

•

12 years ago

Product: mozilla.org → Release Engineering

You need to log in before you can comment on or make changes to this bug.