Open Bug 453851 Opened 16 years ago Updated 4 years ago

[silme] performance optimizations for l20n python parser

Categories

(Localization Infrastructure and Tools :: Silme, defect)

x86
All
defect
Not set
normal

Tracking

(Not tracked)

People

(Reporter: zbraniecki, Unassigned)

Details

Attachments

(4 files)

I'll put here patches for l20n parser optimization for potential later use
before those, python parser is near to 100% similar to JS parser. Most of the optimizations should be cross-lingual
This change modifies the way parser operates on the incoming string. Instead of using an internal pointer and operating on untouched string, we unshift() chars from the beginning of the string whenever needed shortening it, but most importantly, reducing the time needed to get the string for any operation. Cost: - new string is being created after each char read Win: - no pointer - no reference reading - no passing of the substring - string gets shorter with each operation Time win: gandalf:silme zbraniecki$ time ./scripts/test.py real 0m3.976s user 0m3.859s sys 0m0.037s gandalf:silme zbraniecki$ cp parser-new.py lib/silme/playground/l20n/parser.py gandalf:silme zbraniecki$ time ./scripts/test.py real 0m3.746s user 0m3.646s sys 0m0.040s 6% faster
Attached file test lol file
Attached file test.py script
test.py script
Attachment #337091 - Attachment mime type: application/octet-stream → text/plain
Attachment #337089 - Attachment is patch: true
Attachment #337089 - Attachment mime type: application/octet-stream → text/plain
the next patch moves most re.compile out of the loop. Result: old: gandalf:silme zbraniecki$ time ./scripts/test.py real 0m3.768s user 0m3.635s sys 0m0.041s new: gandalf:silme zbraniecki$ time ./scripts/test.py real 0m2.784s user 0m2.732s sys 0m0.030s
Comment on attachment 337089 [details] [diff] [review] Remove self.pointer r- on this one, we should keep the single global string and the pointer, and instead of match = pattern.match(self.content[self.pointer:]) do match = pattern.match(self.content, elf.pointer) Then we don't have any string allocs at all. The string allocs should scale worse for bigger lol files, too.
Attachment #337089 - Flags: review-
Comment on attachment 337152 [details] [diff] [review] re.compile out of loop Hrm. I like the idea to get the pattern compiles out of the loop, but I'm not much of a friend to put those in a hash. You could just do a dummy class and self.patterns.entry, or, self.entryPattern.
(In reply to comment #6) > (From update of attachment 337089 [details] [diff] [review]) > r- on this one, we should keep the single global string and the pointer It slows down the whole parsing (not mentioning it makes the code less readable) >, and > instead of > match = pattern.match(self.content[self.pointer:]) > do > match = pattern.match(self.content, elf.pointer) It doesn't work. I'd love to use it, but it will not work with '^' in regexp. :/ > Then we don't have any string allocs at all. > > The string allocs should scale worse for bigger lol files, too. I also thought about it. I may do another try with the pointer on the current code, since I also would prefer to avoid allocation.
(In reply to comment #7) > (From update of attachment 337152 [details] [diff] [review]) > Hrm. I like the idea to get the pattern compiles out of the loop, but I'm not > much of a friend to put those in a hash. You could just do a dummy class and > self.patterns.entry, or, self.entryPattern. Sure. No emotional feelings about this change. Why don't you like hash here?
match and '^' are double trouble, if you use pattern.match, you can just drop the '^'. That'd only be interesting if you'd use pattern.search
... and I dislike hashes because: - they're hard to type, ['foo'] vs. .foo - I prefer stricter typed setup
Blocks: 458441
it's not going to block the release of silme 0.5 atm.
No longer blocks: 458441
Moving Silme bugs to its component. [Mass change filter: silme-move]
Component: Infrastructure → Silme
Product: Mozilla Localizations → Localization Infrastructure and Tools
Assignee: gandalf → nobody
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: