Open
Bug 453851
Opened 16 years ago
Updated 4 years ago
[silme] performance optimizations for l20n python parser
Categories
(Localization Infrastructure and Tools :: Silme, defect)
Tracking
(Not tracked)
NEW
People
(Reporter: zbraniecki, Unassigned)
Details
Attachments
(4 files)
16.60 KB,
patch
|
Pike
:
review-
|
Details | Diff | Splinter Review |
420 bytes,
text/plain
|
Details | |
490 bytes,
text/plain
|
Details | |
6.43 KB,
patch
|
Details | Diff | Splinter Review |
I'll put here patches for l20n parser optimization for potential later use
Reporter | ||
Comment 1•16 years ago
|
||
before those, python parser is near to 100% similar to JS parser. Most of the optimizations should be cross-lingual
Reporter | ||
Comment 2•16 years ago
|
||
This change modifies the way parser operates on the incoming string.
Instead of using an internal pointer and operating on untouched string, we unshift() chars from the beginning of the string whenever needed shortening it, but most importantly, reducing the time needed to get the string for any operation.
Cost:
- new string is being created after each char read
Win:
- no pointer
- no reference reading
- no passing of the substring
- string gets shorter with each operation
Time win:
gandalf:silme zbraniecki$ time ./scripts/test.py
real 0m3.976s
user 0m3.859s
sys 0m0.037s
gandalf:silme zbraniecki$ cp parser-new.py lib/silme/playground/l20n/parser.py
gandalf:silme zbraniecki$ time ./scripts/test.py
real 0m3.746s
user 0m3.646s
sys 0m0.040s
6% faster
Reporter | ||
Comment 3•16 years ago
|
||
Reporter | ||
Comment 4•16 years ago
|
||
test.py script
Reporter | ||
Updated•16 years ago
|
Attachment #337091 -
Attachment mime type: application/octet-stream → text/plain
Reporter | ||
Updated•16 years ago
|
Attachment #337089 -
Attachment is patch: true
Attachment #337089 -
Attachment mime type: application/octet-stream → text/plain
Reporter | ||
Comment 5•16 years ago
|
||
the next patch moves most re.compile out of the loop.
Result:
old:
gandalf:silme zbraniecki$ time ./scripts/test.py
real 0m3.768s
user 0m3.635s
sys 0m0.041s
new:
gandalf:silme zbraniecki$ time ./scripts/test.py
real 0m2.784s
user 0m2.732s
sys 0m0.030s
Comment 6•16 years ago
|
||
Comment on attachment 337089 [details] [diff] [review]
Remove self.pointer
r- on this one, we should keep the single global string and the pointer, and instead of
match = pattern.match(self.content[self.pointer:])
do
match = pattern.match(self.content, elf.pointer)
Then we don't have any string allocs at all.
The string allocs should scale worse for bigger lol files, too.
Attachment #337089 -
Flags: review-
Comment 7•16 years ago
|
||
Comment on attachment 337152 [details] [diff] [review]
re.compile out of loop
Hrm. I like the idea to get the pattern compiles out of the loop, but I'm not much of a friend to put those in a hash. You could just do a dummy class and self.patterns.entry, or, self.entryPattern.
Reporter | ||
Comment 8•16 years ago
|
||
(In reply to comment #6)
> (From update of attachment 337089 [details] [diff] [review])
> r- on this one, we should keep the single global string and the pointer
It slows down the whole parsing (not mentioning it makes the code less readable)
>, and
> instead of
> match = pattern.match(self.content[self.pointer:])
> do
> match = pattern.match(self.content, elf.pointer)
It doesn't work. I'd love to use it, but it will not work with '^' in regexp. :/
> Then we don't have any string allocs at all.
>
> The string allocs should scale worse for bigger lol files, too.
I also thought about it. I may do another try with the pointer on the current code, since I also would prefer to avoid allocation.
Reporter | ||
Comment 9•16 years ago
|
||
(In reply to comment #7)
> (From update of attachment 337152 [details] [diff] [review])
> Hrm. I like the idea to get the pattern compiles out of the loop, but I'm not
> much of a friend to put those in a hash. You could just do a dummy class and
> self.patterns.entry, or, self.entryPattern.
Sure. No emotional feelings about this change. Why don't you like hash here?
Comment 10•16 years ago
|
||
match and '^' are double trouble, if you use pattern.match, you can just drop the '^'. That'd only be interesting if you'd use pattern.search
Comment 11•16 years ago
|
||
... and I dislike hashes because:
- they're hard to type, ['foo'] vs. .foo
- I prefer stricter typed setup
Reporter | ||
Comment 12•16 years ago
|
||
it's not going to block the release of silme 0.5 atm.
No longer blocks: 458441
Comment 13•7 years ago
|
||
Moving Silme bugs to its component.
[Mass change filter: silme-move]
Component: Infrastructure → Silme
Product: Mozilla Localizations → Localization Infrastructure and Tools
Reporter | ||
Updated•4 years ago
|
Assignee: gandalf → nobody
You need to log in
before you can comment on or make changes to this bug.
Description
•