Closed Bug 464104 Opened 16 years ago Closed 16 years ago

[silme] L10nObjectDiff produces wrong diffs

Categories

(Mozilla Localizations :: Infrastructure, defect)

defect
Not set
major

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: adriank, Unassigned)

References

Details

Attachments

(1 file, 4 obsolete files)

Attached file test case (obsolete) —
When I'm trying to use L10nObjects and not EntityLists in compare-locales I'm experiencing strange behavior:

missingEntities: 337
missingEntitiesInMissingFiles: 9
missingFiles: 1
obsoleteEntities: 312

but it should be:

missingEntities: 102
missingEntitiesInMissingFiles: 9
missingFiles: 1
obsoleteEntities: 67

Somehow sometimes it adds the same entities as added AND removed at the same time, even if they were not changed.

After some testing I've found out, that the problem seems to be caused by the SequenceMatcher used in lines 9: http://hg.mozilla.org/users/zbraniecki_mozilla.com/silme/file/de21c3a3ded4/lib/silme/diff/l10nobject.py

I did a test case for this problem...
Attached file browser_en-US.dtd (obsolete) —
Attached file browser_pl.dtd (obsolete) —
after looking at google it looks like we are not the first to experience strange behavior with the SequenceMatcher...
Ok, it was a tricky game, because it was made of several bugs which coincidentally were exposed by this testcase.

I started fixing them.

1) Fix http://hg.mozilla.org/users/zbraniecki_mozilla.com/silme/rev/837d264c5d7a

We did not collect 'replaced' by default. (also, fixed applyDiff for modified)

2) http://hg.mozilla.org/users/zbraniecki_mozilla.com/silme/rev/a07f6d11b65d

We collected all evidences od entity from l10nObject into entity list which made it possible that getEntities() will return the same entity twice (removed from one place, added to another).
In the future we should recognize it and mark as 'moved' flag (we already had it in silme stage1).

3) I'm working on, l10nObjectDiff does catch unmodified entities while entityListDiff does not. This should be consistent.

After the third patch this bug should be fixed. I'll try to have it by today. Two previous ones are already checked in and should give you no regressions.
Ok. This bug should be fixed. It's a complex fix, but will be cleaned up with bug 458445.

Patches:
1) http://hg.mozilla.org/users/zbraniecki_mozilla.com/silme/rev/837d264c5d7a
2) http://hg.mozilla.org/users/zbraniecki_mozilla.com/silme/rev/a07f6d11b65d
3) http://hg.mozilla.org/users/zbraniecki_mozilla.com/silme/rev/3487b2cf2110

The last one required a lot of debugging bug gave me a nice insight in how complex the whole idea of morphing L10nObject into EntityList is. That's good - it proves that it was a good idea to split those two objects, they have very different logics and its hard to switch from one to another from inside. I want to make sure its easy from outside of API.

Adrian, please confirm that it didn't break anything for you and I'll start working on PEP08/documentation/cleanups for the release :)
Also. It would be great to get an idea on performance hit you register from this.

As I said, it's to be expected (its because in the bottle neck in each parser I register one other element of the source dict), I just hope its not huge.
Attached file test case v2 (obsolete) —
The bug seems to be partially fixed. For the remaining problems I have written a new, longer test along with a few examples.

While testing the last file (messenger.properties), Silme crashes -> maybe a new bug?
Attachment #347366 - Attachment is obsolete: true
Attachment #347367 - Attachment is obsolete: true
Attachment #347368 - Attachment is obsolete: true
http://hg.mozilla.org/users/zbraniecki_mozilla.com/silme/rev/af429423f16b

This solves the latest testcase.

1) I double check for a complex cases with multiply entities with the same ID in one L10nObject. It is legal, but during dump to Entities it caused problems
2) I correctly mark entities as 'removed' and 'added' instead of 'replaced' while dumping to EntityList.
marking as fixed.
Status: NEW → RESOLVED
Closed: 16 years ago
Resolution: --- → FIXED
Attachment #350121 - Attachment description: test case v2 → test case (v3) for the remaining problems
Attachment #349106 - Attachment is obsolete: true
unfortunately, problem is not fully solved. See the v3 test case...
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
sorry, but the new testcase is showing 8 identical examples of duplicated entities.

In the future, when you're preparing a testcase please, ensure that you are indeed showing unique, distinguishable errors with it.

I don't find any bug with this, closing back. Please, make double check that you indeed faces a bug before reopening.
Status: REOPENED → RESOLVED
Closed: 16 years ago16 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: