Closed Bug 1199670 Opened 10 years ago Closed 8 years ago

[compare-locales] add support for l20n to compare-locales

Categories

(Localization Infrastructure and Tools :: compare-locales, defect)

defect
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: Pike, Assigned: stas)

References

(Depends on 1 open bug)

Details

User Story

Support l20n/fluent in compare-locales.

The fileformat support is coming from https://github.com/projectfluent/python-fluent.

We'll need l10n-merge to strip known-bad entities, without adding en-US for them. Adding fallback chain is part of the runtime architecture, to properly hook up plural forms etc.

The metrics to report should be on fluent messages, Axel thinks.

TBD: metrics on complex strings. Word counts, once we add them to compare-locales.

Attachments

(2 files, 2 obsolete files)

As we're preparing to roll out l20n support, add that to compare-locales. The parser is going to be the one from https://github.com/l20n/python-l20n, with a wrapper to make compare-locales like it. And then the l20n magic happens in checks again, where we take a deep dive into the structure checks. The checks itself should increase over time, let's start simple, with just values and attributes. gandalf, I didn't do resolvable yet, but it shouldn't be too hard to add. Also, the tests I wrote for this stuff shame the tests that the other formats got so far :-)
This patch looks rather innocent. I stared at my code for half a day, and then realized that I wouldn't need much more than this. Open questions: l10n-merge: I can do l10n-merge or I can not. I opted for doing it, as a build-time optimization, so that in general, we'd be more performant. That shouldn't have an impact on being able to do runtime fallbacks if the build isn't compat with the localization. resolvable entities, yeah. What's up with those? Also figured that can be a follow-up patch, probably not going to be needed in our immediate future? I haven't looked at the impact of my hacks on the elmo diff code. Probably should do that before we go too far. Thus, I'm starting with a feedback request first.
Assignee: nobody → l10n
Status: NEW → ASSIGNED
Attachment #8654132 - Flags: feedback?(gandalf)
Also, context from our call earlier this week, the checks are inspired by https://github.com/zbraniecki/compare-locales.js/blob/master/lib/mozilla/apps/gaia/malformed.js
Comment on attachment 8654132 [details] [diff] [review] hook up l20n to compare-locales, with tests Review of attachment 8654132 [details] [diff] [review]: ----------------------------------------------------------------- ::: compare_locales/compare.py @@ +384,5 @@ > > def merge(self, ref_entities, ref_map, ref_file, l10n_file, missing, > skips, p): > + if hasattr(p, 'needsMerge') and not p.needsMerge: > + return what's that?
Attachment #8654132 - Flags: feedback?(stas)
Attachment #8654132 - Flags: feedback?(gandalf)
Attachment #8654132 - Flags: feedback+
Comment on attachment 8654132 [details] [diff] [review] hook up l20n to compare-locales, with tests Review of attachment 8654132 [details] [diff] [review]: ----------------------------------------------------------------- (In reply to Axel Hecht [:Pike] from comment #1) > > resolvable entities, yeah. What's up with those? Also figured that can be a > follow-up patch, probably not going to be needed in our immediate future? The isResolvable check serves the purpose of enforcing the social contract of L20n. If the source entity is expected to rsolve to a single string (it's a simple string, or a hash with a default value or an index) then the translation should follow the same convention. This is so that we can allow something like the following: English: <itemsAvailable "{{ $val }} available"> French: <itemsAvailable[plural($val)] { one: "{{ $val }} disponible", many: "{{ $val }} disponibles", }> Given the English string above, all of the following Franch translations are valid in the sense of the isResolvable check: <itemsAvailable "{{ $val }} disp."> <itemsAvailable[plural($val)] { one: "{{ $val }} disponible", many: "{{ $val }} disponibles", }> <itemsAvailable { one: "{{ $val }} disponible", *many: "{{ $val }} disponibles", }> They might not be correct, but at least they won't break the app which expects itemsAvailable to evaluate to a single string. ::: compare_locales/checks.py @@ +420,5 @@ > + l10n_entry = l10nEnt.entry > + # verify that we're having the same set of attributes > + # just the names matter > + ref_attrs = set((attr.id.name for attr in ref_entry.attrs)) > + l10n_attrs = set((attr.id.name for attr in l10n_entry.attrs)) I think we should filter private attributes first. It looks like we're not doing it in compare-locales.js right now, which is a bug. In fact, it looks like we're not even parsing _id as private identifiers; let's fix this in a follow up across all parsers. @@ +421,5 @@ > + # verify that we're having the same set of attributes > + # just the names matter > + ref_attrs = set((attr.id.name for attr in ref_entry.attrs)) > + l10n_attrs = set((attr.id.name for attr in l10n_entry.attrs)) > + obsolete_attrs = l10n_attrs - ref_attrs Do I miss python when I see things like this. @@ +423,5 @@ > + ref_attrs = set((attr.id.name for attr in ref_entry.attrs)) > + l10n_attrs = set((attr.id.name for attr in l10n_entry.attrs)) > + obsolete_attrs = l10n_attrs - ref_attrs > + for name in obsolete_attrs: > + attr = [_a for _a in l10n_entry.attrs if _a.id.name == name][0] I'm catching up to speed with the python l20n parser. I didn't realize entry.attrs was a list. What's more idiomatic in python: an array comprehension or a filter? ::: compare_locales/tests/test_merge.py @@ +106,5 @@ > + maxDiff = None # we got big dictionaries to compare > + > + def reference(content=None): > + if content is None: > + content = """<foo 'fooVal'> Is this default content really helpful/needed? Perhaps the testcases would be more readable if the reference content was always required in the decorator (really cool idea, btw!).
Attachment #8654132 - Flags: feedback?(stas) → feedback+
(In reply to Staś Małolepszy :stas from comment #4) > I think we should filter private attributes first. It looks like we're not > doing it in compare-locales.js right now, which is a bug. In fact, it looks > like we're not even parsing _id as private identifiers; let's fix this in a > follow up across all parsers. I don't think it's a bug. We just don't support private attributes/entities yet in 3.x branch - same way as many other l20n features. We just need to add them one after another :)
Catch-up: we decided that to achieve compat between source/build and runtime fallback logic, all "interface" imcompats between reference and l10n should be warnings. I've updated the patch to implement that. https://github.com/Pike/compare-locales/compare/l20n-support is my branch. I followed up with a few experiments on how to write the tests. It starts with leaving the decorator for en-US. Then I add a decorator for localized content. Then I move the decorators into methods. Then I make en-US content not fall back to default content. I found doing l10n and reference the same way to be nicely consistent. Then I started to wonder if that much decorator content was actually good, and thus started the experiment to move the content inside the test. Given that the test and the docstring (if I had some) would describe what the test does, having the content that we use for that come after it does resonate with me. I'm still not happy on how readable the code is, vs the test content. gandalf, stas, do you guys have comments? (pushed to github for easier multi-patch diffs etc)
Looks good to me! The only thing that made me think twice is if just checking attribute names without verifying if both have resolvable values is enough. But I guess it's a minor thing that can be decided upon later.
I'm wondering, is there a use-case for unresolvable entities? Reminds me of the conversation that I had with stas, and IIRC, our verdict was that all of those should be macros. Even more so, I don't think unresolvable makes any sense for attributes? In the light of that, should those be parsing errors rather than checks?
I'm in favor of removing the resolvability test for values and attributes and requiring that entities have an index or a default value on the parser level.
Sounds good to me. Filed bug 1222055 and will provide a patch today.
User Story: (updated)
Assignee: l10n → nobody
Status: ASSIGNED → NEW
I'd like to work on this bug this week. Is the patch submitted in attachment 8872363 [details] the most up-to-date implementation?
... so does gandalf, I've rebased my patch on top of the current tip, and talked through the places where it's known to be broken. You guys'll warsaw that out?
:stas, feel free to take this and we can slay it together when we co-work. I'll focus on getting all the other patches for fluent+multilocale repack in shape.
Attachment #8654132 - Attachment is obsolete: true
Attachment #8872363 - Attachment is obsolete: true
Pike, attachment 8874942 [details] is a WIP and still requires more. I need to fix the remaining tests and introduce the can_merge/can_strip logic. In this request I'd like to get your feedback on changes to the EntityBase class and how FluentEntity extends it. Thanks!
Comment on attachment 8874942 [details] Bug 1199670 - Add Fluent support to compare-locales. https://reviewboard.mozilla.org/r/146314/#review150314 ::: compare_locales/checks.py:452 (Diff revision 1) > + for i, attr in enumerate(l10n_entry.attributes)) > + > + missing_attr_names = sorted(ref_attr_names - l10n_attr_names, > + key=lambda k: ref_pos[k]) > + for attr_name in missing_attr_names: > + yield ('warning', l10n_entry.span.start, Missing attributes are reported at the begininng of the message. ::: compare_locales/compare.py (Diff revision 1) > - # overload this if needed > - pass > - > - def doChanged(self, file, ref_entity, l10n_entity): > - # overload this if needed > - pass I removed those because ContentComparer doesn't seem to be subclassed anywhere. ::: compare_locales/parser.py:600 (Diff revision 1) > + self.key_span = (entry.id.span.start, entry.id.span.end) > + > + if (entry.value is not None): > + self.val_span = (entry.value.span.start, entry.value.span.end) > + else: > + self.val_span = (0, 0) What should we do when messages don't have any value? ::: compare_locales/parser.py:632 (Diff revision 1) > + self.entry.traverse(count_words) > + > + return self.words_ > + > + def __eq__(self, other): > + return False It's a WIP :) ::: compare_locales/parser.py:641 (Diff revision 1) > + pos = self.entry.span.start > + return self.ctx.lines(pos)[0] > + > + def value_position(self, pos=None): > + if pos is None: > + pos = self.entry.span.start I don't know what value_position should do in FluentEntity. There might not be any value for that matter. For now things just work if value_position has the same implementation as position.
Comment on attachment 8874942 [details] Bug 1199670 - Add Fluent support to compare-locales. https://reviewboard.mozilla.org/r/146314/#review150352 I think the refactors are OK so far, I've noted some issues below. ::: compare_locales/checks.py:439 (Diff revision 1) > + if ref_entry.value and not l10n_entry.value: > + yield ('warning', l10n_entry.span.start, > + 'Missing value', 'fluent') > + if not ref_entry.value and l10n_entry.value: > + yield ('warning', l10n_entry.value.span.start, > + 'Obsolete value', 'fluent') There's the ^ operator for XOR. Also, I guess this should check for is None and not just bool? That is, make a difference between an empty string and no value? Also, these should be reported as error? ::: compare_locales/checks.py:444 (Diff revision 1) > + 'Obsolete value', 'fluent') > + > + # verify that we're having the same set of attributes > + ref_attr_names = set((attr.id.name for attr in ref_entry.attributes)) > + ref_pos = dict((attr.id.name, i) > + for i, attr in enumerate(ref_entry.attributes)) I don't think you need the _names sets at all? ::: compare_locales/checks.py:452 (Diff revision 1) > + for i, attr in enumerate(l10n_entry.attributes)) > + > + missing_attr_names = sorted(ref_attr_names - l10n_attr_names, > + key=lambda k: ref_pos[k]) > + for attr_name in missing_attr_names: > + yield ('warning', l10n_entry.span.start, I think we should report missing attriubtes as error? ::: compare_locales/compare.py (Diff revision 1) > - # overload this if needed > - pass > - > - def doChanged(self, file, ref_entity, l10n_entity): > - # overload this if needed > - pass I'm happy to crash land this removal as a separate patch. But that's part of our move to 2.0, and a separate landing. Users of compare-locales could subclass this, and use it. They're not used outside of compare-locales by code that I wrote, though. Thus, happy to land this, but not as a ride-along. ::: compare_locales/parser.py:109 (Diff revision 1) > > + re_br = re.compile('<br\s*/?>', re.U) > + re_sgml = re.compile('</?\w+.*?>', re.U | re.M) > + > + @property > + def words(self): I'd prefer for this to be a method rather than a property, just because it's potentially computationally intensive. Also, the name should be word_count, not words. It doesn't return the words, but the number of words in the entity. ::: compare_locales/parser.py:597 (Diff revision 1) > + self.ctx = ctx > + self.span = (start, end) > + > + self.key_span = (entry.id.span.start, entry.id.span.end) > + > + if (entry.value is not None): No braces, this is python, not js. ::: compare_locales/parser.py:605 (Diff revision 1) > + self.val_span = (0, 0) > + > + self.entry = entry > + > + def pp(self, value): > + return value I suspect this should do whitespace normalization. ::: compare_locales/parser.py:609 (Diff revision 1) > + def pp(self, value): > + return value > + > + @property > + def all(self): > + return self.ctx.contents[self.span[0]:self.span[1]] This is what the baseclass does? ::: compare_locales/parser.py:613 (Diff revision 1) > + def all(self): > + return self.ctx.contents[self.span[0]:self.span[1]] > + > + @property > + def key(self): > + return self.entry.id.name This can probably go away as you set key_span? ::: compare_locales/parser.py:641 (Diff revision 1) > + pos = self.entry.span.start > + return self.ctx.lines(pos)[0] > + > + def value_position(self, pos=None): > + if pos is None: > + pos = self.entry.span.start This depends on the return value of checker.check. You should do here what makes the positions returned from checker for l20n resolve to the right positions in the file.
Attachment #8874942 - Flags: review?(l10n)
Comment on attachment 8874942 [details] Bug 1199670 - Add Fluent support to compare-locales. https://reviewboard.mozilla.org/r/146314/#review150352 Thanks, Axel, this is very helpful! > There's the ^ operator for XOR. Also, I guess this should check for is None and not just bool? That is, make a difference between an empty string and no value? > > Also, these should be reported as error? The XOR operation won't allow us to report "missing" and "obsolete" separately, right? > I don't think you need the _names sets at all? I do. I need them for the set difference. We can't compare sets of Attribute objects because they're always strictly different in the identity sense. > I think we should report missing attriubtes as error? OK. Should we error in case of obsolete attributes as well? > I'm happy to crash land this removal as a separate patch. But that's part of our move to 2.0, and a separate landing. Users of compare-locales could subclass this, and use it. > > They're not used outside of compare-locales by code that I wrote, though. > > Thus, happy to land this, but not as a ride-along. Sure, I'll revert this. > I'd prefer for this to be a method rather than a property, just because it's potentially computationally intensive. > > Also, the name should be word_count, not words. It doesn't return the words, but the number of words in the entity. If it's a method, I guess count_words() will be appropriate? > No braces, this is python, not js. :) > This is what the baseclass does? Yup, good catch. > This can probably go away as you set key_span? Yeah, I was wondering what your take on this was. Should we stick to using spans here, or make use of the data stored in the AST. I think using spans will help consistency a bit. > This depends on the return value of checker.check. > > You should do here what makes the positions returned from checker for l20n resolve to the right positions in the file. That's what I'm doing right now. Is it okay that position and value_position are the same thing? From what I see, position() is only used with junk entities.
Comment on attachment 8874942 [details] Bug 1199670 - Add Fluent support to compare-locales. https://reviewboard.mozilla.org/r/146314/#review150352 > The XOR operation won't allow us to report "missing" and "obsolete" separately, right? Right, sorry. > I do. I need them for the set difference. We can't compare sets of Attribute objects because they're always strictly different in the identity sense. My thought was to just iterate over the dicts. Something like sorted((name for name in ref_pos if name not in l10n_pos), key=...) > OK. Should we error in case of obsolete attributes as well? I think so. Seems safer? > If it's a method, I guess count_words() will be appropriate? Yes.
Comment on attachment 8874942 [details] Bug 1199670 - Add Fluent support to compare-locales. https://reviewboard.mozilla.org/r/146314/#review150352 > My thought was to just iterate over the dicts. Something like > > sorted((name for name in ref_pos if name not in l10n_pos), key=...) I'm not sure if I see the value of this. The current code looks pretty straight-forward to me; no tricks :) > I suspect this should do whitespace normalization. Any pointers on how to do this? I thought maybe serialize_entry(self.entry) but that also serializes the identifier.
Comment on attachment 8874942 [details] Bug 1199670 - Add Fluent support to compare-locales. https://reviewboard.mozilla.org/r/146314/#review150888 ::: compare_locales/parser.py:606 (Diff revisions 1 - 2) > def __init__(self, ctx, entry): > start = entry.span.start > end = entry.span.end > > self.ctx = ctx > self.span = (start, end) The FluentEntity class currently doesn't set pre_ws_span, def_span nor post_span. Is this okay? I'm also not sure what there should be if it's not okay to leave them out. ::: compare_locales/parser.py:643 (Diff revisions 1 - 2) > + return None if isinstance(node, ftl.Span) else node > > + def __eq__(self, other): > + own_entry = self.entry.traverse(self.remove_spans) > + other_entry = other.entry.traverse(self.remove_spans) > + return own_entry.to_json() == other_entry.to_json() Is this enough in Python to compare to dicts? Do I need to do anything special to ensure the order of keys is the same?
Note: before this can land, https://github.com/projectfluent/python-fluent/pull/13 needs to be merged and a new version of python-fluent must be published and added to compare-locales' setup.py as a dependency.
Comment on attachment 8874942 [details] Bug 1199670 - Add Fluent support to compare-locales. https://reviewboard.mozilla.org/r/146314/#review151304 Getting closer, a few top-level comments: I'd like to see behavior renamed to parser_capabilities. Just semantically a better match for CAN_DO. Also, raised inline, you support MERGE without SKIP? I'd like to see a test with Parser errors, too. Finally starting to get a grip on what you do with positions, and I'm somewhat scared how we're implicitly treating two of the three position types as one, aka, file-relative and value-relative are the same code path, and but work completely different. I don't have a good suggestion ad-hoc, though. Surely needs more comments, at least. I'd add them to both locations in the parser, and to the checker code path, so that we don't accidently do something there that conflicts. ::: compare_locales/parser.py:196 (Diff revisions 1 - 2) > > def __repr__(self): > return self.key > > + def count_words(self): > + return len(self.val.split()) Do we actually need this? AFAICT, it'd be a bug if we called this method. ::: compare_locales/parser.py:647 (Diff revisions 1 - 2) > + other_entry = other.entry.traverse(self.remove_spans) > + return own_entry.to_json() == other_entry.to_json() > + > + # Positions yielded by FluentChecker.check are absolute offsets from the > + # beginning of the file. This is different from the base Checker behavior > + # which yields offsets from the beginning of the current entity. Actually, I don't think this comment is correct. The existing checkers return positions within the value, not within the Entity. I.e. <ENTITY foo "&"> would error in position 0. ::: compare_locales/parser.py:24 (Diff revision 2) > +# Don't perform any merging > +CAN_NONE = 0 > +# Remove broken entities > +CAN_SKIP = 1 > +# Add missing and broken entities from the reference > +CAN_MERGE = 2 This supports CAN_MERGE without CAN_SKIP. I think we didn't want to support that. ::: compare_locales/tests/test_merge.py:366 (Diff revision 2) > + } > + ) > + > + # validate merge results > + mergepath = mozpath.join(self.tmp, "merge", "l10n.ftl") > + self.assert_(os.path.exists(mergepath)) The underlying idea with l10n-merge is that if you don't modify a file, it's not created. Which sounds to me like this shouldn't create a merge file? ::: compare_locales/tests/test_merge.py:430 (Diff revision 2) > + self.assert_(os.path.exists(mergepath)) > + > + p = getParser(mergepath) > + p.readFile(mergepath) > + merged_entities, merged_map = p.parse() > + self.assertEqual(sorted(merged_map.keys()), ['eff']) Update this to the new style in this test? self.assertEqual(map(lambda e: e.key, merged_entities), ["eff"]) The way these tests work changed when we redid the ordering in compare-locales just now. ::: compare_locales/tests/test_merge.py:484 (Diff revision 2) > + self.assert_(os.path.exists(mergepath)) > + > + p = getParser(mergepath) > + p.readFile(mergepath) > + _, merged_map = p.parse() > + self.assertEqual(sorted(merged_map.keys()), []) Here, too. (Yes, even more useless, but consistency is good)
Attachment #8874942 - Flags: review?(l10n) → review-
Comment on attachment 8874942 [details] Bug 1199670 - Add Fluent support to compare-locales. https://reviewboard.mozilla.org/r/146314/#review151304 > Do we actually need this? AFAICT, it'd be a bug if we called this method. It's a bug that was exposed by moving count_words to a method of EntityBase. I filed bug 1371342. > This supports CAN_MERGE without CAN_SKIP. I think we didn't want to support that. I'd prefer to implement the exact logic in ContentComparer and keep bit flags here. > The underlying idea with l10n-merge is that if you don't modify a file, it's not created. > > Which sounds to me like this shouldn't create a merge file? I think I'm starting to understand when to merge and when not to merge.
(In reply to Staś Małolepszy :stas from comment #29) > Review request updated; see interdiff: > https://reviewboard.mozilla.org/r/146314/diff/4-5/ I rebased the patch, fixed my flake8 errors and added fluent to install_requires in setup.py in order to use https://pypi.python.org/pypi/fluent/0.4.0.
Comment on attachment 8874942 [details] Bug 1199670 - Add Fluent support to compare-locales. https://reviewboard.mozilla.org/r/146314/#review153504 We did talk about this patch on the phone, putting the result in the bug, too. I had one more about testing that nothing breaks around stray comments and sections, as the parser only returns Entities. I added the expanded syntax for cross-channel and richer merging of l10n files. Not sure if we'd want x-channel to use the the wrapper parser of python-fluent directly. ... Too much think ::: compare_locales/compare.py:527 (Diff revision 5) > changed += 1 > - changed_w += self.countWords(refent.val) > + changed_w += refent.count_words() > # run checks: > if checker: > for tp, pos, msg, cat in checker.check(refent, l10nent): > # compute real src position, if first line, Let's move all of this into entity, and the col/line code might even just be in DTDEntity, 'cause that's data coming from the XML checks. ::: compare_locales/parser.py:617 (Diff revision 5) > + self.val_span = (0, 0) > + > + self.entry = entry > + > + def pp(self, value): > + # XXX Normalize whitespace? Can you add tests for syntax-sugar whitespace differences, and replace this comment with what's right? ::: compare_locales/parser.py:639 (Diff revision 5) > + > + @staticmethod > + def remove_spans(node): > + return None if isinstance(node, ftl.Span) else node > + > + def __eq__(self, other): Let's expose an API on BaseNode in python_fluent that allows us to compare ASTs deeply without copying the data structure, and with early returns on inequality. That way, we can avoid recreating both sides on the AST twice here. ::: compare_locales/tests/test_merge.py:333 (Diff revision 5) > u"column 0 for bar"}] > } > }) > > > +class TestFluent(unittest.TestCase): One more I forgot. I did wonder if the compare-locales wrapper on the fluent parser should return more than just Entitys. It'd be good to have tests for empty sections and stray comments, and possibly a test for an error in an entity in a section.
Attachment #8874942 - Flags: review?(l10n) → review-
Comment on attachment 8874942 [details] Bug 1199670 - Add Fluent support to compare-locales. https://reviewboard.mozilla.org/r/146314/#review153504 > One more I forgot. I did wonder if the compare-locales wrapper on the fluent parser should return more than just Entitys. > > It'd be good to have tests for empty sections and stray comments, and possibly a test for an error in an entity in a section. We'll do this in a follow-up.
Comment on attachment 8874942 [details] Bug 1199670 - Add Fluent support to compare-locales. https://reviewboard.mozilla.org/r/146312/#review157662 There was a flake8 nit, and the removal of the parser in test_ftl should go from the second to the first patch. Stas has fixes for both. I'd still need a version of fluent on pypi so that tests pass, so that needs to happen before we can land, and also before I can test this patch locally. Also, running the tests just from the command line puts them into .eggs, can you add that to both gitignore and hgignore? ::: setup.py:48 (Diff revision 6) > packages=['compare_locales', 'compare_locales.tests'], > package_data={ > 'compare_locales.tests': ['data/*.properties', 'data/*.dtd'] > }, > install_requires=[ > + 'fluent', Should we specify a minimum version of python-fluent here that actually has .equals in the version that we need it?
Comment on attachment 8880384 [details] Remove Parser.postProcessValue in favor of Entity subclasses. https://reviewboard.mozilla.org/r/151722/#review157664
Attachment #8880384 - Flags: review?(l10n) → review+
Comment on attachment 8874942 [details] Bug 1199670 - Add Fluent support to compare-locales. https://reviewboard.mozilla.org/r/146314/#review157666 r=me with the nits I had on the squashed diff.
Attachment #8874942 - Flags: review?(l10n) → review+
\o/
Assignee: nobody → stas
Status: NEW → RESOLVED
Closed: 8 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: