Closed Bug 1027684 Opened 10 years ago Closed 9 years ago

Implement a subset of l20n format that matches current properties' data model

Categories

(Firefox OS Graveyard :: Gaia::L10n, defect, P2)

x86
All
defect

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: zbraniecki, Assigned: zbraniecki)

References

()

Details

Attachments

(2 files, 3 obsolete files)

As the first step toward transition to L20n file format in Gaia, we want to design and implement a small subset of http://l20n.github.io/spec/grammar.html to match the current data model of Gaia's properties'.

In general it will support:
 - entities with strings or hash values
 - attributes
 - expanders
 - identifiers (entity references)
 - variables (developer variable references)
 - indexes
 - globals (@cldr.plural)
 - call expressions (plural call references)
I made some progress over last two days. I now have a branch that has:

 - wrapped up properties parser in a class
 - added l20n parser that parses to the same AST
 - added l20n serializer that serializes that AST to l20n file format
 - gaia-convert tool that loads properties and writes l20n

It's still pretty early work. I'd like to fork the parser to have one that preserves white spaces and comments, and test performance of l20n parser vs. properties parser.
Priority: -- → P2
No longer blocks: 1022859
Depends on: 1022859
With landing of bug 1022859 the patch in this bug became significantly simpler. I updated the branch and will continue working on it.
Before we can move forward here we need to introduce compare-locales support for the format.

Axel, do you want us to use bug 816258 for that, or file a new one?
Flags: needinfo?(l10n)
I think we need new code for compare-locales and .l20n. Just a parser won't do.
Flags: needinfo?(l10n)
Depends on: 1037052
Blocks: 982124
I updated the patch to handle overlay type.

I tested apps/settings/locales/settings.en-US.properties by converting it to l20n and then producing JSON AST for both - prop and l20n files. The AST's are identical.

I believe this code is ready.
Attached patch patch (obsolete) — Splinter Review
This patch adds a version of L20n Parser and Serializer that matches current's master Properties parser.

It also adds ./tools/convert-gaia script that one can use to convert .properties file into .l20n.

A cycle I used to test it:

./tools/parse.js ~/projects/gaia/apps/settings/locales/settings.en-US.properties > ./prop-ast.json
./tools/convert-gaia.js ~/projects/gaia/apps/settings/locales/settings.en-US.properties > ./settings.en-US.l20n
./tools/parser.js ./settings.en-US.l20n > ./prop-l20n.json
diff -uN ./prop-ast.json ./prop-l20n.json

The two jsons have the exact same size and the only difference is in the identifiers having "-" replaced with "_".

Stas, I'm not sure how you'd like to review this. It seems fairly big unfortunately :( Let me know if I can help you make it easier!
Attachment #8603203 - Flags: review?(stas)
Comment on attachment 8603203 [details] [diff] [review]
patch

Review of attachment 8603203 [details] [diff] [review]:
-----------------------------------------------------------------

I'm still reviewing the AST and the parser; here's the first round of feedback.

::: src/lib/format/l20n/parser.js
@@ +18,5 @@
> +    index: /@cldr\.plural\(\$?(\w+)\)/g,
> +    placeables: /\{\{\s*([^\s]*?)\s*\}\}/,
> +  },
> +
> +  parse: function (ctx, string) {

ctx is not used in the parse method.  Should we add error emitting like we do in PropertiesParser?  You don't seem to be using any try…catches in this parser; is this on purpose?  Sounds like this might break a bit too often.

::: tools/parse.js
@@ +40,2 @@
>    try {
> +    switch (type) {

Maybe keep the try limited to a single line?  So, move the switch outside of it and assign the proper parsing function to a variable that you use inside of the try…catch.

@@ +62,4 @@
>  }
>  
>  if (program.args.length) {
> +  var type = program.args[0].substr(program.args[0].lastIndexOf('.')+1);

nit: put spaces around the + sign.
Comment on attachment 8603203 [details] [diff] [review]
patch

Review of attachment 8603203 [details] [diff] [review]:
-----------------------------------------------------------------

r=me for this round; I'd like to see a new patch with updated error emitting, comments parsing, new behavior of unescapes and source strings of complex strings.
Attachment #8603203 - Flags: review?(stas) → review+
Attached patch patch v2 (obsolete) — Splinter Review
Updated patch
Attachment #8603203 - Attachment is obsolete: true
Attachment #8605129 - Flags: review?(stas)
Comment on attachment 8605129 [details] [diff] [review]
patch v2

Review of attachment 8605129 [details] [diff] [review]:
-----------------------------------------------------------------

I'm going to r- this based on my understanding of the Monday meeting:

 - we shouldn't support \n at all for now (throw L10nErrror),
 - we should support valid expressions in placeables,
 - in the 'simple' mode, we still want to parse the complex string to check for syntax errors.

Are these correct?

::: src/lib/format/l20n/parser.js
@@ +107,5 @@
> +          return '\n';
> +        case 'r':
> +          return '\r';
> +        case 't':
> +          return '\t';

My understanding after the Monday meeting was that we'd throw on \n, \t and \r, and only support \\, \{{, \' and \".

@@ +131,5 @@
> +    outer:
> +    while (opcharPos !== -1) {
> +      var backtrack = opcharPos - 1;
> +      while (this._source.charCodeAt(backtrack) === 92) {
> +        if (this._source.charCodeAt(backtrack - 1) === 92) {

Please put a comment explaining what 92 is.  Or assign 92 to a const defined at the top of the file?

@@ +135,5 @@
> +        if (this._source.charCodeAt(backtrack - 1) === 92) {
> +          backtrack -= 2;
> +        } else {
> +          opcharPos = this._source.indexOf(opchar, opcharPos + 1);
> +          continue outer;

Also, will this backtracking approach work for nested placeables?  E.g. " Hello {{ "world" }}!"  Would it be helpful to use two counters, stringCount and placeableCount and increment them when a new respective control character is encountered?  You can then stop parsing the string when both counters are back to zero.

@@ +150,5 @@
> +
> +    this._index = opcharPos + 1;
> +
> +    if (buf.indexOf('\\') !== -1) {
> +      buf = this.unescapeString(buf, opchar);

I think this is too early to call unescapeString.  In the simple mode, we want bug to stay un-unescaped.

@@ +158,5 @@
> +      overlay = true;
> +    }
> +
> +    if (!this.simpleMode && buf.indexOf('{{') !== -1) {
> +      return [this.parseString(buf), overlay];

I'd still call parseString in the simple mode to detect parsing errors (but return buf, like you already do).

@@ +189,5 @@
> +      return null;
> +    }
> +
> +    if (index) {
> +      return {'$v': val, '$x': index};

Does this mean that we're using dollar-prefixed keys everywhere in the AST now?  (If so, That's cool.)

@@ +222,5 @@
> +
> +    var match = reId.exec(this._source);
> +
> +    this._index = reId.lastIndex;
> +

Nit: remove blank lines?

@@ +339,5 @@
> +
> +    this.getWS();
> +
> +    this._index++;
> +

Nit: remove blank lines?
Attachment #8605129 - Flags: review?(stas) → review-
(In reply to Staś Małolepszy :stas from comment #11)
> Comment on attachment 8605129 [details] [diff] [review]
> patch v2
> 
> Review of attachment 8605129 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> I'm going to r- this based on my understanding of the Monday meeting:
> 
>  - we shouldn't support \n at all for now (throw L10nErrror),
>  - we should support valid expressions in placeables,
>  - in the 'simple' mode, we still want to parse the complex string to check
> for syntax errors.
> 
> Are these correct?

Those are correct for the v3-parser. "v2-parser" is a compatibility mode parser. I'm only matching the current spec and it's only intended to enable people to tinker with l20n file format until we are ready to land v3 API with v3 parser.

The overall goal is to get a parser that covers everything that properties parser does, and nothing more.

I do not intend to enable full expression syntax, nor full index support. I do not "parse" complex strings in this mode neither in simple nor in complex mode.

> My understanding after the Monday meeting was that we'd throw on \n, \t and
> \r, and only support \\, \{{, \' and \".

Oh, ok. I interpreted it as "we will want to support \n, \r, \t at some point."

> Also, will this backtracking approach work for nested placeables?  E.g. "
> Hello {{ "world" }}!"  Would it be helpful to use two counters, stringCount
> and placeableCount and increment them when a new respective control
> character is encountered?  You can then stop parsing the string when both
> counters are back to zero.

No, as I explained, I do not cover anything else than PropertiesParser does in this parser. No nested placeables.

> I think this is too early to call unescapeString.  In the simple mode, we
> want bug to stay un-unescaped.

Good catch

> I'd still call parseString in the simple mode to detect parsing errors (but
> return buf, like you already do).

Ok, but this parseString doesn't really do too much parsing. The only error it can throw is "Too much placeables".

> @@ +189,5 @@
> > +      return null;
> > +    }
> > +
> > +    if (index) {
> > +      return {'$v': val, '$x': index};
> 
> Does this mean that we're using dollar-prefixed keys everywhere in the AST
> now?  (If so, That's cool.)

No, Just for entities. Like we do in PropertiesParser.


Bottom line is. If you don't like the simplified approach of having transitional parser landed now, that matches PropertiesParser simplicity and scope, then I don't believe it's worth taking v3 parser and bending it to the current AST needs.

I really don't want to spend too much time on this parser and I do not intend to have it for any production ever. It's supposed to let people play with l20n until we land v3 API which I hope will happen this month or early next month.
Flags: needinfo?(stas)
(In reply to Staś Małolepszy :stas from comment #8)
 > ::: tools/parse.js
> @@ +40,2 @@
> >    try {
> > +    switch (type) {
> 
> Maybe keep the try limited to a single line?  So, move the switch outside of
> it and assign the proper parsing function to a variable that you use inside
> of the try…catch.

What is the value of that change? We will need to have a try catch code for each case, instead of having one. It sounds like a worse code flow to me.
(In reply to Staś Małolepszy :stas from comment #11)
> ::: src/lib/format/l20n/parser.js
> @@ +107,5 @@
> > +          return '\n';
> > +        case 'r':
> > +          return '\r';
> > +        case 't':
> > +          return '\t';
> 
> My understanding after the Monday meeting was that we'd throw on \n, \t and
> \r, and only support \\, \{{, \' and \".

I removed support for those three from the patch.

One note is that I don't support unescaping \' if the string is marked by " and \" if the string uses '.
I believe it's a good choice in line with "let's unescape as little as possible".
Attached patch patch v3 (obsolete) — Splinter Review
I updated the patch to address your comments.

I believe at this point is a matter of how we want to introduce l20n. If you want to have a full expression parser, placeables parsing, complex string parsing etc. I believe it makes very little sense to invest in getting it working for v2 API, and I'd say let's wait a month or two until we have v3.

My preference is to land it, and let people play with l20n file format. This parser allows for that and any l20n that will work with this parser will work with the full parser later.
Attachment #8605129 - Attachment is obsolete: true
Attachment #8605598 - Flags: review?(stas)
Attached patch patch v4Splinter Review
It's incredible how much I hate bugzilla's patch review flow. Sorry for spam stas, I updated the unescape function to handle unicode unescaping properly.
Attachment #8605598 - Attachment is obsolete: true
Attachment #8605598 - Flags: review?(stas)
Attachment #8605603 - Flags: review?(stas)
(In reply to Zibi Braniecki [:gandalf][:zibi] from comment #12)
> The overall goal is to get a parser that covers everything that properties
> parser does, and nothing more.
> 
> I do not intend to enable full expression syntax, nor full index support. I
> do not "parse" complex strings in this mode neither in simple nor in complex
> mode.

OK, thanks for the explanation, I think this is the part which wasn't clear to me.

> Ok, but this parseString doesn't really do too much parsing. The only error
> it can throw is "Too much placeables".

OK, let's keep this as is for now.

(In reply to Zibi Braniecki [:gandalf][:zibi] from comment #13)

> > Maybe keep the try limited to a single line?  So, move the switch outside of
> > it and assign the proper parsing function to a variable that you use inside
> > of the try…catch.
> 
> What is the value of that change? We will need to have a try catch code for
> each case, instead of having one. It sounds like a worse code flow to me.

You should be able to get away with only one try...catch like so:

  var parse;
  switch(type):
    case 'properties':
      parse = PropertiesParser.parse.bind(..);
      break;
    ..
  try {
    parse(..);
  } catch { .. }

The value of having only one statement inside of the try...catch is that it's easier to know where the error is coming from.

(In reply to Zibi Braniecki [:gandalf][:zibi] from comment #15)
> My preference is to land it, and let people play with l20n file format. This
> parser allows for that and any l20n that will work with this parser will
> work with the full parser later.

OK, I see where you're coming from.  I'm slightly afraid of the confusion this might cause:  we'll end up with 3 versions of supported L20n syntax (Gaia 2, Gaia 3 and L20n 1.0.x).  I already thought that the parser you're writing for Gaia 3 was scoped down and limited compared to what we did in L20n 1.0.x.

How do you want to communicate these differences?
Flags: needinfo?(stas)
Comment on attachment 8605603 [details] [diff] [review]
patch v4

Review of attachment 8605603 [details] [diff] [review]:
-----------------------------------------------------------------

::: src/lib/format/l20n/parser.js
@@ +156,5 @@
> +    if (!this.simpleMode && buf.indexOf('{{') !== -1) {
> +      return [this.parseString(buf), overlay];
> +    }
> +
> +    return [buf, overlay];

Should we somehow indicate that this is the 'simple' return value?  We can add this in a follow up when we add complex string parsing to the resolver, too.
Attachment #8605603 - Flags: review?(stas) → review+
(In reply to Staś Małolepszy :stas from comment #18)

> Should we somehow indicate that this is the 'simple' return value?  We can
> add this in a follow up when we add complex string parsing to the resolver,
> too.

My thinking is that for now, no. I added it so that you can experiment with double-parsing performance/memory impact. I don't expect us to run into a scenario where AST may or may not have complex strings as simple.
(In reply to Staś Małolepszy :stas from comment #17)
> OK, I see where you're coming from.  I'm slightly afraid of the confusion
> this might cause:  we'll end up with 3 versions of supported L20n syntax
> (Gaia 2, Gaia 3 and L20n 1.0.x).  I already thought that the parser you're
> writing for Gaia 3 was scoped down and limited compared to what we did in
> L20n 1.0.x.
> 
> How do you want to communicate these differences?

1.0.x is not interesting for Gaia,

v2 is not interesting for anyone except of Gaia

v3 is the ultimate weapon of choice and the future of the world ;)

I want to introduce v2 *now* so that work that people are putting (like the commit hook introduced today?) is geared toward l20n, not properties. I want them to familiarize themselves with l20n file format and use it in their experimentation knowing that .properties are going away.

When we land v3, l20n should work the same.

I'm really worried that we're missing the window of opportunity to introduce l20n and I don't want to wait for API v3 which doesn't seem to be ready yet.

Do you agree with this approach?
Flags: needinfo?(stas)
I think a bit differently about 1.0.x vs. 3, but I agree about v2. Go ahead :) 

Also, tests please?
Flags: needinfo?(stas)
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: