Implement a subset of l20n format that matches current properties' data model

RESOLVED FIXED

Status

Firefox OS
Gaia::L10n
P2
normal
RESOLVED FIXED
3 years ago
2 years ago

People

(Reporter: gandalf, Assigned: gandalf)

Tracking

(Blocks: 1 bug)

Firefox Tracking Flags

(Not tracked)

Details

(URL)

Attachments

(2 attachments, 3 obsolete attachments)

(Assignee)

Description

3 years ago
As the first step toward transition to L20n file format in Gaia, we want to design and implement a small subset of http://l20n.github.io/spec/grammar.html to match the current data model of Gaia's properties'.

In general it will support:
 - entities with strings or hash values
 - attributes
 - expanders
 - identifiers (entity references)
 - variables (developer variable references)
 - indexes
 - globals (@cldr.plural)
 - call expressions (plural call references)
(Assignee)

Comment 1

3 years ago
I made some progress over last two days. I now have a branch that has:

 - wrapped up properties parser in a class
 - added l20n parser that parses to the same AST
 - added l20n serializer that serializes that AST to l20n file format
 - gaia-convert tool that loads properties and writes l20n

It's still pretty early work. I'd like to fork the parser to have one that preserves white spaces and comments, and test performance of l20n parser vs. properties parser.
Priority: -- → P2
(Assignee)

Updated

3 years ago
No longer blocks: 1022859
(Assignee)

Updated

3 years ago
Depends on: 1022859
(Assignee)

Comment 2

3 years ago
With landing of bug 1022859 the patch in this bug became significantly simpler. I updated the branch and will continue working on it.
(Assignee)

Comment 3

3 years ago
Before we can move forward here we need to introduce compare-locales support for the format.

Axel, do you want us to use bug 816258 for that, or file a new one?
Flags: needinfo?(l10n)
Blocks: 1025018

Comment 4

3 years ago
I think we need new code for compare-locales and .l20n. Just a parser won't do.
Flags: needinfo?(l10n)
(Assignee)

Updated

3 years ago
Depends on: 1037052
Blocks: 1113480
Blocks: 982124
(Assignee)

Comment 5

2 years ago
I updated the patch to handle overlay type.

I tested apps/settings/locales/settings.en-US.properties by converting it to l20n and then producing JSON AST for both - prop and l20n files. The AST's are identical.

I believe this code is ready.
(Assignee)

Comment 6

2 years ago
Created attachment 8603203 [details] [diff] [review]
patch

This patch adds a version of L20n Parser and Serializer that matches current's master Properties parser.

It also adds ./tools/convert-gaia script that one can use to convert .properties file into .l20n.

A cycle I used to test it:

./tools/parse.js ~/projects/gaia/apps/settings/locales/settings.en-US.properties > ./prop-ast.json
./tools/convert-gaia.js ~/projects/gaia/apps/settings/locales/settings.en-US.properties > ./settings.en-US.l20n
./tools/parser.js ./settings.en-US.l20n > ./prop-l20n.json
diff -uN ./prop-ast.json ./prop-l20n.json

The two jsons have the exact same size and the only difference is in the identifiers having "-" replaced with "_".

Stas, I'm not sure how you'd like to review this. It seems fairly big unfortunately :( Let me know if I can help you make it easier!
Attachment #8603203 - Flags: review?(stas)

Comment 7

2 years ago
Created attachment 8603462 [details] [review]
[gaia] zbraniecki:1027684-add-subset-of-l20n-format > mozilla-b2g:master
Comment on attachment 8603203 [details] [diff] [review]
patch

Review of attachment 8603203 [details] [diff] [review]:
-----------------------------------------------------------------

I'm still reviewing the AST and the parser; here's the first round of feedback.

::: src/lib/format/l20n/parser.js
@@ +18,5 @@
> +    index: /@cldr\.plural\(\$?(\w+)\)/g,
> +    placeables: /\{\{\s*([^\s]*?)\s*\}\}/,
> +  },
> +
> +  parse: function (ctx, string) {

ctx is not used in the parse method.  Should we add error emitting like we do in PropertiesParser?  You don't seem to be using any try…catches in this parser; is this on purpose?  Sounds like this might break a bit too often.

::: tools/parse.js
@@ +40,2 @@
>    try {
> +    switch (type) {

Maybe keep the try limited to a single line?  So, move the switch outside of it and assign the proper parsing function to a variable that you use inside of the try…catch.

@@ +62,4 @@
>  }
>  
>  if (program.args.length) {
> +  var type = program.args[0].substr(program.args[0].lastIndexOf('.')+1);

nit: put spaces around the + sign.
Comment on attachment 8603203 [details] [diff] [review]
patch

Review of attachment 8603203 [details] [diff] [review]:
-----------------------------------------------------------------

r=me for this round; I'd like to see a new patch with updated error emitting, comments parsing, new behavior of unescapes and source strings of complex strings.
Attachment #8603203 - Flags: review?(stas) → review+
(Assignee)

Comment 10

2 years ago
Created attachment 8605129 [details] [diff] [review]
patch v2

Updated patch
Attachment #8603203 - Attachment is obsolete: true
Attachment #8605129 - Flags: review?(stas)
Comment on attachment 8605129 [details] [diff] [review]
patch v2

Review of attachment 8605129 [details] [diff] [review]:
-----------------------------------------------------------------

I'm going to r- this based on my understanding of the Monday meeting:

 - we shouldn't support \n at all for now (throw L10nErrror),
 - we should support valid expressions in placeables,
 - in the 'simple' mode, we still want to parse the complex string to check for syntax errors.

Are these correct?

::: src/lib/format/l20n/parser.js
@@ +107,5 @@
> +          return '\n';
> +        case 'r':
> +          return '\r';
> +        case 't':
> +          return '\t';

My understanding after the Monday meeting was that we'd throw on \n, \t and \r, and only support \\, \{{, \' and \".

@@ +131,5 @@
> +    outer:
> +    while (opcharPos !== -1) {
> +      var backtrack = opcharPos - 1;
> +      while (this._source.charCodeAt(backtrack) === 92) {
> +        if (this._source.charCodeAt(backtrack - 1) === 92) {

Please put a comment explaining what 92 is.  Or assign 92 to a const defined at the top of the file?

@@ +135,5 @@
> +        if (this._source.charCodeAt(backtrack - 1) === 92) {
> +          backtrack -= 2;
> +        } else {
> +          opcharPos = this._source.indexOf(opchar, opcharPos + 1);
> +          continue outer;

Also, will this backtracking approach work for nested placeables?  E.g. " Hello {{ "world" }}!"  Would it be helpful to use two counters, stringCount and placeableCount and increment them when a new respective control character is encountered?  You can then stop parsing the string when both counters are back to zero.

@@ +150,5 @@
> +
> +    this._index = opcharPos + 1;
> +
> +    if (buf.indexOf('\\') !== -1) {
> +      buf = this.unescapeString(buf, opchar);

I think this is too early to call unescapeString.  In the simple mode, we want bug to stay un-unescaped.

@@ +158,5 @@
> +      overlay = true;
> +    }
> +
> +    if (!this.simpleMode && buf.indexOf('{{') !== -1) {
> +      return [this.parseString(buf), overlay];

I'd still call parseString in the simple mode to detect parsing errors (but return buf, like you already do).

@@ +189,5 @@
> +      return null;
> +    }
> +
> +    if (index) {
> +      return {'$v': val, '$x': index};

Does this mean that we're using dollar-prefixed keys everywhere in the AST now?  (If so, That's cool.)

@@ +222,5 @@
> +
> +    var match = reId.exec(this._source);
> +
> +    this._index = reId.lastIndex;
> +

Nit: remove blank lines?

@@ +339,5 @@
> +
> +    this.getWS();
> +
> +    this._index++;
> +

Nit: remove blank lines?
Attachment #8605129 - Flags: review?(stas) → review-
(Assignee)

Comment 12

2 years ago
(In reply to Staś Małolepszy :stas from comment #11)
> Comment on attachment 8605129 [details] [diff] [review]
> patch v2
> 
> Review of attachment 8605129 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> I'm going to r- this based on my understanding of the Monday meeting:
> 
>  - we shouldn't support \n at all for now (throw L10nErrror),
>  - we should support valid expressions in placeables,
>  - in the 'simple' mode, we still want to parse the complex string to check
> for syntax errors.
> 
> Are these correct?

Those are correct for the v3-parser. "v2-parser" is a compatibility mode parser. I'm only matching the current spec and it's only intended to enable people to tinker with l20n file format until we are ready to land v3 API with v3 parser.

The overall goal is to get a parser that covers everything that properties parser does, and nothing more.

I do not intend to enable full expression syntax, nor full index support. I do not "parse" complex strings in this mode neither in simple nor in complex mode.

> My understanding after the Monday meeting was that we'd throw on \n, \t and
> \r, and only support \\, \{{, \' and \".

Oh, ok. I interpreted it as "we will want to support \n, \r, \t at some point."

> Also, will this backtracking approach work for nested placeables?  E.g. "
> Hello {{ "world" }}!"  Would it be helpful to use two counters, stringCount
> and placeableCount and increment them when a new respective control
> character is encountered?  You can then stop parsing the string when both
> counters are back to zero.

No, as I explained, I do not cover anything else than PropertiesParser does in this parser. No nested placeables.

> I think this is too early to call unescapeString.  In the simple mode, we
> want bug to stay un-unescaped.

Good catch

> I'd still call parseString in the simple mode to detect parsing errors (but
> return buf, like you already do).

Ok, but this parseString doesn't really do too much parsing. The only error it can throw is "Too much placeables".

> @@ +189,5 @@
> > +      return null;
> > +    }
> > +
> > +    if (index) {
> > +      return {'$v': val, '$x': index};
> 
> Does this mean that we're using dollar-prefixed keys everywhere in the AST
> now?  (If so, That's cool.)

No, Just for entities. Like we do in PropertiesParser.


Bottom line is. If you don't like the simplified approach of having transitional parser landed now, that matches PropertiesParser simplicity and scope, then I don't believe it's worth taking v3 parser and bending it to the current AST needs.

I really don't want to spend too much time on this parser and I do not intend to have it for any production ever. It's supposed to let people play with l20n until we land v3 API which I hope will happen this month or early next month.
Flags: needinfo?(stas)
(Assignee)

Comment 13

2 years ago
(In reply to Staś Małolepszy :stas from comment #8)
 > ::: tools/parse.js
> @@ +40,2 @@
> >    try {
> > +    switch (type) {
> 
> Maybe keep the try limited to a single line?  So, move the switch outside of
> it and assign the proper parsing function to a variable that you use inside
> of the try…catch.

What is the value of that change? We will need to have a try catch code for each case, instead of having one. It sounds like a worse code flow to me.
(Assignee)

Comment 14

2 years ago
(In reply to Staś Małolepszy :stas from comment #11)
> ::: src/lib/format/l20n/parser.js
> @@ +107,5 @@
> > +          return '\n';
> > +        case 'r':
> > +          return '\r';
> > +        case 't':
> > +          return '\t';
> 
> My understanding after the Monday meeting was that we'd throw on \n, \t and
> \r, and only support \\, \{{, \' and \".

I removed support for those three from the patch.

One note is that I don't support unescaping \' if the string is marked by " and \" if the string uses '.
I believe it's a good choice in line with "let's unescape as little as possible".
(Assignee)

Comment 15

2 years ago
Created attachment 8605598 [details] [diff] [review]
patch v3

I updated the patch to address your comments.

I believe at this point is a matter of how we want to introduce l20n. If you want to have a full expression parser, placeables parsing, complex string parsing etc. I believe it makes very little sense to invest in getting it working for v2 API, and I'd say let's wait a month or two until we have v3.

My preference is to land it, and let people play with l20n file format. This parser allows for that and any l20n that will work with this parser will work with the full parser later.
Attachment #8605129 - Attachment is obsolete: true
Attachment #8605598 - Flags: review?(stas)
(Assignee)

Comment 16

2 years ago
Created attachment 8605603 [details] [diff] [review]
patch v4

It's incredible how much I hate bugzilla's patch review flow. Sorry for spam stas, I updated the unescape function to handle unicode unescaping properly.
Attachment #8605598 - Attachment is obsolete: true
Attachment #8605598 - Flags: review?(stas)
Attachment #8605603 - Flags: review?(stas)
(In reply to Zibi Braniecki [:gandalf][:zibi] from comment #12)
> The overall goal is to get a parser that covers everything that properties
> parser does, and nothing more.
> 
> I do not intend to enable full expression syntax, nor full index support. I
> do not "parse" complex strings in this mode neither in simple nor in complex
> mode.

OK, thanks for the explanation, I think this is the part which wasn't clear to me.

> Ok, but this parseString doesn't really do too much parsing. The only error
> it can throw is "Too much placeables".

OK, let's keep this as is for now.

(In reply to Zibi Braniecki [:gandalf][:zibi] from comment #13)

> > Maybe keep the try limited to a single line?  So, move the switch outside of
> > it and assign the proper parsing function to a variable that you use inside
> > of the try…catch.
> 
> What is the value of that change? We will need to have a try catch code for
> each case, instead of having one. It sounds like a worse code flow to me.

You should be able to get away with only one try...catch like so:

  var parse;
  switch(type):
    case 'properties':
      parse = PropertiesParser.parse.bind(..);
      break;
    ..
  try {
    parse(..);
  } catch { .. }

The value of having only one statement inside of the try...catch is that it's easier to know where the error is coming from.

(In reply to Zibi Braniecki [:gandalf][:zibi] from comment #15)
> My preference is to land it, and let people play with l20n file format. This
> parser allows for that and any l20n that will work with this parser will
> work with the full parser later.

OK, I see where you're coming from.  I'm slightly afraid of the confusion this might cause:  we'll end up with 3 versions of supported L20n syntax (Gaia 2, Gaia 3 and L20n 1.0.x).  I already thought that the parser you're writing for Gaia 3 was scoped down and limited compared to what we did in L20n 1.0.x.

How do you want to communicate these differences?
Flags: needinfo?(stas)
Comment on attachment 8605603 [details] [diff] [review]
patch v4

Review of attachment 8605603 [details] [diff] [review]:
-----------------------------------------------------------------

::: src/lib/format/l20n/parser.js
@@ +156,5 @@
> +    if (!this.simpleMode && buf.indexOf('{{') !== -1) {
> +      return [this.parseString(buf), overlay];
> +    }
> +
> +    return [buf, overlay];

Should we somehow indicate that this is the 'simple' return value?  We can add this in a follow up when we add complex string parsing to the resolver, too.
Attachment #8605603 - Flags: review?(stas) → review+
(Assignee)

Comment 19

2 years ago
(In reply to Staś Małolepszy :stas from comment #18)

> Should we somehow indicate that this is the 'simple' return value?  We can
> add this in a follow up when we add complex string parsing to the resolver,
> too.

My thinking is that for now, no. I added it so that you can experiment with double-parsing performance/memory impact. I don't expect us to run into a scenario where AST may or may not have complex strings as simple.
(Assignee)

Comment 20

2 years ago
(In reply to Staś Małolepszy :stas from comment #17)
> OK, I see where you're coming from.  I'm slightly afraid of the confusion
> this might cause:  we'll end up with 3 versions of supported L20n syntax
> (Gaia 2, Gaia 3 and L20n 1.0.x).  I already thought that the parser you're
> writing for Gaia 3 was scoped down and limited compared to what we did in
> L20n 1.0.x.
> 
> How do you want to communicate these differences?

1.0.x is not interesting for Gaia,

v2 is not interesting for anyone except of Gaia

v3 is the ultimate weapon of choice and the future of the world ;)

I want to introduce v2 *now* so that work that people are putting (like the commit hook introduced today?) is geared toward l20n, not properties. I want them to familiarize themselves with l20n file format and use it in their experimentation knowing that .properties are going away.

When we land v3, l20n should work the same.

I'm really worried that we're missing the window of opportunity to introduce l20n and I don't want to wait for API v3 which doesn't seem to be ready yet.

Do you agree with this approach?
Flags: needinfo?(stas)
I think a bit differently about 1.0.x vs. 3, but I agree about v2. Go ahead :) 

Also, tests please?
Flags: needinfo?(stas)
(Assignee)

Comment 22

2 years ago
Commit: https://github.com/mozilla-b2g/gaia/commit/f657d5dd4fec6dcc6d2424b818717e08cd282126
Merge: https://github.com/mozilla-b2g/gaia/commit/ad1ef4b324faf0e87a1cdd22d285004094cd5eb0

L20n.js: https://github.com/l20n/l20n.js/commit/ce225c412e900cf60f454bfbafd40eeece360f43
Status: ASSIGNED → RESOLVED
Last Resolved: 2 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.