Add a test that will look for misuse of single-quotes, double-quotes, and "..." (not ellipsis) in dtd/properties files

RESOLVED FIXED

Status

()

defect
RESOLVED FIXED
3 years ago
3 years ago

People

(Reporter: jaws, Assigned: jaws)

Tracking

(Blocks 2 bugs)

Trunk
Points:
---
Dependency tree / graph

Firefox Tracking Flags

(firefox48 affected)

Details

Attachments

(1 attachment)

Bug 1151449 and bug 1259847 fixed a number of strings that use single-quotes when smart quotes/curly quotes would have been preferred.

We should have a script that runs on commit/check-in that looks at DTD/properties file changes and rejects the commit if curly quotes or three dots ("...") are used.
This should also check for straight double-quotes (U+0022 " instead of U+201C “ + U+211D ”), although apparently we don't have any right now.
(Actually, we do!)
Assignee: nobody → jaws
Status: NEW → ASSIGNED
This is only scanning .properties files for now. I have to think of a way to scan through DTD files.
Comment on attachment 8735009 [details]
MozReview Request: Bug 1259859 - Add a test that looks for misused characters in user-facing strings. r?dolske,jryans

Review request updated; see interdiff: https://reviewboard.mozilla.org/r/42513/diff/1-2/
Comment on attachment 8735009 [details]
MozReview Request: Bug 1259859 - Add a test that looks for misused characters in user-facing strings. r?dolske,jryans

Review request updated; see interdiff: https://reviewboard.mozilla.org/r/42513/diff/2-3/
Summary: Create a lint tool that will look for misuse of single-quotes and "..." (not ellipsis) in dtd/properties files → Add a test that will look for misuse of single-quotes, double-quotes, and "..." (not ellipsis) in dtd/properties files
Comment on attachment 8735009 [details]
MozReview Request: Bug 1259859 - Add a test that looks for misused characters in user-facing strings. r?dolske,jryans

Review request updated; see interdiff: https://reviewboard.mozilla.org/r/42513/diff/3-4/
See https://bugzilla.mozilla.org/show_bug.cgi?id=1151449#c12 for some background discussion referring to this bug.
Comment on attachment 8735009 [details]
MozReview Request: Bug 1259859 - Add a test that looks for misused characters in user-facing strings. r?dolske,jryans

Review request updated; see interdiff: https://reviewboard.mozilla.org/r/42513/diff/4-5/
Fly-by comments on the tests:

Does the directory iterator dive into the object dir? I'm working on a test perpendicular to this one (bug 1255407), and going through my files is the long pole in terms of perf.

Also, I'd make the error message list the good one to use in the message, so people can copy and paste.

Do we care about the bad quotes and ellipsis in comments? As right now, it catches those, too.
Comment on attachment 8735009 [details]
MozReview Request: Bug 1259859 - Add a test that looks for misused characters in user-facing strings. r?dolske,jryans

https://reviewboard.mozilla.org/r/42513/#review39201

The overall test and DevTools changes look good to me.
Attachment #8735009 - Flags: review?(jryans) → review+
(In reply to Axel Hecht [:Pike] from comment #10)
> Fly-by comments on the tests:
> 
> Does the directory iterator dive into the object dir? I'm working on a test
> perpendicular to this one (bug 1255407), and going through my files is the
> long pole in terms of perf.

The test uses generateURIsFromDirTree from parsingTestHelpers.jsm to get the list. It's missing some of the files such as dom.properties but those can be added in a follow-up.

> Also, I'd make the error message list the good one to use in the message, so
> people can copy and paste.

I've added this in my latest patch.
 
> Do we care about the bad quotes and ellipsis in comments? As right now, it
> catches those, too.

This test doesn't look at comments and I don't think we should concern ourselves with those.
Comment on attachment 8735009 [details]
MozReview Request: Bug 1259859 - Add a test that looks for misused characters in user-facing strings. r?dolske,jryans

Review request updated; see interdiff: https://reviewboard.mozilla.org/r/42513/diff/5-6/
Comment on attachment 8735009 [details]
MozReview Request: Bug 1259859 - Add a test that looks for misused characters in user-facing strings. r?dolske,jryans

Review request updated; see interdiff: https://reviewboard.mozilla.org/r/42513/diff/6-7/
https://reviewboard.mozilla.org/r/42513/#review39335

::: browser/base/content/test/general/browser_misused_characters_in_strings.js:111
(Diff revision 7)
> +  testForError(spec, entity, str, /"/, "double-quote", "\u201c\u201d");
> +  testForError(spec, entity, str, /\.\.\./, "ellipsis", "\u2026");
> +}
> +
> +add_task(function* checkAllTheProperties() {
> +  let appDir = Services.dirsvc.get("XCurProcD", Ci.nsIFile);

I wonder if the missing of dom.properties would be resolved if this was "GreD" instead of XCurProcD?

If the missing dom.properties comes from just iterating over browser/omni.ja and not omni.ja.
Comment on attachment 8735009 [details]
MozReview Request: Bug 1259859 - Add a test that looks for misused characters in user-facing strings. r?dolske,jryans

Review request updated; see interdiff: https://reviewboard.mozilla.org/r/42513/diff/7-8/
(In reply to Axel Hecht [:Pike] from comment #15)
> https://reviewboard.mozilla.org/r/42513/#review39335
> 
> :::
> browser/base/content/test/general/browser_misused_characters_in_strings.js:
> 111
> (Diff revision 7)
> > +  testForError(spec, entity, str, /"/, "double-quote", "\u201c\u201d");
> > +  testForError(spec, entity, str, /\.\.\./, "ellipsis", "\u2026");
> > +}
> > +
> > +add_task(function* checkAllTheProperties() {
> > +  let appDir = Services.dirsvc.get("XCurProcD", Ci.nsIFile);
> 
> I wonder if the missing of dom.properties would be resolved if this was
> "GreD" instead of XCurProcD?
> 
> If the missing dom.properties comes from just iterating over browser/omni.ja
> and not omni.ja.

That might be the reason. There is certainly some other files that show up with GreD compared to XCurProcD, though I'd like to change that in a follow-up bug once this lands since 1) the patch here is already huge, and 2) XCurProcD matches the two parsing tests (browser_parsable_css.js and browser_parsable_script.js).
Comment on attachment 8735009 [details]
MozReview Request: Bug 1259859 - Add a test that looks for misused characters in user-facing strings. r?dolske,jryans

https://reviewboard.mozilla.org/r/42513/#review39685

Nice work! r+ with a few small things.

::: browser/base/content/test/general/browser_misused_characters_in_strings.js:7
(Diff revision 8)
> + * http://creativecommons.org/publicdomain/zero/1.0/ */
> +
> +/* This list allows pre-existing or 'unfixable' issues to remain, while we
> + * detect newly occurring issues in shipping files. It is a list of objects
> + * specifying conditions under which an error should be ignored. */
> +const kWhitelist = [{

I was wondering how many exceptions we'd need. :-) [And I also wonder if there's a rare string or two used in some context where it has to be ASCII, but I'm probably being paranoid.]

Ideally these would be per-string annotations in the original .dtd/.property files, just to keep the metadata all in one place, but I think this is fine since there are not very many.

In fact...

All but one of these exceptions are of the form:

 <span|label class|id='foo'/?>
 
What do you think about just having testForErrors() strip this out from the string being tested?

::: browser/base/content/test/general/browser_misused_characters_in_strings.js:11
(Diff revision 8)
> + * specifying conditions under which an error should be ignored. */
> +const kWhitelist = [{
> +    file: "gclicommands.properties",
> +    entity: "cmdSetdirManual2",
> +    type: "double-quote"
> +  }, {

So... This is actually a bug! \o/

This string is attempting to link to a page with <a href="...">. But open up the Developer Toolbar, and type "help cmd setdir": You get a popup window showing this string literally, it's not a link.

I think we should remove the exception, and have the string just list the URL without any attempt at HTML markup.

(And I just noticed bug 1260607, so that's 2-for-1 here.)

::: browser/base/content/test/general/browser_misused_characters_in_strings.js:96
(Diff revision 8)
> +    };
> +    xhr.send(null);
> +  });
> +}
> +
> +function testForError(spec, entity, str, pattern, type, expected) {

nit: spec is a little confusing, make just "file" or "filename"?

even smaller nit: Instead of "entity", "key" or "stringName"?

::: browser/base/content/test/general/browser_misused_characters_in_strings.js:104
(Diff revision 8)
> +    ok(false, `${spec} with key=${entity} has a misused ${type}. Expected ${expected} instead.`);
> +  }
> +}
> +
> +function testForErrors(spec, entity, str) {
> +  testForError(spec, entity, str, /\w'\w/, "single-quote", "\u2018\u2019");

Hmm, it would be nice to call out "apostrophe" here specifically, (as the |type|), and suggest just \u2019 as the expected character. But fear this starts to go down a needless regex rathole.

I'd suggest using the |type| only for the whitelist, and make the second string be the descriptive text.

ok(false, `${spec} with key=${entity} is malformed. ${help}`);

The first testForErrors() would have:

"Single-quoted strings should use Unicode \u2018foo\u2019 instead of 'foo', and words with apostrophes should use foo\u2019 isntead of foo's."

Similarly for double-quote.

And "Strings with an ellipsis should use the Unicode \u2026 character instead of three periods."

::: browser/base/content/test/general/browser_misused_characters_in_strings.js:134
(Diff revision 8)
> +  let uris = yield generateURIsFromDirTree(appDir, [".dtd"]);
> +  ok(uris.length, `Found ${uris.length} .dtd files to scan for misused characters`);
> +
> +  for (let uri of uris) {
> +    let rawContents = yield fetchFile(uri.spec);
> +    let entities = rawContents.match(/ENTITY\s+([\w\.]*)\s+["'](.*)["']/g);

You could make this a little stricter, forcing it to start at the beginning of a line... /^<!ENTITY.../

Although this would only filter out edgecasy stuff (like commenting out an entity, or a perhaps a carefully constructed string talking about how to format an entity), so I don't really care. :)
Attachment #8735009 - Flags: review?(dolske) → review+
Margaret: Is mobile interested in this? If so you'll probably want a followup to fix /mobile strings and move this test to Toolkit.
Flags: needinfo?(margaret.leibovic)
Actually, one more comment. Especially with single-quotes, I find it at little difficult to differentiate the left- and right- forms at a glance. Bonus points for adding a check to make sure they're properly used in pairs (ditto for double-quotes). The obvious complications are dealing with a mix of single-quotes and apostrophes, nested quotes, etc. I don't remember if this is possible with a regex, but I bet it's a solved problem we could grab from the shelf!
PS: if this test was in python, you could use the in-tree version of compare-locales, and the parsers it's coming with. Which might come in handy once we add a new file format for l1(2)0n, as we don't need to write parsers twice then.
(In reply to Justin Dolske [:Dolske] from comment #20)
> Margaret: Is mobile interested in this? If so you'll probably want a
> followup to fix /mobile strings and move this test to Toolkit.

If this is something the UX team wants on desktop, I imagine it would apply to Android as well. I don't recall seeing us encounter this issue in any bugs, but I'm sure we have the same problems with our strings.

Also, if we're doing this in shared toolkit code, I imagine we'd want to be consistent in our /mobile front-end code as well.
Flags: needinfo?(margaret.leibovic)
https://hg.mozilla.org/integration/fx-team/rev/a877338b4740315e31d8843e4c6b34122f93e473
Bug 1259859 - Add a test that looks for misused characters in user-facing strings. r=dolske,jryans
(In reply to Justin Dolske [:Dolske] from comment #19)
> All but one of these exceptions are of the form:
> 
>  <span|label class|id='foo'/?>
>  
> What do you think about just having testForErrors() strip this out from the
> string being tested?

I went back and forth on this and decided not to add this to the test because I didn't want something to slip past that wasn't explicitly overridden. These types of strings don't get added often, so it shouldn't be too much work to update the override list if need be, whereas if these were filtered out we could get some incorrect usages in the tree without knowing it.
 
> browser/base/content/test/general/browser_misused_characters_in_strings.js:11
> > + * specifying conditions under which an error should be ignored. */
> > +const kWhitelist = [{
> > +    file: "gclicommands.properties",
> > +    entity: "cmdSetdirManual2",
> > +    type: "double-quote"
> > +  }, {
> 
> This string is attempting to link to a page with <a href="...">. But open up
> the Developer Toolbar, and type "help cmd setdir": You get a popup window
> showing this string literally, it's not a link.
> 
> I think we should remove the exception, and have the string just list the
> URL without any attempt at HTML markup.

Fixed.

> :::
> browser/base/content/test/general/browser_misused_characters_in_strings.js:96
> (Diff revision 8)
> > +    };
> > +    xhr.send(null);
> > +  });
> > +}
> > +
> > +function testForError(spec, entity, str, pattern, type, expected) {
> 
> nit: spec is a little confusing, make just "file" or "filename"?
> 
> even smaller nit: Instead of "entity", "key" or "stringName"?
> 

Fixed.

> :::
> browser/base/content/test/general/browser_misused_characters_in_strings.js:
> 104
> (Diff revision 8)
> > +    ok(false, `${spec} with key=${entity} has a misused ${type}. Expected ${expected} instead.`);
> > +  }
> > +}
> > +
> > +function testForErrors(spec, entity, str) {
> > +  testForError(spec, entity, str, /\w'\w/, "single-quote", "\u2018\u2019");
> 
> Hmm, it would be nice to call out "apostrophe" here specifically, (as the
> |type|), and suggest just \u2019 as the expected character. But fear this
> starts to go down a needless regex rathole.

Added.

> I'd suggest using the |type| only for the whitelist, and make the second
> string be the descriptive text.
> 
> ok(false, `${spec} with key=${entity} is malformed. ${help}`);
> 
> The first testForErrors() would have:
> 
> "Single-quoted strings should use Unicode \u2018foo\u2019 instead of 'foo',
> and words with apostrophes should use foo\u2019 isntead of foo's."
> 
> Similarly for double-quote.
> 
> And "Strings with an ellipsis should use the Unicode \u2026 character
> instead of three periods."

Good idea, I've made this change.

> :::
> browser/base/content/test/general/browser_misused_characters_in_strings.js:
> 134
> (Diff revision 8)
> > +  let uris = yield generateURIsFromDirTree(appDir, [".dtd"]);
> > +  ok(uris.length, `Found ${uris.length} .dtd files to scan for misused characters`);
> > +
> > +  for (let uri of uris) {
> > +    let rawContents = yield fetchFile(uri.spec);
> > +    let entities = rawContents.match(/ENTITY\s+([\w\.]*)\s+["'](.*)["']/g);
> 
> You could make this a little stricter, forcing it to start at the beginning
> of a line... /^<!ENTITY.../
> 
> Although this would only filter out edgecasy stuff (like commenting out an
> entity, or a perhaps a carefully constructed string talking about how to
> format an entity), so I don't really care. :)

I left this unchanged.

(In reply to Justin Dolske [:Dolske] from comment #21)
> Actually, one more comment. Especially with single-quotes, I find it at
> little difficult to differentiate the left- and right- forms at a glance.
> Bonus points for adding a check to make sure they're properly used in pairs
> (ditto for double-quotes). The obvious complications are dealing with a mix
> of single-quotes and apostrophes, nested quotes, etc. I don't remember if
> this is possible with a regex, but I bet it's a solved problem we could grab
> from the shelf!

I started to implement this and it got hairy pretty fast. Strings that included two sets of quotes (‘%1$S’ does not match ‘%2$S’) were flagged as erroneous because the two inner quote characters matched as being in the wrong order. A similar issue happened for strings that had a single-quote pair and an apostrophe.

16 strings were found to have these issues, and I verified that each one was a false positive. I removed the extra regexs to look for mis-ordered quotes because it was exceedingly difficult to figure it out and wasn't adding much value. I did however add a case that makes sure we are using the correct single quote character for the apostrophe, which is likely to be the more common mistake since it doesn't have another character to "match" up with and act symmetrically towards.
(In reply to Pulsebot from comment #27)
> Backout:
> https://hg.mozilla.org/integration/fx-team/rev/8007d5be3f65

(In reply to Pulsebot from comment #28)
> Backout:
> https://hg.mozilla.org/integration/fx-team/rev/48b57a7cf747

Note, comment #28 is the backout of the backout. The first backout wasn't necessary.
I'm starting to get really confused about this bug. 

There is a string change, for example, that is not part of this patch (cmdSetdirManual3 in gclicommands.properties), and is popping in and out with backouts.
https://hg.mozilla.org/mozilla-central/rev/a877338b4740#l39.164
https://hg.mozilla.org/mozilla-central/log/default/devtools/shared/locales/en-US/gclicommands.properties
That string change was requested by comment #19.

The backouts were not necessary and cancel each other out. cmdSetdirManual3 that is in the hg repo is the correct string.

The previous string, cmdSetdirManual2, was incorrect in that it contained HTML elements that were displayed as raw HTML instead of rendered as proper HTML elements.
(In reply to Jared Wein [:jaws] (please needinfo? me) from comment #31)
> That string change was requested by comment #19.
> 
> The backouts were not necessary and cancel each other out. cmdSetdirManual3
> that is in the hg repo is the correct string.

Thanks, I think I'm starting to understand now. I assumed this bug wasn't in mozilla-central, but it actually landed with 48b57a7cf747 (the fact that this changeset is called 'Backout the backout ...' still confuses the heck out of me).
Note: I think this hasn't been marked as fixed due to all the backout commits, although it has now landed:

https://treeherder.mozilla.org/#/jobs?repo=mozilla-central&revision=55d557f4d73ee58664bdf2fa85aaab555224722e


FTR, I've ported the string changes to the Loop repo:

https://github.com/mozilla/loop/commit/4807808211e1b6fd271a2cc8ca8418b0c8c3f121
It appears this test doesn't work for strings that contain newline characters in their definitions. I noticed this after a try push for bug 1240594 (which removes aboutCertError.dtd).

Specifically, that bug moves the certerror.introPara string to netError.dtd and splits it into 3 lines. I changed the whitelist to reflect that this string is now in netError.dtd, but I ended up with a failure because it never got removed from the whitelist.

Keeping the entire string in one line eliminates this failure. This seems like a bug in the test to me, because afaik string definitions in DTDs can span multiple lines.
Flags: needinfo?(jaws)
Blocks: 1262378
(In reply to Mark Banner (:standard8) from comment #33)
> Note: I think this hasn't been marked as fixed due to all the backout
> commits, although it has now landed:

Thanks for the ping.
Status: ASSIGNED → RESOLVED
Closed: 3 years ago
Resolution: --- → FIXED
Depends on: 1262648
Flags: needinfo?(jaws)
Depends on: 1268159
(In reply to Jared Wein [:jaws] (please needinfo? me) from comment #24)
> https://hg.mozilla.org/integration/fx-team/rev/
> a877338b4740315e31d8843e4c6b34122f93e473
> Bug 1259859 - Add a test that looks for misused characters in user-facing
> strings. r=dolske,jryans

This made changes to browser/locales/en-US/pdfviewer/viewer.properties that never got upstreamed. Brendan, can you please do so?
Flags: needinfo?(bdahl)
Fixed in https://github.com/mozilla/pdf.js/pull/7563
Flags: needinfo?(bdahl)
You need to log in before you can comment on or make changes to this bug.