1259859 - Add a test that will look for misuse of single-quotes, double-quotes, and "..." (not ellipsis) in dtd/properties files

Jared Wein [:jaws] (please needinfo? me)

Assignee

Description

•

9 years ago

Bug 1151449 and bug 1259847 fixed a number of strings that use single-quotes when smart quotes/curly quotes would have been preferred. We should have a script that runs on commit/check-in that looks at DTD/properties file changes and rejects the commit if curly quotes or three dots ("...") are used.

Justin Dolske [:Dolske]

Comment 1

•

9 years ago

This should also check for straight double-quotes (U+0022 " instead of U+201C “ + U+211D ”), although apparently we don't have any right now.

Justin Dolske [:Dolske]

Comment 2

•

9 years ago

(Actually, we do!)

Jared Wein [:jaws] (please needinfo? me)

Assignee

Updated

•

9 years ago

Assignee: nobody → jaws

Status: NEW → ASSIGNED

Jared Wein [:jaws] (please needinfo? me)

Assignee

Comment 3

•

9 years ago

Attached file MozReview Request: Bug 1259859 - Add a test that looks for misused characters in user-facing strings. r?dolske,jryans — Details

Review commit: https://reviewboard.mozilla.org/r/42513/diff/#index_header See other reviews: https://reviewboard.mozilla.org/r/42513/

Attachment #8735009 - Flags: review?(jryans)

Attachment #8735009 - Flags: review?(dolske)

Jared Wein [:jaws] (please needinfo? me)

Assignee

Comment 4

•

9 years ago

This is only scanning .properties files for now. I have to think of a way to scan through DTD files.

Jared Wein [:jaws] (please needinfo? me)

Assignee

Comment 5

•

9 years ago

Comment on attachment 8735009 [details] MozReview Request: Bug 1259859 - Add a test that looks for misused characters in user-facing strings. r?dolske,jryans Review request updated; see interdiff: https://reviewboard.mozilla.org/r/42513/diff/1-2/

Jared Wein [:jaws] (please needinfo? me)

Assignee

Comment 6

•

9 years ago

Comment on attachment 8735009 [details] MozReview Request: Bug 1259859 - Add a test that looks for misused characters in user-facing strings. r?dolske,jryans Review request updated; see interdiff: https://reviewboard.mozilla.org/r/42513/diff/2-3/

Jared Wein [:jaws] (please needinfo? me)

Assignee

Updated

•

9 years ago

Summary: Create a lint tool that will look for misuse of single-quotes and "..." (not ellipsis) in dtd/properties files → Add a test that will look for misuse of single-quotes, double-quotes, and "..." (not ellipsis) in dtd/properties files

Jared Wein [:jaws] (please needinfo? me)

Assignee

Comment 7

•

9 years ago

Comment on attachment 8735009 [details] MozReview Request: Bug 1259859 - Add a test that looks for misused characters in user-facing strings. r?dolske,jryans Review request updated; see interdiff: https://reviewboard.mozilla.org/r/42513/diff/3-4/

Jared Wein [:jaws] (please needinfo? me)

Assignee

Comment 8

•

9 years ago

See https://bugzilla.mozilla.org/show_bug.cgi?id=1151449#c12 for some background discussion referring to this bug.

Jared Wein [:jaws] (please needinfo? me)

Assignee

Comment 9

•

9 years ago

Comment on attachment 8735009 [details] MozReview Request: Bug 1259859 - Add a test that looks for misused characters in user-facing strings. r?dolske,jryans Review request updated; see interdiff: https://reviewboard.mozilla.org/r/42513/diff/4-5/

Axel Hecht [:Pike]

Comment 10

•

9 years ago

Fly-by comments on the tests: Does the directory iterator dive into the object dir? I'm working on a test perpendicular to this one (bug 1255407), and going through my files is the long pole in terms of perf. Also, I'd make the error message list the good one to use in the message, so people can copy and paste. Do we care about the bad quotes and ellipsis in comments? As right now, it catches those, too.

J. Ryan Stinnett [:jryans] (Use needinfo, replies may be slow)

Comment 11

•

9 years ago

Comment on attachment 8735009 [details] MozReview Request: Bug 1259859 - Add a test that looks for misused characters in user-facing strings. r?dolske,jryans https://reviewboard.mozilla.org/r/42513/#review39201 The overall test and DevTools changes look good to me.

Attachment #8735009 - Flags: review?(jryans) → review+

Jared Wein [:jaws] (please needinfo? me)

Assignee

Comment 12

•

9 years ago

(In reply to Axel Hecht [:Pike] from comment #10) > Fly-by comments on the tests: > > Does the directory iterator dive into the object dir? I'm working on a test > perpendicular to this one (bug 1255407), and going through my files is the > long pole in terms of perf. The test uses generateURIsFromDirTree from parsingTestHelpers.jsm to get the list. It's missing some of the files such as dom.properties but those can be added in a follow-up. > Also, I'd make the error message list the good one to use in the message, so > people can copy and paste. I've added this in my latest patch. > Do we care about the bad quotes and ellipsis in comments? As right now, it > catches those, too. This test doesn't look at comments and I don't think we should concern ourselves with those.

Jared Wein [:jaws] (please needinfo? me)

Assignee

Comment 13

•

9 years ago

Comment on attachment 8735009 [details] MozReview Request: Bug 1259859 - Add a test that looks for misused characters in user-facing strings. r?dolske,jryans Review request updated; see interdiff: https://reviewboard.mozilla.org/r/42513/diff/5-6/

Jared Wein [:jaws] (please needinfo? me)

Assignee

Comment 14

•

9 years ago

Comment on attachment 8735009 [details] MozReview Request: Bug 1259859 - Add a test that looks for misused characters in user-facing strings. r?dolske,jryans Review request updated; see interdiff: https://reviewboard.mozilla.org/r/42513/diff/6-7/

Axel Hecht [:Pike]

Comment 15

•

9 years ago

https://reviewboard.mozilla.org/r/42513/#review39335 ::: browser/base/content/test/general/browser_misused_characters_in_strings.js:111 (Diff revision 7) > + testForError(spec, entity, str, /"/, "double-quote", "\u201c\u201d"); > + testForError(spec, entity, str, /\.\.\./, "ellipsis", "\u2026"); > +} > + > +add_task(function* checkAllTheProperties() { > + let appDir = Services.dirsvc.get("XCurProcD", Ci.nsIFile); I wonder if the missing of dom.properties would be resolved if this was "GreD" instead of XCurProcD? If the missing dom.properties comes from just iterating over browser/omni.ja and not omni.ja.

Jared Wein [:jaws] (please needinfo? me)

Assignee

Comment 16

•

9 years ago

Comment on attachment 8735009 [details] MozReview Request: Bug 1259859 - Add a test that looks for misused characters in user-facing strings. r?dolske,jryans Review request updated; see interdiff: https://reviewboard.mozilla.org/r/42513/diff/7-8/

Jared Wein [:jaws] (please needinfo? me)

Assignee

Comment 17

•

9 years ago

(In reply to Axel Hecht [:Pike] from comment #15) > https://reviewboard.mozilla.org/r/42513/#review39335 > > ::: > browser/base/content/test/general/browser_misused_characters_in_strings.js: > 111 > (Diff revision 7) > > + testForError(spec, entity, str, /"/, "double-quote", "\u201c\u201d"); > > + testForError(spec, entity, str, /\.\.\./, "ellipsis", "\u2026"); > > +} > > + > > +add_task(function* checkAllTheProperties() { > > + let appDir = Services.dirsvc.get("XCurProcD", Ci.nsIFile); > > I wonder if the missing of dom.properties would be resolved if this was > "GreD" instead of XCurProcD? > > If the missing dom.properties comes from just iterating over browser/omni.ja > and not omni.ja. That might be the reason. There is certainly some other files that show up with GreD compared to XCurProcD, though I'd like to change that in a follow-up bug once this lands since 1) the patch here is already huge, and 2) XCurProcD matches the two parsing tests (browser_parsable_css.js and browser_parsable_script.js).

Jared Wein [:jaws] (please needinfo? me)

Assignee

Comment 18

•

9 years ago

All green on tryserver, https://treeherder.mozilla.org/#/jobs?repo=try&revision=8e7e7001254f

Justin Dolske [:Dolske]

Comment 19

•

9 years ago

Comment on attachment 8735009 [details] MozReview Request: Bug 1259859 - Add a test that looks for misused characters in user-facing strings. r?dolske,jryans https://reviewboard.mozilla.org/r/42513/#review39685 Nice work! r+ with a few small things. ::: browser/base/content/test/general/browser_misused_characters_in_strings.js:7 (Diff revision 8) > + * http://creativecommons.org/publicdomain/zero/1.0/ */ > + > +/* This list allows pre-existing or 'unfixable' issues to remain, while we > + * detect newly occurring issues in shipping files. It is a list of objects > + * specifying conditions under which an error should be ignored. */ > +const kWhitelist = [{ I was wondering how many exceptions we'd need. :-) [And I also wonder if there's a rare string or two used in some context where it has to be ASCII, but I'm probably being paranoid.] Ideally these would be per-string annotations in the original .dtd/.property files, just to keep the metadata all in one place, but I think this is fine since there are not very many. In fact... All but one of these exceptions are of the form: <span|label class|id='foo'/?> What do you think about just having testForErrors() strip this out from the string being tested? ::: browser/base/content/test/general/browser_misused_characters_in_strings.js:11 (Diff revision 8) > + * specifying conditions under which an error should be ignored. */ > +const kWhitelist = [{ > + file: "gclicommands.properties", > + entity: "cmdSetdirManual2", > + type: "double-quote" > + }, { So... This is actually a bug! \o/ This string is attempting to link to a page with <a href="...">. But open up the Developer Toolbar, and type "help cmd setdir": You get a popup window showing this string literally, it's not a link. I think we should remove the exception, and have the string just list the URL without any attempt at HTML markup. (And I just noticed bug 1260607, so that's 2-for-1 here.) ::: browser/base/content/test/general/browser_misused_characters_in_strings.js:96 (Diff revision 8) > + }; > + xhr.send(null); > + }); > +} > + > +function testForError(spec, entity, str, pattern, type, expected) { nit: spec is a little confusing, make just "file" or "filename"? even smaller nit: Instead of "entity", "key" or "stringName"? ::: browser/base/content/test/general/browser_misused_characters_in_strings.js:104 (Diff revision 8) > + ok(false, `${spec} with key=${entity} has a misused ${type}. Expected ${expected} instead.`); > + } > +} > + > +function testForErrors(spec, entity, str) { > + testForError(spec, entity, str, /\w'\w/, "single-quote", "\u2018\u2019"); Hmm, it would be nice to call out "apostrophe" here specifically, (as the |type|), and suggest just \u2019 as the expected character. But fear this starts to go down a needless regex rathole. I'd suggest using the |type| only for the whitelist, and make the second string be the descriptive text. ok(false, `${spec} with key=${entity} is malformed. ${help}`); The first testForErrors() would have: "Single-quoted strings should use Unicode \u2018foo\u2019 instead of 'foo', and words with apostrophes should use foo\u2019 isntead of foo's." Similarly for double-quote. And "Strings with an ellipsis should use the Unicode \u2026 character instead of three periods." ::: browser/base/content/test/general/browser_misused_characters_in_strings.js:134 (Diff revision 8) > + let uris = yield generateURIsFromDirTree(appDir, [".dtd"]); > + ok(uris.length, `Found ${uris.length} .dtd files to scan for misused characters`); > + > + for (let uri of uris) { > + let rawContents = yield fetchFile(uri.spec); > + let entities = rawContents.match(/ENTITY\s+([\w\.]*)\s+["'](.*)["']/g); You could make this a little stricter, forcing it to start at the beginning of a line... /^<!ENTITY.../ Although this would only filter out edgecasy stuff (like commenting out an entity, or a perhaps a carefully constructed string talking about how to format an entity), so I don't really care. :)

Attachment #8735009 - Flags: review?(dolske) → review+

Justin Dolske [:Dolske]

Comment 20

•

9 years ago

Margaret: Is mobile interested in this? If so you'll probably want a followup to fix /mobile strings and move this test to Toolkit.

Flags: needinfo?(margaret.leibovic)

Justin Dolske [:Dolske]

Comment 21

•

9 years ago

Actually, one more comment. Especially with single-quotes, I find it at little difficult to differentiate the left- and right- forms at a glance. Bonus points for adding a check to make sure they're properly used in pairs (ditto for double-quotes). The obvious complications are dealing with a mix of single-quotes and apostrophes, nested quotes, etc. I don't remember if this is possible with a regex, but I bet it's a solved problem we could grab from the shelf!

Axel Hecht [:Pike]

Comment 22

•

9 years ago

PS: if this test was in python, you could use the in-tree version of compare-locales, and the parsers it's coming with. Which might come in handy once we add a new file format for l1(2)0n, as we don't need to write parsers twice then.

:Margaret Leibovic

Comment 23

•

9 years ago

(In reply to Justin Dolske [:Dolske] from comment #20) > Margaret: Is mobile interested in this? If so you'll probably want a > followup to fix /mobile strings and move this test to Toolkit. If this is something the UX team wants on desktop, I imagine it would apply to Android as well. I don't recall seeing us encounter this issue in any bugs, but I'm sure we have the same problems with our strings. Also, if we're doing this in shared toolkit code, I imagine we'd want to be consistent in our /mobile front-end code as well.

Flags: needinfo?(margaret.leibovic)

Jared Wein [:jaws] (please needinfo? me)

Assignee

Comment 24

•

9 years ago

https://hg.mozilla.org/integration/fx-team/rev/a877338b4740315e31d8843e4c6b34122f93e473 Bug 1259859 - Add a test that looks for misused characters in user-facing strings. r=dolske,jryans

Jared Wein [:jaws] (please needinfo? me)

Assignee

Comment 25

•

9 years ago

(In reply to Justin Dolske [:Dolske] from comment #19) > All but one of these exceptions are of the form: > > <span|label class|id='foo'/?> > > What do you think about just having testForErrors() strip this out from the > string being tested? I went back and forth on this and decided not to add this to the test because I didn't want something to slip past that wasn't explicitly overridden. These types of strings don't get added often, so it shouldn't be too much work to update the override list if need be, whereas if these were filtered out we could get some incorrect usages in the tree without knowing it. > browser/base/content/test/general/browser_misused_characters_in_strings.js:11 > > + * specifying conditions under which an error should be ignored. */ > > +const kWhitelist = [{ > > + file: "gclicommands.properties", > > + entity: "cmdSetdirManual2", > > + type: "double-quote" > > + }, { > > This string is attempting to link to a page with <a href="...">. But open up > the Developer Toolbar, and type "help cmd setdir": You get a popup window > showing this string literally, it's not a link. > > I think we should remove the exception, and have the string just list the > URL without any attempt at HTML markup. Fixed. > ::: > browser/base/content/test/general/browser_misused_characters_in_strings.js:96 > (Diff revision 8) > > + }; > > + xhr.send(null); > > + }); > > +} > > + > > +function testForError(spec, entity, str, pattern, type, expected) { > > nit: spec is a little confusing, make just "file" or "filename"? > > even smaller nit: Instead of "entity", "key" or "stringName"? > Fixed. > ::: > browser/base/content/test/general/browser_misused_characters_in_strings.js: > 104 > (Diff revision 8) > > + ok(false, `${spec} with key=${entity} has a misused ${type}. Expected ${expected} instead.`); > > + } > > +} > > + > > +function testForErrors(spec, entity, str) { > > + testForError(spec, entity, str, /\w'\w/, "single-quote", "\u2018\u2019"); > > Hmm, it would be nice to call out "apostrophe" here specifically, (as the > |type|), and suggest just \u2019 as the expected character. But fear this > starts to go down a needless regex rathole. Added. > I'd suggest using the |type| only for the whitelist, and make the second > string be the descriptive text. > > ok(false, `${spec} with key=${entity} is malformed. ${help}`); > > The first testForErrors() would have: > > "Single-quoted strings should use Unicode \u2018foo\u2019 instead of 'foo', > and words with apostrophes should use foo\u2019 isntead of foo's." > > Similarly for double-quote. > > And "Strings with an ellipsis should use the Unicode \u2026 character > instead of three periods." Good idea, I've made this change. > ::: > browser/base/content/test/general/browser_misused_characters_in_strings.js: > 134 > (Diff revision 8) > > + let uris = yield generateURIsFromDirTree(appDir, [".dtd"]); > > + ok(uris.length, `Found ${uris.length} .dtd files to scan for misused characters`); > > + > > + for (let uri of uris) { > > + let rawContents = yield fetchFile(uri.spec); > > + let entities = rawContents.match(/ENTITY\s+([\w\.]*)\s+["'](.*)["']/g); > > You could make this a little stricter, forcing it to start at the beginning > of a line... /^<!ENTITY.../ > > Although this would only filter out edgecasy stuff (like commenting out an > entity, or a perhaps a carefully constructed string talking about how to > format an entity), so I don't really care. :) I left this unchanged. (In reply to Justin Dolske [:Dolske] from comment #21) > Actually, one more comment. Especially with single-quotes, I find it at > little difficult to differentiate the left- and right- forms at a glance. > Bonus points for adding a check to make sure they're properly used in pairs > (ditto for double-quotes). The obvious complications are dealing with a mix > of single-quotes and apostrophes, nested quotes, etc. I don't remember if > this is possible with a regex, but I bet it's a solved problem we could grab > from the shelf! I started to implement this and it got hairy pretty fast. Strings that included two sets of quotes (‘%1$S’ does not match ‘%2$S’) were flagged as erroneous because the two inner quote characters matched as being in the wrong order. A similar issue happened for strings that had a single-quote pair and an apostrophe. 16 strings were found to have these issues, and I verified that each one was a false positive. I removed the extra regexs to look for mis-ordered quotes because it was exceedingly difficult to figure it out and wasn't adding much value. I did however add a case that makes sure we are using the correct single quote character for the apostrophe, which is likely to be the more common mistake since it doesn't have another character to "match" up with and act symmetrically towards.

Jared Wein [:jaws] (please needinfo? me)

Assignee

Comment 26

•

9 years ago

https://hg.mozilla.org/integration/fx-team/rev/c7bde7c9d5e56565fd3a6a013d712f648480b535 Bug 1259859 - followup, fix some quotes that snuck in. r=me

Pulsebot

Comment 27

•

9 years ago

Backout: https://hg.mozilla.org/integration/fx-team/rev/8007d5be3f65

Pulsebot

Comment 28

•

9 years ago

Backout: https://hg.mozilla.org/integration/fx-team/rev/48b57a7cf747

Jared Wein [:jaws] (please needinfo? me)

Assignee

Comment 29

•

9 years ago

(In reply to Pulsebot from comment #27) > Backout: > https://hg.mozilla.org/integration/fx-team/rev/8007d5be3f65 (In reply to Pulsebot from comment #28) > Backout: > https://hg.mozilla.org/integration/fx-team/rev/48b57a7cf747 Note, comment #28 is the backout of the backout. The first backout wasn't necessary.

Francesco Lodolo [:flod]

Comment 30

•

9 years ago

I'm starting to get really confused about this bug. There is a string change, for example, that is not part of this patch (cmdSetdirManual3 in gclicommands.properties), and is popping in and out with backouts. https://hg.mozilla.org/mozilla-central/rev/a877338b4740#l39.164 https://hg.mozilla.org/mozilla-central/log/default/devtools/shared/locales/en-US/gclicommands.properties

Jared Wein [:jaws] (please needinfo? me)

Assignee

Comment 31

•

9 years ago

That string change was requested by comment #19. The backouts were not necessary and cancel each other out. cmdSetdirManual3 that is in the hg repo is the correct string. The previous string, cmdSetdirManual2, was incorrect in that it contained HTML elements that were displayed as raw HTML instead of rendered as proper HTML elements.

Francesco Lodolo [:flod]

Comment 32

•

9 years ago

(In reply to Jared Wein [:jaws] (please needinfo? me) from comment #31) > That string change was requested by comment #19. > > The backouts were not necessary and cancel each other out. cmdSetdirManual3 > that is in the hg repo is the correct string. Thanks, I think I'm starting to understand now. I assumed this bug wasn't in mozilla-central, but it actually landed with 48b57a7cf747 (the fact that this changeset is called 'Backout the backout ...' still confuses the heck out of me).

Mark Banner (:standard8)

Comment 33

•

9 years ago

Note: I think this hasn't been marked as fixed due to all the backout commits, although it has now landed: https://treeherder.mozilla.org/#/jobs?repo=mozilla-central&revision=55d557f4d73ee58664bdf2fa85aaab555224722e FTR, I've ported the string changes to the Loop repo: https://github.com/mozilla/loop/commit/4807808211e1b6fd271a2cc8ca8418b0c8c3f121

(inactive) Nihanth Subramanya [:nhnt11]

Comment 34

•

9 years ago

It appears this test doesn't work for strings that contain newline characters in their definitions. I noticed this after a try push for bug 1240594 (which removes aboutCertError.dtd). Specifically, that bug moves the certerror.introPara string to netError.dtd and splits it into 3 lines. I changed the whitelist to reflect that this string is now in netError.dtd, but I ended up with a failure because it never got removed from the whitelist. Keeping the entire string in one line eliminates this failure. This seems like a bug in the test to me, because afaik string definitions in DTDs can span multiple lines.

Flags: needinfo?(jaws)

:Gijs (out for now; he/him)

Updated

•

9 years ago

Blocks: 1262378

Jared Wein [:jaws] (please needinfo? me)

Assignee

Comment 35

•

9 years ago

(In reply to Mark Banner (:standard8) from comment #33) > Note: I think this hasn't been marked as fixed due to all the backout > commits, although it has now landed: Thanks for the ping.

Status: ASSIGNED → RESOLVED

Closed: 9 years ago

Resolution: --- → FIXED

(inactive) Nihanth Subramanya [:nhnt11]

Updated

•

9 years ago

Depends on: 1262648

Jared Wein [:jaws] (please needinfo? me)

Assignee

Updated

•

9 years ago

Flags: needinfo?(jaws)

:Gijs (out for now; he/him)

Updated

•

9 years ago

Depends on: 1268159

Ryan VanderMeulen [:RyanVM]

Comment 36

•

8 years ago

(In reply to Jared Wein [:jaws] (please needinfo? me) from comment #24) > https://hg.mozilla.org/integration/fx-team/rev/ > a877338b4740315e31d8843e4c6b34122f93e473 > Bug 1259859 - Add a test that looks for misused characters in user-facing > strings. r=dolske,jryans This made changes to browser/locales/en-US/pdfviewer/viewer.properties that never got upstreamed. Brendan, can you please do so?

Flags: needinfo?(bdahl)

Brendan Dahl [:bdahl]

Comment 37

•

8 years ago

Fixed in https://github.com/mozilla/pdf.js/pull/7563

Flags: needinfo?(bdahl)