Closed Bug 1262648 Opened 8 years ago Closed 8 years ago

browser_misused_characters_in_strings.js doesn't work correctly with string definitions spanning multiple lines

Categories

(Firefox :: General, defect)

defect
Not set
normal

Tracking

()

RESOLVED FIXED
Firefox 48
Tracking Status
firefox48 --- fixed

People

(Reporter: nhnt11, Assigned: nhnt11)

References

Details

Attachments

(1 file)

From bug 1259859 comment 34:

It appears this test doesn't work for strings that contain newline characters in their definitions. I noticed this after a try push for bug 1240594 (which removes aboutCertError.dtd).

Specifically, that bug moves the certerror.introPara string to netError.dtd and splits it into 3 lines. I changed the whitelist to reflect that this string is now in netError.dtd, but I ended up with a failure because it never got removed from the whitelist.

Keeping the entire string in one line eliminates this failure. This seems like a bug in the test to me, because afaik string definitions in DTDs can span multiple lines.
So, my apologies for the crazy regex I ended up with.

I started out just changing a "." to "[\s\S]" so that newlines are matched, but that led to greedy vs lazy behavior to correctly match within the same entity, and then some other issue, and another one.... and I ended up just writing a proper comprehensive regex to match full ENTITY tags.

You can see and play with my test cases here: http://www.regexpal.com/?fam=94585

If you think this is overkill (which I would agree with) I'll see if I can start over and come up with something simple that works with our codebase (i.e. making assumptions about code style).
Don't make assumptions, l10n is fuzz testing those.

https://dxr.mozilla.org/mozilla-central/source/python/compare-locales/compare_locales/parser.py#215 has a decent parser, we have a few years of experience with that one.
https://reviewboard.mozilla.org/r/44721/#review41505

Found a much more straightforward, obvious regex.
Comment on attachment 8738859 [details]
MozReview Request: Bug 1262648 - Make ENTITY matching regexp allow multiline string values in browser_misused_characters_in_strings.js. r=jaws

Found a better regex
Attachment #8738859 - Flags: review?(jaws) → review-
Comment on attachment 8738859 [details]
MozReview Request: Bug 1262648 - Make ENTITY matching regexp allow multiline string values in browser_misused_characters_in_strings.js. r=jaws

Review request updated; see interdiff: https://reviewboard.mozilla.org/r/44721/diff/1-2/
Attachment #8738859 - Attachment description: MozReview Request: Bug 1262648 - Make ENTITY matching regex more comprehensive in browser_misused_characters_in_strings.js. r=jaws → MozReview Request: Bug 1262648 - Make ENTITY matching regexp allow multiline string values in browser_misused_characters_in_strings.js. r=jaws
Attachment #8738859 - Flags: review- → review?(jaws)
New regex is adapted from the file Axel linked to. It doesn't work when there are escaped quotes within the string value (e.g. <!ENTITY some.key "<span class=\"some value\">">), but since it seems we've been using that parser for a while now I think the precedent and simplification of the regex justify the lack of support for that edge case. Also I could not find any such example in our current code base [1].

[1] https://dxr.mozilla.org/mozilla-central/search?q=regexp%3A<!ENTITY\s%2B[\w\.]%2B\s%2B("[^"]*\\"[^"]*"|'[^']*\\'[^']*')\s*>
Comment on attachment 8738859 [details]
MozReview Request: Bug 1262648 - Make ENTITY matching regexp allow multiline string values in browser_misused_characters_in_strings.js. r=jaws

https://reviewboard.mozilla.org/r/44721/#review41597

Please add a dummy DTD file that is opened by the test and confirms that we will support multiline strings even if we aren't shipping any.

The rest of the patch looks fine.
Attachment #8738859 - Flags: review?(jaws)
Comment on attachment 8738859 [details]
MozReview Request: Bug 1262648 - Make ENTITY matching regexp allow multiline string values in browser_misused_characters_in_strings.js. r=jaws

https://reviewboard.mozilla.org/r/44721/#review42413

r=me if you add a new DTD that is opened by the test to confirm that multiline strings are interpreted correctly.
Attachment #8738859 - Flags: review+
Comment on attachment 8738859 [details]
MozReview Request: Bug 1262648 - Make ENTITY matching regexp allow multiline string values in browser_misused_characters_in_strings.js. r=jaws

Review request updated; see interdiff: https://reviewboard.mozilla.org/r/44721/diff/2-3/
r=me on the updated patch
https://hg.mozilla.org/integration/fx-team/rev/5ad0f8d2d7dffcf57ff4d935f5dfb4781683091e
Bug 1262648 - Make ENTITY matching regexp allow multiline string values in browser_misused_characters_in_strings.js. r=jaws
https://hg.mozilla.org/mozilla-central/rev/5ad0f8d2d7df
Status: ASSIGNED → RESOLVED
Closed: 8 years ago
Resolution: --- → FIXED
Target Milestone: --- → Firefox 48
You need to log in before you can comment on or make changes to this bug.