KumaScript changed escaping of HTML entities

VERIFIED FIXED

Status

developer.mozilla.org
KumaScript
VERIFIED FIXED
a year ago
a year ago

People

(Reporter: fscholz, Unassigned)

Tracking

Details

(Whiteboard: [specification][type:bug])

(Reporter)

Description

a year ago
What did you do?
================
Went to https://developer.mozilla.org/en-US/docs/Web/JavaScript


What happened?
==============
Saw "Expressions & operators" in the sidebar navigation.

The string output is from KumaScript, in the jsSidebar.ejs macro:
  
<%=text['Operators']%>

What should have happened?
==========================
"Expressions & operators" in the sidebar.

A workaround would be to now output it like this: <%-text['Operators']%>

Is there anything else we should know?
======================================
The JS pages are quite trafficked, this might a affect quite a lot of pages, and especially localizations of them.

I suspect this is due to our module updates. A quick search on the ejs repo suggests it might be this change: https://github.com/tj/ejs/pull/165

Questions that I have:

Is this new escaping behavior for HTML entities with "<%=" by design and should we use <%- when we need HTML entities in the output?

Or, do we want to fix the new "<%=" so that the old behavior is restored?
It is a good idea to be explicit about when you are adding text that should be escaped, and text that contains vetted HTML.  This helps avoid XSS vectors. "<%-" should be the exception, and cause reviewer's warning bells to go off.

I think JsSidebar template should be updated:

https://github.com/mozilla/kumascript/blob/master/macros/JsSidebar.ejs

The strings should use plain ampersands, in the English and other translations:

'Operators': 'Expressions & operators',

And the EJS should continue to use the "escape and render" version '<%=' :

<li data-default-state="<%=state('Operators')%>"><a href="/<%=locale%>/docs/Web/JavaScript/Reference/Operators"><%=text['Operators']%></a>
&nbsp can be converted to the JS string hexadecimal equivalent:

'&nbsp' -> '\xa0' or '\xA0'

This site has a table of HTML hexadecimal codes and the equivalent HTML entities:

http://on-the-matrix.com/webtools/HtmlEntityReferences.aspx

So copyright © '&copy;' is '&#x00A9;' in HTML and '\xA9' or '\u00A9` in a JS string.

Unicode points like € are '&euro;' and '&#x20AC;'. The four-digit JS escape is '\u20AC'.  It may be better to standardize on the unicode escapes ('\u00A9' and '\u20AC'), since the shorter '\xA9' ones are not allowed in JSON.

Of course, for things like €, you should be able to just use the character in the string.
PR 96 merged, deployed to staging and production. After a force refresh, the sidebar is back to normal.
Status: NEW → RESOLVED
Last Resolved: a year ago
Resolution: --- → FIXED
(Reporter)

Comment 5

a year ago
Thanks for the quick review and deployment!
Status: RESOLVED → VERIFIED
See Also: → bug 1335006
You need to log in before you can comment on or make changes to this bug.