Improve nsEscapeHTML() and nsEscapeHTML2()

RESOLVED FIXED in Firefox 57

Status

()

enhancement
RESOLVED FIXED
2 years ago
2 years ago

People

(Reporter: njn, Assigned: njn)

Tracking

54 Branch
mozilla57
Points:
---

Firefox Tracking Flags

(firefox57 fixed)

Details

Attachments

(2 attachments, 1 obsolete attachment)

Assignee

Description

2 years ago
These functions return malloc'd |char*|/|char16_t*| buffers which are, in most cases, immediately Adopt()ed into nsStrings. We should make them operate on nsAString/nsACString.

Also, they over-allocate terribly! All they do is escape '<', '>', '&', '"', and '\''. The worst case is '"' which expands to "&quot;", which is six chars. So they just assume the worst and allocate 6x more chars for the output than what the input has!
Assignee

Comment 1

2 years ago
I did some ad hoc profiling. Starting the browser and looking at a couple of pages, nsEscapeHTML() was called a grand total of 8 times, all for "moz-extension:" URLs. nsEscapeHTML2() wasn't called at all.
(In reply to Nicholas Nethercote [:njn] from comment #1)
> I did some ad hoc profiling. Starting the browser and looking at a couple of
> pages, nsEscapeHTML() was called a grand total of 8 times, all for
> "moz-extension:" URLs. nsEscapeHTML2() wasn't called at all.

The best I can tell nsEscapeHTML2 is just used for rendering indexes (think foo.bar.com/~erahm/).
Assignee

Comment 3

2 years ago
The existing functions work with C strings but almost all the call sites use
Mozilla strings.

The replacement function has the following properties.

- It works with Mozilla strings, which makes it much simpler and also improves
  the call sites.

- It appends to the destination string because that's what a lot of the call
  sites need. For those that don't, we can just append to an empty string.

- It is declared outside the |extern "C"| section because there is no need for
  it to be in that section.

Note: there is no 16-bit variant of nsAppendEscapedHTML(). This is because
there are only two places that need 16-bit variants, both rarely executed,
and so converting to and from 8-bit is good enough.

The patch also adds some testing of the new function, renaming
TestEscapeURL.cpp as TestEscape.cpp in the process, because that file is now
testing other kinds of escaping.
Attachment #8898603 - Flags: review?(erahm)
Assignee

Updated

2 years ago
Assignee: nobody → n.nethercote
Status: NEW → ASSIGNED
Assignee

Updated

2 years ago
Attachment #8898603 - Attachment is obsolete: true
Attachment #8898603 - Flags: review?(erahm)
Assignee

Comment 5

2 years ago
Comments about functions should go before the declaration, not after!
Attachment #8898629 - Flags: review?(erahm)
Comment on attachment 8898626 [details] [diff] [review]
Replace nsEscapeHTML{,2}() with new nsAppendEscapedHTML() function

Review of attachment 8898626 [details] [diff] [review]:
-----------------------------------------------------------------

Thanks, this is much better.

::: xpcom/io/nsEscape.cpp
@@ +214,3 @@
>  {
> +  // Preparation: aDst's length will increase by at least aSrc's length.
> +  aDst.SetCapacity(aDst.Length() + aSrc.Length());

Nice, no more 6X!

@@ +217,4 @@
>  
> +  nsACString::const_iterator cur, end;
> +  aSrc.BeginReading(cur);
> +  aSrc.EndReading(end);

This could just be:

> for (auto cur = aSrc.BeginReading(); cur != aSrc.EndReading(); cur++)

or slightly more concise (and avoiding our odd iterator class):

> auto cur = aSrc.BeginReading();
> auto end = aSrc.EndReading();

It's fine as-is of course.
Attachment #8898626 - Flags: review?(erahm) → review+

Updated

2 years ago
Attachment #8898629 - Flags: review?(erahm) → review+
Comment on attachment 8898626 [details] [diff] [review]
Replace nsEscapeHTML{,2}() with new nsAppendEscapedHTML() function

Review of attachment 8898626 [details] [diff] [review]:
-----------------------------------------------------------------

::: xpcom/io/nsEscape.cpp
@@ +214,3 @@
>  {
> +  // Preparation: aDst's length will increase by at least aSrc's length.
> +  aDst.SetCapacity(aDst.Length() + aSrc.Length());

The result of this addition should be checked for overflow.  Although, hm, this function isn't fallible...
Assignee

Comment 8

2 years ago
> The result of this addition should be checked for overflow.  Although, hm,
> this function isn't fallible...

Turns out it can't overflow, and it wouldn't matter if it did. I've added this comment:

>  // Preparation: aDst's length will increase by at least aSrc's length.
>  // This should never overflow because kMaxCapacity is less than half of the
>  // maximum value of nsACString::kMaxCapacity. But even if that changed,
>  // Length() is unsigned, so any overflow will just cause this sum to be
>  // smaller than it should be, in which case this call will be a no-op.
>  aDst.SetCapacity(aDst.Length() + aSrc.Length());
(In reply to Nicholas Nethercote [:njn] from comment #8)
> >  // Preparation: aDst's length will increase by at least aSrc's length.
> >  // This should never overflow because kMaxCapacity is less than half of the
> >  // maximum value of nsACString::kMaxCapacity. But even if that changed,

Huh, interesting.  What is the non-qualified kMaxCapacity in the above, where is that coming from?  Do you know offhand if we have a static assert for the relationship cited here?

> >  // Length() is unsigned, so any overflow will just cause this sum to be
> >  // smaller than it should be, in which case this call will be a no-op.
> >  aDst.SetCapacity(aDst.Length() + aSrc.Length());

Smaller capacities don't look like a no-op to me, particularly:

https://dxr.mozilla.org/mozilla-central/source/xpcom/string/nsTSubstring.cpp#683
https://dxr.mozilla.org/mozilla-central/source/xpcom/string/nsTSubstring.cpp#708

We'll overwrite old data if we shrink.  But perhaps this is not a problem because of the capacity arguments cited above.
Assignee

Comment 10

2 years ago
Eh, I'll just add overflow checking and skip the SetCapacity() call on failure.

Comment 12

2 years ago
bugherder
https://hg.mozilla.org/mozilla-central/rev/a9fb936a3c82
https://hg.mozilla.org/mozilla-central/rev/1f1322219d9f
Status: ASSIGNED → RESOLVED
Closed: 2 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla57

Comment 13

2 years ago
That busted us without any warning :-(

Comment 14

2 years ago
Just a string question:

I converted this:
  char *escapedBody = nsEscapeHTML(*body);
  if (escapedBody)
  {
    PR_Free(*body);
    *body = escapedBody;
    bodyLen = strlen(*body);
  }

to this:
  nsCString escapedBody;
  nsAppendEscapedHTML(nsDependentCString(*body), escapedBody);
  if (!escapedBody.IsEmpty())
  {
    PR_Free(*body);
    *body = strdup(escapedBody.get());
    bodyLen = strlen(*body);
  }

The 'strdup()' is really unfortunate, but I guess there is no opposite of Adopt(), so something like:
  *body = escapedBody.get();
  escapedBody.ForgetBuffer();
right?
Flags: needinfo?(erahm)
(In reply to Jorg K (GMT+2) from comment #14)
> Just a string question:
> 
> I converted this:
>   char *escapedBody = nsEscapeHTML(*body);
>   if (escapedBody)
>   {
>     PR_Free(*body);
>     *body = escapedBody;
>     bodyLen = strlen(*body);
>   }
> 
> to this:
>   nsCString escapedBody;
>   nsAppendEscapedHTML(nsDependentCString(*body), escapedBody);
>   if (!escapedBody.IsEmpty())
>   {
>     PR_Free(*body);
>     *body = strdup(escapedBody.get());
>     bodyLen = strlen(*body);
>   }
> 
> The 'strdup()' is really unfortunate, but I guess there is no opposite of
> Adopt(), so something like:
>   *body = escapedBody.get();
>   escapedBody.ForgetBuffer();
> right?

Correct, we don't have a `ForgetBuffer` (the buffer itself includes the ref counting bits so it's not as easy as just forgetting it). You can use `ToNewCString` [1] instead of the `strdup` and `nsCString::Length` instead of `strlen` at least:

>     *body = ToNewCString(escapedBody);
>     bodyLen = escapedBody.Length();

[1] http://searchfox.org/mozilla-central/rev/48ea452803907f2575d81021e8678634e8067fc2/xpcom/string/nsReadableUtils.h#87
Flags: needinfo?(erahm)

Comment 16

2 years ago
Thanks Eric, I was reading the string guide https://developer.mozilla.org/en-US/docs/Mozilla/Tech/XPCOM/Guide/Internal_strings on my mobile device while afk and came to the same conclusions, but you answered before I could cancel the NI.

That page even says: "In other words, there is no way to "grab" the internal char * from a String, i.e. have the string "forget" about it and hand off ownership to other code. Sorry."
Assignee

Comment 17

2 years ago
Apologies for the breakage. For no good reason I utterly failed to check this change against comm-central. I will try to do better in the future.

Comment 18

2 years ago
No problem. There were about 20 call sites and the new function allowed us to toss a lot of clumsy code.
You need to log in before you can comment on or make changes to this bug.