Closed Bug 1488192 Opened 6 years ago Closed 6 years ago

Return the input when no characters were decoded/encoded in decodeURI/encodeURI

Categories

(Core :: JavaScript: Standard Library, defect)

defect
Not set
normal

Tracking

()

RESOLVED FIXED
mozilla64
Tracking Status
firefox63 --- wontfix
firefox64 --- fixed

People

(Reporter: anba, Assigned: anba)

Details

Attachments

(3 files)

It looks like {de,en}codeURI[Component] are quite often called on websites even though no characters need to be decoded resp. encoded. 

For example I got the following results when applying the attached patch and then browsing some news and Alexa 50 sites:

(gdb) p Decode_IdenticalTransferCount
$1 = 3592393
(gdb) p Decode_NonIdenticalTransferCountLatin1
$2 = 177340
(gdb) p Decode_NonIdenticalTransferCountUTF16
$3 = 809
(gdb) p Encode_IdenticalTransferCount
$4 = 445947
(gdb) p Encode_NonIdenticalTransferCountLatin1
$5 = 38846
(gdb) p Encode_NonIdenticalTransferCountUTF16
$6 = 2335

with
- Decode_IdenticalTransferCount: decodeURI and decodeURIComponent called and no characters needed to be decoded.
- Decode_NonIdenticalTransferCountLatin1: decodeURI and decodeURIComponent called, some characters were decoded, input was Latin-1.
- Decode_NonIdenticalTransferCountUTF16: decodeURI and decodeURIComponent called, some characters were decoded, input was UTF-16.
- Encode_IdenticalTransferCount: encodeURI and encodeURIComponent called and no characters needed to be encoded.
- Encode_NonIdenticalTransferCountLatin1: encodeURI and encodeURIComponent called, some characters were encoded, input was Latin-1.
- Encode_NonIdenticalTransferCountUTF16: encodeURI and encodeURIComponent called, some characters were encoded, input was UTF-16.
More detailed results for a couple of sites
Modifies Encode(...) and Decode(...) to append string ranges to the StringBuilder instead of single characters and leave the StringBuilder empty when no characters were decoded/encoded, in which case the callers can return the input string unchanged.


Drive-by changes:
- Remove the unnecessary null-character terminator in |hexBuf| in the Encode() function.
- Modified [1] to call StringBuffer::append(Latin1Char) instead of StringBuffer::append(char16_t), because the former should be slightly faster. (Unless the compiler already figured out that the input is definitely a Latin-1 characters, because |B < 128| is true.)
- Correct the OOM handling in DebugState::debugDisplayURL() to check for |cx->isThrowingOutOfMemory()|. Also added an assertion for "over-recursed" exceptions, which probably don't happen when calling |EncodeURI|, but if they actually do happen (and the assertion fails), we should change the code to handle over-recursed errors similar to OOM errors.

[1] https://searchfox.org/mozilla-central/rev/721842eed881c7fcdccb9ec0fe79e4e6d4e46604/js/src/builtin/String.cpp#3890,3895-3896
Attachment #9005999 - Flags: review?(jdemooij)
Comment on attachment 9005999 [details] [diff] [review]
bug-1488192.patch

Review of attachment 9005999 [details] [diff] [review]:
-----------------------------------------------------------------

Wow, great find. That eliminates a lot of string allocations.

::: js/src/builtin/String.cpp
@@ +3755,3 @@
>  {
> +    if (!sb.empty()) {
> +        str = sb.finishString();

I was wondering about the empty-input-string case, but I see finishString returns cx->names().empty if length == 0, so that will be optimized correctly :)
Attachment #9005999 - Flags: review?(jdemooij) → review+
Pushed by btara@mozilla.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/c3b29fcce16f
Return input if no characters were modified in decode/encodeURI. r=jandem
Keywords: checkin-needed
https://hg.mozilla.org/mozilla-central/rev/c3b29fcce16f
Status: ASSIGNED → RESOLVED
Closed: 6 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla64
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: