Closed Bug 1026680 Opened 10 years ago Closed 10 years ago

Latin1 strings: convert escape/unescape, URI encoding/decoding functions

Tracking

()

Status:

RESOLVED FIXED

Milestone:

mozilla33

People

(Reporter: jandem, Assigned: jandem)

References

Details

Attachments

(5 files)

Part 1 - decodeURI* 10 years ago Jan de Mooij [:jandem] 6.96 KB, patch	n.nethercote : review+	Details \| Diff \| Splinter Review
Part 2 - encode* 10 years ago Jan de Mooij [:jandem] 5.50 KB, patch	n.nethercote : review+	Details \| Diff \| Splinter Review
Part 3 - unescape 10 years ago Jan de Mooij [:jandem] 5.68 KB, patch	n.nethercote : review+	Details \| Diff \| Splinter Review
Part 4 - Cleanup escape 10 years ago Jan de Mooij [:jandem] 3.98 KB, patch	n.nethercote : review+	Details \| Diff \| Splinter Review
Part 5 - Convert escape 10 years ago Jan de Mooij [:jandem] 5.32 KB, patch	n.nethercote : review+	Details \| Diff \| Splinter Review

Jan de Mooij [:jandem]

Assignee

Description

•

10 years ago

      No description provided.

Jan de Mooij [:jandem]

Assignee

Comment 1

•

10 years ago

Attached patch Part 1 - decodeURI* — Details — Splinter Review

I moved the algorithm into a separate function, changed the return type to an enum and eliminated the gotos. After that it was pretty straight-forward.

Attachment #8441584 - Flags: review?(n.nethercote)

Jan de Mooij [:jandem]

Assignee

Comment 2

•

10 years ago

Attached patch Part 2 - encode* — Details — Splinter Review

Attachment #8441587 - Flags: review?(n.nethercote)

Jan de Mooij [:jandem]

Assignee

Comment 3

•

10 years ago

Attached patch Part 3 - unescape — Details — Splinter Review

A bit simpler than the others.

I changed infallibleAppend(c) to append(c) to avoid an assert in infallibleAppend(c). infallibleAppend(jschar) asserts when the buffer is latin1, because inflation is fallible.

Attachment #8441591 - Flags: review?(n.nethercote)

Jan de Mooij [:jandem]

Assignee

Comment 4

•

10 years ago

Attached patch Part 4 - Cleanup escape — Details — Splinter Review

Attachment #8441592 - Flags: review?(n.nethercote)

Jan de Mooij [:jandem]

Assignee

Comment 5

•

10 years ago

Attached patch Part 5 - Convert escape — Details — Splinter Review

Currently it always returns a TwoByte string, but the characters are really Latin1. Unfortunately there's no easy way to check EnableLatin1Strings here and do the right thing in that case.

I'll keep this bug open and get back to this once everything is enabled by default.

Attachment #8441602 - Flags: review?(n.nethercote)

Jeff Walden [:Waldo]

Comment 6

•

10 years ago

Comment on attachment 8441591 [details] [diff] [review]
Part 3 - unescape

Review of attachment 8441591 [details] [diff] [review]:
-----------------------------------------------------------------

::: js/src/jsstr.cpp
@@ +252,5 @@
>      int k = 0;
>      bool building = false;
>  
> +    /* Step 5. */
> +    while (k != length) {

k < length, just in case something screwy happens and we overflow, so possibly we end up in code paths we can easily comprehend?  Just a general principle to always use < rather than != for this sort of thing.  (And I'm told clang, at least, can optimize better in that case, because it can sometimes know wraparound isn't possible.)

@@ -310,5 @@
>              if (!sb.reserve(length))                \
>                  return false;                       \
>              sb.infallibleAppend(chars, k);          \
> -        }                                           \
> -    JS_END_MACRO

While I don't much like the JS_{BEGIN,END}_MACRO nonsense, I don't think you should get rid of the equivalent thing.  Without this, this code doesn't do what it looks like:

    if (...)
        ENSURE_BUILDING;
    else
       ...;

Having explicit do { and while (false) seems much better to me.

@@ +279,3 @@
>  
>          /* Step 10-13. */
>          if (Unhex4(&chars[k + 2], &c)) {

Range::subrange or Range::slice seems valuable for this, to preserve in-bounds assertions.  I can r+ a patch for that.  :-)

@@ -332,5 @@
>          }
>  
>        step_18:
> -        if (building)
> -            sb.infallibleAppend(c);

This was wrong even before this patch, right?  How did we not fail any assertions?

@@ +313,5 @@
> +{
> +    CallArgs args = CallArgsFromVp(argc, vp);
> +
> +    /* Step 1. */
> +    JSLinearString *str = ArgToRootedString(cx, args, 0);

Surprising this doesn't require Rooted.  Or does it?

@@ +327,5 @@
> +     * NB: use signed integers for length/index to allow simple length
> +     * comparisons without unsigned-underflow hazards.
> +     */
> +    JS_STATIC_ASSERT(JSString::MAX_LENGTH <= INT_MAX);
> +    int length = str->length();

SafeCast<int>(str->length()), please.

@@ +331,5 @@
> +    int length = str->length();
> +
> +    if (str->hasLatin1Chars()) {
> +        AutoCheckCannotGC nogc;
> +        if (!Unescape(sb, str->latin1Chars(nogc), length))

Range(str->latin1Chars(nogc), length)?

Nicholas Nethercote [inactive]

Updated

•

10 years ago

Attachment #8441584 - Flags: review?(n.nethercote) → review+

Nicholas Nethercote [inactive]

Updated

•

10 years ago

Attachment #8441587 - Flags: review?(n.nethercote) → review+

Nicholas Nethercote [inactive]

Comment 7

•

10 years ago

Comment on attachment 8441591 [details] [diff] [review]
Part 3 - unescape

Review of attachment 8441591 [details] [diff] [review]:
-----------------------------------------------------------------

I'll cancel this r?, and let Waldo do the review of the updated patch.

Attachment #8441591 - Flags: review?(n.nethercote)

Nicholas Nethercote [inactive]

Comment 8

•

10 years ago

Comment on attachment 8441592 [details] [diff] [review]
Part 4 - Cleanup escape

Review of attachment 8441592 [details] [diff] [review]:
-----------------------------------------------------------------

::: js/src/jsstr.cpp
@@ +151,3 @@
>           */
> +        static_assert(JSString::MAX_LENGTH < UINT32_MAX / 6,
> +                      "newlength must not overflow");

What happens if you have a string of length MAX_LENGTH, and then some of its chars need to be escaped? Will the js_NewString() below fail?

Attachment #8441592 - Flags: review?(n.nethercote) → review+

Nicholas Nethercote [inactive]

Updated

•

10 years ago

Attachment #8441602 - Flags: review?(n.nethercote) → review+

Nicholas Nethercote [inactive]

Comment 9

•

10 years ago

Comment on attachment 8441591 [details] [diff] [review]
Part 3 - unescape

Review of attachment 8441591 [details] [diff] [review]:
-----------------------------------------------------------------

Actually, I'm happy to say r=me with Waldo's nits addressed.

Attachment #8441591 - Flags: review+

Jan de Mooij [:jandem]

Assignee

Comment 10

•

10 years ago

(In reply to Nicholas Nethercote [:njn] from comment #8)
> What happens if you have a string of length MAX_LENGTH, and then some of its
> chars need to be escaped? Will the js_NewString() below fail?

Yes, js_NewString -> JSFlatString::new_ -> validateLength, and there we do:

    if (MOZ_UNLIKELY(length > JSString::MAX_LENGTH)) {
        js_ReportAllocationOverflow(maybecx);
        return false;
    }

Jan de Mooij [:jandem]

Assignee

Comment 11

•

10 years ago

(In reply to Jeff Walden [:Waldo] (remove +bmo to email) from comment #6)
> k < length, just in case something screwy happens and we overflow, so
> possibly we end up in code paths we can easily comprehend?  Just a general
> principle to always use < rather than != for this sort of thing.  (And I'm
> told clang, at least, can optimize better in that case, because it can
> sometimes know wraparound isn't possible.)

OK, k < length is what I'd write in my own code or tell people to write, but with the signed-length and index thing I was being overly cautious. But yeah, k better be <= length or we'll have a buffer overflow in this code. Changed.

> @@ -310,5 @@
> Having explicit do { and while (false) seems much better to me.

Good point, done.

> Range::subrange or Range::slice seems valuable for this, to preserve
> in-bounds assertions.  I can r+ a patch for that.  :-)

These methods don't exist so if you don't mind I'll use Range/RangedPtr in this code and file a followup bug to add slice/subrange and use them. There's a *ton* of code I (still) have to change, and I'm really trying to leave the code in a better state than it was before (just look at the other patches in this bug), but I also don't want to get too distracted fixing pre-existing issues :)

> This was wrong even before this patch, right?  How did we not fail any
> assertions?

It wasn't wrong before this patch, see also comment 3. Basically it's okay to infallibleAppend(jschar) if you call ensureTwoByteChars first, and we do that (only) for TwoByte strings. Now Latin1 strings can also be used here and we don't call ensureTwoByteChars in this case, so infallibleAppend(jschar) will assert.

There are different ways to fix this, but as this is not perf-sensitive it seemed best to just use append(jschar), which will always do the right thing: try to append to a Latin1 buffer and if out of range it will do (fallible) Latin1 -> TwoByte inflation.

(The code also does sb.reserve(input-length) and output-length is always <= input-length.)

> Surprising this doesn't require Rooted.  Or does it?

I think it's fine, the string still isn't used after potential GC-calls. But I'll change it to a RootedLinearString; this is not the code where an unnecessary root matters :)

> SafeCast<int>(str->length()), please.

Done.

> Range(str->latin1Chars(nogc), length)?

OK, will change it to use str->latin1Range(nogc). Maybe we can have some good first bugs to convert existing code to Range/RangedPtr?

Jan de Mooij [:jandem]

Assignee

Comment 12

•

10 years ago

Pushed with comments addressed:

https://hg.mozilla.org/integration/mozilla-inbound/rev/68cec11659b9
https://hg.mozilla.org/integration/mozilla-inbound/rev/aea4ea200313
https://hg.mozilla.org/integration/mozilla-inbound/rev/aff142aa038d
https://hg.mozilla.org/integration/mozilla-inbound/rev/f08470bb573a
https://hg.mozilla.org/integration/mozilla-inbound/rev/45dd001f865a

Leaving this bug open to make str_escape return Latin1 strings once Latin1 strings are enabled (as mentioned in comment 5).

Keywords: leave-open

Carsten Book [:Tomcat]

Comment 13

•

10 years ago

sorry had to back this out for test failures like https://tbpl.mozilla.org/php/getParsedLog.php?id=41959773&tree=Mozilla-Inbound on windows debug builds

Jan de Mooij [:jandem]

Assignee

Comment 14

•

10 years ago

Relanded, Windows debug bustage was unrelated (bug 1027149):

https://hg.mozilla.org/integration/mozilla-inbound/rev/3343738708a5
https://hg.mozilla.org/integration/mozilla-inbound/rev/e794548b0169
https://hg.mozilla.org/integration/mozilla-inbound/rev/f65a13922049
https://hg.mozilla.org/integration/mozilla-inbound/rev/fd9bc8054cdd
https://hg.mozilla.org/integration/mozilla-inbound/rev/0784505c202a

Ed Morley [:emorley]

Comment 15

•

10 years ago

https://hg.mozilla.org/mozilla-central/rev/3343738708a5
https://hg.mozilla.org/mozilla-central/rev/e794548b0169
https://hg.mozilla.org/mozilla-central/rev/f65a13922049
https://hg.mozilla.org/mozilla-central/rev/fd9bc8054cdd
https://hg.mozilla.org/mozilla-central/rev/0784505c202a

Jan de Mooij [:jandem]

Assignee

Comment 16

•

10 years ago

(In reply to Jan de Mooij [:jandem] from comment #5)
> Created attachment 8441602 [details] [diff] [review]
> Part 5 - Convert escape
> 
> Currently it always returns a TwoByte string, but the characters are really
> Latin1. Unfortunately there's no easy way to check EnableLatin1Strings here
> and do the right thing in that case.
> 
> I'll keep this bug open and get back to this once everything is enabled by
> default.

See bug 1041469 part 5.

Status: ASSIGNED → RESOLVED

Closed: 10 years ago

Keywords: leave-open

Resolution: --- → FIXED

Target Milestone: --- → mozilla33

You need to log in before you can comment on or make changes to this bug.