Closed Bug 1026680 Opened 10 years ago Closed 10 years ago

Latin1 strings: convert escape/unescape, URI encoding/decoding functions

Categories

(Core :: JavaScript Engine, defect)

defect
Not set
normal

Tracking

()

RESOLVED FIXED
mozilla33

People

(Reporter: jandem, Assigned: jandem)

References

Details

Attachments

(5 files)

      No description provided.
I moved the algorithm into a separate function, changed the return type to an enum and eliminated the gotos. After that it was pretty straight-forward.
Attachment #8441584 - Flags: review?(n.nethercote)
Attached patch Part 2 - encode*Splinter Review
Attachment #8441587 - Flags: review?(n.nethercote)
A bit simpler than the others.

I changed infallibleAppend(c) to append(c) to avoid an assert in infallibleAppend(c). infallibleAppend(jschar) asserts when the buffer is latin1, because inflation is fallible.
Attachment #8441591 - Flags: review?(n.nethercote)
Attachment #8441592 - Flags: review?(n.nethercote)
Currently it always returns a TwoByte string, but the characters are really Latin1. Unfortunately there's no easy way to check EnableLatin1Strings here and do the right thing in that case.

I'll keep this bug open and get back to this once everything is enabled by default.
Attachment #8441602 - Flags: review?(n.nethercote)
Comment on attachment 8441591 [details] [diff] [review]
Part 3 - unescape

Review of attachment 8441591 [details] [diff] [review]:
-----------------------------------------------------------------

::: js/src/jsstr.cpp
@@ +252,5 @@
>      int k = 0;
>      bool building = false;
>  
> +    /* Step 5. */
> +    while (k != length) {

k < length, just in case something screwy happens and we overflow, so possibly we end up in code paths we can easily comprehend?  Just a general principle to always use < rather than != for this sort of thing.  (And I'm told clang, at least, can optimize better in that case, because it can sometimes know wraparound isn't possible.)

@@ -310,5 @@
>              if (!sb.reserve(length))                \
>                  return false;                       \
>              sb.infallibleAppend(chars, k);          \
> -        }                                           \
> -    JS_END_MACRO

While I don't much like the JS_{BEGIN,END}_MACRO nonsense, I don't think you should get rid of the equivalent thing.  Without this, this code doesn't do what it looks like:

    if (...)
        ENSURE_BUILDING;
    else
       ...;

Having explicit do { and while (false) seems much better to me.

@@ +279,3 @@
>  
>          /* Step 10-13. */
>          if (Unhex4(&chars[k + 2], &c)) {

Range::subrange or Range::slice seems valuable for this, to preserve in-bounds assertions.  I can r+ a patch for that.  :-)

@@ -332,5 @@
>          }
>  
>        step_18:
> -        if (building)
> -            sb.infallibleAppend(c);

This was wrong even before this patch, right?  How did we not fail any assertions?

@@ +313,5 @@
> +{
> +    CallArgs args = CallArgsFromVp(argc, vp);
> +
> +    /* Step 1. */
> +    JSLinearString *str = ArgToRootedString(cx, args, 0);

Surprising this doesn't require Rooted.  Or does it?

@@ +327,5 @@
> +     * NB: use signed integers for length/index to allow simple length
> +     * comparisons without unsigned-underflow hazards.
> +     */
> +    JS_STATIC_ASSERT(JSString::MAX_LENGTH <= INT_MAX);
> +    int length = str->length();

SafeCast<int>(str->length()), please.

@@ +331,5 @@
> +    int length = str->length();
> +
> +    if (str->hasLatin1Chars()) {
> +        AutoCheckCannotGC nogc;
> +        if (!Unescape(sb, str->latin1Chars(nogc), length))

Range(str->latin1Chars(nogc), length)?
Attachment #8441584 - Flags: review?(n.nethercote) → review+
Attachment #8441587 - Flags: review?(n.nethercote) → review+
Comment on attachment 8441591 [details] [diff] [review]
Part 3 - unescape

Review of attachment 8441591 [details] [diff] [review]:
-----------------------------------------------------------------

I'll cancel this r?, and let Waldo do the review of the updated patch.
Attachment #8441591 - Flags: review?(n.nethercote)
Comment on attachment 8441592 [details] [diff] [review]
Part 4 - Cleanup escape

Review of attachment 8441592 [details] [diff] [review]:
-----------------------------------------------------------------

::: js/src/jsstr.cpp
@@ +151,3 @@
>           */
> +        static_assert(JSString::MAX_LENGTH < UINT32_MAX / 6,
> +                      "newlength must not overflow");

What happens if you have a string of length MAX_LENGTH, and then some of its chars need to be escaped? Will the js_NewString() below fail?
Attachment #8441592 - Flags: review?(n.nethercote) → review+
Attachment #8441602 - Flags: review?(n.nethercote) → review+
Comment on attachment 8441591 [details] [diff] [review]
Part 3 - unescape

Review of attachment 8441591 [details] [diff] [review]:
-----------------------------------------------------------------

Actually, I'm happy to say r=me with Waldo's nits addressed.
Attachment #8441591 - Flags: review+
(In reply to Nicholas Nethercote [:njn] from comment #8)
> What happens if you have a string of length MAX_LENGTH, and then some of its
> chars need to be escaped? Will the js_NewString() below fail?

Yes, js_NewString -> JSFlatString::new_ -> validateLength, and there we do:

    if (MOZ_UNLIKELY(length > JSString::MAX_LENGTH)) {
        js_ReportAllocationOverflow(maybecx);
        return false;
    }
(In reply to Jeff Walden [:Waldo] (remove +bmo to email) from comment #6)
> k < length, just in case something screwy happens and we overflow, so
> possibly we end up in code paths we can easily comprehend?  Just a general
> principle to always use < rather than != for this sort of thing.  (And I'm
> told clang, at least, can optimize better in that case, because it can
> sometimes know wraparound isn't possible.)

OK, k < length is what I'd write in my own code or tell people to write, but with the signed-length and index thing I was being overly cautious. But yeah, k better be <= length or we'll have a buffer overflow in this code. Changed.

> @@ -310,5 @@
> Having explicit do { and while (false) seems much better to me.

Good point, done.

> Range::subrange or Range::slice seems valuable for this, to preserve
> in-bounds assertions.  I can r+ a patch for that.  :-)

These methods don't exist so if you don't mind I'll use Range/RangedPtr in this code and file a followup bug to add slice/subrange and use them. There's a *ton* of code I (still) have to change, and I'm really trying to leave the code in a better state than it was before (just look at the other patches in this bug), but I also don't want to get too distracted fixing pre-existing issues :)

> This was wrong even before this patch, right?  How did we not fail any
> assertions?

It wasn't wrong before this patch, see also comment 3. Basically it's okay to infallibleAppend(jschar) if you call ensureTwoByteChars first, and we do that (only) for TwoByte strings. Now Latin1 strings can also be used here and we don't call ensureTwoByteChars in this case, so infallibleAppend(jschar) will assert.

There are different ways to fix this, but as this is not perf-sensitive it seemed best to just use append(jschar), which will always do the right thing: try to append to a Latin1 buffer and if out of range it will do (fallible) Latin1 -> TwoByte inflation.

(The code also does sb.reserve(input-length) and output-length is always <= input-length.)

> Surprising this doesn't require Rooted.  Or does it?

I think it's fine, the string still isn't used after potential GC-calls. But I'll change it to a RootedLinearString; this is not the code where an unnecessary root matters :)

> SafeCast<int>(str->length()), please.

Done.

> Range(str->latin1Chars(nogc), length)?

OK, will change it to use str->latin1Range(nogc). Maybe we can have some good first bugs to convert existing code to Range/RangedPtr?
sorry had to back this out for test failures like https://tbpl.mozilla.org/php/getParsedLog.php?id=41959773&tree=Mozilla-Inbound on windows debug builds
(In reply to Jan de Mooij [:jandem] from comment #5)
> Created attachment 8441602 [details] [diff] [review]
> Part 5 - Convert escape
> 
> Currently it always returns a TwoByte string, but the characters are really
> Latin1. Unfortunately there's no easy way to check EnableLatin1Strings here
> and do the right thing in that case.
> 
> I'll keep this bug open and get back to this once everything is enabled by
> default.

See bug 1041469 part 5.
Status: ASSIGNED → RESOLVED
Closed: 10 years ago
Keywords: leave-open
Resolution: --- → FIXED
Target Milestone: --- → mozilla33
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: