Wifi hotspot: Can't handle non-ASCII chars in SSID

RESOLVED FIXED

Status

defect
RESOLVED FIXED
6 years ago
5 years ago

People

(Reporter: chucklee, Assigned: chucklee)

Tracking

unspecified
All
Gonk (Firefox OS)
Dependency tree / graph
Bug Flags:
in-testsuite ?

Firefox Tracking Flags

(Not tracked)

Details

Attachments

(3 attachments, 3 obsolete attachments)

When using non-ASCII chars in SSID of Hotspot, SSID parameter in sent management frame is not converted to UTF-8(while Android and iPhone does).
It seems being truncated.
Posted patch WIP - Encode string to UTF-8 (obsolete) — Splinter Review
Change |JS_EncodeString()| from encode string into latin1 to UTF-8, this is required while user sets non-ASCII chars as hotspot SSID.

Changing the function shouldn't affect other function since UTF-8 is compatible with ASCII.

Please give me some feedback/suggestion if this is a proper way to fix the bug.
Attachment #708435 - Flags: feedback?(terrence)
Attachment #708435 - Flags: feedback?(mrbkap)
Attachment #708435 - Flags: feedback?(jwalden+bmo)
Comment on attachment 708435 [details] [diff] [review]
WIP - Encode string to UTF-8

Review of attachment 708435 [details] [diff] [review]:
-----------------------------------------------------------------

(In reply to Chuck Lee [:chucklee] from comment #1)
> Change |JS_EncodeString()| from encode string into latin1 to UTF-8, this is
> required while user sets non-ASCII chars as hotspot SSID.
> 
> Changing the function shouldn't affect other function since UTF-8 is
> compatible with ASCII.

Real life isn't so simple, unfortunately.

> Please give me some feedback/suggestion if this is a proper way to fix the
> bug.

I think you're on the right track with everything here except for changing the operation of JS_EncodeString. It would actually help quite a bit if you could explain why the string is flowing through JS_EncodeString at all: everything the browser interacts with should be using the UCS-2 interfaces. Things that need a JS String as UTF-8 typically get the UCS-2 chars and use the browser APIs to convert UCS-2 to UTF-8.

Note: we /do/ want to implement these interfaces and I'd love to take this patch with the nits fixed, but we should really understand why this is a problem at all first.

::: js/src/jsapi.cpp
@@ +6183,4 @@
>      if (!linear)
>          return NULL;
>  
> +    return TwoByteCharsToUtf8CharsZ(cx, linear->range()).c_str();

It is not really acceptable to change the API like this. What you need to do is make a new API entry point to call: JS_EncodeStringToUTF8.

::: js/src/vm/CharacterEncoding.cpp
@@ +26,5 @@
>      return Latin1CharsZ(latin1, len);
>  }
>  
> +UTF8CharsZ
> +JS::TwoByteCharsToUtf8CharsZ(JSContext *cx, TwoByteChars tbchars)

Please include New in the name -- TwoByteCharsToNewUtf8CharsZ -- to indicate that this method returns malloced data.

@@ +63,5 @@
> +    }
> +    utf8[len] = '\0';
> +
> +    return UTF8CharsZ(utf8, len);
> +}

Instead of rolling a new one, please move the implementations of GetDeflatedUTF8StringLength and DeflateStringToUTF8Buffer in js/src/shell/js.cpp here and use them to implement this method.
Attachment #708435 - Flags: feedback?(terrence) → feedback+
(In reply to Terrence Cole [:terrence] from comment #3)
> Comment on attachment 708435 [details] [diff] [review]
> WIP - Encode string to UTF-8
> 
> Review of attachment 708435 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> (In reply to Chuck Lee [:chucklee] from comment #1)
> > Change |JS_EncodeString()| from encode string into latin1 to UTF-8, this is
> > required while user sets non-ASCII chars as hotspot SSID.
> > 
> > Changing the function shouldn't affect other function since UTF-8 is
> > compatible with ASCII.
> 
> Real life isn't so simple, unfortunately.
> 
> > Please give me some feedback/suggestion if this is a proper way to fix the
> > bug.
> 
> I think you're on the right track with everything here except for changing
> the operation of JS_EncodeString. It would actually help quite a bit if you
> could explain why the string is flowing through JS_EncodeString at all:
> everything the browser interacts with should be using the UCS-2 interfaces.
> Things that need a JS String as UTF-8 typically get the UCS-2 chars and use
> the browser APIs to convert UCS-2 to UTF-8.
> 
> Note: we /do/ want to implement these interfaces and I'd love to take this
> patch with the nits fixed, but we should really understand why this is a
> problem at all first.
> 

Hi Terrence,

Thanks for the detailed comments.

This bug is about sending a UCS-2 string out of browser, so it needs to be transformed.
The string flow starts at [1], where gecko encode a command string by |JSAutoByteString::encode()| [2], this is where |JS_EncodeString()| [3] is called. Then the encoded command is sent to Netd.
If there are UCS-2 chars in the command string, they are stripped in |LossyTwoByteCharsToNewLatin1CharsZ()|. That's the root cause of this bug.

Originally, I thought I can't tell when to call |TwoByteCharsToUtf8CharsZ()| or |LossyTwoByteCharsToNewLatin1CharsZ()|, and there might be no compatibility issue. So I change |LossyTwoByteCharsToNewLatin1CharsZ()| to |TwoByteCharsToUtf8CharsZ()| for minimum change.

After your comment, my new idea is add JS_EncodeStringToUTF8() as you mentioned, and add |JSAutoByteString::encodeUtf8()| to provide such encoding function.Just like the |JSAutoByteString::encodeUtf8| -> |JS_EncodeString()| pair.

Is this a reasonable solution to you?

[1] https://mxr.mozilla.org/mozilla-central/source/dom/system/gonk/SystemWorkerManager.cpp#204
[2] https://mxr.mozilla.org/mozilla-central/source/js/src/jsapi.h#5559
[3] https://mxr.mozilla.org/mozilla-central/source/js/src/jsapi.cpp#6176
> @@ +63,5 @@
> > +    }
> > +    utf8[len] = '\0';
> > +
> > +    return UTF8CharsZ(utf8, len);
> > +}
> 
> Instead of rolling a new one, please move the implementations of
> GetDeflatedUTF8StringLength and DeflateStringToUTF8Buffer in
> js/src/shell/js.cpp here and use them to implement this method.

These two are static functions, do you mean copy-paste them here, or make them non-static?
(In reply to Chuck Lee [:chucklee] from comment #4)
> (In reply to Terrence Cole [:terrence] from comment #3)
> Hi Terrence,
> 
> Thanks for the detailed comments.
> 
> This bug is about sending a UCS-2 string out of browser, so it needs to be
> transformed.
> The string flow starts at [1], where gecko encode a command string by
> |JSAutoByteString::encode()| [2], this is where |JS_EncodeString()| [3] is
> called. Then the encoded command is sent to Netd.
> If there are UCS-2 chars in the command string, they are stripped in
> |LossyTwoByteCharsToNewLatin1CharsZ()|. That's the root cause of this bug.

Ah, of course. Thank you for the explanation! JSAutoByteString is our easiest to use character interface -- it also provides the least useful characters as output and no indication that it is going to do so. In the future we'd like to kill off JSAutoByteString and replace it with something less error-prone.

> Originally, I thought I can't tell when to call |TwoByteCharsToUtf8CharsZ()|
> or |LossyTwoByteCharsToNewLatin1CharsZ()|, and there might be no
> compatibility issue. So I change |LossyTwoByteCharsToNewLatin1CharsZ()| to
> |TwoByteCharsToUtf8CharsZ()| for minimum change.
> 
> After your comment, my new idea is add JS_EncodeStringToUTF8() as you
> mentioned, and add |JSAutoByteString::encodeUtf8()| to provide such encoding
> function.Just like the |JSAutoByteString::encodeUtf8| -> |JS_EncodeString()|
> pair.
> 
> Is this a reasonable solution to you?

Yes, that sounds like an excellent interim solution. In addition, please rename |JSAutoByteString::encode()| to |JSAutoByteString::encodeLatin1()| and make it return a Latin1CharsZ. Then make |encode()| call |encodeLatin1()|. Hopefully, this will make similar misuse less likely to happen in the future.

(In reply to Chuck Lee [:chucklee] from comment #5)
> > @@ +63,5 @@
> > > +    }
> > > +    utf8[len] = '\0';
> > > +
> > > +    return UTF8CharsZ(utf8, len);
> > > +}
> > 
> > Instead of rolling a new one, please move the implementations of
> > GetDeflatedUTF8StringLength and DeflateStringToUTF8Buffer in
> > js/src/shell/js.cpp here and use them to implement this method.
> 
> These two are static functions, do you mean copy-paste them here, or make
> them non-static?

To clarify, I'd like to move them to static functions inside of CharacterEncoding.cpp that only TwoByteCharsToNewUTF8 calls. You would then need to remove them and JSStringToUTF8 from shell/js.cpp and replace all of the calls to JSStringToUTF8 with the new JS_EncodeStringToUTF8. These functions will have basically the same interface, so this should be a trivial conversion.

Please let me know if you have any other questions!
Comment on attachment 708435 [details] [diff] [review]
WIP - Encode string to UTF-8

Review of attachment 708435 [details] [diff] [review]:
-----------------------------------------------------------------

Terrence seems to have this covered pretty well.  :-)  I'd only add that you should add documentation comments in the header next to any new API you add.  And please specifically call out how you handle invalid inputs, like mispaired surrogates in TwoByteChars inputs.  U+FFFD is fine in the absence of any direction (in the requirements of the caller) that such inputs are an error, but you should be clear that that's what you're doing.
Attachment #708435 - Flags: feedback?(jwalden+bmo)
Comment on attachment 708435 [details] [diff] [review]
WIP - Encode string to UTF-8

Clearing this feedback request since it looks like Terrence and Waldo got to it first.
Attachment #708435 - Flags: feedback?(mrbkap)
1. Move implementation of string-to-UTF8 encoding from JSStringToUTF8() to new created TwoByteCharsToNewUtf8CharsZ() for public use.
2. Provide JS_EncodeStringToUTF8() to encode string into UTF-8 byte array.
Attachment #708435 - Attachment is obsolete: true
Attachment #716432 - Flags: review?(terrence)
Address the "rename |JSAutoByteString::encode()| to |JSAutoByteString::encodeLatin1()|" in comment 6.
Attachment #716438 - Flags: review?(terrence)
Comment on attachment 716432 [details] [diff] [review]
0001. Support encode string to UTF-8 byte array.

Review of attachment 716432 [details] [diff] [review]:
-----------------------------------------------------------------

Two minor style nits, otherwise perfect.

::: js/public/CharacterEncoding.h
@@ +74,5 @@
>      {
>          JS_ASSERT(aBytes[aLength] == '\0');
>      }
> +
> +    UTF8CharsZ(unsigned char *bytes, size_t length)

Please use aBytes and aLength to match the file's style.

@@ +145,5 @@
>  extern Latin1CharsZ
>  LossyTwoByteCharsToNewLatin1CharsZ(JSContext *cx, TwoByteChars tbchars);
>  
> +extern UTF8CharsZ
> +TwoByteCharsToNewUtf8CharsZ(JSContext *cx, TwoByteChars tbchars);

Capitalize UTF8 in the method to match the type name.
Attachment #716432 - Flags: review?(terrence) → review+
Attachment #716434 - Flags: review?(terrence) → review+
Comment on attachment 716438 [details] [diff] [review]
0003. Rename JSAutoByteString.encode() to encodeLatin1().

Review of attachment 716438 [details] [diff] [review]:
-----------------------------------------------------------------

Awesome work! Thank you for taking the time to do this.
Attachment #716438 - Flags: review?(terrence) → review+
(In reply to Ryan VanderMeulen [:RyanVM] from comment #16)
> 
> Should this have tests?

Follow up Bug 835719 is fired to add test cases, thank you for reminding.
^ ...that's this bug.  :-)
It's Bug 844687, Sorry for wrong copy & paste, and thanks for pointing that out.
Depends on: 849456
Duplicate of this bug: 917218

Updated

6 years ago
Duplicate of this bug: 917218

Updated

5 years ago
See Also: → 1065208
You need to log in before you can comment on or make changes to this bug.