mbslen implementation seems brittle; should be simplified

RESOLVED FIXED in Firefox 57

Status

()

defect
RESOLVED FIXED
6 years ago
2 years ago

People

(Reporter: hsivonen, Assigned: hsivonen)

Tracking

unspecified
mozilla57
Points:
---

Firefox Tracking Flags

(firefox57 fixed)

Details

(Whiteboard: [WebRTC] [blocking-webrtc-])

Attachments

(2 attachments, 1 obsolete attachment)

Follow-up for bug 838319:

The STUN protocol exhibits a strange design that limits the length of UTF-8-encoded values in terms of the number of characters. (See e.g. https://tools.ietf.org/html/rfc5389#section-15.7 . Interestingly, the spec notes that the having 128 as the maximum number of characters means that the string can be up to 763 bytes long, which suggests that the spec is written with the assumption that UTF-8 sequences outside the 21-bit Unicode range might be used. It's also rather curious to make the number of characters a power of two as opposed to making the number of bytes a power of two. Well, the RFC is what it is.)

The code for checking for this is full of #ifdefs and uses a library function whose behavior is sensitive to the activity C library locale. (http://mxr.mozilla.org/mozilla-central/source/media/mtransport/third_party/nICEr/src/util/mbslen.c#53) This seems brittle and over complex.

I think the mbslen function should be rewritten so that it doesn't call to external libraries but counts the number of characters in a UTF-8 string itself. If it's safe to assume that UTF-8 string is a valid UTF-8, the task is very easy: loop over the string until NUL and count all non-continuation bytes--that is, count all bytes whose bit pattern is not 10xxxxxx.
I didn't try even compiling this, let alone check whether the input can really be assumed to be valid, because WebRTC doesn't compile for me right now due to bug 860222.
Note (per IRC) that we should not assume the input to be valid, and that the STUN spec requires these be UTF-8 strings of a specified length or less (so "fits in the buffer" could violate the spec as the buffer is sized for the worst-case UTF-8 string (per comment 0).

Perhaps make the "validate UTF8 string/length" an (optional) external to this library and in our use delegate that to our existing UTF8 handling code in C++.
Whiteboard: [WebRTC] [blocking-webrtc-]
backlog: --- → parking-lot
Still hoping to remove the setlocale() call here.

(In reply to Randell Jesup [:jesup] from comment #2)
> Note (per IRC) that we should not assume the input to be valid

I take it that the function should return R_BAD_DATA in that case.

> Perhaps make the "validate UTF8 string/length" an (optional) external to
> this library and in our use delegate that to our existing UTF8 handling code
> in C++.

How about the following?

If some C preprocessor define indicating that the code is being built as part of m-c is set, validate UTF-8 using encoding_rs. If validation fails, return R_BAD_DATA. If validation succeeds, count the leads bytes per the currently-attached patch.

What would the appropriate preprocessor define to check be?
Flags: needinfo?(rjesup)
(In reply to Henri Sivonen (:hsivonen) from comment #3)
> What would the appropriate preprocessor define to check be?

MOZILLA_VERSION probably.
Flags: needinfo?(rjesup)
Assignee: nobody → hsivonen
Status: NEW → ASSIGNED
Comment on attachment 8904429 [details]
Bug 860727 - Implement UTF-8 code unit counting without setlocale().

https://reviewboard.mozilla.org/r/176266/#review181426

Forwarding review to bwc, who might want to check with ekr about how to embed mozilla-specific code in nICEr.  "normally" this would be done by some sort of API point to install a callback for an alternate to the inline code, but perhaps it's ok to have in there ifdefed.  (and it's simpler...)
Attachment #8904429 - Flags: review?(rjesup)
Attachment #8904429 - Flags: review?(docfaraday)
Comment on attachment 8904429 [details]
Bug 860727 - Implement UTF-8 code unit counting without setlocale().

https://reviewboard.mozilla.org/r/176266/#review181446

So, given that we call this function in only one place, and just let failures slide when they happen (https://dxr.mozilla.org/mozilla-central/source/media/mtransport/third_party/nICEr/src/stun/stun_codec.c#228), I think it would make sense to write a count_utf8_chars_no_validation function and put it in stun_codec.c, use that instead of mbslen, and either leave mbslen.c alone or remove it from the build.
Attachment #8904429 - Flags: review?(docfaraday) → review-
Attachment #8904429 - Flags: review?(docfaraday)
Attachment #8904895 - Flags: review?(docfaraday)
Comment on attachment 8904429 [details]
Bug 860727 - Implement UTF-8 code unit counting without setlocale().

https://reviewboard.mozilla.org/r/176266/#review181446

Created a function that counts under the assumption of UTF-8 validity. Removed mbslen.c/h, because dead-code files are confusing and the m-c import of nICEr already omits some upstream files. Created a corresponding PR for the upstream repo.
Comment on attachment 8904429 [details]
Bug 860727 - Implement UTF-8 code unit counting without setlocale().

https://reviewboard.mozilla.org/r/176266/#review181762
Attachment #8904429 - Flags: review+
Comment on attachment 8904895 [details]
Bug 860727 clean-up - Remove mbslen.c/h from the tree.

https://reviewboard.mozilla.org/r/176676/#review181768

::: media/mtransport/third_party/nICEr/IMPORT_FILES:71
(Diff revision 1)
>                  ./src/util/cb_args.c
>                  ./src/util/cb_args.h
>                  ./src/util/ice_util.c
>                  ./src/util/ice_util.h
> -                ./src/util/mbslen.c
> -                ./src/util/mbslen.h
> +                #./src/util/mbslen.c
> +                #./src/util/mbslen.h

Let's actually remove these.

::: media/mtransport/third_party/nICEr/nicer.gyp:122
(Diff revision 1)
> -                "./src/util/mbslen.c",
> -                "./src/util/mbslen.h",
> +                #"./src/util/mbslen.c",
> +                #"./src/util/mbslen.h",

Same.
Attachment #8904895 - Flags: review?(docfaraday) → review+
Comment on attachment 8904895 [details]
Bug 860727 clean-up - Remove mbslen.c/h from the tree.

https://reviewboard.mozilla.org/r/176676/#review181768

> Let's actually remove these.

Removed

> Same.

Removed. Thanks.
Pushed by hsivonen@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/7f6cfb32a16e
Implement UTF-8 code unit counting without setlocale(). r=bwc
https://hg.mozilla.org/integration/autoland/rev/6a66de2d1901
clean-up - Remove mbslen.c/h from the tree. r=bwc
https://hg.mozilla.org/mozilla-central/rev/7f6cfb32a16e
https://hg.mozilla.org/mozilla-central/rev/6a66de2d1901
Status: ASSIGNED → RESOLVED
Closed: 2 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla57
You need to log in before you can comment on or make changes to this bug.