mbslen implementation seems brittle; should be simplified

RESOLVED FIXED in Firefox 57

Status

()

Core
WebRTC: Networking
RESOLVED FIXED
5 years ago
8 months ago

People

(Reporter: hsivonen, Assigned: hsivonen)

Tracking

unspecified
mozilla57
Points:
---

Firefox Tracking Flags

(firefox57 fixed)

Details

(Whiteboard: [WebRTC] [blocking-webrtc-])

MozReview Requests

()

Submitter Diff Changes Open Issues Last Updated
Loading...
Error loading review requests:

Attachments

(2 attachments, 1 obsolete attachment)

(Assignee)

Description

5 years ago
Follow-up for bug 838319:

The STUN protocol exhibits a strange design that limits the length of UTF-8-encoded values in terms of the number of characters. (See e.g. https://tools.ietf.org/html/rfc5389#section-15.7 . Interestingly, the spec notes that the having 128 as the maximum number of characters means that the string can be up to 763 bytes long, which suggests that the spec is written with the assumption that UTF-8 sequences outside the 21-bit Unicode range might be used. It's also rather curious to make the number of characters a power of two as opposed to making the number of bytes a power of two. Well, the RFC is what it is.)

The code for checking for this is full of #ifdefs and uses a library function whose behavior is sensitive to the activity C library locale. (http://mxr.mozilla.org/mozilla-central/source/media/mtransport/third_party/nICEr/src/util/mbslen.c#53) This seems brittle and over complex.

I think the mbslen function should be rewritten so that it doesn't call to external libraries but counts the number of characters in a UTF-8 string itself. If it's safe to assume that UTF-8 string is a valid UTF-8, the task is very easy: loop over the string until NUL and count all non-continuation bytes--that is, count all bytes whose bit pattern is not 10xxxxxx.
(Assignee)

Comment 1

5 years ago
Created attachment 736258 [details] [diff] [review]
Totally untested drive-by patch suggestion

I didn't try even compiling this, let alone check whether the input can really be assumed to be valid, because WebRTC doesn't compile for me right now due to bug 860222.
Note (per IRC) that we should not assume the input to be valid, and that the STUN spec requires these be UTF-8 strings of a specified length or less (so "fits in the buffer" could violate the spec as the buffer is sized for the worst-case UTF-8 string (per comment 0).

Perhaps make the "validate UTF8 string/length" an (optional) external to this library and in our use delegate that to our existing UTF8 handling code in C++.
Whiteboard: [WebRTC] [blocking-webrtc-]

Updated

3 years ago
backlog: --- → parking-lot
(Assignee)

Comment 3

8 months ago
Still hoping to remove the setlocale() call here.

(In reply to Randell Jesup [:jesup] from comment #2)
> Note (per IRC) that we should not assume the input to be valid

I take it that the function should return R_BAD_DATA in that case.

> Perhaps make the "validate UTF8 string/length" an (optional) external to
> this library and in our use delegate that to our existing UTF8 handling code
> in C++.

How about the following?

If some C preprocessor define indicating that the code is being built as part of m-c is set, validate UTF-8 using encoding_rs. If validation fails, return R_BAD_DATA. If validation succeeds, count the leads bytes per the currently-attached patch.

What would the appropriate preprocessor define to check be?
Flags: needinfo?(rjesup)
Comment hidden (mozreview-request)
(Assignee)

Updated

8 months ago
Attachment #736258 - Attachment is obsolete: true
Comment hidden (mozreview-request)
Comment hidden (mozreview-request)
(Assignee)

Comment 7

8 months ago
(In reply to Henri Sivonen (:hsivonen) from comment #3)
> What would the appropriate preprocessor define to check be?

MOZILLA_VERSION probably.
Flags: needinfo?(rjesup)
(Assignee)

Updated

8 months ago
Assignee: nobody → hsivonen
Status: NEW → ASSIGNED
Comment hidden (mozreview-request)
Comment hidden (mozreview-request)
(Assignee)

Updated

8 months ago
Attachment #8904429 - Flags: review?(rjesup)

Comment 10

8 months ago
mozreview-review
Comment on attachment 8904429 [details]
Bug 860727 - Implement UTF-8 code unit counting without setlocale().

https://reviewboard.mozilla.org/r/176266/#review181426

Forwarding review to bwc, who might want to check with ekr about how to embed mozilla-specific code in nICEr.  "normally" this would be done by some sort of API point to install a callback for an alternate to the inline code, but perhaps it's ok to have in there ifdefed.  (and it's simpler...)
Attachment #8904429 - Flags: review?(rjesup)

Updated

8 months ago
Attachment #8904429 - Flags: review?(docfaraday)

Comment 11

8 months ago
mozreview-review
Comment on attachment 8904429 [details]
Bug 860727 - Implement UTF-8 code unit counting without setlocale().

https://reviewboard.mozilla.org/r/176266/#review181446

So, given that we call this function in only one place, and just let failures slide when they happen (https://dxr.mozilla.org/mozilla-central/source/media/mtransport/third_party/nICEr/src/stun/stun_codec.c#228), I think it would make sense to write a count_utf8_chars_no_validation function and put it in stun_codec.c, use that instead of mbslen, and either leave mbslen.c alone or remove it from the build.
Attachment #8904429 - Flags: review?(docfaraday) → review-
Comment hidden (mozreview-request)
Comment hidden (mozreview-request)
(Assignee)

Updated

8 months ago
Attachment #8904429 - Flags: review?(docfaraday)
Attachment #8904895 - Flags: review?(docfaraday)
(Assignee)

Comment 15

8 months ago
mozreview-review-reply
Comment on attachment 8904429 [details]
Bug 860727 - Implement UTF-8 code unit counting without setlocale().

https://reviewboard.mozilla.org/r/176266/#review181446

Created a function that counts under the assumption of UTF-8 validity. Removed mbslen.c/h, because dead-code files are confusing and the m-c import of nICEr already omits some upstream files. Created a corresponding PR for the upstream repo.

Comment 16

8 months ago
mozreview-review
Comment on attachment 8904429 [details]
Bug 860727 - Implement UTF-8 code unit counting without setlocale().

https://reviewboard.mozilla.org/r/176266/#review181762
Attachment #8904429 - Flags: review+

Comment 17

8 months ago
mozreview-review
Comment on attachment 8904895 [details]
Bug 860727 clean-up - Remove mbslen.c/h from the tree.

https://reviewboard.mozilla.org/r/176676/#review181768

::: media/mtransport/third_party/nICEr/IMPORT_FILES:71
(Diff revision 1)
>                  ./src/util/cb_args.c
>                  ./src/util/cb_args.h
>                  ./src/util/ice_util.c
>                  ./src/util/ice_util.h
> -                ./src/util/mbslen.c
> -                ./src/util/mbslen.h
> +                #./src/util/mbslen.c
> +                #./src/util/mbslen.h

Let's actually remove these.

::: media/mtransport/third_party/nICEr/nicer.gyp:122
(Diff revision 1)
> -                "./src/util/mbslen.c",
> -                "./src/util/mbslen.h",
> +                #"./src/util/mbslen.c",
> +                #"./src/util/mbslen.h",

Same.
Attachment #8904895 - Flags: review?(docfaraday) → review+
Comment hidden (mozreview-request)
(Assignee)

Comment 19

8 months ago
mozreview-review-reply
Comment on attachment 8904895 [details]
Bug 860727 clean-up - Remove mbslen.c/h from the tree.

https://reviewboard.mozilla.org/r/176676/#review181768

> Let's actually remove these.

Removed

> Same.

Removed. Thanks.

Comment 20

8 months ago
Pushed by hsivonen@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/7f6cfb32a16e
Implement UTF-8 code unit counting without setlocale(). r=bwc
https://hg.mozilla.org/integration/autoland/rev/6a66de2d1901
clean-up - Remove mbslen.c/h from the tree. r=bwc

Comment 21

8 months ago
bugherder
https://hg.mozilla.org/mozilla-central/rev/7f6cfb32a16e
https://hg.mozilla.org/mozilla-central/rev/6a66de2d1901
Status: ASSIGNED → RESOLVED
Last Resolved: 8 months ago
status-firefox57: --- → fixed
Resolution: --- → FIXED
Target Milestone: --- → mozilla57
You need to log in before you can comment on or make changes to this bug.