Closed Bug 860727 Opened 11 years ago Closed 7 years ago

mbslen implementation seems brittle; should be simplified

Tracking

()

Status:

RESOLVED FIXED

Milestone:

mozilla57

Tracking Flags:

Tracking

Status

firefox57

---

fixed

Blocking Flags:

backlog

parking-lot

People

(Reporter: hsivonen, Assigned: hsivonen)

Details

(Whiteboard: [WebRTC] [blocking-webrtc-])

Attachments

(2 files, 1 obsolete file)

Totally untested drive-by patch suggestion 11 years ago Henri Sivonen (:hsivonen) 4.14 KB, patch		Details \| Diff \| Splinter Review
Bug 860727 - Implement UTF-8 code unit counting without setlocale(). 7 years ago Henri Sivonen (:hsivonen) 59 bytes, text/x-review-board-request	bwc : review+	Details
Bug 860727 clean-up - Remove mbslen.c/h from the tree. 7 years ago Henri Sivonen (:hsivonen) 59 bytes, text/x-review-board-request	bwc : review+	Details

Henri Sivonen (:hsivonen)

Assignee

Description

•

11 years ago

Follow-up for bug 838319:

The STUN protocol exhibits a strange design that limits the length of UTF-8-encoded values in terms of the number of characters. (See e.g. https://tools.ietf.org/html/rfc5389#section-15.7 . Interestingly, the spec notes that the having 128 as the maximum number of characters means that the string can be up to 763 bytes long, which suggests that the spec is written with the assumption that UTF-8 sequences outside the 21-bit Unicode range might be used. It's also rather curious to make the number of characters a power of two as opposed to making the number of bytes a power of two. Well, the RFC is what it is.)

The code for checking for this is full of #ifdefs and uses a library function whose behavior is sensitive to the activity C library locale. (http://mxr.mozilla.org/mozilla-central/source/media/mtransport/third_party/nICEr/src/util/mbslen.c#53) This seems brittle and over complex.

I think the mbslen function should be rewritten so that it doesn't call to external libraries but counts the number of characters in a UTF-8 string itself. If it's safe to assume that UTF-8 string is a valid UTF-8, the task is very easy: loop over the string until NUL and count all non-continuation bytes--that is, count all bytes whose bit pattern is not 10xxxxxx.

Henri Sivonen (:hsivonen)

Assignee

Comment 1

•

11 years ago

Attached patch Totally untested drive-by patch suggestion (obsolete) — Details — Splinter Review

I didn't try even compiling this, let alone check whether the input can really be assumed to be valid, because WebRTC doesn't compile for me right now due to bug 860222.

Randell Jesup [:jesup] (needinfo me)

Comment 2

•

11 years ago

Note (per IRC) that we should not assume the input to be valid, and that the STUN spec requires these be UTF-8 strings of a specified length or less (so "fits in the buffer" could violate the spec as the buffer is sized for the worst-case UTF-8 string (per comment 0).

Perhaps make the "validate UTF8 string/length" an (optional) external to this library and in our use delegate that to our existing UTF8 handling code in C++.

Whiteboard: [WebRTC] [blocking-webrtc-]

Randell Jesup [:jesup] (needinfo me)

Updated

•

9 years ago

backlog: --- → parking-lot

Henri Sivonen (:hsivonen)

Assignee

Comment 3

•

7 years ago

Still hoping to remove the setlocale() call here.

(In reply to Randell Jesup [:jesup] from comment #2)
> Note (per IRC) that we should not assume the input to be valid

I take it that the function should return R_BAD_DATA in that case.

> Perhaps make the "validate UTF8 string/length" an (optional) external to
> this library and in our use delegate that to our existing UTF8 handling code
> in C++.

How about the following?

If some C preprocessor define indicating that the code is being built as part of m-c is set, validate UTF-8 using encoding_rs. If validation fails, return R_BAD_DATA. If validation succeeds, count the leads bytes per the currently-attached patch.

What would the appropriate preprocessor define to check be?

Flags: needinfo?(rjesup)

Comment hidden (mozreview-request)

setlocale() is not thread-safe, so when mbslen.c is being compiled as part of
Gecko, implement mbslen() without calls to setlocale() and the
locale-dependent parts of the C standard library.

Review commit: https://reviewboard.mozilla.org/r/176266/diff/#index_header
See other reviews: https://reviewboard.mozilla.org/r/176266/

Henri Sivonen (:hsivonen)

Assignee

Updated

•

7 years ago

Attachment #736258 - Attachment is obsolete: true

Comment hidden (mozreview-request)

Comment on attachment 8904429 [details]
Bug 860727 - Implement UTF-8 code unit counting without setlocale().

Review request updated; see interdiff: https://reviewboard.mozilla.org/r/176266/diff/1-2/

Comment hidden (mozreview-request)

Comment on attachment 8904429 [details]
Bug 860727 - Implement UTF-8 code unit counting without setlocale().

Review request updated; see interdiff: https://reviewboard.mozilla.org/r/176266/diff/2-3/

Henri Sivonen (:hsivonen)

Assignee

Comment 7

•

7 years ago

(In reply to Henri Sivonen (:hsivonen) from comment #3)
> What would the appropriate preprocessor define to check be?

MOZILLA_VERSION probably.

Flags: needinfo?(rjesup)

Henri Sivonen (:hsivonen)

Assignee

Updated

•

7 years ago

Assignee: nobody → hsivonen

Status: NEW → ASSIGNED

Comment hidden (mozreview-request)

Comment on attachment 8904429 [details]
Bug 860727 - Implement UTF-8 code unit counting without setlocale().

Review request updated; see interdiff: https://reviewboard.mozilla.org/r/176266/diff/3-4/

Comment hidden (mozreview-request)

Comment on attachment 8904429 [details]
Bug 860727 - Implement UTF-8 code unit counting without setlocale().

Review request updated; see interdiff: https://reviewboard.mozilla.org/r/176266/diff/4-5/

Henri Sivonen (:hsivonen)

Assignee

Updated

•

7 years ago

Attachment #8904429 - Flags: review?(rjesup)

Randell Jesup [:jesup] (needinfo me)

Comment 10

•

7 years ago

mozreview-review

Comment on attachment 8904429 [details]
Bug 860727 - Implement UTF-8 code unit counting without setlocale().

https://reviewboard.mozilla.org/r/176266/#review181426

Forwarding review to bwc, who might want to check with ekr about how to embed mozilla-specific code in nICEr.  "normally" this would be done by some sort of API point to install a callback for an alternate to the inline code, but perhaps it's ok to have in there ifdefed.  (and it's simpler...)

Attachment #8904429 - Flags: review?(rjesup)

Randell Jesup [:jesup] (needinfo me)

Updated

•

7 years ago

Attachment #8904429 - Flags: review?(docfaraday)

Byron Campen [:bwc]

Comment 11

•

7 years ago

mozreview-review

Comment on attachment 8904429 [details]
Bug 860727 - Implement UTF-8 code unit counting without setlocale().

https://reviewboard.mozilla.org/r/176266/#review181446

So, given that we call this function in only one place, and just let failures slide when they happen (https://dxr.mozilla.org/mozilla-central/source/media/mtransport/third_party/nICEr/src/stun/stun_codec.c#228), I think it would make sense to write a count_utf8_chars_no_validation function and put it in stun_codec.c, use that instead of mbslen, and either leave mbslen.c alone or remove it from the build.

Attachment #8904429 - Flags: review?(docfaraday) → review-

Comment hidden (mozreview-request)

Review commit: https://reviewboard.mozilla.org/r/176676/diff/#index_header
See other reviews: https://reviewboard.mozilla.org/r/176676/

Comment hidden (mozreview-request)

Comment on attachment 8904429 [details]
Bug 860727 - Implement UTF-8 code unit counting without setlocale().

Review request updated; see interdiff: https://reviewboard.mozilla.org/r/176266/diff/5-6/

Henri Sivonen (:hsivonen)

Assignee

Comment 14

•

7 years ago

Upstream PR: https://github.com/resiprocate/nICEr/pull/1

Henri Sivonen (:hsivonen)

Assignee

Updated

•

7 years ago

Attachment #8904429 - Flags: review?(docfaraday)

Attachment #8904895 - Flags: review?(docfaraday)

Henri Sivonen (:hsivonen)

Assignee

Comment 15

•

7 years ago

mozreview-review-reply

Comment on attachment 8904429 [details]
Bug 860727 - Implement UTF-8 code unit counting without setlocale().

https://reviewboard.mozilla.org/r/176266/#review181446

Created a function that counts under the assumption of UTF-8 validity. Removed mbslen.c/h, because dead-code files are confusing and the m-c import of nICEr already omits some upstream files. Created a corresponding PR for the upstream repo.

Byron Campen [:bwc]

Comment 16

•

7 years ago

mozreview-review

Comment on attachment 8904429 [details]
Bug 860727 - Implement UTF-8 code unit counting without setlocale().

https://reviewboard.mozilla.org/r/176266/#review181762

Attachment #8904429 - Flags: review+

Byron Campen [:bwc]

Comment 17

•

7 years ago

mozreview-review

Comment on attachment 8904895 [details]
Bug 860727 clean-up - Remove mbslen.c/h from the tree.

https://reviewboard.mozilla.org/r/176676/#review181768

::: media/mtransport/third_party/nICEr/IMPORT_FILES:71
(Diff revision 1)
>                  ./src/util/cb_args.c
>                  ./src/util/cb_args.h
>                  ./src/util/ice_util.c
>                  ./src/util/ice_util.h
> -                ./src/util/mbslen.c
> -                ./src/util/mbslen.h
> +                #./src/util/mbslen.c
> +                #./src/util/mbslen.h

Let's actually remove these.

::: media/mtransport/third_party/nICEr/nicer.gyp:122
(Diff revision 1)
> -                "./src/util/mbslen.c",
> -                "./src/util/mbslen.h",
> +                #"./src/util/mbslen.c",
> +                #"./src/util/mbslen.h",

Same.

Attachment #8904895 - Flags: review?(docfaraday) → review+

Comment hidden (mozreview-request)

Comment on attachment 8904895 [details]
Bug 860727 clean-up - Remove mbslen.c/h from the tree.

Review request updated; see interdiff: https://reviewboard.mozilla.org/r/176676/diff/1-2/

Henri Sivonen (:hsivonen)

Assignee

Comment 19

•

7 years ago

mozreview-review-reply

Comment on attachment 8904895 [details]
Bug 860727 clean-up - Remove mbslen.c/h from the tree.

https://reviewboard.mozilla.org/r/176676/#review181768

> Let's actually remove these.

Removed

> Same.

Removed. Thanks.

Pulsebot

Comment 20

•

7 years ago

Pushed by hsivonen@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/7f6cfb32a16e
Implement UTF-8 code unit counting without setlocale(). r=bwc
https://hg.mozilla.org/integration/autoland/rev/6a66de2d1901
clean-up - Remove mbslen.c/h from the tree. r=bwc

Wes Kocher (:KWierso) (Not reading bugmail; email directly if needed)

Comment 21

•

7 years ago

bugherder

https://hg.mozilla.org/mozilla-central/rev/7f6cfb32a16e
https://hg.mozilla.org/mozilla-central/rev/6a66de2d1901

Status: ASSIGNED → RESOLVED

Closed: 7 years ago

status-firefox57: --- → fixed

Resolution: --- → FIXED

Target Milestone: --- → mozilla57

You need to log in before you can comment on or make changes to this bug.