Open Bug 1515831 Opened 5 years ago Updated 2 years ago

Does CurlUtils.escapeStringPosix handle strings containing code points greater than U+00FF correctly?

Categories

(DevTools :: Netmonitor, enhancement, P3)

enhancement

Tracking

(Not tracked)

People

(Reporter: Waldo, Unassigned)

References

(Blocks 1 open bug, )

Details

browser_net_curl-utils.js has a test for CurlUtils.escapeStringPosix that -- prior to bug 1492937 -- passed in the string

  "æ ø ü ß ö é"

expecting a result like

  "$'\\xc3\\xa6 \\xc3\\xb8 \\xc3\\xbc \\xc3\\x9f \\xc3\\xb6 \\xc3\\xa9'"

where each code point above U+00FF was converted to a sequence of hex escapes for the code point's UTF-8 encoding.

However, this only *happened* to work because browser_net_curl-utils.js was loaded not as UTF-8, but as Latin-1.  So the nice string with various code points in [U+0080, U+00FF] was *really* a string whose elements were the UTF-8 code units.  And when bug 1492937 changed things so this script got loaded as UTF-8, the script *also* had to be changed to make that nice string of actual non-ASCII code points be instead be the awful and stupid

  "\xc3\xa6 \xc3\xb8 \xc3\xbc \xc3\x9f \xc3\xb6 \xc3\xa9"

so it would continue to contain as elements the same UTF-8 code unit values.

It isn't clear that escapeStringPosix does what it's intended to do, and what the author of this test *probably* thought it was doing.  Sometimes it outputs \u-style escapes; other times it outputs \x-style escapes.  The \x-escaping is applied to non-ASCII *bytes* as bytes even when they are constituent parts of a UTF-8 code point, but then the bigger-than-byte code points are escaped otherwise.  And none of this handles surrogate pairs -- "🚴" spanning two UTF-16 code units is almost certainly not escaped in a sensible manner.

It's possible the current code is right, but I doubt it.  But someone who actually understands curl, the arguments it takes, and how ANSI-C quoting interacts with both of those should make the call.
Severity: normal → S3
You need to log in before you can comment on or make changes to this bug.