1837063 - Support Latin1 strings in DOMString

Reporter

Description

•

2 years ago

Currently when Latin1 JS strings get passed to API's that take DOMStrings they are converted to UTF16 strings in CopyLinearStringChars. It would be nice if we could avoid this.

The particular motivation for this was round-tripping strings through the <input>.value in Speedometer. Currently we end up always getting out a UTF16 string despite only putting in Latin1 strings. This ends up causing us to create UTF16 JSON via json_stringify.

Chrome and Safari avoid this their equivalent to DOMString supports Latin1: "A String can represent Unicode code points with either LChars or UChars, which use 8 bits and 16 bits per code unit respectively."

Nika Layzell [:nika] (ni? for response)

Comment 1

•

2 years ago

There was a discussion around this on Matrix yesterday. I'll summarize some of my early thoughts below about possible implementation strategies and the interactions with other parts of our DOM bindings and string infrastructure:

In gecko currently when WebIDL converts a DOMString to C++, it is passed as a nsAString. Changing this type to support 1-byte latin1 strings is likely impractical to do because the type is used so widely in the codebase and the logic using it often heavily relies on the specific internal representation being a char16_t[]. Any change to a new WTF::String-like type would likely require us to introduce it as a new option, and then migrate existing code over to using this new type under the hood, supporting nsAString as a fallback option.

We do already have some custom types which we use within bindings for extra string optimizations in binding code such as DOMString, which is used as a stack-bound outparameter DOMString in some places which benefit from it (example), and FakeString, which is unsafe and used exclusively in bindings to take advantage of the reduced flexibility in how bindings constructs string arguments.

Neither of these types would be suitable for use outside of their current use-cases. The FakeString type is effectively just nsString, but with a non-initializing constructor, and without handling edge-cases around uniquely owned strings in the destructor for performance in generated code, and the DOMString type is large, and clearly only suitable as a stack-bound outparameter type due to it's handling of things like borrowed string buffers and atoms.

The easiest change to do for this bug is probably around the DOMString type which already has extra states and complexity for optimizing passing strings from C++ to JS. We could consider adding cases for latin1-encoded nsCStrings or their nsStringBuffers to be provided as another fast-path to the DOMString type. This wouldn't help us for the case of a WebIDL DOMString being passed into C++ however, so might give us little benefit, especially without support for storing and passing around a value like this, though we might be able to get a minor benefit from specific callers like nsAttrValue which already use more complex storage.

The naive approach may be to add this new type (I'll call it HybridString to avoid confusion with the existing DOMString type, though we probably want a better name), and change the WebIDL bindings to generate it instead in bindings. We'd then introduce an implicit HybridString -> nsAString& conversion to allow calling the original methods. This would be quite inefficient (and have thread safety issues) though, as we'd end up doing extra re-allocations when doing the string wideness conversion which aren't necessary in the current code.

Instead, the bindings code will need to be aware of the target string format when doing the conversion, so that the copy can only be done once, and we'll need to be careful in code which we migrate to use HybridString to not perform unnecessary conversions to nsAString in hot codepaths. This could either be done with some C++ template magic (which may make DOM conversion error handling very difficult, depending on how it is implemented), or by having a new gecko-only WebIDL type (similar to our UTF8String type, which is USVString in the spec) which uses the new (or old) C++ type under the hood.

If we do end up adding a type like this, one of the places which might want to use it is around nsAtom, as many atoms are exclusively latin1 and used very frequently. This may end up being a very large change, however, due to how frequently atoms are used.

Jeff Muizelaar [:jrmuizel]

Reporter

Updated

•

2 years ago

Blocks: speedometer3

Dave Hunt [:davehunt] [he/him] ⌚BST

Updated

•

2 years ago

Whiteboard: [sp3]

Jira Integration Bot

Updated

•

2 years ago

See Also: → https://mozilla-hub.atlassian.net/browse/SP3-609

Mayank Bansal

Updated

•

1 year ago

Updated

•

1 month ago

Blocks: sp3-high

Emilio Cobos Álvarez [:emilio]

Updated

•

1 month ago

Depends on: 2025271, 2023628

Jeff Muizelaar [:jrmuizel]

Reporter

Updated

•

1 month ago

Updated

•

1 month ago

Whiteboard: [sp3] → [perf-prio]

Bugzilla

Support Latin1 strings in DOMString

Categories

(Core :: DOM: Bindings (WebIDL), enhancement)

Tracking

()

People

(Reporter: jrmuizel, Unassigned)

References

(Blocks 2 open bugs)

Details

(Whiteboard: [perf-prio])

Crash Data

Security

(public)

User Story

Description

Comment 1

Updated

Updated

Updated

Updated

Updated

Updated

Updated

Updated