Closed Bug 1944365 Opened 17 days ago Closed 4 days ago

Consider using a single `mSpec` buffer in `nsSimpleURI`

Categories

(Core :: Networking, enhancement, P2)

enhancement

Tracking

()

RESOLVED FIXED
137 Branch
Tracking Status
firefox137 --- fixed

People

(Reporter: nika, Assigned: nika)

Details

(Whiteboard: [necko-triaged])

Attachments

(4 files)

The current architecture of the nsSimpleURI type dates back to 1999 when the type was originally added (https://searchfox.org/mozilla-central/rev/1a019fce8f2a339f6884258d6d197bbe35103a61/netwerk/base/src/nsSimpleURI.h#81-82). The URI stores separate string members for the scheme, path, and, more recently, query and ref. This allows fetching the individual components relatively cheaply, but requires building a new string buffer every time GetSpec is called.

This is in some contrast to nsStandardURL, which has (also for a long time) stored a single canonical mSpec string buffer, and managed the different components with indexes into that string buffer. This allows calling GetSpec to be cheaper, as the string buffer can be shared with the caller to avoid copies.

For most simple URIs, this isn't a problem, but this type is also used for (most, see footnote) data: URIs, which can be very large and therefore expensive to copy. The nsDataChannel implementation has some special behaviour, optimized around the current behaviour, but other callers frequently do not (e.g. bug 1941816).

This bug is to consider aligning nsSimpleURI's implementation more with nsStandardURL to improve the performance of GetSpec() on large data URIs. This could lead to new copies being required for methods like GetFilePath() or (sometimes) GetPathQueryRef(), but I expect this is less common to do unintentionally.


footnote: As some data: URIs appear to use DefaultURI (https://searchfox.org/mozilla-central/rev/a965e3c683ecc035dee1de72bd33a8d91b1203ed/netwerk/protocol/data/nsDataHandler.cpp#67-75), this will not help them. While DefaultURI uses a single spec buffer, all getters require a string copy, as the underlying buffer is stored in rust code using non-nsCString buffers).

Severity: -- → N/A
Flags: needinfo?(valentin.gosu)

Currently writing a pascal-format string to a nsIBinaryOutputStream
requires it to be null-terminated to call the WriteStringZ method.
This adds a helper for C++ code which has the same effect, but accepts a
potentially non-null-terminated const nsACString&.

This overload removes the need for an extra copy when concatenating
multiple nsTSubstrings together to pass to a fallible Replace
function.

This slightly reduces the quality of the allocation failure error
reporting, as the infallible variant may report the wrong size if the
failed allocation is the tuple-flattening allocation in the case where
the tuple is dependent on the mutating string buffer. I expect this is
an uncommon situation, and felt that the extra code complexity from
duplicating the codepaths was not worth the OOM failure size accuracy.

This patch adjusts nsSimpleURI to be stored as a single mSpec member,
with cached indices of the various separators alongside. This improves
the performance of calling GetSpec() on these URIs, as the caller will
be able to share the underlying string buffer.

This moves nsSimpleURI's implementation to be a bit more in-line with
nsStandardURL, hopefully reducing performance surprises, especially
around large data URIs.

nsDataChannel is updated for the new perf characteristics in part 4.

Previously nsDataChannel would try to avoid unnecessary copies during
nsIURI getters by e.g. cloning the URI without the ref component, and
calling methods which were more likely to not require a string copy.

This pattern is no longer optimal after the changes in part 1, and will
instead cause extra string copies. This patch adjusts the code to
reduce back to the original number of copies by reading the full spec
string.

Unfortunately, the nsIURI interface does not provide a way to access a
dependent reference to a URI component, so some additional re-parsing of
the URI is required to avoid this copy. This could be optimized further
if such methods were introduced.

As before these changes, the final input stream cannot share the buffer
completely (even if no unescaping or base64-decoding is needed, due to
mime type metadata, etc.). I expect the code to perform the same number
(or fewer) string copies after this change than it did before.

Currently reviewing patches. Thanks Nika!

Flags: needinfo?(valentin.gosu)
Priority: -- → P2
Whiteboard: [necko-triaged]
Pushed by nlayzell@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/dadbc32b06db Part 1: Add a helper for arbitrary writing nsACString to nsIBinaryOutputStream, r=xpcom-reviewers,mccr8 https://hg.mozilla.org/integration/autoland/rev/344a4e35d3e4 Part 2: Add a fallible version of the substring tuple replace overload, r=xpcom-reviewers,mccr8 https://hg.mozilla.org/integration/autoland/rev/015322a112ce Part 3: Store nsSimpleURI spec in contiguous memory, r=necko-reviewers,valentin https://hg.mozilla.org/integration/autoland/rev/49f00c6b66c6 Part 4: Align nsDataChannel parsing with new nsSimpleURI implementation, r=necko-reviewers,valentin https://hg.mozilla.org/integration/autoland/rev/0dbc9a02e956 apply code formatting via Lando
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: