Consider using a single `mSpec` buffer in `nsSimpleURI`
Categories
(Core :: Networking, enhancement, P2)
Tracking
()
Tracking | Status | |
---|---|---|
firefox137 | --- | fixed |
People
(Reporter: nika, Assigned: nika)
Details
(Whiteboard: [necko-triaged])
Attachments
(4 files)
The current architecture of the nsSimpleURI
type dates back to 1999 when the type was originally added (https://searchfox.org/mozilla-central/rev/1a019fce8f2a339f6884258d6d197bbe35103a61/netwerk/base/src/nsSimpleURI.h#81-82). The URI stores separate string members for the scheme, path, and, more recently, query and ref. This allows fetching the individual components relatively cheaply, but requires building a new string buffer every time GetSpec
is called.
This is in some contrast to nsStandardURL
, which has (also for a long time) stored a single canonical mSpec
string buffer, and managed the different components with indexes into that string buffer. This allows calling GetSpec
to be cheaper, as the string buffer can be shared with the caller to avoid copies.
For most simple URIs, this isn't a problem, but this type is also used for (most, see footnote) data:
URIs, which can be very large and therefore expensive to copy. The nsDataChannel
implementation has some special behaviour, optimized around the current behaviour, but other callers frequently do not (e.g. bug 1941816).
This bug is to consider aligning nsSimpleURI
's implementation more with nsStandardURL
to improve the performance of GetSpec()
on large data URIs. This could lead to new copies being required for methods like GetFilePath()
or (sometimes) GetPathQueryRef()
, but I expect this is less common to do unintentionally.
footnote: As some data: URIs appear to use DefaultURI
(https://searchfox.org/mozilla-central/rev/a965e3c683ecc035dee1de72bd33a8d91b1203ed/netwerk/protocol/data/nsDataHandler.cpp#67-75), this will not help them. While DefaultURI
uses a single spec
buffer, all getters require a string copy, as the underlying buffer is stored in rust code using non-nsCString
buffers).
Assignee | ||
Comment 1•15 days ago
|
||
Currently writing a pascal-format string to a nsIBinaryOutputStream
requires it to be null-terminated to call the WriteStringZ
method.
This adds a helper for C++ code which has the same effect, but accepts a
potentially non-null-terminated const nsACString&
.
Assignee | ||
Comment 2•15 days ago
|
||
This overload removes the need for an extra copy when concatenating
multiple nsTSubstring
s together to pass to a fallible Replace
function.
This slightly reduces the quality of the allocation failure error
reporting, as the infallible variant may report the wrong size if the
failed allocation is the tuple-flattening allocation in the case where
the tuple is dependent on the mutating string buffer. I expect this is
an uncommon situation, and felt that the extra code complexity from
duplicating the codepaths was not worth the OOM failure size accuracy.
Assignee | ||
Comment 3•15 days ago
|
||
This patch adjusts nsSimpleURI to be stored as a single mSpec
member,
with cached indices of the various separators alongside. This improves
the performance of calling GetSpec()
on these URIs, as the caller will
be able to share the underlying string buffer.
This moves nsSimpleURI's implementation to be a bit more in-line with
nsStandardURL, hopefully reducing performance surprises, especially
around large data URIs.
nsDataChannel is updated for the new perf characteristics in part 4.
Assignee | ||
Comment 4•15 days ago
|
||
Previously nsDataChannel would try to avoid unnecessary copies during
nsIURI getters by e.g. cloning the URI without the ref
component, and
calling methods which were more likely to not require a string copy.
This pattern is no longer optimal after the changes in part 1, and will
instead cause extra string copies. This patch adjusts the code to
reduce back to the original number of copies by reading the full spec
string.
Unfortunately, the nsIURI
interface does not provide a way to access a
dependent reference to a URI component, so some additional re-parsing of
the URI is required to avoid this copy. This could be optimized further
if such methods were introduced.
As before these changes, the final input stream cannot share the buffer
completely (even if no unescaping or base64-decoding is needed, due to
mime type metadata, etc.). I expect the code to perform the same number
(or fewer) string copies after this change than it did before.
Updated•8 days ago
|
Comment 7•4 days ago
|
||
bugherder |
https://hg.mozilla.org/mozilla-central/rev/dadbc32b06db
https://hg.mozilla.org/mozilla-central/rev/344a4e35d3e4
https://hg.mozilla.org/mozilla-central/rev/015322a112ce
https://hg.mozilla.org/mozilla-central/rev/49f00c6b66c6
https://hg.mozilla.org/mozilla-central/rev/0dbc9a02e956
Description
•