ProtocolParser does lots of unnecessary string copying

RESOLVED FIXED in mozilla36

Status

()

defect
RESOLVED FIXED
5 years ago
5 years ago

People

(Reporter: njn, Assigned: njn)

Tracking

unspecified
mozilla36
Points:
---
Dependency tree / graph

Firefox Tracking Flags

(Not tracked)

Details

Attachments

(1 attachment, 1 obsolete attachment)

I did some cumulative heap profiling on a new profile, in which I started the
browser and waited for the safebrowsing update to kick in. Heap allocations are
dominated by inefficient string handling done by ProtocolParser.

Consider the following two records.

> Cumulative {
>   ~11,663 blocks in heap block record 1 of 12,785
>   ~158,174,072 bytes (~138,397,176 requested / ~19,776,896 slop)
>   Individual block sizes: 77,824; 73,728 x 2; 69,632 x 2; 65,536 x 2; 61,440 x 47; 57,344 x 72; 53,248 x 106; 49,152 x 74; 45,056 x 114; 40,960 x 68; 36,864 x 101; 32,768 x 79; 28,672 x 233; 24,576 x 378; 20,480 x 1,324; 16,384 x 1,905; 12,288 x 1,887; 8,192 x 1,975; 4,096 x 1,541; ~4,093 x 1,752
>   15.21% of the heap (15.21% cumulative)
>   Allocated at {
>     #01: nsStringBuffer::Alloc(unsigned long) (/home/njn/moz/mi5/co64dmd/xpcom/string/../../../xpcom/string/nsSubstring.cpp:219)
>     #02: nsACString_internal::MutatePrep(unsigned int, char**, unsigned int*) (/home/njn/moz/mi5/co64dmd/xpcom/string/../../../xpcom/string/nsTSubstring.cpp:133)
>     #03: nsACString_internal::ReplacePrepInternal(unsigned int, unsigned int, unsigned int, unsigned int) (/home/njn/moz/mi5/co64dmd/xpcom/string/../../../xpcom/string/nsTSubstring.cpp:170)
>     #04: nsACString_internal::ReplacePrep(unsigned int, unsigned int, unsigned int) (/home/njn/moz/mi5/co64dmd/xpcom/string/../../../xpcom/string/nsTSubstring.h:969)
>     #05: nsACString_internal::Assign(char const*, unsigned int, mozilla::fallible_t const&) (/home/njn/moz/mi5/co64dmd/xpcom/string/../../../xpcom/string/nsTSubstring.cpp:316)
>     #06: nsACString_internal::Assign(char const*, unsigned int) (/home/njn/moz/mi5/co64dmd/xpcom/string/../../../xpcom/string/nsTSubstring.cpp:294)
>     #07: nsACString_internal::Assign(char const*, unsigned int, mozilla::fallible_t const&) (/home/njn/moz/mi5/co64dmd/xpcom/string/../../../xpcom/string/nsTSubstring.cpp:313)
>     #08: nsACString_internal::Assign(nsACString_internal const&, mozilla::fallible_t const&) (/home/njn/moz/mi5/co64dmd/xpcom/string/../../../xpcom/string/nsTSubstring.cpp:408)
>     #09: nsACString_internal::Assign(nsACString_internal const&) (/home/njn/moz/mi5/co64dmd/xpcom/string/../../../xpcom/string/nsTSubstring.cpp:364)
>     #10: nsCString::operator=(nsACString_internal const&) (/home/njn/moz/mi5/co64dmd/toolkit/crashreporter/../../dist/include/nsTString.h:99)
>     #11: mozilla::safebrowsing::ProtocolParser::NextLine(nsACString_internal&) (/home/njn/moz/mi5/co64dmd/toolkit/components/url-classifier/../../../../toolkit/components/url-classifier/ProtocolParser.cpp:600)
>     #12: mozilla::safebrowsing::ProtocolParser::ProcessControl(bool*) (/home/njn/moz/mi5/co64dmd/toolkit/components/url-classifier/../../../../toolkit/components/url-classifier/ProtocolParser.cpp:127)
>   }
> }

> Cumulative {
>   ~11,605 blocks in heap block record 2 of 12,785
>   ~157,588,455 bytes (~137,831,837 requested / ~19,756,618 slop)
>   Individual block sizes: 77,824; 73,728 x 2; 69,632 x 2; 65,536 x 2; 61,440 x 48; 57,344 x 71; 53,248 x 106; 49,152 x 73; 45,056 x 114; 40,960 x 68; 36,864 x 101; 32,768 x 80; 28,672 x 230; 24,576 x 374; 20,480 x 1,312; 16,384 x 1,909; 12,288 x 1,883; 8,192 x 1,975; 4,096 x 1,539; ~4,093 x 1,715
>   15.15% of the heap (30.36% cumulative)
>   Allocated at {
>     #01: nsStringBuffer::Alloc(unsigned long) (/home/njn/moz/mi5/co64dmd/xpcom/string/../../../xpcom/string/nsSubstring.cpp:219)
>     #02: nsACString_internal::MutatePrep(unsigned int, char**, unsigned int*) (/home/njn/moz/mi5/co64dmd/xpcom/string/../../../xpcom/string/nsTSubstring.cpp:133)
>     #03: nsACString_internal::ReplacePrepInternal(unsigned int, unsigned int, unsigned int, unsigned int) (/home/njn/moz/mi5/co64dmd/xpcom/string/../../../xpcom/string/nsTSubstring.cpp:170)
>     #04: nsACString_internal::ReplacePrep(unsigned int, unsigned int, unsigned int) (/home/njn/moz/mi5/co64dmd/xpcom/string/../../../xpcom/string/nsTSubstring.h:969)
>     #05: nsACString_internal::Assign(char const*, unsigned int, mozilla::fallible_t const&) (/home/njn/moz/mi5/co64dmd/xpcom/string/../../../xpcom/string/nsTSubstring.cpp:316)
>     #06: nsACString_internal::Assign(char const*, unsigned int) (/home/njn/moz/mi5/co64dmd/xpcom/string/../../../xpcom/string/nsTSubstring.cpp:294)
>     #07: nsACString_internal::Assign(char const*, unsigned int, mozilla::fallible_t const&) (/home/njn/moz/mi5/co64dmd/xpcom/string/../../../xpcom/string/nsTSubstring.cpp:313)
>     #08: nsACString_internal::Assign(nsACString_internal const&, mozilla::fallible_t const&) (/home/njn/moz/mi5/co64dmd/xpcom/string/../../../xpcom/string/nsTSubstring.cpp:408)
>     #09: nsACString_internal::Assign(nsACString_internal const&) (/home/njn/moz/mi5/co64dmd/xpcom/string/../../../xpcom/string/nsTSubstring.cpp:364)
>     #10: nsCString::operator=(nsACString_internal const&) (/home/njn/moz/mi5/co64dmd/toolkit/crashreporter/../../dist/include/nsTString.h:99)
>     #11: mozilla::safebrowsing::ProtocolParser::ProcessChunk(bool*) (/home/njn/moz/mi5/co64dmd/toolkit/components/url-classifier/../../../../toolkit/components/url-classifier/ProtocolParser.cpp:287)
>     #12: mozilla::safebrowsing::ProtocolParser::AppendStream(nsACString_internal const&) (/home/njn/moz/mi5/co64dmd/toolkit/components/url-classifier/../../../../toolkit/components/url-classifier/ProtocolParser.cpp:107)
>   }
> }

That's over 30% of all heap allocations done for start-up.

The two cases are similar:

> chunk.Assign(Substring(mPending, 0, mChunkState.length)); 
> mPending = Substring(mPending, mChunkState.length);       // line 287

> line.Assign(Substring(mPending, 0, newline));
> mPending = Substring(mPending, newline + 1);              // line 600

In each case a large string is split into two parts. The intention is probably
that this is a cheap operation, but it's not; in both cases the second line
involves a copy of the chars. (The first line probably involves a copy, too,
but that's not showing up high in the profile because the size of the data
is much smaller.)

In principle, avoiding these copies is easy. But I'm not familiar enough with
Gecko's dependent strings implementation to yet know if there's a simple fix
for this, or if it will require more substantial changes.
gcp, asking you to review as module owner; Neil, asking you to review as an
expert nitpicker on Mozilla string code :) I've never used Rebind() before so
I'd appreciate a careful look.

This avoids the copying, and reduces the cumulative allocations done during
startup and safebrowsing initialization from ~1 GiB to ~700 MiB.

At least, that's the results I got during the few rounds of testing in which I
could reliably get a big safebrowsing update just by starting the browser with
a new profile. At some point that stopped happening, not sure why, and now I'm
getting much smaller updates that do less parsing.
Attachment #8524213 - Flags: review?(neil)
Attachment #8524213 - Flags: review?(gpascutto)
Assignee: nobody → n.nethercote
Status: NEW → ASSIGNED
Here's an example sequence that shows how many chars from the tail we copy each
time, prior to my patch being applied:

> 3998, 3959, 3946, 3920, 3907, 3868, 3855, 3829, 3816, 3777, 3764, 3738, 3725,
> 3686, 3673, 3647, 3634, 3595, 3582, 3556, 3543, 3504, 3491, 3465, 3452, 3413,
> 3400, 3374, 3361, 3322, 3309, 3283, 3270, 3231, 3218, 3192, 3179, 3140, 3127,
> 3101, 3088, 3062, 3049, 3010, 2997, 2971, 2958, 2919, 2906, 2867, 2854, 2828,
> 2815, 2776, 2763, 2737, 2724, 2685, 2672, 2646, 2633, 2594, 2581, 2555, 2542,
> 2503, 2490, 2464, 2451, 2386, 2373, 2347, 2334, 2295, 2282, 2256, 2243, 2204,
> 2191, 2165, 2152, 2113, 2100, 2074, 2061, 2022, 2009, 1983, 1970, 1931, 1918,
> 1892, 1879, 1840, 1827, 1801, 1788, 1749, 1736, 1710, 1697, 1658, 1645, 1619,
> 1606, 1567, 1554, 1528, 1515, 1476, 1463, 1437, 1424, 1385, 1372, 1333, 1320,
> 1281, 1268, 1229, 1216, 1190, 1177, 1138, 1125, 1099, 1086, 1047, 1034, 1008,
> 995, 956, 943, 904, 891, 865, 852, 813, 800, 761, 748, 709, 696, 670, 657,
> 618, 605, 579, 566, 527, 514, 488, 475, 436, 423, 397, 384, 345, 332, 306,
> 293, 254, 241, 215, 202, 163, 150, 111, 98, 59, 46, 20, 7

That's 347,232 bytes of unnecessary copying and allocation for a 4 KiB chunk.

(This is from a run where the full safebrowsing data was downloaded.)
Comment on attachment 8524213 [details] [diff] [review]
Avoid lots of unnecessary string copying in ProtocolParser

Review of attachment 8524213 [details] [diff] [review]:
-----------------------------------------------------------------

::: toolkit/components/url-classifier/ProtocolParser.h
@@ +100,5 @@
>  
>    nsCOMPtr<nsICryptoHash> mCryptoHash;
>  
>    nsresult mUpdateStatus;
> +  nsDependentCString mPending;

From digging our string classes, this puts the nsDependentString interface on a CString backing, which effectively only means Rebind is added via destruction+reconstruction.

I'm putting that here because if you look at https://developer.mozilla.org/en-US/docs/Mozilla/Tech/XPCOM/Reference/Glue_classes/nsDependentCString
it deceptively says "This class does not own its data", yet we do expect mPending to own its data.
Attachment #8524213 - Flags: review?(gpascutto) → review+
Comment on attachment 8524213 [details] [diff] [review]
Avoid lots of unnecessary string copying in ProtocolParser

And the above reason is exactly why I believe this patch is incorrect, because as soon as we rebind, we drop our ownership of the data that was appended in AppendStream.

A faster alternative to mPending.Assign(Substring(mPending, N)); is mPending.Cut(0, N); since it doesn't allocate. (Alternatively it might be possible to improve the definition of IsDependentOn to optimise this case.)

From a brief skim of the code I think an even more efficient approach night be to keep a current index, something like this:

bool
ProtocolParser::NextLine(nsACString& aLine)
{
  int32_t newline = mPending.FindChar('\n', mCurPos);
  if (newline == kNotFound) {
    return false;
  }
  line.Assign(Substring(mPending, mCurPos, newline - mCurPos));
  mCurPos = newline + 1;
  return true;
}

nsresult
ProtocolParser::ProcessChunk(bool* aDone)
{
  ...
  if (mPending.Length() - mCurPos < mChunkState.length) {
    *aDone = true;
    return NS_OK;
  }

  nsDependentSubstring chunk(mPending, mCurPos, mChunkState.length);
  mCurPos += mChunkState.length;
  ...
}

Then at the end of AppendStream you could do something like
  mPending.Cut(0, mCurPos);
  mCurPos = 0;
  return NS_OK;
Attachment #8524213 - Flags: review?(neil) → review-
I went with Cut() because it's a smaller change but still avoids the badness. I
confirmed with my profiler that the allocations aren't happening with this new
patch.
Attachment #8524912 - Flags: review?(neil)
Attachment #8524213 - Attachment is obsolete: true
Attachment #8524912 - Flags: review?(neil) → review+
(In reply to Nicholas Nethercote from comment #5)
> I went with Cut() because it's a smaller change but still avoids the
> badness. I confirmed with my profiler that the allocations aren't
> happening with this new patch.

(Consider resummarising the bug to just be about the allocations.)
https://hg.mozilla.org/mozilla-central/rev/845a77a12099
Status: ASSIGNED → RESOLVED
Closed: 5 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla36
You need to log in before you can comment on or make changes to this bug.