Closed Bug 514173 Opened 10 years ago Closed 6 years ago

Make literal string buffers shareable

Categories

(Core :: String, defect, P3)

x86
All
defect

Tracking

()

RESOLVED FIXED
mozilla29

People

(Reporter: benjamin, Assigned: neil)

Details

(Keywords: helpwanted, Whiteboard: [mentor=bsmedberg][lang=c++][pretty big task])

Attachments

(3 files, 7 obsolete files)

Currently when we assign a literal string to another string, we allocate and copy a new buffer. Neil did some stats which show that if we share the literal string buffer we could avoid a fair number of allocations/copies.

This is fairly simple to implement with a string flag: it does encode an assumption, however, that we won't unload XPCOM component DLLs, because the literal string buffers can outlive a DLL. I think this is ok: we don't unload now, and I don't think we have serious plans to do so ever (and we're moving extensions away from binary components).
Attached patch WIP patch (obsolete) — Splinter Review
This doesn't include my extra logging code.

Note that of course this only helps you if you set MOZILLA_INTERNAL_API=1.
Oh, and some stats generated by browsing a few of my favourite sites:
394233 String buffer allocations
 50421 String buffer reallocations
394233 String buffer frees
287985 String buffer shares
 24955 Adopted strings
 24955 Frees of adopted strings
  4332 Strings modified after being copied from a literal
  4593 Void strings copied
 26731 Other zero-length strings copied
103507 Literal strings copied
  6152 Adopted strings copied
133049 Other null-terminated strings copied
113129 Nonterminated strings copied
I'm not going to get to this, but I'd really like somebody to help with it! Note that this is not a small project, so you should at least first have done a bit of Mozilla coding before.
Assignee: benjamin → nobody
Keywords: helpwanted
OS: Windows NT → All
Priority: -- → P3
Whiteboard: [mentor=bsmedberg][lang=c++][pretty big task]
Is that bug still valid?
Yes, why wouldn't it be?
Attached patch Prototype patchSplinter Review
I forgot to include the addition of a get() method to nsLiteralC?String, but besides that (and the fact that a lot of code assumes nsDependentString == nsLiteralString), this should be more or less the direction that makes sense to me.

Thoughts?
Attachment #792008 - Flags: feedback?(benjamin)
Comment on attachment 792008 [details] [diff] [review]
Prototype patch

>+    SetDataFlags(str.IsLiteral() ? F_LITERAL : F_NONE);
Why not str.mFlags & F_LITERAL ? (Similarly for the other case.)

>+        SetDataFlags(str.mFlags);
This doesn't work for auto strings, because str.mFlags includes high bits which SetDataFlags doesn't allow.
Comment on attachment 792008 [details] [diff] [review]
Prototype patch

I'm ok with this, but it should get a real review from dbaron (and Neil's comments are correct and should be addressed).
Attachment #792008 - Flags: feedback?(benjamin) → feedback+
(In reply to neil@parkwaycc.co.uk from comment #7)
> Comment on attachment 792008 [details] [diff] [review]
> Prototype patch
> 
> >+    SetDataFlags(str.IsLiteral() ? F_LITERAL : F_NONE);
> Why not str.mFlags & F_LITERAL ? (Similarly for the other case.)

I think I had compiler errors when I tried that.

> >+        SetDataFlags(str.mFlags);
> This doesn't work for auto strings, because str.mFlags includes high bits
> which SetDataFlags doesn't allow.

That explains why this patch causes lots of code to crash :-)
Or you could have built debug and hit the appropriate assertion almost immediately...
Comment on attachment 792008 [details] [diff] [review]
Prototype patch

>     size_type strLength = str.Length();
> 
>     if (startPos > strLength)
>       startPos = strLength;
> 
>     mData = const_cast<char_type*>(str.Data()) + startPos;
>     mLength = XPCOM_MIN(length, strLength - startPos);
> 
>-    SetDataFlags(F_NONE);
>+    SetDataFlags(str.IsLiteral() ? F_LITERAL : F_NONE);
Must be for the same reason the code here uses Length() and Data(), which is why I added a Flags() accessor in my WIP patch.
Comment on attachment 792008 [details] [diff] [review]
Prototype patch

>+      template<size_t N>
>+      void AssignLiteral( const char (&str)[N] )
>+      {
>+        Assign(self_type(const_cast<char *>(str), N, F_TERMINATED | F_LITERAL));
N - 1
Comment on attachment 792008 [details] [diff] [review]
Prototype patch

>+    SetDataFlags(str.IsLiteral() ? F_LITERAL : F_NONE);
So, this is a more likely reason why you crash: you allow a literal substring, which I didn't (you just get a regular dependent substring with my patch). The problem here is that the string isn't terminated, so that when you want to assign it, you need to copy it, which you don't. So you end up with a string whose .get() returns the original literal rather then the substring.
Attached patch WIP Patch (obsolete) — Splinter Review
This is all the bits from attachment 792008 [details] [diff] [review] that I could make work, plus all the bits from attachment 398137 [details] [diff] [review] that I could make work with them, so I got a little stuck with UTF16 literal strings (I'm using a templated constructor; if I just stick it on nsDependentString then the existing constructor won't stop you from writing NS_LITERAL_STRING(wstring)).
Attachment #398137 - Attachment is obsolete: true
Comment on attachment 794640 [details] [diff] [review]
WIP Patch

>+    mFlags = F_TERMINATED | F_LITERAL;
Oops. This should use SetDataFlags of course.
Comment on attachment 794640 [details] [diff] [review]
WIP Patch

>+        AssignLiteral(str.mData, str.mFlags);
... and this should be str.mLength - oops.
Attached patch WIP Patch (obsolete) — Splinter Review
This makes nsLiteralString its own type in all cases.
Attachment #794640 - Attachment is obsolete: true
Attached patch WIP Patch (obsolete) — Splinter Review
It turns out that we have a couple of multiline strings in the tree. Oops!
Attachment #795627 - Attachment is obsolete: true
Attached patch WIP Patch (obsolete) — Splinter Review
Bug 904985 simplified this. Switching PRUnichar to char16_t should help even more (I had to add a helper trait to work around that).
Attachment #796824 - Attachment is obsolete: true
Attached patch char_type version (obsolete) — Splinter Review
This version assumes that literal strings use char_type so it currently only compiles on Windows and depends on bug 895047 on other platforms.
Attached patch WIP Patch (obsolete) — Splinter Review
This seems to be working quite well. (Well, so did the previous ones, but it turns out that compilers prefer const char* overloads to const char(&)[N]...)

I am running in to a problem on Android though:
../../../xpcom/io/nsLocalFileUnix.cpp: In member function 'virtual nsresult nsLocalFile::Launch()':
../../../xpcom/io/nsLocalFileUnix.cpp:1810:93: error: conversion from 'const nsLiteralCString' to non-scalar type 'nsAutoCString' requested
nsAutoCString has a constructor for nsACString which is an ancestor of nsLiteralCString so I don't quite see why this doesn't compile.
Attachment #806980 - Attachment is obsolete: true
Attachment #815616 - Attachment is obsolete: true
(The code used to use nsDependentCString which conveniently was the same type as NS_LITERAL_CSTRING which meant that the compiler just did construction optimisation. Obviously now that nsLiteralCString is a different type this doesn't work either.)
My Try job seems to have finished with 9 random oranges:
https://tbpl.mozilla.org/?tree=Try&rev=9d0eda979e0e

Here's an Android-only job with a workaround for the compilation error:
https://tbpl.mozilla.org/?tree=Try&rev=34ee3bbfb1cd
Attached patch Working patchSplinter Review
* nsLiteral(C)String is now a distinct type

This is because there's no easy way to add a literal constructor to an existing string class as they often get constructed from const character buffers.

Previously nsLiteralCString was a typedef for nsDependentCString and nsLiteralString was a typedef that depended on whether a 16-bit string literal was available. This has allowed people to abuse the type of literal strings:

IndexedDBParent::CheckPermissionInternal - switched to const nsACString& paramater as that's our generic string parameter type
LogScope/LogFunc/LogMessage - switched to const char*, keeping XPCOM strings around is just a waste of effort
nsDownloadManager::GetDownloadFromDB - used a named literal string to name a literal string (seriously guys?)
nsLocalFileUnix::Launch - rewrote the way the string was constructed to compile (no idea why it can't construct an nsAutoString from an nsAString&)

* ParamTraits - added traits for the new string classes

* XPCStringConvert::ReadableToJSVal - copied jcranmer's code that avoids copying string literals into the JS engine

* Simplified the literal string macros

* Added a literal flag to nsTSubstring

* Removed the non-templated versions of the templated methods

* Added a char_16t version of AssignLiteral that shares

i.e. nsAString& aResult; aResult.AssignLiteral(MOZ_UTF16("UTF-16"));

* nsACString version of AssignLiteral now shares instead of copying

(nsAString version that takes const char (&)[N] still exists)

* Added ReplaceLiteral, InsertLiteral

* Added char_16t version of AppendLiteral

(shares if the string was empty)

* Made strings flagged as literal share on copy

Note that although substrings can point to a literal, there is no benefit to flagging them, except possibly in the case of StringTail.
Attachment #819047 - Attachment is obsolete: true
Attachment #819243 - Flags: review?(ehsan)
Attachment #819243 - Flags: feedback?(benjamin)
Comment on attachment 819243 [details] [diff] [review]
Working patch

Review of attachment 819243 [details] [diff] [review]:
-----------------------------------------------------------------

::: toolkit/components/downloads/nsDownloadManager.cpp
@@ +1070,5 @@
>  {
>    MOZ_ASSERT(!FindDownload(aGUID),
>               "If it is a current download, you should not call this method!");
>  
> +  NS_NAMED_LITERAL_CSTRING(query,

Why not just say nsLiteralCString query("foo") instead of using the macro?
(In reply to Joshua Cranmer from comment #25)
> (From update of attachment 819243 [details] [diff] [review])
> > +  NS_NAMED_LITERAL_CSTRING(query,
> 
> Why not just say nsLiteralCString query("foo") instead of using the macro?

In case it becomes possible to make literal strings static in future.
Attachment #819243 - Flags: feedback?(benjamin) → feedback?(dbaron)
Comment on attachment 819243 [details] [diff] [review]
Working patch

Review of attachment 819243 [details] [diff] [review]:
-----------------------------------------------------------------

Overall things are sane with this patch, but I want dbaron to review the string changes, and also Bobby to review the xpconnect change.  r=me on the rest, and f=me on everything.

::: js/xpconnect/src/XPCString.cpp
@@ +95,5 @@
> +                                             length, &sLiteralFinalizer);
> +        if (str) {
> +            return JS::StringValue(str);
> +        }
> +    }

Bobby needs to vouch on this.

::: xpcom/string/public/nsAString.h
@@ -21,5 @@
> -#ifndef NS_DISABLE_LITERAL_TEMPLATE
> -#  if (defined(__SUNPRO_CC) && (__SUNPRO_CC < 0x560)) || (defined(__HP_aCC) && (__HP_aCC <= 012100))
> -#    define NS_DISABLE_LITERAL_TEMPLATE
> -#  endif
> -#endif /* !NS_DISABLE_LITERAL_TEMPLATE */

I assume that people will complain if they still use any of these ancient compilers.  ;-)
Attachment #819243 - Flags: review?(ehsan)
Attachment #819243 - Flags: review?(dbaron)
Attachment #819243 - Flags: review?(bobbyholley+bmo)
Attachment #819243 - Flags: review+
Attachment #819243 - Flags: feedback?(dbaron)
Attachment #819243 - Flags: feedback?(benjamin)
Comment on attachment 819243 [details] [diff] [review]
Working patch

Bad bad bugzilla
Attachment #819243 - Flags: feedback?(benjamin)
Comment on attachment 819243 [details] [diff] [review]
Working patch

Review of attachment 819243 [details] [diff] [review]:
-----------------------------------------------------------------

r=bholley assuming these strings never go away.
Attachment #819243 - Flags: review?(bobbyholley+bmo) → review+
Comment on attachment 819243 [details] [diff] [review]
Working patch

Review of attachment 819243 [details] [diff] [review]:
-----------------------------------------------------------------

::: js/xpconnect/src/XPCString.cpp
@@ +92,5 @@
> +    if (readable.IsLiteral()) {
> +        JSString *str = JS_NewExternalString(cx,
> +                                             static_cast<const jschar*>(readable.BeginReading()),
> +                                             length, &sLiteralFinalizer);
> +        if (str) {

It doesn't seem like we should continue on OOM.

::: xpcom/string/public/nsLiteralString.h
@@ +22,5 @@
> +  // declare nsLiteralCString
> +#include "string-template-def-char.h"
> +#include "nsTLiteralString.h"
> +#include "string-template-undef.h"
> + 

trailing ws
(In reply to Ms2ger from comment #30)
> (From update of attachment 819243 [details] [diff] [review])
> > +        if (str) {
> It doesn't seem like we should continue on OOM.
Sorry, that part was cargo culted from jcranmer's patch.

As it happens, the code there has bitrotted, so it needs to be updated anyway. How does this version appeal to you?

if (!str)
    return false;
vp.setString(str);
return true;
Marking owner=ehsan because that's what it appears is going on, to clear this off of bugsahoy.
Assignee: nobody → ehsan
Wrong ehsan ;-)
Assignee: ehsan → neil
Comment on attachment 819243 [details] [diff] [review]
Working patch

Canceling what I believe is an obsolete request, since my understanding is that the request on the newer patch replaces this one.
Attachment #819243 - Flags: review?(dbaron)
Comment on attachment 8338356 [details] [diff] [review]
Updated for bitrot and drive-by-review comment

Could you make the comment in nsTLiteralString.h explain not only
that it doesn't own the buffer, but also that it assumes the buffer
lives forever?

I'm a bit queasy about the change to nsTDependentString making it
seem like it's ok for str to not have F_TERMINATED.  Could you add
an assertion to the Rebind method that str.Flags() & F_TERMINATED,
since otherwise the caller is violating the static-typing aspect
of the string type hierarchy?  (Then again... there are a million
other ways to violate it... but at least I'd rather not add to the
problem.)


>+    else if (str.mFlags & F_LITERAL)
>+      {
>+        AssignLiteral(str.mData, str.mLength);
>+        return true;
>+      }

Please assert that str.mFlags & F_TERMINATED inside the condition here.

r=dbaron, and sorry for the long delay
Attachment #8338356 - Flags: review?(dbaron) → review+
https://hg.mozilla.org/mozilla-central/rev/9db7cf4cc385
Status: NEW → RESOLVED
Closed: 6 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla29
This is awesome stuff!

Would you mind filing a followup bug on having a way to create the JSString in the XPConnect here without needing a finalizer?  Needing a finalizer just to do nothing is pretty silly, esp. where generational GC is concerned.
You need to log in before you can comment on or make changes to this bug.