Closed Bug 1347737 Opened 3 years ago Closed 3 years ago

nsHtml5HtmlAttributes::clear spends a bunch of time deallocating

Categories

(Core :: DOM: HTML Parser, enhancement, P1)

enhancement

Tracking

()

RESOLVED FIXED
mozilla55
Tracking Status
firefox55 --- fixed

People

(Reporter: bzbarsky, Assigned: hsivonen)

References

Details

(Whiteboard: [qf:p1])

Attachments

(2 files)

An innerHTML setter profile shows about 10% of the time being spent under nsHtml5HtmlAttributes::clear, mostly deleting nsStrings.

Is there a reason we're storing the values here as nsString*, not nsString?  Seems like it would reduce the amount of heap traffic if we did the latter...
Flags: needinfo?(hsivonen)
Reminds me of bug 559327.
Yeah, I'm seeing other smaller alloc/free stuff too (e.g. free bits from nsHtml5TreeBuilder::endTag).  But this one is particularly large and maybe more amenable than some of the others to fixing easily...
Priority: -- → P1
(In reply to Boris Zbarsky [:bz] (still a bit busy) (if a patch has no decent message, automatic r-) from comment #0)
> Is there a reason we're storing the values here as nsString*, not nsString? 

The structure of the code (for translation) requires strings to be return values rather than out params and the string type needs to be able to hold a nullptr value. Way back when I wrote the code, this problem was identified (see bug 487949), and nsString* vs. nsStringBuffer* was discussed (I'm failing to locate the discussion). I was told to use nsString*. My recollection is that returning nsString (by value) was not on the table, probably because of the requirement for the type to take a nullptr value.
Flags: needinfo?(hsivonen)
The semantics of strings in the parser fit the semantics of nsStringBuffer*, but IIRC, reviewers at the time considered bare nsStringBuffer* more terrible than nsString*.
nsStringBuffer has the fundamental problem of not knowing its own length, unless you very carefully allocate it with just the right capacity.

Do we need to actually be able to hold nullptr, or a "null value"?  That is, could we use a void (in the SetIsVoid()) string to represent the "null" case?

Failing all else, we could come up with a "stringbuffer and length" struct that can be easily returned by value, I would think...

What are the ownership semantics here?  Do the things that accept these nsString* return values take ownership of them or just borrow the reference, or a mix?

If I were to try changing this, is it just a matter of editing the C++ core, or also the translation layer?
Flags: needinfo?(hsivonen)
> editing the C++ core

s/core/code/
(In reply to Boris Zbarsky [:bz] (still a bit busy) (if a patch has no decent message, automatic r-) from comment #5)
> nsStringBuffer has the fundamental problem of not knowing its own length,
> unless you very carefully allocate it with just the right capacity.
> 
> Do we need to actually be able to hold nullptr, or a "null value"?  That is,
> could we use a void (in the SetIsVoid()) string to represent the "null" case?

I'm pretty sure the requirements are:
 1) If foo is a variable of the string type, foo = nullptr needs to set foo to the null value.
 2) If foo is a variable of the string type, foo as part of a boolean expressions should be falsy.

I'm pretty sure, but not 100% sure, that foo == nullptr and foo != nullptr don't need to work. 

> Failing all else, we could come up with a "stringbuffer and length" struct
> that can be easily returned by value, I would think...

Makes sense. There's already jArray.h for handling Java arrays.

> What are the ownership semantics here?  Do the things that accept these
> nsString* return values take ownership of them or just borrow the reference,
> or a mix?

1) There is at most one owning reference.
2) If passed to something else that holds onto the string, the previous owner must do previousReference = nullptr.
3) If the owner wants to dispose of a string, it must pass the string to nsHtml5Portability::releaseString() and then assign nullptr to the the variable that held the reference.

> If I were to try changing this, is it just a matter of editing the C++ core,
> or also the translation layer?

The mechanics would be:
 1) Introduce a .h for the new type.
 2) Change https://hg.mozilla.org/projects/htmlparser/file/d9bdf700af24/translator-src/nu/validator/htmlparser/cpptranslate/CppTypes.java#l196 to return the name of the new type instead of "nsString*".
 3) Change the include lists at https://hg.mozilla.org/projects/htmlparser/file/d9bdf700af24/translator-src/nu/validator/htmlparser/cpptranslate/CppTypes.java#l99 to name the new .h (without the .h extension) where it now says "nsString".
 4) Re-run the translator.
 5) Edit every method in https://searchfox.org/mozilla-central/source/parser/html/nsHtml5Portability.cpp that takes or returns nsString* to take or return the new type instead and adjust the method bodies accordingly.
Flags: needinfo?(hsivonen)
(In reply to Henri Sivonen (:hsivonen) from comment #7)
> I'm pretty sure, but not 100% sure, that foo == nullptr and foo != nullptr
> don't need to work. 

Whenever Java says `foo == null`, the translator should output `!foo`, whenever Java says `foo != null`, the translator should output `!!foo`.
Thanks.  Henri, is this something you might have time for, or do we need to find someone else to do comment 7?
Flags: needinfo?(hsivonen)
I'll make time for this.
Assignee: nobody → hsivonen
Flags: needinfo?(hsivonen)
Comment on attachment 8849994 [details] [diff] [review]
Translator changes

After these changes, a bunch of stuff in nsHtml5Portability should be either declared inline or be inlined by the translator. I suggest leaving that to a follow-up.
Attachment #8849994 - Flags: review?(wchen)
Comment on attachment 8849997 [details]
Bug 1347737 - Introduce a new non-heap-allocated type for holding nsStringBuffer* in the HTML parser. .

https://reviewboard.mozilla.org/r/122754/#review125966

::: parser/html/nsHtml5MetaScannerCppSupplement.h:28
(Diff revision 2)
>  {
>    // This code needs to stay in sync with
>    // nsHtml5StreamParser::internalEncodingDeclaration. Unfortunately, the
>    // trickery with member fields here leads to some copy-paste reuse. :-(
>    nsAutoCString label;
> -  CopyUTF16toUTF8(*charset, label);
> +  nsString charset16;

nsAutoString

::: parser/html/nsHtml5Portability.cpp:65
(Diff revision 2)
>  }
>  
>  jArray<char16_t,int32_t>
> -nsHtml5Portability::newCharArrayFromString(nsString* string)
> +nsHtml5Portability::newCharArrayFromString(nsHtml5String string)
>  {
> -  int32_t len = string->Length();
> +	MOZ_RELEASE_ASSERT(string);

nit: tab -> spaces

::: parser/html/nsHtml5SpeculativeLoad.h:50
(Diff revision 2)
>  
> -    inline void InitMetaCSP(const nsAString& aCSP) {
> +    inline void InitMetaCSP(nsHtml5String aCSP) {
>        NS_PRECONDITION(mOpCode == eSpeculativeLoadUninitialized,
>                        "Trying to reinitialize a speculative load!");
>        mOpCode = eSpeculativeLoadCSP;
> +      nsString csp;

nsAutoString

::: parser/html/nsHtml5StreamParser.cpp:1297
(Diff revision 2)
>    NS_ASSERTION(IsParserThread(), "Wrong thread!");
>    if (mCharsetSource >= kCharsetFromMetaTag) { // this threshold corresponds to "confident" in the HTML5 spec
>      return false;
>    }
>  
> +  nsString newEncoding16;

nsAutoString

::: parser/html/nsHtml5String.cpp:8
(Diff revision 2)
> + * file, You can obtain one at http://mozilla.org/MPL/2.0/. */
> +
> +#include "nsHtml5String.h"
> +
> +nsHtml5String::nsHtml5String(const nsAString& aString)
> + : nsHtml5String(nsStringBuffer::FromString(aString), aString.Length() ? aString.Length() : UINT32_MAX)

This looks like it will construct a null string from an empty string. Also, this seems like it's not going to work if we try to construct from a non-shared string. I think we should add a debug assertion that makes sure that aString is 0 length or that it has the shared flag [1] to catch any instances where we pass the wrong kind of string into the constructor.

[1] https://dxr.mozilla.org/mozilla-central/rev/9577ddeaafd85554c2a855f385a87472a089d5c0/xpcom/string/nsSubstring.cpp#261

::: parser/html/nsHtml5String.cpp:15
(Diff revision 2)
> +}
> +
> +nsHtml5String::nsHtml5String(nsStringBuffer* aBuffer, uint32_t aLength)
> + : mBuffer(aBuffer)
> + , mLength(aLength)
> +{

I think it would be good to add an assertion here to make sure that if mLength == 0, then mBuffer == nullptr. That way we are extra sure that we only have one representation of an empty string. If somehow we get an empty string with a non-null buffer and 0 length, the Equals method could fail.

::: parser/html/nsHtml5String.cpp:53
(Diff revision 2)
> +      return false;
> +    }
> +    // this string is empty
> +    return !(*aLowerCaseLiteral);
> +  }
> +  const char* litPtr = aLowerCaseLiteral;

It looks like we can reuse existing comparison code here and in other places where we do comparisons https://dxr.mozilla.org/mozilla-central/rev/e03e0c60462c775c7558a1dc9d5cf2076c3cd1f9/xpcom/string/nsCharTraits.h#268

::: parser/html/nsHtml5TreeBuilderCppSupplement.h:323
(Diff revision 2)
>        if (nsHtml5Atoms::html == aName) {
> -        nsString* url = aAttributes->getValue(nsHtml5AttributeName::ATTR_MANIFEST);
> +        nsHtml5String url = aAttributes->getValue(nsHtml5AttributeName::ATTR_MANIFEST);
>          nsHtml5TreeOperation* treeOp = mOpQueue.AppendElement();
>          NS_ASSERTION(treeOp, "Tree op allocation failed.");
>          if (url) {
> -          treeOp->Init(eTreeOpProcessOfflineManifest, *url);
> +          nsString urlString;

nsAutoString

::: parser/html/nsHtml5TreeBuilderCppSupplement.h:747
(Diff revision 2)
>  
>  void
> -nsHtml5TreeBuilder::appendDoctypeToDocument(nsIAtom* aName, nsString* aPublicId, nsString* aSystemId)
> +nsHtml5TreeBuilder::appendDoctypeToDocument(nsIAtom* aName, nsHtml5String aPublicId, nsHtml5String aSystemId)
>  {
>    NS_PRECONDITION(aName, "Null name");
> -
> +  nsString publicId;

nsAutoString

::: parser/html/nsHtml5TreeOperation.cpp:322
(Diff revision 2)
>        Reget(aAttributes->getLocalNameNoBoundsCheck(i));
>      int32_t nsuri = aAttributes->getURINoBoundsCheck(i);
>      if (!node->HasAttr(nsuri, localName)) {
>        // prefix doesn't need regetting. it is always null or a static atom
>        // local name is never null
> +      nsString value;

nsAutoString

::: parser/html/nsHtml5TreeOperation.cpp:423
(Diff revision 2)
>      nsCOMPtr<nsIAtom> localName =
>        Reget(aAttributes->getLocalNameNoBoundsCheck(i));
>      nsCOMPtr<nsIAtom> prefix = aAttributes->getPrefixNoBoundsCheck(i);
>      int32_t nsuri = aAttributes->getURINoBoundsCheck(i);
>  
> +    nsString value;

nsAutoString
Attachment #8849997 - Flags: review?(wchen)
Attachment #8849994 - Flags: review?(wchen) → review+
Comment on attachment 8849997 [details]
Bug 1347737 - Introduce a new non-heap-allocated type for holding nsStringBuffer* in the HTML parser. .

https://reviewboard.mozilla.org/r/122754/#review125966

> nsAutoString

In the places where I use an `nsString` on the stack instead of following the "`nsAutoString` on stack" rule, the point is to have a string type that takes a reference to an `nsStringBuffer`. That is, we already know we have an `nsStringBuffer`. AFAICT, in this case using the `Auto` variant either has no effect other than allocating useless space on the stack or has the harm of copying the buffer. (I didn't check which, but it seems clear that a string that just refers to an `nsStringBuffer` is what's appropriate in these cases.)
The last try run showed seemingly unrelated orange. Let's see if a rebase fixes it:
https://treeherder.mozilla.org/#/jobs?repo=try&revision=a79b9aab4dec78cae28e3e895eeaa7b05545c3fb
Weird how this seems to make https://searchfox.org/mozilla-central/source/netwerk/base/nsSocketTransport2.cpp#1285 intermittents non-intermittent.
(In reply to Henri Sivonen (:hsivonen) from comment #25)
> Weird how this seems to make
> https://searchfox.org/mozilla-central/source/netwerk/base/nsSocketTransport2.
> cpp#1285 intermittents non-intermittent.

Maybe I accidentally started preloading all <link href>s or something.
(In reply to Henri Sivonen (:hsivonen) from comment #26)
> Maybe I accidentally started preloading all <link href>s or something.

Turns out I accidentally inverted the preload criteria.

https://hg.mozilla.org/try/pushloghtml?changeset=02ec972c8f83cecadfeb72570bd43037c4bd2dcd
Comment on attachment 8849997 [details]
Bug 1347737 - Introduce a new non-heap-allocated type for holding nsStringBuffer* in the HTML parser. .

https://reviewboard.mozilla.org/r/122754/#review125966

> nsAutoString

Intentionally not auto per previous comment.

> nsAutoString

Intentionally not auto per previous comment.

> This looks like it will construct a null string from an empty string. Also, this seems like it's not going to work if we try to construct from a non-shared string. I think we should add a debug assertion that makes sure that aString is 0 length or that it has the shared flag [1] to catch any instances where we pass the wrong kind of string into the constructor.
> 
> [1] https://dxr.mozilla.org/mozilla-central/rev/9577ddeaafd85554c2a855f385a87472a089d5c0/xpcom/string/nsSubstring.cpp#261

Fixed. And moved to allocating nsStringBuffer manually, because it was too easy to get it wrong otherwise. As a bonus, the new code avoids reallocation in attribute storage.

> I think it would be good to add an assertion here to make sure that if mLength == 0, then mBuffer == nullptr. That way we are extra sure that we only have one representation of an empty string. If somehow we get an empty string with a non-null buffer and 0 length, the Equals method could fail.

Added.

> It looks like we can reuse existing comparison code here and in other places where we do comparisons https://dxr.mozilla.org/mozilla-central/rev/e03e0c60462c775c7558a1dc9d5cf2076c3cd1f9/xpcom/string/nsCharTraits.h#268

Reused code.

> nsAutoString

Intentionally not auto per previous comment.

> nsAutoString

Intentionally not auto per previous comment.

> nsAutoString

Intentionally not auto per previous comment.

> nsAutoString

Intentionally not auto per previous comment.
Comment on attachment 8849997 [details]
Bug 1347737 - Introduce a new non-heap-allocated type for holding nsStringBuffer* in the HTML parser. .

https://reviewboard.mozilla.org/r/122752/#review129300

The remaining try failures seem unrelated.
I'm failing to figure out how to re-request review on MozReview after the reviewer has unset the review flag earlier. MozReview keeps remembering that. On another bug, I was able to switch field values back and forth between MozReview views, but that doesn't work here. Anyway, I do intend to be re-requesting review now.
Whiteboard: [qf]
Comment on attachment 8849997 [details]
Bug 1347737 - Introduce a new non-heap-allocated type for holding nsStringBuffer* in the HTML parser. .

https://reviewboard.mozilla.org/r/122754/#review129668

::: parser/html/nsHtml5String.h:48
(Diff revision 3)
> +  }
> +
> +  /**
> +   * Constructor from nsString. Do not use with subclasses of nsString.
> +   */
> +  explicit nsHtml5String(const nsString& aString);

Where is this defined?

::: parser/html/nsHtml5String.cpp:166
(Diff revision 3)
> +        "Out of memory so badly that couldn't even allocate placeholder.");
> +    }
> +    char16_t* data = reinterpret_cast<char16_t*>(buffer->Data());
> +    data[0] = 0xFFFD;
> +    data[1] = 0;
> +    return nsHtml5String(nullptr, 1);

This would trigger the assertion in the constructor, did you mean to pass the buffer to the constructor here?
Whiteboard: [qf] → [qf:p1]
Comment on attachment 8849997 [details]
Bug 1347737 - Introduce a new non-heap-allocated type for holding nsStringBuffer* in the HTML parser. .

https://reviewboard.mozilla.org/r/122754/#review130516
Attachment #8849997 - Flags: review?(wchen) → review+
Pushed by hsivonen@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/9a4a83630f4c
Introduce a new non-heap-allocated type for holding nsStringBuffer* in the HTML parser. r=wchen.
https://hg.mozilla.org/projects/htmlparser/rev/c7b8e40549d5e7eac39b611ad6c7513d4056943e
Mozilla bug 1347737 - Introduce a new non-heap-allocated type for holding nsStringBuffer* in the HTML parser. r=wchen.
Thanks.
https://hg.mozilla.org/mozilla-central/rev/9a4a83630f4c
Status: NEW → RESOLVED
Closed: 3 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla55
Did you happen to check if this had the expected kind of positive impact on whatever benchmark you were profiling?
Flags: needinfo?(bzbarsky)
This bug is blocking the benchmark bug, and there https://bugzilla.mozilla.org/show_bug.cgi?id=1347525#c5
Flags: needinfo?(bzbarsky)
Indeed.  This helped performance some, as expected.
Duplicate of this bug: 497818
You need to log in before you can comment on or make changes to this bug.