Closed Bug 1856854 Opened 2 years ago Closed 1 year ago

27.97 - 5.09% perf_reftest_singletons id-getter-2.html / perf_reftest_singletons id-getter-3.html + 2 more (Linux, OSX, Windows) regression on Wed September 27 2023

Categories

(Core :: XPCOM, defect)

defect

Tracking

()

RESOLVED WONTFIX
Tracking Status
firefox-esr115 --- unaffected
firefox118 --- unaffected
firefox119 --- unaffected
firefox120 --- fixed
firefox121 --- wontfix

People

(Reporter: afinder, Assigned: emilio)

References

(Regression)

Details

(4 keywords)

Attachments

(1 obsolete file)

Perfherder has detected a talos performance regression from push 84e0feee4ceb4f753e7ac5883ab6601c668d7da3. As author of one of the patches included in that push, we need your help to address this regression.

Regressions:

Ratio Test Platform Options Absolute values (old vs new)
28% perf_reftest_singletons id-getter-2.html windows10-64-shippable-qr e10s fission stylo webrender 594.89 -> 761.27
26% perf_reftest_singletons id-getter-2.html macosx1015-64-shippable-qr e10s fission stylo webrender 656.11 -> 823.51
12% perf_reftest_singletons id-getter-2.html linux1804-64-shippable-qr e10s fission stylo webrender 896.50 -> 1,000.35
5% perf_reftest_singletons id-getter-3.html windows10-64-shippable-qr e10s fission stylo webrender 318.66 -> 334.88

Improvements:

Ratio Test Platform Options Absolute values (old vs new)
2% tp5o_webext linux1804-64-shippable-qr e10s fission stylo webrender-sw 280.35 -> 273.99

Details of the alert can be found in the alert summary, including links to graphs and comparisons for each of the affected tests. Please follow our guide to handling regression bugs and let us know your plans within 3 business days, or the offending patch(es) may be backed out in accordance with our regression policy.

If you need the profiling jobs you can trigger them yourself from treeherder job view or ask a sheriff to do that for you.

For more information on performance sheriffing please see our FAQ.

Flags: needinfo?(emilio)

Set release status flags based on info from the regressing bug 1854446

Flags: needinfo?(emilio)
Severity: -- → S2
Flags: needinfo?(emilio)
See Also: → 1447951

So ok, I did a bit more digging. The new setup was pretty much the one we had before bug 1447951. Bug 1447951 has a bit of measurements of memory improvements that the old setup had. I don't think that accounted for all the copies this avoids tho (when passing strings from the html parser and atomizing them, etc), or at least I don't see that seen anywhere.

In particular, bug 1447951 comment 0 mentions how many atoms are shared at a particular point in time, but not about whether we avoided copies (temporary or not) by handing out references to those atoms or by saving the copy while atomizing.

The performance regression seems to me like mostly about cache misses. Or at least I can't find another explanation for it. comment 3 helps removing a bit of code but that's not enough to recover the regression. My hypothesis is that the reason why this becomes slower and other microbenchmarks that use larger strings not so, is because NewMaybeExternalString actually looks at the character buffer (twice) in order to create an inline string in this case. That comes from bug 1038099.

In any case, I don't think this benchmark is super-realistic in the sense that we're always passing the same atom that's hot in cache before the patch. So my patch would end up paying for two cache misses instead of one.

So the question is, what do we want to do? I see two options:

Back out bug 1854446

...going to the previous setup.

It presumably would give us some of the memory improvements from bug 1447951 (though we saw no AWSY regression). Some cons:

  • Atomizing strings from the HTML parser etc now require a string copy.
  • Atomizing UTF-8 strings from e.g. the CSS parser requires two copies instead of one.
  • Turning an atom into an nsAString requires a string copy.

We could file follow-ups to investigate other potential changes to the string representation to address those, but my understanding is that making nsAString understand nsAtom* more deeply didn't seem palatable to either Nika, nor Nathan at the time.

That said, the useless string copies that bug 1854446 was about have been removed with bug 1855074, so for atoms that are small (<= 16 code units), there's no big performance difference between the new setup and backing out bug 1854446.

Take this regression

...and simplifications like comment 3. Note that bug 1854446 was also an improvement in something that was not a micro-benchmark.

I think the new set up is a bit nicer because:

  • It avoids all the copies mentioned above.

  • It makes the performance characteristics of atoms more predictable: Before bug 1854446, if you use DOMString and SetKnownLiveAtom, you can get fast large atoms passing to JS. If you use nsAString, and nsAtom::ToString which is a lot more common, then you were out of luck, but that now just works.

  • I have other potential ideas to be able to share nsDependentAtomStrings with JS more easily, but in general the less bespoke string representations we have, the better.

I think I lean towards (2), because it makes the performance characteristics of atoms more predictable. But I'm ok with (1) and going back to the drawing board, if you don't agree, of course :)

Flags: needinfo?(smaug)
Flags: needinfo?(nika)
Flags: needinfo?(mstange.moz)
Flags: needinfo?(emilio)

But usually atomizing string from HTML parser doesn't need a copy. Normally there should be already an atom around.

It makes the performance characteristics of atoms more predictable
Well, now that atoms don't have the data inline, there is unpredictable cost accessing the data because of the indirect access.

I'd still rather investigate making nsAtoms nsStringBuffers. We don't even know what the overhead there is.
(and back out the changes for now)

Flags: needinfo?(smaug)

(In reply to Olli Pettay [:smaug][bugs@pettay.fi] from comment #5)

Well, now that atoms don't have the data inline, there is unpredictable cost accessing the data because of the indirect access.

Data has always been out of line for static atoms tho, right?

But usually atomizing string from HTML parser doesn't need a copy. Normally there should be already an atom around.

How so? We atomize a bunch of stuff both from HTML and CSS parsing (e.g., all different ids, classnames, and names in a document). That seems not-insignificant to me.

I'd still rather investigate making nsAtoms nsStringBuffers. We don't even know what the overhead there is.

It's a bit unclear to me how that would look... Can you confirm that would mean moving length / hash (lazily probably, or atom-only) / ascii-lowercase-bit (lazily too, probably, or atom-only) to nsStringBuffer, is that right?

If so, I guess the length at least would be duplicated with nsTStringRepr, and we'd need to keep that somehow in sync, which seems a bit unfortunate / error-prone...

If you confirm that's what you are thinking of, and Nika agrees it's worth exploring, I'm happy to give it a poke...

(and back out the changes for now)

Not sure I agree it's worth backing out, but okay.

Flags: needinfo?(smaug)

(In reply to Emilio Cobos Álvarez (:emilio) from comment #6)

(In reply to Olli Pettay [:smaug][bugs@pettay.fi] from comment #5)

Well, now that atoms don't have the data inline, there is unpredictable cost accessing the data because of the indirect access.

Data has always been out of line for static atoms tho, right?

Well, we're talking about dynamic atoms here :)

But usually atomizing string from HTML parser doesn't need a copy. Normally there should be already an atom around.

How so? We atomize a bunch of stuff both from HTML and CSS parsing (e.g., all different ids, classnames, and names in a document). That seems not-insignificant to me.

Sure, but often we can reuse an atom which is already in mrucache, when using main thread only atomization.

I'd still rather investigate making nsAtoms nsStringBuffers. We don't even know what the overhead there is.

It's a bit unclear to me how that would look... Can you confirm that would mean moving length / hash (lazily probably, or atom-only) / ascii-lowercase-bit (lazily too, probably, or atom-only) to nsStringBuffer, is that right?

StringBuffer has already size, that could be use as length for atoms. But yes.

If so, I guess the length at least would be duplicated with nsTStringRepr, and we'd need to keep that somehow in sync, which seems a bit unfortunate / error-prone...

I don't quite understand what this means. An atom StringBuffer could be readonly.

Flags: needinfo?(smaug)

What was the final decision? Should we back out or not? I'm asking because the soft freeze for 120 is tomorrow, if we don't back out by then we'll have to uplift it.

Flags: needinfo?(emilio)

No decision yet, but we could back out from beta without much issue. Also this is a very targetted micro-benchmark, so it's not a huge deal IMO.

Flags: needinfo?(emilio)

Set release status flags based on info from the regressing bug 1854446

do we need to backout bug 1854446 out of beta? This week is the last week of early beta.

Flags: needinfo?(emilio)

Yeah, let's back out of beta for now, can you do that?

Flags: needinfo?(emilio) → needinfo?(smaug)

Err

Flags: needinfo?(smaug) → needinfo?(dsmith)

Yes i can do that! should we also back this out from central fx121 ( i can request that) or will the fix be ready in time for mergeday?

Flags: needinfo?(dsmith)

120 fixed by backout of bug 1854446

I'm hoping to have a decision on this made by then.

Only a week left before Fx121 goes to RC. Can we please try to make a decision soon? FWIW, it looks like macOS is the only platform where we're still seeing consistently higher numbers at this point - it looks like bug 1857237 clawed back a lot of the regression already on Linux & Windows.

Flags: needinfo?(emilio)

Yeah it makes sense that bug 1857237 would address this regression, effectively.

I think given that, I'd rather take this simplification and call it a day. The less weird string representations we have, the better, IMO. But I'm also fine backing out the regressor in central if you want, Olli.

Flags: needinfo?(emilio) → needinfo?(smaug)

If the regression has been fixed in some manner, no need to back out, IMO.

Flags: needinfo?(smaug)

It has been partially fixed on Linux and Windows, and it is still not on Mac.
Closing as WONTFIX given recent comments.

Status: NEW → RESOLVED
Closed: 1 year ago
Flags: needinfo?(nika)
Flags: needinfo?(mstange.moz)
Resolution: --- → WONTFIX
Blocks: 1870608

Comment on attachment 9356911 [details]
Bug 1856854 - Remove DOMString atom state now that atoms hold an nsStringBuffer. r=smaug

Revision D190218 was moved to bug 1870608. Setting attachment 9356911 [details] to obsolete.

Attachment #9356911 - Attachment is obsolete: true
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: