The PHC check has too much overhead on arm64
Categories
(Core :: Memory Allocator, enhancement, P1)
Tracking
()
Tracking | Status | |
---|---|---|
firefox129 | --- | fixed |
People
(Reporter: mstange, Assigned: pbone)
References
(Blocks 1 open bug)
Details
Attachments
(14 files, 3 obsolete files)
48 bytes,
text/x-phabricator-request
|
Details | Review | |
48 bytes,
text/x-phabricator-request
|
Details | Review | |
48 bytes,
text/x-phabricator-request
|
Details | Review | |
48 bytes,
text/x-phabricator-request
|
Details | Review | |
48 bytes,
text/x-phabricator-request
|
Details | Review | |
48 bytes,
text/x-phabricator-request
|
Details | Review | |
48 bytes,
text/x-phabricator-request
|
Details | Review | |
48 bytes,
text/x-phabricator-request
|
Details | Review | |
48 bytes,
text/x-phabricator-request
|
Details | Review | |
48 bytes,
text/x-phabricator-request
|
Details | Review | |
48 bytes,
text/x-phabricator-request
|
Details | Review | |
48 bytes,
text/x-phabricator-request
|
Details | Review | |
48 bytes,
text/x-phabricator-request
|
Details | Review | |
48 bytes,
text/x-phabricator-request
|
Details | Review |
Profile: https://share.firefox.dev/4aOcGCP
https://speedometer-preview.netlify.app/?suite=TodoMVC-Lit-Complex-DOM&iterationCount=100 is spending too much time in MaybePageAlloc
, specifically in this check:
int32_t newDelay = GAtomic::DecrementDelay();
if (newDelay != 0) {
return nullptr;
}
I think the atomic should be revisited. In particular, I don't believe this sentence:
// It's critical that sAllocDelay has ReleaseAcquire semantics, because that
// guarantees that exactly one thread will see sAllocDelay have the value 0.
// (Relaxed semantics wouldn't guarantee that.)
Relaxed atomics are still atomic.
Assignee | ||
Comment 1•9 months ago
|
||
Thanks Markus.
I think we could use thread local variables here rather than atomics.
This bug is almost a duplicate of 1833807, but I'm going to make it "depends on" because once we do that bug we should re-test arm64.
Assignee | ||
Comment 2•7 months ago
|
||
Working on this now.
Assignee | ||
Comment 3•7 months ago
|
||
Currently we have 3 structures containing mutable data seperated by their
lock-safe behaviour (locked, atomic and TLS). I'd prefer to associate
structures by what they mean. This patch moves the tlsIsDisabled
variable, the only one in GTls
into GMutable
and removes the GTls
class.
Assignee | ||
Comment 4•7 months ago
|
||
Assignee | ||
Comment 5•7 months ago
|
||
Assignee | ||
Comment 6•7 months ago
|
||
This also means that we can remove the logic that checks if it has wrapped
if PHC was disabled on the current thread since that won't happen due to
other threads. The patch also adds code that has to initialise the delay
on each thread.
Assignee | ||
Comment 7•7 months ago
|
||
Testing the patches first on x86_64 (Ryzen 4 Threadripper) I got the following results for the subtest above (https://speedometer-preview.netlify.app/?suite=TodoMVC-Lit-Complex-DOM&iterationCount=100)
Central: 28.1 +- 0.32
Tip: 28.5 +- 0.36
Disabled: 27.8 +- 0.32
Maybe smaller numbers are better in which case my patch makes this slower?
But looking at the profiles, the overhead of this code goes from 2.2% to 1.5%, so either my patch makes this better or it hits a bottleneck elsewhere.
Central: https://share.firefox.dev/3U4JY9l
Tip: https://share.firefox.dev/4aDsQ0X
I'll run tests on my Mac Mini M1 soon.
Assignee | ||
Comment 8•7 months ago
|
||
On my Mac M1
Central: 38.7 +- 0.40
Tip: 39.2 +- 0.55
PHC disabled: 39.8 +- 0.46
So my patch helps.
Profiles:
Central: https://share.firefox.dev/49n79kA
Tip: https://share.firefox.dev/3vCs45w
it seems to drop from 1.2% to 0.6% here (you have to find pthread_getspecific and pthread_setspecific)
I also noticed that in my "central" profile the branch outside DecrementDelay()
showed up with a moderate number of samples, but in the new profile it had zero. So this has improved the cost of MaybePageAlloc()
, not only DecrementDelay()
On ARM the atomic used for Now()
didn't show up at all. it is Relaxed.
Updated•6 months ago
|
Updated•6 months ago
|
Updated•6 months ago
|
Updated•6 months ago
|
Assignee | ||
Comment 9•6 months ago
|
||
Assignee | ||
Comment 10•6 months ago
|
||
Assignee | ||
Comment 11•6 months ago
|
||
Assignee | ||
Comment 12•6 months ago
|
||
Assignee | ||
Comment 13•6 months ago
|
||
Assignee | ||
Comment 14•5 months ago
|
||
Assignee | ||
Comment 15•5 months ago
|
||
Assignee | ||
Comment 16•5 months ago
|
||
Rename this structure to PHCRegion and move its instance into PHC.
Assignee | ||
Comment 17•5 months ago
|
||
Assignee | ||
Comment 18•5 months ago
|
||
Updated•5 months ago
|
Updated•5 months ago
|
Assignee | ||
Comment 19•5 months ago
|
||
This patch uses the wrong approach
Updated•5 months ago
|
Updated•5 months ago
|
Updated•5 months ago
|
Updated•5 months ago
|
Updated•5 months ago
|
Updated•5 months ago
|
Updated•5 months ago
|
Updated•5 months ago
|
Updated•5 months ago
|
Updated•5 months ago
|
Updated•5 months ago
|
Updated•5 months ago
|
Updated•5 months ago
|
Updated•4 months ago
|
Updated•4 months ago
|
Updated•4 months ago
|
Updated•4 months ago
|
Updated•4 months ago
|
Assignee | ||
Comment 21•4 months ago
|
||
Updated•4 months ago
|
Assignee | ||
Comment 22•4 months ago
|
||
Comment 23•4 months ago
|
||
Comment 24•4 months ago
|
||
bugherder |
https://hg.mozilla.org/mozilla-central/rev/1f2e2a375881
https://hg.mozilla.org/mozilla-central/rev/ecae8e56c5cc
https://hg.mozilla.org/mozilla-central/rev/d1f91fdc7c68
https://hg.mozilla.org/mozilla-central/rev/6248406a1453
https://hg.mozilla.org/mozilla-central/rev/2092fb5bf23b
https://hg.mozilla.org/mozilla-central/rev/da3d695e2d65
https://hg.mozilla.org/mozilla-central/rev/6c5254f3df9f
https://hg.mozilla.org/mozilla-central/rev/d6539e603aa5
https://hg.mozilla.org/mozilla-central/rev/41762a317aae
https://hg.mozilla.org/mozilla-central/rev/6ace59776780
https://hg.mozilla.org/mozilla-central/rev/dc512f63eaa6
https://hg.mozilla.org/mozilla-central/rev/2a81039219c9
https://hg.mozilla.org/mozilla-central/rev/fa1129aa4279
https://hg.mozilla.org/mozilla-central/rev/f72b2832f50f
Updated•4 months ago
|
Description
•