Closed Bug 1609478 Opened 5 years ago Closed 2 years ago

Avoid doing large memsets to poison values (or to 0, etc) under the memory lock

Tracking

()

Status:

RESOLVED FIXED

Milestone:

113 Branch

Project Flags:

Performance Impact

medium

Tracking Flags:

Tracking

Status

firefox113

---

fixed

People

(Reporter: jesup, Assigned: pbone)

References

(Blocks 3 open bugs)

Details

(Keywords: perf:responsiveness, Whiteboard: [sp3])

Attachments

(1 file, 1 obsolete file)

Bug 1609478: unlock around (large) memory poisoning r=glandium 5 years ago Randell Jesup [:jesup] (needinfo me) 47 bytes, text/x-phabricator-request		Details \| Review
Bug 1609478 - Poison memory we want to free before we take the arena lock r=glandium 2 years ago Paul Bone [:pbone] 48 bytes, text/x-phabricator-request		Details \| Review

Randell Jesup [:jesup] (needinfo me)

Reporter

Description

•

5 years ago

•

Edited

We've seen profiles where the MainThread is blocked for 10's of ms on the memory lock, because JS Helper's GC-freeing code is poisoning large blocks of memory (see https://faraday.basschouten.com/mozilla/jsmemcpysmall.html)

At least when freeing large blocks of memory, the overhead cost of unlocking around the memset is in the noise. For small blocks it's probably not needed much and might be counter-productive (especially for very small), though if we can move anything out of the lock in general, that's always good.

See for example: https://perfht.ml/2u1IJ2h
(from https://docs.google.com/spreadsheets/d/1v0w5OZy3MQxRVQj-557wqdV7FzehzIz0AaqRHMaoPFE/edit#gid=75257179)

Randell Jesup [:jesup] (needinfo me)

Reporter

Comment 1

•

5 years ago

Attached file Bug 1609478: unlock around (large) memory poisoning r=glandium (obsolete) — Details

Olli Pettay [:smaug][bugs@pettay.fi]

Updated

•

5 years ago

Whiteboard: [qf] → [qf:p1:responsiveness]

Mike Hommey [:glandium]

Comment 2

•

5 years ago

Do you have a link to a profile? The one in comment 0 doesn't work.

Now, without a profile, I'll say this: the claim that "the MainThread is blocked for 10's of ms on the memory lock" is dubious. I can believe that's what it could look like in a sampled profile if many large allocations are freed in a succession on the main thread because of the likeliness to get a sample during memset.

But considering a large allocation is at most 1036288 bytes, spending > 10ms on a memset for that amount of memory sounds ridiculous. That would be less than 100MB/s. SHA1 is much more complex than memset and can do > 500MB/s.

I'm not saying we shouldn't take the patch (but I want to carefully think about the side effects), but I'm saying the premise seems bogus, and the profiles should probably be looked at more deeply.

Randell Jesup [:jesup] (needinfo me)

Reporter

Updated

•

3 years ago

Assignee: rjesup → nobody

Phabricator Automation

Updated

•

3 years ago

Assignee: nobody → rjesup

Status: NEW → ASSIGNED

Dave Hunt [:davehunt] [he/him] ⌚BST

Updated

•

3 years ago

Performance Impact: --- → P1

Keywords: perf:responsiveness

Whiteboard: [qf:p1:responsiveness]

Markus Stange [:mstange]

Comment 5

•

3 years ago

I'm going to reduce the priority on this bug until we have better proof of this being a problem. Speaking from my perspective, I remember seeing this issue show up in profiles every now and then but not very frequently.

We can increase the priority again when we've found more profiles that show this problem.

Performance Impact: P1 → P2

Markus Stange [:mstange]

Comment 6

•

3 years ago

(In reply to Mike Hommey [:glandium] from comment #2)

Do you have a link to a profile? The one in comment 0 doesn't work.

It does work for me, fwiw. Here's the thread that contains the memset: https://share.firefox.dev/3c5uejI
On that thread you can see that the memset is interleaved with calls to arena_t::Purge, so it probably is indeed multiple allocations.

That would be less than 100MB/s

The memory could be paged out, or compressed, too. For memory which hasn't been touched in a long time, it seems odd to pay a cost during free() to page it all back into the cache.

BMO Automation

Updated

•

3 years ago

Severity: normal → S3

Paul Bone [:pbone]

Assignee

Updated

•

2 years ago

Comment 7

•

2 years ago

FWIW, Chrome's PartitionAlloc doesn't seem to poison on free:
https://source.chromium.org/chromium/chromium/src/+/main:base/allocator/partition_allocator/partition_root.h;l=1331;drc=af8d5d2a6d2f816c1bf96481faae6349b92ae70d

i.e. they only unconditionally poison in debug builds and have some disabled support for poisoning 1% of allocations:
https://source.chromium.org/chromium/chromium/src/+/main:base/allocator/partition_allocator/partition_alloc_config.h;l=150;drc=af8d5d2a6d2f816c1bf96481faae6349b92ae70d

Gian-Carlo Pascutto [:gcp]

Comment 8

•

2 years ago

Chrome's PartitionAlloc doesn't seem to poison on free

We found security bugs in macOS betas with our poisoning allocator, which makes me suspect we're one of the only big packages to do that ;-)

Paul Bone [:pbone]

Assignee

Updated

•

2 years ago

Blocks: 1809610

OS: Unspecified → All

Priority: -- → P1

Hardware: Unspecified → All

Paul Bone [:pbone]

Assignee

Comment 9

•

2 years ago

Attached file Bug 1609478 - Poison memory we want to free before we take the arena lock r=glandium — Details

Dave Hunt [:davehunt] [he/him] ⌚BST

Updated

•

2 years ago

Whiteboard: [sp3]

Jira Integration Bot

Updated

•

2 years ago

See Also: → https://mozilla-hub.atlassian.net/browse/SP3-135

Paul Bone [:pbone]

Assignee

Comment 10

•

2 years ago

Performance results

When I test on my workstation I get a speedometer score of 91.7 +- 2.7, before it was 89.7 +- 2.2. Speedometer is rather variable, but that's an improvment of ~2%. However when we use try:

https://treeherder.mozilla.org/perfherder/compare?originalProject=try&originalRevision=058ad9782751284e6af9b4a9103db1d6003b5594&newProject=try&newRevision=f228b80a83a528a838f46683ddd3edb726a4ee39&framework=13&page=1&showOnlyConfident=1&showOnlyImportant=1

This doesn't improve speedometer, or does by a negligible amount. On my workstation if I limit Firefox to a smaller number of cores or disable turbo boost the performance improvement goes away. I believe that this is because this patch reduces time spent with the arena lock held, the main thread is now less likely to block and so without this change when it became blocked it was an opportunity for the CPU to slow its clockspeed or the OS to reschedule the thread onto a different CPU core, both are now less likely with the patch applied. This change in performance may be more likely to show up on laptops and desktops rather than the Xeons we use for performance testing.

In the link to the tests above there are many load and visual metrics tests that do see an improvement though, and others that see a regression (it is after all a scheduling change).

Phabricator Automation

Updated

•

2 years ago

Attachment #9121099 - Attachment is obsolete: true

Pulsebot

Comment 11

•

2 years ago

Pushed by pbone@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/9264c1bf87fa Poison memory we want to free before we take the arena lock r=glandium

Cristian Tuns

Comment 12

•

2 years ago

bugherder

https://hg.mozilla.org/mozilla-central/rev/9264c1bf87fa

Status: ASSIGNED → RESOLVED

Closed: 2 years ago

status-firefox113: --- → fixed

Resolution: --- → FIXED

Target Milestone: --- → 113 Branch

Nicolas Chevobbe [:nchevobbe]

Comment 13

•

2 years ago

Looks like it brought improvement to devtools tests:

== Change summary for alert #37704 (as of Sat, 18 Mar 2023 00:29:04 GMT) ==

Improvements:

Ratio	Test	Platform	Options	Absolute values (old vs new)
13%	reload-inspector:content-process memory	linux1804-64-tsan-qr		109,501,098.67 -> 95,255,210.67
5%	damp server.protocoljs.DAMP	windows10-64-shippable-qr	e10s fission stylo webrender	1,353.39 -> 1,290.51
5%	damp server.protocoljs.DAMP	windows10-64-shippable-qr	e10s fission stylo webrender-sw	1,355.33 -> 1,294.07

For up to date results, see: https://treeherder.mozilla.org/perfherder/alerts?id=37704

Ryan VanderMeulen [:RyanVM]

Updated

•

2 years ago

Regressions: 1825837

Andrew McCreight [:mccr8]

Updated

•

2 years ago

Assignee: rjesup → pbone

Mayank Bansal

Updated

•

2 years ago

Blocks: 1811985

Mayank Bansal

Updated

•

2 years ago

Blocks: 1811784

You need to log in before you can comment on or make changes to this bug.