Painful slow performance for DMD builds on OS X

RESOLVED FIXED in Firefox 56

Status

()

Core
DMD
RESOLVED FIXED
2 months ago
a month ago

People

(Reporter: whimboo, Assigned: njn)

Tracking

({perf})

54 Branch
mozilla56
Points:
---

Firefox Tracking Flags

(firefox56 fixed)

Details

(Whiteboard: [MemShrink])

Attachments

(1 attachment, 2 obsolete attachments)

(Reporter)

Description

2 months ago
Mozilla/5.0 (Macintosh; Intel Mac OS X 10.12.5; rv:55.0) Gecko/20100101 Firefox/55.0 ID:20170608030205 CSet: 7efda263a842e60cd0cc00b3c4a7058c65590702

I followed the instructions to build Firefox with DMD enabled. In my case I used mozilla-beta to create an opt build with official branding.

When starting the build via `mach run --dmd` it takes more than 5min until Firefox has been started. Afterward it's not usable at all, like tab switching takes 5s, and when using the keyboard the letters appear 1-2s later.
I can reproduce this on m-c as well. Just sampling in lldb by periodically breaking it looks like we're spending all our time unwinding. I wonder if Apple updated libunwind in recent MacOS releases?

Looking at a snapshot of the code I can see where they have some potentially terrible perf (linked list for cache lookups) and maybe a deadlock where they acquire a read lock when adding to the cache and the do a malloc with the lock held which could re-enter our DMD malloc hooks.
(In reply to Eric Rahm [:erahm] (please no mozreview requests) from comment #1)
> I can reproduce this on m-c as well. Just sampling in lldb by periodically
> breaking it looks like we're spending all our time unwinding. I wonder if
> Apple updated libunwind in recent MacOS releases?
> 
> Looking at a snapshot of the code I can see where they have some potentially
> terrible perf (linked list for cache lookups) and maybe a deadlock where
> they acquire a read lock when adding to the cache and the do a malloc with
> the lock held which could re-enter our DMD malloc hooks.

We should avoid the deadlock with our intercept blocker [1], so that probably just leaves horrible perf.

[1] http://searchfox.org/mozilla-central/rev/a798ee4fc323f9387b7576dbed177859d29d09b7/memory/replace/dmd/DMD.cpp#1170
Whiteboard: [MemShrink]
(In reply to Eric Rahm [:erahm] (please no mozreview requests) from comment #1)
> I can reproduce this on m-c as well. Just sampling in lldb by periodically
> breaking it looks like we're spending all our time unwinding. I wonder if
> Apple updated libunwind in recent MacOS releases?

FWIW, capturing stacks in the Rust `backtrace` module is essentially unusable on Mac OS X, for this reason -- see https://github.com/brson/error-chain/issues/129#issuecomment-281319312 for a not-very authoritative example of this.
(Assignee)

Comment 4

2 months ago
Created attachment 8883177 [details] [diff] [review]
Use FramePointerStackWalk() in DMD on Mac

This avoids MozStackWalk(), which has become unusably slow on Mac due to
changes in libunwind, and gets us back to decent speed.

The code for getting the frame pointer and stack end was copied from the Gecko
Profiler, which also uses FramePointerStackWalk() on Mac.
Attachment #8883177 - Flags: review?(erahm)
(Assignee)

Updated

2 months ago
Assignee: nobody → n.nethercote
Status: NEW → ASSIGNED
Comment on attachment 8883177 [details] [diff] [review]
Use FramePointerStackWalk() in DMD on Mac

Review of attachment 8883177 [details] [diff] [review]:
-----------------------------------------------------------------

Thanks for picking this up! A few notes below.

::: memory/replace/dmd/DMD.cpp
@@ +752,5 @@
>    // So let's just release it on all platforms.
>    StackTrace tmp;
>    {
>      AutoUnlockState unlock;
> +#ifdef XP_MACOSX

Don't we want this on win 32 as well?

@@ +755,5 @@
>      AutoUnlockState unlock;
> +#ifdef XP_MACOSX
> +    void* fp;
> +    asm (
> +        // Dereference %rbp to get previous %rbp

If we stick with OSX only we can use __builtin_frame_address rather than inline assembly which I'd prefer.

@@ +761,5 @@
> +        :
> +        "=r"(fp)
> +    );
> +    void* stackEnd = pthread_get_stackaddr_np(pthread_self());
> +    bool ok = FramePointerStackWalk(StackWalkCallback, /* skipFrames = */ 0,

Presumably we still want to skip 2. If not can you add a comment explaining the difference?
Attachment #8883177 - Flags: review?(erahm) → feedback+
(Assignee)

Comment 6

2 months ago
> Don't we want this on win 32 as well?

Possibly. I'll try it as a follow-up.
 
> If we stick with OSX only we can use __builtin_frame_address rather than
> inline assembly which I'd prefer.
>
> Presumably we still want to skip 2. If not can you add a comment explaining
> the difference?

__builtin_frame_address() gives a different result, I think it's one frame higher, i.e. the asm code effectively skips a frame. 
With the asm plus skipFrame=0 the first function in each stack is the replace_* function, which I think matches what happens on Linux.

As for switching to __builtin_frame_address, I'd prefer to stick with the asm because I like having this code basically the same as the code in the Gecko Profiler. (I've added a comment about the similarity.)
(Assignee)

Comment 7

2 months ago
Created attachment 8883842 [details] [diff] [review]
Use FramePointerStackWalk() in DMD on Mac
Attachment #8883842 - Flags: review?(erahm)
(Assignee)

Updated

2 months ago
Attachment #8883177 - Attachment is obsolete: true
(In reply to Nicholas Nethercote [:njn] from comment #7)
> Created attachment 8883842 [details] [diff] [review]
> Use FramePointerStackWalk() in DMD on Mac

Patch looks the same as the old one, feel free to carry forward a provisional r+ and land an update with the comments added.
Flags: needinfo?(n.nethercote)
(Assignee)

Comment 9

a month ago
Created attachment 8884073 [details] [diff] [review]
Use FramePointerStackWalk() in DMD on Mac

Apologies for the patch management fail. This one has the comments.
Attachment #8884073 - Flags: review?(erahm)
(Assignee)

Updated

a month ago
Attachment #8883842 - Attachment is obsolete: true
Attachment #8883842 - Flags: review?(erahm)
(Assignee)

Updated

a month ago
Flags: needinfo?(n.nethercote)
Attachment #8884073 - Flags: review?(erahm) → review+
(Assignee)

Comment 10

a month ago
https://hg.mozilla.org/integration/mozilla-inbound/rev/2ec4756b0d4dceeb63f748b1e2080ae4d1e50c00
Bug 1371397 - Use FramePointerStackWalk() in DMD on Mac. r=erahm.

Comment 11

a month ago
bugherder
https://hg.mozilla.org/mozilla-central/rev/2ec4756b0d4d
Status: ASSIGNED → RESOLVED
Last Resolved: a month ago
status-firefox56: --- → fixed
Resolution: --- → FIXED
Target Milestone: --- → mozilla56
You need to log in before you can comment on or make changes to this bug.