Closed Bug 1138876 Opened 10 years ago Closed 24 days ago

Backtracking allocator: 30% performance degradation on parallel Mandelbrot benchmark

Tracking

()

Status:

RESOLVED INCOMPLETE

People

(Reporter: lth, Unassigned)

Details

Attachments

(3 files)

Patch that implements worker blocking, old style 10 years ago Lars T Hansen [:lth] 33.09 KB, patch		Details \| Diff \| Splinter Review
Remove new futex implementation, reinstate the old framework 10 years ago Lars T Hansen [:lth] 32.53 KB, patch		Details \| Diff \| Splinter Review
Old futex implementation on new m-i code 10 years ago Lars T Hansen [:lth] 33.40 KB, patch		Details \| Diff \| Splinter Review

Lars T Hansen [:lth]

Reporter

Description

•

10 years ago

This needs further investigation but it appears that the frame rate for the sab+atomics mandelbrot demos is about 30% lower on current m-i tip than it was about two weeks ago, on two different systems (AMD FX4100 quad on Ubuntu 14.10 and Core i7 4x2 on Mac OS X 10.10). Of course this could be anything. On the sab+atomics side the main thing that has happened is that the futex code was rewritten to use JS interrupts, not special hooks in the DOM.

Lars T Hansen [:lth]

Reporter

Comment 1

•

10 years ago

Confirmed with this setup: - MBP "late 2013" 4x2 2.6GHz Core i7, 16GB RAM - parlib-simple/demo/mandelbrot-animation2 (github.com/lars-t-hansen) - parlib-simple on the "issue26" branch, which adds some assertions and a bug fix - e10s disabled - developer console not shown - build with --enable-debug-symbols only mozilla-inbound from 3 March: 4 workers 27.3fps 6 workers 34.2fps 8 workers 36.1fps mozilla-inbound from 20 February (d9a929677d0a) with patch applied for blocking-on-workers-only: 4 workers 27.2fps 6 workers 37.4fps 8 workers 45.6fps That changeset is the one immediately before the current futex code was landed. The patch I applied to that changeset will be attached to this bug.

Lars T Hansen [:lth]

Reporter

Comment 2

•

10 years ago

Attached patch Patch that implements worker blocking, old style — Details — Splinter Review

Lars T Hansen [:lth]

Reporter

Comment 3

•

10 years ago

I will attach two more patches, which respectively remove the new futex implementation from current m-i and apply the old futex implementation. With the old futex implementation on the new code, the slowdown is still there: mozilla-inbound from 4 March, with old futex implementation: 4 workers 27.1fps 6 workers 34.1fps 8 workers 36.0fps Ergo the futex implementation is not to blame, something else has caused this slowdown. * It could be a code generation issue: the slowdown only appears at higher utilization levels of this 4x2 system. * It could be some sort of throttling / nicing of worker threads when there are more workers than cores (for example). * It could be something to do with graphics, since this program copies a lot of data into a byte array for display. (I will attempt to bisect.)

Lars T Hansen [:lth]

Reporter

Comment 4

•

10 years ago

Attached patch Remove new futex implementation, reinstate the old framework — Details — Splinter Review

Lars T Hansen [:lth]

Reporter

Comment 5

•

10 years ago

Attached patch Old futex implementation on new m-i code — Details — Splinter Review

Lars T Hansen [:lth]

Reporter

Comment 6

•

10 years ago

Initial regression window (m-c to m-i merge points): Known bad: 230668:900075e013be Feb 25 Known good: 230479:be9b4a3b01ab Feb 24

Lars T Hansen [:lth]

Reporter

Comment 7

•

10 years ago

Bisection implicates the backtracking register allocator: changeset: 230540:acc238be19a5 user: Brian Hackett <bhackett1024@gmail.com> date: Tue Feb 24 15:59:37 2015 -0600 summary: Bug 826741 - Use the backtracking register allocator by default, r=jandem.

Lars T Hansen [:lth]

Reporter

Comment 8

•

10 years ago

Re-enabling the LSRA on tip brings the performance back up to 45fps with 8 workers, so this is definitely some effect of the backtracking allocator. Whether it's spilling or some awkward pressure on functional units or something else I don't know (yet). One factoid about the kernel of this demo is that translating it to asm.js did not improve its performance, Ion already did a stellar job generating code for it, it's monotyped and does no allocation. So it could just be very sensitive to minor perturbations.

Component: JavaScript Engine → JavaScript Engine: JIT

Summary: Apparent 30% performance degradation in parallel performance relative to ca mid-February → Backtracking allocator: 30% performance degradation on parallel Mandelbrot benchmark

Lars T Hansen [:lth]

Reporter

Comment 9

•

10 years ago

On my 4-core AMD system, enabling or disabling the backtracking allocator has no effect on performance, just like the backtracking allocator had no negative effects on performance when running with only 4 workers on the 4-core (hyperthreaded) i7.

Lars T Hansen [:lth]

Reporter

Comment 10

•

10 years ago

(I'll try to port the program to the JS shell, with luck it'll repro.)

Lars T Hansen [:lth]

Reporter

Comment 11

•

10 years ago

Another observation: the simpler mandelbrot-animation program is not similarly sensitive to the register allocator (but it runs at the speed of the slow case for mandelbrot-animation2, 35fps). The difference between the two programs is chiefly that the simpler program computes strips of the output (one strip per worker in a deterministic assignment) before displaying it, while the more complicated program computes subgrids of the output (quite a lot of subgrids, 4 times the number of workers along each dimension, chosen from a queue as work is completed) and also overlaps the computation of the next image with the display of the previous image. The two programs probably have very different memory access and ownership patterns. (Not able to repro the phenomenon in the shell, thus far.)

Lars T Hansen [:lth]

Reporter

Comment 12

•

10 years ago

FWIW, still running at the slower rate on current mozilla-inbound.

Lars T Hansen [:lth]

Reporter

Comment 13

•

9 years ago

I just ran across this again in a similar program and found an interesting correlation. I think it may have to do with the number of parameters to the function leading to some poor register allocation choices. I have two variables, centerY and centerX, which were initially global. Their values are constant. They are used in the function but are not hot (only used before the nested loops). When I add these variables to the parameter list of the function and pass them in the call, performance drops by 30%. In this case, the number of parameters increased from four to six. AMD64, Linux, FF 46.0a2. (Have not had the chance to dig into this further.)

Lars T Hansen [:lth]

Reporter

Comment 14

•

9 years ago

I should mention that in the latter case, there's no shared memory or atomics - it's standard JS.

Lars T Hansen [:lth]

Reporter

Comment 15

•

8 years ago

Needs benchmark update and retest.

Priority: -- → P5

Lars T Hansen [:lth]

Reporter

Comment 16

•

8 years ago

The working benchmark code is https://github.com/lars-t-hansen/parlib-simple, in demo/mandelbrot-animation2/mandelbrot.html. Pass the number of workers as a URL parameter, ?workers=n. The default is 4. I had hoped the fix to bug 1205073 might have fixed this too, but it has not. If anything, this program is slower (locally built Nightly, JSGC_DISABLE_POISONING=1) than before, I get 34fps with 8 workers.

Assignee: lhansen → nobody

BMO Automation

Updated

•

2 years ago

Severity: normal → S3

Matthew Gaudet (he/him) [:mgaudet]

Updated

•

24 days ago

Status: NEW → RESOLVED

Closed: 24 days ago

Resolution: --- → INCOMPLETE

You need to log in before you can comment on or make changes to this bug.

Bugzilla

Backtracking allocator: 30% performance degradation on parallel Mandelbrot benchmark

Categories

(Core :: JavaScript Engine: JIT, defect, P5)

Tracking

()

People

(Reporter: lth, Unassigned)

References

Details

Crash Data

Security

(public)

User Story

Attachments

(3 files)

Description

Comment 1

Comment 2

Comment 3

Comment 4

Comment 5

Comment 6

Comment 7

Comment 8

Comment 9

Comment 10

Comment 11

Comment 12

Comment 13

Comment 14

Comment 15

Comment 16

Updated

Updated

Attachment

General

Description

File Name

Content Type