Closed Bug 1719896 Opened 4 years ago Closed 3 months ago

Migrate encoding_rs from packed_simd to core::simd

Tracking

()

Status:

RESOLVED DUPLICATE of bug 1882209

People

(Reporter: hsivonen, Assigned: hsivonen)

References

Details

Attachments

(1 file)

Exynos packed_simd vs. core_simd 4 years ago Henri Sivonen (:hsivonen) 60.91 KB, text/plain		Details

Henri Sivonen (:hsivonen)

Assignee

Description

•

4 years ago

Once core_simd becomes available as core::simd behind a feature gate and doesn't regress performance, migrate to it.

Currently, I have a port to core_simd but it appears to severely regress performance on 32-bit ARM (Exynos 5). (Not sure yet why. The usual suspect is what any()/all() compile to.)

Also, on x86_64 (Haswell) what gets faster and what gets slower suggests that the prediction for the branch after the boolean reduction for "are these 16 UTF-16 code units all ASCII?" check flips to favor "they aren't all ASCII" and disfavor "they are all ASCII".

tt_1

Comment 1

•

4 years ago

I do have an 32-bit ARM device (rpi2), can you please share what the performance regression is all about and how to test for it? thanks

Flags: needinfo?(hsivonen)

Henri Sivonen (:hsivonen)

Assignee

Comment 2

•

4 years ago

Attached file Exynos packed_simd vs. core_simd — Details

Here's how these numbers are derived on Exynos 5 (Samsung Chromebook 2 running Crouton):

rustup default nightly-2021-07-01
Clone encoding_rs, safe_encoding_rs_mem, encoding_bench, and stdsimd so that they are siblings inside a common parent directory.
Ensure that the current branch of encoding_rs is master.
With encoding_bench as the working directory, run cargo bench --target thumbv7neon-unknown-linux-gnueabihf --features 'simd-accel self mem' > packed_simd.txt (I actually ran this four times and merged the results picking the minimum time for each benchmark, since noise can only make the benchmark run slower, using https://github.com/hsivonen/cargo-benchcmp/commits/faster )
Change encoding_rs to branch core_simd.
Run step 4 again into core_simd.txt.
cargo benchcmp --threshold 4 packed_simd.txt core_simd.txt

Flags: needinfo?(hsivonen)

Henri Sivonen (:hsivonen)

Assignee

Comment 3

•

4 years ago

The next step to actually investigating this would be to create a minimal non-inline function that calls all() on m8x16 with packed_simd and mask8x16 with core_simd (with opt_level=3 and opt_level=2) and checking the generated instructions.

Henri Sivonen (:hsivonen)

Assignee

Comment 4

•

4 years ago

On aarch64 (M1), it looks from the results like there could be the same kind of branch prediction flip as on x86_64.

Additionally, decoding UTF-16BE to UTF-16 becomes slower and UTF-16LE to UTF-16 becomes faster. While it's somewhat interesting that same-endian gets better and opposite-endian gets worse when migrating from packed_simd to core_simd, neither is real-world-relevant enough to bother investigating.

Henri Sivonen (:hsivonen)

Assignee

Comment 5

•

4 years ago

Except on aarch64, the presumed branch prediction flip affects the decode side in a regressing way and the encode side becomes faster.

Henri Sivonen (:hsivonen)

Assignee

Comment 6

•

4 years ago

As expected, the codegen for boolean reductions on thumbv7neon is bad.

See Also: → https://github.com/rust-lang/stdsimd/issues/146

Mike Hommey [:glandium]

Comment 7

•

4 years ago

Have you checked what these things look like in Firefox shippable builds (which have both PGO and LTO enabled)?

Henri Sivonen (:hsivonen)

Assignee

Comment 8

•

4 years ago

(In reply to Mike Hommey [:glandium] from comment #7)

Have you checked what these things look like in Firefox shippable builds (which have both PGO and LTO enabled)?

No. I expect those to potentially flip branch predictions on x86_64 and aarch64. However, the 32-bit ARM codegen is so bad that there's no way PGO or LTO would make it OK.

Henri Sivonen (:hsivonen)

Assignee

Comment 9

•

3 months ago

This was fixed in bug 1882209.

Status: ASSIGNED → RESOLVED

Closed: 3 months ago

Duplicate of bug: 1882209

Resolution: --- → DUPLICATE

You need to log in before you can comment on or make changes to this bug.

Bugzilla

Migrate encoding_rs from packed_simd to core::simd

Categories

(Core :: Internationalization, task)

Tracking

()

People

(Reporter: hsivonen, Assigned: hsivonen)

References

Details

Crash Data

Security

(public)

User Story

Attachments

(1 file)

Description

Comment 1

Comment 2

Comment 3

Comment 4

Comment 5

Comment 6

Comment 7

Comment 8

Comment 9

Attachment

General

Description

File Name

Content Type