Open Bug 1396490 Opened 7 years ago Updated 2 years ago

Investigate impact of nsStringBuffer vs. encoding_rs alignment mismatch and decide on action (if any)

Categories

(Core :: XPCOM, enhancement, P3)

enhancement

Tracking

()

People

(Reporter: hsivonen, Unassigned)

Details

By code inspection, nsStringBuffer reserves eight bytes of bookkeeping header from malloc-returned buffer. Since malloc is supposed to return pointers that are aligned even for 128-bit types, this suggests that  the start pointer returned by nsCString/nsString is never aligned to a 16-byte boundary.

encoding_rs uses SSE2 on x86/x86_64 and NEON on Aarch64 for accelerating ASCII operations. These operations benefit from 16-byte alignment, but when the pointers are unaligned, encoding_rs prefers to use unaligned SIMD loads and stores instead of trying to move the pointers to alignment using the ALU. The reason for this is two-fold: First, when the input is mixed ASCII and non-ASCII, moving the pointers to alignment would reduce the probability of being able to do an ASCII-only SIMD read before the next non-ASCII run. Second, when a long buffer is all-ASCII, chances are that it's being held in a buffer that comes directly from malloc and, therefore, is optimally aligned to begin with.

Using XPCOM strings instead of e.g. Rust Vecs makes the second part of the design rationale not hold.

We should measure the impact and decide what to do.

I expect the problem to be quite notable on old Intel microarchitectures, e.g. on Core 2 Duo CPUs but negligible or non-existent on recent microarchitectures. ARM's manual suggests there's a penalty on unaligned writes but unclear if that matters.

Possible responses: 

1) Do nothing. Pros: Simple. Cons: Doesn't solve the problem if there is one.

2) Change encoding_rs  to move pointers to alignment at the start of a call into encoding_rs or at the start of a stream. Pros:  Complexity is hidden in encoding_rs. Cons:  unnecessary branches inflicted on non-XPCOM string usage.  Since eight bytes amounts to a different number of code units in UTF-16 compared to the other encodings, when decoding to UTF-16 or encoding from UTF-16 there isn't a small number of code units to process that would align both the source and the destination. (Probably a problem on old Intel; less so on Aarch64 where aligning destination should be enough.)

3) Change Gecko callers to move pointers to alignment before a call into encoding_rs. Pros:  XPCOM issues not spilling over to general-purpose Rust code. Cons: More Gecko-specific wrapper code that probably should even me architecture-aware. Since eight bytes amounts to a different number of code units in UTF-16 compared to the other encodings, when decoding to UTF-16 or encoding from UTF-16 there isn't a small number of code units to process that would align both the source and the destination. (Probably a problem on old Intel; less so on Aarch64 where aligning destination should be enough.)

4) Deliberately bloat the nsStringBuffer header to 16 bytes. Pros: Good alignment. Cons: 8 bytes of wasted allocation per nsStringBuffer.

5) Change assumed-impactful specific uses not to use XPCOM strings for buffering. E.g. use something else (maybe the upcoming fallible Vec) for buffering CSS and JavaScript resources. Pros: No need for adding alignment branches. Cons: Doesn't fix the long tail.
Priority: -- → P3
Component: String → XPCOM
Severity: normal → S3
You need to log in before you can comment on or make changes to this bug.