Closed Bug 585978 Opened 14 years ago Closed 13 years ago

Add SSE implementation of is-ascii check in nsTextFragment

Categories

(Core :: DOM: Core & HTML, defect)

defect
Not set
normal

Tracking

()

RESOLVED FIXED
mozilla5

People

(Reporter: justin.lebar+bug, Assigned: justin.lebar+bug)

References

Details

(Keywords: perf)

Attachments

(2 files, 9 obsolete files)

Attached patch Patch v1 (obsolete) — Splinter Review
Split off from bug 582858.

Vectorize the following loop in nsTextFragment.cpp:

PRBool need2 = PR_FALSE;
while (ucp < uend) {
  PRUnichar ch = *ucp++;
  if (ch >= 256) {
    need2 = PR_TRUE;
    break;
  }
}
Keywords: perf
Attachment #464448 - Flags: review?(vladimir)
Depends on: 585708, 585818
Benchmark results for the attached patch, testcase v1.1, and based atop b7836c3a63db:

Trunk: 220ms (first run), 155ms (later runs)
Patched: 180ms (first run), 122ms (later runs)

This is surprisingly only 30% faster, but the assembly it's generating looks OK
to me.
Attached file Benchmark v1.1
Assignee: nobody → justin.lebar+bug
Status: NEW → ASSIGNED
Attachment #464448 - Attachment is patch: true
Attachment #464448 - Attachment mime type: application/octet-stream → text/plain
We should also probably vectorize LossyConvertEncoding<PRUnichar, char> (in nsUTF8Utils.h).  There's a comment in nsTextFragment that says it's "likely to be carefully tuned", but in fact it seems pretty naive.
Attached patch Patch v1.1 (obsolete) — Splinter Review
There are apparently two copies of this loop in the code.  This one replaces them both.
Attachment #464448 - Attachment is obsolete: true
Attachment #464510 - Flags: review?(vladimir)
Attachment #464448 - Flags: review?(vladimir)
Attached patch Patch v1.2 (obsolete) — Splinter Review
Build changes: Now based on patch v2 in bug 585818 (and patch v1 in bug 585708).  Sorry for the churn.
Attachment #464510 - Attachment is obsolete: true
Attachment #464556 - Flags: review?(vladimir)
Attachment #464510 - Flags: review?(vladimir)
Attached patch Patch v1.3 (obsolete) — Splinter Review
One more time...
Attachment #464556 - Attachment is obsolete: true
Attachment #464565 - Flags: review?(vladimir)
Attachment #464556 - Flags: review?(vladimir)
Wow, this is actually very much not right.  Fix soon.
Attachment #464565 - Flags: review?(vladimir)
Attached patch Patch v1.4 (obsolete) — Splinter Review
This one should be much more right.
Attachment #464565 - Attachment is obsolete: true
Attachment #465398 - Flags: review?(vladimir)
Summary: Vectorize is-ascii check in nsTextFragment → Add SSE implementation of is-ascii check in nsTextFragment
Comment on attachment 465398 [details] [diff] [review]
Patch v1.4

Vlad says he doesn't think he's the right person to review here; untagging him.
Attachment #465398 - Flags: review?(vladimir) → review?
I have no experience with the integer part of SSE, so I am very hesitant to call myself reviewer. I can just comment on this for now:

+static inline int
+is_zero (__m128i x)
+{
+  return
+    _mm_movemask_epi8(_mm_cmpeq_epi8 (x, _mm_setzero_si128 ())) == 0xffff;
+}

This function should return a bool, it seems :-)
Attached patch Patch v1.5 (obsolete) — Splinter Review
Updating Makefile logic so it works properly on Linux 32.
Attachment #465398 - Attachment is obsolete: true
Attachment #468482 - Flags: review?(jones.chris.g)
Attachment #465398 - Flags: review?
Attached patch Patch v1.6 (obsolete) — Splinter Review
(In reply to comment 10)
> This function should return a bool, it seems :-)

Fixed, thanks.
Attachment #468482 - Attachment is obsolete: true
Attachment #468489 - Flags: review?(jones.chris.g)
Attachment #468482 - Flags: review?(jones.chris.g)
Attached patch Patch v1.7 (rebased v1.6 to tip) (obsolete) — Splinter Review
Attachment #468489 - Attachment is obsolete: true
Attachment #469542 - Flags: review?(jones.chris.g)
Attachment #468489 - Flags: review?(jones.chris.g)
Hi Justin, I'm pretty swamped with fennec b1 work and won't be able to do this review until probably the end of next week; it's going to need a very careful read.  If you need the review sooner than that, please request from someone else.  Sorry!
That's all right -- I'm still waiting on a review on bug 585708 anyway.  :)
Blocks: 586698
Comment on attachment 469542 [details] [diff] [review]
Patch v1.7 (rebased v1.6 to tip)

> +ifneq (,$(filter x86 x86_64,$(CPU_ARCH)))

This test is yet again different from the one in bug 586698. Is there a reason why?

>  #include "nsTraceRefcnt.h"
> +
>  class nsString;

Whitespace-only change. Arguably a good one, but since it's the only modification in this file, perhaps inadvertent.

> +  while (str < end && (NS_PTR_TO_UINT32(str) & alignMask)) {
> ...
> +  while (end > str && (NS_PTR_TO_UINT32(str) & 0xf)) {

Same comment as in bug 586698 about pre-computing these lengths to avoid the per-character double-test.

> +    __m128i in = _mm_load_si128((__m128i *) str);

This uses a C-style cast, but everywhere else you convert pointers you use a reinterpret_cast<>. These should be consistent (especially since you're discarding the const qualifier, for no real reason that I can see).
(In reply to comment #16)
> Comment on attachment 469542 [details] [diff] [review]
> Patch v1.7 (rebased v1.6 to tip)
> 
> > +ifneq (,$(filter x86 x86_64,$(CPU_ARCH)))
> 
> This test is yet again different from the one in bug 586698. Is there a reason
> why?

It should test on INTEL_ARCHITECTURE; I just hadn't modified the patch to reflect bug 585818.

> >  #include "nsTraceRefcnt.h"
> > +
> >  class nsString;
> 
> Whitespace-only change. Arguably a good one, but since it's the only
> modification in this file, perhaps inadvertent.

Indeed it's inadvertent.  Thanks.
Attached patch Patch v2 (obsolete) — Splinter Review
After rewriting all the code to use array offsets (and incorporating Tim's suggestion about precomputing the loop bounds), the unvectorized code runs the benchmark in this bug just as fast as the vectorized code!

I suspect this might not be the case on a 32-bit machine, where in the unvectorized case, we have to do twice as many operations as I get to do on my 64-bit machine.

I also added a simple testcase.
Attachment #469542 - Attachment is obsolete: true
Attachment #492892 - Flags: review?(tterribe)
Attachment #469542 - Flags: review?(jones.chris.g)
Attached patch Patch v2.1Splinter Review
Forgot a whitespace fix.
Attachment #492892 - Attachment is obsolete: true
Attachment #492893 - Flags: review?(tterribe)
Attachment #492892 - Flags: review?(tterribe)
Attachment #464450 - Attachment description: Tesetcase v1.1 → Benchmark v1.1
Comment on attachment 492893 [details] [diff] [review]
Patch v2.1

r+=me
Attachment #492893 - Flags: review?(tterribe) → review+
Comment on attachment 492893 [details] [diff] [review]
Patch v2.1

I think this needs a review from someone in content, too.  jst?
Attachment #492893 - Flags: review?(jst)
Attachment #492893 - Flags: review?(jst) → review+
Comment on attachment 492893 [details] [diff] [review]
Patch v2.1

Requesting a2.0.

This plus bug 586698 yields a significant perf improvement when creating text nodes.  It's a self-contained change which shouldn't have strange interactions with other code, although of course it's string processing, so mistakes would be fatal.
Attachment #492893 - Flags: approval2.0?
Attachment #492893 - Flags: approval2.0? → approval2.0+
What's the status here? It doesn't look like this landed and the approval is nearly 4 weeks old.
It's blocked on bug 585708, which depends on bug 619178 and needs to land together with bug 616778 and bug 616782.  The latter three bugs still need review, and two of them are blockers.
Although this is now (finally) unblocked, I think it's too late to check this
in for 4.0, so I'm clearing the a2.0 flag.  Feel free to renom if you disagree!
Attachment #492893 - Flags: approval2.0+
Whiteboard: [check-in-after-branch]
Blocks: post2.0
No longer blocks: post2.0
Depends on: post2.0
I tried to land this on cedar, but the patch doesn't apply cleanly any more.  Please submit a new version of the patch, or land it on cedar yourself.
What's the timeline for landing on cedar?
(In reply to comment #27)
> What's the timeline for landing on cedar?

Land when it's ready.  :)
Whiteboard: [check-in-after-branch] → [check-in-after-branch][not-ready]
Fixed in Cedar: http://hg.mozilla.org/projects/cedar/rev/712063da5959
Whiteboard: [check-in-after-branch][not-ready] → [fixed-in-cedar]
http://hg.mozilla.org/mozilla-central/rev/712063da5959
Status: ASSIGNED → RESOLVED
Closed: 13 years ago
Resolution: --- → FIXED
Whiteboard: [fixed-in-cedar]
Target Milestone: --- → mozilla2.2
Depends on: 818190
Blocks: 901529
Component: DOM → DOM: Core & HTML
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: