Closed
Bug 585978
Opened 14 years ago
Closed 14 years ago
Add SSE implementation of is-ascii check in nsTextFragment
Categories
(Core :: DOM: Core & HTML, defect)
Core
DOM: Core & HTML
Tracking
()
RESOLVED
FIXED
mozilla5
People
(Reporter: justin.lebar+bug, Assigned: justin.lebar+bug)
References
Details
(Keywords: perf)
Attachments
(2 files, 9 obsolete files)
Split off from bug 582858. Vectorize the following loop in nsTextFragment.cpp: PRBool need2 = PR_FALSE; while (ucp < uend) { PRUnichar ch = *ucp++; if (ch >= 256) { need2 = PR_TRUE; break; } }
Assignee | ||
Updated•14 years ago
|
Attachment #464448 -
Flags: review?(vladimir)
Assignee | ||
Updated•14 years ago
|
Assignee | ||
Comment 1•14 years ago
|
||
Benchmark results for the attached patch, testcase v1.1, and based atop b7836c3a63db: Trunk: 220ms (first run), 155ms (later runs) Patched: 180ms (first run), 122ms (later runs) This is surprisingly only 30% faster, but the assembly it's generating looks OK to me.
Assignee | ||
Comment 2•14 years ago
|
||
Assignee: nobody → justin.lebar+bug
Status: NEW → ASSIGNED
Assignee | ||
Updated•14 years ago
|
Attachment #464448 -
Attachment is patch: true
Attachment #464448 -
Attachment mime type: application/octet-stream → text/plain
Assignee | ||
Comment 3•14 years ago
|
||
We should also probably vectorize LossyConvertEncoding<PRUnichar, char> (in nsUTF8Utils.h). There's a comment in nsTextFragment that says it's "likely to be carefully tuned", but in fact it seems pretty naive.
Assignee | ||
Comment 4•14 years ago
|
||
There are apparently two copies of this loop in the code. This one replaces them both.
Attachment #464448 -
Attachment is obsolete: true
Attachment #464510 -
Flags: review?(vladimir)
Attachment #464448 -
Flags: review?(vladimir)
Assignee | ||
Comment 5•14 years ago
|
||
Build changes: Now based on patch v2 in bug 585818 (and patch v1 in bug 585708). Sorry for the churn.
Attachment #464510 -
Attachment is obsolete: true
Attachment #464556 -
Flags: review?(vladimir)
Attachment #464510 -
Flags: review?(vladimir)
Assignee | ||
Comment 6•14 years ago
|
||
One more time...
Attachment #464556 -
Attachment is obsolete: true
Attachment #464565 -
Flags: review?(vladimir)
Attachment #464556 -
Flags: review?(vladimir)
Assignee | ||
Comment 7•14 years ago
|
||
Wow, this is actually very much not right. Fix soon.
Assignee | ||
Updated•14 years ago
|
Attachment #464565 -
Flags: review?(vladimir)
Assignee | ||
Comment 8•14 years ago
|
||
This one should be much more right.
Attachment #464565 -
Attachment is obsolete: true
Attachment #465398 -
Flags: review?(vladimir)
Assignee | ||
Updated•14 years ago
|
Summary: Vectorize is-ascii check in nsTextFragment → Add SSE implementation of is-ascii check in nsTextFragment
Assignee | ||
Comment 9•14 years ago
|
||
Comment on attachment 465398 [details] [diff] [review] Patch v1.4 Vlad says he doesn't think he's the right person to review here; untagging him.
Attachment #465398 -
Flags: review?(vladimir) → review?
Comment 10•14 years ago
|
||
I have no experience with the integer part of SSE, so I am very hesitant to call myself reviewer. I can just comment on this for now: +static inline int +is_zero (__m128i x) +{ + return + _mm_movemask_epi8(_mm_cmpeq_epi8 (x, _mm_setzero_si128 ())) == 0xffff; +} This function should return a bool, it seems :-)
Assignee | ||
Comment 11•14 years ago
|
||
Updating Makefile logic so it works properly on Linux 32.
Attachment #465398 -
Attachment is obsolete: true
Attachment #468482 -
Flags: review?(jones.chris.g)
Attachment #465398 -
Flags: review?
Assignee | ||
Comment 12•14 years ago
|
||
(In reply to comment 10) > This function should return a bool, it seems :-) Fixed, thanks.
Attachment #468482 -
Attachment is obsolete: true
Attachment #468489 -
Flags: review?(jones.chris.g)
Attachment #468482 -
Flags: review?(jones.chris.g)
Assignee | ||
Comment 13•14 years ago
|
||
Attachment #468489 -
Attachment is obsolete: true
Attachment #469542 -
Flags: review?(jones.chris.g)
Attachment #468489 -
Flags: review?(jones.chris.g)
Hi Justin, I'm pretty swamped with fennec b1 work and won't be able to do this review until probably the end of next week; it's going to need a very careful read. If you need the review sooner than that, please request from someone else. Sorry!
Assignee | ||
Comment 15•14 years ago
|
||
That's all right -- I'm still waiting on a review on bug 585708 anyway. :)
Comment 16•14 years ago
|
||
Comment on attachment 469542 [details] [diff] [review] Patch v1.7 (rebased v1.6 to tip) > +ifneq (,$(filter x86 x86_64,$(CPU_ARCH))) This test is yet again different from the one in bug 586698. Is there a reason why? > #include "nsTraceRefcnt.h" > + > class nsString; Whitespace-only change. Arguably a good one, but since it's the only modification in this file, perhaps inadvertent. > + while (str < end && (NS_PTR_TO_UINT32(str) & alignMask)) { > ... > + while (end > str && (NS_PTR_TO_UINT32(str) & 0xf)) { Same comment as in bug 586698 about pre-computing these lengths to avoid the per-character double-test. > + __m128i in = _mm_load_si128((__m128i *) str); This uses a C-style cast, but everywhere else you convert pointers you use a reinterpret_cast<>. These should be consistent (especially since you're discarding the const qualifier, for no real reason that I can see).
Assignee | ||
Comment 17•14 years ago
|
||
(In reply to comment #16) > Comment on attachment 469542 [details] [diff] [review] > Patch v1.7 (rebased v1.6 to tip) > > > +ifneq (,$(filter x86 x86_64,$(CPU_ARCH))) > > This test is yet again different from the one in bug 586698. Is there a reason > why? It should test on INTEL_ARCHITECTURE; I just hadn't modified the patch to reflect bug 585818. > > #include "nsTraceRefcnt.h" > > + > > class nsString; > > Whitespace-only change. Arguably a good one, but since it's the only > modification in this file, perhaps inadvertent. Indeed it's inadvertent. Thanks.
Assignee | ||
Comment 18•14 years ago
|
||
After rewriting all the code to use array offsets (and incorporating Tim's suggestion about precomputing the loop bounds), the unvectorized code runs the benchmark in this bug just as fast as the vectorized code! I suspect this might not be the case on a 32-bit machine, where in the unvectorized case, we have to do twice as many operations as I get to do on my 64-bit machine. I also added a simple testcase.
Attachment #469542 -
Attachment is obsolete: true
Attachment #492892 -
Flags: review?(tterribe)
Attachment #469542 -
Flags: review?(jones.chris.g)
Assignee | ||
Comment 19•14 years ago
|
||
Forgot a whitespace fix.
Attachment #492892 -
Attachment is obsolete: true
Attachment #492893 -
Flags: review?(tterribe)
Attachment #492892 -
Flags: review?(tterribe)
Assignee | ||
Updated•14 years ago
|
Attachment #464450 -
Attachment description: Tesetcase v1.1 → Benchmark v1.1
Comment 20•14 years ago
|
||
Comment on attachment 492893 [details] [diff] [review] Patch v2.1 r+=me
Attachment #492893 -
Flags: review?(tterribe) → review+
Assignee | ||
Comment 21•14 years ago
|
||
Comment on attachment 492893 [details] [diff] [review] Patch v2.1 I think this needs a review from someone in content, too. jst?
Attachment #492893 -
Flags: review?(jst)
Updated•14 years ago
|
Attachment #492893 -
Flags: review?(jst) → review+
Assignee | ||
Comment 22•14 years ago
|
||
Comment on attachment 492893 [details] [diff] [review] Patch v2.1 Requesting a2.0. This plus bug 586698 yields a significant perf improvement when creating text nodes. It's a self-contained change which shouldn't have strange interactions with other code, although of course it's string processing, so mistakes would be fatal.
Attachment #492893 -
Flags: approval2.0?
Updated•14 years ago
|
Attachment #492893 -
Flags: approval2.0? → approval2.0+
Comment 23•14 years ago
|
||
What's the status here? It doesn't look like this landed and the approval is nearly 4 weeks old.
Assignee | ||
Comment 24•14 years ago
|
||
It's blocked on bug 585708, which depends on bug 619178 and needs to land together with bug 616778 and bug 616782. The latter three bugs still need review, and two of them are blockers.
Assignee | ||
Comment 25•14 years ago
|
||
Although this is now (finally) unblocked, I think it's too late to check this in for 4.0, so I'm clearing the a2.0 flag. Feel free to renom if you disagree!
Assignee | ||
Updated•14 years ago
|
Attachment #492893 -
Flags: approval2.0+
Assignee | ||
Updated•14 years ago
|
Whiteboard: [check-in-after-branch]
Updated•14 years ago
|
Comment 26•14 years ago
|
||
I tried to land this on cedar, but the patch doesn't apply cleanly any more. Please submit a new version of the patch, or land it on cedar yourself.
Assignee | ||
Comment 27•14 years ago
|
||
What's the timeline for landing on cedar?
Comment 28•14 years ago
|
||
(In reply to comment #27) > What's the timeline for landing on cedar? Land when it's ready. :)
Updated•14 years ago
|
Whiteboard: [check-in-after-branch] → [check-in-after-branch][not-ready]
Assignee | ||
Comment 29•14 years ago
|
||
Fixed in Cedar: http://hg.mozilla.org/projects/cedar/rev/712063da5959
Whiteboard: [check-in-after-branch][not-ready] → [fixed-in-cedar]
Comment 30•14 years ago
|
||
http://hg.mozilla.org/mozilla-central/rev/712063da5959
Status: ASSIGNED → RESOLVED
Closed: 14 years ago
Resolution: --- → FIXED
Whiteboard: [fixed-in-cedar]
Target Milestone: --- → mozilla2.2
Updated•6 years ago
|
Component: DOM → DOM: Core & HTML
You need to log in
before you can comment on or make changes to this bug.
Description
•