Closed Bug 577883 Opened 14 years ago Closed 14 years ago

TM: inline unit string comparison

Categories

(Core :: JavaScript Engine, defect)

Other Branch
x86
macOS
defect
Not set
normal

Tracking

()

RESOLVED FIXED

People

(Reporter: gal, Assigned: gal)

References

Details

(Whiteboard: fixed-in-tracemonkey)

Attachments

(1 file, 1 obsolete file)

      No description provided.
Depends on: 577882
sayrer and I both found this issue separately over the last 2 years, time to finally fix it. I will try to find the original bug(s) and dup it/them. Tired and want to upload this.
Attached patch patch (obsolete) — Splinter Review
Measured it again. This is closer to a 30ms/7% speedup with the other patch actually (on sunspider).
Assignee: general → gal
Attachment #456712 - Flags: review?(nnethercote)
Comment on attachment 456712 [details] [diff] [review]
patch

>+template <typename T>
>+static inline bool CompareChars(const jschar **s1p, const jschar **s2p, size_t *np)
>+{
>+    const size_t N = sizeof(T) / sizeof(jschar);
>+    while (*np >= N) {
>+        if (*reinterpret_cast<const T *>(*s1p) != *reinterpret_cast<const T *>(*s2p))
>+            return false;
>+        (*np) -= N;
>+        (*s1p) += N;
>+        (*s2p) += N;
>+    }
>+    return true;
>+}

CompareChars is an awful name for a function returning a bool -- what does a true result mean?
"CharsAreEqual" or something similar would be better.


>+    inline bool equals(JSString *other) {
>+        /* Fast case: pointer equality could be a quick win. */
>+        if (this == other)
>+            return true;
>+
>+        size_t n = length();
>+        if (n != other->length())
>+            return false;
>+
>+        if (n == 0)
>+            return true;
>+
>+        const jschar *s1 = chars();
>+        const jschar *s2 = other->chars();
>+        if (n == 1)
>+            return *s1 == *s2;
>+
>+        if (sizeof(jsuword) == 8 && !js::CompareChars<jsuword>(&s1, &s2, &n))
>+            return false;
>+        if (!js::CompareChars<uint32>(&s1, &s2, &n))
>+            return false;
>+        JS_ASSERT(n == 0 || n == 1);
>+        return n == 0 || *s1 == *s2;
>+    }

Barf.  So CompareChars is effectively unrolling the loop?  On 64-bits you
first try comparing 8 bytes at a time, and then fall back to 4 bytes?  I
guess you're assuming the compiler will eliminate the "sizeof(jsuword)..."
test on 32-bit machines as dead code?  What happens if the size of the
string is not a multiple of 8 or 4 -- won't you be comparing values past the
end of the string?

In general, there's way too much going on here.  Comments would help.  I also think the template class confuses things.  Just making the two instances explicit would probably be no longer but much easier to understand.

I only skimmed the rest of the patch, but I think the same problems occur as
in the patch for bug 577882 -- LIR_uc2ui used instead of LIR_us2ui and you're
assuming the string is flat.
Attachment #456712 - Flags: review?(nnethercote) → review-
Is the unit string bit worth doing in JM?
Back to the patch:  you added inlining, plus you special-cased 1-length strings in equals(), plus you did the loop unrolling in equals().  Any idea the perf benefits of these individually?

Also, would using memcmp() instead of unrolling the loop yourself be smarter?  Presumably the system memcmp() is optimised already.
> Presumably the system memcmp() is optimised already.

We discovered this to not necessarily be the case on some operating systems when experimenting with indexOf.  See 551539 for details.
(In reply to comment #6)
> > Presumably the system memcmp() is optimised already.
> 
> We discovered this to not necessarily be the case on some operating systems
> when experimenting with indexOf.  See 551539 for details.

That's the kind of thing that I would *love* to see in a comment!
The comments added with the code in bug 551539 mention the issue, yes.  Though note that memcmp was faster than manual loop on Windows/Mac for long enough strings, too.
Summary: TM: inline unit string comparison and speed up js_StringEquals → TM: inline unit string comparison
Attached patch patchSplinter Review
Attachment #456712 - Attachment is obsolete: true
Cachegrind results of this patch plus the one in bug 584499:

---------------------------------------------------------------
| millions of instructions executed                           |
| total                        | on-trace (may overestimate)  |
---------------------------------------------------------------
|   111.202   111.487 (0.997x) |    20.797    20.809 (0.999x) | 3d-cube
|    56.990    56.948 (1.001x) |    22.313    22.313 (------) | 3d-morph
|   147.820   146.788 (1.007x) |    21.982    21.971 (1.001x) | 3d-raytrace
|    77.548    77.419 (1.002x) |    17.401    17.400 (------) | access-binary-
|   183.728   183.440 (1.002x) |   103.707   103.704 (------) | access-fannkuc
|    41.565    41.403 (1.004x) |    17.553    17.551 (------) | access-nbody
|    60.014    59.983 (1.001x) |    27.013    27.012 (------) | access-nsieve
|    15.688    15.681 (------) |     2.862     2.862 (------) | bitops-3bit-bi
|    43.303    43.285 (------) |    30.280    30.280 (------) | bitops-bits-in
|    22.773    22.770 (------) |    10.219    10.219 (------) | bitops-bitwise
|    71.624    71.610 (------) |    39.453    39.453 (------) | bitops-nsieve-
|    42.397    42.309 (1.002x) |    26.655    26.654 (------) | controlflow-re
|    47.115    47.002 (1.002x) |     5.541     5.660 (0.979x) | crypto-md5
|    31.336    31.286 (1.002x) |     7.030     7.109 (0.989x) | crypto-sha1
|   127.956   121.973 (1.049x) |    10.461    12.076 (0.866x) | date-format-to
|    99.144    98.905 (1.002x) |    11.118    11.116 (------) | date-format-xp
|    52.617    52.594 (------) |    27.614    27.614 (------) | math-cordic
|    38.416    38.406 (------) |     6.252     6.252 (------) | math-partial-s
|    28.590    28.474 (1.004x) |    10.286    10.285 (------) | math-spectral-
|   100.938   100.914 (------) |    75.045    75.045 (------) | regexp-dna
|    44.077    40.705 (1.083x) |     8.098    10.397 (0.779x) | string-base64
|   123.212   123.302 (0.999x) |    24.544    24.543 (------) | string-fasta
|   272.611   269.834 (1.010x) |     6.840     6.671 (1.025x) | string-tagclou
|   251.422   251.321 (------) |     5.637     5.635 (------) | string-unpack-
|    64.493    63.695 (1.013x) |     7.401     7.521 (0.984x) | string-validat
-------
|  2156.590  2141.546 (1.007x) |   546.114   550.163 (0.993x) | all

Basically it's a good improvement for date-format-tofte, little difference for anything else.
Comment on attachment 463431 [details] [diff] [review]
patch

>+            if (!l_str->isRope() && !r_str->isRope() && l_str->length() == 1 && r_str->length() == 1) {
>+                VMSideExit *exit = snapshot(BRANCH_EXIT);
>+                guard(true, lir->ins2(LIR_eqp, getStringLength(l_ins), INS_CONSTWORD(1)), exit);
>+                guard(true, lir->ins2(LIR_eqp, getStringLength(r_ins), INS_CONSTWORD(1)), exit);

Factor out INS_CONSTWORD(1) so CseFilter doesn't have to?


>+JS_REQUIRES_STACK LIns*
>+TraceRecorder::getStringLength(LIns* str_ins)
>+{
>+    return lir->ins2ImmI(LIR_rshup,
>+                         lir->insLoad(LIR_ldp, str_ins, offsetof(JSString, mLengthAndFlags),
>+                                      ACCSET_OTHER, LOAD_CONST),
>+                         JSString::FLAGS_LENGTH_SHIFT);
>+}
>+
>+JS_REQUIRES_STACK LIns*
>+TraceRecorder::getStringChars(LIns* str_ins)
>+{
>+    return lir->insLoad(LIR_ldp, str_ins, offsetof(JSString, mChars), ACCSET_OTHER, LOAD_CONST);
>+}

I like these.  I'll like them even more if you use addName() to give the
values names like "strLen" and "strChars".  You could do likewise in
getStringLength() with mLengthAndFlags.

Nice patch!
Attachment #463431 - Flags: review+
Whiteboard: fixed-in-tracemonkey
http://hg.mozilla.org/mozilla-central/rev/c15ed7c71e27
Status: NEW → RESOLVED
Closed: 14 years ago
Resolution: --- → FIXED
Depends on: 585309
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: