Adjust the string matcher to use wmemchr

RESOLVED FIXED in mozilla32

Status

()

defect
RESOLVED FIXED
5 years ago
3 years ago

People

(Reporter: h4writer, Assigned: eternalsushant)

Tracking

unspecified
mozilla32
Points:
---

Firefox Tracking Flags

(Not tracked)

Details

(Whiteboard: [good first bug][mentor=h4writer][lang=c++])

Attachments

(1 attachment, 10 obsolete attachments)

Reporter

Description

5 years ago
MemChrMatch in js/src/jsstr.cpp uses memchr to find the first occurence of a 8bit char in a string. Our strings are actually 16bit chars, so there is some patching to get memchr working on 16bit strings.

Now there exists wmemchr. This works on 16bit or 32bit chars. For the places where wmemchr works on 16bit chars we could use wmemchr! As far as I know wmemchr would work on window machines. On unix machines it uses 32bit chars.

This bug is to test this:
1) Test if the 16bit wmemchr improves performance over the hacked up memchr where available.
2) Create a patch to actually implement this for these platforms.
Reporter

Comment 1

5 years ago
The testcase in bug 965712 can be used to test performance of our string matches.
Whiteboard: [good first bug][mentor=h4writer][lang=c++]
Assignee

Comment 2

5 years ago
Hey, I would like to work on this bug. As this would be my first bug I would like additional guidance. Can I contact you over IRC?
Reporter

Comment 3

5 years ago
(In reply to Sushant Dinesh from comment #2)
> Hey, I would like to work on this bug. As this would be my first bug I would
> like additional guidance. Can I contact you over IRC?

Sure you can contact me over IRC. I'm in #jsapi and #ionmonkey. My irc handle is h4writer.

First of all I need to tell you that the memchr patch (part 2 in bug 965712) hasn't landed, since it was a regression on Mac. (see bug 965712 comment 15). It would be possible to import the glibc version directly (see bug 965712 comment 19). My very limited testing just showed no improvement going hat way.

So there are some things you can do (from easy to difficult):

1) Just ignore memchr and try wmemchr for windows/mac and see how it performs against Unrolledmatch. This is what this bug describes.

2) Try to improve the UnrolledMatch with tricks of memchr/wmemchr. We only unrolled the loop, but we can go further. Looking at memchr source, it iterates over "long ints (4 byte at least)" instead of "8bit chars". We could add this trick to UnrolledMatch too.
(e.g. another source http://code.metager.de/source/xref/gnu/glibc/string/memchr.c) 

3) Have a look at the memchr patch. Maybe I overlooked something? the glibc version uses knowledge about the distribution of chars. Now since we hacked it to run on 16bit chars 8bit at a time, this distribution will be off. So that could be a reason. This would be for experts.
Reporter

Updated

5 years ago
Assignee: nobody → eternalsushant
Reporter

Comment 5

5 years ago
(In reply to Sushant Dinesh from comment #4)
> Created attachment 8426894 [details] [diff] [review]
> part1 - Try using wmemchar instead of unrolledloop

Sorry for the delay in testing and thanks for trying this.
So wmemchar would only work on Windows and I get the following numbers:

Before:
70 ms
138 ms
220 ms

After:
63 ms
121 ms
197 ms

So that is a 11% improvement.

Now you are busy with idea 2, right. Is it working? No problems?
(This week I will be hard to reach on IRC. So feel free to comment in the bug or send me a mail.)
Assignee

Comment 6

5 years ago
(In reply to Hannes Verschore [:h4writer] from comment #5)
Hey, Thanks for that :) Good to know there is an improvement.
I was busy over the weekend so I couldn't do much. However I feel that I must be able to complete the idea 2 by tomorrow (If I don't run into any troubles :P) or max day after :)
Assignee

Comment 7

5 years ago
Hey,
I want to know how I can debug this effectively. For some reason I think its returning -1 instead of match. How do I go about tracing the flow of the program?
Reporter

Comment 8

5 years ago
Sushant found the following:

> On windows: Wmemchr           > Unrolled match == memchr hack
> On mac:     Unrolled matcher >> Memchr hack
> On linux:   Memchr hack       > Unrolled matcher > Implemented memchr16

* idea 1 gave a 11% improvement
* idea 2 didn't give us a speed improvement

Also we might have 8bit chars soonish (bug 998392).

So we came up with the following plan of attack.
We would make 3 first char matchers:

1) FirstCharMatcher_16bit
2) FirstCharMatcher_8bit
3) FirstCharMatcher_unrolled<size>

=> FirstCharMatcher_16bit (default)
-> uses wmemchr where size_of(wchar_t) == size_of(jschar)
-> uses FirstCharMatcher_unrolled<jschar *> for clang
-> uses FirstCharMatcher_8bit using the idea in bug 965712 part 2 for other cases

=> FirstCharMatcher_8bit
-> uses FirstCharMatcher_unrolled<char *> for clang
-> uses memchr in all other cases

=> FirstCharMatcher_unrolled<size>
-> is the current UnrolledMatcher code

So this gives us the faster implementation without adding too much complexity and we will have already a 8bit matcher present for comparing two 8bit strings
Assignee

Comment 9

5 years ago
Hey, I finished coding up the improved matcher in the way we discussed. I was waiting around in IRC to discuss what to do next. Do let me know when you're online/free :P
Reporter

Comment 10

5 years ago
Awesome! Today was a holiday and tomorrow I won't be online. I will be Monday. Now you can already add the patch online. If you find how. Maybe I will be able to have a look Sunday. Also if you still know how to run the jit-test. You can always do that. I'll be online Monday for sure. So I'll answer your questions than ;). Thanks
Assignee

Comment 12

5 years ago
Attachment #8432457 - Attachment is obsolete: true
Attachment #8432457 - Flags: review?(hv1989)
Attachment #8432505 - Flags: review?(hv1989)
Assignee

Comment 13

5 years ago
Attachment #8432505 - Attachment is obsolete: true
Attachment #8432505 - Flags: review?(hv1989)
Attachment #8432523 - Flags: review?(hv1989)
Reporter

Updated

5 years ago
Attachment #8426894 - Attachment is obsolete: true
Reporter

Comment 14

5 years ago
Comment on attachment 8432523 [details] [diff] [review]
Improved the matcher. Added support for future 8bit chars.

Review of attachment 8432523 [details] [diff] [review]:
-----------------------------------------------------------------

Looks already quite good. Good job.
Don't be discouraged by the looks of a lot of remarks. Most are just style nits.
After iterating some style nits a few times, I stopped marking them, but the style nits apply to the whole patch.
After making the changes, you can add the new patch again and flag me for review again.
Thanks

::: js/src/jsstr.cpp
@@ +1062,5 @@
>      }
>  };
>  
> +template <typename TextChar,typename PatChar>
> +    static const TextChar*

Style nit: no spaces before "static"

@@ +1063,5 @@
>  };
>  
> +template <typename TextChar,typename PatChar>
> +    static const TextChar*
> +UnrolledMatch(const TextChar *text,uint32_t textlen,const PatChar *pat,uint32_t patlen,uint32_t index)

Can you rename this to: FirstCharMatcherUnrolled

Style nit: We like everything to breath a bit. So after function declaration and after the , there should be a space.
Function(Foo test,Foo2 test)
Function (Foo test, Foo2 test)

@@ +1068,5 @@
>  {
>      JS_ASSERT(patlen > 0 && textlen > 0);
> +    const TextChar *textend = text + textlen - (patlen - 1);
> +    const PatChar p0 = *pat;
> +    const TextChar *t = text+index;

Style nits:
1) Can you add spaces nexto the +. (Let the operators breath ;))
2) Can you also add a newline after this line (personal taste);

@@ +1080,5 @@
> +	case 2: if (*t == p0) goto match;
> +	case 1: if (*t == p0) goto match;
> +    }
> +    while ((t-text) <= (textend-text)) {
> +	if (t[0] == p0) { t += 0;goto match;}

Style nits: After ";" there should be a space. Same applies for the following lines.

@@ +1096,5 @@
> +    }
> +    return nullptr;
> +}
> +
> +#if WCHAR_MAX > 65536 && !defined(__clang__)

Can this now get removed, since it is now used in Matcher?

@@ +1097,5 @@
> +    return nullptr;
> +}
> +
> +#if WCHAR_MAX > 65536 && !defined(__clang__)
> +    static const char*

Style nit: space

@@ +1098,5 @@
> +}
> +
> +#if WCHAR_MAX > 65536 && !defined(__clang__)
> +    static const char*
> +FirstCharMatcher_8bit(const char *text,uint32_t textlen,const char *pat,uint32_t patlen,uint32_t index)

Can you rename it to "FirstCharMatcher8it". My fault, I shouldn't have suggested to put a "_" in it.
Style nit: add spaces

@@ +1101,5 @@
> +    static const char*
> +FirstCharMatcher_8bit(const char *text,uint32_t textlen,const char *pat,uint32_t patlen,uint32_t index)
> +{
> +#if defined(__clang__)
> +    return UnrolledMatch<char,char>(text,textlen,pat,patlen);

Style nit: add spaces

@@ +1103,5 @@
> +{
> +#if defined(__clang__)
> +    return UnrolledMatch<char,char>(text,textlen,pat,patlen);
> +#else
> +    return reinterpret_cast<const char*>(memchr(text+index,pat[0],textlen-index+1));

Style nit: add spaces (after , and before and after + -

@@ +1109,5 @@
> +}
> +#endif
> +
> +    static const jschar *
> +FirstCharMatcher_16bit(const jschar *text,uint32_t textlen,const jschar *pat,uint32_t patlen,uint32_t index)

Can you rename it to "FirstCharMatcher16bit". My fault, I shouldn't have suggested to put a "_" in it.

@@ +1112,5 @@
> +    static const jschar *
> +FirstCharMatcher_16bit(const jschar *text,uint32_t textlen,const jschar *pat,uint32_t patlen,uint32_t index)
> +{
> +    //NEED TO VERIFY IF THIS WORKS. 65536 = 2^16 MAX VALUE OF 16bit INT.
> +#if WCHAR_MAX <= 65536

Can you do:
WCHAR_MAX == INT16_MAX
and remove the comment?

@@ +1122,5 @@
> +#elif defined(__clang__)
> +    /* Performance under memchr is horrible in clang. Hence it is best to use UnrolledMatcher in this case */
> +    return UnrolledMatch<jschar,jschar>(text,textlen,pat,patlen,index);
> +#else
> +    /* For linux the best performance is obtained by slightly hacking memchr.

Style nit: multiline comments go like:

/*
 * something
 * something
 */

@@ +1126,5 @@
> +    /* For linux the best performance is obtained by slightly hacking memchr.
> +     * memchr works only on 8bit char but jschar is 16bit.
> +     * So we treat jschar in blocks of 8bit and use memchr.
> +     */
> +    const char *text8 = (const char *)text;

Style nit: (const char *) text;

@@ +1131,5 @@
> +    const char *pat8 = (const char *)pat;
> +    uint32_t n = (textlen-patlen) * 2;
> +    uint32_t i = index*2;
> +    while(i <= n)
> +    {

Style nit: we put the { on the same line as the while. Also a space between while and (
while (i <= n) {

@@ +1137,5 @@
> +	if(pos8 == nullptr)
> +	    return nullptr;
> +	i = static_cast<uint32_t>(pos8-text8);
> +	if(i%2 != 0)
> +	{

Style nit: we put the { on the same line as the if. Also a space between if and (
Attachment #8432523 - Flags: review?(hv1989)
Assignee

Comment 15

5 years ago
Attachment #8432523 - Attachment is obsolete: true
Attachment #8432644 - Flags: review?(hv1989)
Reporter

Comment 16

5 years ago
Comment on attachment 8432644 [details] [diff] [review]
Improved the matcher. Added support for future 8bit chars.

Review of attachment 8432644 [details] [diff] [review]:
-----------------------------------------------------------------

Looks good. Though I would like to have another look before r+ it, because of the changes requested to FirstCharMatcherUnrolled and the removal of some of the arguments of the FirstCharMatchers.

::: js/src/jsstr.cpp
@@ +1063,5 @@
>  };
>  
> +template <typename TextChar, typename PatChar>
> +static const TextChar*
> +FirstCharMatcherUnrolled (const TextChar *text, uint32_t textlen, const PatChar *pat, uint32_t patlen, uint32_t index)

All "FirstCharMatcherXXX" don't need the index parameter. See my comment in Matcher;

@@ +1071,5 @@
> +    const PatChar p0 = *pat;
> +    const TextChar *t = text + index;
> +
> +    switch ((textend - t) & 7) {
> +	case 0: if (*t == p0) goto match;

1) Something went wrong here during copying. This should be "*t++" instead of "*t". Also for all next lines in the switch.
2) Another improvement possible here is that we can remove the label match here. Just replace it with "return t" in the switch and in the while, change the "t += X; goto match;" to "return t + X;" (I think that should be possible).

@@ +1080,5 @@
> +	case 3: if (*t == p0) goto match;
> +	case 2: if (*t == p0) goto match;
> +	case 1: if (*t == p0) goto match;
> +    }
> +    while ((t - text) <= (textend - text)) {

With the index removed, this can be again:
 while (t != textend) {

@@ +1104,5 @@
> +#if  defined(__clang__)
> +    return FirstCharMatcherUnrolled<char, char>(text, textlen, pat, patlen);
> +#else
> +    return reinterpret_cast<const char*>(memchr(text + index, pat[0], textlen - index + 1));
> +#endif 

Style nit: spaces at the end

@@ +1111,5 @@
> +static const jschar *
> +FirstCharMatcher16bit (const jschar *text, uint32_t textlen, const jschar *pat, uint32_t patlen, uint32_t index)
> +{
> +#if WCHAR_MAX == INT16_MAX
> +    /* Wmemchr works the best. 

Style nit: spaces at the end

@@ +1113,5 @@
> +{
> +#if WCHAR_MAX == INT16_MAX
> +    /* Wmemchr works the best. 
> +     * But only possible to use this when when size of jschar = size of wchar_t.
> +     */

I think I didn't explain it good. Comments are done as following:

One line, max 80 chars
/* foooooooooooooooo */

multiple lines. So the first line is empty (again cutoff at 80 character):
/*
 * ffooooooooo fofofooooooo
 * fooooooooo
 */

@@ +1120,5 @@
> +    const wchar_t *wpat = (const wchar_t*) pat;
> +    return (jschar *) (wmemchr(wtext + index, wpat[0], n - index + 1));
> +#elif defined(__clang__)
> +    /* Performance under memchr is horrible in clang. 
> +     * Hence it is best to use UnrolledMatcher in this case 

Style nit: spaces at the end

@@ +1141,5 @@
> +	if (i%2 != 0) {
> +	    i++;
> +	    continue;
> +	}
> +	if (pat8[1]==text8[i + 1])

Style nit: put spaces around the ==

@@ +1143,5 @@
> +	    continue;
> +	}
> +	if (pat8[1]==text8[i + 1])
> +	    return (text + (i/2));
> +	i+=2;

Style nit: Put spaces around the +=

@@ +1146,5 @@
> +	    return (text + (i/2));
> +	i+=2;
> +    }
> +    return nullptr;
> +#endif 

Style nit: spaces at the end

@@ +1157,2 @@
>      const typename InnerMatch::Extent extent = InnerMatch::computeExtent(pat, patlen);
> +    uint32_t i=0;

Style nit: Put spaces around the =

@@ +1163,5 @@
> +	    pos = (TextChar *) FirstCharMatcher16bit(text, textlen, pat, patlen, i);
> +	else if (sizeof(TextChar) == 1 && sizeof(PatChar) == 1)
> +	    pos = (TextChar *) FirstCharMatcher8bit((char *) text, textlen, (char *) pat, patlen, i);
> +	else
> +	    pos = (TextChar *) FirstCharMatcherUnrolled<TextChar, PatChar>(text, textlen, pat, patlen, i);

1) For all FirstCharMatcher we don't need an extra parameter i. You can just do:
- FirstCharMatcherUnrolled<TextChar, PatChar>(text+i, textlen-i, pat, patlen);
2) Secondly we can remove the patlen parameter. It isn't used. Bonus if you only give pat[0] as parameter to the FirstCharMatcherXXX

@@ +1231,3 @@
>                          :
>  #endif
> +                          Matcher<ManualCmp,jschar,jschar>(text, textlen, pat, patlen);

Here also spaces after the ","
Attachment #8432644 - Flags: review?(hv1989)
Assignee

Comment 17

5 years ago
Posted patch Improved StringMatcher (obsolete) — Splinter Review
Attachment #8432644 - Attachment is obsolete: true
Attachment #8433270 - Flags: review?(hv1989)
Assignee

Comment 18

5 years ago
Posted patch Improved StringMatcher (obsolete) — Splinter Review
Attachment #8433270 - Attachment is obsolete: true
Attachment #8433270 - Flags: review?(hv1989)
Attachment #8433280 - Flags: review?(hv1989)
Reporter

Comment 19

5 years ago
Comment on attachment 8433280 [details] [diff] [review]
Improved StringMatcher

Review of attachment 8433280 [details] [diff] [review]:
-----------------------------------------------------------------

1) For being more clear, let n be "text - patlen + 1". That way we always iterate till "i < n" instead of "i < n+1"
2) Some extra comments

I'll give already r+. Can you upload a new patch with the changes.
Afterwards I'll push it to the tryserver, so we have a full round of tests on all platforms for this.

::: js/src/jsstr.cpp
@@ +1078,5 @@
> +	case 3: if (*t++ == pat) return t - 1;
> +	case 2: if (*t++ == pat) return t - 1;
> +	case 1: if (*t++ == pat) return t - 1;
> +    }
> +    while (textend - t > 0) {

This can be "textend != t" again.

@@ +1098,5 @@
> +{
> +#if  defined(__clang__)
> +    return FirstCharMatcherUnrolled<char, char>(text, n, pat);
> +#else
> +    return reinterpret_cast<const char*>(memchr(text, pat, n + 1));

use n, instead of n + 1

@@ +1111,5 @@
> +     * Wmemchr works the best.
> +     * But only possible to use this when,
> +     * size of jschar = size of wchar_t.
> +     */
> +    const wchar_t *wtext = (const wchar_t*) text;

Style nit: (const wchar_t *)

@@ +1113,5 @@
> +     * size of jschar = size of wchar_t.
> +     */
> +    const wchar_t *wtext = (const wchar_t*) text;
> +    const wchar_t wpat = (const wchar_t) pat;
> +    return (jschar *) (wmemchr(wtext, wpat, n + 1));

use n, instead of n + 1

@@ +1129,5 @@
> +     */
> +    const char *text8 = (const char *) text;
> +    char pat8[2];
> +    pat8[1] = pat & 255<<8;
> +    pat8[0] = pat & 255;

These 3 lines are the same as:
const char *pat8 = (const char *) pat;

@@ +1133,5 @@
> +    pat8[0] = pat & 255;
> +    n = n * 2;
> +    uint32_t i = 0;
> +    while (i <= n) {
> +	const char *pos8 = FirstCharMatcher8bit(text8 + i, n, pat8[0]);

Add comment:
/* Find the first 8 bits of the 16bit first character. */

@@ +1136,5 @@
> +    while (i <= n) {
> +	const char *pos8 = FirstCharMatcher8bit(text8 + i, n, pat8[0]);
> +	if (pos8 == nullptr)
> +	    return nullptr;
> +	uint32_t i = static_cast<uint32_t>(pos8 - text8);

Add comment:
/* Incorrect match if it matches the last 8 bits of a 16bit chars. */

@@ +1140,5 @@
> +	uint32_t i = static_cast<uint32_t>(pos8 - text8);
> +	if (i % 2 != 0) {
> +	    i++;
> +	    continue;
> +	}

Add comment:
/* Test if the last 8 bits equals the last 8 bits of the 16bit char. */

@@ +1157,3 @@
>      const typename InnerMatch::Extent extent = InnerMatch::computeExtent(pat, patlen);
> +    uint32_t i = 0;
> +    uint32_t n = textlen-patlen;

This should be:
uint32_t n = textlen - patlen + 1;

@@ +1157,4 @@
>      const typename InnerMatch::Extent extent = InnerMatch::computeExtent(pat, patlen);
> +    uint32_t i = 0;
> +    uint32_t n = textlen-patlen;
> +    while (i < n + 1) {

i < n
Attachment #8433280 - Flags: review?(hv1989) → review+
Assignee

Comment 20

5 years ago
Hey,
const char *pat8 = (const char *) pat;
It did not let me do that. That is why I had to resort to that technique actually.
And textend != t gives segmentation fault and hence fails some jit-tests. That's why I changed it to what it is now.
const char *pat8 = (const char *)&pat;
? Or the more C++-like
const char *pat8 = reinterpret_cast<const char *>(&pat);

Would that need a check for big-endian?
Reporter

Comment 22

5 years ago
(In reply to Sushant Dinesh from comment #20)
> const char *pat8 = (const char *) pat;
Could you try what Emanuel suggested

> And textend != t gives segmentation fault and hence fails some jit-tests.
> That's why I changed it to what it is now.
Well that is because of n and n+1 being used interchangeble. If you change everything like I suggested (wrt n), it should not segment fault at all.

Something extra I just noticed is that you are using "tabs" instead of spaces. We use "4 spaces" instead of a tab. To make sure the intendation looks the same on all computers. Can you also replace all tabs with spaces?
Assignee

Comment 23

5 years ago
Attachment #8433280 - Attachment is obsolete: true
Attachment #8434137 - Flags: review?(hv1989)
Assignee

Comment 24

5 years ago
Attachment #8434137 - Attachment is obsolete: true
Attachment #8434137 - Flags: review?(hv1989)
Attachment #8434139 - Flags: review?(hv1989)
Assignee

Comment 25

5 years ago
Attachment #8434139 - Attachment is obsolete: true
Attachment #8434139 - Flags: review?(hv1989)
Attachment #8434146 - Flags: review?(hv1989)
Reporter

Comment 26

5 years ago
I pushed this to tryserver:
https://tbpl.mozilla.org/?tree=Try&rev=e6d858f5ffad

Should take some decent time before the results fly in. But this is based on https://tbpl.mozilla.org/?tree=Mozilla-Inbound&rev=37d09834dd1e. After try server is done, you can take the "Windows XP Opt" builds and "Linux Opt" builds and run them locally and test the testcase and report the improvement?
Reporter

Comment 27

5 years ago
Looking at the tryserver we fail on Linux and Windows.

So can you create some sort of unit tests, testing all corner cases on Linux?
It might just be good to do this for future reference, anyway.

So creating a .js file using the following:
"test".match(/test/);
(but with different find/search strings and testing that the output is correct with assertEq() ).

And also hopefully find the case that is failing on tryserver.
Assignee

Comment 28

5 years ago
Attachment #8434146 - Attachment is obsolete: true
Attachment #8434146 - Flags: review?(hv1989)
Attachment #8435742 - Flags: review?(hv1989)
Reporter

Comment 30

5 years ago
Comment on attachment 8435742 [details] [diff] [review]
Improved the matcher. Added support for future 8bit chars.

Review of attachment 8435742 [details] [diff] [review]:
-----------------------------------------------------------------

Sushant reported a 23% increase for linux. But didn't see a windows improvement. I tried it myself on windows and saw a 5%-8% increase. Mac should have no change.

@Sushant: normally we add checkin-needed to the whiteboard (since you have no push rights yet) and somebody will get this in the tree. For this case I will do it a bit different, since the patch has bitrotten already (the changes for 16/8 bit chars already landed). I'll unbitrot this myself and land it. So this should get into the tree soonish. So you can now focus on the next patch you already started.

Thanks for your work!
Attachment #8435742 - Flags: review?(hv1989) → review+
Assignee

Comment 32

5 years ago
(In reply to Hannes Verschore [:h4writer] from comment #31)
> Landed as:
> https://hg.mozilla.org/integration/mozilla-inbound/rev/14563aa75f9c

Cool. Thanks a lot :)
https://hg.mozilla.org/mozilla-central/rev/14563aa75f9c
Status: NEW → RESOLVED
Closed: 5 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla32
You need to log in before you can comment on or make changes to this bug.