Closed Bug 1581052 Opened 5 years ago Closed 5 years ago

[BinAST] Make getHuffmanLookup faster

Tracking

()

Status:

RESOLVED FIXED

Milestone:

mozilla71

Tracking Flags:

Tracking

Status

firefox71

---

fixed

People

(Reporter: Yoric, Assigned: a.munoz3327)

References

Details

Attachments

(5 files)

getHuffmanLookup_bitInversion.patch 5 years ago Ayrton Muñoz 1.25 KB, patch		Details \| Diff \| Splinter Review
Bug 1581052 - [BinAST] Make getHuffmanLookup faster by inverting bits after reading data r?Yoric 5 years ago Ayrton Muñoz 47 bytes, text/x-phabricator-request		Details \| Review
getHuffmanLookup_noWhileLoop.patch 5 years ago Ayrton Muñoz 9.00 KB, patch		Details \| Diff \| Splinter Review
Bug 1581052 - [BinAST] Make getHuffmanLookup faster by calling readBuf once r?Yoric 5 years ago Ayrton Muñoz 47 bytes, text/x-phabricator-request		Details \| Review
Bug 1581052 - [BinAST] removed type-punning in getHuffmanLookup r?tcampbell,Yoric 5 years ago Ayrton Muñoz 47 bytes, text/x-phabricator-request		Details \| Review

David Teller [:Yoric] - still alive but not very active

Reporter

Description

•

5 years ago

Currently, on my benchmarks, getHuffmanLookup shows up as taking 6.5% of parsing time. For a tiny method that only performs a few arithmetic operations, that's too much :)

So what can we do about it?

Look at readByte and either make it faster or call it only once?
Somehow make the per-byte bit inversion faster?
Somehow get rid of the per-byte bit inversion?

Steven DeTar [:sdetar]

Updated

•

5 years ago

Priority: -- → P2

David Teller [:Yoric] - still alive but not very active

Reporter

Comment 1

•

5 years ago

I see two ways to make getHuffmanLookup faster:

we know that we're only ever going to read at most 4 bytes, so we could specialize the code for 1, 2, 3 or 4 bytes and unroll the loop;
we don't actually need to always have MAX_CODE_BIT_LENGTH bits available, we just need the number of bits necessary for the next lookup, which is quite often very small, so we could rewrite as getHuffmanLookup(uint8_t bitsNeeded)

Ayrton Muñoz

Assignee

Comment 2

•

5 years ago

Hi David, I'd like to work on this. I don't know how this is typically benchmarked though. I'm familiar with profilers and have started playing with the Firefox profiler, but wanted to check if you use something else to benchmark this.

David Teller [:Yoric] - still alive but not very active

Reporter

Comment 3

•

5 years ago

We'll be happy to have help, but this is clearly not an easy bug. If you're ok with this, I can send you the benchmarking data.

Flags: needinfo?(a.munoz3327)

Ayrton Muñoz

Assignee

Comment 4

•

5 years ago

(In reply to David Teller [:Yoric] (please use "needinfo") from comment #3)

We'll be happy to have help, but this is clearly not an easy bug. If you're ok with this, I can send you the benchmarking data.

That's fine with me. Can you send the benchmark data?

Flags: needinfo?(a.munoz3327) → needinfo?(dteller)

David Teller [:Yoric] - still alive but not very active

Reporter

Comment 5

•

5 years ago

Ok, let's do this :)

Sharing the benchmark data through Firefox Send as this is a 700Mb+ patch (almost entirely data), not meant to be landed into Firefox: https://send.firefox.com/download/3ebd6d0d98776f50/#FtmQsHCYWXjn1cecS7zvEw .

This is very rough around the edges as I only ever intended to run these benchmarks myself.

You'll need to configure SpiderMonkey with options --enable-optimize --disable-debug for bechmarking purposes.

To run the benchmarks, go to js/src and run ./OPT/dist/bin/jsapi-tests testBinASTReaderContextRealJSSamples (replace OPT with the name of your build directory). The benchmarks can take a while to run, in particular if you're running them from a profiler. Once you look at the profile, we are only interested in code that is executed inside BinASTParserPerTokenizer::parse().

On my machine, we currently get the worse offenders summarized at https://bugzilla.mozilla.org/show_bug.cgi?id=1577764#c5 .

If you're still with me, I'll be happy to chat about possible optimizations :)

Flags: needinfo?(dteller)

David Teller [:Yoric] - still alive but not very active

Reporter

Comment 6

•

5 years ago

Also, forgot to mention, the code we're look at is that one https://searchfox.org/mozilla-central/source/js/src/frontend/BinASTTokenReaderContext.cpp#1258 .

Ayrton Muñoz

Assignee

Comment 7

•

5 years ago

Attached patch getHuffmanLookup_bitInversion.patch — Details — Splinter Review

Thanks for sending the benchmarking stuff, I really appreciate all the help.

I started looking at making the per-byte bit inversion in getHuffmanLookup faster by masking and shifting bits in parallel as described here http://graphics.stanford.edu/~seander/bithacks.html#ReverseParallel. I'm attaching a patch of what I tried in case that isn't clear. The bit inversion itself takes about half as many operations as the original way, but the speed increase is pretty small and probably negligible. I don't think there's a way to do the bit inversion with fewer operations without using lookup tables, but I don't think that would be that much faster.

From your suggestions in comment #1 I think specializing the code for the number of bytes and unrolling the loop sounds pretty straightforward, but I don't quite understand the second idea. What do you mean by "we just need the number of bits necessary for the next lookup"? I'm still going through the file you pointed out and BinASTParser.cpp to get some context, but just wanted to make sure I'm looking in the right places.

Flags: needinfo?(dteller)

Ayrton Muñoz

Assignee

Comment 8

•

5 years ago

Attached file Bug 1581052 - [BinAST] Make getHuffmanLookup faster by inverting bits after reading data r?Yoric — Details

Doing the per-byte bit inversion on the lower 4 bytes of this->bits in place
after reading data is noticeably faster.

David Teller [:Yoric] - still alive but not very active

Reporter

Comment 9

•

5 years ago

(In reply to Ayrton Muñoz from comment #8)

Created attachment 9094094 [details]
Bug 1581052 - [BinAST] Make getHuffmanLookup faster r?Yoric

Doing the per-byte bit inversion on the lower 4 bytes of this->bits in place
after reading data is noticeably faster.

Ah, interesting. Does this pass tests?

For testing, I suggest having a second build with --enable-debug --disable-optimize in a different build directory.

Then, after rebuilding,

./DEBUG/dist/bin/jsapi-tests testBinASTReaderContextECMAScript2 for the quick tests; or
./DEBUG/dist/bin/jsapi-tests testBinASTReaderContextRealJSSamples to use the ~700Mb samples in the patch I sent you for testing.

From your suggestions in comment #1 I think specializing the code for the number of bytes and unrolling the loop sounds pretty straightforward, but I don't quite understand the second idea. What do you mean by "we just need the number of bits necessary for the next lookup"? I'm still going through the file you pointed out and BinASTParser.cpp to get some context, but just wanted to make sure I'm looking in the right places.

To do this, the idea is

1/ to add a method maxBitLength() to classes HuffmanTableImplementation*, just to return maxBitLength (you can of course just make the field public for experimentation);
2/ rewrite getHuffmanLookup(BinASTTokenReaderContext& owner) into getHuffmanLookup(BinASTTokenReaderContext& owner, uint8_t requiredBitLength);
3/ replace the MAX_PREFIX_BIT_LENGTH in getHuffmanLookup() with requiredBitLength;
4/ fix all the call sites of getHuffmanLookup() to pass table.maxBitLength() as argument requiredBitLength.

On the upside, you shouldn't need to fix BinASTparser.cpp :)

Flags: needinfo?(dteller) → needinfo?(a.munoz3327)

Ayrton Muñoz

Assignee

Comment 10

•

5 years ago

So after taking a closer look at the test results I realized my previous patch was actually failing a lot of tests (I thought it was running suspiciously fast...). I fixed the problem by reading the data into newBits while shifting this->bits to make room for the data. Then newBits is inverted in place outside the loop and added to this->bits. This version passed all the quick tests and my profiles show it spends 15% less time in getHuffmanLookup. Some of the tests in your benchmarking data failed, but I get the same thing when I run it with an unmodified build so I'm guessing it's ok.

I also modified getHuffmanLookup() to take the number of bits required like you suggested. It's passing all tests, but with both this and my other change it's only spending 10% less time in getHuffmanLookup than the unmodified build. I only recently got to this point so I think I need to take a closer look to figure out why we're getting this unexpected result.

Flags: needinfo?(a.munoz3327)

David Teller [:Yoric] - still alive but not very active

Reporter

Comment 11

•

5 years ago

Yeah, you should use the testBinASTReaderContextECMAScript2 suite for testing. The other one makes uses of features that have not implemented yet, so it's not a good test.

Now, if your change works better without my suggestion, that's ok with me :) Mine was just a hunch, if the hunch proves wrong, don't use it!

Ayrton Muñoz

Assignee

Comment 12

•

5 years ago

Attached patch getHuffmanLookup_noWhileLoop.patch — Details — Splinter Review

Here's a new version that only reads data once to avoid the while loop. It spends 26% less time in getHuffmanLookup than the unmodified code and passes all tests on my machine :)

However, I don't think it wouldn't work on big endian machines because I'm casting a uint8_t array to uint64_t *. It'd be pretty straightforward to modify for big endian by adding something like

#ifdef MOZ_BIG_ENDIAN
  //call the original readBuf() here
#elif MOZ_LITTLE_ENDIAN
  //call my new readBufReverse() here
#endif

but I don't know if these are always defined or if this is something we'd like to avoid which is why I'm attaching it here. The patch has updated comments and uses an extended example to make things clear, but lmk if something doesn't make sense.

Flags: needinfo?(dteller)

David Teller [:Yoric] - still alive but not very active

Reporter

Comment 13

•

5 years ago

Would any of the code in https://searchfox.org/mozilla-central/source/mfbt/EndianUtils.h be helpful?

Flags: needinfo?(dteller) → needinfo?(a.munoz3327)

Ayrton Muñoz

Assignee

Comment 14

•

5 years ago

(In reply to David Teller [:Yoric] (please use "needinfo") from comment #13)

Would any of the code in https://searchfox.org/mozilla-central/source/mfbt/EndianUtils.h be helpful?

Thanks! that's exactly what I was looking for

Flags: needinfo?(a.munoz3327)

David Teller [:Yoric] - still alive but not very active

Reporter

Updated

•

5 years ago

Flags: needinfo?(a.munoz3327)

Phabricator Automation

Updated

•

5 years ago

Attachment #9094094 - Attachment description: Bug 1581052 - [BinAST] Make getHuffmanLookup faster r?Yoric → Bug 1581052 - [BinAST] Make getHuffmanLookup faster by inverting bits after reading data r?Yoric

Ayrton Muñoz

Assignee

Comment 15

•

5 years ago

Attached file Bug 1581052 - [BinAST] Make getHuffmanLookup faster by calling readBuf once r?Yoric — Details

Reading data with only one call to readBuf() to avoid the while loop reduces
time spent in getHuffmanLookup() by 26%. Also added readBufReverse() to
return the data in reverse order if MOZ_LITTLE_ENDIAN is set.

Depends on D46559

Ayrton Muñoz

Assignee

Comment 16

•

5 years ago

Just split the patch into two like you suggested and I'm currently working on revisions to the new patch.

Flags: needinfo?(a.munoz3327)

David Teller [:Yoric] - still alive but not very active

Reporter

Comment 17

•

5 years ago

Thanks :)

Ping me when you're ready for me to review! In the meantime, I'm assigning the bug to you :)

Assignee: nobody → a.munoz3327

Flags: needinfo?(a.munoz3327)

Ayrton Muñoz

Assignee

Comment 18

•

5 years ago

It's ready for review

Flags: needinfo?(a.munoz3327) → needinfo?(dteller)

David Teller [:Yoric] - still alive but not very active

Reporter

Comment 19

•

5 years ago

Let's run the tests: https://treeherder.mozilla.org/#/jobs?repo=try&revision=dd743d306fea153d773c4eca7ce1905cee2f4a20 .

David Teller [:Yoric] - still alive but not very active

Reporter

Updated

•

5 years ago

Flags: needinfo?(dteller)

David Teller [:Yoric] - still alive but not very active

Reporter

Comment 20

•

5 years ago

So, tests seem to pass :)
If you're ready, we can land the code and see if it sticks!

Flags: needinfo?(a.munoz3327)

Ayrton Muñoz

Assignee

Comment 21

•

5 years ago

(In reply to David Teller [:Yoric] (please use "needinfo") from comment #20)

So, tests seem to pass :)
If you're ready, we can land the code and see if it sticks!

yep I think it's ready to land :)

Flags: needinfo?(a.munoz3327)

David Teller [:Yoric] - still alive but not very active

Reporter

Comment 22

•

5 years ago

"We're sorry, Autoland could not rebase your commits for you automatically. Please manually rebase your commits and try again. applying /tmp/tmpbt18xX js/src/frontend/BinASTTokenReaderContext.cpp Hunk #2 FAILED at 1276. 1 out of 2 hunks FAILED -- saving rejects to file js/src/frontend/BinASTTokenReaderContext.cpp.rej abort: patch command failed: exited with status 256"

Could you pull the latest mozilla-central and rebase your code?

Flags: needinfo?(a.munoz3327)

Ayrton Muñoz

Assignee

Comment 23

•

5 years ago

(In reply to David Teller [:Yoric] (please use "needinfo") from comment #22)

"We're sorry, Autoland could not rebase your commits for you automatically. Please manually rebase your commits and try again. applying /tmp/tmpbt18xX js/src/frontend/BinASTTokenReaderContext.cpp Hunk #2 FAILED at 1276. 1 out of 2 hunks FAILED -- saving rejects to file js/src/frontend/BinASTTokenReaderContext.cpp.rej abort: patch command failed: exited with status 256"

Could you pull the latest mozilla-central and rebase your code?

done

Flags: needinfo?(a.munoz3327)

Pulsebot

Comment 24

•

5 years ago

Pushed by dteller@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/b33ee74d78c3
[BinAST] Make getHuffmanLookup faster by inverting bits after reading data r=Yoric
https://hg.mozilla.org/integration/autoland/rev/28a5e5744f1f
[BinAST] Make getHuffmanLookup faster by calling readBuf once r=Yoric

Andreea Pavel [:apavel]

Comment 25

•

5 years ago

bugherder

https://hg.mozilla.org/mozilla-central/rev/b33ee74d78c3
https://hg.mozilla.org/mozilla-central/rev/28a5e5744f1f

Status: NEW → RESOLVED

Closed: 5 years ago

status-firefox71: --- → fixed

Resolution: --- → FIXED

Target Milestone: --- → mozilla71

David Teller [:Yoric] - still alive but not very active

Reporter

Comment 26

•

5 years ago

Well, this seems to have worked \o/

Thanks for the patch, Ayrton. If you're interested, I can suggest further work.

David Teller [:Yoric] - still alive but not very active

Reporter

Updated

•

5 years ago

Flags: needinfo?(a.munoz3327)

Ayrton Muñoz

Assignee

Comment 27

•

5 years ago

(In reply to David Teller [:Yoric] (please use "needinfo") from comment #26)

Well, this seems to have worked \o/

Thanks for the patch, Ayrton. If you're interested, I can suggest further work.

Yeah I'm definitely interested in finding something else to work on. I've been reading up on the BinAST proposal but I could use some pointers on what I could work on.

Also I just saw someone suggested removing the type-punning in my last patch so I'm submitting a revised version. I ran the tests and performance is about the same as before.

Status: RESOLVED → REOPENED

Flags: needinfo?(a.munoz3327) → needinfo?(dteller)

Resolution: FIXED → ---

Ayrton Muñoz

Assignee

Comment 28

•

5 years ago

Attached file Bug 1581052 - [BinAST] removed type-punning in getHuffmanLookup r?tcampbell,Yoric — Details

BugBot [:suhaib / :marco/ :calixte]

Updated

•

5 years ago

status-firefox71: fixed → affected

Pulsebot

Comment 29

•

5 years ago

Pushed by dteller@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/7c6cde9fc234
[BinAST] removed type-punning in getHuffmanLookup r=tcampbell,Yoric

Stefan Hindli [:stefan_hindli]

Comment 30

•

5 years ago

bugherder

https://hg.mozilla.org/mozilla-central/rev/7c6cde9fc234

Status: REOPENED → RESOLVED

Closed: 5 years ago → 5 years ago

status-firefox71: affected → fixed

Resolution: --- → FIXED

David Teller [:Yoric] - still alive but not very active

Reporter

Comment 31

•

5 years ago

(In reply to Ayrton Muñoz from comment #27)

(In reply to David Teller [:Yoric] (please use "needinfo") from comment #26)

Well, this seems to have worked \o/

Thanks for the patch, Ayrton. If you're interested, I can suggest further work.

Yeah I'm definitely interested in finding something else to work on. I've been reading up on the BinAST proposal but I could use some pointers on what I could work on.

Also I just saw someone suggested removing the type-punning in my last patch so I'm submitting a revised version. I ran the tests and performance is about the same as before.

Thanks, by the way :)

David Teller [:Yoric] - still alive but not very active

Reporter

Updated

•

5 years ago

Flags: needinfo?(dteller)

Marian Raiciof [:marauder]

Updated

•

5 years ago

Regressions: 1588761

David Teller [:Yoric] - still alive but not very active

Reporter

Comment 32

•

5 years ago

marauder, that regression sounds extremely unlikely, given that the code affected is prefed-off.

Flags: needinfo?(marian.raiciof)

Marian Raiciof [:marauder]

Comment 33

•

5 years ago

Thanks David! Marked it as invalid.

Flags: needinfo?(marian.raiciof)

You need to log in before you can comment on or make changes to this bug.