Closed Bug 578916 Opened 15 years ago Closed 14 years ago

JM: Cache results of Math.sin()

Tracking

()

Status:

RESOLVED FIXED

People

(Reporter: sstangl, Assigned: luke)

References

Details

(Whiteboard: fixed-in-tracemonkey)

Attachments

(2 files, 3 obsolete files)

Aggregate argument counts for Math.sin() in SunSpider-0.9.1. 15 years ago Sean Stangl [:sstangl] 2.16 KB, text/plain		Details
WIP 14 years ago Luke Wagner [:luke] 2.57 KB, patch		Details \| Diff \| Splinter Review
patch 14 years ago Luke Wagner [:luke] 14.64 KB, patch		Details \| Diff \| Splinter Review
add missing math-exp cases 14 years ago Luke Wagner [:luke] 15.61 KB, patch		Details \| Diff \| Splinter Review
with lazy cache allocation 14 years ago Luke Wagner [:luke] 16.82 KB, patch	Waldo : review+	Details \| Diff \| Splinter Review

Sean Stangl [:sstangl]

Reporter

Description

•

15 years ago

Attached file Aggregate argument counts for Math.sin() in SunSpider-0.9.1. — Details

During the discussion of bug 578912, gal suggested caching the results of Math.sin() and possibly other fast natives. Dvander requested a bug per fastnative being analyzed. There may be many like this bug, but this one is sin(). Argument counts for Math.sin() are attached. It looks like 216,000 [120*1800] values are meaningful duplicates, while 16,000 are mostly unique.

Sean Stangl [:sstangl]

Reporter

Comment 1

•

15 years ago

SunSpider-0.9.1 contains three tests that call Math.sin(). They naturally have vastly different behavior: 3d-cube has the following arguments to Math.sin() [this is the full output]: 204 0.087266 204 0.052360 204 0.017453 3d-morph has 120 different possible arguments; each argument is repeated 1800 times. Math.sin() is called 120 times between each argument seeing its next repeat; the arguments are thus perfectly cyclic. math-partial-sums contains unique arguments that cannot be cached.

David Mandelin [:dmandelin]

Comment 2

•

15 years ago

JSC did this. I think they got a 1 ms speedup or so.

Brendan Eich [:brendan]

Comment 3

•

15 years ago

This bug is funny. I think Bigfoot (sayrer's benchmark suite) and JSMeter will not repeat the funny, though. /be

David Mandelin [:dmandelin]

Comment 4

•

15 years ago

I dug a little deeper and found we are giving up about 7 ms on 3d-morph due to this. So I guess it matters after all. It's still funny, though. :-D JM wanted SS win: 7 ms

Blocks: JaegerSpeed

Nicholas Nethercote [inactive]

Comment 5

•

15 years ago

What would be done for 3d-morph? A cycle of size 120 is hard to cache simply.

David Mandelin [:dmandelin]

Comment 6

•

15 years ago

The main part of the JSC win might not be due to caching. Maybe our native calls are too slow or something. We should check on the data again after the current work on calls is done to see if there is anything left to do.

Nicholas Nethercote [inactive]

Comment 7

•

15 years ago

I thought native calls didn't suffer the same overheads as scripted calls?

David Mandelin [:dmandelin]

Comment 8

•

15 years ago

(In reply to comment #7) > I thought native calls didn't suffer the same overheads as scripted calls? I'm not sure, but I do believe they are at least better currently in JM. Pretty much all I know about this is what's in comment 2 and comment 4: I know that JSC got about a 1 ms win from doing the caching part, and that when I commented out the call to sin on 3d-morph, then we gained about 7 ms relative speed (i.e., commenting it out in the test gave us 7+X ms gain, and JSC just X ms gain). I don't know if that's still true, and if so, why. If you can figure out what's really going on here, that would definitely help.

Luke Wagner [:luke]

Assignee

Comment 9

•

14 years ago

This optimization doesn't seem completely fake, I can imagine real codes that call sin/cos/etc for the same value. I'll take a stab.

Assignee: general → lw

Luke Wagner [:luke]

Assignee

Comment 10

•

14 years ago

Attached patch WIP (obsolete) — Details — Splinter Review

This just does the simple thing and does caching in math_sin (like JSC). That alone makes 3d-morph 61% faster (7ms). It could go even faster by pushing the cache hit to the IC. Also, it seems like The Right Thing to do this for all the trig functions. With a 4KB table each, that could get wasteful, so I was thinking about using just 1 table and including the trig function in the key...

Luke Wagner [:luke]

Assignee

Comment 11

•

14 years ago

Attached patch patch (obsolete) — Details — Splinter Review

With this patch, 3d-morph goes from 16.9ms to 9ms with -j -m. For reference, jsc gets 12ms. (-m gives 10.5ms. Part of the speedup may be that our hash function gives a 99% hit rate on 3d-morph compared to 89% for jsc) This patch puts the cache in the thread-data (to avoid race conditions) and caches sin,cos,tan,log,asin,acos,atan. Instead of giving each function its own cache, the function is used in the cache key so all share one. Since (without the function key) each cache is 65KB (4K entries * 2 * sizeof(double)), this should save a lot of space. (I suspect this is why jsc only caches sin.) Also relevant: while this doesn't measurably slow down any other Math-using benchmarks, if you make a micro-bench that pounds on sin/cos/etc (without ever using the same value), there is an 8% slowdown, due to the hashing/cache-touching.

Attachment #480819 - Attachment is obsolete: true

Attachment #481599 - Flags: review?(jwalden+bmo)

Boris Zbarsky [:bzbarsky]

Comment 12

•

14 years ago

math_exp_tn needs to hit the cache too, I assume? Also, this looks like doing { sin(x); cos(x) } in a loop will always miss, right? Probably fine, but just checking.

Luke Wagner [:luke]

Assignee

Comment 13

•

14 years ago

Attached patch add missing math-exp cases (obsolete) — Details — Splinter Review

(In reply to comment #12) > math_exp_tn needs to hit the cache too, I assume? Thanks, yes. > Also, this looks like doing { sin(x); cos(x) } in a loop will always miss, > right? Probably fine, but just checking. Good point! We could throw some bits of the function-ptr into the hash...

Luke Wagner [:luke]

Assignee

Updated

•

14 years ago

Attachment #481599 - Attachment is obsolete: true

Attachment #481599 - Flags: review?(jwalden+bmo)

Luke Wagner [:luke]

Assignee

Updated

•

14 years ago

Attachment #481603 - Flags: review?(jwalden+bmo)

Luke Wagner [:luke]

Assignee

Updated

•

14 years ago

Attachment #481603 - Attachment is patch: true

Attachment #481603 - Attachment mime type: application/octet-stream → text/plain

Nicholas Nethercote [inactive]

Comment 14

•

14 years ago

(In reply to comment #11) > > This patch puts the cache in the thread-data (to avoid race conditions) and > caches sin,cos,tan,log,asin,acos,atan. Instead of giving each function its own > cache, the function is used in the cache key so all share one. Since (without > the function key) each cache is 65KB (4K entries * 2 * sizeof(double)), this > should save a lot of space. (I suspect this is why jsc only caches sin.) So each thread is now 65KB bigger? That seems like a lot.

Luke Wagner [:luke]

Assignee

Comment 15

•

14 years ago

(In reply to comment #14) > So each thread is now 65KB bigger? That seems like a lot. Smaller caches with the variety of simple/fast hashes I tried get a lot lower hit rate on 3d-morph (60% for 1/2 size, 25% for 1/4). JSC apparently found the same. Feel free to experiment, though, I'm certainly not expert on hash functions. Note that its not each thread, but each thread that runs JS. IIUC (which is a big if) thats main thread + a small number of worker threads.

Nicholas Nethercote [inactive]

Comment 16

•

14 years ago

(In reply to comment #15) > > Smaller caches with the variety of simple/fast hashes I tried get a lot lower > hit rate on 3d-morph (60% for 1/2 size, 25% for 1/4). JSC apparently found the > same. Feel free to experiment, though, I'm certainly not expert on hash > functions. Sure, I can totally believe this is an appropriate hash table size for the purpose of this bug. The broader question is whether this bug is worthwhile, ie. is it worth bloating each JS thread by 65K for an 8ms Sunspider win, when the optimization in question is of questionable general utility? If someone could point to real-world program that would benefit from this I'd be happier.

Mike Shaver (:shaver emeritus)

Comment 17

•

14 years ago

I would not be surprised to discover that a bunch of the 3d-type demos hit the cache; IIRC the JSC guys found hits in the wild when they did it, and we see more mathy stuff these days. Any cache hits in Kraken-land, or on chrome experiments? Most scripts don't use trig, so if we made the cache be initialized on first trig-use, they wouldn't pay for it anyway.

Nicholas Nethercote [inactive]

Comment 18

•

14 years ago

> Most scripts don't use trig, so if we made the cache be initialized on first > trig-use, they wouldn't pay for it anyway. ++shaver

Luke Wagner [:luke]

Assignee

Comment 19

•

14 years ago

(In reply to comment #17) > Most scripts don't use trig, so if we made the cache be initialized on first > trig-use, they wouldn't pay for it anyway. I was going to do that, but it adds a failure mode to the math_sin traceable native. I suppose it would be rather easy, though, to have TraceRecorder create them when recording calls to such traceable natives.

Luke Wagner [:luke]

Assignee

Comment 20

•

14 years ago

Attached patch with lazy cache allocation — Details — Splinter Review

Attachment #481603 - Attachment is obsolete: true

Attachment #481733 - Flags: review?(jwalden+bmo)

Attachment #481603 - Flags: review?(jwalden+bmo)

Jeff Walden [:Waldo]

Comment 21

•

14 years ago

Comment on attachment 481733 [details] [diff] [review] with lazy cache allocation >+class MathCache >+{ >+ static const unsigned SizeLog2 = 12; >+ static const unsigned Size = 1 << SizeLog2; >+ struct Entry { double in; UnaryFunType f; double out; }; >+ Entry table[Size]; >+ >+ public: >+ MathCache(); >+ >+ uintN hash(double x) { >+ union { double d; struct { uint32 one, two; } s; } u = { x }; >+ uint32 hash32 = u.s.one ^ u.s.two; >+ uint16 hash16 = (uint16)(hash32 ^ (hash32 >> 16)); >+ return (hash16 & (Size - 1)) ^ (hash16 >> (16 - SizeLog2)); >+ } >+ >+ /* >+ * N.B. lookup uses double-equality. This is only safe if hash() maps +0 >+ * and -0 to different table entries, which is asserted in MathCache(). >+ */ >+ double lookup(UnaryFunType f, double x) { >+ uintN index = hash(x); >+ Entry &e = table[index]; >+ if (e.in == x && e.f == f) >+ return e.out; >+ e.in = x; >+ e.f = f; >+ return (e.out = f(x)); >+ } >+}; *This* is what all our caches should look like. I note in passing that NaN inputs will never result in cache hits, but I can't believe we care. r=me if you write some tests for a variety of interesting values (infinities, zeroes, NaN, a few values between 0 and 1) and all these Math functions to check that the result of the first call to a function and the result of the second call to the function are equal, and that a first call, followed by traced successive calls, all return the same value.

Attachment #481733 - Flags: review?(jwalden+bmo) → review+

Luke Wagner [:luke]

Assignee

Comment 22

•

14 years ago

Thanks Waldo! math-trace-tests.js covers this all pretty well (it found the +0/-0 bug mentioned in the patch comments). http://hg.mozilla.org/tracemonkey/rev/5b09b3845178

Whiteboard: fixed-in-tracemonkey

Sean Stangl [:sstangl]

Reporter

Comment 23

•

14 years ago

http://hg.mozilla.org/mozilla-central/rev/5b09b3845178

Status: NEW → RESOLVED

Closed: 14 years ago

Resolution: --- → FIXED

Robert Sayre

Comment 24

•

14 years ago

http://hg.mozilla.org/mozilla-central/rev/5b09b3845178

Grant Galitz

Comment 25

•

14 years ago

This optimization seems to help waveform audio being generated for mozWriteAudio.

Luke Wagner [:luke]

Assignee

Comment 26

•

14 years ago

Vindication!

Grant Galitz

Comment 27

•

14 years ago

https://bugzilla.mozilla.org/show_bug.cgi?id=598655 definitely might have somewhat depended on this bug.

Shu-yu Guo [:shu]

Updated

•

12 years ago