cache DST offsets to improve SunSpider score

RESOLVED FIXED in mozilla1.9.3a5

Status

()

Core
JavaScript Engine
RESOLVED FIXED
8 years ago
8 years ago

People

(Reporter: Robert Sayre, Assigned: Waldo)

Tracking

unspecified
mozilla1.9.3a5
Points:
---

Firefox Tracking Flags

(Not tracked)

Details

(Whiteboard: fixed-in-tracemonkey)

Attachments

(6 attachments, 1 obsolete attachment)

(Reporter)

Description

8 years ago
Everyone else is doing it, so I guess we have to as well. Here's what I got by caching the result of PRMJ_DSTOffset in DaylightSavingTA, from jsdate.cpp:

TEST              COMPARISON            FROM                 TO             DETAILS

=============================================================================

** TOTAL **:      1.67x as fast     61.8ms +/- 2.8%   36.9ms +/- 3.3%

=============================================================================

  date:           1.67x as fast     61.8ms +/- 2.8%   36.9ms +/- 3.3%     
    format-xparb: 1.67x as fast     61.8ms +/- 2.8%   36.9ms +/- 3.3%
How do we plan to invalidate the cached value over time?
We could do it on each GC, and everything would probably work out OK.
(In reply to comment #2)
> We could do it on each GC, and everything would probably work out OK.

That's not good news for my nuclear missile silo. Could end up skipping over my launch date!

Comment 4

8 years ago
Exactly. Be careful what you do here. jsc has some code that uses a notification from osx. It clears the cache when DST changes.
I would be happy to discuss license terms for our Spidermonkey DST Manager Enterprise Edition product, which features FIPS and DOE/NNSA-Q certification, and is suitable for a wide variety of mission- and life-critical applications.
(Reporter)

Comment 6

8 years ago
google says the WM_TIMECHANGE event will get this progressively more correct as Windows versions advance.

Comment 7

8 years ago
splendid. lets do it. love the speedup. what an awesome benchmark.
(Assignee)

Updated

8 years ago
Assignee: general → jwalden+bmo
Status: NEW → ASSIGNED
(Reporter)

Comment 8

8 years ago
(In reply to comment #4)
> Exactly. Be careful what you do here. jsc has some code that uses a
> notification from osx. It clears the cache when DST changes.

I didn't find that code. It looks like they just clear their cache on every invocation.
(Assignee)

Comment 9

8 years ago
I have a patch (several patches, actually) for this, doesn't seem to move the needle at all at the moment.  Whether this is because my caching strategy doesn't anticipate SunSpider's needs or whether it's some other reason (including a broken implementation) remains to be seen.  I'll post the first couple deckchair-rearranging patches in a second; the actual win is to be resumed tomorrow morning after my eyes un-dilate a bit...
(Assignee)

Comment 10

8 years ago
Created attachment 447442 [details] [diff] [review]
Patch 1, v1: reorganize DST offset calculation into a class capable of caching results

This is just rearrangement and a constant-rename or two, plus a very slight dose of LL_* macro removal because the other is just fugly and hard to read.
Attachment #447442 - Flags: review?(sayrer)
(Assignee)

Comment 11

8 years ago
Created attachment 447443 [details] [diff] [review]
Patch 2, v1: Regularize and make obvious units of arguments and return values

The current code converts milliseconds into microseconds for PRMJ, which then un-converts in the opposite direction.  Brilliant!  Be clear about what's what at all times, and use the natural dimensionality at the entry point to the cache.  Inside it we then use the natural dimensionality for calculating the DST offset, but that's private and thus not a problem.
Attachment #447443 - Flags: review?(sayrer)
(Assignee)

Comment 12

8 years ago
Created attachment 447662 [details] [diff] [review]
Patch 3, incomplete: Cache offsets for previously-seen, nearly-the-same dates

Results with this are as follows:

TEST              COMPARISON            FROM                 TO             DETAILS

=============================================================================

** TOTAL **:      1.180x as fast    94.8ms +/- 0.6%   80.4ms +/- 0.7%     significant

=============================================================================

  date:           1.180x as fast    94.8ms +/- 0.6%   80.4ms +/- 0.7%     significant
    format-tofte: 1.031x as fast    54.1ms +/- 1.0%   52.4ms +/- 0.9%     significant
    format-xparb: 1.46x as fast     40.8ms +/- 0.9%   28.0ms +/- 1.1%     significant

The xparb win isn't as much as in the first comment because 500 of 4000ish calls are cache misses now.  Maybe that can be improved still, although claims about the test in the office suggest the only way to do so is to double down on this strategy by, say, expanding the cache range by more than a month at a time (which eventually runs into its own problems over over-zealous or unsound expansion).  We'll see; more investigating to do.

The JS test suite passes, but that's unlikely to exercise this caching logic in interesting ways, so I need to write some very targeted tests to address this by checking for cache consistency with reality.  There are also a couple other tasks to be completed: changing some comments, maybe adding a couple more; benchmarking the browser proper; seeing what sort of DST caching stats are on actual websites and not just SunSpider, Benchmarketia; and as noted, further investigating the tests' MOs.  To be continued tomorrow...
(Reporter)

Updated

8 years ago
Attachment #447442 - Flags: review?(sayrer) → review+
(Reporter)

Updated

8 years ago
Attachment #447443 - Flags: review?(sayrer) → review+
(Assignee)

Comment 13

8 years ago
Patches 1 and 2 pushed to TM, with a followup bustage fix:

http://hg.mozilla.org/tracemonkey/rev/8e8837abeab5
http://hg.mozilla.org/tracemonkey/rev/33b1321c6b8f
http://hg.mozilla.org/tracemonkey/rev/aa32a6921eaa

Further work yet remains here, of course.
(Assignee)

Comment 14

8 years ago
Created attachment 447895 [details]
Stats from running with this a little

I built a browser with this and surfed around for a little bit (some Gmail reading, some Reader reading, some random browsing, some searching on Bing) to see what sort of caching results I got with it -- the result is the attached file, with a little simplification.  Most contexts don't even attempt to get the DST offset (65 of 73 contexts), so I removed them entirely.  I also cut out all other lines noting 0 instances for brevity.

In summary: most contexts created (65 of 73) never even attempted to get the DST offset.  Of the remainder, most pages only attempted a handful of times (1, 7, 4, 1, 5).  Most of these hit the cache slightly less to more often than not -- some success, but not enough loss to matter.  The 1s never hit, but that's hardly a surprise.

The remaining contexts had these stats:

hit:
  in range: 19
misses:
  decrease range start:               3
  increase, offset change, expand:    5
  increase, offset change, new range: 7
  decrease, offset change, expand:    35
  decrease, offset change, new range: 8
  large increase:                     5
  large decrease:                     6
total: 88

hit:
  in range: 20
misses:
  increase range end:                 1
  decrease range start:               3
  increase, offset change, expand:    5
  increase, offset change, new range: 7
  decrease, offset change, expand:    35
  decrease, offset change, new range: 8
  large increase:                     7
  large decrease:                     6
total: 92

hit:
  in range: 118
misses:
  increase range end:                 1
  decrease range start:               1
  large increase:                     1
total: 121

The last one obviously benefits quite a bit.  The first two, which quite possibly are from the same site performing the same action (I'm hooking in right before context destruction and don't know who's doing what in these), hit a little over 20% of the time -- not great, possibly a speedup.

It's important to note that not all misses are alike.  Either a wide miss or a narrow miss at an end of the range that's not within thirty days of a DST change results in one costly offset computation -- exactly what already happens.  So the cost of those misses is basically a few integer arithmetic operations, pretty small.  (These are "{in,de}crease range end" and "large {in,de}crease" in the logging.)

It's the misses where we compute two costly offsets that are more worrisome.  Those happen when a narrow miss occurs but thirty days from that end of the range crosses a DST change.  At that point we only know that the offset changed, but since we don't know *where* it changed we have to re-query one more time to get the offset for the originally requested time.  Then, since it must equal the offset for the range or for the 30-days-off date, we can start our new range.

So how do we do on the two-lookup count?  55 of 88 and 55 of 92 lookups miss, and miss semi-expensively -- but the miss-expensively case is partially balanced, and overall it's pretty rare.  Other browsing sessions I ran through on various other sites (Digg, Facebook, Twitter, &c.) broadly reflected the overall verdict: most places never compute a single DST offset, some of the ones that do hit the cache nearly every time, some compute so few DST offsets that for them the overall effect is a wash but is already negligible, and in almost every case caching DST offset computations -- at least at the current date -- only very rarely results in doubled computation.

So, overall, the cache is either of negligible value or is hit often enough that it does demonstrate wins -- not a huge amount, but enough to say the optimization is basically worthwhile independent of its being in a benchmark.  It's way way way off the beaten path of what the real web does, to be sure, but it has motivation beyond its applicability to a benchmark.
(Assignee)

Comment 15

8 years ago
Created attachment 448156 [details] [diff] [review]
Patch 3, nearly done: Cache offsets for previously-seen, nearly-the-same dates

This updates the comments and includes some testing, but it may still need a little work.  (The test might need to be split up, for example.)  Nevertheless it's nearly done.
(Assignee)

Comment 16

8 years ago
Created attachment 448611 [details] [diff] [review]
Patch 3, v1: Cache offsets for previously-seen, nearly-the-same dates, test cache behavior

Okay, this should be good to go.  I bumped the cache-testing depth up by one to catch any weird hysteretic bugs that three iterations wouldn't catch, dropping the number of timestamps tested a bit to compensate for overall test-execution time.  I also intentionally flipped various conditions in the algorithm to test that the tests would fail from the resulting bugs, so I think there's little danger of the test not being correct and thereby missing a bug.
Attachment #448156 - Attachment is obsolete: true
Attachment #448611 - Flags: review?(sayrer)
(Reporter)

Comment 17

8 years ago
Comment on attachment 448611 [details] [diff] [review]
Patch 3, v1: Cache offsets for previously-seen, nearly-the-same dates, test cache behavior

The implementation of purge() seems fairly brittle. I would probably set a mustFillCache flag or something. But up to you.
Attachment #448611 - Flags: review?(sayrer) → review+
(Assignee)

Comment 18

8 years ago
http://hg.mozilla.org/tracemonkey/rev/edbf0ecd1d1a

Will work on a blog post on this for the benefit of the web-developing world at large, regarding the nature of our perf-winning efforts...
OS: Mac OS X → All
Hardware: x86 → All
Whiteboard: fixed-in-tracemonkey
Target Milestone: --- → mozilla1.9.3a5
(Reporter)

Comment 19

8 years ago
(In reply to comment #18)
> http://hg.mozilla.org/tracemonkey/rev/edbf0ecd1d1a
> 
> Will work on a blog post on this for the benefit of the web-developing world at
> large, regarding the nature of our perf-winning efforts...

Let's save it for when we are winning.

We have an unrelated problem... the time tests are kind of slow. Can we add them to the Slow set?
(Assignee)

Comment 20

8 years ago
The point wasn't to say we were fast, just to detail the sort of optimizations involved in making SunSpider speedier.  But fair enough, there's no particular reason it need or should happen now rather than at some later, as a retrospective.

As for slow, looking at this now; run in isolation the tests are all fine for me, but run with -j4 they start to be flaky.  I'm trying to figure out, at least on my local system, where the crunch point is to see how much time I need to gain back to make this work, which will help with determining what possible solutions make more sense and what make less.
(Assignee)

Comment 21

8 years ago
Hm, so -j2 they take 1:40ish each.  -j4 they take 3:40ish each.  Clearly some sort of parallel-performance cliff for this handful of tests.  Still thinking...

Comment 22

8 years ago
Can we add --no-slow-tests or --slow-tests. This is totally killing the usefulness of our test suite for quick sanity checking right before checkin.
This is a no-brainer -- Waldo, you were rightly in favor of kicking the O(n^2) measuring tests out of the default list. This is as bad or worse, in my recent experience.

/be
(Assignee)

Comment 24

8 years ago
Split up the test into eight rather than four.  Run with -j4, any time for any individual part that I can get is a good minute under the timeout limit.

For an interesting data point, these tests when split into eight, run with -j4, take from 1:05 to 1:29 to run.  Running one of those splits manually, in parallel with nothing, takes 8.89s.  So *something* about running in parallel means you get a 10x slowdown.  Fun times.
(Reporter)

Comment 25

8 years ago
http://hg.mozilla.org/mozilla-central/rev/edbf0ecd1d1a
Status: ASSIGNED → RESOLVED
Last Resolved: 8 years ago
Resolution: --- → FIXED

Comment 26

8 years ago
Created attachment 450854 [details] [diff] [review]
android-build

changeset http://hg.mozilla.org/mozilla-central/rev/edbf0ecd1d1a made the jscntxt.cpp broken on build for Android platform. Changing the added macro __STDC_LIMIT_MACROS before all sys includes can solve this.
Attachment #450854 - Flags: review?

Comment 27

8 years ago
Comment on attachment 450854 [details] [diff] [review]
android-build

Harry, can you land the patch?
Attachment #450854 - Flags: review? → review+

Comment 28

8 years ago
And(In reply to comment #27)
> (From update of attachment 450854 [details] [diff] [review])
> Harry, can you land the patch?

I don't have committer access to the m-c repo.

Updated

8 years ago
Keywords: checkin-needed

Comment 29

8 years ago
Re-opened so we don't forget to land the 1-liner.
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Followup bug is much better for auditing and bisecting and such. Sayrer already merged into m-c and resolved FIXED. Please file a new bug and re-resolve this one.

/be

Comment 31

8 years ago
Bug 571751 will do the rest.
(In reply to comment #31)
> Bug 571751 will do the rest.

Thanks, Harry.

/be
Status: REOPENED → RESOLVED
Last Resolved: 8 years ago8 years ago
Resolution: --- → FIXED

Updated

8 years ago
Keywords: checkin-needed
You need to log in before you can comment on or make changes to this bug.