If you think a bug might affect users in the 57 release, please set the correct tracking and status flags for Release Management.

JM: interpret (some?) scripts a few times before compiling them

RESOLVED FIXED

Status

()

Core
JavaScript Engine
RESOLVED FIXED
7 years ago
5 years ago

People

(Reporter: njn, Assigned: billm)

Tracking

({footprint})

unspecified
footprint
Points:
---
Dependency tree / graph

Firefox Tracking Flags

(fennec2.0+)

Details

(Whiteboard: fixed-in-tracemonkey)

Attachments

(5 attachments, 7 obsolete attachments)

(Reporter)

Description

7 years ago
The basic idea here is that we should not compile all scripts (functions, more or less) on their first execution in order to save memory.

Confusingly, four bugs have been created on this topic.

- Bug 631581 has some space measurements and analysis which shows that lots (eg. ~50%) of scripts are only called once, and ~10% of scripts contain loops that iterate at least once.

- Bug 631706 proposes compiling loopless scripts only after they've been called at least once.  This requires a single 1-bit counter per function.  This is the simpler of two proposals and looks to reduce JM's memory usage by ~2x and the browser's by ~3--10%.  I call this the "one-counter approach".

- Bug 631714 proposes compiling scripts only after they've been called at least M times, or contain a loop that has iterated at least N times.  M and N are likely to be small, eg. 16.  This looks to reduce JM's memory usage by ~5-10x and the browser's by ~5--20% (or more).  I call this the "two-counter approach".  Note that the one-counter approach is just a specialization of the two-counter approach with M=1 and N=0.

- Bug 631740 has an implementation of the two-counter and some Sunspider measurements with different values of M and N.  It suggests that Sunspider performance isn't regressed unless large values of M and N are used (eg. M=4096 or N=512).

Both approaches should make normal browsing a shade faster as well.

This bug has been created to subsume the above four bugs and house all future discussion/patches/etc.  This makes sense because it's a single idea we're contemplating.

We are considering this for Firefox 4.0.  Even though it's very late, it's potentially a very big win.  Risks include:

- Introducing new correctness bugs, though it's not very complicated.

- Regressing performance in some cases, perhaps due to disturbing trace tuning, though Sunspider at least seems ok, and small values of M and N seem like they should only disturb programs that are very sensitive to tuning.

I'm arguing strongly that we should push to get the two-counter approach in Firefox 4.0, because the browser uses a lot of memory, the JS engine is responsible for a lot of that, and this change helps a great deal.
(Reporter)

Updated

7 years ago
Duplicate of this bug: 631581
(Reporter)

Updated

7 years ago
Duplicate of this bug: 631714
(Reporter)

Updated

7 years ago
Duplicate of this bug: 631706
(Reporter)

Updated

7 years ago
Duplicate of this bug: 631740
(Reporter)

Updated

7 years ago
blocking2.0: --- → ?
Keywords: footprint
I hope we can fix this bug for the last beta, but right now, I wouldn't recommend delaying the release for an estimated 10% footprint reduction. I would suggest doing it in a dot release if we can't make the main one. We should keep discussing it, though, as more data and patches come in.
blocking2.0: ? → .x
tracking-fennec: --- → ?

Comment 6

7 years ago
in reply to
https://bugzilla.mozilla.org/show_bug.cgi?id=631733#c9
>Nicholas Nethercote wrote
>Bug 631951 will greatly reduce JaegerMonkey's memory usage, if it makes it into
>Firefox 4.0 (I sure hope it does).
Thanks for this Nicholas, I think this really needs to be sorted out before the 4 release as inevitably people will compare the memory usage to Chrome, Opera, IE9 and it's not going to look good currently

I presume your saw my test with the Askmen page where I tracked down jaegermonkey as the culprit for doubling the memory usage between beta 6 and 7 of one tab on this askmen survey
http://www.askmen.com/specials/2010_great_male_survey/
https://bugzilla.mozilla.org/show_bug.cgi?id=631733#c1

FF4 Beta 6 added 43MB clicking a tab on that page, Beta 7 added 91MB, just tested other browsers, IE9 added 25MB, Iron(Chrome)8 added 15MB to load that tab, Opera 11 had no change at all....

Comment 7

7 years ago
Beta 7 introduced JaegerMonkey, so that is no surprise. JaegerMonkey uses extra RAM to store its compiled code. This bug is about reducing that memory use.

Comment 8

7 years ago
(In reply to comment #7)
> Beta 7 introduced JaegerMonkey, so that is no surprise. JaegerMonkey uses extra
> RAM to store its compiled code. This bug is about reducing that memory use.
Sure I realised that which is why I used as it as an example, I just posted to show how now it looks even worse compared to other browsers in memory usage.
(Reporter)

Comment 9

7 years ago
orangezilla: the askmen one sounds like it's leaking.  I opened bug 632234 to track cases like that.  This bug is not for leak-related problems.

Comment 10

7 years ago
(In reply to comment #9)
> orangezilla: the askmen one sounds like it's leaking.  I opened bug 632234 to
> track cases like that.  This bug is not for leak-related problems.
Currently on 30th and onwards builds, I'm not sure about before that, as I often see similar 90-100MB jumps loading pages or tabs on multiple websites.
(Reporter)

Comment 11

7 years ago
I did some measurements.

TL;DR
------
The one-counter approach with a threshold of 16 looks like a good strategy.
We should try it and measure it.

My measurement process
----------------------
- 64-bit opt build (with symbols, ie. compiled with -g) on Ubuntu 10.04

- Create a profile with the following pages open:

  nytimes.com
  theage.com.au
  www.techcrunch.com
  www.cad-comic.com/cad/
  aec.gov.au
  about:memory

- Open a fresh browser instance, hit "restore session", which opens a new
  window.

- Close the old window containing about:home.

- Wait until all pages have finished loading (as judged by when the spinning
  icons stop) then wait another five seconds just to be sure.

- Then refresh about:memory and read the js/mjit-code value.

I did this with cdleary's patch, for a bunch of JS_CALLS_BEFORE_COMPILE and
JS_BACKEDGES_BEFORE_COMPILE values (which I'll abbreviate to 'C' and 'B').

The results
-----------
Here are the js/mjit-code values (in MB (not MiB)) for the different
combinations:

      B=0       1       2       4       8       16      32      64      1000
----------------------------------------------------------------------------
C= 0 |  52.0    X       X       X       X       X       X       X       X
   1 |  36.7    36.8                                                    36.6
   2 |  31.0            31.4
   4 |  28.8                    28.8
   8 |  26.8                            26.8
  16 |  11.9                                    12.6
  32 |  7.5                                             7.2
  64 |  6.4                                                     6.2
1000 |  1.9                                                             2.0

First, consider the leftmost column, where C=N and B=0.  This is the case
where we delay compiling a script until it's been called N times, with the
exception that we compile as soon as a back-edge is hit.  (If we can assume
that all functions that contain a loop execute at least one iteration of
that loop, which is probably almost true, this is the same as the
one-counter count-only-calls-to-loopless-scripts approach.)  The amount of
memory usage goes down clearly as C increases.  Importantly, C=1 only gives
a 1.4x reduction, so a 1-bit counter isn't good enough.  C=16 gives a 4.4x
reduction.

Now, consider the (C=0,B=N) row.  It's marked with Xs because it doesn't
make sense, except for the (C=0,B=0) case -- there's no point counting loop
iterations if we compile a function as soon as it's called.

Next, consider the (C=N,B=N) diagonal.  The results are almost the same as
the leftmost (C=N,B=0) column.  (If everything was deterministic, you'd
expect the diagonal values to always be less than or equal to the leftmost
column;  this isn't true due to noise in the measurements.)  This indicates
that counting loop iterations does very little to reduce memory use.  This
makes a certain amount of sense:  there must not be many scripts that are
both (a) called only a few times and (b) contain loops with a small number
of iterations.  I tried (C=1,B=1000) as an additional check, for little
difference.

From a speed point of view, we probably want C+B to be as small as possible
-- that minimizes the risk that we get some kind of unexpected time
blow-out.

Effect on total browser usage
-----------------------------
Now, (C=16,B=0) looks like a sweet-spot.  It reduces mjit memory usage by
4.4x, or 40MB.  What does that do to total browser usage?  Well, it depends
how you measure it.  For the above workload, about:memory reported "memory
in use" values (which are more important than "memory mapped" values) in the
range 227--316MB.  Yes, the variation was that high.  (In fact, it seemed to
generally increase as I kept restarting Firefox, which was worrying.)  But
about:memory only measures the heap (ie. malloc/new/new[]).  js/mjit-code
itself measures a mixture of (1) heap space (for JITScripts and IC data)
which is included in about:memory's numbers, and (2) mmap'd executable code.
So it's hard to compare "memory in use" with js/mjit-code.

If I look at 'top' or /proc/pid/status, I get total virtual memory figures
around 900MB and RSS figures around 350MB (again with lots of variation).
So that's about a 5% improvement on the VM total figure, but that includes
heaps of stuff that'll never be touched like thread stacks and big chunks of
code and data sections.  My previous measurements with Massif lead me to
believe that halving that VM total number gives roughly something like the
actual memory touched by the browser.  In which case the improvement is
roughly 10%.  And this workload isn't particularly JS heavy;  opening 20
techcrunch.com tabs would be much heavier.

So it's probably easier to just focus on the mjit numbers:  we know that
they can be big, so let's reduce them as much as possible while minimizing
performance hits on things like Sunspider, and also while minimizing risk.

What now?
---------
I think that implementing the simpler one-counter approach and measuring it
for a range of C values is the next step.  I believe dvander has already
made progress there.
That data looks like memory usage is simply independent of B for that test corpus, right?
(Reporter)

Comment 13

7 years ago
(In reply to comment #12)
> That data looks like memory usage is simply independent of B for that test
> corpus, right?

Yes.  That's why I'm suggesting we implement just the C counter and measure that.
Created attachment 510499 [details] [diff] [review]
one-counter attempt

Great data, Nick. Today I hacked up some really dumb version of the one-counter approach, as a fallback in case we can't get something fancier into the tree in time. So I tried your workload & steps with C=0, 1, and 16, and with Bill's patch in the about:compartments bug to track cumulative memory.

C=0:
js/mjit-code              46,237,592
js/mjit-code-cumulative   47,421,256

C=1:
js/mjit-code              31,895,272
js/mjit-code-cumulative   31,895,480

C=16:
js/mjit-code              27,979,880
js/mjit-code-cumulative   28,333,672

These results aren't as good as yours with Chris's patch, so I'll try that tomorrow to see how it differs. My attempt here was to get absolutely no SunSpider loss with -m -j -p, and Bill said that with the two-counter patch, they only tested -m. However, with this one, C=0 and C=16 make no difference with SunSpider, so we can probably try cranking it up more.

Anyway, the only weird special case is that if (interpMode != JSINTERP_NORMAL), i.e. profiling or recording a trace, we have to method JIT aggressively or else we lose about 1-2% on SunSpider. I'll investigate more tomorrow as I'm running out of time today.
(Reporter)

Comment 15

7 years ago
I tried dvander's patch and got even worse results:

  C=0:    56.3 MB
  C=1:    45.7 MB
  C=16:   42.6 MB
  C=1000: 42.1 MB

That C=1000 number made me really suspicious.  So I tried removing the |!script->hasLoops| part of the test in CanMethodJIT() and got these numbers:

  C=16:   13.0 MB
  C=1000:  1.6 MB

Much closer to comment 11.  I can see two possibilities:

(1) Most of the functions that are called a small number of times contain a loop that executes at most once.  This doesn't seem that likely.

(2) There's a problem with the computation of script->hasLoop.

Maybe there's another possibility I haven't thought of?
(Reporter)

Comment 16

7 years ago
Probably more likely is there's a difference between cdleary's patch and dvander's patch that I haven't accounted for.
(Reporter)

Comment 17

7 years ago
I worked out a possible cause.  dvander's patch is conservative in one sense:  if a script is compiled, all nested scripts within it are also compiled.  So if you have a loop at the top-level of a file, the top-level script will be compiled, and so will every function within it.

(Although, I don't understand how cdleary's patch avoids that problem...)
(Reporter)

Comment 18

7 years ago
Hmm, my analysis in comment 11 assumes cdleary's patch is correct.  But it looks like js_MonitorBranch() is only called if JS_TRACER isn't defined.  That might explain why the B parameter didn't seem to do anything.

Also, another likely difference between the two patches:  cdleary's patch changes RunScript() so that a script always is interpreted once.
(Reporter)

Comment 19

7 years ago
dvander said he gets a 20% SS regression with cdleary's patch with (C=1,B=0).  So I think that patch is too rough to be considered further, and the data in comment 11 is suspect.
(Reporter)

Comment 20

7 years ago
FWIW, here are Sunspider numbers with dvander's patch applied with a threshold of 16:

---------------------------------------------------------------
| millions of instructions executed                           |
| total                        | compiled (may overestimate)  |
---------------------------------------------------------------
|    68.354    68.244 (1.002x) |    44.763    44.760 (------) | 3d-cube
|    40.237    39.372 (1.022x) |    25.189    23.932 (1.053x) | 3d-morph
|    63.760    63.691 (1.001x) |    37.220    37.218 (------) | 3d-raytrace
|    25.287    26.026 (0.972x) |    11.101    11.064 (1.003x) | access-binary-
|    91.818    91.799 (------) |    85.809    85.809 (------) | access-fannkuc
|    29.094    29.004 (1.003x) |    16.514    16.513 (------) | access-nbody
|    36.156    36.136 (1.001x) |    29.335    29.334 (------) | access-nsieve
|     7.378     7.366 (1.002x) |     3.253     3.253 (------) | bitops-3bit-bi
|    36.514    36.501 (------) |    31.742    31.742 (------) | bitops-bits-in
|    15.820    15.756 (1.004x) |    12.018    12.015 (------) | bitops-bitwise
|    38.001    37.981 (1.001x) |    32.968    32.967 (------) | bitops-nsieve-
|    16.961    16.920 (1.002x) |    12.874    12.870 (------) | controlflow-re
|    23.147    23.073 (1.003x) |    11.792    11.790 (------) | crypto-md5
|    19.218    19.140 (1.004x) |     8.495     8.493 (------) | crypto-sha1
|    68.855    71.106 (0.968x) |    21.930    21.443 (1.023x) | date-format-to
|    51.923    73.529 (0.706x) |     9.617     9.184 (1.047x) | date-format-xp
|    41.606    41.481 (1.003x) |    27.330    27.327 (------) | math-cordic
|    22.768    22.719 (1.002x) |     6.308     6.306 (------) | math-partial-s
|    31.517    31.479 (1.001x) |    26.589    26.588 (------) | math-spectral-
|    47.093    47.317 (0.995x) |    34.563    34.563 (------) | regexp-dna
|    25.947    25.643 (1.012x) |     9.254     9.240 (1.002x) | string-base64
|    60.810    60.734 (1.001x) |    32.705    32.703 (------) | string-fasta
|    95.105    94.880 (1.002x) |    17.269    17.264 (------) | string-tagclou
|    98.241    98.578 (0.997x) |    12.626    12.625 (------) | string-unpack-
|    38.630    38.565 (1.002x) |     8.977     8.975 (------) | string-validat
-------
|  1094.255  1117.052 (0.980x) |   570.252   567.989 (1.004x) | all

date-format-xparb is greatly affected.  An interesting thing is that part of the extra time is in the interpreter, but part of it is in trace compilation.  Without dvander's patch, the trace JIT doesn't get invoked at all on date-format-xparb;  with the patch, it does get invoked quite a lot.  It looks like an example of trace tuning perturbation.
(Reporter)

Comment 21

7 years ago
Kraken numbers, not much change there:

---------------------------------------------------------------
| millions of instructions executed                           |
| total                        | compiled (may overestimate)  |
---------------------------------------------------------------
|  3499.890  3499.606 (------) |  3331.079  3331.056 (------) | ai-astar
|  1811.866  1804.391 (1.004x) |  1324.528  1324.436 (------) | audio-beat-det
|  1327.256  1318.438 (1.007x) |  1164.311  1162.656 (1.001x) | audio-dft
|  1717.309  1709.697 (1.004x) |  1249.322  1249.240 (------) | audio-fft
|  2348.611  2339.535 (1.004x) |  1624.658  1624.575 (------) | audio-oscillat
|  5586.936  5586.894 (------) |  4754.496  4754.495 (------) | imaging-gaussi
|  1951.339  1957.331 (0.997x) |   719.356   721.186 (0.997x) | imaging-darkro
|  4356.213  4354.941 (------) |  3531.506  3531.476 (------) | imaging-desatu
|   640.816   640.328 (1.001x) |    10.150    10.146 (------) | json-parse-fin
|   445.112   432.845 (1.028x) |     5.246     5.111 (1.027x) | json-stringify
|  1131.289  1130.140 (1.001x) |   727.716   727.701 (------) | stanford-crypt
|   665.764   664.899 (1.001x) |   369.794   369.778 (------) | stanford-crypt
|  1164.924  1160.898 (1.003x) |   597.261   597.220 (------) | stanford-crypt
|   497.284   496.624 (1.001x) |   200.364   200.350 (------) | stanford-crypt
-------
| 27144.618 27096.572 (1.002x) | 19609.791 19609.431 (------) | all
tracking-fennec: ? → 2.0next+
If there is a shot at saving >20MB and not regressing SS noticably, Mobile would be very interested in this bug for Fennec 4.0 (blocking-fennec:2.0+)

We are getting killed (literally) from excessive memory use on Android.
Marked blocking fennec
tracking-fennec: 2.0next+ → 2.0+
As I mentioned in bug 631706, Type Inference will need something like this as well: see bug 621077. Of course, TI is post-Firefox 4 anyway.

Updated

7 years ago
tracking-fennec: 2.0+ → ---
tracking-fennec: --- → 2.0+
Created attachment 510940 [details] [diff] [review]
papered over profiler problem

Today Bill found that the reason this patch regresses v8, and his and Chris's patch regressed SunSpider, is because of the profiler. The profiler aborts when re-entering the interpreter. The recorder does too, but it decomposes complex ops (like CALL with fun_call) into imacros. The profiler doesn't do this.

However, we're missing a profiler-abort in EnterMethodJIT. Since we never interpret anything, we never abort the profiler. Bill says this is a bug, but it's not clear how to easily fix it. And our patches to not JIT everything started aborting the profiler, and preventing loops from tracing.

To make this patch not totally regress v8, I've narrowed down where the profiler would abort in the interpreter.

At present, Bill says he has a WIP patch that tunes the profiler a little more and uses backedge counting to lazily method JIT. That approach seems like a promising alternative. I'll have some more data on delaying nested call compilation in a few minutes.
Attachment #510499 - Attachment is obsolete: true
Created attachment 510951 [details] [diff] [review]
delay call IC compilation

Applies on top of the previous patch and delays call IC compilation using the same heuristics. Setting the delay to 16 regressed SunSpider by 10% -  at 1, "only" a 6-7ms (3-4%) regression.

At 1, Nick's browser test case got 4MB better from the non-IC-delay patch (27MB from 31MB). At 16, it's 5MB better. (22MB from 27MB).

I'm looking forward to seeing what Bill has since backedge counting should be more accurate and flexible. It makes more sense to compile a call graph from inside a loop, rather than functions that contain loops. This patch was intended to be the simplest, least regression-prone thing possible, but it's clear these heuristics are tricky no matter what.

Anecdotally: I currently have 13 tabs open on my laptop, using 1.1GB of memory. The JIT code accounts for 27MB of this. On my desktop, with three tabs, the JIT is using 6MB of memory out of 700MB. On Dave Mandelin's computer, I think he said he had five tabs open, it was using 400MB, 7MB of which was JIT.

Completely eliminating the JIT wouldn't make a dent in those numbers. We should make sure any memory savings will really be high impact before taking any sort of non-negligible hit on hard-won benchmarks.
(In reply to comment #26)
> Completely eliminating the JIT wouldn't make a dent in those numbers. We should
> make sure any memory savings will really be high impact before taking any sort
> of non-negligible hit on hard-won benchmarks.

+1. I think we should recap what level of awesome improvement we're looking for here and if we're at all on track to hit that.
(Reporter)

Comment 28

7 years ago
(In reply to comment #26)
> 
> Anecdotally: I currently have 13 tabs open on my laptop, using 1.1GB of memory.
> The JIT code accounts for 27MB of this. On my desktop, with three tabs, the JIT
> is using 6MB of memory out of 700MB. On Dave Mandelin's computer, I think he
> said he had five tabs open, it was using 400MB, 7MB of which was JIT.

What measure are you using for the total -- something from about:memory?  Or something like total VM usage as reported by 'top' or a similar program?  You have to be careful with memory measurements, throwing around numbers without being clear what they are will lead to confusion.

> Completely eliminating the JIT wouldn't make a dent in those numbers. We should
> make sure any memory savings will really be high impact before taking any sort
> of non-negligible hit on hard-won benchmarks.

Because JM throws away old code periodically, these numbers don't mean much if we don't have an idea of how recently all those tabs are loaded.

Peak memory usage is an important measure.  My Massif measurements have repeatedly shown JM code to be among the top handful sources of memory use.  Go read bug 615199 or my old blog posts again if you need reminding.

I can see this patch is looking more complicated than it did at first, and as a result it might not make 4.0.  That doesn't mean it isn't worthwhile.  cdleary asks what kind of improvement we're looking for here -- re-read comment 11.  A 4.4x relative reduction, saving ~40MB for that simple workload, would be excellent.  As I said, that would give a ~5% reduction in total memory use and a ~10% reduction in terms of memory actually touched by the browser.  I consider a 5% to 10% reduction as huge;  how many reductions of that size did we get while reducing our Sunspider time down from ~800ms to ~300ms?

With Sunspider we had clear agreement that the goal was worthwhile, and chasing all those 1% wins (as well as the occasional 5% win) was worth the effort.  It's obvious that the value of memory reduction is less clear in everyone's minds.

Look back at your anecdotal stats.  You're all averaging something close to 100MB per tab.  I think that's unreasonable.  Our memory usage is way higher than in 3.6.  Firefox already has a reputation for being a memory hog.  People are going to notice, especially those on mobile.

Comment 29

7 years ago
(In reply to comment #26)
> Completely eliminating the JIT wouldn't make a dent in those numbers. We should
> make sure any memory savings will really be high impact before taking any sort
> of non-negligible hit on hard-won benchmarks.

Shouldn't lazy compilation actually win us some performance?

According to bug 631581 comment 4 and bug 615199 comment 20 we should actually see speed improvements *and* memory improvements. If we don't then something else is eating up our gains, i.e. there is a hidden performance regression affecting the scripts that should be faster if interpreted instead of being compiled. Either that or those estimates made oversimplifying assumptions.
The 8472, currently used benchmarks don't have any scripts that run few enough iterations to be worth interpreting.  So on benchmarks, lazy compilation is pure lose; the only question is how much.

Comment 31

7 years ago
that may be true for benchmarks, but real webpages have lots of run-once/run-rarely scripts (think of all the ads loaded with javascript, social network widgets, google analytics, registering event handlers with JS libraries etc. etc.) that happens right when the page is loaded.

As far as i can see those benchmarks are very heavy on loops and light on closures, while bug 631581 states that when tested with real webpages that most scripts (50%) are loop-less and an even larger percentage does not run loops often.

In other words this could be a real performance gain, even if it does not show up in synthetic benchmarks. And i don't think that a real optimization should be dropped because it doesn't show up in bad benchmarks.


As i mentioned in another bug lazy compilation could get us performance boosts even in those benchmarks if compiling happens on a separate thread (assuming it's a multi-core system). The JS thread itself can still run in interpreted mode while it's waiting for the compiling to finish. But that's something for the future.
> that may be true for benchmarks,

Yes, and the comment you were responding to was specifically about benchmarks.  We all agree the benchmarks are crap.  Too bad people trust them, so regressing them significantly is bad.  Note that this is different from "improving other stuff while not regressing the benchmark", which we all agree is a good thing.

Comment 33

7 years ago
(In reply to comment #28)
> Because JM throws away old code periodically, these numbers don't mean much if
> we don't have an idea of how recently all those tabs are loaded.
> 
> Peak memory usage is an important measure.

I thought the first part of http://blog.mozilla.com/jseward/2011/01/27/profiling-the-browsers-virtual-memory-behaviour made a good point on the extent of the utility of 'peak memory usage'.  With that understanding, it seems like peak memory usage shouldn't be a primary motivation post-bug 617656.  (Obviously, its not worthless either, as even short-term high-memory usage leads to cache-misses and swapping; but then lets have that be the subject of measurement.)

Nevertheless, my informal browser-time-spent-in-the-JS-engine measurements also indicate that lazy-mjitting would decrease latency on all but the most JS-intensive pages, so I really would like to see lazy-mjitting landed.
As long as we're shipping 32-bit builds peak memory (or more precisely address space, which is even worse) usage matters, because we will OOM and crash as soon as we go over 2GB at most (usually closer to 1.5GB, due to fragmentation) on Windows....

Comment 35

7 years ago
(In reply to comment #34)
> As long as we're shipping 32-bit builds peak memory (or more precisely address
> space, which is even worse) usage matters, because we will OOM and crash as
> soon as we go over 2GB at most (usually closer to 1.5GB, due to fragmentation)
> on Windows....

Yes, sorry, I was unclear; I was implicitly referring to *mjit* peak memory usage which I am assuming bug 617656 fixes.  The rest of the browser definitely matters.
> (In reply to comment #29)
> Shouldn't lazy compilation actually win us some performance?

> (In reply to comment #33)
> I really would like to see lazy-mjitting landed.

The performance win from lazy JITing on non-JS intensive pages is not perceivable, as far as I understand - we're talking 10-30ms during page load. We want lazy-mjitting, I never said it wasn't worthwhile. It definitely is and it'll be even more important as JM evolves. I'm saying we should be very careful about throwing away benchmark points in 4.0, and I'm concerned with 100MB/tab in a long-running session and very little of it being JIT Code.

Comment 37

7 years ago
(In reply to comment #35)
On second thought, though, mjit peak usage could matter if the browser was at an already-high usage level and then the mjit does a burst of allocations.  So decreasing peak memory usage would help there.  Of course, what you really want for that problem is not a % less peak memory usage but for the malloc OOM handler to purge mjit code and other caches.  I've seen this proposed several times before.
(In reply to comment #28)
> With Sunspider we had clear agreement that the goal was worthwhile, and chasing
> all those 1% wins (as well as the occasional 5% win) was worth the effort. 
> It's obvious that the value of memory reduction is less clear in everyone's
> minds.

Yes. We need a better understanding of these tradeoffs. As a gut response, I would say that, right now, a 10% reduction in total memory would be worth a 1% regression in SunSpider performance. But not 5%.

For my personal browser usage, browser memory reduction has no value: AFAICT, I never experience any problems from it. That is probably a part of why memory reduction is less appreciated than maybe it deserves. Presumably netbooks, mobile devices, and such could really benefit. Maybe we should get some low-memory devices, try some stuff, and see what gets painful.

> Look back at your anecdotal stats.  You're all averaging something close to
> 100MB per tab.

That's not right. In the example of my browser above, I had 10-15 tabs open (I think), not just 5. At the moment, I have 18 tabs open, and about:memory shows 200MB allocated with malloc, 300MB working set, and 7MB jitcode. A little while ago, it was 17MB jitcode; the aging algorithm must have reduced it. On this machine I have 8GB and currently 5.5GB free.

> I think that's unreasonable.  Our memory usage is way higher than in 3.6.  

That's expected due to gfx and jit work, and the memory increase is not that out of proportion with the change in the price of memory over that period. Also, Fx4's performance is so much better in every practical aspect that I would not even consider using 3.6. We should reduce memory, but 3.6 is not the right point of comparison.

> Firefox already has a reputation for being a memory hog.  People
> are going to notice, especially those on mobile.

If I read the gist of your comment is right, you are trying to persuade people that reducing memory usage is important. And you want to try to move ahead on this patch, but if it doesn't make the Fx4 release because it gets too hard, that's OK, we can do it later. That's my position, anyway. I also think that if we do miss the release on this, we should get it in a service release soon after.

There is another consideration, though: if other parts of the browser are also really close to getting some substantial reductions, it could be that we are able to reduce total memory usage more. And if we could cut total usage by, say, 30% from our current place, then we might want to hold the release a little bit for that. I don't have that big picture, though.
(In reply to comment #38)

> For my personal browser usage, browser memory reduction has no value: AFAICT, I
> never experience any problems from it. That is probably a part of why memory
> reduction is less appreciated than maybe it deserves. Presumably netbooks,
> mobile devices, and such could really benefit. Maybe we should get some
> low-memory devices, try some stuff, and see what gets painful.

Please grab some Android devices and use Fennec. We can OOM on the Nexus S fairly easily.

I understand that it's not a sure thing that fixing this bug will help our situation on mobile, but I'd like to test it a bit. Do we have consensus on which patch/approach we should be considering? We can try to get some numbers from running on Android.
An easy test to figure out what the maximum gain is for mobile is just to run with mjit off and see what it does to about:memory or your OOM cases.  if you still crash, it's not enough. if you don't, we should probably let you tune the threshold somehow, since you have some sunspider lead to burn.
(In reply to comment #39)
> (In reply to comment #38)
> 
> > For my personal browser usage, browser memory reduction has no value: AFAICT, I
> > never experience any problems from it. That is probably a part of why memory
> > reduction is less appreciated than maybe it deserves. Presumably netbooks,
> > mobile devices, and such could really benefit. Maybe we should get some
> > low-memory devices, try some stuff, and see what gets painful.
> 
> Please grab some Android devices and use Fennec. We can OOM on the Nexus S
> fairly easily.
> 
> I understand that it's not a sure thing that fixing this bug will help our
> situation on mobile, but I'd like to test it a bit. Do we have consensus on
> which patch/approach we should be considering? We can try to get some numbers
> from running on Android.

I think Fennec almost certainly should use a pretty aggressive approach here. Even if it doesn't fix your OOM cases it seems like pure win. We'll let you know when we have a patch worth trying. In the meantime, shaver's suggestion should give you a pretty good estimate of the magnitude of the savings.
(Assignee)

Comment 42

7 years ago
Created attachment 511211 [details] [diff] [review]
unified backedge/call counting patch

This patch incorporates code from Chris's patch and David's patch. It works with the tracer and the profiler. It counts backedges on a per-loop basis, and it also counts calls. It compiles a function when it has been called enough or when any of its loops have been traversed enough. However, it still has the problem with profiler aborts that David mentioned above. I haven't yet decided how to fix this.

I'll post performance data next.
(Assignee)

Comment 43

7 years ago
Created attachment 511214 [details]
performance data for patch in previous comment

This file shows the performance and memory data for 3 different configurations. In all three configurations, compilation happens if we traverse a loop once. It also happens if we call a function at least N times, where N is 10, 100, and infinity.

I measured performance on our usual benchmarks, as well as some that I have gathered over the course of the profiler work.

For memory usage, I opened three tabs: Gmail, Zimbra, and TechCrunch. Then I used the patch in bug 625305 to measure cumulative mjit memory usage. This counts how many bytes have been allocated for mjit code, and ignores any releases. The goal is to ignore blips caused by Brian's patch that occasionally throws out mjit code during GC.

In the N=10 config, the SunSpider score is basically the same. Memory goes from 92 MB to 48 MB.

For N=100, SunSpider regresses by 0.5% and memory goes down to 17 MB.

For N=infinity, SunSpider regresses by 1.2% and memory goes down to 8 MB.

Other benchmarks don't seem to be affected too much. The biggest V8 change is 0.5%.

The biggest SunSpider regression is in date-format-tofte. As far as I can tell, it's caused by the profiler abort problem that David mentioned above. If I could somehow fix this, we might actually be saving time on SS even in the N=infinity case. I'll look into this. Right now, I want to get the patch out so we can start worrying over how risky it is.
(Assignee)

Updated

7 years ago
Attachment #511214 - Attachment is patch: false
Awesome results! I would definitely take an 0.5% regression for that.
(Reporter)

Comment 45

7 years ago
Great stuff.  Some comments/questions:

- Why did you choose to store the backedge table in the compartment?  I would have though the JSScript was the obvious spot.  Having said that, adding a vector to every JSScript makes them significantly bigger and there are lots of JSScripts -- was that why?

- In backedgeCount() you return 0 if the lookup fails.  Will it ever fail?

- There isn't a single comment in the entire patch!  One or two wouldn't hurt, eg. on CALLS_BEFORE_COMPILE and BACKEDGES_BEFORE_COMPILE.

- This:
  
    #if (defined(JS_TRACER) && defined(JS_METHODJIT)) || defined(JS_METHODJIT)

  can be simplified to this:

    #if defined(JS_METHODJIT)

I'll do some measurements of my own.
(Reporter)

Comment 46

7 years ago
Oh, you'll need to handle OOM better here:

    backEdgeTable = js_new<BackEdgeTable>();
    JS_ASSERT(backEdgeTable);
    if (!backEdgeTable->init(32))
        JS_ASSERT(0);
(Assignee)

Comment 47

7 years ago
Yes, the hash table is in the compartment to save memory. backedgeCount() will return 0 if you compiled because of calls, in which case you may not have hit any back edges.

The patch is still kinda preliminary. I'm mainly looking for feedback on the approach.
(Reporter)

Comment 48

7 years ago
(In reply to comment #47)
> Yes, the hash table is in the compartment to save memory. backedgeCount() will
> return 0 if you compiled because of calls, in which case you may not have hit
> any back edges.

Fair enough.  Both of those explanations would make good comments :)
 
> The patch is still kinda preliminary. I'm mainly looking for feedback on the
> approach.

I'm not an expert on all this interpreter/JM interaction, but it looks pretty clean.
I think this approach is better than counting calls alone, and compiling after only one or two backedges should have a very low chance at regressing.

On my x86 VM, using Nick's test workload, I use 34.2M of cumulative JIT code memory before this patch. With the patch, with C=10,B=1, I get 20.4M. With C=10,B=10, I get 14.9M. So something in this workload might be pretty loopy.
blocking2.0: .x → ---
Err, that second one should read B=100.
(Reporter)

Comment 51

7 years ago
(In reply to comment #33)
> 
> I thought the first part of
> http://blog.mozilla.com/jseward/2011/01/27/profiling-the-browsers-virtual-memory-behaviour
> made a good point on the extent of the utility of 'peak memory usage'.  With
> that understanding, it seems like peak memory usage shouldn't be a primary
> motivation post-bug 617656.  (Obviously, its not worthless either, as even
> short-term high-memory usage leads to cache-misses and swapping; but then lets
> have that be the subject of measurement.)

When I do page-level Massif runs on Linux64, I see that typically around half the taken space is due to (a) dlopen'd code and data segments, and (b) thread stacks.  Lots of that space will never even be touched.

So the very first distinction should be between "pages that are mapped but never touched" and "pages that are mapped and touched".  The former case can be almost completely ignored (except for the address space exhaustion issue on 32-bit) and it could be maybe 1/4 or 1/3 of total VM space, at a guess.  The latter case -- which includes every byte written by JM -- is where things get interesting.

Ideally you'd then focus on a smaller set of hotter pages.  Working out what those hot pages are is difficult.  Julian's VM-sim tool is a great first attempt at measuring that, and it's telling us that there are basically two things causing page faults:  GC, and *IC purging*.  Fiddling with the order that ICs are purged may help with that;  creating fewer ICs will *definitely* help with that.
(Reporter)

Comment 52

7 years ago
Also, JITted code space tends to hurt more than normal code or data, because you write it and then execute it, which tends to stuff up the caches.  Code that's patched is even worse.
(Reporter)

Comment 53

7 years ago
I repeated my experiments from comment 11.  My results are much less positive:

      B=0      1      2      4      8      16     32     64     1000   inf
----------------------------------------------------------------------------
C= 0 |  53.1   X      X      X      X      X      X      X      X      X
   1 |  52.7   53.2          53.4          53.0          52.9          52.7
   2 |  44.3          43.0          40.6                               37.4
   4 |                       37.5                        32.1          31.8
   8 |                              32.8   31.9   29.5                 28.6
  16 |  39.5   39.7                 32.5   22.0   19.7   19.0          18.0
  32 |                              32.0   20.9   13.9                 9.4
  64 |                              32.0                 10.6          8.0
1000 |  39.3   39.5                                             2.2    2.2 
 inf |  39.3   39.2   37.1   35.8   31.8   19.8   11.7   8.5    2.1    2.0

Even going to (C=inf,B=1) only drops code size from 53.1 to 39.3 MB.  Around (C=16,B=16) I start getting good results.

I talked to billm on IRC about this, he's getting much better results than me on the same workload.  Something weird is going on.  I tried a fresh profile and both opt and debug builds, and got the same results with all variations.
(Reporter)

Comment 54

7 years ago
The fact that I get no change with (C=1,B=1) is suspicious.  It suggests that there are close to zero functions that are called only once, which contradicts both intuition and dmandelin's earlier measurements.
(Reporter)

Comment 55

7 years ago
Created attachment 511293 [details] [diff] [review]
fix off-by-one errors in unified patch

There are off-by-one errors in both count checks:

    script->incCallCount() < CALLS_BEFORE_COMPILE

    cx->compartment->incBackEdgeCount(pc) < BACKEDGES_BEFORE_COMPILE

because the counts are incremented before comparing, the comparisons should be <= not <.

This means that setting either threshold to 1 is the same as setting it to 0.  That explains why my (C=1,B=1) results have been identical to (C=0,B=0), which is progress of a sort.
(Reporter)

Comment 56

7 years ago
(In reply to comment #55)
> Created attachment 511293 [details] [diff] [review]
> fix off-by-one errors in unified patch

This patch also removes some of billm's extra accounting code.
(Reporter)

Comment 57

7 years ago
Adjusting the results in comment 53 to account for the off-by-one error:

      B=0      1      3      7      15     31     63     999    inf    
---------------------------------------------------------------------
C= 0 |  53.1   X      X      X      X      X      X      X      X      
   1 |  44.3   43.0          40.6                               37.4   
   3 |                37.5                        32.1          31.8   
   7 |                       32.8   31.9   29.5                 28.6   
  15 |  39.5                 32.5   22.0   19.7   19.0          18.0   
  31 |                       32.0   20.9   13.9                 9.4    
  63 |                       32.0                 10.6          8.0    
 999 |  39.3                                             2.2    2.2    
 inf |  39.3   37.1   35.8   31.8   19.8   11.7   8.5    2.1    2.0  

That looks more reasonable, though the code size drops more slowly than I would expect/hope as B and C increase.  For example, if we assume that all scripts are the same size (quite an assumption, but bear with me):

- Going from (C=0,B=0) to (C=1,B=0) only dropped us from 53.1 to 44.3.  That indicates that only 83% of scripts are executed more than once.  dmandelin measured that number to be only 50%.

- Going from (C=0,B=0) to (C=inf,B=0) only dropped us from 53.1 to 39.3.  That indicates that 74% of scripts have at least one taken back-edge.  dmandelin measured that number to be only 47%.

Now, dmandelin was measuring script counts, not code size, and he had a different workload.  Still, I'm a bit surprised.
(Reporter)

Comment 58

7 years ago
I experimented with http://www.valgrind.org/njn/test.html (if you want to try it, download a local copy, it's big and the server is slow).  The script it uses contains many (over 40,000) functions, stored in an array.  It loops over the array and calls each function once.

Now, with (C=1,B=1) I thought that none of the 40,000 functions would be compiled, but they all are.  With (C=1,B=20000) half of them are.  So it appears that once a loop gets compiled, all functions called from within it are also compiled.  Is this intentional?  It might be related to ICs (see comment 26).  It might also explain why the numbers drop off slower than dmandelin's measurements suggested -- some functions are compiled even though they don't hit the C threshold because they are called within a compiled loop.
(Reporter)

Comment 59

7 years ago
(In reply to comment #58)
> It might be related to ICs (see comment 26).

I tried applying dvander's patch from comment 26 on top of billm's patch.  (I'm not even sure if that makes sense, but I thought I'd give it a try :)

It fixes the all-functions-called-in-a-compiled-loop-are-compiled problem -- with (C=1,B=0) nothing in my test.html example is compiled, as I originally expected.

Furthermore, on my 5 tab workload, the numbers were a lot better:

               old      new
               ---      ---
(C=0,B=0)      53.1     55.0
(C=1,B=0)      44.3     34.8
(C=1,B=1)      43.0     33.4
(C=15,B=0)     39.5     25.0       
(C=15,B=15)    22.0     14.6
(C=inf,B=0)    39.3     18.4
(C=inf,B=15)   19.8      6.9

I'm not sure why the (C=0,B=0) result went up, but the others results are good, and much more in line with dmandelin's measurements.

I also tried Sunspider:

(C=0,B=0)       266ms
(C=15,B=0)      270ms
(C=15,B=15)     261ms
(C=1000,B=1000) 322ms
(C=inf,B=inf)   1275ms

The margin of error was sizeable (eg. 5%) but the reasonable values of B and C didn't seem to hurt.
(Reporter)

Comment 60

7 years ago
(Oh, and FWIW, in techcrunch.com accounts for about 75% of the code in my 5 site workload!)
(Assignee)

Updated

7 years ago
Attachment #511214 - Attachment is obsolete: true
(Assignee)

Comment 61

7 years ago
Created attachment 511569 [details] [diff] [review]
new patch

I don't know what happened with the last patch I posted. I'm not longer able to reproduce the memory numbers I was getting on techcrunch. Now I'm getting numbers more in line with Nick.

This patch combines the previous patch with David's don't-always-jit-for-call-ICs patch. It includes a workaround for a problem that happens when BACKEDGES > 1.

I'm getting pretty good performance and memory results with this at (15,15). However, I'll hold off posting anything until Nick can verify that it actually saves memory.
(Assignee)

Comment 62

7 years ago
Created attachment 511571 [details] [diff] [review]
new new patch

Oops. This one's better.
Attachment #511569 - Attachment is obsolete: true
(Assignee)

Comment 63

7 years ago
Created attachment 511585 [details]
perf data for calls=15, backedges=15

This is the performance data for the patch above. Surprisingly, the patch improves all the benchmarks, at least in the aggregate. I tried increasing the call threshold to infinity, and SunSpider performance got much worse, mostly due to controlflow-recursive (not surprisingly).
(Assignee)

Updated

7 years ago
Attachment #511585 - Attachment is patch: false
(Reporter)

Comment 64

7 years ago
new new stats for the new new patch:

      B=0      1      2      4      8      16     32     64     inf
----------------------------------------------------------------------------
C= 0 |  54.7   X      X      X      X      X      X      X      X      
   1 |  33.0   31.6          30.1          29.4                 27.9
   2 |  29.1   27.6   26.0                                             
   4 |  28.2   25.4          22.6                                  
   8 |  24.6   22.6   21.9          16.1   16.1   14.3             
  16 |  23.1   21.6          17.9   14.8   12.5   12.9   9.5           
  32 |  21.4   20.0   18.5          13.2   10.7   8.7                  
  64 |  22.3   18.7                        9.5           5.6           
 inf |  17.5                                                    0.3

I reckon (C=16,B=16) still looks like something of a sweet-spot.  Or we could do (C=2,B=0) if we want to sneak it into 4.0 :)
(Reporter)

Comment 65

7 years ago
Created attachment 511613 [details]
cg_diff output for v8-raytrace

I did some SS and V8bench runs, too:

               SS       V8bench
(C=0,B=0)      267ms    3360
(C=1,B=1)      255ms    3671
(C=8,B=8)      268ms    2970
(C=16,B=16)    255ms    3181
(C=32,B=32)    264ms    3136
(C=64,B=64)    285ms    2365

Margin of error for SS was around 5%.

At this point I don't trust the score-based version of V8bench much at all.  So I did some Cachegrind runs on the time-based version in Sunspider, along with Sunspider itself.

---------------------------------------------------------------
| millions of instructions executed                           |
| total                        | compiled (may overestimate)  |
---------------------------------------------------------------
|    68.383    68.942 (0.992x) |    44.772    44.716 (1.001x) | 3d-cube
|    40.462    40.502 (0.999x) |    25.405    25.401 (------) | 3d-morph
|    63.799    62.534 (1.020x) |    37.219    36.921 (1.008x) | 3d-raytrace
|    25.301    25.413 (0.996x) |    11.100    11.095 (------) | access-binary-
|    91.820    91.821 (------) |    85.809    85.805 (------) | access-fannkuc
|    29.080    29.017 (1.002x) |    16.512    16.478 (1.002x) | access-nbody
|    36.150    36.094 (1.002x) |    29.335    29.333 (------) | access-nsieve
|     7.380     7.401 (0.997x) |     3.253     3.251 (1.001x) | bitops-3bit-bi
|    36.515    36.537 (0.999x) |    31.742    31.741 (------) | bitops-bits-in
|    15.822    15.838 (0.999x) |    12.018    12.017 (------) | bitops-bitwise
|    37.988    37.941 (1.001x) |    32.967    32.966 (------) | bitops-nsieve-
|    16.958    16.932 (1.002x) |    12.874    12.870 (------) | controlflow-re
|    23.165    23.835 (0.972x) |    11.792    11.731 (1.005x) | crypto-md5
|    19.246    19.298 (0.997x) |     8.493     8.478 (1.002x) | crypto-sha1
|    68.826    69.146 (0.995x) |    21.937    21.894 (1.002x) | date-format-to
|    51.999    50.445 (1.031x) |     9.677     9.581 (1.010x) | date-format-xp
|    41.593    41.513 (1.002x) |    27.329    27.323 (------) | math-cordic
|    22.742    22.801 (0.997x) |     6.304     6.291 (1.002x) | math-partial-s
|    31.517    31.573 (0.998x) |    26.588    26.578 (------) | math-spectral-
|    47.118    46.916 (1.004x) |    34.563    34.555 (------) | regexp-dna
|    25.937    26.075 (0.995x) |     9.252     9.235 (1.002x) | string-base64
|    60.829    60.814 (------) |    32.704    32.735 (0.999x) | string-fasta
|    95.109    94.667 (1.005x) |    17.258    17.075 (1.011x) | string-tagclou
|    98.966    99.128 (0.998x) |    12.628    12.605 (1.002x) | string-unpack-
|    38.649    38.643 (------) |     8.977     8.966 (1.001x) | string-validat
-------
|  1095.367  1093.839 (1.001x) |   570.521   569.653 (1.002x) | all

---------------------------------------------------------------
| millions of instructions executed                           |
| total                        | compiled (may overestimate)  |
---------------------------------------------------------------
|  1197.162  1202.137 (0.996x) |  1035.788  1035.707 (------) | v8-crypto
|  1145.612  1146.711 (0.999x) |   842.931   842.762 (------) | v8-deltablue
|   894.296   878.813 (1.018x) |   453.731   453.343 (1.001x) | v8-earley-boye
|   503.383   467.835 (1.076x) |   237.959   235.706 (1.010x) | v8-raytrace
|   877.875   887.177 (0.990x) |   371.943   371.146 (1.002x) | v8-regexp
|  1143.424  1144.157 (0.999x) |  1115.515  1115.470 (------) | v8-richards
|   429.687   430.731 (0.998x) |   101.069   101.232 (0.998x) | v8-splay
-------
|  6191.442  6157.563 (1.006x) |  4158.939  4155.369 (1.001x) | all

v8-raytrace is a big win.  I've attached the cg_diff output, looks like there are some inefficient GetProp stub calls happening when the method JIT is on.
(Reporter)

Comment 66

7 years ago
(Oh, the Cachegrind runs were with (C=16,B=16).)
(Assignee)

Comment 67

7 years ago
Created attachment 511838 [details] [diff] [review]
cleaned-up patch

This is a cleaned-up version of the previous patch. The environment variables have been replaced with hard-coded constants. There's a new about:config option, javascript.options.methodjit_always. This makes it easy to turn the mjit tuning on or off without rebuilding. I also added the code that we need to handle exceptions and OOMs.

It's running on tryserver now.
Attachment #510951 - Attachment is obsolete: true
Attachment #511211 - Attachment is obsolete: true
Attachment #511293 - Attachment is obsolete: true
Attachment #511571 - Attachment is obsolete: true
Attachment #511838 - Flags: review?(dvander)
Comment on attachment 511838 [details] [diff] [review]
cleaned-up patch


>+    if (BackEdgeMap::Ptr p = backEdgeTable.lookup(pc))
>+        return p->value;
>+    else
>+        return 0;
>+}

You can drop the "else" here (no else-after-return).

>+    if (BackEdgeMap::AddPtr p = backEdgeTable.lookupForAdd(pc)) {
>+        p->value++;
>+        return p->value;
>+    } else {
>+        backEdgeTable.add(p, pc, 1);
>+        return 1;
>+    }
>+}

Same.
Attachment #511838 - Flags: review?(dvander) → review+
(Assignee)

Comment 69

7 years ago
http://hg.mozilla.org/tracemonkey/rev/f569d49576bb

I left the second conditional as-is because it uses p, which is scoped to the if statement.
Whiteboard: fixed-in-tracemonkey
(In reply to comment #69)
> http://hg.mozilla.org/tracemonkey/rev/f569d49576bb
> 
> I left the second conditional as-is because it uses p, which is scoped to the
> if statement.

Just use NULL not p in the else clause and lose the else.

/be
(Assignee)

Comment 71

7 years ago
(In reply to comment #70)
> Just use NULL not p in the else clause and lose the else.
> 
> /be

Strangely, I don't think that's correct. We can get back an AddPtr with valid, useful data and still follow the else branch via a magic C++ AddPtr-to-bool conversion.

If you're truly a virulent else-hater, I could lift out the AddPtr definition. But this seems uglier to me. Also, I've seen the same pattern used elsewhere.
Oh, I'm an else-after-return hater, but new hate target: falsy non-null pointer-like magic C++ objects! Luke, is this really righteous?

/be

Comment 73

7 years ago
Righteous?  I dunno.  Recall that, even on a miss, we want to convey the miss information to 'add' so the add is more efficient.  So that implies an API of the form:

 HashTable::Hit_or_miss_info info = ht.lookupForAdd(...);
 if (... somehow test success from info ...)
    ...
 else
    ht.add(info, ...)

on the other hand, forgetting the add-on-miss case for a second, its certainly nice to be able to write:

  if (HashTable::Ptr p = ht.lookup(...))
    ... p->value

which wants a 'truthy' Ptr type.  Putting the two together you get a truthy Hit_or_miss_info type (= HashTable::AddPtr).  The verbose alternative would be:

  MissInfo missInfo;
  if (HashTable::Ptr p = ht.lookupForAdd(..., &missInfo))
    ...
  else
    ht.add(missInfo, ...)

which I rather dislike.  A less verbose alternative would be for AddPtr (which derives Ptr) to hide the boolean conversion and force use of the (equivalent) found() member:

  HashTable::AddPtr p = ht.lookupForAdd(....);
  if (p.found())
    ...
  else
    ht.add(p, ....)

In comment 71 billm said this was ugly, but I don't know, I could dig it.  I guess its a matter of how terse you like it.  Truthy AddPtr does seem to encourage breaking the no-else-after-return rule...  I've over-analyzed it too much to have a strong opinion atm.
Sure, you want two results, a pointer for add and a not-found result. But the bool conversion operator gets the latter from the former (via a load and null test), right? Hiding that just makes for mischief. EIBTI.

JS forbids Object toBoolean resulting in anything but true, and C and C++ for plain old pointer types want only null to be falsy, so the magic conversion rubs me the wrong way.

/be

Comment 75

7 years ago
(In reply to comment #74)
Filed bug 633690 for further discussion.
(Reporter)

Comment 76

7 years ago
This patch broke the --jitflags option to jit_tests.py.  For example, if I use --jitflags=j,mj, I get "-j -m -j" passed to both invocations of each test.  It's something to do with the 'extend' call;  unfortunately my non-existent Python is preventing me from getting further than that.
(Reporter)

Comment 77

7 years ago
Created attachment 512095 [details] [diff] [review]
patch to fix jit_tests.py breakage

(In reply to comment #76)
> my non-existent
> Python is preventing me from getting further than that.

Turns out this line:

        t.jitflags = self.jitflags

in Test.copy is just copying a reference to jitflags rather than cloning it.  That's the problem.  Attached patch fixes it.
Attachment #512095 - Flags: review?(wmccloskey)
(Assignee)

Updated

7 years ago
Attachment #512095 - Flags: review?(wmccloskey) → review+

Updated

7 years ago
Depends on: 633929

Comment 78

7 years ago
I'm not sure how the counters are stored, but if reducing the number of bits for counters is important, try using half-a-bit. :) [Probabilistic Counter Updates for Predictor Hysteresis and Stratification]

The basic idea is that you can use a somewhat-random (doesn't have to be security-grade random) source to conditionally set a bit (or increment). So instead of having 4 bits to count up to 15, set a 1-bit counter 1/16 of the time.
cdleary-bot mozilla-central merge info:
http://hg.mozilla.org/mozilla-central/rev/f569d49576bb
http://hg.mozilla.org/mozilla-central/rev/1ec6fc4dbab8
Status: ASSIGNED → RESOLVED
Last Resolved: 7 years ago
Resolution: --- → FIXED

Comment 80

7 years ago
Now that we're not method-jitting everything immediately, have we made the appropriate changes to all the JM tests written when JM jitting everything? According to njn, we only added '|jit-test| mjitalways' flags to a handful of the tests (http://hg.mozilla.org/mozilla-central/rev/f569d49576bb). It would be very bad if our test coverage just dropped dramatically.

(We had similar issues when we changed the tracer's parameters, if I recall correctly.)
(Reporter)

Comment 81

7 years ago
Bug 635155 is open to get full test coverage back again.
Depends on: 640116
(Reporter)

Updated

7 years ago
Assignee: nnethercote → wmccloskey
You need to log in before you can comment on or make changes to this bug.