Large amount of heap allocation from js::PropertyTable::init

RESOLVED FIXED

Status

()

defect
RESOLVED FIXED
9 years ago
9 years ago

People

(Reporter: jseward, Assigned: njn)

Tracking

Trunk
x86_64
Linux
Points:
---
Dependency tree / graph

Firefox Tracking Flags

(blocking2.0 betaN+)

Details

(Whiteboard: [cib-memory], fixed-in-tracemonkey)

Attachments

(1 attachment, 1 obsolete attachment)

For a run of M-C of 29 Oct, x86_64-Linux, startup with 5 tabs with
fairly complex pages, a bit of browsing, the heap maxes out at
72,395,085 bytes in 411,549 blocks.

When ranking heap allocation stacks by maximum liveness,
js::PropertyTable::init comes 2nd from top, beaten only by
gfxImageSurface::gfxImageSurface.  At its maximum liveness it holds
6,537,344 bytes in 17,209 blocks, or around 9% of the total heap
volume.  This is not quick turnover stuff, either: on average these
blocks stayed alive for 1/3 of the entire process lifetime.

Also, on average only 44% of the bytes in the blocks are written
to.  Normally that might be a sign of inefficient utilisation of
the blocks, but the blocks are calloc'd here, so perhaps the 
zeroes written by calloc are important?

raw data:

======== SUMMARY STATISTICS ========

guest_insns:  75,213,429,144

max_live:     72,395,085 in 411,549 blocks

tot_alloc:    2,035,850,111 in 6,282,282 blocks

insns per allocated byte: 36

[...]

-------------------- 2 of 5000 --------------------
max-live:    6,537,344 in 17,209 blocks
tot-alloc:   21,226,752 in 42,241 blocks (avg size 502.51)
deaths:      42,241, at avg age 24,888,177,107 (33.09% of prog lifetime)
acc-ratios:  1.91 rd, 0.44 wr  (40,695,808 b-read, 9,417,656 b-written)
   at 0x4C268DC: calloc (/home/sewardj/VgTRUNK/trunk/coregrind
                 /m_replacemalloc/vg_replace_malloc.c:467)
   by 0x65CBE89: js::PropertyTable::init(js::Shape*, JSContext*)
                 (js/src/jsutil.h:205)
   by 0x65CDBA7: JSObject::addPropertyInternal(JSContext*, long,
                 int (*)(JSContext*, JSObject*, long, js::Value*),
                 int (*)(JSContext*, JSObject*, long, js::Value*),
                 unsigned int, unsigned int, unsigned int, int,
                 js::Shape**) (js/src/jsscope.cpp:808)
   by 0x65CDFAE: JSObject::putProperty(JSContext*, long, int (*)
                 (JSContext*, JSObject*, long, js::Value*),
                 int (*)(JSContext*, JSObject*, long, js::Value*),
                 unsigned int, unsigned int, unsigned int, int)
                 (js/src/jsscope.cpp:853)
   by 0x657623F: js_DefineNativeProperty(JSContext*, JSObject*, long,
                 js::Value const&, int (*)(JSContext*, JSObject*, long,
                 js::Value*), int (*)(JSContext*, JSObject*, long,
                 js::Value*), unsigned int, unsigned int, int,
                 JSProperty**, unsigned int) (js/src/jsobj.cpp:4421)
Hashtables don't want to get too full, so 44% utilization by post-calloc writes is not bad.

The question is, do we need these hashtables for faster lookup, or are we making them prematurely or without enough evidence that property searches are suffering from linear costs compounding quadratically. We don't try to be fancy. We hash if there are > PropertyTable::HASH_THRESHOLD or 6 entries from PropertyTable::init.

The average capacity seems high at 6.5e6/17e3 or ~95 pointers (32 bits) per table but it's hard to say without knowing more about the distribution.

Julian, are you game to try setting HASH_THRESHOLD to 12 or larger and see what happens?

/be
(In reply to comment #1)
 
> The average capacity seems high at 6.5e6/17e3 or ~95 pointers (32 bits) per
> table

This is a 64-bit build, so ~47 pointers per table.

I notice that the avg block size at max residency is, as you say,
6.5e6/17e3 = 382 bytes, but averaged over all blocks from this stack,
it's 502 bytes.  Dunno if that's relevant.


> setting HASH_THRESHOLD to 12 or larger and see what happens?

Done; but getting accurately repeatable results from a browser run is
pretty hopeless.  I'll measure standalone jsshell a bit later.  

FWIW (probably not much) here's results from a somewhat longer run,
with HASH_THRESHOLD = 12.  The avg block size increased to 650 (from
502), peak residency from this stack is 6421kB as compared to 6537kB
before.  This may or may not be a good sign, given that the peak
residency for the process increased from 72.4MB to 92.6MB.



======== SUMMARY STATISTICS ========

guest_insns:  133,954,464,422

max_live:     92,637,734 in 313,694 blocks

tot_alloc:    2,307,069,659 in 12,741,356 blocks

insns per allocated byte: 58
[...]
-------------------- 3 of 5000 --------------------
max-live:    6,421,888 in 12,588 blocks
tot-alloc:   22,797,440 in 35,064 blocks (avg size 650.16)
deaths:      35,064, at avg age 41,620,376,594 (31.07% of prog lifetime)
acc-ratios:  1.51 rd, 0.44 wr  (34,465,592 b-read, 10,124,552 b-written)
   at 0x4C268DC: calloc (/home/sewardj/VgTRUNK/trunk/coregrind/m_replace
   by 0x647CD1E: js::PropertyTable::init(js::Shape*, JSContext*) (js/src
   by 0x647CE71: js::Shape::maybeHash(JSContext*) (js/src/jsscope.cpp:19
   by 0x647EB27: JSObject::addPropertyInternal(JSContext*, long, int (*)
   by 0x647ECCD: JSObject::putProperty(JSContext*, long, int (*)(JSConte
I should clarify, perhaps .. I'm reporting this primarily because
I don't remember seeing so much space usage in the js engine when
doing heap profiling earlier this year.  Do we have reliable 
space-use numbers for it, over time?  I'm also seeing 
js::mjit::Compiler::finishThisUp hanging on to multiple megabytes
of stuff.  I'll file that separately.
blocking2.0: --- → ?
blocking2.0: ? → betaN+
FWIW, in the run I discuss here: http://blog.mozilla.com/nnethercote/2010/12/09/memory-profiling-firefox-with-massif/, js::PropertyTable::init accounts for 2.04% (26,472,576 bytes) of the total memory mapped by the process (not just the heap).
Nick, how much of that is actually used?
I don't know, I used Massif and it doesn't tell me that, but Julian has data about that in comment 0.
I have a patch to move the property tree into compartments, but its a bit invasive and massive. It makes the property tree much shorter lived, so that might help.
I think I've found a pathology here.  Fixing it will be harder than just tweaking a hashing threshold, but the potential improvement is correspondingly higher.

Basically, creating a new shape is linear (in both time and space) in the number of properties.  So if we add numerous properties one at a time, we get quadratic behaviour.

Consider this delightful program:

 var x = {};
 x.x01 = 1;
 x.x02 = 1;
 x.x03 = 1;
 x.x04 = 1;
 x.x05 = 1;
 x.x06 = 1;
 x.x07 = 1;
 x.x08 = 1;
 x.x09 = 1;
 x.x10 = 1;
 x.x11 = 1;
 x.x12 = 1;

HASH_THRESHOLD is 6, so once we get to x.x06, we call PropertyTable::init()
for every property add.  So we create a hash table for 6 elements, and then
one for 7 elements, then one for 8, etc:

 init(this=0x9fc5e70):  entryCount= 6 -> 16 ( 64 bytes)
 init(this=0x9fc5ed0):  entryCount= 7 -> 16 ( 64 bytes)
 init(this=0x9fc5f30):  entryCount= 8 -> 16 ( 64 bytes)
 init(this=0x9fc6018):  entryCount= 9 -> 32 (128 bytes)
 init(this=0x9fc60b8):  entryCount=10 -> 32 (128 bytes)
 init(this=0x9fc6158):  entryCount=11 -> 32 (128 bytes)
 init(this=0x9fc61f8):  entryCount=12 -> 32 (128 bytes)

You might think "what a silly program" but looking at real programs, both
SunSpider/V8 in the shell and real-world browser usage, I'm seeing these
ascending runs *all the time*, often sequences 10, 20, 30, 40, 50 or
more long.

It turns out that if you have a large object literal, the properties get
added one at a time, just like in the above example!  In which case we're
building all these Shapes that will never be used, and allocating quadratic
(relative to the number of added properties) amounts of memory.  And don't
forget that we're not only allocating these PropertyTables but hashing
all those properties into them in the same quadratic fashion.  AFAICT it's
due to js_DefineNativeProperty() being called over and over from jsemit.cpp.  

There *must* be a way to batch up these property adds.  Is this related to
bug 577359?

And I wonder if there are cases other than object literals where this add-properties-one-at-a-time behaviour occurs -- I'm seeing enough these ascending runs so frequently that I suspect there must be other cases.
Slow arrays would be another case.

Note that the quadratic growth you see only happens until we get to 64 properties, as far a I can see.  After that we give up on the property tree, switch to dictionary mode, and start passing the table along from previous prop to next when it's added.  Of course this might mean we have a table per _object_ with that many properties, if I understand how dictionary mode works right...  I wonder how the two effects compare on real-world content.
(In reply to comment #9)
> 
> Note that the quadratic growth you see only happens until we get to 64
> properties, as far a I can see.

That makes sense, I didn't see runs go any higher than 64.  Still, in the worst case we're allocating/hashing roughly 25x more than we need to.
btw, v8-regexp is a useful test case for this.

(In reply to comment #8)
> I think I've found a pathology here.

Yes.  It looks ungood, and you can see these sequences of increasingly
large allocations in (eg) v8-regexp.  Fixing this obviously will be a
big win.

That aside though, the hash table utilisation is pretty bad.  DHAT
consistently gives an occupancy ratio of 44%-46% of the calloc'd
space.  I tried to figure out why that was, and eventually realised
it's an overestimate, an artefact of how dhat does its accounting.

In fact, both analytically and by measurement, the utilisation is 
a pitiful 3/8ths (37.5%).  Instrumenting PropertyTable::init gives
a value of 37.1% for v8-regexp.  By analysis, a block containing
N pointers, where N is a power of 2, is allocated for any entryCount
in the range 1/4 N to 1/2 N, giving minimum and maximum utilisations
of 0.25 and 0.5.  If the entryCount values are uniformly distributed
then we'll get an average utilisation halfway between, viz, 37.5%.

This wouldn't matter much, except there are a lot of these blocks.
Is there a way we can jack up the occupancy without totalling
performance?
You can see in jsscope.h that the maximum utilization is 75% -- see PropertyTable::needsToGrow().  I don't know how often PropertyTables grow, but given the pathology I found I suspect many of them are never looked at again.

So if we allow the tables to be fuller to start with we could probably reduce memory usage by close to 2x.  But avoiding the pathology would be better;  if we didn't allocate all those unnecessary PropertyTables, it may be the case that the proportion of PropertyTables that actually grow increases greatly.
I tried changing things so that the initial load factor was in the range 0.375--0.75 instead of 0.25--0.5.  SunSpider results (below) show that it causes a non-negligible (1--2%) instruction increase in some cases.  I'm now going to look at whether we can create the tables in a lazier fashion.

---------------------------------------------------------------
| millions of instructions executed                           |
| total                        | compiled (may overestimate)  |
---------------------------------------------------------------
|    69.463    69.505 (0.999x) |    44.798    44.798 (------) | 3d-cube
|    39.727    39.753 (0.999x) |    24.630    24.630 (------) | 3d-morph
|    64.272    64.357 (0.999x) |    37.176    37.176 (------) | 3d-raytrace
|    24.238    24.263 (0.999x) |    11.010    11.010 (------) | access-binary-
|    90.753    90.776 (------) |    85.777    85.777 (------) | access-fannkuc
|    28.105    28.133 (0.999x) |    16.279    16.279 (------) | access-nbody
|    36.203    36.225 (0.999x) |    29.338    29.338 (------) | access-nsieve
|     7.419     7.442 (0.997x) |     3.255     3.255 (------) | bitops-3bit-bi
|    36.619    36.642 (0.999x) |    31.745    31.745 (------) | bitops-bits-in
|    15.854    15.876 (0.999x) |    12.018    12.018 (------) | bitops-bitwise
|    38.039    38.063 (0.999x) |    32.971    32.971 (------) | bitops-nsieve-
|    16.984    17.007 (0.999x) |    12.875    12.875 (------) | controlflow-re
|    23.308    23.347 (0.998x) |    11.835    11.835 (------) | crypto-md5
|    19.440    19.462 (0.999x) |     8.544     8.544 (------) | crypto-sha1
|    66.585    66.688 (0.998x) |    21.867    21.867 (------) | date-format-to
|    62.255    63.338 (0.983x) |     9.943     9.943 (------) | date-format-xp
|    43.480    43.504 (0.999x) |    29.516    29.516 (------) | math-cordic
|    22.846    22.876 (0.999x) |     6.310     6.310 (------) | math-partial-s
|    31.575    31.601 (0.999x) |    26.567    26.567 (------) | math-spectral-
|    46.582    46.549 (1.001x) |    34.576    34.576 (------) | regexp-dna
|    28.498    28.534 (0.999x) |     9.272     9.272 (------) | string-base64
|    63.620    63.653 (0.999x) |    32.121    32.121 (------) | string-fasta
|   100.406   100.447 (------) |    17.214    17.214 (------) | string-tagclou
|    94.825    95.910 (0.989x) |    12.828    12.848 (0.998x) | string-unpack-
|    43.624    43.651 (0.999x) |     8.576     8.576 (------) | string-validat
-------
|  1114.731  1117.615 (0.997x) |   571.052   571.072 (------) | all
Assignee: general → nnethercote
Status: NEW → ASSIGNED
Turns out I was misunderstanding what shapes are and how they work, so my comments above about unused shapes aren't relevant.

So I measured how many times each PropertyTable is searched, changed and grown.  Here's the histogram for the 20 cad-comics workload:

( 1) 158796 (73.3%, 73.3%): searched = 2, changed = 0, grown = 0
( 2)  22968 (10.6%, 83.9%): searched = 3, changed = 0, grown = 0
( 3)   4831 ( 2.2%, 86.1%): searched = 6, changed = 0, grown = 0
( 4)   4618 ( 2.1%, 88.3%): searched = 4, changed = 0, grown = 0
( 5)   3428 ( 1.6%, 89.8%): searched = 0, changed = 0, grown = 0
( 6)   3219 ( 1.5%, 91.3%): searched = 5, changed = 0, grown = 0
( 7)   2790 ( 1.3%, 92.6%): searched = 1, changed = 0, grown = 0
( 8)   2756 ( 1.3%, 93.9%): searched = 8, changed = 0, grown = 0
( 9)   1239 ( 0.6%, 94.5%): searched = 7, changed = 0, grown = 0
(10)    769 ( 0.4%, 94.8%): searched = 21, changed = 0, grown = 0

The tables are often searched, but rarely changed or grown.  Small numbers of PropertyTables are searched 100s or 1000s of times, but the clear majority are searched twice, and most of the rest are searched a handful of times.  I also measured SS and V8, the numbers are similar but the small numbers account for an even higher proportion (eg. 80--85% for 2 searches).

(Nb: I excluded the calls to search() that occur within init() when first creating the PropertyTable.)

So I guess the next step is to work out why 2 searches is so common, and if that case can be optimized somehow.
Just to point out the obvious:  the vast majority of tables have a handful of searches.  So we allocate space for a hash table, hash all the properties into it, and then do a small number of searches.  Just sticking with linear search would be much better in these cases.  But linear search will kill us for the small number of cases where we do thousands of searches.

So if we can determine ahead of time if a PropertyTable will have a small number of searches, we can avoid the hash-ification, saving time and space.  I'll keep digging, but any suggestions would be welcome.
Maybe we could count the number of searches, and hashify if it exceeds some threshold...
(In reply to comment #16)
> Maybe we could count the number of searches, and hashify if it exceeds some
> threshold...

Yes, this was what I was getting at on IRC:

brendan: njn: if we hash on lookup (infallibly) we can defer table construction till needed for non-dictionary objects
 . . .
brendan: njn: if we instead hashed only when lookup found linear search facing the threshold (or even going past it enough -- IOW detect quadratic growth), then we'd hash a lot less

/be
Whiteboard: [cib-memory]
This patch avoids creating PropertyTables most of the time.

The old logic was to create a PropertyTable when calling maybeHash() and the
number of entries was 6 or more.  The new logic is to create a PropertyTable
once it's been searched 7 times or more (and also it's always done for
dictionaries).  I tried a few other values, 7 was the best in terms of
minimizing instruction counts for SS by a small margin.

I tried using both criteria together initially, but counting the number of
entries is linear, and putting that count inside search(), which is called a
lot, slowed things down a bit.  This new heuristic works well:  most Shapes
are searched 7 or fewer times.  Shapes that have few entries and are
searched a lot are now hashified;  this is ok because with few entries
there's little difference (in both space and time) between a hash search and
a linear search.  And 90+% of searches end up taking place in hashed Shapes.

Most of the maybeHash() calls have been removed accordingly;
I kept the ones where the shape is in dictionary mode.  maybeHash() is renamed
hashify() accordingly, there's no "maybe" about it now (unless there's an OOM).

The amount of space allocated in PropertyTable::init() has reduced by a
factor of 22.2x for an entire Sunspider run (1,405,568 bytes down to 63,424)
and 12.5x for a 20 tab cad-comics browser session (100,744704 bytes down to
8,036,480).

We also save some time by not allocating/hashing;  Sunspider instruction
counts are down 0.5%, V8 comes out unchanged;  timing changes in both cases
are in the noise.

The bulk of the patch is boring: JSContext pointers need to be passed to
Shape::search(), which required threading them through in lots of places.
Note particularly that I used the cached copies present in Parser and
FrameState in a couple of places, and I added a cached copy to 
CompExprTransplanter.

Shape has an extra uint32 field;  it only needs a few bits and so I could cram
it in somewhere else if necessary.
Attachment #496751 - Flags: review?(brendan)
(In reply to comment #18)
> The amount of space allocated in PropertyTable::init() has reduced by a
> factor of 22.2x for an entire Sunspider run (1,405,568 bytes down to 63,424)
> and 12.5x for a 20 tab cad-comics browser session (100,744,704 bytes down to
> 8,036,480).

Acceptable.
Comment on attachment 496751 [details] [diff] [review]
patch (against TM 58561:4aeb551dd44d)

>-js_LexicalLookup(JSTreeContext *tc, JSAtom *atom, jsint *slotp, JSStmtInfo *stmt)
>+js_LexicalLookup(JSContext *cx, JSTreeContext *tc, JSAtom *atom, jsint *slotp, JSStmtInfo *stmt)

TreeContext::prs::context is a JSContext * already, if that saves you some churn.
Oh, I wanted to ask: does staying with linear search more often help at all in terms of cache misses?

Also: if we don't want to whine about OOM failure, we can go JSObject * -> runtime (via the compartment's arena, I think) with various bitmath tricks, to avoid threading those params everywhere.

Awesome work.
> Also: if we don't want to whine about OOM failure, we can go JSObject * ->
> runtime (via the compartment's arena, I think) with various bitmath tricks, to
> avoid threading those params everywhere.

http://mxr.mozilla.org/mozilla-central/source/js/src/jsgc.h#535 shows the trick to get the runtime.
Yeah, obj->compartment()->runtime, its very fast.
Thanks for the suggestions!  It's Friday night in Melbourne so I won't get back to this until Monday, I'll try them then.
(In reply to comment #18)
> This patch avoids creating PropertyTables most of the time.

That's excellent.  The numbers I get, from a 20 tab cad-comic run,
are: M-C, before, w/ all space fixes except this: heap levels off at
around 186.3MB.  With this fix (which applies w/o problems to m-c)
it levels off at 167.7MB (90.0% of previous).

Of that 167.7MB, the breakdown for (obviously) js-related stuff is

27.0 js::mjit::finishThisUp  with which we've partially
     grappled (bug 611400, bug 616367, although the latter
     remains inconclusive)

24.0 js::propertyTree:newShape -- afaik uninvestigated, maybe related
     to this?

15.4 JSScript::NewScript -- also uninvestigated.

Very roughly speaking, everything else in the browser combined takes
about the same amount again.

Note of course this doesn't include the jitted code or other directly
mmap'd stuff.
I had a patch that moved hashifying to the lookup/search code but I avoided adding a cx parameter by tracking memory pressure otherwise. The idea of using the compartment to get at the runtime wasn't implemented then. Either way, I think we do not want to add cx params all over.

Nick, can you redo to avoid cx over-parameterization? I'll review quickly. Thanks again for taking this one and doing a great (and scientismically quantitative, as usual) job with it!

/be
(In reply to comment #9)
> Slow arrays would be another case.
> 
> Note that the quadratic growth you see only happens until we get to 64
> properties, as far a I can see.  After that we give up on the property tree,
> switch to dictionary mode, and start passing the table along from previous prop
> to next when it's added.  Of course this might mean we have a table per
> _object_ with that many properties, if I understand how dictionary mode works
> right...  I wonder how the two effects compare on real-world content.

Just to follow up: the dictionary-mode case does indeed avoid accumulating (shared immutable, and mostly useless in practice without njn's patch) hash tables, instead mutating and passing along the one true table for the object.

This is not a problem in practice. It's not strictly necessary as it is in JSC (where to compute the next slot number, a dictionary structure's hash table's entry count is consulted, with no fallback). It is of course important not to leave O(n^2) lookup growth traps around, but that would be a bug to have instead of the space craziness here.

This was unfinished scope-ectomy work, I should have dealt with it sooner but we're all happy to have Nick on the case.

/be
> leave O(n^2) lookup growth traps around, but that would be a bug to have

Meaning runtime growth traps, of course.

/be
One note if we did have to pass cx all over (we don't and should not): since it is a parameter of infallible nativeLookup, etc., it should go last, not first. The signature of fallible methods and functions is (bool or pointer return type)(JSContext *cx, ...), so we move cx to the end for infallible cx-bearing methods.

If cx is the only parameter then it's not clear from the signature, and comments are the only help.

/be
This patch uses the JSRuntime to allocate the PropertyTable entries in PropertyTable::init() instead of the JSContext.  This allows all of the JSContext arguments to be removed.

> The amount of space allocated in PropertyTable::init() has reduced by a
> factor of 22.2x for an entire Sunspider run (1,405,568 bytes down to 63,424)
> and 12.5x for a 20 tab cad-comics browser session (100,744704 bytes down to
> 8,036,480).

Some more numbers: for the 20 tab case, the amount of memory allocated by init() in use at the point of peak memory consumption dropped from 28,648,192 bytes down to 5,038,592, a reduction of 5.7x.
Attachment #496751 - Attachment is obsolete: true
Attachment #497205 - Flags: review?(brendan)
Attachment #496751 - Flags: review?(brendan)
(In reply to comment #25)
>
> 24.0 js::propertyTree:newShape -- afaik uninvestigated, maybe related
>      to this?
> 
> 15.4 JSScript::NewScript -- also uninvestigated.

I looked briefly at both of these, I couldn't see any obvious ways to shrink them.
Comment on attachment 497205 [details] [diff] [review]
patch v2 (against TM 58569:a546d76bea6b)

>-    int sizeLog2;
>+    uint32 sizeLog2 = JS_CeilingLog2(2 * entryCount);

Please leave the int alone -- it's faster and shorter in general. There's no need to overconstrain local types with bitsizes, and indeed JS_CeilingLog2 returns an int (under an ugly old typedef renaming, but no matter).

Great work, thanks again for doing this.

/be
Attachment #497205 - Flags: review?(brendan) → review+
(In reply to comment #32)
> Comment on attachment 497205 [details] [diff] [review]
> patch v2 (against TM 58569:a546d76bea6b)
> 
> >-    int sizeLog2;
> >+    uint32 sizeLog2 = JS_CeilingLog2(2 * entryCount);
> 
> Please leave the int alone -- it's faster and shorter in general. There's no
> need to overconstrain local types with bitsizes, and indeed JS_CeilingLog2
> returns an int (under an ugly old typedef renaming, but no matter).

That was necessary to avoid GCC complaining about signed vs. unsigned comparisons.
Couldn't you just use |unsigned int|?
Or uintN, which seems to be the SM idiom.  Sure, I'll do that.  Brendan:  is int really faster than uint32?
http://hg.mozilla.org/tracemonkey/rev/0343557b0c7a
Whiteboard: [cib-memory] → [cib-memory], fixed-in-tracemonkey
Backed out due to a test failure:
http://hg.mozilla.org/tracemonkey/rev/5bab73449e39

The problem is that this code:

  function f([y],a,b,c,d,e,f,g,h,a){}

should give a syntax error:

  a.js:1: SyntaxError: duplicate argument is mixed with destructuring pattern:
  a.js:1: function f([y],a,b,c,d,e,f,g,h,a){}
  a.js:1: ...............................^

but with the patch it doesn't.
A simpler reproducer:

  function f([y],a,a){}

The problem strikes here, jsparse.cpp:2819:

    if (fun->lookupLocal(context, atom, NULL) != JSLOCAL_NONE) {
        duplicatedArg = atom;
        if (destructuringArg)
            goto report_dup_and_destructuring;
    }

lookupLocal() calls Shape::search().  Without my patch, the search done for
the second 'a' is linear, and hits.  With my patch, the search is a hash
table search and misses.  Don't know yet why.
In the failing hash table search, there is only a single element in the hash table.  hashify() is called in two places outside of PropertyTable::search().  If I guard those two calls with a test to make sure the entryCount is greater than 6 the problem goes away... but that's just papering over it.  I wonder if there's something funny that causes problems if there's only one element in the hash table?
I bet it has something to do with this:

    /*
     * The destructuring formal parameter parser adds a null atom, which we
     * encode as an INT id. The parser adds such locals after adding vars for
     * the destructured-to parameter bindings -- those must be vars to avoid
     * aliasing arguments[i] for any i -- so we must switch u.i.names to a
     * dictionary list to cope with insertion "in the middle" of an index-named
     * shape for the object or array argument.
     */
Hmm, bug 600067 comment 1 sounds rather relevant.
(In reply to comment #41)
> Hmm, bug 600067 comment 1 sounds rather relevant.

Though that was a case where duplicate entries were seen in different order depending on whether the search was linear or hash.  This case we're getting different hit/miss results depending on the data structure.
That code is really weird. I ran into it before playing with the prop cache. It finds a point to inserts args. If we can solve this differently that would be awesome (i.e. collect them and then insert in-order).
Depends on: 619003
(In reply to comment #35)
> Or uintN, which seems to be the SM idiom.  Sure, I'll do that.  Brendan:  is
> int really faster than uint32?

It depends on the architecture. There's no need on 64-bit targets to chop to 32 bits unsigned when the natural-width int machine register will do. This used to matter with RISCs. I don't know if it's an issue with current targets, but it is an overspecification -- and an overlong eyesore. JS_CeilingLog2 really does return signed int, so uintN isn't right either.

/be
Bug 619003 is for the latent destructuring/duplicate args bug.
Whiteboard: [cib-memory], fixed-in-tracemonkey → [cib-memory]
A summary for blocker drivers:  this patch is complete and ready to land, but it exposed a latent bug which caused test failures, and so cannot land until bug 619003 lands.
Note to self:  the last paragraph of the big comment in jsscope.h needs to be updated to describe the new conditions under which the hash-ification occurs.
Blocks: 617236
Bug 619003 has landed, is this patch ready to land?
I'm about to land it.  I only just returned to work from the Xmas/New Year break :)
http://hg.mozilla.org/tracemonkey/rev/28d1f9e77362
Whiteboard: [cib-memory] → [cib-memory], fixed-in-tracemonkey
You need to log in before you can comment on or make changes to this bug.