Closed Bug 157334 Opened 22 years ago Closed 14 years ago

IE6 3x faster than Mozilla, concatenating two local strings

Tracking

()

Status:

RESOLVED FIXED

People

(Reporter: pschwartau, Unassigned)

References

Details

(Keywords: perf, testcase, Whiteboard: [For best timing results, save and run testcases locally])

Attachments

(8 files, 12 obsolete files)

HTML testcase (poorly scoped) 22 years ago Phil Schwartau 6.82 KB, text/html		Details
JS shell testcase (poorly scoped) 22 years ago Phil Schwartau 3.35 KB, text/plain		Details
HTML testcase (corrected): \|UBOUND\| a local variable, like the strings 22 years ago Phil Schwartau 6.88 KB, text/html		Details
JS shell testcase (corrected): \|UBOUND\| a local variable, like the strings 22 years ago Phil Schwartau 3.40 KB, text/plain		Details
proposed xpconnect change 20 years ago Brendan Eich [:brendan] 769 bytes, patch		Details \| Diff \| Splinter Review
design notes 20 years ago Brendan Eich [:brendan] 2.25 KB, text/plain		Details
design notes, v2 20 years ago Brendan Eich [:brendan] 2.67 KB, text/plain		Details
design notes, v3 20 years ago Brendan Eich [:brendan] 2.98 KB, text/plain		Details
backup 17 years ago Brendan Eich [:brendan] 37.28 KB, patch		Details \| Diff \| Splinter Review
backup, v2 17 years ago Brendan Eich [:brendan] 56.79 KB, patch		Details \| Diff \| Splinter Review
backup, v3 17 years ago Brendan Eich [:brendan] 62.62 KB, patch		Details \| Diff \| Splinter Review
local GC, almost per design notes 17 years ago Brendan Eich [:brendan] 60.80 KB, patch		Details \| Diff \| Splinter Review
local GC, almost per design notes 17 years ago Brendan Eich [:brendan] 126.89 KB, patch		Details \| Diff \| Splinter Review
important bugfix, plus tracing optimization 17 years ago Brendan Eich [:brendan] 133.35 KB, patch		Details \| Diff \| Splinter Review
dependent string fixes 17 years ago Brendan Eich [:brendan] 135.74 KB, patch		Details \| Diff \| Splinter Review
checkpoint of moving local GC patch 17 years ago Brendan Eich [:brendan] 153.93 KB, patch		Details \| Diff \| Splinter Review
alterna-patch using GCF_ESCAPED flag 17 years ago Brendan Eich [:brendan] 126.25 KB, patch		Details \| Diff \| Splinter Review
hack 16 years ago Brian Crowder 2.76 KB, patch		Details \| Diff \| Splinter Review
tiny cleanup 16 years ago Brian Crowder 3.60 KB, patch		Details \| Diff \| Splinter Review
whitespace changes removed 16 years ago Brian Crowder 3.42 KB, patch		Details \| Diff \| Splinter Review

Phil Schwartau

Reporter

Description

•

22 years ago

This bug arose from the JS Engine performance tracking bug 117611.

It was noted that Mozilla under-performs on Peter Nilsson's string
concatenation test at http://www.formula-one.nu/mozilla/jsTestcases/,
which is structured schematically as follows: 


                  function test()
                  {
                    var str1= a very long string;
                    var str2= a very long string;
                    var str;
                                      
                    for(var i=0; i<=UBOUND; i++)
                      str = str1 + str2;
                  }


I will attach this testcase below in an HTML format that can be run
in NN4.x, Mozilla, and in IE6. I will also attach a standalone JS
version that I run via the load() function in the optimized JS shell.

The key thing to note is that all the variables are declared locally
in the function. This should avoid some DOM overhead, since the DOM
does security checks on all global variable access.

Here are results I got by trying the test in all three browsers
and in the JS shell. The Mozilla build date I used was 2002-07-01.

OS=WinNT4.0(SP6); 500MHz CPU; 128M RAM

UBOUND = 10,000 
Average time after 10 runs (in ms):

IE6  =  314
JS   =  715 
Moz  = 1045
NN4  = 1853


Note: for comparison, here are timing results on an empty loop:

UBOUND = 100,000: 
Average time after 10 runs (in ms):

IE6 =  151
JS  =   97
Moz =  167
NN4 = 2171


We see from above that IE6 is more than 3x faster than Mozilla on the
string concatenation test, and more than 2x faster than the JS shell -

Phil Schwartau

Reporter

Comment 1

•

22 years ago

Attached file HTML testcase (poorly scoped) (obsolete) — Details

Phil Schwartau

Reporter

Comment 2

•

22 years ago

Attached file JS shell testcase (poorly scoped) (obsolete) — Details

Phil Schwartau

Reporter

Updated

•

22 years ago

Blocks: 117611

Phil Schwartau

Reporter

Updated

•

22 years ago

Keywords: perf

Nicholas Allen

Comment 3

•

22 years ago

I'm seeing even worse results.  I did 10 iterations and got:

IE6 =  40.1
Moz = 490.4

I then ran it out to 70 iterations and got:

IE6 =  40.02
Moz = 500.6

Evaluations in IE ranged from 30-50 ms.
Evaluations in Moz ranged from 440-620 ms.

On Win2k (SP2), 1200 MHz CPU, 512M RAM with Mozilla 0704.

Jerry Baker

Comment 4

•

22 years ago

Moz 2002071208: Average time (in ms) = 1048.3
Nwtscape 4.79:  Average time (in ms) = 318.6
IE 6:           Average time (in ms) = 58.2

Phil Schwartau

Reporter

Comment 5

•

22 years ago

Jerry: can you give us the OS, CPU, and RAM of your PC? 
I'm surprised that NN4.79 is faster than Mozilla for you on this.

What results do you get if you save the testcase locally and run it
off your disk?  That's the best way (or were you doing that already?).

Also, do you have anything set in Mozilla's Edit > Preferences ?
Thanks -

Phil Schwartau

Reporter

Updated

•

22 years ago

Whiteboard: [For best timing results, save and run testcases locally]

Jerry Baker

Comment 6

•

22 years ago

Local files:
Mozilla: Average time (in ms) = 481.47
NN4:     Average time (in ms) = 411.80
IE6:     Average time (in ms) = 58.61

Windows XP; Athlon 800, 512MB PC-133

I have some prefs to keep windows from popping up, but that's the only prefs I
have set w/regards to JS.

Brendan Eich [:brendan]

Comment 7

•

22 years ago

Phil, Jerry: please try the test in xpcshell and report results.

/be

Phil Schwartau

Reporter

Comment 8

•

22 years ago

Trying the above JS shell testcase in my xpcshell.
OS=WinNT4.0(SP6); 500MHz CPU; 128M RAM; Mozilla 20020701xx

JS shell: Average time (in ms) = 710
xpcshell: Average time (in ms) = 660
Mozilla:  Average time (in ms) = 1018

Most contributors will not have an optimized JS shell handy, 
but Mozilla automatically comes with the xpcshell.exe. 

STEPS TO TEST IN XPCSHELL:
1. Save the JS shell testcase locally as, say, Concat_Two_Strings.js
2. Save it in the same directory that your Mozilla binary is in.
3. Open a console window; for example MS-DOS or CygWin
4. cd to your Mozilla directory
5. At the command prompt, type  ./xpcshell.exe -f Concat_Two_Strings.js
6. (RESULTS WILL PRINT OUT)
7. Hit the up arrow and <Enter> to try again, etc.
8. End your xpcshell session by typing the command quit()

Jerry Baker

Comment 9

•

22 years ago

XPCShell:

Average time (in ms) = 260.4

Nicholas Allen

Comment 10

•

22 years ago

Doing 100 runs:

xpcshell = 204.3

Evaluations in xpcshell ranged from 190-211 ms.

Phil Schwartau

Reporter

Comment 11

•

22 years ago

> The key thing to note is that all the variables are declared locally
> in the function. This should avoid some DOM overhead, since the DOM
> does security checks on all global variable access.


True, but in my adaptation of Peter Nilsson's test, I mistakenly
used a global variable |UBOUND| in the for-statement of the loop:


                  function test()
                  {
                    var str1= a very long string;
                    var str2= a very long string;
                    var str;
                                      
                    for(var i=0; i<=UBOUND; i++)
                      str = str1 + str2;
                  }



I will attach corrections below, and leave the two testcases
above in case anyone wants to compare how much DOM overhead
is incurred by a global |UBOUND|.

Phil Schwartau

Reporter

Comment 12

•

22 years ago

Attached file HTML testcase (corrected): |UBOUND| a local variable, like the strings — Details

Phil Schwartau

Reporter

Comment 13

•

22 years ago

Attached file JS shell testcase (corrected): |UBOUND| a local variable, like the strings — Details

timeless

Updated

•

22 years ago

Attachment #91234 - Attachment description: HTML testcase → HTML testcase (poorly scoped)

Attachment #91234 - Attachment is obsolete: true

timeless

Updated

•

22 years ago

Attachment #91235 - Attachment description: JS shell testcase → JS shell testcase (poorly scoped)

Attachment #91235 - Attachment is obsolete: true

Phil Schwartau

Reporter

Comment 14

•

22 years ago

It was noted in bug 117611 comment 63 that Mozilla appears to have
slowed down on the browser version of the string concatenation test,
when measured from the 1.0 release to today's current trunk.

Brendan asked for pure JS engine testing on this, since the browser
incorporates other modules which could be causing the slowdown.

I did this using the pure JS shell testcase in Comment #13 above.
I compared today's current trunk vs. the CVS tag MOZILLA_1_0_RELEASE.

As timeless notes in bug 117611 comment 65, to build via Makefile.ref
it is necessary to copy over such subdirectories as js/src/editline/
and js/src/config/, since they didn't get the MOZILLA_1_0_RELEASE tag.

Here are my results on WinNT4.0(SP6), 500MHz CPU, 128M RAM.
All times in ms:
                       
                   MOZILLA_1_0_RELEASE   Current trunk (2003-02-17)
UBOUND = 10000            690                 681                
UBOUND = 100000           6917                6837

These show a nominal JS Engine performance improvement on this test.
So it looks like any observed slowdown on this is browser-related -

Markus Hübner

Updated

•

21 years ago

OS: Windows NT → All

Hardware: PC → All

Markus Hübner

Comment 15

•

20 years ago

Do we have any chance of finding out where in the browser this slowdown is 
based on?

Keywords: testcase

Zibi Braniecki [:zbraniecki][:gandalf]

Comment 16

•

20 years ago

Athlon XP 1800+

Firefox 0.9:
Average time (in ms) = 896.93
Excluding first run  = 891.78

IE 6:

Average time (in ms) = 31.06
Excluding first run  = 29.92

Is there any chance to find a reason?

Brendan Eich [:brendan]

Comment 17

•

20 years ago

Uh, that's way off, and I wonder whether you'd find a similar difference if you
got rid fo the string concatenation and just ran a simple loop test.

Cc'ing jst, peterv, nallen.

Obviously this could be profiled.  jprof or another such sampler, or quantify or
another such instrumenter.  Cc'ing bz in case he's back and interested.

/be

Assignee: khanson → general

Component: JavaScript Engine → DOM: Level 0

Zibi Braniecki [:zbraniecki][:gandalf]

Comment 18

•

20 years ago

There is also big difference between Linux and Windows here.
Fx Linux gave results ~140 (4x slower than IE) while Fx Windows gave me results
between 600 and 800 (20x slower)

Nicholas Allen

Comment 19

•

20 years ago

I tried poking into this by looking at the difference in performance between
browser and xpcshell.  I had some trouble profiling because the browser seems to
have two different performance modes.  mode 1 was the baseline (280 ms), mode 2
took ~20% less time (220 ms), xpcshell took ~40% less time (160 ms), and IE took
~80% less time (60 ms).  Much of the test case time was spent in GC so I focused
on that.

When doing concat, there's a memory allocation failure path:

js_ConcatStrings:182
js_NewString:2439
js_AllocGCThing:529
js_GC

which is the last ditch allocator to try again.  xpcshell ran GC through that
path quite a bit taking relatively short GCs.  mode 2 ran GC through that path
an order of magnitude less frequently taking fewer, longer GCs spending a lot of
time in js_MarkGCThing.  mode 1 almost never ran GC through that path.  It was
instead taking the JS_MaybeGC on the dom branch callback and running many, long GCs.

I can't duplicate the extremely bad performance numbers.  This needs to be
profiled on one of the machines that sees 25x+ slowdown compared to IE.  But I
wouldn't be surprised if the % time in GC was huge.

Brendan Eich [:brendan]

Comment 20

•

20 years ago

Oh, these last ditch GCs are coming from
http://lxr.mozilla.org/mozilla/search?string=gGCSize.  Nallen, can you try
setting that constant to 0xffffffff and rerunning?

Also, benchmarking the js shell, built from js/src/Makefile.ref on Unixes with a
few directories pulled (config, editline) in addition to what Mozilla builds
pull, gives the baseline for JS perf.  XPCShell has thread-safety enabled,
although its costs have been optimized away in most cases.

/be

Brendan Eich [:brendan]

Comment 21

•

20 years ago

Attached patch proposed xpconnect change — Details — Splinter Review

I see no point, given the DOM's use of JS_GC and JS_MaybeGC, in having any
last-ditch GC limit for XPConnect-based apps that we host.  Other apps lacking
our DOM code, but using SpiderMonkey and XPConnect, may care; they will have to
make themselves heard, or suit themselves with better GC-calling from their own
code.

/be

Brendan Eich [:brendan]

Comment 22

•

20 years ago

Putting this on aviary1.0 radar.

/be

Flags: blocking-aviary1.0+

Nicholas Allen

Comment 23

•

20 years ago

I spun a build with unlimited gGCSize.  FF dropped down to 170 ms on the test
case, competitive with where xpcshell was.  xpcshell is toast though.  It
dropped from 160 ms to 130 ms but it can't keep that up very long since it never
runs a GC ever.

Zibi Braniecki [:zbraniecki][:gandalf]

Comment 24

•

20 years ago

Is there any chance to get Win build with this patch ? I don't have any
environment to build it by myself.

Nicholas Allen

Comment 25

•

20 years ago

I wouldn't recommend running with this patch.  It speeds up the testcase, but
bumping the GC limit to a larger yet still reasonable value would as well.  When
things go wrong with no limiter, they can go really wrong.  Multiple lines of
defense are a good thing here.

Hussam Al-Tayeb

Updated

•

20 years ago

Flags: blocking1.8a4?

Flags: blocking1.7.x?

Asa Dotzler [:asa]

Updated

•

20 years ago

Flags: blocking1.8a4? → blocking1.8a4-

Brendan Eich [:brendan]

Comment 26

•

20 years ago

Taking -- I have some JS-engine-focused inspiration based on some recent work
done for hardware-optimized Java VMs.

/be

Assignee: general → brendan

Component: DOM: Level 0 → JavaScript Engine

Priority: -- → P1

Target Milestone: --- → mozilla1.8beta

Brendan Eich [:brendan]

Comment 27

•

20 years ago

Attached file design notes (obsolete) — Details

"Stack" here refers to a virtual interpreter LIFO storage structure, not even
the interpreter's stack, and not a machine (thread or process) stack.

/be

Brendan Eich [:brendan]

Comment 28

•

20 years ago

Attached file design notes, v2 (obsolete) — Details

Attachment #159677 - Attachment is obsolete: true

Brendan Eich [:brendan]

Comment 29

•

20 years ago

Attached file design notes, v3 — Details

Attachment #159693 - Attachment is obsolete: true

Asa Dotzler [:asa]

Updated

•

20 years ago

Flags: blocking1.7.x? → blocking1.7.x-

Brendan Eich [:brendan]

Comment 30

•

20 years ago

The bad behavior for synthetic benchmarks that overallocate memory is no reason
for blocking aviary1.0.  I'll fix this soon enough on the trunk, we'll get it
into a shipping release soon enough.

/be

Status: NEW → ASSIGNED

Flags: blocking-aviary1.0+ → blocking-aviary1.0-

Brendan Eich [:brendan]

Updated

•

20 years ago

Target Milestone: mozilla1.8beta1 → mozilla1.9alpha

Markus Hübner

Updated

•

19 years ago

Blocks: 203448

Worcester12345

Comment 31

•

19 years ago

Are we there yet? (soon enough, that is)

;-)

Jerry Baker

Comment 32

•

19 years ago

Tested with Firefox 20050621

FF -> Average time (Ubound = 100000): 2650
IE6 -> Average time (Ubound = 100000): 311

So, nope, no improvement in almost 3 years.

Zibi Braniecki [:zbraniecki][:gandalf]

Comment 33

•

19 years ago

Well... On my Athlon 1.8, Gentoo with Gecko/20050605 Firefox/1.0+ it's 

(100 tries, default UBOUND)

Average time (in ms) = 198

Attachment 96729 [details]. 

Opera 8.01:

Average time (in ms) = 256

Konq 3.4:

Average time (in ms) = 158.62

Brendan Eich [:brendan]

Comment 34

•

19 years ago

One of you commenters want to chip in more than complaints?  The numbers for
other browsers are welcome, but it would be good to know the machine
configuration(s) used for these tests.

What's more, try varying UBOUND from small to large and plot the results (attach
csv or other data files, something we can chart with spreadsheets).  That would
be a help.

My past comments reflected interest, but I didn't have the time.  This bug is
now targeted for 1.9alpha, which is right.  No more complaining -- supply
signal, not noise.

/be

Jerry Baker

Comment 35

•

19 years ago

If I were to answer this the way most people are answered by owners when they
make a request in Bugzilla, I would say, "Hey, it's open source. If you want the
data, you are welcome to get it yourself."

Bastiaan Jacques

Comment 36

•

19 years ago

(In reply to comment #35)
> If I were to answer this the way most people are answered by owners when they
> make a request in Bugzilla, I would say, "Hey, it's open source. If you want the
> data, you are welcome to get it yourself."

Of course, if Brendan were to answer in that fashion, he would say: "Hey, it's
open source. If you want better performance, you are welcome to fix this bug
yourself."

Brendan Eich [:brendan]

Comment 37

•

19 years ago

Jerry, take it somewhere else or I'll stop playing along.  The whole point here
is cooperation, not whining or grumbling, even (*especially*) if you have a
point.  No one owes you anything, and you don't owe me more data -- but it would
help.  Are you here to help?  I am, but I'm kinda busy and I bet others could
get the data faster/better and maybe even analyze it well.

/be

Bob Clary [:bc] (inactive)

Updated

•

19 years ago

Flags: testcase?

Bob Clary [:bc] (inactive)

Comment 38

•

19 years ago

Checking in regress-157334-01.js;
/cvsroot/mozilla/js/tests/js1_5/String/regress-157334-01.js,v  <-- 
regress-157334-01.js
initial revision: 1.1

Flags: testcase? → testcase+

Marc Bejarano

Updated

•

18 years ago

Blocks: 164421

Worcester12345

Comment 39

•

18 years ago

(In reply to comment #38)
> Checking in regress-157334-01.js;
> /cvsroot/mozilla/js/tests/js1_5/String/regress-157334-01.js,v  <-- 
> regress-157334-01.js
> initial revision: 1.1
> 

Is that trunk, or branch? I know this is a trunk bug, but that was going to be done "soon" in Sept 2004, with "shipping release" soon thereafter. Just wondering which branch this patch landed on in case I missed something. I guess it is too late for 2.0 at this point, but one never knows. Maybe it can get slipped in if there is something else forcing another RC and this can fit in since it was pre-existing open bug and helps performance by 3x. Thanks.

Bob Clary [:bc] (inactive)

Comment 40

•

18 years ago

(In reply to comment #39)

That check in is for a test not a fix. This bug is still open as you can see from its status ASSIGNED.

Bob Clary [:bc] (inactive)

Updated

•

17 years ago

QA Contact: pschwartau → general

Brendan Eich [:brendan]

Comment 41

•

17 years ago

Attached patch backup (obsolete) — Details — Splinter Review

Work in progress, eliminate GCF_LOCK as a stored flag, move GCF_MUTABLE into the JSString length member.

/be

Brendan Eich [:brendan]

Updated

•

17 years ago

Target Milestone: mozilla1.9alpha1 → mozilla1.9 M8

Brendan Eich [:brendan]

Comment 42

•

17 years ago

Attached patch backup, v2 (obsolete) — Details — Splinter Review

Attachment #275539 - Attachment is obsolete: true

Igor Bukanov

Comment 43

•

17 years ago

Comment on attachment 275567 [details] [diff] [review]
backup, v2

>Index: js/src/jsgc.h
> /* GC thing type indexes. */
> #define GCX_OBJECT              0               /* JSObject */
> #define GCX_STRING              1               /* JSString */
> #define GCX_DOUBLE              2               /* jsdouble */
>-#define GCX_MUTABLE_STRING      3               /* JSString that's mutable --
>-                                                   single-threaded only! */
>+#define GCX_SCRIPT              3               /* JSScript */
> #define GCX_FUNCTION            4               /* JSFunction */
...
> 
> /* GC flag definitions, must fit in 8 bits (type index goes in the low bits). */
> #define GCF_TYPEMASK    JS_BITMASK(GCX_NTYPES_LOG2)
> #define GCF_MARK        JS_BIT(GCX_NTYPES_LOG2)
> #define GCF_FINAL       JS_BIT(GCX_NTYPES_LOG2 + 1)
> #define GCF_SYSTEM      JS_BIT(GCX_NTYPES_LOG2 + 2)
>-#define GCF_LOCKSHIFT   (GCX_NTYPES_LOG2 + 3)   /* lock bit shift */
>-#define GCF_LOCK        JS_BIT(GCF_LOCKSHIFT)   /* lock request bit in API */
> 
>-/* Pseudo-flag that modifies GCX_STRING to make GCX_MUTABLE_STRING. */
>-#define GCF_MUTABLE     2
>+/* Request flag bits passed to js_NewGCThing but not stored in memory. */

I am curious what is wrong with with GCF_LOCK given that the storage to store the flag is available in any case? For strings it saves an entry per hash table and the patch from bug 386265 comment 14 assumes this efficiency.

Brendan Eich [:brendan]

Comment 44

•

17 years ago

I was just backing up patches, also trying to free a flag bit for a purpose I've satisfied otherwise. So, thanks for the comment -- you are right, no need to lose the lock flag, which saves us a double hashtable entry for primitives.

/be

Brendan Eich [:brendan]

Comment 45

•

17 years ago

Attached patch backup, v3 (obsolete) — Details — Splinter Review

I restored GCF_LOCK (my baby from 1996) and removed all the mutable-string farbling, so this should not conflict with the patch for bug 391290. I'll review that bug's patch today.

/be

Attachment #275567 - Attachment is obsolete: true

Brendan Eich [:brendan]

Comment 46

•

17 years ago

Attached patch local GC, almost per design notes (obsolete) — Details — Splinter Review

Igor, I'd appreciate your comments and early review. This is working for simple tests, with very promising results (especially in terms of process memory size staying small); I'll pound on it tonight.

The escape barrier does not compact the local heap when a function returns, instead it hopes for good LIFO allocation patterns and return values. I'll instrument that, too, and clean up the ad-hoc instrumentation.

Also TBD: JS_NewContextLocalHeap -- this patch just bloats every context with a local heap.

/be

Attachment #275818 - Attachment is obsolete: true

Attachment #276584 - Flags: review?(igor)

Brendan Eich [:brendan]

Comment 47

•

17 years ago

Attached patch local GC, almost per design notes (obsolete) — Details — Splinter Review

Oops, must remember to upload the right patch.

/be

Attachment #276584 - Attachment is obsolete: true

Attachment #276586 - Flags: review?(igor)

Attachment #276584 - Flags: review?(igor)

Igor Bukanov

Comment 48

•

17 years ago

In the patch I think a fallible write barrier can be avoided if escaped things are just marked as such. Then, when the local GC detacts that the amount of escaped things reaches a particular threshold, the whole arena is transfered to the global list.

Brendan Eich [:brendan]

Comment 49

•

17 years ago

Attached patch important bugfix, plus tracing optimization (obsolete) — Details — Splinter Review

Transferring the arena on overflow trap to the global heap is a good idea. I did not think of doing that also if there are too many escapees. Will try tomorrow.

/be

Attachment #276586 - Attachment is obsolete: true

Attachment #276606 - Flags: review?(igor)

Attachment #276586 - Flags: review?(igor)

Brendan Eich [:brendan]

Comment 50

•

17 years ago

(In reply to comment #49)
> Created an attachment (id=276606) [details]
> important bugfix, plus tracing optimization
> 
> Transferring the arena on overflow trap to the global heap is a good idea. I
> did not think of doing that also if there are too many escapees.

Escape from a frame being popped is not the issue, though -- it's the heap trap that occurs when a local thing (string or number) is stored in a (global, of course) object or root. In such a case, we can try to keep the thing local, but we won't be able to reduce lastLimit to at or below its offset from the first page in the arena.

This will result in more local GCs and freelist-based local GC-thing allocations. It will increase the cost of local GC relative to allocation. That can hurt, at least for the real-nbody.js benchmark (before I added escape trap handling, i.e. fp->lastLimit and js_EscapeBarrier), a lot (order 10x slowdown, I'll post numbers tomorrow).

I'll try it and measure, but I'm worried about pathological performance.

/be

Igor Bukanov

Comment 51

•

17 years ago

(In reply to comment #50)
> Escape from a frame being popped is not the issue, though -- it's the heap trap
> that occurs when a local thing (string or number) is stored in a (global, of
> course) object or root. In such a case, we can try to keep the thing local, but
> we won't be able to reduce lastLimit to at or below its offset from the first
> page in the arena.

But why use lastLimit and not a flag for local things that became global? If js_GetGCThingFlags is slow for that, then what about using CPU pages for GC arenas so accessing the flag field would be very simple bit manipulation without any ifs. MMgc has page allocation code that can be used for that.

With the flag the only consequence of thing's escape would be decreased amount of local storage until local GC decides to transfer the whole arena to the global GC.
  
> This will result in more local GCs and freelist-based local GC-thing
> allocations. It will increase the cost of local GC relative to allocation. That
> can hurt, at least for the real-nbody.js benchmark (before I added escape trap
> handling, i.e. fp->lastLimit and js_EscapeBarrier), a lot (order 10x slowdown,
> I'll post numbers tomorrow).
> 
> I'll try it and measure, but I'm worried about pathological performance.
> 
> /be
>

Brendan Eich [:brendan]

Comment 52

•

17 years ago

(In reply to comment #51)
> (In reply to comment #50)
> > Escape from a frame being popped is not the issue, though -- it's the heap trap
> > that occurs when a local thing (string or number) is stored in a (global, of
> > course) object or root. In such a case, we can try to keep the thing local, but
> > we won't be able to reduce lastLimit to at or below its offset from the first
> > page in the arena.
> 
> But why use lastLimit and not a flag for local things that became global?

That's why I was trying to liberate GCF_LOCK (comment 43 and comment 45). But I would prefer to minimize change so as to get this performance and memory win into 1.9. We can move to mmap based page allocation in a later patch, if there is time.

[Since I wrote that, hours ago] Thanks for filing this now as bug 392263.

> With the flag the only consequence of thing's escape would be decreased amount
> of local storage until local GC decides to transfer the whole arena to the
> global GC.

What if we don't transfer the whole local heap arena, so some local things are referenced by global things. We will need to unify global and local mark phases or the local GC for this local heap's context will sweep the now-globally-referenced, locally-unreferenced thing.

The way local GC works, we don't want global things pointing to local things, or we may end up putting local things on a global or per-thread freelist. The case we hope is much more typical than transferring the whole arena to the global GC is to "pop" LIFO-allocated things and empty the arena, leave it ready for more local, mostly-LIFO allocations.

/be

Igor Bukanov

Comment 53

•

17 years ago

Question about dependent strings.

When a local dependent string becomes global, the patch AFAICS does not make global its base. In this case what prevents the base from being collected via local GC?

Brendan Eich [:brendan]

Comment 54

•

17 years ago

(In reply to comment #53)
> Question about dependent strings.
> 
> When a local dependent string becomes global, the patch AFAICS does not make
> global its base. In this case what prevents the base from being collected via
> local GC?

That's a bug, thanks for pointing it out (I had it in the back of my mind, chewing a hole in my brain ;-) -- new patch shortly.

/be

Brendan Eich [:brendan]

Comment 55

•

17 years ago

Attached patch dependent string fixes (obsolete) — Details — Splinter Review

Attachment #276606 - Attachment is obsolete: true

Attachment #276850 - Flags: review?(igor)

Attachment #276606 - Flags: review?(igor)

Igor Bukanov

Comment 56

•

17 years ago

(In reply to comment #55)
> Created an attachment (id=276850) [details]
> dependent string fixes

Is the idea here is to allow for dependent strings to exists only in the local heap? But then s += string where s is a global variable would not be able to benefit from the patch unless I miss something.

Igor Bukanov

Comment 57

•

17 years ago

The patch currently does not deal with GC_GLOBAL_WRITE that fails. How that would be addressed?

Brendan Eich [:brendan]

Comment 58

•

17 years ago

(In reply to comment #56)
> (In reply to comment #55)
> > Created an attachment (id=276850) [details] [details]
> > dependent string fixes
> 
> Is the idea here is to allow for dependent strings to exists only in the local
> heap? But then s += string where s is a global variable would not be able to
> benefit from the patch unless I miss something.

Any global write of a local value must global-ize the local. The dependent string case could be handled by extending the tracer to keep track of multiple things to move, and that could be worthwhile. But I would like to instrument more and keep the code simple, and see whether this dependent local string written to global case is indeed a hard case.

Also, the result of + is an independent string, not a dependent string. The left operand (s in your example) would be a global dependent string whose base is the global independent string produced by JSOP_ADD, even though string (the right hand side) is local. See the JOF_GLOBALWRITE testing in jsinterp.c.

With more static analysis, we could do better, but that is for ActionMonkey and for later.

(In reply to comment #57)
> The patch currently does not deal with GC_GLOBAL_WRITE that fails. How that
> would be addressed?

About as badly as the js_MakeStringImmutable failing in js_SetSlotThreadSafe, which you filed as bug 363059 :-P. If there's an OOM error under js_LocalGC called from the write barrier, the value written to the global-heap-contained slot will be null. To fix this I would need to make the *OBJ_SET_* macros fallible.

It could be done, but I'm trying to get benchmark results ignoring this little issue, so I can compare with an alternative patch along the lines you suggested of allowing local things to become reachable from globals, flagging them so they do not get collected by a local GC, and merging global and local mark phases.

/be

Igor Bukanov

Comment 59

•

17 years ago

(In reply to comment #58)
> (In reply to comment #57)
> > The patch currently does not deal with GC_GLOBAL_WRITE that fails. How that
> > would be addressed?
> 
> About as badly as the js_MakeStringImmutable failing in js_SetSlotThreadSafe,
> which you filed as bug 363059 :-P.

I do not like the idea of infecting more code with that bug.

> If there's an OOM error under js_LocalGC
> called from the write barrier, the value written to the global-heap-contained
> slot will be null. To fix this I would need to make the *OBJ_SET_* macros
> fallible.

An alternative recovery is to make the local arena global on errors. Then js_LocalGC  never fails.

> 
> It could be done, but I'm trying to get benchmark results ignoring this little
> issue,

What if fixing the issue (which I do not see as small one) would affect the benchmarks to alter the conclusion?

Brendan Eich [:brendan]

Comment 60

•

17 years ago

(In reply to comment #59)
> I do not like the idea of infecting more code with that bug.

Agreed, and so I won't commit this patch or any that has the null-on-OOM-from-localGC bug. My review request was more for early eyes on code, not to get the patch landed. I'm still debugging it, in fact.

> > If there's an OOM error under js_LocalGC
> > called from the write barrier, the value written to the global-heap-contained
> > slot will be null. To fix this I would need to make the *OBJ_SET_* macros
> > fallible.
> 
> An alternative recovery is to make the local arena global on errors. Then
> js_LocalGC  never fails.

Not bad for a solution that should not affect benchmarks.

> > It could be done, but I'm trying to get benchmark results ignoring this little
> > issue,
> 
> What if fixing the issue (which I do not see as small one) would affect the
> benchmarks to alter the conclusion?

I think you just solved that problem ;-).

/be

Brendan Eich [:brendan]

Comment 61

•

17 years ago

Attached patch checkpoint of moving local GC patch — Details — Splinter Review

This is merged up to top of trunk as of now.

/be

Attachment #276850 - Attachment is obsolete: true

Attachment #276850 - Flags: review?(igor)

Brendan Eich [:brendan]

Comment 62

•

17 years ago

Attached patch alterna-patch using GCF_ESCAPED flag — Details — Splinter Review

This patch improves performance and memory use for the string concatenation benchmark in this bug, and the double-allocating benchmark in bug 161110. So does the last "plan A" patch (attachment 277024 [details] [diff] [review]), which uses moving GC lightly to preserve LIFO local heap management as much as possible.

But for the real-nbody.js test, which jresig hosts and which originated (AFAIK) in the Debian language shootout, this patch makes things a lot slower. That's because it overcollects compared to the stack-like case in the last patch, which can finalize temporaries without first having to mark, when popping a frame.

I'll continue with "plan A", working to remove fallibility in the write barrier.

/be

Igor Bukanov

Comment 63

•

17 years ago

(In reply to comment #62)

> But for the real-nbody.js test, which jresig hosts and which originated (AFAIK)
> in the Debian language shootout, this patch makes things a lot slower. That's
> because it overcollects compared to the stack-like case in the last patch,
> which can finalize temporaries without first having to mark, when popping a
> frame.

I suspect that what slowed math-intensive benchmarks with the patch is not "ovecollecting" but rather "overscanning". That is the patch makes LocalGCSweep more expensive since now it has scan over escaped things. 

This means that a copy collector for doubles would benefit SM even more then just local heap since copying GC never scans the garbage which calculations with double values produces a lot. It is too bad that due to API compatibility this is not possible for the whole heap and can only be done for a local copy that embeddings do not see. 

Still it should be possible to reduce the scanning overhead within the current API by marking doubles in groups. Then the scanning code can put the whole group to the free list with single check instead of scanning all group members. With bug 392263 implemented such grouping should be straightforward to support.

Igor Bukanov

Comment 64

•

17 years ago

To Brendan:

The comments in the bug mention few times real-nbody.js benchmark. Where can I find it? I would like to try it out with a patch from bug 400902.

Brendan Eich [:brendan]

Comment 65

•

17 years ago

(In reply to comment #64)
> To Brendan:
> 
> The comments in the bug mention few times real-nbody.js benchmark.

See http://ejohn.org/apps/speed/ under Real-World / N-Body. IIRC this test came from the "great language shootout" benchmark-suite at

http://shootout.alioth.debian.org/

/be

Igor Bukanov

Updated

•

17 years ago

Assignee: brendan → igor

Status: ASSIGNED → NEW

Igor Bukanov

Comment 66

•

17 years ago

http://ejohn.org/blog/javascript-engine-speeds/ is a link to various JS benchmarks that includes nbody one.

Igor Bukanov

Comment 67

•

17 years ago

I am not working on the bug right now.

Assignee: igor → general

Brian Crowder

Comment 68

•

17 years ago

> Re: comment 10 -- I commented on that blog. The code shown there has a couple
> of bugs, and we would not want to round up to powers of two past a certain
> point. We often go linear at a threshold: 2, 4, 8, ... 1024, 2048, 3072, 4096,
> ... although of course you can measure O(n^2) compounding of the linear cost
> function if you try hard enough.

re: http://www.yafla.com/dforbes/String_Concatenation_and_Immutable_Strings_Speeding_Spidermonkey#a412 (in case this isn't already discussed herein)

Fixing the bugs in his implementation and switching to linear-growth at some small threshold seem reasonably, and maybe this is worth developing just to give it a real try against real-world data.  I still think even with small linear growth, at a small threshold, there is a non-trivial footprint cost associated with this idea.

Brian Crowder

Comment 69

•

17 years ago

Did the patch from comment #21 ever get any traction?

Mike Schroepfer

Comment 70

•

17 years ago

(In reply to comment #68)
> > Re: comment 10 -- I commented on that blog. The code shown there has a couple
> > of bugs, and we would not want to round up to powers of two past a certain
> > point. We often go linear at a threshold: 2, 4, 8, ... 1024, 2048, 3072, 4096,
> > ... although of course you can measure O(n^2) compounding of the linear cost
> > function if you try hard enough.
> 
> re:
> http://www.yafla.com/dforbes/String_Concatenation_and_Immutable_Strings_Speeding_Spidermonkey#a412
> (in case this isn't already discussed herein)
> 
> Fixing the bugs in his implementation and switching to linear-growth at some
> small threshold seem reasonably, and maybe this is worth developing just to
> give it a real try against real-world data.  I still think even with small
> linear growth, at a small threshold, there is a non-trivial footprint cost
> associated with this idea.
> 

How about you work a patch up - we land it and let it go through talos/couple days of nightlies and back back out?

Zibi Braniecki [:zbraniecki][:gandalf]

Comment 71

•

16 years ago

Any chances in getting this in or should we move the TM?

I can reproduce it in latest Fx3 nightly - average is 67, comparing to 16 in IE& (HTML testcase)

Brian Crowder

Comment 72

•

16 years ago

Attached patch hack (obsolete) — Details — Splinter Review

Assignee: general → crowder

Status: NEW → ASSIGNED

Brian Crowder

Comment 73

•

16 years ago

Attached patch tiny cleanup — Details — Splinter Review

diff -b next.

Attachment #308464 - Attachment is obsolete: true

Brian Crowder

Comment 74

•

16 years ago

Attached patch whitespace changes removed — Details — Splinter Review

the way the old else clause was written did not agree with house-style, so I cleaned it up, but the majority of the changes are whitespace.

Brendan Eich [:brendan]

Comment 75

•

16 years ago

The spacing still looks wrong, and not just the spacing -- how about a plain (space-sensitive, I mean, but still cvs diff -pu8) diff?

More important, and I've said this before, exponentiation explodes, so we should switch to linear growth at some threshold. Doubling is wrong at large size.

/be

Brian Crowder

Comment 76

•

16 years ago

Brendan:  You don't like the diff from comment #73, as a diff?

This needs more testing and it's buggy, too (makes the assumption that power-of-2 alloc happened on creation of the first string, which is - I think - false), but worse it doesn't seem to yield much of a bonus in perf-tests, if anything.  I never sought review on this patch for that reason.

Brendan Eich [:brendan]

Comment 77

•

16 years ago

With jemalloc we definitely don't want to second-guess realloc's underlying size classification. This patch is not the way, so let's put energy into other avenues.

/be

Brian Crowder

Comment 78

•

16 years ago

Yep, agreed, which is why I abandoned work on it after trying the easy hack.  I think perhaps we should WONTFIX this ancient bug (the summary is sort of bogus anyway).

Brendan Eich [:brendan]

Comment 79

•

16 years ago

We should not WONTFIX this bug without re-running relevant benchmarks from it, noting the performance, and adjusting the summary if the bug should stay open. Just because one possible patch is invalid does not mean the bug should be marked WONTFIX -- that would be a bug that's a real malfunction by some measure, but that we believe does not matter, or is worth the costs for other reasons.

/be

Brian Crowder

Comment 80

•

16 years ago

Throwing this one back in the pool.

Assignee: crowder → general

Status: ASSIGNED → NEW

Priority: P1 → --

Target Milestone: mozilla1.9alpha8 → ---

Mike Shaver (:shaver -- probably not reading bugmail closely)

Comment 81

•

14 years ago

(We get clobbered by IE9 on this: we're bimodal at either 70 or 250ms, and they are sub-1ms.  Seems like they might be doing some analysis!)

Brian Hackett [Laid off!]

Comment 82

•

14 years ago

A constant propagation that works across branches/joins would pick this up.  Bug 608346 does this, but doesn't constant fold string concatenation (easy to fix).  How does the comparison look when the strings are not constant?

Mike Shaver (:shaver -- probably not reading bugmail closely)

Comment 83

•

14 years ago

If I make the function look like

var str1='qwertyuiop...';

function test()
{
  var str2='qwertyu...';
  var str;
                                        
  for(var i=0; i<=UBOUND; i++)
    str = str1 + str2;

  return str;
}

IE9 looks like 0-2 ms.  If I make it return str[40], it goes to about 2-4ms.

If I make the inner loop be
  str = str1[i % str1.length] + str2;
in addition to the other changes, IE9 is about 5-9ms, and we cut down to 30 -- because we're copying half as much string data, if I have to guess.

We may have to accept that they just have fast strings. :-)

Brian Hackett [Laid off!]

Comment 84

•

14 years ago

Our problem here is that even though we make a rope when doing str1 + str2, we also have to allocate a ballast to hold the flattened string on every such concatenation, which is 4KB for these strings.  If I comment out the code allocating this ballast my time on a microbenchmark adding these qwerty strings goes from 40ms to .5ms.  Blocking on bug 608776, which should fix this.

Depends on: 608776

Sean Stangl [:sstangl]

Comment 85

•

14 years ago

This has been resolved via bug 609440, which made JS_GetStringChars() fallible, removing the requirement of maintaining a buffer alongside the rope.

The testcase from Comment 13 currently reports 3.5ms average time on tracemonkey linux-x64-opt. Closing.

Status: NEW → RESOLVED

Closed: 14 years ago

Resolution: --- → FIXED

Guilherme Lima

Comment 86

•

14 years ago

Hi, I'm on Windows XP.
Setting UBound = 100000, running 30 times

Using latest nightly (Gecko/20110113 Firefox/4.0b10pre) I get
Average time (in ms) = 6.36

Using IE8 I get
Average time (in ms) = 92.26

And using Chrome 8.0.552.237 I get
Average time (in ms) = 2.0

So, 3x slower than Chrome, something to investigate?
Sorry to post on a closed bug.

Brian Crowder

Comment 87

•

14 years ago

3x slower than chrome we should investigate in a new bug...

Zibi Braniecki [:zbraniecki][:gandalf]

Updated

•

14 years ago

Blocks: 625615

Zibi Braniecki [:zbraniecki][:gandalf]

Updated

•

14 years ago

No longer blocks: 625615

Zibi Braniecki [:zbraniecki][:gandalf]

Comment 88

•

14 years ago

reported as bug 625615

You need to log in before you can comment on or make changes to this bug.