Closed Bug 604704 Opened 14 years ago Closed 6 years ago

Optimize the exact tracing methods

Categories

(Tamarin Graveyard :: Garbage Collection (mmGC), defect)

defect
Not set
normal

Tracking

(Not tracked)

RESOLVED WONTFIX
Future

People

(Reporter: lhansen, Unassigned)

References

Details

Attachments

(2 files, 1 obsolete file)

Once the exact tracing infrastructure has landed in Tamarin, we need to worry about performance.  There are many optimization ideas in the roadmap (https://acrobat.com/#d=UCWhgWVzJeNr7Ph3M7y1Fw), we should pursue them.

In particular we should look into fast scanning of slot arrays.
Depends on: 617943
Attached patch WIP: Specialized string tracers (obsolete) — Splinter Review
Warm-up exercise.

Strings are bounded-depth data structures (a string is either a leaf, possibly with atomic data hanging off it, or it is dependent on a master that is a leaf).  Thus we can avoid pushing and popping work items when marking a string value, and we can specialize TraceLocation for String pointers (yay C++).  The attached code is a preliminary attempt to do that.

A simple test shows some performance wins too (MacPro Release build w/GCC 4.0, five iterations, further testing required):

                                      avm          avm2
test                          best    avg   best    avg  %dBst  %dAvg
Metric: v8 custom v8 normalized metric (hardcoded in the test)
Dir: v8.5/js/
  crypto                       541  540.2    548  547.2    1.3    1.3 + 
  deltablue                    382  381.2    383  382.2    0.3    0.3 + 
  earley-boyer                1254 1247.8   1293   1289    3.1    3.3 + 
  raytrace                     814  812.4    853  851.6    4.8    4.8 + 
  regexp                       103  102.2    116    116   12.6   13.5 ++
  richards                     323    323    317    317   -1.9   -1.9 - 
  splay                        925    844    974  965.8    5.3   14.4   
Dir: v8.5/optimized/
  crypto                      4523 4514.6   4679 4665.8    3.4    3.3 + 
  deltablue                   4061 4042.4   4068 4056.2    0.2    0.3   
  earley-boyer                1262   1253   1287 1277.8    2.0    2.0 + 
  raytrace                    9383   9364   9715 9685.8    3.5    3.4 + 
  regexp                       103  102.2    117  116.2   13.6   13.7 ++
  richards                    4578 4575.6   4602 4599.6    0.5    0.5 + 
  splay                       6610   6582   6888 6870.2    4.2    4.4 + 
Dir: v8.5/typed/
  crypto                      3516 3510.6   3563   3557    1.3    1.3 + 
  deltablue                   4122 4116.8   4183   4176    1.5    1.4 + 
  earley-boyer                1261 1254.6   1291 1286.6    2.4    2.6 + 
  raytrace                    9392 9360.4   9690   9673    3.2    3.3 + 
  regexp                       102    102    117  116.6   14.7   14.3 ++
  richards                    4572   4572   4615   4607    0.9    0.8 + 
  splay                       1147 1143.6   1208 1207.2    5.3    5.6 ++
Dir: v8.5/untyped/
  crypto                       600  599.4    602  600.6    0.3    0.2 + 
  deltablue                   2041 2038.6   2090 2085.2    2.4    2.3 + 
  earley-boyer                1253 1249.4   1284 1281.8    2.5    2.6 + 
  raytrace                    3894   3886   3959 3953.4    1.7    1.7 + 
  regexp                       102    102    117  116.2   14.7   13.9 ++
  richards                     527  526.2    504  500.8   -4.4   -4.8 - 
  splay                       1088 1085.6   1143 1141.2    5.1    5.1 ++

Not sure what the slowdown on Richards means.
Apples-to-apples now, same setup:

                                      avm          avm2
test                          best    avg   best    avg  %dBst  %dAvg
Metric: v8 custom v8 normalized metric (hardcoded in the test)
Dir: v8.5/js/
  crypto                       546  545.6    548  547.4    0.4    0.3 + 
  deltablue                    383  381.4    383  382.8      0    0.4   
  earley-boyer                1292 1287.6   1293   1285    0.1   -0.2   
  raytrace                     868    866    853  852.6   -1.7   -1.5 - 
  regexp                       117  116.2    117  116.2      0      0   
  richards                     321    321    318    318   -0.9   -0.9   
  splay                        984  972.6    980    973   -0.4    0.0   
Dir: v8.5/optimized/
  crypto                      4688   4680   4697 4688.6    0.2    0.2 + 
  deltablue                   4096 4077.8   4089 4075.8   -0.2   -0.0   
  earley-boyer                1280 1272.4   1283 1278.8    0.2    0.5   
  raytrace                    9764   9743   9690 9680.8   -0.8   -0.6 - 
  regexp                       117  116.6    116    116   -0.9   -0.5 - 
  richards                    4602 4592.4   4615 4609.4    0.3    0.4 + 
  splay                       7007 6988.4   6851 6846.4   -2.2   -2.0 - 
Dir: v8.5/typed/
  crypto                      3592 3589.2   3572 3569.8   -0.6   -0.5 - 
  deltablue                   4183 4176.2   4175 4166.6   -0.2   -0.2   
  earley-boyer                1278 1274.6   1287 1281.8    0.7    0.6 + 
  raytrace                    9745   9729   9681 9673.4   -0.7   -0.6 - 
  regexp                       117  116.8    117  116.4      0   -0.3   
  richards                    4602 4594.8   4608 4605.6    0.1    0.2   
  splay                       1219 1216.4   1213 1211.4   -0.5   -0.4 - 
Dir: v8.5/untyped/
  crypto                       602  601.4    604  602.8    0.3    0.2 + 
  deltablue                   2088 2081.4   2084 2079.6   -0.2   -0.1   
  earley-boyer                1273 1267.6   1283 1275.6    0.8    0.6   
  raytrace                    3959 3948.8   3959   3951      0    0.1   
  regexp                       117  116.2    117  116.2      0      0   
  richards                     496  494.4    503    502    1.4    1.5 + 
  splay                       1153 1151.4   1143 1140.8   -0.9   -0.9 -
This is a faster bit-scan loop for slot tracers.  Here are some numbers, but the big win from this change should show up on Flex apps, not on microbenchmarks:

                                      avm          avm2
test                          best    avg   best    avg  %dBst  %dAvg
Metric: v8 custom v8 normalized metric (hardcoded in the test)
Dir: v8.5/js/
  crypto                       547  546.6    549  547.8    0.4    0.2 + 
  deltablue                    382  381.8    376  375.6   -1.6   -1.6 - 
  earley-boyer                1292 1286.6   1294   1292    0.2    0.4   
  raytrace                     868  867.6    860  859.4   -0.9   -0.9 - 
  regexp                       117  116.4    117  116.8      0    0.3   
  richards                     322  321.4    320    320   -0.6   -0.4 - 
  splay                        984  977.6    989    976    0.5   -0.2   
Dir: v8.5/optimized/
  crypto                      4699 4691.6   4695 4686.6   -0.1   -0.1   
  deltablue                   4075 4065.2   4108 4086.6    0.8    0.5 + 
  earley-boyer                1274 1260.4   1270 1265.4   -0.3    0.4   
  raytrace                    9764 9747.2   9745 9708.6   -0.2   -0.4   
  regexp                       117  116.4    117  116.2      0   -0.2   
  richards                    4602 4593.6   4602 4591.2      0   -0.1   
  splay                       6968 6945.8   7015 6986.2    0.7    0.6 + 
Dir: v8.5/typed/
  crypto                      3600   3592   3583   3580   -0.5   -0.3 - 
  deltablue                   4190 4185.8   4183 4174.8   -0.2   -0.3   
  earley-boyer                1269 1265.2   1279 1275.4    0.8    0.8 + 
  raytrace                    9755 9745.4   9764   9745    0.1   -0.0   
  regexp                       117  116.4    117  116.6      0    0.2   
  richards                    4602 4592.4   4602 4594.8      0    0.1   
  splay                       1215 1214.6   1225 1222.6    0.8    0.7 + 
Dir: v8.5/untyped/
  crypto                       603  600.6    599  597.6   -0.7   -0.5 - 
  deltablue                   2088 2084.8   2086 2081.2   -0.1   -0.2   
  earley-boyer                1269 1260.8   1272 1263.6    0.2    0.2   
  raytrace                    3943 3942.4   3955 3947.4    0.3    0.1 + 
  regexp                       116  115.8    117  116.4    0.9    0.5 + 
  richards                     498  492.2    493  491.2   -1.0   -0.2   
  splay                       1153   1152   1162 1159.2    0.8    0.6 +
Flags: flashplayer-bug-
This is somewhat more sophisticated but makes no real difference on any benchmark we have already.  A benchmark that ought to show a difference would have a very large database of mixed dependent/indirect strings (a Vector.<String> probably) and would run the allocator enough to cause a lot of GC in that database.  That said, it may be that the high bit for GC performance is elsewhere right now and that these performance tweaks should be back-burnered until we've tackled other problems.
Attachment #502756 - Attachment is obsolete: true
Priority: P3 → --
Target Milestone: Q3 11 - Serrano → Future
Depends on: 650102
Another probably not making a difference today idea is to exploit the new Leaf types to avoid testing ContainsPointers,  ie TraceLocation overrides would call TraceLeafPointer and it would just set the mark bit
Actually if the only !kContainsPointers allocations were Leaf objects we could also remove the test from TracePointer because contains pointers would always be true.
Assignee: lhansen → nobody
Status: ASSIGNED → NEW
Flags: flashplayer-qrb+
Tamarin is a dead project now. Mass WONTFIX.
Status: NEW → RESOLVED
Closed: 6 years ago
Resolution: --- → WONTFIX
Tamarin isn't maintained anymore. WONTFIX remaining bugs.
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: