Open Bug 1384840 Opened 7 years ago Updated 7 days ago

BacktrackingAllocator::tryAllocateRegister() is slow in SplayTree::lookup()

Categories

(Core :: JavaScript Engine: JIT, defect, P3)

defect

Tracking

()

Tracking Status
firefox57 --- wontfix

People

(Reporter: ting, Unassigned)

References

(Blocks 3 open bugs)

Details

(Keywords: perf)

I found this when profile Speedometer Ember test, BacktrackingAllocator::tryAllocateRegister() is slow in this line [1]:

  if (!rAlias.allocations.contains(range, &existing))

for SplayTree::lookup() where T is LiveRange*. On VTune, accessing node->item [2], v1->from(), v1->to() [3], and node->left [4] aren't cheap.

Are there any data structure other than SplayTree that have both better memory locality and fast accessing?

[1] http://searchfox.org/mozilla-central/rev/ad093e98f42338effe2e2513e26c3a311dd96422/js/src/jit/BacktrackingAllocator.cpp#1407
[2] http://searchfox.org/mozilla-central/rev/ad093e98f42338effe2e2513e26c3a311dd96422/js/src/ds/SplayTree.h#178
[3] http://searchfox.org/mozilla-central/rev/ad093e98f42338effe2e2513e26c3a311dd96422/js/src/jit/BacktrackingAllocator.h#356,358
[4] http://searchfox.org/mozilla-central/rev/ad093e98f42338effe2e2513e26c3a311dd96422/js/src/ds/SplayTree.h#182
I think it is caused by the implementation of SplayTree rather than SplayTree itself.

A simple approach is to allocate these nodes and referenced objects (e.g. LiveInterval) by a local allocator so that they can be accessed at the nearby address. IMO, all the objects created inside the RA process and have local live range are better to be packed together.
I don't know much about the underlying binary search tree storage here, so perhaps the suggestion here doesn't make a ton of sense, but binary search in general on a tree which is stored sorted normally is pretty cache hostile, as the algorithm forces you to make huge jumps in the underlying array offsets.  Over in bug 1366241 comment 7, for another case where we needed to do binary search over a tree, I suggested switching the tree to be sorted in level order, which is a smart trick to make the first few steps of the binary search algorithm access elements in the beginning of the physical array in the memory in order to increase the cache friendliness of lookups, and we got some good results.

I'm not sure how much this idea is possible to make work with splay trees.  Of course SplayTree right now works on top of a LifoAlloc, so the order of the nodes in the underlying array is totally random, so all of this is moot until that part can change!
Yes, I agree with you. It's hard to evaluate the impact of this approach. I guess the number of nodes and objects won't be large that's why I think local allocator will be a potential approach. Maybe we could collect information about accessing diversity and make some experiments based on the methods mentioned above.

Level order is good for static lookup, which is, queries occur after all elements are inserted. In this case, we keep inserting elements. To do that, I think we need some tricks but it will bring side effects.

If there's anything I missed, please let me know, thanks:)
Yeah I was guessing this data structure is continually updated since it was using a splay tree...  I think everything you said above makes sense to me in the abstract, but I don't know anything about this code, so please don't trust much of what I say about the code here.  :-)  I mostly wanted to point to our experience in the other bug if it's useful here (and even if it wasn't I figured it was a nice trick!)
See Also: → 1385165
Keywords: perf
Priority: -- → P3
Severity: normal → S3
Blocks: sm-regalloc
You need to log in before you can comment on or make changes to this bug.