Optimize computing a column number as count of code points by caching column numbers (and whether each chunk might contain anything multi-unit) and counting forward from them
Categories
(Core :: JavaScript Engine, enhancement, P1)
Tracking
()
People
(Reporter: Waldo, Assigned: Waldo)
Details
Attachments
(5 files, 1 obsolete file)
Bug 1504947 made column numbers be counts of code points (at least til its final step was reverted in bug 1549758). Because code points generally aren't all the same length, determining column numbers thus requires some sort of O(n)
scan -- O(n**2)
when it happens repeatedly. Computations on the same line can avoid redoing work by counting forward from a prior offset. But not all lookups are successive, and non-successive cases hit issues: bug 1545015, bug 1545034, bug 1546313 at minimum.
I've implemented a solution: for lines that are "long enough", lazily compute a vector of the column number at almost-regular intervals, then you can count only from the nearest column. --compileonly --file
with bug 1545034's testcase indicates this is effective.
There are a couple different axes of design space/implementation choice here.
First, chunk size.
128 could be any number that will fit the lengthiest code point (2/4 for UTF-8/16). A smaller number reduces worst-case scanning (which can still be quadratic, just within a chunk) but costs more memory and has more overhead to populate the data structures. I bet most non-minified lines are under 100ch, so 128 seems most reasonable. (We could data-gather to support that; I don't think the effort is justified.)
Second, the vector elements.
Regular chunking may split code points, so you must round chunk beginnings down to the start of a code point. This can be done every time a true boundary need be computed, or you could save the exact offset. The memory cost of exact offset saving, when retracting to a code point boundary should be a few predictable instructions, IMO rules out offset-saving. (I have a patch for this, tho, if we think different.)
A better thing to save in vector elements, IMO, is a bit that conservatively indicates whether a chunk is guaranteed to contain only single-unit code points. Then the scanning cost is only paid by chunks that definitely contain multi-unit code points, or by an instantaneously-at-the-end chunk where we just can't know yet. That one byte per chunk seems a much better cost to pay than four bytes per chunk for retracted offsets.
Anyway -- I'll post a patch where vector elements are column numbers only, a patch that adds in the guaranteed-single-unit bit too, and a patch that also stores the offset (that we probably don't want). I also have a nice big ol' testcase -- written specifically for that 128-unit chunk size, to trigger behaviors and edge cases within it -- that seems to indicate things are okay. (And it'd be easy enough to see how the test runs with various values for ColumnChunkLength
besides 128, for scattershot testing -- at some point I tried it with value 7
for a way-too-low approach and it worked fine.)
Assignee | ||
Comment 1•6 years ago
|
||
Assignee | ||
Comment 2•6 years ago
|
||
Depends on D31300
Assignee | ||
Comment 3•6 years ago
|
||
...longest code point encoding - 1) + ColumnChunkLength - 1 units must be observed when computing a column number. r=arai
Depends on D31301
Assignee | ||
Comment 4•6 years ago
|
||
...and avoid iterating at all. r=arai
Depends on D31302
Assignee | ||
Comment 5•6 years ago
|
||
Depends on D31303
Assignee | ||
Comment 6•6 years ago
|
||
Updated•6 years ago
|
Updated•6 years ago
|
Assignee | ||
Comment 7•6 years ago
|
||
We're somewhat freezy right now, so I'm going to land most of what's here, but I'm not going to land -- yet -- the final patch that actually begins to invoke any of this new code or long-line optimization mechanisms. The work here is hidden behind #if JS_COLUMN_DIMENSION_IS_CODE_POINTS()
, so landing it won't pose risk but will get it out of my tree so I don't have rebase/conflict worries. If memory serves, the branch is next week, and I'll land that final bit after that point.
Here's a try run of all the patches except that last effective one:
https://treeherder.mozilla.org/#/jobs?repo=try&revision=93a9c23c511287cf21a01406788c6543009e1bf0
And here's a try-run that includes that last effective patch:
https://treeherder.mozilla.org/#/jobs?repo=try&revision=3516ed710e2938ff6a687d91ca20853f3725103a
Comment 9•6 years ago
|
||
bugherder |
Comment 10•6 years ago
|
||
Assignee | ||
Updated•6 years ago
|
Comment 11•6 years ago
|
||
bugherder |
Updated•6 years ago
|
Description
•