2007152 - Speed up computeColumnOffsetForUTF8 for ASCII strings

While looking at parser profiles for multi-inspector-code-load in JS3, I noticed that UTF8 parsing spends ~5% of its time in computeColumnOffset, whereas UTF16 parsing spends ~0.1%. Most of the time is spent here counting the length of a range in UTF16 code units. However, in the ASCII case, this is trivial. It would be nice if we could detect this case and avoid counting.

A few options:

If we know for some reason that the input is ASCII before parsing, we could set a flag to skip counting here. (One example of where we might know this is for eval on a Latin1 string; I am experimenting with a patch that quickly checks whether a Latin1 string is valid ASCII and then parses it as UTF8.)
Alternatively, we could consider a quick initial scan of the UTF8 string to see if it is valid ASCII. I imagine this would succeed for many UTF8 inputs in the real world.
We might also be able to set a flag dynamically when we see the first non-ASCII character on a line (here-ish?), and compute columns the cheap way until that flag is set. This is the most flexible, but also adds extra code in a hot path.

Bugzilla

Speed up computeColumnOffsetForUTF8 for ASCII strings

Categories

(Core :: JavaScript Engine, task, P2)

Tracking

()

People

(Reporter: iain, Unassigned)

References

(Blocks 2 open bugs)

Details

Crash Data

Security

(public)

User Story

Description

Updated