Closed Bug 608915 Opened 9 years ago Closed 6 years ago

nsString::AppendFloat is insanely slow

Categories

(Core :: String, defect)

defect
Not set

Tracking

()

RESOLVED FIXED
mozilla28

People

(Reporter: bzbarsky, Assigned: froydnj)

References

(Blocks 2 open bugs)

Details

(Keywords: perf)

Attachments

(4 files, 5 obsolete files)

First we call PR_dtoa and then we ... fix up what it returns?

Can we not do that somehow?

Better yet, can we make use of the fast v8 dtoa that the js engine now has?  The relevant use case for me (nsCSSValue::ToString) is calling the float signature of AppendFloat, so I wonder how important that "round to 6 digits" behavior is there and whether we can preserve it while still using the fast v8 code.

For what it's worth, if I don't worry about that issue and just hack things in a bit, I do see a noticeable speedup on the testcase in bug 608880, though it still ends up falling back to ToFloat for some cases... I guess the random numbers don't help.
With bug 608914 applied, AppendFloat uses its own 'Modified_cnvtf':

Modified_cnvtf allocates a temporary buffer the size of the buffer passed in as argument, which we know is limited to 40 characters (as it is only used with nsTSubstring.cpp itself), so one could use a stack buffer instead.

This would be a fairly simple patch, that improves the AppendFloat case a lot.

Second step would be to select a dtoa for the special case of AppendFloat (mode 2, prec 6 or 15), which doesn't need any BigInt stuff, which is essentially the FastDtoa of Chromium:
http://codesearch.google.com/codesearch/p?hl=en#OAMlx_jo-ck/src/v8/src/fast-dtoa.h&q=dtoa&exact_package=chromium&d=3&l=74
Keywords: perf
OS: Mac OS X → All
Hardware: x86 → All
> With bug 608914 applied, AppendFloat uses its own 'Modified_cnvtf':

It already did.  I just moved the code around.  What I don't understand is why we need that at all.  What's Modified_cnvtf doing, exactly?

I'm not sure what you're saying in the last paragraph of comment 1.  Are you suggesting that we can make PR_dtoa fall into a fast mode, or that we should use the chromium impl?  The latter is what I tried in comment 0; it certainly does help, but then the Modified_cvntf thing still costs a ton of time.
Depends on: 614188
(In reply to Boris Zbarsky (:bz) from comment #2)
> It already did.  I just moved the code around.  What I don't understand is
> why we need that at all.  What's Modified_cnvtf doing, exactly?

Just so somebody else doesn't waste time on this: PR_dtoa essentially formats the number into a string of digits, sans convenient things like decimal points, leading and trailing zeros, etc.  Modified_cvntf is going through and fixing everything up to act like you'd expect.
This patch on top of the patches in bug 614188 is good for 5-10% speedup on the original benchmark mentioned in bug 608880.  It's probably worth splitting this into two functions, DoAppendFloat and DoAppendDouble to really avoid overhead.
Overhead is probably compiled away, but removing the code clutter will also help. Instead of DoAppendFloat & DoAppendDouble, one could also direct provide implementation for the two AppendFloat signatures or use templating.
I assume there's no way to have the doubleconversion code write directly to PRUnichars?
(In reply to Boris Zbarsky (:bz) from comment #6)
> I assume there's no way to have the doubleconversion code write directly to
> PRUnichars?

Not without significant modifications, no.
(In reply to Nathan Froyd (:froydnj) from comment #7)
> (In reply to Boris Zbarsky (:bz) from comment #6)
> > I assume there's no way to have the doubleconversion code write directly to
> > PRUnichars?
> 
> Not without significant modifications, no.

Actually, I take this back.  We could tweak the StringBuilder class double-conversion uses to write to a PRUnichar buffer under the hood, even though the external interface is char*'s.  That would be a couple lines of change (though I don't know how ugly it would be to have PRUnichar in MFBT...).  But we couldn't do PRUnichar and char strings simultaneously without major conversion.
> (though I don't know how ugly it would be to have PRUnichar in MFBT...)

FWIW, my hash patches in bug 729940 fake PRUnichar as uint16_t on *nix and wchar_t on Windows.
Nathan, is there a reason this never got landed?
Flags: needinfo?(nfroyd)
(In reply to Boris Zbarsky [:bz] from comment #10)
> Nathan, is there a reason this never got landed?

I think I was clueless about how to handle the PRUnicher-in-MFBT situation and didn't ask for help, so it fell off the radar.  Justin's patches in bug 729940 provide a way forward and I think we even have some sort of wide-char stuff in MFBT nowadays, so the mismatch may be somewhat easier to solve nowadays.  I'll take a second look at this.
Flags: needinfo?(nfroyd)
jcranmer also has a plan to switch stuff to char16_t, I think (bug 904985, etc.).
Awesome, thanks!  Let me know if you need a hand.
Assignee: nobody → nfroyd
Updated patch that applies on the current tree.  Testing on try:

https://tbpl.mozilla.org/?tree=Try&rev=2ff234b88aba

bz for the code, gps for the build bits
Attachment #604435 - Attachment is obsolete: true
Attachment #809310 - Flags: review?(gps)
Attachment #809310 - Flags: review?(bzbarsky)
Comment on attachment 809310 [details] [diff] [review]
use MFBT's double-conversion code in nsString::AppendFloat

Review of attachment 809310 [details] [diff] [review]:
-----------------------------------------------------------------

I assume you can move this line before commit.

::: xpcom/string/src/Makefile.in
@@ +25,5 @@
>  endif
>  
>  DEFINES		+= -DIMPL_LIBXUL
> +
> +LOCAL_INCLUDES += -I$(topsrcdir)/mfbt/double-conversion

LOCAL_INCLUDES should now be defined in moz.build files.
Attachment #809310 - Flags: review?(gps) → review+
Comment on attachment 809310 [details] [diff] [review]
use MFBT's double-conversion code in nsString::AppendFloat

Review of attachment 809310 [details] [diff] [review]:
-----------------------------------------------------------------

Oh, I thought bug 880254 had landed. You don't have to worry about this if you don't want to. It will get sorted out eventually.
(In reply to Gregory Szorc [:gps] from comment #16)
> Oh, I thought bug 880254 had landed. You don't have to worry about this if
> you don't want to. It will get sorted out eventually.

I thought it had too!  But a grep through the tree convinced me otherwise. :)
(In reply to Nathan Froyd (:froydnj) from comment #14)
> Updated patch that applies on the current tree.  Testing on try:
> 
> https://tbpl.mozilla.org/?tree=Try&rev=2ff234b88aba

So as might be expected, this is producing a lot of failures because numbers are formatted slightly differently than tests expect.  The typical problem is that the new formatting algorithm produces more fractional digits than the tests expect: e.g. the tests expect -0.707107 and we now produce -0.70710677.  This may be due to AppendFloat(float) converting to double and then formatting the results...
Comment on attachment 809310 [details] [diff] [review]
use MFBT's double-conversion code in nsString::AppendFloat

Does ToShortestSingle() not do the right thing?  Check that you're taking that codepath?

Should we be using ToPrecision() instead?  Might be slower.  :(

r=me modulo that bit
Attachment #809310 - Flags: review?(bzbarsky) → review+
(In reply to Gregory Szorc [:gps] from comment #16)
> Comment on attachment 809310 [details] [diff] [review]
> use MFBT's double-conversion code in nsString::AppendFloat
> 
> Review of attachment 809310 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> Oh, I thought bug 880254 had landed. You don't have to worry about this if
> you don't want to. It will get sorted out eventually.

It's supported in moz.build, I just haven't taken the time to move everything. Adding new uses in moz.build rather than Makefile.in is appreciated :)
(In reply to Boris Zbarsky [:bz] from comment #19)
> Comment on attachment 809310 [details] [diff] [review]
> use MFBT's double-conversion code in nsString::AppendFloat
> 
> Does ToShortestSingle() not do the right thing?  Check that you're taking
> that codepath?
> 
> Should we be using ToPrecision() instead?  Might be slower.  :(

Splitting things into separate functions for float and double doesn't make a difference: https://tbpl.mozilla.org/?tree=Try&rev=c10431fe4a94

I can try ToPrecision instead.

Do the specs provide any guidance here on how to format numbers?  Or do they just assume things will be formatted as the JS engine does?  Arguably having the AppendFloat() code use the same code as the JS engine makes things more consistent.  But it looks like there's ~150 tests to be modified if that's the case...
> Do the specs provide any guidance here on how to format numbers?  

In which cases?

For CSS the current spec draft (still in flux) says:

  A base-ten number using digits 0-9 (U+0030 to U+0039) in the shortest form possible,
  using "." to separate decimals (if any), preceded by "-" (U+002D) if it is negative. 

and then:

  JavaScript's ToString algorithm cannot be used since it can serialize numbers using an
  exponent, which would not round-trip in CSS. 

(which is a preexisting bug we already have anyway).
ToPrecision() is not what we want to use, since that will round large numbers (e.g. 1234567 will become 1234570) and it produces things like 1.000000 for 1, which many tests complain about.

I'm not sure if the double-conversion code has a mode that (natively) does what we want.  Perhaps a closer look at what Modified_cvntf is doing is warranted...
Hmm.  My gut feeling is that changing from -0.707107 to -0.70710677 is not necesarily a problem (except insofar as fixing up the tests is time-consuming).  David, thoughts?
Flags: needinfo?(dbaron)
Blocks: 920659
(In reply to Boris Zbarsky [:bz] from comment #24)
> Hmm.  My gut feeling is that changing from -0.707107 to -0.70710677 is not
> necesarily a problem (except insofar as fixing up the tests is
> time-consuming).

I guess what I think depends on whether these values parse to the same float value or not.  (If they do parse to the same thing, it seems bad in that it's reporting extra accuracy that's not truly present.)  Even then, I'm not sure how much I should care.
Flags: needinfo?(dbaron)
Nathan, is this ready to land? We need this for Peacekeeper (bug 920659).
Flags: needinfo?(nfroyd)
(In reply to Jan de Mooij [:jandem] from comment #26)
> Nathan, is this ready to land? We need this for Peacekeeper (bug 920659).

No, it's not.  It needs tests fixed up to accommodate formatting changes and those changes reviewed.

My old try runs seem to have gone away, so I repushed this patch to Try.  I'll have a go at fixing up the tests today.
Flags: needinfo?(nfroyd)
I took a closer look at this today after re-examining a try run using ToShortest.

The current behavior is that PR_dtoa is getting called with a printing mode that works kind of like ToPrecision, but is able to suppress trailing zeros.  You can see the modes for PR_dtoa here:

http://mxr.mozilla.org/mozilla-central/source/nsprpub/pr/src/misc/prdtoa.c#2726

We use mode 2.  This also means that we also round 15.99996 to 16 (when printing single precision).  This rounding and eliminate-trailing-zeros behavior means that our floating point values for CSS come out much nicer than they would otherwise.  Additionally, we are already getting the rounding behavior that comment 23 points out and nobody seems to be complaining much.

double-conversion does expose the raw number-to-digits+sign+decimal-point-position primitive, so it's possible we can get away with using that instead to preserve the current behavior and eliminate the rounding peculiarities.  Still trying to play around with that.
OK, so instead of trying to fixup the raw digit string returned by DoubleToAscii,
I opted to just remove the trailing zeros from the result of ToPrecision.  I don't
know whether this is actually faster that PR_dtoa now, but I do know all the tests
pass.  Going to do some performance testing with peacekeeper momentarily.

This is complicated enough that it needs re-review.  I'm not sure if the gymnastics
to find a place closer to the decimal point to start strchr from, rather than just
starting from the beginning of the string, are at all worthwhile.

The build parts haven't changed, so gps' r+ on the build still applies (will move
LOCAL_INCLUDES into moz.build before commit).
Attachment #809310 - Attachment is obsolete: true
Attachment #826762 - Flags: review?(bzbarsky)
(In reply to Nathan Froyd (:froydnj) from comment #29)
> OK, so instead of trying to fixup the raw digit string returned by
> DoubleToAscii,
> I opted to just remove the trailing zeros from the result of ToPrecision.  I
> don't
> know whether this is actually faster that PR_dtoa now, but I do know all the
> tests
> pass.  Going to do some performance testing with peacekeeper momentarily.

This change actually gives me a small regression (~0.75%) on the futuremark.com version of peacekeeper.  Jan, can you test this change yourself to see if you see the same?
Flags: needinfo?(jdemooij)
> This change actually gives me a small regression (~0.75%) on the futuremark.com version
> of peacekeeper.

For what it's worth, it still gives a significant win for me on the first testcase in bug 608880 (order of 30% improvement or so).
Third time's the charm.  This version is significantly uglier, because we also
have to remove trailing zeros from cases with exponential notation.  I also don't
like that those checks mean that we need to slow down the general case.

I guess if we wanted to, we could hack the double-conversion code's ToPrecision
to tell use whether it had used exponential notation....

I see a small speedup on my machine for the bug 608880 testcase, nothing like
what bz claims he's seeing.  That may or may not have anything to do with my
testing things over a remote X connection.  Still interested in numbers from other
people.

This version is green on try.
Attachment #826762 - Attachment is obsolete: true
Attachment #826762 - Flags: review?(bzbarsky)
Attachment #826926 - Flags: review?(bzbarsky)
Bonus patch: let's change ToPrecision to tell us when it constructs an exponential
number, so we can separate out the exponential-with-trailing-zeros case from the
trailing-zeros case.  Appears to win ~1% on the actual peacekeeper benchmark and
the benchmark from bug 608880.  Other performance testers appreciated.
Comment on attachment 826926 [details] [diff] [review]
use MFBT's double-conversion code in nsString::AppendFloat

>+static int
>+FormatWithoutTrailingZeros(char (& buf)[40], double aDouble,
>+                           int precision)

Document that this returns the length of the resulting string in buf?

>+  if (length > precision) {

Please document why this is a reasonable thing to test.  I guess something about how if length == precision that means we have no decimal point?  I assume that's true even if we went exponential, in that for 500000000 we'll get 5.000000e8 or something here?

>+#endif

Leave the /* CharT_is_PRUnichar */ bit here, please.

r=me with those nits
Attachment #826926 - Flags: review?(bzbarsky) → review+
(In reply to Nathan Froyd (:froydnj) from comment #30)
> This change actually gives me a small regression (~0.75%) on the
> futuremark.com version of peacekeeper.  Jan, can you test this change
> yourself to see if you see the same?

Hm weird that you're seeing different numbers than bz. Which platform are you on? Do you have Try builds I can download?
Flags: needinfo?(jdemooij)
(In reply to Jan de Mooij [:jandem] from comment #35)
> (In reply to Nathan Froyd (:froydnj) from comment #30)
> > This change actually gives me a small regression (~0.75%) on the
> > futuremark.com version of peacekeeper.  Jan, can you test this change
> > yourself to see if you see the same?
> 
> Hm weird that you're seeing different numbers than bz. Which platform are
> you on? Do you have Try builds I can download?

I'm on x86-64 Linux.  I do not have try builds for download; I was just building things locally.

I can confirm that I see speedups on the test in bug 608880 of roughly 30%, just like bz.  I'm going to assume that's good enough.
Renumbered because detecting the exponential case in ToPrecision is actually
worthwhile.  Addressing bz's review comments in part 3.  Carrying over r+.
Attachment #826926 - Attachment is obsolete: true
Attachment #828752 - Flags: review+
Telling callers about this case, like nsString::AppendFloat, costs virtually nothing.
And if callers are doing further manipulation on the returned value, again like
nsString::AppendFloat, this flag saves them time.  I agree it's a little gross, but
benchmarks > all.
Attachment #828753 - Flags: review?(jwalden+bmo)
This patch just rewrites things to accomodate the ToPrecision changes.  Makes the
formatting a little nicer, addresses review comments from part 1.  Will probably wind
up folding this into part 1, so you can almost think of it as an interdiff.

Wins another couple of percent over the initial patch in part 1.
Attachment #827412 - Attachment is obsolete: true
Attachment #828754 - Flags: review?(bzbarsky)
Comment on attachment 828754 [details] [diff] [review]
part 3 - use the exponential notation flag set by ToPrecision in nsString::AppendFloat

>+    for ( ; exponent != decimalPoint; --exponent) {
>+      if (*exponent == 'e') {

Given that we know exponential_notation is true, can we not just decreas exponent until we hit the 'e'?  That is, we should never hit decimalPoint before 'e', right?

>+    char* exponentZeros = exponent - 1;

This is maybe better named zerosBeforeExponent; otherwise it sounds like it's zeros in the exponent, which is not what it's about.

>+    // worry about copying the trailing NUL pointer.

NUL character, right?

>+    length -= (end - ((exponentZeros + exponentSize) + 1));

Why not:

  length -= exponent - (exponentZeros + 1);

?  That's how much you slid things over by, no?

The expression you have is equivalent, since (end - exponentSize) == exponent, but took me a bit to figure out...

r=me
Attachment #828754 - Flags: review?(bzbarsky) → review+
One last run through try revealed errors like (all in flexbox testing):

12992 ERROR TEST-UNEXPECTED-FAIL | /tests/layout/style/test/test_flexbox_layout.html | computed value of 'width' should match expected - got 9e+6px, expected 9000000px

https://tbpl.mozilla.org/php/getParsedLog.php?id=30284340&tree=Try

I guess we could either:

- print numbers at a precision of '7' (I assume these are single precision floats)
- just "fix" the test cases.

bz, which are you happier rs+'ing?
Flags: needinfo?(bzbarsky)
7 is probably better.  Per spec, 9e+6px isn't even a valid width value.  ;)
Flags: needinfo?(bzbarsky)
(In reply to Boris Zbarsky [:bz] from comment #42)
> 7 is probably better.  Per spec, 9e+6px isn't even a valid width value.  ;)

:)  We'll see if a precision of 7 still works...floating point may come out and bite us.

Another alternative would be to not use EcmaScriptConverter and see if varying some of the parameters gives us the numbers we want.
Turns out we need to be slightly more lenient about padding zeros than JS.  This
is good because it solves our "M000000 gets printed as M.00000e+6" problem and
because we will delete any extraneous zeros after the decimal point anyway.
Attachment #829034 - Flags: review+
Comment on attachment 829034 [details] [diff] [review]
use our own converter (to be folded into other patches)

Sounds great!
Attachment #828753 - Flags: review?(jwalden+bmo) → review+
You need to log in before you can comment on or make changes to this bug.