Closed Bug 919319 Opened 11 years ago Closed 11 years ago

Simplify number parsing

Categories

(Core :: SVG, defect)

defect
Not set
normal

Tracking

()

RESOLVED FIXED
mozilla27

People

(Reporter: longsonr, Assigned: longsonr)

References

Details

Attachments

(1 file, 3 obsolete files)

Attached patch code (obsolete) — Splinter Review
      No description provided.
OS: Windows Vista → All
Hardware: x86 → All
Comment on attachment 808306 [details] [diff] [review]
code

At the moment we convert the string to UTF-8 and then parse it. It's surely more efficient to parse the string into a number without doing that.

I've punted on nsSMILParserUtils.cpp and nsSVGDataParser.cpp they'll need a follow up.

The number parser implementation is based on that in nsCSSScanner.cpp

The only functional change is lengths, of all the types we allowed whitespace here which is not correct per the specification. I'll add some tests for length parsing before landing.
Attachment #808306 - Attachment is patch: true
Attachment #808306 - Flags: review?(dholbert)
Assignee: nobody → longsonr
Comment on attachment 808306 [details] [diff] [review]
code

Haven't gotten through all of this yet, but here's what I've got so far: 

>diff --git a/content/svg/content/src/SVGContentUtils.cpp b/content/svg/content/src/SVGContentUtils.cpp
>+/**
>+ * True if 'ch' is a decimal digit.
>+ */
>+static inline bool
>+IsDigit(PRUnichar ch)

nit: s/ch/aCh/

>+SVGContentUtils::ParseNumberWithUnits(const nsAString &aString, 
>+                                      double *aValue,
>+                                      nsAString &aUnits)

Maybe this should take a reference instead of a pointer for 'aValue', since we don't want to allow nullptr?

Alternately, if we take a pointer, we should at least assert that the provided pointer is non-null at the beginning of the function.

Also, nit: bump the ampersands and stars to the left of the space, to make them part of the type. :)

>+  // Absolute value of the integer part of the mantissa.
>+  double intPart = 0;

s/0/0.0/

(same for "double fracPart" and "double divisor")

>+    if (gotE) {
>+      iter = expIter;
>+      do {
>+        if (!IsDigit(*iter)) {
>+          break;
>+        }
>+        exponent = 10 * exponent + DecimalDigitValue(*iter);
>+        ++iter;
>+      } while (iter != end);

The first time we hit this loop, we already know that *iter is a digit, so the IsDigit() check is unnecessary.

Maybe move that IsDigit check into the "while" condition, like you do up higher when you're parsing the fractional component? (both for efficiency and consistency)   Unless we can't for some reason, but it looks to me like we can.

>+SVGContentUtils::ParseNumber(const nsAString &aString, 
>+                             double *aValue)

As with ParseNumberWithUnits: maybe "aValue" should be a reference? and ideally make the & and * type-hugging.

>+bool
>+SVGContentUtils::ParseInteger(const nsAString &aString,
>+                              int32_t *aValue)
>+{

(and here as well)

>+  mozilla::RangedPtr<const PRUnichar> iter(aString.Data(), aString.Length());
>+  const mozilla::RangedPtr<const PRUnichar> end(aString.Data() + aString.Length(),
>+                                                aString.Data(), aString.Length());
>+
>+  if (iter == end) {
>+    return false;
>+  }

Up above in ParseNumberWithUnits(), your patch checks for empty-string with "aString.IsEmpty()"; here, we do it with "iter == end". Either is fine, but we should be consistent. (So, probably replace this one with the IsEmpty() check? Or the other way around if you prefer.)

>+  do {
>+      if (!IsDigit(*iter)) {
>+        return false;
>+      }
>+      value = 10 * value + DecimalDigitValue(*iter);
>+      ++iter;
>+  } while (iter != end);

Reduce indentation there -- it's 2-spaces too much, inside the loop.

Also: this doesn't check whether we're getting overflowing INT_MAX... That'd probably be worth doing, particularly that old code sort of did this (it used strtol, which clamps its result to LONG_MAX).

>diff --git a/content/svg/content/src/SVGContentUtils.h b/content/svg/content/src/SVGContentUtils.h
>+  /**
>+   * Parse a number of the form 
>+   * number ::= integer ([Ee] integer)? | [+-]? [0-9]* "." [0-9]+ ([Ee] integer)?
>+   * anything after the number is returned as the units
>+   */

Maybe add colon after "form". (This applies to the other functions as well.)

Also, capitalize "Anything" & add a period at the end of the comment.

>+  /**
>+   * Parse an integer of the form 
>+   * integer ::= [+-]? [0-9]+
>+   */
>+  static bool
>+  ParseInteger(const nsAString& aString, int32_t *aValue);

Whatever integer-overflow behavior we end up with for this function (see above), we should document it.

>+++ b/content/svg/content/src/nsSVGViewBox.cpp
>-    vals[i] = float(PR_strtod(token, &end));
>-    if (*end != '\0' || !NS_finite(vals[i])) {
>-      return NS_ERROR_DOM_SYNTAX_ERR; // parse error
>+    if (!SVGContentUtils::ParseNumber(tokenizer.nextToken(), &vals[i])) {
>+      return NS_ERROR_DOM_SYNTAX_ERR;
>     }

If the user-specified value is too large to store in a float, we previously would reject it with a DOM syntax error. Now it looks like we won't (and we'll internally be working with a bogus float-infinity value, which is probably bad).  We probably should preserve the old NS_finite() check and return a DOM syntax error if it fails.
> >+      value = 10 * value + DecimalDigitValue(*iter);
> >+      ++iter;
> >+  } while (iter != end);
[...]
> Also: this doesn't check whether we're getting overflowing INT_MAX... That'd
> probably be worth doing, particularly that old code sort of did this (it
> used strtol, which clamps its result to LONG_MAX).

s/probably worth doing/definitely worth doing/
As Jesse pointed out to me yesterday (for an unrelated issue), signed-integer overflow behavior is technically undefined, so the compiler is allowed to do anything if we overflow 'value'. So: we should make sure not to do that.

Simplest way would probably be to do the multiplication and addition separately, and check if value >= INT32_MAX/10 before doing the multiplication, and whether value >= INT32_MAX - [the just-parsed decimal digit] before doing the addition.  I think that should keep us safe.
Comment on attachment 808306 [details] [diff] [review]
code

(The NS_finite() issue applies to most of the call-sites, too.  In a lot of them, you've changed a local "GetValueFromString" method from taking a float to taking a double -- probably worth reverting that, so that GetValueFromString can do the NS_finite() check for you.)

>+++ b/content/svg/content/src/SVGLength.cpp
> bool
>-SVGLength::SetValueFromString(const nsAString &aValue)
>+SVGLength::SetValueFromString(const nsAString &aValueAsString)
> {
[...]
>+  if (!SVGContentUtils::ParseNumberWithUnits(aValueAsString, &value, units)) {
>+    return false;
>+  }
>+  uint16_t unitType = GetUnitTypeForString(units);
>+  if (IsValidUnitType(unitType)) {
>+    mValue = float(value);
>+    mUnit = uint8_t(unitType);
>+    return true;
>   }
>   return false;
> }

Nit: It'd probably make more sense to invert that last "if" check, so that the early-returns are all for error-handling. Then the logic would be structured like so:

  if (!something_that_should_succeed) {
    return false;
  }
  if (!something_else_that_should_succeed) {
    return false;
  }
  // [ ... work with the results ...]
  return true;

>+++ b/content/svg/content/src/SVGNumberList.cpp
>   nsAutoCString str;  // outside loop to minimize memory churn
> 
>   while (tokenizer.hasMoreTokens()) {
>-    CopyUTF16toUTF8(tokenizer.nextToken(), str); // NS_ConvertUTF16toUTF8
>-    const char *token = str.get();
[...]
>+    double num;
>+    if (!SVGContentUtils::ParseNumber(tokenizer.nextToken(), &num)) {

This leaves "nsAutoCString str" with no more usages -- drop that variable.

>diff --git a/content/svg/content/src/SVGPointList.cpp b/content/svg/content/src/SVGPointList.cpp
>+    double x;
>+    nsAutoString units;
>+    if (!SVGContentUtils::ParseNumberWithUnits(tokenizer.nextToken(), &x, units)) {
>       rv = NS_ERROR_DOM_SYNTAX_ERR;
>       break;
>     }
[...]
>+    } else {
>+      // It's possible for the token to be 10-30 which has
>+      // no separator but needs to be parsed as 10, -30
>+      if (units[0] != '-' && !SVGContentUtils::ParseNumber(units, &y)) {

I think you want "||" instead of "&&" there.  You want to invoke ParseNumber() when 'units' starts with "-", but right now you're invoking it when units does *not* start with "-".

ALSO:  The use of the "units" naming here (where "-30" is treated as "units") is a bit off-putting.  I'd rather we rename things a bit to drop the "units" nomenclature from ParseNumberWithUnits(), since we have no idea whether the extra string is actually units (and some callsites, e.g. this one, depend on it *not* being units).

Maybe name it just ParseNumber(), which takes a "nsAString& aRemainingStr" outparam (instead of "aUnits"), and then name the local variable "units" here something like "remainingStr", too?

>--- a/content/svg/content/src/nsSVGBoolean.cpp
>+++ b/content/svg/content/src/nsSVGBoolean.cpp
>-static nsresult
>+static bool
> GetValueFromString(const nsAString &aValueAsString,
>                    bool *aValue)
> {
>   if (aValueAsString.EqualsLiteral("true")) {
>     *aValue = true;
>-    return NS_OK;
>+    return true;
>   }

The changes to this file might want to be in a separate patch, since they're not number-parsing related, but I suppose they're fine here too. :)

>diff --git a/content/svg/content/src/nsSVGNumber2.cpp b/content/svg/content/src/nsSVGNumber2.cpp
>+static bool
> GetValueFromString(const nsAString &aValueAsString,
>                    bool aPercentagesAllowed,
>-                   float *aValue)
>+                   double *aValue)
> {
>-  NS_ConvertUTF16toUTF8 value(aValueAsString);
>-  const char *str = value.get();
> 
[...]
>+  nsAutoString units;

This function ends up with an empty line before that first line of code. Drop the blank line.
Attachment #808306 - Flags: review?(dholbert) → feedback+
Attached patch bounds.txt (obsolete) — Splinter Review
Attachment #808306 - Attachment is obsolete: true
Attachment #811558 - Flags: review?(dholbert)
Attachment #811558 - Flags: review?(dholbert)
Attached patch bounds.txt (obsolete) — Splinter Review
One thing to note:

It may seem like I've changed things functionally by removing the whitespace before/after code from SVGLength but that's not actually the case. SVGLength is a component of SVGLengthList and the parser for SVGLengthList (nsCharSeparatedTokenizer) strips whitespace from the tokens before handing them to the SVGLength parser so the whitespace allowing code was superfluous.
Attachment #811558 - Attachment is obsolete: true
Attachment #811656 - Flags: review?(dholbert)
Attached patch bounds.txtSplinter Review
Attachment #811656 - Attachment is obsolete: true
Attachment #811656 - Flags: review?(dholbert)
Attachment #811657 - Flags: review?(dholbert)
Comment on attachment 811657 [details] [diff] [review]
bounds.txt

Looks great! Final review notes -- mostly just things I missed the first time around, and all in SVGContentUtils.cpp:

>--- a/content/svg/content/src/SVGContentUtils.cpp
>+/**
>+ * Assuming that 'aCh' is a decimal digit, return its numeric value.
>+ */
>+static inline uint32_t
>+DecimalDigitValue(PRUnichar aCh)
>+{
>+  return aCh - '0';
>+}

Probably worth asserting IsDigit(aCh) at the beginning of this function, as a sanity check.

>+  if (!gotDot) {
>+    if (!IsDigit(*iter)) {
>+      return false;
>+    }
>+    do {
>+        intPart = floatType(10) * intPart + DecimalDigitValue(*iter);
>+        ++iter;
>+    } while (iter != end && IsDigit(*iter));

Fix excessive indentation in the body that do/while loop.

>+template<class floatType>
>+bool
>+SVGContentUtils::ParseNumber(const nsAString& aString, 
>+                             floatType& aValue)
>+{
>+  nsAutoString aLeftOver;
>+
>+  if (!ParseNumber(aString, aValue, aLeftOver)) {

s/aLeftOver/leftOver/ (it's a local variable)

>+bool
>+SVGContentUtils::ParseInteger(const nsAString& aString,
>+                              int32_t& aValue)
>+{
[...]
>+  int64_t value = 0;

Observation: So, int64_t will get around the 32-bit-max wraparound issue, but it'll make the math a bit slower on 32-bit platforms. But this function probably isn't enough of a bottleneck to matter, I guess, so this should be fine. (and the marginal perf cost is probably worth the code-simplicity win that a 64-bit int gets us.)

>+  if (value <= std::numeric_limits<int32_t>::max() &&
>+      value >= std::numeric_limits<int32_t>::min()) {
>+
>+    aValue = int32_t(value);
>+    return true;
>+  }
>+  return false;

So, it's worth noting that this is a behavior-change - I *think* previously we'd clamp huge numbers to LONG_MAX (via strtol), whereas now we'll treat them as invalid input.

Assuming I'm not misunderstanding on that point -- could you include a unit test (or probably a check in some existing test) with a large number, to exercise the new behavior and ensure that it's changing in the way that we expect?
Attachment #811657 - Flags: review?(dholbert) → review+
I've changed my mind and I think clamping integers is probably better. Webidl clamps so it would be consistent if the parser did the same.
Sounds good to me. (Still worth including a test for that behavior, for at least one int-valued attribute. :))
Flags: in-testsuite+
https://hg.mozilla.org/mozilla-central/rev/f550869d86bc
Status: NEW → RESOLVED
Closed: 11 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla27
Blocks: 938939
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: