Closed Bug 433533 Opened 12 years ago Closed 11 years ago

Attributes stored using enum value don't keep the 'caseness'

Categories

(Core :: DOM: Core & HTML, defect)

x86
All
defect
Not set

Tracking

()

RESOLVED FIXED

People

(Reporter: smaug, Assigned: smaug)

References

(Blocks 1 open bug)

Details

Attachments

(2 files, 9 obsolete files)

<input type="HIDDEN"> becomes <input type="hidden">
(ACID3, test 54)
Similar to bug 254337. See my comments there; we probably want a single fix that will fix all attribute storage types.
Right. I started to write a patch for this during a train trip - it is not yet 
quite ready.
Attached patch patch (obsolete) — Splinter Review
Fixes also bug 254337.

Moves integer type storing to the misc-container and restricts
ParseAtomArray so that whitespaces are kept properly.
CSSStyleRule handling doesn't need to be changed
(http://www.whatwg.org/specs/web-apps/current-work/#the-style)
and SVG's attribute handling is a bit different - if some changes should
be done there, that is a different bug.

Gets ACID3 from 77 to 79.
Assignee: nobody → Olli.Pettay
Status: NEW → ASSIGNED
Which part of the spec says we're allowed to change the "style" content attribute?

Should we consider making mStringValue be an nsStringBuffer* or an nsTextFragment and storing it as 8-bit text when possible?

Are you sure there is no performance impact from using MiscContainer for integer types?
(In reply to comment #4)
> Which part of the spec says we're allowed to change the "style" content
> attribute?
I interpret the following to mean that style attribute can be changed:
"The style DOM attribute must return a CSSStyleDeclaration whose value represents the declarations specified in the attribute, if present. Mutating the CSSStyleDeclaration object must create a style attribute on the element (if there isn't one already) and then change its value to be a value representing the serialized form of the CSSStyleDeclaration object."

> Should we consider making mStringValue be an nsStringBuffer* or an
> nsTextFragment and storing it as 8-bit text when possible?
Is it really worth it?

> Are you sure there is no performance impact from using MiscContainer for
> integer types?
No, I'm not sure, but how else could we store the original value.
Store in misc only if string version of integer value isn't the same
as the original value? Perhaps, but that slows down SetAttr because
integer-to-string conversion and some comparisons are needed.

Need to run the patch in TryServer (does that work with hg already?) to
see if there are any significant memusage changes.
Attached patch patch v1.1 (obsolete) — Splinter Review
Make it more clear what Equals(const nsAttrValue& aOther) does.

I run the first patch and "a no change" patch in TryServer - no significant changes in memusage or performance.
Attachment #324256 - Attachment is obsolete: true
(In reply to comment #4)
> Are you sure there is no performance impact from using MiscContainer for
> integer types?
The patch changes |(value & ~a_mask) / a_multiplier| to |a_ptr->value|.
Not sure how much slower an indirect referene could be.

Jonas, you implemented nsAttrValue, you may have some comments here?
 

(In reply to comment #5)
> (In reply to comment #4)
> > Which part of the spec says we're allowed to change the "style" content
> > attribute?
> I interpret the following to mean that style attribute can be changed:
> "The style DOM attribute must return a CSSStyleDeclaration whose value
> represents the declarations specified in the attribute, if present. Mutating
> the CSSStyleDeclaration object must create a style attribute on the element
> (if there isn't one already) and then change its value to be a value
> representing the serialized form of the CSSStyleDeclaration object."

That's clear that if the 'style' DOM attribute is modified then the content attribute changes. But if the 'style' DOM attribute is not modified then I don't think we have the right to change the string value.

> > Should we consider making mStringValue be an nsStringBuffer* or an
> > nsTextFragment and storing it as 8-bit text when possible?
> Is it really worth it?

Attribute values are almost always ASCII. But I don't know how significant these extra strings are in terms of overall memory usage.

> > Are you sure there is no performance impact from using MiscContainer for
> > integer types?
> No, I'm not sure, but how else could we store the original value.
> Store in misc only if string version of integer value isn't the same
> as the original value?

Yes.

> Perhaps, but that slows down SetAttr because
> integer-to-string conversion and some comparisons are needed.

Not necessarily. We could have an extended ToInteger function that returns whether it saw a round-trippable string. (Pretty easy --- say strings that start with 1-9 and are all digits are round-trippable).

(In reply to comment #7)
> (In reply to comment #4)
> > Are you sure there is no performance impact from using MiscContainer for
> > integer types?
> The patch changes |(value & ~a_mask) / a_multiplier| to |a_ptr->value|.
> Not sure how much slower an indirect referene could be.

I'm more worried about space and the cost of memory management for the strings.

It's good that it doesn't affect Tp/Ts but I think we should also consider space requirements for huge documents. If you add some instrumentation to nsAttrValue and load the HTML5 spec, for example, you should be able to measure fairly accurately how many extra strings are required and how much space they're using (and how much space could be saved if we were more aggressive about not storing extra strings).
Yes, my main concern is memory usage. Specifically pages that have lots of <td align="CENTER"> would get lots more memory used with this patch.

Having lots and lots of elements with the same set of presentational attributes used to happen enough that we have a good chunk of code in the tree to deal with it (see all the code relating to mapped attributes). I would assume that that still is the case today.

So if those attributes have the "wrong" casing that could result in lots of strings being stored.

One possible remedy would be to store an atom rather than a string. That way we'll at least share the string among all elements. But even with that I'd be somewhat worried.
I did some testing with this patch and the result with HTML5 document is that
<12120 more bytes were consumed. Of those ~3420 used by chrome.
Note, it was counted so that every atom in misc reserves more memory -
no atom reusing. To me that doesn't look too bad.
0 Percent attributes were created, 12 Integer attribute, 2 Enum, 388 AtomArray, 0 Color (stored as integer), total 60399 attributes.
Of those chrome created ~46 AtomArrays and ~5173 attributes.

When loading form-heavy pages, like bugzilla, especially enum attrs are used
pretty much. As an example loading bug 335998 creates
90 integer attrs, 290 enums, and 20 colors (stored as integer), total 11071 attributes (~half of those in chrome).

http://java.sun.com/javase/6/docs/api/, which loads 3 pretty huge pages, uses some Percent, Integer, Enum and Color attributes, but comparing to total
attribute count 32061, not too many.

http://lxr.mozilla.org/seamonkey/source/layout/base/nsCSSFrameConstructor.cpp
uses just about 1k more memory, although it creates ~83000 attributes.

I didn't check exactly how many of attributes were just temporary, not really 
stored in an element. But testing how many attributes are in those pages,
seems like ~50% are temporary. That corresponds pretty exactly how
setting an attribute works - first create an nsAttrValue in stack, then 
swap the values with (possibly new) heap allocated object.

The patch doesn't still handle CSS or SVG parts. If this ever gets checked in,
those could be handled in different bugs - if needed.
Attachment #324282 - Attachment is obsolete: true
I do think we should do what roc suggested and only store the string value for integers when stringifying the integer doesn't produce the correct result. It should be trivial to add code that detects if we're stripping stuff in the int parsing functions.

Similarly, we should do the same for atoms, atom arrays and enums since i bet in the absolutely most cases the stored value roundtripps correctly. For enums we could even use a bit to indicated if roundtripping is correct when uppercasing the enum value.
In what cases would atoms not roundtrip correctly?
class="  foo  " is currently stored as a single atom value with the atom containing "foo".
(In reply to comment #12)
> It
> should be trivial to add code that detects if we're stripping stuff in the int
> parsing functions.
Yes it should be easy, especially if String API is changed. But that is not 
something I'd want to do for 1.9.1 (if this bug could ever be fixed for 1.9.1).
Ofc copy-pasting ToInteger impl and modifying it a bit is one option.

> Similarly, we should do the same for atoms, atom arrays and enums since i bet
> in the absolutely most cases the stored value roundtripps correctly.
Right. Just not set atom or string at all when stringifying the value
creates the right value. Or if we want to optimize more, create misc only
when atom or string is needed...though that may make the code pretty ugly and
doesn't have the advantage the patch has that integer values are stored as 
32bits, not (32 - something) bits like currently.

How far can we go with optimizations, or when does the over-optimization start?
> Ofc copy-pasting ToInteger impl and modifying it a bit is one option.

Another option is to just write the simplest possible integer conversion code here, which only handles [1-9][0-9]* and doesn't use a fallback string, and if that conversion fails then call the general ToInteger method and use a fallback string.

I think optimizations that detect when a stored value roundtrips correctly and avoid the fallback string in those cases are reasonable, assuming they're simple and efficient. I actually think integers, atoms and enums should be enough ... atom arrays (multiple classes?) seem hardly ever used so just always use a fallback string. Extra stuff like detecting uppercase variants or whitespace variations and encoding those in a special way seems like overkill to me.
Feel free to reimplement ToInteger. The implementation that exists is retarded and super inefficient. It's trivial to throw together one manually.
I do actually think uppercasing is pretty common, but we can leave it for now and see if the original patch increases the memory numbers when it lands.
Btw, color attribute handling is still something fix - there we strip white space,
but without that NS_ColorNameToRGB doesn't work, I think.
Attachment #324482 - Attachment is obsolete: true
Attached patch v3.1 (obsolete) — Splinter Review
Make StringToInteger handle cases like -0, -0%, "string > maxint", "string < minint"  and add testcases.
Attachment #324637 - Attachment is obsolete: true
Attached patch v3.2, just a small optimization (obsolete) — Splinter Review
What about this patch?
Attachment #324770 - Attachment is obsolete: true
Attachment #324803 - Flags: review?(jonas)
Attached patch with diff -u (obsolete) — Splinter Review
Attachment #324803 - Attachment is obsolete: true
Attachment #324804 - Flags: review?(jonas)
Attachment #324803 - Flags: review?(jonas)
Flags: wanted1.9.1+
Blocks: 444377
Blocks: 444379
Blocks: 444380
Please, Jonas, can you review attachment 324804 [details] [diff] [review]?

Thanks.
Component: DOM: HTML → DOM: Core & HTML
I talked to Jonas at the summit and he promised to accept the approach with an additional optimization for the case where an attribute is set to the upper-case version of an enum value.
Hmm, so would have to check that the enum-string value is lowercase and
the value to be set is all-upper-case, and set yet another bit somewhere to
indicate that the return value should be uppercased...
I can sure do that, but sounds a bit ugly.
And support for <video> added yet another attr type, |float|. Have to fix that too
Attached patch v12 (obsolete) — Splinter Review
handles uppercase enumattrs and floats. Added few tests and passes mochitest.
(mochitest did catch one problem when I first added support for uppercase enumattrs)
Attachment #324804 - Attachment is obsolete: true
Attachment #332771 - Flags: superreview?(jonas)
Attachment #332771 - Flags: review?(jonas)
Attachment #324804 - Flags: review?(jonas)
It seems like the patch always creates a misc struct when storing integer type values (int, enum, percent, etc), even if integer strictly serializes to the correct value, i.e. even for something like size="1". I really think we want to avoid this as allocating the misc structs is somewhat expensive.

Also, in TypeAndValueEquals you unfortunately need to ensure that the string values are the same, not just the integer values. Otherwise in a situation like:

<font size="1">
<font size=" 1 ">

we will end up sharing the nsAttrValue and one of the two will serialize wrong.
Attached patch v13 (obsolete) — Splinter Review
More strict TypeAndValueEquals and store integer values in mBits whenever possible:
  - strict parsing
  - NS_ATTRVALUE_INTEGERTYPE_MINVALUE <= x <= NS_ATTRVALUE_INTEGERTYPE_MAXVALUE)

Added some tests too.

(nsAttrValue is getting pretty ugly. Probably pretty over-optimized for memory usage.)
Attachment #332771 - Attachment is obsolete: true
Attachment #337333 - Flags: superreview?(jonas)
Attachment #337333 - Flags: review?(jonas)
Attachment #332771 - Flags: superreview?(jonas)
Attachment #332771 - Flags: review?(jonas)
Comment on attachment 337333 [details] [diff] [review]
v13

General comments:

Why not give color and float values the same treatment as int/enum/percent?

Please give MiscContainer::mBits have a more descriptive name. Such as mStringvalue or mStringBits?

It doesn't look like you are dealing with values like:
<div class=" foo ">

In that case we currently just store 



>-nsAttrValue::SetTo(PRInt16 aInt)
>+nsAttrValue::SetTo(PRInt16 aInt, const nsAString& aStringValue)

It doesn't look like this new signature is needed. Everywhere where it is called the integer serializes to exactly the stringvalue.

>@@ -320,131 +337,136 @@ nsAttrValue::ToString(nsAString& aResult
>     case eColor:
>     {
>       nscolor v;
>       GetColorValue(v);
>       NS_RGBToHex(v, aResult);
>-
>-      break;

Don't you need to treat colors the same as integers etc and store a string value if the parsed value doesn't serialize to the exact right string?

Also, would it be simpler to implement nsAttrValue::ToString like this?:

nsAttrValue::ToString(...)
{

  if (BaseType() == eOtherBase &&
      GetMiscContainer()->mBits) {
    <get string or atom string>
    return;
  }
  <old ToString implementation with enum serialization taught about uppercase enums>
}

Right now everything is spread out over two switch statements which is a bit messy to follow.


>@@ -510,16 +532,28 @@ nsAttrValue::HashValue() const

I think it'd be worth adding MiscContainer::mBits to the hash if it's pointing to an atom. That should cover most cases for when we have mBits and avoid unnecessary Equals calls, without slowing down the hash function.

> PRBool
>-nsAttrValue::Equals(const nsAttrValue& aOther) const
>+nsAttrValue::TypeAndValueEquals(const nsAttrValue& aOther) const

This name change doesn't seem needed any more now that it's an exact compare.


>@@ -596,30 +653,37 @@ nsAttrValue::Equals(const nsAttrValue& a
>+  if (needsStringComparison) {
>+    nsAutoString otherValue;
>+    aOther.ToString(otherValue);
>+    return Equals(otherValue, eCaseMatters);
>+  }

I think it'd be worth implementing this a bit faster. This is exercised quite heavily during HTML parsing when there are a lot of mapped attributes. Something like this should be fast:

if (otherCont->mBits & 1 == 1 && thisCont->mBits & 1 == 1)
  return otherCont->mBits == thisCont->mBits;
return otherCont->mBits & 1 == 0 && thisCont->mBits & 1 == 0 &&
       <compare as strings>


> nsAttrValue::Equals(const nsAString& aValue,
>                     nsCaseTreatment aCaseSensitive) const

I think you need to fix this function to pay attention the new string value. Not really sure what sorts of bugs we could get otherwise, but it might be as bad a security implications. I.e. having nsIContent::AttrValueIs return false for a given value but then returning that value from GetAttr could be bad.

>@@ -762,21 +828,23 @@ nsAttrValue::ParseAtomArray(const nsAStr
>@@ -804,32 +872,72 @@ nsAttrValue::ParseAtomArray(const nsAStr
>       Reset();
>       return;
>     }
> 
>     // skip whitespace
>     while (iter != end && nsContentUtils::IsHTMLWhitespace(*iter)) {
>       ++iter;
>     }
>-  } while (iter != end);
>+  }
> 
>+  SetMiscAtomOrString(&aValue);
>   return;

You could possibly skip this call if there is only exactly one space between each atom. Probably doesn't make a big difference since the common case is to just have exactly one class name. But also doesn't seem like a lot of code to check for.

Don't care much either way. Also something we could do later.

>@@ -838,27 +946,34 @@ nsAttrValue::ParseEnumValue(const nsAStr
>   while (aTable->tag) {
>     if (aCaseSensitive ? aValue.EqualsASCII(aTable->tag) :
>                          aValue.LowerCaseEqualsASCII(aTable->tag)) {
> 
>       // Find index of EnumTable
>       PRInt16 index = sEnumTableArray->IndexOf(aTable);
>       if (index < 0) {
>         index = sEnumTableArray->Length();
>-        NS_ASSERTION(index <= NS_ATTRVALUE_ENUMTABLEINDEX_MAXVALUE,
>-                     "too many enum tables");

Why remove this?


Reviewed to ParseSpecialIntValue
(In reply to comment #30)
> Why not give color and float values the same treatment as int/enum/percent?
Because in this patch I didn't want to start to optimize even more than we currently do.

> 
> Please give MiscContainer::mBits have a more descriptive name. Such as
> mStringvalue or mStringBits?
Makes sense.

> It doesn't look like you are dealing with values like:
> <div class=" foo ">
How so? There is !hasSpace check.

> It doesn't look like this new signature is needed. Everywhere where it is
> called the integer serializes to exactly the stringvalue.
True.

> Don't you need to treat colors the same as integers etc and store a string
> value if the parsed value doesn't serialize to the exact right string?
There is the followup bug 444377. 

> Also, would it be simpler to implement nsAttrValue::ToString like this?:
Not sure, because with the current approach there isn't need
to have two stringification's of integer, enum and percent.

> You could possibly skip this call if there is only exactly one space between
> each atom. Probably doesn't make a big difference since the common case is to
> just have exactly one class name. But also doesn't seem like a lot of code to
> check for.
Ugh, even more micro-optimizations. Won't do unless it shows up in some
profile.
(In reply to comment #31)
> (In reply to comment #30)
> > Why not give color and float values the same treatment as int/enum/percent?
> Because in this patch I didn't want to start to optimize even more than we
> currently do.

Ok, makes sense.

> > It doesn't look like you are dealing with values like:
> > <div class=" foo ">
> How so? There is !hasSpace check.

Yeah, meant to remove that comment.

> > Also, would it be simpler to implement nsAttrValue::ToString like this?:
> Not sure, because with the current approach there isn't need
> to have two stringification's of integer, enum and percent.

That wouldn't be the case then either way, I don't think. Basically you are already doing what i've proposed, just inside the default statement of another switch that just handles some of the types.
Comment on attachment 337333 [details] [diff] [review]
v13

>+nsAttrValue::SetMiscAtomOrString(const nsAString* aValue)
>+{
>+  NS_ASSERTION(GetMiscContainer(), "Must have MiscContainer!");
>+  NS_ASSERTION(!GetMiscContainer()->mBits, "Trying to re-set atom or string!");
>+  if (aValue) {
>+    PRUint32 len = aValue->Length();
>+    if (len) {

Could len ever be 0 here? Seems strange that we would have parsed an empty string into any valid value. Would an assertion be enough?

>+nsAttrValue::SetFloatValue(float aValue, const nsAString* aStringValue)

Same here as for SetIntValueAndType, will anyone ever really want to set a specific float value but a different string representation? Other than ParseFloat.

Looks good otherwise. Would like to see another patch to check out nsAttrValue::Equals.
(In reply to comment #33)
> >+nsAttrValue::SetFloatValue(float aValue, const nsAString* aStringValue)
> 
> Same here as for SetIntValueAndType, will anyone ever really want to set a
> specific float value but a different string representation? Other than
> ParseFloat.
SetIntValueAndType is used in many place, but sure I can remove SetFloatValue
> > nsAttrValue::Equals(const nsAString& aValue,
> >                     nsCaseTreatment aCaseSensitive) const
> 
> I think you need to fix this function to pay attention the new string value.
> Not really sure what sorts of bugs we could get otherwise, but it might be as
> bad a security implications. I.e. having nsIContent::AttrValueIs return false
> for a given value but then returning that value from GetAttr could be bad.
This is already using ToString().Equals as a fallback.
Attached patch v14Splinter Review
I hope I didn't miss anything.
Found still one problem - class attribute which contains > 0 spaces.
Added a testcase for that and fixed the code.
Attachment #338205 - Flags: superreview?(jonas)
Attachment #338205 - Flags: review?(jonas)
Attachment #337333 - Attachment is obsolete: true
Attachment #337333 - Flags: superreview?(jonas)
Attachment #337333 - Flags: review?(jonas)
Comment on attachment 338205 [details] [diff] [review]
v14

>@@ -105,18 +107,16 @@ public:
>   enum ValueType {
>     eString =       0x00, //   00
>                           //   01  this value indicates an 'misc' struct
>     eAtom =         0x02, //   10
>     eInteger =      0x03, // 0011
>     eColor =        0x07, // 0111
>     eEnum =         0x0B, // 1011  This should eventually die
>     ePercent =      0x0F, // 1111
>-    // Values below here won't matter, they'll be stored in the 'misc' struct
>-    // anyway
>     eCSSStyleRule = 0x10,
>     eAtomArray =    0x11 

Why remove the comment, it's still correct no? Maybe say "they'll always be stored" to make it more clear.


> nsAttrValue::ToString(nsAString& aResult) const
> {
>+  MiscContainer* cont = nsnull;
>+  if (BaseType() == eOtherBase) {
>+    cont = GetMiscContainer();
>+    void* ptr =
>+      reinterpret_cast<void*>(cont->mStringBits & NS_ATTRVALUE_POINTERVALUE_MASK);
>+    if (ptr) {
>+      if (static_cast<ValueBaseType>(cont->mStringBits & NS_ATTRVALUE_BASETYPE_MASK) ==
>+          eStringBase) {

Technically this static_cast isn't needed. Same thing in a few other places. Feel free to keep if you prefer it though.

(in nsAttrValue::ToString)
>     case ePercent:
>     {
>       nsAutoString intStr;
>-      intStr.AppendInt(GetIntInternal());
>+      intStr.AppendInt(cont
>+        ? GetMiscContainer()->mPercent
>+        : GetIntInternal());
>       aResult = intStr + NS_LITERAL_STRING("%");

Change GetMiscContainer() to cont?

In nsAttrValue::Equals
>+      nsAutoString otherValue;
>+      aOther.ToString(otherValue);
>+      return Equals(otherValue, eCaseMatters);

This still seems like a somewhat wasteful way to get to what is essentially just a loop over the two buffers. Use

nsCheapString(static_cast<nsStringBuffer*>(thisCont->mStringBits)).Equals(
  nsCheapString(static_cast<nsStringBuffer*>(otherCont->mStringBits)))

?

Looks great otherwise!
Attachment #338205 - Flags: superreview?(jonas)
Attachment #338205 - Flags: superreview+
Attachment #338205 - Flags: review?(jonas)
Attachment #338205 - Flags: review+
Attached patch checked inSplinter Review
So far nothing dramatic in perf/mem numbers, but will continue looking at those.
Status: ASSIGNED → RESOLVED
Closed: 11 years ago
Resolution: --- → FIXED
Well this seems like a duh! type answer if it resolves to an enum then case can NOT matter.
To make it clear, what is required for this to be a bug is "This specification required 'blah' to be case sensitive" and the Mozilla specification reduces it to an enum which is obviously case insensitive.  So, without the reference to where there is a specification that says it should be case sensitive there is no bug here.
You need to log in before you can comment on or make changes to this bug.