Closed
Bug 869788
Opened 10 years ago
Closed 6 years ago
DOMTokenList stringifier should trim/compress white space and only emit unique values
Categories
(Core :: DOM: Core & HTML, defect)
Core
DOM: Core & HTML
Tracking
()
RESOLVED
FIXED
mozilla55
Tracking | Status | |
---|---|---|
firefox55 | --- | fixed |
People
(Reporter: heycam, Assigned: ayg)
References
Details
(Keywords: dev-doc-complete)
Attachments
(3 files, 7 obsolete files)
I think this test should pass, according to the DOM Standard.
![]() |
||
Comment 1•10 years ago
|
||
Yep. Want to fix? ;)
Reporter | ||
Comment 3•10 years ago
|
||
Attachment #749105 -
Flags: review?(bzbarsky)
![]() |
||
Comment 4•10 years ago
|
||
Hmm. I'm worried about the nsAttrValue changes in terms of performance, and also in terms of something depending on our current behavior....
Reporter | ||
Comment 5•10 years ago
|
||
I think these changes are all supported by the spec. IE will normalise white space when you modify the class list through the DOMTokenList, but it won't remove duplicate entries. Chrome seems to match our current behaviour, from a couple of quick tests. Is your worry big enough to file a bug on the spec?
![]() |
||
Comment 6•10 years ago
|
||
I'm not sure. Would need to see some real-life performance numbers....
![]() |
||
Comment 7•10 years ago
|
||
Comment on attachment 749105 [details] [diff] [review] patch Olli, can you take this off my hands?
Attachment #749105 -
Flags: review?(bzbarsky) → review?(bugs)
Comment 8•10 years ago
|
||
Comment on attachment 749105 [details] [diff] [review] patch This seems to slow down class attribute setting few %, around ~5%, but depends of course on the case. Maybe GetAtomArrayString could use a bloom filter on stack, and if a value is found twice or more, actually create a set and stringify it and perhaps update also the atom array to not contain duplicates. (GetAtomArrayString should be called something else then.) Atoms have pre-calculated hashcode, so BloomFilter handling should be fast. How does that sound to you?
Attachment #749105 -
Flags: review?(bugs) → review-
Reporter | ||
Comment 9•10 years ago
|
||
Sounds good, I can do that.
Comment 10•8 years ago
|
||
FYI, we're planning to make this change in WebKit: https://bugs.webkit.org/show_bug.cgi?id=148589
Reporter | ||
Updated•8 years ago
|
Flags: needinfo?(cam)
Assignee | ||
Comment 11•8 years ago
|
||
Wouldn't it be better to only compute the duplicate removal when classList is actually accessed? This should remove the performance problems completely without any optimization, I'd think, unless we're worried about someone calling .classList on an element with 67,000 classes.
Flags: needinfo?(bugs)
Assignee | ||
Comment 12•8 years ago
|
||
Here's a different approach. It will have no perf impact unless you access .classList. In that case perf isn't great, but probably it's acceptable in real life? I could change the O(N^2) duplicate check in Parse() to use a hashtable if you think it would speed things up in real-world cases. I put the code in nsDOMTokenList instead of nsAttrValue because this really doesn't affect behavior when accessed not via nsDOMTokenList. But I don't understand how all this fits together, so very possibly this is just wrong. https://treeherder.mozilla.org/#/jobs?repo=try&revision=833592b68654
Assignee: cam → ayg
Status: NEW → ASSIGNED
Flags: needinfo?(bugs)
Attachment #8671366 -
Flags: review?(bugs)
Assignee | ||
Comment 13•8 years ago
|
||
(Failing tests are all either unexpected passes, or microdata tests that have been deleted upstream.)
Comment 14•8 years ago
|
||
microdata tests shouldn't still start to fail, right? (we will probably remove microdata API in couple of months, but not right now)
Assignee | ||
Comment 15•8 years ago
|
||
AFAICT, the unexpected fails in the microdata tests were wrong. But I can't tell for sure, because there's no spec anymore. :) Anyway, they've been removed upstream: https://github.com/w3c/web-platform-tests/commit/e955cd447aa99fd000f211971c24ced87b04fe5b So they're going away with the next wpt sync, even if we still support the feature for now.
Comment 16•8 years ago
|
||
Comment on attachment 8671366 [details] [diff] [review] Different approach > nsDOMTokenList::IndexedGetter(uint32_t aIndex, bool& aFound, nsAString& aResult) > { >- const nsAttrValue* attr = GetParsedAttr(); >- >- if (attr && aIndex < static_cast<uint32_t>(attr->GetAtomCount())) { >+ nsAutoString attr; >+ mElement->GetAttr(kNameSpaceID_None, mAttrAtom, attr); >+ nsTArray<nsString> list; >+ Parse(attr, list); >+ if (aIndex < static_cast<uint32_t>(list.Length())) { > aFound = true; >- attr->AtomAt(aIndex)->ToString(aResult); >+ aResult = list[aIndex]; > } else { > aFound = false; > } > } So you're making indexed getter _parse_ the already parsed attribute value again each time it is accessed. Don't understand why. Why don't you just rely on nsAttrValue* ? I assume this change would show up rather badly in some microbenchmarks. >@@ -128,40 +178,27 @@ nsDOMTokenList::AddInternal(const nsAttrValue* aAttr, > } > > nsAutoString resultStr; > > if (aAttr) { > aAttr->ToString(resultStr); > } > >- bool oneWasAdded = false; >- nsAutoTArray<nsString, 10> addedClasses; >+ nsAutoTArray<nsString, 10> list; >+ Parse(resultStr, list); > > for (uint32_t i = 0, l = aTokens.Length(); i < l; ++i) { >- const nsString& aToken = aTokens[i]; >- >- if ((aAttr && aAttr->Contains(aToken)) || >- addedClasses.Contains(aToken)) { >- continue; >+ if (!list.Contains(aTokens[i])) { >+ list.AppendElement(aTokens[i]); > } >- >- if (oneWasAdded || >- (!resultStr.IsEmpty() && >- !nsContentUtils::IsHTMLWhitespace(resultStr.Last()))) { >- resultStr.Append(' '); >- resultStr.Append(aToken); >- } else { >- resultStr.Append(aToken); >- } >- >- oneWasAdded = true; >- addedClasses.AppendElement(aToken); > } > >+ Serialize(list, resultStr); >+ Also here. Why all this parsing and serializing. > mElement->SetAttr(kNameSpaceID_None, mAttrAtom, resultStr, true); ... which SetAttr will then parse again so that it can store things in AtomArray > nsDOMTokenList::RemoveInternal(const nsAttrValue* aAttr, > const nsTArray<nsString>& aTokens) has similar issue Sorry, but this looks like a way to make TokenList behave really slowly, and I would be surprised if there isn't some silly benchmark doing microbenchmarking on this interface too.
Attachment #8671366 -
Flags: review?(bugs) → review-
Reporter | ||
Updated•8 years ago
|
Flags: needinfo?(cam)
Assignee | ||
Comment 17•8 years ago
|
||
I separated the BloomFilter code into a different patch to preserve attribution better, since this patch is still almost entirely Cameron's code.
Attachment #749105 -
Attachment is obsolete: true
Attachment #8671366 -
Attachment is obsolete: true
Attachment #8672386 -
Flags: review?(bugs)
Assignee | ||
Comment 18•8 years ago
|
||
https://treeherder.mozilla.org/#/jobs?repo=try&revision=af39cffe8c3d I don't understand comment 8 -- why should this slow down attribute setting? Parsing is only needed when classList is accessed, so why should this code run when the class attribute is set? Surely we should defer it? 99.9% of the time that a class is set, we don't need to parse.
Attachment #8672387 -
Flags: review?(bugs)
Assignee | ||
Comment 19•8 years ago
|
||
The first version was missing a somewhat important line. :) https://treeherder.mozilla.org/#/jobs?repo=try&revision=131c142d212d
Attachment #8672387 -
Attachment is obsolete: true
Attachment #8672387 -
Flags: review?(bugs)
Attachment #8672397 -
Flags: review?(bugs)
Comment 21•8 years ago
|
||
(In reply to :Aryeh Gregor (working until October 13) from comment #18) > I don't understand comment 8 -- why should this slow down attribute setting? There is that Hashtable creation at least.
Assignee | ||
Comment 22•8 years ago
|
||
Why do we have to do any parsing when the class attribute is set? Why not wait until the first access of .classList?
Comment 23•8 years ago
|
||
Well, the atom array is mostly needed by CSSRuleProcessor. Sure, we could make nsIContent::GetClasses() or AttrValue::GetAtomCount() to parse the value first time it is needed, but that isn't what we've done so far with AttrValues.
Comment 24•8 years ago
|
||
Comment on attachment 8672386 [details] [diff] [review] Part 1 -- Cameron's patch updated to apply to current code hard to review this when this has tons of unrelated changes (all those test removals).
Assignee | ||
Comment 25•8 years ago
|
||
Comment 26•8 years ago
|
||
Comment on attachment 8672386 [details] [diff] [review] Part 1 -- Cameron's patch updated to apply to current code Waiting for some new patch.
Attachment #8672386 -
Flags: review?(bugs)
Updated•8 years ago
|
Attachment #8672397 -
Flags: review?(bugs)
Comment 27•8 years ago
|
||
So the eDiscardDuplicates approach takes care of only classList. What about iframe.sandox = "some tokens" and other similar attributes? SettableTokenList.value uses the same 'ordered set parser' what element's 'classes' use. (But then, I don't see anything guaranteeing that iframe.setAttribute("sandbox", "new new token token") leads to unique tokens in the list. element.setAttribute("class", "class1 class1"); is defined to skip duplicates. )
Updated•8 years ago
|
Keywords: dev-doc-needed
Comment 28•8 years ago
|
||
Skip duplicates for unique tokens in the list (for other attribute than class) was covered by solving this bug in DOM spec: https://www.w3.org/Bugs/Public/show_bug.cgi?id=20660 And stringifier for DOMTokenList was reverted to previouse requirements so Firefox has it correct now (like other browsers): https://github.com/whatwg/dom/issues/105
Assignee | ||
Comment 29•7 years ago
|
||
Okay, how about this? If you don't like the bloom filter or hash table usage on every assignment, I could try to do those more lazily. Probably will need test adjustments for the try run: https://treeherder.mozilla.org/#/jobs?repo=try&revision=1c2d7d7f1083
Attachment #8672386 -
Attachment is obsolete: true
Attachment #8672397 -
Attachment is obsolete: true
Attachment #8672716 -
Attachment is obsolete: true
Attachment #8739989 -
Flags: review?(bugs)
Assignee | ||
Comment 30•7 years ago
|
||
Comment on attachment 8739989 [details] [diff] [review] New patch Not quite finished. Will re-post for review when I get a green try run.
Attachment #8739989 -
Attachment is obsolete: true
Attachment #8739989 -
Flags: review?(bugs)
Assignee | ||
Comment 31•7 years ago
|
||
Finished try run: https://treeherder.mozilla.org/#/jobs?repo=try&revision=235cdc528e0a The failures there should be fixed in the new try run: https://treeherder.mozilla.org/#/jobs?repo=try&revision=b6cb5447ee18a494579659fce37dc10caa5d9ec5 I wrote more tests that I'll submit upstream to wpt, that test all the DOMTokenList attributes in the DOM spec (not just classList). I based it off dom/base/test/test_classList.html, with XUL and mutation events removed.
Attachment #8740436 -
Flags: review?(bugs)
Comment 32•7 years ago
|
||
Ok, so https://github.com/whatwg/dom/issues/105 makes it still a bit unclear what we want here.
Comment 33•7 years ago
|
||
Comment on attachment 8740436 [details] [diff] [review] New patch >+ BloomFilter<8, nsIAtom> filter; >+ if (aDuplicateHandling == eDiscardDuplicates) { >+ filter.add(classAtom); >+ } >+ >+ nsDataHashtable<nsPtrHashKey<nsIAtom>, bool> tokens; Have you checked whether constructing and deconstructing a hashtable shows up in performance profiles? >+ enum DuplicateHandling { eAllowDuplicates, eDiscardDuplicates }; >+ void ParseAtomArray(const nsAString& aValue, >+ DuplicateHandling aDuplicateHandling = eDiscardDuplicates); Could you remove the default value. It is hard to say whether this patch might have some odd side effects since I don't know in which all cases it is being called. autocomplete seems to be one such special case, and there eAllowDuplicates is being used. +void +nsDOMTokenList::Reserialize(const nsAttrValue* aAttr) +{ + nsAutoString result; + aAttr->GetAtomArrayString(result); + mElement->SetAttr(kNameSpaceID_None, mAttrAtom, result, true); +} So we don't ever call this method. > nsDOMTokenList::RemoveInternal(const nsAttrValue* aAttr, > const nsTArray<nsString>& aTokens) ... >- mElement->SetAttr(kNameSpaceID_None, mAttrAtom, output, true); >+ nsAutoString result; >+ aAttr->GetAtomArrayStringWithSomeRemoved(result, aTokens); >+ mElement->SetAttr(kNameSpaceID_None, mAttrAtom, result, true); This is a bit odd. GetAtomArrayStringWithSomeRemoved serializes the attribute (with some parts removed) and then SetAttr parses the same string. But ok, I guess this simplifies mutation handling quite a bit. Perhaps add a comment why we do it this way. >@@ -134,17 +142,17 @@ nsDOMTokenList::AddInternal(const nsAttrValue* aAttr, > { > if (!mElement) { > return; > } > > nsAutoString resultStr; > > if (aAttr) { >- aAttr->ToString(resultStr); >+ aAttr->GetAtomArrayString(resultStr); > } So given that we're using GetAtomArrayString now here, why do we still need !nsContentUtils::IsHTMLWhitespace(resultStr.Last() check few lines below? I think I'd like to see a new patch with the default value for aDuplicateHandling removed. Just to make sure we aren't relying on the old behavior somewhere. Also, it is not clear whether we'll land this. Need to ask what MS and Blink folks think. I think I need to still re-read the spec, and figure out what https://github.com/whatwg/dom/issues/105#issuecomment-206327267 refers to. (It is unclear now that what is in the spec vs. what different browsers have implemented.)
Attachment #8740436 -
Flags: review?(bugs) → review-
Comment 34•6 years ago
|
||
The DOM Standard changed per https://github.com/whatwg/dom/issues/105. Follow-up work is bug 1347998.
Status: ASSIGNED → RESOLVED
Closed: 6 years ago
Resolution: --- → INVALID
Assignee | ||
Comment 36•6 years ago
|
||
It looks like the spec has been finalized (for now! :) ). The conclusion seems to be that everything normalizes whitespace and removes duplicates, *except* stringification/.value, which returns the attribute value as-is. So the patches need a bit of fixup but should be good.
Status: RESOLVED → REOPENED
Resolution: INVALID → ---
Comment hidden (mozreview-request) |
![]() |
||
Comment 38•6 years ago
|
||
Just as a heads-up, this bug is going to need a serious performance investigation before it can land. That means some measurements on microbenchmarks (and possibly resulting optimization), measurements on actual sites, etc.
Assignee | ||
Comment 39•6 years ago
|
||
Olli, do you still want to review this? Whoever's going to review this: please say exactly what performance tests you want me to run and I'll run them and see what I can do. There are some obvious optimizations to do if the results aren't good enough.
Flags: needinfo?(bugs)
![]() |
||
Comment 40•6 years ago
|
||
Just to put the request from comment 39 in perspective, I suspect that coming up with (not running!) the right set of performance tests here is a pretty nontrivial task. I'd estimate it to take an hour or two of thought for someone who is already intimately familiar with the patch.
Assignee | ||
Comment 41•6 years ago
|
||
Hmm. Well, I think we do want to make this change, so I guess it will have to wait until someone has time to come up with the right performance figures.
Comment 42•6 years ago
|
||
Think about worst case scenarios and write testcases and profile? ParseAtomArray is very hot when dealing with class attribute for example. Call setAttribute("class", <varying list of values>); in a loop? Setting class attribute is very common operation and it already shows up when profiling innerHTML
Flags: needinfo?(bugs)
Comment 43•6 years ago
|
||
And to profile the code, addon from https://perf-html.io/ gives often good enough numbers.
Assignee | ||
Comment 44•6 years ago
|
||
Would it be easier if we initialized the token list lazily only when a .classList method is called? Then attribute setting would just have to null the token list, which would be reinitialized when next needed. Would that mean we wouldn't have to do performance testing, since .classList methods are probably not hot paths? We should still initialize the token list efficiently, of course, but if it slows down the first call to DOMTokenList.add/remove/toggle/replace/length somewhat it shouldn't be a big deal, right?
Flags: needinfo?(bugs)
![]() |
||
Comment 45•6 years ago
|
||
We need to have _a_ token list (in the sense of array of nsIAtom) for CSS matching on class to work right. We could leave that list as it is right now and perform the first normalization on actual DOMTokenList access, though.
Assignee | ||
Comment 47•6 years ago
|
||
Should I add a dirty flag that tracks whether the token list needs normalization, to save repeated normalization if DOMTokenList methods are called repeatedly? If so, where should I put it?
Flags: needinfo?(bzbarsky)
![]() |
||
Comment 48•6 years ago
|
||
I haven't thought about this sufficiently to have an offhand answer to that question, sorry.
Flags: needinfo?(bzbarsky)
Comment 49•6 years ago
|
||
Yes, there should be a flag somewhere. Like, set the flag whenever an attribute is set or something. Though, I'm not sure if we have currently any spare bits in nsINode. We may, or I do recall there being a bug about removing some exiting flag. (filed by bholley I think )
Comment 50•6 years ago
|
||
Bug 1338723 is what I was thinking.
Comment hidden (mozreview-request) |
Assignee | ||
Comment 52•6 years ago
|
||
Notes for reviewing the patch: 1) It's based on bug 1354066. If this bug gets reviewed faster than that one, you could take over review for that one too if you want. 2) There are some spec changes in the pipe to this stuff, so it will probably need tweaking soon: https://github.com/whatwg/dom/issues/442 https://github.com/whatwg/dom/issues/443 https://github.com/whatwg/dom/pull/444 https://github.com/whatwg/infra/pull/126 But the changes should be minor, so it still makes sense to review this.
Assignee | ||
Updated•6 years ago
|
Attachment #8854853 -
Flags: review?(bugs)
Comment hidden (mozreview-request) |
Comment hidden (mozreview-request) |
Assignee | ||
Comment 55•6 years ago
|
||
smaug, do you want to review this, or should someone else? I'm around for about another week. Note comment 52.
Flags: needinfo?(bugs)
Assignee | ||
Updated•6 years ago
|
Attachment #8854853 -
Flags: review?(bugs)
Comment 56•6 years ago
|
||
Have you done any performance measurements in the worst case scenarios? How much does RemoveDuplicates(); slow down indexed getter for example when class attribute has several values?
Flags: needinfo?(bugs)
Comment 57•6 years ago
|
||
Comment on attachment 8854853 [details] Bug 869788 - Normalize DOMTokenList for whitespace/dupes; Reviewing once some performance test has been done. Like, does this slow down indexed getters significantly.
Attachment #8854853 -
Flags: review?(bugs)
Assignee | ||
Comment 58•6 years ago
|
||
data:text/html,<!DOCTYPE html> <span class="a b c d e f g h i j k l m n o p q r s t u v w x y z"></span> <span class="a a a a a a a a a a a a a a a a a a a a a a a a a a"></span> <span class="a a a a a a a a a a a a a a a a a a a a a a a a a b"></span> <script> var spans = document.querySelectorAll("span"); var results = []; for (var i = 0; i < 3; i++) { var start = window.performance.now(); var list = spans[i].classList; for (var j = 0; j < 1000000; j++) { list[7]; } results.push(window.performance.now() - start); } document.body.textContent = results.join(", "); </script> I get figures of around 530-580 without the patch. With the patch, the first is around 570-630, and the other two are around 720-760. This is with 26 classes and is a super-microbenchmark, though. If I change the inner loop to be spans[i].classList[7] instead of storing in a local variable, it already eliminates almost all the difference.
Flags: needinfo?(bugs)
Comment 59•6 years ago
|
||
mozreview-review |
Comment on attachment 8854853 [details] Bug 869788 - Normalize DOMTokenList for whitespace/dupes; https://reviewboard.mozilla.org/r/126806/#review135528 ::: dom/base/nsDOMTokenList.cpp:58 (Diff revision 3) > } > return mElement->GetAttrInfo(kNameSpaceID_None, mAttrAtom).mValue; > } > > +void > +nsDOMTokenList::RemoveDuplicates() Shouldn't this take nsAttrValue* as a param, since all the callers seem to have such anyhow. ::: dom/base/nsDOMTokenList.cpp:115 (Diff revision 3) > nsDOMTokenList::IndexedGetter(uint32_t aIndex, bool& aFound, nsAString& aResult) > { > + RemoveDuplicates(); > + > const nsAttrValue* attr = GetParsedAttr(); > Want to add the check here that RemoveDuplicates is called only if aIndex < atom count
Attachment #8854853 -
Flags: review-
Comment hidden (mozreview-request) |
Comment 61•6 years ago
|
||
mozreview-review |
Comment on attachment 8854853 [details] Bug 869788 - Normalize DOMTokenList for whitespace/dupes; https://reviewboard.mozilla.org/r/126806/#review135840 ::: dom/base/nsDOMTokenList.cpp:194 (Diff revision 4) > - aAttr->ToString(resultStr); > + RemoveDuplicates(aAttr); > + for (uint32_t i = 0; i < aAttr->GetAtomCount(); i++) { > + if (i != 0) { > + resultStr.AppendLiteral(" "); > + } > + resultStr.Append(nsDependentAtomString(aAttr->AtomAt(i))); I assume there is wpt test for this behavior, that <div class="a a a"> div.classList.add("a"); changes the attribute value to "a" ::: dom/base/nsDOMTokenList.cpp:359 (Diff revision 4) > - WhitespaceTokenizer tokenizer(attribute); > > bool sawIt = false; > - while (tokenizer.hasMoreTokens()) { > - auto currentToken = tokenizer.nextToken(); > - if (currentToken.Equals(aToken) || currentToken.Equals(aNewToken)) { > + nsAutoString resultStr; > + for (uint32_t i = 0; i < aAttr->GetAtomCount(); i++) { > + if (aToken == nsDependentAtomString(aAttr->AtomAt(i)) || aAttr->AtomAt(i)->Equals(aToken) would be a bit faster I think
Attachment #8854853 -
Flags: review?(bugs) → review+
Assignee | ||
Comment 62•6 years ago
|
||
(In reply to Olli Pettay [:smaug] from comment #61) > I assume there is wpt test for this behavior, that > > <div class="a a a"> > div.classList.add("a"); > changes the attribute value to "a" There are a couple like that: https://reviewboard-hg.mozilla.org/gecko/rev/7f37a8918fce#l7.28 https://reviewboard-hg.mozilla.org/gecko/rev/7f37a8918fce#l7.40 https://reviewboard-hg.mozilla.org/gecko/rev/7f37a8918fce#l7.40
Flags: needinfo?(bugs)
Comment hidden (mozreview-request) |
Comment 64•6 years ago
|
||
We're sorry, Autoland could not rebase your commits for you automatically. Please manually rebase your commits and try again. hg error in cmd: hg rebase -s c0bb078e6bce -d e05f84ea2a33: rebasing 391456:c0bb078e6bce "Bug 869788 - Normalize DOMTokenList for whitespace/dupes; r=smaug" (tip) local [dest] changed testing/web-platform/meta/dom/lists/DOMTokenList-iteration.html.ini which other [source] deleted use (c)hanged version, (d)elete, or leave (u)nresolved? u unresolved conflicts (see hg resolve, then hg rebase --continue)
Comment hidden (mozreview-request) |
Comment 66•6 years ago
|
||
Pushed by ayg@aryeh.name: https://hg.mozilla.org/integration/autoland/rev/e8f01e428501 Normalize DOMTokenList for whitespace/dupes; r=smaug
Comment 67•6 years ago
|
||
bugherder |
https://hg.mozilla.org/mozilla-central/rev/e8f01e428501
Status: REOPENED → RESOLVED
Closed: 6 years ago → 6 years ago
status-firefox55:
--- → fixed
Resolution: --- → FIXED
Target Milestone: --- → mozilla55
Comment 68•6 years ago
|
||
I've added a section to the docs explaining this behaviour: https://developer.mozilla.org/en-US/docs/Web/API/DOMTokenList#Trimming_of_whitespace_and_removal_of_duplicates and added a note to the Fx55 release notes: https://developer.mozilla.org/en-US/Firefox/Releases/55#DOM_HTML_DOM As a bonus, I've also documented all of the members of DOMTokenList, as many of the pages were missing, and others needed updates. Let me know if this looks OK!
Keywords: dev-doc-needed → dev-doc-complete
You need to log in
before you can comment on or make changes to this bug.
Description
•