Closed Bug 1392185 Opened 7 years ago Closed 6 years ago

Remove special handling of atoms in the HTML5 parser

Tracking

()

Status:

RESOLVED FIXED

Milestone:

mozilla65

Tracking Flags:

Tracking

Status

firefox57

---

affected

firefox65

---

fixed

People

(Reporter: n.nethercote, Assigned: emilio)

References

Details

Attachments

(3 files, 6 obsolete files)

Remove nsHtml5Atom and associated machinery 7 years ago Nicholas Nethercote [inactive] 45.42 KB, patch	hsivonen : feedback+	Details \| Diff \| Splinter Review
Remove special handling of atoms in the HTML5 parser 7 years ago Nicholas Nethercote [inactive] 69.51 KB, patch		Details \| Diff \| Splinter Review
Remove HTML5 atoms 7 years ago Nicholas Nethercote [inactive] 11.67 KB, patch		Details \| Diff \| Splinter Review
Remove special handling of atoms in the HTML5 parser 7 years ago Nicholas Nethercote [inactive] 69.58 KB, patch		Details \| Diff \| Splinter Review
Translator changes. 6 years ago Emilio Cobos Álvarez (:emilio) 8.74 KB, patch	hsivonen : review+	Details \| Diff \| Splinter Review
Bug 1392185 - cd parser/html && make sync && make translate 6 years ago Emilio Cobos Álvarez (:emilio) 47 bytes, text/x-phabricator-request		Details \| Review
Bug 1392185 - Remove an unused function. 6 years ago Emilio Cobos Álvarez (:emilio) 47 bytes, text/x-phabricator-request		Details \| Review
Bug 1392185 - Run the translator with the changes. 6 years ago Emilio Cobos Álvarez (:emilio) 47 bytes, text/x-phabricator-request		Details \| Review
Bug 1392185 - Remove dynamic HTML5 atoms. 6 years ago Emilio Cobos Álvarez (:emilio) 47 bytes, text/x-phabricator-request		Details \| Review

Nicholas Nethercote [inactive]

Reporter

Description

•

7 years ago

The HTML5 parser handles atoms in a complicated way. It uses the the main atom API for static atoms, but maintains parser-local tables for dynamic atoms. This was done to avoid the cost of locking when looking up dynamic atoms, and also so that dynamic atoms can be handled off the main thread. This requires having a separate dynamic atom type (nsHtml5Atom) and migrating those atoms to vanilla dynamic atoms at certain points during parsing. However, since this machinery was put into place, the main atom table has become threadsafe (bug 1275755) and a cache has been added (bug 1352874) to the parser that removes the need for most of the full table lookups. So it can now be removed.

Nicholas Nethercote [inactive]

Reporter

Comment 1

•

7 years ago

Attached patch Remove nsHtml5Atom and associated machinery (obsolete) — Details — Splinter Review

Henri: this is my first attempt at this. There are a couple of "njn:" comments in places where I have questions. I also have updated the generated C++ files by hand but have not updated the Java files; I will do that if this looks like it's in a state suitable for landing.

Attachment #8899360 - Flags: feedback?(hsivonen)

Nicholas Nethercote [inactive]

Reporter

Updated

•

7 years ago

Assignee: nobody → n.nethercote

Status: NEW → ASSIGNED

Henri Sivonen (:hsivonen) (not reading bugmail until 2025-02-06)

Comment 2

•

7 years ago

Comment on attachment 8899360 [details] [diff] [review] Remove nsHtml5Atom and associated machinery Review of attachment 8899360 [details] [diff] [review]: ----------------------------------------------------------------- In general, this looks good if we want to do this. Not too long ago, I asked if I should go ahead and do this now that the atoms are thread-safe, but someone, IIRC froydnj, advised me not to, mostly because of locking issues. The comments here explain that the recent atom cache addresses the locking issue. Still, requesting feedback from froydnj also, just in case. I note that this adds a bunch of atomic refcount changes to the atoms compared to the previous state. Has it been confirmed that this patch doesn't cause a measurable perf regression? (For the Java to C++ translation, this will add the slight new complication that the types that translate to atoms now need to translate to different types (nsIAtom*) in the argument list vs. elsewhere (nsCOMPtr<nsIAtom>), so the Java patch will be a bit more than just changing the type names.) ::: parser/html/nsHtml5AtomTable.cpp @@ +31,5 @@ > } > > + nsCOMPtr<nsIAtom> atom = NS_Atomize(aKey); > + mRecentlyUsedParserAtoms[index] = atom; > + return atom; Now nsHtml5AtomTable only serves the caching purpose, is it intentional to keep a separate parser-scoped cache for parsing that happens on the main thread (document.write() and innerHTML) as this patch does (i.e. the parser doesn't trash the general-purpose cache) or should the parser instead use the general-purpose main-thread atom cache? ::: parser/html/nsHtml5AtomTable.h @@ +29,5 @@ > * accessed from the main thread. An instance of nsHtml5AtomTable that belongs > * to an nsHtml5StreamParser is accessed both from the main thread and from the > * thread that executes the runnables of the nsHtml5StreamParser instance. > * However, the threads never access the nsHtml5AtomTable instance concurrently > * in the nsHtml5StreamParser case. Is the above paragraph still true now that there is no longer a need to clone the atoms when the post-document.write() state is cloned to the parser thread? I.e. aren't the instances now exclusively either accessed on the main thread or on the parser thread? ::: parser/html/nsHtml5AttributeEntry.h @@ +61,3 @@ > { > // Copy the memory > nsHtml5AttributeEntry clone(*this); AFAICT since the C++ default copy constructor is defined as a member-wise copy, the above does the right thing in terms of incrementing the atom refcounts now that the atom members are smart pointers. However, this makes the comment wrong: It's no longer just a copy of the memory but the refcounts are affected. @@ +68,1 @@ > clone.mValue = this->mValue.Clone(); This is needed to manually increment the refcount of the attribute value, which is, unfortunately, manually managed. ::: parser/html/nsHtml5TreeOperation.h @@ +528,5 @@ > // decide how many operands it dequeues after it. > eHtml5TreeOperation mOpCode; > union { > nsIContent** node; > + // njn: should this be nsCOMPtr<nsIAtom>? the union makes it difficult... It seems to me that this need manual refcounting, since the union makes it infeasible to rely on the destructor of a smart pointer.

Attachment #8899360 - Flags: feedback?(nfroyd)

Attachment #8899360 - Flags: feedback?(hsivonen)

Attachment #8899360 - Flags: feedback+

Nathan Froyd [:froydnj]

Comment 3

•

7 years ago

Comment on attachment 8899360 [details] [diff] [review] Remove nsHtml5Atom and associated machinery Review of attachment 8899360 [details] [diff] [review]: ----------------------------------------------------------------- I don't have significant comments on the patch as-is. I don't specifically recall the conversation Henri cites (bugzilla comments, or possibly San Francisco?), but the locking that this patch will require is a concern. The cache hit rate (bug 1352874 comment 1) looks to be shockingly high, so perhaps we'll only be locking in a small amount of cases anyway, which would lessen the impact. It's also not clear to me how contended the atom table's lock is while the HTML parser is doing its work; my working assumption is that locks are usually not a problem if they are not contended. I don't have much beyond that idle speculation. Henri's question about atomic refcounting is also something worth looking into; I guess that's only a concern for atoms that we don't pre-declare (i.e. dynamic atoms), though? How prevalent are those kinds of atoms in the HTML parser?

Attachment #8899360 - Flags: feedback?(nfroyd)

Nicholas Nethercote [inactive]

Reporter

Comment 4

•

7 years ago

I haven't done any serious perf comparisons yet. But I have counted the number of cache misses, and it's low -- after starting the browser, running speedometer, and opening a few other tabs the total number of misses was a few hundred.

Henri Sivonen (:hsivonen) (not reading bugmail until 2025-02-06)

•

7 years ago

Depends on: 1411893

Bobby Holley (:bholley)

Comment 7

•

7 years ago

Note that bug 1440824 should significantly improve the locking situation.

Nicholas Nethercote [inactive]

Reporter

•

7 years ago

Attachment #8899360 - Attachment is obsolete: true

Nicholas Nethercote [inactive]

Reporter

Comment 10

•

7 years ago

As mentioned in comment 8, I'm not planning on working on this bug any more. But the attached patches go a long way to solving things if someone else wants to take over.

Assignee: n.nethercote → nobody

Status: ASSIGNED → NEW

Nicholas Nethercote [inactive]

Reporter

Comment 11

•

7 years ago

Attached patch Remove special handling of atoms in the HTML5 parser (obsolete) — Details — Splinter Review

Updated version that fixes the commit message a bit.

Nicholas Nethercote [inactive]

Reporter

•

6 years ago

Attached patch Translator changes. — Details — Splinter Review

We're going to refcount atoms normally, so we need to hold strong references to them in some places. This modifies the translator to output a different type for fields and nested types to accomplish this.

Attachment #8954624 - Attachment is obsolete: true

Attachment #8954626 - Attachment is obsolete: true

Attachment #9022916 - Flags: review?(hsivonen)

Emilio Cobos Álvarez (:emilio)

Assignee

Comment 14

•

6 years ago

Attached file Bug 1392185 - cd parser/html && make sync && make translate (obsolete) — Details

This doesn't have my translator changes applied, so it's just useful to make the followup reviews easier, since make translate undoes the clang formatting.

Emilio Cobos Álvarez (:emilio)

Assignee

Comment 15

•

6 years ago

Attached file Bug 1392185 - Remove an unused function. — Details

Drive-by cleanup while I audited atom usage from the parser.

Emilio Cobos Álvarez (:emilio)

Assignee

Comment 16

•

6 years ago

Attached file Bug 1392185 - Run the translator with the changes. (obsolete) — Details

This isn't enough to remove the HTML5 atoms, but it's the automatic part of it, so probably should be separately reviewed.

Emilio Cobos Álvarez (:emilio)

Assignee

Comment 17

•

6 years ago

Attached file Bug 1392185 - Remove dynamic HTML5 atoms. — Details

This is a rebase + manual refcounting on some places, + cleanup of the original patch in the bug. Try hasn't finished yet but I expect only minor fixes to be needed, if any. Co-authored-by: Nicholas Nethercote <nnethercote@mozilla.com>

Emilio Cobos Álvarez (:emilio)

Assignee

Updated

•

6 years ago

Blocks: 1499170

Emilio Cobos Álvarez (:emilio)

Assignee

Comment 18

•

6 years ago

https://treeherder.mozilla.org/#/jobs?repo=try&revision=cd9edafd6df61729551801db5d0ad5eecbb16bbf

Henri Sivonen (:hsivonen) (not reading bugmail until 2025-02-06)

Updated

•

6 years ago

Attachment #9022916 - Flags: review?(hsivonen) → review+

Emilio Cobos Álvarez (:emilio)

Assignee

•

6 years ago

\o/

Stefan Hindli [:stefan_hindli]

Comment 21

•

6 years ago

bugherder

https://hg.mozilla.org/mozilla-central/rev/48b23717fe95 https://hg.mozilla.org/mozilla-central/rev/e9267d39ec81

Status: NEW → RESOLVED

Closed: 6 years ago

status-firefox65: --- → fixed

Resolution: --- → FIXED

Target Milestone: --- → mozilla65

Phabricator Automation

Updated

•

6 years ago

Attachment #9022920 - Attachment is obsolete: true

Henri Sivonen (:hsivonen) (not reading bugmail until 2025-02-06)

Comment 22

•

6 years ago

Pushed the htmlparser repo patch with edited commit message: https://hg.mozilla.org/projects/htmlparser/rev/c50d5979f5bb Thanks for fixing this!

Phabricator Automation

Updated

•

6 years ago

Attachment #9022917 - Attachment is obsolete: true

Henri Sivonen (:hsivonen) (not reading bugmail until 2025-02-06)

Updated

•

6 years ago

Depends on: 1512613

You need to log in before you can comment on or make changes to this bug.