Closed Bug 1392185 Opened 5 years ago Closed 4 years ago

Remove special handling of atoms in the HTML5 parser


(Core :: DOM: HTML Parser, defect, P3)




Tracking Status
firefox57 --- affected
firefox65 --- fixed


(Reporter: n.nethercote, Assigned: emilio)




(3 files, 6 obsolete files)

The HTML5 parser handles atoms in a complicated way. It uses the the main atom
API for static atoms, but maintains parser-local tables for dynamic atoms. This
was done to avoid the cost of locking when looking up dynamic atoms, and also
so that dynamic atoms can be handled off the main thread. This requires having
a separate dynamic atom type (nsHtml5Atom) and migrating those atoms to vanilla
dynamic atoms at certain points during parsing.

However, since this machinery was put into place, the main atom table has
become threadsafe (bug 1275755) and a cache has been added (bug 1352874) to the
parser that removes the need for most of the full table lookups. So it can now be removed.
Henri: this is my first attempt at this. There are a couple of "njn:" comments
in places where I have questions. I also have updated the generated C++ files
by hand but have not updated the Java files; I will do that if this looks like
it's in a state suitable for landing.
Attachment #8899360 - Flags: feedback?(hsivonen)
Assignee: nobody → n.nethercote
Comment on attachment 8899360 [details] [diff] [review]
Remove nsHtml5Atom and associated machinery

Review of attachment 8899360 [details] [diff] [review]:

In general, this looks good if we want to do this. Not too long ago, I asked if I should go ahead and do this now that the atoms are thread-safe, but someone, IIRC froydnj, advised me not to, mostly because of locking issues. The comments here explain that the recent atom cache addresses the locking issue. Still, requesting feedback from froydnj also, just in case.

I note that this adds a bunch of atomic refcount changes to the atoms compared to the previous state. Has it been confirmed that this patch doesn't cause a measurable perf regression?

(For the Java to C++ translation, this will add the slight new complication that the types that translate to atoms now need to translate to different types (nsIAtom*) in the argument list vs. elsewhere (nsCOMPtr<nsIAtom>), so the Java patch will be a bit more than just changing the type names.)

::: parser/html/nsHtml5AtomTable.cpp
@@ +31,5 @@
>    }
> +  nsCOMPtr<nsIAtom> atom = NS_Atomize(aKey);
> +  mRecentlyUsedParserAtoms[index] = atom;
> +  return atom;

Now nsHtml5AtomTable only serves the caching purpose, is it intentional to keep a separate parser-scoped cache for parsing that happens on the main thread (document.write() and innerHTML) as this patch does (i.e. the parser doesn't trash the general-purpose cache) or should the parser instead use the general-purpose main-thread atom cache?

::: parser/html/nsHtml5AtomTable.h
@@ +29,5 @@
>   * accessed from the main thread. An instance of nsHtml5AtomTable that belongs 
>   * to an nsHtml5StreamParser is accessed both from the main thread and from the 
>   * thread that executes the runnables of the nsHtml5StreamParser instance. 
>   * However, the threads never access the nsHtml5AtomTable instance concurrently 
>   * in the nsHtml5StreamParser case.

Is the above paragraph still true now that there is no longer a need to clone the atoms when the post-document.write() state is cloned to the parser thread? I.e. aren't the instances now exclusively either accessed on the main thread or on the parser thread?

::: parser/html/nsHtml5AttributeEntry.h
@@ +61,3 @@
>    {
>      // Copy the memory
>      nsHtml5AttributeEntry clone(*this);

AFAICT since the C++ default copy constructor is defined as a member-wise copy, the above does the right thing in terms of incrementing the atom refcounts now that the atom members are smart pointers. However, this makes the comment wrong: It's no longer just a copy of the memory but the refcounts are affected.

@@ +68,1 @@
>      clone.mValue = this->mValue.Clone();

This is needed to manually increment the refcount of the attribute value, which is, unfortunately, manually managed.

::: parser/html/nsHtml5TreeOperation.h
@@ +528,5 @@
>      // decide how many operands it dequeues after it.
>      eHtml5TreeOperation mOpCode;
>      union {
>        nsIContent**                    node;
> +      // njn: should this be nsCOMPtr<nsIAtom>? the union makes it difficult...

It seems to me that this need manual refcounting, since the union makes it infeasible to rely on the destructor of a smart pointer.
Attachment #8899360 - Flags: feedback?(nfroyd)
Attachment #8899360 - Flags: feedback?(hsivonen)
Attachment #8899360 - Flags: feedback+
Comment on attachment 8899360 [details] [diff] [review]
Remove nsHtml5Atom and associated machinery

Review of attachment 8899360 [details] [diff] [review]:

I don't have significant comments on the patch as-is.  I don't specifically recall the conversation Henri cites (bugzilla comments, or possibly San Francisco?), but the locking that this patch will require is a concern.  The cache hit rate (bug 1352874 comment 1) looks to be shockingly high, so perhaps we'll only be locking in a small amount of cases anyway, which would lessen the impact.  It's also not clear to me how contended the atom table's lock is while the HTML parser is doing its work; my working assumption is that locks are usually not a problem if they are not contended.

I don't have much beyond that idle speculation.  Henri's question about atomic refcounting is also something worth looking into; I guess that's only a concern for atoms that we don't pre-declare (i.e. dynamic atoms), though?  How prevalent are those kinds of atoms in the HTML parser?
Attachment #8899360 - Flags: feedback?(nfroyd)
I haven't done any serious perf comparisons yet. But I have counted the number of cache misses, and it's low -- after starting the browser, running speedometer, and opening a few other tabs the total number of misses was a few hundred.
(In reply to Nathan Froyd [:froydnj] from comment #3)
> Henri's question about
> atomic refcounting is also something worth looking into; I guess that's only
> a concern for atoms that we don't pre-declare (i.e. dynamic atoms), though? 
> How prevalent are those kinds of atoms in the HTML parser?

Only dynamic atoms would perform atomic refcount operations. However, this patch also adds a bunch a virtual calls (that do nothing) on static atoms.

Standard element and attribute names are static atoms. The names of custom elements as well as data-* attributes are dynamic atoms, so newer Web apps are likely to exercise dynamic atomization more that old pages.

Additionally, bug 1390506 may be of interest in terms of a possible future change to off-the-main-thread atomization. It would atomize class names (when there is no whitespace in the class attribute) off the main thread. Bug 1375701 added such in-parser atomization for innerHTML using the general-purpose main-thread atom cache.
Priority: -- → P3
Depends on: 1401873
Note: I'm going to remove nsHtml5Atom in bug 1401873, but that will leave nsAtoms with a HTML5 variant, which is much the same thing.
Summary: Remove nsHtml5Atom and associated machinery → Remove HTML5 atoms and associated machinery
Summary: Remove HTML5 atoms and associated machinery → Remove special handling of atoms in the HTML5 parser
Depends on: 1411893
Note that bug 1440824 should significantly improve the locking situation.
Blocks: 1257126
No longer blocks: 529808
Here is an updated version of the earlier patch. It applies against
mozilla-inbound revision 81c0cc6fe4c4. It's mostly good; there is one "njn:"
comment indicating a place where some manual refcounting needs to be added for
an `nsAtom*` within a union.

I originally worked on this because I thought it was a prerequisite for
removing the static atom table. But it's clear now that is incorrect. I don't
have the time right now to make the necessary modifications to the htmlparser
repository to produce the generated C++ files appropriately. But if anybody
wants to do that, this patch is an excellent starting point.
Attached patch Remove HTML5 atoms (obsolete) — Splinter Review
And once the parsing stuff is cleaned up, we have no need for HTML5 atoms. This
patch hasn't been tested, but it should be close to what is needed.
Attachment #8899360 - Attachment is obsolete: true
As mentioned in comment 8, I'm not planning on working on this bug any more. But the attached patches go a long way to solving things if someone else wants to take over.
Assignee: n.nethercote → nobody
Updated version that fixes the commit message a bit.
Attachment #8954585 - Attachment is obsolete: true
Assignee: nobody → n.nethercote
Assignee: n.nethercote → nobody
I gave this a shot.
Assignee: nobody → emilio
We're going to refcount atoms normally, so we need to hold strong references to them in some places.

This modifies the translator to output a different type for fields and nested types to accomplish this.
Attachment #8954624 - Attachment is obsolete: true
Attachment #8954626 - Attachment is obsolete: true
Attachment #9022916 - Flags: review?(hsivonen)
This doesn't have my translator changes applied, so it's just useful to make
the followup reviews easier, since make translate undoes the clang formatting.
Drive-by cleanup while I audited atom usage from the parser.
This isn't enough to remove the HTML5 atoms, but it's the automatic part of it,
so probably should be separately reviewed.
This is a rebase + manual refcounting on some places, + cleanup of the original
patch in the bug.

Try hasn't finished yet but I expect only minor fixes to be needed, if any.

Co-authored-by: Nicholas Nethercote <>
Blocks: 1499170
Depends on: 1466449
Pushed by
Remove dead code from nsHtml5ArrayCopy.h. r=hsivonen
Remove dynamic HTML5 atoms. r=njn,hsivonen
Blocks: 1506133
Closed: 4 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla65
Attachment #9022920 - Attachment is obsolete: true
Pushed the htmlparser repo patch with edited commit message:

Thanks for fixing this!
Attachment #9022917 - Attachment is obsolete: true
You need to log in before you can comment on or make changes to this bug.