[HTML5] Introduce a scoped nsIAtomService for the HTML5 parser

RESOLVED FIXED

Status

()

defect
RESOLVED FIXED
10 years ago
10 years ago

People

(Reporter: hsivonen, Assigned: hsivonen)

Tracking

Trunk
Points:
---
Dependency tree / graph
Bug Flags:
in-testsuite ?

Firefox Tracking Flags

(Not tracked)

Details

Attachments

(1 attachment, 7 obsolete attachments)

The off-the-main-thread stuff needs a way to manage dynamic atoms that don't live in the main atom table.
Blocks: 482918
Blocks: 515338
Attachment #398649 - Attachment description: WIP → Add a read-only hash for static atoms, a method for regetting an atom and a scoped table
Attachment #398649 - Flags: review?(benjamin)
Comment on attachment 398649 [details] [diff] [review]
Add a read-only hash for static atoms, a method for regetting an atom and a scoped table

I don't see any reason for nsHtml5AtomTable to implement nsIAtomService. Since it is only used within the html5 parser, just use a C++ class and avoid XPCOM interfaces altogether (it can be more efficient, too!)

In addition, it seems like using nsTHashtable would be better, so you can avoid a double lookup on Get/Put: you can just call PutEntry a single time.

The Reget function is a bit unsatisfactory: if you have an AtomImpl (the canonical XPCOM version), why not just have it return itself in all cases? And I don't think you need to pass in an nsIAtomService.
Attachment #398649 - Flags: review?(benjamin) → review-
(In reply to comment #2)
> (From update of attachment 398649 [details] [diff] [review])
> I don't see any reason for nsHtml5AtomTable to implement nsIAtomService. Since
> it is only used within the html5 parser, just use a C++ class and avoid XPCOM
> interfaces altogether (it can be more efficient, too!)

OK.

> In addition, it seems like using nsTHashtable would be better, so you can avoid
> a double lookup on Get/Put: you can just call PutEntry a single time.

OK. It would be nice if nsClassHashtable had a hook for a creator function for this use case.

Is the use of nsInterfaceHashtable for the table of static atoms OK, though?

> The Reget function is a bit unsatisfactory: if you have an AtomImpl (the
> canonical XPCOM version), why not just have it return itself in all cases? And
> I don't think you need to pass in an nsIAtomService.

For the initial design, I assumed that the main-thread part of the parser would use the global atom table, so there'd have been a reason for migrating global atoms to a parser thread. However, if I make the main-thread part of the parser use a private atom table, the global atoms don't need to migrate off the main thread and only the private atoms need to turn into global ones.

Thanks!
> Is the use of nsInterfaceHashtable for the table of static atoms OK, though?

It's ok, but it does an extra addref which is really unnecessary: probably better would be

nsDataHashtable<nsStringHashKey, nsIAtom*>

> For the initial design, I assumed that the main-thread part of the parser would
> use the global atom table, so there'd have been a reason for migrating global
> atoms to a parser thread. However, if I make the main-thread part of the parser
> use a private atom table, the global atoms don't need to migrate off the main
> thread and only the private atoms need to turn into global ones.

Hrm, I don't know how the duties are being divided up between threads. If you know that you're going to end up using a private atom table, it sounds like you should start using one "immediately", unless there are synchronization issues.
This patch has a method IsStaticAtom() to enable the implementation of the previous "Reget()" type of code that uses static atoms without actually regetting them outside the atoms themselves.
Attachment #398649 - Attachment is obsolete: true
Attachment #401440 - Flags: review?(benjamin)
Comment on attachment 401440 [details] [diff] [review]
Address previous review comments

>diff --git a/parser/html/nsHtml5Atom.cpp b/parser/html/nsHtml5Atom.cpp

>+nsHtml5Atom::nsHtml5Atom(const nsAString* aString)
>+  : mData(*aString)

The standard signature is const nsAString&, please use that.

>+NS_IMETHODIMP
>+nsHtml5Atom::GetUTF8String(const char **aReturn)
>+{
>+  return NS_ERROR_NOT_IMPLEMENTED;

add a NS_NOTREACHED here too please

>diff --git a/parser/html/nsHtml5AtomTable.h b/parser/html/nsHtml5AtomTable.h

>+class nsHtml5AtomEntry : public nsStringHashKey {

nit, opening-class-brace should go on the next line, per the standard coding style.

>+  public:
>+   nsHtml5AtomEntry(KeyTypePointer aStr);
>+   nsHtml5AtomEntry(const nsHtml5AtomEntry& aOther);
>+   ~nsHtml5AtomEntry();
>+   inline nsHtml5Atom* GetAtom() {
>+     return mAtom;
>+   }
>+  private:
>+   nsAutoPtr<nsHtml5Atom> mAtom;
>+};

What is the lifetime/scope of a nsHtml5AtomTable? Is there one per-document which is destroyed when the document is finished parsing? I'm trying to make sure that it is not destroyed before references to its nsIAtoms are finished, and also that it doesn't collect and retain lots of atoms beyond their natural lifetime.

>diff --git a/xpcom/ds/nsAtomTable.cpp b/xpcom/ds/nsAtomTable.cpp

>+// this is a hash of the static atoms for other atom tables
>+// need to optimize this to use less key memory
>+
>+static nsDataHashtable<nsStringHashKey, nsIAtom*>* gStaticAtomTable = 0;

This needs to be a separate hashtable because it is accessed from multiple threads, unlike the main table. How are you sure that modifications (NS_RegisterStaticAtoms) don't occur while the table is being read?
Posted patch Addressing review comments (obsolete) — Splinter Review
(In reply to comment #6)
> (From update of attachment 401440 [details] [diff] [review])
> >diff --git a/parser/html/nsHtml5Atom.cpp b/parser/html/nsHtml5Atom.cpp
> 
> >+nsHtml5Atom::nsHtml5Atom(const nsAString* aString)
> >+  : mData(*aString)
> 
> The standard signature is const nsAString&, please use that.

It doesn't compile that way:
no matching function for call to ‘nsHtml5Atom::nsHtml5Atom(const nsAString_internal*&)’

I tried using KeyType instead of KeyTypePointer in the entry constructor, but that didn't compile, either. What do I need to change in addition to changing the argument above to const nsAString&?

> >+NS_IMETHODIMP
> >+nsHtml5Atom::GetUTF8String(const char **aReturn)
> >+{
> >+  return NS_ERROR_NOT_IMPLEMENTED;
> 
> add a NS_NOTREACHED here too please

Added.

> >diff --git a/parser/html/nsHtml5AtomTable.h b/parser/html/nsHtml5AtomTable.h
> 
> >+class nsHtml5AtomEntry : public nsStringHashKey {
> 
> nit, opening-class-brace should go on the next line, per the standard coding
> style.

Fixed.

> >+  public:
> >+   nsHtml5AtomEntry(KeyTypePointer aStr);
> >+   nsHtml5AtomEntry(const nsHtml5AtomEntry& aOther);
> >+   ~nsHtml5AtomEntry();
> >+   inline nsHtml5Atom* GetAtom() {
> >+     return mAtom;
> >+   }
> >+  private:
> >+   nsAutoPtr<nsHtml5Atom> mAtom;
> >+};
> 
> What is the lifetime/scope of a nsHtml5AtomTable? Is there one per-document
> which is destroyed when the document is finished parsing?

The lifetime is the duration of a parse. The scope is one document and parser core (soon thread). That is, in a 100% scripted case (innerHTML and document.open + document.write) there's one nsHtml5AtomTable per document being parsed. When there's a network stream involved, there's two.

> I'm trying to make
> sure that it is not destroyed before references to its nsIAtoms are finished,
> and also that it doesn't collect and retain lots of atoms beyond their natural
> lifetime.

In the non-fragment case, the table should get destroyed with the parser. In the fragment case the parser is recycled, so the table needs to be destroyed manually and re-created.

> >diff --git a/xpcom/ds/nsAtomTable.cpp b/xpcom/ds/nsAtomTable.cpp
> 
> >+// this is a hash of the static atoms for other atom tables
> >+// need to optimize this to use less key memory
> >+
> >+static nsDataHashtable<nsStringHashKey, nsIAtom*>* gStaticAtomTable = 0;
> 
> This needs to be a separate hashtable because it is accessed from multiple
> threads, unlike the main table. How are you sure that modifications
> (NS_RegisterStaticAtoms) don't occur while the table is being read?

I am assuming that NS_RegisterStaticAtoms is only called during app startup and it's all done before the HTML5 parser is even instantiated. Is this assumption correct?

Thanks.
Attachment #401440 - Attachment is obsolete: true
Attachment #402606 - Flags: review?(benjamin)
Attachment #401440 - Flags: review?(benjamin)
> > The standard signature is const nsAString&, please use that.
> 
> It doesn't compile that way:
> no matching function for call to ‘nsHtml5Atom::nsHtml5Atom(const
> nsAString_internal*&)’

Well, you do have to change to passing it a reference instead of a pointer. Probably adding * would be sufficient: mAtom(new nsHtml5Atom(*aStr))

> > This needs to be a separate hashtable because it is accessed from multiple
> > threads, unlike the main table. How are you sure that modifications
> > (NS_RegisterStaticAtoms) don't occur while the table is being read?
> 
> I am assuming that NS_RegisterStaticAtoms is only called during app startup and
> it's all done before the HTML5 parser is even instantiated. Is this assumption
> correct?

I don't think it's guaranteed to be correct, no. At least nsAccessNode::InitXPAccessibility is not called until some external application (a screen reader) asks for an a11y view, which may be well after we've started parsing. RDFContentSinkImpl::RDFContentSinkImpl does also: I'm not sure when we initialize RDF right now, but we certainly trying to remove it from the startup path.

> 
> Thanks.
(In reply to comment #8)
> > > The standard signature is const nsAString&, please use that.
> > 
> > It doesn't compile that way:
> > no matching function for call to ‘nsHtml5Atom::nsHtml5Atom(const
> > nsAString_internal*&)’
> 
> Well, you do have to change to passing it a reference instead of a pointer.
> Probably adding * would be sufficient: mAtom(new nsHtml5Atom(*aStr))

OK. Thanks. (Somehow I was thinking the signature came from the template and not from my code lines above.)

> > > This needs to be a separate hashtable because it is accessed from multiple
> > > threads, unlike the main table. How are you sure that modifications
> > > (NS_RegisterStaticAtoms) don't occur while the table is being read?
> > 
> > I am assuming that NS_RegisterStaticAtoms is only called during app startup and
> > it's all done before the HTML5 parser is even instantiated. Is this assumption
> > correct?
> 
> I don't think it's guaranteed to be correct, no. At least
> nsAccessNode::InitXPAccessibility is not called until some external application
> (a screen reader) asks for an a11y view, which may be well after we've started
> parsing. RDFContentSinkImpl::RDFContentSinkImpl does also: I'm not sure when we
> initialize RDF right now, but we certainly trying to remove it from the startup
> path.

That's a pretty serious problem.

Would it be OK to seal the table of static atoms so that it no longer accepts new atoms after the layout module has initialized itself? (By setting a boolean that NS_RegisterStaticAtoms would check.) This would probably be good enough for the HTML5 parser but could be confusing if anyone else wants to reuse this facility later for something accessibility or RDF-related stuff.

BTW, I'm not quite sure if I understood the promotion of atoms to permanent when there are duplicates. If nsHtml5Atoms::foo is a duplicate of nsGkAtoms::foo, will both be nsStaticAtomWrapper or will one be a promoted nsAtomImpl? From one data point, it seems that nsHtml5Atoms::td is an nsStaticAtomWrapper even when it's a dupe of nsGkAtoms::td.
> Would it be OK to seal the table of static atoms so that it no longer accepts
> new atoms after the layout module has initialized itself? (By setting a boolean
> that NS_RegisterStaticAtoms would check.) This would probably be good enough
> for the HTML5 parser but could be confusing if anyone else wants to reuse this
> facility later for something accessibility or RDF-related stuff.

What would you do with the existing callers? I don't think we can just make NS_RegisterStaticAtoms fail without changing those callers. We could perhaps register all static atoms with C++ static initializers... we'd have to make sure that it was ok to do so before XPCOM was started, but that sounds like the best solution so far.

> BTW, I'm not quite sure if I understood the promotion of atoms to permanent
> when there are duplicates. If nsHtml5Atoms::foo is a duplicate of
> nsGkAtoms::foo, will both be nsStaticAtomWrapper or will one be a promoted
> nsAtomImpl? From one data point, it seems that nsHtml5Atoms::td is an

Promotion only ever happens nsAtomImpl->PermanentAtomImpl. This is for the case where you NS_NewAtom("foo") first, and then *later* call NS_RegisterStaticAtoms. For two static atom tables like nsGkAtoms::foo and ngHtml5Atoms::td you'd just end up with one nsStaticAtomWrapper.
(In reply to comment #10)
> > Would it be OK to seal the table of static atoms so that it no longer accepts
> > new atoms after the layout module has initialized itself? (By setting a boolean
> > that NS_RegisterStaticAtoms would check.) This would probably be good enough
> > for the HTML5 parser but could be confusing if anyone else wants to reuse this
> > facility later for something accessibility or RDF-related stuff.
> 
> What would you do with the existing callers? I don't think we can just make
> NS_RegisterStaticAtoms fail without changing those callers.

I mean NS_RegisterStaticAtoms would skip putting the late static atoms in the static atom hash so the hash wouldn't contain all static atoms out there when some static atoms are added lazily.

> > BTW, I'm not quite sure if I understood the promotion of atoms to permanent
> > when there are duplicates. If nsHtml5Atoms::foo is a duplicate of
> > nsGkAtoms::foo, will both be nsStaticAtomWrapper or will one be a promoted
> > nsAtomImpl? From one data point, it seems that nsHtml5Atoms::td is an
> 
> Promotion only ever happens nsAtomImpl->PermanentAtomImpl. This is for the case
> where you NS_NewAtom("foo") first, and then *later* call
> NS_RegisterStaticAtoms.

How do I know no one has done that on any atom values that are also in the nsHtml5Atoms atom table?

> For two static atom tables like nsGkAtoms::foo and
> ngHtml5Atoms::td you'd just end up with one nsStaticAtomWrapper.

OK.
> I mean NS_RegisterStaticAtoms would skip putting the late static atoms in the
> static atom hash so the hash wouldn't contain all static atoms out there when
> some static atoms are added lazily.

Oh. Yes, that's fine.

> How do I know no one has done that on any atom values that are also in the
> nsHtml5Atoms atom table?

We only promote non-permanent atoms that are in the XPCOM atom table. Since non-permanent XPCOM atoms are never in the nsHtml5Atoms table, I don't think this is a problem at all.
New patch addressing previous comments.
Attachment #402606 - Attachment is obsolete: true
Attachment #403472 - Flags: review?(benjamin)
Attachment #402606 - Flags: review?(benjamin)
Attachment #403472 - Attachment is obsolete: true
Attachment #403472 - Flags: review?(benjamin)
Comment on attachment 403472 [details] [diff] [review]
Make static atom table sealable and make html5 atom table clearable

Oops. Sorry. This patch is wrong.
Attachment #403780 - Flags: review?(benjamin)
Attachment #403780 - Flags: review?(benjamin) → review+
Comment on attachment 403780 [details] [diff] [review]
Make the static table sealed and make the dynamic table clearable

>diff --git a/parser/html/nsHtml5Atom.h b/parser/html/nsHtml5Atom.h

>+class nsHtml5Atom : public nsIAtom {

nit, bracing is still wrong

>diff --git a/parser/html/nsHtml5AtomTable.cpp b/parser/html/nsHtml5AtomTable.cpp

>+class nsHtml5AtomTable {

nit again
This could really use a test though, especially of the sealing bit.
Flags: in-testsuite?
Thanks. I'll fix the brace nits.

(In reply to comment #17)
> This could really use a test though, especially of the sealing bit.

What testing mechanism should I use?

Also, do I need an sr? If so, from whom?
Since the static atom stuff is only available from binary code, your test will need to be binary. Copy xpcom/tests/TestTimers.cpp which uses TestHarness.h and it should be fairly straightforward.

According to the rules you do need an sr. Probably dbaron or peterv.
Posted patch Address nits, add test (obsolete) — Splinter Review
Fixed brace placement. Added tests. Re-requesting r for the test.
Attachment #403780 - Attachment is obsolete: true
Attachment #404790 - Flags: superreview?(dbaron)
Attachment #404790 - Flags: review?(benjamin)
Attachment #404790 - Flags: review?(benjamin) → review+
Is the nsHtml5AtomTable intended to live on a single thread, and you make one for each thread on which it is used?  Or is there only one such thread?

It seems like its GetAtom method should assert that the atom table is sealed, and also make assertions about its threading invariants.

(It seems like it might be cleaner to clone the table of static atoms at the time of creation of the nsHtml5AtomTable rather than deal with sealing, although it's probably not worth it.)
(In reply to comment #21)
> Is the nsHtml5AtomTable intended to live on a single thread, and you make one
> for each thread on which it is used?  Or is there only one such thread?

An instance of nsHtml5AtomTable is constructed and destructed on the main thread (currently but this isn't an essential feature of the class itself). During normal operation, each nsHtml5AtomTable gets atom lookups only from the thread that logically owns the table. However, when a speculation fails, the main thread will do lookups in the off-the-main-thread instance of nsHtml5AtomTable. However, this is done while the the main thread holds a mutex on the class whose field holds the nsHtml5AtomTable so that the mutex prevents the parser thread from doing pretty much anything on that parser instance while the main thread performs the post-failed speculation state synchronization. Thus, nsHtml5AtomTable can't simply assert that all lookups happen on the same thread.

> It seems like its GetAtom method should assert that the atom table is sealed,
> and also make assertions about its threading invariants.

Asserting that the static table is sealed seems feasible. However, I think nsHtml5AtomTable can't assert about threading without a debug-only mutex, because it's the responsibility of its owner to prevent concurrent access. Should I add a debug-only mutex for detecting concurrent access attempts?

> (It seems like it might be cleaner to clone the table of static atoms at the
> time of creation of the nsHtml5AtomTable rather than deal with sealing,
> although it's probably not worth it.)

Cloning the table of static atoms would have a perf cost which would defeat the point of trying to keep static atoms cheap in terms of execution time.
Made test cases not break opt builds. Added assertions.
Attachment #404790 - Attachment is obsolete: true
Attachment #406402 - Flags: superreview?(dbaron)
Attachment #404790 - Flags: superreview?(dbaron)
Attachment #406402 - Attachment is patch: true
Attachment #406402 - Attachment mime type: application/octet-stream → text/plain
Comment on attachment 406402 [details] [diff] [review]
Fix test cases, address dbaron's comments

Not sure if I'm supposed to re-request r on this one. If not, sorry for the spam.
Attachment #406402 - Flags: review?(benjamin)
Status: NEW → ASSIGNED
Attachment #406402 - Flags: review?(benjamin) → review+
Comment on attachment 406402 [details] [diff] [review]
Fix test cases, address dbaron's comments

You need to rev the IID of nsIAtom.

(Though I wonder if it would be better to make an nsIAtomPrivate interface
that's derived from nsIAtom so you don't have to.)

Maybe at least put your new method at the end of the interface?

And I'd think you can still use 'boolean' in the IDL rather than 'PRBool'.

In MoreTestingAtoms.cpp:

+  NS_RegisterStaticAtoms(MoreTestingAtoms_info, NS_ARRAY_LENGTH(MoreTestingAtoms_info));

please wrap at the , so it stays under 80 characters.

nsHtml5Atom.cpp:

+NS_IMPL_QUERY_INTERFACE1(nsHtml5Atom, nsIAtom)

Why waste all the code on a QueryInterface that will assert if it is ever
called with a valid interface?  Maybe just make it look like AddRef and
Release (with a return NS_ERROR_UNEXPECTED)?

Could you document in nsHtml5AtomTable how many atom tables you expect
there to be, and what distinguishes them?

+nsIAtom*
+nsHtml5AtomTable::GetAtom(const nsAString& aKey)
+{
+#ifdef DEBUG
+  nsCOMPtr<nsIThread> currentThread;
+  NS_GetCurrentThread(getter_AddRefs(currentThread));
+  NS_ASSERTION(mPermittedLookupThread == currentThread, "Wrong thread!");
+#endif

You should brace the contents of the ifdef debug so you don't accidentally
write code that depends on those variables and have it compile in a debug
build, i.e.:
#ifdef DEBUG
  {
    nsCOMPtr<nsIThread> currentThread;
    NS_GetCurrentThread(getter_AddRefs(currentThread));
    NS_ASSERTION(mPermittedLookupThread == currentThread, "Wrong thread!");
  }
#endif

Also, in general, in the include guards in your headers, you should end
the names of the macros with only a single underscore, since all names
with two consecutive underscores are reserved to the implementation.
(See C++, [lib.global.names].)

+nsHtml5AtomEntry::nsHtml5AtomEntry(const nsHtml5AtomEntry& aOther)
+  : nsStringHashKey(aOther)
+  , mAtom(nsnull)
+{
+  NS_NOTREACHED("nsHtml5AtomTable is broken and tried to copy an entry");
+}

Rather than doing this, you can simply declare the copy-constructor as
private, and not implement it.  Then any attempt to use it will generate
a linker error.  (And you may want to do the same for operator=.)  For a
few examples already in the tree, see
http://mxr.mozilla.org/mozilla-central/search?string=not%20supported%20and

+class nsHtml5AtomEntry : public nsStringHashKey
+{
+  public:
+   nsHtml5AtomEntry(KeyTypePointer aStr);

Using a single space for a level of indentation is very odd (and
inconsistent with the code a dozen lines below, though that code is
itself inconsistent about whether things under "public:" should be
further indented).

+  if(!gStaticAtomTable->Get(aUTF16String, &atom)) {

space between "if" and "(".

+nsHtml5Atom::IsStaticAtom()

I think all three implementations of IsStaticAtom should be declared
NS_IMETHODIMP_(PRBool) rather than just PRBool, for consistency with
NS_DECL_NSIATOM.

+NS_IMETHODIMP
+nsHtml5Atom::EqualsUTF8(const nsACString& aString, PRBool *aReturn)
+{
+  *aReturn = CompareUTF8toUTF16(nsDependentCString(aString), mData) == 0;

You shouldn't need this "nsDependentCString("...")"

TestingAtoms.h and MoreTestingAtoms.h have comments that are clearly
copied from nsGkAtoms.h without revision.  Feel free to delete them.


Was there a reason you chose to have these atoms store the data in
UTF-16 and convert for the methods that deal with UTF-8, whereas other
atoms do the reverse?


Where is it documented:
 * what threads the atoms are allowed to be used on?
 * in what contexts you have to use the HTML5 atom and in what contexts
   you have to convert it to a normal atom?  (And how do you convert?)

The whole idea of having two different nsIAtoms for the same string,
especially ones that are both allowed to be used on the same thread,
scares me, given the purpose of atoms and how we currently use them.
This patch rewrites a whole bunch of existing invariants and yet doesn't
have a single source code comment in it explaining anything that isn't
obvious by looking at the code right below the comment.  It needs to
document what rules it expects to be followed.

I've reread bits of the threads at:
http://groups.google.com/group/mozilla.dev.platform/browse_thread/thread/3a3785a83571bfeb/
http://groups.google.com/group/mozilla.dev.platform/browse_thread/thread/3646baf3f137b61c/
and based on that I can accept that this is useful.  But I still don't
understand what the rules for using these new classes are.  (And when
I'm asked to review something I don't understand, I tend to think I'm
just stupid, and I'll understand it when I come back to it in a few
days.)  But if I don't understand them, I think other people dealing
with this code won't either.  So I think some documentation here would
be quite helpful.
Attachment #406402 - Flags: superreview?(dbaron) → superreview-
If you post a revised patch with comments added, I'll get to it much more quickly.
(In reply to comment #25)
> (From update of attachment 406402 [details] [diff] [review])

Thank you.

> You need to rev the IID of nsIAtom.

Fixed.

> (Though I wonder if it would be better to make an nsIAtomPrivate interface
> that's derived from nsIAtom so you don't have to.)

Do you mean that static vs. parser-private atom would be checked using QI rather than a virtual method that returns boolean?

> Maybe at least put your new method at the end of the interface?

Fixed.

> And I'd think you can still use 'boolean' in the IDL rather than 'PRBool'.

Fixed.

> In MoreTestingAtoms.cpp:
> 
> +  NS_RegisterStaticAtoms(MoreTestingAtoms_info,
> NS_ARRAY_LENGTH(MoreTestingAtoms_info));
> 
> please wrap at the , so it stays under 80 characters.

Fixed.

> nsHtml5Atom.cpp:
> 
> +NS_IMPL_QUERY_INTERFACE1(nsHtml5Atom, nsIAtom)
> 
> Why waste all the code on a QueryInterface that will assert if it is ever
> called with a valid interface?  

I wasn't aware that using the boilerplate macro wasn't always mandatory.

> Maybe just make it look like AddRef and
> Release (with a return NS_ERROR_UNEXPECTED)?

Fixed.

> Could you document in nsHtml5AtomTable how many atom tables you expect
> there to be, and what distinguishes them?

Documented.

> +nsIAtom*
> +nsHtml5AtomTable::GetAtom(const nsAString& aKey)
> +{
> +#ifdef DEBUG
> +  nsCOMPtr<nsIThread> currentThread;
> +  NS_GetCurrentThread(getter_AddRefs(currentThread));
> +  NS_ASSERTION(mPermittedLookupThread == currentThread, "Wrong thread!");
> +#endif
> 
> You should brace the contents of the ifdef debug so you don't accidentally
> write code that depends on those variables and have it compile in a debug
> build, i.e.:
> #ifdef DEBUG
>   {
>     nsCOMPtr<nsIThread> currentThread;
>     NS_GetCurrentThread(getter_AddRefs(currentThread));
>     NS_ASSERTION(mPermittedLookupThread == currentThread, "Wrong thread!");
>   }
> #endif

Fixed.

> Also, in general, in the include guards in your headers, you should end
> the names of the macros with only a single underscore, since all names
> with two consecutive underscores are reserved to the implementation.
> (See C++, [lib.global.names].)

Fixed in the files introduced by this patch. This pattern is all over the codebase (e.g. in nsISupportsImpl.h), so I had copied it when I tried to mimic the Mozilla C++ style. Filed bug 528863 about the rest of the HTML5 parser.

> +nsHtml5AtomEntry::nsHtml5AtomEntry(const nsHtml5AtomEntry& aOther)
> +  : nsStringHashKey(aOther)
> +  , mAtom(nsnull)
> +{
> +  NS_NOTREACHED("nsHtml5AtomTable is broken and tried to copy an entry");
> +}
> 
> Rather than doing this, you can simply declare the copy-constructor as
> private, and not implement it.  Then any attempt to use it will generate
> a linker error.  (And you may want to do the same for operator=.)  For a
> few examples already in the tree, see
> http://mxr.mozilla.org/mozilla-central/search?string=not%20supported%20and

I tried doing this (private unimplemented copy constructor and private unimplemented operator=) but got:

/Users/Shared/Projects/mozilla-html5/parser/html/nsHtml5AtomTable.h: In static member function ‘static void nsTHashtable<EntryType>::s_CopyEntry(PLDHashTable*, const PLDHashEntryHdr*, PLDHashEntryHdr*) [with EntryType = nsHtml5AtomEntry]’:
../../dist/include/nsTHashtable.h:345:   instantiated from ‘PRBool nsTHashtable<EntryType>::Init(PRUint32) [with EntryType = nsHtml5AtomEntry]’
/Users/Shared/Projects/mozilla-html5/parser/html/nsHtml5AtomTable.h:108:   instantiated from here
/Users/Shared/Projects/mozilla-html5/parser/html/nsHtml5AtomTable.h:61: error: ‘nsHtml5AtomEntry::nsHtml5AtomEntry(const nsHtml5AtomEntry&)’ is private
../../dist/include/nsTHashtable.h:387: error: within this context

> +class nsHtml5AtomEntry : public nsStringHashKey
> +{
> +  public:
> +   nsHtml5AtomEntry(KeyTypePointer aStr);
> 
> Using a single space for a level of indentation is very odd (and
> inconsistent with the code a dozen lines below, though that code is
> itself inconsistent about whether things under "public:" should be
> further indented).

Fixed.

> +  if(!gStaticAtomTable->Get(aUTF16String, &atom)) {
> 
> space between "if" and "(".

Fixed.

> +nsHtml5Atom::IsStaticAtom()
> 
> I think all three implementations of IsStaticAtom should be declared
> NS_IMETHODIMP_(PRBool) rather than just PRBool, for consistency with
> NS_DECL_NSIATOM.

Fixed.

> +NS_IMETHODIMP
> +nsHtml5Atom::EqualsUTF8(const nsACString& aString, PRBool *aReturn)
> +{
> +  *aReturn = CompareUTF8toUTF16(nsDependentCString(aString), mData) == 0;
> 
> You shouldn't need this "nsDependentCString("...")"

Removed. (Though maybe the whole method should just become return NS_ERROR_NOT_IMPLEMENTED;. I implemented the method only for completeness.)

> TestingAtoms.h and MoreTestingAtoms.h have comments that are clearly
> copied from nsGkAtoms.h without revision.  Feel free to delete them.

Deleted.

> Was there a reason you chose to have these atoms store the data in
> UTF-16 and convert for the methods that deal with UTF-8, whereas other
> atoms do the reverse?

The atoms are only ever created from PRUnichar buffers and converted to PRUnichar buffers and compared with PRUnichar buffers inside the HTML5 parser. The UTF-8 variants of the methods are useless. I implemented them only for future completeness. Shall I remove the implementations of the UTF-8 variants?

> Where is it documented:
>  * what threads the atoms are allowed to be used on?
>  * in what contexts you have to use the HTML5 atom and in what contexts
>    you have to convert it to a normal atom?  (And how do you convert?)

This is now documented in nsHtml5AtomTable.h.

> The whole idea of having two different nsIAtoms for the same string,
> especially ones that are both allowed to be used on the same thread,
> scares me, given the purpose of atoms and how we currently use them.

The function served by the global static atom table is reducing scariness.

It would be possible to make the off-the-main-thread HTML5 parser work merely with the IsStaticAtom() method addition on the XPCOM side (or without any additions on the XPCOM side if nsHtml5Atom were distinguishable from static atoms via QI using a marker interface, but that QI would probably be more expensive to run for every atom than a virtual method that returns a boolean), but doing so would require depending on even more unobvious characteristics of the HTML5 parser internals (basically introducing more fine-grained zones of atomicity that one would have to know not to cross: element names, attribute names and doctype names). See http://groups.google.com/group/mozilla.dev.platform/browse_thread/thread/3a3785a83571bfeb/f07962a28b9dfae6 .

> This patch rewrites a whole bunch of existing invariants and yet doesn't
> have a single source code comment in it explaining anything that isn't
> obvious by looking at the code right below the comment.  It needs to
> document what rules it expects to be followed.

My apologies for not documenting the assumptions of the code properly initially. The rules are now documented.

(Obeying the rules so that these atoms don't escape from the parser is dealt with in subsequent patches, so there are no invariant changes to the rest of the app.)
Attachment #406402 - Attachment is obsolete: true
Attachment #412558 - Flags: superreview?(dbaron)
Comment on attachment 412558 [details] [diff] [review]
Address superreview comments

(In reply to comment #27)
> Do you mean that static vs. parser-private atom would be checked using QI
> rather than a virtual method that returns boolean?

No, I meant that we'd know all our atoms implement nsIAtomPrivate in addition to nsIAtom, but we'd just leave the nsIAtom interface unchanged.  Not sure if it matters, though.

> The UTF-8 variants of the methods are useless. I implemented them only for
> future completeness. Shall I remove the implementations of the UTF-8 variants?

It might be better to make them assert if they're unused and actually using them would hurt performance.



Please rewrap the added comments at a width of less than 80 characters, and sr=dbaron.
Attachment #412558 - Flags: superreview?(dbaron) → superreview+
(In reply to comment #28)
> > The UTF-8 variants of the methods are useless. I implemented them only for
> > future completeness. Shall I remove the implementations of the UTF-8 variants?
> 
> It might be better to make them assert if they're unused and actually using
> them would hurt performance.

OK. I made them assert and return failure.

> Please rewrap the added comments at a width of less than 80 characters, and
> sr=dbaron.

Whoa. Looks like the supposed 80-character line in TextWrangler was completely bogus. Rewrapped.

Thank you.

Pushed as
http://hg.mozilla.org/mozilla-central/rev/bec299562b22
Status: ASSIGNED → RESOLVED
Closed: 10 years ago
Resolution: --- → FIXED
(In reply to comment #27)
> It would be possible to make the off-the-main-thread HTML5 parser work merely
> with the IsStaticAtom() method addition on the XPCOM side (or without any
> additions on the XPCOM side if nsHtml5Atom were distinguishable from static
> atoms via QI using a marker interface, but that QI would probably be more
> expensive to run for every atom than a virtual method that returns a boolean),
> but doing so would require depending on even more unobvious characteristics of
> the HTML5 parser internals (basically introducing more fine-grained zones of
> atomicity that one would have to know not to cross: element names, attribute
> names and doctype names).

Code now in bug 529808.
You need to log in before you can comment on or make changes to this bug.