487949 - (html5-parsing-land) [HTML5] Land HTML5 parser on m-c preffed off

Assignee

•

16 years ago

Attached patch Patch as of rev 2770407be167 as if bug 468708 had already landed part 1 (obsolete) — Details — Splinter Review

Attaching part one of a huge patch to have something to set review flags on. This patch doesn't contain the Java code, since its location in the tree is undecided.

Attachment #373077 - Flags: superreview?(jonas)

Attachment #373077 - Flags: review?(mrbkap)

Henri Sivonen (:hsivonen)

Assignee

Comment 4

•

16 years ago

Attached patch Patch as of rev 2770407be167 as if bug 468708 had already landed part 2 (obsolete) — Details — Splinter Review

Bugzilla didn't allow all this in one attachment...

Attachment #373078 - Flags: superreview?(jonas)

Attachment #373078 - Flags: review?(mrbkap)

Henri Sivonen (:hsivonen)

Assignee

Comment 5

•

16 years ago

Note to self: Patch generated with hg diff -r 2199410bc4bb:tip

Priority: -- → P1

Ben Newman (:bnewman) (:benjamn)

Comment 6

•

16 years ago

Applied all three patches (including the one for bug 468708), and, after a little tweaking, it works! At first glance, the 468708 patch seems to have removed a couple of method prototypes that the big patch assumes: diff --git a/content/html/content/src/nsGenericHTMLElement.h b/content/html/content/src/nsGenericHTMLElement.h --- a/content/html/content/src/nsGenericHTMLElement.h +++ b/content/html/content/src/nsGenericHTMLElement.h @@ -98,23 +98,27 @@ public: nsresult DOMQueryInterface(nsIDOMHTMLElement *aElement, REFNSIID aIID, void **aInstancePtr); // From nsGenericElement nsresult CopyInnerTo(nsGenericElement* aDest) const; // Implementation for nsIDOMNode NS_METHOD GetNodeName(nsAString& aNodeName); + NS_METHOD GetLocalName(nsAString& aLocalName); // Implementation for nsIDOMElement NS_METHOD SetAttribute(const nsAString& aName, const nsAString& aValue); NS_METHOD GetTagName(nsAString& aTagName); NS_METHOD GetElementsByTagName(const nsAString& aTagname, nsIDOMNodeList** aReturn); + NS_METHOD GetElementsByTagNameNS(const nsAString& aNamespaceURI, + const nsAString& aLocalName, + nsIDOMNodeList** aReturn); // nsIDOMHTMLElement methods. Note that these are non-virtual // methods, implementations are expected to forward calls to these // methods. nsresult GetId(nsAString& aId); nsresult SetId(const nsAString& aId); nsresult GetTitle(nsAString& aTitle); nsresult SetTitle(const nsAString& aTitle); diff --git a/content/html/document/src/nsHTMLDocument.h b/content/html/document/src/nsHTMLDocument.h --- a/content/html/document/src/nsHTMLDocument.h +++ b/content/html/document/src/nsHTMLDocument.h @@ -179,16 +179,23 @@ public: virtual PRInt32 GetNumFormsSynchronous(); virtual void TearingDownEditor(nsIEditor *aEditor); virtual void SetIsXHTML(PRBool aXHTML) { mIsRegularHTML = !aXHTML; } PRBool IsXHTML() { return !mIsRegularHTML; } +#ifdef DEBUG + virtual nsresult CreateElem(nsIAtom *aName, nsIAtom *aPrefix, + PRInt32 aNamespaceID, + PRBool aDocumentDefaultType, + nsIContent** aResult); +#endif + nsresult ChangeContentEditableCount(nsIContent *aElement, PRInt32 aChange); virtual EditingState GetEditingState() { return mEditingState; } virtual void DisableCookieAccess()

Ben Newman (:bnewman) (:benjamn)

•

16 years ago

Attachment #373077 - Attachment is obsolete: true

Attachment #373077 - Flags: superreview?(jonas)

Attachment #373077 - Flags: review?(mrbkap)

Henri Sivonen (:hsivonen)

Assignee

Updated

•

16 years ago

Attachment #373078 - Attachment is obsolete: true

Attachment #373078 - Flags: superreview?(jonas)

Attachment #373078 - Flags: review?(mrbkap)

Reed Loden [:reed]

•

16 years ago

Attached patch The HTML5 parsing diff as if the DOM namespace patch had already landed (obsolete) — Details — Splinter Review

Attachment #374272 - Flags: superreview?(jonas)

Attachment #374272 - Flags: review?(mrbkap)

Henri Sivonen (:hsivonen)

Assignee

Comment 14

•

16 years ago

Attached patch The HTML5 parsing diff as if the DOM namespace patch had already landed (obsolete) — Details — Splinter Review

Now with more care for elements that need special treatment when inserted to content tree by the parser.

Attachment #374272 - Attachment is obsolete: true

Attachment #375271 - Flags: superreview?(jonas)

Attachment #375271 - Flags: review?(mrbkap)

Attachment #374272 - Flags: superreview?(jonas)

Attachment #374272 - Flags: review?(mrbkap)

Henri Sivonen (:hsivonen)

Assignee

Comment 15

•

16 years ago

Attached patch HTML5 parsing patch, now with fix for bug 483158 (obsolete) — Details — Splinter Review

New patch with major tokenizer loop refactoring for bug 483158. This patch is against trunk. I deliberately haven't bothered syncing this with the latest iteration of bug 468708, since it seems more useful to land it first and then just sync the HTML5 repo with trunk.

Attachment #375271 - Attachment is obsolete: true

Attachment #381762 - Flags: superreview?(jonas)

Attachment #381762 - Flags: review?(mrbkap)

Attachment #375271 - Flags: superreview?(jonas)

Attachment #375271 - Flags: review?(mrbkap)

Ben Newman (:bnewman) (:benjamn)

Comment 16

•

16 years ago

(In reply to comment #15) > This patch is against trunk. I deliberately haven't bothered syncing this with > the latest iteration of bug 468708, since it seems more useful to land it first > and then just sync the HTML5 repo with trunk. Wholly agreed.

Brendan Eich [:brendan]

Comment 17

•

16 years ago

Seems like too many new and delete calls, also PRUnichar read/write arrays instead of const arrays. Fixable in the translator? We really need to avoid going to the heap for what should be LIFO and static storage. Anyone have a clue about code footprint and pageload performance? /be

Brendan Eich [:brendan]

•

16 years ago

Hmm. Or maybe one could live dangerously and have the taint bit out of sync in the copies whose taint bit is never read anyway and still pass the whole stack node by value.

Jonas Sicking (:sicking) No longer reading bugmail consistently

Comment 23

•

16 years ago

We really should get rid of table taint. If we can show that it simplifies the parser design and that it doesn't appear to break any pages, I don't see why we couldn't get rid of it. I'd actually suggest that we remove taint asap so that we can get testing on if that causes any problems.

Henri Sivonen (:hsivonen)

Assignee

Comment 24

•

16 years ago

> The stack nodes can't be heap-allocated, because the call stack goes away and > the tree builder stack stays. > > Attribute values once they get into the attribute holder can't be > heap-allocated for the same reason. Argh. s/heap-allocated/stack-allocated/ in both paragraphs.

Robert O'Callahan (:roc) (email my personal email if necessary)

•

16 years ago

Attached patch HTML5 parsing patch, now diffed after the namespace landing (obsolete) — Details — Splinter Review

Attachment #381762 - Attachment is obsolete: true

Attachment #382713 - Flags: superreview?(jonas)

Attachment #382713 - Flags: review?(mrbkap)

Attachment #381762 - Flags: superreview?(jonas)

Attachment #381762 - Flags: review?(mrbkap)

Brendan Eich [:brendan]

•

16 years ago

I'm only seeing small differences between the perf of Firefox 3.5b99, HTML5 builds with HTML5 active and with HTML5 builds with HTML5 inactive. However, on tryserver, builds with no HTML5 parser at all are green but even HTML5 builds that have the HTML5 parser inactive are insanely slow when running the old parser. This seems like a blocking issue. (I feel terrible for not trying this configuration on the tryserver before. I had only tried HTML5 on and same point on trunk without any HTML5 code in the build at all.) There's something about the latent presence of the HTML5 parser that slows things down on the tryserver even when the HTML5 parser isn't run (only its pseudo-static memory is initialized at app startup). The most reasonable explanation I can come up with is that the different set of static atoms makes the atom table slow somehow. But if that were the explanation, I should be seeing the same slowdown locally, which I don't. (I'm in the process of verifying with printfs that the latent HTML5 parser really doesn't get run and doesn't get its statics initialized more than once.)

Robert O'Callahan (:roc) (email my personal email if necessary)

Comment 35

•

16 years ago

Have you tried running the tryserver-generated build locally?

Henri Sivonen (:hsivonen)

Assignee

Comment 36

•

16 years ago

(In reply to comment #34) > However, on tryserver, builds with no HTML5 parser at all are green but even > HTML5 builds that have the HTML5 parser inactive are insanely slow when running > the old parser. Fortunately, this was a misdiagnosis my part. The last orange Tp run had HTML5 parsing enabled. Tp is green with latent HTML5 parsing. I'll now keep the HTML5 repo in the preffed-off state in order to simulate the situation of having the parser landed on trunk preffed off. However, with latent HTML5 parser mochitest times out. Now investigating that instead. I have verified that the HTML5 parser is indeed latent on the tryserver in those builds and only participates in pseudo-static initialization and shutdown and well as in the global atom table load factor. (In reply to comment #35) > Have you tried running the tryserver-generated build locally? Yes, I have. (Both HTML5 enabled and latent.) The numbers are similar to locally-built builds. Since my dev machine has more RAM than tryserver talos, I'll try running on a RAM-limited Linux VM image next.

Henri Sivonen (:hsivonen)

Assignee

Comment 37

•

16 years ago

Attached patch Hopefully a landable patch (except still doesn't have .java sources) (obsolete) — Details — Splinter Review

Attachment #382713 - Attachment is obsolete: true

Attachment #383652 - Flags: superreview?(jonas)

Attachment #383652 - Flags: review?(mrbkap)

Attachment #382713 - Flags: superreview?(jonas)

Attachment #382713 - Flags: review?(mrbkap)

Henri Sivonen (:hsivonen)

Assignee

Comment 38

•

16 years ago

Attached patch Interdiff of the previous and current patch — Details — Splinter Review

Henri Sivonen (:hsivonen)

Assignee

•

16 years ago

Attached patch Fix merge issue, cache pref, fix static inline void and cycle collection (obsolete) — Details — Splinter Review

Attachment #383652 - Attachment is obsolete: true

Attachment #384287 - Flags: superreview?(jonas)

Attachment #384287 - Flags: review?(mrbkap)

Attachment #383652 - Flags: superreview?(jonas)

Attachment #383652 - Flags: review?(mrbkap)

Henri Sivonen (:hsivonen)

Assignee

Comment 51

•

16 years ago

Attached patch Interdiff of the previous and current patch — Details — Splinter Review

Henri Sivonen (:hsivonen)

Assignee

Comment 52

•

16 years ago

The remaining test failures with the current patch are: * One nsComponentManagerImpl leaks on Windows * mochitest-chrome times out on Linux

Jonas Sicking (:sicking) No longer reading bugmail consistently

Comment 53

•

16 years ago

Is the nsComponentManagerImpl leak the leak that's causing cycles collector slowness and ultimately talos timing out?

Henri Sivonen (:hsivonen)

Assignee

Comment 54

•

16 years ago

The nsComponentManagerImpl leak is with HTML5 parser preffed off, so it should be unrelated to the cycle collector code in the HTML5 parser. I merged with a known-green point from the trunk after the last comment and now mochitest-plain timed out (with the old parser). Any advice on how to troubleshoot?

Boris Zbarsky [:bzbarsky]

Comment 55

•

16 years ago

> The nsComponentManagerImpl leak is with HTML5 parser preffed off, Is this one of the known random oranges? Did it appear on all try machines for you, or just some? > * mochitest-chrome times out on Linux Similar questions...

Ben Newman (:bnewman) (:benjamn)

Comment 56

•

16 years ago

(In reply to comment #50) > Created an attachment (id=384287) [details] > Fix merge issue, cache pref, fix static inline void and cycle collection I'm guessing you made the parser a cycle collection participant to help track down the leak. Is that a permanent design choice? I ask because the cycle collector is main thread only, which means cycle collection could complicate off-main thread parsing.

Henri Sivonen (:hsivonen)

Assignee

•

16 years ago

The Mac orange was random, too. Now it appears that none of the orange is caused by this patch.

Brendan Eich [:brendan]

•

16 years ago

Attached patch Address review comments, fix crash with app cache, fix a test case (obsolete) — Details — Splinter Review

Attachment #384287 - Attachment is obsolete: true

Attachment #384830 - Flags: superreview?(jonas)

Attachment #384830 - Flags: review?(mrbkap)

Attachment #384287 - Flags: superreview?(jonas)

Attachment #384287 - Flags: review?(mrbkap)

Henri Sivonen (:hsivonen)

Assignee

Comment 65

•

16 years ago

Attached patch Interdiff of the previous and current patch — Details — Splinter Review

Blake Kaplan (:mrbkap) (inactive)

Comment 66

•

16 years ago

(I'll respond since these comments cover most of mine and because sicking is probably already asleep). (In reply to comment #63) > > Can you turn these into nsAutoPtr/nsAutoArrayPtrs instead? > > That seems like an overkill, because these objects are only ever held from one > place. If we want to avoid manual new/delete here, mTokenizer and mTreeBuilder > could, in principle, be allocated as parts of the nsHtml5Parser object instead > of hitting new/delete and having them behind a pointer. If you can get away with that, I think that would be preferable. At the very least, I agree with sicking here -- especially with the Reset API on nsIParser, the potential for accidental leaks seems high. FWIW, in hand-written C++ code, the prevailing style for the past few years (as long as I've been working on Gecko) has been to eschew manual refcounting in favor of nsRefPtr/nsAutoPtr with the assumption that compilers will generate decent code (and that humans forget to do stuff like that way more easily than compilers!). > Copied from nsParser, but fixed. :-) rickg = single entrance single exit. :-) > Fixed here. Should I also remove parentheses in cases like NS_ASSERTION((foo == > bar), "blah");? Yeah. In cases like this, it's up to the macro to do proper parenthesizing. Also, the parentheses will be distracting in the console if the asserts do hit. > > Can this really happen during a document.write?? Or rather, should it really be > > honored then? > > I don't see why it wouldn't be honored per spec. Opera honors it. IE8, Gecko's > old parser and WebKit don't. Maybe the spec should change here? For what it's worth, we have code in the tree specifically to not honor this. I haven't done the CVS archeology yet to figure out why, though. > It's only for handling <noscript> elements. I'm not sure if the code can be > reached if scripting is off. I assumed that an extension running as chrome > could reach it even if the document has scripting turned off, so I put it in > just in case. I'd expect this to run copy/pasting HTML into a noscript document, but I haven't tested. > It's code I copied from nsParser to use as a starting point. Removed. Blame me for not nuking it earlier (I did finally get rid of it in nsParser.cpp recently).

Henri Sivonen (:hsivonen)

Assignee

Comment 67

•

16 years ago

•

16 years ago

Attached patch Patch with .java files omitted — Details — Splinter Review

Attachment #384830 - Attachment is obsolete: true

Attachment #385389 - Flags: superreview?(jonas)

Attachment #385389 - Flags: review?(mrbkap)

Attachment #384830 - Flags: superreview?(jonas)

Attachment #384830 - Flags: review?(mrbkap)

Henri Sivonen (:hsivonen)

Assignee

Comment 80

•

16 years ago

Attached patch Interdiff form 2009-06-24 to 2009-06-26 — Details — Splinter Review

Pushed to try at rev bb03b52176c3. sicking, over to you. I have no unpushed local changes.

Attachment #384864 - Attachment is obsolete: true

Henri Sivonen (:hsivonen)

Assignee

Comment 81

•

16 years ago

Pushed as rev 168dd0eb7283. Woohoo! Thank you for the reviews and for all the other help!

Status: NEW → RESOLVED

Closed: 16 years ago

Resolution: --- → FIXED

Henri Sivonen (:hsivonen)

Assignee

Comment 82

•

16 years ago

Attached patch Contingency patch for making the HTML5 run no code — Details — Splinter Review

Since it's impractical to back out a push of this size if things go wrong on the tinderboxen, I'm attaching a patch that keeps the parser in the build but makes it run no code (not even its static initialization which is otherwise run even when the parser is preffed off).

Ginn Chen

Comment 83

•

16 years ago

"SEC" is defined as 1 in /usr/include/sys/time.h on Solaris. Can we rename that? Error message: nsHtml5ElementName.h", line 153: Error: Identifier expected instead of "1".

Henri Sivonen (:hsivonen)

Assignee

Comment 84

•

16 years ago

Yes, it will be renamed.

Henri Sivonen (:hsivonen)

Assignee

Comment 85

•

Updated

•

16 years ago

Depends on: 501345

Boris Zbarsky [:bzbarsky]

Updated

•

16 years ago

Alias: html5-parsing-land

Martijn Wargers (dead)

Comment 89

•

16 years ago

It turns out that the fix for bug 501113 has fixed these crashes.

Martijn Wargers (dead)

Updated

•

16 years ago

Depends on: 502091

Martijn Wargers (dead)

Updated

•

16 years ago

Depends on: 502103

Henri Sivonen (:hsivonen)

Assignee

Updated

•

16 years ago

No longer depends on: 502103

Ben Newman (:bnewman) (:benjamn)

Comment 90

•

16 years ago

I've been tasked with devising/proposing a plan to incorporate the java source files into mozilla-central. As a start, I've posted a patch that adds some new export scripts to the validator.nu htmlparser repository: http://bugzilla.validator.nu/show_bug.cgi?id=592 The high bit is that we need to include, at minimum, some sort of standalone version of the Java-to-C++ translator so that mozilla developers can hack on the java sources and easily regenerate the corresponding C++ files. Henri, please let me know if there's anything in the patch you don't like, and feel free to modify it as you see fit. After all, it is your repo.

-fullmetaljacket-

Updated

•

16 years ago

Depends on: 502984

Henri Sivonen (:hsivonen)

Assignee

Updated

•

16 years ago

No longer depends on: 502984

Ben Newman (:bnewman) (:benjamn)

Comment 91

•

16 years ago

Attached patch Patch to create Makefiles with sync/libs/translate/clean targets (obsolete) — Details — Splinter Review

I had a realization today. All the HTML5 parsing-related java files live in Subversion repositories, and one nice thing about Subversion is that it allows you to check out an arbitrary subdirectory of a respository. So, instead of exporting the java source files from some external svn working directory into the mozilla-central source tree, it makes a lot more sense to pull just the necessary directories directly from the appropriate repository. This patch by itself does not contain the java files (it would have been almost 2MB if it did), but if you read parser/html/java/README.txt you'll see that it's easy to fetch them (make sync), build the translator (make), and perform the translation (make translate). The 'sync' target fetches not only the necessary java sources but also the appropriate license files. This does introduce a Subversion dependency for those who want to hack on the HTML5 parser, but only for them. Building mozilla-central is still possible without Subversion, since these Makefiles are independent of the existing build system. At some point we may decide to eliminate the Subversion dependency by committing the java sources to mozilla-central, but I'll leave that for a future code review.

Attachment #387588 - Flags: superreview?(jst)

Attachment #387588 - Flags: review?(jst)

Ben Newman (:bnewman) (:benjamn)

Updated

•

16 years ago

Attachment #387588 - Flags: review?(jst) → review?(hsivonen)

Robert O'Callahan (:roc) (email my personal email if necessary)

Comment 92

•

16 years ago

Doesn't this mean that if that subversion repo were to go away, we'd be violating the GPL and LGPL?

Reed Loden [:reed]

Updated

•

16 years ago

Attachment #387588 - Flags: review-

Reed Loden [:reed]

Comment 93

•

•

16 years ago

Comment on attachment 387588 [details] [diff] [review] Patch to create Makefiles with sync/libs/translate/clean targets Moved my translator check-in patch to a more appropriate bug: https://bugzilla.mozilla.org/attachment.cgi?id=388568&action=edit

Attachment #387588 - Attachment is obsolete: true

Attachment #387588 - Flags: superreview?(jst)

Attachment #387588 - Flags: review?(hsivonen)

(mostly gone) XtC4UaLL [:xtc4uall]

Updated

•

16 years ago

Depends on: 507119

IU

Updated

•

16 years ago

Depends on: 507872

u88484

Updated

•

16 years ago

Depends on: 512310

Mike Beltzner [:beltzner, not reading bugmail]

Updated

•

16 years ago

status1.9.2: --- → beta1-fixed

Flags: wanted1.9.2?

•

6 years ago

Regressions: CVE-2019-11744

Patch as of rev 2770407be167 as if bug 468708 had already landed part 1 16 years ago Henri Sivonen (:hsivonen) 1021.92 KB, patch		Details \| Diff \| Splinter Review
Patch as of rev 2770407be167 as if bug 468708 had already landed part 2 16 years ago Henri Sivonen (:hsivonen) 545.07 KB, patch		Details \| Diff \| Splinter Review
The HTML5 parsing diff as if the DOM namespace patch had already landed 16 years ago Henri Sivonen (:hsivonen) 1.53 MB, patch		Details \| Diff \| Splinter Review
The HTML5 parsing diff as if the DOM namespace patch had already landed 16 years ago Henri Sivonen (:hsivonen) 1.53 MB, patch		Details \| Diff \| Splinter Review
HTML5 parsing patch, now with fix for bug 483158 16 years ago Henri Sivonen (:hsivonen) 1.62 MB, patch		Details \| Diff \| Splinter Review
HTML5 parsing patch, now diffed after the namespace landing 16 years ago Henri Sivonen (:hsivonen) 1.57 MB, patch		Details \| Diff \| Splinter Review
Hopefully a landable patch (except still doesn't have .java sources) 16 years ago Henri Sivonen (:hsivonen) 1.58 MB, patch		Details \| Diff \| Splinter Review
Interdiff of the previous and current patch 16 years ago Henri Sivonen (:hsivonen) 18.66 KB, patch		Details \| Diff \| Splinter Review
Fix merge issue, cache pref, fix static inline void and cycle collection 16 years ago Henri Sivonen (:hsivonen) 1.58 MB, patch		Details \| Diff \| Splinter Review
Interdiff of the previous and current patch 16 years ago Henri Sivonen (:hsivonen) 20.21 KB, patch		Details \| Diff \| Splinter Review
Address review comments, fix crash with app cache, fix a test case 16 years ago Henri Sivonen (:hsivonen) 1.59 MB, patch		Details \| Diff \| Splinter Review
Interdiff of the previous and current patch 16 years ago Henri Sivonen (:hsivonen) 8.42 KB, patch		Details \| Diff \| Splinter Review
Interdiff for making static jArray init safe, fix assert parens, include .java 16 years ago Henri Sivonen (:hsivonen) 777.25 KB, patch		Details \| Diff \| Splinter Review
Patch with .java files omitted 16 years ago Henri Sivonen (:hsivonen) 1.43 MB, patch		Details \| Diff \| Splinter Review
Interdiff form 2009-06-24 to 2009-06-26 16 years ago Henri Sivonen (:hsivonen) 1.30 MB, patch		Details \| Diff \| Splinter Review
Contingency patch for making the HTML5 run no code 16 years ago Henri Sivonen (:hsivonen) 1.56 KB, patch		Details \| Diff \| Splinter Review
Patch to create Makefiles with sync/libs/translate/clean targets 16 years ago Ben Newman (:bnewman) (:benjamn) 12.62 KB, patch	reed : review-	Details \| Diff \| Splinter Review