Closed
Bug 207377
Opened 21 years ago
Closed 21 years ago
txMozillaXMLOutput::endHTMLElement needs improvement
Categories
(Core :: XSLT, defect)
Tracking
()
RESOLVED
FIXED
People
(Reporter: emrick, Assigned: emrick)
Details
(Keywords: perf)
Attachments
(2 files, 5 obsolete files)
289.38 KB,
text/plain
|
Details | |
14.72 KB,
patch
|
axel
:
review+
peterv
:
superreview+
|
Details | Diff | Splinter Review |
User-Agent: Mozilla/5.0 (OS/2; U; Warp 4.5; en-US; rv:1.3) Gecko/20030313 Build Identifier: Mozilla/5.0 Gecko/20030313 Testcase measurement shows potential for overall 8% improvement from txMozillaXMLOutput::endHTMLElement http://lxr.mozilla.org/seamonkey/source/extensions/transformiix/source/xslt/txMozillaXMLOutput.cpp#529 specificly, in the: if (atom == txHTMLAtoms::table && !aXHTML) { ... } code block. Testcase is NOT marked as xhtml, yet renders correctly if above is changed to: if (0) { ... } to no-op the tbody insertion code block. Some means of short-cutting this check to do tbody insertion processing is needed. The means to do that is not exactly clear to me right now, but I'm studying it, and by the bugzilla, hoping to get ideas from others about it. Reproducible: Always Steps to Reproduce:
Comment 1•21 years ago
|
||
there's also a crasher bug very close to the lines you want to touch: bug 202765.
Assignee | ||
Comment 2•21 years ago
|
||
Interesting data. The best performance result is obtained by following the XHTML path, either by hacking the code, or by putting the right XHTML incantation into the xsl:stylesheet attributtes. The document does NOT have well-formed tbody's. In fact, it does not use tbody at all anyplace. So, curious is this... it appears tbody and /tbody are redundant anyway... table and /table appear to 'wrap' table-tbody elements just fine without the need for tbody and /tbody............ yes? no? maybe? Good but not best performance result is obtained by editing the document and adding the tbody (just after each table) and /tbody (just before each /table). Having the tbody and /tbody already there avoids the wrapChildren path. It would seem to me, if it is the case that tbody REALLY needs to be inserted, then the place to do this would be at compile-time, not execute time. Yes/No ? It does not look to hard to do there. I'm going to play with what happens if <table blah blah> <tbody> <tbody> ... </tbody> </tbody> </table> if its not bad, then could just unconditionally add a tbody after each table and not need to 'lookahead' to see if one is already there.... Other ideas or comments on this?
Comment 3•21 years ago
|
||
1) Mozilla expects the tbody to be there for HTML, see bug 30378. 2) XHTML output is always going to be faster, less fixup to do. 3) When you say the tbody is redundant, are you sure? Does the C in attachment 65021 [details] have a colored background if you skip the tbody insertion code? 4) Let's not make the output even worse by unconditionally adding a tbody. 5) It is not possible to do this at compile time afaik. My advice is to change the stylesheet to output XHTML and add tbody's in the stylesheet. We could make this code smarter, but it is going to be nasty because of nested tables, you'll at least need a stack. Olivier: bug 202765 is totally unrelated to this one, please don't mix them.
Assignee | ||
Comment 4•21 years ago
|
||
> 1) Mozilla expects the tbody to be there for HTML, see bug 30378. Much ado about nothing. The code does whats right regardless of what 30378 says. If and only if there are multiple table sections (multiple tbody's or tbody with tfoot and thead) then, of course, tbody tags required. Otherwise optional. >2) XHTML output is always going to be faster, less fixup to do. That will be sweet then... when everybody uses it. Unfortunatly, neither you or I will live that long. >3) When you say the tbody is redundant, are you sure? Does the C in attachment 65021 have a colored background if you skip the tbody insertion code? If I add the xmlns="http://www.w3.org/1999/xhtml" incantation to thet stylesheet then ix goes through XHTML path, <tbodys> are not added... and it still looks same to me. Do you get a different result? >4) Let's not make the output even worse by unconditionally adding a tbody. I'm not sure I understand your point here????? >5) It is not possible to do this at compile time afaik. Why do you think so? >My advice is to change the stylesheet to output XHTML and add tbody's in the >stylesheet. Change every stylesheet in the world ??? Talk about steady work ! Nobody uses tbody's unless they need multiple table sections. >We could make this code smarter, but it is going to be nasty because >of nested tables, you'll at least need a stack. I am not afraid of a stack. :-)
Comment 5•21 years ago
|
||
I guess we could discuss removing this code if it doesn't regress bug 53944. I'd personally be in favor of keeping the code, in HTML tbody is an inferred element (and I'd rather be consistent with what our parser does for regular HTML, it avoids nasty surprises for users). And don't make output nasty with two levels of tbody if the user has been so good to provide one himself. In short, I'm more likely to review a patch that improves this code than one that removes it.
Assignee: peterv → emrick
Status: UNCONFIRMED → NEW
Ever confirmed: true
Comment 6•21 years ago
|
||
And there's no way to do this at compile time because you don't have all the cases where a stylesheet will produce a table in the output. The simplest example is a copying stylesheet that you feed a document with a table in it.
Assignee | ||
Comment 7•21 years ago
|
||
I've been working with varients of http://lxr.mozilla.org/seamonkey/source/layout/html/tests/table/viewer_tests/test4.html and sadness prevails..... and some surprise.... there is no means to consistently insert a <tbody> in a visually invisable way.... and yes I'm saying that impacts the rendering of documents processed by xslt and the base gkparser. (IE produces the intended invisible result, Moz does not). (scratches head icon)
Comment 8•21 years ago
|
||
Sam, how do you get an 8% number anyway? That pretty much puzzles me. That's an if(), so at most one branch prediction miss in any real-world testcase. With all the bad perf problems we have 8% for an if() come as quite a suprise to me.
First of all, we definitely have to keep the same logic as exists now and let the outputhandler insert tbody-elements if they are "missing". In theory this can sometimes be done at compiletime, but not always, see comment 6. There are two things we can do however: 1. I suspect the content-code ends up doing way too much work when we move the <table>-children from the <table> to the <tbody>, this is because we use DOM-methods instead of nsIContent-methods. This is a problem in general with the output handlers. I have a patch that is half-way done that should improve things. 2. Instead of putting the elements into the <table> and then move them to the <tbody> we could instead hold off putting childnodes into a <table> until we know for sure if a <tbody> needs to be inserted or not. That should make the insertion of the <tbody> next to free. IMHO we should first do 1 since that should be done anyway. After that we should investigate if we're still paying a penilty for the moving and if so do 2.
oh, an alternative to doing 2 in my previous comment is to always insert the <tbody> and then remove it if we end up inserting another table-section-element. That should make us do the 'optimal' think in almost all realworld cases.
Assignee | ||
Comment 11•21 years ago
|
||
>First of all, we definitely have to keep the same logic as exists >now and let the outputhandler insert tbody-elements... Yea, I said that(clumsy way, I guess) with comment #7 - to maintain same visual result as base Mozilla, then XSLT must insert tbody's too, and also, XSLT cannot insert additional 'redundant' tbody's either. I think this is a base bug in tbody-attributes handling, but that's another issue. For now, pertinent point is, XSLT must do tbody's to produce same visual results as base Mozilla). I think compile-time tbody insertion is worth considering more. Even if it is not 100%, it would 'fix-up' many cases... and then the currnet execute-time code would 'catch the rest'. >"doing way too much work when we move the <table>-children from the <table> to the <tbody>" Exactly so. And because code does not detect the need to do this 'move' until the end (endElement). Better would be to detect the condition on the first element not appropriate to live under tbody. That is, as such: if after a <table>, then first <tr>, before <tbody> would signal need to insert <tbody>.... or something like that. One attribute of today's code... the intent this tbody insertion business is to fix-up the most simple kind of tables, tables with just one table secion, those tables where tbody is by-spec an optional keyword. However, the logic here in XSLT and in mozilla base actually does more, as it will do same fix-up to malformed multi-section tables. Simple tables w/o tbody are by spec okay. Complex tables with malformed sections are by spec not allowed, but the code today attempts to do same sort of tbody insertion fix-up. Don't know if anybody thinks this is a problem. (I hope not, solution space we're talking about in this bug has same attribute).
Assignee | ||
Comment 12•21 years ago
|
||
Only after instrumenting complex tables, have I come to appreciate the power allowed by xsl logic and flow control statements embedded into html table building code.... The result is such -- the potential for COMPILE SIDE optimization of tables seems limited to a specific but important case -- the case where <TABLE> StartElement is immediantly followed by a <TR> StartElement. When this is the case, then insert between these two a <TBODY> StartElement, This would just catch and fix-up the most simple cases. The more complex cases must be, can only be, detected and fixed-up at execution time. Nevertheless, I think it's useful to consider adding the compile-side fixup, to handle the most ssimple and most common cases. Looks to me, the place to do it is in txStylesheetCompiler::startElement or txStylesheetCompiler::startElementInternal. Means to do this would be, each step save current startElement atom as previous atom. If previous atom == "table" and current atom == "tr" then do special code to insert tbody into compiled code stream. Some care would need be taken in this code to do minimal conversion in compares and assignments, That is, table and tr tags are a small percent of any given input tag stream, and overhead to detect the sequence <table><tr> must be minimal to make this a good optimization. Comments?
Comment 13•21 years ago
|
||
- HTML fixup is module only code - it would take more than a trout to make me wanna have TX_EXE in stylesheet compilation - the very common case of default-html-output-method cannot be caught during compilation I would still like to see what's actually taking the time. On how to fix this in startElement, the parser code that does this is using http://lxr.mozilla.org/seamonkey/source/htmlparser/src/COtherElements.h#819 which adds a tbody if you insert a tr or th directly into a table. We could mockup that, but then we had to have a stack telling us if the current element is inserted. Not too bad, but I'd prefer to see what's really going on inside the childmoving, first.
Assignee | ||
Comment 14•21 years ago
|
||
per Axel's request, a trace of wrapChildren portion of endHTMLElement processing
Comment 15•21 years ago
|
||
aaaaahhhh. We should definitly do sicking-1. This stuff does security checks and modification notifications and all. No wonder it's slow as molasses.
Assignee | ||
Comment 16•21 years ago
|
||
Okay..... Jonas convinced me it is NOT possible to do a fix at compile-time with a testcase that I cannot fugure out how to handle. Next try. At execute-time, SIMPLE code to handle the most common case of simple tables at startElement time (rather than endElement time). The fastest way to do wrapChildren is to NOT do it at all! This code provides that for the simple table cases. The endElement is still needed for cases that this code does not catch. But this code will catch a lot of tables and do them most quickly. This is NOT to say that Jonas's 'solution 1' is not needed. That is a bigger fish. This is a small fish that is good bang for the buck. Not even big enough to do a patch file. Here's the concept and code: 1) In txMozillaXMLOutput.h ...... In txMozillaXMLOutput class private data, add: + PRBool mPreviousStartHTMLWasTable, and in Constructor init this bool to PR_FALSE 2) In txHTMLAtomList.h add a element for the 'tr' tag 3) Then, in txMozillaXMLOutput.cpp, just a bit of added code: void txMozillaXMLOutput::startHTMLElement(nsIDOMElement* aElement) { nsCOMPtr<nsIAtom> atom; nsCOMPtr<nsIContent> content = do_QueryInterface(aElement); content->GetTag(*getter_AddRefs(atom)); + // Is this flag necessary? If actually good XHTML, not needed! + PRBool aXHTML = PR_FALSE; // Is this needed to reject bad XHTML? + + // Do insert of <tbody> case where <tr> immediantly follows <table> + if (mPreviousStartHTMLWasTable == PR_TRUE && + atom == txHTMLAtoms::tr && !aXHTML) { + nsresult rv; + nsCOMPtr<nsIDOMElement> elem; + rv = mDocument->CreateElement(NS_LITERAL_STRING("tbody"), getter_AddRefs(elem)); + NS_ASSERTION(NS_SUCCEEDED(rv), "Can't create tbody element"); + nsCOMPtr<nsIDOMNode> resultNode; + rv = mCurrentNode->AppendChild(elem, getter_AddRefs(resultNode)); + NS_ASSERTION(NS_SUCCEEDED(rv), "Can't append tbody elem node"); + } + + if (atom == txHTMLAtoms::table && !aXHTML) + mPreviousStartHTMLWasTable = PR_TRUE; + else mPreviousStartHTMLWasTable = PR_FALSE; mDontAddCurrent = (atom == txHTMLAtoms::script); ... Please?
That won't work since we need to add the <tr>s inside the inserted <tbody>, not after it. But the idea is definitly right, we should default to inserting the <tbody> instead of defaulting to not insert it. See comment 10.
Comment 18•21 years ago
|
||
Ew, that's going to get messy in EndElement, we'll have to walk up to the parent twice if we inserted the tbody ourselves.
Assignee | ||
Comment 19•21 years ago
|
||
>That won't work since we need to add the <tr>s >inside the inserted <tbody>, not after it. Jonas, not sure that I understand you. What this code does is, if and only if, the html tag sequence is this: <table> <tr> then it changes the sequence to this: <table) <tbody> <tr> such that, for example, if the tbody already appears in the html source, then this will NOT add another. (startElement code flow wise, the new code inserts the tbody if needed, then the code below this does the <tr> insert that was the reason this code was entered. >Ew, that's going to get messy in EndElement, we'll have >to walk up to the parent twice if we inserted the tbody ourselves. There are no changes required to endElement. I have tested this code and it works.
No, you're actually creating a DOM like <table> <tbody></tbody> <tr>...</tr> <tr>...</tr> ... </table> whereas what we want is <table> <tbody> <tr>...</tr> <tr>...</tr> </tbody> </table> To do this you need to set mCurrentNode to the newly inserted <tbody>. And then modify endElement to walk up two nodes when appropriate. An alternative to walking up two steps is to keep a stack of the current-node at all levels, which is what the contentsinks do. This is probably something we should do since that is a requirement to supporting document.write, and it also protects us from modifications of the document during the transformation.
Assignee | ||
Comment 21•21 years ago
|
||
Proposed patch, following suggested implementation. Jonas was already so kind as to do first level review. Additional review requested. Thanks. Sam
Peterv: Does it matter what we do for tables like: <table> <tr>...</tr> <tbody> <tr>...</tr> <tr>...</tr> </tbody> <tr>...</tr> </table> ? The current code doesn't do anything if any child is a tablesection. The HTML-parser creates a table with 3 <tbody>s by wrapping a <tbody> around the first and the last <tr>.
Assignee | ||
Comment 23•21 years ago
|
||
>Peterv: Does it matter what we do for tables like:
Well, "what is right to do" for malformed html is not clear to me. Tbody's are
ONLY optional in a 1-section table. (Basicly, backward compatibility to simple
html 3.2 tables).
Whenever multiple sections are present, (html 4.0, including thead, tfoot, or
multiple tbody's) then tbody is not optional.
There's a more well-formed and more common case that neither current code nor
proposed code handles. The </tbody> tags are optional in html 4.0 even in the
cases where the tbody tag is requeired. With a bit more work, the code here
could handle that case too, I think.
Assignee | ||
Comment 24•21 years ago
|
||
Jonas points out by email that my mPreviousHTMLTableAddedTbody[32] 'fixed size stack' is insufficient and this needs to use txStack instead. I agree (and admit I was being lazy). Assuming we're on the right track here, I'll modify patch to use txStack. Other option is to eliminate the stack altogether. Do this as such: 1) Detect and 'fix up' all instances of "found /TABLE when was expecting /TBODY". 2) Since TBODY we insert always follow TABLE, no risk of adding an /TBODY in the wrong place. 3) This would also fix-up the most simple cases of missing /TBODY in html source. What do y'all think? Would this be better or worse?
Assignee | ||
Comment 25•21 years ago
|
||
this one insures only the specific end-tbody matching our inserted start-tbody tag is ever manipulated. The added compare, using TX_StringEqualsAtom, is rather performance expensive. Is there a cheaper way to compare aName equals literal string "TABLE" here? Sam
Assignee | ||
Updated•21 years ago
|
Attachment #125773 -
Attachment is obsolete: true
Comment 26•21 years ago
|
||
Comment on attachment 125844 [details] [diff] [review] patch varient, insures no other end-tbody is manipulated We shouldn't add more code that uses the name arg to endElement. That's supposed to die. I'm still not sure about the good way to do this. I wonder if we should make a few odd table constructs and check what the html parser does to them. For example, <table> <tfoot><!-- stuff --> </tfoot> <tr/> </table> and stuff like that
Assignee | ||
Comment 27•21 years ago
|
||
Okay, I've created a patch varient that does not rely on a stack. This one adds, same as before, adds <tbody> if <tr> or <th> immediantly follows <table>. In endElement, it handles case where name was </table>, but element was actually </tbody> and the parent element is indeed the expected </table>. By this means no stack is required. Very specific start tag insertion rule and likewise specific end tag insertion rule make it such that no state is required. I'd hoped that this code would also allow/support fix-up of missing 'optional' end-tbody tags in user html shource. Turns out that is a moot point because expat refuses to parse such source. (its probably a bug that expat does this, but not my worry right now). Regarding <thead> and <tfoot>, the proposed patch simply ignores such cases and passes them on to current code to potentially do tbody child creation and element wrappering. The intent and function of proposed patch is to handle the most simple forms of tables in most performance-efficient manner. (based on my opinion that simple html 3.2-flavor tables are by far cost common form of tables). This patch should, ideally at least, make no change to comples table processing. Ideally, it should make not change 'malformed table' processing either. ( However, that's a lot to ask... as in, current code does not handle malformed cases either, so I'm not sure I understand if and why there is a requirement that this patch needs to handle junk input better than current code.)
This one will probably deal with tables containing mixed <tbody>s and <tr>s. It even does what the html-parser does and wrapps groups of <tr>s (turned out that was the easiest way to do it).
Attachment #125844 -
Attachment is obsolete: true
sorry about that guys, this one actually works
Attachment #127831 -
Attachment is obsolete: true
sorry, forgot to run a final cvs-diff before attaching. The only change between this one and attachment 127850 [details] [diff] [review] is that this one doesn't set mTableState when popping from the mTableStateStack in startHTMLElement in the |if (atom == tbody)| block
Attachment #127850 -
Attachment is obsolete: true
Comment 31•21 years ago
|
||
Comment on attachment 127886 [details] [diff] [review] new version that *really* works r=axel@pike.org with a comment in txMozillaXMLOutput.h that ADDED_TBODY only signals that our fixup routine added the tbody, not tbodies added thru the stylesheet. That should make the trap I almost ran into harmless.
Attachment #127886 -
Flags: review+
I just noticed a nice sideeffect from this patch. For tables like: <table> <caption>...</caption> <tr>...</tr> <tr>...</tr> </table> we ended up creating a DOM like <table> <tbody> <caption>...</caption> <tr>...</tr> <tr>...</tr> </tbody> </table> However with my patch only the 'tr's are wrapped inside the tbody. However that made me realize that we should not only move out tbodies from the our created tbody, all non-tr elements should make us step one step up the tree and place that element outside the tbody. New patch comming up for that
This also fixes the problem mentioned in comment 32. Basically I just rearranged the |if|s in txMozillaXMLOutput::startHTMLElement so that we break out of an added <tbody> if we try to insert anything other then <tr>s in it.
Attachment #127886 -
Attachment is obsolete: true
Attachment #128874 -
Flags: superreview?(peter)
Attachment #128874 -
Flags: review?(axel)
Comment 34•21 years ago
|
||
Comment on attachment 128874 [details] [diff] [review] This is gold >Index: source/xslt/txMozillaXMLOutput.cpp >=================================================================== >@@ -234,26 +236,29 @@ void txMozillaXMLOutput::endElement(cons > #ifdef DEBUG >- nsAutoString nodeName; >- mCurrentNode->GetNodeName(nodeName); >- NS_ASSERTION(nodeName.Equals(aName, nsCaseInsensitiveStringComparator()), >- "Unbalanced startElement and endElement calls!"); >+ if (mTableState != ADDED_TBODY) { >+ nsAutoString nodeName; >+ mCurrentNode->GetNodeName(nodeName); >+ NS_ASSERTION(nodeName.Equals(aName, >+ nsCaseInsensitiveStringComparator()), >+ "Unbalanced startElement and endElement calls!"); >+ } Do we want to check the parent's nodename for ADDED_TBODY? >@@ -488,21 +500,60 @@ void txMozillaXMLOutput::closePrevious(P > NS_ASSERTION(NS_SUCCEEDED(rv), "Can't append text node"); > > mText.Truncate(); > } > } > >-void txMozillaXMLOutput::startHTMLElement(nsIDOMElement* aElement) >+void txMozillaXMLOutput::startHTMLElement(nsIDOMElement* aElement, PRBool aXHTML) > { >+ nsresult rv = NS_OK; > nsCOMPtr<nsIAtom> atom; > nsCOMPtr<nsIContent> content = do_QueryInterface(aElement); > content->GetTag(getter_AddRefs(atom)); > > mDontAddCurrent = (atom == txHTMLAtoms::script); > >- if (atom == txHTMLAtoms::head) { >+ if ((atom != txHTMLAtoms::tr || aXHTML) && >+ NS_PTR_TO_INT32(mTableStateStack.peek()) == ADDED_TBODY) { Line this up. >+ nsCOMPtr<nsIDOMNode> parent; >+ mCurrentNode->GetParentNode(getter_AddRefs(parent)); >+ mCurrentNode = parent; mCurrentNode.swap(parent)? >+ mTableStateStack.pop(); >+ } >+ >+ if (atom == txHTMLAtoms::table && !aXHTML) { >+ mTableState = TABLE; >+ } >+ else if (atom == txHTMLAtoms::tr && !aXHTML && >+ NS_PTR_TO_INT32(mTableStateStack.peek()) == TABLE) { >+ nsCOMPtr<nsIDOMElement> elem; >+ if (mDocumentIsHTML) { >+ rv = mDocument->CreateElement(NS_LITERAL_STRING("tbody"), >+ getter_AddRefs(elem)); >+ } >+ else { >+ rv = mDocument->CreateElementNS(NS_LITERAL_STRING(kXHTMLNameSpaceURI), >+ NS_LITERAL_STRING("tbody"), >+ getter_AddRefs(elem)); >+ } >+ if (NS_FAILED(rv)) { >+ return; >+ } >+ nsCOMPtr<nsIDOMNode> dummy; >+ rv = mCurrentNode->AppendChild(elem, getter_AddRefs(dummy)); >+ if (NS_FAILED(rv)) { >+ return; >+ } >+ nsresult rv = mTableStateStack.push(NS_INT32_TO_PTR(ADDED_TBODY)); >+ if (NS_FAILED(rv)) { >+ return; >+ } >+ mCurrentNode = elem; mCurrentNode.swap(elem)? >+void txMozillaXMLOutput::endHTMLElement(nsIDOMElement* aElement) >+ if (mTableState == ADDED_TBODY) { >+ NS_ASSERTION(atom == txHTMLAtoms::tbody, "Added tbody in nontable"); I don't really understand this assertion's message. >+ nsCOMPtr<nsIDOMNode> parent; >+ mCurrentNode->GetParentNode(getter_AddRefs(parent)); >+ mCurrentNode = parent; mCurrentNode.swap(parent)? >Index: source/xslt/txMozillaXMLOutput.h >=================================================================== >+ ADDED_TBODY // the current node is an inserted tbody not comming from >+ // the stylesheet coming
Attachment #128874 -
Flags: superreview?(peterv) → superreview+
Comment 35•21 years ago
|
||
Comment on attachment 128874 [details] [diff] [review] This is gold r=me. > Do we want to check the parent's nodename for ADDED_TBODY? no. that stuff is supposed to die. IMHO. NS_ASSERTION(atom == txHTMLAtoms::tbody, "This should have been the added tbody"); ? And I like swap().
Attachment #128874 -
Flags: review?(axel) → review+
fix checked in
Status: NEW → RESOLVED
Closed: 21 years ago
Resolution: --- → FIXED
You need to log in
before you can comment on or make changes to this bug.
Description
•