Last Comment Bug 18333 - (incrementalxml) XML Content Sink should be incremental
(incrementalxml)
: XML Content Sink should be incremental
Status: RESOLVED FIXED
[Hixie-P2] Long XML documents take a ...
: perf, testcase, xhtml
Product: Core
Classification: Components
Component: XML (show other bugs)
: Trunk
: All All
: P3 major with 28 votes (vote)
: mozilla1.9alpha2
Assigned To: Henri Sivonen (:hsivonen)
: Ashish Bhatt
Mentors:
http://www.w3.org/TR/1999/PR-xslt-199...
: 162404 200532 229773 410584 (view as bug list)
Depends on: 368909 387221 442094 445688 448585 36145 36146 344281 368913 368917 369011 369167 369402 369950 370210 370230 372323 373536 374037 374345 374420 375173 380896 383951 401613 401946 419716 419756 425498 437106 449219 459122
Blocks: 359487 363688 84582 155752 232004 276037 309721 312858 343870 355213 361892 369015 369935
  Show dependency treegraph
 
Reported: 1999-11-09 10:43 PST by Dan Rosen
Modified: 2014-04-26 03:06 PDT (History)
69 users (show)
bzbarsky: in‑testsuite?
See Also:
Crash Signature:
(edit)
QA Whiteboard:
Iteration: ---
Points: ---
Has Regression Range: ---
Has STR: ---


Attachments
Make the XML content sink incremental (104.63 KB, patch)
2006-07-07 05:33 PDT, Henri Sivonen (:hsivonen)
no flags Details | Diff | Splinter Review
Make the XML content sink incremental, v2 (96.35 KB, patch)
2006-07-24 16:12 PDT, Henri Sivonen (:hsivonen)
no flags Details | Diff | Splinter Review
230512: Make the XML content sink incremental, v3 (96.36 KB, patch)
2006-07-25 12:04 PDT, Henri Sivonen (:hsivonen)
no flags Details | Diff | Splinter Review
Make the XML content sink incremental, v4 (95.79 KB, patch)
2006-07-31 04:19 PDT, Henri Sivonen (:hsivonen)
no flags Details | Diff | Splinter Review
Make the XML content sink incremental, v5 (98.54 KB, patch)
2006-08-01 13:21 PDT, Henri Sivonen (:hsivonen)
no flags Details | Diff | Splinter Review
Make the XML content sink incremental, v6 (101.07 KB, patch)
2006-08-21 03:36 PDT, Henri Sivonen (:hsivonen)
no flags Details | Diff | Splinter Review
Make the XML content sink incremental, v7 (109.19 KB, patch)
2006-08-22 05:18 PDT, Henri Sivonen (:hsivonen)
no flags Details | Diff | Splinter Review
Make the XML content sink incremental, v8 (108.95 KB, patch)
2006-10-17 03:43 PDT, Henri Sivonen (:hsivonen)
no flags Details | Diff | Splinter Review
Make the XML content sink incremental, v9 (114.97 KB, patch)
2006-11-29 05:32 PST, Peter Van der Beken [:peterv]
no flags Details | Diff | Splinter Review
Make the XML content sink incremental, v10 (111.51 KB, patch)
2006-12-13 07:32 PST, Henri Sivonen (:hsivonen)
peterv: review+
Details | Diff | Splinter Review
Make the XML content sink incremental, v10.0.1 (144.89 KB, patch)
2007-01-18 17:56 PST, Jonas Sicking (:sicking) No longer reading bugmail consistently
jonas: superreview+
Details | Diff | Splinter Review
Make the XML content sink incremental, v11 (145.48 KB, patch)
2007-01-30 13:18 PST, Jonas Sicking (:sicking) No longer reading bugmail consistently
no flags Details | Diff | Splinter Review

Description Dan Rosen 1999-11-09 10:43:30 PST
Large XML documents, such as the W3C's XSLT spec, take an incredibly long time
to load into view source. The browser freezes/blocks (is "not responding"
according to Windows) while it processes, and finally unlocks after the entire
source of the document is ready for display.

This may be a problem on other platforms as well. I noticed the problem on
build 1999110809 for Win32.
Comment 1 Nisheeth Ranjan 1999-11-09 19:20:59 PST
Hopefully, this will get fixed with the reflow coalescing work I am planning to
do.  I'm setting the milestone for this bug to M12 for now...
Comment 2 Nisheeth Ranjan 1999-12-01 15:45:59 PST
Move non-PDT+ bugs to M13...
Comment 3 Dan Rosen 1999-12-08 09:48:59 PST
A little more data for you, Nisheeth: in Solaris, the browser does not freeze
(as it does or did in Windows), and operations still may be done on that window
and others; but the source does not display until *much* later, even on small
documents. I'm upping the severity a little, for good measure...
Comment 4 Nisheeth Ranjan 1999-12-22 16:21:59 PST
Moving non block/inline bugs to M14...
Comment 5 Nisheeth Ranjan 2000-01-25 23:07:44 PST
Not a beta blocker.  Moving out to M15...
Comment 6 Nisheeth Ranjan 2000-04-04 14:36:12 PDT
Moving bugs out by one milestone...
Comment 7 rickg 2000-04-07 15:08:24 PDT
non-essential for m16
Comment 8 David Baron :dbaron: ⌚️UTC-7 (review requests must explain patch) 2000-04-07 19:28:44 PDT
I've done some jprof profiling on view source, and this doesn't seem specific to
XML documents.  It does seem strongly related (N? N^2? log N?) to the length of
the document.  I see the same problem on REC-CSS1 (which is a little shorter
than PR-xslt).   When my computer is on, you can see the two jprof profiling
runs at:
http://dbaron.student.harvard.edu/u/david/mozperf/view-source-CSS1.html
http://dbaron.student.harvard.edu/u/david/mozperf/view-source-PR-xslt.html
They were done using JPROF_FLAGS="JP_START JP_DEFER JP_PERIOD=0.010", and I
started and stopped using SIGPROF and SIGUSR1.

It looks like there are a bunch of things that should probably be a lot faster. 
The biggest bottlenecks are in text measurement, which takes over a quarter of
the time.  In particular, nsLineLayout::FindNextText is either very slow or
called way too much (I think both).

It seems the only reason text measurement needs to be done on view-source is
because of tabs.  I wonder if it could be optimized any better for these cases?

There are lots of other things that look slow and possible to fix in this
profile.  I might try to fix some of the easy (small, too, unfortunately) ones.
Comment 9 David Baron :dbaron: ⌚️UTC-7 (review requests must explain patch) 2000-04-07 20:06:44 PDT
A few things of interest, in the order I find them (I'm looking through the
second profile above, so that's where the numbers are coming from):
 * 9% of the time is spent in AtomImpl::AddRef and AtomImpl::Release.  Do these
really need to be threadsafe?
 * Inline reflow, text measurement, etc., seem to take unusually long (although
perhaps this is normal).  Is this a problem with the way view-source is
structuring the data or a problem with those algorithms?  Much of the time is
spent dealing with text runs and word breaking.
 * nsXMLContentSink::GetElementFactor was using about 3% of the time.  Much of
that time is in nsCOMPtr code.  It might be better to call GetService directly
rather than use NS_WITH_SERVICE.  I'm not sure...
 * A considerable amount of time is also spent in style resolution.  Could this
be optimized?  In particular, some of the generated content /
pseudo-style-context code looks like it's taking lots of time.  What about the
document (or the view-source stylesheet) causes this?  Could the view-source be
done some way other than generated content?  Can the generated content code be
made faster?
 * The XML content sink (everything within VCiewSourceHTML::WriteTag) is taking
up 10.8% of the time.  Much of this seems to be due to the nsCOMPtr problem
mentioned above, though.  However, it seems to me it would be much faster to do
the document creation through the DOM rather than through the content sink
(which spends lots of time dealing with tag names as strings, etc.).  Is this a
possibility?

Also, a few things to note about the profiles:  the interval between hits is
10ms, so any statistically significant number (which depends on how accurate you
want to be) of hits means about that number of hits times 10ms was spent in that
function.  Also, there's a little bit of the profile devoted to my moving the
mouse to the menu and picking view source from the menu.  It's 370ms (37 hits)
of the profile (why so much?).  But ignore it...
Comment 10 Nisheeth Ranjan 2000-04-08 10:55:54 PDT
Adding, Troy, Steve and Rick Gessner to the cc list.

Updating the summary to reflect the real problem.  The problem is not limited to 
view-source.  The XML content sink is non-incremental, so, all XML documents 
start laying out only after the entire content model has been constructed.  
That is why all long XML documents take a long time to display.

David Baron has done a jprof run on viewsource of large documents.  Please take 
a look at his findings on this bug report and add your comments.  Thanks (to you 
and David Baron for doing such a great analysis).
Comment 11 David Baron :dbaron: ⌚️UTC-7 (review requests must explain patch) 2000-04-08 11:12:15 PDT
I think these findings may be somewhat specific to view-source, since huge XML
documents with very shallow tree structures where almost everything is a child
or grandchild of the root element, and the root element is the only block-level
element are probably quite rare.  There's also lots of generated content, but
that's probably not so unusual.  And, the layout issues probably aren't specific
to XML, but probably are specific to the tree structure (and the size of the
block).

I wonder if splitting preformatted elements up by-line before processing them
(or something like that) would improve the layout problems?  It seems (although
I'm not sure) that some of the layout algorithms showing up here are O(N^2) on
the size of the inline content of a block (and run on each block), and the
entire viewsource document is one block.  But perhaps they should be changed
instead (if possible).
Comment 12 troy 2000-04-08 19:21:15 PDT
David, on what platform were you doing the profiling? Windows is better now and 
I have more changes in my tree that improve text rendering speed even more. If 
it's preformatted text, though, then my changes will help some but that code 
path has known performance problems

Unix and Mac haven't implemented the new rendering context functions to measure 
text in chunks so they won't see much performance benefit yet
Comment 13 David Baron :dbaron: ⌚️UTC-7 (review requests must explain patch) 2000-04-08 19:28:01 PDT
I was profiling on Linux.  Still, the text measurement stuff isn't the only
problem shown in the profile...
Comment 14 Nisheeth Ranjan 2000-04-13 14:15:15 PDT
Move back into M16...
Comment 15 David Baron :dbaron: ⌚️UTC-7 (review requests must explain patch) 2000-04-18 06:25:37 PDT
Adding dependencies to text measurement bugs for Mac and Unix.  There are other
specific bugs that need to be filed too.
Comment 16 Nisheeth Ranjan 2000-05-03 17:11:39 PDT
Moving XML/HTML content sink factoring related bugs out to M17...
Comment 17 Nisheeth Ranjan 2000-06-23 15:42:34 PDT
We need to decide whether the work to make the XML content sink more incremental 
is going to happen in the beta 3 timeframe.  For now, putting on the beta 3 
radar...
Comment 18 Peter Trudelle 2000-08-08 15:49:29 PDT
would be nice if it were faster, but it is better than before, and is no longer 
a critical problem. nsbeta3-, ->future
Comment 19 Hixie (not reading bugmail) 2001-04-26 02:10:06 PDT
Ok, this is a real blocker for transitioning to XML in the future. Compare our
performance on these two tests:

   http://www.damowmow.com/mozilla/bugs/18333/xml  a test served as text/xml
   http://www.damowmow.com/mozilla/bugs/18333/html same, served as text/html

Nominating for various things.
Comment 20 Heikki Toivonen (remove -bugzilla when emailing directly) 2002-10-07 10:52:08 PDT
*** Bug 162404 has been marked as a duplicate of this bug. ***
Comment 21 Greg K. 2002-10-07 11:44:41 PDT
Setting Platform=All per comment 20.
Comment 22 Heikki Toivonen (remove -bugzilla when emailing directly) 2003-04-03 12:41:01 PST
*** Bug 200532 has been marked as a duplicate of this bug. ***
Comment 23 David A. Madore 2003-09-03 06:31:23 PDT
OK, four years after it's been filed (note that it still has "mozilla1.0" in the
keywords...), could someone give us an update on this bug?  Is someone still
working on it or should it be WONTFIX?

What I'd mostly like to know is, how difficult fixing this would be?  Can anyone
fill me up on this?  Is there some basic limitation that would be immensely hard
to work around (such as eXpat - which I believe Mozilla uses for XML - being
inadequate for incremental parsing)?  Or would it be a "lengthy but
straightforward" correction?

I'm afraid the absence of incremental rendering of XHTML by Mozilla is going to
be either hugely detrimental to the adoption of XHTML, or hugely detrimental to
Mozilla.  I'm told that Opera does incremental XHTML rendering.
Comment 24 Heikki Toivonen (remove -bugzilla when emailing directly) 2003-09-03 09:58:52 PDT
Nobody is working on this afaik, but this should not be wontfixed. We need this.

Expat shouldn't be a problem. If nothing else, it can now be stopped and resumed
at will.
Comment 25 Boris Zbarsky [:bz] 2003-10-08 00:22:35 PDT
See bug 220930 for some comments on what needs doing here (basically,
StartLayout() needs to happen earlier and ContentAppended notifications need to
be delivered, as well as interrupting the sink every so often to give layout a
chance to reflow and paint).
Comment 26 Christian :Biesinger (don't email me, ping me on IRC) 2003-12-31 08:44:27 PST
*** Bug 229773 has been marked as a duplicate of this bug. ***
Comment 27 Henri Sivonen (:hsivonen) 2006-03-07 12:50:41 PST
I am scheduled to work on this in June. Assigning to self since no one else seems to be working on it.
Comment 28 Henri Sivonen (:hsivonen) 2006-06-10 12:50:49 PDT
Some plans:
http://hsivonen.iki.fi/kesakoodi/content-sink/

Feedback appreciated.
Comment 29 Henri Sivonen (:hsivonen) 2006-07-06 06:42:39 PDT
Test builds for OS X and Ubuntu are available:
http://hsivonen.iki.fi/kesakoodi/builds/
Comment 30 David Baron :dbaron: ⌚️UTC-7 (review requests must explain patch) 2006-07-06 07:41:43 PDT
Test builds using what code?  Any chance you could attach the patch somewhere?  Or is it a branch in our CVS repository?
Comment 31 Mikko Rantalainen 2006-07-07 04:07:36 PDT
The builds and the patch seem to be at http://hsivonen.iki.fi/test/bug18333/builds/. Perhaps the patch should be added as attachment to this bug?

I tested the i686 build on Linux Mandriva 2005 and couldn't find any major issues during normal browsing. At least simple XHTML files such as the one in comment 19 seem to render incrementally. The build seems to be at least as stable as any random trunk build from mozilla.org. (I don't like the text zoom overriding the 100% text zoom. It should be an additional toggled feature. Now after pressing CTRL+0 a couple of times I have no idea what is the real zoom percentage.)

I think that stuff that goes through XSLT doesn't (yet?) render incrementally. An example of this would be http://www.w3.org/TR/1999/PR-xslt-19991008.xml

An example of XML+XSTL to HTML not rendering incrementally can be found here (an old technology demo in Finnish, ignore the content): http://www.cc.jyu.fi/~mira/demos/xml/bookstore/bookstore.php
This example outputs the whole file and sleeps 5 seconds before closing the connection (browser has no way to know if the script is going to output something else, it could render what it has so far, though). All the linked resources (CSS and XSLT files) are provided immediately to the browser.
Comment 32 Henri Sivonen (:hsivonen) 2006-07-07 05:33:12 PDT
Created attachment 228417 [details] [diff] [review]
Make the XML content sink incremental

The attached patch contains my changes relevant to this bug as of now.

The full diff for yesterday's test builds is at
http://hsivonen.iki.fi/test/bug18333/builds/firefox-2006-07-06-hsivonen.diff

The attached patch contains fixes to interrupting Expat that weren't in yesterday's builds.

XSLT is not supposed to be incremental. This means that XML pretty-printing isn't incremental, either. Making XSLT incremental is *way* beyond the scope of this bug.
Comment 33 Henri Sivonen (:hsivonen) 2006-07-07 14:28:42 PDT
Made a new batch of test builds with the changes from the patch I attached earlier.

http://hsivonen.iki.fi/test/bug18333/builds/
Comment 34 Henri Sivonen (:hsivonen) 2006-07-24 16:12:51 PDT
Created attachment 230512 [details] [diff] [review]
Make the XML content sink incremental, v2

Cleaned up the patch. Fixed crashes with innerHTML.

Remaining issues:
Exposes bug 344281. Does this bug need to wait for that one?
Scripts aren't stopped if a fatal XML error occurs.

Requesting r= anyway to discover possible other problems while researching the above issues.
Comment 35 Jonathan Stanley 2006-07-25 11:40:47 PDT
I've tried the v2 patch on trunk CVS as of about 20060725t1700 (UTC), though whilst it patches cleanly, the build fails with:

Building deps for /cygdrive/c/Firefox/mozilla/content/html/document/src/nsHTMLCo
ntentSink.cpp
/cygdrive/c/Firefox/mozilla/build/cygwin-wrapper cl -FonsHTMLContentSink.obj -c
 -DMOZILLA_INTERNAL_API -D_IMPL_NS_GFX -D_IMPL_NS_MSG_BASE -D_IMPL_NS_WIDGET  -D
OSTYPE=\"WINNT5.1\" -DOSARCH=\"WINNT\" -DBUILD_ID=0000000000 -D_IMPL_NS_LAYOUT
-I../../../../dist/include/xpcom -I../../../../dist/include/string -I../../../..
/dist/include/gfx -I../../../../dist/include/layout -I../../../../dist/include/w
idget -I../../../../dist/include/dom -I../../../../dist/include/necko -I../../..
/../dist/include/htmlparser -I../../../../dist/include/locale -I../../../../dist
/include/view -I../../../../dist/include/js -I../../../../dist/include/intl -I..
/../../../dist/include/webshell -I../../../../dist/include/docshell -I../../../.
./dist/include/caps -I../../../../dist/include/util -I../../../../dist/include/u
conv -I../../../../dist/include/pref -I../../../../dist/include/uriloader -I../.
./../../dist/include/rdf -I../../../../dist/include/chardet -I../../../../dist/i
nclude/nkcache -I../../../../dist/include/lwbrk -I../../../../dist/include/imgli
b2 -I../../../../dist/include/xpconnect -I../../../../dist/include/unicharutil -
I../../../../dist/include/commandhandler -I../../../../dist/include/composer -I.
./../../../dist/include/editor -I../../../../dist/include/txtsvc -I../../../../d
ist/include/plugin -I../../../../dist/include   -I../../../../dist/include/conte
nt -I../../../../dist/include/nspr    -I../../../../dist/sdk/include -I/cygdrive
/c/Firefox/mozilla/content/html/document/src/../../../base/src -I/cygdrive/c/Fir
efox/mozilla/content/html/document/src/../../../events/src -I/cygdrive/c/Firefox
/mozilla/content/html/document/src/../../content/src -I/cygdrive/c/Firefox/mozil
la/content/html/document/src/../../../../layout/style -I/cygdrive/c/Firefox/mozi
lla/content/html/document/src/../../../../dom/src/base        -GR- -TP -nologo -
Zc:wchar_t- -W3 -Gy -FdnsHTMLContentSink.pdb  -DNDEBUG -DTRIMMED -O1 -MD
    -D_CRT_SECURE_NO_DEPRECATE=1 -D_CRT_NONSTDC_NO_DEPRECATE=1 -DWINVER=0x500 -D
_WIN32_WINNT=0x500 -DX_DISPLAY_MISSING=1 -DMOZILLA_VERSION=\"1.9a1\" -DMOZILLA_V
ERSION_U=1.9a1 -DHAVE_SNPRINTF=1 -D_WINDOWS=1 -D_WIN32=1 -DWIN32=1 -DXP_WIN=1 -D
XP_WIN32=1 -DHW_THREADS=1 -DSTDC_HEADERS=1 -DWIN32_LEAN_AND_MEAN=1 -DNO_X11=1 -D
_X86_=1 -DD_INO=d_ino -DMOZ_EMBEDDING_LEVEL_DEFAULT=1 -DMOZ_EMBEDDING_LEVEL_BASI
C=1 -DMOZ_EMBEDDING_LEVEL_MINIMAL=1 -DMOZ_PHOENIX=1 -DMOZ_BUILD_APP=browser -DMO
Z_XUL_APP=1 -DMOZ_DEFAULT_TOOLKIT=\"cairo-windows\" -DMOZ_THEBES=1 -DMOZ_CAIRO_G
FX=1 -DMOZ_DISTRIBUTION_ID=\"org.mozilla\" -DOJI=1 -DIBMBIDI=1 -DMOZ_VIEW_SOURCE
=1 -DMOZ_XPINSTALL=1 -DMOZ_JSLOADER=1 -DNS_PRINTING=1 -DNS_PRINT_PREVIEW=1 -DMOZ
_NO_XPCOM_OBSOLETE=1 -DMOZ_XTF=1 -DMOZ_ENABLE_CANVAS=1 -DMOZ_SVG=1 -DMOZ_SVG_FOR
EIGNOBJECT=1 -DMOZ_UPDATE_CHANNEL=default -DMOZ_PLACES=1 -DMOZ_FEEDS=1 -DMOZ_STO
RAGE=1 -DMOZ_SAFE_BROWSING=1 -DMOZ_URL_CLASSIFIER=1 -DMOZ_LOGGING=1 -DMOZ_USER_D
IR=\"Mozilla\" -DHAVE_UINT64_T=1 -DMOZ_XUL=1 -DMOZ_PROFILELOCKING=1 -DMOZ_MORKRE
ADER=1 -DMOZ_DLL_SUFFIX=\".dll\" -DJS_THREADSAFE=1 -DMOZILLA_LOCALE_VERSION=\"1.
9a1\" -DMOZILLA_REGION_VERSION=\"1.9a1\" -DMOZILLA_SKIN_VERSION=\"1.8\"  -D_MOZI
LLA_CONFIG_H_ -DMOZILLA_CLIENT /cygdrive/c/Firefox/mozilla/content/html/document
/src/nsHTMLContentSink.cpp
nsHTMLContentSink.cpp
c:\firefox\mozilla\content\html\document\src\nshtmlcontentsink.cpp(2928) : error
 C4716: 'HTMLContentSink::WillProcessTokens' : must return a value
make[7]: *** [nsHTMLContentSink.obj] Error 2
make[7]: Leaving directory `/cygdrive/c/Firefox/mozilla/firefox-static/content/h
tml/document/src'
make[6]: *** [libs] Error 2
make[6]: Leaving directory `/cygdrive/c/Firefox/mozilla/firefox-static/content/h
tml/document'
make[5]: *** [libs] Error 2
make[5]: Leaving directory `/cygdrive/c/Firefox/mozilla/firefox-static/content/h
tml'
make[4]: *** [libs] Error 2
make[4]: Leaving directory `/cygdrive/c/Firefox/mozilla/firefox-static/content'
make[3]: *** [libs_tier_9] Error 2
make[3]: Leaving directory `/cygdrive/c/Firefox/mozilla/firefox-static'
make[2]: *** [tier_9] Error 2
make[2]: Leaving directory `/cygdrive/c/Firefox/mozilla/firefox-static'
make[1]: *** [default] Error 2
make[1]: Leaving directory `/cygdrive/c/Firefox/mozilla/firefox-static'
make: *** [build] Error 2
Comment 36 Henri Sivonen (:hsivonen) 2006-07-25 12:04:32 PDT
Created attachment 230613 [details] [diff] [review]
230512: Make the XML content sink incremental, v3

Good catch. I wonder why that didn't cause a compile error on OS X and Linux. Fixed in patch v3.
Comment 37 Jonathan Stanley 2006-07-25 22:48:11 PDT
(In reply to comment #36)
> Created an attachment (id=230613) [edit]
> 230512: Make the XML content sink incremental, v3
> 
> Good catch. I wonder why that didn't cause a compile error on OS X and Linux.
> Fixed in patch v3.
> 

The v3 patched cleanly and compiled successfully (I am using VS C++ 2005 Express) with CVS as of 2006-07-25 12:00 PDT. Shall dog-food it for a while though initial impressions are it does fixes bug 309721 for me, which depended on this.
Comment 38 Boris Zbarsky [:bz] 2006-07-26 07:38:02 PDT
There needs to be an impl of FlushPendingNotifications in the XML sink, like there is in the HTML one, no?

Also, why did nsHTMLDocument end up with extra methods?  Esp. when it doesn't actually usefully override them?
Comment 39 Henri Sivonen (:hsivonen) 2006-07-31 04:19:15 PDT
Created attachment 231387 [details] [diff] [review]
Make the XML content sink incremental, v4

> There needs to be an impl of FlushPendingNotifications in the XML sink, like
> there is in the HTML one, no?

Thanks. Added.

> Also, why did nsHTMLDocument end up with extra methods?  Esp. when it doesn't
> actually usefully override them?

Those were debugging hooks that I forgot to take out. Taken out.
Comment 40 Boris Zbarsky [:bz] 2006-07-31 07:44:51 PDT
Henri, I'm not going to be able to do a full review of this in a reasonable timeframe...  You really do want someone more familiar with the XML sink for starters (like peterv or sicking).  I'd love to do the sr (check over how the changes interact with the rest of Gecko, etc, without checking every single line of code).

On which note, as I pointed out in the newsgroup at some point, you do need the DummyParserRequest to handle correctly stopping parsing when STOP_NETWORK happens.  I don't see that happening in this patch.

Also, adding the FlushPendingNotifications impl itself is not enough.  It needs to be called too; the relevant code in nsHTMLDocument::FlushPendingNotifications needs to be pushed up to nsDocument, probably.  I suggest looking up the blame for that code and trying out those testcases.  ;)
Comment 41 Henri Sivonen (:hsivonen) 2006-08-01 01:52:27 PDT
(In reply to comment #40)
> On which note, as I pointed out in the newsgroup at some point, you do need the
> DummyParserRequest to handle correctly stopping parsing when STOP_NETWORK
> happens.  I don't see that happening in this patch.

Ouch. I specifically undid my DummyParserRequest refactoring since it seemed to do more harm than good. How do I test for the STOP_NETWORK case? (What does it even mean?)

> Also, adding the FlushPendingNotifications impl itself is not enough.  It needs
> to be called too; the relevant code in
> nsHTMLDocument::FlushPendingNotifications needs to be pushed up to nsDocument,
> probably. 

Actually, I did check that it gets called. nsHTMLDocument is used on the XML side. But I guess there are cases (possibly XBL) that I overlooked. I'll look into it.
Comment 42 Jesse Ruderman 2006-08-01 03:56:31 PDT
>Remaining issues:
>...
>Scripts aren't stopped if a fatal XML error occurs.

I think that issue also exists without this patch.
Comment 43 Boris Zbarsky [:bz] 2006-08-01 07:35:30 PDT
> How do I test for the STOP_NETWORK case?

Find a caller of nsDocShell::Stop() with STOP_NETWORK passed in and see whether it actually stops the parser?

Typical testcase would look something like this, probably:

1)  An XML file being sent by a server which has a <script> in it.
2)  The <script> loads slowly (sleep() calls in the CGI serving up the <script> 
    or something) so that by the time the <script> data is in all the data for
    the main document has been received.
3)  The data for the main document is long enough that we'll interrupt the
    parser and return to the main event loop multiple times (enough to give a
    chance for a user to interact with the page meaningfully).  Probably a few
    tens of kilobytes of something pretty simple (no tables) would do.
4)  The user clicks a link on the page pointing to another http:// URI.
5)  This new URI load is slow (sleep() calls in the CGI again; give this at
    least 15-20 seconds).

The test is whether anything gets parsed and added to the visible document after the user clicks the link (but before the new URI starts to load). Current HTML behavior should be "no", if I understand it correctly.

> Actually, I did check that it gets called. nsHTMLDocument is used on the XML
> side.

Only for the XHTML case.  Try loading the same exact file as text/xml (and adjust the scripts testing things as needed to not assume the HTML DOM) and see what things look like.
Comment 44 Henri Sivonen (:hsivonen) 2006-08-01 13:21:54 PDT
Created attachment 231639 [details] [diff] [review]
Make the XML content sink incremental, v5

Hoisted FlushPendingNotifications.
Comment 45 Jesse Ruderman 2006-08-04 04:25:07 PDT
This will probably cause ugliness for MathML during incremental rendering.  Gecko's MathML code currently shows the text "invalid-markup" in the content area if certain elements (e.g. <mroot>) have too few or too many children.
Comment 46 Boris Zbarsky [:bz] 2006-08-04 08:13:45 PDT
The HTML sink handles this by having "monolithic containers".  May be worth doing that here...  Alternately, maybe we could change mathml to only show that after onload or something... or not at all.  rbs, what do you think?
Comment 47 Henri Sivonen (:hsivonen) 2006-08-07 13:56:56 PDT
I've already tried making the math element monolithic. It does not fix the problem.
Comment 48 Boris Zbarsky [:bz] 2006-08-07 17:24:05 PDT
OK.  Seriously, talk to rbs (over email if need be) about the mathml issues...
Comment 49 rbs 2006-08-08 19:00:53 PDT
Per your comment on in bug 344281 I went to
    http://hsivonen.iki.fi/test/bug18333/builds/
and grabbed
    firefox-2006-07-07-hsivonen.diff

I applied the diff in my tree (today's tip of the trunk) to see for myself what was going on. There were a couple of problems that I had to resolve along the way:
- nsITextContent is no more on the trunk.
- missing NS_IMETHOD WillTokenize() somewhere.

After fixing the compile errors, I cannot get the build to start. Both with a firefox build and a suite build, I am getting these never-ending assertions:

###!!! ASSERTION: QueryInterface needed: 'query_result.get() == mRawPtr', file d
:\mozilla\tip\objs\suite\dist\include\xpcom\nsCOMPtr.h, line 594
###!!! ASSERTION: QueryInterface needed: 'query_result.get() == mRawPtr', file d
:\mozilla\tip\objs\suite\dist\include\xpcom\nsCOMPtr.h, line 594
chrome://global/content/bindings/dialog.xml ##### mBeginLoadTime set to current
time
###!!! ASSERTION: QueryInterface needed: 'query_result.get() == mRawPtr', file d
:\mozilla\tip\objs\suite\dist\include\xpcom\nsCOMPtr.h, line 594
chrome://global/content/bindings/dialog.xml ##### WillProcessTokensImpl: mDelayT
imerStart set
chrome://global/content/bindings/dialog.xml ##### no presshell
chrome://global/content/bindings/dialog.xml ##### no presshell
chrome://global/content/bindings/dialog.xml ##### no presshell
chrome://global/content/bindings/dialog.xml ##### no presshell
chrome://global/content/bindings/dialog.xml ##### no presshell
###!!! ASSERTION: QueryInterface needed: 'query_result.get() == mRawPtr', file d
:\mozilla\tip\objs\suite\dist\include\xpcom\nsCOMPtr.h, line 594
###!!! ASSERTION: QueryInterface needed: 'query_result.get() == mRawPtr', file d
:\mozilla\tip\objs\suite\dist\include\xpcom\nsCOMPtr.h, line 594
###!!! ASSERTION: QueryInterface needed: 'query_result.get() == mRawPtr', file d
:\mozilla\tip\objs\suite\dist\include\xpcom\nsCOMPtr.h, line 594
Comment 50 rbs 2006-08-17 18:03:35 PDT
Some encouraging comments:

I was able to get pass the assertions by setting the following env variables which stops the zillion dialogs from poping up (and thus requesting zillion clicks):

XPCOM_DEBUG_BREAK=warn

With that, I can browse, but because of the numerous printfs introduced, it is way too slow to use. I could only quickly confirm that the reported MathML mtable issues are not there anymore now that bug 344281 has been fixed. (Also, the diff in nsFont.cpp is not needed anymore.)

IMO, you should resubmit an updated patch that addresses the assertions for the benefit of the reviewers.
Comment 51 Henri Sivonen (:hsivonen) 2006-08-21 03:36:38 PDT
Created attachment 234758 [details] [diff] [review]
Make the XML content sink incremental, v6

Fixed the use of XPCOMification macros thereby silencing the assertions. I'll look into the STOP_NETWORK case next.
Comment 52 Henri Sivonen (:hsivonen) 2006-08-22 02:39:31 PDT
Currently the old DOM is appended to until after the user has clicked a link but the new page has only returned HTTP headers but not the resource itself.
Comment 53 Henri Sivonen (:hsivonen) 2006-08-22 05:18:20 PDT
Created attachment 234924 [details] [diff] [review]
Make the XML content sink incremental, v7

Since I haven't heard back on getting rid of DummyParserRequest altogether, this patch uses the DummyParserRequest on the XML side and fixes the STOP_NETWORK case form comment 43.

Stopping scripts doesn't happen without this patch, either, so that should probably be spun into a separate bug.

I believe this patch now addresses all the issues raised so far that are in the scope of this bug.
Comment 54 Peter Van der Beken [:peterv] 2006-10-12 02:05:17 PDT
Comment on attachment 234924 [details] [diff] [review]
Make the XML content sink incremental, v7

I wonder if it would make sense to have a class between nsContentSink and ns[XML|HTML]ContentSink that implements DidBuildModel, WillBuildModel, etc

> Index: mozilla/parser/htmlparser/public/nsIContentSink.h
> ===================================================================

> @@ -64,6 +65,17 @@
>    NS_DECLARE_STATIC_IID_ACCESSOR(NS_ICONTENT_SINK_IID)
>  
>    /**
> +   * This method gets called before the nsParser calls tokenize. 
> +   * This is needed because the XML side actually builds 
> +   * the content model as part of the tokenization and 
> +   * not on BuildModel(). The XML side can use this call 
> +   * to do stuff that the HTML side does in WillBuildModel().

You mean BuildModel in that last sentence, right?
Couldn't we hoist WillProcessTokens to nsIContentSink and rename it instead?

> +   *
> +   * @update 2006-06-28 hsivonen
> +   */     
> +  NS_IMETHOD WillTokenize(void)=0;

> Index: mozilla/content/base/src/nsContentSink.cpp
> ===================================================================

> +nsresult
> +DummyParserRequest::Create(nsIRequest** aResult, nsContentSink* aSink)
> +{
> +  *aResult = new DummyParserRequest(aSink);
> +  if (!*aResult) {
> +    return NS_ERROR_OUT_OF_MEMORY;
> +  }
> +
> +  NS_ADDREF(*aResult);
> +
> +  return NS_OK;
> +}

There's no need for this function, just do |new DummyParserRequest(sink)| wherever you call this function.

> +
> +
> +DummyParserRequest::DummyParserRequest(nsContentSink* aSink)
> +{
> +  mSink = aSink;

Initializer list.

> +DummyParserRequest::Cancel(nsresult status)
> +{
> +  // Cancel parser
> +  nsresult rv = NS_OK;
> +  if ((mSink) && (mSink->mParser)) {

Drop the braces around mSink.

> +    mSink->mParser->CancelParsingEvents();
> +  }
> +  return rv;

No need for rv, just return NS_OK

> +NS_DEFINE_CID(kAppShellCID, NS_APPSHELL_CID);

Drop this, just use NS_APPSHELL_CID directly.

> +  
> +    mNotifyOnTimer =

Whitespace

> +  if (nsContentUtils::GetBoolPref("content.interrupt.parsing", PR_TRUE)) {
> +    mCanInterruptParser = PR_TRUE;

  mCanInterruptParser =
    nsContentUtils::GetBoolPref("content.interrupt.parsing", PR_TRUE)

> +PRBool
> +nsContentSink::IsTimeToNotify()
> +{
> +  if (!mNotifyOnTimer || !mLayoutStarted || !mBackoffCount ||
> +      mInMonolithicContainer) {
> +    return PR_FALSE;
> +  }
> +
> +  PRTime now = PR_Now();
> +  PRInt64 interval, diff;

You could use nsInt64 like in WillInterruptImpl.

> +
> +  LL_I2L(interval, GetNotificationInterval());
> +  LL_SUB(diff, now, mLastNotificationTime);
> +
> +  if (LL_CMP(diff, >, interval)) {

> +nsContentSink::DidProcessATokenImpl()

> +  if (mCanInterruptParser) {

return early if !mCanInterruptParser

> +    if ((!mDynamicLowerValue) &&
> +      (mLastSampledUserEventTime == eventTime)) {

No need for those braces.

> +      } else {
> +        if (mDynamicLowerValue) {

    }
    else if {

> +nsContentSink::WillBuildModelImpl() {

Move the brace down.

> +  if (mCanInterruptParser) {

> +  }

> +  mBeginLoadTime = PR_IntervalToMicroseconds(PR_IntervalNow());

Could put this inside the mCanInterruptParser block?

> Index: mozilla/content/base/src/nsContentSink.h
> ===================================================================

> +static PRLogModuleInfo* gContentSinkLogModuleInfo;

I think you should mark it extern here and move this to nsContentSink.cpp.

> +#define SINK_TRACE(_bit, _args)                       \
> +  PR_BEGIN_MACRO                                      \
> +    if (SINK_LOG_TEST(gContentSinkLogModuleInfo, _bit)) {    \
> +      PR_LogPrint _args;                              \
> +    }                                                 \

Please line up the \

Shouldn't you remove the SINK_TRACE and SINK_TRACE_NODE defines from nsHTMLContentSink?

> +#undef SINK_NO_INCREMENTAL

Why undef it? We don't want to allow people to define it?

> +class DummyParserRequest;

No need for this.

>  class nsContentSink : public nsICSSLoaderObserver,

> +  void NotifyAppend(nsIContent* aContent,
> +                    PRUint32 aStartIndex);

No need to put this on two lines.

> +  // these are de facto abstract, but making them non-abstract to 
> +  // make it look like this class could be instantiated on its own.

Why?

> +  virtual nsresult FlushTags(PRBool aNotify) { return NS_OK; }
> +
> +  virtual void TryToScrollToRef() { return; }

> Index: mozilla/content/html/document/src/nsHTMLContentSink.cpp
> ===================================================================

> +#ifdef NS_DEBUG
> +  PRInt32 mFlags;
> +#endif

Do we still need this?

> @@ -1124,7 +901,7 @@
>  
>      case eHTMLTag_frameset:
>        if (!mSink->mFrameset &&
> -          mSink->mFlags & NS_SINK_FLAG_FRAMES_ENABLED) {
> +          mSink->mFramesEnabled) {

Rewrap.

> +// Forked into nsXMLContentSink. Please keep in sync.

> +// forked to nsXMLContentSink. Please keep in sync.

It'd be good if we could avoid this. At least file a follow up bug to unfork.

> @@ -2754,7 +2321,7 @@

> +  if (done && (mFramesEnabled)) {

Drop the inner braces.

> +HTMLContentSink::FlushTags(PRBool aNotify) {
> +  if (mCurrentContext) {
> +    return mCurrentContext->FlushTags(aNotify);
> +  }	else {
> +    return NS_OK;	
> +  }

  return mCurrentContext ? mCurrentContext->FlushTags(aNotify) : NS_OK;

> Index: mozilla/content/xml/document/src/nsXMLContentSink.cpp
> ===================================================================

> @@ -154,7 +157,13 @@
>  }
>  
>  nsXMLContentSink::~nsXMLContentSink()
> -{
> +{ 

Don't add trailing whitespace.

> +NS_IMPL_ISUPPORTS_INHERITED7(nsXMLContentSink,
>                               nsContentSink,
>                               nsIContentSink,
>                               nsIXMLContentSink,
>                               nsIExpatSink,
> +                             nsITimerCallback,
> +                             nsIDocumentObserver,
> +                             nsIMutationObserver,
>                               nsITransformObserver)
> +                             

Don't add trailing whitespace.

> +PRBool
> +nsXMLContentSink::CanStillPrettyPrint()
> +{
> +  return !(!mPrettyPrintXML || (mPrettyPrintHasFactoredElements &&
> +                           !mPrettyPrintHasSpecialRoot));

I think this is equivalent to

  return mPrettyPrintXML &&
         (!mPrettyPrintHasFactoredElements || mPrettyPrintHasSpecialRoot);

which is a bit easier to read.

>  nsXMLContentSink::PushContent(nsIContent *aContent)
>  {

> +  StackNode sn;
> +  sn.mContent = aContent;
> +  sn.mNumFlushed = 0;
> +  NS_ADDREF(aContent);
> +  mContentStack.AppendElement(sn);

You should do:

  StackNode *sn = mContentStack.AppendElement();
  ...

>  nsXMLContentSink::PopContent()
>  {  

> +  NS_RELEASE(mContentStack[count - 1].mContent);
> +  // Don't addref again. Keep the refcount from the stack 

This comment doesn't make sense.

> +  return (result == NS_OK) ? DidProcessATokenImpl() : result;

Use NS_SUCCEEDED instead of == NS_OK (others like that in this file)

> @@ -1312,6 +1403,10 @@
>      mXSLTProcessor->CancelLoads();
>      mXSLTProcessor = nsnull;
>    }
> +  
> +  // release the nodes on stack
> +  while(PopContent() > 0);

If you'd make the stacknodes hold an nsCOMPtr you could just call Clear here.

> +    // XXX the newline normalization here is bogus -- hsivonen

I don't see why.

> +nsXMLContentSink::FlushTags(PRBool aNotify)

> +    // responsibility to handle flushing interactions between contexts (see
> +    // HTMLContentSink::BeginContext).

This comment needs updating.

> +    while (stackPos < stackLen) {

Make this |for (stackPos = 0; stackPos < stackLen; ++stackPos)|

> +nsXMLContentSink::DidAddContent()
> +{
> +  if(IsTimeToNotify()) {

Space after if.

> +    FlushTags(PR_TRUE);	
> +  }
> +}

I think you should just replace the callers with this and remove the function. Or at least inline it.

> Index: mozilla/content/xml/document/src/nsXMLContentSink.h
> ===================================================================

> +struct StackNode {
> +  nsIContent* mContent;

I'd try to use an nsCOMPtr here.

> Index: mozilla/parser/htmlparser/src/nsExpatDriver.cpp
> ===================================================================

> @@ -536,6 +544,8 @@
>          mInternalState = mSink->HandleCharacterData(newline, 1);
>        }
>      }
> +    // XXX the code above is bogus -- hsivonen
> +    MaybeStopParser();

Huh?
And this should be in the else block.

> -  NS_PRECONDITION(mInternalState != NS_ERROR_HTMLPARSER_BLOCK || !aBuffer,
> +  NS_PRECONDITION(mInternalState != NS_ERROR_HTMLPARSER_BLOCK || mInternalState != NS_ERROR_HTMLPARSER_INTERRUPTED || !aBuffer,

Long line.

> +    if (status == XML_STATUS_SUSPENDED && mInternalState != NS_ERROR_HTMLPARSER_BLOCK) {
> +      mInternalState = NS_ERROR_HTMLPARSER_INTERRUPTED;

This seems wrong, why isn't mInternalState == NS_ERROR_HTMLPARSER_INTERRUPTED already?

> +    } else
> +    if (status == XML_STATUS_ERROR) {

else on its own line, if on the same line as the else.

> -         (mInternalState == NS_ERROR_HTMLPARSER_BLOCK && mExpatBuffered > 0)) {
> +         ((mInternalState == NS_ERROR_HTMLPARSER_BLOCK || mInternalState == NS_ERROR_HTMLPARSER_INTERRUPTED) && mExpatBuffered > 0)) {

Long line.

> -    PRBool blocked = mInternalState == NS_ERROR_HTMLPARSER_BLOCK;
> +    PRBool blocked = (mInternalState == NS_ERROR_HTMLPARSER_BLOCK || mInternalState == NS_ERROR_HTMLPARSER_INTERRUPTED);

Long line.

> -    if (mInternalState == NS_ERROR_HTMLPARSER_BLOCK) {
> +    if (mInternalState == NS_ERROR_HTMLPARSER_BLOCK || mInternalState == NS_ERROR_HTMLPARSER_INTERRUPTED) {

Long line.

>        PR_LOG(gExpatDriverLog, PR_LOG_DEBUG,
>               ("Blocked parser (probably for loading linked stylesheets or "
>                "scripts)."));

Change the logging so it says 'Blocked or interrupted ...'

> @@ -1237,7 +1249,7 @@
>                            nsITokenObserver* anObserver,
>                            nsIContentSink* aSink)
>  {
> -  return mInternalState;
> +  return mInternalState;  

Don't add trailing spaces.

> +void
> +nsExpatDriver::MaybeStopParser()
> +{
> +  if (mInternalState == NS_ERROR_HTMLPARSER_BLOCK || mInternalState == NS_ERROR_HTMLPARSER_INTERRUPTED) {

Long line.

Maybe you should add an inline function that returns |mInternalState != NS_ERROR_HTMLPARSER_BLOCK || mInternalState != NS_ERROR_HTMLPARSER_INTERRUPTED| and us that wherever you do that check instead.
Comment 55 Jeff Walden [:Waldo] (remove +bmo to email) 2006-10-12 10:48:11 PDT
(In reply to comment #54)
> > +  LL_I2L(interval, GetNotificationInterval());
> > +  LL_SUB(diff, now, mLastNotificationTime);
> > +
> > +  if (LL_CMP(diff, >, interval)) {

NSPR no longer supports platforms without a 64-bit integer type, so the LL_ macros are unnecessary and just make the code less readable.  There was a post to n.p.m.xpcom about this (I think it was before the newsgroup transition) by darin, and I think biesi could also give further info on this/get you the relevant post if you need it.
Comment 56 :Gavin Sharp [email: gavin@gavinsharp.com] 2006-10-12 12:09:31 PDT
(In reply to comment #55)
> relevant post if you need it.

http://groups.google.com/group/mozilla.dev.tech.nspr/browse_frm/thread/3a737d5e9708db02/d48490f4c0781b92
Comment 57 Henri Sivonen (:hsivonen) 2006-10-17 03:43:24 PDT
Created attachment 242485 [details] [diff] [review]
Make the XML content sink incremental, v8

Thank you for the review comments. I refreshed the patch to apply to current trunk.

(In reply to comment #54)
> (From update of attachment 234924 [details] [diff] [review] [edit])
> I wonder if it would make sense to have a class between nsContentSink and
> ns[XML|HTML]ContentSink that implements DidBuildModel, WillBuildModel, etc

nsContentSink already is the common superclass for ns[XML|HTML]ContentSink without any sibling subclasses as far as I can tell.
See: http://hsivonen.iki.fi/kesakoodi/content-sink-diagram/

> > Index: mozilla/parser/htmlparser/public/nsIContentSink.h
> > ===================================================================
> 
> > @@ -64,6 +65,17 @@
> >    NS_DECLARE_STATIC_IID_ACCESSOR(NS_ICONTENT_SINK_IID)
> >  
> >    /**
> > +   * This method gets called before the nsParser calls tokenize. 
> > +   * This is needed because the XML side actually builds 
> > +   * the content model as part of the tokenization and 
> > +   * not on BuildModel(). The XML side can use this call 
> > +   * to do stuff that the HTML side does in WillBuildModel().
> 
> You mean BuildModel in that last sentence, right?

Actually, the HTML side does its thing in WillProcessTokens.

> Couldn't we hoist WillProcessTokens to nsIContentSink and rename it instead?

I am not sure. I don't know what moving the call to WillProcessTokens would couse on the HTML side. The nsIContentSink and nsIParser design assumes a separate tokenizer, but expat isn't exposed that way, so the XML and HTML sides need to do stuff at different points. I added stuff the the XML side, but I tried to avoid refactoring the HTML side when I was able to.

> > +   *
> > +   * @update 2006-06-28 hsivonen
> > +   */     
> > +  NS_IMETHOD WillTokenize(void)=0;
> 
> > Index: mozilla/content/base/src/nsContentSink.cpp
> > ===================================================================
> 
> > +nsresult
> > +DummyParserRequest::Create(nsIRequest** aResult, nsContentSink* aSink)
> > +{
> > +  *aResult = new DummyParserRequest(aSink);
> > +  if (!*aResult) {
> > +    return NS_ERROR_OUT_OF_MEMORY;
> > +  }
> > +
> > +  NS_ADDREF(*aResult);
> > +
> > +  return NS_OK;
> > +}
> 
> There's no need for this function, just do |new DummyParserRequest(sink)|
> wherever you call this function.

This and various other stylistic issues are not introduced by me but by whoever wrote the code on the HTML side. I merely moved existing code. Fixing anyway.

I hope I didn't break refcounting.

> > +
> > +
> > +DummyParserRequest::DummyParserRequest(nsContentSink* aSink)
> > +{
> > +  mSink = aSink;
> 
> Initializer list.

OK.

> > +DummyParserRequest::Cancel(nsresult status)
> > +{
> > +  // Cancel parser
> > +  nsresult rv = NS_OK;
> > +  if ((mSink) && (mSink->mParser)) {
> 
> Drop the braces around mSink.

OK.

> > +    mSink->mParser->CancelParsingEvents();
> > +  }
> > +  return rv;
> 
> No need for rv, just return NS_OK

OK.

> > +NS_DEFINE_CID(kAppShellCID, NS_APPSHELL_CID);
> 
> Drop this, just use NS_APPSHELL_CID directly.

It didn't compile that way.

> > +  
> > +    mNotifyOnTimer =
> 
> Whitespace

OK.

> > +  if (nsContentUtils::GetBoolPref("content.interrupt.parsing", PR_TRUE)) {
> > +    mCanInterruptParser = PR_TRUE;
> 
>   mCanInterruptParser =
>     nsContentUtils::GetBoolPref("content.interrupt.parsing", PR_TRUE)

OK.

> > +PRBool
> > +nsContentSink::IsTimeToNotify()
> > +{
> > +  if (!mNotifyOnTimer || !mLayoutStarted || !mBackoffCount ||
> > +      mInMonolithicContainer) {
> > +    return PR_FALSE;
> > +  }
> > +
> > +  PRTime now = PR_Now();
> > +  PRInt64 interval, diff;
> 
> You could use nsInt64 like in WillInterruptImpl.

But subsequent comments indicated that plain PRInt64 should be preferred.

> > +
> > +  LL_I2L(interval, GetNotificationInterval());
> > +  LL_SUB(diff, now, mLastNotificationTime);
> > +
> > +  if (LL_CMP(diff, >, interval)) {
> 
> > +nsContentSink::DidProcessATokenImpl()
> 
> > +  if (mCanInterruptParser) {
> 
> return early if !mCanInterruptParser

OK.

> > +    if ((!mDynamicLowerValue) &&
> > +      (mLastSampledUserEventTime == eventTime)) {
> 
> No need for those braces.

OK.

> > +      } else {
> > +        if (mDynamicLowerValue) {
> 
>     }
>     else if {

OK.

> > +nsContentSink::WillBuildModelImpl() {
> 
> Move the brace down.

OK.

> > +  if (mCanInterruptParser) {
> 
> > +  }
> 
> > +  mBeginLoadTime = PR_IntervalToMicroseconds(PR_IntervalNow());
> 
> Could put this inside the mCanInterruptParser block?

I am not sure--just moved existing code.

> > Index: mozilla/content/base/src/nsContentSink.h
> > ===================================================================
> 
> > +static PRLogModuleInfo* gContentSinkLogModuleInfo;
> 
> I think you should mark it extern here and move this to nsContentSink.cpp.

OK.

> > +#define SINK_TRACE(_bit, _args)                       \
> > +  PR_BEGIN_MACRO                                      \
> > +    if (SINK_LOG_TEST(gContentSinkLogModuleInfo, _bit)) {    \
> > +      PR_LogPrint _args;                              \
> > +    }                                                 \
> 
> Please line up the \

OK.

> Shouldn't you remove the SINK_TRACE and SINK_TRACE_NODE defines from
> nsHTMLContentSink?

I don't know. I tried to avoid touching them as much as possible.

> > +#undef SINK_NO_INCREMENTAL
> 
> Why undef it? We don't want to allow people to define it?

I don't know. That bit wasn't created by me--just moved by me.

> > +class DummyParserRequest;
> 
> No need for this.

OK.

> >  class nsContentSink : public nsICSSLoaderObserver,
> 
> > +  void NotifyAppend(nsIContent* aContent,
> > +                    PRUint32 aStartIndex);
> 
> No need to put this on two lines.

OK.

> > +  // these are de facto abstract, but making them non-abstract to 
> > +  // make it look like this class could be instantiated on its own.
> 
> Why?

To make it compile. Can't remember the details.

> > +  virtual nsresult FlushTags(PRBool aNotify) { return NS_OK; }
> > +
> > +  virtual void TryToScrollToRef() { return; }
> 
> > Index: mozilla/content/html/document/src/nsHTMLContentSink.cpp
> > ===================================================================
> 
> > +#ifdef NS_DEBUG
> > +  PRInt32 mFlags;
> > +#endif
> 
> Do we still need this?

Yes, unless all the debug code is changed.

> > @@ -1124,7 +901,7 @@
> >  
> >      case eHTMLTag_frameset:
> >        if (!mSink->mFrameset &&
> > -          mSink->mFlags & NS_SINK_FLAG_FRAMES_ENABLED) {
> > +          mSink->mFramesEnabled) {
> 
> Rewrap.

OK.

> > +// Forked into nsXMLContentSink. Please keep in sync.
> 
> > +// forked to nsXMLContentSink. Please keep in sync.
> 
> It'd be good if we could avoid this. At least file a follow up bug to unfork.

Can't avoid it without a rewrite on the HTML side that gets rid of the parser contexts.

> > @@ -2754,7 +2321,7 @@
> 
> > +  if (done && (mFramesEnabled)) {
> 
> Drop the inner braces.

OK.

> > +HTMLContentSink::FlushTags(PRBool aNotify) {
> > +  if (mCurrentContext) {
> > +    return mCurrentContext->FlushTags(aNotify);
> > +  }	else {
> > +    return NS_OK;	
> > +  }
> 
>   return mCurrentContext ? mCurrentContext->FlushTags(aNotify) : NS_OK;

OK.

> > Index: mozilla/content/xml/document/src/nsXMLContentSink.cpp
> > ===================================================================
> 
> > @@ -154,7 +157,13 @@
> >  }
> >  
> >  nsXMLContentSink::~nsXMLContentSink()
> > -{
> > +{ 
> 
> Don't add trailing whitespace.

OK.

> > +NS_IMPL_ISUPPORTS_INHERITED7(nsXMLContentSink,
> >                               nsContentSink,
> >                               nsIContentSink,
> >                               nsIXMLContentSink,
> >                               nsIExpatSink,
> > +                             nsITimerCallback,
> > +                             nsIDocumentObserver,
> > +                             nsIMutationObserver,
> >                               nsITransformObserver)
> > +                             
> 
> Don't add trailing whitespace.

OK.

> > +PRBool
> > +nsXMLContentSink::CanStillPrettyPrint()
> > +{
> > +  return !(!mPrettyPrintXML || (mPrettyPrintHasFactoredElements &&
> > +                           !mPrettyPrintHasSpecialRoot));
> 
> I think this is equivalent to
> 
>   return mPrettyPrintXML &&
>          (!mPrettyPrintHasFactoredElements || mPrettyPrintHasSpecialRoot);
> 
> which is a bit easier to read.

OK.

> >  nsXMLContentSink::PushContent(nsIContent *aContent)
> >  {
> 
> > +  StackNode sn;
> > +  sn.mContent = aContent;
> > +  sn.mNumFlushed = 0;
> > +  NS_ADDREF(aContent);
> > +  mContentStack.AppendElement(sn);
> 
> You should do:
> 
>   StackNode *sn = mContentStack.AppendElement();
>   ...

OK.

> >  nsXMLContentSink::PopContent()
> >  {  
> 
> > +  NS_RELEASE(mContentStack[count - 1].mContent);
> > +  // Don't addref again. Keep the refcount from the stack 
> 
> This comment doesn't make sense.

OK.

> > +  return (result == NS_OK) ? DidProcessATokenImpl() : result;
> 
> Use NS_SUCCEEDED instead of == NS_OK (others like that in this file)

OK.

> > @@ -1312,6 +1403,10 @@
> >      mXSLTProcessor->CancelLoads();
> >      mXSLTProcessor = nsnull;
> >    }
> > +  
> > +  // release the nodes on stack
> > +  while(PopContent() > 0);
> 
> If you'd make the stacknodes hold an nsCOMPtr you could just call Clear here.

I don't understand nsCOMPtr well enough to understand that comment.

> > +    // XXX the newline normalization here is bogus -- hsivonen
> 
> I don't see why.

It is done by expat. Doing it twice is just wrong. (XML has a spec peculiarity here that allows CR in the DOM if escaped in source, but Gecko is non-compliant due to normalizing twice.)

> > +nsXMLContentSink::FlushTags(PRBool aNotify)
> 
> > +    // responsibility to handle flushing interactions between contexts (see
> > +    // HTMLContentSink::BeginContext).
> 
> This comment needs updating.

OK.

> > +    while (stackPos < stackLen) {
> 
> Make this |for (stackPos = 0; stackPos < stackLen; ++stackPos)|

OK.

> > +nsXMLContentSink::DidAddContent()
> > +{
> > +  if(IsTimeToNotify()) {
> 
> Space after if.

OK.

> > +    FlushTags(PR_TRUE);	
> > +  }
> > +}
> 
> I think you should just replace the callers with this and remove the function.
> Or at least inline it.

Inlined.

> > Index: mozilla/content/xml/document/src/nsXMLContentSink.h
> > ===================================================================
> 
> > +struct StackNode {
> > +  nsIContent* mContent;
> 
> I'd try to use an nsCOMPtr here.

I don't understand nsCOMPtr well enough to see how it would simplify things here.

> > Index: mozilla/parser/htmlparser/src/nsExpatDriver.cpp
> > ===================================================================
> 
> > @@ -536,6 +544,8 @@
> >          mInternalState = mSink->HandleCharacterData(newline, 1);
> >        }
> >      }
> > +    // XXX the code above is bogus -- hsivonen
> > +    MaybeStopParser();
> 
> Huh?

It tampers with line ends again.

> And this should be in the else block.

It is in the else block.

> > -  NS_PRECONDITION(mInternalState != NS_ERROR_HTMLPARSER_BLOCK || !aBuffer,
> > +  NS_PRECONDITION(mInternalState != NS_ERROR_HTMLPARSER_BLOCK || mInternalState != NS_ERROR_HTMLPARSER_INTERRUPTED || !aBuffer,
> 
> Long line.
> 
> > +    if (status == XML_STATUS_SUSPENDED && mInternalState != NS_ERROR_HTMLPARSER_BLOCK) {
> > +      mInternalState = NS_ERROR_HTMLPARSER_INTERRUPTED;
> 
> This seems wrong, why isn't mInternalState == NS_ERROR_HTMLPARSER_INTERRUPTED
> already?

IIRC, expat could end up in a suspended state on its own in a way that I did not fully understand, but this fixed the symptoms.

> > +    } else
> > +    if (status == XML_STATUS_ERROR) {
> 
> else on its own line, if on the same line as the else.
> 
> > -         (mInternalState == NS_ERROR_HTMLPARSER_BLOCK && mExpatBuffered > 0)) {
> > +         ((mInternalState == NS_ERROR_HTMLPARSER_BLOCK || mInternalState == NS_ERROR_HTMLPARSER_INTERRUPTED) && mExpatBuffered > 0)) {
> 
> Long line.
> 
> > -    PRBool blocked = mInternalState == NS_ERROR_HTMLPARSER_BLOCK;
> > +    PRBool blocked = (mInternalState == NS_ERROR_HTMLPARSER_BLOCK || mInternalState == NS_ERROR_HTMLPARSER_INTERRUPTED);
> 
> Long line.
> 
> > -    if (mInternalState == NS_ERROR_HTMLPARSER_BLOCK) {
> > +    if (mInternalState == NS_ERROR_HTMLPARSER_BLOCK || mInternalState == NS_ERROR_HTMLPARSER_INTERRUPTED) {
> 
> Long line.
> 
> >        PR_LOG(gExpatDriverLog, PR_LOG_DEBUG,
> >               ("Blocked parser (probably for loading linked stylesheets or "
> >                "scripts)."));
> 
> Change the logging so it says 'Blocked or interrupted ...'
> 
> > @@ -1237,7 +1249,7 @@
> >                            nsITokenObserver* anObserver,
> >                            nsIContentSink* aSink)
> >  {
> > -  return mInternalState;
> > +  return mInternalState;  
> 
> Don't add trailing spaces.
> 
> > +void
> > +nsExpatDriver::MaybeStopParser()
> > +{
> > +  if (mInternalState == NS_ERROR_HTMLPARSER_BLOCK || mInternalState == NS_ERROR_HTMLPARSER_INTERRUPTED) {
> 
> Long line.
> 
> Maybe you should add an inline function that returns |mInternalState !=
> NS_ERROR_HTMLPARSER_BLOCK || mInternalState != NS_ERROR_HTMLPARSER_INTERRUPTED|
> and us that wherever you do that check instead.

Made an inline method.
Comment 58 Boris Zbarsky [:bz] 2006-11-06 10:41:14 PST
So... I think we should block 1.9 on this.  This fixes some regressions from various 1.9 work that are really hard to fix in any other way, and it'd really simplify some other things that I think we should try to do for 1.9.

Peter, do you think you could review this in the near future?

Henri, what does your schedule look like?  Are you able to update the patch to review comments as needed?
Comment 59 Henri Sivonen (:hsivonen) 2006-11-10 02:54:50 PST
I have an obligation to work on another project. I think I can find time to fix smallish things, but if review turns up something big, it may be a problem. But I would really like to get this checked in.
Comment 60 Peter Van der Beken [:peterv] 2006-11-29 05:32:03 PST
Created attachment 246923 [details] [diff] [review]
Make the XML content sink incremental, v9

- updated to trunk
- whitespace fixes
- unified SINK_TRACE macro between nsContentSink and HTMLContentSink (added logmodule argument to the macro)
- s/HTMLContentSink/nsContentSink/ in nsContentSink log messages
- made nsContentSink::FlushTags pure virtual
- removed else after return in nsXMLContentSink::GetCurrentContent()
- made StackNode hold an nsCOMPtr<nsIContent>, made PopContent not return anything
- moved inlined functions' implementation into header

(In reply to comment #57)
> > You could use nsInt64 like in WillInterruptImpl.
> 
> But subsequent comments indicated that plain PRInt64 should be preferred.

So then you should use PRInt64 instead of nsInt64 in WillInterruptImpl.

> > > +  if (mCanInterruptParser) {
> > 
> > > +  }
> > 
> > > +  mBeginLoadTime = PR_IntervalToMicroseconds(PR_IntervalNow());
> > 
> > Could put this inside the mCanInterruptParser block?
> 
> I am not sure--just moved existing code.

You didn't just move, that's why I asked:

-  if (mFlags & NS_SINK_FLAG_CAN_INTERRUPT_PARSER) {
...
-    mBeginLoadTime = PR_IntervalToMicroseconds(PR_IntervalNow());

> > > Index: mozilla/content/html/document/src/nsHTMLContentSink.cpp
> > > ===================================================================
> > 
> > > +#ifdef NS_DEBUG
> > > +  PRInt32 mFlags;
> > > +#endif
> > 
> > Do we still need this?
> 
> Yes, unless all the debug code is changed.

I don't understand, I don't see it being set or used anywhere in nsHTMLContentSink.

> > > +// Forked into nsXMLContentSink. Please keep in sync.
> > 
> > > +// forked to nsXMLContentSink. Please keep in sync.
> > 
> > It'd be good if we could avoid this. At least file a follow up bug to unfork.
> 
> Can't avoid it without a rewrite on the HTML side that gets rid of the parser
> contexts.

So file a bug on it.

> > > +    // XXX the newline normalization here is bogus -- hsivonen
> > 
> > I don't see why.
> 
> It is done by expat. Doing it twice is just wrong. (XML has a spec peculiarity
> here that allows CR in the DOM if escaped in source, but Gecko is non-compliant
> due to normalizing twice.)

Well, then you need to file a bug, explain it in the comment and at a reference to the bug in the comment instead of adding cryptic comments.

> > > Index: mozilla/parser/htmlparser/src/nsExpatDriver.cpp
> > > ===================================================================
> > 
> > > @@ -536,6 +544,8 @@
> > >          mInternalState = mSink->HandleCharacterData(newline, 1);
> > >        }
> > >      }
> > > +    // XXX the code above is bogus -- hsivonen
> > > +    MaybeStopParser();
> > 
> > Huh?
> 
> It tampers with line ends again.

Same here.

> > > +    if (status == XML_STATUS_SUSPENDED && mInternalState != NS_ERROR_HTMLPARSER_BLOCK) {
> > > +      mInternalState = NS_ERROR_HTMLPARSER_INTERRUPTED;
> > 
> > This seems wrong, why isn't mInternalState == NS_ERROR_HTMLPARSER_INTERRUPTED
> > already?
> 
> IIRC, expat could end up in a suspended state on its own in a way that I did
> not fully understand, but this fixed the symptoms.

We need to figure this out instead of wallpapering over it.
Comment 61 Henri Sivonen (:hsivonen) 2006-12-13 07:32:16 PST
Created attachment 248515 [details] [diff] [review]
Make the XML content sink incremental, v10

> > > You could use nsInt64 like in WillInterruptImpl.
> > 
> > But subsequent comments indicated that plain PRInt64 should be preferred.
> 
> So then you should use PRInt64 instead of nsInt64 in WillInterruptImpl.

Done.

> > > > +  if (mCanInterruptParser) {
> > > 
> > > > +  }
> > > 
> > > > +  mBeginLoadTime = PR_IntervalToMicroseconds(PR_IntervalNow());
> > > 
> > > Could put this inside the mCanInterruptParser block?
> > 
> > I am not sure--just moved existing code.
> 
> You didn't just move, that's why I asked:
> 
> -  if (mFlags & NS_SINK_FLAG_CAN_INTERRUPT_PARSER) {
> ...
> -    mBeginLoadTime = PR_IntervalToMicroseconds(PR_IntervalNow());

I can't remember why I have done it that way. The way I left it seems harmless and I doubt that I would have changed it without a reason.

> > > > Index: mozilla/content/html/document/src/nsHTMLContentSink.cpp
> > > > ===================================================================
> > > 
> > > > +#ifdef NS_DEBUG
> > > > +  PRInt32 mFlags;
> > > > +#endif
> > > 
> > > Do we still need this?
> > 
> > Yes, unless all the debug code is changed.
> 
> I don't understand, I don't see it being set or used anywhere in
> nsHTMLContentSink.

Removed.

> > > > +// Forked into nsXMLContentSink. Please keep in sync.
> > > 
> > > > +// forked to nsXMLContentSink. Please keep in sync.
> > > 
> > > It'd be good if we could avoid this. At least file a follow up bug to unfork.
> > 
> > Can't avoid it without a rewrite on the HTML side that gets rid of the parser
> > contexts.
> 
> So file a bug on it.

Bug 363688 filed.

> > > > +    // XXX the newline normalization here is bogus -- hsivonen
> > > 
> > > I don't see why.
> > 
> > It is done by expat. Doing it twice is just wrong. (XML has a spec peculiarity
> > here that allows CR in the DOM if escaped in source, but Gecko is non-compliant
> > due to normalizing twice.)
> 
> Well, then you need to file a bug, explain it in the comment and at a reference
> to the bug in the comment instead of adding cryptic comments.

Bug 343870 already filed. Changed comment.

> > > > Index: mozilla/parser/htmlparser/src/nsExpatDriver.cpp
> > > > ===================================================================
> > > 
> > > > @@ -536,6 +544,8 @@
> > > >          mInternalState = mSink->HandleCharacterData(newline, 1);
> > > >        }
> > > >      }
> > > > +    // XXX the code above is bogus -- hsivonen
> > > > +    MaybeStopParser();
> > > 
> > > Huh?
> > 
> > It tampers with line ends again.
> 
> Same here.

Changed comment.

> > > > +    if (status == XML_STATUS_SUSPENDED && mInternalState != NS_ERROR_HTMLPARSER_BLOCK) {
> > > > +      mInternalState = NS_ERROR_HTMLPARSER_INTERRUPTED;
> > > 
> > > This seems wrong, why isn't mInternalState == NS_ERROR_HTMLPARSER_INTERRUPTED
> > > already?
> > 
> > IIRC, expat could end up in a suspended state on its own in a way that I did
> > not fully understand, but this fixed the symptoms.
> 
> We need to figure this out instead of wallpapering over it.

It turns out that the only way to make status == XML_STATUS_SUSPENDED hold is to first call XML_StopParser from the app. The app only calls XML_StopParser from nsExpatDriver.cpp. Looking at the unpatched code and the patched code, it appears that after I came up with the "wallpaper" fix, I also made the real fix and forgot to remover the "wallpaper" fix.

Changed the quoted lines to an assertion.
Comment 62 Peter Van der Beken [:peterv] 2007-01-04 03:38:28 PST
Comment on attachment 248515 [details] [diff] [review]
Make the XML content sink incremental, v10

>Index: mozilla/content/base/src/nsContentSink.cpp
>===================================================================

>+nsContentSink::WillProcessTokensImpl(void)
>+{
>+  if (mCanInterruptParser) {
>+    mDelayTimerStart = PR_IntervalToMicroseconds(PR_IntervalNow());
>+  }
>+
>+  return NS_OK;
>+}
>+
>+NS_IMETHODIMP
>+nsContentSink::WillBuildModelImpl()
>+{
>+  if (mCanInterruptParser) {
>+    nsresult rv = AddDummyParserRequest();
>+    if (NS_FAILED(rv)) {
>+      NS_ERROR("Adding dummy parser request failed");
>+
>+      // Don't return the error result, just reset flag which
>+      // indicates that it can interrupt parsing. If
>+      // AddDummyParserRequests fails it should not affect
>+      // WillBuildModel.
>+      mCanInterruptParser = PR_FALSE;
>+    }
>+  }
>+
>+  mBeginLoadTime = PR_IntervalToMicroseconds(PR_IntervalNow());

It'd still make more sense to me if this was inside the if.
I think sicking mentioned he was going to sr this.
Comment 63 Jonas Sicking (:sicking) No longer reading bugmail consistently 2007-01-18 14:24:35 PST
Comment on attachment 248515 [details] [diff] [review]
Make the XML content sink incremental, v10

Just a couple of minor nits that I don't want to loose (gotta restart browser)

>+nsContentSink::Notify(nsITimer *timer)
...
>+  mNotificationTimer = 0;

= nsnull;

>+nsContentSink::IsTimeToNotify()
>+{
>+  if (!mNotifyOnTimer || !mLayoutStarted || !mBackoffCount ||
>+      mInMonolithicContainer) {
>+    return PR_FALSE;
>+  }
>+
>+  PRTime now = PR_Now();
>+  PRInt64 interval, diff;
>+
>+  LL_I2L(interval, GetNotificationInterval());
>+  LL_SUB(diff, now, mLastNotificationTime);
>+
>+  if (LL_CMP(diff, >, interval)) {

You don't need to use the LL macros any more. Same thing elsewhere.
Comment 64 Jonas Sicking (:sicking) No longer reading bugmail consistently 2007-01-18 17:56:01 PST
Created attachment 251995 [details] [diff] [review]
Make the XML content sink incremental, v10.0.1

This is the same as previous patch but diffed with -u8P which makes the patch a lot easier to read. I also synced with tip (though didn't try to build yet)
Comment 65 Jonas Sicking (:sicking) No longer reading bugmail consistently 2007-01-18 17:57:06 PST
>+nsContentSink::DidBuildModelImpl2(void)

I really don't like this name as it isn't descriptive at all what the function does. How about DropParserAndPerfHint?
Comment 66 Jonas Sicking (:sicking) No longer reading bugmail consistently 2007-01-22 14:47:26 PST
Comment on attachment 251995 [details] [diff] [review]
Make the XML content sink incremental, v10.0.1

>+nsContentSink::AddDummyParserRequest(void)
>+{
>+  nsresult rv = NS_OK;
>+
>+  NS_ASSERTION(!mDummyParserRequest, "Already have a dummy parser request");
>+
>+  mDummyParserRequest = new DummyParserRequest(this);
>+  if (!mDummyParserRequest) {
>+    return NS_ERROR_OUT_OF_MEMORY;
>+  }
>+
>+  nsCOMPtr<nsILoadGroup> loadGroup;
>+  if (mDocument) {
>+    loadGroup = mDocument->GetDocumentLoadGroup();
>+  }
>+
>+  if (loadGroup) {
>+    rv = mDummyParserRequest->SetLoadGroup(loadGroup);

This line does nothing since DummyParserRequest::SetLoadGroup is a no-op. Maybe you should implement Get/SetLoadGroup?

>Index: content/html/document/src/nsHTMLContentSink.cpp

>+// Forked into nsXMLContentSink. Please keep in sync.
> /**
>  * Flush all elements that have been seen so far such that

Please make this comment bigger and harder to miss.

(sorry for the fragmented comments)
Comment 67 Henri Sivonen (:hsivonen) 2007-01-25 08:27:21 PST
> How about DropParserAndPerfHint?

That would be fine.

> Maybe you should implement Get/SetLoadGroup?

I gather that we want to get rid of DummyParserRequest anyway eventually (but I am not competent to remove it from the code base). It fulfills its purpose (comment #40) the way it is now. (I don't know why it works, but it does. I'm just moving around someone else's code at that point.)

> Please make this comment bigger and harder to miss.

OK.

Should I make a new patch now or wait for more comments?
Comment 68 Jonas Sicking (:sicking) No longer reading bugmail consistently 2007-01-25 11:21:01 PST
Wait for more comments, I'm having to do this one in pieces right now unfortunately.
Comment 69 Jonas Sicking (:sicking) No longer reading bugmail consistently 2007-01-25 14:50:24 PST
Comment on attachment 251995 [details] [diff] [review]
Make the XML content sink incremental, v10.0.1

>Index: content/xml/document/src/nsXMLContentSink.h

>-  virtual void FlushPendingNotifications(mozFlushType aType) { }
>+  void FlushPendingNotifications(mozFlushType aType);

Please keep the virtual. It'll be there implicitly anyway, but it's nice to have it explicit.

>Index: parser/htmlparser/src/nsExpatDriver.cpp

>@@ -413,49 +414,51 @@ nsExpatDriver::HandleStartElement(const 
>   PRUint32 attrArrayLength;
>   for (attrArrayLength = XML_GetSpecifiedAttributeCount(mExpatParser);
>        aAtts[attrArrayLength];
>        attrArrayLength += 2) {
>     // Just looping till we find out what the length is
>   }
> 
>   if (mSink) {
>-    mSink->HandleStartElement(aValue, aAtts,
>+    mInternalState = mSink->HandleStartElement(aValue, aAtts,
>                               attrArrayLength,
>                               XML_GetIdAttributeIndex(mExpatParser),
>                               XML_GetCurrentLineNumber(mExpatParser));

Fix indentation.


> nsExpatDriver::ParseBuffer(const PRUnichar *aBuffer,
>                            PRUint32 aLength,
>                            PRBool aIsFinal,
>                            PRUint32 *aConsumed)
> {
>   NS_ASSERTION((aBuffer && aLength != 0) || (!aBuffer && aLength == 0), "?");
>   NS_ASSERTION(mInternalState != NS_OK || aIsFinal || aBuffer,
>                "Useless call, we won't call Expat");
>-  NS_PRECONDITION(mInternalState != NS_ERROR_HTMLPARSER_BLOCK || !aBuffer,
>+  NS_PRECONDITION(mInternalState != NS_ERROR_HTMLPARSER_BLOCK ||
>+                  mInternalState != NS_ERROR_HTMLPARSER_INTERRUPTED ||
>+                  !aBuffer,

This looks wrong, shouldn't this be

NS_PRECONDITION((mInternalState != NS_ERROR_HTMLPARSER_BLOCK &&
                 mInternalState != NS_ERROR_HTMLPARSER_INTERRUPTED) ||
                !aBuffer,

Or better yet

NS_PRECONDITION(!BlockedOrInterrupted() || !aBuffer,
Comment 70 Jonas Sicking (:sicking) No longer reading bugmail consistently 2007-01-25 16:52:33 PST
Comment on attachment 251995 [details] [diff] [review]
Make the XML content sink incremental, v10.0.1

You should make UpdateChildCounts return an nsresult to allow it to indicate out-of-memory, i.e. when mContentStack.AppendElement returns null. There is no caller that cares about the current return value anyway afaict.

>Index: content/xml/document/src/nsXMLContentSink.cpp
>+nsXMLContentSink::UpdateChildCounts()
>+{
>+  // Start from the top of the stack (growing upwards) and see if any
>+  // new content has been appended. If so, we recognize that reflows
>+  // have been generated for it and we should make sure that no
>+  // further reflows occur.  Note that we have to include stackPos == 0
>+  // to properly notify on kids of <html>.
>+  PRInt32 stackLen = mContentStack.Length();
>+  PRInt32 stackPos = stackLen - 1;
>+  while (stackPos >= 0) {
>+    StackNode & node = mContentStack[stackPos];
>+    node.mNumFlushed = node.mContent->GetChildCount();
>+
>+    stackPos--;
>+  }
>+  mNotifyLevel = stackLen;

Shouldn't this be |stackLen - 1|

With those things fixed along with peters comment (in particular comment 62) fixed. sr=sicking

Sweet work!
Comment 71 Jonas Sicking (:sicking) No longer reading bugmail consistently 2007-01-29 14:16:53 PST
Henri, are you going to be able to work on this? Or should I fix the comments and check in?
Comment 72 Henri Sivonen (:hsivonen) 2007-01-29 15:42:07 PST
I was going to fix the remaining issues soonish, but I can't check in anyway, so I'd really appreciate it if you could make the changes and check in.
Comment 73 Jonas Sicking (:sicking) No longer reading bugmail consistently 2007-01-30 11:36:08 PST
Would be great if you could comment on if any of my comments seem wrong though. Especially the last one in comment 69 and comment 70
Comment 74 Henri Sivonen (:hsivonen) 2007-01-30 12:33:56 PST
Yes, it appears that it should be |stackLen - 1|. 

NS_PRECONDITION(!BlockedOrInterrupted() || !aBuffer, looks right, too.

I really cannot remember what the deal with the issue of comment 62 is, but it is probably harmless to do it either way.
Comment 75 Jonas Sicking (:sicking) No longer reading bugmail consistently 2007-01-30 13:18:00 PST
Created attachment 253387 [details] [diff] [review]
Make the XML content sink incremental, v11

This is the latest version with all comments addressed.
Comment 76 Jonas Sicking (:sicking) No longer reading bugmail consistently 2007-01-30 14:18:05 PST
Checked in. Leaving the honor of marking this fixed to Henri.
Comment 77 Mikko Rantalainen 2007-01-31 07:31:06 PST
Verified fixed in Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.9a2pre) Gecko/20070131 Minefield/3.0a2pre

XSLT is not yet incremental. See comment 31 and comment 32. I guess the right choice is to mark this as fixed and create a new bug for that issue.
Comment 78 Henri Sivonen (:hsivonen) 2007-02-01 06:31:42 PST
Sicking: Thank you!

Mikko: XSLT is a very different issue.

Marking FIXED. Yay!
Comment 79 Boris Zbarsky [:bz] 2007-02-01 09:35:17 PST
Is it me, or did this somehow improve Tdhml?  ;)

Henri: thanks for doing this!

Mikko: Making XSLT incremental is kinda hard, because it's non-incremental by design of the spec -- any part of the XML document can affect what the transformed document will look like...
Comment 80 Gábor Stefanik 2007-02-03 08:43:58 PST
Is this for Linux only? I still don't get incremental XML rendering with the latest Windows nightly!
Comment 81 Jesse Ruderman 2007-02-03 14:17:42 PST
It works well for me on Windows XP and Mac, testing with
http://www.hixie.ch/tests/evil/page-loading/incremental/001.cgi?mime=text%2Fxml&delay=1&repeats=10
Comment 82 Laurens Holst 2007-02-04 05:59:49 PST
As for XSLT, Boris, it’s incremental as long as you don’t do anything that needs to ‘look back’ (e.g. count() or sorting, etc). This template would for example be incremental:

<xsl:template match="body">
   <h1>Headings in this document</h1>
   <ul>
      <xsl:apply-templates select="h1|h2|h3|h4|h5|h6" />
   </ul>
</xsl:template>

<xsl:template match="h1|h2|h3|h4|h5|h6">
   <li><xsl:value-of select="." /></li>
</xsl:template>

The output tree is always constructed serially from front to back (unlike CSS which for certain functionality requires reflows), that is really suitable for incremental processing, and these kind of matches don’t need to look ahead in the document either, they match on the fly.

So if you’d make the functions evaluate the document ‘lazily’, that is, place a marker at the location up till where the document has been loaded, and if they encounter a marker they wait for more content to come in until they can complete their function, that should work for a lot of stylesheets, provided that they are created with incremental processing in mind or coincidentally don’t use any non-incremental functionality.

Looking at how the language is constructed, I do think that incremental processing was one of the things they kept in mind when designing it. Cocoon (XML web server) also supports incremental processing of XSLT I think, it saves memory and time.

So I kindly disagree with your "it's non-incremental by
design of the spec" statement :).


~Grauw
Comment 83 Boris Zbarsky [:bz] 2007-02-04 09:51:14 PST
> The output tree is always constructed serially from front to back 

Oh, the output tree _construction_ could be made incremental.  But you can't begin to construct the output tree until you have received all the data from the server, because XPath (which is what "match" uses) allows you do create queries that depend on arbitrary relationships between elements in the document.  For a server this is not the problem, of course, since getting the original data is not the bottleneck -- it's already got the original data.

I suppose something could be done in cases where all the expressions happen to be very very simple (like your case).  If that's a common usage scenario, feel free to file bugs!
Comment 84 Jonas Sicking (:sicking) No longer reading bugmail consistently 2007-02-04 20:10:22 PST
Yes, Please do NOT talk about XSLT in this bug. It is long enough as it is and has nothing to do with XSLT.

I would suggest filing a new bug or starting a discussion in the mozilla.dev.tech.xslt newsgroup.
Comment 85 Frank Wein [:mcsmurf] 2008-01-03 15:26:16 PST
*** Bug 410584 has been marked as a duplicate of this bug. ***
Comment 86 Vlad Alexander 2008-03-24 12:47:52 PDT
Is it possible that a fix for this bug is causing bugs 422962 and Bug 419716?
Comment 87 Andrew Cook 2009-11-05 23:40:15 PST
On |Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.1.5) Gecko/20091102 Firefox/3.5.5| the testcase from comment 81 doesn't work (i.e. do what this bug supposedly fixed) for me.

Simple XHTML, no XSLT or other nonsense, but it is not rendered nor is its source viewable until it's loaded completely.

Did something regress, or something?
Comment 88 Gábor Stefanik 2009-11-06 04:09:48 PST
The test is broken - http://www.hixie.ch/tests/evil/page-loading/incremental/001.cgi?mime=text%2Fhtml&delay=1&repeats=10 also displays in one block (even though this is text/html).

Note You need to log in before you can comment on or make changes to this bug.