Closed
Bug 543458
Opened 15 years ago
Closed 15 years ago
[HTML5][Patch] Make the HTML5 parser not regress tp4
Categories
(Core :: DOM: HTML Parser, defect, P1)
Tracking
()
RESOLVED
FIXED
People
(Reporter: hsivonen, Assigned: hsivonen)
References
Details
(Keywords: perf)
Attachments
(2 files, 4 obsolete files)
85.53 KB,
patch
|
mozilla+ben
:
review+
|
Details | Diff | Splinter Review |
12.76 KB,
patch
|
Details | Diff | Splinter Review |
Currently, the trunk tp4 score is 499.41 on Linux, 652.76 on WinNT 6.0, 378.67 on WinNT 5.1 and 652.39 on Mac.
The scores for the HTML5 parser are 520.22 on Linux and 463.66 on WinNT 5.1. So about 4% worse on Linux and 22% worse on WinNT 5.1.
This needs to be fixed.
Assignee | ||
Updated•15 years ago
|
Priority: -- → P1
Assignee | ||
Comment 1•15 years ago
|
||
Looks like the XP numbers fluctuate. Numbers for runs that have more constrained changes, so the differences should really be the parser or randomness:
Old HTML5 Change
Linux 499.46 520.22 + 4%
Mac 631.97 736.44 +17%
XP 451.29 463.66 + 3%
Vista 770.96 677.52 -12%
So Linux is 4% slower and XP is 3% slower, which looks like a gap that can be closed. Oddly, the HTML5 parser makes things 12% faster on Vista and 17% slower on Mac. The figure for Mac looks troubling.
Assignee | ||
Comment 2•15 years ago
|
||
So the Vista numbers were too good to be true. Here an new numbers:
Old HTML5 Change
XP 437.91 461.61 5%
Vista 678.35 742.69 9%
Mac 634.57 722.96 14%
Linux 500.29 522.45 4%
Assignee | ||
Comment 3•15 years ago
|
||
CPU% counters are collected on Windows, but they go both ways...
Old XP:
* tp4: 437.91 (details)
* tp4_%cpu: 63.83 (details)
* tp4_pbytes: 73.8MB (details)
* tp4_memset: 78.0MB (details)
* tp4_shutdown: 1691.0 (details)
Old Vista
* tp4: 678.35 (details)
* tp4_%cpu: 48.26 (details)
* tp4_pbytes: 87.0MB (details)
* tp4_memset: 72.4MB (details)
* tp4_shutdown: 16509.0 (details)
HTML5 XP
* tp4: 461.61 (details)
* tp4_%cpu: 57.81 (details)
* tp4_pbytes: 75.9MB (details)
* tp4_memset: 80.4MB (details)
* tp4_shutdown: 1715.0 (details)
HTML5 Vista
* tp4: 742.69 (details)
* tp4_%cpu: 49.42 (details)
* tp4_pbytes: 84.9MB (details)
* tp4_memset: 73.8MB (details)
* tp4_shutdown: 13421.0 (details)
Assignee | ||
Comment 4•15 years ago
|
||
I did an analysis of the per-page numbers. Observations:
* The HTML5 parser has some ridiculously bad outlier runs, especially on Vista.
* Looking at *medians*, there are 8 pages that are faster with the HTML5 parser on all four platforms, 18 that are slower with the HTML5 parser on all four platforms and 74 pages that are faster *or* slower depending on the platform.
* Mac is overall slower even in medians.
Assignee | ||
Comment 5•15 years ago
|
||
Profiled tp4 with Shark:
* The parser thread takes 0.9% of the time.
* _handleWindowNeedsDisplay takes notably more time 11.3% vs. 6.9%
* ProcessGeckoEvents takes a little more time 7.3% vs. 6.8%
* nsPreloadURIs::Run takes less time than the HTML5 parser's speculative runnables.
* PresShell::ReflowEvent and DispatchContentLoadedEvents take more time with the HTML5 parser enabled.
* InputStreamReadyEvent does more stuff in with the HTML5 parser. Particularly for imgrequests.
Assignee | ||
Comment 6•15 years ago
|
||
Main thread also serving as the parser thread:
Linux
* tp4: 515.93 (details)
* tp4_pbytes: 138.5MB (details)
* tp4_rss: 45.9MB (details)
* tp4_shutdown: 889.0 (details)
XP
* tp4: 454.03 (details)
* tp4_%cpu: 55.71 (details)
* tp4_pbytes: 75.5MB (details)
* tp4_memset: 81.2MB (details)
* tp4_shutdown: 2087.0 (details)
Vista
* tp4: 753.73 (details)
* tp4_%cpu: 49.72 (details)
* tp4_pbytes: 85.5MB (details)
* tp4_memset: 74.7MB (details)
* tp4_shutdown: 12051.0 (details)
Mac
* tp4: 797.48 (details)
* tp4_pbytes: 460.5MB (details)
* tp4_rss: 132.6MB (details)
* tp4_shutdown: 1234.0 (details)
Assignee | ||
Comment 7•15 years ago
|
||
Setting responsiveness timers to large values:
Linux
* tp4: 519.27 (details)
* tp4_pbytes: 136.3MB (details)
* tp4_rss: 45.8MB (details)
* tp4_shutdown: 956.0 (details)
Mac
* tp4: 722.51 (details)
* tp4_pbytes: 720.1MB (details)
* tp4_rss: 134.7MB (details)
* tp4_shutdown: 1105.0 (details)
XP
* tp4: 455.56 (details)
* tp4_%cpu: 57.0 (details)
* tp4_pbytes: 76.9MB (details)
* tp4_memset: 81.8MB (details)
* tp4_shutdown: 1573.0 (details)
Vista
* tp4: 883.42 (details)
* tp4_%cpu: 44.1 (details)
* tp4_pbytes: 85.5MB (details)
* tp4_memset: 77.4MB (details)
* tp4_shutdown: 20914.0 (details)
Assignee | ||
Comment 8•15 years ago
|
||
Avoiding flushing append notifications form controls
Linux
* tp4: 521.01 (details)
* tp4_pbytes: 134.4MB (details)
* tp4_rss: 44.7MB (details)
* tp4_shutdown: 834.0 (details)
XP
* tp4: 459.08 (details)
* tp4_%cpu: 59.22 (details)
* tp4_pbytes: 78.2MB (details)
* tp4_memset: 82.7MB (details)
* tp4_shutdown: 2620.0 (details)
Vista
* tp4: 781.15 (details)
* tp4_%cpu: 47.0 (details)
* tp4_pbytes: 85.2MB (details)
* tp4_memset: 74.6MB (details)
* tp4_shutdown: 9894.0 (details)
Mac
* tp4: 721.6 (details)
* tp4_pbytes: 710.8MB (details)
* tp4_rss: 135.2MB (details)
* tp4_shutdown: 871.0 (details)
Assignee | ||
Comment 9•15 years ago
|
||
No metascan
Linux
* tp4: 505.73 (details)
* tp4_pbytes: 136.0MB (details)
* tp4_rss: 46.1MB (details)
* tp4_shutdown: 929.0 (details)
XP
* tp4: 461.44 (details)
* tp4_%cpu: 55.36 (details)
* tp4_pbytes: 75.5MB (details)
* tp4_memset: 82.1MB (details)
* tp4_shutdown: 1572.0 (details)
Vista
* tp4: 803.3 (details)
* tp4_%cpu: 48.31 (details)
* tp4_pbytes: 83.1MB (details)
* tp4_memset: 77.7MB (details)
* tp4_shutdown: 19504.0 (details)
Mac
* tp4: 706.59 (details)
* tp4_pbytes: 694.3MB (details)
* tp4_rss: 134.8MB (details)
* tp4_shutdown: 1061.0 (details)
Assignee | ||
Comment 10•15 years ago
|
||
With the patch from bug 538087 applied:
Linux
* tp4: 490.74 (details)
* tp4_pbytes: 136.6MB (details)
* tp4_rss: 46.3MB (details)
* tp4_shutdown: 1037.0 (details)
XP
* tp4: 432.29 (details)
* tp4_%cpu: 54.83 (details)
* tp4_pbytes: 76.5MB (details)
* tp4_memset: 81.2MB (details)
* tp4_shutdown: 1939.0 (details)
Vista
* tp4: 816.84 (details)
* tp4_%cpu: 46.47 (details)
* tp4_pbytes: 88.5MB (details)
* tp4_memset: 82.6MB (details)
* tp4_shutdown: 8564.0 (details)
Mac
* tp4: 688.91 (details)
* tp4_pbytes: 712.4MB (details)
* tp4_rss: 136.2MB (details)
* tp4_shutdown: 963.0 (details)
Looks really good on Linux and XP and weird on Vista.
Depends on: 538087
Assignee | ||
Updated•15 years ago
|
Assignee: nobody → hsivonen
Status: NEW → ASSIGNED
Assignee | ||
Comment 11•15 years ago
|
||
Notes:
* The code still posts a runnable after each </script> albeit a different runnable. This seems to be sufficient to make the parser never slip to doom because one of the runnables returns early. It isn't clear to me what would be strictly necessary to avoid letting the parser slip to doom (returning to the event loop before the parser is done but without anything ever calling to the parser again).
* The current value for the deadline pref in all.js is deliberately over the top. It needs tuning.
* As previously discussed, varying the deadline depending on the recentness of a user event is left to a follow-up patch.
Assignee | ||
Comment 12•15 years ago
|
||
Comment on attachment 429116 [details] [diff] [review]
Instead of returning to the event loop after a </script>, loop within the treeop executor
Please pretend that the deadline pref is set to 750 instead of 4000.
This patch unregresses tp4 on Vista and Linux, but not on Mac. (XP not available on the tryserver.)
I'm running out of ideas on the Mac. I'll reprofile in Shark.
Attachment #429116 -
Flags: review?(bnewman)
Assignee | ||
Comment 13•15 years ago
|
||
Comment on attachment 429116 [details] [diff] [review]
Instead of returning to the event loop after a </script>, loop within the treeop executor
I'll try something else still.
Attachment #429116 -
Flags: review?(bnewman)
Assignee | ||
Comment 14•15 years ago
|
||
Known problem: The "deflection count" of 200 is tuned for number of tokens with the old parser. Now it's used for the number of tree ops. Setting it to 100 or so might be better.
Attachment #429116 -
Attachment is obsolete: true
Assignee | ||
Comment 15•15 years ago
|
||
When speculative loads were runnables, looping within the tree op executor made it possible for the loop to execute tree ops before their corresponding speculative load runnables had run. This caused badness in test_bug503481b.html.
To make sure that speculative loads always start before the corresponding real tree ops run (even if the time distance of the two were so narrow that the speculations aren't really useful anymore), I put speculative loads into an explicit queue that always gets flushed before the op queue.
The code for the new speculative load queue is modeled on the structure of the tree op stage.
Attachment #429524 -
Attachment is obsolete: true
Attachment #430027 -
Flags: review?(bnewman)
Assignee | ||
Comment 16•15 years ago
|
||
Oops. I had forgotten to remove a bogus line about deflections.
Attachment #430027 -
Attachment is obsolete: true
Attachment #430028 -
Flags: review?(bnewman)
Attachment #430027 -
Flags: review?(bnewman)
Assignee | ||
Comment 17•15 years ago
|
||
Tryserver results:
Vista:
* tp4: 661.19 (details)
* tp4_%cpu: 50.96 (details)
* tp4_pbytes: 95.7MB (details)
* tp4_memset: 91.2MB (details)
* tp4_shutdown: 6548.0 (details)
XP:
* tp4: 423.46 (details)
* tp4_%cpu: 59.71 (details)
* tp4_pbytes: 77.8MB (details)
* tp4_memset: 80.8MB (details)
* tp4_shutdown: 1566.0 (details)
Mac:
* tp4: 687.56 (details)
* tp4_pbytes: 698.2MB (details)
* tp4_rss: 134.9MB (details)
* tp4_shutdown: 854.0 (details)
Linux:
* tp4: 471.42 (details)
* tp4_pbytes: 141.1MB (details)
* tp4_rss: 45.7MB (details)
* tp4_shutdown: 902.0 (details)
Yay. Now better than the old parser on all platforms.
Summary: [HTML5] Make the HTML5 parser not regress tp4 → [HTML5][Patch] Make the HTML5 parser not regress tp4
Assignee | ||
Updated•15 years ago
|
Attachment #430028 -
Flags: review?(bnewman) → review?(jonas)
Still trying to understand how the speculative loading works. Moving away from runnables scares me. What will cause us to get in to RunFlushLoop so that we call FlushLoads while we're otherwise stalled waiting for a slow external script waiting to load?
Does FlushLoads/mLoadStage/mLoadQueue etc only deal with speculative loads? If so, please name them FlushSpeculativeLoads/mSpeculativeLoadStage etc. And please add comments for these member variables to describe how they are used.
The flow is also somewhat confusingly named. With one FlushLoads function that moves ops from "queue" to "stage", and another FlushLoads function that moves them from "stage" to "queue". Simply using better names here would likely help.
Is there a reason to speculatively load manifests? First of all they are very rare today, and when they do appear, they generally should appear before any <script>s, thus before we'll ever block, right?
Ok, I think I'm understanding how the speculation is supposed to happen now. Sorry to be arguing about names, but I find that good naming tends to make it a lot easier to understand new code piecewise without having to fit it all in my head before understanding any of it.
So first to confirm that I've actually understood things:
The idea is that we have two sets of queues, the 'op' queue and the 'load' queue. Where the 'load' queue is where we stick all the requests for speculative loads.
In order to prevent first loading a resource, and then speculatively loading it (which would hit the network twice, and cause test_bug503481b.html to fail), we always transfer things in the 'load' queue to the main thread before we transfer things in the 'op' queue to the main thread. And we always process the entire 'load' queue before we transfer anything in the 'op' queue.
Is this correct?
It seems like there is one race condition which could cause us do the real load before the speculative load still. Is there a risk that writing to 'load' queue and the 'op' queue happens while the main thread is in RunFlushLoop after flushing the 'load' queue, but before flushing the 'op' queue?
I suspect the simplest thing would be to keep a hash in the script loader of all loaded scripts and make sure that we never speculatively load something that we've done a real load for. Alternatively, make sure that any time we do a real load, make sure to go through the 'load' queue and purge any pending speculative events for the same uri.
Assignee | ||
Comment 20•15 years ago
|
||
(In reply to comment #18)
> Still trying to understand how the speculative loading works. Moving away from
> runnables scares me. What will cause us to get in to RunFlushLoop so that we
> call FlushLoads while we're otherwise stalled waiting for a slow external
> script waiting to load?
There's also a dedicated runnable for calling FlushLoads only. That runnable will call FlushLoads before RunFlushLoop does when "ops" are actually stalled.
> Does FlushLoads/mLoadStage/mLoadQueue etc only deal with speculative loads? If
> so, please name them FlushSpeculativeLoads/mSpeculativeLoadStage etc. And
> please add comments for these member variables to describe how they are used.
OK. Lately, I've been erring on the side of shorter names to avoid the over 80 characters-wide lines that I get with verbose names.
> The flow is also somewhat confusingly named. With one FlushLoads function that
> moves ops from "queue" to "stage", and another FlushLoads function that moves
> them from "stage" to "queue". Simply using better names here would likely help.
In the case of both "ops" and "loads", there are three queues: one on the main thread, one on the parser thread and one in a staging area in between.
> Is there a reason to speculatively load manifests? First of all they are very
> rare today, and when they do appear, they generally should appear before any
> <script>s, thus before we'll ever block, right?
"Speculative" manifest loads aren't truly speculative--if a manifest gets loaded, we are committed to it. There can never be a <script> before the manifest, so the situation of having to undo a manifest due to document.write() never arises. The reason why a parser thread-discovered manifest gets loaded via "loads" as opposed to "ops" is that the manifest must get processed before any actually speculative loads such as scripts. Thus, manifests seen by the parser thread have to maintain the queue order relative to true speculative loads. See bug 541079.
(Aside: Fixing bug 543062 will have to involve bypassing the "ops" for manifests in document.open()ed docs, too.)
(In reply to comment #19)
> Ok, I think I'm understanding how the speculation is supposed to happen now.
> Sorry to be arguing about names, but I find that good naming tends to make it a
> lot easier to understand new code piecewise without having to fit it all in my
> head before understanding any of it.
>
> So first to confirm that I've actually understood things:
>
> The idea is that we have two sets of queues, the 'op' queue and the 'load'
> queue. Where the 'load' queue is where we stick all the requests for
> speculative loads.
>
> In order to prevent first loading a resource, and then speculatively loading it
> (which would hit the network twice, and cause test_bug503481b.html to fail), we
> always transfer things in the 'load' queue to the main thread before we
> transfer things in the 'op' queue to the main thread. And we always process the
> entire 'load' queue before we transfer anything in the 'op' queue.
>
> Is this correct?
Correct.
> It seems like there is one race condition which could cause us do the real load
> before the speculative load still. Is there a risk that writing to 'load' queue
> and the 'op' queue happens while the main thread is in RunFlushLoop after
> flushing the 'load' queue, but before flushing the 'op' queue?
Looks like there is in the case where the parser thread is not parsing ahead after a </script> but is transferring data to the main thread through the op stage. Oops...
> I suspect the simplest thing would be to keep a hash in the script loader of
> all loaded scripts and make sure that we never speculatively load something
> that we've done a real load for. Alternatively, make sure that any time we do a
> real load, make sure to go through the 'load' queue and purge any pending
> speculative events for the same uri.
It seems to me a simpler fix would be unifying the load stage and the op stage into one object with one mutex and making the main thread read a single method for the case where ops are read from stage.
Assignee | ||
Comment 21•15 years ago
|
||
One more thing:
> In the case of both "ops" and "loads", there are three queues: one on the main
> thread, one on the parser thread and one in a staging area in between.
For "ops" the staging area in only used when the parser thread is transferring data to the main thread without setting it aside in a speculation. When a speculation has been set aside or when processing document.write()-created ops, there's no point in using the staging area, so it's not used in those cases.
I would fairly strongly err on the side of descriptive names and fewer abstractions in order to make it more understandable what the code is doing by simply looking at it. So for example I'd also prefer to see the nsHtml5SpeculativeLoadStage class go away and have the treebuilder hold a strong reference to the nsHtml5TreeOpExecutor directly and have a lock living there which you explicitly grab. No need to go through a level of abstraction IMHO.
(In reply to comment #20)
> In the case of both "ops" and "loads", there are three queues: one on the main
> thread, one on the parser thread and one in a staging area in between.
I think more descriptive names, as well as comments where the members are declared, would help with understanding this a lot.
> "Speculative" manifest loads aren't truly speculative--if a manifest gets
> loaded, we are committed to it. There can never be a <script> before the
> manifest, so the situation of having to undo a manifest due to document.write()
> never arises. The reason why a parser thread-discovered manifest gets loaded
> via "loads" as opposed to "ops" is that the manifest must get processed before
> any actually speculative loads such as scripts. Thus, manifests seen by the
> parser thread have to maintain the queue order relative to true speculative
> loads. See bug 541079.
Please document this thoroughly given that it's somewhat unexpected usage of the speculative queues.
> It seems to me a simpler fix would be unifying the load stage and the op stage
> into one object with one mutex and making the main thread read a single method
> for the case where ops are read from stage.
That would be ok too. Though eventually we'll probably want to use lock-less queues to transfer these objects from the parser thread to the main thread. But we can worry about that later.
Assignee | ||
Comment 23•15 years ago
|
||
(In reply to comment #22)
> I would fairly strongly err on the side of descriptive names and fewer
> abstractions in order to make it more understandable what the code is doing by
> simply looking at it.
OK.
> So for example I'd also prefer to see the
> nsHtml5SpeculativeLoadStage class go away and have the treebuilder hold a
> strong reference to the nsHtml5TreeOpExecutor directly and have a lock living
> there which you explicitly grab. No need to go through a level of abstraction
> IMHO.
Surely for the purposes of this review, nsHtml5SpeculativeLoadStage can stay since this patch doesn't introduce it and its existence has been previously reviewed? The purpose of nsHtml5SpeculativeLoadStage is to avoid per-tree op locking on one hand and on the other hand make it obvious which members are protected by the mutex. Furthermore, since nsHtml5TreeOpExecutor is itself a lockless tree op sink, there needs to be another object that is the lock-protected tree op sink.
> (In reply to comment #20)
> > "Speculative" manifest loads aren't truly speculative--if a manifest gets
> > loaded, we are committed to it. There can never be a <script> before the
> > manifest, so the situation of having to undo a manifest due to document.write()
> > never arises. The reason why a parser thread-discovered manifest gets loaded
> > via "loads" as opposed to "ops" is that the manifest must get processed before
> > any actually speculative loads such as scripts. Thus, manifests seen by the
> > parser thread have to maintain the queue order relative to true speculative
> > loads. See bug 541079.
>
> Please document this thoroughly given that it's somewhat unexpected usage of
> the speculative queues.
OK.
> > It seems to me a simpler fix would be unifying the load stage and the op stage
> > into one object with one mutex and making the main thread read a single method
> > for the case where ops are read from stage.
>
> That would be ok too.
I'll fix it that way.
Assignee | ||
Updated•15 years ago
|
Attachment #430028 -
Attachment is obsolete: true
Attachment #430028 -
Flags: review?(jonas)
Assignee | ||
Comment 24•15 years ago
|
||
Attachment #432779 -
Flags: review?(bnewman)
Comment 25•15 years ago
|
||
(In reply to comment #24)
> Created an attachment (id=432779) [details]
> Address comments from sicking
My usual technique for getting a reliable interdiff (apply one patch, use patch -R to reverse it, then use patch to apply the other patch, then hg diff) doesn't seem to be producing correct results, probably due to bit-rot.
Would it be possible for you to post an interdiff, Henri?
Assignee | ||
Comment 26•15 years ago
|
||
Here's the interdiff.
Comment 27•15 years ago
|
||
Comment on attachment 432779 [details] [diff] [review]
Address comments from sicking
These changes look sound to me, and appear to address the points raised in the discussion above.
My only suggestion is that you stick to the same style between MoveOpsAndSpeculativeLoadsTo, MoveSpeculativeLoadsFrom, and MoveSpeculativeLoadsTo with respect to early return vs. else clause:
diff --git a/parser/html/nsHtml5TreeOpStage.cpp b/parser/html/nsHtml5TreeOpStage.cpp
--- a/parser/html/nsHtml5TreeOpStage.cpp
+++ b/parser/html/nsHtml5TreeOpStage.cpp
@@ -79,22 +79,22 @@ nsHtml5TreeOpStage::MoveSpeculativeLoads
{
mozilla::MutexAutoLock autoLock(mMutex);
if (mSpeculativeLoadQueue.IsEmpty()) {
mSpeculativeLoadQueue.SwapElements(aSpeculativeLoadQueue);
- return;
+ } else {
+ mSpeculativeLoadQueue.MoveElementsFrom(aSpeculativeLoadQueue);
}
- mSpeculativeLoadQueue.MoveElementsFrom(aSpeculativeLoadQueue);
}
void
nsHtml5TreeOpStage::MoveSpeculativeLoadsTo(nsTArray<nsHtml5SpeculativeLoad>& aSpeculativeLoadQueue)
{
mozilla::MutexAutoLock autoLock(mMutex);
if (aSpeculativeLoadQueue.IsEmpty()) {
mSpeculativeLoadQueue.SwapElements(aSpeculativeLoadQueue);
- return;
+ } else {
+ aSpeculativeLoadQueue.MoveElementsFrom(mSpeculativeLoadQueue);
}
- aSpeculativeLoadQueue.MoveElementsFrom(mSpeculativeLoadQueue);
}
#ifdef DEBUG
void
Attachment #432779 -
Flags: review?(bnewman) → review+
Assignee | ||
Comment 28•15 years ago
|
||
(In reply to comment #27)
> (From update of attachment 432779 [details] [diff] [review])
> These changes look sound to me, and appear to address the points raised in the
> discussion above.
>
> My only suggestion is that you stick to the same style between
> MoveOpsAndSpeculativeLoadsTo, MoveSpeculativeLoadsFrom, and
> MoveSpeculativeLoadsTo with respect to early return vs. else clause:
Thanks!
Pushed with 'else' instead of early 'return' in nsHtml5TreeOpStage.cpp:
http://hg.mozilla.org/mozilla-central/rev/61fb9e7374eb
Status: ASSIGNED → RESOLVED
Closed: 15 years ago
Resolution: --- → FIXED
You need to log in
before you can comment on or make changes to this bug.
Description
•