Closed Bug 481566 Opened 11 years ago Closed 11 years ago

Content sink needs to be more responsive

Categories

(Core :: DOM: Core & HTML, defect, P2)

x86
Linux
defect

Tracking

()

RESOLVED FIXED
Tracking Status
fennec 1.0b1+ ---

People

(Reporter: pavlov, Assigned: sicking)

References

(Depends on 1 open bug)

Details

(Keywords: fixed1.9.1, mobile)

Attachments

(6 files, 12 obsolete files)

637 bytes, patch
mfinkle
: review+
Details | Diff | Splinter Review
59.04 KB, text/plain
Details
58.94 KB, text/plain
Details
14.70 KB, text/plain
Details
27.49 KB, patch
sicking
: review+
sicking
: superreview+
Details | Diff | Splinter Review
27.12 KB, patch
Details | Diff | Splinter Review
Attached patch content sink changes (obsolete) — Splinter Review
This is basically spun off of bug 461031.

We never landed https://bugzilla.mozilla.org/attachment.cgi?id=345387, and I think we need to.  I've been using a slightly more hack&slash version in the last couple of Fennec alphas with good results.

We need to get this landed in some form...
tracking-fennec: --- → 1.0b1+
Flags: blocking1.9.1+
Priority: -- → P2
Attachment #365594 - Flags: review?(jonas)
Comment on attachment 365594 [details] [diff] [review]
Fennec preferences

>diff --git a/app/mobile.js b/app/mobile.js

> /* session history */
>-pref("browser.sessionhistory.max_total_viewers", 0);
>+pref("browser.sessionhistory.max_total_viewers", 1);

Skip this for now?
Attachment #365594 - Flags: review?(jonas) → review+
Just for my own sanity, this is what the current code does:


if (!dynlower && noNewEvents && ++mDeflectedCount < 200) {
  return;
}
mDeflectedCount = 0;

getCurrentTime();

if (secsSinceLoadStart > 2) {
  if (secsSinceLastEvent < .75) {
    if (!dynlower) {
      favorPerf(false);
    }
    dynlower = true;
  }
  else {
    if (dynlower) {
      favorPerf(true);
    }
    dynlower = false;
  }
}

if (dynlower && secsSinceParseStart > .003 ||
    !dynlower && secsSinceParseStart > .360) {
  return goToMainEventLoop;
}

return;
This is what I plan to replace it with.

onParseStart() {
 currTime = getCurrentTime();
 ensureInteractive(secsSinceLoadStart > 2 && secsSinceLastEvent < .75);

 mDeflectedCount = dynlower ? 10 : 200;

 mEndTime = currTime + (dynlower ? .003 : .360);
}

onToken() {
 if (!mCanInterruptParser || (--mDeflectedCount > 0 && noNewEvents)) {
  return;
 }

 mDeflectedCount = dynlower ? 10 : 200;

 if (getCurrentTime() > mEndTime) {
  return goToMainEventLoop;
 }
}



The !mCanInterruptParser is actually in the current code as well, missed that.
Assignee: nobody → jonas
Attached patch Patch (obsolete) — Splinter Review
So this cleans up the code and adds a more sensible set of prefs.

For now this patch should not really affect behavior on trunk, but we should definitely play around with the prefs and see if we can get better behavior. I'll document the preferences separately.
Attachment #365592 - Attachment is obsolete: true
Attachment #368625 - Flags: superreview?(mrbkap)
Attachment #368625 - Flags: review?(mrbkap)
Attachment #365592 - Flags: review?(jonas)
Attached patch Slightly better patch (obsolete) — Splinter Review
Attachment #368625 - Attachment is obsolete: true
Attachment #368628 - Flags: superreview?(mrbkap)
Attachment #368628 - Flags: review?(mrbkap)
Attachment #368625 - Flags: superreview?(mrbkap)
Attachment #368625 - Flags: review?(mrbkap)
Attached patch Slightly better patch (obsolete) — Splinter Review
Attachment #368628 - Attachment is obsolete: true
Attachment #368628 - Flags: superreview?(mrbkap)
Attachment #368628 - Flags: review?(mrbkap)
Attached patch With prefs being read correctly (obsolete) — Splinter Review
Attachment #368630 - Attachment is obsolete: true
Attachment #368636 - Flags: superreview?(mrbkap)
Attachment #368636 - Flags: review?(mrbkap)
So the following prefs control when we return to the main event loop. The default values, which should match what trunk is doing now, is in parenthesis.


content.sink.interactive_deflect_count (0)
content.sink.perf_deflect_count (200)

This is how many tokens we deflect before checking calling PR_IntervalNow() to see if we need to return to the event loop. These just exist to avoid calling PR_IntervalNow() for every token.

The first one is the deflect count we'll use in interactive mode, the second in perf mode.

content.sink.interactive_parse_time (3000)
content.sink.perf_parse_time (360000)

How much time we spend parsing before we return to the event loop. First is in interactive mode, second in perf mode.

content.sink.pending_event_mode (1)

Controls if we check for pending events, and what we do if we see a pending event. Note that checking for pending events is only implemented on windows and os/2 so far.

0 = don't check for pending events.
1 = check for pending events. If there is a pending event, ignore the deflect count and always check to see if it's time to return to the main event loop
2 = check for pending events and if there are any immediately return to the main event loop.

content.sink.event_probe_rate (1)

How often to check for pending events. Only there to avoid spending time checking for pending events too often. (strangly enough we do this every token on trunk right now, even though I would have assumed it's a fairly expensive operation on windows)


content.sink.interactive_time (750000)

How long after a user event we should say in interactive mode.

content.sink.initial_perf_time (2000000)

How long after starting a page load we should ignore user events and always stay in perf mode. 

content.sink.enable_perf_mode (0)

Controls if we switch between perf and interactive mode at all.
0 = switch between perf and interactive mode based on the two above preferences
1 = Always stay in interactive mode
2 = Always stay in perf mode


The initial patch in this bug basically gave us the following perf settings

content.sink.interactive_deflect_count (10) // doesn't matter 
content.sink.perf_deflect_count (10)
content.sink.interactive_parse_time (50000) // doesn't matter 
content.sink.perf_parse_time (50000)
content.sink.pending_event_mode (0)
content.sink.event_probe_rate (1) // doesn't matter 
content.sink.interactive_time (0) // doesn't matter 
content.sink.initial_perf_time (0) // doesn't matter 
content.sink.enable_perf_mode (2)

This meant that we always stayed in perf mode and never probed for pending events. But we did have a relatively low deflect count, and we parsed for 50ms before returning to the main event loop.

It would be interesting to see that setting those prefs still yield the same behavior.


I would also try something like:
content.sink.interactive_deflect_count (10) 
content.sink.perf_deflect_count (10)
content.sink.interactive_parse_time (50000)
content.sink.perf_parse_time (50000)
content.sink.pending_event_mode (0)
content.sink.event_probe_rate (1)
content.sink.interactive_time (0)
content.sink.initial_perf_time (0)
content.sink.enable_perf_mode (1)

In other words, be in interactive mode rather than perf mode since that prioritizes user events while loading.
Attachment #368636 - Flags: superreview?(mrbkap)
Attachment #368636 - Flags: superreview+
Attachment #368636 - Flags: review?(mrbkap)
Attachment #368636 - Flags: review+
Comment on attachment 368636 [details] [diff] [review]
With prefs being read correctly

> NS_IMETHODIMP
> nsContentSink::ScriptEvaluated(nsresult aResult,
>                                nsIScriptElement *aElement,
>                                PRBool aIsInline)
> {
>+  mDeflectedCount = mPerfDeflectCount;

Please comment on why you do this.

> nsContentSink::DidProcessATokenImpl()
>+
> 

Nit: extra new line here.

>+    if (mDynamicLowerValue != newDynLower) {

Rename mDynamicLowerValue to something that makes sense.

r+sr=mrbkap with those comments addressed.
Whiteboard: [needs updated patch]
Attached patch bustage fix (obsolete) — Splinter Review
The windows nsWindow.cpp change wasn't landed, and both the windows and os/2 parts of the patch were wrong (assigning to *aHasPending rather than aHasPending).

I landed this as a bustage fix:
http://hg.mozilla.org/mozilla-central/rev/ec2bec209571
Hmm.  So the event thing doesn't help Fennec, since it's Windows-only, right?

For interruptible reflow, we decided to interrupt on process-wide events; is there a reason for the sink here to only interrupt on events in the current window?  This means that events on a page won't interrupt sinks in subframes or in background tabs, for example.
Thanks Gavin!

bz: the goal of this patch was to install better prefs for the current code we have, so that fennec can use what we have and just change prefs rather than having to comment out big hunks of code.

I do agree that using process wide checks makes more sense. Is that easily doable without loosing perf? Even on windows?
Ah, ok.  I thought changing the interactivity thing was one of the goals.

For the rest, I misread the code; the Windows code is in fact checking the global queue (GetQueueStatus); the only difference there from what interruptible reflow is doing is that it's not bothering with masking of HIWORD.  That's equivalent, since the low word always has a subset of the bits in the high word set.  I got confused with the OS/2 code looking so much like Windows... ;)
I keep crashing in DidProcessATokenImpl() because mParser is NULL. Seems to be often associated with opening a popup window. Attached is the backtrace from gdb.
Attached file backtrace of crash from comment #15 (obsolete) —
In fact, all the unit test machines were orange with crashes in crashtest, mochitest, mochitest-chrome, and mochitest-a11y with that stack.

The 10 failing reftests I can't explain, but also seem to be due to this patch.

I've backed the patch out for now.
The updated patch should change the comments in nsIWidget.

Though I'm going to add HasPendingEvent alongside the existing GetLastInputEventTime in bug 67752, so depending on landing order we'll need to merge one way or another.

Also, can HasPendingEvent() just return a PRBool instead of using an out param?  Would be really nice!
Same patch as previous, applies cleanly on top of incremental reflow patch
(In reply to comment #15)
> I keep crashing in DidProcessATokenImpl() because mParser is NULL. Seems to be
> often associated with opening a popup window. Attached is the backtrace from
> gdb.

What site(s) are you most frequently seeing this on?
I have the crash fixed locally, just needed a simple null-check.

Still tracking down the layout regressions though:
http://tinderbox.mozilla.org/showlog.cgi?log=MozillaTry/1239782535.1239789923.15087.gz&fulltext=1
Attached patch Still causes layout regressions (obsolete) — Splinter Review
Attachment #371008 - Attachment is obsolete: true
Attachment #371022 - Attachment is obsolete: true
Attachment #372925 - Attachment is obsolete: true
This doesn't apply any more, specifically:

@@ -407,6 +404,8 @@
                                nsIScriptElement *aElement,
                                PRBool aIsInline)
 {
+  mDeflectedCount = mPerfDeflectCount;
+
   if (mParser) {
     mParser->ScriptDidExecute();
   }
Comment on attachment 375741 [details] [diff] [review]
Up to tip and some changes from roc. Still has layout problems

This is the wrong patch.
Attachment #375741 - Attachment is obsolete: true
Attached file Frame dump with patch
So this framedump shows what looks like a broken frame tree. On line 507, in both frame dumps, there is a line frame. The line, without my patch, contains:

line 0x2bc8b2c: count=1 state=block,clean,prevmarginclean,not impacted,not wrapped,before:nobr,after:nobr[0x4108] {900,13500,37440,6300} <

With my patch the [0x4108] is changed to [0x4148]. Which after talking with dbaron and roc seems like a state we should not be able to be in after we're done reflowing.

So I need help from layout folks here. Any ideas for how to debug this?
The layout changes in the above patch was a suggestion from roc to fix the problem, but didn't seem to make a difference.
So that's the mInvalidateTextRuns flag, right?  That's set to true in all sorts of dynamic-append and dynamic-insert cases in nsBlockFrame, but only cleared for non-block lines.  The line here (the one containing the table) is a block line.  So I can definitely see us ending up with that flag set for a block line depending on when notifications happen, and once it gets there it'll stick around on that line.  I wouldn't think that would cause observable layout differences...
Attached file Frame dump diff
Ah, ok. So discounting that difference, the differences seem to appear around line 512-513.

Before:
BCTableCell(td)(0)@0x2bd2b08 next=0x2bd2ce4 {0,0,3540,1140} [content=0x42ed4ce0] [overflow=-180,-120,3780,1380] [sc=0x2bd2760]<
  Block(td)(0)@0x2bd2bb4 {360,180,3000,600} [state=00d00000] sc=0x2bd2b64(i=0,b=1) pst=:-moz-cell-content<

With my patch:
BCTableCell(td)(0)@0x2b83908 next=0x2b83ae4 {0,0,3540,1260} [content=0x404b1ad0] [overflow=-180,-120,3780,1500] [sc=0x2b83560]<
  Block(td)(0)@0x2b839b4 {360,240,3000,600} [state=00d00000] sc=0x2b83964(i=0,b=1) pst=:-moz-cell-content<



The BCTableCell(td) in my patched build is somewhat taller, and the Block(td) inside it has a somewhat higher top coordinate.

Attaching a frametree diff where all pointers have been replaced by $number$ in order to make the diff readable.
What's the HTML those frame dumps are for?  It looks like that entire first row is just 2px taller in one build than the other, but it's hard to tell from that dump why it has the height it has (about 20px) at all, given that all the blocks I see there are about 10px tall.
This is border-collapse-table-cell.html. Though most files in that directory seems to fail the same way.
Make that several, not most. Seems like several files have very similar markup.
Hmm.  Is the issue that we're currently not notifying within <tr> (due to the monolithic container stuff) on those testcases and with this patch we suddenly do?  Bernd, are there known bc dynamic change bugs that could cause this layout difference?  Can we reproduce the issue without this patch by building the table via DOM calls?
I will look into this over the weekend.
Awesome, thanks a ton!
Attached patch Should workSplinter Review
This one should work. Turns out that parser->CanInterrupt lies during WillParse :(
Attachment #368636 - Attachment is obsolete: true
Attachment #375747 - Attachment is obsolete: true
Attachment #377799 - Flags: superreview+
Attachment #377799 - Flags: review+
You presumably want to take the printfs out, right?
It would also help if the patch would apply to trunk, it does not due to bz checkin for bug 67752.
the widget code seems to have some of the changes already
Attached patch forced apply (obsolete) — Splinter Review
I tried to apply attachment 377799 [details] [diff] [review] and did some hacking changes in widget due to the above described conflict. 

* I could not reproduce the blocking reftest failures in my XP Debug build *

It might be worth to fix the patch and send it over try server
At least its difficult for me to judge which bug might be the cause if I can not reproduce it.
Attachment #377836 - Attachment is obsolete: true
(In reply to comment #42)
> I tried to apply attachment 377799 [details] [diff] [review] and did some hacking changes in widget due
> to the above described conflict. 
> 
> * I could not reproduce the blocking reftest failures in my XP Debug build *

Bernd, I think you'll need an older version of Jonas' patch to trigger the layout bug, i.e. attachment 375747 [details] [diff] [review], and you'd probably be best off using a copy of the source from before bz's interruptible reflow patches landed (hg up -r 47e935bd7f3b or earlier). Jonas' last patch makes us not trigger the layout bug, but the word on the street is that the bug is still there.
Jonas (or bz, or mrbkap), do we still need the widget changes here, even after bz landed his interruptible reflow changes? If so, we'll need and updated patch, it's not obvious to me how to merge your changes with bz's. If we don't need your changes in the widget code, I've got your patch tweaked to compile with current trunk (no widget changes at all), and I could land that if that's really all we want. Please let me know...
We have two options.  We can either make this patch use my widget changes or have this patch add a new function it uses just like it does now.  On Windows, there's no difference between the two.  On Linux and Mac the latter approach would mean not getting any content sink interrupts on user events (as now).

The former approach has the benefit of trying to interrupt the sink on mac and linux too, and the drawback that those "check for user input" calls might be more expensive than on Windows, possibly.
I am currently 

at changeset:   27990:edf759b1ba4a
user:        Dave Townsend <dtownsend@oxymoronical.com>
date:        Tue May 05 12:45:53 2009 +0100
summary:     Bug 486195: getItemLocation on WinRegInstallLocation should return
a clone of the item location. r=robstrong

and applied attachment 377799 [details] [diff] [review]

nevertheless this is WFM under winxp

Are the failures on a MAC? if so bug 477504 comes into mind. Otherwise as I said I would recommend to push this over the try server.
Looks like it stuck. I'll file a separate bug on the layout issue.
Whiteboard: [needs updated patch]
Component: Content → DOM
QA Contact: content → general
For mobile, it caused a performance improvement, almost 1000ms:

http://graphs.mozilla.org/#show=3006905,3006921,3006936
(In reply to comment #48)
> This appears to have caused a performance regression:
> http://graphs-new.mozilla.org/graph.html#tests=[{%22machine%22:51,%22test%22:2,%22branch%22:1},{%22machine%22:52,%22test%22:2,%22branch%22:1},{%22machine%22:53,%22test%22:2,%22branch%22:1}]&sel=1242662700,1242835500

The Tp3 regression seems to have been caused by something else? The graphs show a blip that then returned to normal. Should reload that link again later in the day and see if they've levelled out.
(In reply to comment #50)
> The Tp3 regression seems to have been caused by something else? The graphs show
> a blip that then returned to normal. Should reload that link again later in the
> day and see if they've levelled out.

Sorry, I replied in the newsgroup but not here. I checked in a fix to the Tp3 regression.
Te fix sicking mentions in comment 51 was http://hg.mozilla.org/mozilla-central/rev/9731a54fbb99 : just disabling interrupt on user events on non-Windows platforms.
Soooo ... RESO FIXED?
Aye, FIXED.  On to 1.9.1!
Status: NEW → RESOLVED
Closed: 11 years ago
Resolution: --- → FIXED
So Jonas has this mostly ported to 1.9.1, but is unfortunately out for a few hours, will attempt to get the patch into the bug later today.
FWIW, this bug's checkin added 4 new unsigned/signed comparison build-warnings:

> nsContentSink.cpp: In member function ‘nsresult nsContentSink::DidProcessATokenImpl()’:
> nsContentSink.cpp:1545: warning: comparison between signed and unsigned integer expressions
> nsContentSink.cpp:1552: warning: comparison between signed and unsigned integer expressions
> nsContentSink.cpp: In member function ‘nsresult nsContentSink::WillParseImpl()’:
> nsContentSink.cpp:1692: warning: comparison between signed and unsigned integer expressions
> nsContentSink.cpp:1693: warning: comparison between signed and unsigned integer expressions

Links to guilty lines, in the source:
> http://hg.mozilla.org/mozilla-central/annotate/11a3e4008446/content/base/src/nsContentSink.cpp#l1545
> http://hg.mozilla.org/mozilla-central/annotate/11a3e4008446/content/base/src/nsContentSink.cpp#l1552
> http://hg.mozilla.org/mozilla-central/annotate/11a3e4008446/content/base/src/nsContentSink.cpp#l1692
> http://hg.mozilla.org/mozilla-central/annotate/11a3e4008446/content/base/src/nsContentSink.cpp#l1693
Attached patch Branch patchSplinter Review
Branch patch. Includes porting of Boris' implementation of HasPendingInputEvents. Was quite simple to port once I went for the simple solution of just adding nsIWidget_1_9_1_BRANCH. However since this requires an extra QI and thus might be a slight perf hit, I set the content.sink.event_probe_rate pref to 3 instead of the default 1. This should more than offset the perf hit.
sicking: Any warning fixes in there? rs=me on the patch to fix them on m-c.

/be
Fixed in 1.9.1: http://hg.mozilla.org/releases/mozilla-1.9.1/rev/a7eb03446bed

I'll check in a fix for the warnings this weekend
Keywords: fixed1.9.1
No longer depends on: 502168
Component: DOM → DOM: Core & HTML
You need to log in before you can comment on or make changes to this bug.