Last Comment Bug 650189 - Recent performance regression in loading huge tinderbox logs
: Recent performance regression in loading huge tinderbox logs
Status: RESOLVED FIXED
: dogfood, perf, regression
Product: Core
Classification: Components
Component: Layout: Text (show other bugs)
: Trunk
: x86 Mac OS X
: -- normal (vote)
: ---
Assigned To: Simon Montagu :smontagu
:
Mentors:
http://tinderbox.mozilla.org/showlog....
Depends on:
Blocks: 263359
  Show dependency treegraph
 
Reported: 2011-04-14 19:40 PDT by :Ehsan Akhgari
Modified: 2013-12-27 14:20 PST (History)
23 users (show)
See Also:
Crash Signature:
(edit)
QA Whiteboard:
Iteration: ---
Points: ---
Has Regression Range: ---
Has STR: ---
+
fixed


Attachments
w-i-p (9.19 KB, patch)
2011-04-20 12:15 PDT, Simon Montagu :smontagu
no flags Details | Diff | Splinter Review

Description :Ehsan Akhgari 2011-04-14 19:40:49 PDT
See the URL for example (warning: it might kill your browser).

I've recently (the past couple of weeks maybe) am seeing a huge regression in loading these pages.  I leave Firefox for like 5 minutes and then kill it, because it's absolutely unresponsive.

I don't have an --enable-profiling opt build to profile this with right now, but shark on our (useless for profiling) nightlies shows me some nsTextFrame stuff.

This is the worst regression I've ever seen.  Can somebody profile please?
Comment 1 Robert O'Callahan (:roc) (Exited; email my personal email if necessary) 2011-04-14 19:47:50 PDT
Simon is the most likely culprit :-)
Comment 2 Josh Matthews [:jdm] (away until 9/3) 2011-04-14 19:56:11 PDT
I was loading merrily tinderbox logs on a nightly no older than a couple of days.
Comment 3 Boris Zbarsky [:bz] 2011-04-14 20:00:47 PDT
> but shark on our (useless for profiling) nightlies

http://ftp.mozilla.org/pub/mozilla.org/firefox/nightly/latest-mozilla-central/firefox-6.0a1.en-US.mac-shark.dmg tends to have a useful for profiling nightly.

I'll take a look if someone doesn't beat me to it, but I suspect regression range may be more useful than profile.
Comment 4 Alice0775 White 2011-04-14 20:25:57 PDT
Please double check the following range.

Regression window(m-c hourly):
Works:
http://hg.mozilla.org/mozilla-central/rev/d114ab94a859
Mozilla/5.0 (Windows NT 6.1; WOW64; rv:2.2a1pre) Gecko/20110411 Firefox/4.2a1pre ID:20110411100945
Broken:
http://hg.mozilla.org/mozilla-central/rev/f65b79eeabd4
Mozilla/5.0 (Windows NT 6.1; WOW64; rv:2.2a1pre) Gecko/20110411 Firefox/4.2a1pre ID:20110411101509
Pushlog:
http://hg.mozilla.org/mozilla-central/pushloghtml?fromchange=d114ab94a859&tochange=f65b79eeabd4

Regression window(cedar hourly):
works:
http://hg.mozilla.org/projects/cedar/rev/db98c4f46347
Mozilla/5.0 (Windows NT 6.1; WOW64; rv:2.2a1pre) Gecko/20110409 Firefox/4.2a1pre ID:20110410231038
Broken:
http://hg.mozilla.org/projects/cedar/rev/2ad43389d244
Mozilla/5.0 (Windows NT 6.1; WOW64; rv:2.2a1pre) Gecko/20110409 Firefox/4.2a1pre ID:20110411010443
Pushlog:
http://hg.mozilla.org/projects/cedar//pushloghtml?fromchange=db98c4f46347&tochange=2ad43389d244
Comment 5 Justin Wood (:Callek) (Away until Aug 29) 2011-04-14 20:32:09 PDT
(In reply to comment #4)
> Regression window(cedar hourly):
> Pushlog:
> http://hg.mozilla.org/projects/cedar//pushloghtml?fromchange=db98c4f46347&tochange=2ad43389d244

Which is smontagu, as roc suspected. thanks alice
Comment 6 :Ehsan Akhgari 2011-04-14 21:21:36 PDT
(In reply to comment #5)
> (In reply to comment #4)
> > Regression window(cedar hourly):
> > Pushlog:
> > http://hg.mozilla.org/projects/cedar//pushloghtml?fromchange=db98c4f46347&tochange=2ad43389d244
> 
> Which is smontagu, as roc suspected. thanks alice

Yeah, makes sense to me.

We should investigate to see if this problem can show up in other web pages, and back out bug 263359 from aurora.
Comment 7 Simon Montagu :smontagu 2011-04-14 21:43:42 PDT
Very strange, because all my testing showed a huge improvement in loading tinderbox logs from bug 263359. Why is this tinderbox log different from all other tinderbox logs?
Comment 8 Simon Montagu :smontagu 2011-04-14 23:01:19 PDT
Ah, I was forgetting that bug 646359 got split out from bug 263359. Testing with the patch from there applied makes the test URL load speedily for me (a local copy of it, that is).

I'm still surprised that bug 263359 made things significantly worse, though.
Comment 9 Simon Montagu :smontagu 2011-04-14 23:12:54 PDT
On second thoughts, testing with the local copy doesn't prove anything because it gets decoded as Latin1
Comment 10 Simon Montagu :smontagu 2011-04-14 23:44:55 PDT
Bug 646359 doesn't help with this.
Comment 11 Simon Montagu :smontagu 2011-04-15 00:26:57 PDT
So this is basically the same old problem with incremental bidi resolution in long paragraphs that we have talked about in the past, e.g. https://bugzilla.mozilla.org/show_bug.cgi?id=566066#c2.

Bug 263359, which was supposed to help with that, was a performance win with smaller files, but it seems that there is a critical mass above which doing bidi resolution on each line separately takes longer than doing it on the whole block.

FTR, when I saw performance improvements I was testing with tinderbox logs around 10MB, 85K lines; the log in the URL is just under 32MB, 146K lines.
Comment 12 :Ehsan Akhgari 2011-04-15 13:53:00 PDT
Does that mean that we can't effectively get good perf in both cases (long lines and large blocks)?
Comment 13 Simon Montagu :smontagu 2011-04-16 13:45:00 PDT
I'm sure there are other things we can do. One idea that I want to investigate is a NS_LINE_NEEDS_BIDI_RESOLUTION flag for preformatted elements, similar to the NS_BLOCK_NEEDS_BIDI_RESOLUTION that we already use.
Comment 14 Robert O'Callahan (:roc) (Exited; email my personal email if necessary) 2011-04-17 17:07:00 PDT
Why is doing bidi resolution on each line separately a *huge* regression from doing it on the whole block? That sounds surprising to me. Are we sure that's all that's going on here?
Comment 15 Sheila Mooney 2011-04-19 15:35:13 PDT
Why are we nominating this for Aurora? What are the risks?
Comment 16 JP Rosevear [:jpr] 2011-04-19 15:40:13 PDT
Will track, we need to get to a solid root cause analysis and find out if backout or something else is a good strategy.
Comment 17 :Ehsan Akhgari 2011-04-20 11:06:37 PDT
(In reply to comment #15)
> Why are we nominating this for Aurora? What are the risks?

We know that the patches landed in bug 263359 (which is on Aurora) have caused huge performance regressions for at least Tinderbox log pages.  We don't know whether this regression can also be observed on other (regular) web pages or not.  We need to track this on Aurora, to make sure that we back out bug 263359 if we need to.
Comment 18 Simon Montagu :smontagu 2011-04-20 12:15:14 PDT
Created attachment 527349 [details] [diff] [review]
w-i-p

It's cool when working on one bug provides illumination for another one...

At least part of this was caused by moving the initialization of the nsBlockInFlowLineIterator from Resolve() into ResolveParagraph() in http://hg.mozilla.org/mozilla-central/rev/ad5009233ae9. 

That can't be the whole story though, because it doesn't fit the regression range, and wasn't checked in to Aurora. This patch also regresses bug 650189 :(
Comment 19 Daniel Cater 2011-04-20 12:27:33 PDT
(In reply to comment #18)
> This patch also regresses bug 650189 :(

This is bug 650189.
Comment 20 Simon Montagu :smontagu 2011-04-20 12:39:19 PDT
The patch regress *bug 645193*
Comment 21 Robert O'Callahan (:roc) (Exited; email my personal email if necessary) 2011-04-20 16:43:39 PDT
I think we should do a backout to fix this, both on Aurora and mozilla-central.
Comment 22 Simon Montagu :smontagu 2011-04-20 22:05:14 PDT
Can someone double check whether the regression exists in Aurora before backing out there? I'm still seeing improvement rather than regression in Aurora compared to FF4, but my network connection speed is very irregular and I don't know how reliable my data are.
Comment 23 Robert O'Callahan (:roc) (Exited; email my personal email if necessary) 2011-04-20 22:14:07 PDT
It should be possible to simulate a testcase with a script loop that document.write()s chunks of text and then calls document.body.offsetHeight to flush reflow periodically.
Comment 24 :Ehsan Akhgari 2011-04-21 09:58:24 PDT
Using the link in the URL field, I can reproduce the exact regression on Aurora as well.  Things *might* be different under slow connections, in which case roc's suggestion in comment 23 might help, but I think that we have enough information now to assert that this regression exists both on mozilla-central and Aurora.
Comment 25 JP Rosevear [:jpr] 2011-04-26 15:01:18 PDT
Since there is no proposed fix, please backout as per #21 as soon as possible.
Comment 26 :Ehsan Akhgari 2011-04-26 15:42:06 PDT
(In reply to comment #25)
> Since there is no proposed fix, please backout as per #21 as soon as possible.

Simon, can you please take care of this?  I'm not sure which patches to back out exactly.
Comment 28 Simon Montagu :smontagu 2011-04-27 01:56:30 PDT
http://hg.mozilla.org/mozilla-central/rev/fa0295a97f1b

Ehsan, can you confirm that the backout fixes the regression?
Comment 29 :Ehsan Akhgari 2011-04-28 11:05:32 PDT
(In reply to comment #28)
> http://hg.mozilla.org/mozilla-central/rev/fa0295a97f1b
> 
> Ehsan, can you confirm that the backout fixes the regression?

It does, yes.  Thanks!

Note You need to log in before you can comment on or make changes to this bug.