The default bug view has changed. See this FAQ.

Recent performance regression in loading huge tinderbox logs

RESOLVED FIXED

Status

()

Core
Layout: Text
RESOLVED FIXED
6 years ago
3 years ago

People

(Reporter: Ehsan, Assigned: smontagu)

Tracking

({dogfood, perf, regression})

Trunk
x86
Mac OS X
dogfood, perf, regression
Points:
---

Firefox Tracking Flags

(firefox5+ fixed)

Details

(URL)

Attachments

(1 attachment)

(Reporter)

Description

6 years ago
See the URL for example (warning: it might kill your browser).

I've recently (the past couple of weeks maybe) am seeing a huge regression in loading these pages.  I leave Firefox for like 5 minutes and then kill it, because it's absolutely unresponsive.

I don't have an --enable-profiling opt build to profile this with right now, but shark on our (useless for profiling) nightlies shows me some nsTextFrame stuff.

This is the worst regression I've ever seen.  Can somebody profile please?
Simon is the most likely culprit :-)
Assignee: nobody → smontagu

Comment 2

6 years ago
I was loading merrily tinderbox logs on a nightly no older than a couple of days.
> but shark on our (useless for profiling) nightlies

http://ftp.mozilla.org/pub/mozilla.org/firefox/nightly/latest-mozilla-central/firefox-6.0a1.en-US.mac-shark.dmg tends to have a useful for profiling nightly.

I'll take a look if someone doesn't beat me to it, but I suspect regression range may be more useful than profile.

Comment 4

6 years ago
Please double check the following range.

Regression window(m-c hourly):
Works:
http://hg.mozilla.org/mozilla-central/rev/d114ab94a859
Mozilla/5.0 (Windows NT 6.1; WOW64; rv:2.2a1pre) Gecko/20110411 Firefox/4.2a1pre ID:20110411100945
Broken:
http://hg.mozilla.org/mozilla-central/rev/f65b79eeabd4
Mozilla/5.0 (Windows NT 6.1; WOW64; rv:2.2a1pre) Gecko/20110411 Firefox/4.2a1pre ID:20110411101509
Pushlog:
http://hg.mozilla.org/mozilla-central/pushloghtml?fromchange=d114ab94a859&tochange=f65b79eeabd4

Regression window(cedar hourly):
works:
http://hg.mozilla.org/projects/cedar/rev/db98c4f46347
Mozilla/5.0 (Windows NT 6.1; WOW64; rv:2.2a1pre) Gecko/20110409 Firefox/4.2a1pre ID:20110410231038
Broken:
http://hg.mozilla.org/projects/cedar/rev/2ad43389d244
Mozilla/5.0 (Windows NT 6.1; WOW64; rv:2.2a1pre) Gecko/20110409 Firefox/4.2a1pre ID:20110411010443
Pushlog:
http://hg.mozilla.org/projects/cedar//pushloghtml?fromchange=db98c4f46347&tochange=2ad43389d244
(In reply to comment #4)
> Regression window(cedar hourly):
> Pushlog:
> http://hg.mozilla.org/projects/cedar//pushloghtml?fromchange=db98c4f46347&tochange=2ad43389d244

Which is smontagu, as roc suspected. thanks alice
(Reporter)

Comment 6

6 years ago
(In reply to comment #5)
> (In reply to comment #4)
> > Regression window(cedar hourly):
> > Pushlog:
> > http://hg.mozilla.org/projects/cedar//pushloghtml?fromchange=db98c4f46347&tochange=2ad43389d244
> 
> Which is smontagu, as roc suspected. thanks alice

Yeah, makes sense to me.

We should investigate to see if this problem can show up in other web pages, and back out bug 263359 from aurora.
tracking-firefox5: --- → ?
Blocks: 263359
(Assignee)

Comment 7

6 years ago
Very strange, because all my testing showed a huge improvement in loading tinderbox logs from bug 263359. Why is this tinderbox log different from all other tinderbox logs?
(Assignee)

Comment 8

6 years ago
Ah, I was forgetting that bug 646359 got split out from bug 263359. Testing with the patch from there applied makes the test URL load speedily for me (a local copy of it, that is).

I'm still surprised that bug 263359 made things significantly worse, though.
Depends on: 646359
(Assignee)

Comment 9

6 years ago
On second thoughts, testing with the local copy doesn't prove anything because it gets decoded as Latin1
(Assignee)

Comment 10

6 years ago
Bug 646359 doesn't help with this.
No longer depends on: 646359
(Assignee)

Comment 11

6 years ago
So this is basically the same old problem with incremental bidi resolution in long paragraphs that we have talked about in the past, e.g. https://bugzilla.mozilla.org/show_bug.cgi?id=566066#c2.

Bug 263359, which was supposed to help with that, was a performance win with smaller files, but it seems that there is a critical mass above which doing bidi resolution on each line separately takes longer than doing it on the whole block.

FTR, when I saw performance improvements I was testing with tinderbox logs around 10MB, 85K lines; the log in the URL is just under 32MB, 146K lines.
(Reporter)

Comment 12

6 years ago
Does that mean that we can't effectively get good perf in both cases (long lines and large blocks)?
(Assignee)

Comment 13

6 years ago
I'm sure there are other things we can do. One idea that I want to investigate is a NS_LINE_NEEDS_BIDI_RESOLUTION flag for preformatted elements, similar to the NS_BLOCK_NEEDS_BIDI_RESOLUTION that we already use.
Why is doing bidi resolution on each line separately a *huge* regression from doing it on the whole block? That sounds surprising to me. Are we sure that's all that's going on here?

Comment 15

6 years ago
Why are we nominating this for Aurora? What are the risks?

Updated

6 years ago
tracking-firefox5: ? → +

Comment 16

6 years ago
Will track, we need to get to a solid root cause analysis and find out if backout or something else is a good strategy.
(Reporter)

Comment 17

6 years ago
(In reply to comment #15)
> Why are we nominating this for Aurora? What are the risks?

We know that the patches landed in bug 263359 (which is on Aurora) have caused huge performance regressions for at least Tinderbox log pages.  We don't know whether this regression can also be observed on other (regular) web pages or not.  We need to track this on Aurora, to make sure that we back out bug 263359 if we need to.
(Assignee)

Comment 18

6 years ago
Created attachment 527349 [details] [diff] [review]
w-i-p

It's cool when working on one bug provides illumination for another one...

At least part of this was caused by moving the initialization of the nsBlockInFlowLineIterator from Resolve() into ResolveParagraph() in http://hg.mozilla.org/mozilla-central/rev/ad5009233ae9. 

That can't be the whole story though, because it doesn't fit the regression range, and wasn't checked in to Aurora. This patch also regresses bug 650189 :(

Comment 19

6 years ago
(In reply to comment #18)
> This patch also regresses bug 650189 :(

This is bug 650189.
(Assignee)

Comment 20

6 years ago
The patch regress *bug 645193*
I think we should do a backout to fix this, both on Aurora and mozilla-central.
(Assignee)

Comment 22

6 years ago
Can someone double check whether the regression exists in Aurora before backing out there? I'm still seeing improvement rather than regression in Aurora compared to FF4, but my network connection speed is very irregular and I don't know how reliable my data are.
It should be possible to simulate a testcase with a script loop that document.write()s chunks of text and then calls document.body.offsetHeight to flush reflow periodically.
(Reporter)

Comment 24

6 years ago
Using the link in the URL field, I can reproduce the exact regression on Aurora as well.  Things *might* be different under slow connections, in which case roc's suggestion in comment 23 might help, but I think that we have enough information now to assert that this regression exists both on mozilla-central and Aurora.

Comment 25

6 years ago
Since there is no proposed fix, please backout as per #21 as soon as possible.
(Reporter)

Comment 26

6 years ago
(In reply to comment #25)
> Since there is no proposed fix, please backout as per #21 as soon as possible.

Simon, can you please take care of this?  I'm not sure which patches to back out exactly.
(Assignee)

Comment 27

6 years ago
http://hg.mozilla.org/mozilla-aurora/rev/13470a83e372
http://hg.mozilla.org/mozilla-aurora/rev/a307c807cfab (bustage fix)
(Assignee)

Comment 28

6 years ago
http://hg.mozilla.org/mozilla-central/rev/fa0295a97f1b

Ehsan, can you confirm that the backout fixes the regression?
(Reporter)

Comment 29

6 years ago
(In reply to comment #28)
> http://hg.mozilla.org/mozilla-central/rev/fa0295a97f1b
> 
> Ehsan, can you confirm that the backout fixes the regression?

It does, yes.  Thanks!
Status: NEW → RESOLVED
Last Resolved: 6 years ago
Resolution: --- → FIXED

Updated

6 years ago
status-firefox5: --- → fixed
You need to log in before you can comment on or make changes to this bug.