367116 - Hang loading 20MB text/plain page

Reporter

Description

•

18 years ago

User-Agent:       Mozilla/5.0 (X11; U; Linux i686; fr; rv:1.8.0.9) Gecko/20061211 SeaMonkey/1.0.7
Build Identifier: Mozilla/5.0 (X11; U; Linux i686; fr; rv:1.8.0.9) Gecko/20061211 SeaMonkey/1.0.7

while looking for informations on the web, I found this page, and seamonkey suddenly became unresponsive : side bar doesn't react neither clicking on tab, and I had to wait for one minute with right button clicked on the tab to have the contextual menu showing and to be able to close it.
It reproduces every time I try to load it. During all that, the Cpu usage keeps between 80 and 100% abd the entire system becomes unresponsive !

Reproducible: Always

Steps to Reproduce:
1.start seamonkey
2.open http://www.fastcgi.com/archives/fastcgi-developers.mbox/fastcgi-developers.mbox
3.wait
Actual Results:  
The system seems to hang up for a very long time, so long that I usually have to close the tab or seamonkey to get back hand on my box

Expected Results:  
as this page is pure text it should load at the speed of the light even if it's very long like it happens using the same seamonkey release on windows XP.

I'm running Seamonkey 1.0.7 under Linux Mandriva 2006

Wayne Mery (:wsmwk)

Updated

•

17 years ago

Status: UNCONFIRMED → RESOLVED

Closed: 17 years ago

Resolution: --- → DUPLICATE

Version: unspecified → 1.0 Branch

Boris Zbarsky [:bzbarsky]

Comment 2

•

17 years ago

How do you know it's the same problem?  Have you done a profile?  Or have some other indication that it's the same issue?

Reopening and marking dependent pending answer.

Status: RESOLVED → UNCONFIRMED

Depends on: 216418

Resolution: DUPLICATE → ---

Boris Zbarsky [:bzbarsky]

Comment 3

•

17 years ago

In particular, the page in question is text/plain.  So there's no HTML parsing going on, and none of the deep nesting happening in bug 216418.

So whatever is going on here is a totally different beastie.

No longer depends on: 216418

Boris Zbarsky [:bzbarsky]

Updated

•

17 years ago

Blocks: 384260

Boris Zbarsky [:bzbarsky]

Comment 4

•

17 years ago

So I profiled this on trunk, with the textframe changes checked in today.  The document in question is, of course about 20MB of text.  I killed my browser when it had reached about 700MB of RAM, because it was about to start seriously swapping -- only have a bit more than that on this machine.  It might be worth looking into _why_ we ended up using that much RAM.

Unfortunately, I cannot attach the file to the bug, because even bz2-compressed it's 3+ MB.  I'll have it in case someone needs it, though.

The profile (which I basically ran for about 10 incremental relayouts as the page loaded, a ways into the load) shows us spending about 75% of the time in reflow and 25% firing the text run cache expiration timer.  Pretty much all of reflow is spent under nsTextFrame::Reflow.

Under nsTextFrame::Reflow, gfxTextRunWordCache::MakeTextRun is what takes up about 66% of the time.  This is split as:

 52036 17917   158123 gfxTextRunWordCache::MakeTextRun
                62759 gfxTextRunWordCache::LookupWord
                25355 gfxPangoFontGroup::MakeTextRun
                13528 gfxTextRun::SetSpaceGlyph
                11922 gfxTextRun::gfxTextRun
                 9592 IsWordBoundary
                 6375 memcpy

Under FrameTextRunCache::NotifyExpired we have:

52033 18981    83654 gfxTextRunWordCache::RemoveTextRun
               48441 gfxTextRunWordCache::RemoveWord
               12042 IsWordBoundary
               3280 IsBoundarySpace

I'll attach the profile.

Assignee: general → nobody

Status: UNCONFIRMED → NEW

Component: General → Layout: Fonts and Text

Ever confirmed: true

Product: Mozilla Application Suite → Core

QA Contact: general → layout.fonts-and-text

Version: 1.0 Branch → Trunk

Boris Zbarsky [:bzbarsky]

Comment 5

•

17 years ago

Attached file Zipped-up profile (obsolete) — Details

Boris Zbarsky [:bzbarsky]

Comment 6

•

17 years ago

Oh, looking bottom-up, the top functions time is spent _in_ (not under) are:

Total hit count: 327046
Count %Total  Function Name
32188   9.8     gfxTextRunWordCache::CacheHashEntry::KeyEquals(gfxTextRunWordCache::CacheHashKey const*) const
23349   7.1     IsWordBoundary(unsigned short)
18981   5.8     gfxTextRunWordCache::RemoveTextRun(gfxTextRun*)
17917   5.5     gfxTextRunWordCache::MakeTextRun(unsigned short const*, unsigned int, gfxFontGroup*, gfxTextRunFactory::Parameters const*, unsigned int, int*)
17080   5.2     gfxTextRun::CopyGlyphDataFrom(gfxTextRun*, unsigned int, unsigned int, unsigned int, int)
15674   4.8     SearchTable
13842   4.2     IsBoundarySpace(unsigned short)
11945   3.7     gfxTextRun::gfxTextRun(gfxTextRunFactory::Parameters const*, void const*, unsigned int, gfxFontGroup*, unsigned int)
10008   3.1     gfxTextRun::SetSpaceGlyph(gfxFont*, gfxContext*, unsigned int)

Boris Zbarsky [:bzbarsky]

Updated

•

17 years ago

Blocks: 319143

tech

Reporter

Comment 7

•

17 years ago

(In reply to comment #1)
> 
> *** This bug has been marked as a duplicate of bug 216418 ***
> 
Remember I'm talking about Linux version... I don't know how you can compare Mozilla to IE under Linux ;)
I hadn't checked this bug since a few Seamonkey releases. With the 1.1.1 I'm using now, loading the indicated page doesn't hang anymore, during all the time the page is loading, I'm now able to work on other apps running in my box. I don't know if  it's due to Seamonkey upgrades or to Mandriva upgrades, but now I don't have this  problem anymore :)

Boris Zbarsky [:bzbarsky]

Comment 8

•

17 years ago

Oh, and I did my testing on Linux, of course.

Robert O'Callahan (:roc) (email my personal email if necessary)

Comment 9

•

17 years ago

If it's ASCII text then I expect we should be using about 6 bytes per character plus the cost of the textrun word cache, which is one 12-byte hash entry per distinct white-space delimited word in the text, divided by the hash load factor of course. Let's say average word length 5 characters, load factor 0.5, that's about 5 more bytes per character. Total 11 bytes per character, 220MB for your 20MB file. I don't know why we'd bloat out way past that. Maybe there's a leak. That's one thing I need to look at.

We could easily size-limit the textrun word cache to a certain number of entries. This would reduce the overhead back to 6 bytes per character, 120MB.

The other issue we may be hitting here is that all the text in the file likely ends up in one big textrun. Every time we append content to the file we have to rebuild the textrun from scratch. That means scanning the frame tree back to the start of the block looking for safe places to start, then starting over and rebuilding the entire textrun. Hence quadratic behaviour for plain text files. This should definitely be fixed. A couple of options spring to mind:
-- stop textruns at preformatted newline boundaries. Kind of painful since we construct textruns before reflow has happened, and at that time, all the preformatted text is in one frame and hasn't been broken into lines yet. So we'd have multiple textruns for the same frame which we don't really support. We could perhaps alter frame construction so that it creates one frame per line from the beginning. Maybe we could get away with just creating a textrun for the first line and then creating the rest on demand. Still dodgy.
-- support efficient appending to a textrun in CharacterDataChanged and when textrun construction discovers extra frames that should be added to an existing textrun. This might be the best way to handle giant text files with collapsing whitespace, but it would be very difficult to avoid the scanning process used by textrun construction having to scan back to the start of the lines and then forwards again.

Once we get bogged down and very slow, things get even worse because textrun words time out of the cache constantly so our cache locality drops (although there still are cache hits because the textrun contains many occurrences of the same word).

Wayne Mery (:wsmwk)

Comment 10

•

17 years ago

(In reply to comment #2)
> How do you know it's the same problem?  Have you done a profile?  Or have some
> other indication that it's the same issue?

I based it on seeing <mailto: and some other constructs.  But you're quite right, I jumped the gun.


(In reply to comment #7)
> Remember I'm talking about Linux version... I don't know how you can compare
> Mozilla to IE under Linux ;)

The problem is OS=ALL, so in windows a comparison against IE is useful.

roc comment #9:
> (crazy good analysis)
>
> Once we get bogged down and very slow, things get even worse because textrun
> words time out of the cache constantly so our cache locality drops (although
> there still are cache hits because the textrun contains many occurrences of the
> same word).

would cache locality tanking explain why it eventually beats the crap out of windows' performance?

also, see bug 319143 comment 15 and comment 14 which illustrates how quickly the performance deteriorates (file is 150MB)

Robert O'Callahan (:roc) (email my personal email if necessary)

Comment 11

•

17 years ago

> -- stop textruns at preformatted newline boundaries.

I think this is what I'll do because I think it's relatively simple. It will also reduce memory usage because textruns for earlier lines in the file will just expire and won't need to be recreated as we add lines to the page.

Boris Zbarsky [:bzbarsky]

Comment 12

•

17 years ago

As far as memory usage goes...  We're presumably creating one textframe per line, right?  Possibly more dependign on where the 4096-byte splits of the content nodes fall.  An nsTextFrameThebes in an opt build is 60 bytes.  The file has 486789 lines, which gives about 30MB for the textframe objects.

For each line we have an nsLineBox, which is 40 bytes.  That's another 20MB or 
so.

An nsTextNode plus nsTextFragment is 44 bytes, not counting the actual text.  One of those every 2048 bytes in the file, which doesn't seem like very much.

massif reports that about half the allocations are coming from PLArenas in its graph...   Unfortunately, the text output from massif seems to be a huge lie, so I can't tell much about what's going on there.  :(

So I'm also somewhat at a loss as to where the memory usage is coming from.  I wish massif reported the truth....

Boris Zbarsky [:bzbarsky]

Comment 13

•

17 years ago

I can't recall now whether I was watching the VM or RSS number.  If we're effectively doing realloc() a bunch of times (creating giant text runs) with increasing sizes every time, we could be running into an interesting fragmentation issue...  The new textrun won't fit in the old hole, so we give it a new spot, then we allocate some new nodes and textframes, then realloc the textrun, and it doesn't fit in any existing hole, so we brk() more heap, etc...

Robert O'Callahan (:roc) (email my personal email if necessary)

Comment 14

•

17 years ago

For really large allocations, I thought glibc would just use mmap. (And munmap when freeing.)

Robert O'Callahan (:roc) (email my personal email if necessary)

Comment 15

•

17 years ago

But it's certainly possible that fragmentation is biting us. Per-line textruns would help with that.

Boris Zbarsky [:bzbarsky]

Comment 16

•

17 years ago

I'd apparently run massif on an old-textframe build.  In a new-textframe build, 70% of the memory is allocated in gfxTextRun::gfxTextRun.  It's split 50-20 between the CompressedGlyph array and the PRUnichar array...  The char array does seem to be allocated some, but "not much".  I still wish the percentage stuff worked better.  :(  This is all for about 130MB total allocation, though.  I should let it run longer.

And leak logging says we don't leak textruns, fwiw.

Boris Zbarsky [:bzbarsky]

Comment 17

•

17 years ago

OK, I just looked at the memory usage again.  When "top" says:

VIRT: 623m  RSS: 444m

malloc_stats() says:

Arena 0:
system bytes     =   90439680
in use bytes     =   90363336
Arena 1:
system bytes     =     135168
in use bytes     =       2256
Total (incl. mmap):
system bytes     =  485224448
in use bytes     =  485015192
max mmap regions =         69
max mmap bytes   =  448593920
$3 = 9852032

So you're right.  The large allocations are mmap'ed, and there's not much fragmentation.

Is it possible that we have more than one giant textrun live at a time, by any chance?

Robert O'Callahan (:roc) (email my personal email if necessary)

Comment 18

•

17 years ago

You might accumulate old ones in the cache, but they'd be flushed after 30 seconds.

Boris Zbarsky [:bzbarsky]

Comment 19

•

17 years ago

Hmm.  We're doing layouts more often than that over here, I think, when I see the memory growth...

Boris Zbarsky [:bzbarsky]

Comment 20

•

17 years ago

Stopping preformatted textruns at newlines happened in bug 386122.  I'll reprofile this once I pull in that patch.

It might be worth filing a separate bug on what happens when there's a large chunk of text that is _not_ preformatted....

Boris Zbarsky [:bzbarsky]

Updated

•

17 years ago

Depends on: 386122

Boris Zbarsky [:bzbarsky]

Comment 21

•

17 years ago

Attached file Updated zipped-up profile — Details

With that patch (preformatted newlines), the memory usage goes from 28m resident and 99m virtual before I load the page to 430m resident and 572m virtual after I load it. But at least it loads in fairly finite time now, and the memory never grew over that final number, as far as I can tell.

Now we're spending about 2% of total time expiring textruns and 91% of total time under reflow. nsTextFrame::Reflow takes up about 31% of total time on this testcase. The breakdown there is:

132024 22672 458227 nsTextFrame::Reflow
272914 gfxTextRun::BreakAndMeasureText
154243 nsTextFrame::EnsureTextRun

and minor stuff. A fair amount of the former seems to be dealing with the literal tabs in that file. In particular, PropertyProvider::GetTabWidths is about 13% of the profile.

Looked at bottom-up, we have:

Total hit count: 1462009
Count %Total Function Name
265948 18.2 nsBlockFrame::ReflowDirtyLines
236718 16.2 nsBlockFrame::ComputeCombinedArea
65660 4.5 nsRect::UnionRect
38717 2.6 PropertyProvider::GetSpacingInternal

Those are times _in_ those functions, not under. UnionRect is largely called from nsBlockFrame::ComputeCombinedArea, but partially from nsBlockFrame::Reflow.

In general, it looks like this page is long enough to hit the O(N^2) behavior implicit in the repeated line traversals ReflowDirtyLines and ComputeCombinedArea do as we add content to the block... I wonder whether we can optimize where we start somehow.

Attachment #268202 - Attachment is obsolete: true

Boris Zbarsky [:bzbarsky]

Updated

•

17 years ago

Attachment #271157 - Attachment is patch: false

Attachment #271157 - Attachment mime type: text/plain → application/zip

Boris Zbarsky [:bzbarsky]

Comment 22

•

17 years ago

roc, if the file were not preformatted (or plaintext) we'd still end up with the giant textruns, right?  Do you want a separate bug to track that, or just not worry about it?

Robert O'Callahan (:roc) (email my personal email if necessary)

Comment 23

•

17 years ago

That'd be quite a bit harder to fix. And I think such testcases are quite a lot rarer. So I don't plan to worry about it.

Jesse Ruderman

Updated

•

17 years ago

Severity: normal → critical

Keywords: hang

Summary: seamonkey completly hanged and using all cpu ressources while loading this page → Hang loading 20MB text/plain page

Jesse Ruderman

Comment 24

•

17 years ago

See also bug 390051, hang with 6MB binary file incorrectly sent as text/plain.

Jesse Ruderman

Updated

•

17 years ago

Keywords: perf

Randell Jesup [:jesup] (needinfo me)

Comment 26

•

11 years ago

Attached file Profile on a current nightly of a tbpl log — Details

Profile of 
https://tbpl.mozilla.org/php/getParsedLog.php?id=22019202&tree=Mozilla-Central&full=1

In a mozilla-inbound full-opt/no-debug build with --enable-jprof, JP_DEFER JP_PERIOD=0.002, linux x64 Fedora 17 fast XEON.  Started profile before loading page, stopped it after page was loaded (kill -PROF xxxx; load page; kill -USR1 xxxx)

Boris Zbarsky [:bzbarsky]

Comment 27

•

11 years ago

That's showing mostly time in textrun construction.

Are we constructing lots of textruns or one huge textrun?

Wayne Mery (:wsmwk)

Updated

•

11 years ago

Blocks: 890457

Wayne Mery (:wsmwk)

Updated

•

3 years ago

Blocks: 823898

Rares Doghi, Desktop QA

Updated

•

3 years ago

Whiteboard: qa-not-actionable

Daniel Holbert [:dholbert]

Comment 28

•

2 years ago

(In reply to tech from comment #0)

2.open
http://www.fastcgi.com/archives/fastcgi-developers.mbox/fastcgi-developers.mbox

Unfortunately this site seems to be down now. However, archive.org does have some snapshots, including one from a bout a week before this bug was filed:
https://web.archive.org/web/20070108091906/http://www.fastcgi.com/archives/fastcgi-developers.mbox/fastcgi-developers.mbox
...so it should pretty-faithfully capture the testcase that was used here.

Daniel Holbert [:dholbert]

Comment 29

•

2 years ago

Attached file gzipped version of "fastcgi-developers-2007-01-08.mbox" from 2007-01-08 archive.org snapshot — Details

Daniel Holbert [:dholbert]

Updated

•

2 years ago

Attachment #9290566 - Attachment description: gzipped version of "fastcgi-developers-2007-01-08.mbox" from 2007-01-08 arhcive.org snapshot → gzipped version of "fastcgi-developers-2007-01-08.mbox" from 2007-01-08 archive.org snapshot

Daniel Holbert [:dholbert]

Updated

•

2 years ago

Attachment #9290566 - Attachment filename: fastcgi-developers-2007-01-08.mbox.gz → fastcgi-developers-2007-01-08.mbox.txt.gz

Daniel Holbert [:dholbert]

Comment 30

•

2 years ago

•

Edited

In current Firefox Nightly on my Lenovo ThinkPad, we spend about 5 seconds reflowing the page (during which time the tab is blank) and then we finish rendering and we're done. Profile: https://share.firefox.dev/3ptF5aN

This doesn't seem particularly bad, particularly given that e10s will mean the frontend and most/all background tabs should remain responsive.

Also, RE memory usage (which I imagine is why the original reporter's system came to a halt due to swapping): bz mentioned > 700MB RAM usage in comment 4 (possibly much more, he just killed the browser at that point), but my profile shows our content process peaking at about 250-260MB RAM. So we seem to have improved that metric here as well.

So: I suspect we can consider this bug essentially fixed by a combination of e10s and various optimizations/improvements over the years.

Status: NEW → RESOLVED

Closed: 17 years ago → 2 years ago

Resolution: --- → WORKSFORME

Zipped-up profile 17 years ago Boris Zbarsky [:bzbarsky] 78.50 KB, application/zip		Details
Updated zipped-up profile 17 years ago Boris Zbarsky [:bzbarsky] 486.37 KB, application/zip		Details
Profile on a current nightly of a tbpl log 11 years ago Randell Jesup [:jesup] (needinfo me) 539.70 KB, text/html		Details
gzipped version of "fastcgi-developers-2007-01-08.mbox" from 2007-01-08 archive.org snapshot 2 years ago Daniel Holbert [:dholbert] 3.79 MB, application/gzip		Details