Closed Bug 367116 Opened 18 years ago Closed 2 years ago

Hang loading 20MB text/plain page

Categories

(Core :: Layout: Text and Fonts, defect)

x86
Linux
defect
Not set
critical

Tracking

()

RESOLVED WORKSFORME

People

(Reporter: tech, Unassigned)

References

(Blocks 1 open bug, )

Details

(Keywords: hang, perf, Whiteboard: qa-not-actionable)

Attachments

(3 files, 1 obsolete file)

User-Agent:       Mozilla/5.0 (X11; U; Linux i686; fr; rv:1.8.0.9) Gecko/20061211 SeaMonkey/1.0.7
Build Identifier: Mozilla/5.0 (X11; U; Linux i686; fr; rv:1.8.0.9) Gecko/20061211 SeaMonkey/1.0.7

while looking for informations on the web, I found this page, and seamonkey suddenly became unresponsive : side bar doesn't react neither clicking on tab, and I had to wait for one minute with right button clicked on the tab to have the contextual menu showing and to be able to close it.
It reproduces every time I try to load it. During all that, the Cpu usage keeps between 80 and 100% abd the entire system becomes unresponsive !

Reproducible: Always

Steps to Reproduce:
1.start seamonkey
2.open http://www.fastcgi.com/archives/fastcgi-developers.mbox/fastcgi-developers.mbox
3.wait
Actual Results:  
The system seems to hang up for a very long time, so long that I usually have to close the tab or seamonkey to get back hand on my box

Expected Results:  
as this page is pure text it should load at the speed of the light even if it's very long like it happens using the same seamonkey release on windows XP.

I'm running Seamonkey 1.0.7 under Linux Mandriva 2006
Status: UNCONFIRMED → RESOLVED
Closed: 17 years ago
Resolution: --- → DUPLICATE
Version: unspecified → 1.0 Branch
How do you know it's the same problem?  Have you done a profile?  Or have some other indication that it's the same issue?

Reopening and marking dependent pending answer.
Status: RESOLVED → UNCONFIRMED
Depends on: 216418
Resolution: DUPLICATE → ---
In particular, the page in question is text/plain.  So there's no HTML parsing going on, and none of the deep nesting happening in bug 216418.

So whatever is going on here is a totally different beastie.
No longer depends on: 216418
Blocks: 384260
So I profiled this on trunk, with the textframe changes checked in today.  The document in question is, of course about 20MB of text.  I killed my browser when it had reached about 700MB of RAM, because it was about to start seriously swapping -- only have a bit more than that on this machine.  It might be worth looking into _why_ we ended up using that much RAM.

Unfortunately, I cannot attach the file to the bug, because even bz2-compressed it's 3+ MB.  I'll have it in case someone needs it, though.

The profile (which I basically ran for about 10 incremental relayouts as the page loaded, a ways into the load) shows us spending about 75% of the time in reflow and 25% firing the text run cache expiration timer.  Pretty much all of reflow is spent under nsTextFrame::Reflow.

Under nsTextFrame::Reflow, gfxTextRunWordCache::MakeTextRun is what takes up about 66% of the time.  This is split as:

 52036 17917   158123 gfxTextRunWordCache::MakeTextRun
                62759 gfxTextRunWordCache::LookupWord
                25355 gfxPangoFontGroup::MakeTextRun
                13528 gfxTextRun::SetSpaceGlyph
                11922 gfxTextRun::gfxTextRun
                 9592 IsWordBoundary
                 6375 memcpy

Under FrameTextRunCache::NotifyExpired we have:

52033 18981    83654 gfxTextRunWordCache::RemoveTextRun
               48441 gfxTextRunWordCache::RemoveWord
               12042 IsWordBoundary
               3280 IsBoundarySpace

I'll attach the profile.
Assignee: general → nobody
Status: UNCONFIRMED → NEW
Component: General → Layout: Fonts and Text
Ever confirmed: true
Product: Mozilla Application Suite → Core
QA Contact: general → layout.fonts-and-text
Version: 1.0 Branch → Trunk
Attached file Zipped-up profile (obsolete) —
Oh, looking bottom-up, the top functions time is spent _in_ (not under) are:

Total hit count: 327046
Count %Total  Function Name
32188   9.8     gfxTextRunWordCache::CacheHashEntry::KeyEquals(gfxTextRunWordCache::CacheHashKey const*) const
23349   7.1     IsWordBoundary(unsigned short)
18981   5.8     gfxTextRunWordCache::RemoveTextRun(gfxTextRun*)
17917   5.5     gfxTextRunWordCache::MakeTextRun(unsigned short const*, unsigned int, gfxFontGroup*, gfxTextRunFactory::Parameters const*, unsigned int, int*)
17080   5.2     gfxTextRun::CopyGlyphDataFrom(gfxTextRun*, unsigned int, unsigned int, unsigned int, int)
15674   4.8     SearchTable
13842   4.2     IsBoundarySpace(unsigned short)
11945   3.7     gfxTextRun::gfxTextRun(gfxTextRunFactory::Parameters const*, void const*, unsigned int, gfxFontGroup*, unsigned int)
10008   3.1     gfxTextRun::SetSpaceGlyph(gfxFont*, gfxContext*, unsigned int)
Blocks: 319143
(In reply to comment #1)
> 
> *** This bug has been marked as a duplicate of bug 216418 ***
> 
Remember I'm talking about Linux version... I don't know how you can compare Mozilla to IE under Linux ;)
I hadn't checked this bug since a few Seamonkey releases. With the 1.1.1 I'm using now, loading the indicated page doesn't hang anymore, during all the time the page is loading, I'm now able to work on other apps running in my box. I don't know if  it's due to Seamonkey upgrades or to Mandriva upgrades, but now I don't have this  problem anymore :)
   

Oh, and I did my testing on Linux, of course.
If it's ASCII text then I expect we should be using about 6 bytes per character plus the cost of the textrun word cache, which is one 12-byte hash entry per distinct white-space delimited word in the text, divided by the hash load factor of course. Let's say average word length 5 characters, load factor 0.5, that's about 5 more bytes per character. Total 11 bytes per character, 220MB for your 20MB file. I don't know why we'd bloat out way past that. Maybe there's a leak. That's one thing I need to look at.

We could easily size-limit the textrun word cache to a certain number of entries. This would reduce the overhead back to 6 bytes per character, 120MB.

The other issue we may be hitting here is that all the text in the file likely ends up in one big textrun. Every time we append content to the file we have to rebuild the textrun from scratch. That means scanning the frame tree back to the start of the block looking for safe places to start, then starting over and rebuilding the entire textrun. Hence quadratic behaviour for plain text files. This should definitely be fixed. A couple of options spring to mind:
-- stop textruns at preformatted newline boundaries. Kind of painful since we construct textruns before reflow has happened, and at that time, all the preformatted text is in one frame and hasn't been broken into lines yet. So we'd have multiple textruns for the same frame which we don't really support. We could perhaps alter frame construction so that it creates one frame per line from the beginning. Maybe we could get away with just creating a textrun for the first line and then creating the rest on demand. Still dodgy.
-- support efficient appending to a textrun in CharacterDataChanged and when textrun construction discovers extra frames that should be added to an existing textrun. This might be the best way to handle giant text files with collapsing whitespace, but it would be very difficult to avoid the scanning process used by textrun construction having to scan back to the start of the lines and then forwards again.

Once we get bogged down and very slow, things get even worse because textrun words time out of the cache constantly so our cache locality drops (although there still are cache hits because the textrun contains many occurrences of the same word).
(In reply to comment #2)
> How do you know it's the same problem?  Have you done a profile?  Or have some
> other indication that it's the same issue?

I based it on seeing <mailto: and some other constructs.  But you're quite right, I jumped the gun.


(In reply to comment #7)
> Remember I'm talking about Linux version... I don't know how you can compare
> Mozilla to IE under Linux ;)

The problem is OS=ALL, so in windows a comparison against IE is useful.

roc comment #9:
> (crazy good analysis)
>
> Once we get bogged down and very slow, things get even worse because textrun
> words time out of the cache constantly so our cache locality drops (although
> there still are cache hits because the textrun contains many occurrences of the
> same word).

would cache locality tanking explain why it eventually beats the crap out of windows' performance?

also, see bug 319143 comment 15 and comment 14 which illustrates how quickly the performance deteriorates (file is 150MB)

> -- stop textruns at preformatted newline boundaries.

I think this is what I'll do because I think it's relatively simple. It will also reduce memory usage because textruns for earlier lines in the file will just expire and won't need to be recreated as we add lines to the page.
As far as memory usage goes...  We're presumably creating one textframe per line, right?  Possibly more dependign on where the 4096-byte splits of the content nodes fall.  An nsTextFrameThebes in an opt build is 60 bytes.  The file has 486789 lines, which gives about 30MB for the textframe objects.

For each line we have an nsLineBox, which is 40 bytes.  That's another 20MB or 
so.

An nsTextNode plus nsTextFragment is 44 bytes, not counting the actual text.  One of those every 2048 bytes in the file, which doesn't seem like very much.

massif reports that about half the allocations are coming from PLArenas in its graph...   Unfortunately, the text output from massif seems to be a huge lie, so I can't tell much about what's going on there.  :(

So I'm also somewhat at a loss as to where the memory usage is coming from.  I wish massif reported the truth....
I can't recall now whether I was watching the VM or RSS number.  If we're effectively doing realloc() a bunch of times (creating giant text runs) with increasing sizes every time, we could be running into an interesting fragmentation issue...  The new textrun won't fit in the old hole, so we give it a new spot, then we allocate some new nodes and textframes, then realloc the textrun, and it doesn't fit in any existing hole, so we brk() more heap, etc...
For really large allocations, I thought glibc would just use mmap. (And munmap when freeing.)
But it's certainly possible that fragmentation is biting us. Per-line textruns would help with that.
I'd apparently run massif on an old-textframe build.  In a new-textframe build, 70% of the memory is allocated in gfxTextRun::gfxTextRun.  It's split 50-20 between the CompressedGlyph array and the PRUnichar array...  The char array does seem to be allocated some, but "not much".  I still wish the percentage stuff worked better.  :(  This is all for about 130MB total allocation, though.  I should let it run longer.

And leak logging says we don't leak textruns, fwiw.
OK, I just looked at the memory usage again.  When "top" says:

VIRT: 623m  RSS: 444m

malloc_stats() says:

Arena 0:
system bytes     =   90439680
in use bytes     =   90363336
Arena 1:
system bytes     =     135168
in use bytes     =       2256
Total (incl. mmap):
system bytes     =  485224448
in use bytes     =  485015192
max mmap regions =         69
max mmap bytes   =  448593920
$3 = 9852032

So you're right.  The large allocations are mmap'ed, and there's not much fragmentation.

Is it possible that we have more than one giant textrun live at a time, by any chance?
You might accumulate old ones in the cache, but they'd be flushed after 30 seconds.
Hmm.  We're doing layouts more often than that over here, I think, when I see the memory growth...
Stopping preformatted textruns at newlines happened in bug 386122.  I'll reprofile this once I pull in that patch.

It might be worth filing a separate bug on what happens when there's a large chunk of text that is _not_ preformatted....
Depends on: 386122
With that patch (preformatted newlines), the memory usage goes from 28m resident and 99m virtual before I load the page to 430m resident and 572m virtual after I load it.  But at least it loads in fairly finite time now, and the memory never grew over that final number, as far as I can tell.

Now we're spending about 2% of total time expiring textruns and 91% of total time under reflow.  nsTextFrame::Reflow takes up about 31% of total time on this testcase.  The breakdown there is:

132024 22672   458227 nsTextFrame::Reflow
               272914 gfxTextRun::BreakAndMeasureText
               154243 nsTextFrame::EnsureTextRun

and minor stuff.  A fair amount of the former seems to be dealing with the literal tabs in that file.  In particular, PropertyProvider::GetTabWidths is about 13% of the profile.

Looked at bottom-up, we have:


Total hit count: 1462009
Count %Total  Function Name
265948   18.2     nsBlockFrame::ReflowDirtyLines
236718   16.2     nsBlockFrame::ComputeCombinedArea
65660   4.5     nsRect::UnionRect
38717   2.6     PropertyProvider::GetSpacingInternal

Those are times _in_ those functions, not under.  UnionRect is largely called from nsBlockFrame::ComputeCombinedArea, but partially from nsBlockFrame::Reflow.

In general, it looks like this page is long enough to hit the O(N^2) behavior implicit in the repeated line traversals ReflowDirtyLines and ComputeCombinedArea do as we add content to the block...  I wonder whether we can optimize where we start somehow.
Attachment #268202 - Attachment is obsolete: true
Attachment #271157 - Attachment is patch: false
Attachment #271157 - Attachment mime type: text/plain → application/zip
roc, if the file were not preformatted (or plaintext) we'd still end up with the giant textruns, right?  Do you want a separate bug to track that, or just not worry about it?
That'd be quite a bit harder to fix. And I think such testcases are quite a lot rarer. So I don't plan to worry about it.
Severity: normal → critical
Keywords: hang
Summary: seamonkey completly hanged and using all cpu ressources while loading this page → Hang loading 20MB text/plain page
See also bug 390051, hang with 6MB binary file incorrectly sent as text/plain.
Keywords: perf
Profile of 
https://tbpl.mozilla.org/php/getParsedLog.php?id=22019202&tree=Mozilla-Central&full=1

In a mozilla-inbound full-opt/no-debug build with --enable-jprof, JP_DEFER JP_PERIOD=0.002, linux x64 Fedora 17 fast XEON.  Started profile before loading page, stopped it after page was loaded (kill -PROF xxxx; load page; kill -USR1 xxxx)
That's showing mostly time in textrun construction.

Are we constructing lots of textruns or one huge textrun?
Blocks: 890457
Blocks: 823898
Whiteboard: qa-not-actionable

(In reply to tech from comment #0)

2.open
http://www.fastcgi.com/archives/fastcgi-developers.mbox/fastcgi-developers.mbox

Unfortunately this site seems to be down now. However, archive.org does have some snapshots, including one from a bout a week before this bug was filed:
https://web.archive.org/web/20070108091906/http://www.fastcgi.com/archives/fastcgi-developers.mbox/fastcgi-developers.mbox
...so it should pretty-faithfully capture the testcase that was used here.

Attachment #9290566 - Attachment description: gzipped version of "fastcgi-developers-2007-01-08.mbox" from 2007-01-08 arhcive.org snapshot → gzipped version of "fastcgi-developers-2007-01-08.mbox" from 2007-01-08 archive.org snapshot
Attachment #9290566 - Attachment filename: fastcgi-developers-2007-01-08.mbox.gz → fastcgi-developers-2007-01-08.mbox.txt.gz

In current Firefox Nightly on my Lenovo ThinkPad, we spend about 5 seconds reflowing the page (during which time the tab is blank) and then we finish rendering and we're done. Profile: https://share.firefox.dev/3ptF5aN

This doesn't seem particularly bad, particularly given that e10s will mean the frontend and most/all background tabs should remain responsive.

Also, RE memory usage (which I imagine is why the original reporter's system came to a halt due to swapping): bz mentioned > 700MB RAM usage in comment 4 (possibly much more, he just killed the browser at that point), but my profile shows our content process peaking at about 250-260MB RAM. So we seem to have improved that metric here as well.

So: I suspect we can consider this bug essentially fixed by a combination of e10s and various optimizations/improvements over the years.

Status: NEW → RESOLVED
Closed: 17 years ago2 years ago
Resolution: --- → WORKSFORME
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: