Closed Bug 441669 Opened 16 years ago Closed 14 years ago

imagemap with large number of <area> statements (30000) is VERY slow to load and appears to hang.

Categories

(Core :: Layout, defect)

x86
Windows XP
defect
Not set
critical

Tracking

()

VERIFIED FIXED

People

(Reporter: yep498, Assigned: bzbarsky)

References

Details

(Keywords: perf, testcase)

Attachments

(6 files, 1 obsolete file)

User-Agent:       Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9) Gecko/2008052906 Firefox/3.0
Build Identifier: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9) Gecko/2008052906 Firefox/3.0

Given an image with 31545 area shape definitions, CPU spikes to 100% and takes 4:20 to load the image. FF2 loads same image in :10. HTML was generated by ploticus. Many of the shapes are 1 pixel wide. i.e. coords="373
,753,373,753"

Reproducible: Always

Steps to Reproduce:
1.Load page with large image and many "area shape" definitions in html.
2.
3.
Actual Results:  
CPU spiked to 100%, took 4:20 to load image.

Expected Results:  
Expected page to load in same time or faster than FF2. Expected page to load in 10 seconds or less.

This is a serious limitation in usability for me. I produce Network up/down status pages that require a large number of areas to be defined in order to pop-up information about the selected portion of the graph. With CPU spiked and no response for more than 4 minutes, the program appears hung. I tried loading a page with 44348 areas and ended up killing Firefox instead of continuing to wait.
Could you perhaps attach a testcase that shows the issue?
This test html/png imagemap contains 19008 <area shape ...> statements. The name of this file is "test-data-30day.html".
This file contains a test case with 32047 <area shape ...> imagemap statements. The name of the file is "test-data-60day.html".
This test case contains an imagemap html with 44793 area shape statements. This html was compressed with bzip2 in order to shrink it enough to post here. Use bunzip2 to uncompress the html after extracting the archive.
I think I see a regression here, current trunk builds tend to take like appr. 40000ms, while branch builds seem to take appr. 15000ms.
Not sure though, trunk seems to vary with results considerably.
Ria, would you perhaps willing to find out if there is a clear regression range for this bug?
For what it's worth, resize is affected by this as well. After the page loads, resizing triggers the CPU spike and it may take 60 seconds or so before the page redraw is complete.
I can reproduce this behaviour on the latest trunk.

If you remove the newlines after each <area> tag, it takes substantially less time to load. It still feels quite laggy though.
Random stack trace while the CPU is pegged at 100%:

ntkrnlpa.exe!KiUnexpectedInterrupt+0x121
ntkrnlpa.exe!ZwYieldExecution+0x1c8e
hal.dll+0x2ef2
xul.dll!gfxSkipCharsIterator::SetOffsets+0x398
xul.dll!gfxSkipCharsIterator::SetOffsets+0x160
xul.dll!gfxWindowsFontGroup::MakeTextRun+0x5177
xul.dll!gfxASurface::GetDefaultContextFlags+0x93c1c
xul.dll!gfxSkipCharsIterator::SetOffsets+0x36ce
xul.dll!gfxTextRun::gfxTextRun+0xc21
xul.dll!gfxWindowsFontGroup::MakeTextRun+0xb95
xul.dll!gfxFontUtils::GetPrefsFontList+0x10bc
xul.dll!gfxFontUtils::GetPrefsFontList+0xdc0

All the samples I captured look something like this.

Gecko/20080917034815 with TraceMonkey enabled
Was about to post the same bug -- I'm also graphing events that need area maps. 

This renders in an instant on chrome and safari.
Mozilla/5.0 (Windows; Windows NT 5.1; en-US; rv:2.0b2pre) Gecko/20100712 Minefield/4.0b2pre

My numbers (locally):

2.0 trunk     30 sec
Safari       0.4 sec
Opera 10.6   0.3 sec
What's the right component for this bug?
Status: UNCONFIRMED → NEW
Ever confirmed: true
Keywords: perf, testcase
Summary: imagemap with large number of <area shape ..> statements (30000) is VERY slow to load and appears to hang. → imagemap with large number of <area> statements (30000) is VERY slow to load and appears to hang.
layout
Component: General → Layout
Product: Firefox → Core
QA Contact: general → layout
OK.  So general info on the profile:

54% in nsTextFrame::EnsureTextRun (called from TrimTrailingWhiteSpace).  This is almost all of the reflow time.

38% appending undisplayed nodes.  See bug 560616.

The frame tree looks like a block for the <map> which contains a bajillion (well, 18652, which is one less than the number of <area>s) textnodes each containing "\n".  I'm not quite sure how many calls to EnsureTextRun we end up making or what takes so long inside there.  The per-line profile in shark claims 70% of the function's time is on this line:

  2022       if (flow->mStartFrame->GetContent() == mContent) {

and in particular on this instruction:

  mov  ecx, dword ptr [ebp+12]

20% is on the for loop header on line 2020 (half mov instruction, half add instruction).  10% is the lea for reading &userData->mMappedFlows[i].

L2 cache miss profile, unsurprisingly, claims lots of L2 misses on all of those, especially on line 2022.
Depends on: 560616
Oh, and the point is that in that loop userData->mMappedFlowCount is 18652.  And we call EnsureTextRun on each one of those textframes, as far as I can see, running the entire loop each time.  TEXT_IS_SIMPLE_FLOW is not set, of course.

It looks like the point of the loop is finding the flow that maps mContent, right?  and it searches forward, then backward... but in this case the flow we want is always one back from the one we last looked at, so the forward search is a waste of time.
OK, I have a fix for the textrun part of this.  I'm not seeing the multi-tens-of-seconds times on my machine (bigger L2 cache?), but the time on the last testcase in this bug does go down from 2.7 seconds to 1.1 seconds or so, as expected from the profile.
Attached patch Like so (obsolete) — Splinter Review
Assignee: nobody → bzbarsky
Status: NEW → ASSIGNED
Attachment #457927 - Flags: review?(roc)
(In reply to comment #18)
> I'm not seeing the multi-tens-of-seconds times on my machine

BTW, Fx3.6 is 5-10 seconds faster than current trunk on the machine I tested with (Pentium 4, 1800 MHz, 256 MB RAM)
Looks good, but the code would be simpler if the loop to find the right flow was separated into a helper function.
> BTW, Fx3.6 is 5-10 seconds faster than current trunk

I was doing all the testing+profiling on trunk.  That's what the 2.7s number was for.  3.6 is at 2.6s or so.

> simpler if the loop to find the right flow was separated into a helper function.

Yeah, fair.  I'll do that.
That said, the P4 has a tiny L2 cache (order of 256KB, iirc).  The processor I'm on has a 6MB L2 cache.  So it's possible that you really do get the multi-tens-of-seconds behavior just due to a slew of cache misses...
Attached patch Like thisSplinter Review
Attachment #457927 - Attachment is obsolete: true
Attachment #458043 - Flags: review?(roc)
Attachment #457927 - Flags: review?(roc)
Pushed http://hg.mozilla.org/mozilla-central/rev/659b3ff05e37

j.j., can you test in tomorrow's nightlies?

If I add a local fix for bug 560616, the time drops from 1.2s to 0.2 seconds or so for me.  This is comparable to the Opera/Chrome times.

Marking fixed (though bug 560616 is still outstanding and also needs fixing).
Status: ASSIGNED → RESOLVED
Closed: 14 years ago
Flags: in-testsuite?
Resolution: --- → FIXED
Depends on: 580167
Mozilla/5.0 (Windows; Windows NT 5.1; rv:2.0b2pre) Gecko/20100720 Minefield/4.0b2pre

 < 4 seconds
Alright.  So yeah, the tiny L2 cache was killing you.

This will get faster still as I fix the remaining bugs in the "depends on" list.
I'm calling this fixed.  With bug 560616 fixed, we're 3-5x faster than we were last week, and 50x faster than Chrome or Safari on the "reduced testcase, unzipped" testcase.
> we're 3-5x faster than we were last week, and 50x faster than Chrome or Safari

... but 2-3x slower than Opera 12.18 :-) .  Chromium/Blink has an open issue for a regression of this since 2011.
Status: RESOLVED → VERIFIED
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: