Closed Bug 566066 Opened 14 years ago Closed 3 years ago

loading tinderbox log causes massive memory and cpu usage

Categories

(Core :: Layout: Text and Fonts, defect)

x86
Windows 7
defect
Not set
normal

Tracking

()

RESOLVED INCOMPLETE

People

(Reporter: vlad, Unassigned)

References

Details

(Keywords: memory-footprint, perf)

Attachments

(3 files, 1 obsolete file)

Loading this tinderbox log: http://tinderbox.mozilla.org/showlog.cgi?log=MozillaTry/1273782808.1273795801.23338.gz&fulltext=1 Causes not only significant cpu usage, but also causes us to consume close to 1GB memory at the end. It takes about 15 minutes to load on my fast core i7 laptop (no swapping). I'm grabbing an xperf profile now (original one identified nsTextFrame::GetLastInFlow as the culprit for basically 100% of CPU, but I couldn't get a full stack). Also, unfortunately, my about:memory instrumentation isn't fully capturing whatever's causing the memory growth: Win32 Private Bytes - ~500MB xpcom ns*String strings - 60MB JS GC chunks - 23MB JS "Other" - 14MB - JS Strings - 5MB (part of "Other") Storage - 10.5MB PresShell Arenas - 25MB This is with tinderboxpushlog for MozillaTry loaded plus the above log. Upon closing tinderboxpushlog, not much changes (I can still use the browser while this is loading, it's just very slow), though now that I reloaded about:memory I see that xpcom strings grew to 120MB.
Attached file profile data
Here's the xperf output -- note that the total percentage of firefox.exe is 25.04, so 24.65 is 98.44% of firefox's CPU usage, etc. Basically, we're looking at this: xul.dll!nsBlockReflowContext::ReflowBlock, 88582.652769, 24.24, 88625, |- xul.dll!nsBlockFrame::Reflow, 63906.820667, 17.49, 63937, |- xul.dll!nsBlockFrame::ResolveBidi, 24673.831611, 6.75, 24686, 72% in Reflow, 28% in ResolveBidi; deeper trace under Reflow is in the log, a large chunk of which is in EnsureTextRun. Not sure yet where the memory is going, will trace that next.
Some details on memory usage: A lot of the excessive memory usage here could easily be due to the windows' heap fragmentation suckiness. I can't use jemalloc and use xperf's heap tracking, which is unfortunate, but I see an outstanding allocation size ending up at around 150MB (which is entirely reasonable for loading a 44MB text file -- smaller would be better for sure, but..), but a heap size of around 400MB. Some large allocations -- Strings, obviously. The data is served without a content-length, so we're constantly growing this string looks like. The actual data size is around 44MB. Might be nice to trim for large buffers like this, if we don't already. None of this is surprising, but interesting to see: -- this is stack AAA below: |- xul.dll!nsTimerImpl::Fire | xul.dll!nsRefreshDriver::Notify | xul.dll!PresShell::FlushPendingNotifications | xul.dll!PresShell::ProcessReflowCommands | xul.dll!PresShell::DoReflow | xul.dll!ViewportFrame::Reflow | xul.dll!nsContainerFrame::ReflowChild | xul.dll!nsHTMLScrollFrame::Reflow | xul.dll!nsHTMLScrollFrame::ReflowContents | xul.dll!nsHTMLScrollFrame::ReflowScrolledFrame | xul.dll!nsContainerFrame::ReflowChild | xul.dll!nsCanvasFrame::Reflow | xul.dll!nsContainerFrame::ReflowChild | xul.dll!nsBlockFrame::Reflow | xul.dll!nsBlockFrame::ReflowDirtyLines | xul.dll!nsBlockFrame::ReflowLine | xul.dll!nsBlockFrame::ReflowBlockFrame | xul.dll!nsBlockReflowContext::ReflowBlock | xul.dll!nsBlockFrame::Reflow | xul.dll!nsBlockFrame::ReflowDirtyLines | xul.dll!nsBlockFrame::ReflowLine | xul.dll!nsBlockFrame::ReflowBlockFrame | xul.dll!nsBlockReflowContext::ReflowBlock | xul.dll!nsBlockFrame::ResolveBidi | xul.dll!nsBidiPresUtils::Resolve | xul.dll!nsBidiPresUtils::CreateBlockBuffer | xul.dll!nsTextFragment::AppendTo | xul.dll!AppendASCIItoUTF16 | xul.dll!SetLengthForWriting | xul.dll!nsAString_internal::SetLength | xul.dll!nsAString_internal::SetCapacity | xul.dll!nsAString_internal::MutatePrep | xul.dll!nsStringBuffer::Realloc | msvcr100_clr0400.dll!realloc -- this is stack BBB below: |- xul.dll!nsBaseAppShell::OnProcessNextEvent | xul.dll!nsBaseAppShell::DoProcessNextNativeEvent | xul.dll!nsAppShell::ProcessNextNativeEvent | user32.dll!DispatchMessageW | user32.dll!DispatchMessageWorker | user32.dll!UserCallWinProcCheckWow | user32.dll!InternalCallWinProc | xul.dll!nsWindow::WindowProc | xul.dll!nsWindow::DispatchPendingEvents | xul.dll!NS_ProcessPendingEvents_P | xul.dll!nsThread::ProcessNextEvent | xul.dll!nsTimerImpl::Fire | xul.dll!nsRefreshDriver::Notify | xul.dll!PresShell::FlushPendingNotifications | xul.dll!PresShell::ProcessReflowCommands | xul.dll!PresShell::DoReflow | xul.dll!ViewportFrame::Reflow | xul.dll!nsContainerFrame::ReflowChild | xul.dll!nsHTMLScrollFrame::Reflow | xul.dll!nsHTMLScrollFrame::ReflowContents | xul.dll!nsHTMLScrollFrame::ReflowScrolledFrame | xul.dll!nsContainerFrame::ReflowChild | xul.dll!nsCanvasFrame::Reflow | xul.dll!nsContainerFrame::ReflowChild | xul.dll!nsBlockFrame::Reflow | xul.dll!nsBlockFrame::ReflowDirtyLines | xul.dll!nsBlockFrame::ReflowLine | xul.dll!nsBlockFrame::ReflowBlockFrame | xul.dll!nsBlockReflowContext::ReflowBlock | xul.dll!nsBlockFrame::Reflow | xul.dll!nsBlockFrame::ReflowDirtyLines | xul.dll!nsBlockFrame::ReflowLine | xul.dll!nsBlockFrame::ReflowBlockFrame | xul.dll!nsBlockReflowContext::ReflowBlock | xul.dll!nsBlockFrame::ResolveBidi | xul.dll!nsBidiPresUtils::Resolve | xul.dll!nsBidiPresUtils::CreateBlockBuffer | xul.dll!nsTextFragment::AppendTo | xul.dll!AppendASCIItoUTF16 | xul.dll!SetLengthForWriting | xul.dll!nsAString_internal::SetLength | xul.dll!nsAString_internal::SetCapacity | xul.dll!nsAString_internal::MutatePrep | xul.dll!nsStringBuffer::Realloc | msvcr100_clr0400.dll!realloc Size, Count, AllocTime, Stack, FreeTime, Address Size: 59686922 59686922, 1, 7.820784098, AAA, 9.013569240, 0x0000000024350020 59686922, 1, 9.157039579, BBB, 10.122591624, 0x0000000021a50020 59686922, 1, 10.234637649, AAA, 121.435413600, 0x0000000023d40020 Size: 29843466 29843466, 1, 7.779839645, AAA, 7.839154211, 0x00000000226d0020 29843466, 1, 9.114623145, BBB, 9.175517473, 0x000000001fdd0020 29843466, 1, 10.194868393, AAA, 10.253481229, 0x00000000220c0020 Size: 21691653 21691653, 1, 9.232790360, BBB, 10.124212573, 0x0000000015a70020 21691653, 1, 9.517800605, BBB, 10.125852383, 0x000000001fdd0020 Size: 21593349 21593349, 1, 7.893348412, AAA, 9.015185185, 0x0000000015a70020 21593349, 1, 8.174921598, AAA, 9.016846166, 0x00000000226d0020 Size: 14921738 14921738, 1, 7.760541845, AAA, 7.788865532, 0x0000000016190020 14921738, 1, 9.094473491, BBB, 9.125476305, 0x0000000016190020 14921738, 1, 10.174848846, AAA, 10.204053642, 0x0000000021280020 Size: 7460874 7460874, 1, 7.751001304, AAA, 7.765253799, 0x0000000015a70020 7460874, 1, 9.084343234, BBB, 9.099302464, 0x0000000015a70020 7460874, 1, 10.164789417, AAA, 10.179455713, 0x0000000016f20020 Size: 3730442 3730442, 1, 7.746263560, AAA, 7.753304738, 0x0000000005dc0020 3730442, 1, 9.079440739, BBB, 9.086742901, 0x0000000005dc0020 3730442, 1, 10.160103639, AAA, 10.167023177, 0x0000000005dc0020 Size: 1865226 1865226, 1, 7.743863509, AAA, 7.747241673, 0x0000000005bf0020 1865226, 1, 9.076934062, BBB, 9.080588222, 0x0000000005bf0020 1865226, 1, 10.157599271, AAA, 10.161034404, 0x0000000005bf0020 Size: 1048584 1048584, 1, 7.738576467, AAA, 8.974104850, 0x0000000003e00020 1048584, 1, 9.071683588, BBB, 10.088017411, 0x0000000003e00020 1048584, 1, 10.149917181, AAA, 11.174884260, 0x0000000003e00020 Sorting this by AllocTime: Size, Count, AllocTime, Stack, FreeTime, Address 1048584, 1, 7.738576467, AAA, 8.974104850, 0x0000000003e00020 1865226, 1, 7.743863509, AAA, 7.747241673, 0x0000000005bf0020 3730442, 1, 7.746263560, AAA, 7.753304738, 0x0000000005dc0020 7460874, 1, 7.751001304, AAA, 7.765253799, 0x0000000015a70020 14921738, 1, 7.760541845, AAA, 7.788865532, 0x0000000016190020 29843466, 1, 7.779839645, AAA, 7.839154211, 0x00000000226d0020 59686922, 1, 7.820784098, AAA, 9.013569240, 0x0000000024350020 21593349, 1, 7.893348412, AAA, 9.015185185, 0x0000000015a70020 21593349, 1, 8.174921598, AAA, 9.016846166, 0x00000000226d0020 1048584, 1, 9.071683588, BBB, 10.088017411, 0x0000000003e00020 1865226, 1, 9.076934062, BBB, 9.080588222, 0x0000000005bf0020 3730442, 1, 9.079440739, BBB, 9.086742901, 0x0000000005dc0020 7460874, 1, 9.084343234, BBB, 9.099302464, 0x0000000015a70020 14921738, 1, 9.094473491, BBB, 9.125476305, 0x0000000016190020 29843466, 1, 9.114623145, BBB, 9.175517473, 0x000000001fdd0020 59686922, 1, 9.157039579, BBB, 10.122591624, 0x0000000021a50020 21691653, 1, 9.232790360, BBB, 10.124212573, 0x0000000015a70020 21691653, 1, 9.517800605, BBB, 10.125852383, 0x000000001fdd0020 1048584, 1, 10.149917181, AAA, 11.174884260, 0x0000000003e00020 1865226, 1, 10.157599271, AAA, 10.161034404, 0x0000000005bf0020 3730442, 1, 10.160103639, AAA, 10.167023177, 0x0000000005dc0020 7460874, 1, 10.164789417, AAA, 10.179455713, 0x0000000016f20020 14921738, 1, 10.174848846, AAA, 10.204053642, 0x0000000021280020 29843466, 1, 10.194868393, AAA, 10.253481229 0x00000000220c0020 59686922, 1, 10.234637649, AAA, 121.435413600, 0x0000000023d40020 The final allocation is the string that lives on. Also, two 21,789,957 allocations also live on -- the final size is around 155MB (from tracked allocations, there might be more outside of my range, also issues due to fragmentation etc.), lokos like in Bidi: ntdll.dll!_RtlUserThreadStart ntdll.dll!__RtlUserThreadStart kernel32.dll!BaseThreadInitThunk firefox.exe!__tmainCRTStartup firefox.exe!wmain firefox.exe!NS_internal_main xul.dll!XRE_main xul.dll!nsAppStartup::Run xul.dll!nsAppShell::Run xul.dll!nsBaseAppShell::Run xul.dll!MessageLoop::Run xul.dll!MessageLoop::RunInternal xul.dll!mozilla::ipc::MessagePump::Run xul.dll!NS_ProcessNextEvent_P xul.dll!nsThread::ProcessNextEvent xul.dll!nsTimerImpl::Fire xul.dll!nsRefreshDriver::Notify xul.dll!PresShell::FlushPendingNotifications xul.dll!PresShell::ProcessReflowCommands xul.dll!PresShell::DoReflow xul.dll!ViewportFrame::Reflow xul.dll!nsContainerFrame::ReflowChild xul.dll!nsHTMLScrollFrame::Reflow xul.dll!nsHTMLScrollFrame::ReflowContents xul.dll!nsHTMLScrollFrame::ReflowScrolledFrame xul.dll!nsContainerFrame::ReflowChild xul.dll!nsCanvasFrame::Reflow xul.dll!nsContainerFrame::ReflowChild xul.dll!nsBlockFrame::Reflow xul.dll!nsBlockFrame::ReflowDirtyLines xul.dll!nsBlockFrame::ReflowLine xul.dll!nsBlockFrame::ReflowBlockFrame xul.dll!nsBlockReflowContext::ReflowBlock xul.dll!nsBlockFrame::Reflow xul.dll!nsBlockFrame::ReflowDirtyLines xul.dll!nsBlockFrame::ReflowLine xul.dll!nsBlockFrame::ReflowBlockFrame xul.dll!nsBlockReflowContext::ReflowBlock xul.dll!nsBlockFrame::ResolveBidi xul.dll!nsBidiPresUtils::Resolve xul.dll!nsBidi::SetPara xul.dll!nsBidi::GetMemory msvcr100_clr0400.dll!realloc So out of our 150MB total allocated size, we have 59+21*2 = 91MB in strings in these two places. Ouch. We then have around 60k 96-byte TextRun structure allocations, and slightly fewer 24-byte allocations from BuildTextRunsScanner::BuildTextRunForFrames, and then a bunch of various-sized allocations for textruns in their AllocateStorage function.
I can't comment on whether we need to stick all the text in the block into a single buffer; let's assume we do for now (though it might be nice to avoid). Given that premise, one obvious issue here is that truncating a string to 0 length always releases its buffer. I could have sworn we had a bug on this.. So in CreateBlockBuffer we free the existing buffer up front and then immediately reallocate a big buffer. Ideally what we would do is keep the buffer around and keep reusing it, then deallocate off a timer or something. The SetPara/GetMemory stack is nsBidi not releasing its various buffers. It looks like it just holds on to them for app lifetime, but they're effecitively singletons (so using Realloc to resize if they're too small, but never shrinking them). Fundamentally, the fact that we keep redoing bidi processing on the whole thing over and over is the real issue.
Hmm, I'll run a profile and let it collect data for the entire load process so that we can examine what's going on perf-wise in each time chunk. I'm not quite sure what bidi's doing -- is the time I saw that was being spent underneath Reflow dealing with textruns and whatnot caused by whatever work ResolveBidi's doing?
I would think part of the plan in comment 2 would involve not copying all the text into a single buffer (since we would no longer need to operate on all the text at once).
> is the time I saw that was being spent underneath Reflow dealing with textruns > and whatnot caused by whatever work ResolveBidi's doing? Chances are, yes.
Out of about 130MB that gets used by presshell arenas, 100MB of it is inside nsBidi (which is per prescontext/presshell). Pretty sure this memory isn't used at all after the initial resolution is done. However, the process is up to 500MB at this point, so I'm still trying to account for the others. Need to instrument a few more things.
er, "that gets used by presshell arenas" -> "that gets used by presshell/prescontext, counting arenas and bidi memory for now"
any more instrumentation in play here now? 100MB is a spicy meatball.
Blocks: 390341
No longer blocks: 390341
So I still see this, when loading a log through my fetchlog script (iso8859-1, gzip-encoded passthrough) -- e.g. http://vlad.off.net/misc/fetchlog.php?log=http%3A%2F%2Fftp.mozilla.org%2Fpub%2Fmozilla.org%2Ffirefox%2Ftinderbox-builds%2Fcedar-win32%2F1286598284%2Fcedar_win7_test-mochitests-1-build0.txt.gz But I don't think it's bidi related, and may be extension related. If I do it in a fresh browser/clean profile, it uses up about 250-300MB while loading then drops down to about 100MB per log file, and takes about 3-4 seconds of frozen browser to load. If I do it in my normal browser with a dozen or so tabs open, my browser is usually frozen for 30-40 seconds, working set spikes by about 350-450MB per log file, and if I open too many of them (e.g. 4) my browser crashes due to out of memory.
Is this a dupe/related of bug 216418?
The patches that landed in bug 631035 should address at least some of the memory issues here. It'd be worth re-testing to see if this can be resolved as a dup of that bug. (Which perhaps ought to have been duped to this one, but that's beside the point.)
> Is this a dupe/related of bug 216418? No. Jonathan, none of the profiles done for this bug have showed the issue from bug 631035; it's possible that issue was introduced after the profiles were done. What will fix this bug is not doing bidi reordering on the multi-megabyte "paragraph", or making said reordering more efficient....
(In reply to comment #15) > What will fix this bug is not doing bidi reordering on the multi-megabyte > "paragraph", or making said reordering more efficient.... Setting dependency on bug 263359, which has reviewed patches targetted for post-FF4, and will make us do bidi reordering line-by-line on preformatted text, which should help with this. I'm not sure to what extent this bug overlaps with bug 477495 and bug 330350, which were already marked as dependencies. I thought there was also a bug about implementing the plan linked to in comment 2, but I can't find it.
Depends on: 263359
Depends on: 646359
Attached a jprof of the 32MB testcase from bug 650189. (took 6:40 to load on an Athlon x2). 90% is in GetLastInFlow, and I think all of those are actually from nsTextFrame::GetInFlowContentLength() (inlined). On a huge text flow, it may walk the remainder of the list for many (all?) nodes, which is at least O(N log N), and may be closer to O(N^2)
Attached patch Patch (obsolete) — Splinter Review
Fixed by caching the inflow length on the content node, as roc suggested on IRC and email. Testing this patch with jprof on the testcase from bug 650189, for me GetLastInFlow goes down from 59% of the profile to "0%" (2 hits out of 5898). Wallclock time goes down from 44 seconds to 14 seconds on average.
Assignee: nobody → smontagu
Attachment #533579 - Flags: review?(roc)
Comment on attachment 533579 [details] [diff] [review] Patch Review of attachment 533579 [details] [diff] [review]: ----------------------------------------------------------------- I'm a bit worried about the overhead of these node properties. Node property access isn't cheap like frame properties are :-(. Can we guard the flowlength deletions in Init and CharacterDataChanged on FRAME_IS_BIDI or at least whether bidi is enabled for the presshell? ::: layout/generic/nsTextFrameThebes.cpp @@ +633,5 @@ > + > + GetOffsets(start, end); > + if (flowLength && flowLength->mStartOffset <= mContentOffset && > + flowLength->mEndFlowOffset > mContentOffset && > + flowLength->mEndFlowOffset >= end) { Surely end >= mContentOffset, so checking mEndFlowOffset > mContentOffset is redundant? But it seems to me that if flowLength->mStartOffset <= mContentOffset and flowLength->mEndFlowOffset > mContentOffset then the in-flow content length for this frame *must* be equal to mEndFlowOffset. And we should assert that that is >= end (otherwise our frame crosses a fixed-continuation boundary, which would be disastrous). @@ +647,5 @@ > + } > + > + if (!flowLength) { > + flowLength = new FlowLengthProperty; > + if (flowLength) { null check not needed
Attached patch Patch v.2Splinter Review
(In reply to comment #19) > I'm a bit worried about the overhead of these node properties. Node property > access isn't cheap like frame properties are :-(. Can we guard the > flowlength deletions in Init and CharacterDataChanged on FRAME_IS_BIDI or at > least whether bidi is enabled for the presshell? I think testing for bidi-enabled is safer, since FRAME_IS_BIDI could change after CharacterDataChanged. > Surely end >= mContentOffset, so checking mEndFlowOffset > mContentOffset is > redundant? > > But it seems to me that if flowLength->mStartOffset <= mContentOffset and > flowLength->mEndFlowOffset > mContentOffset then the in-flow content length > for this frame *must* be equal to mEndFlowOffset. And we should assert that > that is >= end (otherwise our frame crosses a fixed-continuation boundary, > which would be disastrous). Yes, I think you're right here. In fact that is almost what I first wrote, except that I made the mistake of testing for flowLength->mEndFlowOffset >= mContentOffset, and when that wasn't right I tried to fix it by using end instead of mContentOffset. Then that caused an assertion in one of Jesse's evil tests on tryserver so I put in the extra check for > mContentOffset. Sigh.
Attachment #533579 - Attachment is obsolete: true
Attachment #533674 - Flags: review?(roc)
Attachment #533579 - Flags: review?(roc)
Comment on attachment 533674 [details] [diff] [review] Patch v.2 Review of attachment 533674 [details] [diff] [review]: -----------------------------------------------------------------
Attachment #533674 - Flags: review?(roc) → review+
http://hg.mozilla.org/mozilla-central/rev/107bbdaf84c0 This only addresses the "cpu usage" part of the bug, not the "massive memory", so leaving open for now, but bug 624798 will help with the memory usage.
Depends on: 624798
Depends on: 660416
Component: General → Layout: Text

Testcase URL no longer works

Flags: needinfo?(jfkthame)

The bug assignee didn't login in Bugzilla in the last 7 months.
:jfkthame, could you have a look please?
For more information, please visit auto_nag documentation.

Assignee: smontagu → nobody
Flags: needinfo?(jfkthame)

Given that:

(a) a patch landed here to address CPU usage (comment 22),
(b) bug 624798 should have helped with memory usage (also see comment 22),
(c) much of the code will have changed over the past decade, and
(d) the testcase is no longer available

...I think we should close this (as "incomplete", given that we didn't apparently verify how much the above patches helped, and can no longer make meaningful comparisons).

I'm sure there are still cases where CPU and/or memory usage is significantly higher than we'd like, and it would be good to investigate them further, but let's do that by filing new bugs with relevant testcases; this specific bug is no longer usefully actionable.

Status: NEW → RESOLVED
Closed: 3 years ago
Flags: needinfo?(jfkthame)
Resolution: --- → INCOMPLETE
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: