Last Comment Bug 337883 - This testcases causes nsBlockFrame::FindLineFor to look at a freed frame
: This testcases causes nsBlockFrame::FindLineFor to look at a freed frame
Status: RESOLVED FIXED
[sg:critical?]
: testcase, verified1.8.0.7, verified1.8.1
Product: Core
Classification: Components
Component: Layout (show other bugs)
: Trunk
: Other Linux
: -- critical (vote)
: ---
Assigned To: Mats Palmgren (:mats)
:
Mentors:
Depends on: 348688
Blocks: randomstyles
  Show dependency treegraph
 
Reported: 2006-05-13 20:13 PDT by Jesse Ruderman
Modified: 2007-12-23 16:55 PST (History)
10 users (show)
mtschrep: blocking1.8.1+
dveditz: blocking1.8.0.7+
jruderman: in‑testsuite+
See Also:
Crash Signature:
(edit)
QA Whiteboard:
Iteration: ---
Points: ---
Has Regression Range: ---
Has STR: ---


Attachments
testcase (394 bytes, text/html)
2006-05-13 20:15 PDT, Jesse Ruderman
no flags Details
valgrind output for the invalid read (7.91 KB, text/plain)
2006-05-13 20:16 PDT, Jesse Ruderman
no flags Details
testcase that triggers the bug in a debug build (430 bytes, text/html)
2006-05-19 11:15 PDT, Jesse Ruderman
no flags Details
valgrind output for the invalid read in a debug build (2.67 KB, text/plain)
2006-05-19 11:17 PDT, Jesse Ruderman
no flags Details
Frame dump (23.00 KB, text/html)
2006-05-21 08:20 PDT, Mats Palmgren (:mats)
no flags Details
Patch rev. 1 (4.56 KB, patch)
2006-05-21 08:29 PDT, Mats Palmgren (:mats)
roc: review+
roc: superreview+
dveditz: approval1.8.0.7+
mtschrep: approval1.8.1+
Details | Diff | Splinter Review

Description Jesse Ruderman 2006-05-13 20:13:10 PDT
Steps to reproduce:

1. Run a Firefox build with DEBUG_TRACEMALLOC_FRAMEARENA defined under Valgrind.  (Additional setup that probably isn't relevant: Pear has Valgrind 3.1.1, and the Firefox build has patches from bug 287116 and bug 184614 to fix other errors Valgrind detects.)

2. Load the testcase.

Result: Valgrind complains about an invalid read on a deleted frame.  Simplified output below, will attach more complete output.

Invalid read of size 4
  at nsBlockFrame::FindLineFor(nsIFrame*) 
  by nsBlockFrame::PrepareChildIncrementalReflow(nsBlockReflowState&) 
  by nsBlockFrame::Reflow(nsPresContext*, nsHTMLReflowMetrics&, nsHTMLReflowState const&, unsigned&)

Address 0x7ED3380 is 48 bytes inside a block of size 52 free'd
  at free (vg_replace_malloc.c:235)
  by PR_Free   
  by FrameArena::FreeFrame(unsigned, void*) 
  by PresShell::FreeFrame(unsigned, void*) 
  by nsFrame::Destroy() 
  by nsSplittableFrame::Destroy()
Comment 1 Jesse Ruderman 2006-05-13 20:15:09 PDT
Created attachment 221938 [details]
testcase
Comment 2 Jesse Ruderman 2006-05-13 20:16:48 PDT
Created attachment 221939 [details]
valgrind output for the invalid read
Comment 3 Boris Zbarsky [:bz] (TPAC) 2006-05-14 15:50:06 PDT
So is the frame passed to FindLineFor dead?  Or something else?  Line numbers would really help here.
Comment 4 Jesse Ruderman 2006-05-17 19:19:38 PDT
I can't reproduce with a debug build (built from the same source as the opt build with which I can reproduce it).  I think that's going to make it difficult to get a stack trace with line numbers.

One thing I tried was adding an assertion to nsBlockFrame::FindLineFor that attempts to check whether it has been given a deleted frame.  My assertion complains if aFrame->GetParent() & 0ffff0000 == 0xdddd0000.  I was able to reproduce a bug (this bug?) with a different testcase earlier this week using that trick, but now I can't.  That testcase involved height and clear.

Any ideas for what else I can try?  Would it help if you had an account on the Linux QA machines I use through VNC?
Comment 5 Robert O'Callahan (:roc) (email my personal email if necessary) 2006-05-17 19:49:12 PDT
You could try compiling an opt build with -g to add symbols without changing the code.
Comment 6 Jesse Ruderman 2006-05-17 20:22:28 PDT
The opt build already has symbols -- that's why the stacks from Valgrind are readable.
Comment 7 Robert O'Callahan (:roc) (email my personal email if necessary) 2006-05-17 21:35:39 PDT
You should be able to get line numbers compiled into your opt build too.
Comment 8 Graydon Hoare :graydon 2006-05-18 12:01:00 PDT
(In reply to comment #6)
> The opt build already has symbols -- that's why the stacks from Valgrind are
> readable.

There are a variety of types of symbols which might or might not be present for each piece of the program compiled. Valgrind, like gdb, works with what it can get; even my stripped /usr/lib/mozilla-firefox/firefox-bin image has 44283 symbols (exported symbols, GOT/PLT relocs, etc). Your valgrind stack has some frames with source line numbers, and some without. The -g compiler flag will add .debug_foo sections to the currently compiling obj file, which will greatly increase the resolution of things like Valgrind. But it's not an all-or-nothing affair. You might need to do a clean rebuild with your opt flags set to '-O2 -g'.

To check, run 'readelf -S foo.so' to get a section list for foo.so. If there's a section called .debug_line in there, *some* of the objects contributing to the .so have debug line records. Unless the SIZE column in the output shows lots of data (in mozilla's case, probably megabytes) you might not have rebuilt every object in the .so with -g.
Comment 9 Jesse Ruderman 2006-05-19 11:15:46 PDT
Created attachment 222649 [details]
testcase that triggers the bug in a debug build

Not sure why this approach worked, but it did ;)
Comment 10 Jesse Ruderman 2006-05-19 11:17:22 PDT
Created attachment 222650 [details]
valgrind output for the invalid read in a debug build
Comment 11 Jesse Ruderman 2006-05-19 23:50:18 PDT
So I think it's fc->mPlaceholder that's dead, not aFrame.  I still can't figure out how to reproduce on Mac OS X, though :/
Comment 12 Mats Palmgren (:mats) 2006-05-21 08:20:49 PDT
Created attachment 222784 [details]
Frame dump

The problem seem to occur when removing a placeholder in
nsBlockFrame::RemoveFrame that is also in the float cache.
Comment 13 Mats Palmgren (:mats) 2006-05-21 08:29:40 PDT
Created attachment 222785 [details] [diff] [review]
Patch rev. 1

This fixes the two testcases. I also ran it over a set of ~220 URLs and
got this statistics: there were 2217 calls to RemovePlaceholderDescendantsOf
and only 8 of those had a float cache item (exactly 1 in all cases).
For that item we peeked at 2.125 frames on average (including aFrame) and got
no match (i.e. none of the pages triggered the bug).
(so I don't think the suggested fix will have any performance impact)
Comment 14 Mats Palmgren (:mats) 2006-05-21 10:37:17 PDT
Comment on attachment 222785 [details] [diff] [review]
Patch rev. 1

The NS_ABORT_IF_FALSE in RemovePlaceholderDescendantsOf
is a mistake, I'll remove that...
Comment 15 Mats Palmgren (:mats) 2006-05-24 20:33:21 PDT
I got this error on the "Linux prometheus" Tinderbox:
nsLineBox.cpp:518: error: invalid conversion from `nsIFrame*' to `long int'

So I did this change:
-      if (NS_UNLIKELY(frame)) {
+      if (NS_UNLIKELY(frame != nsnull)) {

I wonder why this should be necessary though...
(it didn't break any of the other builds so what's special with "prometheus"?)
Comment 16 Mats Palmgren (:mats) 2006-05-24 20:34:17 PDT
Checked in on trunk 2006-05-24 18:48 PDT

-> FIXED
Comment 17 Boris Zbarsky [:bz] (TPAC) 2006-05-27 10:37:14 PDT
Compiler version?
Comment 18 Mats Palmgren (:mats) 2006-05-27 11:59:15 PDT
prometheus tinderbox build log says:

uname: Linux prometheus.mozilla.org 2.4.21-27.0.4.ELsmp #1 SMP Sat Apr 16 18:43:06 EDT 2005 i686 i686 i386 GNU/Linux
Compiler is -- gcc (gcc (GCC) 3.3.2 20031022 (Red Hat Linux 3.3.2-1)

The compile error I got (comment 15) seems like a bug to me, I would expect
that if "if(x);" compiles then so should "if(NS_UNLIKELY(x));" for any given x.

Is sizeof(long)==32 and sizeof(void*)==64 on this box by any chance?
Comment 19 David Baron :dbaron: ⌚️UTC-7 (busy September 14-25) 2006-05-27 13:32:32 PDT
(In reply to comment #18)
> uname: Linux prometheus.mozilla.org 2.4.21-27.0.4.ELsmp #1 SMP Sat Apr 16
> 18:43:06 EDT 2005 i686 i686 i386 GNU/Linux

> Is sizeof(long)==32 and sizeof(void*)==64 on this box by any chance?

Not with that uname output.

We should probably just fix the NS_LIKELY and NS_UNLIKELY macros to cast in some way (but only in the case where they're actually interesting).
Comment 20 Mats Palmgren (:mats) 2006-06-03 09:18:51 PDT
(filed bug 340244 on improving NS_LIKELY/NS_UNLIKELY)
Comment 21 Daniel Veditz [:dveditz] 2006-07-07 01:46:14 PDT
Neither testcase seems to crash ff1.5.0.4 or a debug Bon Echo build, is this a trunk-only bug?
Comment 22 Mats Palmgren (:mats) 2006-07-07 02:53:35 PDT
(In reply to comment #21)
> Neither testcase seems to crash ff1.5.0.4 or a debug Bon Echo build,
> is this a trunk-only bug?

Note that isn't a crasher. I tried one of my pre-patches that has trace
printf's and I can see that, 1) the testcase do trigger the code that the
patch adds. 2) without the patch we do find a placeholder that is a
descendant of a deleted frame in nsBlockFrame::FindLineFor.

I don't have a Valgrind build for the 1.8 branch, but I would say the
printf's strongly suggests the invalid read also occurs on branch (and
that the patch would fix it).
Comment 23 Mats Palmgren (:mats) 2006-07-28 12:02:02 PDT
Checked in to MOZILLA_1_8_BRANCH at 2006-07-28 11:20 PDT.
Comment 24 Daniel Veditz [:dveditz] 2006-08-09 14:37:43 PDT
Comment on attachment 222785 [details] [diff] [review]
Patch rev. 1

approved for 1.8.0 branch, a=dveditz for drivers
Comment 25 Mats Palmgren (:mats) 2006-08-10 03:53:33 PDT
Checked in to MOZILLA_1_8_0_BRANCH at 2006-08-10 03:18 PDT.
Comment 26 Mats Palmgren (:mats) 2006-08-17 07:38:02 PDT
The real fix for this bug is in bug 348688 (which backed out this patch).
Real fix will be on branches soon...
Comment 27 Bob Clary [:bc:] 2006-08-22 00:24:08 PDT
https://bugzilla.mozilla.org/attachment.cgi?id=222649
ff2b2 debug windows, linux no crash
ff2b2 windows, linux, macppc no crash
Comment 28 Jay Patel [:jay] 2006-08-24 16:03:13 PDT
v.fixed on both branches with 8/24 nightly builds, no crash with testcase:
Mozilla/5.0 (Macintosh; U; Intel Mac OS X; en-US; rv:1.8.0.7pre) Gecko/20060824 Firefox/1.5.0.7pre
Mozilla/5.0 (Macintosh; U; Intel Mac OS X; en-US; rv:1.8.1b2) Gecko/20060824 BonEcho/2.0b2
Comment 29 Jesse Ruderman 2007-12-23 16:55:24 PST
Crashtests checked in.  I think sayrer runs through the crashtests with valgrind enabled frequently enough ;)

Note You need to log in before you can comment on or make changes to this bug.