Closed Bug 337883 Opened 18 years ago Closed 18 years ago

This testcases causes nsBlockFrame::FindLineFor to look at a freed frame

Categories

(Core :: Layout, defect)

Other
Linux
defect
Not set
critical

Tracking

()

RESOLVED FIXED

People

(Reporter: jruderman, Assigned: MatsPalmgren_bugz)

References

Details

(Keywords: testcase, verified1.8.0.7, verified1.8.1, Whiteboard: [sg:critical?])

Attachments

(4 files, 2 obsolete files)

Steps to reproduce:

1. Run a Firefox build with DEBUG_TRACEMALLOC_FRAMEARENA defined under Valgrind.  (Additional setup that probably isn't relevant: Pear has Valgrind 3.1.1, and the Firefox build has patches from bug 287116 and bug 184614 to fix other errors Valgrind detects.)

2. Load the testcase.

Result: Valgrind complains about an invalid read on a deleted frame.  Simplified output below, will attach more complete output.

Invalid read of size 4
  at nsBlockFrame::FindLineFor(nsIFrame*) 
  by nsBlockFrame::PrepareChildIncrementalReflow(nsBlockReflowState&) 
  by nsBlockFrame::Reflow(nsPresContext*, nsHTMLReflowMetrics&, nsHTMLReflowState const&, unsigned&)

Address 0x7ED3380 is 48 bytes inside a block of size 52 free'd
  at free (vg_replace_malloc.c:235)
  by PR_Free   
  by FrameArena::FreeFrame(unsigned, void*) 
  by PresShell::FreeFrame(unsigned, void*) 
  by nsFrame::Destroy() 
  by nsSplittableFrame::Destroy()
Attached file testcase (obsolete) —
Attached file valgrind output for the invalid read (obsolete) —
So is the frame passed to FindLineFor dead?  Or something else?  Line numbers would really help here.
I can't reproduce with a debug build (built from the same source as the opt build with which I can reproduce it).  I think that's going to make it difficult to get a stack trace with line numbers.

One thing I tried was adding an assertion to nsBlockFrame::FindLineFor that attempts to check whether it has been given a deleted frame.  My assertion complains if aFrame->GetParent() & 0ffff0000 == 0xdddd0000.  I was able to reproduce a bug (this bug?) with a different testcase earlier this week using that trick, but now I can't.  That testcase involved height and clear.

Any ideas for what else I can try?  Would it help if you had an account on the Linux QA machines I use through VNC?
You could try compiling an opt build with -g to add symbols without changing the code.
The opt build already has symbols -- that's why the stacks from Valgrind are readable.
You should be able to get line numbers compiled into your opt build too.
(In reply to comment #6)
> The opt build already has symbols -- that's why the stacks from Valgrind are
> readable.

There are a variety of types of symbols which might or might not be present for each piece of the program compiled. Valgrind, like gdb, works with what it can get; even my stripped /usr/lib/mozilla-firefox/firefox-bin image has 44283 symbols (exported symbols, GOT/PLT relocs, etc). Your valgrind stack has some frames with source line numbers, and some without. The -g compiler flag will add .debug_foo sections to the currently compiling obj file, which will greatly increase the resolution of things like Valgrind. But it's not an all-or-nothing affair. You might need to do a clean rebuild with your opt flags set to '-O2 -g'.

To check, run 'readelf -S foo.so' to get a section list for foo.so. If there's a section called .debug_line in there, *some* of the objects contributing to the .so have debug line records. Unless the SIZE column in the output shows lots of data (in mozilla's case, probably megabytes) you might not have rebuilt every object in the .so with -g.
Not sure why this approach worked, but it did ;)
Attachment #221938 - Attachment is obsolete: true
Attachment #221939 - Attachment is obsolete: true
So I think it's fc->mPlaceholder that's dead, not aFrame.  I still can't figure out how to reproduce on Mac OS X, though :/
Attached file Frame dump
The problem seem to occur when removing a placeholder in
nsBlockFrame::RemoveFrame that is also in the float cache.
Attached patch Patch rev. 1Splinter Review
This fixes the two testcases. I also ran it over a set of ~220 URLs and
got this statistics: there were 2217 calls to RemovePlaceholderDescendantsOf
and only 8 of those had a float cache item (exactly 1 in all cases).
For that item we peeked at 2.125 frames on average (including aFrame) and got
no match (i.e. none of the pages triggered the bug).
(so I don't think the suggested fix will have any performance impact)
Attachment #222785 - Flags: review?(roc)
Comment on attachment 222785 [details] [diff] [review]
Patch rev. 1

The NS_ABORT_IF_FALSE in RemovePlaceholderDescendantsOf
is a mistake, I'll remove that...
Assignee: nobody → mats.palmgren
I got this error on the "Linux prometheus" Tinderbox:
nsLineBox.cpp:518: error: invalid conversion from `nsIFrame*' to `long int'

So I did this change:
-      if (NS_UNLIKELY(frame)) {
+      if (NS_UNLIKELY(frame != nsnull)) {

I wonder why this should be necessary though...
(it didn't break any of the other builds so what's special with "prometheus"?)
Checked in on trunk 2006-05-24 18:48 PDT

-> FIXED
Status: NEW → RESOLVED
Closed: 18 years ago
Resolution: --- → FIXED
Compiler version?
prometheus tinderbox build log says:

uname: Linux prometheus.mozilla.org 2.4.21-27.0.4.ELsmp #1 SMP Sat Apr 16 18:43:06 EDT 2005 i686 i686 i386 GNU/Linux
Compiler is -- gcc (gcc (GCC) 3.3.2 20031022 (Red Hat Linux 3.3.2-1)

The compile error I got (comment 15) seems like a bug to me, I would expect
that if "if(x);" compiles then so should "if(NS_UNLIKELY(x));" for any given x.

Is sizeof(long)==32 and sizeof(void*)==64 on this box by any chance?
(In reply to comment #18)
> uname: Linux prometheus.mozilla.org 2.4.21-27.0.4.ELsmp #1 SMP Sat Apr 16
> 18:43:06 EDT 2005 i686 i686 i386 GNU/Linux

> Is sizeof(long)==32 and sizeof(void*)==64 on this box by any chance?

Not with that uname output.

We should probably just fix the NS_LIKELY and NS_UNLIKELY macros to cast in some way (but only in the case where they're actually interesting).
(filed bug 340244 on improving NS_LIKELY/NS_UNLIKELY)
Neither testcase seems to crash ff1.5.0.4 or a debug Bon Echo build, is this a trunk-only bug?
(In reply to comment #21)
> Neither testcase seems to crash ff1.5.0.4 or a debug Bon Echo build,
> is this a trunk-only bug?

Note that isn't a crasher. I tried one of my pre-patches that has trace
printf's and I can see that, 1) the testcase do trigger the code that the
patch adds. 2) without the patch we do find a placeholder that is a
descendant of a deleted frame in nsBlockFrame::FindLineFor.

I don't have a Valgrind build for the 1.8 branch, but I would say the
printf's strongly suggests the invalid read also occurs on branch (and
that the patch would fix it).
Flags: blocking1.8.1?
Flags: blocking1.8.0.6?
Whiteboard: [sg:critical?]
Attachment #222785 - Flags: approval1.8.1?
Attachment #222785 - Flags: approval1.8.0.6?
Flags: blocking1.8.1? → blocking1.8.1+
Attachment #222785 - Flags: approval1.8.1? → approval1.8.1+
Flags: blocking1.8.0.6? → blocking1.8.0.6+
Checked in to MOZILLA_1_8_BRANCH at 2006-07-28 11:20 PDT.
Keywords: fixed1.8.1
Comment on attachment 222785 [details] [diff] [review]
Patch rev. 1

approved for 1.8.0 branch, a=dveditz for drivers
Attachment #222785 - Flags: approval1.8.0.7? → approval1.8.0.7+
Checked in to MOZILLA_1_8_0_BRANCH at 2006-08-10 03:18 PDT.
Keywords: fixed1.8.0.7
Depends on: 348688
The real fix for this bug is in bug 348688 (which backed out this patch).
Real fix will be on branches soon...
https://bugzilla.mozilla.org/attachment.cgi?id=222649
ff2b2 debug windows, linux no crash
ff2b2 windows, linux, macppc no crash
v.fixed on both branches with 8/24 nightly builds, no crash with testcase:
Mozilla/5.0 (Macintosh; U; Intel Mac OS X; en-US; rv:1.8.0.7pre) Gecko/20060824 Firefox/1.5.0.7pre
Mozilla/5.0 (Macintosh; U; Intel Mac OS X; en-US; rv:1.8.1b2) Gecko/20060824 BonEcho/2.0b2
Group: security
Flags: in-testsuite?
Crashtests checked in.  I think sayrer runs through the crashtests with valgrind enabled frequently enough ;)
Flags: in-testsuite? → in-testsuite+
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: