This testcases causes nsBlockFrame::FindLineFor to look at a freed frame

RESOLVED FIXED

Status

()

Core
Layout
--
critical
RESOLVED FIXED
11 years ago
10 years ago

People

(Reporter: Jesse Ruderman, Assigned: Mats Palmgren (vacation - back in August))

Tracking

(Blocks: 1 bug, {testcase, verified1.8.0.7, verified1.8.1})

Trunk
Other
Linux
testcase, verified1.8.0.7, verified1.8.1
Points:
---
Dependency tree / graph
Bug Flags:
blocking1.8.1 +
blocking1.8.0.7 +
in-testsuite +

Firefox Tracking Flags

(Not tracked)

Details

(Whiteboard: [sg:critical?])

Attachments

(4 attachments, 2 obsolete attachments)

(Reporter)

Description

11 years ago
Steps to reproduce:

1. Run a Firefox build with DEBUG_TRACEMALLOC_FRAMEARENA defined under Valgrind.  (Additional setup that probably isn't relevant: Pear has Valgrind 3.1.1, and the Firefox build has patches from bug 287116 and bug 184614 to fix other errors Valgrind detects.)

2. Load the testcase.

Result: Valgrind complains about an invalid read on a deleted frame.  Simplified output below, will attach more complete output.

Invalid read of size 4
  at nsBlockFrame::FindLineFor(nsIFrame*) 
  by nsBlockFrame::PrepareChildIncrementalReflow(nsBlockReflowState&) 
  by nsBlockFrame::Reflow(nsPresContext*, nsHTMLReflowMetrics&, nsHTMLReflowState const&, unsigned&)

Address 0x7ED3380 is 48 bytes inside a block of size 52 free'd
  at free (vg_replace_malloc.c:235)
  by PR_Free   
  by FrameArena::FreeFrame(unsigned, void*) 
  by PresShell::FreeFrame(unsigned, void*) 
  by nsFrame::Destroy() 
  by nsSplittableFrame::Destroy()
(Reporter)

Comment 1

11 years ago
Created attachment 221938 [details]
testcase
(Reporter)

Comment 2

11 years ago
Created attachment 221939 [details]
valgrind output for the invalid read

Comment 3

11 years ago
So is the frame passed to FindLineFor dead?  Or something else?  Line numbers would really help here.
(Reporter)

Comment 4

11 years ago
I can't reproduce with a debug build (built from the same source as the opt build with which I can reproduce it).  I think that's going to make it difficult to get a stack trace with line numbers.

One thing I tried was adding an assertion to nsBlockFrame::FindLineFor that attempts to check whether it has been given a deleted frame.  My assertion complains if aFrame->GetParent() & 0ffff0000 == 0xdddd0000.  I was able to reproduce a bug (this bug?) with a different testcase earlier this week using that trick, but now I can't.  That testcase involved height and clear.

Any ideas for what else I can try?  Would it help if you had an account on the Linux QA machines I use through VNC?
You could try compiling an opt build with -g to add symbols without changing the code.
(Reporter)

Comment 6

11 years ago
The opt build already has symbols -- that's why the stacks from Valgrind are readable.
You should be able to get line numbers compiled into your opt build too.
(In reply to comment #6)
> The opt build already has symbols -- that's why the stacks from Valgrind are
> readable.

There are a variety of types of symbols which might or might not be present for each piece of the program compiled. Valgrind, like gdb, works with what it can get; even my stripped /usr/lib/mozilla-firefox/firefox-bin image has 44283 symbols (exported symbols, GOT/PLT relocs, etc). Your valgrind stack has some frames with source line numbers, and some without. The -g compiler flag will add .debug_foo sections to the currently compiling obj file, which will greatly increase the resolution of things like Valgrind. But it's not an all-or-nothing affair. You might need to do a clean rebuild with your opt flags set to '-O2 -g'.

To check, run 'readelf -S foo.so' to get a section list for foo.so. If there's a section called .debug_line in there, *some* of the objects contributing to the .so have debug line records. Unless the SIZE column in the output shows lots of data (in mozilla's case, probably megabytes) you might not have rebuilt every object in the .so with -g.
(Reporter)

Comment 9

11 years ago
Created attachment 222649 [details]
testcase that triggers the bug in a debug build

Not sure why this approach worked, but it did ;)
Attachment #221938 - Attachment is obsolete: true
(Reporter)

Comment 10

11 years ago
Created attachment 222650 [details]
valgrind output for the invalid read in a debug build
Attachment #221939 - Attachment is obsolete: true
(Reporter)

Comment 11

11 years ago
So I think it's fc->mPlaceholder that's dead, not aFrame.  I still can't figure out how to reproduce on Mac OS X, though :/
Created attachment 222784 [details]
Frame dump

The problem seem to occur when removing a placeholder in
nsBlockFrame::RemoveFrame that is also in the float cache.
Created attachment 222785 [details] [diff] [review]
Patch rev. 1

This fixes the two testcases. I also ran it over a set of ~220 URLs and
got this statistics: there were 2217 calls to RemovePlaceholderDescendantsOf
and only 8 of those had a float cache item (exactly 1 in all cases).
For that item we peeked at 2.125 frames on average (including aFrame) and got
no match (i.e. none of the pages triggered the bug).
(so I don't think the suggested fix will have any performance impact)
Attachment #222785 - Flags: review?(roc)
Comment on attachment 222785 [details] [diff] [review]
Patch rev. 1

The NS_ABORT_IF_FALSE in RemovePlaceholderDescendantsOf
is a mistake, I'll remove that...
Attachment #222785 - Flags: superreview+
Attachment #222785 - Flags: review?(roc)
Attachment #222785 - Flags: review+
Assignee: nobody → mats.palmgren
I got this error on the "Linux prometheus" Tinderbox:
nsLineBox.cpp:518: error: invalid conversion from `nsIFrame*' to `long int'

So I did this change:
-      if (NS_UNLIKELY(frame)) {
+      if (NS_UNLIKELY(frame != nsnull)) {

I wonder why this should be necessary though...
(it didn't break any of the other builds so what's special with "prometheus"?)
Checked in on trunk 2006-05-24 18:48 PDT

-> FIXED
Status: NEW → RESOLVED
Last Resolved: 11 years ago
Resolution: --- → FIXED

Comment 17

11 years ago
Compiler version?
prometheus tinderbox build log says:

uname: Linux prometheus.mozilla.org 2.4.21-27.0.4.ELsmp #1 SMP Sat Apr 16 18:43:06 EDT 2005 i686 i686 i386 GNU/Linux
Compiler is -- gcc (gcc (GCC) 3.3.2 20031022 (Red Hat Linux 3.3.2-1)

The compile error I got (comment 15) seems like a bug to me, I would expect
that if "if(x);" compiles then so should "if(NS_UNLIKELY(x));" for any given x.

Is sizeof(long)==32 and sizeof(void*)==64 on this box by any chance?
(In reply to comment #18)
> uname: Linux prometheus.mozilla.org 2.4.21-27.0.4.ELsmp #1 SMP Sat Apr 16
> 18:43:06 EDT 2005 i686 i686 i386 GNU/Linux

> Is sizeof(long)==32 and sizeof(void*)==64 on this box by any chance?

Not with that uname output.

We should probably just fix the NS_LIKELY and NS_UNLIKELY macros to cast in some way (but only in the case where they're actually interesting).
(filed bug 340244 on improving NS_LIKELY/NS_UNLIKELY)
Neither testcase seems to crash ff1.5.0.4 or a debug Bon Echo build, is this a trunk-only bug?
(In reply to comment #21)
> Neither testcase seems to crash ff1.5.0.4 or a debug Bon Echo build,
> is this a trunk-only bug?

Note that isn't a crasher. I tried one of my pre-patches that has trace
printf's and I can see that, 1) the testcase do trigger the code that the
patch adds. 2) without the patch we do find a placeholder that is a
descendant of a deleted frame in nsBlockFrame::FindLineFor.

I don't have a Valgrind build for the 1.8 branch, but I would say the
printf's strongly suggests the invalid read also occurs on branch (and
that the patch would fix it).
Flags: blocking1.8.1?
Flags: blocking1.8.0.6?
Whiteboard: [sg:critical?]
Attachment #222785 - Flags: approval1.8.1?
Attachment #222785 - Flags: approval1.8.0.6?

Updated

11 years ago
Flags: blocking1.8.1? → blocking1.8.1+

Updated

11 years ago
Attachment #222785 - Flags: approval1.8.1? → approval1.8.1+
Flags: blocking1.8.0.6? → blocking1.8.0.6+
Checked in to MOZILLA_1_8_BRANCH at 2006-07-28 11:20 PDT.
Keywords: fixed1.8.1
Comment on attachment 222785 [details] [diff] [review]
Patch rev. 1

approved for 1.8.0 branch, a=dveditz for drivers
Attachment #222785 - Flags: approval1.8.0.7? → approval1.8.0.7+
Checked in to MOZILLA_1_8_0_BRANCH at 2006-08-10 03:18 PDT.
Keywords: fixed1.8.0.7

Updated

11 years ago
Depends on: 348688
The real fix for this bug is in bug 348688 (which backed out this patch).
Real fix will be on branches soon...

Comment 27

11 years ago
https://bugzilla.mozilla.org/attachment.cgi?id=222649
ff2b2 debug windows, linux no crash
ff2b2 windows, linux, macppc no crash
Keywords: fixed1.8.1 → verified1.8.1

Comment 28

11 years ago
v.fixed on both branches with 8/24 nightly builds, no crash with testcase:
Mozilla/5.0 (Macintosh; U; Intel Mac OS X; en-US; rv:1.8.0.7pre) Gecko/20060824 Firefox/1.5.0.7pre
Mozilla/5.0 (Macintosh; U; Intel Mac OS X; en-US; rv:1.8.1b2) Gecko/20060824 BonEcho/2.0b2

Updated

11 years ago
Keywords: fixed1.8.0.7 → verified1.8.0.7
Group: security
Flags: in-testsuite?
(Reporter)

Comment 29

10 years ago
Crashtests checked in.  I think sayrer runs through the crashtests with valgrind enabled frequently enough ;)
Flags: in-testsuite? → in-testsuite+
You need to log in before you can comment on or make changes to this bug.