Closed Bug 877097 Opened 11 years ago Closed 11 years ago

crash in mozilla::layout::ScrollbarActivity::UpdateOpacity

Categories

(Core :: Layout, defect)

23 Branch
x86_64
macOS
defect
Not set
critical

Tracking

()

VERIFIED FIXED
mozilla25
Tracking Status
firefox21 --- unaffected
firefox22 --- unaffected
firefox23 --- disabled
firefox24 + verified
firefox25 --- verified
firefox-esr17 --- unaffected

People

(Reporter: scoobidiver, Assigned: spohl)

References

Details

(Keywords: crash, regression, Whiteboard: [lion-scrollbars+])

Crash Data

Attachments

(3 files, 3 obsolete files)

It first showed up in 23.0a1/20130503 but is discontinuous across builds.
It doesn't affect Mac OS X 10.6.
It's likely a regression from bug 636564 (frame 1 of the stack trace).

It seems related to bookmarking a page.

Signature 	mozilla::layout::ScrollbarActivity::UpdateOpacity(mozilla::TimeStamp) More Reports Search
UUID	8c66760e-bdb9-461a-8f79-97f3c2130526
Date Processed	2013-05-26 04:02:28
Uptime	599
Last Crash	11.8 hours before submission
Install Age	11.7 hours since version was first installed.
Install Time	2013-05-25 16:20:40
Product	Firefox
Version	24.0a1
Build ID	20130525031005
Release Channel	nightly
OS	Mac OS X
OS Version	10.8.3 12D78
Build Architecture	amd64
Build Architecture Info	family 6 model 23 stepping 10
Crash Reason	EXC_BAD_ACCESS / 0x0000000d
Crash Address	0x0
App Notes 	
AdapterVendorID: 0x10de, AdapterDeviceID: 0x 863GL Layers? GL Context? GL Context+ GL Layers+ 
Processor Notes 	sp-processor02_phx1_mozilla_com_13115:2012; exploitability tool: ERROR: unable to analyze dump
EMCheckCompatibility	True
Adapter Vendor ID	0x10de
Adapter Device ID	0x 863

Frame 	Module 	Signature 	Source
0 	XUL 	mozilla::layout::ScrollbarActivity::UpdateOpacity 	ScrollbarActivity.cpp:385
1 	XUL 	_ZThn8_N7mozilla6layout17ScrollbarActivity11WillRefreshENS_9TimeStampE 	ScrollbarActivity.cpp:107
2 	XUL 	nsRefreshDriver::Tick 	nsRefreshDriver.cpp:925
3 	libsystem_c.dylib 	libsystem_c.dylib@0x19886 	
4 	libsystem_c.dylib 	libsystem_c.dylib@0x18c13 	
5 	libnss3.dylib 	pt_PostNotifies 	
6 	libmozglue.dylib 	arena_malloc 	jemalloc.c:1722
7 	libsystem_c.dylib 	libsystem_c.dylib@0x2d1b3 	
8 	libsystem_c.dylib 	libsystem_c.dylib@0x2dc07 	
9 	libmozalloc.dylib 	moz_xmalloc 	mozalloc.cpp:54
10 	XUL 	nsTArray_base<nsTArrayInfallibleAllocator>::EnsureCapacity 	nsTArray.h:192
11 	XUL 	mozilla::RefreshDriverTimer::Tick 	nsRefreshDriver.cpp:168
12 	XUL 	XUL@0x39bc50 	
...

More reports at:
https://crash-stats.mozilla.com/report/list?signature=mozilla%3A%3Alayout%3A%3AScrollbarActivity%3A%3AUpdateOpacity%28mozilla%3A%3ATimeStamp%29
This may be the intermittent crashes that can occur due to bug 868498. They can be triggered by simply removing a USB mouse.
Whiteboard: [lion-scrollbars+]
After another look at the crash dump (and assuming that the line number is correct in the call stack) it looks like mScrollableFrame is NULL in ScrollbarActivity::GetScrollbarContent. This should never happen...
WillRefresh
  UpdateOpacity
    SetOpacityOnElement
      nsIDOMCSSStyleDeclaration::SetProperty
http://hg.mozilla.org/mozilla-central/annotate/18fc62fd8dcc/layout/style/nsDOMCSSDeclaration.cpp#l168
        nsDOMCSSDeclaration::ParsePropertyValue
http://hg.mozilla.org/mozilla-central/annotate/18fc62fd8dcc/layout/style/nsDOMCSSDeclaration.cpp#l249

I think the mozAutoDocConditionalContentUpdateBatch implies the frame
can be destroyed and thus the ScrollbarActivity object.

Possible fix: add a strong ref on 'this' before the call to UpdateOpacity
to keep it alive and then check some flag, !mIsActive perhaps, to see
if Destroy was called.  Check it after the first SetOpacityOnElement
call and after the UpdateOpacity call in WillRefresh and return early.
For the record, the crashing line ScrollbarActivity.cpp:385 is:

382 nsIContent*
383 ScrollbarActivity::GetScrollbarContent(bool aVertical)
384 {
385  nsIFrame* box = mScrollableFrame->GetScrollbarBox(aVertical);
386  return box ? box->GetContent() : nullptr;
387 }
I can't reproduce the issue, but for those who experienced the crash, could you just check if this quick patch fixes it?
Attachment #757299 - Flags: feedback?(scoobidiver)
Attachment #757299 - Flags: feedback?(matspal)
Comment on attachment 757299 [details] [diff] [review]
Just checking for null at all stages of dereferencing

> if (this ...

Huh? I don't think there is ever a situation where checking if 'this'
is null is correct code.

Please read comment 3 about the need to keep the object alive.
Attachment #757299 - Flags: feedback?(scoobidiver)
Attachment #757299 - Flags: feedback?(matspal)
Attachment #757299 - Flags: feedback-
When calling a non-virtual method, this can be null (or any bogus value in fact). It's actually the 1st (and invisible) parameter passed to the method. Of course, calling a method on a null object is already a mistake.

I read comment 3 of course, but I don't know much about strong refs yet.

And my question remains: does this small patch fixes the crash? It would just give a clue, it's not intended to be a real fix.
Attached patch Patch (obsolete) β€” β€” Splinter Review
This patch incorporates the thoughts in comment 3 (thanks mats!) and is similar to what was done in patch 'part 13' for the overlay scrollbars (see bug 636564 comment 208).
Assignee: nobody → spohl.mozilla.bugs
Attachment #757299 - Attachment is obsolete: true
Attachment #760070 - Flags: review?(roc)
Status: NEW → ASSIGNED
Looks good to me, with s/'this' became null/'this' was destroyed/

I suspect UnsetOpacityOnElement has the same problem through its
call to nsDOMCSSDeclaration::RemoveProperty, so SetIsFading
needs to be fixed in the same way.
Comment on attachment 760070 [details] [diff] [review]
Patch

Review of attachment 760070 [details] [diff] [review]:
-----------------------------------------------------------------

::: layout/generic/ScrollbarActivity.h
@@ +113,5 @@
>    void StopListeningForEventsOnScrollbar(nsIDOMEventTarget* aScrollbar);
>    void RegisterWithRefreshDriver();
>    void UnregisterFromRefreshDriver();
>  
> +  bool UpdateOpacity(TimeStamp aTime); // returns false if 'this' became null

"if 'this' was destroyed"
Attachment #760070 - Flags: review?(roc) → review+
Attached patch Patch β€” β€” Splinter Review
Fixed comment. Carrying over r+.
Attachment #760070 - Attachment is obsolete: true
Attachment #760147 - Flags: review+
https://hg.mozilla.org/integration/mozilla-inbound/rev/122712f43da6
Blocks: 881022
(In reply to Mats Palmgren [:mats] from comment #9)
> I suspect UnsetOpacityOnElement has the same problem through its
> call to nsDOMCSSDeclaration::RemoveProperty, so SetIsFading
> needs to be fixed in the same way.

Filed bug 881022 for this.
https://hg.mozilla.org/mozilla-central/rev/122712f43da6
Status: ASSIGNED → RESOLVED
Closed: 11 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla24
Comment on attachment 760147 [details] [diff] [review]
Patch

[Approval Request Comment]
Bug caused by (feature/regressing bug #): Bug 636564
User impact if declined: Crash
Testing completed (on m-c, etc.): Has been on m-c for a few days and no more crashes have been reported in crash stats since landing.
Risk to taking this patch (and alternatives if risky): low
String or IDL/UUID changes made by this patch: none
Attachment #760147 - Flags: approval-mozilla-aurora?
Attachment #760147 - Flags: approval-mozilla-aurora? → approval-mozilla-aurora+
Requesting checkin for aurora.
Keywords: checkin-needed
https://hg.mozilla.org/releases/mozilla-aurora/rev/2d44faad87c6
Marc, can you try to verify this fixed in Firefox 23 Beta 1:
> ftp://ftp.mozilla.org/pub/mozilla.org/firefox/nightly/23.0b1-candidates/build1/
Keywords: verifyme
QA Contact: mschifer
There are about 10 post-fix crashes for Firefox 23 in Socorro and a few more for Firefox 24. I looked through some of these reports though and they seem to show another bug: https://crash-stats.mozilla.com/report/list?query_search=signature&query_type=contains&reason_type=contains&range_value=4&range_unit=weeks&hang_type=any&process_type=any&signature=mozilla%3A%3Alayout%3A%3AScrollbarActivity%3A%3AUpdateOpacity%28mozilla%3A%3ATimeStamp%29
(In reply to Ioana Budnar, QA [:ioana] from comment #19)
> I looked through some of these reports though and they seem to show another bug
I disagree. Compare the comment and stack trace in a pre-fix crash (see bp-e51ffbea-2c17-4c38-be9b-739b72130609) and a post-fix crash (see bp-faae29c7-6cd2-42ad-bc5e-84f5d2130612). It's not fixed.
It might be related to bug 866402 though.
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
(In reply to Scoobidiver from comment #20)
> (In reply to Ioana Budnar, QA [:ioana] from comment #19)
> > I looked through some of these reports though and they seem to show another bug
> I disagree. Compare the comment and stack trace in a pre-fix crash (see
> bp-e51ffbea-2c17-4c38-be9b-739b72130609) and a post-fix crash (see
> bp-faae29c7-6cd2-42ad-bc5e-84f5d2130612). It's not fixed.
> It might be related to bug 866402 though.

I compared the new crashes with the one in comment 0 and I don't see them as being the same bug. The stack traces there are pretty different. Am I wrong?
Flags: needinfo?(spohl.mozilla.bugs)
(In reply to Ioana Budnar, QA [:ioana] from comment #21)
> The stack traces there are pretty different. Am I wrong?
Yes, you are. XUL frames are the same and libsystem_c.dylib ones differ because it's not the same OS X version.
Flags: needinfo?(spohl.mozilla.bugs)
Depends on: 866402
Dropping verifyme since this is still an issue. Please re-add once a fix lands.
Keywords: verifyme
I've observed similar crashes while working on bug 868498 (note the 'crash' keyword in the bug) but don't have crash dumps to back it up. I'd like to land a patch for bug 868498 and see if this crash still reproduces. My observation has been that when connecting/disconnecting a mouse, or changing the system preferences for the scrollbar style, we could get crashes due to mScrollbarActivity being (or becoming) null in nsGfxScrollFrameInner. This could match the null-dereference that we're seeing here.
So, we're trying to do a do_QueryFrame on a mScrollableFrame that was set to 0x0000000d:
http://hg.mozilla.org/releases/mozilla-aurora/annotate/d0b270f86f3e/layout/generic/ScrollbarActivity.cpp#l322

This gets past the null check in do_QueryFrame:
http://hg.mozilla.org/mozilla-central/annotate/834c8941ae24/layout/generic/nsQueryFrame.h#l86

And we try to dereference 0x0000000d here:
http://hg.mozilla.org/mozilla-central/annotate/834c8941ae24/layout/generic/nsQueryFrame.h#l89

There are scenarios where we destroy ScrollbarActivity objects and from what I can tell, we seem to make sure that refresh observers are removed when that happens. This crash seems to indicate that we still get a tick from a refresh observer that should have been removed and that we don't correctly null out mScrollableFrame. 0x0000000d looks a bit like an offset into a null object, like mScrollableFrame in ScrollbarActivity...

All this doesn't make a whole lot of sense to me right now.
It looks like there have been no more reports of crashes after build 20130716040307, which seems to coincide with the landing of bug 868498. Scoobidiver, could you confirm that I'm reading these stats right? If I don't, could you provide a link that displays those reports correctly? Thanks!
Flags: needinfo?(scoobidiver)
(In reply to Stephen Pohl [:spohl] from comment #26)
> It looks like there have been no more reports of crashes after build
> 20130716040307, which seems to coincide with the landing of bug 868498.
> Scoobidiver, could you confirm that I'm reading these stats right? If I
> don't, could you provide a link that displays those reports correctly?
It happened one build out of five in the worst case so you should wait the build from July 21 to confirm it's fixed. See https://crash-stats.mozilla.com/report/list?product=Firefox&version=Firefox:25.0a1&query_search=signature&query_type=contains&reason_type=contains&range_value=28&range_unit=days&signature=mozilla%3A%3Alayout%3A%3AScrollbarActivity%3A%3AUpdateOpacity%28mozilla%3A%3ATimeStamp%29
Flags: needinfo?(scoobidiver)
There are crashes in 24.0a2/20130718 and 19 after the landing of bug 868498's patch.
Okay, looks like this is still an issue in aurora and nightly. Scoobidiver, could you confirm that we haven't had any more of these crashes in the latest beta build and update the tracking flags? Thanks!
There are no crashes in 23.0b7.
The latest crashes happened in 24.0a2/20130719 and 25.0a1/20130715.
Attached patch Patch part 2 β€” β€” Splinter Review
Looking through the initial patch and reading comment 3 again, I have a feeling that what can be destroyed by SetOpacityOnElement calls isn't actually mScrollableFrame, but rather what's returned by GetHorizontalScrollbar or GetVerticalScrollbar. This is a problem in ScrollbarActivity::GetRefreshDriver, because we may register a refresh observer and later be unable to unregister it, because the required frame was destroyed between registration and unregistration. This patch uses mScrollableFrame's nsPresContext to register refresh observers and *should* therefore avoid this problem.

If this proves to fix this crash we may be able to revert the initial patch.
Attachment #779333 - Flags: review?(roc)
(In reply to Stephen Pohl [:spohl] from comment #31)
> what can be destroyed by SetOpacityOnElement calls isn't
> actually mScrollableFrame, but rather what's returned by
> GetHorizontalScrollbar or GetVerticalScrollbar.

Get*Scrollbar returns anonymous content created by the scroll frame and
has the same lifetime.  Everything can be destroyed by SetOpacityOnElement;
you can't count on anything being alive unless you hold a strong ref on it.

> This is a problem in
> ScrollbarActivity::GetRefreshDriver, because we may register a refresh
> observer and later be unable to unregister it, because the required frame
> was destroyed between registration and unregistration.

How? I don't see how that can occur.  Destroying the scroll frame calls
ScrollbarActivity::Destroy which calls UnregisterFromRefreshDriver() which 
should succeed at that point, afaict.
http://hg.mozilla.org/mozilla-central/annotate/308e3cf5ba75/layout/generic/nsGfxScrollFrame.cpp#l2904
http://hg.mozilla.org/mozilla-central/annotate/308e3cf5ba75/layout/generic/ScrollbarActivity.cpp#l39

> If this proves to fix this crash we may be able to revert the initial patch.

I don't think "Patch part 2" makes any difference.  We still need the first
patch.
(In reply to Mats Palmgren (:mats) from comment #32)
> (In reply to Stephen Pohl [:spohl] from comment #31)
> > This is a problem in
> > ScrollbarActivity::GetRefreshDriver, because we may register a refresh
> > observer and later be unable to unregister it, because the required frame
> > was destroyed between registration and unregistration.
> 
> How? I don't see how that can occur.  Destroying the scroll frame calls
> ScrollbarActivity::Destroy which calls UnregisterFromRefreshDriver() which 
> should succeed at that point, afaict.

My (potentially flawed) thinking was that the call to GetRefreshDriver from UnregisterFromRefreshDriver may no longer succeed in the same way that it did when we called RegisterWithRefreshDriver and potentially return null, leaving the observer registered. Or, we may register the observer with the refresh driver from the horizontal scrollbar and then try to unregister it via the refresh driver obtained from the vertical scrollbar's nsPresContext (should the vertical scrollbar have been destroyed in the meantime). I was assuming that the nsPresContexts can be different and therefore the refresh drivers.
(In reply to Stephen Pohl [:spohl] from comment #33)
> My (potentially flawed) thinking was that the call to GetRefreshDriver from
> UnregisterFromRefreshDriver may no longer succeed in the same way that it
> did when we called RegisterWithRefreshDriver and potentially return null,
> leaving the observer registered.

GetScrollbarBox returns the content from mVScrollbarBox/mHScrollbarBox which
is anonymous content owned by, and has the same lifetime as, the scroll frame.
As far as I know, mVScrollbarBox/mHScrollbarBox does not change after they
are initially setup.  When hiding a scrollbar we set visibility:hidden
(and/or set width,height==0 ?), but we don't destroy the scrollbar frames
dynamically.

The current code in GetRefreshDriver() looks a bit weird, so "part 2"
is better but I don't think it makes a difference for the crash.

> Or, we may register the observer with the
> refresh driver from the horizontal scrollbar and then try to unregister it
> via the refresh driver obtained from the vertical scrollbar's nsPresContext

All these frames have the same nsPresContext.
Comment on attachment 779333 [details] [diff] [review]
Patch part 2

Looks good to me, r=mats.

But please leave the bug open as I don't think this will fix
the crash.
Attachment #779333 - Flags: review?(roc) → review+
Thanks Mats! Landed on inbound:
https://hg.mozilla.org/integration/mozilla-inbound/rev/f750e5a76656

I'll keep this bug open and will keep digging for the cause of the crash.
Depends on: 895203
https://hg.mozilla.org/mozilla-central/rev/f750e5a76656
Status: REOPENED → RESOLVED
Closed: 11 years ago11 years ago
Resolution: --- → FIXED
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Whiteboard: [lion-scrollbars+] → [leave open][lion-scrollbars+]
I was able to confirm that forgetting to unregister from the refresh driver is causing very similar crashes, although I've been unable to get the exact same callstack (an attempt of dereferencing 0x0000000d in do_QueryFrame). This may also be because I'm running non-optimized builds. Just for reference, the top frames of my 'artifical' crashes look like this:
0   XUL 0x000000010156ae9d mozilla::TimeStamp::operator-(mozilla::TimeStamp const&) const + 173 (TimeStamp.h:262)
1   XUL 0x0000000102b7347c mozilla::layout::ScrollbarActivity::UpdateOpacity(mozilla::TimeStamp) + 60 (ScrollbarActivity.cpp:352)
2   XUL 0x0000000102b733fe mozilla::layout::ScrollbarActivity::WillRefresh(mozilla::TimeStamp) + 206 (ScrollbarActivity.cpp:123)
3   XUL 0x0000000102b7379f non-virtual thunk to mozilla::layout::ScrollbarActivity::WillRefresh(mozilla::TimeStamp) + 47
4   XUL 0x0000000102b32373 nsRefreshDriver::Tick(long long, mozilla::TimeStamp) + 563 (nsRefreshDriver.cpp:1103)
5   XUL 0x0000000102b3a697 mozilla::RefreshDriverTimer::TickDriver(nsRefreshDriver*, long long, mozilla::TimeStamp) + 87 (nsRefreshDriver.cpp:172)
[...]

In my artificial crashes, we crash because operator- doesn't accept null values for aOther:
http://mxr.mozilla.org/mozilla-central/source/xpcom/ds/TimeStamp.h#262

I'm guessing that in optimized builds we may be able to get beyond this step and crash later.

I've noticed that we have a couple of places where we don't continue to unregister from the refresh driver if the frame was destroyed, for example after calls to SetIsFading:
http://mxr.mozilla.org/mozilla-central/source/layout/generic/ScrollbarActivity.cpp#59

However, my understanding is that if the frame was destroyed, ScrollbarActivity::Destroy() was also called and we unregistered from the refresh driver there. So moving the UnregisterFromRefreshDriver calls before method calls that could potentially destroy the frame wouldn't help with these crashes.

Mats, is it guaranteed that a call to nsRefreshDriver::RemoveRefreshObserver will remove the observer immediately? I.e. we're certain that there won't be any ticks for the observer anymore? I'm thinking of a queue where these ticks could have queued up and be missed during the removal of the observer for example. I'm pretty sure we're solid there, but I can't seem to find an instance where we're forgetting to unregister from the refresh driver and I'm running out of ideas...
Flags: needinfo?(matspal)
(In reply to Stephen Pohl [:spohl] from comment #38)
> However, my understanding is that if the frame was destroyed,
> ScrollbarActivity::Destroy() was also called and we unregistered from the
> refresh driver there.

Yes, that's my understanding too.

> Mats, is it guaranteed that a call to nsRefreshDriver::RemoveRefreshObserver
> will remove the observer immediately?

Yes, absolutely.  One should be able to delete the instance after
the call to RemoveRefreshObserver.

The interesting thing is what happens after the frame is destroyed.
It may be destroyed while you still have some ScrollbarActivity
calls on the stack.

Also, it's not just SetOpacityOnElement/UnsetOpacityOnElement that
can destroy the frame.  There's also SetBooleanAttribute which
calls [Un]SetAttr with aNotify==true.

Here's a call analysis of all three:

SetOpacityOnElement
  UpdateOpacity
    WillRefresh

UnsetOpacityOnElement
  SetIsFading
    ActivityStarted
      HandleEventForScrollbar
        HandleEvent
      ActivityOccurred
        HandleEvent
        nsGfxScrollFrameInner::SetCoordAttribute
        nsGfxScrollFrameInner::CurPosAttributeChanged
    BeginFade
      FadeBeginTimerFired
    EndFade
      WillRefresh

SetBooleanAttribute
  SetIsActive
    ActivityStarted
      ... as above
    EndFade
      ... as above
  HoveredScrollbar
    HandleEventForScrollbar
      ... as above
    SetIsActive
      ... as above

The calls rooted in WillRefresh and HandleEvent should be OK since
they hold a strong ref on the callee for the duration of the call.

FadeBeginTimerFired doesn't hold a strong ref:
http://hg.mozilla.org/mozilla-central/annotate/73b69c146ca6/layout/generic/ScrollbarActivity.h#l90
but both BeginFade and SetIsFading looks OK since they now check that
the frame is alive and return early if it isn't.  Still, we should
probably make it hold a strong ref to avoid any future mishaps when
adding/modifying this code.

The ActivityOccurred calls from nsGfxScrollFrameInner are more
interesting -- the caller use mScrollbarActivity which is a strong
ref but it's not guaranteed to live for the duration of the call
since it will be nulled out if the frame is destroyed (which will
likely delete the ScrollbarActivity object), so half-way through
ActivityOccurred() we may have deleted 'this' and the call to
ActivityStopped() would crash.

So to be on the safe side, the calls from the scroll frame code
should also hold a strong ref on the stack for the duration of the
call.

Still, that doesn't explain the 20130723030205 Nightly crash:
bp-e5f79cd2-a069-4e58-91e5-6946a2130724
It looks like a crash when reflowing a text control.  Maybe you could
stress test that path by pretending the preference has always/randomly
changed?
Flags: needinfo?(matspal)
(In reply to Ed Morley [:edmorley UTC+1] from comment #37)
> https://hg.mozilla.org/mozilla-central/rev/f750e5a76656
There have been no crashes since 25.0a1/20130724.
Comment on attachment 779333 [details] [diff] [review]
Patch part 2

Given comment 34 and 35, this patch might not fix the crash reported here. However, given the very low risk of the patch and since there haven't been any crashes in 25.0a1 since landing this patch, we might want to give it some more exposure in Aurora to see if it has any positive effect on the number of reported crashes. While waiting for the new numbers, I'll be working through the feedback in comment 39.

[Approval Request Comment]
Bug caused by (feature/regressing bug #): Bug 636564
User impact if declined: Might prevent crashes
Testing completed (on m-c, etc.): Has been on m-c for 6 days
Risk to taking this patch (and alternatives if risky): low
String or IDL/UUID changes made by this patch: none
Attachment #779333 - Flags: approval-mozilla-aurora?
Attachment #779333 - Flags: approval-mozilla-aurora? → approval-mozilla-aurora+
https://hg.mozilla.org/releases/mozilla-aurora/rev/5a18180195d7
So....is this fixed?
Target Milestone: mozilla24 → mozilla25
(In reply to Ryan VanderMeulen [:RyanVM UTC-4] from comment #43)
> So....is this fixed?

I don't believe so, but the crash stats over the next few days should tell us. In the meantime I'm addressing the feedback from comment 39, which is a good thing to do anyway. I'd feel more comfortable closing this bug after that's done. If that's not the proper way to handle this bug, just let me know.
(In reply to Stephen Pohl [:spohl] from comment #44)
> (In reply to Ryan VanderMeulen [:RyanVM UTC-4] from comment #43)
> > So....is this fixed?
> I don't believe so, but the crash stats over the next few days should tell
> us.
For me, it is because of seven consecutive builds without crashes.
Attached patch Patch part 3 (obsolete) β€” β€” Splinter Review
Let's see if I'm doing this right. Mats, are we confident that we're not going to destroy the ScrollbarActivity object when we keep a reference to it on the stack? I seem to remember from bug 636564 comment 208 that when the frame is being destroyed, the ScrollbarActivity object is destroyed regardless of remaining references. Given this, might it be better to check that mScrollableFrame is still alive between ScrollbarActivity::ActivityStarted and ScrollbarActivity::ActivityStopped in ScrollbarActivity::ActivityOccurred instead? Or were you suggesting that we should keep a strong reference to the scroll frame (rather than the ScrollbarActivity object) on the stack?

I'm still stress-testing the "switching of scrollbar styles during reflow" case. So far, no success in reproducing a crash.
Attachment #783910 - Flags: feedback?(matspal)
Attachment #783910 - Attachment is patch: true
Attachment #783910 - Attachment mime type: message/rfc822 → text/plain
(In reply to Stephen Pohl [:spohl] from comment #46)
> I seem to remember from bug 636564 comment 208 that when the
> frame is being destroyed, the ScrollbarActivity object is destroyed
> regardless of remaining references.

Just to follow up on this, looking at nsHTMLScrollFrame for example, the following seems to confirm my experience above:
http://mxr.mozilla.org/mozilla-central/source/layout/generic/nsGfxScrollFrame.cpp#101
http://mxr.mozilla.org/mozilla-central/source/layout/generic/nsGfxScrollFrame.cpp#2937

mScrollbarActivity appears to be getting destroyed regardless of its refcount.
No longer depends on: 866402
(In reply to Stephen Pohl [:spohl] from comment #46)
> Let's see if I'm doing this right. Mats, are we confident that we're not
> going to destroy the ScrollbarActivity object when we keep a reference to it
> on the stack?

Yes, the ScrollbarActivity object won't be deleted before the last strong
ref is gone.  The Destroy() method might called earlier but that's OK.

> I seem to remember from bug 636564 comment 208 that when the
> frame is being destroyed, the ScrollbarActivity object is destroyed
> regardless of remaining references.

See below.

(In reply to Stephen Pohl [:spohl] from comment #47)
> mScrollbarActivity appears to be getting destroyed regardless of its
> refcount.

It seems we use different terminology -- by "destroyed" you mean that
ScrollbarActivity::Destroy() was called?  While I take "destroyed" to
mean "the objects destructor was called and its memory deallocated".
(I suggest we use Destroy'ed for the former and deleted for the latter
just to be clear.  People often use destroyed/deleted/deallocated
interchangebly to mean the latter though.)

A strong ref on the stack won't prevent someone from calling Destroy().
I don't see a problem with that, do you?
Comment on attachment 783910 [details] [diff] [review]
Patch part 3

Looks good.  A couple of nits:
Keep the reinterpret_cast instead of the C-style cast.
Use nsCOMPtr<ScrollbarActivity> instead of introducing
nsRefPtr<ScrollbarActivity> (saves some code footprint).
r=mats with that.
Attachment #783910 - Flags: feedback?(matspal) → review+
(In reply to Mats Palmgren (:mats) from comment #49)
> (In reply to Stephen Pohl [:spohl] from comment #46)
> > Let's see if I'm doing this right. Mats, are we confident that we're not
> > going to destroy the ScrollbarActivity object when we keep a reference to it
> > on the stack?
> 
> Yes, the ScrollbarActivity object won't be deleted before the last strong
> ref is gone.  The Destroy() method might called earlier but that's OK.

Ah, yes, I agree with "won't be deleted".

> It seems we use different terminology -- by "destroyed" you mean that
> ScrollbarActivity::Destroy() was called?

Right, that's what I meant.

>  While I take "destroyed" to
> mean "the objects destructor was called and its memory deallocated".
> (I suggest we use Destroy'ed for the former and deleted for the latter
> just to be clear.  People often use destroyed/deleted/deallocated
> interchangebly to mean the latter though.)

Agreed.

> A strong ref on the stack won't prevent someone from calling Destroy().
> I don't see a problem with that, do you?

I've just checked and agree with you. That shouldn't be a problem. I wonder why I ran into the crashes in bug 636564 comment 208, but the logs for the try run are no longer available. It seems like roc was suggesting the same thing in bug 636564 comment 200, holding a local reference to mScrollbarActivity. For some reason, it looks like mScrollbarActivity got deleted in that case (rather than just Destroy'ed).

(In reply to Mats Palmgren (:mats) from comment #50)
> Use nsCOMPtr<ScrollbarActivity> instead of introducing
> nsRefPtr<ScrollbarActivity> (saves some code footprint).

I'm confused. I thought nsRefPtr should be used for concrete classes (taken from [1]). Is this not/no longer the case, or just not here? I'm sure you're right (and roc suggested the same thing in bug 636564), I'm just trying to understand.

[1] https://developer.mozilla.org/en-US/docs/nsRefPtr
Flags: needinfo?(matspal)
(In reply to Stephen Pohl [:spohl] from comment #51)
> For some reason, it looks like mScrollbarActivity got
> deleted in that case (rather than just Destroy'ed).

A wild guess is that you used the mScrollbarActivity field of a frame
that had been deleted (and thus read garbage), but that the actual
ScrollbarActivity object was still alive.

> I'm confused. I thought nsRefPtr should be used for concrete classes

We're already using nsCOMPtr<ScrollbarActivity> so that's why
I suggested it; but you're right nsRefPtr<ScrollbarActivity>
is more appropriate in this case -- so change all to nsRefPtr.
Flags: needinfo?(matspal)
Attached patch Patch part 3 β€” β€” Splinter Review
Thanks, Mats! Addressed review feedback. Carrying over r+ and will be landing on inbound shortly.
Attachment #783910 - Attachment is obsolete: true
Attachment #784143 - Flags: review+
https://hg.mozilla.org/integration/mozilla-inbound/rev/b16893e6cb28
Removing [leave open] keyword from the whiteboard since this hasn't crashed since landing patch part 2 (comment 45), and we've added all the additional safeguards that we (Mats, really) could think of in patch part 3 (comment 39).
Whiteboard: [leave open][lion-scrollbars+] → [lion-scrollbars+]
https://hg.mozilla.org/mozilla-central/rev/b16893e6cb28
Status: REOPENED → RESOLVED
Closed: 11 years ago11 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: