crash in mozilla::layout::ScrollbarActivity::UpdateOpacity

VERIFIED FIXED in Firefox 24

Status

()

--
critical
VERIFIED FIXED
6 years ago
5 years ago

People

(Reporter: scoobidiver, Assigned: spohl)

Tracking

({crash, regression})

23 Branch
mozilla25
x86_64
Mac OS X
crash, regression
Points:
---
Dependency tree / graph

Firefox Tracking Flags

(firefox21 unaffected, firefox22 unaffected, firefox23 disabled, firefox24+ verified, firefox25 verified, firefox-esr17 unaffected)

Details

(Whiteboard: [lion-scrollbars+], crash signature)

Attachments

(3 attachments, 3 obsolete attachments)

(Reporter)

Description

6 years ago
It first showed up in 23.0a1/20130503 but is discontinuous across builds.
It doesn't affect Mac OS X 10.6.
It's likely a regression from bug 636564 (frame 1 of the stack trace).

It seems related to bookmarking a page.

Signature 	mozilla::layout::ScrollbarActivity::UpdateOpacity(mozilla::TimeStamp) More Reports Search
UUID	8c66760e-bdb9-461a-8f79-97f3c2130526
Date Processed	2013-05-26 04:02:28
Uptime	599
Last Crash	11.8 hours before submission
Install Age	11.7 hours since version was first installed.
Install Time	2013-05-25 16:20:40
Product	Firefox
Version	24.0a1
Build ID	20130525031005
Release Channel	nightly
OS	Mac OS X
OS Version	10.8.3 12D78
Build Architecture	amd64
Build Architecture Info	family 6 model 23 stepping 10
Crash Reason	EXC_BAD_ACCESS / 0x0000000d
Crash Address	0x0
App Notes 	
AdapterVendorID: 0x10de, AdapterDeviceID: 0x 863GL Layers? GL Context? GL Context+ GL Layers+ 
Processor Notes 	sp-processor02_phx1_mozilla_com_13115:2012; exploitability tool: ERROR: unable to analyze dump
EMCheckCompatibility	True
Adapter Vendor ID	0x10de
Adapter Device ID	0x 863

Frame 	Module 	Signature 	Source
0 	XUL 	mozilla::layout::ScrollbarActivity::UpdateOpacity 	ScrollbarActivity.cpp:385
1 	XUL 	_ZThn8_N7mozilla6layout17ScrollbarActivity11WillRefreshENS_9TimeStampE 	ScrollbarActivity.cpp:107
2 	XUL 	nsRefreshDriver::Tick 	nsRefreshDriver.cpp:925
3 	libsystem_c.dylib 	libsystem_c.dylib@0x19886 	
4 	libsystem_c.dylib 	libsystem_c.dylib@0x18c13 	
5 	libnss3.dylib 	pt_PostNotifies 	
6 	libmozglue.dylib 	arena_malloc 	jemalloc.c:1722
7 	libsystem_c.dylib 	libsystem_c.dylib@0x2d1b3 	
8 	libsystem_c.dylib 	libsystem_c.dylib@0x2dc07 	
9 	libmozalloc.dylib 	moz_xmalloc 	mozalloc.cpp:54
10 	XUL 	nsTArray_base<nsTArrayInfallibleAllocator>::EnsureCapacity 	nsTArray.h:192
11 	XUL 	mozilla::RefreshDriverTimer::Tick 	nsRefreshDriver.cpp:168
12 	XUL 	XUL@0x39bc50 	
...

More reports at:
https://crash-stats.mozilla.com/report/list?signature=mozilla%3A%3Alayout%3A%3AScrollbarActivity%3A%3AUpdateOpacity%28mozilla%3A%3ATimeStamp%29
(Assignee)

Comment 1

6 years ago
This may be the intermittent crashes that can occur due to bug 868498. They can be triggered by simply removing a USB mouse.
(Assignee)

Updated

6 years ago
Whiteboard: [lion-scrollbars+]
(Assignee)

Comment 2

6 years ago
After another look at the crash dump (and assuming that the line number is correct in the call stack) it looks like mScrollableFrame is NULL in ScrollbarActivity::GetScrollbarContent. This should never happen...
WillRefresh
  UpdateOpacity
    SetOpacityOnElement
      nsIDOMCSSStyleDeclaration::SetProperty
http://hg.mozilla.org/mozilla-central/annotate/18fc62fd8dcc/layout/style/nsDOMCSSDeclaration.cpp#l168
        nsDOMCSSDeclaration::ParsePropertyValue
http://hg.mozilla.org/mozilla-central/annotate/18fc62fd8dcc/layout/style/nsDOMCSSDeclaration.cpp#l249

I think the mozAutoDocConditionalContentUpdateBatch implies the frame
can be destroyed and thus the ScrollbarActivity object.

Possible fix: add a strong ref on 'this' before the call to UpdateOpacity
to keep it alive and then check some flag, !mIsActive perhaps, to see
if Destroy was called.  Check it after the first SetOpacityOnElement
call and after the UpdateOpacity call in WillRefresh and return early.
For the record, the crashing line ScrollbarActivity.cpp:385 is:

382 nsIContent*
383 ScrollbarActivity::GetScrollbarContent(bool aVertical)
384 {
385  nsIFrame* box = mScrollableFrame->GetScrollbarBox(aVertical);
386  return box ? box->GetContent() : nullptr;
387 }

Comment 5

6 years ago
Created attachment 757299 [details] [diff] [review]
Just checking for null at all stages of dereferencing

I can't reproduce the issue, but for those who experienced the crash, could you just check if this quick patch fixes it?
Attachment #757299 - Flags: feedback?(scoobidiver)
Attachment #757299 - Flags: feedback?(matspal)
Comment on attachment 757299 [details] [diff] [review]
Just checking for null at all stages of dereferencing

> if (this ...

Huh? I don't think there is ever a situation where checking if 'this'
is null is correct code.

Please read comment 3 about the need to keep the object alive.
Attachment #757299 - Flags: feedback?(scoobidiver)
Attachment #757299 - Flags: feedback?(matspal)
Attachment #757299 - Flags: feedback-

Comment 7

6 years ago
When calling a non-virtual method, this can be null (or any bogus value in fact). It's actually the 1st (and invisible) parameter passed to the method. Of course, calling a method on a null object is already a mistake.

I read comment 3 of course, but I don't know much about strong refs yet.

And my question remains: does this small patch fixes the crash? It would just give a clue, it's not intended to be a real fix.
status-firefox21: --- → unaffected
status-firefox-esr17: --- → unaffected
tracking-firefox23: --- → ?
tracking-firefox24: --- → ?
(Assignee)

Comment 8

6 years ago
Created attachment 760070 [details] [diff] [review]
Patch

This patch incorporates the thoughts in comment 3 (thanks mats!) and is similar to what was done in patch 'part 13' for the overlay scrollbars (see bug 636564 comment 208).
Assignee: nobody → spohl.mozilla.bugs
Attachment #757299 - Attachment is obsolete: true
Attachment #760070 - Flags: review?(roc)
(Assignee)

Updated

6 years ago
Status: NEW → ASSIGNED
Looks good to me, with s/'this' became null/'this' was destroyed/

I suspect UnsetOpacityOnElement has the same problem through its
call to nsDOMCSSDeclaration::RemoveProperty, so SetIsFading
needs to be fixed in the same way.
Comment on attachment 760070 [details] [diff] [review]
Patch

Review of attachment 760070 [details] [diff] [review]:
-----------------------------------------------------------------

::: layout/generic/ScrollbarActivity.h
@@ +113,5 @@
>    void StopListeningForEventsOnScrollbar(nsIDOMEventTarget* aScrollbar);
>    void RegisterWithRefreshDriver();
>    void UnregisterFromRefreshDriver();
>  
> +  bool UpdateOpacity(TimeStamp aTime); // returns false if 'this' became null

"if 'this' was destroyed"
Attachment #760070 - Flags: review?(roc) → review+
(Assignee)

Comment 11

6 years ago
Created attachment 760147 [details] [diff] [review]
Patch

Fixed comment. Carrying over r+.
Attachment #760070 - Attachment is obsolete: true
Attachment #760147 - Flags: review+
(Assignee)

Updated

6 years ago
Blocks: 881022
(Assignee)

Comment 13

6 years ago
(In reply to Mats Palmgren [:mats] from comment #9)
> I suspect UnsetOpacityOnElement has the same problem through its
> call to nsDOMCSSDeclaration::RemoveProperty, so SetIsFading
> needs to be fixed in the same way.

Filed bug 881022 for this.
https://hg.mozilla.org/mozilla-central/rev/122712f43da6
Status: ASSIGNED → RESOLVED
Last Resolved: 6 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla24
(Reporter)

Updated

6 years ago
status-firefox24: affected → fixed
tracking-firefox23: ? → +
tracking-firefox24: ? → +
(Assignee)

Comment 15

6 years ago
Comment on attachment 760147 [details] [diff] [review]
Patch

[Approval Request Comment]
Bug caused by (feature/regressing bug #): Bug 636564
User impact if declined: Crash
Testing completed (on m-c, etc.): Has been on m-c for a few days and no more crashes have been reported in crash stats since landing.
Risk to taking this patch (and alternatives if risky): low
String or IDL/UUID changes made by this patch: none
Attachment #760147 - Flags: approval-mozilla-aurora?
Attachment #760147 - Flags: approval-mozilla-aurora? → approval-mozilla-aurora+
(Assignee)

Comment 16

6 years ago
Requesting checkin for aurora.
Keywords: checkin-needed
https://hg.mozilla.org/releases/mozilla-aurora/rev/2d44faad87c6
status-firefox23: affected → fixed
Keywords: checkin-needed
Marc, can you try to verify this fixed in Firefox 23 Beta 1:
> ftp://ftp.mozilla.org/pub/mozilla.org/firefox/nightly/23.0b1-candidates/build1/
Keywords: verifyme
QA Contact: mschifer

Comment 19

6 years ago
There are about 10 post-fix crashes for Firefox 23 in Socorro and a few more for Firefox 24. I looked through some of these reports though and they seem to show another bug: https://crash-stats.mozilla.com/report/list?query_search=signature&query_type=contains&reason_type=contains&range_value=4&range_unit=weeks&hang_type=any&process_type=any&signature=mozilla%3A%3Alayout%3A%3AScrollbarActivity%3A%3AUpdateOpacity%28mozilla%3A%3ATimeStamp%29
(Reporter)

Comment 20

6 years ago
(In reply to Ioana Budnar, QA [:ioana] from comment #19)
> I looked through some of these reports though and they seem to show another bug
I disagree. Compare the comment and stack trace in a pre-fix crash (see bp-e51ffbea-2c17-4c38-be9b-739b72130609) and a post-fix crash (see bp-faae29c7-6cd2-42ad-bc5e-84f5d2130612). It's not fixed.
It might be related to bug 866402 though.
Status: RESOLVED → REOPENED
status-firefox23: fixed → affected
status-firefox24: fixed → affected
Resolution: FIXED → ---

Comment 21

6 years ago
(In reply to Scoobidiver from comment #20)
> (In reply to Ioana Budnar, QA [:ioana] from comment #19)
> > I looked through some of these reports though and they seem to show another bug
> I disagree. Compare the comment and stack trace in a pre-fix crash (see
> bp-e51ffbea-2c17-4c38-be9b-739b72130609) and a post-fix crash (see
> bp-faae29c7-6cd2-42ad-bc5e-84f5d2130612). It's not fixed.
> It might be related to bug 866402 though.

I compared the new crashes with the one in comment 0 and I don't see them as being the same bug. The stack traces there are pretty different. Am I wrong?
Flags: needinfo?(spohl.mozilla.bugs)
(Reporter)

Comment 22

6 years ago
(In reply to Ioana Budnar, QA [:ioana] from comment #21)
> The stack traces there are pretty different. Am I wrong?
Yes, you are. XUL frames are the same and libsystem_c.dylib ones differ because it's not the same OS X version.
Flags: needinfo?(spohl.mozilla.bugs)
(Reporter)

Updated

6 years ago
Depends on: 866402
Dropping verifyme since this is still an issue. Please re-add once a fix lands.
Keywords: verifyme
(Assignee)

Comment 24

6 years ago
I've observed similar crashes while working on bug 868498 (note the 'crash' keyword in the bug) but don't have crash dumps to back it up. I'd like to land a patch for bug 868498 and see if this crash still reproduces. My observation has been that when connecting/disconnecting a mouse, or changing the system preferences for the scrollbar style, we could get crashes due to mScrollbarActivity being (or becoming) null in nsGfxScrollFrameInner. This could match the null-dereference that we're seeing here.
(Assignee)

Comment 25

6 years ago
So, we're trying to do a do_QueryFrame on a mScrollableFrame that was set to 0x0000000d:
http://hg.mozilla.org/releases/mozilla-aurora/annotate/d0b270f86f3e/layout/generic/ScrollbarActivity.cpp#l322

This gets past the null check in do_QueryFrame:
http://hg.mozilla.org/mozilla-central/annotate/834c8941ae24/layout/generic/nsQueryFrame.h#l86

And we try to dereference 0x0000000d here:
http://hg.mozilla.org/mozilla-central/annotate/834c8941ae24/layout/generic/nsQueryFrame.h#l89

There are scenarios where we destroy ScrollbarActivity objects and from what I can tell, we seem to make sure that refresh observers are removed when that happens. This crash seems to indicate that we still get a tick from a refresh observer that should have been removed and that we don't correctly null out mScrollableFrame. 0x0000000d looks a bit like an offset into a null object, like mScrollableFrame in ScrollbarActivity...

All this doesn't make a whole lot of sense to me right now.
(Assignee)

Comment 26

6 years ago
It looks like there have been no more reports of crashes after build 20130716040307, which seems to coincide with the landing of bug 868498. Scoobidiver, could you confirm that I'm reading these stats right? If I don't, could you provide a link that displays those reports correctly? Thanks!
Flags: needinfo?(scoobidiver)
(Reporter)

Comment 27

6 years ago
(In reply to Stephen Pohl [:spohl] from comment #26)
> It looks like there have been no more reports of crashes after build
> 20130716040307, which seems to coincide with the landing of bug 868498.
> Scoobidiver, could you confirm that I'm reading these stats right? If I
> don't, could you provide a link that displays those reports correctly?
It happened one build out of five in the worst case so you should wait the build from July 21 to confirm it's fixed. See https://crash-stats.mozilla.com/report/list?product=Firefox&version=Firefox:25.0a1&query_search=signature&query_type=contains&reason_type=contains&range_value=28&range_unit=days&signature=mozilla%3A%3Alayout%3A%3AScrollbarActivity%3A%3AUpdateOpacity%28mozilla%3A%3ATimeStamp%29
Flags: needinfo?(scoobidiver)
(Reporter)

Comment 28

6 years ago
There are crashes in 24.0a2/20130718 and 19 after the landing of bug 868498's patch.
(Assignee)

Comment 29

6 years ago
Okay, looks like this is still an issue in aurora and nightly. Scoobidiver, could you confirm that we haven't had any more of these crashes in the latest beta build and update the tracking flags? Thanks!
(Reporter)

Comment 30

6 years ago
There are no crashes in 23.0b7.
The latest crashes happened in 24.0a2/20130719 and 25.0a1/20130715.
status-firefox23: affected → unaffected
(Assignee)

Comment 31

6 years ago
Created attachment 779333 [details] [diff] [review]
Patch part 2

Looking through the initial patch and reading comment 3 again, I have a feeling that what can be destroyed by SetOpacityOnElement calls isn't actually mScrollableFrame, but rather what's returned by GetHorizontalScrollbar or GetVerticalScrollbar. This is a problem in ScrollbarActivity::GetRefreshDriver, because we may register a refresh observer and later be unable to unregister it, because the required frame was destroyed between registration and unregistration. This patch uses mScrollableFrame's nsPresContext to register refresh observers and *should* therefore avoid this problem.

If this proves to fix this crash we may be able to revert the initial patch.
Attachment #779333 - Flags: review?(roc)
(In reply to Stephen Pohl [:spohl] from comment #31)
> what can be destroyed by SetOpacityOnElement calls isn't
> actually mScrollableFrame, but rather what's returned by
> GetHorizontalScrollbar or GetVerticalScrollbar.

Get*Scrollbar returns anonymous content created by the scroll frame and
has the same lifetime.  Everything can be destroyed by SetOpacityOnElement;
you can't count on anything being alive unless you hold a strong ref on it.

> This is a problem in
> ScrollbarActivity::GetRefreshDriver, because we may register a refresh
> observer and later be unable to unregister it, because the required frame
> was destroyed between registration and unregistration.

How? I don't see how that can occur.  Destroying the scroll frame calls
ScrollbarActivity::Destroy which calls UnregisterFromRefreshDriver() which 
should succeed at that point, afaict.
http://hg.mozilla.org/mozilla-central/annotate/308e3cf5ba75/layout/generic/nsGfxScrollFrame.cpp#l2904
http://hg.mozilla.org/mozilla-central/annotate/308e3cf5ba75/layout/generic/ScrollbarActivity.cpp#l39

> If this proves to fix this crash we may be able to revert the initial patch.

I don't think "Patch part 2" makes any difference.  We still need the first
patch.
(Assignee)

Comment 33

6 years ago
(In reply to Mats Palmgren (:mats) from comment #32)
> (In reply to Stephen Pohl [:spohl] from comment #31)
> > This is a problem in
> > ScrollbarActivity::GetRefreshDriver, because we may register a refresh
> > observer and later be unable to unregister it, because the required frame
> > was destroyed between registration and unregistration.
> 
> How? I don't see how that can occur.  Destroying the scroll frame calls
> ScrollbarActivity::Destroy which calls UnregisterFromRefreshDriver() which 
> should succeed at that point, afaict.

My (potentially flawed) thinking was that the call to GetRefreshDriver from UnregisterFromRefreshDriver may no longer succeed in the same way that it did when we called RegisterWithRefreshDriver and potentially return null, leaving the observer registered. Or, we may register the observer with the refresh driver from the horizontal scrollbar and then try to unregister it via the refresh driver obtained from the vertical scrollbar's nsPresContext (should the vertical scrollbar have been destroyed in the meantime). I was assuming that the nsPresContexts can be different and therefore the refresh drivers.
(In reply to Stephen Pohl [:spohl] from comment #33)
> My (potentially flawed) thinking was that the call to GetRefreshDriver from
> UnregisterFromRefreshDriver may no longer succeed in the same way that it
> did when we called RegisterWithRefreshDriver and potentially return null,
> leaving the observer registered.

GetScrollbarBox returns the content from mVScrollbarBox/mHScrollbarBox which
is anonymous content owned by, and has the same lifetime as, the scroll frame.
As far as I know, mVScrollbarBox/mHScrollbarBox does not change after they
are initially setup.  When hiding a scrollbar we set visibility:hidden
(and/or set width,height==0 ?), but we don't destroy the scrollbar frames
dynamically.

The current code in GetRefreshDriver() looks a bit weird, so "part 2"
is better but I don't think it makes a difference for the crash.

> Or, we may register the observer with the
> refresh driver from the horizontal scrollbar and then try to unregister it
> via the refresh driver obtained from the vertical scrollbar's nsPresContext

All these frames have the same nsPresContext.
Comment on attachment 779333 [details] [diff] [review]
Patch part 2

Looks good to me, r=mats.

But please leave the bug open as I don't think this will fix
the crash.
Attachment #779333 - Flags: review?(roc) → review+
(Assignee)

Comment 36

6 years ago
Thanks Mats! Landed on inbound:
https://hg.mozilla.org/integration/mozilla-inbound/rev/f750e5a76656

I'll keep this bug open and will keep digging for the cause of the crash.
(Assignee)

Updated

6 years ago
status-firefox23: unaffected → disabled
status-firefox25: --- → affected
tracking-firefox23: + → ---
(Reporter)

Updated

6 years ago
Depends on: 895203
https://hg.mozilla.org/mozilla-central/rev/f750e5a76656
Status: REOPENED → RESOLVED
Last Resolved: 6 years ago6 years ago
Resolution: --- → FIXED
(Assignee)

Updated

6 years ago
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Whiteboard: [lion-scrollbars+] → [leave open][lion-scrollbars+]
(Assignee)

Comment 38

6 years ago
I was able to confirm that forgetting to unregister from the refresh driver is causing very similar crashes, although I've been unable to get the exact same callstack (an attempt of dereferencing 0x0000000d in do_QueryFrame). This may also be because I'm running non-optimized builds. Just for reference, the top frames of my 'artifical' crashes look like this:
0   XUL 0x000000010156ae9d mozilla::TimeStamp::operator-(mozilla::TimeStamp const&) const + 173 (TimeStamp.h:262)
1   XUL 0x0000000102b7347c mozilla::layout::ScrollbarActivity::UpdateOpacity(mozilla::TimeStamp) + 60 (ScrollbarActivity.cpp:352)
2   XUL 0x0000000102b733fe mozilla::layout::ScrollbarActivity::WillRefresh(mozilla::TimeStamp) + 206 (ScrollbarActivity.cpp:123)
3   XUL 0x0000000102b7379f non-virtual thunk to mozilla::layout::ScrollbarActivity::WillRefresh(mozilla::TimeStamp) + 47
4   XUL 0x0000000102b32373 nsRefreshDriver::Tick(long long, mozilla::TimeStamp) + 563 (nsRefreshDriver.cpp:1103)
5   XUL 0x0000000102b3a697 mozilla::RefreshDriverTimer::TickDriver(nsRefreshDriver*, long long, mozilla::TimeStamp) + 87 (nsRefreshDriver.cpp:172)
[...]

In my artificial crashes, we crash because operator- doesn't accept null values for aOther:
http://mxr.mozilla.org/mozilla-central/source/xpcom/ds/TimeStamp.h#262

I'm guessing that in optimized builds we may be able to get beyond this step and crash later.

I've noticed that we have a couple of places where we don't continue to unregister from the refresh driver if the frame was destroyed, for example after calls to SetIsFading:
http://mxr.mozilla.org/mozilla-central/source/layout/generic/ScrollbarActivity.cpp#59

However, my understanding is that if the frame was destroyed, ScrollbarActivity::Destroy() was also called and we unregistered from the refresh driver there. So moving the UnregisterFromRefreshDriver calls before method calls that could potentially destroy the frame wouldn't help with these crashes.

Mats, is it guaranteed that a call to nsRefreshDriver::RemoveRefreshObserver will remove the observer immediately? I.e. we're certain that there won't be any ticks for the observer anymore? I'm thinking of a queue where these ticks could have queued up and be missed during the removal of the observer for example. I'm pretty sure we're solid there, but I can't seem to find an instance where we're forgetting to unregister from the refresh driver and I'm running out of ideas...
Flags: needinfo?(matspal)
(In reply to Stephen Pohl [:spohl] from comment #38)
> However, my understanding is that if the frame was destroyed,
> ScrollbarActivity::Destroy() was also called and we unregistered from the
> refresh driver there.

Yes, that's my understanding too.

> Mats, is it guaranteed that a call to nsRefreshDriver::RemoveRefreshObserver
> will remove the observer immediately?

Yes, absolutely.  One should be able to delete the instance after
the call to RemoveRefreshObserver.

The interesting thing is what happens after the frame is destroyed.
It may be destroyed while you still have some ScrollbarActivity
calls on the stack.

Also, it's not just SetOpacityOnElement/UnsetOpacityOnElement that
can destroy the frame.  There's also SetBooleanAttribute which
calls [Un]SetAttr with aNotify==true.

Here's a call analysis of all three:

SetOpacityOnElement
  UpdateOpacity
    WillRefresh

UnsetOpacityOnElement
  SetIsFading
    ActivityStarted
      HandleEventForScrollbar
        HandleEvent
      ActivityOccurred
        HandleEvent
        nsGfxScrollFrameInner::SetCoordAttribute
        nsGfxScrollFrameInner::CurPosAttributeChanged
    BeginFade
      FadeBeginTimerFired
    EndFade
      WillRefresh

SetBooleanAttribute
  SetIsActive
    ActivityStarted
      ... as above
    EndFade
      ... as above
  HoveredScrollbar
    HandleEventForScrollbar
      ... as above
    SetIsActive
      ... as above

The calls rooted in WillRefresh and HandleEvent should be OK since
they hold a strong ref on the callee for the duration of the call.

FadeBeginTimerFired doesn't hold a strong ref:
http://hg.mozilla.org/mozilla-central/annotate/73b69c146ca6/layout/generic/ScrollbarActivity.h#l90
but both BeginFade and SetIsFading looks OK since they now check that
the frame is alive and return early if it isn't.  Still, we should
probably make it hold a strong ref to avoid any future mishaps when
adding/modifying this code.

The ActivityOccurred calls from nsGfxScrollFrameInner are more
interesting -- the caller use mScrollbarActivity which is a strong
ref but it's not guaranteed to live for the duration of the call
since it will be nulled out if the frame is destroyed (which will
likely delete the ScrollbarActivity object), so half-way through
ActivityOccurred() we may have deleted 'this' and the call to
ActivityStopped() would crash.

So to be on the safe side, the calls from the scroll frame code
should also hold a strong ref on the stack for the duration of the
call.

Still, that doesn't explain the 20130723030205 Nightly crash:
bp-e5f79cd2-a069-4e58-91e5-6946a2130724
It looks like a crash when reflowing a text control.  Maybe you could
stress test that path by pretending the preference has always/randomly
changed?
Flags: needinfo?(matspal)
(Reporter)

Comment 40

6 years ago
(In reply to Ed Morley [:edmorley UTC+1] from comment #37)
> https://hg.mozilla.org/mozilla-central/rev/f750e5a76656
There have been no crashes since 25.0a1/20130724.
(Assignee)

Comment 41

6 years ago
Comment on attachment 779333 [details] [diff] [review]
Patch part 2

Given comment 34 and 35, this patch might not fix the crash reported here. However, given the very low risk of the patch and since there haven't been any crashes in 25.0a1 since landing this patch, we might want to give it some more exposure in Aurora to see if it has any positive effect on the number of reported crashes. While waiting for the new numbers, I'll be working through the feedback in comment 39.

[Approval Request Comment]
Bug caused by (feature/regressing bug #): Bug 636564
User impact if declined: Might prevent crashes
Testing completed (on m-c, etc.): Has been on m-c for 6 days
Risk to taking this patch (and alternatives if risky): low
String or IDL/UUID changes made by this patch: none
Attachment #779333 - Flags: approval-mozilla-aurora?
Attachment #779333 - Flags: approval-mozilla-aurora? → approval-mozilla-aurora+
So....is this fixed?
status-firefox24: affected → fixed
status-firefox25: affected → fixed
Target Milestone: mozilla24 → mozilla25
(Assignee)

Comment 44

6 years ago
(In reply to Ryan VanderMeulen [:RyanVM UTC-4] from comment #43)
> So....is this fixed?

I don't believe so, but the crash stats over the next few days should tell us. In the meantime I'm addressing the feedback from comment 39, which is a good thing to do anyway. I'd feel more comfortable closing this bug after that's done. If that's not the proper way to handle this bug, just let me know.
(Reporter)

Comment 45

6 years ago
(In reply to Stephen Pohl [:spohl] from comment #44)
> (In reply to Ryan VanderMeulen [:RyanVM UTC-4] from comment #43)
> > So....is this fixed?
> I don't believe so, but the crash stats over the next few days should tell
> us.
For me, it is because of seven consecutive builds without crashes.
(Assignee)

Comment 46

5 years ago
Created attachment 783910 [details] [diff] [review]
Patch part 3

Let's see if I'm doing this right. Mats, are we confident that we're not going to destroy the ScrollbarActivity object when we keep a reference to it on the stack? I seem to remember from bug 636564 comment 208 that when the frame is being destroyed, the ScrollbarActivity object is destroyed regardless of remaining references. Given this, might it be better to check that mScrollableFrame is still alive between ScrollbarActivity::ActivityStarted and ScrollbarActivity::ActivityStopped in ScrollbarActivity::ActivityOccurred instead? Or were you suggesting that we should keep a strong reference to the scroll frame (rather than the ScrollbarActivity object) on the stack?

I'm still stress-testing the "switching of scrollbar styles during reflow" case. So far, no success in reproducing a crash.
Attachment #783910 - Flags: feedback?(matspal)
(Assignee)

Updated

5 years ago
Attachment #783910 - Attachment is patch: true
Attachment #783910 - Attachment mime type: message/rfc822 → text/plain
(Assignee)

Comment 47

5 years ago
(In reply to Stephen Pohl [:spohl] from comment #46)
> I seem to remember from bug 636564 comment 208 that when the
> frame is being destroyed, the ScrollbarActivity object is destroyed
> regardless of remaining references.

Just to follow up on this, looking at nsHTMLScrollFrame for example, the following seems to confirm my experience above:
http://mxr.mozilla.org/mozilla-central/source/layout/generic/nsGfxScrollFrame.cpp#101
http://mxr.mozilla.org/mozilla-central/source/layout/generic/nsGfxScrollFrame.cpp#2937

mScrollbarActivity appears to be getting destroyed regardless of its refcount.
(Reporter)

Updated

5 years ago
No longer depends on: 866402
(Reporter)

Updated

5 years ago
Duplicate of this bug: 866402
(In reply to Stephen Pohl [:spohl] from comment #46)
> Let's see if I'm doing this right. Mats, are we confident that we're not
> going to destroy the ScrollbarActivity object when we keep a reference to it
> on the stack?

Yes, the ScrollbarActivity object won't be deleted before the last strong
ref is gone.  The Destroy() method might called earlier but that's OK.

> I seem to remember from bug 636564 comment 208 that when the
> frame is being destroyed, the ScrollbarActivity object is destroyed
> regardless of remaining references.

See below.

(In reply to Stephen Pohl [:spohl] from comment #47)
> mScrollbarActivity appears to be getting destroyed regardless of its
> refcount.

It seems we use different terminology -- by "destroyed" you mean that
ScrollbarActivity::Destroy() was called?  While I take "destroyed" to
mean "the objects destructor was called and its memory deallocated".
(I suggest we use Destroy'ed for the former and deleted for the latter
just to be clear.  People often use destroyed/deleted/deallocated
interchangebly to mean the latter though.)

A strong ref on the stack won't prevent someone from calling Destroy().
I don't see a problem with that, do you?
Comment on attachment 783910 [details] [diff] [review]
Patch part 3

Looks good.  A couple of nits:
Keep the reinterpret_cast instead of the C-style cast.
Use nsCOMPtr<ScrollbarActivity> instead of introducing
nsRefPtr<ScrollbarActivity> (saves some code footprint).
r=mats with that.
Attachment #783910 - Flags: feedback?(matspal) → review+
(Assignee)

Comment 51

5 years ago
(In reply to Mats Palmgren (:mats) from comment #49)
> (In reply to Stephen Pohl [:spohl] from comment #46)
> > Let's see if I'm doing this right. Mats, are we confident that we're not
> > going to destroy the ScrollbarActivity object when we keep a reference to it
> > on the stack?
> 
> Yes, the ScrollbarActivity object won't be deleted before the last strong
> ref is gone.  The Destroy() method might called earlier but that's OK.

Ah, yes, I agree with "won't be deleted".

> It seems we use different terminology -- by "destroyed" you mean that
> ScrollbarActivity::Destroy() was called?

Right, that's what I meant.

>  While I take "destroyed" to
> mean "the objects destructor was called and its memory deallocated".
> (I suggest we use Destroy'ed for the former and deleted for the latter
> just to be clear.  People often use destroyed/deleted/deallocated
> interchangebly to mean the latter though.)

Agreed.

> A strong ref on the stack won't prevent someone from calling Destroy().
> I don't see a problem with that, do you?

I've just checked and agree with you. That shouldn't be a problem. I wonder why I ran into the crashes in bug 636564 comment 208, but the logs for the try run are no longer available. It seems like roc was suggesting the same thing in bug 636564 comment 200, holding a local reference to mScrollbarActivity. For some reason, it looks like mScrollbarActivity got deleted in that case (rather than just Destroy'ed).

(In reply to Mats Palmgren (:mats) from comment #50)
> Use nsCOMPtr<ScrollbarActivity> instead of introducing
> nsRefPtr<ScrollbarActivity> (saves some code footprint).

I'm confused. I thought nsRefPtr should be used for concrete classes (taken from [1]). Is this not/no longer the case, or just not here? I'm sure you're right (and roc suggested the same thing in bug 636564), I'm just trying to understand.

[1] https://developer.mozilla.org/en-US/docs/nsRefPtr
Flags: needinfo?(matspal)
(In reply to Stephen Pohl [:spohl] from comment #51)
> For some reason, it looks like mScrollbarActivity got
> deleted in that case (rather than just Destroy'ed).

A wild guess is that you used the mScrollbarActivity field of a frame
that had been deleted (and thus read garbage), but that the actual
ScrollbarActivity object was still alive.

> I'm confused. I thought nsRefPtr should be used for concrete classes

We're already using nsCOMPtr<ScrollbarActivity> so that's why
I suggested it; but you're right nsRefPtr<ScrollbarActivity>
is more appropriate in this case -- so change all to nsRefPtr.
Flags: needinfo?(matspal)
(Assignee)

Comment 53

5 years ago
Created attachment 784143 [details] [diff] [review]
Patch part 3

Thanks, Mats! Addressed review feedback. Carrying over r+ and will be landing on inbound shortly.
Attachment #783910 - Attachment is obsolete: true
Attachment #784143 - Flags: review+
(Assignee)

Comment 55

5 years ago
Removing [leave open] keyword from the whiteboard since this hasn't crashed since landing patch part 2 (comment 45), and we've added all the additional safeguards that we (Mats, really) could think of in patch part 3 (comment 39).
Whiteboard: [leave open][lion-scrollbars+] → [lion-scrollbars+]
https://hg.mozilla.org/mozilla-central/rev/b16893e6cb28
Status: REOPENED → RESOLVED
Last Resolved: 6 years ago5 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.