Crash in [@ mozilla::widget::TouchResampler::NotifyFrame]
Categories
(Core :: Panning and Zooming, defect, P2)
Tracking
()
People
(Reporter: RyanVM, Unassigned)
References
(Regression)
Details
(Keywords: crash, regression)
Crash Data
Attachments
(1 file)
Started in the 85-based Fenix & Focus releases.
Crash report: https://crash-stats.mozilla.org/report/index/bd0e1a9e-a29a-4238-b308-3ab920210401
MOZ_CRASH Reason: MOZ_RELEASE_ASSERT(lastTouchTime)
Top 10 frames of crashing thread:
0 libxul.so mozilla::widget::TouchResampler::NotifyFrame widget/TouchResampler.cpp:92
1 libxul.so mozilla::widget::NPZCSupport::OnVsync widget/android/nsWindow.cpp:747
2 libxul.so void mozilla::jni::NativeStub<mozilla::java::AndroidVsync::NotifyVsync_t, mozilla::widget::AndroidVsyncSupport, mozilla::jni::Args<long> >::Wrap<&mozilla::widget::AndroidVsyncSupport widget/android/jni/Natives.h:1462
3 base.odex base.odex@0xae424
4 base.art] base.art]@0x1dbfec
5 boot-framework.art] boot-framework.art]@0x150fc
6 boot-framework.art] boot-framework.art]@0x929e4c
7 base.odex base.odex@0x264b48
8 base.art] base.art]@0x1dc03c
9 boot.art] boot.art]@0x414
Updated•3 years ago
|
Updated•3 years ago
|
Updated•3 years ago
|
Updated•3 years ago
|
Reporter | ||
Updated•2 years ago
|
Updated•2 years ago
|
Comment 1•2 years ago
|
||
Had a look at this, and I don't see any obvious reason why this crash might occur:
- The value being asserted is a timestamp, which starts as null and is only modified in
CurrentTouches::UpdateFromEvent()
when processing aMULTITOUCH_MOVE
event, set to the event's timestamp; that in turn is called fromTouchResampler::ProcessEvent()
. - Due to this condition, the assertion is only reached if
mDeferredTouchMoveEvents
is nonempty. mDeferredTouchMoveEvents
is only populated here inTouchResampler::ProcessEvent()
, if aMULTITOUCH_MOVE
event is processed. Thus, if it's nonempty, a move event has been processed in the past and thus the timestamp should be set from the event. Assuming the event itself doesn't have a null timestamp, the assertion should pass.
I considered threading issues as well, but that part seems fine: ProcessEvent()
is called from HandleMotionEvent()
which should be called on the Java UI thread. The assertion is in TouchResampler::NotifyFrame()
which is called from OnVsync()
which is documented to also be called on the Java UI thread.
Markus, do you see anything I'm overlooking? If we can't track down the assertion's cause, should we revisit whether it needs to be a release assert? It seems we could fall back gracefully to e.g. calling FlushDeferredTouchMoveEventsUnresampled()
instead.
Comment 2•2 years ago
|
||
Thanks so much for taking a look!
(In reply to Botond Ballo [:botond] from comment #1)
I considered threading issues as well, but that part seems fine:
ProcessEvent()
is called fromHandleMotionEvent()
which should be called on the Java UI thread. The assertion is inTouchResampler::NotifyFrame()
which is called fromOnVsync()
which is documented to also be called on the Java UI thread.
I wonder if we have a use-after-free even with everything happening on the Java UI thread because of this code in AndroidVsync::NotifyVsync
:
// Do not keep the lock held while calling OnVsync.
nsTArray<Observer*> observers;
{
auto impl = mImpl.Lock();
observers.AppendElements(impl->mInputObservers);
observers.AppendElements(impl->mRenderObservers);
}
for (Observer* observer : observers) {
observer->OnVsync(timeStamp);
}
It doesn't look like I thought this code through when I wrote it. We are holding on to an array of raw pointers and expect all of them to still be valid when we call OnVsync on them. I'm not sure why I thought this was ok.
Specifically, if we have two observers, and if notifying the first observer causes code to run which deletes the second observer (and unregisters it), then we call OnVsync on the second observer when it's already gone.
There are currently only two possible observer types: NPZCSupport
(which registers itself as an input observer), and AndroidVsyncSource::Display
(which registers itself as a render observer). So this function will always call NPZCSupport::OnVsync
before AndroidVsyncSource::Display::OnVsync
.
I wonder if it's possible to have two NPZCSupport
instances registered at the same time? And then whether it's possible that NPZCSupport::OnVsync
on the first instance could cause the destruction of the second NPZCSupport
instance?
If all those things came together, we'd call NPZCSupport::OnVsync
on a destroyed NPZCSupport
object, which would call TouchResampler::NotifyFrame
on a destroyed TouchResampler
object, and then this release assertion might be hit.
Comment 3•2 years ago
|
||
(In reply to Markus Stange [:mstange] from comment #2)
I wonder if it's possible to have two
NPZCSupport
instances registered at the same time? And then whether it's possible thatNPZCSupport::OnVsync
on the first instance could cause the destruction of the secondNPZCSupport
instance?
Good questions. I'm not really familiar enough with the GeckoView integration to answer them, but perhaps Agi might know.
Specifically:
- Can there be more than one
NPZCSupport
object alive at one time? (AndroidVsync
has a static instance, so this would imply more than one could be registered at the same time.) - Could the code inside NPZCSupport::OnVsync() (particularly places that call back into GeckoView, like the GeckoResult::Complete() calls in
FinishHandlingMotionEvent()
) trigger destruction of anotherNPZCSupport
instance? - Finally, what is a safe way
NPZCSupport
can register itself as an observer with another C++ component (AndroidVsync
), such that the pointer stored by that component either keeps theNPZCSupport
alive, or can check before invoking the observer whether it's still alive?
Comment 4•2 years ago
•
|
||
(In reply to Botond Ballo [:botond] from comment #3)
(In reply to Markus Stange [:mstange] from comment #2)
I wonder if it's possible to have two
NPZCSupport
instances registered at the same time? And then whether it's possible thatNPZCSupport::OnVsync
on the first instance could cause the destruction of the secondNPZCSupport
instance?Good questions. I'm not really familiar enough with the GeckoView integration to answer them, but perhaps Agi might know.
Specifically:
- Can there be more than one
NPZCSupport
object alive at one time? (AndroidVsync
has a static instance, so this would imply more than one could be registered at the same time.)
There should be one for each GeckoSession ( = each tab)
- Could the code inside NPZCSupport::OnVsync() (particularly places that call back into GeckoView, like the GeckoResult::Complete() calls in
FinishHandlingMotionEvent()
) trigger destruction of anotherNPZCSupport
instance?
I can't think of a way that would happen right now, but it is possible.
- Finally, what is a safe way
NPZCSupport
can register itself as an observer with another C++ component (AndroidVsync
), such that the pointer stored by that component either keeps theNPZCSupport
alive, or can check before invoking the observer whether it's still alive?
the Java GC owns NPZCSupport, so anywhere in native code we cannot assume that the pointer is valid. I think we would need to have the observers be some sort of WeakPtr (the easiest would be mozilla::jni::NativeWeakPtr
: https://searchfox.org/mozilla-central/rev/ad38c9d1f0a9036c4da5271a849f47342b68adc8/widget/android/nsWindow.h#87)
Comment 5•2 years ago
|
||
Thanks. Sounds like a potential fix worth trying would be to change the AndroidVsync
observer registered by NPZCSupport
from the NPZCSupport
object itself, to a wrapper which holds a NativeWeakPtr
to the NPZCSupport
, and null-checks it at invocation time.
Updated•2 years ago
|
Comment 6•1 year ago
|
||
So you want AndroidVsync::RegisterObserver()
and AndroidVsync::UnregisterObserver()
to take a NativeWeakPtr<Observer>
parameter (instead of Observer *
), and store them in nsTArray<NativeWeakPtr<Observer>>
arrays?
Comment 7•1 year ago
|
||
(In reply to Razvan Cojocaru from comment #6)
So you want
AndroidVsync::RegisterObserver()
andAndroidVsync::UnregisterObserver()
to take aNativeWeakPtr<Observer>
parameter (instead ofObserver *
), and store them innsTArray<NativeWeakPtr<Observer>>
arrays?
This is an interesting alternative I hadn't considered. Unfortunately, I don't think we can do this, because I believe the only types that can be stored inside NativeWeakPtr<>
are ones that derive from an appropriate JNI-related base class. This is true of NPZCSupport but not of AndroidVsyncSource.
So, I think what we'd want to do instead is:
- Instead of having
NPZCSupport
implementAndroidVsync::Observer
itself, give it a nested classNPZCSupport::Observer
which implementsAndroidVsync::Observer
. - Have
NPZCSupport::Observer
store aNativeWeakPtr<NPZCSupport>
. - In
NPZCSupport::Observer::OnVsync()
, null-check theNativeWeakPtr
and only callNPZCSupport::OnVsync()
if the object is still alive.
Comment 8•1 year ago
•
|
||
(In reply to Botond Ballo [:botond] from comment #7)
- Instead of having
NPZCSupport
implementAndroidVsync::Observer
itself, give it a nested classNPZCSupport::Observer
which implementsAndroidVsync::Observer
.
Speaking of NPZCSupport::Observer
, it appears to be implemented like so:
class Observer {
public:
// Will be called on the Java UI thread.
virtual void OnVsync(const TimeStamp& aTimeStamp) = 0;
};
I strongly believe that all classes meant as interfaces (that is, base classes with virtual - especially pure virtual - member functions) should have a virtual destructor. This is supported by a variety of sources: Stroustrup's FAQ, the C++ Core Guidelines, the Sonar documentation, and so on.
Perhaps this is something worth considering for the future (of course not for this bug)?
Comment 9•1 year ago
|
||
(In reply to Botond Ballo [:botond] from comment #7)
(In reply to Razvan Cojocaru from comment #6)
So you want
AndroidVsync::RegisterObserver()
andAndroidVsync::UnregisterObserver()
to take aNativeWeakPtr<Observer>
parameter (instead ofObserver *
), and store them innsTArray<NativeWeakPtr<Observer>>
arrays?This is an interesting alternative I hadn't considered. Unfortunately, I don't think we can do this, because I believe the only types that can be stored inside
NativeWeakPtr<>
are ones that derive from an appropriate JNI-related base class. This is true of NPZCSupport but not of AndroidVsyncSource.So, I think what we'd want to do instead is:
- Instead of having
NPZCSupport
implementAndroidVsync::Observer
itself, give it a nested classNPZCSupport::Observer
which implementsAndroidVsync::Observer
.
Who would manage the lifetime of the NPZCSupport::Observer
instance we'd want to pass instead of this
? It can't be a member of the current NPZCSupport
, since in that case the problem persists: it will disappear alongside the object it is a member of. Creating it on the heap with new
also doesn't appear to be a good solution at a first glance at the code, since AndroidVsync::UnregisterObserver() appears to simply remove the pointer from its raw-pointer arrays (this is why I proposed to store weak pointers instead of raw pointers in the observer arrays) - but doesn't delete
it.
Since you've got much more experience with the Firefox code, I'm sure I'm just missing something?
Comment 10•1 year ago
|
||
(In reply to Razvan Cojocaru from comment #8)
Speaking of
NPZCSupport::Observer
, it appears to be implemented like so:class Observer { public: // Will be called on the Java UI thread. virtual void OnVsync(const TimeStamp& aTimeStamp) = 0; };
I strongly believe that all classes meant as interfaces (that is, base classes with virtual - especially pure virtual - member functions) should have a virtual destructor. This is supported by a variety of sources: Stroustrup's FAQ, the C++ Core Guidelines, the Sonar documentation, and so on.
Perhaps this is something worth considering for the future (of course not for this bug)?
Agreed, we should give Observer
a virtual destructor.
Procedurally, in cases like this where we notice a cleanup/improvement opportunity in code we're working with that's not strictly related to the bug being fixed, we'll sometimes do the cleanup/improvement in an additional patch on the same bug. This both keeps the patches clean (avoids mixing different changes in one patch) while also avoiding the overhead of filing a new bug.
Comment 11•1 year ago
|
||
(In reply to Razvan Cojocaru from comment #9)
Who would manage the lifetime of the
NPZCSupport::Observer
instance we'd want to pass instead ofthis
? It can't be a member of the currentNPZCSupport
, since in that case the problem persists: it will disappear alongside the object it is a member of. Creating it on the heap withnew
also doesn't appear to be a good solution at a first glance at the code, since AndroidVsync::UnregisterObserver() appears to simply remove the pointer from its raw-pointer arrays (this is why I proposed to store weak pointers instead of raw pointers in the observer arrays) - but doesn'tdelete
it.Since you've got much more experience with the Firefox code, I'm sure I'm just missing something?
You're not missing anything, this is a very good question that I've overlooked.
Would it be too hacky to give Observer
a second virtual function named Dispose()
, which UnregisterObserver()
calls, and which in the case of NPZCSupport::Observer
calls delete this
(and then we can just create it using new
)?
The other option I can think of is to change AndroidVsync
to accept and store RefPtr<Observer>
instead (i.e. strong references). However, in the case where the Observer
is AndroidVsyncSource
, that stores a strong refrence to AndroidVsync
, so it seems we'd be creating a strong reference cycle.
I'm open to other suggestions as well.
Comment 12•1 year ago
|
||
(In reply to Botond Ballo [:botond] from comment #10)
Agreed, we should give
Observer
a virtual destructor.Procedurally, in cases like this where we notice a cleanup/improvement opportunity in code we're working with that's not strictly related to the bug being fixed, we'll sometimes do the cleanup/improvement in an additional patch on the same bug. This both keeps the patches clean (avoids mixing different changes in one patch) while also avoiding the overhead of filing a new bug.
Great, I'll do the modification in the patch then - should be short and painless. :) Thanks!
(In reply to Botond Ballo [:botond] from comment #11)
Would it be too hacky to give
Observer
a second virtual function namedDispose()
, whichUnregisterObserver()
calls, and which in the case ofNPZCSupport::Observer
callsdelete this
(and then we can just create it usingnew
)?
Well it is (the simplified to the base essentials version of) COM+'s AddRef()
/Release()
so it's not all that hacky. The solution's bound to be hacky anyway if we want to be able to store both regular pointers to on-the-stack objects and pointers to heap-allocated objects. We could just use standard C++ smart pointers as well and pass dummy do-nothing custom deleters for stack objects, but that's just as hacky and arguably more complicated than the Dispose()
solution.
Just one last question: what's the proper way to create a NativeWeakPtr<NPZCSupport>
from this
(in a NPZCSupport
object)? Or is that not what we want here?
Comment 13•1 year ago
|
||
(In reply to Razvan Cojocaru from comment #12)
Just one last question: what's the proper way to create a
NativeWeakPtr<NPZCSupport>
fromthis
(in aNPZCSupport
object)? Or is that not what we want here?
Perhaps something like this:
- Give
nsWindow
a methodGetNPZCSupportWeakPtr()
which returns a copy ofmNPZCSupport
- In
NPZCSupport
, do something like:
if (auto win = mWindow.Access()) {
nsWindow* gkWindow = window->GetNsWindow();
NativeWeakPtr<NPZCSupport> weakPtrToThis = gkWindow->GetNPZCSupportWeakPtr();
}
It's possible there is an easier way I'm not familiar with. We will ask for code review from someone on the GeckoView team and they can suggest improvements if needed.
Comment 14•1 year ago
|
||
Another option might be to have NPZCSupport::Observer
work more like NPZCSupport::InputEvent: store a java::PanZoomController::NativeProvider::GlobalRef
, and recover the NativeWeakPtr<NPZCSupport>
at invocation time like this.
Comment 15•1 year ago
|
||
Give Observer a virtual destructor while at it.
Updated•1 year ago
|
Comment 16•1 year ago
|
||
(In reply to Botond Ballo [:botond] from comment #10)
Procedurally, in cases like this where we notice a cleanup/improvement opportunity in code we're working with that's not strictly related to the bug being fixed, we'll sometimes do the cleanup/improvement in an additional patch on the same bug. This both keeps the patches clean (avoids mixing different changes in one patch) while also avoiding the overhead of filing a new bug.
Gah, I've just re-read this and I've completely missed the part about the change being in a separate patch, and now the single patch has been accepted. I'm sorry, should have paid more attention. I can split them now if you want.
Comment 17•1 year ago
|
||
(In reply to Razvan Cojocaru from comment #16)
(In reply to Botond Ballo [:botond] from comment #10)
Procedurally, in cases like this where we notice a cleanup/improvement opportunity in code we're working with that's not strictly related to the bug being fixed, we'll sometimes do the cleanup/improvement in an additional patch on the same bug. This both keeps the patches clean (avoids mixing different changes in one patch) while also avoiding the overhead of filing a new bug.
Gah, I've just re-read this and I've completely missed the part about the change being in a separate patch, and now the single patch has been accepted. I'm sorry, should have paid more attention. I can split them now if you want.
No worries, we're adding a new virtual function anyways so it seems fairly natural to add the virtual destructor in the same patch.
Comment 18•1 year ago
|
||
Pushed by bballo@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/81da50971081 Fix crash in [@ mozilla::widget::TouchResampler::NotifyFrame]. r=botond,geckoview-reviewers,m_kato
Comment 19•1 year ago
|
||
Adding leave-open keyword since we shouldn't close this bug until we see the crashes actually stop occuring.
Comment 20•1 year ago
|
||
bugherder |
Comment 21•1 year ago
|
||
(In reply to Botond Ballo [:botond] from comment #19)
Adding leave-open keyword since we shouldn't close this bug until we see the crashes actually stop occuring.
Writing down mostly for my own reference:
- Our speculative fix landed in version 110, which will hit the release on Feb 14
- Once 110 has been out for a few weeks, if we see crash reports from 108 and 109 but no crash reports from 110, that's a good sign our fix worked
Comment 22•1 year ago
|
||
Unfortunately, we're still seeing crashes in 110, which suggests that our theory about the cause of the crash was incorrect.
I don't have any alternative theories at this time. If anyone else has suggestions, please feel free to share them in a comment. Meanwhile I'm going to leave the bug open.
Comment 23•1 year ago
|
||
(In reply to Botond Ballo [:botond] from comment #22)
Unfortunately, we're still seeing crashes in 110, which suggests that our theory about the cause of the crash was incorrect.
I don't have any alternative theories at this time. If anyone else has suggestions, please feel free to share them in a comment. Meanwhile I'm going to leave the bug open.
Sorry to hear that. At least we've fixed a problem with the previous patch. :)
Updated•1 year ago
|
Comment 24•1 year ago
|
||
(In reply to Razvan Cojocaru from comment #23)
At least we've fixed a problem with the previous patch. :)
Indeed, thanks for your work on that!
Comment 25•1 year ago
|
||
(In reply to Botond Ballo [:botond] from comment #24)
(In reply to Razvan Cojocaru from comment #23)
At least we've fixed a problem with the previous patch. :)
Indeed, thanks for your work on that!
Happy to help!
Comment 26•7 months ago
|
||
The leave-open keyword is there and there is no activity for 6 months.
:botond, maybe it's time to close this bug?
For more information, please visit BugBot documentation.
Comment 27•7 months ago
|
||
The crash is still happening occasionally, so the bug should remain open.
However, the leave-open
keyword can be removed (it only needs to be present at the time a patch lands, if we don't want the landing to close the bug).
Description
•