Acquire AsyncPanZoomController::mRecursiveMutex consistently before accessing Metrics()
Categories
(Core :: Panning and Zooming, enhancement, P3)
Tracking
()
Tracking | Status | |
---|---|---|
firefox68 | --- | fixed |
People
(Reporter: botond, Assigned: srujana121, Mentored)
References
(Blocks 1 open bug)
Details
(Whiteboard: [gfx-noted] [lang=c++])
Attachments
(1 file)
Comment 1•6 years ago
|
||
Reporter | ||
Comment 2•6 years ago
|
||
Reporter | ||
Comment 3•6 years ago
|
||
Comment 4•6 years ago
|
||
Reporter | ||
Comment 5•6 years ago
|
||
Comment 6•6 years ago
|
||
Updated•6 years ago
|
Reporter | ||
Comment 7•6 years ago
|
||
Reporter | ||
Comment 9•6 years ago
|
||
Comment 10•6 years ago
|
||
Reporter | ||
Comment 11•6 years ago
|
||
Comment 12•6 years ago
|
||
Reporter | ||
Comment 13•6 years ago
|
||
Comment 14•6 years ago
|
||
Reporter | ||
Comment 15•6 years ago
|
||
Comment 16•6 years ago
|
||
Comment 17•6 years ago
|
||
Reporter | ||
Comment 18•6 years ago
|
||
Assignee | ||
Comment 20•6 years ago
|
||
Hi, is this bug active? I would like to work on this.
Reporter | ||
Comment 21•6 years ago
|
||
(In reply to Srujana Peddinti from comment #20)
Hi, is this bug active? I would like to work on this.
Hi! Yes, you're welcome to give this a try. Please read over the comments so far, it should give you an idea of how to get started. Let me know if you have any questions!
Assignee | ||
Comment 22•6 years ago
|
||
This is the error stack. I am not able to understand where to lock mRecursiveMutex (RecursiveMutexAutoLock lock(mRecursiveMutex)).
0:00.52 /home/srujana/src5/mozilla-central/obj-x86_64-pc-linux-gnu/dist/bin/firefox -no-remote -profile /home/srujana/src5/mozilla-central/obj-x86_64-pc-linux-gnu/tmp/profile-default
++DOCSHELL 0x7f526ea6b000 == 1 [pid = 29424] [id = {1fac6e14-4a09-4445-9c1f-40d811acea65}]
++DOMWINDOW == 1 (0x7f5273829a60) [pid = 29424] [serial = 1] [outer = (nil)]
++DOMWINDOW == 2 (0x7f526eaa1000) [pid = 29424] [serial = 2] [outer = 0x7f5273829a60]
[29424, Main Thread] WARNING: NS_ENSURE_SUCCESS(rv, NS_ERROR_UNEXPECTED) failed with result 0x80004005: file /home/srujana/src5/mozilla-central/extensions/cookie/nsPermissionManager.cpp, line 1039
[29424, Main Thread] WARNING: Attempting to get a displayport from a content with no primary frame!: file /home/srujana/src5/mozilla-central/layout/base/nsLayoutUtils.cpp, line 762
++DOCSHELL 0x7f5269798800 == 2 [pid = 29424] [id = {dc55a44f-efab-4597-b494-28b2fbebc7a8}]
++DOMWINDOW == 3 (0x7f5269f8e3e0) [pid = 29424] [serial = 3] [outer = (nil)]
LoadPlugin() /usr/lib/flashplugin-installer/libflashplayer.so returned 7f5268143a90
Couldn't convert chrome URL: chrome://branding/locale/brand.properties
[Child 29498, Main Thread] WARNING: NS_ENSURE_SUCCESS(rv, rv) failed with result 0x80004005: file /home/srujana/src5/mozilla-central/netwerk/base/nsIOService.cpp, line 941
++DOCSHELL 0x7ff9af58b800 == 1 [pid = 29498] [id = {f288df59-5120-4c50-9086-9fe16aaca89c}]
++DOMWINDOW == 1 (0x7ff9b1efc020) [pid = 29498] [serial = 1] [outer = (nil)]
++DOMWINDOW == 4 (0x7f52738f1800) [pid = 29424] [serial = 4] [outer = 0x7f5269f8e3e0]
Assertion failure: IsAcquired() && mOwningThread == PR_GetCurrentThread(), at /home/srujana/src5/mozilla-central/xpcom/threads/BlockingResourceBase.cpp:537
[Parent 29424, Main Thread] WARNING: '!parent', file /home/srujana/src5/mozilla-central/netwerk/ipc/NeckoParent.cpp, line 955
#01: ???[/home/srujana/src5/mozilla-central/obj-x86_64-pc-linux-gnu/dist/bin/libxul.so +0x25dd084]
#02: ???[/home/srujana/src5/mozilla-central/obj-x86_64-pc-linux-gnu/dist/bin/libxul.so +0x42afd45]
#03: ???[/home/srujana/src5/mozilla-central/obj-x86_64-pc-linux-gnu/dist/bin/libxul.so +0x42c40bc]
#04: ???[/home/srujana/src5/mozilla-central/obj-x86_64-pc-linux-gnu/dist/bin/libxul.so +0x42aed71]
#05: ???[/home/srujana/src5/mozilla-central/obj-x86_64-pc-linux-gnu/dist/bin/libxul.so +0x430b9b2]
#06: ???[/home/srujana/src5/mozilla-central/obj-x86_64-pc-linux-gnu/dist/bin/libxul.so +0x430a53c]
#07: ???[/home/srujana/src5/mozilla-central/obj-x86_64-pc-linux-gnu/dist/bin/libxul.so +0x42dc839]
#08: ???[/home/srujana/src5/mozilla-central/obj-x86_64-pc-linux-gnu/dist/bin/libxul.so +0x42e46de]
#09: ???[/home/srujana/src5/mozilla-central/obj-x86_64-pc-linux-gnu/dist/bin/libxul.so +0x42af118]
#10: ???[/home/srujana/src5/mozilla-central/obj-x86_64-pc-linux-gnu/dist/bin/libxul.so +0x42c1f41]
#11: ???[/home/srujana/src5/mozilla-central/obj-x86_64-pc-linux-gnu/dist/bin/libxul.so +0x4439c2f]
#12: ???[/home/srujana/src5/mozilla-central/obj-x86_64-pc-linux-gnu/dist/bin/libxul.so +0x445329d]
#13: ???[/home/srujana/src5/mozilla-central/obj-x86_64-pc-linux-gnu/dist/bin/libxul.so +0x34f0bb4]
#14: ???[/home/srujana/src5/mozilla-central/obj-x86_64-pc-linux-gnu/dist/bin/libxul.so +0x32e3b7a]
#15: ???[/home/srujana/src5/mozilla-central/obj-x86_64-pc-linux-gnu/dist/bin/libxul.so +0x3164564]
#16: ???[/home/srujana/src5/mozilla-central/obj-x86_64-pc-linux-gnu/dist/bin/libxul.so +0x31630aa]
#17: ???[/home/srujana/src5/mozilla-central/obj-x86_64-pc-linux-gnu/dist/bin/libxul.so +0x316392c]
#18: ???[/home/srujana/src5/mozilla-central/obj-x86_64-pc-linux-gnu/dist/bin/libxul.so +0x3163e35]
#19: ???[/home/srujana/src5/mozilla-central/obj-x86_64-pc-linux-gnu/dist/bin/libxul.so +0x3083a59]
#20: ???[/home/srujana/src5/mozilla-central/obj-x86_64-pc-linux-gnu/dist/bin/libxul.so +0x3084206]
#21: ???[/home/srujana/src5/mozilla-central/obj-x86_64-pc-linux-gnu/dist/bin/libxul.so +0x3084443]
#22: ???[/home/srujana/src5/mozilla-central/obj-x86_64-pc-linux-gnu/dist/bin/libxul.so +0x30852d9]
#23: ???[/home/srujana/src5/mozilla-central/obj-x86_64-pc-linux-gnu/dist/bin/libxul.so +0x308389f]
#24: ???[/home/srujana/src5/mozilla-central/obj-x86_64-pc-linux-gnu/dist/bin/libxul.so +0x3083815]
#25: ???[/home/srujana/src5/mozilla-central/obj-x86_64-pc-linux-gnu/dist/bin/libxul.so +0x30837ca]
#26: ???[/home/srujana/src5/mozilla-central/obj-x86_64-pc-linux-gnu/dist/bin/libxul.so +0x30a540b]
#27: ???[/home/srujana/src5/mozilla-central/obj-x86_64-pc-linux-gnu/dist/bin/libxul.so +0x308a73e]
#28: ???[/lib/x86_64-linux-gnu/libpthread.so.0 +0x76db]
#29: clone[/lib/x86_64-linux-gnu/libc.so.6 +0x12188f]
#30: ??? (???:???)
Program /home/srujana/src5/mozilla-central/obj-x86_64-pc-linux-gnu/dist/bin/firefox (pid = 29424) received signal 11.
Stack:
#01: ???[/home/srujana/src5/mozilla-central/obj-x86_64-pc-linux-gnu/dist/bin/libxul.so +0xae742b1]
++DOMWINDOW == 2 (0x7ff9b1d23800) [pid = 29498] [serial = 2] [outer = 0x7ff9b1efc020]
#02: ???[/home/srujana/src5/mozilla-central/obj-x86_64-pc-linux-gnu/dist/bin/libxul.so +0xb8e67d7]
#03: ???[/lib/x86_64-linux-gnu/libpthread.so.0 +0x12890]
#04: ???[/home/srujana/src5/mozilla-central/obj-x86_64-pc-linux-gnu/dist/bin/libxul.so +0x25dd094]
#05: ???[/home/srujana/src5/mozilla-central/obj-x86_64-pc-linux-gnu/dist/bin/libxul.so +0x42afd45]
#06: ???[/home/srujana/src5/mozilla-central/obj-x86_64-pc-linux-gnu/dist/bin/libxul.so +0x42c40bc]
#07: ???[/home/srujana/src5/mozilla-central/obj-x86_64-pc-linux-gnu/dist/bin/libxul.so +0x42aed71]
#08: ???[/home/srujana/src5/mozilla-central/obj-x86_64-pc-linux-gnu/dist/bin/libxul.so +0x430b9b2]
#09: ???[/home/srujana/src5/mozilla-central/obj-x86_64-pc-linux-gnu/dist/bin/libxul.so +0x430a53c]
#10: ???[/home/srujana/src5/mozilla-central/obj-x86_64-pc-linux-gnu/dist/bin/libxul.so +0x42dc839]
#11: ???[/home/srujana/src5/mozilla-central/obj-x86_64-pc-linux-gnu/dist/bin/libxul.so +0x42e46de]
#12: ???[/home/srujana/src5/mozilla-central/obj-x86_64-pc-linux-gnu/dist/bin/libxul.so +0x42af118]
#13: ???[/home/srujana/src5/mozilla-central/obj-x86_64-pc-linux-gnu/dist/bin/libxul.so +0x42c1f41]
#14: ???[/home/srujana/src5/mozilla-central/obj-x86_64-pc-linux-gnu/dist/bin/libxul.so +0x4439c2f]
#15: ???[/home/srujana/src5/mozilla-central/obj-x86_64-pc-linux-gnu/dist/bin/libxul.so +0x445329d]
#16: ???[/home/srujana/src5/mozilla-central/obj-x86_64-pc-linux-gnu/dist/bin/libxul.so +0x34f0bb4]
#17: ???[/home/srujana/src5/mozilla-central/obj-x86_64-pc-linux-gnu/dist/bin/libxul.so +0x32e3b7a]
#18: ???[/home/srujana/src5/mozilla-central/obj-x86_64-pc-linux-gnu/dist/bin/libxul.so +0x3164564]
#19: ???[/home/srujana/src5/mozilla-central/obj-x86_64-pc-linux-gnu/dist/bin/libxul.so +0x31630aa]
#20: ???[/home/srujana/src5/mozilla-central/obj-x86_64-pc-linux-gnu/dist/bin/libxul.so +0x316392c]
#21: ???[/home/srujana/src5/mozilla-central/obj-x86_64-pc-linux-gnu/dist/bin/libxul.so +0x3163e35]
#22: ???[/home/srujana/src5/mozilla-central/obj-x86_64-pc-linux-gnu/dist/bin/libxul.so +0x3083a59]
#23: ???[/home/srujana/src5/mozilla-central/obj-x86_64-pc-linux-gnu/dist/bin/libxul.so +0x3084206]
#24: ???[/home/srujana/src5/mozilla-central/obj-x86_64-pc-linux-gnu/dist/bin/libxul.so +0x3084443]
#25: ???[/home/srujana/src5/mozilla-central/obj-x86_64-pc-linux-gnu/dist/bin/libxul.so +0x30852d9]
#26: ???[/home/srujana/src5/mozilla-central/obj-x86_64-pc-linux-gnu/dist/bin/libxul.so +0x308389f]
#27: ???[/home/srujana/src5/mozilla-central/obj-x86_64-pc-linux-gnu/dist/bin/libxul.so +0x3083815]
#28: ???[/home/srujana/src5/mozilla-central/obj-x86_64-pc-linux-gnu/dist/bin/libxul.so +0x30837ca]
#29: ???[/home/srujana/src5/mozilla-central/obj-x86_64-pc-linux-gnu/dist/bin/libxul.so +0x30a540b]
#30: ???[/home/srujana/src5/mozilla-central/obj-x86_64-pc-linux-gnu/dist/bin/libxul.so +0x308a73e]
#31: ???[/lib/x86_64-linux-gnu/libpthread.so.0 +0x76db]
#32: clone[/lib/x86_64-linux-gnu/libc.so.6 +0x12188f]
#33: ??? (???:???)
Sleeping for 300 seconds.
Type 'gdb /home/srujana/src5/mozilla-central/obj-x86_64-pc-linux-gnu/dist/bin/firefox 29424' to attach your debugger to this thread.
[Parent 29424, Main Thread] WARNING: '!parent', file /home/srujana/src5/mozilla-central/netwerk/ipc/NeckoParent.cpp, line 955
Couldn't convert chrome URL: chrome://branding/locale/brand.properties
[Child 29528, Main Thread] WARNING: NS_ENSURE_SUCCESS(rv, rv) failed with result 0x80004005: file /home/srujana/src5/mozilla-central/netwerk/base/nsIOService.cpp, line 941
[Parent 29424, Main Thread] WARNING: Failed to get base domain!: file /home/srujana/src5/mozilla-central/ipc/glue/BackgroundUtils.cpp, line 328
[Parent 29424, Main Thread] WARNING: Attempt to Lock a texture that is being read by the compositor!: file /home/srujana/src5/mozilla-central/gfx/layers/client/TextureClient.cpp, line 501
Done sleeping...
[Child 29528, Chrome_ChildThread] WARNING: pipe error (25): Connection reset by peer: file /home/srujana/src5/mozilla-central/ipc/chromium/src/chrome/common/ipc_channel_posix.cc, line 357
Exiting due to channel error.
[Child 29528, Chrome_ChildThread] WARNING: pipe error (24): Connection reset by peer: file /home/srujana/src5/mozilla-central/ipc/chromium/src/chrome/common/ipc_channel_posix.cc, line 357
[Child 29528, Chrome_ChildThread] WARNING: pipe error (23): Connection reset by peer: file /home/srujana/src5/mozilla-central/ipc/chromium/src/chrome/common/ipc_channel_posix.cc, line 357
Exiting due to channel error.
Reporter | ||
Comment 23•6 years ago
•
|
||
Your backtrace does not have symbols in it. You need a backtrace with symbols to be able to interpret it usefully.
Can you attach the debugger as instructed in the output:
Type 'gdb /home/srujana/src5/mozilla-central/obj-x86_64-pc-linux-gnu/dist/bin/firefox 29424' to attach your debugger to this thread.
and run the backtrace
command, and see if the backtrace from the debugger is any better?
Assignee | ||
Comment 24•6 years ago
|
||
I tried looking in the files in backtrace output. But I couldnt understand exactly whats happening. As I understand some function is accessing metrics without locking. But I cant seem to find the function which is calling metrics without calling lock.
Since this is my first mutex related bug, can you help me through it with this error? I hope I can similarly debug the errors in the later stage of this bug on my own.
Below is the log. Thank you in Advance.
mach run output : https://pastebin.com/W2hGWauu
gdb output : https://pastebin.com/aBZD37Mb
backtrace output : https://pastebin.com/fpWwH0GA
Reporter | ||
Comment 25•6 years ago
|
||
(In reply to Srujana Peddinti from comment #24)
backtrace output : https://pastebin.com/fpWwH0GA
This is better, as it now has symbols.
Unfortunately, it does not match the location of the failing assertion. From the mach run
output, the failing assertion is here:
Assertion failure: IsAcquired() && mOwningThread == PR_GetCurrentThread(), at /home/srujana/src5/mozilla-central/xpcom/threads/BlockingResourceBase.cpp:537
So the backtrace should have an entry at BlockingResourceBase.cpp:537, but it doesn't.
I suspect that's because we're looking at the backtrace for the wrong thread. Firefox is a multi-threaded application; when you attach gdb to it, gdb will select one of the threads (likely the main thread) to be active, and the backtrace
command will apply to that thread, but our assertion failure is in a different thread.
You can type info threads
in gdb to list all the threads, and thread <N>
to select a thread by number (after which backtrace
will now give the backtrace for the newly selected thread). The thread with the assertion failure is likely to be the one labelled Compositor
. (If not, feel free to post the info threads
output and I can make a better guess from there.)
Could you try getting a backtrace for the right thread? Things should start to make more sense once we have that.
Reporter | ||
Comment 26•6 years ago
|
||
(You can also run thread apply all backtrace
to get gdb to print backtraces for every thread. However, that's likely to produce a lot of output and it's probably easier to identify the right thread and get the backtrace just for that.)
Assignee | ||
Comment 27•6 years ago
|
||
I think only one thread is being started.
Immediately after attaching gdb, this is the output. https://pastebin.com/aBZD37Mb
As I understand, gdb itself is starting some threads. (The "[New LWP <no.>]" lines 17-52 )
When I do "info threads", the only threads are the LWP numbers in lines 17 to 52 and the original thread itself.
Did I miss something?
Reporter | ||
Comment 28•6 years ago
|
||
(In reply to Srujana Peddinti from comment #27)
Immediately after attaching gdb, this is the output. https://pastebin.com/aBZD37Mb
That seems normal.
As I understand, gdb itself is starting some threads. (The "[New LWP <no.>]" lines 17-52 )
I think that's more like gdb noticing the existing threads of the application when it attaches.
When I do "info threads", the only threads are the LWP numbers in lines 17 to 52 and the original thread itself.
So, there are no thread names / labels? Could you paste the output of info threads
?
Assignee | ||
Comment 29•6 years ago
|
||
info threads : https://pastebin.com/9Stb88qp
compositor backtrace : https://pastebin.com/nGZWGmeN
I read the code but cant figure out whats happening.
Reporter | ||
Comment 30•6 years ago
•
|
||
(In reply to Srujana Peddinti from comment #29)
compositor backtrace : https://pastebin.com/nGZWGmeN
Thanks, this is the backtrace we were looking for!
I read the code but cant figure out whats happening.
The assertion failures we're interested in are in calls to AsyncPanZoomController::Metrics()
. In the backtrace, we can see this function being called in frame #8.
The fact that the assertion is firing means that AsyncPanZoomController::mRecursiveMutex
is not being held during this call to Metrics()
. The mutex should be acquired by the caller of Metrics()
(or one of its callers, transitively).
In this case, the caller is the AsyncPanZoomController
constructor (frame #9). Presumably, the call to Metrics()
in question is this one. (The line number is slightly different in the stack trace, likely because the file has changed a bit since the version you built, but that and two lines below it are the only places where the constructor calls Metrics()
, so it must be it.)
Having identified the offending call to Metrics()
, the fix will vary case by case. It will typically involve adding a statement of the form RecursiveMutexAutoLock lock(mRecursiveMutex);
in the calling function, preceding the call to Metrics()
, though there are some nuances such as being careful to respect the lock order as mentioned in comment 3.
This particular case is a bit more challenging, because the call to Metrics()
occurs inside the constructor initializer list, rather than the constructor body. There is no place syntactically to create a RecursiveMutexAutoLock
object such that it covers the execution of the member initializers. We can't do it in the constructor's caller, either, because the mutex we need to lock is also a member of AsyncPanZoomController
, and so it will not have been created yet in the constructor's caller.
One possible solution here would be to move the initialization of the affected variable (in this case, mZoomConstraints
) into the body of the constructor, where we can then acquire the mutex.
However, taking a step back: there isn't actually a need to guard against concurrent access to fields while executing the constructor, since it's not possible for another thread to have access to the same AsyncPanZoomController
object and be calling a member function on it while it's still being constructed. (I mean, technically it's possible, if the constructor were to say schedule a task to run on another thread, passing it the object's address, and that task ended up running before the constructor completed. However, we can see by inspection that the AsyncPanZoomController
constructor does no such thing, and I think it's safe to assume it won't do such a thing in the future either, as it's an odd thing for a constructor to be doing.)
So, a simpler solution here is to just bypass the check, by replacing the call to Metrics()
with a more direct access to mScrollMetadata.GetMetrics()
(that's just the implementation of Metrics()
, minus the assertion check we're adding).
Having done that, you can continue looking for other places where the assertion is firing. Presumably some other places that require an actual fix will turn up.
Assignee | ||
Comment 31•6 years ago
|
||
'''
AutoApplyAsyncTestAttributes::AutoApplyAsyncTestAttributes(
AsyncPanZoomController* aApzc)
: mApzc(aApzc), mPrevFrameMetrics(aApzc->Metrics()) {
mApzc->ApplyAsyncTestAttributes();
}
'''
This is the function thats causing the problem now. What should I do? Should I change the metrics to mScrollMetadata.GetMetrics() or should I lock it before this gets called and unlock it afterwards?
Till now, wherever the errors are I was sure of no concurrent access, so I replaced Metrics() with mScrollMetadata.GetMetrics(), is it okay?
Reporter | ||
Comment 32•6 years ago
|
||
(In reply to Srujana Peddinti from comment #31)
'''
AutoApplyAsyncTestAttributes::AutoApplyAsyncTestAttributes(
AsyncPanZoomController* aApzc)
: mApzc(aApzc), mPrevFrameMetrics(aApzc->Metrics()) {
mApzc->ApplyAsyncTestAttributes();
}
'''This is the function thats causing the problem now. What should I do? Should I change the metrics to mScrollMetadata.GetMetrics() or should I lock it before this gets called and unlock it afterwards?
This is also a constructor, but it's not the AsyncPanZoomController
constructor, it's the constructor of a helper class that gets created by some AsyncPanZoomController
methods. We should add locking to those methods.
Till now, wherever the errors are I was sure of no concurrent access, so I replaced Metrics() with mScrollMetadata.GetMetrics(), is it okay?
Do you mean you've found other cases than the one discussed in comment 30 where you made this replacement? I'd have to see what they are to reason about them. Feel free to post a work-in-progress patch if you'd like feedback on what you've done so far.
Assignee | ||
Comment 33•6 years ago
|
||
I ll upload the feedback patch in couple of days. Now I think i got the hang of what to be done and where to look.
I just have one doubt.
As I understand "RecursiveMutexAutoLock lock(mRecursiveMutex)" locks the mRecursiveMutex. I cant find where this is being unlocked.
These are the occurrences of mRecursiveMutex :
I am not able to find similar unlock function anywhere.
Also what is mRecursiveMutex.Unlock() and mRecursiveMutex.Lock(). How is mRecursiveMutex.Lock() different from lock(mRecursiveMutex) ? Are the the same? So can I use mRecursiveMutex.Unlock() to unlock the mutex locked by lock(mRecursiveMutex)?
Assignee | ||
Comment 34•6 years ago
|
||
I can see that there is RecursiveMutexAutoUnlock in the file xpcom/threads/RecursiveMutex.h
But I cant find its usage anywhere, so I wanted to make sure.
Reporter | ||
Comment 35•6 years ago
|
||
Hint: look at the implementation of RecursiveMutexAutoLock
.
Assignee | ||
Comment 36•6 years ago
|
||
Yes, I did. Thats what confused me. because it is using Lock() in its constructor. Are they interchangeable?
I did not understand what this meant "MOZ_GUARD_OBJECT_NOTIFIER_INIT;" I did not understand the difference between the two.
Also as I understand RecursiveMutexAutoLock constructor locks the mutex and the destructor unlocks it. So should I do delete(lock) to unlock it? Will that be fine? Or should I explicitly use RecursiveMutexAutoUnLock(which I personally think is an overkill).
Reporter | ||
Comment 37•6 years ago
|
||
In C++, the destructor of a local variable is called automatically at the end of the scope in which the variable is declared (i.e. the end of the function, or the end of a smaller block like an if statement if the variable is declared inside one).
So, you do not need to do anything to explicitly unlock the mutex; just declare a RecursiveMutexAutoLock
variable as a local variable, and the unlocking will happen automatically at the end of the scope.
This pattern is called "RAII". You can read more about it here. (That article mentions std::lock_guard
. RecursiveMutexAutoLock
serves a very similar purpose.)
Assignee | ||
Comment 38•6 years ago
|
||
Hi, I have a problem that I am not able to solve.
There is this line AutoApplyAsyncTestAttributes testAttributeApplier(wrapper.GetApzc()) in gfx/layers/composite/AsyncCompositionManager.cpp thats calling the constructing which calls Metrics().
So I tried to add these lines above it.
mozilla::RecursiveMutex mRecursiveMutex("AsyncPanZoomController");
RecursiveMutexAutoLock lock(mRecursiveMutex);
But the error is still the same. As I understand I initialised a new recursivemutex that locks AsyncPanZoomController. And then locked it.
Is this not the right way or not the right mutex to lock?
(Sorry for the gap, I am having my semester exams)
Assignee | ||
Comment 39•6 years ago
|
||
As I understand, the issue is that mRecursiveMutex is a member in the class AsyncPanZoomController. This line is in the class AsyncCompositionManager.
AsyncPanZoomController is not a friend of AsyncCompositionManager. How do I access its member?
So instead, I tried creating another lock as above, but didnt help.
Assignee | ||
Comment 40•6 years ago
|
||
Is this the right lock ?
RecursiveMutexAutoLock lock(((wrapper.GetApzc())->mRecursiveMutex);
Reporter | ||
Comment 41•6 years ago
•
|
||
(In reply to Srujana Peddinti from comment #38)
So I tried to add these lines above it.
mozilla::RecursiveMutex mRecursiveMutex("AsyncPanZoomController");
RecursiveMutexAutoLock lock(mRecursiveMutex);
We don't want to create new mutexes here, just add guards (RecursiveMutexAutoLock
objects) which lock the existing AsyncPanZoomController::mRecursiveMutex
.
As I understand I initialised a new recursivemutex that locks AsyncPanZoomController.
Note that the string argument to the RecursiveMutex
constructor doesn't determine what the mutex locks. It's just a label that's used for display and debugging.
(In fact, "what the mutex locks" isn't expressed anywhere in the mutex's API. Rather, it's determined by how the mutex is used. So, we decide that we want AsyncPanZoomController::mRecursiveMutex
to guard certain member variables such as AsyncPanZoomController::mScrollMetadata
, and then we lock the mutex before entering any code that accesses those variables.)
Now, about the specific place you mentioned:
There is this line AutoApplyAsyncTestAttributes testAttributeApplier(wrapper.GetApzc()) in gfx/layers/composite/AsyncCompositionManager.cpp thats calling the constructing which calls Metrics().
It would indeed be tricky to acquire AsyncPanZoomController::mRecursiveMutex
from AsyncCompositionManager
. (Yes, wrapper.GetApzc()->mRecursiveMutex
is the right mutex, but we don't have access to it from this code because mRecursiveMutex
is a private member of AsyncPanZoomController
.)
However, usages of AutoApplyAsyncTestAttributes
in AsyncCompositionManager.cpp
have actually been removed on 2019-03-28 (in bug 1526489). You must have a version of the code checked out that's older than that. Please update to the latest mozilla-central, and this particular problem will be solved by the code in question having gone away :)
Reporter | ||
Comment 42•6 years ago
|
||
(In case you're curious about what we might have done if we had to keep usages of AutoApplyAsyncTestAttributes
in AsyncCompositionManager
: one idea might have been to make the RecursiveMutexAutoLock
a member of AutoApplyAsyncTestAttributes
. Members are created in order of declaration, so the mutex would be locked before the initializer of another member called Metrics()
. But again, this is no longer relevant given the new state of the code.)
Assignee | ||
Comment 43•6 years ago
|
||
Made all the calls to GetFrameMetrics() go through Metrics() in AsyncPanZoomController.cpp. Locked the mutex mRecursiveMutex before every call to Metrics(). Moved the Assertion from GetFrameMetrics() to Metrics(). GetFrameMetrics() now can be safely removed. It serves no other purpose than to call Matrics().
Assignee | ||
Comment 44•6 years ago
|
||
I uploaded the patch. I tested it by using nightly firefox for a while. There wasnt any crash.
Should I test this on treeherder? If yes, what tests should I run?
Reporter | ||
Comment 45•6 years ago
|
||
(In reply to Srujana Peddinti from comment #44)
I uploaded the patch. I tested it by using nightly firefox for a while. There wasnt any crash.
Great, thanks!
Should I test this on treeherder?
Yep. I pushed it for you: https://treeherder.mozilla.org/#/jobs?repo=try&tier=1&revision=7ca272b3776050f9a5671bfc08d2168b5a1c8141
If yes, what tests should I run?
I usually run builds on all platforms, and tests on Linux and Android. This gives us pretty good coverage without putting heavy strain on our CI resources by running tests on all platforms.
Reporter | ||
Comment 46•6 years ago
|
||
(In reply to Botond Ballo [:botond] from comment #45)
(In reply to Srujana Peddinti from comment #44)
Should I test this on treeherder?
Yep. I pushed it for you: https://treeherder.mozilla.org/#/jobs?repo=try&tier=1&revision=7ca272b3776050f9a5671bfc08d2168b5a1c8141
We can see that the assertion is triggered by some of the tests.
You can get backtraces by clicking on a test job, and then clicking the log button ("Open the log viewer in a new window") in the bottom panel that opens up.
Assignee | ||
Comment 47•6 years ago
|
||
(In reply to Botond Ballo [:botond] from comment #46)
We can see that the assertion is triggered by some of the tests.
You can get backtraces by clicking on a test job, and then clicking the log button ("Open the log viewer in a new window") in the bottom panel that opens up.
So there is this assert statement thats causing one of the crashes.
MOZ_ASSERT(apzc->Metrics().IsRootContent());
So ideally I should do this.
RecursiveMutexAutoLock lock(apzc->mRecursiveMutex);
But I cant because mRecursiveMutex is a protected member of the class.
What would you suggest me to do?
Reporter | ||
Comment 48•6 years ago
|
||
In a situation like this, we could add a public method AsyncPanZoomController::IsRootContent()
which creates a guard and returns Metrics().IsRootContent()
.
Reporter | ||
Comment 49•6 years ago
|
||
In fact, we already have one.
Reporter | ||
Comment 50•6 years ago
|
||
Here's a new Try push with the updated patch:
Assignee | ||
Comment 51•6 years ago
|
||
Hi. I solved the initial errors and ran all linux and android tests.
https://treeherder.mozilla.org/?fbclid=IwAR3l80vJ3qdOB2d-Tk--uk4TnUfLUpK3lYbTwLD9DCOk_SvHTpk2HEr6Pfo#/jobs?repo=try&revision=1bcbc0f8536cd57ede7ac3fef12dad7f253a7650&selectedJob=243127677
I got few more errors which werent in the original testing. Also I saw your treeherder link and fixed them.
want to confirm two things before I push it one last time to tree.
-
Should I do full testing for linux and android or should I run only the tests you have run?
https://searchfox.org/mozilla-central/source/gfx/layers/apz/src/AsyncPanZoomController.cpp#594
This class in its constructor calls Metrics()
So as I understand everytime we create an object with this class, we should lock the mutex.
But here were are not.
https://searchfox.org/mozilla-central/source/gfx/layers/apz/src/AsyncPanZoomController.cpp#4160
Is there a reason for that or should I change it?
I think this will be the last issue. And after this I dont think there will be any errors.
Reporter | ||
Comment 52•6 years ago
|
||
(In reply to Srujana Peddinti from comment #51)
- Should I do full testing for linux and android or should I run only the tests you have run?
My Try push should have run ran all Linux and Android tests, as I specified "-u all" in the Try syntax.
https://searchfox.org/mozilla-central/source/gfx/layers/apz/src/AsyncPanZoomController.cpp#594
This class in its constructor calls Metrics()
So as I understand everytime we create an object with this class, we should lock the mutex.But here were are not.
https://searchfox.org/mozilla-central/source/gfx/layers/apz/src/AsyncPanZoomController.cpp#4160Is there a reason for that or should I change it?
I think that's just an oversight, like other places we've fixed in this bug. GetCurrentPinchZoomScale()
should createa a RecursiveMutexAutoLock
.
Assignee | ||
Comment 53•6 years ago
|
||
Hi, I just wanted to confirm something before fixing the below issue.
https://treeherder.mozilla.org/logviewer.html#/jobs?job_id=243125893&repo=try&lineNumber=2372
Is this happening because in the function AsyncPanZoomController::AdvanceAnimations here, we are locking the mutex once. And again for the variable StateChangeNotificationBlocker blocker, in the destructor here , again the mutex is being locked. Naturally locking the mutex(unless its a semaphore) twice is not allowed and hence the error, I am guessing. is that true?
If yes, this is how I think we should solve it. We can unlock it specifically before we go out of scope. (so that we are unlocking before the destructor is called). Am I correct?
Reporter | ||
Comment 54•6 years ago
|
||
(In reply to Srujana Peddinti from comment #53)
Is this happening because in the function AsyncPanZoomController::AdvanceAnimations here, we are locking the mutex once. And again for the variable StateChangeNotificationBlocker blocker, in the destructor here , again the mutex is being locked. Naturally locking the mutex(unless its a semaphore) twice is not allowed and hence the error, I am guessing. is that true?
A regular mutex cannot be locked multiple times, but a RecursiveMutex
can (that's what the "recursive" in the name signifies).
However, that's not actually happening here. A local variable is destroyed at the end of the innermost block scope in which it is declared. So, the guard that's created here is destroyed here (in fact, we created that block scope specifically to ensure that the unlocking will happen where we want it to). The destructor of the notification blocker created here, however, runs here. As a result, there is no overlap between the first guard, and the one created by the notification blocker's destructor.
I suspect the problem is actually that the StateChangeNotificationBlocker
destructor calls DispatchStateChangeNotification()
, which accesses Metrics()
here. That is covered by neither this guard (which is destroyed here), nor this one (which is destroyed here).
We can solve this by creating another guard before the UpdateRootFrameMetrics()
call. (I believe the reason for not having a guard that covers the entire function is that we don't want the mutex locked during the NotifyAPZStateChange()
calls.)
Assignee | ||
Comment 55•6 years ago
|
||
I fixed all the errors. Tested it on treeherder.
https://treeherder.mozilla.org/#/jobs?repo=try&revision=9ec5c29c667e5a3496b1de5799c7e91d9efe81f9
Let me know if there are any further changes that required to be made.
Assignee | ||
Comment 56•6 years ago
|
||
I have made the changes and tested them on treeherder for Android platforms. It passed the tests. I'm testing for all platforms now. Can you please check?
All platforms:
https://treeherder.mozilla.org/#/jobs?repo=try&revision=5461e58712586761f98a5250136d45f34178e003
Android:
https://treeherder.mozilla.org/#/jobs?repo=try&revision=c91f44894b7063244b282ef1c2d220c2d50a9c53
Tests:
gtests, mochitests , reftests .
Updated•6 years ago
|
Updated•6 years ago
|
Reporter | ||
Updated•6 years ago
|
Updated•6 years ago
|
Comment 57•6 years ago
|
||
Comment 58•6 years ago
|
||
bugherder |
Reporter | ||
Comment 59•6 years ago
|
||
Thank you for your work here, Srujana! I tweaked the bug description to better represent what the patch that landed does.
Reporter | ||
Comment 60•6 years ago
|
||
I filed bug 1549504 as a follow-up for potentially landing the assertion as well. Let me know if you're interested in working on that!
Description
•