Closed Bug 1217251 Opened 4 years ago Closed 4 years ago

Crash in APZ TaskThrottler in Fennec

Categories

(Core :: Panning and Zooming, defect)

ARM
Android
defect
Not set

Tracking

()

RESOLVED FIXED
mozilla44
Tracking Status
firefox44 --- fixed

People

(Reporter: rbarker, Assigned: botond)

References

Details

Attachments

(2 files)

Getting crash in fennec with C++APZ enabled. The TaskThrottler is trying to post a task to a nullptr returned by MessageLoop::current();

Stack trace:


Program received signal SIGSEGV, Segmentation fault.
0x57fd050a in mozilla::layers::TaskThrottler::PostTask (this=this@entry=0x605ab480, aLocation=..., aTask=<error reading variable: Cannot access memory at address 0x0>, aTimeStamp=...)
    at /Volumes/firefox/zoom/gfx/layers/apz/src/TaskThrottler.cpp:53
53	      if (!MessageLoop::current()) { MOZ_CRASH(); }
(gdb) bt
#0  0x57fd050a in mozilla::layers::TaskThrottler::PostTask (this=this@entry=0x605ab480, aLocation=..., aTask=<error reading variable: Cannot access memory at address 0x0>, aTimeStamp=...)
    at /Volumes/firefox/zoom/gfx/layers/apz/src/TaskThrottler.cpp:53
#1  0x57fd079e in mozilla::layers::AsyncPanZoomController::RequestContentRepaint (this=0x5caef600, this@entry=<error reading variable: Cannot access memory at address 0xbeaa6d74>, 
    aFrameMetrics=..., aFrameMetrics@entry=<error reading variable: Cannot access memory at address 0xbeaa6d74>, aThrottled=aThrottled@entry=true)
    at /Volumes/firefox/zoom/gfx/layers/apz/src/AsyncPanZoomController.cpp:2732
#2  0x57fd0bcc in mozilla::layers::AsyncPanZoomController::RequestContentRepaint (this=<error reading variable: Cannot access memory at address 0xbeaa6d74>)
    at /Volumes/firefox/zoom/gfx/layers/apz/src/AsyncPanZoomController.cpp:2698
#3  0x57fd0f46 in mozilla::layers::AsyncPanZoomController::ScheduleCompositeAndMaybeRepaint (this=<error reading variable: Cannot access memory at address 0xbeaa6d74>)
    at /Volumes/firefox/zoom/gfx/layers/apz/src/AsyncPanZoomController.cpp:2631
#4  0x57fd0f46 in mozilla::layers::AsyncPanZoomController::ScheduleCompositeAndMaybeRepaint (this=<error reading variable: Cannot access memory at address 0xbeaa6d74>)
    at /Volumes/firefox/zoom/gfx/layers/apz/src/AsyncPanZoomController.cpp:2631
This is 100% reproducible in the Android 4.3 emulator.
Regression from bug 1213273. I'm assuming MessageLoop::current() being null happens during shutdown or some other edge case, and we can just omit posting the timeout task in that case.
Blocks: 1213273
(In reply to Botond Ballo [:botond] (at standards meeting Oct 19-24) from comment #2)
> Regression from bug 1213273. I'm assuming MessageLoop::current() being null
> happens during shutdown or some other edge case, and we can just omit
> posting the timeout task in that case.

Sorry, I should have given more info on reproducing. Unfortunately this doesn't just occur at shut down. This will happen in Fennec while scrolling a page up and down quickly. I came across this while examining a reftest (gfx/layers/apz/test/reftest/async-scrollbar-1-v.html) for an unrelated failure.
Oh, interesting. Then I guess on Fennec, some threads (that APZ code can run on) don't have a MessageLoop at all?
It would be helpful to know what thread this occurs on. Randall, if you're able to catch this in gdb again, could you get the name of the thread that it occurs on?
Flags: needinfo?(rbarker)
(In reply to Botond Ballo [:botond] (at standards meeting Oct 19-24) from comment #5)
> It would be helpful to know what thread this occurs on. Randall, if you're
> able to catch this in gdb again, could you get the name of the thread that
> it occurs on?

It is easy to reproduce. I did this to try and get the thread name:

      if (!MessageLoop::current()) {                                             
        RLOG("NO current MessageLoop!");                                         
        PRThread* t = nsThreadManager::get()->GetCurrentThread()->GetPRThread(); 
        RLOG("In thread: %s", (PR_GetThreadName(t) ? PR_GetThreadName(t) : "<NO THREAD NAME>"));
        return;                                                                  
      }

Unfortunately it printed out: "<NO THREAD NAME>"

Is there something else I can do?
Flags: needinfo?(rbarker) → needinfo?(botond)
I havea stack in 1217529, feel free to dupe on way or the other ;)
From the stack in bug 1217529, it looks like the call happens on the Java UI thread (since we call into APZ from JNI). I guess it makes sense for the Java UI thread not to have a MessageLoop.

I see APZThreadUtils::RunOnControllerThread() special-cases Android APZ to use a different mechanism (than MessageLoop) to post a Task. We could probably use this function to post the task in TaskThrottler. (It might change the task from sometimes running on the compositor thread and sometimes on the controller thread, to alwas running on the controller thread, but that should be fine.)
Flags: needinfo?(botond)
Assignee: nobody → botond
Bug 1217251 - Add APZThreadUtils::RunDelayedTaskOnControllerThread(). r=mstange
Bug 1217251 - In TaskThrottler, dispatch the timeout task correctly from the Java UI thread on Android. r=mstange
The attached patches implement the approach described in comment 8 and should solve the problem.

https://treeherder.mozilla.org/#/jobs?repo=try&revision=7bb0ae36566e
I confirm that this fixes the crashes for me.
Duplicate of this bug: 1217529
Great, thanks! Posting for review.
Comment on attachment 8677686 [details]
MozReview Request: Bug 1217251 - Add APZThreadUtils::RunDelayedTaskOnCurrentThread(). r=kats

Bug 1217251 - Add APZThreadUtils::RunDelayedTaskOnControllerThread(). r=mstange
Attachment #8677686 - Flags: review?(mstange)
Comment on attachment 8677687 [details]
MozReview Request: Bug 1217251 - In TaskThrottler, dispatch the timeout task correctly from the Java UI thread on Android. r=kats

Bug 1217251 - In TaskThrottler, dispatch the timeout task correctly from the Java UI thread on Android. r=mstange
Attachment #8677687 - Flags: review?(mstange)
(In reply to Botond Ballo [:botond] (at standards meeting Oct 19-24) from comment #8)
> (It might
> change the task from sometimes running on the compositor thread and
> sometimes on the controller thread, to alwas running on the controller
> thread, but that should be fine.)

I would like to avoid this if possible. We can instead add an APZThreadUtils::RunDelayedTaskOnCurrentThread() which uses MessageLoop::current() if it is non-null. If it is null then it falls back to what you're doing now, which is posting to the controller thread. We can even add stricter assertions that if MessageLoop::current() is non-null then we must be on Fennec.
Sounds reasonable. I actually considered doing that I just wasn't sure if it mattered enough to be worth it.
Comment on attachment 8677686 [details]
MozReview Request: Bug 1217251 - Add APZThreadUtils::RunDelayedTaskOnCurrentThread(). r=kats

Clearing review flag until I make the requested change.
Attachment #8677686 - Flags: review?(mstange)
Attachment #8677687 - Flags: review?(mstange)
Comment on attachment 8677686 [details]
MozReview Request: Bug 1217251 - Add APZThreadUtils::RunDelayedTaskOnCurrentThread(). r=kats

Bug 1217251 - Add APZThreadUtils::RunDelayedTaskOnCurrentThread(). r=kats
Attachment #8677686 - Attachment description: MozReview Request: Bug 1217251 - Add APZThreadUtils::RunDelayedTaskOnControllerThread(). r=mstange → MozReview Request: Bug 1217251 - Add APZThreadUtils::RunDelayedTaskOnCurrentThread(). r=kats
Attachment #8677686 - Flags: review?(bugmail.mozilla)
Comment on attachment 8677687 [details]
MozReview Request: Bug 1217251 - In TaskThrottler, dispatch the timeout task correctly from the Java UI thread on Android. r=kats

Bug 1217251 - In TaskThrottler, dispatch the timeout task correctly from the Java UI thread on Android. r=kats
Attachment #8677687 - Attachment description: MozReview Request: Bug 1217251 - In TaskThrottler, dispatch the timeout task correctly from the Java UI thread on Android. r=mstange → MozReview Request: Bug 1217251 - In TaskThrottler, dispatch the timeout task correctly from the Java UI thread on Android. r=kats
Attachment #8677687 - Flags: review?(bugmail.mozilla)
Comment on attachment 8677686 [details]
MozReview Request: Bug 1217251 - Add APZThreadUtils::RunDelayedTaskOnCurrentThread(). r=kats

https://reviewboard.mozilla.org/r/22997/#review20677

::: gfx/layers/apz/util/APZThreadUtils.cpp:92
(Diff revision 2)
> +  // Fennec does not have a MessageLoop::current().

Indent the body of the else block more.
Also, append "on the controller thread" to this comment.
Attachment #8677686 - Flags: review?(bugmail.mozilla) → review+
Attachment #8677687 - Flags: review?(bugmail.mozilla) → review+
Comment on attachment 8677687 [details]
MozReview Request: Bug 1217251 - In TaskThrottler, dispatch the timeout task correctly from the Java UI thread on Android. r=kats

https://reviewboard.mozilla.org/r/22999/#review20679
https://reviewboard.mozilla.org/r/22997/#review20681

::: gfx/layers/apz/util/APZThreadUtils.cpp:96
(Diff revision 2)
> +  MOZ_ASSERT(false, "This non-Fennec platform should have a MessageLoop::current()");

Actually make this a release assert and remove the |delete| statement. If this ever happens I want to make sure we know about it. In particular I don't want the task throttler's task to get deleted and it to keep going because that seems like a UAF waiting to happen.
(In reply to Kartikaya Gupta (email:kats@mozilla.com) from comment #24)
> Actually make this a release assert and remove the |delete| statement. If
> this ever happens I want to make sure we know about it. In particular I
> don't want the task throttler's task to get deleted and it to keep going
> because that seems like a UAF waiting to happen.

Ah, good point - it's the running of the task that nulls TaskThrottler::mTimeoutTask, so if we delete the task without running it first then we indeed get a dangling pointer. Thanks for catching this!
https://hg.mozilla.org/mozilla-central/rev/027c42d1dcc8
https://hg.mozilla.org/mozilla-central/rev/c6404f3d787c
Status: NEW → RESOLVED
Closed: 4 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla44
You need to log in before you can comment on or make changes to this bug.