Closed
Bug 1257862
Opened 8 years ago
Closed 8 years ago
Scrolling via mousewheel and touchpad intermittently stops working
Categories
(Core :: Panning and Zooming, defect, P1)
Tracking
()
VERIFIED
FIXED
mozilla48
Tracking | Status | |
---|---|---|
firefox47 | --- | unaffected |
firefox48 | + | verified |
People
(Reporter: RyanVM, Assigned: kats)
References
Details
(Keywords: regression, Whiteboard: [gfx-noted])
Attachments
(2 files)
There have been a couple recent mentions on IRC of people encountering this on OSX recently. I've also seen the same on Win10 and restarting Fx is the only way to get it working again. Scrolling via arrow keys and scrollbars still works. This only broke for me yesterday, but happened twice throughout the course of the day. Unfortunately, I don't have reliable STR :(
Assignee | ||
Updated•8 years ago
|
status-firefox48:
--- → affected
Version: 47 Branch → 48 Branch
Assignee | ||
Updated•8 years ago
|
Comment 1•8 years ago
|
||
I'm seeing this too every few hours on OSX. Trying to get some STR.
Comment 2•8 years ago
|
||
Tracking since this sounds likely to be a recent regression.
Comment 3•8 years ago
|
||
Could it have been caused by bug 1247854? The timing looks similar
Comment 4•8 years ago
|
||
(In reply to :Felipe Gomes (needinfo me!) from comment #3) > Could it have been caused by bug 1247854? The timing looks similar That got uplifted to beta, so that's not very good... Markus?
Flags: needinfo?(mstange)
Assignee | ||
Comment 5•8 years ago
|
||
It's unlikely to have been caused by that. If mousewheel/trackpad stops working entirely it's likely that some global state is getting screwed up, where as the changes in bug 1247854 are per-page, so loading a new page or switching tabs would fix it.
Assignee | ||
Comment 6•8 years ago
|
||
That being said we definitely need to figure out what's going on here. Marking it as a APZ release blocker, although for the moment we've only had it reported on Nightly/48.
Updated•8 years ago
|
Flags: needinfo?(mstange)
Comment 7•8 years ago
|
||
I've found a page where I can reproduce this reliably. https://www.rightscon.org/program/ If you mouse over the schedule and scroll, it doesn't work, but if you're just left or right of the schedule, it starts working again.
Comment 8•8 years ago
|
||
(In reply to Mike Hoye [:mhoye] from comment #7) > but if you're > just left or right of the schedule, it starts working again. Unfortunately, this means that it's a different bug. In this bug, scrolling stops permanently for the whole window.
Comment 9•8 years ago
|
||
The next time I hit this (which seems to be about once a day), is there a place I should be setting a breakpoint with lldb / Xcode to give you some actionable data about what's going wrong?
Assignee | ||
Comment 10•8 years ago
|
||
Some useful places to set breakpoints (assuming this is OS X trackpad scrolling): [1] [2] [3] [4]. Depending on which of these breakpoints get hit (or not) we'll have a better idea of what's going on. If you run into this and botond is in the office he should be able to help you debug it. [1] http://mxr.mozilla.org/mozilla-central/source/gfx/layers/apz/src/InputQueue.cpp?rev=fd968c710a39#49 [2] http://mxr.mozilla.org/mozilla-central/source/gfx/layers/apz/src/AsyncPanZoomController.cpp?rev=4ba52b16f83a#1131 [3] http://mxr.mozilla.org/mozilla-central/source/gfx/layers/apz/src/AsyncPanZoomController.cpp?rev=4ba52b16f83a#2929 [4] http://mxr.mozilla.org/mozilla-central/source/gfx/layers/apz/util/APZCCallbackHelper.cpp?rev=6a6e000c7a42#100
Comment 11•8 years ago
|
||
I can reliably reproduce this on my main profile , on OS X, using this url: data:text/html,<input type=button onclick=window.print()>aaa</input> You need to click on the button and you can cancel the print afterwards. Manually triggering a print by Cmd+P doesn't seem to work. But I cannot reproduce it on a fresh profile I'm digging to see if I can get any more info
Comment 12•8 years ago
|
||
(In reply to Mike Hoye [:mhoye] from comment #7) > I've found a page where I can reproduce this reliably. > https://www.rightscon.org/program/ > > If you mouse over the schedule and scroll, it doesn't work, but if you're > just left or right of the schedule, it starts working again. I can reproduce this on Linux. Split it out into bug 1259135.
Comment 13•8 years ago
|
||
15:09.93 INFO: Narrowed nightly regression window from [2016-03-10, 2016-03-12] (2 days) to [2016-03-11, 2016-03-12] (1 days) (~0 steps left) 15:09.93 INFO: Got as far as we can go bisecting nightlies... 15:09.93 INFO: Last good revision: 102886e9ac63b3fa33a6f1b394aea054abce2dfd (2016-03-11) 15:09.93 INFO: First bad revision: 946ed22cad04431c75ab5093989dfedf1bae5a3e (2016-03-12) 15:09.93 INFO: Pushlog: https://hg.mozilla.org/mozilla-central/pushloghtml?fromchange=102886e9ac63b3fa33a6f1b394aea054abce2dfd&tochange=946ed22cad04431c75ab5093989dfedf1bae5a3e Continuing mozregression into inbound now
Comment 14•8 years ago
|
||
Hmm mozregression won't go into inbound 0:01.54 INFO: Getting mozilla-inbound builds between 102886e9ac63b3fa33a6f1b394aea054abce2dfd and 946ed22cad04431c75ab5093989dfedf1bae5a3e 0:07.28 WARNING: Skipping build 28bca8e29596: Unable to find build info using the taskcluster route 'gecko.v2.mozilla-inbound.revision.28bca8e29596cfa1c24b864925cb5cb5e20e35ab.firefox.macosx64-opt' 0:09.33 INFO: Oh noes, no (more) inbound revisions :( From the range above, bug 1242690 looks like a good candidate
Assignee | ||
Comment 15•8 years ago
|
||
Ah, I can repro using your STR if I enable apz.drag.enabled, but not if that pref is disabled (the default). Everybody else who's seen this issue: do you guys have apz.drag.enabled set to true?
Flags: needinfo?(ryanvm)
Flags: needinfo?(mconley)
Assignee | ||
Comment 18•8 years ago
|
||
Felipe, do you? If so that might be the difference between your main profile and the clean profile - and if so this issue you found is a separate issue (although may still have the same underlying root cause).
Assignee | ||
Comment 19•8 years ago
|
||
Actually I can repro this problem even without apz.drag.enabled. However it's racy. What's happening is that we schedule the main-thread timeout on the drag block in the InputQueue, and the timeout callback never gets called. I think the print dialog opening up somehow destroys the MessageLoop on which the callback was scheduled, and so it drops that task on the floor. As a result the drag block never moves out of the queue and just all future input events get stuck behind it.
Assignee | ||
Updated•8 years ago
|
Assignee | ||
Updated•8 years ago
|
Keywords: regressionwindow-wanted
Comment 20•8 years ago
|
||
Are there other ways that race could happen? I don't seem to recall printing lately, certainly not leading up to hitting this bug.
Comment 21•8 years ago
|
||
I was printing just before I hit this bug.
Comment 22•8 years ago
|
||
I don't have apz.drag.enabled either I hit this once right when printing but also a couple of times when printing was not involved. So something else in my profile causes the race condition to always fail
Assignee | ||
Comment 23•8 years ago
|
||
So it seems like it's not the MessageLoop getting destroyed, but some code calling SetNestableTasksAllowed(false) on the MessageLoop. I'm still digging but presumably any sort of modal dialog (not just printing) might trigger this.
Reporter | ||
Comment 24•8 years ago
|
||
That could fit the times I've hit it. I seem to recall it was when I was doing tree management-type things that could involve modal dialogs for entering credentials.
Assignee | ||
Comment 25•8 years ago
|
||
Ok, so I think I've figured out what's going on. Summary: a) APZ code calls MessageLoop::PostDelayedTask() with a task scheduled for 300ms later. b) MessageLoop moves the task from the "incoming" queue to the "work" queue, and then to the "delayed work" queue, and calls ScheduleDelayedWork so that DoDelayedWork eventually runs c) Then MessageLoop::RunTask runs (for some other task, triggered by the print dialog). This sets nestable_tasks_allowed_ to false before Run()'ing the task d) This task goes into DoWorkRunnable::Run [1] a bunch of times. Each time through, this function saves the current value of NestableTasksAllowed(), forces it true, and then restores the old value. The old value is always false (because it was set to false in step c). e) At some point, between invocations of DoWorkRunnable::Run, MessageLoop::DoDelayedWork gets run (recall it was scheduled in step b). When this runs, nestable_tasks_allowed_ is false, and so it bails out after resetting *next_delayed_work_time [2] f) The RunTask call from step (c) unwinds, and sets nestable_tasks_allowed_ back to true g) Any future delayed tasks scheduled on the MessageLoop are pushed onto the back of the delayed work queue [3] and AFAICT DoDelayedWork never runs again, and the stuff in the delayed work queue is never processed. [1] http://mxr.mozilla.org/mozilla-central/source/ipc/glue/MessagePump.cpp?rev=6d0c889a1c0e#211 [2] http://mxr.mozilla.org/mozilla-central/source/ipc/chromium/src/base/message_loop.cc?rev=e8c7dfe727cd#471 [3] http://mxr.mozilla.org/mozilla-central/source/ipc/chromium/src/base/message_loop.cc?rev=e8c7dfe727cd#455
Assignee | ||
Comment 26•8 years ago
|
||
khuey, is the issue described in comment 25 a known issue? Do you have any suggestions on the best way to fix it (other than "don't use MessageLoop" :))?
Flags: needinfo?(khuey)
Assignee | ||
Comment 27•8 years ago
|
||
After discussion on IRC with khuey, he suggested this as a fix. It seems to work, by wrapping the invocation of MessageLoop::DoDelayedWork in step (e) with code that sets nestable_tasks_allowed_ to true, which allows the delayed work task to run. Flagging bsmedberg for review as well as per khuey's request (bsmedberg: see comment 25 for a summary of the problem).
Assignee: nobody → bugmail.mozilla
Flags: needinfo?(khuey)
Attachment #8734139 -
Flags: review?(khuey)
Attachment #8734139 -
Flags: review?(benjamin)
Comment on attachment 8734139 [details] [diff] [review] Patch Review of attachment 8734139 [details] [diff] [review]: ----------------------------------------------------------------- r=me in the "this looks like a reasonable thing to do" sense, not in the "this will definitely work and will have no unintended side effects" sense. But we can try it.
Attachment #8734139 -
Flags: review?(khuey) → review+
Comment 29•8 years ago
|
||
Comment on attachment 8734139 [details] [diff] [review] Patch I'd really like Billm to think about this, but dvander might be good as well. I totally don't remember what nestable tasks should/shouldn't be for.
Attachment #8734139 -
Flags: review?(benjamin) → review?(dvander)
Comment 30•8 years ago
|
||
(attaching the testcase for posterity)
Updated•8 years ago
|
Comment on attachment 8734139 [details] [diff] [review] Patch Review of attachment 8734139 [details] [diff] [review]: ----------------------------------------------------------------- It looks like the approach here is to always allow nestable tasks for all of Gecko. I don't know how risky that is, or what could go wrong, but our current behavior does seem broken. We never reschedule the timer. The Chromium approach is to opt-in at each re-entrant task point, so "ScopedNestableTaskAllower" appears all over the place. My vote is to try it out and see what happens, and if anything breaks we might have to restrict where nestable tasks are allowed. (Or fix the scheduling bug.) Anyway, I'll r? Bill just to see if he has any thoughts.
Attachment #8734139 -
Flags: review?(wmccloskey)
Attachment #8734139 -
Flags: review?(dvander)
Attachment #8734139 -
Flags: review+
Assignee | ||
Comment 33•8 years ago
|
||
I'm going to land this so we can find out sooner if it's gonna break things. Leaving review flag for billm since he should still look at it.
Comment 38•8 years ago
|
||
bugherder |
https://hg.mozilla.org/mozilla-central/rev/f56afd13cb91
Status: NEW → RESOLVED
Closed: 8 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla48
Comment on attachment 8734139 [details] [diff] [review] Patch Review of attachment 8734139 [details] [diff] [review]: ----------------------------------------------------------------- Trying to figure out how the Chrome developers wanted this to work, I keep coming to the conclusion that it doesn't work. However, the underlying Chrome code doesn't seem to have changed much that I can see. I think the best thing is to remove all the nestable_tasks_allowed_ code. I don't see any benefits to it. We're now not using it at all on our main threads, and I seriously doubt it's useful on non-main threads.
Attachment #8734139 -
Flags: review?(wmccloskey) → review+
Comment 41•8 years ago
|
||
Verified as fixed using latest Dev Edition 48.0a2.
Status: RESOLVED → VERIFIED
You need to log in
before you can comment on or make changes to this bug.
Description
•