Closed Bug 1497090 Opened 6 years ago Closed 4 months ago

While scrolling 60% of the main thread's time is spent blocking on synchronous calls

Categories

(Core :: Graphics: WebRender, defect, P3)

defect

Tracking

()

RESOLVED FIXED

People

(Reporter: nical, Unassigned)

References

Details

On linux with smooth scrolling enabled and webrender there is quite a bit of input lag (much more than with smooth scrolling disabled).
A lot of main thread time time is spent blocking on the synchronous hit testing request, and on some X server request.

Here is a profile scrolling my bugzilla dashboard: https://perfht.ml/2PhEzsD

webrender hit testing: 43.4% 

> wr_api_hit_test
> mozilla::wr::WebRenderAPI::HitTest
> mozilla::layers::APZCTreeManager::GetAPZCAtPointWR
> mozilla::layers::APZCTreeManager::GetTargetAPZC
> mozilla::layers::APZCTreeManager::ReceiveInputEvent
> mozilla::layers::APZInputBridge::ReceiveInputEvent
> nsBaseWidget::DispatchInputEvent
> nsWindow::OnScrollEvent
> scroll_event_cb


x11_get_visual due to GetScreenCoords: 25.7%

> XReply
> XTranslateCoordinates
> gdk_x11_visual_get_xvisual
> gdk_window_get_origin
> nsWindow::WidgetToScreenOffset()
> mozilla::dom::Event::GetScreenCoords
> mozilla::EventStateManager::PreHandleEvent
> mozilla::PresShell::HandleEventInternal
> mozilla::PresShell::HandleEvent
> nsViewManager::DispatchEvent
> nsViewManager::DispatchEvent
> nsView::HandleEvent
> nsWindow::DispatchEvent
> nsBaseWidget::ProcessUntransformedAPZEvent
> nsBaseWidget::DispatchInputEvent
> nsWindow::OnScrollEvent
> scroll_event_cb

I don't have a Windows device handy to check how much it is affected (probably doesn't suffer from the silly X server roundtrips), but we might want to make it so that webrender hit testing calls don't need to block on the render backend (for example we could keep a copy of the latest hit test tree that is always accessible by any thread without querying the render backend).
Dupe of bug 1480964 (for windows)
From the profile it looks like every time APZ requests a hit-test, WR goes to do a frame build and rebuild the hit-tester, which is why it spends so much time waiting on the RB thread. It didn't use to do this, IIRC, so this is a regression.
Priority: -- → P3
The input lag has been an issue for me for a very long time (I disabled smooth scrolling because of that at least a year ago) I remember it was an issue on my laptop but not on Jeff's macbook, probably something baout the frequency of input events and/or his CPU being a lot faster.

I added some logging and during scrolling we build the frame as often as we request it with Transaction::generate_frame, so hit testing isn't causing extra frame building as far as I can tell (although in that profile we do build a lot of frames since we are scrolling continuously).
Depending on timing, hit-testing may trigger re-building the hit tester (if the hit-test request arrives between a scene build and a vsync) but I expect this to be much cheaper than frame building to not happen very often on that test case.
Ok. The hit-testing stuff was originally written with the assumption that the RB thread would be responsive always. If that assumption is no longer valid then we should revisit the architecture. But if we can't use WR hit-testing then the changes on the APZ side are going to be very complicated, and a big step backwards (IMO in the long run we want to do hit-testing using WR rather than having to send another pile of hit-testing data to the C++ APZ code and do it there).
I think that we should aim for frame building to be faster than what it is now for many reasons and maybe that would be enough for the purposes of hit-testing.
That said, another thing we could maybe do is create a read-only clone of the hit-tester each time we re-build it and make that clone accessible to any thread that want to do hit-testing without strong guarantees about the ordering of events (I am under the impression that this is the case for the main thread's hit testing requests but I might be wrong).
We would pay for an extra allocation/copy but if the hit-tester is sufficiently small it should be cheaper than blocking on the render backend thread.
I think that this shouldn't be too hard to implement (maybe an Arc<Mutex<HitTester>> would be enough).

All of this depends on my assumption that the hit testing request can race ahead of other messages but since those messages come from different sources on different threads I think we should be fine.

If that's not the case, then we can still look into:
 - compressing input callbacks: by that I mean instead of doing the work directly in the handler, we push the events in a queue that can and schedule the work in the main thread's event loop whenever we can. This way If we have 4 scroll events piled up we in the queue when we get to do the work we can simplify them into one. That wouldn't really fix the issue about waiting but would prevent work from accumulating which is what makes the lag so noticeable.
 - reorganize work that relies on hi-testing to be asynchronous so that the main thread's event loop can keep doing work while the hit test request is in flight (that sounds like more work than I'd like).
> Arc<Mutex<HitTester>>

correction: I meant Arc<Mutex<Arc<HitTester>>> for Arc<Atomic<Arc<HitTester>>> if that's a thing.
There are some ordering requirements on hit-testing, but it should be possible to make a clone onto another thread, I think. I'd have to think about it more to confirm. But let's focus on speeding up frame building first, and then we can revisit this issue if it's still a problem.
See Also: → 1580178
Depends on: 1580178
Severity: normal → S3
Status: NEW → RESOLVED
Closed: 4 months ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.