Open Bug 1461336 Opened 2 years ago Updated 6 months ago
Hangs with Wasm Web
User Agent: Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/66.0.3359.170 Safari/537.36 Steps to reproduce: Testing movements in a webgl game. Weird behavior with the mouse control Actual results: Since the version 60 of firefox, if an user run a webgl game below 60 fps because he have a low end computer or because the game use a lot of ressources, then he got some saccade/jerk movements on the mouse mouvements (weird behavior with the mouse control). Tested on 2 computer and some other game also tell their players to move on chrome because of this problem. Test this demo on Firefox : https://www.asr-games.net/testing/FirefoxPerformance/ Follow the instruction and see the weird behavior when you move foward and move the mouse in same time. Work perfectly on chrome. Unity forum link if needed: https://forum.unity.com/threads/firefox-performances.530825/ Expected results: We should not get the saccade/jerk movements on the mouse mouvements
You also reproduce this problem on the After the Flood demo here :https://playcanv.as/e/p/44MRmJRU/ Put the game on ultra in order to get 20/25 fps press foward/left key and rotate your mouse.
So this issue didn't happen on FF59? If so, trying to find regression range would be great. https://mozilla.github.io/mozregression/
tentatively adding regression keyword. Regression-range would be super useful.
Another useful thing would be to create a performance profile using Nightly (http://nightly.mozilla.org/) build of Firefox and https://perf-html.io/ addon.
(In reply to Olli Pettay [:smaug] from comment #2) > So this issue didn't happen on FF59? > If so, trying to find regression range would be great. > https://mozilla.github.io/mozregression/ I don't know if this issue was present before the version 60, i have see this problem after a performance test on a low computer just after the release of Firefox 60.
Are you running any addons? Try restarting in safe mode and see if it still happens. We have a known issue with addons that can cause small hangs like this in bug 1449033, and was introduced in Firefox 60.
Hmm, never mind. I see the same thing on a recent Nightly without any addons installed, so it isn't that.
Sorry, but do the "wontfix" term mean that you will not fix this issue for the version 60? All games are affected and all our players have to wait until the next release planned the 16 june? We do not have other choice to move our players on chrome at the moment.
I tried this a few days ago on Nightly and I was seeing hangs like you described, but I tried it again just now (this time with the profiler running) and I didn't see any issues. I also tried it just now in Firefox Beta and I couldn't reproduce it. Can you reproduce this issue on an updated version of Firefox Beta or Nightly? (I suppose it is possible this is a duplicate of bug bug 1449033.) If you can, please use the profiler addon at https://perf-html.io/ and capture a profile that includes one of the hangs, and upload it and link it here. Thanks.
The bug is still present on the beta 61.0b5 but seem to be fixed on Nightly 62.0a1 (2018-05-17). I don't know how to use your https://perf-html.io/ Be sure to press foward and left or right in same time when you make your test. You also need to be below 60 FPS. https://playcanv.as/e/p/44MRmJRU/ on Ultra should give you less than 60 fps or my demo if you follow my steps on my first post. Thank you.
Flags: needinfo?(r2roka) → needinfo?(continuation)
I don't see any problems in beta, even with the stress mode on. There are instructions for using the profiler to report a problem here: https://developer.mozilla.org/en-US/docs/Mozilla/Performance/Reporting_a_Performance_Problem
I don't know if i have doing it correctly but : Firefox Normal 60.0.1 (bug reproduced) : https://perfht.ml/2IBf1ne Firefox Beta 61.0b6 (bug reproduced) : https://perfht.ml/2KDgk5o Firefox Nightly 62.0a1 (2018-05-18) (not reproduced, seem to be fixed???) : https://perfht.ml/2LfuCdK Thank you
Thanks for the profiles. If I'm reading this correctly, in the two bad profiles, Firefox is spending a little over 40% of the time inside dom::consoleBinding::log(). In the "not reproduced" profile, it is spending about 20% of the time in this same method. RokaWar, what exactly is this demo logging? Do you still see the performance issues if you disable the logging? It looks like WASM is calling something called _JS_Log_ Dump. I saw a few dozen messages in the browser console, but not enough to justify spending so much time logging. Baku, maybe you could take a look into why logging is taking so much time? I don't know if this is unreasonable or not. It is also interesting that we're apparently twice as fast in Nightly. I don't know if that is expected or not. I'll move this to DOM for now, because that is what is showing up in the profile.
Component: Untriaged → DOM
Product: Firefox → Core
Summary: Jerky mouse movements in WebGL game → Hangs with WASM WebGL game
Summary: Hangs with WASM WebGL game → Hangs with Wasm WebGL game
The only major change to Console.cpp that landed in the last month is bug 1458466, which implemented a new method on Console.
Well, like my demo do some log (i have forgot to remove it for the release) and like i don't want you to focus on it because the problem is not related to the log, i have made a new test on the After the flood demo : https://playcanv.as/e/p/44MRmJRU/ The result is the same. Firefox normal 60.0.1 (bug reproduced): https://perfht.ml/2J21BDW Firefox Beta 61.0b7 (bug reproduced): https://perfht.ml/2IVMBYr Firefox Nightly 62.0a1 (bug not reproduced) : https://perfht.ml/2J2CCR1 Sorry about the log trouble from my demo, i will update it tomorrow without the log. Thank you.
Flags: needinfo?(r2roka) → needinfo?(continuation)
Alright. I guess Andrea can file a new bug for the console stuff if he thinks there's a reason for it. Well, I looked at your new profiles and I'm not sure what could be going wrong there. Mike, can you take a look and see if there's anything that jumps out at you in the profiles in comment 15? Thanks.
Flags: needinfo?(continuation) → needinfo?(mconley)
Nothing jumps out at me - we're spending most of our time either rasterizing, or running JS and doing GL commands. Perhaps Bas sees something?
Flags: needinfo?(mconley) → needinfo?(bas)
(In reply to Mike Conley (:mconley) (:⚙️) (Catching up on needinfos / reviews) from comment #17) > Nothing jumps out at me - we're spending most of our time either > rasterizing, or running JS and doing GL commands. Perhaps Bas sees something? Isn't it odd that in places like this: https://perf-html.io/public/b7050df4af0bb9e4e093d1eae1886b371df22b5c/calltree/?range=12.8214_13.2717&thread=6&threadOrder=0-2-3-4-6-1-5&v=3 we're drawing many frames during 1 event processing delay? It's almost as if we're queueing too many frames ahead. Matt changed something in the vsync driver, I suspect he may have fixed this by accident, could I be right, Matt?
Flags: needinfo?(bas) → needinfo?(matt.woodrow)
I'll just cancel the needinfo on Andrea under the assumption that the console stuff was a red herring.
(In reply to Bas Schouten (:bas.schouten) from comment #18) > (In reply to Mike Conley (:mconley) (:⚙️) (Catching up on needinfos / > reviews) from comment #17) > > Nothing jumps out at me - we're spending most of our time either > > rasterizing, or running JS and doing GL commands. Perhaps Bas sees something? > > Isn't it odd that in places like this: > https://perf-html.io/public/b7050df4af0bb9e4e093d1eae1886b371df22b5c/ > calltree/?range=12.8214_13.2717&thread=6&threadOrder=0-2-3-4-6-1-5&v=3 we're > drawing many frames during 1 event processing delay? It's almost as if we're > queueing too many frames ahead. > > Matt changed something in the vsync driver, I suspect he may have fixed this > by accident, could I be right, Matt? Yes, I think this is exactly right. Bug 1371668 changed the vsync timings, and landed into Nightly on 2018-05-16. Does the bug reproduce on a Nightly older than that? It looks like we're in the situation described at , where we're always painting in response to the DidComposite message. It looks like painting from vsync has manual rescheduling to prevent the IPDL message from being higher priority and blocking all other event processing . The DidComposite message doesn't have this, and each time we finish painting, the next one has been delivered, so we never get back to the normal queue.  https://bugzilla.mozilla.org/show_bug.cgi?id=1371668#c5  https://searchfox.org/mozilla-central/source/layout/base/nsRefreshDriver.cpp#561
Well, i have modified my testing method. You can test it here https://www.asr-games.net/testing/FirefoxPerformance/ Be sure to follow the instruction and wait after the fps drop in order to see the weird behavior with the mouse control. Tested on 4 computers and right now, it's impossible for anyone to play an first person shooter game using firefox with a slow computer (fps < 30). Profiler result : firefox 60.0.2 : https://perfht.ml/2JUkvKp <==== Completely weird mouse behavior. firefox 61.0b14 : https://perfht.ml/2ygJLcr <==== Completely weird mouse behavior. firefox 62.0a1 : https://perfht.ml/2yjVbvZ <==== Better result on Nightly but not smooth like in google chrome. If someone can take a look. Thank you.
You'll need somebody on the graphics team to take a look. I don't know anything about graphics. Maybe Bas has some time to do that.
Flags: needinfo?(continuation) → needinfo?(bas)
(In reply to Andrew McCreight [:mccr8] from comment #23) > You'll need somebody on the graphics team to take a look. I don't know > anything about graphics. Maybe Bas has some time to do that. The graphics part of this bug is fixed on nightly. In the profile on nightly it seems the vast majority of time is spent in scripts (30-60ms per frame), mainly inside WASM stuff.
Flags: needinfo?(bas) → needinfo?(continuation)
Ah, right. I'll move this over to the WebAssembly component and maybe somebody familiar with that can take a look at the profile.
FWIW, I was able to reproduce the unplayability described in comment 22 on ff<62 and the big improvement on ff62, which seems roughly the same as Chrome at the same framerate. Given that the point of the demo is to saturate the CPU with physics work, I don't even think there is a problem left to fix here and so this would be resolved duplicate of whatever fixed the mouse between 61 and 62.
I'm still able to reproduce this problem by playing this game https://warbrokers.io (in 16 players room in order to get performance loss). Windows machine. Nightly 63.0a1 ==> https://perfht.ml/2IvkL0T Bad to see that it's impossible to play an fps game by using firefox because of this problem. The loading time is so fast in firefox but the game play and fluidity is so much better in chrome. Something is wrong somewhere. All fps game are unplayable on all firefox version. It will be cool if someone can focus on it. We have lose the unity web player and the webgl is the only way for us to release our game. Thank you.
It looks like most of the time is being spent in wasm, so you should ask Luke if anything more can be done.
Flags: needinfo?(continuation) → needinfo?(luke)
To wit, clicking the perf report in comment 27, and clicking invert callstack, I see 17% of self time in ZwDelayExecution under GL/driver calls. That being said, it seems like CPU usage isn't the issue here so much as janky mouse behavior when there is high CPU usage. That's what I observed in comment 26 (where for the *same FPS*, FF mouselook behaved way worse). It felt like it was fixed to me, but maybe not completely...
Luke, you are right,the ZwDelayExecution consumption is not normal and something is wrong with it but we see it all the time on the "content" tab, not just only during the jerk mouse behavior. If you look at this profiler result : https://perfht.ml/2IPfMYN i have moved and reproduced the crazy mouse behavior bettween 9s and 13s. If you look on the main thread , you can see some spike during the mouse movements and it seem that these spike come from the mouse input event if i'm not wrong. If it helps to solve this big problem for us. Thank you
Sorry to bother you Andrew but it seem that Luke is not present at the moment. Can you ping someone else? Too much of our players are complaining and we have no other choice than redirect them on chrome. Thank you
In that profile, between 9 and 13 seconds, it looks like the main thread is spending most of its time inside some Windows function, converting unicode to ascii, which is waiting for a lock I think? Mike, could you take a look at this profile? I have no idea what is actually going wrong here by looking at the profile, if it is even possible to tell.
(In reply to Andrew McCreight [:mccr8] from comment #32) > In that profile, between 9 and 13 seconds, it looks like the main thread is > spending most of its time inside some Windows function, converting unicode > to ascii, which is waiting for a lock I think? > It's a little opaque, but what you're identifying is the idle state of the main thread in the parent process. So the good news is that, for 75% of the time from the 9s-13s window, the main thread in the parent process is waiting on the OS to give us events. While I'm certainly seeing activity in the parent process related to the mouse moving, none of it appears to be cause for concern - the parent does spend some time waiting for sync IPC messages (for APZ) to come back for, presumably, hit testing. But it doesn't wait long (the compositor thread seems pretty free). The real concern, to me, is in the main thread of the content process running the game. The game is clearly not running at 60fps. The ZwDelayExecution thing appears to be something that gets called when we call into D3D11. That takes about 20% of the time. The rest is a long tail of WASM stuff. Here's the call stack for the ZwDelayExecution thing: (root) RtlUserThreadStart BaseThreadInitThunk static int __scrt_common_main_seh() wmain content_process_main(mozilla::Bootstrap *,int,char * * const) XRE_InitChildProcess XRE_InitChildProcess(int,char * * const,XREChildData const *) MessageLoop::Run() MessageLoop::RunHandler() mozilla::ipc::MessagePumpForChildProcess::Run(base::MessagePump::Delegate *) XRE_RunAppShell() nsAppShell::Run() nsBaseAppShell::Run() MessageLoop::Run() MessageLoop::RunHandler() mozilla::ipc::MessagePumpForChildProcess::Run(base::MessagePump::Delegate *) mozilla::ipc::MessagePump::Run(base::MessagePump::Delegate *) nsThread::ProcessNextEvent(bool,bool *) mozilla::ipc::MessageChannel::MessageTask::Run() mozilla::ipc::MessageChannel::RunMessage(mozilla::ipc::MessageChannel::MessageTask &) mozilla::ipc::MessageChannel::DispatchMessageW(IPC::Message &&) mozilla::ipc::MessageChannel::DispatchAsyncMessage(IPC::Message const &) mozilla::ipc::PBackgroundChild::OnMessageReceived(IPC::Message const &) PVsync::Msg_Notify mozilla::layout::PVsyncChild::OnMessageReceived(IPC::Message const &) mozilla::layout::VsyncChild::RecvNotify(mozilla::TimeStamp const &) mozilla::VsyncRefreshDriverTimer::RefreshDriverVsyncObserver::NotifyVsync(mozilla::TimeStamp) mozilla::VsyncRefreshDriverTimer::RefreshDriverVsyncObserver::TickRefreshDriver(mozilla::TimeStamp) mozilla::VsyncRefreshDriverTimer::RunRefreshDrivers(mozilla::TimeStamp) mozilla::RefreshDriverTimer::Tick(__int64,mozilla::TimeStamp) mozilla::RefreshDriverTimer::TickRefreshDrivers(__int64,mozilla::TimeStamp,nsTArray<RefPtr<nsRefreshDriver> > &) mozilla::RefreshDriverTimer::TickDriver(nsRefreshDriver *,__int64,mozilla::TimeStamp) nsRefreshDriver::Tick nsRefreshDriver::Tick(__int64,mozilla::TimeStamp) nsViewManager::ProcessPendingUpdates() nsViewManager::ProcessPendingUpdatesForView(nsView *,bool) nsViewManager::ProcessPendingUpdatesPaint(nsIWidget *) PresShell::Paint http://warbrokers.io/game3d.php?nick=Guest84150&firefox=true mozilla::PresShell::Paint(nsView *,nsRegion const &,unsigned int) mozilla::layers::ClientLayerManager::EndEmptyTransaction(mozilla::layers::LayerManager::EndTransactionFlags) ClientLayerManager::EndTransactionInternal mozilla::layers::ClientLayerManager::EndTransactionInternal(void (*)(mozilla::layers::PaintedLayer *,gfxContext *,mozilla::gfx::IntRegionTyped<mozilla::gfx::UnknownUnits> const &,mozilla::gfx::IntRegionTyped<mozilla::gfx::UnknownUnits> const &,mozilla::layers::DrawRegionClip,mozilla::gfx::IntRegionTyped<mozilla::gfx::UnknownUnits> const &,void *),void *,mozilla::layers::LayerManager::EndTransactionFlags) ClientCanvasLayer::RenderLayer mozilla::layers::ClientContainerLayer::RenderLayer() mozilla::layers::ClientCanvasLayer::RenderLayer() mozilla::layers::ShareableCanvasRenderer::UpdateCompositableClient() mozilla::WebGLContextUserData::PreTransactionCallback(void *) mozilla::WebGLContext::PresentScreenBuffer() mozilla::gl::GLScreenBuffer::Swap(mozilla::gfx::IntSizeTyped<mozilla::gfx::UnknownUnits> const &) mozilla::gl::GLContext::fFinish() gl::Finish() rx::Context11::finish(gl::Context const *) rx::Renderer11::finish() CContext::ID3D11DeviceContext1_GetData_<1>(ID3D11DeviceContext1 *,ID3D11Asynchronous *,void *,unsigned int,unsigned int) CAsynchronous<ID3D11Asynchronous>::GetData<1,0>(CContext *,CAsynchronous<ID3D11Asynchronous> *,void *,unsigned int,unsigned int) 0x7feeb449f79 0x7feeb449f2b 0x7feeb4379a9 ZwDelayExecution So I think I'm going to kick this over to Graphics again to see if that ZwDelayExecution is expected (it's ultimately costing us the majority of our rasterization time). Hey jrmuizel - I've got a juicy maybe-graphics-related profile here. See anything actionable?: https://perfht.ml/2KY08Q9
Flags: needinfo?(mconley) → needinfo?(jmuizelaar)
It looks to me like the ZwDelayExecution is just us waiting for the GPU to finish. This wait is happening on the same thread as the WASM execution and so our frame time is wasm time + gpu time. Chrome runs GL asynchronously and is likely doing a better job of overlapping the execution of the gpu time and the wasm time. jgilbert can confirm that we expect to be waiting on the GPU in the glFinish call and perhaps suggest a solution. In the long term sandboxing webgl will force us to be more asynchronous and so might fix this.
Flags: needinfo?(jmuizelaar) → needinfo?(jgilbert)
We need to glFinish there if we didn't get a KeyedMutex: https://dxr.mozilla.org/mozilla-central/rev/085cdfb90903d4985f0de1dc7786522d9fb45596/gfx/gl/SharedSurfaceANGLE.cpp#148 We really should be getting a KeyedMutex, but maybe that's not happening for some reason? We need better tests for this.
Ah indeed. That seems worth investigating. Are you able to reproduce the problem locally?
Assignee: nobody → jgilbert
Yes, but not in this bug. This bug seems to have been fixed as of Nightly 62, and I'm seeing that issue in Nightly 63.
Do you have any infos Jeff? I'm sorry, but this bug is a big problem for us and our games. Thank you.
(In reply to RokaWar from comment #38) > Do you have any infos Jeff? > I'm sorry, but this bug is a big problem for us and our games. Can you try again? Bug 1475010 has landed on Nightly a few days ago and it sounds like it might help this issue. Thanks.
I have tested it and it's more smooth(we do not have the big hangs anymore) but something is still wrong somewhere, the mouse hangs happen again but not hardly like before this bug fixe. In chrome, the mouse movements is smooth even at a low frame rate without jump or hangs. My profiler result (63.0a1 (2018-07-23) (64 bits)) : https://perfht.ml/2JM7SjC Thank you.
Flags: needinfo?(r2roka) → needinfo?(continuation)
I do see some 54ms input delays here, but I can't tell what is going wrong, if anything.
Hello, Can you please assign someone who can fixe this problem definetely ? It's impossible to play an fps game with firefox if we have a frame rate below 60 fps. Everyday, i receive message from my players who have an issue with their mouse with firefox. I tell them to move on chrome because it's smooth on it. Every firefox update, i'm hoping that this issue will be solved but it's everytime the same. Your compile time and load time is faster, but something is wrong when moving below 60 FPS. Tested today again on 64.0a1 : First game (https://warbrokers.io/signin.php) : https://perfht.ml/2D1N2Pe Between 0 ==> 25 secondes , i move my mouse (weird mouse jump happen) Between 25 ==> 31 secondes , i do not move my mouse Between 31 ==> 50 secondes , i move my mouse (weird mouse jump happen) Second game (https://warmerise.com/) : https://perfht.ml/2D2NFIj Between 0 ==> 21 secondes , i move my mouse (weird mouse jump happen) Between 21 ==> 24 secondes , i do not move my mouse Between 25 ==> 40 secondes , i move my mouse (weird mouse jump happen) Thank you
Olli, can you take a look at the profiles in comment 42?
Flags: needinfo?(continuation) → needinfo?(bugs)
I tried with and without input event queue (input_event_queue.supported) and couldn't see difference on a slow Windows machine. Locally on linux I see long GLContent::raw_fReadPixels calls, but then, this linux machine is too fast. On the slow Windows machine when looking at where the time is spent between refreshdriver ticks, it is in wasm and usually the time under some some WebGLContext::* call. And those WebGL calls seem to be locking.
tryserver is closed atm, but I'd like to see how this changes the behavior on a slower machine. I'd expect possibly worse behavior. Need to wait for try to reopen.
remote: View the pushlog for these changes here: remote: https://hg.mozilla.org/try/pushloghtml?changeset=5019f295e02f7b247d76e7514cb76cd506726211 remote: remote: Follow the progress of your build on Treeherder: remote: https://treeherder.mozilla.org/#/jobs?repo=try&revision=5019f295e02f7b247d76e7514cb76cd506726211 remote: recorded changegroup in replication log in 0.017s
(In reply to Olli Pettay [:smaug] from comment #44) My Linux machine also spends >50% in Graphics of the time doing raw_fReadPixels(), which takes me to ~30fps which is where the problems appear. The problem seems to not be the poor frame rate, but rather that the input is erratic, not just the normal lag you'd expect with 30fps. It's like somehow we're mis-reporting the frame time or mouse position or something like that.
I wonder if Chrome actually doesn't report all the mousemoves. That has been the case earlier in some cases.
Are you able to instrument the game to get a log of mouse move events to determine if Chrome is doing different mousemove event delivery as Olli is suggesting?
Sorry, but i will not spend more time on this problem. I have reported it long time ago and we are at the same point. Now i redirect everybody on google chrome (like a lot of webmaster are doing now). It's more easy for me and my players get better game experience on a web browser. Thank you
Flags: needinfo?(r2roka) → needinfo?(luke)
Bummer. Olli is there an easy way to not report all mousemoves as you expect Chrome is doing and see if that fixes the mouse weirdness at low FPS?
Coincidentally, I'm seeing the same erratic mouse movement on an Epic engine build (that again spends all its time in raw_fReadPixels() on Linux with layers acceleration disabled), so this really seems like some mouse event issue.
QA Contact: ajones
(In reply to Luke Wagner [:luke] from comment #51) > Bummer. Olli is there an easy way to not report all mousemoves as you > expect Chrome is doing and see if that fixes the mouse weirdness at low FPS? I have no idea what that means. We coalesce mousemoves already. I have the gut feeling that Chrome is not reporting all the values, so we end up kind of causing more movement than Chrome. But haven't had time to measure.
Too late for a fix in 64. We could still take a patch for 66 or possibly 65.
Component: Event Handling → User events and focus handling
You need to log in before you can comment on or make changes to this bug.