Open Bug 1897427 Opened 1 year ago Updated 18 days ago

fales.itch.io - The game is stuck in a loading state

Categories

(Web Compatibility :: Site Reports, defect, P3)

Tracking

(Webcompat Priority:P2, Performance Impact:medium, Webcompat Score:5)

Webcompat Priority P2
Performance Impact medium
Webcompat Score 5

People

(Reporter: denschub, Unassigned)

References

()

Details

(4 keywords)

User Story

platform:windows,mac,linux,android
impact:site-broken
configuration:general
affects:all

Attachments

(2 files)

Environment:
Operating system: Linux
Firefox version: Firefox 118.0

Steps to reproduce:
Press "Run Game", Chrom* loads correctly, Firefox stalls forever during load.
The demo is a Godot 4.1 project exported with GDExtension support (C++ to WASM via emscripten with threads and dynamic linking support).
Tested with emscripten 3.1.18

Actual Behavior:
Page not loading correctly

Notes:

  • Reproduces regardless of the status of ETP
  • Reproduces in Firefox Nightly, Firefox Release
  • Does not reproduce in Chrome

Created from https://github.com/webcompat/web-bugs/issues/127798

While loading that game, the Content Process ends up in 100% jank. Profile: https://share.firefox.dev/3KaJBVO

Something is going horribly wrong here, and I'm not sure what exactly. Loading that game completely freezes the content process. The profile reveals a very red and very grumpy jank bar, and it also claims that... there's 211 GB of memory allocations?!

The fast majority of runtime is spent in

SvgLoader::paint() [https://html.itch.zone/html/8801312/index.js line 4652 > WebAssembly.instantiate:7073782]
tvg::Picture::Impl::load() [https://html.itch.zone/html/8801312/index.js line 4652 > WebAssembly.instantiate:7053093]
tvg::Picture::Impl::update(tvg::RenderMethod&, tvg::RenderTransform const*, tvg::Array<void*>&, unsigned char, tvg::RenderUpdateFlag, bool) [https://html.itch.zone/html/8801312/index.js line 4652 > WebAssembly.instantiate:7053863]
tvg::PaintMethod<tvg::Picture::Impl>::update(tvg::RenderMethod&, tvg::RenderTransform const*, tvg::Array<void*>&, unsigned char, tvg::RenderUpdateFlag, bool) [https://html.itch.zone/html/8801312/index.js line 4652 > WebAssembly.instantiate:7053838]
tvg::Paint::Impl::update(tvg::RenderMethod&, tvg::RenderTransform const*, tvg::Array<void*>&, unsigned char, tvg::RenderUpdateFlag, bool) [https://html.itch.zone/html/8801312/index.js line 4652 > WebAssembly.instantiate:7039779]
tvg::Canvas::Impl::update(tvg::Paint*, bool) [https://html.itch.zone/html/8801312/index.js line 4652 > WebAssembly.instantiate:7100341]
tvg::Canvas::Impl::push(std::__2::unique_ptr<tvg::Paint, std::__2::default_delete<tvg::Paint>>) [https://html.itch.zone/html/8801312/index.js line 4652 > WebAssembly.instantiate:7099987]
tvg::Canvas::push(std::__2::unique_ptr<tvg::Paint, std::__2::default_delete<tvg::Paint>>) [https://html.itch.zone/html/8801312/index.js line 4652 > WebAssembly.instantiate:7099907]
ImageLoaderSVG::create_image_from_utf8_buffer(Ref<Image>, unsigned char const*, int, float, bool) [https://html.itch.zone/html/8801312/index.js line 4652 > WebAssembly.instantiate:7105573]
ImageLoaderSVG::create_image_from_utf8_buffer(Ref<Image>, Vector<unsigned char> const&, float, bool) [https://html.itch.zone/html/8801312/index.js line 4652 > WebAssembly.instantiate:7107366]
ImageLoaderSVG::create_image_from_string(Ref<Image>, String, float, bool, HashMap<Color, Color, HashMapHasherDefault, HashMapComparatorDefault<Color>, DefaultTypedAllocator<HashMapElement<Color, Color>>> const&) [https://html.itch.zone/html/8801312/index.js line 4652 > WebAssembly.instantiate:7107459]
generate_icon(int) [https://html.itch.zone/html/8801312/index.js line 4652 > WebAssembly.instantiate:20280318]
fill_default_theme(Ref<Theme>&, Ref<Font> const&, Ref<Font> const&, Ref<Font> const&, Ref<Font> const&, Ref<Texture2D>&, Ref<StyleBox>&, float) [https://html.itch.zone/html/8801312/index.js line 4652 > WebAssembly.instantiate:20169843]
make_default_theme(float, Ref<Font>, TextServer::SubpixelPositioning, TextServer::Hinting, TextServer::FontAntialiasing, bool, bool) [https://html.itch.zone/html/8801312/index.js line 4652 > WebAssembly.instantiate:20282284]
ThemeDB::initialize_theme() [https://html.itch.zone/html/8801312/index.js line 4652 > WebAssembly.instantiate:3155073]
Main::setup2() [https://html.itch.zone/html/8801312/index.js line 4652 > WebAssembly.instantiate:3105620]
Main::setup(char const*, int, char**, bool) [https://html.itch.zone/html/8801312/index.js line 4652 > WebAssembly.instantiate:2998039]
godot_web_main(int, char**) [https://html.itch.zone/html/8801312/index.js line 4652 > WebAssembly.instantiate:2997438]
...

But most of it seems to be emscripten_futex_wait, and I don't see what it's waiting for. I'm throwing this over to the performance team, maybe y'all can have a look at what's going wrong here.

I'll note that according to this comment, this does/can affect Godot-builds for Firefox, so we should take this seriously, even if the specific report's URL isn't highly ranked.

Component: Site Reports → Performance
Product: Web Compatibility → Core
Webcompat Priority: --- → ?

This bug was moved into the Performance component.

:denschub, could you make sure the following information is on this bug?

  • ✅ For slowness or high CPU usage, capture a profile with http://profiler.firefox.com/, upload it and share the link here.
  • For memory usage issues, capture a memory dump from about:memory and attach it to this bug.
  • Troubleshooting information: Go to about:support, click "Copy raw data to clipboard", paste it into a file, save it, and attach the file here.

If the requested information is already in the bug, please confirm it is recent.

Thank you.

Flags: needinfo?(dschubert)
Flags: needinfo?(dschubert)
Attached file memory-report.json.gz
See Also: → 1898309

There is a very long Ion Compilation going on, see https://share.firefox.dev/44VkjEL. Originally I opened another bug for this, but I think it's likely related to this hang so I'll close it and keep the issue here.

Component: Performance → JavaScript: WebAssembly

The Performance Impact Calculator has determined this bug's performance impact to be medium. If you'd like to request re-triage, you can reset the Performance Impact flag to "?" or needinfo the triage sheriff.

Platforms: [x] Windows [x] macOS [x] Linux [x] Android
Impact on browser: Causes noticeable jank
Impact on site: Renders site effectively unusable
Page load impact: Some
Websites affected: Rare
[x] Able to reproduce locally

Performance Impact: --- → medium
See Also: 1898309

(In reply to Denis Palmeiro [:denispal] from comment #5)

There is a very long Ion Compilation going on, see https://share.firefox.dev/44VkjEL.

In that profile, "TaskController #0", there does seem to be a lot of time spent
in Ion's register allocator, and in particular in ::tryMergeReuseRegister, which
we might have had problems with before now (I don't remember exactly).

Can you get hold of the .wasm file involved? That would make it much
easier to investigate.

There are two .wasm files,

I don't know which one is relevant. Maybe Rémi can help?

Flags: needinfo?(rverschelde)

index.wasm is compiled without difficulty by both wasm-baseline and Ion.
index.side.wasm is ok on wasm-baseline, but causes Ion to take a very long
time. It has a huge function which produces 947115 MIR nodes, all in a single
block, and the register allocator falls into what I suspect is a O(N^2)-time
hole. It's one we've seen before; the patch below "fixes" it.

Unfortunately I suspect there is a second, unrelated perf problem somewhere.
Using release (126.0), with javascript.options.wasm_optimizingjit = false, I
see, first, a burst of network activity as index.side.wasm (61MB) is
retrieved, a short burst of computation -- I assume baseline compiling it --
but then the 8-segment spinner appears for several minutes. So something
else, not Ion, is also a problem.

diff --git a/js/src/jit/BacktrackingAllocator.cpp b/js/src/jit/BacktrackingAllocator.cpp
--- a/js/src/jit/BacktrackingAllocator.cpp
+++ b/js/src/jit/BacktrackingAllocator.cpp
@@ -1982,17 +1982,17 @@ bool BacktrackingAllocator::tryMergeReus
     // can thus be in the same group.
     return tryMergeBundles(def.firstBundle(), input.firstBundle());
   }
 
   // Avoid merging in very large live ranges as merging has non-linear
   // complexity.  The cutoff value is hard to gauge.  1M was chosen because it
   // is "large" and yet usefully caps compile time on AutoCad-for-the-web to
   // something reasonable on a 2017-era desktop system.
-  const uint32_t RANGE_SIZE_CUTOFF = 1000000;
+  const uint32_t RANGE_SIZE_CUTOFF = 1000000 / 2;
   if (inputRange->to() - inputRange->from() > RANGE_SIZE_CUTOFF) {
     def.setMustCopyInput();
     return true;
   }
 
   // The input is live afterwards, either in future instructions or in a
   // safepoint for the reusing instruction. This is impossible to satisfy
   // without copying the input.

Some details about the the demo:

  • As Dennis said, this is a build of Godot + Threads + Emscripten dynamic linking. Any other variant (no-threads, no-extension, etc) works.
  • This build type is composed as following:
    • The index.wasm , is the main module, it contains no code beside the emscripten runtime, some bindings for our JS support functions, and a main that calls an extern.
    • The index.side.wasm is the side module, it contains all the actual Godot code, and the godot_web_main extern that will be called by the main module.
    • A variable amount of .wasm libraries, that Godot will load with the emscripten dlopen emulation (in this demo: "libgdexample.javascript.template_debug.wasm32.wasm").
    • All dynamic libraries (including the index.side.wasm) are preloaded before calling the Godot main, using emscripten's Module["loadDynamicLibrary"] (called by the main engine JS file)
    • All thread builds have THREAD_POOL_SIZE=8, i.e. emscripten immediately starts 8 threads (so the engine can start threads synchronously). I think this is the burst Julian see (the 8 workers starting and instantiating the wasm).

I've also recently tried a minimal build disabling all Godot modules, but the issue was still present. I've also tried to recreate the same setup in a minimal C++ project, but couldn't reproduce it there :(.
Let me know if you want me to provide a custom godot build with most features disabled (might be easier to debug?). I can also provide it with debug symbols if you think that'd be useful (though that would be something like 400MiB).
In general, I seem unable to debug FF when this issue happens... If I start the devtools before the page loads, the page get stuck on a different error. If I try to open the devtools after the issue is happening, the devtools are completely unresponsive and the only option is closing the page.

(In reply to Fabio Alessandrelli from comment #10)

Thanks for the details. At the moment the important thing (IMO) is to not
change fales.itch.io/gdextension-demo in any way, since if you do, we will
lose our test case. Or if you change it, please make a copy available and let
us know where it is.

I will try to profile the browser running fales.itch.io/gdextension-demo by
the end of this week.

You're welcome, and no problem, I won't be touching the hosted demo.

By putting printfs in Ion's register allocator, I can see that index.side.wasm
(or at least the huge function in it) is compiled multiple times. Possibly 8
times. Maybe related to THREAD_POOL_SIZE=8 in comment 10 ? But this is
compilation, not instantiation.

I also tried to use the Firefox profiler to observe the compilation activity,
but didn't see any compilation activity there. I don't understand why not
(but I know little about how to use the profiler).

Using both Ion-only and baseline-only compilation, the app eventually winds up
getting stuck at the "spinner" display and never seems to progress beyond
that. With baseline-only, it reaches that point within a few seconds. With
Ion-only compilation and the change in comment 9, it takes a bit longer to get
there, but arrives at the same point.

The long delay in Ion compilation seems to me to be a secondary problem (and
we should land the comment 9 change IMO). The core problem is that the app
hangs after compilation and possibly instantiation.

More than that I do not know (nor understand).

Just reduce scope of this bug: there might be a performance problem in the Ion compiler, but that is not a blocker for this bug. It will not resolve the problem. The issues is still observable with javascript.options.wasm_optimizingjit=false. That points to problem with engine or difference in browser APIs behavior.

It is consistently hanging on a futex, seemingly deadlocked. This occurs specifically within the following:

Godot: make_default_theme
Godot: fill_default_theme
Godot: generate_icon
ThorVG: ImageLoaderSVG::create_image_from_string
ThorVG: SvgLoader::paint()
ThorVG: tvg::Task::done() (hangs on the condition variable wait)

My profile, with wasm baseline only: https://share.firefox.dev/3UNcIne

This happens consistently in Firefox regardless of which wasm compiler we use. Running in Firefox with a Chrome user agent changes the behavior of the page slightly (the game runs in the frame instead of a popup) but does not affect the hang at all.

It seems most likely to me that the problem is somewhere in the intersection of Godot, ThorVG, and Emscripten, with Godot and ThorVG the more likely culprits. On this exact wait, Emscripten helpfully logs a console error saying "Blocking on the main thread is very dangerous, see https://emscripten.org/docs/porting/pthreads.html#blocking-on-the-main-browser-thread". The referenced page helpfully describes the exact situation that seems to be occurring here, and suggests some remedies.

Overall my hunch is that Godot is misusing ThorVG, which itself uses a complicated (sorry, meticulously crafted) multithreaded task scheduler. Godot should probably ensure that application startup and theme generation do not block on the main thread, else I think these deadlocks are likely to happen for random reasons.

Webcompat Priority: ? → ---

Should we move this back to the WebCompat component and treat it as a site bug, or is there something we could do on our end to reduce the chance of this happening?

I wanted to add that as mentioned above I did a few extra tests, compiling a minimal Godot build disabling all third party libraries and modules (including ThorVG) and the issue still persists, so ThorVG seems more like a red herring than the culprit.

We had "similar" issues in the past when emscripten had broken thread_local support, my hunch is that there's some memory corruption happening and an atomic condition gets messed up.

As I said, I have no clue how to further debug this, since I can't even seem to be able to open the debug tools in Firefox when the deadlock is met.

Fabio, please link or attach said testcase (or more specifically the wasm file).

Flags: needinfo?(fabio.alessandrelli)

Verified this issue and still reproduces on Firefox Nightly

Attached are the requested files

Tested with:

Browser / Version: Firefox Nightly 128.0a1 (2024-05-27) (64-bit)
Operating System: Windows 10 PRO x64

Dennis, Ben, I've created a minimal godot build (no modules, no 3D), with a minimal example that asks you if you want to load the GDExtension or not. If you don't load it, it will run fine on Firefox, if you do, it will stall (both works on chrom* browsers).

This finally seems to produce a meaningful stack trace when pressing "stop script", pointing to a stalling dlopen (maybe an issue caused by different emscripten JS paths for different browsers?). Also, still no luck running the devtools debugger on my side which completely freezes :( .

Note that we do preload the library via emscripten's loadDynamicLibrary and (asynchronously) wait for them to be preloaded before attempting to start the engine (and then call dlopen with RTLD_NOW).

I've uploaded it here: https://fales.itch.io/gdextensionthreads-ff-bug-symbols
I had to strip the dwarf symbols from the side or itch would not let me publish it.
I've added the zip with full dwarf symbols as a download (so you can run it from localhost).
I'll see if I can have the version with full debug symbols also hosted on the Godot website.

Here's the stack trace:

_emscripten_get_now@http://127.0.0.1:8060/index.js:8071:29
futex_wait_main_browser_thread@:wasm-function[839]:0xa2949
emscripten_futex_wait@:wasm-function[838]:0xa283d
__timedwait_cp@:wasm-function[466]:0x94c13
__pthread_cond_timedwait@:wasm-function[1699]:0xbf119
pthread_cond_wait@:wasm-function[1711]:0xbf4ed
emscripten_proxy_sync_with_ctx@:wasm-function[1617]:0xbdf98
emscripten_proxy_sync@:wasm-function[1624]:0xbe242
_emscripten_proxy_dlsync@:wasm-function[754]:0xa18fb
Godot/createExportWrapper/Godot@http://127.0.0.1:8060/index.js:604:20
__emscripten_dlsync_threads@http://127.0.0.1:8060/index.js:7369:29
dlsync@:wasm-function[759]:0xa1a5b
load_library_done@:wasm-function[762]:0xa1c74
_dlopen@:wasm-function[757]:0xa1a07
dlopen@:wasm-function[756]:0xa1946
OS_Web::open_dynamic_library(String, void*&, bool, String*)@http://127.0.0.1:8060/index.js line 4686 > WebAssembly.instantiate:wasm-function[1149]:0x20cceb
GDExtension::open_library(String const&, String const&)@http://127.0.0.1:8060/index.js line 4686 > WebAssembly.instantiate:wasm-function[87848]:0x19c6ae9
GDExtensionResourceLoader::load_gdextension_resource(String const&, Ref<GDExtension>&)@http://127.0.0.1:8060/index.js line 4686 > WebAssembly.instantiate:wasm-function[87870]:0x19c91dc
GDExtensionResourceLoader::load(String const&, String const&, Error*, bool, float*, ResourceFormatLoader::CacheMode)@http://127.0.0.1:8060/index.js line 4686 > WebAssembly.instantiate:wasm-function[87876]:0x19c9851
ResourceLoader::_load(String const&, String const&, String const&, ResourceFormatLoader::CacheMode, Error*, bool, float*)@http://127.0.0.1:8060/index.js line 4686 > WebAssembly.instantiate:wasm-function[76169]:0x175c82c
ResourceLoader::_thread_load_function(void*)@http://127.0.0.1:8060/index.js line 4686 > WebAssembly.instantiate:wasm-function[76170]:0x175ccd5
ResourceLoader::_load_start(String const&, String const&, ResourceLoader::LoadThreadMode, ResourceFormatLoader::CacheMode)@http://127.0.0.1:8060/index.js line 4686 > WebAssembly.instantiate:wasm-function[76173]:0x175dba3
ResourceLoader::load(String const&, String const&, ResourceFormatLoader::CacheMode, Error*)@http://127.0.0.1:8060/index.js line 4686 > WebAssembly.instantiate:wasm-function[76179]:0x175ee3f
GDExtensionManager::load_extension(String const&)@http://127.0.0.1:8060/index.js line 4686 > WebAssembly.instantiate:wasm-function[88218]:0x19e7ee1
GDExtensionManager::load_extensions()@http://127.0.0.1:8060/index.js line 4686 > WebAssembly.instantiate:wasm-function[88233]:0x19e8c7e
register_core_extensions()@http://127.0.0.1:8060/index.js line 4686 > WebAssembly.instantiate:wasm-function[70844]:0x1597799
Main::setup(char const*, int, char**, bool)@http://127.0.0.1:8060/index.js line 4686 > WebAssembly.instantiate:wasm-function[1205]:0x2113c8
godot_web_main(int, char**)@http://127.0.0.1:8060/index.js line 4686 > WebAssembly.instantiate:wasm-function[1113]:0x20c0a4
__Z14godot_web_mainiPPc@http://127.0.0.1:8060/index.js:4951:47
main@:wasm-function[379]:0x8feb8
callMain@http://127.0.0.1:8060/index.js:52984:26
SafeEngine/start/Engine</</<@http://127.0.0.1:8060/index.js:53848:17
SafeEngine/start/Engine</<@http://127.0.0.1:8060/index.js:53843:14
Flags: needinfo?(fabio.alessandrelli)

It looks like we're still deadlocking on a emscripten_futex_wait, and I still see a "Blocking on the main thread is very dangerous" error, so what Ben wrote in comment 15 seems to still be true, just caused by something else.

Emscripten felt so strong about this that they're logging this as an error, not even as a warning, and it looks like you're running into exactly what they describe on their page. I'm of course no expert here and I hope that Ben can take another look to figure out if there is anything we can do inside Firefox here, but looking into resolving the angry Emscrpten error sounds like a reasonable thing for you to do. They probably know why they call it "very dangerous".

Indeed the reason why we don't get this error on Chrom* may simply be because their wasm compiler is faster and we don't run into a race condition.

About the "Blocking on the main thread is very dangerous", please, you have to understand that any application that is slightly more complex than a fractal generator, and uses threads, will have to call thread.join() at some point. I know "it's very dangerous", but it's also how normal application works.

Changing Godot to never block on main means redesigning and rewriting huge parts of the engine. So, I don't see that happening anytime soon.

In the end, we'll probably just drop the idea of running multithreaded on the web, since it clearly is not ready for general purpose applications yet.

(In reply to Fabio Alessandrelli from comment #22)

Indeed the reason why we don't get this error on Chrom* may simply be because their wasm compiler is faster and we don't run into a race condition.

Searching the internet, you find a lot of people who run into deadlocks with Emscripten on all kinds of browsers - including Chrome. I don't know why this seems to work most of the time in Chrome, but it's clearly a bit more complicated than "Chrome is faster". This specific test-case also deadlocks on Safari and Safari TP. It might also be interesting to ask this user on xitter who interacted with you before which browser they could reproduce this on. But this is clearly not a Firefox-only issue - it seems to me that it accidentally works in Chrome, and that's about it.

you have to understand that any application that is slightly more complex than a fractal generator, and uses threads, will have to call thread.join() at some point. I know "it's very dangerous", but it's also how normal application works.

Regardless of the fact that "block the main UI thread for an undefined amount if time" is absolutely not "how normal applications work", nobody here ever questioned the need to join threads at some point. All that happened here was people telling you "hey you're blocking the main thread, maybe try not doing that".

Not even the Emscripten docs tell you that you can never join a thread to the main thread. What they do, however, is asking you to investigate using alternatives like pthread_tryjoin_np, which would allow you to join without blocking. They also mention the PROXY_TO_PTHREAD flag, which moves the entire main() function into a thread where you can do whatever you want to without even worrying about blocking the main thread. It looks like Godot even supports that, but it's not the default. That PR looks like it was a lot of work, and I can imagine that supporting this flag in all cases is even more work, but making sure that this option is the default and making sure the engine properly supports it seems like a much better approach than just hoping that browsers fix your issues for you.

There is a limit to how much we can do in the browser - and especially when it comes to complicated timing issues, it's frequently near-impossible to 100% match the timing characteristics of another browser. The solution always is to make sure your app doesn't rely on those timing characteristics.

In the end, we'll probably just drop the idea of running multithreaded on the web, since it clearly is not ready for general purpose applications yet.

To be honest, I don't see any future coming up where it's okay to block the main thread without consequences. JS in the browser fundamentally doesn't work that way. A lot of stuff, especially DOM interactions, need to happen in a single-thread environment. So if you want the web to be a supported target, I don't think just deferring this problem to a later point is helping.

I see, well, thank you for your reply, I think the issue can be closed then, I guess the bug is in Chrome that for some reason works.

I've given up on working on the web platform, it's an endless pit of despair and I decided to protect my mental health.

I hope the new Godot Web platform maintainer will manage to make the proxy main to thread PR work.
Right now, it's broken due to all sort of re-entry race conditions when handling callbacks for input/io/etc because they must happen on the main thread.

Julian, do you want to use this bug to land what you suggested in comment 9, or do you want to spin off a different bug and have this one moved to the Site Reports component?

Flags: needinfo?(rverschelde) → needinfo?(jseward)

(In reply to Dennis Schubert [:denschub][OOO, back June 3rd] from comment #25)

Julian, do you want to use this bug to land what you suggested in comment 9 [..]

I've moved that to a new bug 1900316, so it's independent of this one.

Flags: needinfo?(jseward)
Severity: -- → S4
User Story: (updated)
Component: JavaScript: WebAssembly → Site Reports
Priority: -- → P3
Product: Core → Web Compatibility
See Also: → 1900316
Webcompat Priority: --- → P2
Webcompat Score: --- → 6
Webcompat Score: 6 → 5

Profile: https://share.firefox.dev/4nsikRn
Should this bug be closed?

You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: