[GTK_USE_PORTAL] File open/save dialogs get progressively slower
Categories
(Core :: Widget: Gtk, defect, P2)
Tracking
()
People
(Reporter: doctorow, Unassigned)
References
(Blocks 3 open bugs)
Details
Attachments
(1 obsolete file)
User Agent: Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:69.0) Gecko/20100101 Firefox/69.0
Steps to reproduce:
Through the day, as I use webforms or Save As... to save or upload files to my browser, dismissing the file dialog (by clicking "Save" or "Open") gets slower and slower. The only fix is to reboot. This problem has manifested across several 64 bit Thinkpad systems running different versions of Ubuntu, including LTS, OEM and current stable versions.
Actual results:
Clicking "Open" or "Save" in a file dialog box leaves the box drawn on the screen for a long pause that increases over time until the browser is in an unusable state. While the box is drawn on the screen, I can still access the browser controls (for example, switching tabs with keyboard shortcuts, mousing to text entry fields and clicking on them and beginning to type, select-all, etc).
Expected results:
Clicking "Open" or "Save" in a file dialog box should dismiss the box instantaenously.
Updated•5 years ago
|
Comment 1•5 years ago
|
||
Not sure where to put this without more info. Maybe File Handling - but maybe Core::Networking is more likely to be the root of the problem
Comment 2•5 years ago
|
||
Gijs, what further info from the reporter might help here?
Comment 3•5 years ago
|
||
So I'm pretty confused - I know bits about how our file dialog opening and frontend code work, but I don't think they're likely to be the issue here. My understanding is that the open/save boxes we use on Linux are ultimately created via calls into gtk, and (pretty much?) all our gtk integration runs on the main thread. Normally, while those dialogs are up (ie prior to accepting/canceling the dialog), you can't interact with the main browser UI (Cory, would be helpful if you can confirm that that's also what you're seeing!). So what doesn't make an awful lot of sense is if they stay on the screen, while the browser (whose UI also runs on the main thread) remains responsive.
So what I suspect is some kind of graphics/painting problem, but I don't really know what's responsible for "not painting" the dialog anymore, when (AIUI) it's not us ourselves who are directly responsible for the dialog in the first place.
The other thing that I notice in the report is that you explicitly say the dialog spends a long time going away when clicking Open/Save - at that point, does clicking cancel (or the close button if there is one in your windowing system) work quickly still, or is that also slow?
Also, can I ask what window manager / desktop environment you're using? I assume this is the same between the different machines you've seen this on?
Is the CPU busy (and/or progressively busier as your day goes on) while the dialog stays up, or not?
Karl, do you have any ideas about what might cause this issue?
(Based on the above, tentatively moving over to our widget/gtk integration component, though it's certainly possible it ought to be in graphics or I'm otherwise misdiagnosing...)
Reporter | ||
Comment 4•5 years ago
|
||
Hey there!
Yes, what I think is going on is that after I click Save/Open, the dialog ceases to be "live" but is not erased (maybe there's an animation that's stalling?). So the rest of the page is live and can be interacted with, but it's obscured by the dialog box that hasn't been erased.
A couple further, Ffox-specific notes:
-
This does not occur with Ffox's File -> Open dialog, only with file dialogs spawned by form elements
-
This does not occur with File dialogs in other programs on my system, only with Ffox form elements that open or save files
This leads me to think that Firefox's form elements is NOT accessing the same GTK UI widgets as the rest of the OS
Is the CPU busy (and/or progressively busier as your day goes on) while the dialog stays up, or not?
I don't know how I'd measure this?
Comment 5•5 years ago
|
||
(In reply to Cory Doctorow from comment #4)
Hey there!
Yes, what I think is going on is that after I click Save/Open, the dialog ceases to be "live" but is not erased (maybe there's an animation that's stalling?). So the rest of the page is live and can be interacted with, but it's obscured by the dialog box that hasn't been erased.
OK. Just to confirm, one of my many (sorry!) questions in comment #3 was whether canceling / closing the dialog was also slow (ie whether this is triggered specifically by accepting via open/save, or whether it happens for any attempt to close the dialog) - can you check that?
- This does not occur with Ffox's File -> Open dialog, only with file dialogs spawned by form elements
Oh, interesting. But both the File -> Open dialog and the browse button for a form element in a webpage (e.g. the one in the content area if you load data:text/html,<input type=file>
from the address bar) make the same dialog appear, right? That is, beyond potentially opening with the initial selection in different folders, the dialog looks/behaves the same? Or do you see very different dialogs between the two cases?
(In theory, it'd be possible for the in-content dialogs you use to be opened by Adobe Flash if the site uses that plugin and you've enabled it, but that seems unlikely...)
- This does not occur with File dialogs in other programs on my system, only with Ffox form elements that open or save files
This leads me to think that Firefox's form elements is NOT accessing the same GTK UI widgets as the rest of the OS
Right, there will definitely be some kind of difference, just not sure what exactly it is and why it'd cause this problem...
Is the CPU busy (and/or progressively busier as your day goes on) while the dialog stays up, or not?
I don't know how I'd measure this?
The summary and comment #0 talk about things being "slower", and typically that's caused by something keeping the CPU busy. However, it sounds like here the CPU might not be busy, we just somehow don't clear the dialog off the screen, so I was just trying to confirm if you see a thread/process using lots of CPU when the problem happens (e.g. by using top
or a gnome/kde/... graphical frontend for it). Based on you saying you can interact with the page "behind" the dialog when this happens, I think the suspicion of a painting problem (rather than a CPU/hang problem) is correct.
Reporter | ||
Comment 6•5 years ago
|
||
OK. Just to confirm, one of my many (sorry!) questions in comment #3 was whether canceling / closing the dialog was also slow (ie whether this is triggered specifically by accepting via open/save, or whether it happens for any attempt to close the dialog) - can you check that?
Canceling does not induce the delay, only open/save (again, maybe it's a different animation?).
Oh, interesting. But both the File -> Open dialog and the browse button for a form element in a webpage (e.g. the one in the content area if you load data:text/html,<input type=file> from the address bar) make the same dialog appear, right? That is, beyond potentially opening with the initial selection in different folders, the dialog looks/behaves the same? Or do you see very different dialogs between the two cases?
They don't appear different, nope.
(In theory, it'd be possible for the in-content dialogs you use to be opened by Adobe Flash if the site uses that plugin and you've enabled it, but that seems unlikely...)
No, these are native HTML form elements.
The summary and comment #0 talk about things being "slower", and typically that's caused by something keeping the CPU busy. However, it sounds like here the CPU might not be busy, we just somehow don't clear the dialog off the screen, so I was just trying to confirm if you see a thread/process using lots of CPU when the problem happens (e.g. by using top or a gnome/kde/... graphical frontend for it). Based on you saying you can interact with the page "behind" the dialog when this happens, I think the suspicion of a painting problem (rather than a CPU/hang problem) is correct.
I'll keep top running and let you know if I see any CPU spikes.
Comment 7•5 years ago
|
||
(In reply to Cory Doctorow from comment #6)
OK. Just to confirm, one of my many (sorry!) questions in comment #3 was whether canceling / closing the dialog was also slow (ie whether this is triggered specifically by accepting via open/save, or whether it happens for any attempt to close the dialog) - can you check that?
Canceling does not induce the delay, only open/save (again, maybe it's a different animation?).
Are there visible animations in these dialogs when they disappear in normal use? And could you list what window manager / desktop environment you use, to see if people can reproduce?
Reporter | ||
Comment 8•5 years ago
|
||
Hey there! No, I don't see an animation; but I know that in Gnome it's standard advice that you can speed things up by turning off animations at the OS levels; these animations are minor things like changing the alpha channels as elements are drawn and erased and are largely below the threshold of my ability to notice them (which is good, b/c I turned them all off!), but they're super compute-intensive and turning them off does indeed make things a LOT faster.
Comment 9•5 years ago
|
||
It's hard to imagine a painting problem in Gecko causing this. If the dialog has gone, then a compositing window manager would draw the last contents received from the parent window. But I don't have any other good suspicions re why the window would still be visible, but no longer modal.
One difference with from-content dialogs is IPC.
One difference between Open/Save and Cancel might be disk I/O.
Is the Firefox build directly from Mozilla or provided by a distribution?
Reporter | ||
Comment 10•5 years ago
|
||
I get it from the standard Ubuntu PPA.
Comment 11•5 years ago
|
||
If the slowdowns are only on save dialogs, it probably isn't memory leak related, so I'm clearing those flags.
Reporter | ||
Comment 12•5 years ago
|
||
Just to be clear, it's Save AND Open dialogs.
Comment 13•3 years ago
|
||
The bug has a release status flag that shows some version of Firefox is affected, thus it will be considered confirmed.
Updated•2 years ago
|
Comment 14•2 years ago
|
||
I was able to reproduce this today and get a profile from a single image save: https://share.firefox.dev/3PBUyE7
The images I used for testing were from the Nightowl cdrom collection (http://www.retroarchive.org/cdrom/). I saved over 500 images into a single directory; by the end of the test the browser was very slow at saving files (multiple seconds to even open the dialog).
Comment 15•2 years ago
|
||
It looks like Gtk3 dialog bug. I've seen similar before, usually related to file sorting or so.
I know it's difficult to reproduce, but do you see this bug in other Gtk3 based applications?
Like gedit (but gedit can't save imaged). Gimp may work but it may not support Gtk3 yet afaik (https://www.phoronix.com/news/GIMP-GTK3-Finished).
Comment 16•2 years ago
|
||
Also the stack looks a bit weird:
fun_37d490
fun_1c8040
msort_with_tmp
msort_with_tmp
msort_with_tmp
msort_with_tmp
msort_with_tmp
msort_with_tmp
msort_with_tmp
msort_with_tmp
msort_r
fun_1c8740
fun_1cc300
g_task_return_now
complete_in_idle_cb
g_main_dispatch
g_main_context_dispatch
g_main_context_iterate
g_main_context_iteration
nsAppShell::ProcessNextNativeEvent(bool)
nsBaseAppShell::DoProcessNextNativeEvent(bool)
nsBaseAppShell::OnProcessNextEvent(nsIThreadInternal*, bool)
{virtual override thunk({offset(-8)}, nsBaseAppShell::OnProcessNextEvent(nsIThreadInternal*, bool))}
nsThread::ProcessNextEvent(bool, bool*)
NS_ProcessNextEvent(nsIThread*, bool)
mozilla::ipc::MessagePump::Run(base::MessagePump::Delegate*)
MessageLoop::RunInternal()
MessageLoop::RunHandler()
MessageLoop::Run()
nsBaseAppShell::Run()
nsAppStartup::Run()
XREMain::XRE_mainRun()
XREMain::XRE_main(int, char**, mozilla::BootstrapConfig const&)
XREMain::XRE_main
i checked gtk3 sources on Fedora but I don't see any gtk3 calls to fun_*() which comes from libgtk-3.so.0 on the stack. Is that an Ubuntu specific modification or so?
Comment 17•2 years ago
|
||
Well, it's also possible that the profile is broken and we see wrong profile names. Can you please try to use local debugging to get the backtrace?
https://fedoraproject.org/wiki/Debugging_guidelines_for_Mozilla_products#Attach_debugger_to_running_application
(make sure you have installed debug info packages for gtk3 & firefox).
Thanks.
Comment 18•2 years ago
|
||
(In reply to Martin Stránský [:stransky] (ni? me) from comment #15)
It looks like Gtk3 dialog bug. I've seen similar before, usually related to file sorting or so.
I know it's difficult to reproduce, but do you see this bug in other Gtk3 based applications?
Like gedit (but gedit can't save imaged). Gimp may work but it may not support Gtk3 yet afaik (https://www.phoronix.com/news/GIMP-GTK3-Finished).
The bug is not difficult at all to reproduce in firefox (just time consuming). Here's what I did:
- Visit http://annex.retroarchive.org/cdrom/nightowl-008/035A/index.html
- Open all GIFs in a new tab with right-click
- For each tab, right-click on the image and click save as
- Repeat with a different nightowl cdrom until saving feels slow
Now that I think about it, opening the images in a new tab is probably not necessary. I'm guessing it would have been enough to right-click on the link instead, but I have not tried that.
Off the top of my head, I don't know another gtk3 application that would be an apples-to-apples comparison. The ideal would be a gtk-based web browser. Epiphany crashes (tried both the Ubuntu package and the flatpak), and netsurf-gtk doesn't have a "Save As" option in its context menu. Surf also crashes, and Galeon is too old.
(In reply to Martin Stránský [:stransky] (ni? me) from comment #16)
Also the stack looks a bit weird:
...
i checked gtk3 sources on Fedora but I don't see any gtk3 calls to fun_*() which comes from libgtk-3.so.0 on the stack. Is that an Ubuntu specific modification or so?
Would this be related to not having the debug packages installed? Or does the profiler know how to use debuginfod?
(In reply to Martin Stránský [:stransky] (ni? me) from comment #17)
Well, it's also possible that the profile is broken and we see wrong profile names. Can you please try to use local debugging to get the backtrace?
https://fedoraproject.org/wiki/Debugging_guidelines_for_Mozilla_products#Attach_debugger_to_running_application
(make sure you have installed debug info packages for gtk3 & firefox).
Thanks.
If I get some free time again I can try to do this, otherwise hopefully the instructions above are sufficient for someone else to repro. The process takes about 20-30 minutes to manually save 500 images, but maybe a useful profile can be obtained with fewer images.
Comment 19•2 years ago
|
||
I was not able to get anything useful out of gdb, because I don't have the debug symbols available for the firefox snap, and I do not know how to install them. I do not see any functions with the prefix fun_
in libgtk-3.so.0
, so I do not know where those functions are coming from or what they represent.
I did do another experiment today:
- Start with 2200 images in a single directory
- With a fresh firefox instance, save 100 images to that directory using right-click and save as
- Touch a single text file in the same directory as the images and firefox has a CPU spike
This seems to me to indicate that the file save dialogs are not getting closed properly after saving the images and that they are still watching the directory for changes.
Next, I touched 5 text files with the profiler running. The profile looks similar to the previous one: https://share.firefox.dev/3Nz8f3Y
Next, I moved 13 images to a subdirectory. This was pretty fast: https://share.firefox.dev/44cIXQb
I then moved all the GIF images to a subdirectory. This took longer but was still fast: https://share.firefox.dev/3Nz8f3Y
I finally moved all the GIF images back into the parent directory they came from. This caused a CPU spike > 500%, then CPU came down to 100% and firefox was completely unresponsive. I waited for 10 minutes before giving up and killing the process.
Comment 20•2 years ago
|
||
Some more sleuthing: using the functions that are invoked we can make an educated guess about which real function each of the fun_*
functions is.
Looking at the profile where I touched 5 text files: we can see that fun_1c9f30
is called from _g_cclosure_marshal_VOID__OBJECT_OBJECT_ENUMv
. AFAICT this function is only referenced one place in all of glib/gtk - g_file_monitor_class_init
(https://github.com/GNOME/glib/blob/ec754e5404be345fe4b576b65acccf88c72476d3/gio/gfilemonitor.c#L137).
GFileMonitor
is used a few places in gtk. The most likely candidate is GtkFileSystemModel
, as this is used by GtkFileChooserWidget
.
One of the functions called by fun_1c9f30
is gtk_tree_path_new_from_indices
. This is called indirectly from _gtk_file_system_model_update_file
via thaw_updates
; _gtk_file_system_model_update_file
is called indirectly from gtk_file_system_model_monitor_change
. This is the signal handler for the file monitor "changed" event.
It seems then that even though the dialog itself is not visible and not modal, the file system model is still receiving events from a live file system monitor.
Comment 21•2 years ago
|
||
For fun_1c8740
, we can see that it calls the following functions (among others):
msort_r
(which is reallyg_sort_with_data
)g_signal_emit
g_hash_table_resize
gtk_tree_path_new
g_hash_table_remove_all
g_qsort_with_data
is called from a few places, but the most likely is gtk_file_system_model_sort
. This function also calls g_hash_table_remove_all
, gtk_tree_path_new
, and gtk_tree_path_free
. I don't see where it calls g_hash_table_resize
but it could be called indirectly.
For fun_1cbe80
, we can see that it calls g_file_equal
and g_hash_table_insert
. g_file_equal
is the equality comparison for model->file_lookup
. So fun_1cbe80
is something that calls node_get_for_file
.
The difference between the test where I touched text files and the test where I saved an image from the browser is that when saving the image from the browser, the filter is matched (because the new file is an image). Perhaps this is why fun_1c9f30
is only a small percentage of the time in the profile were an image was saved -- if the new file is a text file, there is less work to be done than when it is an image (when it is an image, the list must be resorted).
Comment 22•2 years ago
|
||
Since my last experiment of moving all 2000+ files out of the download directory and back in again caused the browser to hang, I decided to try it again with a smaller number of files.
In this experiment, I downloaded 11 images using the same process as before (open in a new tab, right click, save image as). Then I moved 140 images into a subdirectory. Next I started the profiler and moved all 140 images back into the download directory.
The browser was completely unresponsive for a full minute.
Unfortunately I don't think the profile adds much information; it looks very similar to previous experiments: https://share.firefox.dev/3CZA14N
One thing the experiment does show is that the calls to fun_1cc300
, fun_1cbe80
, and fun_1c8740
are not due to Save As (because I did not click Save As during the profile run, unlike the very first profile). This supports the idea that these functions are called because the file model's list has to be re-sorted due to a change in the filesystem.
Comment 23•2 years ago
|
||
That's weird and looks like something from snap environment.
Can you try to test plain Mozilla binaries?
https://fedoraproject.org/wiki/How_to_debug_Firefox_problems#Testing_Mozilla_binaries
It's unfortunate that you're missing debug info packages, I don't know if they're available for snap or even for any Firefox packages provided in Ubuntu.
Comment 24•2 years ago
|
||
Using the vanilla binaries, I was initially unable to reproduce the bug. Here's a profile where I had already saved 10 images, then moved 140 images into the download directory: https://share.firefox.dev/3NXwfPr
I noticed that the Save As dialog was different with the vanilla binary. It turns out that Ubuntu sets the environment variable GTK_USE_PORTAL=1
(at least under KDE). Trying the same experiment with that variable set did trigger the bug: https://share.firefox.dev/3r4jE44
Some of the fun_*
functions are still there, so I did try getting a backtrace in gdb. I have debuginfod enabled for gdb, and I have the glib dbysym package installed, but installing the gtk dbgsym package gives me an error. The resulting backtrace is incomplete:
#0 0x00007ffff65eb4ea in () at /lib/x86_64-linux-gnu/libgtk-3.so.0
#1 0x00007ffff5b8a5c6 in msort_with_tmp (p=0x7fffffffbf60, b=0x7fffca8abbe0, n=2428) at ../../../glib/gqsort.c:152
#2 0x00007ffff5b8d975 in msort_r (b=<optimized out>, n=<optimized out>, s=72, cmp=0x7ffff65eb470, arg=0x7fffffffbfd0) at ../../../glib/gqsort.c:237
#3 0x00007ffff65f1554 in () at /lib/x86_64-linux-gnu/libgtk-3.so.0
#4 0x00007ffff65f1fc0 in () at /lib/x86_64-linux-gnu/libgtk-3.so.0
#5 0x00007ffff65f2007 in () at /lib/x86_64-linux-gnu/libgtk-3.so.0
#6 0x00007ffff5dfce39 in g_task_return_now (task=0x7fffc2c26a00) at ../../../gio/gtask.c:1230
#7 0x00007ffff5dfce7d in complete_in_idle_cb (task=0x7fffc2c26a00) at ../../../gio/gtask.c:1244
#8 0x00007ffff5b84c44 in g_main_dispatch (context=0x7ffff77ee2f0) at ../../../glib/gmain.c:3419
#9 g_main_context_dispatch (context=0x7ffff77ee2f0) at ../../../glib/gmain.c:4137
#10 0x00007ffff5bda258 in g_main_context_iterate.constprop.0 (context=context@entry=0x7ffff77ee2f0, block=block@entry=0, dispatch=dispatch@entry=1, self=<optimized out>) at ../../../glib/gmain.c:4213
#11 0x00007ffff5b823e3 in g_main_context_iteration (context=0x7ffff77ee2f0, may_block=0) at ../../../glib/gmain.c:4278
#12 0x00007fffefdf6912 in () at /home/cout/Downloads/firefox/libxul.so
#13 0x00007fffefe500fe in () at /home/cout/Downloads/firefox/libxul.so
#14 0x00007ffff0a3815d in () at /home/cout/Downloads/firefox/libxul.so
#15 0x00007ffff1178336 in () at /home/cout/Downloads/firefox/libxul.so
#16 0x00007fffeebe8d31 in () at /home/cout/Downloads/firefox/libxul.so
#17 0x00007fffeec69df2 in () at /home/cout/Downloads/firefox/libxul.so
#18 0x00007fffeec6a6cb in () at /home/cout/Downloads/firefox/libxul.so
#19 0x00007fffeec6aac1 in () at /home/cout/Downloads/firefox/libxul.so
#20 0x00005555555ce5c6 in _start ()
Comment 25•2 years ago
|
||
Thanks for testing. GTK_USE_PORTAL means you use file dialog provided by KDE environment via. portal, i.e. that's used by snap/flatpak to bypass sandbox. You should report this bug to Ubuntu/KDE (but Gnome may be affected too).
Updated•2 years ago
|
Updated•2 years ago
|
Updated•11 months ago
|
Description
•