Open Bug 1581169 Opened 5 years ago Updated 3 months ago

[GTK_USE_PORTAL] File open/save dialogs get progressively slower

Categories

(Core :: Widget: Gtk, defect, P2)

69 Branch
defect

Tracking

()

Tracking Status
firefox69 --- affected
firefox70 --- ?
firefox71 --- ?

People

(Reporter: doctorow, Unassigned)

References

(Blocks 3 open bugs)

Details

Attachments

(1 obsolete file)

User Agent: Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:69.0) Gecko/20100101 Firefox/69.0

Steps to reproduce:

Through the day, as I use webforms or Save As... to save or upload files to my browser, dismissing the file dialog (by clicking "Save" or "Open") gets slower and slower. The only fix is to reboot. This problem has manifested across several 64 bit Thinkpad systems running different versions of Ubuntu, including LTS, OEM and current stable versions.

Actual results:

Clicking "Open" or "Save" in a file dialog box leaves the box drawn on the screen for a long pause that increases over time until the browser is in an unusable state. While the box is drawn on the screen, I can still access the browser controls (for example, switching tabs with keyboard shortcuts, mousing to text entry fields and clicking on them and beginning to type, select-all, etc).

Expected results:

Clicking "Open" or "Save" in a file dialog box should dismiss the box instantaenously.

Keywords: memory-leak
Whiteboard: [MemShrink]

Not sure where to put this without more info. Maybe File Handling - but maybe Core::Networking is more likely to be the root of the problem

Component: Untriaged → File Handling

Gijs, what further info from the reporter might help here?

Flags: needinfo?(gijskruitbosch+bugs)

So I'm pretty confused - I know bits about how our file dialog opening and frontend code work, but I don't think they're likely to be the issue here. My understanding is that the open/save boxes we use on Linux are ultimately created via calls into gtk, and (pretty much?) all our gtk integration runs on the main thread. Normally, while those dialogs are up (ie prior to accepting/canceling the dialog), you can't interact with the main browser UI (Cory, would be helpful if you can confirm that that's also what you're seeing!). So what doesn't make an awful lot of sense is if they stay on the screen, while the browser (whose UI also runs on the main thread) remains responsive.

So what I suspect is some kind of graphics/painting problem, but I don't really know what's responsible for "not painting" the dialog anymore, when (AIUI) it's not us ourselves who are directly responsible for the dialog in the first place.

The other thing that I notice in the report is that you explicitly say the dialog spends a long time going away when clicking Open/Save - at that point, does clicking cancel (or the close button if there is one in your windowing system) work quickly still, or is that also slow?
Also, can I ask what window manager / desktop environment you're using? I assume this is the same between the different machines you've seen this on?
Is the CPU busy (and/or progressively busier as your day goes on) while the dialog stays up, or not?

Karl, do you have any ideas about what might cause this issue?

(Based on the above, tentatively moving over to our widget/gtk integration component, though it's certainly possible it ought to be in graphics or I'm otherwise misdiagnosing...)

Component: File Handling → Widget: Gtk
Flags: needinfo?(karlt)
Flags: needinfo?(gijskruitbosch+bugs)
Flags: needinfo?(doctorow)
Product: Firefox → Core

Hey there!

Yes, what I think is going on is that after I click Save/Open, the dialog ceases to be "live" but is not erased (maybe there's an animation that's stalling?). So the rest of the page is live and can be interacted with, but it's obscured by the dialog box that hasn't been erased.

A couple further, Ffox-specific notes:

  • This does not occur with Ffox's File -> Open dialog, only with file dialogs spawned by form elements

  • This does not occur with File dialogs in other programs on my system, only with Ffox form elements that open or save files

This leads me to think that Firefox's form elements is NOT accessing the same GTK UI widgets as the rest of the OS

Is the CPU busy (and/or progressively busier as your day goes on) while the dialog stays up, or not?

I don't know how I'd measure this?

Flags: needinfo?(doctorow)

(In reply to Cory Doctorow from comment #4)

Hey there!

Yes, what I think is going on is that after I click Save/Open, the dialog ceases to be "live" but is not erased (maybe there's an animation that's stalling?). So the rest of the page is live and can be interacted with, but it's obscured by the dialog box that hasn't been erased.

OK. Just to confirm, one of my many (sorry!) questions in comment #3 was whether canceling / closing the dialog was also slow (ie whether this is triggered specifically by accepting via open/save, or whether it happens for any attempt to close the dialog) - can you check that?

  • This does not occur with Ffox's File -> Open dialog, only with file dialogs spawned by form elements

Oh, interesting. But both the File -> Open dialog and the browse button for a form element in a webpage (e.g. the one in the content area if you load data:text/html,<input type=file> from the address bar) make the same dialog appear, right? That is, beyond potentially opening with the initial selection in different folders, the dialog looks/behaves the same? Or do you see very different dialogs between the two cases?

(In theory, it'd be possible for the in-content dialogs you use to be opened by Adobe Flash if the site uses that plugin and you've enabled it, but that seems unlikely...)

  • This does not occur with File dialogs in other programs on my system, only with Ffox form elements that open or save files

This leads me to think that Firefox's form elements is NOT accessing the same GTK UI widgets as the rest of the OS

Right, there will definitely be some kind of difference, just not sure what exactly it is and why it'd cause this problem...

Is the CPU busy (and/or progressively busier as your day goes on) while the dialog stays up, or not?

I don't know how I'd measure this?

The summary and comment #0 talk about things being "slower", and typically that's caused by something keeping the CPU busy. However, it sounds like here the CPU might not be busy, we just somehow don't clear the dialog off the screen, so I was just trying to confirm if you see a thread/process using lots of CPU when the problem happens (e.g. by using top or a gnome/kde/... graphical frontend for it). Based on you saying you can interact with the page "behind" the dialog when this happens, I think the suspicion of a painting problem (rather than a CPU/hang problem) is correct.

Flags: needinfo?(doctorow)

OK. Just to confirm, one of my many (sorry!) questions in comment #3 was whether canceling / closing the dialog was also slow (ie whether this is triggered specifically by accepting via open/save, or whether it happens for any attempt to close the dialog) - can you check that?

Canceling does not induce the delay, only open/save (again, maybe it's a different animation?).

Oh, interesting. But both the File -> Open dialog and the browse button for a form element in a webpage (e.g. the one in the content area if you load data:text/html,<input type=file> from the address bar) make the same dialog appear, right? That is, beyond potentially opening with the initial selection in different folders, the dialog looks/behaves the same? Or do you see very different dialogs between the two cases?

They don't appear different, nope.

(In theory, it'd be possible for the in-content dialogs you use to be opened by Adobe Flash if the site uses that plugin and you've enabled it, but that seems unlikely...)

No, these are native HTML form elements.

The summary and comment #0 talk about things being "slower", and typically that's caused by something keeping the CPU busy. However, it sounds like here the CPU might not be busy, we just somehow don't clear the dialog off the screen, so I was just trying to confirm if you see a thread/process using lots of CPU when the problem happens (e.g. by using top or a gnome/kde/... graphical frontend for it). Based on you saying you can interact with the page "behind" the dialog when this happens, I think the suspicion of a painting problem (rather than a CPU/hang problem) is correct.

I'll keep top running and let you know if I see any CPU spikes.

Flags: needinfo?(doctorow)

(In reply to Cory Doctorow from comment #6)

OK. Just to confirm, one of my many (sorry!) questions in comment #3 was whether canceling / closing the dialog was also slow (ie whether this is triggered specifically by accepting via open/save, or whether it happens for any attempt to close the dialog) - can you check that?

Canceling does not induce the delay, only open/save (again, maybe it's a different animation?).

Are there visible animations in these dialogs when they disappear in normal use? And could you list what window manager / desktop environment you use, to see if people can reproduce?

Flags: needinfo?(doctorow)

Hey there! No, I don't see an animation; but I know that in Gnome it's standard advice that you can speed things up by turning off animations at the OS levels; these animations are minor things like changing the alpha channels as elements are drawn and erased and are largely below the threshold of my ability to notice them (which is good, b/c I turned them all off!), but they're super compute-intensive and turning them off does indeed make things a LOT faster.

Flags: needinfo?(doctorow)

It's hard to imagine a painting problem in Gecko causing this. If the dialog has gone, then a compositing window manager would draw the last contents received from the parent window. But I don't have any other good suspicions re why the window would still be visible, but no longer modal.

One difference with from-content dialogs is IPC.

One difference between Open/Save and Cancel might be disk I/O.

Is the Firefox build directly from Mozilla or provided by a distribution?

Flags: needinfo?(karlt)

I get it from the standard Ubuntu PPA.

If the slowdowns are only on save dialogs, it probably isn't memory leak related, so I'm clearing those flags.

Keywords: memory-leak
Whiteboard: [MemShrink]

Just to be clear, it's Save AND Open dialogs.

The bug has a release status flag that shows some version of Firefox is affected, thus it will be considered confirmed.

Status: UNCONFIRMED → NEW
Ever confirmed: true
Severity: normal → S3

I was able to reproduce this today and get a profile from a single image save: https://share.firefox.dev/3PBUyE7

The images I used for testing were from the Nightowl cdrom collection (http://www.retroarchive.org/cdrom/). I saved over 500 images into a single directory; by the end of the test the browser was very slow at saving files (multiple seconds to even open the dialog).

It looks like Gtk3 dialog bug. I've seen similar before, usually related to file sorting or so.
I know it's difficult to reproduce, but do you see this bug in other Gtk3 based applications?
Like gedit (but gedit can't save imaged). Gimp may work but it may not support Gtk3 yet afaik (https://www.phoronix.com/news/GIMP-GTK3-Finished).

Priority: -- → P2

Also the stack looks a bit weird:

fun_37d490
fun_1c8040
msort_with_tmp
msort_with_tmp
msort_with_tmp
msort_with_tmp
msort_with_tmp
msort_with_tmp
msort_with_tmp
msort_with_tmp
msort_r
fun_1c8740
fun_1cc300
g_task_return_now
complete_in_idle_cb
g_main_dispatch
g_main_context_dispatch
g_main_context_iterate
g_main_context_iteration
nsAppShell::ProcessNextNativeEvent(bool)
nsBaseAppShell::DoProcessNextNativeEvent(bool)
nsBaseAppShell::OnProcessNextEvent(nsIThreadInternal*, bool)
{virtual override thunk({offset(-8)}, nsBaseAppShell::OnProcessNextEvent(nsIThreadInternal*, bool))}
nsThread::ProcessNextEvent(bool, bool*)
NS_ProcessNextEvent(nsIThread*, bool)
mozilla::ipc::MessagePump::Run(base::MessagePump::Delegate*)
MessageLoop::RunInternal()
MessageLoop::RunHandler()
MessageLoop::Run()
nsBaseAppShell::Run()
nsAppStartup::Run()
XREMain::XRE_mainRun()
XREMain::XRE_main(int, char**, mozilla::BootstrapConfig const&)
XREMain::XRE_main

i checked gtk3 sources on Fedora but I don't see any gtk3 calls to fun_*() which comes from libgtk-3.so.0 on the stack. Is that an Ubuntu specific modification or so?

Well, it's also possible that the profile is broken and we see wrong profile names. Can you please try to use local debugging to get the backtrace?
https://fedoraproject.org/wiki/Debugging_guidelines_for_Mozilla_products#Attach_debugger_to_running_application
(make sure you have installed debug info packages for gtk3 & firefox).
Thanks.

Flags: needinfo?(curlypaul924)

(In reply to Martin Stránský [:stransky] (ni? me) from comment #15)

It looks like Gtk3 dialog bug. I've seen similar before, usually related to file sorting or so.
I know it's difficult to reproduce, but do you see this bug in other Gtk3 based applications?
Like gedit (but gedit can't save imaged). Gimp may work but it may not support Gtk3 yet afaik (https://www.phoronix.com/news/GIMP-GTK3-Finished).

The bug is not difficult at all to reproduce in firefox (just time consuming). Here's what I did:

Now that I think about it, opening the images in a new tab is probably not necessary. I'm guessing it would have been enough to right-click on the link instead, but I have not tried that.

Off the top of my head, I don't know another gtk3 application that would be an apples-to-apples comparison. The ideal would be a gtk-based web browser. Epiphany crashes (tried both the Ubuntu package and the flatpak), and netsurf-gtk doesn't have a "Save As" option in its context menu. Surf also crashes, and Galeon is too old.

(In reply to Martin Stránský [:stransky] (ni? me) from comment #16)

Also the stack looks a bit weird:
...
i checked gtk3 sources on Fedora but I don't see any gtk3 calls to fun_*() which comes from libgtk-3.so.0 on the stack. Is that an Ubuntu specific modification or so?

Would this be related to not having the debug packages installed? Or does the profiler know how to use debuginfod?

(In reply to Martin Stránský [:stransky] (ni? me) from comment #17)

Well, it's also possible that the profile is broken and we see wrong profile names. Can you please try to use local debugging to get the backtrace?
https://fedoraproject.org/wiki/Debugging_guidelines_for_Mozilla_products#Attach_debugger_to_running_application
(make sure you have installed debug info packages for gtk3 & firefox).
Thanks.

If I get some free time again I can try to do this, otherwise hopefully the instructions above are sufficient for someone else to repro. The process takes about 20-30 minutes to manually save 500 images, but maybe a useful profile can be obtained with fewer images.

I was not able to get anything useful out of gdb, because I don't have the debug symbols available for the firefox snap, and I do not know how to install them. I do not see any functions with the prefix fun_ in libgtk-3.so.0, so I do not know where those functions are coming from or what they represent.

I did do another experiment today:

  • Start with 2200 images in a single directory
  • With a fresh firefox instance, save 100 images to that directory using right-click and save as
  • Touch a single text file in the same directory as the images and firefox has a CPU spike

This seems to me to indicate that the file save dialogs are not getting closed properly after saving the images and that they are still watching the directory for changes.

Next, I touched 5 text files with the profiler running. The profile looks similar to the previous one: https://share.firefox.dev/3Nz8f3Y
Next, I moved 13 images to a subdirectory. This was pretty fast: https://share.firefox.dev/44cIXQb
I then moved all the GIF images to a subdirectory. This took longer but was still fast: https://share.firefox.dev/3Nz8f3Y
I finally moved all the GIF images back into the parent directory they came from. This caused a CPU spike > 500%, then CPU came down to 100% and firefox was completely unresponsive. I waited for 10 minutes before giving up and killing the process.

Flags: needinfo?(curlypaul924)

Some more sleuthing: using the functions that are invoked we can make an educated guess about which real function each of the fun_* functions is.

Looking at the profile where I touched 5 text files: we can see that fun_1c9f30 is called from _g_cclosure_marshal_VOID__OBJECT_OBJECT_ENUMv. AFAICT this function is only referenced one place in all of glib/gtk - g_file_monitor_class_init (https://github.com/GNOME/glib/blob/ec754e5404be345fe4b576b65acccf88c72476d3/gio/gfilemonitor.c#L137).

GFileMonitor is used a few places in gtk. The most likely candidate is GtkFileSystemModel, as this is used by GtkFileChooserWidget.

One of the functions called by fun_1c9f30 is gtk_tree_path_new_from_indices. This is called indirectly from _gtk_file_system_model_update_file via thaw_updates; _gtk_file_system_model_update_file is called indirectly from gtk_file_system_model_monitor_change. This is the signal handler for the file monitor "changed" event.

It seems then that even though the dialog itself is not visible and not modal, the file system model is still receiving events from a live file system monitor.

For fun_1c8740, we can see that it calls the following functions (among others):

  • msort_r (which is really g_sort_with_data)
  • g_signal_emit
  • g_hash_table_resize
  • gtk_tree_path_new
  • g_hash_table_remove_all

g_qsort_with_data is called from a few places, but the most likely is gtk_file_system_model_sort. This function also calls g_hash_table_remove_all, gtk_tree_path_new, and gtk_tree_path_free. I don't see where it calls g_hash_table_resize but it could be called indirectly.

For fun_1cbe80, we can see that it calls g_file_equal and g_hash_table_insert. g_file_equal is the equality comparison for model->file_lookup. So fun_1cbe80 is something that calls node_get_for_file.

The difference between the test where I touched text files and the test where I saved an image from the browser is that when saving the image from the browser, the filter is matched (because the new file is an image). Perhaps this is why fun_1c9f30 is only a small percentage of the time in the profile were an image was saved -- if the new file is a text file, there is less work to be done than when it is an image (when it is an image, the list must be resorted).

Since my last experiment of moving all 2000+ files out of the download directory and back in again caused the browser to hang, I decided to try it again with a smaller number of files.

In this experiment, I downloaded 11 images using the same process as before (open in a new tab, right click, save image as). Then I moved 140 images into a subdirectory. Next I started the profiler and moved all 140 images back into the download directory.

The browser was completely unresponsive for a full minute.

Unfortunately I don't think the profile adds much information; it looks very similar to previous experiments: https://share.firefox.dev/3CZA14N

One thing the experiment does show is that the calls to fun_1cc300, fun_1cbe80, and fun_1c8740 are not due to Save As (because I did not click Save As during the profile run, unlike the very first profile). This supports the idea that these functions are called because the file model's list has to be re-sorted due to a change in the filesystem.

That's weird and looks like something from snap environment.

Can you try to test plain Mozilla binaries?
https://fedoraproject.org/wiki/How_to_debug_Firefox_problems#Testing_Mozilla_binaries

It's unfortunate that you're missing debug info packages, I don't know if they're available for snap or even for any Firefox packages provided in Ubuntu.

Blocks: snap

Using the vanilla binaries, I was initially unable to reproduce the bug. Here's a profile where I had already saved 10 images, then moved 140 images into the download directory: https://share.firefox.dev/3NXwfPr

I noticed that the Save As dialog was different with the vanilla binary. It turns out that Ubuntu sets the environment variable GTK_USE_PORTAL=1 (at least under KDE). Trying the same experiment with that variable set did trigger the bug: https://share.firefox.dev/3r4jE44

Some of the fun_* functions are still there, so I did try getting a backtrace in gdb. I have debuginfod enabled for gdb, and I have the glib dbysym package installed, but installing the gtk dbgsym package gives me an error. The resulting backtrace is incomplete:

#0  0x00007ffff65eb4ea in  () at /lib/x86_64-linux-gnu/libgtk-3.so.0
#1  0x00007ffff5b8a5c6 in msort_with_tmp (p=0x7fffffffbf60, b=0x7fffca8abbe0, n=2428) at ../../../glib/gqsort.c:152
#2  0x00007ffff5b8d975 in msort_r (b=<optimized out>, n=<optimized out>, s=72, cmp=0x7ffff65eb470, arg=0x7fffffffbfd0) at ../../../glib/gqsort.c:237
#3  0x00007ffff65f1554 in  () at /lib/x86_64-linux-gnu/libgtk-3.so.0
#4  0x00007ffff65f1fc0 in  () at /lib/x86_64-linux-gnu/libgtk-3.so.0
#5  0x00007ffff65f2007 in  () at /lib/x86_64-linux-gnu/libgtk-3.so.0
#6  0x00007ffff5dfce39 in g_task_return_now (task=0x7fffc2c26a00) at ../../../gio/gtask.c:1230
#7  0x00007ffff5dfce7d in complete_in_idle_cb (task=0x7fffc2c26a00) at ../../../gio/gtask.c:1244
#8  0x00007ffff5b84c44 in g_main_dispatch (context=0x7ffff77ee2f0) at ../../../glib/gmain.c:3419
#9  g_main_context_dispatch (context=0x7ffff77ee2f0) at ../../../glib/gmain.c:4137
#10 0x00007ffff5bda258 in g_main_context_iterate.constprop.0 (context=context@entry=0x7ffff77ee2f0, block=block@entry=0, dispatch=dispatch@entry=1, self=<optimized out>) at ../../../glib/gmain.c:4213
#11 0x00007ffff5b823e3 in g_main_context_iteration (context=0x7ffff77ee2f0, may_block=0) at ../../../glib/gmain.c:4278
#12 0x00007fffefdf6912 in  () at /home/cout/Downloads/firefox/libxul.so
#13 0x00007fffefe500fe in  () at /home/cout/Downloads/firefox/libxul.so
#14 0x00007ffff0a3815d in  () at /home/cout/Downloads/firefox/libxul.so
#15 0x00007ffff1178336 in  () at /home/cout/Downloads/firefox/libxul.so
#16 0x00007fffeebe8d31 in  () at /home/cout/Downloads/firefox/libxul.so
#17 0x00007fffeec69df2 in  () at /home/cout/Downloads/firefox/libxul.so
#18 0x00007fffeec6a6cb in  () at /home/cout/Downloads/firefox/libxul.so
#19 0x00007fffeec6aac1 in  () at /home/cout/Downloads/firefox/libxul.so
#20 0x00005555555ce5c6 in _start ()

Thanks for testing. GTK_USE_PORTAL means you use file dialog provided by KDE environment via. portal, i.e. that's used by snap/flatpak to bypass sandbox. You should report this bug to Ubuntu/KDE (but Gnome may be affected too).

Summary: File open/save dialogs get progressively slower → [GTK_USE_PORTAL] File open/save dialogs get progressively slower
Attachment #9385954 - Attachment is obsolete: true
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: