Closed Bug 1517101 Opened 8 months ago Closed 6 months ago

Download dialog box freezes Firefox

Categories

(Firefox :: File Handling, defect, P1)

64 Branch
x86_64
Linux
defect

Tracking

()

RESOLVED FIXED
Tracking Status
relnote-firefox --- 66+
firefox-esr60 --- unaffected
firefox64 --- wontfix
firefox65 + wontfix
firefox66 --- fixed

People

(Reporter: hugues.granger, Unassigned)

References

Details

(Keywords: regression, Whiteboard: [fixed by bug 1492326])

Attachments

(2 files)

User Agent: Mozilla/5.0 (X11; Linux x86_64; rv:65.0) Gecko/20100101 Firefox/65.0

Steps to reproduce:

When being prompted as to where should a file be open or downloaded, the dialog box opens but then the browser is almost frozen/very very slow.
I sometimes have to wait 3 or 4 minutes before being able to click OK. Meanwhile, the browser is not usable.

Using Linux Mint 19.1, the issues appeared with Firefox 64.
Testing through different versions from here: https://ftp.mozilla.org/pub/firefox/releases/
The issue happens only with Firefox 64 and 65 : 63 works fine.


Actual results:

Browser freezes


Expected results:

Let me click OK or Cancel
Side note: the issue happens also when totally erasing the profiles and/or in safe mode
Component: Untriaged → File Handling
OS: Unspecified → Linux
Hardware: Unspecified → x86_64
I can't reproduce this with Firefox64 on Ubuntu18.04LTS 

Can you please use our mozregression tool to test different builds to find the change in Firefox that causes this ?
- https://mozilla.github.io/mozregression/
Flags: needinfo?(hugues.granger)
Now I'm not reproducing with any nightly build.
Testing against different production versions, results are not consistent, once I it worked fine with v63, another time not. 

I'll keep trying identifying the root cause on my side, not sure what to do with this ticket then.
Flags: needinfo?(hugues.granger)
Flags: needinfo?(hugues.granger)
Usually, this happens when a filesystem is slow or cannot be mounted.
For example, a cifs/smb or nfs remote file system.
(In reply to Sylvestre Ledru [:sylvestre] from comment #4)
> Usually, this happens when a filesystem is slow or cannot be mounted.
> For example, a cifs/smb or nfs remote file system.

I use some SMB mounts every day, but I do have the issue even when none of them is mounted.
Flags: needinfo?(hugues.granger)
I can reproduce this on Ubuntu 18.04.1 LTS, using FF 64.

I do not have any shares mounted.  My filesystem is a fast SSD. Hitting "File - Save Page As" works quickly, as expected.

But, 100% of the time, if I click on a download link, I get the download prompt with the "OK" button greyed out. All of FF becomes unresponsive and I either have to wait a few minutes, or hit the ESC key.
Do you see anything interesting if you run
strace -p <pid> -f -s foo.log ?

I get this 100% of the time. once the dialog appears everything is frozen for 20+ seconds, it varies. Everything from clicking the other radio button to choosing OK (which doesn't even get enabled on focus as it usually does)

Manjaro Linux x86-64

Duplicate of this bug: 1517959

Going to try mozregression tomorrow. Versions <= 63 and >= 66 work as expected so I'm using v66 nightly right now. Other dialogs like "View Page Info" are slow to load as well, and it feels the same, related.

Output of mozregression --good 63 --bad 64 is this paste.

Duplicate of this bug: 1519526

Brian could you take a look if your patch could be the cause of this bug ?

Blocks: 1481949
Status: UNCONFIRMED → NEW
Ever confirmed: true
Flags: needinfo?(bgrinstead)
Keywords: regression

Firefox 66 isn't affected actually

(In reply to jpegxguy from comment #13)

https://hg.mozilla.org/integration/autoland/pushloghtml?fromchange=cec74a3c6a5cf518f175ba9d7903e5dd66f25cc8&tochange=88315c6735dec7bbb70c5933e2b11e7343114f73

Those radio group changes are certainly suspicious >.>

OK, thanks for tracking this down.

(In reply to jpegxguy from comment #16)

Firefox 66 isn't affected actually

So something (probably Bug 1481949) landed in 64 that caused this issue, but something else landed in 66 that fixed it (I think you mentioned 65 is affected as well). Would it be possible to use mozregression with 'bad' as 65 and 'good' as 66 to narrow down what fixed it, so we can see if it's something that could be uplifted?

Flags: needinfo?(bgrinstead) → needinfo?(jpegxguy)

At first I ran mozregression --good 66 --bad 65 but 66 is not an actual release so it didn't work. What did was to work with the nightly build I'm running and v64.

Here is the output of mozregression --good 2019-01-11 --bad 64 --find-fix
pastebin here

Seems like something in https://hg.mozilla.org/integration/mozilla-inbound/pushloghtml?fromchange=1bb44497f6c0601c16de02b9595402ccc661b29a&tochange=0c7a54d4cc426d989d5759f962c8c0af737d5a5b fixed it

Flags: needinfo?(jpegxguy)

(In reply to jpegxguy from comment #18)

At first I ran mozregression --good 66 --bad 65 but 66 is not an actual release so it didn't work. What did was to work with the nightly build I'm running and v64.

Here is the output of mozregression --good 2019-01-11 --bad 64 --find-fix
pastebin here

Seems like something in https://hg.mozilla.org/integration/mozilla-inbound/pushloghtml?fromchange=1bb44497f6c0601c16de02b9595402ccc661b29a&tochange=0c7a54d4cc426d989d5759f962c8c0af737d5a5b fixed it

Oh, it actually totally makes sense that Bug 1492326 would have fixed this. This made it so looking up whether the radiogroup implements nsIDOMXULSelectControlElement doesn't need to call into JS anymore.

I don't expect that would be an easy patch to uplift, though. So we may need to figure out something else, unless if Neil thinks it could be.

Depends on: 1492326
Flags: needinfo?(enndeakin)

(In reply to Brian Grinstead [:bgrins] from comment #19)

(In reply to jpegxguy from comment #18)

Seems like something in https://hg.mozilla.org/integration/mozilla-inbound/pushloghtml?fromchange=1bb44497f6c0601c16de02b9595402ccc661b29a&tochange=0c7a54d4cc426d989d5759f962c8c0af737d5a5b fixed it

Oh, it actually totally makes sense that Bug 1492326 would have fixed this. This made it so looking up whether the radiogroup implements nsIDOMXULSelectControlElement doesn't need to call into JS anymore.

I'm not sure why this perf issue would only show up on Linux, though

At first glance, since both Bug 1492326 and the radiogroup bug have caused several regressions (including some issues which have not yet been fixed) I'd think that haven't neither patch in would be better.

Flags: needinfo?(enndeakin)

Is there a way to uplift what fixed the issue in 66 and bring it over to current release in a minor patch update?

(In reply to Mkll from comment #22)

Is there a way to uplift what fixed the issue in 66 and bring it over to current release in a minor patch update?

I don't think so - it's too sweeping of a change and as Neil noted in Comment 21 there's still at least one regression that blocks it that hasn't been fixed. I think the best case would be if we can get a profile from this and come up with a temporary JS fix that could be landed in 64/65. I don't know know off hand how difficult that will be (without regressing other things like accessibility).

Would it be possible to generate a profile of the bad case using the addon at https://perf-html.io/?

Flags: needinfo?(jpegxguy)

Could both of the patches be held back in a minor release? It a very long wait till v66 hits stable?
I will try the profile thing when I have the time though

My understanding is we're not going to have another 64 dot release given the impending 65 release.

65 is affected as well though

(In reply to jpegxguy from comment #26)

65 is affected as well though

We have separate flags for each release, so setting status-firefox64 to wontfix just means we won't uplift a fix to 64. status-firefox65 is still set to affected, although I think we'll have to be creative to come up with a fix that could be uplifted, given Comment 23.

(In reply to Brian Grinstead [:bgrins] from comment #27)

(In reply to jpegxguy from comment #26)

65 is affected as well though

We have separate flags for each release, so setting status-firefox64 to wontfix just means we won't uplift a fix to 64. status-firefox65 is still set to affected, although I think we'll have to be creative to come up with a fix that could be uplifted, given Comment 23.

Ah, I see.

Firefox 65.0b11, profile: https://perfht.ml/2FCmiUW

Flags: needinfo?(jpegxguy)
Duplicate of this bug: 1520443

Please note that all reports for this bug seem to be from Linux users

(In reply to jpegxguy from comment #29)

Firefox 65.0b11, profile: https://perfht.ml/2FCmiUW

Thanks, that helps a lot. I see at https://perfht.ml/2RQ1ksl calls to getCustomInterfaceCallback taking ~2000ms (https://dxr.mozilla.org/mozilla-beta/source/toolkit/content/customElements.js#221), the to create the actual interface proxy takes ~33ms (https://dxr.mozilla.org/mozilla-beta/source/toolkit/content/customElements.js#236) and then calls into the proxy taking ~1100ms (https://dxr.mozilla.org/mozilla-beta/source/toolkit/content/customElements.js#246).

I guess somehow the Linux download window is triggering a boatload of calls to and from JS. Need to look into this further to see what's triggering them.

On an Ubuntu VM with 65, when I download a file like https://ftp.mozilla.org/pub/firefox/releases/0.10.1/Firefox%201.0PR.dmg.gz I'm not seeing any jank.

Does this happen only with certain file types? I'm wondering if it's related to the number of applications in the "Open with" list, or something like that.

Flags: qe-verify+
Flags: needinfo?(jpegxguy)

I'm seeing some strange results in https://perfht.ml/2W2Uejq. There's a call to nsViewManager::DispatchEvent, followed by about a billion calls to std::deque::_M_reallocate_map, followed by calls to LifecycleGetCustomInterfaceCallback (the thing that got replaced with a C++ call in Bug 1492326). I wonder if we are still having the calls to std::deque::_M_reallocate_map but the C++ Qi is just a lot faster, so it isn't noticeable.

I've also requested some help from QA to see if they can reproduce this on 65.

(In reply to Brian Grinstead [:bgrins] from comment #33)

On an Ubuntu VM with 65, when I download a file like https://ftp.mozilla.org/pub/firefox/releases/0.10.1/Firefox%201.0PR.dmg.gz I'm not seeing any jank.

Does this happen only with certain file types? I'm wondering if it's related to the number of applications in the "Open with" list, or something like that.

I tried that download, still had the same problem described here. Only this time I didn't force quit and waited instead. I got a funny message about an unresponsive script, it said "Script: chrome://global/content/customElements.js:184"
By the way, how do I attach an image to my reply?

(In reply to bad.and.ugly from comment #36)

By the way, how do I attach an image to my reply?

Thanks, that would help. Click the "Attach File" link near the top of the page, which should take you to https://bugzilla.mozilla.org/attachment.cgi?bugid=1517101&action=enter.

(In reply to bad.and.ugly from comment #36)

(In reply to Brian Grinstead [:bgrins] from comment #33)

On an Ubuntu VM with 65, when I download a file like https://ftp.mozilla.org/pub/firefox/releases/0.10.1/Firefox%201.0PR.dmg.gz I'm not seeing any jank.

Does this happen only with certain file types? I'm wondering if it's related to the number of applications in the "Open with" list, or something like that.

I tried that download, still had the same problem described here. Only this time I didn't force quit and waited instead. I got a funny message about an unresponsive script, it said "Script: chrome://global/content/customElements.js:184"

Also, how many items show up in the list of recommended applications for that file type?

No, it happens 100% of the time, with varying delays after the click. I forgot to mention it happens with other dialogs as well, like View Page Info. Something in the way firefox handles it's own native dialogs.

Flags: needinfo?(jpegxguy)

I tried reproducing the issue on Firefox 64.0b3, 64.0b5, 65.0b5 and 65.0b11 under Ubuntu 18.04 (x64) and Linux Mint 19.1 (x64), but without success. I tried under normal circumstances with clean profiles and safe mode, everytime the browser behaving correctly, without any kind of freezes.

Flags: qe-verify+

I confirm this problem in Mint 19.1 (Firefox 64).

I will add one information, which I don't know if it is related: If I go to

Preferences -> Connection Settings

and try to change the "Configure Proxy..." option, there is quite a delay in the clicking of the options.

Actually there is a delay in any of the selection bullets of the Preferences. It seems to be the same issue
and not related to downloads, file systems, etc, but to the selection form.

Just an additional information: after changing some of the selection options which have this delay, that translates to slow functioning of the whole firefox window. Just try to go back and forth in different tabs after that. There is clearly some overhead caused by the selection box that is slowing everything down.

Yeah it's found throughout the Firefox UI

I still can't reproduce the problem, I tried all the steps mentioned above by leandro and still nothing, the browser behaves correctly.

I have this problem in a Samsung S51 Pro notebook (which has a Nvidia GeForce card and a SSD drive).

I have Mint 19.1 installed in a much slower machine (a small Nuc box), and I don't see the problem there.

I do not if this is of any help.

(In reply to Catalin Sasca, QA [:csasca] from comment #44)

I still can't reproduce the problem, I tried all the steps mentioned above by leandro and still nothing, the browser behaves correctly.

Seeing as it is a performance problem, maybe your specific machine doesn't mind the load

(In reply to jpegxguy from comment #46)

(In reply to Catalin Sasca, QA [:csasca] from comment #44)

I still can't reproduce the problem, I tried all the steps mentioned above by leandro and still nothing, the browser behaves correctly.

Seeing as it is a performance problem, maybe your specific machine doesn't mind the load

Is there anything that stands out with your system or hard drive that might help for reproducing? See in particular https://bugzilla.mozilla.org/show_bug.cgi?id=1517101#c4 and https://bugzilla.mozilla.org/show_bug.cgi?id=1517101#c7

So, I booted from USB with the latest Manjaro Xfce preview image. Both 64.0.2 and 65.0b11: No issues.
I guess we know that my hardware isn't the problem.

Interesting. Running firefox with firejail solves this as well. (or at least it's faster, but visually I can't say it's slower than normal). Seems like firefox is looking for something that firejail prevents it from seeing?

If it's smething filesystem related, this is what firefox sees in a firejail: https://firejail.wordpress.com/documentation-2/firefox-guide/#security

(In reply to Sylvestre Ledru [:sylvestre] from comment #7)

Do you see anything interesting if you run
strace -p <pid> -f -s foo.log ?

jpegxguy, could you try running this during the hang and post your log?

Flags: needinfo?(jpegxguy)

I think I get these messages

[pid 28336] madvise(0x7f2e0ab02000, 32768, MADV_DONTNEED) = 0

in the instant the problem is manifested.

Attachment #9038278 - Flags: feedback+

Moving this to fix-optional, as 65 ships in 1 week and this (existing) regression isn't going to be fixed by then.

Without a reproducible test case (or more than a couple of people able to reproduce), this also seems unlikely to be fixed in a 65.0.x dot release in the next few weeks (before 66 is on its way, which apparently fixes the issue entirely). But we'll see what happens.

I also get the madvise syscalls, precisely when the button press happens, whether that is clicking the RSS button in View Page Info or selecting to download instead of opening file

Flags: needinfo?(jpegxguy)

I am having the same problem on two different machines that are running Linux Mint 19 Cinnamon. I have included the system information for one device. Would the hardware information of both assist in reproducing this issue?

System: Host: MYHOST Kernel: 4.15.0-43-generic x86_64 bits: 64 gcc: 7.3.0
Desktop: Cinnamon 3.8.9 (Gtk 3.22.30-1ubuntu1) dm: lightdm Distro: Linux Mint 19 Tara

I have found a workaround. The single firejail option that mitigates this behavior is nodbus. Any idea why disabling D-Bus access fixes this problem?

This means that if you are affected by the problem you can run firefox like so:
firejail --noprofile --nodbus firefox
You can also edit the firefox.desktop file in /usr/share/applications. Of course, one can use firejail with its default firefox profile as it adds a nice layer of protection.

jpegxguy, I cannot find information on launching firefox without d-bus access. It seems this option is unique to firejail and there is not a similar one in firefox. I don't believe editing the firefox.desktop file is a valid alternative.

(In reply to jgoulet1994 from comment #57)

jpegxguy, I cannot find information on launching firefox without d-bus access. It seems this option is unique to firejail and there is not a similar one in firefox. I don't believe editing the firefox.desktop file is a valid alternative.

I can see why there isn't a firefox option for that. And yes, this is a workaround, and it has consequences. It's not a solution. I just wanted to share what I did so that I can still use Firefox instead of Nightly on a daily basis

(In reply to jpegxguy from comment #56)

I have found a workaround. The single firejail option that mitigates this behavior is nodbus. Any idea why disabling D-Bus access fixes this problem?

This means that if you are affected by the problem you can run firefox like so:
firejail --noprofile --nodbus firefox
You can also edit the firefox.desktop file in /usr/share/applications. Of course, one can use firejail with its default firefox profile as it adds a nice layer of protection.

I do not understand this suggestion. firejail was not installed in my computer, and after installing, I get:

% firejail --noprofile --nodbus firefox
Error: invalid --nodbus command line option

Maybe the firejail your sitribution has in the repositories comes without the nodbus option.

Duplicate of this bug: 1523320

Is there any update on the status of this bug?

At least Firefox Developer Edition is at v66, I use that now.

(In reply to jpegxguy from comment #63)

At least Firefox Developer Edition is at v66, I use that now.

Yeah - I'm sorry this is happening and appreciate all the extra information folks have given here, but until we can get a reproducible test case there's not a lot that can be done for 65. In the meantime, I'd suggest either using Beta / Developer Edition or using the workaround in Comment 56 if that works locally.

To tell you truth, I'll use Firefox Dev as my main, I like new features, and it's in my repos :)

(In reply to Brian Grinstead [:bgrins] from comment #34)

I'm seeing some strange results in https://perfht.ml/2W2Uejq. There's a call to nsViewManager::DispatchEvent, followed by about a billion calls to std::deque::_M_reallocate_map, followed by calls to LifecycleGetCustomInterfaceCallback (the thing that got replaced with a C++ call in Bug 1492326). I wonder if we are still having the calls to std::deque::_M_reallocate_map but the C++ Qi is just a lot faster, so it isn't noticeable.

Neil, does this profile help figure out what's going on here? Are we confident there's nothing left to fix after bug 1492326?

Jan, any idea what the connection with dbus is here?

Going to mark P3 while we're still no closer to reliable STR for now.

Flags: needinfo?(jhorak)
Flags: needinfo?(enndeakin)
Priority: -- → P3

Hello,

As a user, do you know how we can help for the investigation ?

Thanks

(In reply to akred from comment #67)

Hello,

As a user, do you know how we can help for the investigation ?

Thanks

It's still not clear why this is reproducing for some people and not others. It doesn't make a lot of sense to me that this is particularly bad for some users and doesn't happen at all for others; the effects described (hangs for minutes / lots of seconds) are too severe to be explained by purely performance differences between machines. So there must be some other difference; I just don't know what that is.

It's also as-yet unclear what the connection is between the dbus disabling and the bugs that regressed and fixed this issue. That is, the regressing/fixing bugs change the <radio> elements in the dialog that pops up asking about saving/opening files. The dbus code is presumably only used for checking what the default handler app is on your OS (in the context of that dialog, that is; we obviously might use dbus for other stuff in other places).

On builds where the issue is fixed (66 beta and 67 nightly), are there a lot of entries in the "open with" dropdown under the "open" option for files where this happens on broken builds, or something? Do all those entries show up when you download something with mimetype application/octet-stream as well, and in that case, are those downloads showing the same problem on affected/broken builds?

Flags: needinfo?(akred)
Duplicate of this bug: 1524540
Duplicate of this bug: 1524582

Putting it back on the release managers radar because of the number of dups

Flags: needinfo?(ryanvm)

Changing the priority to p1 as the bug is tracked by a release manager for the current release.
See How Do You Triage for more information

Priority: P3 → P1

Hello,

I have tested with the latest Firefox 65 (64bits) right now with many type of files (mp3 / mp3/ txt, etc...) and all of them present the same issue.
For example for a PDF, on "open with entry", I have only 2 entries : "PDF viewer" (the default one of Linux Mint) and "other" option.
(If it can help, in this case if I select other, I have 6 entries, which are winebrowser / Document View / 2 entries of Image Magick / Master PDF editor)

But if you say that the issue is fixed on Firefox 66, it's a good news no ?

(In reply to :Gijs (he/him) from comment #68)

(In reply to akred from comment #67)

Hello,

As a user, do you know how we can help for the investigation ?

Thanks

It's still not clear why this is reproducing for some people and not others. It doesn't make a lot of sense to me that this is particularly bad for some users and doesn't happen at all for others; the effects described (hangs for minutes / lots of seconds) are too severe to be explained by purely performance differences between machines. So there must be some other difference; I just don't know what that is.

It's also as-yet unclear what the connection is between the dbus disabling and the bugs that regressed and fixed this issue. That is, the regressing/fixing bugs change the <radio> elements in the dialog that pops up asking about saving/opening files. The dbus code is presumably only used for checking what the default handler app is on your OS (in the context of that dialog, that is; we obviously might use dbus for other stuff in other places).

On builds where the issue is fixed (66 beta and 67 nightly), are there a lot of entries in the "open with" dropdown under the "open" option for files where this happens on broken builds, or something? Do all those entries show up when you download something with mimetype application/octet-stream as well, and in that case, are those downloads showing the same problem on affected/broken builds?

Flags: needinfo?(akred)

(In reply to akred from comment #73)

I have tested with the latest Firefox 65 (64bits) right now with many type of files (mp3 / mp3/ txt, etc...) and all of them present the same issue.
For example for a PDF, on "open with entry", I have only 2 entries : "PDF viewer" (the default one of Linux Mint) and "other" option.
(If it can help, in this case if I select other, I have 6 entries, which are winebrowser / Document View / 2 entries of Image Magick / Master PDF editor)

But if you say that the issue is fixed on Firefox 66, it's a good news no ?

Could you please test locally with Beta (66) and confirm that the issue is fixed for you?

Hello Brian,

I have just tested with firefox 66 beta4, and the issue is fixed !!!
We just have to wait for the stable release ;)

Thank you !

(In reply to Brian Grinstead [:bgrins] from comment #74)

(In reply to akred from comment #73)

I have tested with the latest Firefox 65 (64bits) right now with many type of files (mp3 / mp3/ txt, etc...) and all of them present the same issue.
For example for a PDF, on "open with entry", I have only 2 entries : "PDF viewer" (the default one of Linux Mint) and "other" option.
(If it can help, in this case if I select other, I have 6 entries, which are winebrowser / Document View / 2 entries of Image Magick / Master PDF editor)

But if you say that the issue is fixed on Firefox 66, it's a good news no ?

Could you please test locally with Beta (66) and confirm that the issue is fixed for you?

Duplicate of this bug: 1524859

The patches within bug 1492326 are too large to get uplifted to 65 and it is unclear what within that patchset fixed this. At this point in 65 we're unlikely to fix this bug, and since this bug is not present in 66 there is no remaining action to be taken for this bug.

Status: NEW → RESOLVED
Closed: 6 months ago
Flags: needinfo?(ryanvm)
Flags: needinfo?(jhorak)
Flags: needinfo?(enndeakin)
Resolution: --- → FIXED
Whiteboard: [fixed by bug 1492326]
Duplicate of this bug: 1528584
Duplicate of this bug: 1514692

I'm adding this as a known issue to the Fx65 relnotes. We should probably add a note for 66 as well that it's fixed.

relnote-firefox: --- → ?
Flags: needinfo?(lhenry)

Noted in 66 beta, and i'll be sure to bring that into release as well.

Flags: needinfo?(lhenry)

The release note at https://www.mozilla.org/en-US/firefox/66.0beta/releasenotes/ got mangled. It's displaying like this:

Fixed an performance issue some Linux users experienced with the Downloads panel ([bug 1517101][1]). [1]: https://bugzilla.mozilla.org/show_bug.cgi?id=1517101

Fixed, thanks!

Duplicate of this bug: 1536707
You need to log in before you can comment on or make changes to this bug.