Open Bug 1508551 Opened 9 months ago Updated 5 months ago

back and forward buttons not working when dragging files into the window

Categories

(Core :: DOM: Content Processes, defect, P3)

defect

Tracking

()

Tracking Status
firefox-esr60 --- fix-optional
firefox63 --- wontfix
firefox64 --- wontfix
firefox65 --- fix-optional

People

(Reporter: felix.bau, Unassigned)

References

Details

(Keywords: regression)

User Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:65.0) Gecko/20100101 Firefox/65.0

Steps to reproduce:

opened some random page
(optional: went a page backwards)

Dragged some file onto it that Firefox can't open (default behavior is download):
like the Firefox installer or a wave file

I click abort (no download)


Actual results:

The forward and backward buttons aren't working anymore.

Workaround for now:
Press and hold the button you want to use to get the last back or forward pages and then select the one you want to go to

This bug exists in Release and Nightly
Version: 63 Branch → Trunk
Status: UNCONFIRMED → NEW
Ever confirmed: true
I'm guessing this is a problem at the DOM level with process switching, and not sandboxing itself, because of how it affects browser history (but the DOM people can send the bug back if I'm wrong).
Component: Security: Process Sandboxing → DOM: Content Processes
Too late to fix in 64. Marking this issue as fix-optional for 65; if you land a patch in nightly and think it's low-risk for beta, please request uplift.

I am currently investigating this bug as a way to get introduced to the code base and, perhaps, make my first contribution. I have found my way to the bottom of the rabbit hole of opening files with the external app helper (etc) and now trying to put the pieces together to figure out where the navigation failure occurs. I hope that I can have something soon!

Will

(In reply to Will Hawkins from comment #4)

I am currently investigating this bug as a way to get introduced to the code base and, perhaps, make my first contribution. I have found my way to the bottom of the rabbit hole of opening files with the external app helper (etc) and now trying to put the pieces together to figure out where the navigation failure occurs. I hope that I can have something soon!

Will

I think that I tracked it down. It appears that the problem is (ultimately) that OnNewURI is not getting called for these type of drag/drop operations. That appears to be the result of the fact that there is no content viewer associated with them (no call to CreateContentViewer). The call to CreateContentViewer() is the function that invokes the OnLoadingSite callback which, in turn, calls OnNewURI which, finally, adds the entry to the session history.

Now that I think that I know the root source of the problem, I will go about adding an appropriate fix. I hope to have something done within a few days!

Will

Flags: needinfo?(jld)

(In reply to Will Hawkins from comment #5)

(In reply to Will Hawkins from comment #4)

I am currently investigating this bug as a way to get introduced to the code base and, perhaps, make my first contribution. I have found my way to the bottom of the rabbit hole of opening files with the external app helper (etc) and now trying to put the pieces together to figure out where the navigation failure occurs. I hope that I can have something soon!

Will

I think that I tracked it down. It appears that the problem is (ultimately) that OnNewURI is not getting called for these type of drag/drop operations. That appears to be the result of the fact that there is no content viewer associated with them (no call to CreateContentViewer). The call to CreateContentViewer() is the function that invokes the OnLoadingSite callback which, in turn, calls OnNewURI which, finally, adds the entry to the session history.

Now that I think that I know the root source of the problem, I will go about adding an appropriate fix. I hope to have something done within a few days!

To solve this, I think that the right solution is to replace the current page with about:blank. To do that, I am calling CreateAboutBlankContentViewer() in nsIDocShell. The only problem is that CreateAboutBlankContentViewer() specifically keeps itself from going into the history. I have created another nsIDocShell method that creates an about:blank content viewer that gets added to the history.

Does that sound like a reasonable solution? I would love anyone's input, please!

Will

Bob, can you help? This looks like it's due to the file process switch. Will has come up with some ideas in comment 5 and comment 6. If those seem like the right solution, perhaps you or Nika can help suggest how to implement them? I'm a little bit surprised in that I would have expected this to effectively be using the same process switching code as other cases where we load a non-same-process URL in a tab (e.g. clicking the home button on nightly with "normal" http pages loaded, or manually typing about:addons or about:preferences in the url bar and loading that in a tab with http pages), and restore history into the tab using the session store helpers. Not sure if docshell is the right place to add logic for this.

Flags: needinfo?(bobowencode)

(In reply to :Gijs (he/him) from comment #7)

Bob, can you help? This looks like it's due to the file process switch. Will has come up with some ideas in comment 5 and comment 6. If those seem like the right solution, perhaps you or Nika can help suggest how to implement them? I'm a little bit surprised in that I would have expected this to effectively be using the same process switching code as other cases where we load a non-same-process URL in a tab (e.g. clicking the home button on nightly with "normal" http pages loaded, or manually typing about:addons or about:preferences in the url bar and loading that in a tab with http pages), and restore history into the tab using the session store helpers. Not sure if docshell is the right place to add logic for this.

Just wanted to be clear: I no longer think that my suggestions for changing the nsIDocShell interface are the best way to handle this. I was just brainstorming. I do, however, think that "fixing" this in the URI Loader for contented handled by an external helper is the way to go.

As I said earlier, I have made some progress on applying a fix at that location (specifically, after the DoContent() call on the external helper service object in DispatchContent() in nsURILoader.cpp). The problem that I still have is that I cannot manipulate the history properly. That said, you are all way smarter and more familiar with the code than I and I would absolutely defer to your judgment on the position of a fix.

Thanks for helping me with this fix and for education me along the way!

Will

(I don't know anything useful about the DOM side of this, but it looks like it's being routed to people who can help.)

Flags: needinfo?(jld)

This is similar to bug 1175267 (and bug 1399574).

It is an underlying issue with the way we handle these situations where we process switch but the new process/remote type can't handle the mime type and sends it back up to the parent.
The file content process just makes it more likely to happen.

For example, if you turn off the file content process (browser.tabs.remote.separateFileUriProcess) and drag a file from where the sandbox doesn't block read access (say from under c:\Program Files\) then that does fix it.
However, if you go to a parent rendered page (like about:robots) or a web extension's moz-extension:// page and drag the same file it will still process switch and give you the same problem.

As I said in bug 1399574 comment 3, I don't think it makes sense for the child to be deciding that an external/different handler is needed. It should probably be done up front correctly in the parent process.

Nika - I know you're changing quite a lot around this for fission, is that by chance going to address this problem as well?

Will - great to see you digging into the problems here. If we were to still allow the child to make this decision, then I don't think that adding a history entry for the blank page would be the way to go. I think we would want to reload the original URI and cause the process to switch back. That's a bit clunky and as I say I think the parent should do the right thing up front.
I suspect that this isn't going to be a good first bug to work on. If you haven't already I would take a look at https://wiki.mozilla.org/Good_first_bug or go straight to https://wiki.mozilla.org/BugsAhoy.
Given that you've already been able to dig through this including in docshell (where most people fear to tread), I'm sure you'll find something to pick up really quickly.

Flags: needinfo?(bobowencode) → needinfo?(nika)

(In reply to Bob Owen (:bobowen) from comment #10)

This is similar to bug 1175267 (and bug 1399574).

It is an underlying issue with the way we handle these situations where we process switch but the new process/remote type can't handle the mime type and sends it back up to the parent.
The file content process just makes it more likely to happen.

For example, if you turn off the file content process (browser.tabs.remote.separateFileUriProcess) and drag a file from where the sandbox doesn't block read access (say from under c:\Program Files\) then that does fix it.
However, if you go to a parent rendered page (like about:robots) or a web extension's moz-extension:// page and drag the same file it will still process switch and give you the same problem.

As I said in bug 1399574 comment 3, I don't think it makes sense for the child to be deciding that an external/different handler is needed. It should probably be done up front correctly in the parent process.

Nika - I know you're changing quite a lot around this for fission, is that by chance going to address this problem as well?

Will - great to see you digging into the problems here. If we were to still allow the child to make this decision, then I don't think that adding a history entry for the blank page would be the way to go. I think we would want to reload the original URI and cause the process to switch back. That's a bit clunky and as I say I think the parent should do the right thing up front.
I suspect that this isn't going to be a good first bug to work on. If you haven't already I would take a look at https://wiki.mozilla.org/Good_first_bug or go straight to https://wiki.mozilla.org/BugsAhoy.
Given that you've already been able to dig through this including in docshell (where most people fear to tread), I'm sure you'll find something to pick up really quickly.

Thank you for the feedback Mr. Bowen. I would really like to stick with fixing this bug -- I'm starting as an employee at Mozilla next week and I've just loved digging in to this problem.

I'm sad to say that I do seem to understand a) your description of the problem and b) how you would solve it! I am curious to learn all the ins/outs so that I can get started with my work at Mozilla starting on day one! If you're willing to talk to me more about the problem and a potential solution, I'd really appreciate it!

Will

(In reply to Will Hawkins from comment #11)

(In reply to Bob Owen (:bobowen) from comment #10)
...
Thank you for the feedback Mr. Bowen. I would really like to stick with fixing this bug -- I'm starting as an employee at Mozilla next week and I've just loved digging in to this problem.

I'm sad to say that I do seem to understand a) your description of the problem and b) how you would solve it! I am curious to learn all the ins/outs so that I can get started with my work at Mozilla starting on day one! If you're willing to talk to me more about the problem and a potential solution, I'd really appreciate it!

OK great, I think Nika or someone else from the DOM team will be best able to advise on the way forward here.

My memory of this is a bit sketchy now, but here's what I can remember without too much re-investigating:

  • The original page (being normal http(s)) is handled by a normal child content process.
  • The dragged in file comes into the frontend JavaScript as a file:// URI, so that decides that it needs a file content process to handle the request (file content processes have full read access to be able to load file:// URI pages). So, the tab switches to a file content process and sends the request down.
  • After exhausting all the options in nsDocumentOpenInfo::DispatchContent, the file content process ends up at [1], which in the child process ends up in nsExternalHelperAppService::DoContentContentProcessHelper at [2].
  • The PExternalHelperAppChild that is created their is an IPDL actor (https://developer.mozilla.org/en-US/docs/Mozilla/IPDL/Tutorial) that is then used to handle the request, but basically just forwards is back up to the parent's corresponding PExternalHelperAppParent actor, which actually handles the request and eventually in this case presents the download dialog.
  • As part of the process switch, the tab loses the original page, but never reloads anything and so is broken and blank.

As part of handling the external helper in the parent, we could reload the original URI in the tab.
I had something a bit like this as a fix for bug 1175267, but you get a nasty flicker and we do a process switch (probably starting a new file content process) for no good reason.
I think it's better to decide up front that we need to handle things in the parent and short circuit the whole thing.

Hope that makes sense. :-)

[1] https://searchfox.org/mozilla-central/rev/c035ee7d3a5cd6913e7143e1bce549ffb4a566ff/uriloader/base/nsURILoader.cpp#608
[2] https://searchfox.org/mozilla-central/rev/c035ee7d3a5cd6913e7143e1bce549ffb4a566ff/uriloader/exthandler/nsExternalHelperAppService.cpp#642

(In reply to Bob Owen (:bobowen) from comment #12)

(In reply to Will Hawkins from comment #11)

(In reply to Bob Owen (:bobowen) from comment #10)
...
Thank you for the feedback Mr. Bowen. I would really like to stick with fixing this bug -- I'm starting as an employee at Mozilla next week and I've just loved digging in to this problem.

I'm sad to say that I do seem to understand a) your description of the problem and b) how you would solve it! I am curious to learn all the ins/outs so that I can get started with my work at Mozilla starting on day one! If you're willing to talk to me more about the problem and a potential solution, I'd really appreciate it!

OK great, I think Nika or someone else from the DOM team will be best able to advise on the way forward here.

My memory of this is a bit sketchy now, but here's what I can remember without too much re-investigating:

This all makes perfect sense and is exactly what I've deduced after a very long investigation (I look forward to knowing all that information from "My memory" like you, Mr. Bowen!).

  • The original page (being normal http(s)) is handled by a normal child content process.
  • The dragged in file comes into the frontend JavaScript as a file:// URI, so that decides that it needs a file content process to handle the request (file content processes have full read access to be able to load file:// URI pages). So, the tab switches to a file content process and sends the request down.

Here's what I have as the call stack at this point:

browser.handleDroppedLink (in base/content/browser.js)
tabbrowser.loadTabs (in base/content/tabbrowser.js) (we have replace set to true, ie we are replacing the content in the current tab)
browser._loadURI (in base/content/browser.js)
LoadInOtherProcess (in base/content/browser.js)

  • After exhausting all the options in nsDocumentOpenInfo::DispatchContent, the file content process ends up at [1], which in the child process ends up in nsExternalHelperAppService::DoContentContentProcessHelper at [2].

Yes!

  • The PExternalHelperAppChild that is created their is an IPDL actor (https://developer.mozilla.org/en-US/docs/Mozilla/IPDL/Tutorial) that is then used to handle the request, but basically just forwards is back up to the parent's corresponding PExternalHelperAppParent actor, which actually handles the request and eventually in this case presents the download dialog.

This is exactly how I understand it, too!

  • As part of the process switch, the tab loses the original page, but never reloads anything and so is broken and blank.

This is where I lose the thread (no pun intended). I can get all the way to the "bottom" of the call stack where the external content is handled, but ...

As part of handling the external helper in the parent, we could reload the original URI in the tab.

I don't quite understand where in the control flow the tab is unloaded.

I had something a bit like this as a fix for bug 1175267, but you get a nasty flicker and we do a process switch (probably starting a new file content process) for no good reason.
I think it's better to decide up front that we need to handle things in the parent and short circuit the whole thing.

This seems like a good solution. I will wait for Nika to weigh in before trying to look into a "real" solution. I will continue to play around, but I'll wait for her expert guidance to do anything that might look like a production-ready solution.

Hope that makes sense. :-)

Thank you for taking the time to explain this. I am glad that I am on the same page as you -- it validates my all the grokking I have done in the past several weeks!

Will

[1] https://searchfox.org/mozilla-central/rev/c035ee7d3a5cd6913e7143e1bce549ffb4a566ff/uriloader/base/nsURILoader.cpp#608
[2] https://searchfox.org/mozilla-central/rev/c035ee7d3a5cd6913e7143e1bce549ffb4a566ff/uriloader/exthandler/nsExternalHelperAppService.cpp#642

Thanks for helping out & doing so much research! Seems like you understand the issue here quite well.

Sorry for the delay in getting back to you here. This is a pretty gnarly issue which touches a lot of old & complex code which is pretty actively being changed & needs some improvement, so it's going to be hard for me to give an awesome answer as to what the best solution is.

In this case, what I think is happening is an unhandled link drop is detected upon the content area, causing the browser to decide to perform a load. That load is given a file:// URI to the file to load, which the browser detects cannot be loaded in the current content process. Due to this, it causes a process swap.

Unfortunately, our process-changing load implementation is still quite naive (although we're actively working on it!). With most loads, the existing document is kept around while the load occurs, and can be returned to if the load fails or is determined to be a download. Unfortunately, when doing a process swap, Firefox throws out the entire existing "browser", and creates a new one for the target process. This means that if the load immediately after a process swap triggers a download or fails, there's no existing document to return to!

The final solution to fix this issue in the general case is probably beyond the scope of this bug, as it involves changing the way we load documents fairly fundamentally. However, we may be able to fix this specific issue in the shorter term. That being said, I'm not sure whether it's worth the effort right now, given the ongoing work to improve process swapping for Fission.

If we decide to implement a workaround, the specific place we'd need to add the hacky check would probably be in https://searchfox.org/mozilla-central/rev/5c8ea961d04767db723a0a15e3a8f7fbca154129/browser/base/content/browser.js#1135-1140. Namely, we'd have to detect that the URL is a file:// URL, check the file on the filesystem, check if i looks like it would be downloaded not loaded, and if it would, just perform the download rather than cause the reload. This is technically racy because filesystems, but would probably handle the issue in most cases.

Flags: needinfo?(nika)
Priority: -- → P3
You need to log in before you can comment on or make changes to this bug.