Open Bug 1481994 Opened 6 years ago Updated 2 years ago

URL Spoofing by delaying a navigation and using the onbeforeunload dialog

Categories

(Core :: DOM: Navigation, defect, P2)

defect

Tracking

()

People

(Reporter: luan.herrera, Unassigned, NeedInfo)

References

()

Details

(Keywords: csectype-spoof, sec-moderate, Whiteboard: [reporter-external] [client-bounty-form] [verif?][secdom:beforeunload])

Attachments

(2 files, 1 obsolete file)

Attached file server.zip
I noticed that a redirect to a website that takes a long time to load cancels the onbeforeunload dialog when the navigation is completed. Knowing this, it's possible to craft a server that will stall the navigation up until the user tries to navigate to another website. This triggers the onbeforeunload dialog, which will then be canceled by the redirect and will end up leaving the URL the user was trying to access in the address bar, thus spoofing it.

Reproduced on Windows 10 and Ubuntu 16.04:
Version -> 61.0.2  (64-bit)
Version -> 62.0b15 (64-bit)

Steps to reproduce:
1. Download server.zip.
2. Extract the files into a folder, install NodeJS and the ExpressJS package (if you don't have it installed yet).
3. Run the server (nodejs index.js).
4. Access http://localhost and then click anywhere in the page.
5. Try to access a different website using the address bar.

Actual results:  
The address bar displays the URL of the site the user was trying to access with content controlled by the attacker.

Expected results:  
The onbeforeunload dialog shouldn't have been dismissed by the delayed navigation and the URL in the address bar should have been reverted to the one of the current page.

Here is an unlisted video simulating the attack:
https://www.youtube.com/watch?v=f0r4LgRFVAQ
Flags: sec-bounty?
Hi, sorry for CC'ing you, I don't know if this is the right way to go about it but I got your email from an old security bug I reported and was wondering if you could help me find someone to triage this bug, given Wennie was assigned as the triager of this report but seems to be inactive since 31/07.

Thanks!
Flags: needinfo?(past)
Johann should know who to route this to.
Flags: needinfo?(past) → needinfo?(jhofmann)
Thank you for reporting this and sorry for the delayed response.

I can confirm your STR, this is quite a nice and simple URL bar spoof. Does this technique also enable the site to "trap" the user on the page by repeatedly cancelling the onbeforeunload dialog?

It seems to me like the issue we're trying to fix is onbeforeunload getting cancelled by redirects (Chrome doesn't allow this, for instance), so I'm moving this to the right component.
Group: firefox-core-security → core-security
Status: UNCONFIRMED → NEW
Component: Security → Document Navigation
Ever confirmed: true
Flags: needinfo?(jhofmann)
Product: Firefox → Core
Johann: Yes, I believe it would be possible to "trap" the user.

The user wouldn't be able to:
1. Close the tab or the browser.
2. Leave the website using the address bar.
3. Use another tab (we could use the onblur event to call a redirect to trigger the onbeforeunload dialog which would then steal the focus to the attacker's page).
qDot, can you help triage this further? Thanks.

(FWIW, I suspect we have dupes of this, but the STR here might be better than what we have in the dupes...)
Group: core-security → dom-core-security
Flags: needinfo?(kyle)
Yeah, I think this is something we can deal with in loading portions of DocShell (which I'm deep in right now anyways), though it's going to take some work to figure out where the event issues are happening on our end, and how we can unwind from that. Unfortunately I'm getting to this a bit late and am on PTO thru next Tuesday.

I do think we should definitely get this fixed, but since it requires a specially crafted server to perform the stall, I don't think it's a P0 kinda deal. P2 should be good. I'm going to ni? :smaug on this since it's event cancellation related in DocShell, but if he's too busy I'll try to take a look when I get back next Tuesday (guidance would be appreciated if you've got any though :smaug! :) ). Should probably keep it as a security bug until fixed though.

I'm not aware of any dupes against currently untriaged docshell bugs, but I really didn't look too hard and I've only been the component manager for about 48 hours so far, so I'm not real familiar with our backlog yet.
Flags: needinfo?(kyle) → needinfo?(bugs)
Priority: -- → P2
(I don't expect to have time for this before tomorrow)
Flags: needinfo?(bugs)
Ok, back from PTO and working on this now. I have the STR working, and am trying to trace why a change to the document would cause the prompt to abort, and why we aren't resetting correctly after that.
Assignee: nobody → kyle
Some quick cross browser behavior checking, which gets interesting:

Chrome: Loads URL but still shows loading status spinner after page comes up. On page unload via URL entry, brings up dialog. If dialog cancelled, stays on initial page, fixes URL in URL bar. If dialog accepted, goes to requested page correctly.

Edge: Loads URL, with no spinner shown so page looks completely loaded. On page unload via URL entry, brings up dialog. If dialog cancelled, goes to /redirect and shows spoofed page. If dialog accepted, goes to requested page correctly.

Safari: Cannot load initial page. Stays stuck in loading forever.

Firefox: Loads URL, with no spinner shown so page looks completely loaded. On page unload via URL entry, unload dialog flashes but is cancelled quickly by page content change. Rest of behavior follows from above.

Even if the content change is removed, and the dialog is shown but cancelled by the user, we still don't return the URL bar to its original state.
Doing more tracing to figure this out. The unload dialog is handled in Docshell and just returns NS_OK before we started the load if its cancelled (https://searchfox.org/mozilla-central/source/docshell/base/nsDocShell.cpp#10132). We get here via calling LoadURI (https://searchfox.org/mozilla-central/source/browser/base/content/urlbarBindings.xml#911).

So, it seems like this is a combination of a few bugs:

1. We show that the server has stopped loading even though the load is still happening
2. We have no way to signify that an unload has been cancelled above DocShell
3. Prompts can be cancelled via calls in onbeforeunload

I think bug 1 may be something in necko? Not sure what handles the "done loading" signals there and how that bubbles up to the GUI. 

To fix bug 2, we need to return an error from nsDocShell when a load is cancelled. There's logic in urlbarBindings.xml to revert the URL after that (https://searchfox.org/mozilla-central/source/browser/base/content/urlbarBindings.xml#992). This has, as far as I can tell, returned NS_OK since onbeforeunloadeevent was implemented in 2004, so I'm not sure how much depends on this or will break due to that. Going to run a try with it returning NS_ERROR_FAILURE and see what happens.

Fixing bug 3 is going to require XUL/widget knowledge, because I don't really have time to get into how nsPrompter works myself. I'll do some commit checking and see who might be worth pinging there.

We're aiming to fix a lot of this via fission work, but this is important enough that it'd be nice to have a stopgap until we have a clearer system in place. I think getting the DocShell error return fixed should get us a decent bit of the way there, and fixing 3 would be great too, though I'll let :johannh be the final judge of that. :)
(In reply to Kyle Machulis [:qdot] [:kmachulis] (if a patch has no decent commit message, automatic r-) from comment #10)
> 3. Prompts can be cancelled via calls in onbeforeunload

> Fixing bug 3 is going to require XUL/widget knowledge, because I don't
> really have time to get into how nsPrompter works myself. I'll do some
> commit checking and see who might be worth pinging there.

I know a fair amount about onbeforeunload handling and nsPrompter, but I'm not clear what the sentence you've got for (3) means. can you clarify what the issue is?
Flags: needinfo?(kyle)
:gijs The PoC in this bug has a onbeforeunloadevent handler that looks like this (where "status" is the ID of an img tag):

onbeforeunload = function() {
	document.getElementById("status").src = "/status";
	return "";
}

The src set for the image tag is what causes the unload prompt to disappear. I'm not quite sure where that cancellation comes from though.
Flags: needinfo?(kyle) → needinfo?(gijskruitbosch+bugs)
If a user cancels an unload, revert the location in the URL bar.
Ran a try with the patch I just attached. Looks ok, pretty sure the browser_documentnavigation.js failures are another intermittent but will do a bit of extra checking there.

https://treeherder.mozilla.org/#/jobs?repo=try&revision=2a921a12860d6ab7f871c35afc987347192a215f&selectedJob=197751597
Comment on attachment 9007024 [details]
Bug 1481994 - Revert URL in location bar when unload cancelled

Olli Pettay [:smaug] has approved the revision.
Attachment #9007024 - Flags: review+
(In reply to Kyle Machulis [:qdot] [:kmachulis] (if a patch has no decent commit message, automatic r-) from comment #12)
> :gijs The PoC in this bug has a onbeforeunloadevent handler that looks like
> this (where "status" is the ID of an img tag):
> 
> onbeforeunload = function() {
> 	document.getElementById("status").src = "/status";
> 	return "";
> }
> 
> The src set for the image tag is what causes the unload prompt to disappear.
> I'm not quite sure where that cancellation comes from though.

AFAICT the issue is that the test uses the `load` event to trigger a different load. The sequence of events ends up being:

1. load the original page
2. user tries to leave by going to mozilla.org / google.com / whatever
3. we fire the beforeunload event
4. the page triggers a load in the image
5. we finish firing beforeunload event and show the dialog to the user because of the return value of the beforeunload event handler
6. the `load` event fires, triggering another navigation in the onload handler that the page defines (to http://localhost/redirect)
7. we trigger a `pagehide` for the original page because of the new navigation
8. that closes the beforeunload dialog here: https://searchfox.org/mozilla-central/rev/d4ef4e9747133aa2914aca2a15cf9df1e42a6aa0/toolkit/components/prompts/src/nsPrompter.js#384-400 (through the MM to here: https://searchfox.org/mozilla-central/rev/d4ef4e9747133aa2914aca2a15cf9df1e42a6aa0/browser/modules/RemotePrompt.jsm#50-60 ).

IMO (and I haven't looked at the specs, so apply liberal amounts of salt) we go wrong at step 6. I think ideally we shouldn't allow new events (or timeouts or anything else) to fire while the beforeunload dialog is up. We should just queue up everything and fire it after the dialog goes away, or something.

We've previously added code that stopped dialogs, and I thought we actually had code that stopped navigation while the dialog was up ( https://searchfox.org/mozilla-central/rev/d4ef4e9747133aa2914aca2a15cf9df1e42a6aa0/docshell/base/nsDocShell.cpp#650-659 ). It's possible we now only do this while the event is on the stack, or something, and not while the dialog is up (after the event is off the stack) and that's why this is broken? I'm not sure.

As an aside, we will need to be careful about how to address frames here. I *think* as soon as a beforeunload dialog is up anywhere in the DOM <iframe>/<frameset> tree, we should be disabling navigation everywhere in that same-type-docshell-tree (so both above and below in that tree). There may be a separate bug lurking there, and it may or may not be on file...

In the past, disabling all JS in that docshell tree while the dialog was up has been difficult because the tab-modal dialogs use a spinning event loop to work. I don't know if that's something we can do better at here.

Passing this back to qDot given the above...
Flags: needinfo?(gijskruitbosch+bugs) → needinfo?(kyle)
Ok, so I guess the question is, how do we want to take care of the current security issue? I think at least removing the spoof (which the patch has been approved for, just needs a test now) is a good start, because figuring out the rest of this is going to take a while, especially with the many changes to DocShell that are happening at the moment. I can get that test done, land close/this (and maybe follow up on other similar bugs), then break Comment 16 out into another non security bug and continue work?
Flags: needinfo?(kyle) → needinfo?(jhofmann)
(In reply to Kyle Machulis [:qdot] [:kmachulis] (if a patch has no decent commit message, automatic r-) from comment #17)
> Ok, so I guess the question is, how do we want to take care of the current
> security issue? I think at least removing the spoof (which the patch has
> been approved for, just needs a test now) is a good start, because figuring
> out the rest of this is going to take a while, especially with the many
> changes to DocShell that are happening at the moment. I can get that test
> done, land close/this (and maybe follow up on other similar bugs), then
> break Comment 16 out into another non security bug and continue work?

Not sure if I should be the one making this call but I agree that breaking this spoof and following up with a deeper investigation in a different bug sounds like a good path forward.
Flags: needinfo?(jhofmann)
Got sidetracked on some fission stuff. Trying to get a test done for this then it should be ready to land. Hopefully in by tomorrow.
While trying to build a test for this (which is complicated since we need to stall the request), I'm seeing that my fix here no longer works either. We still return a failure, but for some reason it's not thrown through the XUL urlBar code, so the revert no longer happens. I have no idea why. Trying to figure things out now.
(In reply to Kyle Machulis [:qdot] [:kmachulis] (if a patch has no decent commit message, automatic r-) from comment #20)
> While trying to build a test for this (which is complicated since we need to
> stall the request), I'm seeing that my fix here no longer works either. We
> still return a failure, but for some reason it's not thrown through the XUL
> urlBar code, so the revert no longer happens. I have no idea why. Trying to
> figure things out now.

Anything we can help with?
Flags: needinfo?(kyle)
Sorry, got pulled off onto some fission work. Trying to get that landed now, going to get back to looking at this today hopefully, mostly trying to figure out why the prompt rejection error in DocShell (which would reset the URL bar) seems to fire after we've already redirected.
Flags: needinfo?(kyle)
Ok, finally getting back to this. 

Due to bouncing back and forth between bugs, I apparently got confused with the patch for this one and didn't realize that it only works in non-e10s instances, which isn't super helpful. In an e10s situation, we end up returning early from OpenTrustedLinkIn in _loadURL in the urlbarBindings.xml XUL, well before InternalLoad has a chance to fail on the prompt.

Now to figure out how to get this to work across e10s, because the fact that openTrustedLinkIn always succeeds is definitely an issue.
Ok, in the e10s case, our problem ends up being here:

https://searchfox.org/mozilla-central/source/toolkit/actors/WebNavigationChild.jsm#43

Nothing actually checks whether loadURI fails or throws here, and it's triggered by SendMessage(), so we're out of the execution context that had the URI revert on exception in it anyways. I've figured out ways to force a URI revert after this, so now it's just a matter of figuring out how to wire this up correctly to at least get the URL back to what it was, as this is called in instances that may not want a revert on failure.
Still trying to figure out the best path to take here.

The problem here is that our LoadURI call happens behind an IPC SendMessage call, and we lose the exception thrown from LoadURI due to that (see Comment 24). I'm trying to get this error propagated out to where browser.js could check for it, but that's proving difficult.

Currently, after the LoadURI failure happens, we still get an onLoad event call, which in turn calls browser.js's OnLocationChange handler. In that handler, URLBarSetURI is called with the correct URL (that of the server holding the stalled request), but since gBrowser.userTypedValue isn't null, we just accept what's in the URL bar currently (whatever the user typed) as what things are supposed to be. 

If I could figure out a way to get to gBrowser, I could probably set the userTypedValue to null when LoadURI throws an error that does not also display an error page (as happens in urlbarBindings.xml), but I'm not sure how to get there from WebNavigationChild.jsm. I've got a valid docShell in the WebNavigationChild, though honestly I'm not sure trying to worm my way through to gBrowser that was is a good idea anyways. Any ideas?
Flags: needinfo?(gijskruitbosch+bugs)
(In reply to Kyle Machulis [:qdot] [:kmachulis] (if a patch has no decent commit message, automatic r-) from comment #25)
> Still trying to figure out the best path to take here.
> 
> The problem here is that our LoadURI call happens behind an IPC SendMessage
> call, and we lose the exception thrown from LoadURI due to that (see Comment
> 24). I'm trying to get this error propagated out to where browser.js could
> check for it, but that's proving difficult.
> 
> Currently, after the LoadURI failure happens, we still get an onLoad event
> call, which in turn calls browser.js's OnLocationChange handler. In that
> handler, URLBarSetURI is called with the correct URL (that of the server
> holding the stalled request), but since gBrowser.userTypedValue isn't null,
> we just accept what's in the URL bar currently (whatever the user typed) as
> what things are supposed to be. 
> 
> If I could figure out a way to get to gBrowser, I could probably set the
> userTypedValue to null when LoadURI throws an error that does not also
> display an error page (as happens in urlbarBindings.xml), but I'm not sure
> how to get there from WebNavigationChild.jsm. I've got a valid docShell in
> the WebNavigationChild, though honestly I'm not sure trying to worm my way
> through to gBrowser that was is a good idea anyways. Any ideas?

gBrowser is in the parent process, your docshell is in the child process. So you're going to need to IPC something, somehow, or determine something in the parent from IPC that's already there.

Adding IPC would be something like sending RemoteWebNavigation:LoadURIThrew from the loadURI implementation, and listen for it in browser.xml , and clear userTypedValue iff the last triggered load is the same as the one that threw (to avoid ordering issues if there are 2 loads in quick succession). You'd need some kind of counter/ID to track things.

However, generally, I'd expect the userTypedValue to be cleared by the successful load of the next page - https://searchfox.org/mozilla-central/rev/3d989d097fa35afe19e814c6d5bc2f2bf10867e1/browser/base/content/tabbrowser.js#5010-5015 didStartLoadSinceLastUserTyping should be true here, because the load was started much later than the (user-triggered) load that triggered the beforeunload dialog.

Why aren't we hitting that code?
Flags: needinfo?(gijskruitbosch+bugs) → needinfo?(kyle)
We never make it there because we never officially start the load. _startedLoadSinceLastUserTyping is set at https://searchfox.org/mozilla-central/source/browser/base/content/tabbrowser.js#4849, but tabbrowser.js's OnStateChange is never called with STATE_START in this instance, so we never get there. This is most likely because we would need to get farther than we do in nsDocShell::InternalLoad for that to happen. 

However gross it may be, I'm not really seeing a way around having to bounce IPC messages back and forth at the moment in order to relay the error back to the URL Bar. We could maybe add some sort of "load aborted" state to nsIWebProgressListener and cleared based on that, but it's hard to say where all that would need to go in the loading logic.
Flags: needinfo?(kyle)
(In reply to Kyle Machulis [:qdot] [:kmachulis] (if a patch has no decent commit message, automatic r-) from comment #27)
> We never make it there because we never officially start the load.
> _startedLoadSinceLastUserTyping is set at
> https://searchfox.org/mozilla-central/source/browser/base/content/tabbrowser.
> js#4849, but tabbrowser.js's OnStateChange is never called with STATE_START
> in this instance, so we never get there. This is most likely because we
> would need to get farther than we do in nsDocShell::InternalLoad for that to
> happen. 

Ah, yes. Sorry, I forgot about that part, but now that you've said it I remember this is how things worked...

> However gross it may be, I'm not really seeing a way around having to bounce
> IPC messages back and forth at the moment in order to relay the error back
> to the URL Bar. We could maybe add some sort of "load aborted" state to
> nsIWebProgressListener and cleared based on that, but it's hard to say where
> all that would need to go in the loading logic.

Concur. I think we should be OK if we send a message back via the message manager, listen for that in browser.xml iff the browser's `isRemoteBrowser` prop is truthy, and as long as we can associate the exception with the URI (or some id) we attempted to load to avoid breaking quick subsequent loads if the content process is slow or something.
(In reply to :Gijs (he/him) from comment #28)
> (In reply to Kyle Machulis [:qdot] [:kmachulis] (if a patch has no decent
> commit message, automatic r-) from comment #27)
> > We never make it there because we never officially start the load.
> > _startedLoadSinceLastUserTyping is set at
> > https://searchfox.org/mozilla-central/source/browser/base/content/tabbrowser.
> > js#4849, but tabbrowser.js's OnStateChange is never called with STATE_START
> > in this instance, so we never get there. This is most likely because we
> > would need to get farther than we do in nsDocShell::InternalLoad for that to
> > happen. 
> 
> Ah, yes. Sorry, I forgot about that part, but now that you've said it I
> remember this is how things worked...

Wait, and then I re-read some more -- the *second* load is successful, though, right? That's kinda the point - that's the thing that gets the wrong URI in the address bar? Doesn't that send STATE_START ?
Uh. Hmm. You know, that's a good question, 'cause I wasn't seeing messages from that second load happen through the JS instrumentation (read: loads of dump statements) I added, which may possible be due to the stall introduced by the custom server we're using to repro this. I'll check back through this and see if I can figure out what's going on.
Did some more research to try and answer the question from Comment 29, here's a timeline of what's going on:

- We load the initial page (http://localhost:8000)
- During the call to onload, location.href is reset to another URL (http://localhost:8000/redirect)
- That URL stalls loading by not responding with data, which is why this requires server crafting to pull off. For reasons I'm not yet sure of, even though the response is stalled the UI makes the load look complete (no "X" cancel icon). Since we're already in the middle of the load though, this means we're after STATE_START, which is why the URL bar doesn't reset.
- Trying to navigate away calls onbeforeunload, which triggers an element in the page to load another URL that causes the /redirect request to complete, which is what updates the page, but not the URL bar.

The solution from Comment 28 should fix this specific case of the URL bar update, as we're now past the STATE_START check so we can't expect that logic to work. However, the end of the /redirect load causing the prompt to disappear also still seems bad. I'm not sure if we have a way to block the load from completing until the onbeforeunload prompt is cleared yet.
Flags: needinfo?(gijskruitbosch+bugs)
(In reply to Kyle Machulis [:qdot] [:kmachulis] (if a patch has no decent commit message, automatic r-) from comment #31)
> Did some more research to try and answer the question from Comment 29,
> here's a timeline of what's going on:
> 
> - We load the initial page (http://localhost:8000)
> - During the call to onload, location.href is reset to another URL
> (http://localhost:8000/redirect)
> - That URL stalls loading by not responding with data, which is why this
> requires server crafting to pull off. For reasons I'm not yet sure of, even
> though the response is stalled the UI makes the load look complete (no "X"
> cancel icon). Since we're already in the middle of the load though, this
> means we're after STATE_START, which is why the URL bar doesn't reset.
> - Trying to navigate away calls onbeforeunload, which triggers an element in
> the page to load another URL that causes the /redirect request to complete,
> which is what updates the page, but not the URL bar.
> 
> The solution from Comment 28 should fix this specific case of the URL bar
> update, as we're now past the STATE_START check so we can't expect that
> logic to work. However, the end of the /redirect load causing the prompt to
> disappear also still seems bad. I'm not sure if we have a way to block the
> load from completing until the onbeforeunload prompt is cleared yet.

OK. So AIUI we have:

0) load attacker page
1) initial redirect triggered by the webpage, which stalls (but crucially, this attempt to navigate doesn't itself trigger beforeunload, presumably because it happens before the page is even loaded and/or before the beforeunload handler is attached?)
2) user tries to load some other page
3) load from (1) completes instead, dismissing the beforeunload dialog.

:qDot, is that right?

This is... messy. I don't know how we would stop the load from (1) based on the dialog - it's supposed to have been stopped before starting in the first place. But if you're asking: "can we detect we're in the middle of prompting for beforeunload, and wait for that or break the load", the answer is "yes". Another alternative would be that we somehow store the pending navigation and cancel it if a new one is started (ie the attempt from (2) should clobber (1) before even showing the beforeunload dialog).

I don't know enough about docshell to know what our options are. It looks to me like we're deliberately letting the unload happen ( https://searchfox.org/mozilla-central/rev/a7f4d3ba4fbfe3efbde832869f1d672fce7122f6/layout/base/nsDocumentViewer.cpp#1268-1272 ) because otherwise users wouldn't be able to navigate the page while the dialog is up at all. I don't know if there's a sane way to distinguish these cases. Maybe :bz can help.
Flags: needinfo?(kyle)
Flags: needinfo?(gijskruitbosch+bugs)
Flags: needinfo?(bzbarsky)
What cases are we trying to distinguish?

When we say the load from (1) "completes", is that just it being canceled and therefore sending OnStopRequest?
Flags: needinfo?(bzbarsky)
(In reply to Boris Zbarsky [:bzbarsky, bz on IRC] from comment #33)
> What cases are we trying to distinguish?

Cases where the user navigates the page while the beforeunload dialog is up (which should just allow the unload and execute the user-requested navigation) vs. cases where the page navigates itself while the beforeunload dialog is up, esp. cases where that navigation started prior to the beforeunload event, but is completed once we're in the process of unloading for a user-generated request.

> When we say the load from (1) "completes", is that just it being canceled
> and therefore sending OnStopRequest?

I don't think so, because we end up with the contents of the target of that load in the browser window.

Specifically, the beforeunload handler in the page loaded at (0) (when you use the address bar to attempt to navigate) triggers the load of an image, which the server then uses to trigger completion of the stalled request (1) - this will all be happening while the beforeunload dialog is being put up, and I imagine then the page is allowed to unload (in order to show the response of (1) ) as a result of the code I linked in comment 32.

I'll let :qDot confirm.
Yeah that sounds correct to me.
Flags: needinfo?(kyle)
> Cases where the user navigates the page while the beforeunload dialog is up (which should just allow the unload and execute the user-requested navigation) vs. cases where the page navigates itself while the beforeunload dialog is up

Ah.

So in general, what we could do is that when a user navigation _starts_ while the beforeunload dialog is up we take down the dialog.  Then when the user navigation is ready, we just navigate away.  The problem is what happens if the page itself does a navigation that cancels the user one...

I don't have any great suggestions, short of flagging user loads somehow and treating them differently.  Most simply, we could check whether the triggering principal on the load is system, right?
(In reply to Boris Zbarsky [:bzbarsky, bz on IRC] from comment #36)
> I don't have any great suggestions, short of flagging user loads somehow and
> treating them differently.  Most simply, we could check whether the
> triggering principal on the load is system, right?

Yes, that should work, in theory. :-)
I'm not real clear on what next steps here are, as I'm still not exactly sure where the onbeforeunload dialog is taken down on the load being finished that's triggered during the event. My read on this is that we want to only take down the dialog at that point if it's a user triggered load (not page triggered), but I've gotten lost trying to find where to do that inside of the Prompt code.
Flags: needinfo?(gijskruitbosch+bugs)
(In reply to Kyle Machulis [:qdot] [:kmachulis] (if a patch has no decent commit message, automatic r-) from comment #38)
> I'm not real clear on what next steps here are, as I'm still not exactly
> sure where the onbeforeunload dialog is taken down on the load being
> finished that's triggered during the event. My read on this is that we want
> to only take down the dialog at that point if it's a user triggered load
> (not page triggered), but I've gotten lost trying to find where to do that
> inside of the Prompt code.

I think my thinking was we should alter https://searchfox.org/mozilla-central/rev/a7f4d3ba4fbfe3efbde832869f1d672fce7122f6/layout/base/nsDocumentViewer.cpp#1268-1272 to be told the triggering principal for the new load that's prompted the unload, and if mInPermitUnload or mInPermitUnloadPrompt are true, and the triggering principal isn't the system principal, we should set *aPermitUnload to false instead of true, before returning.

However... re-reading the comments, I'm not sure we actually hit that code at point (3)  in comment #32, that is, for the page-triggered load, did we pass the "can we unload" check way earlier, or do we only start unloading the original page when the request to `redirect` completes, and then return early from the referenced code because the beforeunload dialog is already showing because of the user-triggered load (which is what I suspect...)?
Flags: needinfo?(gijskruitbosch+bugs) → needinfo?(kyle)
This bug seems somewhat related to bug 1505389 and bug 1418134. It looks like we've got a lot of places where loads can be cancelled while other things might be happening, and how/where those loads are cancelled may affect things different.

In particular, I'm curious about https://bugzilla.mozilla.org/show_bug.cgi?id=1505389#c2. Step 7 in https://html.spec.whatwg.org/multipage/browsing-the-web.html#navigating-across-documents says 

"Cancel any preexisting but not yet mature attempt to navigate browsingContext"

In the context of this bug, when we try to navigate elsewhere, we're already in the middle of a load. Doesn't this mean we should just cancel the load we're in the middle of, before onbeforeunload is called? If we cancel that load before the "finish loading" trigger happens in onbeforeunload, we should continue on as normal with the unload prompt not being cancelled. 

(Let me know if I need to restate this, having a problem making it sound coherent :/ )
Flags: needinfo?(kyle) → needinfo?(bzbarsky)
> "Cancel any preexisting but not yet mature attempt to navigate browsingContext"

Sure.  That's step 7.  Beforeunload fires in step 5.

> Doesn't this mean we should just cancel the load we're in the middle of, before onbeforeunload is called?

I don't think so, because step 7 comes after step 5...
Flags: needinfo?(bzbarsky)
And just to be clear, the spec has no reentry while the beforeunload dialog is up.  While it's up, the event  loop is completely blocked in the spec, with no event delivery of any sort happening.

I talked to :nika today and came up with a possible solution involving "pausing" loadgroups during InternalLoad. It's going to take some testing and special casing for things like sync XHRs, but I'm hoping it's viable as a fix to this and other issues.

Suspend the docshell loadgroup while running the PermitUnload query
while loading a page. This will stop the permission dialog from being
dismissed.

Added a new patch for pausing the loadgroup while we query the user. This seems to both fix this problem and passes try:

https://treeherder.mozilla.org/#/jobs?repo=try&revision=ab37a1457a940537f2438ae3ab07fc3e65d9e356

I still need to integrate the load stalling test from this bug as an sjs test.

Attachment #9040220 - Attachment is obsolete: true
Assignee: kyle → nobody

Hi! Do you know who to route this to so a new owner can be found? Thanks!

Flags: needinfo?(gijskruitbosch+bugs)

(In reply to Luan Herrera from comment #46)

Hi! Do you know who to route this to so a new owner can be found? Thanks!

I'm not best placed for this, unfortunately. Nika?

Flags: needinfo?(gijskruitbosch+bugs) → needinfo?(nika)

It looks like, looking at the spec, the correct behaviour here would be to cancel the existing fetch before we ever display the prompt to unload. That should theoretically prevent this issue from occurring, as we won't be able to cause the original load to finish while this load is ongoing.

I don't have a ton of extra time, but I'll try to find some time to look into this soon, and see how much doing that breaks.

Step 9 of https://html.spec.whatwg.org/multipage/browsing-the-web.html#navigate:

  1. Cancel any preexisting but not yet mature attempt to navigate browsingContext, including canceling any instances of the fetch algorithm started by those attempts. If one of those attempts has already created and initialized a new Document object, abort that Document also. (Navigation attempts that have matured already have session history entries, and are therefore handled during the update the session history with the new page algorithm, later.)

  2. Prompt to unload the active document of browsingContext. If the user refused to allow the document to be unloaded, then return.

Assignee: nobody → nika
Flags: needinfo?(nika)

(In reply to :Nika Layzell (ni? for response) from comment #48)

It looks like, looking at the spec, the correct behaviour here would be to cancel the existing fetch before we ever display the prompt to unload. That should theoretically prevent this issue from occurring, as we won't be able to cause the original load to finish while this load is ongoing.

I'm confused. IIRC the additional fetch is happening after the unload dialog has shown up (cf. comment #41). That is, AIUI the attempt (by the user) to navigate starts, and as part of step 10 in that attempt to navigate, we fire an event that triggers an image fetch (which doesn't itself navigate so wouldn't be cancelled by step 9 as quoted below, AIUI - and anyway, happens after step 9), and then while still in step 10 (because tab-modal prompts don't stop events/timeouts/... firing inside that docshell tree) the load of the image trips an event listener which starts a second attempt to navigate, while still in step 10 (because the dialog is up so the original attempt to navigate has not moved forward while all this happened). See comment #41 and comment #42.

So I don't think doing what you suggest would help with this bug... Am I missing something?

Step 9 of https://html.spec.whatwg.org/multipage/browsing-the-web.html#navigate:

  1. Cancel any preexisting but not yet mature attempt to navigate browsingContext, including canceling any instances of the fetch algorithm started by those attempts. If one of those attempts has already created and initialized a new Document object, abort that Document also. (Navigation attempts that have matured already have session history entries, and are therefore handled during the update the session history with the new page algorithm, later.)

  2. Prompt to unload the active document of browsingContext. If the user refused to allow the document to be unloaded, then return.

Flags: needinfo?(nika)

Nevermind, I misunderstood. Looking into other solutions.

Flags: needinfo?(nika)

Could we leverage whatever the JS debugger does when you're paused on a breakpoint to just stop scheduling of event/timeout/microtask runs in the given docshell tree while a tab-modal dialog is up? AIUI that would fix this...

(In reply to :Gijs (he/him) from comment #49)

I'm confused. IIRC the additional fetch is happening after the unload dialog has shown up (cf. comment #41). That is, AIUI the attempt (by the user) to navigate starts, and as part of step 10 in that attempt to navigate, we fire an event that triggers an image fetch (which doesn't itself navigate so wouldn't be cancelled by step 9 as quoted below, AIUI - and anyway, happens after step 9), and then while still in step 10 (because tab-modal prompts don't stop events/timeouts/... firing inside that docshell tree) the load of the image trips an event listener which starts a second attempt to navigate, while still in step 10 (because the dialog is up so the original attempt to navigate has not moved forward while all this happened). See comment #41 and comment #42.

Actually, I think the second attempt to navigate while the first attempt is still waiting for the prompt should be aborted by step 5:

  1. If the prompt to unload algorithm is being run for the active document of browsingContext, then return without affecting the prompt to unload algorithm.

A.K.A. the second load attempt would be ignored before even starting. I think the current checks for stuff like this (which appear to be occurring a bit late in our impl) are doing the opposite - allowing the unload if we're already within a permitUnload block. https://searchfox.org/mozilla-central/rev/a777ff11b6d700a698c61e5bd17e73b044304494/layout/base/nsDocumentViewer.cpp#1269-1273

I might still be missing something though - I haven't spent as long looking at this stuff as others. (ni? :bz for potential insight)

Flags: needinfo?(bzbarsky)

(In reply to :Nika Layzell (ni? for response) from comment #52)

(In reply to :Gijs (he/him) from comment #49)

I'm confused. IIRC the additional fetch is happening after the unload dialog has shown up (cf. comment #41). That is, AIUI the attempt (by the user) to navigate starts, and as part of step 10 in that attempt to navigate, we fire an event that triggers an image fetch (which doesn't itself navigate so wouldn't be cancelled by step 9 as quoted below, AIUI - and anyway, happens after step 9), and then while still in step 10 (because tab-modal prompts don't stop events/timeouts/... firing inside that docshell tree) the load of the image trips an event listener which starts a second attempt to navigate, while still in step 10 (because the dialog is up so the original attempt to navigate has not moved forward while all this happened). See comment #41 and comment #42.

Actually, I think the second attempt to navigate while the first attempt is still waiting for the prompt should be aborted by step 5:

  1. If the prompt to unload algorithm is being run for the active document of browsingContext, then return without affecting the prompt to unload algorithm.

A.K.A. the second load attempt would be ignored before even starting. I think the current checks for stuff like this (which appear to be occurring a bit late in our impl) are doing the opposite - allowing the unload if we're already within a permitUnload block. https://searchfox.org/mozilla-central/rev/a777ff11b6d700a698c61e5bd17e73b044304494/layout/base/nsDocumentViewer.cpp#1269-1273

I might still be missing something though - I haven't spent as long looking at this stuff as others. (ni? :bz for potential insight)

FWIW, this sounds right to me. The reason we allow the unload atm is to make it so that users can close tabs (or navigate them) when the dialog is up, ie when the dialog is simply "in the way", attempting to close / navigate the tab the second time should Just Work.

We might be able to keep that if we inspect the triggering principal of the request to navigate, or we might be able to come up with a different way to make the "click the close button twice" usability thing work irrespective of how docshell behaves.

allowing the unload if we're already within a permitUnload block

That check is for whether the beforeunload stuff should run. It's a reentry prevention.

The "prevent navigation during beforeunload" bit is at https://searchfox.org/mozilla-central/rev/e18057a9613ffda06dfd3640209ca234ed7dc37d/docshell/base/nsDocShell.cpp#3716-3717 and makes nsDocShell::IsNavigationAllowed return false if beforeunload is firing.

So I just took a look at the spec, and now I'm confused. Either the spec changed or I misread it in comment 41. Probably the former, because the step numbers are all different.

At this point the spec says to cancel the old navigation in step 9 and fire beforeunload in step 10. See https://html.spec.whatwg.org/multipage/browsing-the-web.html#navigate. Then if the beforeunload allows the unload all other fetches are canceled in step 11. If we implemented this spec as written, it would solve this bug, right?

Flags: needinfo?(bzbarsky)

So it looks like the spec has said to do that for a while, so I misread it and it has changed. I am pretty sure we do not implement that as written, and it sounds like we should!

I haven't had the time to get around to this over the last few months, as I've been quite busy with other work, but I took a quick look at the logic today.

Currently, if an existing beforeunload is being handled for the active document, it returns a success value here. This logic appears to have been added in bug 864247. The obvious fix from the point-of-view of the standard would be to change this behaviour to instead reject the unload attempt.

As mentioned in comment 53, this may not be desirable for loads triggered by the UI, as it means that a beforeunload handler could prevent the user from navigating a tab using the browser UI. To handle that, we'd need to pass a triggering principal into nsIContentViewer::PermitUnload and check if it is the system principal. However, if we allow the nested load, we need to somehow prevent the outer load from occurring. I think we'll need some sort of mAllowedNestedUnload flag, which needs to be checked upon exiting from the permitunload event handler or the dialog, and automatically reject unloading if set. This should ensure that only one load is allowed to proceed past beforeunload at a time.

Even with those changes, though, I'm a bit worried about re-entrant loads due to subframes. After beforeunload is fired for the toplevel document, it is recursively fired in all subframes. However, neither mInPermitUnload nor mInPermitUnloadPrompt appear to be set on the parent nsDocumentViewer object while these nested events are firing. I worry something ~like this might be an issue:

  1. Navigation 1 is attempted, and beforeunload fired in subframe. Subframe is removed from document, and nested event loop is created using sync-xhr.
  2. Navigation 2 is started. No flag is set on the toplevel document, and subframe is no longer in tree, so the navigation begins.
  3. sync-xhr is resolved, and navigation 1's beforeunload is permitted to continue, leading to multiple concurrent navigations on the same docshell.

We may need to have a flag for the nsDocumentViewer which is set for the entire duration of PermitUnloadInternal, and check the mAllowedNestedUnload flag after firing the event on subframes as well.

Boris, could you take this since Nika is on leave?

Assignee: nika → nobody
Flags: needinfo?(bzbarsky)

Is this a high priority? This is a lot of investigatory and spec work to figure out what the behavior should be and define it, and I don't know that I can squeeze that in along with the other things I need to do right now.

Flags: needinfo?(bzbarsky)

Unfortunately this bug doesn't meet our criteria for spoofing bounties, both the user interaction required (narrow scope of attack) and lack of a lock icon indicating you've securely loaded the fake page.

Flags: sec-bounty? → sec-bounty-
Group: dom-core-security

NI for trying to repro it with the latest Fission architecture.

Flags: needinfo?(nkochar)

I'm not finding the time to get to this, so going to transfer the NI to Nika to try and see if this still is repros.

Flags: needinfo?(nkochar) → needinfo?(nika)
Whiteboard: [reporter-external] [client-bounty-form] [verif?] → [reporter-external] [client-bounty-form] [verif?][secdom:beforeunload]
Severity: normal → S3
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: