Closed Bug 1485836 Opened 6 years ago Closed 6 years ago

first tab is broken and completely white on first boot with security.sandbox.content.level >= 3

Categories

(Core :: Security: Process Sandboxing, defect, P1)

60 Branch
Unspecified
Windows
defect

Tracking

()

RESOLVED FIXED
mozilla66
Tracking Status
firefox66 --- fixed

People

(Reporter: richard, Assigned: richard)

Details

(Whiteboard: [tor 26381])

Attachments

(1 file)

User Agent: Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:60.0) Gecko/20100101 Firefox/60.0
Build ID: 20180607190419

Steps to reproduce:

So this affects the new Tor Browser alpha ( https://trac.torproject.org/projects/tor/ticket/26381 ).  When launching with security.sandbox.content.level >= 3 the first tab to open will either work successfully and display the about:tor page, the tab body will be completely white and will not navigate anywhere when urls are typed in the url bar, or the tab body will contain an error as described in the tor bug the tab still works for subsequent urls via the url bar.

I've done a bit of digging into the differences between successful launches and failing launches by shotgunning trace-logging everywhere that seems relevant (starting in the sandboxing code and working my way out).  

I know it's not a lot to go on but if y'all have any insight or debugging suggestions that would be great.
Component: Untriaged → Security: Process Sandboxing
Product: Firefox → Core
This seems to affect only Windows. Bob do you have an idea what could cause this? I.e. what is present in level 3 but not level 2 and could affect Tor Browser?

 FWIW: An other effect of this bug is that on some machines and can load the first page successfully and then start surfing as usual but after the third session the URLs are loaded in an endless loop and extensions are basically broken. One can get rid of that symptom by closing the session and adding `InvalidateCaches=1` in `compatibility.ini`. But the cycle starts again after three sessions. (https://trac.torproject.org/projects/tor/ticket/27261 has the details)

All this seems related to our Tor Launcher extension which is still an XPCOM one.
Status: UNCONFIRMED → NEW
Ever confirmed: true
Flags: needinfo?(bobowencode)
The likely difference between level 2 and 3, that could cause issues, is the removal of the user's own SID from the access token.
The main effect of this to remove access to the home directory and therefore the default installation directory under Desktop.

Access to the installation directory is given back through a sandbox rule, although it looks like it is not always giving access to the omni.ja file, which might be what is causing this.
It does give access some of the time though, so I need to work out why this fails in the one case.

As I'm sure you're aware though debugging Tor browser on Windows is far from a pleasant experience. :-)
I'll carry on looking into this.
Hacking at the sandbox code today does point to the user token sandboxing logic being the 'culprit' for Tor Browser.  Installing to Program Files makes the issue go away (though interestingly enough, installing ESR into the user directory does not cause the issue to surface).  I'll let you know if I find something concrete here as I continue digging.
Sorry got side-tracked, but after much further tracing through assembly, I think I've worked out what's going on.

It definitely appeared to be the blocking of omni.ja that was causing the looped loading, if I gave Everyone read access then it stopped.

What I couldn't understand was that the first request from a child for the broker to access omni.ja was failing, but the next one working.

Tracing through these step by step I realised that the policies for the child processes were different.
The first policy was missing all of the ones added through AddCachedDirRule [1], for example the ones at [2].
Turning on the logging with the following env vars confirmed that the base dirs hadn't been set:
set MOZ_LOG=SandboxBroker:5,sync
set MOZ_LOG_FILE=%TEMP%\torsandbox.log

These are set in SandboxBroker::GeckoDependentInitialize, which is called at [3].
So, something in Tor Browser must be triggering the launch of a content process too early and we are not in a position to set up the policy correctly.
I'm slightly surprised it doesn't cause other issues.

[1] https://searchfox.org/mozilla-central/rev/43969de34f5d4b113133d090f024e5eed7a82af0/security/sandbox/win/src/sandboxbroker/sandboxBroker.cpp#266
[2] https://searchfox.org/mozilla-central/rev/43969de34f5d4b113133d090f024e5eed7a82af0/security/sandbox/win/src/sandboxbroker/sandboxBroker.cpp#493
[3] https://searchfox.org/mozilla-central/rev/43969de34f5d4b113133d090f024e5eed7a82af0/toolkit/xre/nsAppRunner.cpp#4662
Flags: needinfo?(bobowencode)
This confirms much of what I was seeing, in particular on successful runs the content processes are launched later than on failed runs.  I'll see if I can figure out why/how the launches are being triggered early in Tor Browser.
Ok, with your clues, I've figured it out!

Basically our tor-launcher extension creates a window once tor.exe has launched, initialized and signaled back to the tor-launcher extension.  However, this all can happen before XRE_mainRun completes and makes it to SandboxBroker::GeckoDependentInitialize (which is where the static paths are initialized).  As a result, these various paths are null, which causes SandboxBroker::AddCachedDirs to fail until XRE_mainRun has had a chance to catch up.

The issue goes away if I move the call to SandboxBroker::GeckoDependenInitialize to before nsXREDirProvider::DoStartup (which is where various services and extensions are initialized).  All the child content processes correctly go through the SandboxBroker::AddCachedDirs calls, about:tor loads and the tab works.

Another potential solution here would be to have the RunPerformAsyncLaunch thread spin until the cached dirs have been populated.

What do you think?
Flags: needinfo?(bobowencode)
(In reply to Richard Pospesel (Tor Browser Dev) from comment #6)
...
> The issue goes away if I move the call to
> SandboxBroker::GeckoDependenInitialize to before nsXREDirProvider::DoStartup
> (which is where various services and extensions are initialized).  All the
> child content processes correctly go through the
> SandboxBroker::AddCachedDirs calls, about:tor loads and the tab works.

I'm not an expert in all of the dependencies during start-up, but I think we probably shouldn't be using the directory service before DoStartup.
It would be good if something enforced that, but it probably doesn't because ... history.

It might work for Tor, but we couldn't do it for Firefox at the moment, because the content temp directory is set up in DoStartup.
I have concerns that it might break (and possibly already be breaking) in other subtle ways or circumstances though.

At some point I'd like to get rid of the content temp dir and most of the other rules in here, we would probably still need to give access to the bin dir, but we could get that easily by other means.
Anyway that doesn't solve the problem now.

> Another potential solution here would be to have the RunPerformAsyncLaunch
> thread spin until the cached dirs have been populated.

I suppose that would work, but I don't really like it.

Would it be possible for the tor-launcher extension to register with the observer service for something later on and then continue?
"final-ui-startup" would seem to be a good choice.
Flags: needinfo?(bobowencode)
Ah this would almost certainly work!  We're currently triggering on the 'profile-after-change' notification.  I'll try it out and let you know how it goes.
Happy to report that switching to the final-ui-startup for our init logic has fixed this issue for us!
(In reply to Richard Pospesel (Tor Browser Dev) from comment #10)
> Happy to report that switching to the final-ui-startup for our init logic
> has fixed this issue for us!

Excellent, I'll resolve this as invalid at the moment, given that the current solution is in tor-launcher.
Please re-open or file a new bug if you run into other problems.
Status: NEW → RESOLVED
Closed: 6 years ago
Resolution: --- → INVALID
It would seem I was a bit too quick to declare victory here.  The final-ui-startup notification occurs too late for us, and other extension(s) and service(s) (the most obvious one being the updater service) which do network dependent work on the profile-after-change notification.  The skepticism surrounding early calling of GeckoDependentInitialize() seems to be warranted, as it would seem the base profile directory (NS_APP_USER_PROFILE_50_DIR) is not known yet (so the chrome and extensions directories are not white-listed correctly).

I've a patch up for Tor Browser which inserts the call in the DoStartup flow, after the NS_APP_USER_PROFILE_50_DIR directory has been set in the directory service ( https://gitweb.torproject.org/user/richard/tor-browser.git/commit/?h=bug_26381_v2&context=10&ignorews=0&dt=0 ) which seems to work, and all of the paths saved off in sandboxBroker.cpp have same values with this patch applied.

One thing to note here is that my earlier suggestion of having the RunPerformAsyncLaunch thread wait until the paths are initialized is also not going to work, since the tor-launcher actually blocks the parent process's main thread until the tor network settings window (hosted in a child content process) completes its work.  

I would be curious on what a correct fix would look like here, since any extension which triggers the creation of a child content process is going to potentially hit this race condition.
Status: RESOLVED → REOPENED
Resolution: INVALID → ---
(In reply to Richard Pospesel (Tor Browser Dev) from comment #12)
... 
> I've a patch up for Tor Browser which inserts the call in the DoStartup
> flow, after the NS_APP_USER_PROFILE_50_DIR directory has been set in the
> directory service (
> https://gitweb.torproject.org/user/richard/tor-browser.git/commit/
> ?h=bug_26381_v2&context=10&ignorews=0&dt=0 ) which seems to work, and all of
> the paths saved off in sandboxBroker.cpp have same values with this patch
> applied.
...
> I would be curious on what a correct fix would look like here, since any
> extension which triggers the creation of a child content process is going to
> potentially hit this race condition.

Ah, when I first suggested a later event I did wonder if and how you prevented other network access.
Given those constraints I think that this is probably the best solution.

I'll add a needinfo to remind me to come back and get your patch landed, thanks.
As I say I'd like to remove the dependency on the directory service at some point, so we could move this back out at that point.
Flags: needinfo?(bobowencode)
This is because we need it to be called before extensions initialize, but after
NS_APP_USER_PROFILE_50_DIR is known.
Attachment #9030430 - Flags: review?(nfroyd)
Assignee: nobody → bobowencode
Status: REOPENED → ASSIGNED
Assignee: bobowencode → richard
Flags: needinfo?(bobowencode)
Priority: -- → P1
Attachment #9030430 - Flags: review?(nfroyd) → review+
Pushed by bobowencode@gmail.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/11a1d58b955a
Move call to SandboxBroker::GeckoDependentInitialize into nsXREDirProvider::DoStartup. r=froydnj
https://hg.mozilla.org/mozilla-central/rev/11a1d58b955a
Status: ASSIGNED → RESOLVED
Closed: 6 years ago6 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla66
Whiteboard: [tor 26381]
OS: Unspecified → Windows
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: