Closed Bug 1386404 Opened 2 years ago Closed 2 years ago

Stop allowing Linux content processes to access /tmp

Categories

(Core :: Security: Process Sandboxing, enhancement, P2)

Unspecified
Linux
enhancement

Tracking

()

RESOLVED FIXED
mozilla60
Tracking Status
firefox60 --- fixed

People

(Reporter: jld, Assigned: gcp)

References

(Blocks 1 open bug)

Details

(Whiteboard: sb+)

Attachments

(7 files, 2 obsolete files)

Right now we're giving content processes read/write access to all of /tmp.  There is probably private data exposed that way that shouldn't be, as well as opportunities to actively attack other applications on the system (including other content processes in the same browser instance).  We should restrict access to a process-private temporary directory.

This opens some questions about how to clean up these directories, including when the parent process doesn't shut down normally, but we might already be dealing with that on other platforms.  In principle we could create a per-process tmpfs that would be destroyed by the kernel when the process exits, but that would need either unprivileged user namespaces or a root-privileged launch agent (aside from sandboxing tools like bubblewrap, I think systemd can also do this).

Because we're brokering all the file access, if we find that there's some library that insists on /tmp and won't honor $TMPDIR, we can rewrite the paths if we need to, like what we already do for /proc/self.
Priority: -- → P2
Whiteboard: sb+
This apparently breaks mozregression for some of the sandbox issues we've been running into lately, because it puts everything in /tmp, so things that normally fail don't fail.

Also, we're allowing read access to anything in /usr/tmp and /var/tmp; that should be reevaluated too.
Duplicate of this bug: 1388922
Assignee: nobody → gpascutto
Comment on attachment 8919366 [details]
Bug 1386404 - Enable content-process specific tmpdir on Linux.

https://reviewboard.mozilla.org/r/190216/#review195510

::: ipc/glue/GeckoChildProcessHost.cpp:569
(Diff revision 2)
> +      mRestoreString.Append(origVal);
> +
> +      // Passing to PR_SetEnv is ok here if we keep the the storage alive
> -  // for the time we launch the sub-process.  It's copied to the new
> +      // for the time we launch the sub-process.  It's copied to the new
> -  // environment.
> -  PR_SetEnv(buffer.BeginReading());
> +      // environment by PR_DuplicateEnvironment()
> +      PR_SetEnv(mSetString.get());

I'll probably wind up removing this class in bug 1401786 — my work-in-progress has GeckoChildProcessHost hold the launch options (including env var changes) in an instance variable, so it will be able to use that instead of side-effecting the parent's environment.

However, I'd suggest landing this as-is: the code already exists, and I don't want this bug to be unnecessarily blocked on the launch changes.
Comment on attachment 8919366 [details]
Bug 1386404 - Enable content-process specific tmpdir on Linux.

https://reviewboard.mozilla.org/r/190214/#review195512

::: ipc/glue/GeckoChildProcessHost.cpp:610
(Diff revision 2)
> +  AutoSetAndRestoreEnvVarForChildProcess("NSPR_LOG_FILE", nsprLogName);
> +  AutoSetAndRestoreEnvVarForChildProcess("MOZ_LOG_FILE", nsprLogName);

Is the omission of the local variable name an intentional thing?

I see that AutoSetAndRestoreEnvVarForChildProcess() won't set or restore environment variables if the new value is NULL, but could we skip these when the original environment variables are not set? Seems unnecessary to create the object when we know we don't need to.

::: ipc/glue/GeckoChildProcessHost.cpp:611
(Diff revision 2)
> -      mRestoreOrigMozLogName.Append(origMozLogName);
> -    }
> -    SetChildLogName("MOZ_LOG_FILE=", origMozLogName, mozLogName);
>    }
> +  AutoSetAndRestoreEnvVarForChildProcess("NSPR_LOG_FILE", nsprLogName);
> +  AutoSetAndRestoreEnvVarForChildProcess("MOZ_LOG_FILE", nsprLogName);

s/nsprLogName/mozLogName/

::: ipc/glue/GeckoChildProcessHost.cpp:615
(Diff revision 2)
> +  AutoSetAndRestoreEnvVarForChildProcess("NSPR_LOG_FILE", nsprLogName);
> +  AutoSetAndRestoreEnvVarForChildProcess("MOZ_LOG_FILE", nsprLogName);
>  
>    // `RUST_LOG_CHILD` is meant for logging child processes only.
> -  if (childRustLog) {
> -    if (mRestoreOrigRustLog.IsEmpty()) {
> +  const char* childRustLog = PR_GetEnv("RUST_LOG_CHILD");
> +  AutoSetAndRestoreEnvVarForChildProcess("RUST_LOG", childRustLog);

Skip if childRustLog is NULL?
(In reply to Haik Aftandilian [:haik] from comment #7)
> Is the omission of the local variable name an intentional thing?

No, that's a bad bug actually.
 
> I see that AutoSetAndRestoreEnvVarForChildProcess() won't set or restore
> environment variables if the new value is NULL, but could we skip these when
> the original environment variables are not set? Seems unnecessary to create
> the object when we know we don't need to.

The working of this relies on RAII, so unfortunately that's not possible: a conditional check would create an extra scope. I know of no easy way around that without making this much more complicated :(

The extra object is on the stack so it really only consumes stack space, though.

> >    // `RUST_LOG_CHILD` is meant for logging child processes only.
> > -  if (childRustLog) {
> > -    if (mRestoreOrigRustLog.IsEmpty()) {
> > +  const char* childRustLog = PR_GetEnv("RUST_LOG_CHILD");
> > +  AutoSetAndRestoreEnvVarForChildProcess("RUST_LOG", childRustLog);
> 
> Skip if childRustLog is NULL?


See above, that won't work.
From discussing on IRC: Using UniquePtr and putting the "env" guards on the heap could work. 

There's also mfbt/Maybe.h which may solve this better, I'll give it a shot.
Comment on attachment 8919366 [details]
Bug 1386404 - Enable content-process specific tmpdir on Linux.

https://reviewboard.mozilla.org/r/190216/#review195972

r+ Reminder to add the browser_content_sandbox_utils.js change.

I like the use of Maybe<> and emplace(). Looks like a nice clean way to do this.
Attachment #8919366 - Flags: review?(haftandilian) → review+
Comment on attachment 8922414 [details]
Bug 1386404 - Intercept access to /tmp and rewrite to content process tempdir.

https://reviewboard.mozilla.org/r/193444/#review198684


C/C++ static analysis found 1 defect in this patch.

You can run this analysis locally with: `./mach static-analysis check path/to/file.cpp`

If you see a problem in this automated review, please report it here: http://bit.ly/2y9N9Vx


::: security/sandbox/linux/broker/SandboxBroker.cpp:531
(Diff revision 1)
> +    NS_LITERAL_CSTRING("/tmp"),
> +    NS_LITERAL_CSTRING("/var/tmp"),
> +    NS_LITERAL_CSTRING("/usr/tmp")
> +  };
> +
> +  for (size_t i = 0; i < ArrayLength(tempPrefixes); i++) {

Warning: Use range-based for loop instead [clang-tidy: modernize-loop-convert]
Comment on attachment 8922413 [details]
Bug 1386404 - Enable access to the entire chrome dir from content.

https://reviewboard.mozilla.org/r/193442/#review198784
Attachment #8922413 - Flags: review?(jld) → review+
Comment on attachment 8922415 [details]
Bug 1386404 - Only do the tmp remapping if needed.

https://reviewboard.mozilla.org/r/193446/#review198814
Attachment #8922415 - Flags: review?(jld) → review+
Comment on attachment 8922414 [details]
Bug 1386404 - Intercept access to /tmp and rewrite to content process tempdir.

https://reviewboard.mozilla.org/r/193444/#review198786

::: security/sandbox/linux/broker/SandboxBroker.cpp:525
(Diff revision 1)
>  
> +size_t
> +SandboxBroker::RemapTempDirs(char* aPath, size_t aBufSize, size_t aPathLen)
> +{
> +  nsAutoCString path(aPath);
> +  static const nsCString tempPrefixes[] = {

Does declaring static `nsCString`s like this add static constructors?  I think we're generally trying not to do that if possible.

It looks like changing the type to `nsLiteralCString` probably fixes that, if it turns out to be a problem.

::: security/sandbox/linux/broker/SandboxBroker.cpp:528
(Diff revision 1)
> +{
> +  nsAutoCString path(aPath);
> +  static const nsCString tempPrefixes[] = {
> +    NS_LITERAL_CSTRING("/tmp"),
> +    NS_LITERAL_CSTRING("/var/tmp"),
> +    NS_LITERAL_CSTRING("/usr/tmp")

Do we even need to allow `/usr/tmp` and `/var/tmp`?  Is anything using them besides that ridiculous PRNG code in NSS that tries to read the directories as files and use it as entropy?  That code looks like it won't complain if we just deny access.

Even if there is anything else in those directories, we were previously giving read-only access, so if that hypothetical thing were trying to read files placed there by something else, remapping it would break it anyway.

::: security/sandbox/linux/broker/SandboxBroker.cpp:545
(Diff revision 1)
> +      if (NS_SUCCEEDED(rv)) {
> +        nsAutoCString tmpPath;
> +        rv = tmpDir->GetNativePath(tmpPath);
> +        if (NS_SUCCEEDED(rv)) {
> +          tmpPath.Append(cutPath);
> +          base::strlcpy(aPath, tmpPath.get(), aBufSize);

This will silently truncate the path if it's too long for the buffer, but that probably shouldn't happen with well-behaved code.
Attachment #8922414 - Flags: review?(jld) → review+
Comment on attachment 8919366 [details]
Bug 1386404 - Enable content-process specific tmpdir on Linux.

https://reviewboard.mozilla.org/r/190216/#review199612


C/C++ static analysis found defects in this patch.
 - defects found by clang-tidy
 - defects found by clang-format

You can run this analysis locally with: `./mach static-analysis check path/to/file.cpp` and `./mach clang-format -p path/to/file.cpp`

If you see a problem in this automated review, please report it here: http://bit.ly/2y9N9Vx


A full diff for the formatting issues found by clang-format is provided here: https://queue.taskcluster.net/v1/task/local instance/runs/0/artifacts/public/results/fda7508c-190216-5-clang-format.diff

You can use it in your repository with `hg import`


::: ipc/glue/GeckoChildProcessHost.h:182
(Diff revision 5)
>    // with launching the sub-process.
> -  void SetChildLogName(const char* varName, const char* origLogName,
> +  void GetChildLogName(const char* origLogName, nsACString &buffer);

Warning: Incorrect coding style [clang-format]
Replace by: 

  // with launching the sub-process.
  void GetChildLogName(const char *origLogName, nsACString &buffer);


::: ipc/glue/GeckoChildProcessHost.cpp:40
(Diff revision 5)
>  #include "mozilla/Telemetry.h"
> +#include "mozilla/Maybe.h"
>  #include "ProtocolUtils.h"

Warning: Incorrect coding style [clang-format]
Replace by: 

#include "mozilla/Telemetry.h"
#include "mozilla/ipc/BrowserProcessSubThread.h"


::: ipc/glue/GeckoChildProcessHost.cpp:437
(Diff revision 5)
>  
>  void
> -GeckoChildProcessHost::SetChildLogName(const char* varName, const char* origLogName,
> +GeckoChildProcessHost::GetChildLogName(const char* origLogName,
>                                         nsACString &buffer)
>  {
>    // We currently have no portable way to launch child with environment
>    // different than parent.  So temporarily change NSPR_LOG_FILE so child
>    // inherits value we want it to have. (NSPR only looks at NSPR_LOG_FILE at
>    // startup, so it's 'safe' to play with the parent's environment this way.)

Warning: Incorrect coding style [clang-format]
Replace by: 


void GeckoChildProcessHost::GetChildLogName(const char *origLogName,
                                            nsACString &buffer) {
// We currently have no portable way to launch child with environment
// different than parent.  So temporarily change NSPR_LOG_FILE so child
// inherits value we want it to have. (NSPR only looks at NSPR_LOG_FILE at
// startup, so it's 'safe' to play with the parent's environment this way.)


::: ipc/glue/GeckoChildProcessHost.cpp:475
(Diff revision 5)
> +public:
> +  AutoSetAndRestoreEnvVarForChildProcess(const char* envVar,
> +                                         const char* newVal) {
> +    const char* origVal = PR_GetEnv(envVar);

Warning: Incorrect coding style [clang-format]
Replace by: 

public:
  AutoSetAndRestoreEnvVarForChildProcess(const char *envVar,
                                         const char *newVal) {
    const char *origVal = PR_GetEnv(envVar);


::: ipc/glue/GeckoChildProcessHost.cpp:491
(Diff revision 5)
> +  // Delegate helper
> +  AutoSetAndRestoreEnvVarForChildProcess(const char* envVar,
> +                                         nsCString& newVal)
> +    : AutoSetAndRestoreEnvVarForChildProcess(envVar, newVal.get()) {}
> +  ~AutoSetAndRestoreEnvVarForChildProcess() {
> +    PR_SetEnv(mRestoreString.get());
> -}
> +  }

Warning: Incorrect coding style [clang-format]
Replace by: 

  // Delegate helper
  AutoSetAndRestoreEnvVarForChildProcess(const char *envVar, nsCString &newVal)
      : AutoSetAndRestoreEnvVarForChildProcess(envVar, newVal.get()) {}
  ~AutoSetAndRestoreEnvVarForChildProcess() { PR_SetEnv(mRestoreString.get()); }



::: ipc/glue/GeckoChildProcessHost.cpp:518
(Diff revision 5)
> +
> +  const char* origNSPRLogName = PR_GetEnv("NSPR_LOG_FILE");
> +  const char* origMozLogName = PR_GetEnv("MOZ_LOG_FILE");

Warning: Incorrect coding style [clang-format]
Replace by: 


  const char *origNSPRLogName = PR_GetEnv("NSPR_LOG_FILE");
  const char *origMozLogName = PR_GetEnv("MOZ_LOG_FILE");


::: ipc/glue/GeckoChildProcessHost.cpp:533
(Diff revision 5)
>    // `RUST_LOG_CHILD` is meant for logging child processes only.
> +  const char* childRustLog = PR_GetEnv("RUST_LOG_CHILD");

Warning: Incorrect coding style [clang-format]
Replace by: 

  // `RUST_LOG_CHILD` is meant for logging child processes only.
  const char *childRustLog = PR_GetEnv("RUST_LOG_CHILD");


::: security/sandbox/linux/broker/SandboxBrokerPolicyFactory.cpp:273
(Diff revision 5)
>    nsCOMPtr<nsIFile> homeDir;
> -  rv = GetSpecialSystemDirectory(Unix_HomeDirectory, getter_AddRefs(homeDir));
> +  nsresult rv = GetSpecialSystemDirectory(Unix_HomeDirectory,
> +                                          getter_AddRefs(homeDir));

Warning: Incorrect coding style [clang-format]
Replace by: 

  nsCOMPtr<nsIFile> homeDir;
  nsresult rv =
      GetSpecialSystemDirectory(Unix_HomeDirectory, getter_AddRefs(homeDir));


::: toolkit/xre/nsXREDirProvider.h:146
(Diff revision 5)
> -#if (defined(XP_WIN) || defined(XP_MACOSX)) && defined(MOZ_CONTENT_SANDBOX)
> +#if defined(MOZ_CONTENT_SANDBOX)
>    nsCOMPtr<nsIFile>      mContentTempDir;
>    nsCOMPtr<nsIFile>      mContentProcessSandboxTempDir;

Warning: Incorrect coding style [clang-format]
Replace by: 

#if defined(MOZ_CONTENT_SANDBOX)
  nsCOMPtr<nsIFile> mContentTempDir;
  nsCOMPtr<nsIFile> mContentProcessSandboxTempDir;
Pushed by gpascutto@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/c2c40e4d9815
Enable content-process specific tmpdir on Linux. r=haik
https://hg.mozilla.org/integration/autoland/rev/4600c2d575f9
Enable access to the entire chrome dir from content. r=jld
https://hg.mozilla.org/integration/autoland/rev/b136f90dc49f
Intercept access to /tmp and rewrite to content process tempdir. r=jld
https://hg.mozilla.org/integration/autoland/rev/36556e1a5ac7
Only do the tmp remapping if needed. r=jld
The fallout also affects MacOS. This stack is interesting:

application crashed [@ __findenv + 0x5a]
11:04:32     INFO - Thread 0 (crashed)
11:04:32     INFO -  0  libsystem_c.dylib!__findenv + 0x5a
11:04:32     INFO -     rax = 0x0000000000000000   rdx = 0x000000011d6df000
11:04:32     INFO -     rcx = 0x000000000000004c   rbx = 0x00000001072c401d
11:04:32     INFO -     rsi = 0x00007fff5f443134   rdi = 0x00000001072c401d
11:04:32     INFO -     rbp = 0x00007fff5f443120   rsp = 0x00007fff5f443108
11:04:32     INFO -      r8 = 0x00000000fffffff0    r9 = 0x0000000000000010
11:04:32     INFO -     r10 = 0x000000011d6df0e8   r11 = 0x000000010c617334
11:04:32     INFO -     r12 = 0x0000000100e88850   r13 = 0x0000000000000000
11:04:32     INFO -     r14 = 0x00000000fffffff0   r15 = 0x0000000000000058
11:04:32     INFO -     rip = 0x00007fff90e9caa3
11:04:32     INFO -     Found by: given as instruction pointer in context
11:04:32     INFO -  1  libsystem_c.dylib!getenv + 0x1d
11:04:32     INFO -     rbp = 0x00007fff5f443140   rsp = 0x00007fff5f443130
11:04:32     INFO -     rip = 0x00007fff90e9cb27
11:04:32     INFO -     Found by: previous frame's frame pointer
11:04:32     INFO -  2  libnss3.dylib!<name omitted> [prenv.c:36556e1a5ac7 : 68 + 0x8]
11:04:32     INFO -     rbp = 0x00007fff5f443160   rsp = 0x00007fff5f443150
11:04:32     INFO -     rip = 0x0000000100c3204b
11:04:32     INFO -     Found by: previous frame's frame pointer
11:04:32     INFO -  3  XUL!XREMain::XRE_main(int, char**, mozilla::BootstrapConfig const&) [nsAppRunner.cpp:36556e1a5ac7 : 316 + 0xc]

So I guess it's PerformAsyncLaunch or so that must be busted.
Flags: needinfo?(gpascutto)
Comment on attachment 8919366 [details]
Bug 1386404 - Enable content-process specific tmpdir on Linux.

https://reviewboard.mozilla.org/r/190216/#review199704

::: ipc/glue/GeckoChildProcessHost.cpp:496
(Diff revision 5)
> +  // Delegate helper
> +  AutoSetAndRestoreEnvVarForChildProcess(const char* envVar,
> +                                         nsCString& newVal)
> +    : AutoSetAndRestoreEnvVarForChildProcess(envVar, newVal.get()) {}
> +  ~AutoSetAndRestoreEnvVarForChildProcess() {
> +    PR_SetEnv(mRestoreString.get());

This might be the problem, and I should have caught this when I looked at the patch earlier.  PR_SetEnv is like putenv() — the string passed becomes part of the environment and in general can't be safely freed.
Should just take a ptr to the original string (if any, another potential issue is that there's no null check).
(In reply to Gian-Carlo Pascutto [:gcp] from comment #32)
> Should just take a ptr to the original string (if any, another potential
> issue is that there's no null check).

I was going to suggest that, but there's a problem: you need a pointer to the entire "NAME=value" string in order to put it back without allocating.  You could try calling PR_GetEnv() and subtracting strlen("NAME=") but I don't know if that's guaranteed to work.

Freeing the temporary env var is also a problem, if another thread might read it and try to use it after it's freed, or if a library could read it and save the pointer.

I'm wondering if it might be better to do bug 1401786 first, so that we don't need to touch the parent process's environment for any of this.
I was wondering how this could ever have worked then, and I understand now: I forgot to remove the mRestoreOrigNSPRLogName etc vars from GeckoChildProcessHost, who presumably stays alive and hence avoids the issue. So that also suggests another fix here.
Comment on attachment 8924642 [details]
Bug 1386404 - Whitelist the prefix used by the XPCOM leak logs.

https://reviewboard.mozilla.org/r/195874/#review201104

r+ once you fix the compilation issue on Mac. GetDirectoryPath() is now defined after StartMacOSContentSandbox() where it's used on Mac.
Attachment #8924642 - Flags: review?(haftandilian) → review+
Of course, now the sandboxing tests fail because the leak log is written to the profile directory, so whitelisting that makes the profile readable from content (during tests). Sigh.
Comment on attachment 8924642 [details]
Bug 1386404 - Whitelist the prefix used by the XPCOM leak logs.

Interestingly, despite the patch being dropped and this one being a complete replacement, mozreview erroneously thinks it's the same and that you have already reviewed it.
Attachment #8924642 - Flags: review+ → review?(haftandilian)
Comment on attachment 8924642 [details]
Bug 1386404 - Whitelist the prefix used by the XPCOM leak logs.

https://reviewboard.mozilla.org/r/195874/#review201496

Hacky, but OK for DEBUG-only changes.
Attachment #8924642 - Flags: review?(haftandilian) → review+
Comment on attachment 8923827 [details]
Bug 1386404 - The environment has to live forever.

https://reviewboard.mozilla.org/r/194984/#review201526

I'm a little uneasy about the possibility of some other thread winding up with a pointer to one of the temporary environment variables, but we're already past the beta 58 merge so bug 1401786 will hopefully prevent that part from winding up in a release.
Attachment #8923827 - Flags: review?(jld) → review+
Pushed by gpascutto@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/d7f697bac6ef
Enable content-process specific tmpdir on Linux. r=haik
https://hg.mozilla.org/integration/autoland/rev/802a00ea50e7
Enable access to the entire chrome dir from content. r=jld
https://hg.mozilla.org/integration/autoland/rev/eac6eb517096
Intercept access to /tmp and rewrite to content process tempdir. r=jld
https://hg.mozilla.org/integration/autoland/rev/9eba087cf64a
Only do the tmp remapping if needed. r=jld
https://hg.mozilla.org/integration/autoland/rev/6224ffae752a
The environment has to live forever. r=jld
https://hg.mozilla.org/integration/autoland/rev/c80acdea24c1
Whitelist the prefix used by the XPCOM leak logs. r=haik
The macOS failures on inbound are because the TMPDIR interception code in GeckoChildProcessHost is triggering on macOS as well.

No idea what's up with the xpcshell test.
Backout by archaeopteryx@coole-files.de:
https://hg.mozilla.org/integration/autoland/rev/1aa6d3251c70
Backed out 6 changesets for XPCshell failures, at least on Linux. r=backout on a CLOSED TREE
Backed out for XPCshell failures, at least on Linux:

https://hg.mozilla.org/integration/autoland/rev/1aa6d3251c7039e8b0e8c94374ddde3f886b40eb

Push with failure: https://treeherder.mozilla.org/#/jobs?repo=autoland&revision=c80acdea24c1c7954c4560c05d4625776ac09134&filter-resultStatus=testfailed&filter-resultStatus=busted&filter-resultStatus=exception&filter-resultStatus=retry&filter-resultStatus=usercancel&filter-resultStatus=runnable
Push which ran more failing tests: https://treeherder.mozilla.org/#/jobs?repo=autoland&revision=973e547ce20c556c234201413716badcd41cb0b8&filter-resultStatus=testfailed&filter-resultStatus=busted&filter-resultStatus=exception&filter-resultStatus=retry&filter-resultStatus=usercancel&filter-resultStatus=runnable
The test-verify failures on the first link might even occur without these patches, they only run the tests multiple times if they are modified.

Failure log: https://treeherder.mozilla.org/logviewer.html#?job_id=142018241&repo=autoland

[task 2017-11-03T18:48:57.662Z] 18:48:57     INFO -  TEST-START | toolkit/forgetaboutsite/test/unit/test_removeDataFromDomain.js
[task 2017-11-03T18:48:58.709Z] 18:48:58  WARNING -  TEST-UNEXPECTED-FAIL | toolkit/forgetaboutsite/test/unit/test_removeDataFromDomain.js | xpcshell return code: -11
[task 2017-11-03T18:48:58.711Z] 18:48:58     INFO -  TEST-INFO took 1045ms
[task 2017-11-03T18:48:58.712Z] 18:48:58     INFO -  >>>>>>>
[task 2017-11-03T18:48:58.713Z] 18:48:58     INFO -  PID 10513 | [10513, Main Thread] WARNING: Couldn't get the user appdata directory. Crash events may not be produced.: file /builds/worker/workspace/build/src/toolkit/crashreporter/nsExceptionHandler.cpp, line 2818
[task 2017-11-03T18:48:58.713Z] 18:48:58     INFO -  PID 10513 | Couldn't convert chrome URL: chrome://branding/locale/brand.properties
[task 2017-11-03T18:48:58.714Z] 18:48:58     INFO -  PID 10513 | JavaScript strict warning: resource://gre/modules/PushService.jsm, line 23: ReferenceError: reference to undefined property "getCryptoParams"
[task 2017-11-03T18:48:58.715Z] 18:48:58     INFO -  (xpcshell/head.js) | test MAIN run_test pending (1)
[task 2017-11-03T18:48:58.715Z] 18:48:58     INFO -  (xpcshell/head.js) | test run_next_test 0 pending (2)
[task 2017-11-03T18:48:58.716Z] 18:48:58     INFO -  (xpcshell/head.js) | test MAIN run_test finished (2)
[task 2017-11-03T18:48:58.716Z] 18:48:58     INFO -  running event loop
[task 2017-11-03T18:48:58.717Z] 18:48:58     INFO -  PID 10513 | Couldn't convert chrome URL: chrome://branding/locale/brand.properties
[task 2017-11-03T18:48:58.717Z] 18:48:58     INFO -  PID 10513 | [10513, Main Thread] WARNING: Could not get the program name for a cubeb stream.: 'NS_SUCCEEDED(rv)', file /builds/worker/workspace/build/src/dom/media/CubebUtils.cpp, line 372
[task 2017-11-03T18:48:58.718Z] 18:48:58     INFO -  "CONSOLE_MESSAGE: (info) No chrome package registered for chrome://branding/locale/brand.properties"
[task 2017-11-03T18:48:58.719Z] 18:48:58     INFO -  "CONSOLE_MESSAGE: (warn) [JavaScript Warning: "ReferenceError: reference to undefined property "getCryptoParams"" {file: "resource://gre/modules/PushService.jsm" line: 23}]"
[task 2017-11-03T18:48:58.720Z] 18:48:58     INFO -  toolkit/forgetaboutsite/test/unit/test_removeDataFromDomain.js | Starting test_history_cleared_with_direct_match
[task 2017-11-03T18:48:58.720Z] 18:48:58     INFO -  (xpcshell/head.js) | test test_history_cleared_with_direct_match pending (2)
[task 2017-11-03T18:48:58.721Z] 18:48:58     INFO -  PID 10513 | [10513, Main Thread] WARNING: NS_ENSURE_SUCCESS(rv, rv) failed with result 0x80004002: file /builds/worker/workspace/build/src/toolkit/components/resistfingerprinting/nsRFPService.cpp, line 182
[task 2017-11-03T18:48:58.722Z] 18:48:58     INFO -  (xpcshell/head.js) | test run_next_test 0 finished (2)
[task 2017-11-03T18:48:58.723Z] 18:48:58     INFO -  "CONSOLE_MESSAGE: (info) No chrome package registered for chrome://branding/locale/brand.properties"
[task 2017-11-03T18:48:58.724Z] 18:48:58     INFO -  TEST-PASS | toolkit/forgetaboutsite/test/unit/test_removeDataFromDomain.js | test_history_cleared_with_direct_match - [test_history_cleared_with_direct_match : 242] true == true
[task 2017-11-03T18:48:58.725Z] 18:48:58     INFO -  PID 10513 | JavaScript strict warning: resource://testing-common/PlacesTestUtils.jsm, line 85: ReferenceError: reference to undefined property "transition"
[task 2017-11-03T18:48:58.726Z] 18:48:58     INFO -  PID 10513 | JavaScript strict warning: resource://gre/modules/PlacesUtils.jsm, line 1018: ReferenceError: reference to undefined property "description"
[task 2017-11-03T18:48:58.727Z] 18:48:58     INFO -  "CONSOLE_MESSAGE: (warn) [JavaScript Warning: "ReferenceError: reference to undefined property "transition"" {file: "resource://testing-common/PlacesTestUtils.jsm" line: 85}]"
[task 2017-11-03T18:48:58.727Z] 18:48:58     INFO -  "CONSOLE_MESSAGE: (warn) [JavaScript Warning: "ReferenceError: reference to undefined property "description"" {file: "resource://gre/modules/PlacesUtils.jsm" line: 1018}]"
[task 2017-11-03T18:48:58.728Z] 18:48:58     INFO -  TEST-PASS | toolkit/forgetaboutsite/test/unit/test_removeDataFromDomain.js | test_history_cleared_with_direct_match - [test_history_cleared_with_direct_match : 244] true == true
[task 2017-11-03T18:48:58.729Z] 18:48:58     INFO -  PID 10513 | [10513, Main Thread] WARNING: NS_ENSURE_SUCCESS(rv, rv) failed with result 0x80520012: file /builds/worker/workspace/build/src/extensions/cookie/nsPermissionManager.cpp, line 2871
[task 2017-11-03T18:48:58.730Z] 18:48:58     INFO -  PID 10513 | LoadPlugin() /tmp/xpc-plugins-tsbDw8/libnpthirdtest.so returned e667b1c0
[task 2017-11-03T18:48:58.731Z] 18:48:58     INFO -  PID 10513 | LoadPlugin() /tmp/xpc-plugins-tsbDw8/libnpswftest.so returned e667b1e0
[task 2017-11-03T18:48:58.731Z] 18:48:58     INFO -  PID 10513 | LoadPlugin() /tmp/xpc-plugins-tsbDw8/libnpsecondtest.so returned e38ed2a0
[task 2017-11-03T18:48:58.732Z] 18:48:58     INFO -  PID 10513 | LoadPlugin() /tmp/xpc-plugins-tsbDw8/libnptest.so returned e38ed300
[task 2017-11-03T18:48:58.732Z] 18:48:58     INFO -  PID 10513 | [10513, Main Thread] WARNING: Couldn't get the user appdata directory, crash dumps will go in an unusual location: file /builds/worker/workspace/build/src/toolkit/crashreporter/nsExceptionHandler.cpp, line 2895
[task 2017-11-03T18:48:58.733Z] 18:48:58     INFO -  PID 10513 | Assertion failure: NS_IsMainThread() (nsXPCWrappedJS::CallMethod called off main thread), at /builds/worker/workspace/build/src/js/xpconnect/src/XPCWrappedJS.cpp:611
[task 2017-11-03T18:49:17.909Z] 18:49:17     INFO -  PID 10513 | #01: nsXPCWrappedJS::CallMethod [js/xpconnect/src/XPCWrappedJS.cpp:610]
[task 2017-11-03T18:49:17.911Z] 18:49:17     INFO -  PID 10513 | #02: PrepareAndDispatch [xpcom/reflect/xptcall/md/unix/xptcstubs_gcc_x86_unix.cpp:62]
[task 2017-11-03T18:49:17.912Z] 18:49:17     INFO -  PID 10513 | #03: FindProviderFile [xpcom/io/nsDirectoryService.cpp:220]
[task 2017-11-03T18:49:17.913Z] 18:49:17     INFO -  PID 10513 | #04: nsDirectoryService::Get [xpcom/io/nsDirectoryService.cpp:249]
[task 2017-11-03T18:49:17.914Z] 18:49:17     INFO -  PID 10513 | #05: mozilla::ipc::GeckoChildProcessHost::PerformAsyncLaunch [xpcom/io/nsDirectoryServiceUtils.h:28]
[task 2017-11-03T18:49:17.916Z] 18:49:17     INFO -  PID 10513 | #06: mozilla::ipc::GeckoChildProcessHost::RunPerformAsyncLaunch [ipc/glue/GeckoChildProcessHost.cpp:582]
[task 2017-11-03T18:49:17.918Z] 18:49:17     INFO -  PID 10513 | #07: mozilla::detail::RunnableMethodImpl<mozilla::ipc::GeckoChildProcessHost*, bool (mozilla::ipc::GeckoChildProcessHost::*)(std::vector<std::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::basic_string<char, std::char_traits<char>, std::allocator<char> > > >), false, (mozilla::RunnableKind)0u, std::vector<std::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::basic_string<char, std::char_traits<char>, std::allocator<char> > > > >::Run [xpcom/threads/nsThreadUtils.h:1142]
[task 2017-11-03T18:49:17.919Z] 18:49:17     INFO -  PID 10513 | #08: MessageLoop::RunTask [xpcom/base/nsCOMPtr.h:631]
[task 2017-11-03T18:49:17.920Z] 18:49:17     INFO -  PID 10513 | #09: MessageLoop::DeferOrRunPendingTask [ipc/chromium/src/base/message_loop.cc:460]
[task 2017-11-03T18:49:17.921Z] 18:49:17     INFO -  PID 10513 | #10: MessageLoop::DoWork [ipc/chromium/src/base/message_loop.cc:535]
[task 2017-11-03T18:49:17.922Z] 18:49:17     INFO -  PID 10513 | #11: base::MessagePumpLibevent::Run [ipc/chromium/src/base/message_pump_libevent.cc:353]
[task 2017-11-03T18:49:17.924Z] 18:49:17     INFO -  PID 10513 | #12: MessageLoop::RunInternal [ipc/chromium/src/base/message_loop.cc:327]
[task 2017-11-03T18:49:17.925Z] 18:49:17     INFO -  PID 10513 | #13: MessageLoop::Run [ipc/chromium/src/base/message_loop.cc:298]
[task 2017-11-03T18:49:17.926Z] 18:49:17     INFO -  PID 10513 | #14: base::Thread::ThreadMain [ipc/chromium/src/base/thread.cc:184]
[task 2017-11-03T18:49:17.927Z] 18:49:17     INFO -  PID 10513 | #15: ThreadFunc [ipc/chromium/src/base/platform_thread_posix.cc:40]
[task 2017-11-03T18:49:17.928Z] 18:49:17     INFO -  PID 10513 | #16: libpthread.so.0 + 0x6295
[task 2017-11-03T18:49:17.929Z] 18:49:17     INFO -  PID 10513 | ExceptionHandler::GenerateDump cloned child 10540
[task 2017-11-03T18:49:17.930Z] 18:49:17     INFO -  PID 10513 | ExceptionHandler::SendContinueSignalToChild sent continue signal to child
[task 2017-11-03T18:49:17.931Z] 18:49:17     INFO -  PID 10513 | ExceptionHandler::WaitForContinueSignal waiting for continue signal...
[task 2017-11-03T18:49:17.932Z] 18:49:17     INFO -  PID 10513 | ERROR:audioipc_server: server poll error: Interrupted system call (os error 4)
[task 2017-11-03T18:49:17.933Z] 18:49:17     INFO -  <<<<<<<
Flags: needinfo?(gpascutto)
Comment on attachment 8923827 [details]
Bug 1386404 - The environment has to live forever.

https://reviewboard.mozilla.org/r/194984/#review201974

In hindsight, it really wasn't appropriate for me to r+ a patch with known memory safety issues just because I had a vague idea that they might not happen.

::: ipc/glue/GeckoChildProcessHost.cpp:508
(Diff revision 2)
> +      PR_SetEnv(mRestorePtr);
> +    }
>    }
>  private:
> +  // Needs to live for process launch.
>    nsAutoCString mSetString;

This string also needs to live forever; see bug 1297740 comment #6.
Attachment #8923827 - Flags: review+ → review-
Comment on attachment 8923827 [details]
Bug 1386404 - The environment has to live forever.

https://reviewboard.mozilla.org/r/194984/#review201978

::: ipc/glue/GeckoChildProcessHost.h:203
(Diff revision 2)
>    nsCString mRestoreOrigMozLogName;
>    nsCString mRestoreOrigRustLog;
> +  nsCString mRestoreTmpDir;
> +  nsCString mRestoreXdgCacheHome;
> +  nsCString mRestoreXdgCacheDir;
> +  nsCString mRestoreMesaCacheDir;

Also, I'm not convinced that these instance variables live long enough, and I'm not sure why I ever thought that would be the case.
(In reply to Jed Davis [:jld] (⏰UTC-7) from comment #62)
> >  private:
> > +  // Needs to live for process launch.
> >    nsAutoCString mSetString;
> 
> This string also needs to live forever; see bug 1297740 comment #6.

To clarify: it doesn't need to live forever for the observable code, because at destruction we PR_SetEnv to another string. But it's not guaranteed another thread might not be looking up TMPDIR at the same time, and end up with a pointer to it anyhow.
Flags: needinfo?(gpascutto)
Comment on attachment 8925596 [details]
Bug 1386404 - Get content tmp dir early, avoid replacement on macOS.

https://reviewboard.mozilla.org/r/196708/#review201992

r+ with some minor comments.

::: ipc/glue/GeckoChildProcessHost.h:195
(Diff revision 1)
>    //
>    // FIXME/cjones: this strongly indicates bad design.  Shame on us.
>    std::queue<IPC::Message> mQueue;
>  
> +  // Set this up before we're called from a different thread.
> +  nsCString mTmpDirName;

Should this and other Linux-specific member variables be #ifdef LINUX to avoid unused variable warnings on other platforms?

::: ipc/glue/GeckoChildProcessHost.cpp:281
(Diff revision 1)
>                            || !!PR_GetEnv("MOZ_SANDBOX_LOGGING");
>  #endif
> +#elif defined(XP_LINUX)
> +  // Get and remember the path to the per-content-process tmpdir
> +  if (ShouldHaveDirectoryService()) {
> +    nsCOMPtr<nsIFile> mContentTempDir;

mContentTempDir not a member variable, but has m prefix.

::: ipc/glue/GeckoChildProcessHost.cpp:561
(Diff revision 1)
>    Maybe<AutoSetAndRestoreEnvVarForChildProcess> tmpDir;
>    Maybe<AutoSetAndRestoreEnvVarForChildProcess> xdgCacheHome;
>    Maybe<AutoSetAndRestoreEnvVarForChildProcess> xdgCacheDir;
>    Maybe<AutoSetAndRestoreEnvVarForChildProcess> mesaCacheDir;

These could have a smaller scope by moving them into the if(!mTmpDirName.IsEmpty()) body.
Attachment #8925596 - Flags: review?(haftandilian) → review+
Attachment #8923827 - Attachment is obsolete: true
Attachment #8925596 - Attachment is obsolete: true
Blocks: 1297740
Comment on attachment 8919366 [details]
Bug 1386404 - Enable content-process specific tmpdir on Linux.

https://reviewboard.mozilla.org/r/190216/#review203102

I like this approach much better.  And I like how much “except Linux, which doesn't do this yet” is being deleted.
Attachment #8919366 - Flags: review?(jld) → review+
Depends on: 1405877
No longer blocks: 1297740
Depends on: 1297740
Comment on attachment 8941117 [details]
Bug 1386404 - Don't try to fetch the content process tmpdir if sandboxing is disabled.

https://reviewboard.mozilla.org/r/211386/#review217196
Attachment #8941117 - Flags: review?(jld) → review+
928 INFO TEST-UNEXPECTED-FAIL | browser/components/extensions/test/browser/test-oop-extensions/browser_ext_themes_icons.js | Uncaught exception - [Exception... "Component returned failure code: 0x80520015 (NS_ERROR_FILE_ACCESS_DENIED) [nsIFile.createUnique]" nsresult: "0x80520015 (NS_ERROR_FILE_ACCESS_DENIED)" location: "JS frame :: resource://testing-common/ExtensionTestCommon.jsm :: generateZipFile :: line 271" data: no] [log…] 

https://dxr.mozilla.org/mozilla-central/rev/c38d22170d6f27e94c3c53093215d20255fab27a/toolkit/components/extensions/ExtensionTestCommon.jsm#270

This test writes to TmpD from content. So it's expected to fail. Question is, why isn't it failing on Windows/mac?
It passed on a retrigger. ¯\_(ツ)_/¯
Pushed by gpascutto@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/14f1fbe5263a
Enable content-process specific tmpdir on Linux. r=haik,jld
https://hg.mozilla.org/integration/autoland/rev/fbe717b9a664
Enable access to the entire chrome dir from content. r=jld
https://hg.mozilla.org/integration/autoland/rev/2c007d385ce4
Intercept access to /tmp and rewrite to content process tempdir. r=jld
https://hg.mozilla.org/integration/autoland/rev/b7ca6ae185f2
Only do the tmp remapping if needed. r=jld
https://hg.mozilla.org/integration/autoland/rev/8dca7ef74c4a
Whitelist the prefix used by the XPCOM leak logs. r=haik
https://hg.mozilla.org/integration/autoland/rev/be1441859e8b
Don't try to fetch the content process tmpdir if sandboxing is disabled. r=jld
A quick look shows that the times for each test have been drastically increased. While formerly it took about 5s for each of those it has been increased with this patch to 30s each.

Here without the patch:
[task 2018-01-10T07:55:41.868Z] 07:55:41     INFO - STDOUT: tests/web-platform/tests/webdriver/tests/actions/modifier_click.py::test_modifier_click[\ue00a-altKey] 
[task 2018-01-10T07:55:41.877Z] 07:55:41     INFO - PID 850 | 1515570941872	mozrunner::runner	INFO	Running command: "/builds/worker/workspace/build/application/firefox/firefox" "-marionette" "-profile" "/tmp/rust_mozprofile.XP6TLqbCH1JO"
[task 2018-01-10T07:56:09.352Z] 07:56:09     INFO - PID 850 | 1515570969349	Marionette	INFO	Enabled via --marionette
[task 2018-01-10T07:56:10.708Z] 07:56:10     INFO - PID 850 | GLib-GIO-Message: Using the 'memory' GSettings backend.  Your settings will not be saved or shared with other applications.
[task 2018-01-10T07:56:11.708Z] 07:56:11     INFO - PID 850 | 1515570971702	Marionette	INFO	Listening on port 2828
[task 2018-01-10T07:56:12.605Z] 07:56:12     INFO - STDOUT: PASSED
[task 2018-01-10T07:56:12.606Z] 07:56:12     INFO - STDOUT:  [ 14%]
[task 2018-01-10T07:56:12.966Z] 07:56:12     INFO - PID 850 | *** UTM:SVC TimerManager:registerTimer called after profile-before-change notification. Ignoring timer registration for id: telemetry_modules_ping
[task 2018-01-10T07:56:13.406Z] 07:56:13     INFO - STDOUT: tests/web-platform/tests/webdriver/tests/actions/modifier_click.py::test_modifier_click[\ue052-altKey] 
[task 2018-01-10T07:56:13.411Z] 07:56:13     INFO - PID 850 | 1515570973406	mozrunner::runner	INFO	Running command: "/builds/worker/workspace/build/application/firefox/firefox" "-marionette" "-profile" "/tmp/rust_mozprofile.VBpedvTgLmlv"
[task 2018-01-10T07:56:14.175Z] 07:56:14     INFO - PID 850 | 1515570974168	Marionette	INFO	Enabled via --marionette
[task 2018-01-10T07:56:15.417Z] 07:56:15     INFO - PID 850 | GLib-GIO-Message: Using the 'memory' GSettings backend.  Your settings will not be saved or shared with other applications.
[task 2018-01-10T07:56:16.405Z] 07:56:16     INFO - PID 850 | 1515570976399	Marionette	INFO	Listening on port 2828
[task 2018-01-10T07:56:17.221Z] 07:56:17     INFO - STDOUT: PASSED
[task 2018-01-10T07:56:17.222Z] 07:56:17     INFO - STDOUT:  [ 28%]
[task 2018-01-10T07:56:17.513Z] 07:56:17     INFO - PID 850 | *** UTM:SVC TimerManager:registerTimer called after profile-before-change notification. Ignoring timer registration for id: telemetry_modules_ping
[task 2018-01-10T07:56:17.921Z] 07:56:17     INFO - STDOUT: tests/web-platform/tests/webdriver/tests/actions/modifier_click.py::test_modifier_click[\ue03d-metaKey] 
[task 2018-01-10T07:56:17.938Z] 07:56:17     INFO - PID 850 | 1515570977922	mozrunner::runner	INFO	Running command: "/builds/worker/workspace/build/application/firefox/firefox" "-marionette" "-profile" "/tmp/rust_mozprofile.3sifLESyQ8Az"
[task 2018-01-10T07:56:18.653Z] 07:56:18     INFO - PID 850 | 1515570978650	Marionette	INFO	Enabled via --marionette
[task 2018-01-10T07:56:19.949Z] 07:56:19     INFO - PID 850 | GLib-GIO-Message: Using the 'memory' GSettings backend.  Your settings will not be saved or shared with other applications.
[task 2018-01-10T07:56:20.887Z] 07:56:20     INFO - PID 850 | 1515570980883	Marionette	INFO	Listening on port 2828
[task 2018-01-10T07:56:21.774Z] 07:56:21     INFO - STDOUT: PASSED

And with this patch:

[task 2018-01-10T08:29:06.259Z] 08:29:06     INFO - STDOUT: tests/web-platform/tests/webdriver/tests/actions/modifier_click.py::test_modifier_click[\ue00a-altKey] 
[task 2018-01-10T08:29:06.263Z] 08:29:06     INFO - PID 849 | 1515572946257	mozrunner::runner	INFO	Running command: "/builds/worker/workspace/build/application/firefox/firefox" "-marionette" "-profile" "/tmp/rust_mozprofile.NLXdw4d9a338"
[task 2018-01-10T08:29:33.492Z] 08:29:33     INFO - PID 849 | 1515572973487	Marionette	INFO	Enabled via --marionette
[task 2018-01-10T08:29:34.356Z] 08:29:34     INFO - PID 849 | GLib-GIO-Message: Using the 'memory' GSettings backend.  Your settings will not be saved or shared with other applications.
[task 2018-01-10T08:29:35.182Z] 08:29:35     INFO - PID 849 | 1515572975174	Marionette	INFO	Listening on port 2828
[task 2018-01-10T08:30:02.062Z] 08:30:02     INFO - STDOUT: PASSED
[task 2018-01-10T08:30:02.063Z] 08:30:02     INFO - STDOUT:  [ 14%]
[task 2018-01-10T08:30:02.265Z] 08:30:02     INFO - PID 849 | *** UTM:SVC TimerManager:registerTimer called after profile-before-change notification. Ignoring timer registration for id: telemetry_modules_ping
[task 2018-01-10T08:30:02.669Z] 08:30:02     INFO - STDOUT: tests/web-platform/tests/webdriver/tests/actions/modifier_click.py::test_modifier_click[\ue052-altKey] 
[task 2018-01-10T08:30:02.670Z] 08:30:02     INFO - PID 849 | 1515573002667	mozrunner::runner	INFO	Running command: "/builds/worker/workspace/build/application/firefox/firefox" "-marionette" "-profile" "/tmp/rust_mozprofile.C5dzeBgsfEEe"
[task 2018-01-10T08:30:03.283Z] 08:30:03     INFO - PID 849 | 1515573003279	Marionette	INFO	Enabled via --marionette
[task 2018-01-10T08:30:04.119Z] 08:30:04     INFO - PID 849 | GLib-GIO-Message: Using the 'memory' GSettings backend.  Your settings will not be saved or shared with other applications.
[task 2018-01-10T08:30:05.067Z] 08:30:05     INFO - PID 849 | 1515573005063	Marionette	INFO	Listening on port 2828
[task 2018-01-10T08:30:32.051Z] 08:30:32     INFO - STDOUT: PASSED
[task 2018-01-10T08:30:32.051Z] 08:30:32     INFO - STDOUT:  [ 28%]
[task 2018-01-10T08:30:32.255Z] 08:30:32     INFO - PID 849 | *** UTM:SVC TimerManager:registerTimer called after profile-before-change notification. Ignoring timer registration for id: telemetry_modules_ping
[task 2018-01-10T08:30:32.662Z] 08:30:32     INFO - STDOUT: tests/web-platform/tests/webdriver/tests/actions/modifier_click.py::test_modifier_click[\ue03d-metaKey] 
[task 2018-01-10T08:30:32.666Z] 08:30:32     INFO - PID 849 | 1515573032663	mozrunner::runner	INFO	Running command: "/builds/worker/workspace/build/application/firefox/firefox" "-marionette" "-profile" "/tmp/rust_mozprofile.2zbBz4BRWnRL"
[task 2018-01-10T08:30:33.289Z] 08:30:33     INFO - PID 849 | 1515573033283	Marionette	INFO	Enabled via --marionette
[task 2018-01-10T08:30:34.137Z] 08:30:34     INFO - PID 849 | GLib-GIO-Message: Using the 'memory' GSettings backend.  Your settings will not be saved or shared with other applications.
[task 2018-01-10T08:30:35.144Z] 08:30:35     INFO - PID 849 | 1515573035137	Marionette	INFO	Listening on port 2828
[task 2018-01-10T08:31:02.055Z] 08:31:02     INFO - STDOUT: PASSED

This starts the browser in-between each of the test. So maybe this is related to longer shutdown times?
Also an interesting measure for startup. This is the timing until the `-marionette` command line argument gets processed. The following outputs are for debug builds and the very first start of Firefox:

Without the patch (2s)

[task 2018-01-10T07:57:58.888Z] 07:57:58     INFO - PID 850 | 1515571078882	mozrunner::runner	INFO	Running command: "/builds/worker/workspace/build/application/firefox/firefox" "-marionette" "-profile" "/tmp/rust_mozprofile.JnlJo0Uz7m8i"
[task 2018-01-10T07:58:00.641Z] 07:58:00     INFO - PID 850 | [863, Main Thread] WARNING: NS_ENSURE_SUCCESS(rv, rv) failed with result 0x80520012: file /builds/worker/workspace/build/src/extensions/cookie/nsPermissionManager.cpp, line 2878
[task 2018-01-10T07:58:00.759Z] 07:58:00     INFO - PID 850 | 1515571080751	Marionette	INFO	Enabled via --marionette

With the patch (29s)

[task 2018-01-10T08:06:09.005Z] 08:06:09     INFO - PID 855 | 1515571568991	mozrunner::runner	INFO	Running command: "/builds/worker/workspace/build/application/firefox/firefox" "-marionette" "-profile" "/tmp/rust_mozprofile.gpBu9Q11UhFa"
[task 2018-01-10T08:06:11.105Z] 08:06:11     INFO - PID 855 | [868, Main Thread] WARNING: NS_ENSURE_SUCCESS(rv, rv) failed with result 0x80520012: file /builds/worker/workspace/build/src/extensions/cookie/nsPermissionManager.cpp, line 2878
[task 2018-01-10T08:06:38.716Z] 08:06:38     INFO - PID 855 | 1515571598707	Marionette	INFO	Enabled via --marionette
This also explains why Linux opt went green.
Flags: needinfo?(gpascutto)
There are also a couple of Marionette failures during that timespan the patch was on autoland:
https://treeherder.mozilla.org/#/jobs?repo=autoland&filter-searchStr=mn%20linux&bugfiler&fromchange=b542c2c398bb
I eventually traced this down to remapping XDG_CACHE_HOME. It seems XDG_CACHE_DIR is not actually a thing, see e.g. https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=736448 and https://specifications.freedesktop.org/basedir-spec/latest/ar01s03.html, so I probably saw that in some buggy software.

If we remap XDG_CACHE for content, it causes things to either get very slow or hang, particularly for a few specific tests. What makes this strange is that by default we don't even allow writes in ~/.cache, which is the default for that variable.

My best guess is this: if XDG_CACHE is writable, and does not contain the expected data, something is constructing its cache in there. On my local system this might be fontconfig. We may be able to observe what it is on the try infra.

https://treeherder.mozilla.org/#/jobs?repo=try&revision=bf97e4184c1f0039c8d7bbf9c3fe38d1b090d5a6
venv) root@967cb64fd29c:~/workspace/build# ls -l ~/.cache/
total 28
drwx------ 2 worker worker 4096 Jan 26 11:23 compizconfig-1
drwx------ 2 worker worker 4096 Jan 26 11:23 dconf
drwx------ 8 worker worker 4096 Jan 26 11:23 evolution
drwxr-xr-x 2 worker worker 4096 Jan 26 11:23 fontconfig
drwxr-xr-x 2 worker worker 4096 Jan 26 11:23 gstreamer-1.0
drwx------ 3 root   root   4096 Jan 26 12:09 mozilla
drwx------ 2 worker worker 4096 Jan 26 11:23 wallpaper


(venv) root@967cb64fd29c:~/workspace/build# ls ~/.cache/mozilla/firefox/
5vtpzbbk.marionette-test-profile-1516968577750-1516968578526

The per-content process tmpdir had nothing interesting.

Current theory: our caching code gets confused when its cache disappears from underneath it?
Maybe one of these is writing/reading from content? But then we'd see errors in the sandboxing logs...

./safebrowsing/test-block-simple.pset
...
./safebrowsing/block-flash-digest256.sbstore
./cache2
./cache2/ce_T151c2VyQ29udGV4dElkPTEs
./cache2/entries
./cache2/entries/63F52231E2FE67A5786DFB0D3F909098C426133E
...
./cache2/entries/8216FC49610DFC8565BF49DDE19D682A4D9DB83E
./cache2/doomed
./cache2/ce_T151c2VyQ29udGV4dElkPTEsYSw=
./startupCache
./startupCache/scriptCache.bin
./startupCache/scriptCache-child.bin
./startupCache/urlCache.bin
./OfflineCache
./OfflineCache/index.sqlite
./thumbnails
./activity-stream.topstories.json
./directoryLinks.json
rm -rf ~/.mozilla/

marionette --binary application/firefox/firefox -vv tests/marionette/te
sts/browser/components/migration/tests/marionette/test_refresh_firefox.py

TEST-START | marionette/tests/browser/components/migration/tests/marionette/test_refresh_firefox.py TestFirefoxRefr
esh.testReset
Application command: application/firefox/firefox -no-remote -marionette -profile /tmp/tmpYyKS7_.mozrunner
TEST-PASS | marionette/tests/browser/components/migration/tests/marionette/test_refresh_firefox.py TestFirefoxRefre
sh.testReset | took 15787ms
SUMMARY
-------
passed: 1
failed: 0
Blocks: 1434281
No longer blocks: 1434281
Depends on: 1434281
Comment on attachment 8947011 [details]
Bug 1386404 - Use the full tmpdir finding logic.

https://reviewboard.mozilla.org/r/216832/#review222838

::: security/sandbox/linux/broker/SandboxBroker.cpp:39
(Diff revision 1)
>  #include "sandbox/linux/system_headers/linux_syscalls.h"
>  
>  namespace mozilla {
>  
> +// Default/fallback temporary directory
> +static const nsLiteralCString tempDirPrefix(NS_LITERAL_CSTRING("/tmp"));

Nit: does this need the `NS_LITERAL_CSTRING` or can it just be initialized with `"/tmp"`?

::: security/sandbox/linux/broker/SandboxBroker.cpp:646
(Diff revision 1)
> +  // If we can't find it, we aren't bothered much: we will
> +  // always try /tmp anyway in the substitution code
> +  if (NS_FAILED(rv) || mTempPath.IsEmpty()) {
> +    SANDBOX_LOG_ERROR("Tempdir: /tmp");
> +  } else {
> +    SANDBOX_LOG_ERROR("Tempdir: %s", mTempPath.get());

These log messages should be verbose-only.
Attachment #8947011 - Flags: review?(jld) → review+
Duplicate of this bug: 1422570
See Also: → 1439057
You need to log in before you can comment on or make changes to this bug.