Closed Bug 1385928 Opened 3 years ago Closed 3 years ago

Mozregression launched nightly after 2017-07-30 don't load start page

Categories

(Core :: Security: Process Sandboxing, defect, P2, major)

defect

Tracking

()

VERIFIED FIXED
mozilla57
Tracking Status
firefox-esr52 --- unaffected
firefox55 --- unaffected
firefox56 --- verified
firefox57 --- verified

People

(Reporter: tcampbell, Assigned: bobowen)

References

(Depends on 1 open bug, Blocks 1 open bug)

Details

(Keywords: regression, Whiteboard: sb+)

Attachments

(2 files)

STR:
  - On Win10, use mozregression to start nightly 2017-07-30

Expected:
  - Welcome to nightly start page appears

Actual:
  - No page loaded
  - Hidden window appears in task bar
  - Window close buttons don't work (but hotkeys Ctrl+Shift+Q does work)

Regression Range:
https://hg.mozilla.org/integration/mozilla-inbound/pushloghtml?fromchange=9ff81e376529981389074a985083d161e2f155d3&tochange=27ac1eee8d2952cfb25b32f71a62dd0dd7f9e05e

This seems like an interaction between sandbox updates and mozregression script. Starting a system nightly with --profile works fine for me.
Could changes in Bug 1369669 break the way mozregression starts up new profiles?
Flags: needinfo?(bobowencode)
Blocks: 1369669
Component: mozregression → Security: Process Sandboxing
Product: Testing → Core
Version: Version 3 → unspecified
Also I'm unable to navigate to a site. Mozregression-gui is unusable regarding recent regressions right now. I'll attach the output from the log view.
Severity: normal → major
Has Regression Range: --- → yes
Has STR: --- → yes
Keywords: regression
Temporary work-around is to run mozregression with |--pref security.sandbox.content.level:0| option. This will turn down sandbox and allow using mozregression until issue can be fixed.
(In reply to Ted Campbell [:tcampbell] from comment #4)
> Temporary work-around is to run mozregression with |--pref
> security.sandbox.content.level:0| option. This will turn down sandbox and
> allow using mozregression until issue can be fixed.

Is this also possible with mozregression-gui?
(In reply to Arjen de Jager from comment #5)
> (In reply to Ted Campbell [:tcampbell] from comment #4)
> > Temporary work-around is to run mozregression with |--pref
> > security.sandbox.content.level:0| option. This will turn down sandbox and
> > allow using mozregression until issue can be fixed.
> 
> Is this also possible with mozregression-gui?

And yes, it is. I'm sorry for bothering you! I'm not very experienced with mozregression-gui and didn't experiment with it's options that much already.
(In reply to Ted Campbell [:tcampbell] from comment #1)
> Could changes in Bug 1369669 break the way mozregression starts up new
> profiles?

This just resolved any symlinks or junction points in the child executable path.
Was it definitely bug 1369669 not bug 1366697?
Maybe it was the strengthening of the sandbox that changed things, does mozregression rely on reading files in the content process?
Flags: needinfo?(bobowencode)
Doesn't look like Bug 1366697 is to blame. We should already have been at level 3 since these are nightly builds. As well, sandbox level 1 still has the issue.

Looking at console output, I see the following when sandbox is on:
 [Parent 8576] WARNING: Failed to launch tab subprocess: file z:/build/build/src/ipc/glue/GeckoChildProcessHost.cpp, line 604
Priority: -- → P1
Whiteboard: sb?
Depends on: 1388048
(In reply to Ted Campbell [:tcampbell] from comment #8)
> Doesn't look like Bug 1366697 is to blame. We should already have been at
> level 3 since these are nightly builds. As well, sandbox level 1 still has
> the issue.
> 
> Looking at console output, I see the following when sandbox is on:
>  [Parent 8576] WARNING: Failed to launch tab subprocess: file
> z:/build/build/src/ipc/glue/GeckoChildProcessHost.cpp, line 604

Yes it must be Bug 1369669 then.

I can't reproduce, but I notice that mozregression is installing firefox in the temp dir, so maybe the resolution of junction points and symlinks is failing in some circumstances.
I noticed that I missed checking for that, so I've filed bug 1388048 for fixing that and hopefully that will fix this.
Priority: P1 → P2
Whiteboard: sb? → sb+
Logs after Bug 1388048 landed.

 0:02.75 INFO: [Main Thread]: D/Widget ResolveJunctionPointsAndSymLinks: Resolving path: c:\users\tcampb~1\appdata\local\temp\tmpewukwq\firefox
 0:02.75 INFO: [Main Thread]: D/Widget ResolveJunctionPointsAndSymLinks: Resolved path to: C:\Users\tcampbell\AppData\Local\Temp\tmpewukwq\firefox
 0:02.75 INFO: [Main Thread]: D/Widget ResolveJunctionPointsAndSymLinks: Resolving path: c:\users\tcampb~1\appdata\local\temp\tmpewukwq\firefox\browser
 0:02.75 INFO: [Main Thread]: D/Widget ResolveJunctionPointsAndSymLinks: Resolved path to: C:\Users\tcampbell\AppData\Local\Temp\tmpewukwq\firefox\browser
 0:02.75 INFO: [Main Thread]: D/Widget ResolveJunctionPointsAndSymLinks: Resolving path: c:\users\tcampb~1\appdata\local\temp\tmpewukwq\firefox\firefox.exe
 0:02.75 INFO: [Main Thread]: D/Widget ResolveJunctionPointsAndSymLinks: Resolved path to: C:\Users\tcampbell\AppData\Local\Temp\tmpewukwq\firefox\firefox.exe
 0:02.75 INFO: [Main Thread]: D/Widget ResolveJunctionPointsAndSymLinks: Resolving path: c:\users\tcampb~1\appdata\local\temp\tmptahvdw.mozrunner
 0:02.75 INFO: [Main Thread]: D/Widget ResolveJunctionPointsAndSymLinks: Resolved path to: C:\Users\tcampbell\AppData\Local\Temp\tmptahvdw.mozrunner
 0:03.30 INFO: [Main Thread]: D/Widget ResolveJunctionPointsAndSymLinks: Resolving path: C:\Users\tcampbell\AppData\LocalLow\Mozilla\Temp-{6662030a-6ae2-4a45-8ff3-7f2858a2cbb2}
 0:03.30 INFO: [Main Thread]: D/Widget ResolveJunctionPointsAndSymLinks: Resolved path to: C:\Users\tcampbell\AppData\LocalLow\Mozilla\Temp-{6662030a-6ae2-4a45-8ff3-7f2858a2cbb2}
 0:03.30 INFO: [Gecko_IOThread]: D/Widget ResolveJunctionPointsAndSymLinks: Resolving path: c:\users\tcampb~1\appdata\local\temp\tmpewukwq\firefox\firefox.exe
 0:03.30 INFO: [Gecko_IOThread]: D/Widget ResolveJunctionPointsAndSymLinks: Resolved path to: C:\Users\tcampbell\AppData\Local\Temp\tmpewukwq\firefox\firefox.exe
 0:03.48 INFO: [Gecko_IOThread]: D/Widget ResolveJunctionPointsAndSymLinks: Resolving path: c:\users\tcampb~1\appdata\local\temp\tmpewukwq\firefox\firefox.exe
 0:03.48 INFO: [Gecko_IOThread]: D/Widget ResolveJunctionPointsAndSymLinks: Resolved path to: C:\Users\tcampbell\AppData\Local\Temp\tmpewukwq\firefox\firefox.exe
 0:03.48 INFO: [Parent 7312] WARNING: Failed to launch tab subprocess: file z:/build/build/src/ipc/glue/GeckoChildProcessHost.cpp, line 608
 0:03.48 INFO: [Gecko_IOThread]: D/Widget ResolveJunctionPointsAndSymLinks: Resolving path: c:\users\tcampb~1\appdata\local\temp\tmpewukwq\firefox\firefox.exe
 0:03.48 INFO: [Gecko_IOThread]: D/Widget ResolveJunctionPointsAndSymLinks: Resolved path to: C:\Users\tcampbell\AppData\Local\Temp\tmpewukwq\firefox\firefox.exe
 0:03.48 INFO: [Parent 7312] WARNING: Failed to launch tab subprocess: file z:/build/build/src/ipc/glue/GeckoChildProcessHost.cpp, line 608
 0:03.48 INFO: [Gecko_IOThread]: D/Widget ResolveJunctionPointsAndSymLinks: Resolving path: c:\users\tcampb~1\appdata\local\temp\tmpewukwq\firefox\firefox.exe
 0:03.48 INFO: [Gecko_IOThread]: D/Widget ResolveJunctionPointsAndSymLinks: Resolved path to: C:\Users\tcampbell\AppData\Local\Temp\tmpewukwq\firefox\firefox.exe
 0:03.50 INFO: [Parent 7312] WARNING: Failed to launch tab subprocess: file z:/build/build/src/ipc/glue/GeckoChildProcessHost.cpp, line 608
Doesn't reproduce on one of my machines, but does on other. I'm now wondering if problem occurs only when login name is >8 chars and we have short paths.
Confirmed. If I change my temp directory to a path without 8.3names, I no longer get issues.
More specifically, a mix of 8.3name and a directory with >8 chars.

STR:
- Create C:\longpathname\longpathname\
- Put a firefox bundle in it
- In command prompt (boring old windows one), go to C:\longpa~1\longpathname\
- Try to start firefox from here. (Run firefox\firefox.exe)
Will, can we patch mozregression to expand the path returned by mkdtemp as a workaround? It looks like there can be issues on Windows if someone's username is more than 8 characters (or contains a space).
Flags: needinfo?(wlachance)
(In reply to Ted Campbell [:tcampbell] from comment #14)
> Will, can we patch mozregression to expand the path returned by mkdtemp as a
> workaround? It looks like there can be issues on Windows if someone's
> username is more than 8 characters (or contains a space).

Should be quite straightforward. Probably just need to use the technique here:

https://stackoverflow.com/questions/11420689/how-to-get-long-file-system-path-from-python-on-windows

Here:

https://github.com/mozilla/mozregression/blob/19cd5b376be731d6b92728716ac9f32a22365184/mozregression/main.py#L51

Or maybe define a utility function performing the transformation, and use it wherever mkdtemp is called (there are a few other spots). If you wanted to do up a PR against mozregression, that would be cool, otherwise I can do it when I get back from PTO/weekend next Monday.
Flags: needinfo?(wlachance)
(In reply to Ted Campbell [:tcampbell] from comment #13)
> More specifically, a mix of 8.3name and a directory with >8 chars.
> 
> STR:
> - Create C:\longpathname\longpathname\
> - Put a firefox bundle in it
> - In command prompt (boring old windows one), go to C:\longpa~1\longpathname\
> - Try to start firefox from here. (Run firefox\firefox.exe)

Ted, thanks for investigating this further.
It looks like we might need to fix this in general.

Although a workaround for mozregression in the short term would be good.
I'll look at this again when I get back.
By request of :ted, I'm going to look into the mozregression fix. Setting a NI as a reminder.
Flags: needinfo?(wlachance)
I put a fix in https://github.com/mozilla/mozregression/commit/2a76a90ced4095d557abbd61bab029557e8ae115
and posted a new versions of mozregression command line (the gui should be automatically updated shortly)

Not sure if there's anything left to do for this bug, so I'll just leave it open.
Flags: needinfo?(wlachance)
Thanks. That mozregression patch fixes the issue on my machine and lets me bisect again without turning off sandbox. We can leave this bug open until underlying sandbox issue is fixed.
(In reply to Ted Campbell [:tcampbell] from comment #13)
> More specifically, a mix of 8.3name and a directory with >8 chars.
> 
> STR:
> - Create C:\longpathname\longpathname\
> - Put a firefox bundle in it
> - In command prompt (boring old windows one), go to C:\longpa~1\longpathname\
> - Try to start firefox from here. (Run firefox\firefox.exe)

The problem is due to the way the sandbox looks for the base address for the executable.
It compares paths from ::QueryFullProcessImageNameW and ::GetMappedFileNameW (searching through the virtual memory mappings).

These always seem to be consistent on Win7 but on Win10 they don't in circumstances such as these.
(This is probably after creators update as we saw similar issues with firefox.exe if it had been moved after being run and mapped into memory.)

I'll see if I can come up with something.
Assignee: nobody → bobowencode
Status: NEW → ASSIGNED
Chromium have obviously been hitting similar issues [1].
So, I think it makes sense for us to patch our version of GetProcessBaseAddress with this new one.

I might let it bed in for a few days first though. :-)

[1] https://bugs.chromium.org/p/chromium/issues/detail?id=753140
Chromium patch seems to be sticking, here's a try push with there new version of the function:
https://treeherder.mozilla.org/#/jobs?repo=try&revision=6a8e70452eafef8680ee9def26836dd95a5f9cb6
This should fix issues we have seen with running Firefox from short name paths or moved binaries.
Attachment #8901240 - Flags: review?(jmathies)
Attachment #8901240 - Flags: review?(jmathies) → review+
Pushed by bobowencode@gmail.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/8d7578086308
Take new implementation of GetProcessBaseAddress from chromium commit f398005bc4ca0cc2dab2198faa99d4ee8f4da60d. r=jimm
https://hg.mozilla.org/mozilla-central/rev/8d7578086308
Status: ASSIGNED → RESOLVED
Closed: 3 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla57
Please request Beta approval on this when you get a chance.
Flags: needinfo?(bobowencode)
Comment on attachment 8901240 [details] [diff] [review]
Take new implementation of GetProcessBaseAddress from chromium commit f398005bc4ca0cc2dab2198faa99d4ee8f4da60d

Approval Request Comment
[Feature/Bug causing the regression]:
Windows process sandboxing.

[User impact if declined]:
Users who are running Firefox using a short name path or after moving Firefox may continue to have an unusable version of Firefox.

[Is this code covered by automated tests?]:
This code is in the launching of sandboxed child processes, which happens in nearly all e10s tests.

[Has the fix been verified in Nightly?]:
Yes.
Also the SANDBOX_FAILED_LAUNCH telemetry in Nightly now has no failures for 43 (SBOX_ERROR_CANNOT_FIND_BASE_ADDRESS) since 26th Aug when this landed.

[Needs manual test from QE? If yes, steps to reproduce]: 
Yes, STR in comment 13 can be used on Windows 10 (probably needs creators update).

[List of other uplifts needed for the feature/fix]:
None

[Is the change risky?]:
Slightly

[Why is the change risky/not risky?]:
This fix has landed in chromium but it doesn't look like they will uplift it. This is possibly because they have another existing fix that somewhat mitigates the issue.
However the new code is much simpler than the old way of getting the base address, so the risk seems fairly small.

[String changes made/needed]:
None
Flags: needinfo?(bobowencode)
Attachment #8901240 - Flags: approval-mozilla-beta?
Comment on attachment 8901240 [details] [diff] [review]
Take new implementation of GetProcessBaseAddress from chromium commit f398005bc4ca0cc2dab2198faa99d4ee8f4da60d

Fix for a new regression in 56, let's uplift this for beta 8.
Attachment #8901240 - Flags: approval-mozilla-beta? → approval-mozilla-beta+
Flags: qe-verify+
Blocks: 1368600
I managed to reproduce the bug on an old version of Nightly (2017-07-30) on Windows 10 x64 using the method from comment 13. The page was not loaded, I couldn't close any tabs and couldn't open new ones. The browser crashed. 

I retested everything using the same method on latest Nightly 57.0a1 and beta 56.0b12 and the bug is not reproducing anymore.
Status: RESOLVED → VERIFIED
Flags: qe-verify+
Blocks: 1400826
You need to log in before you can comment on or make changes to this bug.