Closed Bug 1421262 Opened 6 years ago Closed 6 years ago

Firefox renders garbage viewing PDFs or Google Docs with nVidia driver

Categories

(Core :: Graphics, defect, P3)

57 Branch
defect

Tracking

()

RESOLVED FIXED
mozilla59
Tracking Status
firefox58 --- fixed
firefox59 --- fixed

People

(Reporter: manuel.nuno.melo, Assigned: haik)

References

Details

(Whiteboard: [gfx-noted])

Attachments

(2 files)

User Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.13; rv:57.0) Gecko/20100101 Firefox/57.0
Build ID: 20171112125346

Steps to reproduce:

Open a PDF in Firefox, like http://www.pdf995.com/samples/pdf.pdf, or a Google Docs Spreadsheet (screenshot below taken on the 'Travel planner' spreadsheet template).

MacBook Pro (13-inch, Mid 2010) running macOS High Sierra 10.13.1.  Happens only since the Quantum update, and only when using NVIDIA's Web Driver (for a NVIDIA GeForce 320M 256 MB). Switching to OS X Default Graphics Driver solves the issue, but I'd rather use my card's vendor's driver.

Maybe related to https://bugzilla.mozilla.org/show_bug.cgi?id=1419264


Actual results:

Garbage is rendered instead of the PDF or Spreadsheet contents. Garbage seems to be dynamic, in that page display updates seem to refresh the garbage and sometimes actually show the desired content. When having a PDF and Spreadsheet open simultaneously sometimes one shows garbage from the other's window.

The underlying document structure seems to be preserved (I can select text/cells in each document).


Expected results:

PDF/Spreadsheet documents should have been rendered properly.
Component: Untriaged → Graphics
Product: Firefox → Core
This seems like it could be the same underlying issue as bug 1422027, the web driver.

Since this only happens since the quantum update, would you be able to try using mozregression to determine what change started triggering this driver bug? You can follow the instructions here [1]

[1] http://mozilla.github.io/mozregression/
Flags: needinfo?(manuel.nuno.melo)
See Also: → 1422027
Whiteboard: [gfx-noted]
Ok, will try that and report back.
37:35.44 INFO: Narrowed inbound regression window from [ba8db0fb, db5209b5] (3 builds) to [ba8db0fb, 6b101438] (2 builds) (~1 steps left)
37:35.44 INFO: No more inbound revisions, bisection finished.
37:35.44 INFO: Last good revision: ba8db0fbc00605d6d097dde0b7e034297f55c1ec
37:35.44 INFO: First bad revision: 6b101438c684bff925471edbfe593e500bcb3a03
37:35.44 INFO: Pushlog:
https://hg.mozilla.org/integration/autoland/pushloghtml?fromchange=ba8db0fbc00605d6d097dde0b7e034297f55c1ec&tochange=6b101438c684bff925471edbfe593e500bcb3a03

A side note to other mozregression testers of this bug:
the bug seems to have to do with some reuse of render data (contents from one viewer contaminate the other). I was testing by always viewing the same pdf, and the garbage sometimes seemed not to be there -- because it apparently was reusing the data from a previous test view. Skipping a couple of pdf pages brings up the problem clearly again because now the first pages start contaminating the later ones.

Hope it now becomes easy to fix.
Flags: needinfo?(manuel.nuno.melo)
Went a bit further on my own. The breaking commit, associated with https://bugzilla.mozilla.org/show_bug.cgi?id=1332190, is reducing the sandbox file open scope. In that bug it is mentioned that this is overridable, and sure enough if I set security.sandbox.content.level to 2 the problem is fixed in the current version.
Thanks!

Haik Aftandilian would you have any idea what is going on or if there is anything we can do to fix this? This seems to be a bug with the NVIDIA web driver on OSX that only happens when security.sandbox.content.level is 3.
Blocks: 1332190
Status: UNCONFIRMED → NEW
Ever confirmed: true
Flags: needinfo?(haftandilian)
Given that setting security.sandbox.content.level=2 works around the problem, this sounds like an incompatibility with our level 3 Mac sandbox rules and the Nvidia driver on this MacBook Pro.

I don't have access to that revision of MacBook Pro, but we might be able to figure out what's causing the problem using the OS X Console (/Applications/Utilities/Console.app). The steps to do this are

1) Turn on sandbox violation logging in Firefox. Open about:config in the browser and set security.sandbox.logging.enabled=true and then quit Firefox.

2) Before restarting the browser, open the Mac Console application (from /Applications/Utilities/Console) and enter plugin-container in the Search field. 

3) Click "Clear" in Console.

4) Then launch Firefox again using the same profile.

5) You should start to see some violation logging as soon as you launch Firefox. These are expected. For example, expect to see errors like "Sandbox: plugin-container(3901) deny(1) file-read-metadata /Users/<username>" and some errors related to Pasteboard.

But if you see anything in the Console that sounds in any way related to graphics or Nvidia, that might be related to the problem.

You can compare the Console warnings with/without the Nvidia driver. I find it works best to "clear" the console before starting Firefox each time. The Console app also allows you to select all and copy and then paste into a text editor.
Flags: needinfo?(haftandilian)
manuel, any chance you could test using the Console app as described in comment 6? Thanks
Flags: needinfo?(manuel.nuno.melo)
I followed the instructions and may have got something:

I checked console output under sandbox levels 2 and 3, using the nVidia and the OS's driver. Interesting stuff only seemed to show up when loading a pdf for viewing.

=============================================================================================================================

With nVidia, under level 3 (rendering is broken) there were a couple of mentions to GL:

fault 04:24:01.536749 +0000   kernel  Sandbox: plugin-container(54580) deny(1) file-read-metadata /Library/GPUBundles/GeForceTeslaGLDriverWeb.bundle

and

error   04:25:12.431726 +0000   sandboxd    Sandbox: plugin-container(54580) deny sysctl-read hw.cachelinesize

-----------------------------------------------------------------------------------------------------------------------------

Under level 2 (rendering is fine) I got twice this one:

error   04:18:19.832784 +0000   sandboxd    Sandbox: plugin-container(54540) deny file-write-data /private/var/folders/gh/ln5zc4f920q_c5y4dxykh3xh0000gn/C/   com.nvidia.OpenGL/570DF816-1294-3816-8AA5-1E85D34E12E1/8F577142C1A3E87B/B5A80C40-794A-3C12-9E5E-90BE32676F30.toc

=============================================================================================================================

When running without nVidia's driver (rendering is fine) I got, under level 3:

default	04:42:28.178928 +0000	kernel	Sandbox: plugin-container(500) deny(1) file-write-create /private/var/folders/gh/ln5zc4f920q_c5y4dxykh3xh0000gn/C/com.nvidia.OpenGL/8542743A-9A86-32E0-AF67-9F5159ABF99D/8F577142C1A3E87B/B5A80C40-794A-3C12-9E5E-90BE32676F30.toc

-----------------------------------------------------------------------------------------------------------------------------

and under level 2 (quite similar to above):

error   04:47:37.544767 +0000   sandboxd    Sandbox: plugin-container(575) deny file-write-create /private/var/folders/gh/ln5zc4f920q_c5y4dxykh3xh0000gn/C/   com.nvidia.OpenGL/8542743A-9A86-32E0-AF67-9F5159ABF99D/8F577142C1A3E87B/B5A80C40-794A-3C12-9E5E-90BE32676F30.toc

=============================================================================================================================

The two first errors, different from the rest and only happening when rendering fails, seem like a good bet to dig deeper. From the sound of it I'm betting on the hw.cachelinesize thing: the corruption I see, with reused data, hints at some sort of memory/cache misreferencing.
Flags: needinfo?(manuel.nuno.melo)
(In reply to manuel.nuno.melo from comment #8)
> I followed the instructions and may have got something:
> 
> I checked console output under sandbox levels 2 and 3, using the nVidia and
> the OS's driver. Interesting stuff only seemed to show up when loading a pdf
> for viewing.
> 
> =============================================================================
> ================================================
> 
> With nVidia, under level 3 (rendering is broken) there were a couple of
> mentions to GL:
> 
> fault 04:24:01.536749 +0000   kernel  Sandbox: plugin-container(54580)
> deny(1) file-read-metadata /Library/GPUBundles/GeForceTeslaGLDriverWeb.bundle
> 
> and
> 
> error   04:25:12.431726 +0000   sandboxd    Sandbox: plugin-container(54580)
> deny sysctl-read hw.cachelinesize

Thanks! That's really helpful. I think these two are significant. Writing to the .toc files may also be important, but if graphics are rendering properly without allowing those writes, we'd prefer to not allow them. (Allowing writes is much more security-sensitive.) To know for sure, I'll try to speak with someone more knowledgeable about the Nvidia driver.

On a 2010 Mac Mini I had access to (that also has an Nvidia 320M) running 10.11, the bundle wasn't present. Is /Library/GPUBundles/GeForceTeslaGLDriverWeb.bundle present on your machine?

And could you test the build(s) below to see if they resolve the problem for you? If so, I recommend testing in a new profile to avoid the possibility Nightly damages one of your existing profiles.

This is a build of Nightly with a single change to allow content processes to access the hw.cachelinesize sysctl.

  hw.cachelinesize only
  https://queue.taskcluster.net/v1/task/ElOqxcseQU2e9gUm0j6eqg/runs/0/artifacts/public/build/target.dmg

And this build allows access to /Library/GPUBundles as well as hw.cachelinesize.

  GPUBundles + hw.cachelinesize
  https://queue.taskcluster.net/v1/task/XrYd-iupTJuEMLWxV7uxhw/runs/0/artifacts/public/build/target.dmg
Flags: needinfo?(manuel.nuno.melo)
Will test the nightlies and report.

As for /Library/GPUBundles/GeForceTeslaGLDriverWeb.bundle, it exists, with a creation == last modified date of 7 Dec. The /Library/GPUBundles dir itself has a creation of 3 Oct, meaning that it's a likely recent addition to the OS (therefore not present on the 10.11 you tested).

A note if you also want to pursue the .toc write fails: they occur even when the nVidia driver's disabled. Why is FF writing to an nVidia-related temp dir when the driver is not in use?
Ok tested. Got positive results:

My hunch was wrong. The first nightly (access to hw.cachelinesize only) still had the issue. The second (access to /Library/GPUBundles + hw.cachelinesize) solved it.

I'll be glad to try a third version enabling access to /Library/GPUBundles only, if you'd like to narrow it down.

In the case of the first nightly I got (only) the console message complaining about access to /Library/GPUBundles/GeForceTeslaGLDriverWeb.bundle.

For the second nightly, which worked, I got the .toc access error, as with all other cases that work.
Flags: needinfo?(manuel.nuno.melo)
(In reply to manuel.nuno.melo from comment #10)
> Will test the nightlies and report.
> 
> As for /Library/GPUBundles/GeForceTeslaGLDriverWeb.bundle, it exists, with a
> creation == last modified date of 7 Dec. The /Library/GPUBundles dir itself
> has a creation of 3 Oct, meaning that it's a likely recent addition to the
> OS (therefore not present on the 10.11 you tested).

OK, good to know.

> A note if you also want to pursue the .toc write fails: they occur even when
> the nVidia driver's disabled. Why is FF writing to an nVidia-related temp
> dir when the driver is not in use?

I don't know, but it's most likely not something we have control over. More likely, an Apple library call we use as part of the graphics stack does this as a side effect. You might be able to get a stack trace for the write using the dtruss(1m) command. The stack trace should show us where the write is coming from. If you want to give that a try, first quit Firefox, then run this command in the terminal

  $ sudo dtruss -s -t open -n plugin-container

and then restart the browser and watch for a failed open of one of those .toc files along with a stack trace. Example: https://pastebin.mozilla.org/9075104

> My hunch was wrong. The first nightly (access to hw.cachelinesize only)
> still had the issue. The second (access to /Library/GPUBundles +
> hw.cachelinesize) solved it.

Cool :)

> I'll be glad to try a third version enabling access to /Library/GPUBundles
> only, if you'd like to narrow it down.
> 
> In the case of the first nightly I got (only) the console message
> complaining about access to
> /Library/GPUBundles/GeForceTeslaGLDriverWeb.bundle.

I think it makes sense to allow hw.cachelinesize because cache line sizes are likely used to optimize memory buffer sizes / layout. And we're already allowing hw.cachelinesize_compat which provides access to the same information in a different way.

> For the second nightly, which worked, I got the .toc access error, as with
> all other cases that work.

OK. I should have mentioned that is expected because we block write access from content processes on security.sandbox.content.level=1 and greater.

I'll try to get more information on those .toc files and then get this fix landed on Nightly and Beta.
Flags: needinfo?(haftandilian)
The .toc files are likely to be part of Nvidia's shader cache and preventing those files from being saved to disk may affect performance on future runs. I haven't seen any instances of the sandbox blocking this on newer hardware with the default Apple-provided drivers. I'll file another bug to track this problem.

Moving forward with the fix to allow the cachelinesize sysctl and /Library/GPUBundles.

We're already allowing hw.cachelinesize_compat. Comments in the xnu kernel (found in the 10.13 xnu-4570.1.46 download) in bsd/kern/kern_mib.c explain what the _compat versions are for:

  ...
   * The variables named *_compat here are int-sized versions of variables
   * that are now exported as quads.  The int-sized versions are normally
   * looked up only by number, wheras the quad-sized versions should be
   * looked up by name.
   *
   * The *_compat nodes are *NOT* visible within the kernel.
  ...
Flags: needinfo?(haftandilian)
(In reply to Haik Aftandilian [:haik] from comment #13)
> I haven't seen any instances of the sandbox blocking this on newer hardware
> with the default Apple-provided drivers.

To clarify, I haven't seen any instances of this on any Mac hardware with the Apple-provided drivers. This is probably specific to the downloadable Nvidia "web" drivers.
Comment on attachment 8939603 [details]
Bug 1421262 - [Mac] Add access to hw.cachelinesize sysctl, /Library/GPUBundles to content sandbox rules.

https://reviewboard.mozilla.org/r/209912/#review215502

::: commit-message-4f09d:1
(Diff revision 1)
> +Bug 1421262 - Add access to hw.cachelinesize sysctl, /Library/GPUBundles. r?Alex_Gaynor

Commit message should probably mention sandbox and/or content process.
Attachment #8939603 - Flags: review?(agaynor) → review+
@manuel.nuno.melo, thanks for all your help debugging and running tests to track this problem down!
Pushed by haftandilian@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/29245716751b
[Mac] Add access to hw.cachelinesize sysctl, /Library/GPUBundles to content sandbox rules. r=Alex_Gaynor
Assignee: nobody → haftandilian
See Also: → 1427827
https://hg.mozilla.org/mozilla-central/rev/29245716751b
Status: NEW → RESOLVED
Closed: 6 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla59
Comment on attachment 8939603 [details]
Bug 1421262 - [Mac] Add access to hw.cachelinesize sysctl, /Library/GPUBundles to content sandbox rules.

Approval Request Comment
[Feature/Bug causing the regression]:
Enabling of content sandbox read-access filesystem restrictions in bug 1332190. This problem occurs with the downloadable (not the standard Apple-included) Nvidia graphics drivers.

[User impact if declined]:
Firefox is broken for users who want to use the downloadable Nvidia "web" drivers. Some pages render garbled and unreadable content.

[Is this code covered by automated tests?]:
Yes, the changed code is executed every time a content process is started, but automated tests only use the standard Nvidia graphics drivers, not the downloadable versions.

[Has the fix been verified in Nightly?]:
Yes

[Needs manual test from QE? If yes, steps to reproduce]: 
No

[List of other uplifts needed for the feature/fix]:
None

[Is the change risky?]:
No

[Why is the change risky/not risky?]:
The patches changes the Mac sandbox rules in a minor way, make the sandbox slightly more permissive. Adding additional allowances is unlikely to cause regressions.

[String changes made/needed]:
None
Attachment #8939603 - Flags: approval-mozilla-beta?
Comment on attachment 8939603 [details]
Bug 1421262 - [Mac] Add access to hw.cachelinesize sysctl, /Library/GPUBundles to content sandbox rules.

Fix an rendering issue when using NVIDIA's Web Driver. Beta58+.
Attachment #8939603 - Flags: approval-mozilla-beta? → approval-mozilla-beta+
You need to log in before you can comment on or make changes to this bug.