Open Bug 1636555 Opened 4 years ago Updated 2 years ago

Proprietary Nvidia driver: Home Depot product-preview image crashes Xorg (or locks up my machine entirely), with ` Fatal IO error 11 (Resource temporarily unavailable) on X server`

Categories

(Core :: Graphics, defect, P3)

x86_64
Linux
defect

Tracking

()

Tracking Status
firefox76 --- wontfix
firefox77 --- wontfix
firefox78 --- affected

People

(Reporter: dholbert, Unassigned)

References

(Blocks 1 open bug)

Details

Attachments

(1 file)

STR:

  1. Visit https://www.homedepot.com/p/Pratt-Retail-Specialties-72-in-W-x-80-in-L-Premium-Moving-Blanket-7007004/202518473
  2. Click the first thumbnail image on the left (with a silhouette/chart of a person standing next to the blanket held up against a wall)

ACTUAL RESULTS:
The image loads full-size in an in-content popup, and then I get dropped to my Ubuntu login screen, because Xorg has crashed.

Or worse: if I enable WebRender, then the display just locks up and becomes unresponsive until I forcibly power off my machine.

EXPECTED RESULTS:
No crash/lockup.

I'm using a Dell XPS 15-inch 9570 (2018) laptop, running Ubuntu 20.04, which was released a couple weeks ago. I'm not sure if this is a regression in Firefox and/or Ubuntu. But for what it's worth, Chrome does not have this issue, on the same machine/environment.

When I crash, four copies of this message goes by in my gnome-terminal output:

Gdk-Message: 11:33:52.928: /home/dholbert/programs/firefox-nightly/firefox-bin: Fatal IO error 11 (Resource temporarily unavailable) on X server :1.

followed by this message:

Gdk-Message: 11:33:52.932: firefox: Fatal IO error 11 (Resource temporarily unavailable) on X server :1.
Summary: Home Depot product-preview image crashes Xorg (or locks up my machine entirely) → Home Depot product-preview image crashes Xorg (or locks up my machine entirely), with ` Fatal IO error 11 (Resource temporarily unavailable) on X server`
See Also: → 1129492

Do you see this in Firefox release?

Flags: needinfo?(dholbert)
Severity: -- → S2

Yes, I can reproduce this in Firefox release, version 76.0.1 (64-bit)

Flags: needinfo?(dholbert)

I can reproduce in Nightly 2018-12-01 as well (i.e. version 65.0a1 (2018-12-01) (64-bit))

I tried a somewhat older Nightly, 2018-06-01, but you can't complete the STR there because clicking the thumbnail image triggers a JS error ReferenceError: event is not defined when I click the thumbnail, and nothing happens (the image slideshow in-content-modal-popup doesn't appear) -- so they must be using some JS/DOM feature that we added between those two Nightlies. (Not that that feature is involved with the graphics error; it's just required to get to the spot that triggers the graphics error.)

So: bottom line, this doesn't seem to be a regression (not in Firefox at least). This may be a Xorg/Ubuntu regression or behavior-change involved, or possibly not; I don't have older Ubuntu handy for testing.

Martin, could you take a look at this on

Flags: needinfo?(stransky)

(In reply to Daniel Holbert [:dholbert] from comment #4)

I tried a somewhat older Nightly, 2018-06-01, but you can't complete the STR there because clicking the thumbnail image triggers a JS error ReferenceError: event is not defined when I click the thumbnail, and nothing happens (the image slideshow in-content-modal-popup doesn't appear) -- so they must be using some JS/DOM feature that we added between those two Nightlies.

(That's probably https://bugzilla.mozilla.org/show_bug.cgi?id=1496288, or one of the variations where we have this enabled for a while, before backing out and eventually shipping)

www.homedepot.com gives me Access Denied.

But anyway, can you try to get a backtrace of the crash? Do you have anything in you journal log? ("journalctl -b 1" prints previous boot logs, journalctl prints all logs). xorg crashes are ususaly shown there.

Flags: needinfo?(stransky) → needinfo?(dholbert)
Priority: -- → P3

(In reply to Martin Stránský [:stransky] from comment #7)

www.homedepot.com gives me Access Denied.

Darn; they must have a geolocation-block of some sort.

But anyway, can you try to get a backtrace of the crash? Do you have anything in you journal log? ("journalctl -b 1" prints previous boot logs, journalctl prints all logs). xorg crashes are ususaly shown there.

Here's output from journalctl at the time of the crash:


Jun 01 09:33:19 coral kernel: NVRM: GPU at PCI:0000:01:00: GPU-b7b30932-0782-8fe5-4757-bd243f512429
Jun 01 09:33:19 coral kernel: NVRM: Xid (PCI:0000:01:00): 32, pid=1245, Channel ID 00000018 intr 00004000
Jun 01 09:33:19 coral kernel: NVRM: Xid (PCI:0000:01:00): 32, pid=1902, Channel ID 00000018 intr 00004000
Jun 01 09:33:19 coral kernel: NVRM: Xid (PCI:0000:01:00): 31, pid=1902, Ch 00000019, intr 50000000. MMU Fault: ENGINE GRAPHICS GPCCLIENT_PROP_0 faulted @ 0x1_005f0000. Fault is of type FAULT_PTE ACCESS_TYPE_WRITE
Jun 01 09:33:19 coral /usr/lib/gdm3/gdm-x-session[1902]: (EE)
Jun 01 09:33:19 coral /usr/lib/gdm3/gdm-x-session[1902]: (EE) Backtrace:
Jun 01 09:33:19 coral /usr/lib/gdm3/gdm-x-session[1902]: (EE) 0: /usr/lib/xorg/Xorg (OsLookupColor+0x13c) [0x562383ca7dec]
Jun 01 09:33:19 coral /usr/lib/gdm3/gdm-x-session[1902]: (EE) 1: /lib/x86_64-linux-gnu/libpthread.so.0 (funlockfile+0x60) [0x7f986a70841f]
Jun 01 09:33:19 coral /usr/lib/gdm3/gdm-x-session[1902]: (EE) unw_get_proc_name failed: no unwind info found [-10]
Jun 01 09:33:19 coral /usr/lib/gdm3/gdm-x-session[1902]: (EE) 2: /usr/lib/xorg/modules/drivers/modesetting_drv.so (?+0x0) [0x7f986955b050]
Jun 01 09:33:19 coral /usr/lib/gdm3/gdm-x-session[1902]: (EE) 3: /usr/lib/xorg/Xorg (xf86_crtc_show_cursor+0x31) [0x562383bbc731]
Jun 01 09:33:19 coral /usr/lib/gdm3/gdm-x-session[1902]: (EE) 4: /usr/lib/xorg/Xorg (xf86_show_cursors+0x28f) [0x562383bbca6f]
Jun 01 09:33:19 coral /usr/lib/gdm3/gdm-x-session[1902]: (EE) 5: /usr/lib/xorg/Xorg (xf86DestroyCursorInfoRec+0xa83) [0x562383bc8113]
Jun 01 09:33:19 coral /usr/lib/gdm3/gdm-x-session[1902]: (EE) 6: /usr/lib/xorg/Xorg (xf86DestroyCursorInfoRec+0xecd) [0x562383bc8a1d]
Jun 01 09:33:19 coral /usr/lib/gdm3/gdm-x-session[1902]: (EE) 7: /usr/lib/xorg/Xorg (RamDacHandleColormaps+0x765) [0x562383bc6a95]
Jun 01 09:33:19 coral /usr/lib/gdm3/gdm-x-session[1902]: (EE) 8: /usr/lib/x86_64-linux-gnu/nvidia/xorg/nvidia_drv.so (nvidiaAddDrawableHandler+0x46d856) [0x7f9869ee59ac]
Jun 01 09:33:19 coral /usr/lib/gdm3/gdm-x-session[1902]: (EE)
Jun 01 09:33:19 coral /usr/lib/gdm3/gdm-x-session[1902]: (EE) Segmentation fault at address 0x0
Jun 01 09:33:19 coral /usr/lib/gdm3/gdm-x-session[1902]: (EE)
Jun 01 09:33:19 coral /usr/lib/gdm3/gdm-x-session[1902]: Fatal server error:
Jun 01 09:33:19 coral /usr/lib/gdm3/gdm-x-session[1902]: (EE) Caught signal 11 (Segmentation fault). Server aborting
Jun 01 09:33:19 coral /usr/lib/gdm3/gdm-x-session[1902]: (EE)
Jun 01 09:33:19 coral /usr/lib/gdm3/gdm-x-session[1902]: (EE)
Jun 01 09:33:19 coral /usr/lib/gdm3/gdm-x-session[1902]: Please consult the The X.Org Foundation support
Jun 01 09:33:19 coral /usr/lib/gdm3/gdm-x-session[1902]:          at http://wiki.x.org
Jun 01 09:33:19 coral /usr/lib/gdm3/gdm-x-session[1902]:  for help.
Jun 01 09:33:19 coral /usr/lib/gdm3/gdm-x-session[1902]: (EE) Please also check the log file at "/var/log/Xorg.1.log" for additional information.
Jun 01 09:33:19 coral /usr/lib/gdm3/gdm-x-session[1902]: (EE)
Jun 01 09:33:19 coral /usr/lib/gdm3/gdm-x-session[1902]: (EE) NVIDIA(0): The NVIDIA X driver has encountered an error; attempting to
Jun 01 09:33:19 coral /usr/lib/gdm3/gdm-x-session[1902]: (EE) NVIDIA(0):     recover...
Jun 01 09:33:19 coral /usr/lib/gdm3/gdm-x-session[1902]: (II) NVIDIA(0): Error recovery was successful.
Flags: needinfo?(dholbert)

As hinted by the mention of cursors in the backtrace, this seems to be an issue with a custom cursor on the page.

Here's a minimal testcase that triggers the issue for me.

The cursor is hosted at https://assets.homedepot-static.com/p/static/images/inlinePlayer/360_zoom_out.cur , but I've included it as a data URI in the attached testcase so that the testcase is entirely standalone.

Looking at the backtrace it seems to be mesa/nvidia drivers issue. Better to report at Xorg bugzilla but noveau drivers are know to be buggy.
I wonder if anyone without nvidia drivers can reproduce it.

Does it crash with Nouveau as well (if you uninstall the proprietary Nvidia driver)?

(In reply to Jan Andre Ikenmeyer [:darkspirit] from comment #11)

Does it crash with Nouveau as well (if you uninstall the proprietary Nvidia driver)?

I used the Ubuntu "Additional Drivers" tool (part of the Software & Updates dialog) to switch to Nouveau, and rebooted, and now unfortunately I can't seem to get to a graphical login screen - I just get a blinking white cursor at the last stage of the bootup process.

So, it seems the Nouveau drivers don't support my hardware/configuration.

OK -- after swapping to Nouveau, I was able to successfully boot to a graphical desktop by choosing "recovery mode" in GRUB and then choosing "resume" from the recover prompt.

In this graphical session, I'm able to load the testcase and the original Home Depot page just fine.

So: indeed, this does seem to be an issue that is specific to the latest NVIDIA proprietary driver. So probably a driver bug.

(Though it's specific to Firefox running under that driver -- Chrome doesn't have any trouble. So perhaps it's a bug in some underlying API that we use (directly or indirectly) & that Chrome doesn't use, or something.)

(And for my own reference / anyone else who happens to run into the issue from comment 12 -- I was able to return to using the NVIDIA driver [which lets me boot successfully, but be subject to this bug] by running the following in a terminal: sudo ubuntu-drivers install)

S1 or S2 bugs need an assignee - could you find someone for this bug?

Flags: needinfo?(jbonisteel)
Flags: needinfo?(jbonisteel)
Severity: S2 → S3
Summary: Home Depot product-preview image crashes Xorg (or locks up my machine entirely), with ` Fatal IO error 11 (Resource temporarily unavailable) on X server` → Proprietary Nvidia driver: Home Depot product-preview image crashes Xorg (or locks up my machine entirely), with ` Fatal IO error 11 (Resource temporarily unavailable) on X server`
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: