Closed Bug 1693011 Opened 3 years ago Closed 2 years ago

Firefox hangs during startup when both headless and Software Rendering are enabled (RenderCompositorSWGL failed mapping default framebuffer)

Categories

(Core :: Graphics: WebRender, defect)

defect

Tracking

()

RESOLVED WORKSFORME
Tracking Status
firefox-esr78 --- unaffected
firefox85 --- unaffected
firefox86 --- disabled
firefox87 - disabled
firefox88 --- wontfix
firefox89 --- fix-optional

People

(Reporter: whimboo, Unassigned)

References

(Blocks 1 open bug, Regression)

Details

(Keywords: regression)

Original issue as reported:
https://github.com/puppeteer/puppeteer/pull/6861

When running headless Puppeteer tests with Firefox Nightly on Linux for some tests Firefox hangs during startup. As result of these failures tests for Firefox have been currently disabled in the Puppeteer CI.

I did some investigation and I can reproduce the problem on my local Linux machine running Linux Mint 20.1 with the Cinnamon desktop. Running the tests in non-headless mode it works all fine.

With the help of mozregression I nailed down the regression to bug 1684170.

Here the steps to reproduce:

  1. Clone the Puppeteer repository: https://github.com/puppeteer/puppeteer
  2. Install Puppeteer via PUPPETEER_PRODUCT=firefox npm install
  3. Modify the following line in page.spec.ts to describe.only
  4. Run the unit tests with Firefox via npm run funit

At some point Firefox no longer starts-up and causing failures in the beforeEach hooks.

I don't have more details yet, but it looks like a breakage with maybe software Webrender and headless mode.

CC'ing Andrew given that he landed that patch on bug 1684170.

[Tracking Requested - why for this release]: Asking for tracking the 86 release because running Puppeteer tests will by default use headless mode, and it could be completely broken for users.

I just tested with the Firefox 86.0 RC builds and it looks all fine. Also as Pascal mentioned to me Software WebRender will not be enabled for the 86 release. So we won't have to track it for 86, and will most likely be a wontfix for that release.

Running the tests locally with a debug build the following output is visible:

[84504] WARNING: NS_ENSURE_SUCCESS(rv, rv) failed with result 0x80520012: file /builds/slave/m-cen-l64-d-000000000000000000/build/src/extensions/cookie/nsPermissionManager.cpp, line 2606
++DOCSHELL 0x7f8e1f3aa000 == 1 [pid = 84504] [id = 1]
++DOMWINDOW == 1 (0x7f8e1f3aa800) [pid = 84504] [serial = 1] [outer = (nil)]
++DOMWINDOW == 2 (0x7f8e1f3ab800) [pid = 84504] [serial = 2] [outer = 0x7f8e1f3aa800]
++DOCSHELL 0x7f8e1a033800 == 2 [pid = 84504] [id = 2]
++DOMWINDOW == 3 (0x7f8e1a034000) [pid = 84504] [serial = 3] [outer = (nil)]
++DOMWINDOW == 4 (0x7f8e1a035000) [pid = 84504] [serial = 4] [outer = 0x7f8e1a034000]
++DOMWINDOW == 5 (0x7f8e192a8000) [pid = 84504] [serial = 5] [outer = 0x7f8e1f3aa800]
[84504] WARNING: Hardware Vsync support not yet implemented. Falling back to software timers: file /builds/slave/m-cen-l64-d-000000000000000000/build/src/gfx/thebes/gfxPlatform.cpp, line 2240
[84504] WARNING: Failed to retarget HTML data delivery to the parser thread.: file /builds/slave/m-cen-l64-d-000000000000000000/build/src/parser/html/nsHtml5StreamParser.cpp, line 967
++DOCSHELL 0x7f8e1137c800 == 3 [pid = 84504] [id = 3]
++DOMWINDOW == 6 (0x7f8e1a039800) [pid = 84504] [serial = 6] [outer = (nil)]
++DOCSHELL 0x7f8e2252b800 == 4 [pid = 84504] [id = 4]
++DOMWINDOW == 7 (0x7f8e10cccc00) [pid = 84504] [serial = 7] [outer = (nil)]
[84504] WARNING: NS_ENSURE_SUCCESS(rv, rv) failed with result 0x80040111: file /builds/slave/m-cen-l64-d-000000000000000000/build/src/dom/base/nsFrameLoader.cpp, line 272
++DOCSHELL 0x7f8e0febe800 == 5 [pid = 84504] [id = 5]
++DOMWINDOW == 8 (0x7f8e0fe76c00) [pid = 84504] [serial = 8] [outer = (nil)]
[84504] WARNING: NS_ENSURE_SUCCESS(rv, rv) failed with result 0x80040111: file /builds/slave/m-cen-l64-d-000000000000000000/build/src/dom/base/nsFrameLoader.cpp, line 272
[84504] WARNING: Couldn't create child process for iframe.: file /builds/slave/m-cen-l64-d-000000000000000000/build/src/dom/base/nsFrameLoader.cpp, line 336
++DOMWINDOW == 9 (0x7f8e0fe81c00) [pid = 84504] [serial = 9] [outer = 0x7f8e0fe76c00]
++DOMWINDOW == 10 (0x7f8e0f98a800) [pid = 84504] [serial = 10] [outer = 0x7f8e1a039800]
++DOMWINDOW == 11 (0x7f8e0f977c00) [pid = 84504] [serial = 11] [outer = 0x7f8e10cccc00]
++DOMWINDOW == 12 (0x7f8e0f9ad000) [pid = 84504] [serial = 12] [outer = 0x7f8e0fe76c00]
++DOMWINDOW == 13 (0x7f8e0ded7c00) [pid = 84504] [serial = 13] [outer = 0x7f8e0fe76c00]

The curious part is that headless isn't taken into account and a browser window opens without the Remote Agent being enabled. Firefox is somewhere stuck during startup.

Note the above warning for:

Hardware Vsync support not yet implemented. Falling back to software timer.

Actually the above is not valid. Specifying HEADLESS=1 for Puppeteer actually doesn't use headless but normal mode.

As such when running in headless mode I can see hundreds of these lines:

[GFX1-]: RenderCompositorSWGL failed mapping default framebuffer

Summary: Firefox does not always startup for Puppeteer tests as run in headless mode → Firefox does not always startup for Puppeteer tests as run in headless mode (RenderCompositorSWGL failed mapping default framebuffer)
Flags: needinfo?(aosmond)

I can actually see a similar behavior when running firefox directly:

% firefox/firefox --screenshot screenshot.png http://mozilla.org
[85975] WARNING: XPCOM objects created/destroyed from static ctor/dtor: file /builds/slave/m-cen-l64-d-000000000000000000/build/src/xpcom/base/nsTraceRefcnt.cpp, line 174

We should most likely file a specific software webrender/headless bug, but I will wait for Andrew's reply.

As mentioned by Andrew on Element we might want to force disable WebRender via MOZ_WEBRENDER=0. That actually also fixes the problem for now.

Keeping the needinfo set so that the bug can be updated (summary and component) to better visualize what's broken on Linux.

I also filed bug 1693021 to disable Webrender for Puppeteer unit tests in Taskcluster.

See Also: → 1693021

Is this only an issue with sw-wr? AIUI that is not riding to release just yet?

Flags: needinfo?(hskupin)

Cannot say if it's just sw-wr, but I assume so. It should be nightly and early beta yet. So yes, it's not enabled for release yet.

Flags: needinfo?(hskupin)

This is (software) webrender related. So moving to the appropriate component for triaging.

Severity: S1 → --
Component: Agent → Graphics: WebRender
Priority: P1 → --
Product: Remote Protocol → Core

As of Mar 5, 2021, headless Firefox Nightly (Mozilla Firefox 88.0a1) reliably crashes for me on Intel Mac 10.15; running with MOZ_WEBRENDER=0 helps.

Here are the repro steps.

$ # repro steps include an HTML test asset from playwright repo. Checkout tag to make repro future-proof.  
$ git clone https://github.com/microsoft/playwright && git checkout v1.9.0
$ 
$ /Applications/Firefox\ Nightly.app/Contents/MacOS/firefox --headless file:///$PWD/playwright/test/assets/video.html

For the record: downstream Playwright bug - https://github.com/microsoft/playwright/issues/5721

Blocks: gfx-triage

The crashes as referenced by the last comment are all showing the following output:

    [pid=4029][err] Unable to create basic Accelerated OpenGL renderer.
    [pid=4029][err] Core Image is now using the software OpenGL renderer. This will be slow.
    [pid=4029][out] Crash Annotation GraphicsCriticalError: |[0][GFX1-]: RenderCompositorSWGL 

Andrey, do you know which exact changeset caused it? If not mind giving us the build ids or changesets for the build that was last working and the one that starts failing? Please note we build Nightly twice the day, so March 5th is not complete. Thanks!

Andrew, shall we file a new bug about the crash under MacOS?

Flags: needinfo?(aslushnikov)

If not mind giving us the build ids

Sure. I just updated nightly and tried it - it crashes as well. The build id is 20210308094833

Andrey, do you know which exact changeset caused it?

I don't know the exact changeset, unfortunately. Bisecting would take a while for me due to compilation time, but I have good and bad SHA's of the beta branch of the https://github.com/mozilla/gecko-dev

Flags: needinfo?(aslushnikov)

(In reply to Andrey Lushnikov from comment #12)

Sure. I just updated nightly and tried it - it crashes as well. The build id is 20210308094833

I finally did your reproduction steps and can also get Firefox to crash. I filed bug 1697004 for this specific crash, which seems to be related to the video on that page.

Blocks: sw-wr-stability
No longer blocks: gfx-triage
Severity: -- → S4

Note that this is actually not only an issue with Puppeteer but a general problem with Software Rendering and headless mode.

Summary: Firefox does not always startup for Puppeteer tests as run in headless mode (RenderCompositorSWGL failed mapping default framebuffer) → Firefox hangs during startup when both headless and Software Rendering are enabled (RenderCompositorSWGL failed mapping default framebuffer)

With Firefox 93, it's no longer possible to disable web render with MOZ_WEBRENDER=0.

The crash, however, is still in place. Is it possible to restore the old functionality?

For the record: the functionality was removed here – https://bugzilla.mozilla.org/show_bug.cgi?id=1725388

Has Regression Range: --- → yes
Flags: needinfo?(aosmond)

For the time being, is there any workaround so that simply "./firefox -headless" works?

(In reply to mcccs from comment #17)

For the time being, is there any workaround so that simply "./firefox -headless" works?

That would be interesting indeed.
We are not able to run Firefox headless on Windows Server Core (without desktop experience installed)

./firefox -headless seems to work for me on Ubuntu 22.04. Is anyone else still having problems on Linux? If so what are your steps to reproduce?

Ralph, I'd suggest filing a separate bug for getting Firefox headless running on Windows Sever Core as that seems like it might be an unrelated issue.

Just for the record, the error message:

[GFX1-]: RenderCompositorSWGL failed mapping default framebuffer, no dt

Yeah, I get that message as well. But headless still works.

(In reply to Jeff Muizelaar [:jrmuizel] from comment #21)

Yeah, I get that message as well. But headless still works.

Ok. Thanks for the hint!
I'll collect some information regarding the crash from the event logs and will create a new issue.

(In reply to Jeff Muizelaar [:jrmuizel] from comment #19)

./firefox -headless seems to work for me on Ubuntu 22.04. Is anyone else still having problems on Linux? If so what are your steps to reproduce?

Given that I filed this bug based on issues with Puppeteer I checked the current state locally and I can confirm that it works. As such I created https://github.com/puppeteer/puppeteer/pull/9001 to remove the environment variable. If all tests are passing and after the PR got merged I will close this bug.

The upstream PR has been merged. As such we can close this bug as WFM.

Status: NEW → RESOLVED
Closed: 2 years ago
Resolution: --- → WORKSFORME
You need to log in before you can comment on or make changes to this bug.