Closed Bug 718629 Opened 12 years ago Closed 12 years ago

intermittent waitpid failure causes OpenGL features to be blacklisted on X11

Categories

(Core :: Graphics, defect)

x86_64
Linux
defect
Not set
normal

Tracking

()

RESOLVED FIXED
mozilla13
Tracking Status
firefox12 --- fixed

People

(Reporter: bjacob, Assigned: bjacob)

References

Details

(Whiteboard: [qa-])

Attachments

(2 files, 3 obsolete files)

Here on Ubuntu 11.10 with the radeon driver, I sometimes get a spurious 'waitpid failure' reported in about:support. This causes GL features to be disabled. Needs investigating.

Also see bug 622127 comment 16, it's the same thing.
Attachment #589171 - Flags: review?(joe)
Attachment #589171 - Attachment is obsolete: true
Attachment #589171 - Flags: review?(joe)
Attachment #589178 - Flags: review?(joe)
Affects me also (reproducing my comments from Bug 622127)

The wiki page says that Webgl is enabled by default for Mesa >7.10.3
https://wiki.mozilla.org/Blocklisting/Blocked_Graphics_Drivers#On_X11

However, my Intel Ironlake card is blacklisted even though I'm running Mesa 7.11.3 (Firefox 9 stable channel, ArchLinux, Kernel 3.1.9)
The graphics section of about:support says this:

===================================================================
Adapter DescriptionGLXtest process failed (waitpid failed): VENDOR
Tungsten Graphics, Inc
RENDERER
Mesa DRI Intel(R) Ironlake Desktop 
VERSION
2.1 Mesa 7.11.2
TFP
TRUE

WebGL RendererTungsten Graphics, Inc -- Mesa DRI Intel(R) Ironlake Desktop  -- 2.1 Mesa 7.11.2GPU Accelerated Windows0/1. Blocked for your graphics driver version. Try updating your graphics driver to version <Anything with EXT_texture_from_pixmap support> or newer.
===========================================================================

If I run glxinfo | grep texture_from_pixmap I get the following:
====================
GLX_ARB_multisample, GLX_EXT_import_context, GLX_EXT_texture_from_pixmap, 
GLX_SGIX_visual_select_group, GLX_EXT_texture_from_pixmap, 
GLX_EXT_texture_from_pixmap
=====================

So it appears that my driver and hardware do support the EXT_texture_from_pixmap extension, contrary to what the about:support message says.

I can get it working by force-enabling webgl and layers acceleration, but why am I having to do this? Is the Wiki page just wrong?  Is there a bug with Ironlake hardware that needs to be filed with the Mesa team?
Attachment #589178 - Flags: review?(joe) → review+
Ryan, we're still investigating this. I'll land my patch so it should be in Thursday's Nightly build (nightly.mozilla.org). If you could try it then, and paste here the about:support Graphics information from it, that would greatly help us understand this problem.

The texture_from_pixmap message will be fixed at the same time. The probe does recognize that your driver supports texture_from_pixmap, see "TFP TRUE".
http://hg.mozilla.org/integration/mozilla-inbound/rev/924d6091bec7
Assignee: nobody → bjacob
Target Milestone: --- → mozilla12
Thanks for the quick response, Jacob!  In preparation for Thursday I went ahead and tried tonight (1/17) nightly and the problem seems resolved already. 

About:support reads:
================
Adapter DescriptionTungsten Graphics, Inc -- Mesa DRI Intel(R) Ironlake Desktop Vendor IDTungsten Graphics, IncDevice IDMesa DRI Intel(R) Ironlake Desktop Driver Version2.1 Mesa 7.11.2WebGL RendererTungsten Graphics, Inc -- Mesa DRI Intel(R) Ironlake Desktop  -- 2.1 Mesa 7.11.2GPU Accelerated Windows0
=================

If I enable layers acceleration in about:config, performance seems decent, too!

I'll test again on Thursday once your patch hits.
https://hg.mozilla.org/mozilla-central/rev/924d6091bec7
Status: NEW → RESOLVED
Closed: 12 years ago
Resolution: --- → FIXED
(In reply to Ryan S Kingsbury from comment #6)
> Thanks for the quick response, Jacob!  In preparation for Thursday I went
> ahead and tried tonight (1/17) nightly and the problem seems resolved
> already. 

Hm, OK, as I said in the title of this bug, it's intermittent. Which means that I don't think it's resolved, it's just not manifesting itself for you at the moment.

When the problem reappears, go again to about:support, check if it mentions "waitpid", if yes paste that info here: starting in tomorrow's Nightly it will have more useful information.

Keeping this bug open until it's actually fixed.
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
You were right, I'm having problems again even before updating the nightly build to today's version.

Tried the 1/19 nightly and I see the following:
========================
Adapter DescriptionTungsten Graphics, Inc -- Mesa DRI Intel(R) Ironlake Desktop 
Vendor IDTungsten Graphics, Inc
Device ID Mesa DRI Intel(R) Ironlake Desktop 
Driver Version 2.1 Mesa 7.11.2
WebGL Renderer Blocked for your graphics driver version. Try updating your graphics driver to version <Anything with EXT_texture_from_pixmap support> or newer.
GPU Accelerated Windows 0
==========================
If I force enable webgl I get the following:
==================
Adapter Description Tungsten Graphics, Inc -- Mesa DRI Intel(R) Ironlake Desktop 
Vendor ID Tungsten Graphics, Inc
Device ID Mesa DRI Intel(R) Ironlake Desktop 
Driver Version 2.1 Mesa 7.11.2
WebGL Renderer Tungsten Graphics, Inc -- Mesa DRI Intel(R) Ironlake Desktop  -- 2.1 Mesa 7.11.2
GPU Accelerated Windows 0
================

The GPU Accelerated windows count remains at 0 even when I have a webgl tab open (in this case webglearth.com)

Unfortunately I don't see anything about waitpid
Update: now I see the latter output (no error) even without webgl force-enabled.  I haven't changed anything, just starting and restarting the browser a few times.

I should not that I'm running these tests in safe mode so all addons are disabled.
Another update (sorry to spam):

I tried various combinations of
webgl.force-enabled T/F
layers.acceleration.force-enabled T/F
safe and normal mode

The only 2 configurations that result in "GPU Accelerated Windows" going from 0 to 1 involve using layers.acceleration.force-enabled TRUE, and opening in non-safe mode. This configuration works whether or not webgl is force-enabled.

The rest of the graphics output in about:support remains the same, only the window count changes.

Oddly, when I force-enable layers acceleration, I get strange behavior on input events. For instance, while trying to type the text into this box the keyboard became VERY laggy. Also, hovering the mouse over different HTML elements causes some of them to flicker and disappear. I don't know if that's related or off-track.  The layers acceleration DOES definitely bring the GPU online though; my framerate on webgl earth went from about 3 to 20+ when the GPU accelerated windows count was 1.
If PR_CreateProcess is called before waitpid is called, then this would be the same issue as bug 678372.
Karl, I set a breakpoint at prinit.c:731 and it's never reached here, during a normal Firefox run.
Though, in the debug build which I debugged, I can't reproduce this problem. And on Nightly, I get:

GLXtest process failed (waitpid failed with errno=10 for pid 31025):
VENDOR
X.Org
RENDERER
Gallium 0.4 on AMD RV710
VERSION
2.1 Mesa 7.11
TFP
TRUE

This means a few things. First, errno=10 means "no child" which is in agreement with your theory about PR_WaitProcess reaping the glxtest process.

Second, the glxtest process was pid 31025 while this Firefox instance is pid 31023. So there was indeed a process in between at some point. I guess Nightly does stuff that my debug build doesn't do.

Third, even though the glxtest process was gone, the data was still successfully read from the pipe. So perhaps this just doesn't matter and we should just ignore the errno=10 case?
Attached patch tolerate ECHILD (obsolete) — Splinter Review
A good reason to r- this would be if there were a good reason to fear that in case of ECHILD error, we could actually fail to get the data from the pipe. The premise of this patch is that ECHILD just means that the process was already reaped but we still get the data from the pipe.
Attachment #592597 - Flags: review?(karlt)
This is better: we only tolerate ECHILD if reading from the pipe succeeded. If reading from the pipe failed, we report explicitly about it in AppNotes (and remove a wishful comment about it being impossible) and we still consider ECHILD a waitpid failure, so we also still report about that in AppNotes.
Attachment #592597 - Attachment is obsolete: true
Attachment #592597 - Flags: review?(karlt)
Attachment #592598 - Flags: review?(karlt)
Per IRC discussion.
Attachment #592598 - Attachment is obsolete: true
Attachment #592598 - Flags: review?(karlt)
Attachment #592894 - Flags: review?(karlt)
Comment on attachment 592894 [details] [diff] [review]
tolerate ECHILD, and only write to the pipe at the very end of glxtest

>+            } else {
>+                // Bug 718629
>+                // ECHILD happens when the glxtest process got reaped by a PR_WaitProcess
>+                // as per bug 227246. This shouldn't matter, as we still seem to get the data

The first sentence here isn't quite accurate.  This happens even without PR_WaitProcess being called.
It happens from WaitPidDaemonThread, which will be run after PR_CreateProcess is called.
I'd suggest "... got reaped after a PR_CreateProcess as per ..."

r+ with that touch-up.
Attachment #592894 - Flags: review?(karlt) → review+
Comment on attachment 592894 [details] [diff] [review]
tolerate ECHILD, and only write to the pipe at the very end of glxtest

[Approval Request Comment]
Regression caused by (bug #): don't know, but it must be 1 or 2 month old
User impact if declined: WebGL will be wrongly blacklisted half the time, intermittently, on linux (desktop, not android)
Testing completed (on m-c, etc.): just landed it on m-c.
Risk to taking this patch (and alternatives if risky): not risky. This will not cause us to wrongly whitelist a driver that should be blacklisted.
String changes made by this patch: None
Attachment #592894 - Flags: approval-mozilla-aurora?
Target Milestone: mozilla12 → mozilla13
https://hg.mozilla.org/mozilla-central/rev/d550ac61fc59
Status: REOPENED → RESOLVED
Closed: 12 years ago12 years ago
Resolution: --- → FIXED
Comment on attachment 592894 [details] [diff] [review]
tolerate ECHILD, and only write to the pipe at the very end of glxtest

[Triage Comment]
Prevents WebGL from being blacklisted in certain circumstances. Approved for Aurora 12.
Attachment #592894 - Flags: approval-mozilla-aurora? → approval-mozilla-aurora+
Target Milestone: mozilla12 → mozilla13
I am still seeing this issue even with firefox version 11:

Adapter DescriptionGLXtest process failed (waitpid failed): VENDOR
NVIDIA Corporation
RENDERER
GeForce GTX 460/PCIe/SSE2
VERSION
4.2.0 NVIDIA 295.20
TFP
TRUE

WebGL RendererBlocked for your graphics card because of unresolved driver issues.GPU Accelerated Windows0. Blocked for your graphics driver version. Try updating your graphics driver to version <Anything with EXT_texture_from_pixmap support> or newer.

But I can't reproduce it if I start it as a different user with a clean profile (I haven't tried this with earlier versions of firefox, that is < 11). According to the wiki page this driver version should be whitelisted.
gk.bts1: this is strange; please file a new bug as this has to be a different issue. To determine whether we should worry about this, we should check how many % of firefox 11+ users get this 'waitpid failure' from crash-stats data.
This bug has been fixed for version 12, but not version 11 unfortunately.
Thank you for the hint - I was subscribed to a duplicate of this bug and the original submitter says it has been fixed for them with version 11 and I naively believed it was included, but now after checking the code I can see it's not there yet.  Will retest with version 12.  Sorry for the noise and keep up the good work.
Whiteboard: [qa+]
Are there any reliable STR which could be used in order to verify this issue?
Not easy to QA: this never was 100% reproducible.
Whiteboard: [qa+] → [qa-]
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: