Closed Bug 892567 Opened 11 years ago Closed 10 years ago

Image decoding corruption and linux kernel crash with hardware reset

Categories

(Core :: Graphics: ImageLib, defect)

22 Branch
x86
Linux
defect
Not set
critical

Tracking

()

RESOLVED WORKSFORME

People

(Reporter: u474838, Unassigned)

References

Details

(Keywords: crash)

User Agent: Mozilla/5.0 (X11; Linux i686; rv:22.0) Gecko/20100101 Firefox/22.0 (Beta/Release)
Build ID: 20130618035212

Steps to reproduce:

Opened http://blather.michaelwlucas.com/wp-content/uploads/2013/07/ao2e-index.jpg and upon seeing that after successful decoding the bottom half got corrupted, I hit reload twice and that's when the kernel crashed and reset the machine. I filed a bug for this in the linux distro this happened with at https://bugs.archlinux.org/task/36105. The corruption happens with Firefox 22 to 25 and with 24 and 25 I was able to reliably make it crash while at least 22 doesn't crash the kernel.

Software and hardware info:
* Core2Duo in Mac Mini 2,1 running in 32bit mode
* linux 3.10-1
* xf86-video-intel 2.21.11-1
* mesa 9.1.4-4
* Intel 945GM


Actual results:

Corrupted image and kernel crash with reset upon hitting reload button.


Expected results:

Image should be decoded and displayed correctly without resetting the machine.
Type about:crashes in the location bar and provide the crash ID.
Severity: normal → critical
Flags: needinfo?(carstenmattner)
Keywords: crash, stackwanted
On github and youtube I've also seen an avatar picture (github) or video thumbnail get incorrectly reused in multiple places or a corrupted picture displayed and these are normally small images.
(In reply to Scoobidiver from comment #1)
> Type about:crashes in the location bar and provide the crash ID.

about:crashes is empty and I don't expect it to contain a crash. Reason is the way it just instantly resets the machine when it happens. The general corrupted image behavior may be an issue of using Firefox 22 or newer with intel-drm on linux driving this chip and I haven't seen such behavior with Opera or Chromium. I have seen corruption in gtk widgets where it draws the rect of a button with a corrupted inner image in various applications.
Flags: needinfo?(carstenmattner)
(In reply to carstenmattner from comment #3)
> I have seen corruption in gtk widgets where it draws the
> rect of a button with a corrupted inner image in various applications.

but this crash and corrupted jpeg decoding only happens in Firefox.
Flags: needinfo?(carstenmattner)
Keywords: stackwanted
Same thing happens in Safe Mode. This is how the corruption in ao2e-index.jpg happens:

* first the image loads completely and is displayed correctly for a split second
* then after loading the bottom half the image is corrupted meaning it doesn't display the original/correct content
* this time around the bottom half actually started display 3 rectangle or more where there were clearly small snapshots of the whole browser window displayed in the corrupted part's upper half or so.

I can fetch a multi-hour talk 720p webm from youtube and view it with various media players utilizing Xv, X11, or OpenGL and I see no picture corruption (except the occasional gtk widget) in Chromium, Opera, PDF viewers etc. So I don't think it's the hardware.

I suspect this is a combination of a bug in the graphics driver and Firefox's threaded image decoding or generally different decoding algorithm compared to Opera and other image viewers.

Didn't try with Aurora or Nightly in Safe Mode because the corruption exists in 22 and I wanted to post this comment first but I'll try 25 next.
Flags: needinfo?(carstenmattner)
> * then after loading the bottom half the image is corrupted meaning it doesn't display the original/correct content

sorry that should "then the bottom half of the image"
Component: Untriaged → ImageLib
Product: Firefox → Core
One other question - can you try turning off gfx.xrender.enabled and see if it still reproduces?
Wasn't able to crash the machine with 4 reloads of ao2e-index.jpg in Firefox 25 Safe Mode but the corruption is of course still there just this time there was no little snapshots of the browser window (with content). I mean this looks like there's memory corruption when decoding images happening consistently with Firefox's decoding mechanism. Next I'll try 22 and 25 in normal mode.
(In reply to Joe Drew (:JOEDREW! \o/) from comment #8)
> One other question - can you try turning off gfx.xrender.enabled and see if
> it still reproduces?

Tried 25 in normal mode and saw corruption but no crash yet.

Then with gfx.xrender.enabled set to false and restarted the corruption was gone BUT the fonts subjectively look like they're rendered in a less sharp (AA) way and are harder to read.

All with 25.
Turning off XRender does change a lot of things, yeah.
Did another test with intel-drm's SNA acceleration disabled and gfx.xrender.enabled set to true. No corruption occurred so far as I can see. What could this mean? That the intel driver has a bug with SNA enabled, maybe specifically when driving this chip and not generally? That could be a regression as I am sure that SNA was enabled and used on this machine without such bugs for a long time. Is this something Firefox should be concerned about ignoring that I should probably file a bug at freedesktop.org?

From Xorg.log with SNA enabled:
[    73.948] (==) Depth 24 pixmap format is 32 bpp
[    73.948] (II) intel(0): SNA initialized with Alviso (gen3) backend
[    73.948] (==) intel(0): Backing store disabled
[    73.948] (==) intel(0): Silken mouse enabled
[    73.948] (II) intel(0): HW Cursor enabled
[    73.948] (II) intel(0): RandR 1.2 enabled, ignore the following RandR disabled message.
[    73.948] (==) intel(0): DPMS enabled
[    73.948] (II) intel(0): [XvMC] i915_xvmc driver initialized.
[    73.948] (II) intel(0): [DRI2] Setup complete
[    73.948] (II) intel(0): [DRI2]   DRI driver: i915
[    73.948] (II) intel(0): direct rendering: DRI2 Enabled
[    73.948] (==) intel(0): hotplug detection: "enabled"
[    73.949] (--) RandR disabled
[    73.968] (II) AIGLX: enabled GLX_MESA_copy_sub_buffer
[    73.968] (II) AIGLX: enabled GLX_INTEL_swap_event
[    73.968] (II) AIGLX: enabled GLX_ARB_create_context
[    73.968] (II) AIGLX: enabled GLX_ARB_create_context_profile
[    73.968] (II) AIGLX: enabled GLX_EXT_create_context_es2_profile
[    73.968] (II) AIGLX: enabled GLX_SGI_swap_control and GLX_MESA_swap_control
[    73.968] (II) AIGLX: GLX_EXT_texture_from_pixmap backed by buffer objects
[    73.969] (II) AIGLX: Loaded and initialized i915
[    73.969] (II) GLX: Initialized DRI2 GL provider for screen 0

From Xorg.log without SNA enabled:
[  4731.653] (==) Depth 24 pixmap format is 32 bpp
[  4731.653] (II) intel(0): [DRI2] Setup complete
[  4731.653] (II) intel(0): [DRI2]   DRI driver: i915
[  4731.653] (II) UXA(0): Driver registered support for the following operations:
[  4731.653] (II)         solid
[  4731.653] (II)         copy
[  4731.653] (II)         composite (RENDER acceleration)
[  4731.654] (II)         put_image
[  4731.654] (II)         get_image
[  4731.654] (==) intel(0): Backing store disabled
[  4731.654] (==) intel(0): Silken mouse enabled
[  4731.654] (II) intel(0): Initializing HW Cursor
[  4731.654] (II) intel(0): RandR 1.2 enabled, ignore the following RandR disabled message.
[  4731.654] (==) intel(0): DPMS enabled
[  4731.654] (==) intel(0): Intel XvMC decoder disabled
[  4731.654] (II) intel(0): Set up textured video
[  4731.654] (II) intel(0): Set up overlay video
[  4731.654] (II) intel(0): direct rendering: DRI2 Enabled
[  4731.654] (==) intel(0): hotplug detection: "enabled"
[  4731.683] (--) RandR disabled
[  4731.703] (II) AIGLX: enabled GLX_MESA_copy_sub_buffer
[  4731.703] (II) AIGLX: enabled GLX_INTEL_swap_event
[  4731.703] (II) AIGLX: enabled GLX_ARB_create_context
[  4731.703] (II) AIGLX: enabled GLX_ARB_create_context_profile
[  4731.703] (II) AIGLX: enabled GLX_EXT_create_context_es2_profile
[  4731.703] (II) AIGLX: enabled GLX_SGI_swap_control and GLX_MESA_swap_control
[  4731.703] (II) AIGLX: GLX_EXT_texture_from_pixmap backed by buffer objects
[  4731.703] (II) AIGLX: Loaded and initialized i915
[  4731.703] (II) GLX: Initialized DRI2 GL provider for screen 0
Yes, please file a xf86-video-intel or kernel bug.
Sounds like a kernel bug is likely, but I assume an xf86-video-intel bug will have the right people looking at it.
Per https://bugs.archlinux.org/task/36105#comment114630 I built a version of
Firefox 24.0 linking more system versions of libraries and the corrupted jpeg
doesn't seem to happen so far. I was using ftp.mozilla.org binaries and those
reproduce the corruption reliably.

mozconfig:
. $topsrcdir/browser/config/mozconfig
ac_add_options --enable-official-branding
ac_add_options --with-system-jpeg
ac_add_options --with-system-zlib
ac_add_options --with-system-bz2
ac_add_options --with-system-png
ac_add_options --with-system-libevent
ac_add_options --enable-system-sqlite
ac_add_options --enable-system-cairo
ac_add_options --enable-system-pixman
ac_add_options --disable-tests
ac_add_options --disable-crashreporter
ac_add_options --disable-updater
ac_add_options --disable-installer
mk_add_options PROFILE_GEN_SCRIPT='EXTRA_TEST_ARGS=10 $(MAKE) -C $(MOZ_OBJDIR) pgo-profile-run'

This is a non PGO build due to memory limitations on this machine but I copied
that PGO line from ArchLinux's mozconfig as found.

package versions:
cairo 1.12.16-1
pixman 0.30.2-1
libjpeg-turbo 1.3.0-2
libpng 1.6.5-1
zlib 1.2.8-1
bzip2 1.0.6-4
zlib 1.2.8-1
libevent 2.0.21-2
sqlite 3.8.0.2-1

This is how ArchLinux's Firefox package is built with the exception of
--enable-system-cairo:
https://projects.archlinux.org/svntogit/packages.git/tree/trunk?h=packages/firefox
The custom Firefox build doesn't provoke the corruption but as suspected by the Intel devs there are corruption issues just hiding. I don't know if it's related but I've seen a Qt3 application draw text destined for its statusbar outside of the window and right into the X root or over other windows managed by the wm.
(In reply to carstenmattner from comment #16)
> The custom Firefox build doesn't provoke the corruption but as suspected by
> the Intel devs there are corruption issues just hiding. I don't know if it's
> related but I've seen a Qt3 application draw text destined for its statusbar
> outside of the window and right into the X root or over other windows
> managed by the wm.

I'll keep an eye out for this after enabling UXA.
Image decoding bug looks fixed in xf86-video-intel-2.99.906 but I still have to check the Qt3 out-of-window text drawing/corruption.
Haven't seen Qt3 out-of-window text drawing/corruption either. Both issues fixed in xf86-video-intel-2.99.906.
Status: UNCONFIRMED → RESOLVED
Closed: 10 years ago
Resolution: --- → WORKSFORME
You need to log in before you can comment on or make changes to this bug.