Last Comment Bug 672787 - Aurora build crashes [@ _moz_pixman_image_composite32] at start-up (07/20)
: Aurora build crashes [@ _moz_pixman_image_composite32] at start-up (07/20)
Status: RESOLVED FIXED
:
Product: Core
Classification: Components
Component: Graphics (show other bugs)
: Trunk
: ARM Android
: -- blocker (vote)
: mozilla8
Assigned To: Jim Chen [:jchen] [:darchons]
:
Mentors:
: 676368 (view as bug list)
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2011-07-20 06:57 PDT by Cristian Nicolae (:xti)
Modified: 2011-09-21 18:17 PDT (History)
23 users (show)
See Also:
Crash Signature:
QA Whiteboard:
Iteration: ---
Points: ---
Has Regression Range: ---
Has STR: ---
+
wontfix
fixed


Attachments
crash logcat (57.06 KB, text/plain)
2011-07-20 06:57 PDT, Cristian Nicolae (:xti)
no flags Details
Fix (1.92 KB, patch)
2011-09-15 13:37 PDT, Jim Chen [:jchen] [:darchons]
mark.finkle: review+
jst: approval‑mozilla‑aurora+
christian: approval‑mozilla‑beta-
Details | Diff | Review

Description Cristian Nicolae (:xti) 2011-07-20 06:57:37 PDT
Created attachment 547071 [details]
crash logcat

Build id : Mozilla/5.0 (Android;Linux armv7l;rv:7.0a2)Gecko/20110720
Firefox/7.0a2 Fennec/7.0a2
Device: Motorola Droid 2
OS: Android 2.2

Steps to reproduce:

Case 1:
If there is any Aurora build outdated installed, update it from about:firefox. After the new build is installed, tap on Open button.

Case 2:
Go to http://ftp.mozilla.org/pub/mozilla.org/mobile/nightly/latest-mozilla-aurora-android/ and tap on fennec-7.0a2.multi.eabi-arm.apk. After the app is installed, tap on the Open button.

Expected result:
Aurora build opens normally.

Actual result:
Aurora build crashes every time when it's opened and a Mozilla Crash dialog is displayed.

Note:
I cannot get the crash report from about:crashes because the Aurora build doesn't open at all.
Comment 1 Cristian Nicolae (:xti) 2011-07-20 07:07:08 PDT
I was able to get a crash report after I've installed the build from 20110719 over it: https://crash-stats.mozilla.com/report/index/bp-502110f9-b010-4911-86f8-edc292110720
Comment 2 Cristian Nicolae (:xti) 2011-07-20 07:20:02 PDT
This issue doesn't occur on:
Build id : Mozilla/5.0 (Android;Linux armv7l;rv:7.0a2)Gecko/20110719
Firefox/7.0a2 Fennec/7.0a2
Build config: http://hg.mozilla.org/releases/mozilla-aurora/rev/4d2a4e9e9730

But occurs on:
Build id : Mozilla/5.0 (Android;Linux armv7l;rv:7.0a2)Gecko/20110719
Firefox/7.0a2 Fennec/7.0a2

A possible range is:
http://hg.mozilla.org/mozilla-central/pushloghtml?startdate=2011-07-19&enddate=2011-07-20+03%3A00
Comment 3 Martijn Wargers [:mwargers] (gone per 2016-05-31 :-( ) 2011-07-20 07:25:27 PDT
https://crash-stats.mozilla.com/report/index/bp-502110f9-b010-4911-86f8-edc292110720
0 	libxul.so 	libxul.so@0x9df494 	
1 	libxul.so 	_moz_pixman_image_composite32 	gfx/cairo/libpixman/src/pixman.c:371
2 	libxul.so 	_clip_and_composite_boxes 	gfx/cairo/cairo/src/cairo-image-surface.c:3002
3 	libxul.so 	_cairo_image_surface_paint 	gfx/cairo/cairo/src/cairo-image-surface.c:3304
4 	libxul.so 	_cairo_surface_paint 	gfx/cairo/cairo/src/cairo-surface.c:2100
5 	libxul.so 	_cairo_gstate_paint 	gfx/cairo/cairo/src/cairo-gstate.c:1049
6 	libxul.so 	_moz_cairo_paint 	gfx/cairo/cairo/src/cairo.c:2238
7 	libxul.so 	_moz_cairo_paint_with_alpha 	gfx/cairo/cairo/src/cairo.c:2267
8 	libxul.so 	gfxContext::Paint 	gfx/thebes/gfxContext.cpp:772
9 	libxul.so 	gfxPlatform::OptimizeImage 	gfx/thebes/gfxPlatform.cpp:414
10 	libxul.so 	imgFrame::Optimize 	nsAutoPtr.h:954
11 	libxul.so 	mozilla::imagelib::RasterImage::DecodingComplete 	modules/libpr0n/src/RasterImage.cpp:1111
12 	libxul.so 	mozilla::imagelib::Decoder::PostDecodeDone 	nsCOMPtr.h:800
13 	libxul.so 	mozilla::imagelib::nsPNGDecoder::end_callback 	modules/libpr0n/decoders/nsPNGDecoder.cpp:863
14 	libxul.so 	MOZ_PNG_push_have_end 	modules/libimg/png/pngpread.c:1908
15 	libxul.so 	MOZ_PNG_push_read_chunk 	modules/libimg/png/pngpread.c:364
16 	libxul.so 	MOZ_PNG_proc_some_data 	modules/libimg/png/pngpread.c:65
17 	libxul.so 	MOZ_PNG_process_data 	modules/libimg/png/pngpread.c:39
18 	libxul.so 	mozilla::imagelib::nsPNGDecoder::WriteInternal 	modules/libpr0n/decoders/nsPNGDecoder.cpp:354
19 	libxul.so 	mozilla::imagelib::Decoder::Write 	modules/libpr0n/src/Decoder.cpp:104
20 	libxul.so 	mozilla::imagelib::RasterImage::WriteToDecoder 	modules/libpr0n/src/RasterImage.cpp:2277
Comment 4 Martijn Wargers [:mwargers] (gone per 2016-05-31 :-( ) 2011-07-20 07:27:37 PDT
Afaik, you need to look at the pushlog for Aurora, which is here:
http://hg.mozilla.org/releases/mozilla-aurora/
The problem is that I don't see anything that could trigger this crash, afaik.
Comment 5 Matt Brubeck (:mbrubeck) 2011-07-20 07:35:26 PDT
Today's Aurora nightly starts up fine (no crash) on my Xoom.
Comment 6 Martijn Wargers [:mwargers] (gone per 2016-05-31 :-( ) 2011-07-20 07:35:51 PDT
I guess this is basically related to/the same as bug 623161.
Comment 7 Martijn Wargers [:mwargers] (gone per 2016-05-31 :-( ) 2011-07-20 07:45:23 PDT
Btw, I can reproduce this crash on start-up, using the LG Optimus Black.
Comment 8 Doug Turner (:dougt) 2011-07-20 10:17:14 PDT
i don't crash on a n1.
Comment 9 Martijn Wargers [:mwargers] (gone per 2016-05-31 :-( ) 2011-07-20 12:36:46 PDT
I'm in the Mountain View office with the crashing Aurora browser on the phone. If someone wants to investigate, he can grab my phone (I'm in the QA area).
Comment 10 Marcia Knous [:marcia - use ni] 2011-07-20 14:31:19 PDT
I mentioned this to Naoki in case it is useful - There is a corresponding signature on the Firefox side with fairly low volume crash rate: https://crash-stats.mozilla.com/report/list?signature=_moz_pixman_image_composite32
Comment 11 Martijn Wargers [:mwargers] (gone per 2016-05-31 :-( ) 2011-07-20 14:57:09 PDT
Ok, this looks more like a Cairo bug to me, hen. Moving it to Core->Graphics.
Comment 12 Brad Lassey [:blassey] (use needinfo?) 2011-07-20 16:49:41 PDT
This crashes (info pulled from application.ini):
Version=7.0a2
BuildID=20110720042444
SourceRepository=http://hg.mozilla.org/releases/mozilla-aurora
SourceStamp=579cbf7a9add

This runs:
Version=7.0a2
BuildID=20110719042859
SourceRepository=http://hg.mozilla.org/releases/mozilla-aurora
SourceStamp=4d2a4e9e9730

So these are what landed in that span:

changeset:   72687:579cbf7a9add
user:        Simon Montagu <smontagu@smontagu.org>
date:        Mon Jul 11 06:40:51 2011 +0300
summary:     Don't resolve bidi paragraph in preformatted text until we really get to the end of the line. Bug 670226, r=roc, a=asa

changeset:   72686:433cd269be19
user:        Simon Montagu <smontagu@smontagu.org>
date:        Mon Jul 11 06:40:51 2011 +0300
summary:     Tests for bug 670226

changeset:   72685:ef4909389600
user:        Simon Montagu <smontagu@smontagu.org>
date:        Fri Jul 08 10:51:26 2011 +0300
summary:     Make sure that bidi continuation chains don't go beyond the end of the paragraph. Bug 668941, r=roc, a=asa

changeset:   72684:9a3234ac5c1c
user:        Myk Melez <myk@mozilla.org>
date:        Tue Jul 19 20:55:10 2011 -0700
summary:     update revision of Add-on SDK tests to latest tip; a=test-only

changeset:   72683:82f49f622e9d
user:        Luke Wagner <luke@mozilla.com>
date:        Mon Jul 18 17:37:19 2011 -0700
summary:     Bug 672026 - Ensure that there is an object principals finder during early startup (r=mrbkap,a=asa)
Comment 13 Naoki Hirata :nhirata (please use needinfo instead of cc) 2011-07-21 10:08:12 PDT
Talos regression in bug 672026
Comment 14 Naoki Hirata :nhirata (please use needinfo instead of cc) 2011-07-21 10:11:53 PDT
(In reply to comment #13)
> Talos regression in bug 672026

correction bug 654049
Comment 15 Armen Zambrano [:armenzg] - Engineering productivity 2011-07-21 10:17:36 PDT
I helped dougt trigger the following jobs (from http://build.mozilla.org/builds/running.html):
mozilla-aurora	9a3234ac5c1c 	Android mozilla-aurora build 	
mozilla-aurora	82f49f622e9d 	Android mozilla-aurora build 	
mozilla-aurora	ef4909389600 	Android mozilla-aurora build 	
mozilla-aurora	433cd269be19 	Android mozilla-aurora build 	
	        579cbf7a9add 	Android mozilla-aurora build
	        433cd269be19 	Android mozilla-aurora build

He would have not had the means to trigger those 3 csets that were on the same push.

The builds should show up in http://ftp.mozilla.org/pub/mozilla.org/mobile/tinderbox-builds/mozilla-aurora-android/
Comment 16 Mike Hommey [:glandium] 2011-07-21 10:24:51 PDT
How exactly is bug 654049 involved in this?
Comment 17 Doug Turner (:dougt) 2011-07-21 10:58:01 PDT
It looks like the crash started here:

http://hg.mozilla.org/releases/mozilla-aurora/rev/82f49f622e9d
Comment 18 Jim Chen [:jchen] [:darchons] 2011-07-22 12:09:13 PDT
I did some debugging and seems like it's not really a JS bug, but rather some strange linker magic

Functions from pixman_arm_neon_asm.o are supposed to be at least 4-byte aligned, which is the case before bug 672026:

arm-linux-androideabi-objdump -t dist/lib/libxul.so | grep 'pixman.\+neon'
> 009e57b8 l     F .text        00000000              .hidden pixman_composite_src_0888_0565_rev_asm_neon
> 009e79f8 l     F .text        00000000              .hidden pixman_scaled_nearest_scanline_8888_8888_OVER_asm_neon
> 009dc3d8 l     F .text        00000000              .hidden pixman_composite_scanline_add_asm_neon
> 009eb17c l     F .text        00000000              .hidden pixman_scaled_bilinear_scanline_8888_8888_OVER_asm_neon
> 009e6158 l     F .text        00000000              .hidden pixman_composite_over_0565_8_0565_asm_neon
> 009e8948 l     F .text        00000000              .hidden pixman_scaled_nearest_scanline_0565_8888_SRC_asm_neon
> 009e16c8 l     F .text        00000000              .hidden pixman_composite_add_n_8_8_asm_neon
> 009df054 l     F .text        00000000              .hidden pixman_composite_src_n_0565_asm_neon

But after bug 672026, everything from pixman_arm_neon_asm.o are now offset by 2 bytes (address in the first column):

arm-linux-androideabi-objdump -t dist/lib/libxul.so | grep 'pixman.\+neon'
> 009e5882 l     F .text        00000000              .hidden pixman_composite_src_0888_0565_rev_asm_neon
> 009e7ac2 l     F .text        00000000              .hidden pixman_scaled_nearest_scanline_8888_8888_OVER_asm_neon
> 009dc4a2 l     F .text        00000000              .hidden pixman_composite_scanline_add_asm_neon
> 009eb246 l     F .text        00000000              .hidden pixman_scaled_bilinear_scanline_8888_8888_OVER_asm_neon
> 009e6222 l     F .text        00000000              .hidden pixman_composite_over_0565_8_0565_asm_neon
> 009e8a12 l     F .text        00000000              .hidden pixman_scaled_nearest_scanline_0565_8888_SRC_asm_neon
> 009e1792 l     F .text        00000000              .hidden pixman_composite_add_n_8_8_asm_neon
> 009df11e l     F .text        00000000              .hidden pixman_composite_src_n_0565_asm_neon

Strange thing is this only happens to pixman_arm_neon_asm.o

Now when we call these function, blx instruction implies 4-byte alignment:

> 009da4dc <neon_composite_src_8888_8888+0x3c>:
>   9da4dc:       9000            str     r0, [sp, #0]
>   9da4de:       980b            ldr     r0, [sp, #44]
>   9da4e0:       f004 efa8       blx     9df434 <pixman_composite_src_8888_8888_asm_neon+0x2>
>   9da4e4:       b003            add     sp, #12
>   9da4e6:       bd00            pop     {pc}

So our nice ARM instructions:

> 009df432 <pixman_composite_src_8888_8888_asm_neon>:
>   9df432:       e92d5ff0        push    {r4, r5, r6, r7, r8, r9, sl, fp, ip, lr}
>   9df436:       e59d4028        ldr     r4, [sp, #40]
>   9df43a:       e3a0a000        mov     sl, #0  ; 0x0
>   9df43e:       e59d502c        ldr     r5, [sp, #44]
>   9df442:       e1a06002        mov     r6, r2
>   9df446:       e1a0b004        mov     fp, r4
>   9df44a:       e1a0c006        mov     ip, r6
>   9df44e:       e1a0e007        mov     lr, r7

Turn into gibberish due to the 2-byte offset:

> 009df434 <pixman_composite_src_8888_8888_asm_neon+0x2>:
>   9df434:       4028e92d        eormi   lr, r8, sp, lsr #18
>   9df438:       a000e59d        mulge   r0, sp, r5
>   9df43c:       502ce3a0        eorpl   lr, ip, r0, lsr #7
>   9df440:       6002e59d        mulvs   r2, sp, r5
>   9df444:       b004e1a0        andlt   lr, r4, r0, lsr #3
>   9df448:       c006e1a0        andgt   lr, r6, r0, lsr #3
>   9df44c:       e007e1a0        and     lr, r7, r0, lsr #3
>   9df450:       9201e1a0        andls   lr, r1, #40     ; 0x28

And sooner or later we crash.

This only happens to that bit of NEON assembly, and our Tegra boards don't have NEON so this was not caught on tests.

Also this doesn't happen with NDK5, so one more reason to switch :)  I will try to find out if NDK5 doesn't have this linker bug because it was fixed or because the conditions for the bug aren't met under NDK5.
Comment 19 Siarhei Siamashka 2011-07-22 12:25:45 PDT
Yes, this looks like the same issue as bug 666931 and bug 623161
Comment 20 Ben Hearsum (:bhearsum) 2011-07-22 12:34:41 PDT
In the future please don't trigger nightlies when regression hunting. If you need clean builds, use https://build.mozilla.org/clobberer/ to clobber the builder, and use normal opt builds. Triggering multiple nightlies in parallel has unknown behaviour, and seems to cause us to temporarily strand users (bug 673501).

Thanks!
Comment 21 Siarhei Siamashka 2011-07-22 14:47:04 PDT
I asked Timothy B. Terriberry on IRC and he provided more explanations about the problem and a link to this bug in binutils bugtracker:
    http://sourceware.org/bugzilla/show_bug.cgi?id=12931

For now the workaround (also applied to WebM earlier) is to explicitly set alignment for code sections and the following patch should do it for pixman:
    http://lists.freedesktop.org/archives/pixman/2011-July/001347.html

Please confirm whether it really helps to resolve this bug. And if it does, then it makes sense to do a complete review of all the arm assembly code in Mozilla to see if such workarounds should be also applied somewhere else.
Comment 22 Jim Chen [:jchen] [:darchons] 2011-07-22 15:07:13 PDT
> Please confirm whether it really helps to resolve this bug. And if it does,
> then it makes sense to do a complete review of all the arm assembly code in
> Mozilla to see if such workarounds should be also applied somewhere else.

Yes, this does fix the bug. Thank you for identifying the issue.

I agree a complete review will be very helpful, before another innocent person gets bitten by this bug again :)
Comment 23 Mark Finkle (:mfinkle) (use needinfo?) 2011-07-25 14:36:26 PDT
Has a bug been filed to get the pixman alignment fix into the mozilla codebase?
Comment 24 Mark Finkle (:mfinkle) (use needinfo?) 2011-08-04 15:00:57 PDT
*** Bug 676368 has been marked as a duplicate of this bug. ***
Comment 25 christian 2011-09-15 12:58:37 PDT
So, what do we need to do here for Firefox 7? Nothing? This is an existing problem? Do we have the workaround mentioned in comment 21 in mozilla-central or mozilla-beta? We must get some action on this today, preferably a resolution if it needs it.
Comment 26 Siarhei Siamashka 2011-09-15 13:10:43 PDT
(In reply to Christian Legnitto [:LegNeato] from comment #25)
> So, what do we need to do here for Firefox 7?

I would suggest cherry picking and applying http://cgit.freedesktop.org/pixman/commit/?id=b8d6babc91459a9f854695b56f0265298a3c6427 to the Mozilla's copy of pixman.

And while you are at it, there is also bug 667284 with a simple fix available. Which would be also nice to have applied.
Comment 27 Jim Chen [:jchen] [:darchons] 2011-09-15 13:37:16 PDT
Created attachment 560443 [details] [diff] [review]
Fix

Here's the patch for Mozilla. It's not in mozilla-central or anywhere else, but it would be very good to have.
Comment 28 Jim Chen [:jchen] [:darchons] 2011-09-15 14:04:38 PDT
Comment on attachment 560443 [details] [diff] [review]
Fix

Also nominating for Aurora and Beta, since the crash was originally from Fennec 7.

The patch has virtually no risk; it specifies one alignment attribute for three assembly files.

Without it, any build can potentially contain this crash, and the tegras cannot catch it all the time.
Comment 29 christian 2011-09-19 14:18:44 PDT
There were 2 crashes on android for the past week and ~45 on windows. Not high enough volume (though it is a startup crash which are underreported). Denying approval for beta.
Comment 30 Siarhei Siamashka 2011-09-19 14:43:51 PDT
Comment on attachment 560443 [details] [diff] [review]
Fix

Review of attachment 560443 [details] [diff] [review]:
-----------------------------------------------------------------

r+ from me
Comment 31 Johnny Stenback (:jst, jst@mozilla.com) 2011-09-20 14:40:54 PDT
Comment on attachment 560443 [details] [diff] [review]
Fix

a=jst per todays driver meeting (and this was reviewed, the flag just didn't get set, and given the nature of this change we're ok approving this before it's been landed in mozilla-central).
Comment 32 Mark Finkle (:mfinkle) (use needinfo?) 2011-09-21 05:55:08 PDT
Comment on attachment 560443 [details] [diff] [review]
Fix

making the r+ official
Comment 33 Mark Finkle (:mfinkle) (use needinfo?) 2011-09-21 06:07:14 PDT
pushed to aurora and inbound
https://hg.mozilla.org/releases/mozilla-aurora/rev/0b54cb43cec5
https://hg.mozilla.org/integration/mozilla-inbound/rev/da9d9d9d9809
Comment 34 Ed Morley [:emorley] 2011-09-21 18:17:40 PDT
https://hg.mozilla.org/mozilla-central/rev/da9d9d9d9809

Note You need to log in before you can comment on or make changes to this bug.