Closed Bug 672787 Opened 13 years ago Closed 13 years ago

Aurora build crashes [@ _moz_pixman_image_composite32] at start-up (07/20)

Categories

(Core :: Graphics, defect)

ARM
Android
defect
Not set
blocker

Tracking

()

RESOLVED FIXED
mozilla8
Tracking Status
firefox7 + wontfix
firefox8 --- fixed

People

(Reporter: xti, Assigned: jchen)

References

Details

Crash Data

Attachments

(2 files)

Attached file crash logcat
Build id : Mozilla/5.0 (Android;Linux armv7l;rv:7.0a2)Gecko/20110720
Firefox/7.0a2 Fennec/7.0a2
Device: Motorola Droid 2
OS: Android 2.2

Steps to reproduce:

Case 1:
If there is any Aurora build outdated installed, update it from about:firefox. After the new build is installed, tap on Open button.

Case 2:
Go to http://ftp.mozilla.org/pub/mozilla.org/mobile/nightly/latest-mozilla-aurora-android/ and tap on fennec-7.0a2.multi.eabi-arm.apk. After the app is installed, tap on the Open button.

Expected result:
Aurora build opens normally.

Actual result:
Aurora build crashes every time when it's opened and a Mozilla Crash dialog is displayed.

Note:
I cannot get the crash report from about:crashes because the Aurora build doesn't open at all.
I was able to get a crash report after I've installed the build from 20110719 over it: https://crash-stats.mozilla.com/report/index/bp-502110f9-b010-4911-86f8-edc292110720
This issue doesn't occur on:
Build id : Mozilla/5.0 (Android;Linux armv7l;rv:7.0a2)Gecko/20110719
Firefox/7.0a2 Fennec/7.0a2
Build config: http://hg.mozilla.org/releases/mozilla-aurora/rev/4d2a4e9e9730

But occurs on:
Build id : Mozilla/5.0 (Android;Linux armv7l;rv:7.0a2)Gecko/20110719
Firefox/7.0a2 Fennec/7.0a2

A possible range is:
http://hg.mozilla.org/mozilla-central/pushloghtml?startdate=2011-07-19&enddate=2011-07-20+03%3A00
https://crash-stats.mozilla.com/report/index/bp-502110f9-b010-4911-86f8-edc292110720
0 	libxul.so 	libxul.so@0x9df494 	
1 	libxul.so 	_moz_pixman_image_composite32 	gfx/cairo/libpixman/src/pixman.c:371
2 	libxul.so 	_clip_and_composite_boxes 	gfx/cairo/cairo/src/cairo-image-surface.c:3002
3 	libxul.so 	_cairo_image_surface_paint 	gfx/cairo/cairo/src/cairo-image-surface.c:3304
4 	libxul.so 	_cairo_surface_paint 	gfx/cairo/cairo/src/cairo-surface.c:2100
5 	libxul.so 	_cairo_gstate_paint 	gfx/cairo/cairo/src/cairo-gstate.c:1049
6 	libxul.so 	_moz_cairo_paint 	gfx/cairo/cairo/src/cairo.c:2238
7 	libxul.so 	_moz_cairo_paint_with_alpha 	gfx/cairo/cairo/src/cairo.c:2267
8 	libxul.so 	gfxContext::Paint 	gfx/thebes/gfxContext.cpp:772
9 	libxul.so 	gfxPlatform::OptimizeImage 	gfx/thebes/gfxPlatform.cpp:414
10 	libxul.so 	imgFrame::Optimize 	nsAutoPtr.h:954
11 	libxul.so 	mozilla::imagelib::RasterImage::DecodingComplete 	modules/libpr0n/src/RasterImage.cpp:1111
12 	libxul.so 	mozilla::imagelib::Decoder::PostDecodeDone 	nsCOMPtr.h:800
13 	libxul.so 	mozilla::imagelib::nsPNGDecoder::end_callback 	modules/libpr0n/decoders/nsPNGDecoder.cpp:863
14 	libxul.so 	MOZ_PNG_push_have_end 	modules/libimg/png/pngpread.c:1908
15 	libxul.so 	MOZ_PNG_push_read_chunk 	modules/libimg/png/pngpread.c:364
16 	libxul.so 	MOZ_PNG_proc_some_data 	modules/libimg/png/pngpread.c:65
17 	libxul.so 	MOZ_PNG_process_data 	modules/libimg/png/pngpread.c:39
18 	libxul.so 	mozilla::imagelib::nsPNGDecoder::WriteInternal 	modules/libpr0n/decoders/nsPNGDecoder.cpp:354
19 	libxul.so 	mozilla::imagelib::Decoder::Write 	modules/libpr0n/src/Decoder.cpp:104
20 	libxul.so 	mozilla::imagelib::RasterImage::WriteToDecoder 	modules/libpr0n/src/RasterImage.cpp:2277
Afaik, you need to look at the pushlog for Aurora, which is here:
http://hg.mozilla.org/releases/mozilla-aurora/
The problem is that I don't see anything that could trigger this crash, afaik.
Crash Signature: [@ _moz_pixman_image_composite32]
Summary: Aurora build crashes at start-up (07/20) → Aurora build crashes [@ _moz_pixman_image_composite32] at start-up (07/20)
Today's Aurora nightly starts up fine (no crash) on my Xoom.
I guess this is basically related to/the same as bug 623161.
Btw, I can reproduce this crash on start-up, using the LG Optimus Black.
i don't crash on a n1.
I'm in the Mountain View office with the crashing Aurora browser on the phone. If someone wants to investigate, he can grab my phone (I'm in the QA area).
I mentioned this to Naoki in case it is useful - There is a corresponding signature on the Firefox side with fairly low volume crash rate: https://crash-stats.mozilla.com/report/list?signature=_moz_pixman_image_composite32
Ok, this looks more like a Cairo bug to me, hen. Moving it to Core->Graphics.
Component: General → Graphics
Product: Fennec → Core
QA Contact: general → thebes
Version: Firefox 7 → Trunk
This crashes (info pulled from application.ini):
Version=7.0a2
BuildID=20110720042444
SourceRepository=http://hg.mozilla.org/releases/mozilla-aurora
SourceStamp=579cbf7a9add

This runs:
Version=7.0a2
BuildID=20110719042859
SourceRepository=http://hg.mozilla.org/releases/mozilla-aurora
SourceStamp=4d2a4e9e9730

So these are what landed in that span:

changeset:   72687:579cbf7a9add
user:        Simon Montagu <smontagu@smontagu.org>
date:        Mon Jul 11 06:40:51 2011 +0300
summary:     Don't resolve bidi paragraph in preformatted text until we really get to the end of the line. Bug 670226, r=roc, a=asa

changeset:   72686:433cd269be19
user:        Simon Montagu <smontagu@smontagu.org>
date:        Mon Jul 11 06:40:51 2011 +0300
summary:     Tests for bug 670226

changeset:   72685:ef4909389600
user:        Simon Montagu <smontagu@smontagu.org>
date:        Fri Jul 08 10:51:26 2011 +0300
summary:     Make sure that bidi continuation chains don't go beyond the end of the paragraph. Bug 668941, r=roc, a=asa

changeset:   72684:9a3234ac5c1c
user:        Myk Melez <myk@mozilla.org>
date:        Tue Jul 19 20:55:10 2011 -0700
summary:     update revision of Add-on SDK tests to latest tip; a=test-only

changeset:   72683:82f49f622e9d
user:        Luke Wagner <luke@mozilla.com>
date:        Mon Jul 18 17:37:19 2011 -0700
summary:     Bug 672026 - Ensure that there is an object principals finder during early startup (r=mrbkap,a=asa)
(In reply to comment #13)
> Talos regression in bug 672026

correction bug 654049
Depends on: 654049
No longer depends on: 672026
I helped dougt trigger the following jobs (from http://build.mozilla.org/builds/running.html):
mozilla-aurora	9a3234ac5c1c 	Android mozilla-aurora build 	
mozilla-aurora	82f49f622e9d 	Android mozilla-aurora build 	
mozilla-aurora	ef4909389600 	Android mozilla-aurora build 	
mozilla-aurora	433cd269be19 	Android mozilla-aurora build 	
	        579cbf7a9add 	Android mozilla-aurora build
	        433cd269be19 	Android mozilla-aurora build

He would have not had the means to trigger those 3 csets that were on the same push.

The builds should show up in http://ftp.mozilla.org/pub/mozilla.org/mobile/tinderbox-builds/mozilla-aurora-android/
How exactly is bug 654049 involved in this?
It looks like the crash started here:

http://hg.mozilla.org/releases/mozilla-aurora/rev/82f49f622e9d
Depends on: 654049
No longer depends on: 654049
I did some debugging and seems like it's not really a JS bug, but rather some strange linker magic

Functions from pixman_arm_neon_asm.o are supposed to be at least 4-byte aligned, which is the case before bug 672026:

arm-linux-androideabi-objdump -t dist/lib/libxul.so | grep 'pixman.\+neon'
> 009e57b8 l     F .text        00000000              .hidden pixman_composite_src_0888_0565_rev_asm_neon
> 009e79f8 l     F .text        00000000              .hidden pixman_scaled_nearest_scanline_8888_8888_OVER_asm_neon
> 009dc3d8 l     F .text        00000000              .hidden pixman_composite_scanline_add_asm_neon
> 009eb17c l     F .text        00000000              .hidden pixman_scaled_bilinear_scanline_8888_8888_OVER_asm_neon
> 009e6158 l     F .text        00000000              .hidden pixman_composite_over_0565_8_0565_asm_neon
> 009e8948 l     F .text        00000000              .hidden pixman_scaled_nearest_scanline_0565_8888_SRC_asm_neon
> 009e16c8 l     F .text        00000000              .hidden pixman_composite_add_n_8_8_asm_neon
> 009df054 l     F .text        00000000              .hidden pixman_composite_src_n_0565_asm_neon

But after bug 672026, everything from pixman_arm_neon_asm.o are now offset by 2 bytes (address in the first column):

arm-linux-androideabi-objdump -t dist/lib/libxul.so | grep 'pixman.\+neon'
> 009e5882 l     F .text        00000000              .hidden pixman_composite_src_0888_0565_rev_asm_neon
> 009e7ac2 l     F .text        00000000              .hidden pixman_scaled_nearest_scanline_8888_8888_OVER_asm_neon
> 009dc4a2 l     F .text        00000000              .hidden pixman_composite_scanline_add_asm_neon
> 009eb246 l     F .text        00000000              .hidden pixman_scaled_bilinear_scanline_8888_8888_OVER_asm_neon
> 009e6222 l     F .text        00000000              .hidden pixman_composite_over_0565_8_0565_asm_neon
> 009e8a12 l     F .text        00000000              .hidden pixman_scaled_nearest_scanline_0565_8888_SRC_asm_neon
> 009e1792 l     F .text        00000000              .hidden pixman_composite_add_n_8_8_asm_neon
> 009df11e l     F .text        00000000              .hidden pixman_composite_src_n_0565_asm_neon

Strange thing is this only happens to pixman_arm_neon_asm.o

Now when we call these function, blx instruction implies 4-byte alignment:

> 009da4dc <neon_composite_src_8888_8888+0x3c>:
>   9da4dc:       9000            str     r0, [sp, #0]
>   9da4de:       980b            ldr     r0, [sp, #44]
>   9da4e0:       f004 efa8       blx     9df434 <pixman_composite_src_8888_8888_asm_neon+0x2>
>   9da4e4:       b003            add     sp, #12
>   9da4e6:       bd00            pop     {pc}

So our nice ARM instructions:

> 009df432 <pixman_composite_src_8888_8888_asm_neon>:
>   9df432:       e92d5ff0        push    {r4, r5, r6, r7, r8, r9, sl, fp, ip, lr}
>   9df436:       e59d4028        ldr     r4, [sp, #40]
>   9df43a:       e3a0a000        mov     sl, #0  ; 0x0
>   9df43e:       e59d502c        ldr     r5, [sp, #44]
>   9df442:       e1a06002        mov     r6, r2
>   9df446:       e1a0b004        mov     fp, r4
>   9df44a:       e1a0c006        mov     ip, r6
>   9df44e:       e1a0e007        mov     lr, r7

Turn into gibberish due to the 2-byte offset:

> 009df434 <pixman_composite_src_8888_8888_asm_neon+0x2>:
>   9df434:       4028e92d        eormi   lr, r8, sp, lsr #18
>   9df438:       a000e59d        mulge   r0, sp, r5
>   9df43c:       502ce3a0        eorpl   lr, ip, r0, lsr #7
>   9df440:       6002e59d        mulvs   r2, sp, r5
>   9df444:       b004e1a0        andlt   lr, r4, r0, lsr #3
>   9df448:       c006e1a0        andgt   lr, r6, r0, lsr #3
>   9df44c:       e007e1a0        and     lr, r7, r0, lsr #3
>   9df450:       9201e1a0        andls   lr, r1, #40     ; 0x28

And sooner or later we crash.

This only happens to that bit of NEON assembly, and our Tegra boards don't have NEON so this was not caught on tests.

Also this doesn't happen with NDK5, so one more reason to switch :)  I will try to find out if NDK5 doesn't have this linker bug because it was fixed or because the conditions for the bug aren't met under NDK5.
Yes, this looks like the same issue as bug 666931 and bug 623161
In the future please don't trigger nightlies when regression hunting. If you need clean builds, use https://build.mozilla.org/clobberer/ to clobber the builder, and use normal opt builds. Triggering multiple nightlies in parallel has unknown behaviour, and seems to cause us to temporarily strand users (bug 673501).

Thanks!
I asked Timothy B. Terriberry on IRC and he provided more explanations about the problem and a link to this bug in binutils bugtracker:
    http://sourceware.org/bugzilla/show_bug.cgi?id=12931

For now the workaround (also applied to WebM earlier) is to explicitly set alignment for code sections and the following patch should do it for pixman:
    http://lists.freedesktop.org/archives/pixman/2011-July/001347.html

Please confirm whether it really helps to resolve this bug. And if it does, then it makes sense to do a complete review of all the arm assembly code in Mozilla to see if such workarounds should be also applied somewhere else.
> Please confirm whether it really helps to resolve this bug. And if it does,
> then it makes sense to do a complete review of all the arm assembly code in
> Mozilla to see if such workarounds should be also applied somewhere else.

Yes, this does fix the bug. Thank you for identifying the issue.

I agree a complete review will be very helpful, before another innocent person gets bitten by this bug again :)
Has a bug been filed to get the pixman alignment fix into the mozilla codebase?
Crash Signature: [@ _moz_pixman_image_composite32] → [@ libxul.so@0x9df494] [@ _moz_pixman_image_composite32]
So, what do we need to do here for Firefox 7? Nothing? This is an existing problem? Do we have the workaround mentioned in comment 21 in mozilla-central or mozilla-beta? We must get some action on this today, preferably a resolution if it needs it.
(In reply to Christian Legnitto [:LegNeato] from comment #25)
> So, what do we need to do here for Firefox 7?

I would suggest cherry picking and applying http://cgit.freedesktop.org/pixman/commit/?id=b8d6babc91459a9f854695b56f0265298a3c6427 to the Mozilla's copy of pixman.

And while you are at it, there is also bug 667284 with a simple fix available. Which would be also nice to have applied.
Attached patch FixSplinter Review
Here's the patch for Mozilla. It's not in mozilla-central or anywhere else, but it would be very good to have.
Attachment #560443 - Flags: review?(siarhei.siamashka)
Comment on attachment 560443 [details] [diff] [review]
Fix

Also nominating for Aurora and Beta, since the crash was originally from Fennec 7.

The patch has virtually no risk; it specifies one alignment attribute for three assembly files.

Without it, any build can potentially contain this crash, and the tegras cannot catch it all the time.
Attachment #560443 - Flags: approval-mozilla-beta?
Attachment #560443 - Flags: approval-mozilla-aurora?
There were 2 crashes on android for the past week and ~45 on windows. Not high enough volume (though it is a startup crash which are underreported). Denying approval for beta.
Attachment #560443 - Flags: approval-mozilla-beta? → approval-mozilla-beta-
Comment on attachment 560443 [details] [diff] [review]
Fix

Review of attachment 560443 [details] [diff] [review]:
-----------------------------------------------------------------

r+ from me
Comment on attachment 560443 [details] [diff] [review]
Fix

a=jst per todays driver meeting (and this was reviewed, the flag just didn't get set, and given the nature of this change we're ok approving this before it's been landed in mozilla-central).
Attachment #560443 - Flags: approval-mozilla-aurora? → approval-mozilla-aurora+
Comment on attachment 560443 [details] [diff] [review]
Fix

making the r+ official
Attachment #560443 - Flags: review?(siarhei.siamashka) → review+
https://hg.mozilla.org/mozilla-central/rev/da9d9d9d9809
Assignee: nobody → jimnchen+bmo
Status: NEW → RESOLVED
Closed: 13 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: