<a class="header-button" href="https://bugzilla.mozilla.org/home" title="Go to home page"> Bugzilla

Updated

•

12 years ago

Comment 3

•

12 years ago

Observed one source of crash: SharedLibraryInfo::GetInfoForSelf() in tools/profiler/shared-libraries-linux.cc: It scans /proc/%d/maps and take some transient maps into consideration. Those maps, or "false elfs", can disappear or change, for example, before stack unwinding or other usages. A minor problem is that, the offsets may be larger than 4G, so the unsigned long offset can sometimes not fit.

Comment 4

•

12 years ago

A few thoughts: Is Flatfish JB-based? If so, bug 914190 could be the problem. We should try to get the crash to happen under GDB, but from comment 2 it sounds like child processes are crashing on startup? If so, there's an environment variable (I forget what it is, but it's on one of the wikis somewhere) to make each child process sleep for a while after it's created so a debugger can be attached. Do we know that the problem is stale mappings from SharedLibraryInfo, or is that just a guess?

Comment 5

•

12 years ago

(In reply to Jed Davis [:jld] from comment #4) > A few thoughts: > > Is Flatfish JB-based? If so, bug 914190 could be the problem. > Yes, it's JB-based. The patch in bug 914190 was r- but it should work, right? I'll try it asap. BTW, the problem in comment 3 still exist no matter getline() works or not. > We should try to get the crash to happen under GDB, but from comment 2 it > sounds like child processes are crashing on startup? If so, there's an > environment variable (I forget what it is, but it's on one of the wikis > somewhere) to make each child process sleep for a while after it's created > so a debugger can be attached. > Thanks. I tested with the following and is able to get the crashes by gdb. The crashes seemed to be very random. ./profile.sh start or MOZ_PROFILER_STARTUP=1 ./run-gdb.sh > Do we know that the problem is stale mappings from SharedLibraryInfo, or is > that just a guess? I observed that SharedLibraryInfo contains things like 49db7000-49db8000 rwxs 800000ad000 00:0c 2147 /dev/pvrsrvkm and the profiler tried to access addresses not in /proc/pid/maps. After removing EHAddrSpace::Update(); in EHABIStackWalkInit() This kind of crash vanished.

Comment 6

•

12 years ago

One of the crashes: (segfault on accessing |container|, which is NULL) #0 nsLayoutUtils::FontSizeInflationFor (aFrame=<optimized out>) at ../../../gecko/layout/base/nsLayoutUtils.cpp:5418 #1 0x40f546b8 in FontSizeInflationListMarginAdjustment (aFrame=0x480c7b58) at ../../../gecko/layout/generic/nsHTMLReflowState.cpp:101 #2 nsCSSOffsetState::ComputeMargin (this=0xbea2d148, aHorizontalPercentBasis=<optimized out>, aVerticalPercentBasis=76800) at ../../../gecko/layout/generic/nsHTMLReflowState.cpp:2487 #3 0x40f54888 in nsCSSOffsetState::InitOffsets (this=<optimized out>, aHorizontalPercentBasis=76800, aVerticalPercentBasis= 76800, aFrameType=0x43c55ce0, aBorder=0x0, aPadding=0x0) at ../../../gecko/layout/generic/nsHTMLReflowState.cpp:2166 #4 0x40f54cf6 in nsHTMLReflowState::InitConstraints (this=0xbea2d148, aPresContext=0x4747e400, aContainingBlockWidth=76800, aContainingBlockHeight=43800, aBorder=0x0, aPadding=0x0, aFrameType=0x43c55ce0) at ../../../gecko/layout/generic/nsHTMLReflowState.cpp:1928 #5 0x40f556c2 in nsHTMLReflowState::Init (this=0xbea2d148, aPresContext=0x4747e400, aContainingBlockWidth=76800, aContainingBlockHeight=43800, aBorder=0x0, aPadding=0x0) at ../../../gecko/layout/generic/nsHTMLReflowState.cpp:319 #6 0x40f55968 in nsHTMLReflowState::nsHTMLReflowState (this=0xbea2d148, aPresContext=0x4747e400, aParentReflowState=..., aFrame=<optimized out>, aAvailableSpace=..., aContainingBlockWidth=76800, aContainingBlockHeight=43800, aFlags=0) at ../../../gecko/layout/generic/nsHTMLReflowState.cpp:190 #7 0x40f31822 in nsAbsoluteContainingBlock::ReflowAbsoluteFrame (this=<optimized out>, aDelegatingFrame=0x4809fdc0, aPresContext=0x4747e400, aReflowState=..., aContainingBlock=..., aConstrainHeight=true, aKidFrame=0x480c7b58, aStatus= @0xbea2d240, aOverflowAreas=0xbea2d600) at ../../../gecko/layout/generic/nsAbsoluteContainingBlock.cpp:390 #8 0x40f31af4 in nsAbsoluteContainingBlock::Reflow (this=0x479ffef8, aDelegatingFrame=0x4809fdc0, aPresContext=0x4747e400, aReflowState=..., aReflowStatus=@0xbea2d3dc, aContainingBlock=..., aConstrainHeight=true, aCBWidthChanged=true, aCBHeightChanged=true, aOverflowAreas=0xbea2d600) at ../../../gecko/layout/generic/nsAbsoluteContainingBlock.cpp:137 #9 0x40f3803e in nsBlockFrame::Reflow (this=0x4809fdc0, aPresContext=0x4747e400, aMetrics=..., aReflowState=<optimized out>, aStatus=@0xbea2d718) at ../../../gecko/layout/generic/nsBlockFrame.cpp:1165 #10 0x40f31894 in nsAbsoluteContainingBlock::ReflowAbsoluteFrame (this=<optimized out>, aDelegatingFrame=<optimized out>, aPresContext=0x4747e400, aReflowState=..., aContainingBlock=..., aConstrainHeight=true, aKidFrame=0x4809fdc0, aStatus= @0xbea2d718, aOverflowAreas=0xbea2dbc0) at ../../../gecko/layout/generic/nsAbsoluteContainingBlock.cpp:415 #11 0x40f31af4 in nsAbsoluteContainingBlock::Reflow (this=0x479ffea0, aDelegatingFrame=0x4805be88, aPresContext=0x4747e400, aReflowState=..., aReflowStatus=@0xbea2d8b4, aContainingBlock=..., aConstrainHeight=true, aCBWidthChanged=true, aCBHeightChanged=true, aOverflowAreas=0xbea2dbc0) at ../../../gecko/layout/generic/nsAbsoluteContainingBlock.cpp:137 #12 0x40f3803e in nsBlockFrame::Reflow (this=0x4805be88, aPresContext=0x4747e400, aMetrics=..., aReflowState=<optimized out>, aStatus=@0xbea2da7c) at ../../../gecko/layout/generic/nsBlockFrame.cpp:1165 #13 0x40f3d9a4 in nsContainerFrame::ReflowChild (this=<optimized out>, aKidFrame=0x4805be88, aPresContext=0x4747e400, aDesiredSize=..., aReflowState=..., aX=0, aY=0, aFlags=3, aStatus=@0xbea2da7c, aTracker=0x0) at ../../../gecko/layout/generic/nsContainerFrame.cpp:961 #14 0x40f4d6e4 in nsHTMLScrollFrame::ReflowScrolledFrame (this=0x4805bbd8, aState=0xbea2dc9c, aAssumeHScroll=<optimized out>, aAssumeVScroll=<optimized out>, aMetrics=0xbea2db94, aFirstPass=true) at ../../../gecko/layout/generic/nsGfxScrollFrame.cpp:441

Comment 7

•

12 years ago

Attached patch bug926734-unwind-no-rwx-hg0.diff (obsolete) — Details — Splinter Review

Does this patch help? It should exclude the writable/shared mappings from the graphics drivers. I don't understand why they need to be executable, but I assume we can't easily change that. The right solution is probably to have a real dl_iterate_phdr on b2g instead of walking the maps; bug 827846 was looking into this, but I don't know what its current status is.

Flags: needinfo?(thuang)

Comment 8

•

12 years ago

Partially. There are still other kinds of crashes remaining. Here are the files after filtering: 8000-23000 r-xp 0 /system/b2g/b2g 4007c000-4008a000 r-xp 0 /system/bin/linker 40097000-400d9000 r-xp 0 /system/b2g/libmozglue.so 400df000-400e2000 r-xp 0 /system/lib/liblog.so 400e4000-40129000 r-xp 0 /system/lib/libc.so 40139000-4013a000 r-xp 0 /system/lib/libstdc++.so 4013c000-40151000 r-xp 0 /system/lib/libm.so 40153000-4015d000 r-xp 0 /system/lib/libui.so 4015f000-4016c000 r-xp 0 /system/lib/libcutils.so 4017d000-4017e000 r-xp 0 /system/lib/libhardware.so 40180000-40181000 r-xp 0 /system/lib/libsync.so 40183000-4019b000 r-xp 0 /system/lib/libutils.so 4019e000-401a1000 r-xp 0 /system/lib/libcorkscrew.so 401a3000-401a7000 r-xp 0 /system/lib/libgccdemangle.so 401aa000-401c0000 r-xp 0 /system/lib/libz.so 401c2000-401ff000 r-xp 0 /system/lib/libEGL.so 40207000-40247000 r-xp 0 /system/lib/libGLES_trace.so 40249000-4027d000 r-xp 0 /system/lib/libstlport.so 40281000-40285000 r-xp 0 /system/lib/libhardware_legacy.so 40288000-4028a000 r-xp 0 /system/lib/libwpa_client.so 4028c000-40291000 r-xp 0 /system/lib/libnetutils.so 40293000-402c7000 r-xp 0 /system/lib/libgui.so 402d3000-402f1000 r-xp 0 /system/lib/libbinder.so 402f7000-402fc000 r-xp 0 /system/lib/libGLESv2.so 402fe000-40300000 r-xp 0 /system/lib/libsuspend.so 4040b000-40454000 r-xp 0 /system/lib/libdbus.so 40457000-4046d000 r-xp 0 /system/lib/libcamera_client.so 40474000-40488000 r-xp 0 /system/lib/libdrmframework.so 4048c000-404a0000 r-xp 0 /system/lib/libexpat.so 404a3000-404a7000 r-xp 0 /system/lib/libgabi++.so 404a9000-404f7000 r-xp 0 /system/lib/libsonivox.so 408fe000-40a93000 r-xp 0 /system/b2g/libnss3.so 40aa4000-421ea000 r-xp 0 /system/b2g/libxul.so 42590000-42676000 r-xp 0 /system/lib/libstagefright.so 4267e000-42744000 r-xp 0 /system/lib/libcrypto.so 42759000-4286a000 r-xp 0 /system/lib/libicui18n.so 42872000-4295d000 r-xp 0 /system/lib/libicuuc.so 4296b000-429d2000 r-xp 0 /system/lib/libmedia.so 429e7000-429f2000 r-xp 0 /system/lib/libstagefright_foundation.so 429f4000-429f6000 r-xp 0 /system/lib/libaudioutils.so 429f8000-429fb000 r-xp 0 /system/lib/libspeexresampler.so 429fd000-429fe000 r-xp 0 /system/lib/libmedia_native.so 42a00000-42a32000 r-xp 0 /system/lib/libssl.so 42a38000-42a4a000 r-xp 0 /system/lib/libstagefright_omx.so 42a4d000-42a4f000 r-xp 0 /system/lib/libstagefright_yuv.so 42a51000-42a69000 r-xp 0 /system/lib/libvorbisidec.so 42a6b000-42a6c000 r-xp 0 /system/lib/libstagefright_enc_common.so 42a6e000-42a73000 r-xp 0 /system/lib/libstagefright_avc_common.so 42a75000-42a79000 r-xp 0 /system/lib/libsysutils.so 42a7c000-42a94000 r-xp 0 /system/lib/libsensorservice.so 42a99000-42aa2000 r-xp 0 /system/vendor/lib/hw/gralloc.flatfish.so 42aa4000-42ae2000 r-xp 0 /system/vendor/lib/libsrv_um.so 42ae5000-42aea000 r-xp 0 /system/vendor/lib/libpvr2d.so 42bf0000-42bf6000 r-xp 0 /system/vendor/lib/hw/hwcomposer.flatfish.so 42cf8000-42cf9000 r-xp 0 /system/lib/hw/power.default.so ffff0000-ffff1000 r-xp 0 [vectors] I added a check to remove [vectors] but the problem remains.

Flags: needinfo?(thuang)

Comment 9

•

12 years ago

The [vectors] entry shouldn't be a problem. Are the remaining crashes in the stack unwinder, or somewhere else? The crash in comment #6 appears to be unrelated to profiling.

Flags: needinfo?(thuang)

Comment 10

•

12 years ago

(In reply to Jed Davis [:jld] from comment #9) > Are the remaining crashes in the stack unwinder, or somewhere else? I don't known, yet. It's rather random, although none are in stack unwinder. > The crash in comment #6 appears to be unrelated to profiling. It only happens when profiling is enabled. I have been trying to narrow the problem down and it seems that even replacing the contents of the signal handler with an empty, time consuming loop the problem remains. I'm wondering if this is related to some kernel problem but have yet to verify other parts before sending SIGPROF. Any ideas?

Flags: needinfo?(thuang)

Comment 11

•

12 years ago

Does kernel need to enable any special functions to support gecko profile?

Comment 12

•

12 years ago

(In reply to thomas tsai from comment #11) > Does kernel need to enable any special functions to support gecko profile? AFAIK, No. On Linux, SPS is implemented with POSIX signal.

Comment 13

•

12 years ago

I saw several crashes in JS parser. The logs give me more confidence that JS parser felt something wrong. Combining with the symptoms observed in bug 922548, there are very likely some memory corruptions. I'm trying to run valgrind but blocked by the linker; The linker crashed immediately when being executed by valgrind. Thomas, are there any special patches to bionic in this platform? Could you help run valgrind on it?

Flags: needinfo?(ttsai)

Comment 14

•

12 years ago

bionic is from aosp 4.2.2. There is no special patches for bionic. You can refer to "https://bitbucket.org/thomastsai/a31-b2g-manifest/src/6a8765e609490b79966f212dfcc6dbbd3bd1331a/base-jb.xml?at=aosp-4.2.2" valgrind ever run to debug graphic performance couple weeks ago. I will arrange one to run valgrind again.

Flags: needinfo?(ttsai)

Updated

•

12 years ago

Flags: needinfo?(dliang)

Updated

•

12 years ago

Depends on: 940167

Comment 15

•

12 years ago

Comment on attachment 829579 [details] [diff] [review] bug926734-unwind-no-rwx-hg0.diff I filed bug 940167 for this. If you need the patch on 1.2, I think the bug needs to be flagged koi? and go through triage for that.

Attachment #829579 - Attachment is obsolete: true

Comment 16

•

12 years ago

Can anyone confirm that this fixes the issue for Flatfish profiling?

Flags: needinfo?(vliu)

Vincent Liu[:vliu]

Reporter

Comment 17

•

12 years ago

Ting-Yuan and dliang is still working on this issue. As Ting-Yuan said in Comment 13, they still focus on running valgrind to see the problem.

Flags: needinfo?(vliu)

Francis Lee [:frlee]

Comment 18

•

12 years ago

hi Thomas, this issue seems important for developer release, i intend to change it to 1.3+. do you have any concern?

Flags: needinfo?(ttsai)

Danny Liang [:dliang]

Comment 19

•

12 years ago

Now, we can use valgrind to debug B2G successfully. Below is the SW information on valgrind: Valgrind: 3.9.0 ( download here: http://valgrind.org/downloads/valgrind-3.9.0.tar.bz2) Android-NDK: R9b (download here: http://dl.google.com/android/ndk/android-ndk-r9b-linux-x86_64.tar.bz2) We will keep post if any further update.

Flags: needinfo?(dliang)

Updated

•

12 years ago

Flags: needinfo?(ttsai)

Jerry Shih[:jerry] (UTC+8) (inactive)

Comment 20

•

12 years ago

I've been looking more into this. The remaining crash seems to be strongly linked with the imgtec driver. Here's the information I have: - Signaling the process doesn't cause the crash. (interrupted the sys calls the imgtec driver may of not been handled properly). - Writing to a large heap allocated block (the profiler's circular buffer) is enough to trigger the crash. The crash seems to happen in a minimal signal handler if-and-only-if I touch this circular buffer. - I tried playing and scribbling the popped stack in case imgtec was returning memory from the stack but that wasn't the case. It didn't make a difference. I'm not sure how the imgtec driver could be causing this. There's 'gralloc_unregister_buffer' error. I tried patching GraphicsBuffer to log the alloc/free and they all seemed pair. Maybe these errors should be fixed before we continue to investigate this problem. Vincent or Jerry do you see the gralloc error with flatfish? I see them without the profiler running.

Flags: needinfo?(vlin)

Flags: needinfo?(hshih)

Comment 21

•

12 years ago

I have thest message without enable profiler. Are these error message the same as yours? Vincent will ask the imgtec for this error message. E/IMGSRV ( 1273): :0: gralloc_device_alloc: Framebuffer/bypass usage bits are incompatible with non-GPU-renderable pixel format (1) W/GraphicBufferAllocator( 1273): WOW! gralloc alloc failed, waiting for pending frees! E/IMGSRV ( 1273): :0: gralloc_device_alloc: Framebuffer/bypass usage bits are incompatible with non-GPU-renderable pixel format (1) E/IMGSRV ( 1273): :0: gralloc_unregister_buffer: Cannot unregister unregistered buffer (ID=9) E/IMGSRV ( 1273): :0: gralloc_unregister_buffer: Cannot unregister unregistered buffer (ID=10) E/IMGSRV ( 1273): :0: gralloc_unregister_buffer: Cannot unregister unregistered buffer (ID=16)

Flags: needinfo?(vlin)

Flags: needinfo?(hshih)

Vincent Lin[:vilin]

Comment 22

•

12 years ago

(In reply to Benoit Girard (:BenWa) from comment #20) > I've been looking more into this. The remaining crash seems to be strongly > linked with the imgtec driver. Here's the information I have: > - Signaling the process doesn't cause the crash. (interrupted the sys calls > the imgtec driver may of not been handled properly). > - Writing to a large heap allocated block (the profiler's circular buffer) > is enough to trigger the crash. The crash seems to happen in a minimal > signal handler if-and-only-if I touch this circular buffer. > - I tried playing and scribbling the popped stack in case imgtec was > returning memory from the stack but that wasn't the case. It didn't make a > difference. > > I'm not sure how the imgtec driver could be causing this. There's > 'gralloc_unregister_buffer' error. I tried patching GraphicsBuffer to log > the alloc/free and they all seemed pair. Maybe these errors should be fixed > before we continue to investigate this problem. > > Vincent or Jerry do you see the gralloc error with flatfish? I see them > without the profiler running. I've also seen these error messages for a while, but alloc/free in Gecko is pair. Terry: We need your help to debug graphics hal.

Flags: needinfo?(phterry)

Updated

•

12 years ago

Depends on: 953018