662936 - fennec silently fails while running talos

Reporter

Description

•

14 years ago

while running talos, we have a series of easily reproducible failures which occur. They occur across a variety of tests such as tsvg, tdhtml, tpan, tzoom, and tp4m. Sometimes I can get 4 or 5 runs in a row (locally from my desktop to my tegra) with no failures, other times it will fail part way through the test. This is not a scenario where Fennec is hung and we timeout, this is a scenario where the fennec process terminates and we complete the test early inconclusively due to lack of complete data. I have ran hundreds of test run and looking in logcat and instrumenting test files I am not able to see any signs of data out of the ordinary.

Joel Maher ( :jmaher ) (UTC -8)

Reporter

Comment 1

•

14 years ago

we need a mobile developer to take a look at this.

Joel Maher ( :jmaher ) (UTC -8)

Reporter

Comment 2

•

14 years ago

I have updated my set of android dumps: http://people.mozilla.org/~jmaher/android_dumps/ These contain information which should be helpful in pinpointing this. After further debugging today, I see that we add a 'pageshow' event listener in messagemanager and we never receive that event. In watching this on a screen, I see fennec loading the test url and crashing during the pageload. I see checkerboard on half of the screen and the webpage on the other half of the webpage. This error is by far the most frequent orange that we get for android talos.

Joel Maher ( :jmaher ) (UTC -8)

Reporter

Comment 3

•

14 years ago

here is the list of crashes that I submitted from this morning: https://crash-stats.mozilla.com/report/list?product=Fennec&build_id=20110608043018&query_search=signature&query_type=contains&reason_type=contains&date=06%2F09%2F2011%2011%3A16%3A55&range_value=1&range_unit=weeks&hang_type=any&process_type=any&do_query=1&signature=libc.so%400x11f0c

Joel Maher ( :jmaher ) (UTC -8)

Reporter

Comment 4

•

14 years ago

this could be the same crash as bug 613954

Depends on: 613954

Joel Maher ( :jmaher ) (UTC -8)

Reporter

Comment 5

•

14 years ago

here are the steps to reproduce this: 1) checkout talos: hg clone http://hg.mozilla.org/build/talos talos 2) cd talos/mobile_profile/extensions 3) check out fennecmark (inside mobile_profile): hg clone http://hg.mozilla.org/users/tglek_mozilla.com/fennecmark bench@tara.glek 4) setup a webserver (I use apache) to point at the root of your talos checkout 5) get a tegra (or compatible android device) with a SUTAgent installed 6) from the talos directory run: python remotePerfConfigurator.py -v -e org.mozilla.fennec --activeTests tpan --resultsServer ' ' --resultsLink ' ' --output tpan.yml --remoteDevice <ip of your sutagent> --webServer <ip of your host> 7) from the talos directory run: python run_tests.py -d -n tpan.yml 8) repeat up to 5 times, I usually get it to fail in 3 or 4 times.

Alon Zakai (:azakai)

Comment 6

•

14 years ago

What kind of failure occurs in step 8 - crash, timeout, OOM, or incorrect result?

Joel Maher ( :jmaher ) (UTC -8)

Reporter

Comment 7

•

14 years ago

when it fails, I see fennec crash due to OOM, and talos reports back 'unexpected output' as it never found the log file with the timing data that it expects.

Alon Zakai (:azakai)

Comment 8

•

14 years ago

Ok, thanks. I don't have a tegra, but I can try to reproduce on linux desktop. Unless there is an obvious reason why that is a waste of time?

Joel Maher ( :jmaher ) (UTC -8)

Reporter

Comment 9

•

14 years ago

I can never get tpan or tzoom working on a linux desktop build of fennec (I could be building it wrong). Any android device would be ideal. We don't see this on Maemo, so I suspect the problem is in Android only. That is just me speculating though.

Alon Zakai (:azakai)

Comment 10

•

14 years ago

I can try on an Android phone then. Searching for info on SUT agent, I found this page https://wiki.mozilla.org/Mobile/Fennec_Unittests/Remote_Testing which says it is possible to use adb devicemanager instead of the SUT agent. Would that be simpler if I am using a phone connected by USB? And what would the STR be to run talos that way?

Alon Zakai (:azakai)

Comment 11

•

14 years ago

From irc discussion it looks like I need the SUT agent. How would I do the webserver part on a developer machine? Our firewall will prevent me just running a webserver there that the phone can access. Is there some way in Talos to get around that?

Joel Maher ( :jmaher ) (UTC -8)

Reporter

Comment 12

•

14 years ago

here are some steps to run without a server (i.e. skipping step 4 above). 6) from the talos directory run: python remotePerfConfigurator.py -v -e org.mozilla.fennec --activeTests tpan --resultsServer ' ' --resultsLink ' ' --output tpan.yml --remoteDevice <ip of your sutagent> --webServer people.mozilla.org/~jmaher/talos 7) from the talos directory run: python run_tests.py -d -n tpan.yml

Joel Maher ( :jmaher ) (UTC -8)

Reporter

Comment 13

•

14 years ago

ok, I have some new instructions: 1) download: http://people.mozilla.org/~jmaher/talos/talos_ADB.zip 2) unzip in a local talos directory on your machine 3) inside that talos directory, run: python remotePerfConfigurator.py -v -e org.mozilla.fennec --activeTests tpan --resultsServer ' ' --resultsLink ' ' --output tpan.yml --remoteDevice ADB --webServer people.mozilla.org/~jmaher/talos 4)now run: python run_tests.py -d -n tpan.yml This requires that you have ADB access to your device. This has been poorly tested, but it is a much simpler way for reproducing this bug.

Alon Zakai (:azakai)

Comment 14

•

14 years ago

Definitely getting closer, but the blocking factor at this point for me is that a rooted device is apparently needed for this approach, and I don't have one. Without root copying application.ini is not possible. Trying to do run-as always results in run-as: Package 'org.mozilla.fennec' has corrupt installation Perhaps all of this is not needed though. If it's an OOM, it should be debuggable on desktop as well (it won't OOM, but we can see memory rise and understand why). However I can't get Talos to run on linux desktop either. I got to the point where it loads a Fennec window, but aside from the window title there nothing happens, it just waits forever.

Alon Zakai (:azakai)

Comment 15

•

14 years ago

Got a rooted device. Still no luck though, still getting 'restarting as root failed' when running remotePerfConfigurator.py. Note btw that all the talos infrastructure stuff isn't necessarily needed on device. If I just had a URL of a website to visit, that would then scroll/load other pages/etc. the browser through the same data as tpan does, then that would be enough. All I need is to be able to run the device through that on a profiling build of Fennec (I don't need any special control/logging stuff the talos system has).

Joel Maher ( :jmaher ) (UTC -8)

Reporter

Comment 16

•

14 years ago

here is some logcat output. tpan successful iteration (opt build): http://people.mozilla.org/~jmaher/android_dumps/successful_opt.txt tpan failed iteration (opt build): http://people.mozilla.org/~jmaher/android_dumps/failure_opt.txt tpan failed iteration (debug build): http://people.mozilla.org/~jmaher/android_dumps/failure_debug.txt

Alon Zakai (:azakai)

Comment 17

•

14 years ago

jmaher, following our IRC discussion, here is a special profiling build: http://syntensity.com/static/fennec-profiling.apk The logcat output of that build, when failing silently in tpan, would be very useful. If you can, please include logs from several runs (useful data may not appear in every run).

Alon Zakai (:azakai)

Comment 18

•

14 years ago

Improved build based on initial results: http://syntensity.com/static/fennec-profiling2.apk

Joel Maher ( :jmaher ) (UTC -8)

Reporter

Comment 19

•

14 years ago

fennec crashes with profiling2.apk: http://people.mozilla.org/~jmaher/android_dumps/profiled2-1.txt

Joel Maher ( :jmaher ) (UTC -8)

Reporter

Comment 20

•

14 years ago

http://people.mozilla.org/~jmaher/android_dumps/ a second crash with the same build: profiled2-2.txt a successful run with the same build (immediately before the profiled2-2.txt crash): profiled-2-pass.log

Alon Zakai (:azakai)

Comment 21

•

14 years ago

The data so far appears to show this is indeed an OOM. Need more data though. Here is a yet more improved build: http://syntensity.com/static/fennec-profiling3.apk

Alon Zakai (:azakai)

Comment 22

•

14 years ago

Even better. Sorry for bugspam. http://syntensity.com/static/fennec-profiling4.apk

Alon Zakai (:azakai)

Comment 23

•

14 years ago

Further profiling with jmaher indicates this is NOT an oom. We sometimes get '4GB' as the rss, but the device has 1GB. Also it doesn't always crash when saying '4GB' and rss returns to normal values. And in jmaher's last log, http://people.mozilla.org/~jmaher/android_dumps/profiled4-2.log there is a crash but no indication of excessive memory use. What is worrying is that the logs show different Runnables being run right before the crash. Which argues against a single problem here. Unless it isn't an event, timer or ipc message. At this point, we need to see some stack traces. The profiling builds should have crashreporter enabled, since I didn't disable it in the mozconfig. Hopefully we can get a stack trace that way. Otherwise, we will need to figure out another way.

Alon Zakai (:azakai)

Updated

•

14 years ago

Depends on: 664510

Alon Zakai (:azakai)

Comment 24

•

14 years ago

The stack traces we have from before all show crashes in libc, with just hex offsets, which is not very helpful. blassey, is there some way to get clearer stack traces on Android?

Brad Lassey [:blassey] (use needinfo?)

Comment 25

•

14 years ago

(In reply to comment #24) > blassey, is there some way to get clearer stack traces on Android? Debug symbols for libc would do the trick. If you build your own rom you get those.

Joel Maher ( :jmaher ) (UTC -8)

Reporter

Comment 26

•

14 years ago

couldn't we just look at this from the pageload angle? It always crashes during a pageload, usually when about 2/3 of the page is loaded and the rest is checkerboard. I am game for installing a custom ROM on my tegra board and giving it a go.

Bob Moss :bmoss

Comment 27

•

14 years ago

Can I suggest that you guys Alon, Joel, and Bear, pick a board in staging, setup adb over tcp (for debugging purposes) and run the repro steps together? I can provide the couple of commands that would be necessary to start this up and then you can use everything else just like it happens in try, build, etc...

Alon Zakai (:azakai)

Comment 28

•

14 years ago

bmoss, thanks for the suggestion. Right now we have some other stuff to try, but if they don't work out we should do that. Would we be able to modify the board from staging - install a different libc, install gdb, etc.?

Alon Zakai (:azakai)

Comment 29

•

14 years ago

jmaher, can you please reproduce the crash on your device with this debug build, http://syntensity.com/static/fennec-debug.apk After looking some more at the previous crashes you reported, it just looks like the stack traces are broken. Hopefully that won't happen with this debug build and we can get proper stack traces in the crash reports.

Bob Moss :bmoss

Comment 30

•

14 years ago

Alon, You should be able to do anything you would do from adb. The only caution I will make is if you reboot you will need to re-establish adb over tcp before you can continue, so as long as the agent is installed you should be good to go. It always starts up on boot. If you need to completely reimage the board that will most likely prove to be very difficult because it would remove the agent. Let me know if I can be of assistance.

Joel Maher ( :jmaher ) (UTC -8)

Reporter

Comment 31

•

14 years ago

it seems that having physical access to a device would be helpful for the main purpose of allowing crashreporter to run and collecting data that way. Is there anyway we could get Alon a tegra+screen+input to do this? Even if it is just for a week or two (however long) until this is behind us? It would be simple to pull a tegra from the automation pool and access it over the network. Running the tests would be as simple as downloading the talos.zip and running the commands given above, just replacing the IP address with that of the tegra. Why I hestitate to jump on this approach is that we seem to find value in the crashreporter end of things. We can setup adb over tcp/ip for a tegra from the pool and install the debugging tools on there if we wish.

Alon Zakai (:azakai)

Comment 32

•

14 years ago

It does sound like getting a board could help. But I do have some hope that just using the build in comment #29 will get us all the data we need here. So I would suggest trying that out first since it sounds much faster.

Joel Maher ( :jmaher ) (UTC -8)

Reporter

Comment 33

•

14 years ago

Alon, do you have a tegra yet? if not, we need to bug bmoss or aki to get one.

Aki Sasaki (not active)

Comment 34

•

14 years ago

As stated in IRC, there are tegras on the right hand side of my desk that won't boot without a usb keyboard or mouse attached, which makes them useless for production. Alon can use one.

Alon Zakai (:azakai)

Comment 35

•

14 years ago

I have a tegra board as of yesterday, have just been trying to get gdb running on it with dougt. However we have not been able to do so. Perhaps there is something wrong with that board specifically and I should get another. Aki, I'll be in touch soon unless I figure this out with the current board. Have also been trying to run gdb on some phones, but without success due to various phone-related issues.

Alon Zakai (:azakai)

Comment 36

•

14 years ago

With bmoss' help, I have a tegra board running. However pushing gdb to the device and then just running adb shell /data/local/gdbserver segfaults (this worked on the phones). Currently blocked on this and have no idea how to proceed.

Alon Zakai (:azakai)

Comment 37

•

14 years ago

After a lot of work with dougt, the situation is this: 1. We are getting corrupt stack traces apparently all the time, bug 664510, both using crashreporter and manual gdb. We can't really do much until that is fixed. Not sure who needs to look at that. 2. Debug builds now crash on startup it appears. Maybe unrelated, but also preventing work on this bug (since it might have helped us get stack traces). And, since we can't get stack traces, we can't see where this startup crash is.

Joel Maher ( :jmaher ) (UTC -8)

Reporter

Comment 38

•

14 years ago

For some reason, I don't think we are going to fix both of these issues any time soon. Is there any other method we can use to help narrow this down?

Alon Zakai (:azakai)

Comment 39

•

14 years ago

It will be very sad if we do not fix the first issue very soon. We are basically flying blind right now, much of our incoming crash reporter data is meaningless! Issue 2 might not be crucial for us here. Aside from this, I have just one more idea at the moment, to automatically rewrite the entire tree to print out all functions as we enter them. Then we could narrow down where we are crashing here. It will be very hard to get useful data from this, but we can try it.

Joel Maher ( :jmaher ) (UTC -8)

Reporter

Comment 40

•

14 years ago

any update on this bug?

Alon Zakai (:azakai)

Comment 41

•

14 years ago

I am still working on the automatic rewriting patch. Nothing to update about yet.

Alon Zakai (:azakai)

Comment 42

•

14 years ago

Ok, this is probably *not* going to work, but worth a shot. Here is a special build of fennec that will print out each function as we enter it, www.syntensity.com/static/fennec-functionprint.apk This is slower than a debug build, almost unusably so. It also generates a ridiculous amount of output, so saving it to disk would probably not be practical. However, if we can run this on talos, reproduce the silent failure, and view the last say 10,000 lines of output, we would have a chance at processing that and figuring out what function is responsible. jmaher, can you please try to run this build on talos to see if the crash is reproducible with it? If so, the next step would be for us to figure out how to save just the last 10,000 lines of output. (If not, back to the drawing board...)

Alon Zakai (:azakai)

Comment 43

•

14 years ago

Another direction we could pursue here, is to bisect on fennec builds on talos to see when the stack reporting breaks. That is, to find the latest version of fennec where we *can* get stack traces for this bug. There is a good chance having that would let us fix this.

Alon Zakai (:azakai)

Comment 44

•

14 years ago

Looks like we might have at least a partial fix for the stack traces issue, in bug 644707. I made an android build with it, www.syntensity.com/static/fennec-aborthandler.apk jmaher, can you please try to reproduce the talos problem in this bug using that build?

Joel Maher ( :jmaher ) (UTC -8)

Reporter

Comment 45

•

14 years ago

azakai, I have 1 device and all my other android work stops when I do this. I was under the impression that ctalbert had a tegra you could use. Please let me know if you have problems running the steps in comment 13 (https://bugzilla.mozilla.org/show_bug.cgi?id=662936#c13) to reproduce this.

Alon Zakai (:azakai)

Comment 46

•

14 years ago

Ok. I am currently sick at home, so can't use that tegra, but hopefully in a few days I'll be able to work on this.

Joel Maher ( :jmaher ) (UTC -8)

Reporter

Comment 47

•

14 years ago

I ran this and reproduced it a few times. I have a log which has two of the crashes in it: http://people.mozilla.org/~jmaher/android_dumps/aborted.log I should have some cycles on my tegra today and this weekend, so I can run another build.

Alon Zakai (:azakai)

Comment 48

•

14 years ago

There won't be any useful information in the log, but hopefully crashreporter stack traces will now work. The relevant patch landed on m-c this morning, so the same should be true for tomorrow's nightly.

Alon Zakai (:azakai)

Comment 49

•

14 years ago

A promising development is that the just-released Android NDK6 has an ndk-stack tool, which translates the raw dumps from logcat into proper stack traces. This could be very useful in this bug and elsewhere, potentially. (I also suspect this tool would work even with binaries created with earlier NDK versions, but would need to check that.)

cmtalbert

Comment 50

•

14 years ago

(In reply to comment #49) > A promising development is that the just-released Android NDK6 has an > ndk-stack tool, which translates the raw dumps from logcat into proper stack > traces. This could be very useful in this bug and elsewhere, potentially. > > (I also suspect this tool would work even with binaries created with earlier > NDK versions, but would need to check that.) So when can we expect to see some movement on this issue? Do you still need a tegra to debug it? Can we sit down next week and crank through this issue?

Alon Zakai (:azakai)

Comment 51

•

14 years ago

At this point, the next step is to run the latest nightly on talos (and then to inspect the crashreporter stacks). jmaher said he might have time to do that in comment 47. If not, let's do this together next week.

Joel Maher ( :jmaher ) (UTC -8)

Reporter

Comment 52

•

14 years ago

nightly build from yesterday and uploaded crash report: https://crash-stats.mozilla.com/report/index/0bb6b0d2-a42a-4d9e-8f20-ab0d12110719

Alon Zakai (:azakai)

Comment 53

•

14 years ago

Sadly the work we did on better crash reports did not help, that stack trace is still just libc noise. I will look into getting better stack traces with a custom-compiled libc.

Brad Lassey [:blassey] (use needinfo?)

Updated

•

14 years ago

Depends on: 672564

(inactive) Jim Chen [:jchen] [:darchons]

Assignee

Comment 54

•

14 years ago

Earlier I took a look at the crash log at http://people.mozilla.org/~jmaher/android_dumps/aborted.log. Combined with the libc.so in that directory and the source code, I translated the stack trace to the following libc functions: 0. abort() [basically anytime there's a SIGSEG with address 0xdeadbaad it's abort() causing it] 1. dlfree() 2. free() 3. fclose() 4. _files_getaddrinfo() 5. nsdispatch() 6. explore_fqdn() 7. getaddrinfo() The source for getaddrinfo() and friends is here http://android.git.kernel.org/?p=platform/bionic.git;a=blob;f=libc/netbsd/net/getaddrinfo.c;hb=HEAD So far, the stack trace says some function called getaddrinfo(), which eventually calls _files_getaddrinfo() which is responsible for resolving the host name through /etc/hosts. So _files_getaddrinfo() fopens /system/etc/hosts, reads through it, and somehow by the time it fcloses hosts, the FILE structure got corrupted to the point that free() detects it and decides to abort(). I don't know why it's getting corrupted though. Too bad the log only shows 8 frames, but I'd be glad to help anyone getting more stack frames.

(inactive) Jim Chen [:jchen] [:darchons]

Assignee

Comment 55

•

14 years ago

(In reply to Jim Chen [:jchen] (mobile intern :) from comment #54) > the FILE structure got corrupted One correction: it's FILE::_bf getting corrupted. _bf is an internal buffer used by fgetc et al. _files_getaddrinfo only uses fgets so should be fgets that's corrupting the buffer. Still strange though.

John O'Duinn [:joduinn] (please use "needinfo?" flag)

Updated

•

14 years ago

Whiteboard: [orange][mobile_unittests][mobile_dev_needed] → [orange][mobile_unittests][mobile_dev_needed][android_tier_1]

Alon Zakai (:azakai)

Comment 56

•

14 years ago

(In reply to Jim Chen [:jchen] (mobile intern :) from comment #54) > Earlier I took a look at the crash log at > http://people.mozilla.org/~jmaher/android_dumps/aborted.log. Combined with > the libc.so in that directory and the source code, I translated the stack > trace Can you please elaborate on the steps you took to do that?

(not currently active) Ted Mielczarek

Comment 57

•

14 years ago

Attached file stack — Details

This is a stack from the crash in comment 52. It goes off the rails after frame 8, but it looks sensible up to that point.

(inactive) Jim Chen [:jchen] [:darchons]

Assignee

Comment 58

•

14 years ago

(In reply to Alon Zakai (:azakai) from comment #56) > (In reply to Jim Chen [:jchen] (mobile intern :) from comment #54) > > Earlier I took a look at the crash log at > > http://people.mozilla.org/~jmaher/android_dumps/aborted.log. Combined with > > the libc.so in that directory and the source code, I translated the stack > > trace > > Can you please elaborate on the steps you took to do that? I basically took the libc.so and did 'objdump -d', and found the instructions that matched the offset in the logcat stack trace: > I/DEBUG ( 937): #00 pc 00011f0c /system/lib/libc.so > I/DEBUG ( 937): #01 pc 0000be08 /system/lib/libc.so > I/DEBUG ( 937): #02 pc 0000cd6c /system/lib/libc.so > I/DEBUG ( 937): #03 pc 000147d4 /system/lib/libc.so > I/DEBUG ( 937): #04 pc 0002e58e /system/lib/libc.so > I/DEBUG ( 937): #05 pc 0002cd1e /system/lib/libc.so > I/DEBUG ( 937): #06 pc 0002d4ec /system/lib/libc.so > I/DEBUG ( 937): #07 pc 0002d7f8 /system/lib/libc.so From that I could verify the stack trace is good. objdump also gives you offsets in relation to exported symbols, so it was easy to map the frames to exported functions such as abort(), free(), nsdispatch(), getaddrinfo(). The stack trace now looks like this: 0. abort() 1. dlfree() 2. free() 3. fclose() 4. ??? 5. nsdispatch() 6. ??? 7. getaddrinfo() Some functions are not exported, like _files_getaddrinfo() and explore_fqdn(). To get those I looked in the source code I linked and found that, 1) fclose() is called from android_getaddrinfo_proxy() and _endhtent(), _endhtent() is called by _files_getaddrinfo(). The instruction around the fclose() call fit _files_getaddrinfo(). So you can infer that the missing frame between fclose() and nsdispatch() is _files_getaddrinfo() and _endhtent() was inlined into _files_getaddrinfo(). That's why it appears _files_getaddrinfo() is calling fclose() directly. 2) getaddrinfo() calls explore_fqdn() and explore_fqdn() calls nsdispatch(). So the other missing frame is explore_fqdn() Also, if you disassemble fclose() and look at where free() is called, it corresponds to the line 'free(fp->_bf._base)' in the source code. Combined with the bionic free() behavior that it calls abort() whenever it detects corruption, you can infer that the FILE::_bf buffer was corrupted.

(inactive) Jim Chen [:jchen] [:darchons]

Assignee

Comment 59

•

14 years ago

I could not reproduce this on my HTC Vision after running the test about 10 times. Here is my setup: 1) HTC Vision running CyanogenMod 7 with debug libc.so 2) latest nightly and latest SUTAgent from http://ftp.mozilla.org/pub/mozilla.org/mobile/nightly/latest-mozilla-central-android-r7 3) talos from http://people.mozilla.org/~jmaher/talos/talos_ADB.zip 4) xampp as host web server 5) phone networked to host through USB tethering 6) same python commands as in comment #5 The ADB way didn't work for me but SUTAgent did. There were talos warnings about failing to validate output, but no unexpected output errors. Maybe there is something wrong with my setup? Or has this been reproduced on a non-tegra device before?

Joel Maher ( :jmaher ) (UTC -8)

Reporter

Comment 60

•

14 years ago

I have not tried to reproduce this on a non tegra, I can try on a Nexus S to see if I can make it reproduce.

Doug Turner (:dougt)

Comment 61

•

14 years ago

I have a few roms that I can flash on the tegra 200 to see if it makes any difference.

Joel Maher ( :jmaher ) (UTC -8)

Reporter

Comment 62

•

14 years ago

I was not able to reproduce on my Nexus S with adb or sut, but could easily reproduce using the same build and tools on my tegra.

(inactive) Jim Chen [:jchen] [:darchons]

Assignee

Comment 63

•

14 years ago

After more testing we found that the crash _is_ caused by getaddrinfo() in bionic and more specifically, by handling of the etc/hosts file. Looks to be a thread-unsafe operation in either stdio (likely) or malloc (unlikely). One workaround is to put locks around getaddrinfo() in NSPR, i.e. only one thread can call it at a time. This works but is ugly. A simpler workaround confirmed by jmaher is to delete /system/etc/hosts on the tegra. (since /system is initially read-only, it should first be remounted with 'adb remount'). This works by skipping etc/hosts handling entirely, which is fine for our tegra environments. Obviously, if it's a bug in Fennec we should fix it in Fennec, but at this point, the crash looks to be a bug in bionic and is only affecting the tegras. So this workaround is the better option.

Joel Maher ( :jmaher ) (UTC -8)

Reporter

Comment 64

•

14 years ago

here is some sample code we can integrate into our setup/config scripts before running tests: import devicemanager dm = devicemanager.DeviceManager('192.168.1.101', 20701) if dm.fileExists('/system/etc/hosts'): dm.sendCMD(['exec mount -o remount,rw -t yaffs2 /dev/block/mtdblock3 /system']) dm.sendCMD(['exec rm /system/etc/hosts']) if dm.fileExists('/system/etc/hosts'): print "failed to remove /system/etc/hosts" else: print "successfully removed hosts file, we can test!!!" dm.reboot() I have tested this many times on a fresh system with just the gainroot.sh and SUT agent installed.

Joel Maher ( :jmaher ) (UTC -8)

Reporter

Updated

•

14 years ago

Depends on: 678992

Doug Turner (:dougt)

Comment 65

•

14 years ago

most of the depends on bugs here are incorrect. I think the only bug that this bug now depends on is 678992. When that is fixed, we can mark this bug FIXED

Mike Taylor [:bear]

Comment 66

•

14 years ago

I ran this on the staging tegras and have restarted them. The staging test master triggers a set of test runs every 15 minutes, so the overnight should produce some good data.

Brad Lassey [:blassey] (use needinfo?)

Updated

•

14 years ago

Assignee: nobody → nchen

John O'Duinn [:joduinn] (please use "needinfo?" flag)

Comment 67

•

14 years ago

(In reply to Doug Turner (:dougt) from comment #65) > most of the depends on bugs here are incorrect. I think the only bug that > this bug now depends on is 678992. When that is fixed, we can mark this bug > FIXED bug#678992 was fixed on 23aug. Anything left to do here?

Joel Maher ( :jmaher ) (UTC -8)

Reporter

Comment 68

•

14 years ago

I vote for resolving this bug.

(inactive) Jim Chen [:jchen] [:darchons]

Assignee

Comment 69

•

14 years ago

(In reply to Joel Maher (:jmaher) from comment #68) > I vote for resolving this bug. Since I'm assigned to it, I vote for resolving as well :)

Status: NEW → RESOLVED

Closed: 14 years ago

Resolution: --- → FIXED

(not currently active) Ted Mielczarek

Comment 70

•

14 years ago

Interestingly, this shows up in the wild as well: https://crash-stats.mozilla.com/report/list?product=Fennec&query_search=signature&query_type=contains&reason_type=contains&date=09%2F10%2F2011%2008%3A33%3A44&range_value=1&range_unit=days&hang_type=any&process_type=any&do_query=1&signature=__libc_android_abort%20%7C%20dlfree%20%7C%20free%20%7C%20fclose

Crash Signature: [@ __libc_android_abort | dlfree | free | fclose ]

Justin Dolske [:Dolske]

Comment 71

•

14 years ago

Yeah, I,m seeing this on recent nightlies. I just had two such crashes in a row, via a session with 1 wikipedia tab. New bug?

(not currently active) Ted Mielczarek

Comment 72

•

14 years ago

Yeah. File it?

Scoobidiver (away)

Comment 73

•

14 years ago

(In reply to Jim Chen [:jchen] from comment #69) > Since I'm assigned to it, I vote for resolving as well :) What is the target milestone because it is still #2 top crasher in 7.0 Beta (#1 in 6.0)?

(inactive) Jim Chen [:jchen] [:darchons]

Assignee

Comment 74

•

14 years ago

(In reply to Scoobidiver from comment #73) > (In reply to Jim Chen [:jchen] from comment #69) > > Since I'm assigned to it, I vote for resolving as well :) > What is the target milestone because it is still #2 top crasher in 7.0 Beta > (#1 in 6.0)? I'm not sure. We still need to decide on a fix. Filed bug 687367 for that.

Nobody; OK to take it and work on it

Updated

•

13 years ago

Keywords: intermittent-failure

Nobody; OK to take it and work on it

Updated

•

13 years ago

Whiteboard: [orange][mobile_unittests][mobile_dev_needed][android_tier_1] → [mobile_unittests][mobile_dev_needed][android_tier_1]