1027802 - Try to get a ANR report and stack trace following Android test hangs

Assignee

Description

•

11 years ago

In bug 960265, we considered but backed away from using "kill SIGABRT" to get a stack trace following detection of a test hang on Android. My experience tracking down a shutdown hang in bug 1020345 convinced me that we should revisit that idea, or otherwise invest in trying to get a stack trace following an Android hang.

Ed Morley [:emorley]

Updated

•

11 years ago

Depends on: 890026

Geoff Brown [:gbrown]

Assignee

Updated

•

11 years ago

Updated

•

11 years ago

Summary: Try to get a stack trace following Android test hangs → Try to get a ANR report and stack trace following Android test hangs

Geoff Brown [:gbrown]

Assignee

Comment 2

•

11 years ago

We want to SIGQUIT (kill -3) to generate an ANR report, then SIGABRT (kill -6) to trigger breadpad -- requested in bug 1029550.

Geoff Brown [:gbrown]

Assignee

Comment 3

•

11 years ago

When my previous patch was reviewed (https://bugzilla.mozilla.org/show_bug.cgi?id=960265#c12) we thought it might be good to leverage devicemanager.killProcess(). killProcess() takes a signal parameter, which is supported for ADB, but not for SUT (added for B2G in bug 931078). For SUT, the issue is that sutagent's kill command does not support a signal argument. Rather than block on a sutagent upgrade, I'm thinking I'll hack around sutagent with a shellCheckOutput command, and open another bug to support the signal parameter in sutagent.

Geoff Brown [:gbrown]

Assignee

Comment 4

•

11 years ago

I pushed an initial patch to try with: [ kill -3, kill -6, kill ]. It consistently produces an ANR report and consistently produces an .dmp file. But frequently (50%?), the dmp file has a bad header signature. This happens even if I [ kill -3, kill -6, sleep 3 seconds, kill ]. https://tbpl.mozilla.org/php/getParsedLog.php?id=42381315&tree=Try&full=1#error0 13:20:36 INFO - mozcrash INFO | Downloading symbols from: https://ftp-ssl.mozilla.org/pub/mozilla.org/firefox/try-builds/gbrown@mozilla.com-a7818fa4ad10/try-android/fennec-33.0a1.en-US.android-arm.crashreporter-symbols.zip 13:20:37 WARNING - PROCESS-CRASH | http://10.12.135.20:30737/tests/layout/reftests/reftest-sanity/test-async.html | application crashed [Unknown top frame] 13:20:37 INFO - Crash dump filename: /tmp/tmpLwEjSY/3a1bd888-2347-0b34-180d56c8-6ccc20aa.dmp 13:20:37 INFO - stderr from minidump_stackwalk: 13:20:37 INFO - 2014-06-24 13:20:37: minidump_processor.cc:264: INFO: Processing minidump in file /tmp/tmpLwEjSY/3a1bd888-2347-0b34-180d56c8-6ccc20aa.dmp 13:20:37 INFO - 2014-06-24 13:20:37: minidump.cc:3815: INFO: Minidump opened minidump /tmp/tmpLwEjSY/3a1bd888-2347-0b34-180d56c8-6ccc20aa.dmp 13:20:37 INFO - 2014-06-24 13:20:37: minidump.cc:3847: ERROR: Minidump header signature mismatch: (0x0, 0x0) != 0x504d444d 13:20:37 INFO - 2014-06-24 13:20:37: minidump_processor.cc:268: ERROR: Minidump /tmp/tmpLwEjSY/3a1bd888-2347-0b34-180d56c8-6ccc20aa.dmp could not be read 13:20:37 INFO - 2014-06-24 13:20:37: minidump.cc:3787: INFO: Minidump closing minidump 13:20:37 INFO - 2014-06-24 13:20:37: minidump_stackwalk.cc:529: ERROR: MinidumpProcessor::Process failed 13:20:37 INFO - minidump_stackwalk exited with return code 1 13:20:37 INFO - mozcrash INFO | Saved minidump as /builds/panda-0737/test/build/blobber_upload_dir/3a1bd888-2347-0b34-180d56c8-6ccc20aa.dmp I think we have seen this before, but I don't know the cause. :ted -- Any ideas? Is it worth sleeping longer before the final kill?

Geoff Brown [:gbrown]

Assignee

Updated

•

11 years ago

Flags: needinfo?(ted)

James Willcox (:snorp) (jwillcox@mozilla.com) (he/him)

Comment 5

•

11 years ago

(In reply to Geoff Brown [:gbrown] from comment #4) > I pushed an initial patch to try with: [ kill -3, kill -6, kill ]. It > consistently produces an ANR report and consistently produces an .dmp file. > But frequently (50%?), the dmp file has a bad header signature. This happens > even if I [ kill -3, kill -6, sleep 3 seconds, kill ]. Not sure if it matters here, but I would put a sleep between the 'kill -3' and 'kill -6' too.

James Willcox (:snorp) (jwillcox@mozilla.com) (he/him)

Comment 6

•

11 years ago

Ugh, it's really depressing that the ANR output does not have the native stack. I guess they added that in later versions. When are we getting JB/KK emulators up? :)

Geoff Brown [:gbrown]

Assignee

Updated

•

11 years ago

Depends on: 1030342

Geoff Brown [:gbrown]

Assignee

Updated

•

11 years ago

Comment 7

•

11 years ago

(In reply to Geoff Brown [:gbrown] from comment #4) > :ted -- Any ideas? Is it worth sleeping longer before the final kill? Ideally the right behavior would be "kill -6; wait for the process to exit, if that times out kill -9". Maybe that's a pain to do, but we don't know how long it will take to write the minidump and exit cleanly.

Flags: needinfo?(ted)

Geoff Brown [:gbrown]

Assignee

Comment 8

•

11 years ago

Attached patch on hang, try to get Fennec ANR and stack trace — Details — Splinter Review

Following Ted's advice, the approach here is to kill -3 to generate an ANR report, wait 3 seconds, kill -6 to generate a .dmp file, wait up to 15 seconds for the process to end, kill -9 if still running, and finally fall back to the old killProcess behavior if *still* running. Testing on Android 2.3 always produces an ANR and a stack trace, and the process consistently dies quickly following kill -6. Testing on Android 4.0 always produces an ANR and sometimes produces a stack trace and dies quickly following kill -6. Frequently, kill -6 produces a .dmp file with a header signature mismatch (Comment 4) and Fennec does not die until kill -9; in these cases, even waiting for several minutes does not help. Although we don't get a stack trace in these cases, no harm is done and at least we get the ANR report.

Attachment #8450559 - Flags: review?(jmaher)

Joel Maher ( :jmaher ) (UTC -8)

Comment 9

•

11 years ago

Comment on attachment 8450559 [details] [diff] [review] on hang, try to get Fennec ANR and stack trace Review of attachment 8450559 [details] [diff] [review]: ----------------------------------------------------------------- a few nits ::: build/mobile/remoteautomation.py @@ +301,5 @@ > + if pid and pid > 0: > + time.sleep(5) > + else: > + return > + retries += 1 if we retry and don't succeed, should we print a warning? @@ +302,5 @@ > + time.sleep(5) > + else: > + return > + retries += 1 > + self.dm.killProcess(self.procName, 9) do we need a 9 here or can we do the old school way? ::: testing/mozbase/mozdevice/mozdevice/devicemanagerSUT.py @@ +539,5 @@ > if sig: > + pid = self.processExist(appname) > + if pid and pid > 0: > + try: > + self.shellCheckOutput(['kill', '-%d' % sig, str(pid)], nit: trailing whitespace @@ +557,5 @@ > + if self.processExist(appname): > + self._runCmds([{ 'cmd': 'kill ' + appname }]) > + return > + except DMError, err: > + retries +=1 nit: retries += 1 <- add a space before the 1

Attachment #8450559 - Flags: review?(jmaher) → review+

Geoff Brown [:gbrown]

Assignee

Comment 10

•

11 years ago

Thanks! I fixed the nits and added a retry warning, but kept the kill -9 -- I am not sure it is needed but I think it is good practice to kill aggressively when faced with a process that is not exiting.

Geoff Brown [:gbrown]

Assignee

Comment 11

•

11 years ago

https://hg.mozilla.org/integration/mozilla-inbound/rev/e7ba9bfbea36

Carsten Book [:Tomcat]

Comment 12

•

11 years ago

https://hg.mozilla.org/mozilla-central/rev/e7ba9bfbea36

Status: NEW → RESOLVED

Closed: 11 years ago

Resolution: --- → FIXED

Target Milestone: --- → mozilla33

Ed Morley [:emorley]

Comment 13

•

11 years ago

\o/ I'm excited to see what this turns up for existing intermittent failure modes :-)

Bugzilla

Try to get a ANR report and stack trace following Android test hangs

Categories

(Testing :: General, defect)

Tracking

(Not tracked)

People

(Reporter: gbrown, Assigned: gbrown)

References

Details

Crash Data

Security

(public)

User Story

Attachments

(1 file)

Description

Updated

Updated

Updated

Comment 2

Comment 3

Comment 4

Updated

Comment 5

Comment 6

Updated

Updated

Comment 7

Comment 8

Comment 9

Comment 10

Comment 11

Comment 12

Comment 13

Attachment

General

Description

File Name

Content Type